FilterPagesLuaScript
The path of a Lua script that contains a custom function to use for deciding whether to ingest pages.
NOTE: This feature is available only if your Web Connector license includes dynamic corpus functionality.
The script must contain a function named shouldIngestPage
that returns true
to ingest the page or false
to ignore it. You can optionally return a list of links to override the links were extracted from the page by the connector. For example, if you want to ingest the page but not follow any of the links on the page, you can return true
but specify an empty list.
The function should look like this:
function shouldIngestPage(url, contentType, contentFilename, textContentFilename, links, depth) -- do something to decide return value... -- to ingest the page and follow links extracted by the connector return true -- to ingest the page but not follow any links return true, {} -- to ignore the page return false end
The arguments supplied to the function are:
Argument | Type | Description |
---|---|---|
url
|
string | The page URL. |
contentType
|
string | The MIME content type. |
contentFilename
|
string | The path to the file that contains the page content. |
textContentFilename
|
string | The path to the file that contains the text that was extracted from the page (or nil if text could not be extracted). |
links
|
list of strings | The links that were extracted from the page. |
depth
|
integer | The page depth (the number of links that were followed from the starting point in order to reach the page). |
An example script, FilterPages_binarycat.lua
, is included with the connector. This script decides whether to ingest a page by calling the IDOL Category component and running the action BinaryCatQuery
.
Type: | String (file path) |
Default: | |
Required: | No |
Configuration Section: | TaskName or FetchTasks |
Example: | FilterPagesLuaScript=MyCustomPageSelection.lua
|
See Also: |