Choose the Content to Index with a Lua Script
Web Connector supports dynamic corpus functionality. This means that you can use IDOL analytics such as categorization to decide whether to ingest content. You can also use this feature to filter the links that are extracted from a page. The connector runs a Lua script to decide whether to ingest the page and which links to follow, so you can also implement a custom algorithm for deciding which pages to index.
NOTE: This feature is available only if your Web Connector license includes dynamic corpus functionality.
The script must contain a function named shouldIngestPage
that returns true
to ingest the page or false
to ignore it. You can optionally return a list of links to override the links were extracted from the page by the connector. For example, if you want to ingest the page but not follow any of the links on the page, you can return true
but specify an empty list.
The function should look like this:
function shouldIngestPage(url, contentType, contentFilename, textContentFilename, links, depth) -- do something to decide return value... -- to ingest the page and follow links extracted by the connector return true -- to ingest the page but not follow any links return true, {} -- to ignore the page return false end
The arguments supplied to the function are:
Argument | Type | Description |
---|---|---|
url
|
string | The page URL. |
contentType
|
string | The MIME content type. |
contentFilename
|
string | The path to the file that contains the page content. |
textContentFilename
|
string | The path to the file that contains the text that was extracted from the page (or nil if text could not be extracted). |
links
|
list of strings | The links that were extracted from the page. |
depth
|
integer | The page depth (the number of links that were followed from the starting point in order to reach the page). |
An example script, FilterPages_binarycat.lua
, is included with the connector. This script decides whether to ingest a page by calling the IDOL Category component and running the action BinaryCatQuery
.
To configure the connector to run your script, set the configuration parameter FilterPagesLuaScript
.