CFS can index documents into multiple indexes. Normally, CFS indexes identical data into every index, but you might want to manipulate documents depending on the index that they are sent to. For example, if you are using Vertica to analyze structured information, you might want to remove the content from the documents indexed into Vertica, but keep the content in documents that are indexed into IDOL.
You cannot use import and index tasks to manipulate documents in this way, because those tasks affect documents sent to all of the indexes. To manipulate the documents sent to a single index, you can run a Lua script during the indexing process.
The script must define a handler
function:
function handler(document, operation) -- do something, for example document:deleteField("UNINTERESTING_FIELD") return true end
The operation
argument specifies the documents that you want to run the script on. This argument is a string and can be set to add
, update
, or remove
:
add
- manipulate documents that are being added to the index. Ingest-adds are sent when a connector finds new documents in a repository, or when a document's content is changed (the old document is removed, and the new document added).update
- manipulate documents that represent metadata updates.remove
- manipulate documents that represent information deleted from the source repository.To index the document the handler
function must return true
. To discard the document, return false
.
To manipulate documents before indexing
In a section of the configuration file specified by the IndexerSections
configuration parameter, set the IndexLuaScript
parameter. This parameter specifies the path to the script that you want to run. For example:
[Indexing]
IndexerSections=IdolServer,Vertica [Vertica] IndexerType=Library LibraryDirectory=indexerdlls LibraryName=verticaIndexer ConnectionString=DSN=VERTICA TableName=my_flex_table IndexLuaScript=./scripts/remove_content.lua