Manipulate Documents Before Indexing

CFS can index documents into multiple indexes. Normally, CFS indexes identical data into every index, but you might want to manipulate documents depending on the index that they are sent to. For example, if you are using Vertica to analyze structured information, you might want to remove the content from the documents indexed into Vertica, but keep the content in documents that are indexed into IDOL.

You cannot use import and index tasks to manipulate documents in this way, because those tasks affect documents sent to all of the indexes. To manipulate the documents sent to a single index, you can run a Lua script during the indexing process.

The script must define a handler function:

function handler(document, operation) 
     -- do something, for example
     document:deleteField("UNINTERESTING_FIELD")
     return true 
 end

The operation argument specifies the documents that you want to run the script on. This argument is a string and can be set to add, update, or remove:

  • add - manipulate documents that are being added to the index. Ingest-adds are sent when a connector finds new documents in a repository, or when a document's content is changed (the old document is removed, and the new document added).
  • update - manipulate documents that represent metadata updates.
  • remove - manipulate documents that represent information deleted from the source repository.

To index the document the handler function must return true. To discard the document, return false.

To manipulate documents before indexing

  1. Open the CFS configuration file.
  2. In a section of the configuration file specified by the IndexerSections configuration parameter, set the IndexLuaScript parameter. This parameter specifies the path to the script that you want to run. For example:

    [Indexing]
    IndexerSections=IdolServer,Vertica [Vertica] IndexerType=Library LibraryDirectory=indexerdlls LibraryName=verticaIndexer ConnectionString=DSN=VERTICA TableName=my_flex_table IndexLuaScript=./scripts/remove_content.lua
  3. Save and close the configuration file.