Reject All Invalid Documents

The BadFilesFilter task rejects all documents that are considered to be invalid:

  • Documents that have binary content.
  • Documents for which import errors have occurred.
  • Documents that have too high a proportion of symbolic content.
  • Documents where the average word length is too long or too short.

BadFilesFilter must be configured as a Post task.

BadFilesFilter reads configuration parameters from the section of the configuration file that you specify in the Post parameter. In this section you can set parameters for each filter. In the example below, two parameters have been set to configure the word length filter:

[ImportTasks]
Post0=BadFilesFilter:BadFilesFilterSettings

[BadFilesFilterSettings]
MinimumAverage=3.0
MaximumAverage=10.0
OnErrorIndexerSections=IdolErrorServer
IndexDatabase=IdolErrorReview

For information about the parameters that you can use to configure this task, refer to the Connector Framework Server Reference.