Reject All Invalid Documents
The BadFilesFilter
task rejects all documents that are considered to be invalid:
- Documents that have binary content.
- Documents for which import errors have occurred.
- Documents that have too high a proportion of symbolic content.
- Documents where the average word length is too long or too short.
BadFilesFilter
must be configured as a Post task.
BadFilesFilter
reads configuration parameters from the section of the configuration file that you specify in the Post
parameter. In this section you can set parameters for each filter. In the example below, two parameters have been set to configure the word length filter:
[ImportTasks] Post0=BadFilesFilter:BadFilesFilterSettings [BadFilesFilterSettings] MinimumAverage=3.0 MaximumAverage=10.0 OnErrorIndexerSections=IdolErrorServer IndexDatabase=IdolErrorReview
For information about the parameters that you can use to configure this task, refer to the Connector Framework Server Reference.