Reject Documents with Symbolic Content

The SymbolicContentFilter task calculates the proportion of symbolic characters in a document. If the proportion of symbolic characters in the document content exceeds the limit specified by the MaxSymbolicCharactersPercent parameter, the document is rejected.

Symbolic characters are defined as any character between U+2000 and U+2FFF.

The SymbolicContentFilter task can be configured as a Post task. The parameters that are passed to the task are specified in a named section of the configuration file. For example:

[ImportTasks]
Post0=SymbolicContentFilter:SymbolicContentFilterSettings

[SymbolicContentFilterSettings]
MaxSymbolicCharactersPercent=8
OnErrorIndexerSections=IdolErrorServer
IndexDatabase=IdolErrorReview

For information about the parameters that you can use to configure this task, refer to the Connector Framework Server Reference.