Reject Documents with Symbolic Content
The SymbolicContentFilter
task calculates the proportion of symbolic characters in a document. If the proportion of symbolic characters in the document content exceeds the limit specified by the MaxSymbolicCharactersPercent
parameter, the document is rejected.
Symbolic characters are defined as any character between U+2000 and U+2FFF.
The SymbolicContentFilter
task can be configured as a Post task. The parameters that are passed to the task are specified in a named section of the configuration file. For example:
[ImportTasks]
Post0=SymbolicContentFilter:SymbolicContentFilterSettings
[SymbolicContentFilterSettings]
MaxSymbolicCharactersPercent=8
OnErrorIndexerSections=IdolErrorServer
IndexDatabase=IdolErrorReview
For information about the parameters that you can use to configure this task, refer to the Connector Framework Server Reference.