IngestAsPlainText
Web Connector creates an IDOL document for each web page that you have chosen to ingest. A document is ingested with a HTML file that represents the web page after it has been processed (the connector can clip irrelevant content, remove scripts, convert hyperlinks to absolute URLs, and so on). Connector Framework Server (CFS) populates the DRECONTENT
field of each document by extracting text from its associated file.
Alternatively, setting IngestAsPlainText=TRUE
configures the Web Connector to ingest content as plain text. In this case Web Connector downloads each page you have chosen to ingest, processes it (performing clipping, removing scripts, and so on), and then extracts the text. The text is added to the DRECONTENT
document field. The connector sends metadata-only documents to CFS (the documents do not have associated HTML files).
Using the connector to extract text is likely to produce better results than CFS (KeyView), because the connector processes HTML using the HTML Document Object Model (DOM).
Type: | Boolean |
Default: | False |
Required: | No |
Configuration Section: | TaskName or FetchTasks or Default |
Example: | IngestAsPlainText=TRUE
|
See Also: |