The Ingestion Process
The following chart provides a summary of the ingestion process.
Documents are submitted to Connector Framework Server through the ingest
action. If the document has metadata only, CFS runs any processing tasks that have been configured and the document is then ready for indexing. If the document has an associated file then the ingestion process depends on the file format.
- All files apart from IDOL IDX and XML. Most documents that have an associated file are added to the import queue so that the information in the file can be extracted by KeyView or other processing tasks. For information about the import process, see The Import Process.
- IDOL IDX files. An IDX file contains one or more documents in IDOL IDX format, so CFS attempts to parse the file. If parsing is successful then the IDOL documents are returned to the ingest queue as metadata-only documents. If parsing is not successful then CFS adds the document to the import queue so that the IDX file is processed by KeyView. Parsing an IDX file is preferable to processing it with KeyView, because although KeyView can extract the text, it cannot extract the structure information that divides the text into separate documents, content sections, and metadata fields.
-
XML files. Many systems export information in XML format and CFS has features to help you convert XML into IDOL documents.
CFS can run a transformation on an ingested XML file. This is an optional step but can be useful in cases where your XML files do not resemble IDOL documents or you are processing XML from many sources and the files have different schemas. You can configure any number of transformations and CFS runs the first transformation where the ingested XML matches the specified schema. You can also configure a default transformation that CFS runs when an XML file does not match any of your schemas. When a transformation is configured but is not successful, CFS adds the document to the import queue so that the XML is processed by KeyView.
After an XML transformation is successful or when transformation is not configured, CFS attempts to convert the XML into IDOL documents. The conversion is performed by mapping elements in the XML to IDOL documents and document fields. If the conversion is successful the resulting documents are returned to the ingest queue as metadata-only documents. If the conversion does not result in any IDOL documents but the XML was transformed after matching a schema, CFS does not consider this as a failure and does not index any documents. Otherwise, CFS adds the document to the import queue so that the XML is processed by KeyView.
Parsing an XML file is usually preferable to processing it with KeyView, because although KeyView can extract the text it does not preserve the structure information (the XML tags are discarded).