Adds Parameter
The adds
parameter specifies XML that describes the document to ingest. A document must have a unique reference. It can consist of metadata only, content only, or metadata and content. You can specify content using either plain text or a file.
The following XML describes a document with metadata and a file:
<adds> <add> <document> <reference>http://www.example.com/</reference> <metadata name="Field1" value="Value1"/> <metadata name="Field2" value="Value2"/> </document> <source filename="MyFile.doc" lifetime="permanent"/> </add> </adds>
You can specify XML metadata using the xmlmetadata
element:
<adds> <add> <document> <reference>http://www.example.com/</reference> <xmlmetadata> <Field1>Value1</Field1> <Field2> <SubFieldOne>First</SubFieldOne> <SubFieldTwo>Second</SubFieldTwo> </Field2> </xmlmetadata> </document> <source filename="MyFile.doc" lifetime="permanent"/> </add> </adds>
In the preceding examples the source is specified as a file on the file system. You can also specify the source using a base64 encoded string:
<adds> <add> <document> <reference>http://www.example.com/</reference> </document> <source content="U29tZSB0ZXh0DQo="/> </add> </adds>
You can also specify the plain text for each section using the pages element:
<adds> <add> <document> <reference>http://www.example.com/</reference> <pages content="Page 1 content"/> <pages content="Page 2 content"/> </document> </add> </adds>
Multiple documents can be ingested by specifying multiple <add>
elements. The following table describes the various XML elements.
XML element | Description |
---|---|
add (required)
|
The Each |
document (required)
|
The Each |
reference (required) |
The reference element is used to provide a unique reference (DREREFERENCE) for the document. |
metadata (optional) |
The
|
xmlmetadata (optional) |
The |
pages (optional) |
The pages element is used to specify any filtered document content that should be sent with the document. You can use multiple pages elements, and each of these will map to a separate DRESECTION. |
page (optional) |
The page element is used to specify the content for a single DRESECTION. Specify the content as plain text. |
source (optional) |
The You must set either the
If you specify the location of a file using
NOTE: The |
Example
<adds> <add> <document> <reference>C:\Autonomy\newfs\data\050309-020409.xls</reference> <xmlmetadata> <AUTN_GROUP>fs</AUTN_GROUP> <AUTN_IDENTIFIER>PGlkIHM9IlRBU0sxIiByPSJDOlxBdXRvbm9teVxuZXdmc1xkYXRhXDA1MDMwOS0wMjA0MDkueGxzIi8+</AUTN_IDENTIFIER> <CREATED>2012-Feb-13 10:42:56.232479</CREATED> <DocTrackingId>9f3684f499aef4a9c025a43d8125029f</DocTrackingId> <DREDBNAME>Test</DREDBNAME> <FILESIZE>61440</FILESIZE> <LASTACCESSED>2012-Feb-13 10:42:56.232479</LASTACCESSED> <LASTCHANGED>2012-Feb-14 09:16:38.250813</LASTCHANGED> <LASTMODIFIED>2009-Apr-06 10:21:15.032472</LASTMODIFIED> </xmlmetadata> </document> <source filename="C:/Autonomy/newfs/data/050309-020409.xls" lifetime="permanent"/> </add> <add> <document> <reference>C:\Autonomy\newfs\data\070610-100610.xls</reference> <xmlmetadata> <AUTN_GROUP>fs</AUTN_GROUP> <AUTN_IDENTIFIER>PGlkIHM9IlRBU0sxIiByPSJDOlxBdXRvbm9teVxuZXdmc1xkYXRhXDA3MDYxMC0xMDA2MTAueGxzIi8+</AUTN_IDENTIFIER> <CREATED>2012-Feb-13 10:42:56.232479</CREATED> <DocTrackingId>4d7fbaa368fa1727177b9f1ef06caa57</DocTrackingId> <DREDBNAME>Test</DREDBNAME> <FILESIZE>54784</FILESIZE> <LASTACCESSED>2012-Feb-13 10:42:56.232479</LASTACCESSED> <LASTCHANGED>2012-Feb-14 09:16:38.235226</LASTCHANGED> <LASTMODIFIED>2010-Jun-14 13:52:24.041742</LASTMODIFIED> </xmlmetadata> </document> <source filename="C:/Autonomy/newfs/data/070610-100610.xls" lifetime="permanent"/> </add> </adds>
Ingest an IDX File
You can ingest an IDX file by using the <source>
element to specify the path to the file.
<adds> <add> <source filename="data.idx" /> </add> </adds>
If the <add>
element includes a <document>
element specifying a reference, metadata, or content:
- The reference is ignored. The references in the IDX file are always used.
- The metadata is merged with the metadata for every document in the IDX file.
- The content is inserted before the content of every document in the IDX file.
If the IDX file contains sectioned documents, the sections are merged into a single document.
If CFS fails to parse the IDX file, and no documents were successfully extracted, the IDX is processed as a regular file. If CFS fails to parse the IDX file, but at least one document was successfully extracted, an error is logged and the remainder of the file is not processed.