ConvertXMLToDocuments

Many systems can output data in XML format. The ConvertXMLToDocuments processor automatically detects any FlowFile that represents an XML file and attempts to parse the XML into IDOL documents. The processor produces a new FlowFile for each IDOL document.

TIP: To be parsed successfully, the input XML must conform to the standard IDOL document structure (the same format as created by the WriteDocument processor when you choose XML format).

Properties

Name Default Value Description
IDOL License Service  

An IdolLicenseServiceImpl that provides a way to communicate with an IDOL License Server.

Document Registry Service   A DocumentRegistryServiceImpl controller service that manages and updates a document registry database. This ensures that documents are indexed in the correct order.
Commit Batch Size 100 The processor outputs documents in batches to limit memory use and allow subsequent tasks to begin processing the documents sooner. This property specifies the maximum batch size.
Content Paths "DRECONTENT" A comma-separated list of possible paths to a node that contains the document content. Specify the paths relative to the node identified by "Document Root Paths". Use a forward slash (/) to represent levels in the XML hierarchy. If multiple content nodes are identified for a single document, a document is produced with multiple sections.
Document Root Paths "DOCUMENT" A comma-separated list of paths to nodes that contain a single document. Specify the paths relative to the root of the XML. Use a forward slash (/) to represent levels in the XML hierarchy. Any elements contained within the specified node are added to the document as metadata.
Fail If Document Has No Reference True Specifies whether to fail XML parsing when a document parsed from the XML does not have a reference field (as defined by the "Reference Paths" property). When you set this property to FALSE and no reference field is found, the document will have an empty reference.
Include Root Path False A Boolean value that specifies whether to include the node specified by "Document Root Paths" in the document. You might set this property to TRUE if your root node has attributes that you need to include in the document.
Max Documents   The maximum number of documents to create from an XML file. The processor stops parsing the XML file after this number of documents have been created.
Maximum File Size In Memory 1073741824 The maximum size of an XML file to load into memory for parsing, in bytes. Files larger than this size are parsed using file streams, which is less memory intensive, but slower.
Reference Paths "DREREFERENCE" A comma-separated list of possible paths to a node that contains the document reference. Specify the paths relative to the node identified by "Document Root Paths". Use a forward slash (/) to represent levels in the XML hierarchy. The XML for each document must contain exactly one node that matches one of the specified paths.

Relationships

Name Description
extracted New FlowFiles for individual IDOL documents that were extracted from an XML file.
failure Original FlowFiles that represent an XML file but from which there were XML parsing errors.
processed Original FlowFiles that represent an XML file and were parsed successfully. Original FlowFiles are routed to this relationship when they contain valid IDOL documents in XML format.
unprocessed Original FlowFiles that do not represent an XML file.