xml_config

Configure the elements and attributes to extract from XML documents with a specified format ID or root element.

For more information, see Configure Element Extraction for XML Documents.

Syntax

Configuration& xml_config(
    Format docFormat,
    const std::string& root,
    const std::string& metadata_include,
    const std::string& metadata_exclude,
    const std::string& content_include,
    const std::string& content_exclude,
    const std::string& content_attributes
);
docFormat

The format ID as detected by the KeyView detection module. This ID determines the file type to which these extraction settings apply. See Obtain Format Information for more information on format ID values.

If you are adding configuration settings for a custom XML document type, set docFormat to Format::Unknown_Fmt.

root

The file's root element. If docFormat is set to Format::Unknown_Fmt, the root element is used to determine the file type to which these settings apply. Otherwise, root is ignored.

metadata_include The elements extracted from the file as metadata. All other elements are extracted as text.
metadata_exclude

The child elements in the included metadata elements that are not extracted from the file as metadata.

For example, the default extraction settings for the Visio XML format extract the DocumentProperties element as metadata. This element includes child elements such as Title, Subject, Author, Description, and so on. However, the child element PreviewPicture is defined in metadata_exclude because it is binary data and should not be extracted.

You cannot exclude any metadata elements from the output for StarOffice files. All metadata is extracted regardless of this setting.

content_include The elements extracted from the file as content text.
content_exclude

The child elements in the included content elements that are not extracted from the file as content text.

content_attributes

The attribute values extracted from the file. If attributes are not defined here, attribute values are not extracted.

NOTE: For more information about how to specify elements, see Syntax for Specifying Elements.