The Eduction configuration elements that you can define in the XML file are described in Eduction Parameters.
The following XML configuration file example shows all the available XML elements:
<?xml version="1.0" encoding="UTF-8"?> <!-- Sample Eduction XML configuration file for the edktool utility --> <Eduction> <!-- Global Settings (Defaults shown) --> <MatchWholeWord>true</MatchWholeWord> <SuppressMatchLogging>false</SuppressMatchLogging> <MaxEntityLength>256</MaxEntityLength> <AllowOverlaps>false</AllowOverlaps> <EnableComponents>false</EnableComponents> <OutputSimpleMatchInfo>true</OutputSimpleMatchInfo> <MatchCase>true</MatchCase> <DocumentDelimiterCSVs>*/DOCUMENT</DocumentDelimiterCSVs> <CantHaveFields> <CantHaveField>*/DRESTORECONTENT</CantHaveField> <CantHaveField>*/CHECKSUM</CantHaveField> <CantHaveField>*/DREWORDCOUNT</CantHaveField> <CantHaveField>*/DRETYPE</CantHaveField> <CantHaveField>*/IMPORTBODYLEN</CantHaveField> <CantHaveField>*/IMPORTMETALEN</CantHaveField> <CantHaveField>*/IMPORTLINKLEN</CantHaveField> <CantHaveField>*/IMPORTTITLELEN</CantHaveField> <CantHaveField>*/IMPORTQUALITY</CantHaveField> <CantHaveField>*/DREPAGE</CantHaveField> <CantHaveField>*/DREFILENAME</CantHaveField> <CantHaveField>*/dredoctype</CantHaveField> </CantHaveFields> <!-- Eduction grammar (resource) files to load --> <ResourceFiles> <ResourceFile>phone.ecr</ResourceFile> <ResourceFile>jargon.ecr</ResourceFile> </ResourceFiles> <!-- IDOL databases to search. Applies only to IDOL IDX or IDOL XML input documents --> <Databases> <Database>Contact</Database> <Database>Customer</Database> </Databases> <!-- Document fields to search. ignored for plain text input documents (DRECONTENT is the default) --> <SearchFields> <SearchField>DREREFERENCE</SearchField> <SearchField>DRETITLE</SearchField> <SearchField>DRECONTENT</SearchField> </SearchFields> <!-- Definitions of search zones within a document --> <Zones> <Zone> <Name>Summary</Name> <StartPattern>Executive Summary</StartPattern> <EndPattern>Introduction</EndPattern> </Zone> <Zone> <Name>Body</Name> <StartPattern>Introduction</StartPattern> </Zone> </Zones> <!-- Fields generated from a match. Always required, but applies only to IDOL IDX or IDOL XML input documents where the output is also a modified IDOL document --> <TargetFields> <TargetField> <Name>PHONE</Name> <AllowDuplicates>false</AllowDuplicates> </TargetField> </TargetFields> <!-- Eduction grammar entities used for searching --> <Entities> <Entity> <Name>phone/all</Name> <TargetField>PHONE</TargetField> <MatchRange>1,2-4</MatchRange> <MinScore>0.5</MinScore> <Zone>Summary</Zone> <Zone>Body</Zone> </Entity> </Entities> </Eduction>
If Eduction reads an IDOL XML data file, you must configure DocumentDelimiterCSVs
, and also at least one entry for the CantHaveFields
setting. If this is not present, Eduction defaults to DOCUMENT
and EDUCTION_DUMMY_FIELD
respectively.
|