This option extracts entities from a document. It can print the output to a file, or to the console. You can use this option to test your grammars.
You can use wildcard expressions in the -e
and -g
parameters; see
Wildcard Expressions in edktool for more information.
You can enable redaction on extracted matches in edktool
either by setting RedactedOutput
to True
in the edktool
configuration file, or by specifying a redaction file using the -r
parameter at the command line. Note that edktool
only performs redaction on fields that you have configured as IDOL search fields.
If you have specified an IDX file to perform extraction on, existing fields are preserved in their unredacted form, and a redacted copy of each search field is added to the IDX file, with _REDACTED
appended to the original field name. For example:
#DREREFERENCE 1 #DREFIELD DRECONTENT_REDACTED="The driver ########## was questioned." #DRECONTENT The driver Joe Bloggs was questioned. #DREENDDOC
If you have specified a plaintext file to perform extraction on, the entities identified as matches by edktool
are redacted from the input text to form the redacted output. For example:
Input:
The driver Joe Bloggs was questioned.
Output:
The driver ########## was questioned.
Eduction sends redacted output to the file specified in the-r
parameter. If you do not specify this argument but you have enabled redaction in the configuration file, Eduction displays redacted output in the console after the list of matches, unless you have specified the -q
parameter at the command line to enable Quiet mode. In Quiet mode, redacted output does not display in the console.
-l <licensefile>
|
The file containing a valid license key for Eduction. If you do not specify a license key at the command line, |
-i <inputfile>
|
The file to perform entity extraction on. The input file can be either an IDOL IDX file, an IDOL XML file, or a plain text file. It must be UTF-8 encoded. NOTE:
If the input file is an XML file, the configuration file (in either IDOL configuration file format or XML format) must contain entries for the |
-c <configfile>
|
A configuration file controlling the extraction. The configuration file can be either an IDOL Server style .CFG configuration file or an XML configuration file. See Configuration Files for Eduction Settings. You can specify one or more grammar files and one or more entities in place of a configuration file. Specifying a configuration file overrides the grammar or entity parameters. |
-g <grammarfile>
|
A grammar file to use when If you provide a grammar file but do not specify any entities with |
-e <entity>
|
The entities to extract when |
-o <outputfile>
|
The file containing the results of the extraction. The content of the optional output file depends on the type of input file provided and whether the If the input file type is an IDOL file and the If the input file is a plain text file or an IDOL file with the If the input file is an IDOL file, the output file also contains document information. |
-m
|
Produce match results for IDOL input files. |
-q
|
(Optional) Sets “Quiet Mode” so that descriptive messages and redacted output are removed, and the output consists of the XML matchlist only (that is, an XML document with all the matches and any configured metadata). |
-r <redaction_file>
|
A copy of the input file, with all matches redacted.For example, if you specified an IDX input file, the content is sent to the redaction file as follows, with the redactions made in place:#DREREFERENCE 1 #DRECONTENT The driver ########## was questioned. #DREENDDOC |
-p
|
Set this parameter if you want to use a plaintext grammar file rather than an XML grammar file as the input text to extract from. |
The extract option requires an input file (either in IDOL IDX, IDOL XML, or plain text format) and either a configuration file or a grammar file. If you do not provide a configuration file, edktool
searches the file for any specified entities in the specified grammar (or all entities, if none are specified). For example, in the simplest command line:
C:\>edktool e -i myData.txt -g grammar1.ecr,grammar2.ecr
edktool
is invoked with no configuration file. It uses the command-line arguments to process the data file myData.txt
with the grammar files grammar1.ecr
and grammar2.ecr
. Eduction identifies all the entities in the two grammar files, and matches on these. The output is sent to the console in XML format, identifying matches in the data file and using the entity names to generate field names for the matches that contain the matched data. Assuming myData.txt
is a plain text file, the entire body of the file is matched.
You can also specify the -p
parameter at the command line to extract matches from a plaintext grammar file.
The plaintext grammar file must be in the format described in Plaintext Grammar File Format .
|