The Passage Extractor entity extraction file provides Answer Server with a map to specify what components to use to extract entities, depending on the question classification.
When you ask a question, Passage Extractor classifies it by using the question classifier, and then finds matching documents and document sections in the data store. It uses IDOL Content highlighting to find the most relevant passages, which it uses as candidate answers. Passage Extractor then uses Eduction and an Agentstore component to find entities in the candidate answers that match the question classification.
For example, if you have an Agent entity database with the names of plants, and you send a question that Passage Extractor classifies as plants, Passage Extractor uses the Agentstore component to find the relevant plant entities in the candidate answer text.
By default, if you configure an Agentstore component, Passage Extractor uses the Agentstore for the classifications HUM:gr
, all LOC
classifications, ENTY:plant
, ENTY:animal
, and ENTY:lang
. It uses Eduction and Agentstore for the HUM:ind
question classification, and Eduction only for all other question classifications.
You can use the Entity Extraction file to modify these classifications, for example if you create additional Agent entity files for your data.
You do not need to specify an entity type to extract for every question classification. If a question classification does not appear in the entity extraction file, Passage Extractor does not attempt to extract entities. This might be appropriate for many question classifications (for example, if the appropriate answer is a long description, there might not be a corresponding entity).
Passage Extractor also attempts to corroborate the candidate answers, by comparing how often particular entities occur. In most cases, this improves the quality of the result answers.
In some cases, corroboration might not be appropriate. For example, if valid answers include very common words (such as one and two), the words might occur in multiple places, and be falsely corroborated as a likely answer. For this reason, corroboration is turned off for the NUM:count
entity type in the default entity extraction JSON file.
You might also want to turn corroboration off if likely answers occur only once in your data set. In these cases you can modify the entity extraction JSON file to turn corroboration off for particular entities.
The entity extraction file contains the question classifications, which match the values that you use in the classifier training file. For each question classification, it also contains at least one of:
When there is an Agentstore database, you can also specify Agent FieldText
to use in a query to the Agentstore entity database for the question classification.
The entity extraction file is a JSON file, with the following structure:
{ "entity_map": [ { "entity_type": "QuestionClass1", "agentstore": { "databases": [ListOfAgentstoreDatabases], "fieldtext": "FieldTextRestriction" }, "eduction": {"entities": [ListOfEductionEntities]}, "corroborate": Boolean }, { "entity_type": "QuestionClass2", "agentstore": { "databases": [ListOfAgentstoreDatabases], "fieldtext": "FieldTextRestriction" }, "eduction": {"entities": [ListOfEductionEntities]}, "corroborate": Boolean } ... ] }
where,
QuestionClassN
|
is the name of the question classification (for example, HUM:ind ). |
ListofEductionEntities
|
is an array of relevant Eduction entities. |
ListOfAgentstoreDatabases
|
is an array of databases in the Agentstore component that contain relevant entities. |
FieldTextRestriction
|
is an IDOL FieldText expression to use to restrict the Agent query in the specified database. |
You must specify at least one of the eduction
or agentstore
properties for each question classification. If you specify the agentstore
property, the database
property is required, but fieldtext
is not.
If you do not want to use entity extraction for a particular question classification, do not include it in the entity extraction file.
The corroborate
property is optional. The default value is true
.
The following example gives some of the question classifications in the default entity extraction file:
{ "entity_map": [ { "entity_type": "HUM:ind", "agentstore": {"databases": ["people"]}, "eduction": {"entities": ["hum/ind"]} }, { "entity_type": "NUM:date", "eduction": {"entities": ["num/date", "date/*"]} }, { "entity_type": "ENTY:plant", "agentstore": { "databases": ["organisms"], "fieldtext": "MATCH{PLANTAE,VIRIDIPLANTAE}:ORGANISMS_KINGDOM" } }, { "entity_type": "NUM:count", "eduction": {"entities": ["num/count"]}, "corroborate": false }, ...
The default entity extraction file, included in your Answer Server installation , is appropriate for most installation. However, you might need to modify the file if:
To update the entity extraction file
Open the entity extraction JSON file in a text editor.
Make the necessary modifications. You can add, delete, or update, any of details for the question classifications.
To turn off corroboration, add the corroborate
property in a particular group and set it to false
. For example:
{ "entity_type": "NUM:count", "eduction": {"entities": ["num/count"]}, "corroborate": false }
Save and close the entity extraction file.
Restart Answer Server for your changes to take effect.
If you add new question classifications that do not exist in the classifier training file, you must also update the classifier training file and retrain the classifier. See Train Passage Extractor Classifiers.
You can use the EntityExtractionFile
configuration parameter to configure the location of the entity extraction file. If you want to move or rename the entity extraction file, or use a different file for any reason, you must modify the value of this parameter to specify the name and location of the new file.
|