Use Eduction

Eduction is a useful part of pre-index processing, which allows you to create fields with extracted values automatically. If your documents have consistent, well-defined formats, you can set up an Eduction task to extract different parts of the document to different fields. For example, if the documents are letters, you can extract the name, address, and date from each document into pre-defined fields.

You can also use Eduction to extract information from your data that you want to use as metadata.

Eduction is useful for processing information. After you have set it up, it automatically extracts information in a consistent way. The syntax that you use to define entities is very expressive.

One use of Eduction is for extracting information from the results of Optical Character Recognition. You can use Eduction to automatically extract metadata from the result data. Another major use of Eduction is Sentiment Analysis.

Another use of Eduction is for extracting information that might be personally identifiable information (PII), so that you can work out what user data you store for data protection regulations such as GDPR. OpenText provides a special package of PII grammars to help you find and extract PII from your data. For more information, refer to the IDOL PII Package Technical Note.

Choose the Entities to Extract

When you decide how to use Eduction, consider your data, and how you want to be able to retrieve information. You should include all the entities that you are likely to want to search for, while minimizing the total number of values that you extract.

The Eduction process uses additional indexing resources, and the extra document fields add to the index size and the time taken to index data. If you choose the minimum set of useful entities to extract, it gives you the most efficient index, which in turn makes retrieval more efficient.

Use Entities

When you have extracted entities, you can add them into document Fields. This process is known as Document Tagging.

IDOL Server has several ways of making retrieval more efficient for different types of data. You can use optimized field properties to make it quicker to filter and retrieve the values of the different entities.

How you use the entity tag fields depends partly on the entity. If you add each entity that you extract to a different field, you can treat each one in a different way, to maximize the value that you get, and minimize the cost in terms of index size and processing time.