Case Sensitive Matches
By default, Eduction matches characters case sensitively, which has better performance than case insensitive matching. When you require case insensitive matching, there are several ways to configure it:
-
configure
MatchCase
. -
configure individual grammars, entities, and entries with case sensitivity options, in a custom grammar file.
-
use case normalization.
Micro Focus recommends that you always create and use Eduction grammars that allow you to do case sensitive matching, because it has better performance. Most of the standard grammars come with entities using common and appropriate case styles. Some also have different entities for different case styles. If your data uses a consistent case, it is unlikely that you need to use case insensitive matching.
Configure MatchCase
The simplest way to turn off case sensitivity is to set the MatchCase
configuration parameter to False
in the configuration file.
If you run Eduction with MatchCase=False
, entities are optimized for case-insensitive matching when they are loaded. This can increase the time required to initialize Eduction, so if you regularly use a grammar file with MatchCase=False
you can optimize the entities for case-insensitive matching at compile time instead. See Optimize Case-Insensitive Matching.
MatchCase
applies to all matches, so it can have a significant performance impact. In general, Micro Focus recommends that you use one of the other options to enable case insensitive matching. For the best performance, write a grammar file using only upper or lower case and then normalize your input (see Case Normalization).
Configure Custom Grammars
When you create your own custom XML grammar files, you can configure individual grammars, entities, and entries individually to be case sensitive or insensitive.
When you configure case sensitivity at a lower level, it overrides the higher level settings. Additionally, if you reference the entity in another entity, it maintains its own case sensitivity setting.
Most entities in the standard grammars do not have case sensitivity set explicitly, giving you the flexibility to use case sensitivity as required in your grammars.
NOTE: If you design an entity for case insensitive matching, it is important that entries in the entity have a consistent case style to ensure that all matches are extracted correctly. You should use all lower case, all upper case, or all initial capitals, but not a mixture.
Eduction uses an optimization technique for case insensitive matching that might not extract every possible match if you do not define the entity consistently.
Case Normalization
Case sensitive matching generally has better performance than case insensitive matching. When you require case insensitive matching, you can use case normalization to give the same performance as case-sensitive matching.
When you want to use case normalization:
-
Do not set case sensitivity explicitly in grammars and entities.
-
Set the
MatchCase
configuration parameter toTrue
. -
Create all entries in your entities in either all lower case, or all upper case.
-
Set
CaseNormalization
to:LOWER
if all your entities are lower case.UPPER
if all your entities are upper case.
Eduction normalizes the input data accordingly before the (case sensitive) matching. This process means that both your input and grammars are all in the same case, so the matching is effectively case insensitive, with the performance benefits of case sensitive matching.
For more information about these configuration parameters, see CaseNormalization and MatchCase.