This parameter allows you to specify how to normalize Chinese, Japanese, and Korean data before extraction.
You can set the following values:
Kana
. Normalize half width kana to full width kana.
OldNew
. Normalize old kanji to new kanji.
Number
. Normalize Chinese or kanji number characters to ASCII number characters.
HWNum
. Normalize full width number characters to ASCII number characters.
HWAlpha
. Normalize full width alphabet characters to ASCII alphabet characters.
SimpChi
. Normalize traditional Chinese to simplified Chinese.
FWJamo
. Normalize half width jamo to full width jamo.
Separate multiple options with a comma.
Type: | String |
Default: | None |
Required: | No |
Configuration Section: |
Eduction |
Example: | CJKNormalization=SimpChi,Kana
|
See Also: |
|