CJKNormalization

This parameter allows you to specify how to normalize Chinese, Japanese, and Korean data before extraction.

You can set the following values:

  • Kana. Normalize half width kana to full width kana.
  • OldNew. Normalize old kanji to new kanji.
  • Number. Normalize Chinese or kanji number characters to ASCII number characters.
  • HWNum. Normalize full width number characters to ASCII number characters.
  • HWAlpha. Normalize full width alphabet characters to ASCII alphabet characters.
  • SimpChi. Normalize traditional Chinese to simplified Chinese.
  • FWJamo. Normalize half width jamo to full width jamo.

Separate multiple options with a comma.

Type: String
Default: None
Required: No
Configuration Section:

Any section that you have defined for Eduction settings

Example: CJKNormalization=SimpChi,Kana
See Also: