Per-Language Sentence-Breaking Files
For languages in which words are not delimited by spaces (Japanese, Chinese, Thai, and Korean), the IDOL Content component uses sentence-breaking libraries. In a default IDOL Content component installation, these files are stored in the IDOL/langfiles
directory.
If you run Content on a UNIX platform, specify the LD_LIBRARY_PATH
to ensure that Content can find the sentence-breaking files that it requires.
The following tables list the files that the individual languages require.
-
Japanese
NT UNIX japanesebreaking.dll
\jpn-cha\cforms.cha
\jpn-cha\chadic.da
\jpn-cha\chadic.lex
\jpn-cha\chasenrc
\jpn-cha\connect.cha
\jpn-cha\ctypes.cha
\jpn-cha\grammar.cha
\jpn-cha\matrix.cha
\jpn-cha\table.cha
libchasen.dll
japanesebreaking.so
/jpn-cha/cforms.cha
/jpn-cha/chadic.da
/jpn-cha/chadic.lex
/jpn-cha/chasenrc
/jpn-cha/connect.cha
/jpn-cha/ctypes.cha
/jpn-cha/grammar.cha
/jpn-cha/matrix.cha
/jpn-cha/table.cha
-
Traditional Chinese
NT UNIX chinesebreaking.dll
big5togb.txt
wordlist.txt
chineseconvlist.txt
chinesebreaking.so
big5togb.txt
wordlist.txt
chineseconvlist.txt
-
Simplified Chinese
NT UNIX chinesebreaking.dll
big5togb.txt
wordlist.txt
chineseconvlist.txt
chinesebreaking.so
big5togb.txt
wordlist.txt
chineseconvlist.txt
-
Thai
NT UNIX thaibreaking.dll
thaidict.txt
thaiconvlist.txt
thaibreaking.so
thaidict.txt
thaiconvlist.txt
-
Korean
NT UNIX koreanbreaking.dll
main.dat
prob.dat
main.fst
prob.fst
pos.nam
tag.nam
tagout.nam
connection.txt
StopPosNam.txt
TagName.txt
koreanconvlist.txt
koreanbreaking.so
main.dat
prob.dat
main.fst
prob.fst
pos.nam
tag.nam
tagout.nam
connection.txt
StopPosNam.txt
TagName.txt
koreanconvlist.txt