Introduction

Language identification identifies the language of speech. You can run closed-set or open-set language identification:

  • Closed-set identification is the default mode. Media Server expects the language to be one of a list that you specify with the Languages parameter when you configure the analysis task. If you know which languages are likely to occur, OpenText recommends setting the Languages parameter because restricting the possible languages can increase accuracy and improve performance. If the ingested audio includes speech in another language, Media Server attempts to identify it as one of the languages in the list.
  • Open-set identification allows Media Server to decide that speech is in an unknown language. If you know which languages are likely to occur, you should still set the Languages parameter to limit the possible options. Open-set language identification might lead to speech in a known language being labeled as unknown. OpenText recommends that you use open-set identification only if you need to recognize an unknown language as unknown, or if you want to identify whether the speech belongs to a particular language and do not need to identify other languages.