Media Server

Media Server is an ACI server. For details of changes that affect all ACI servers, see ACI Server Framework.

23.2.2

Media Server 23.2.2 resolves an issue that prevented more than one speech-to-text model version being loaded for the same language. For example, it was not possible to run tasks using both the small and large models to process ENUK speech.

23.2.0

New Features

  • Media Server supports new speech-to-text models that offer significantly better accuracy, especially for English speech. The speech-to-text analysis engine has a new configuration parameter, ModelVersion. The default behavior (ModelVersion=legacy) uses the same models as Media Server 12.x. To use one of the new speech to text models you must set this parameter to either small (the fastest of the new models) or large (which offers the best accuracy). Custom language models and custom word dictionaries are neither necessary nor supported with the new models, because the vocabulary of the new models is not limited by their training. Due to their size, the new models are not included in the Media Server package and must be downloaded separately.
  • Media Server can perform speaker clustering, which segments an audio recording into different speakers. There is a new analysis engine (Type=ClusterSpeech). Speaker clustering does not need training but does require that you install an appropriate speech-to-text language pack.
  • Transcript alignment (action=AlignAudioTranscript) has a new parameter named IngestDateTime. You can use this to configure the start time for the timestamps. For example, set this parameter when you want the timestamps to match the time when the video was broadcast.
  • Face recognition accuracy has been significantly improved.
  • OCR accuracy has been improved when processing high resolution images in scene mode.
  • The OCR WordResult track supports scrolling text (often seen below television news broadcasts).
  • Strict action validation has been extended so that it validates any session configuration that you pass to a process action. When strict action validation is used, the process action will fail immediately if the configuration includes unknown parameters, or includes a configuration section that is referenced but contains no parameters.

Resolved Issues

  • Media Server could terminate unexpectedly when running Optical Character Recognition (OCR).
  • Some temporary files produced by KeyView were written to the system temporary directory, rather than the location specified by the TempDir parameter in the [Paths] section of the configuration file. Some temporary files were not deleted when no longer required.

Notes

  • The older session configuration format, which was replaced in Media Server 12.0, is no longer supported. Media Server 23.2 only reads the list of processing tasks from the [Session] section of the configuration file. The older section names, such as [Ingest], [Analysis], [Transform], [Encoding], and [Output], are now ignored.
  • Media Server no longer supports the alias Image_1 for the Default_Image track. In your session configuration files, replace Image_1 with Default_Image. This change does not affect fully-qualified track names such as TaskName.Image_1, where TaskName is the name of an ingest task.
  • The deprecated parameters Language and CustomLM have been removed from the speech-to-text analysis engine. Use the LanguagePack and CustomLanguageModel parameters instead.
  • Strict action validation has been enabled by default. If you prefer to disable this, set the configuration parameter StrictActionValidation=FALSE in the [Server] section of the configuration file.
  • The records in the OCR WordData and WordResult tracks now have unique UUIDs for each word. In earlier versions of Media Server all of the words on the same line had the same UUID. Now, there is a unique UUID for each word and the parentID field provides the UUID of the parent line.

Deprecated Features

The following features are deprecated and might be removed in a future release.

Category Deprecated Feature Deprecated Since
Speech processing The SampleFrequency parameter, for speech-to-text and for the AlignAudioTranscript action, has been deprecated. Media Server can now determine the correct sample frequency from the language pack. 23.2.0
Transcript alignment The output of the AlignAudioTranscript action has been updated. The startTime, duration, and endTime fields (which provide timestamps in milliseconds from the beginning of the file) have been deprecated. There is now a timestamp field that provides timestamp information in both ISO 8601 format and in epoch microseconds. 23.2.0
Face detection In the XML output from face detection there are fields named outOfPlaneAngleX, outOfPlaneAngleY, and percentageInImage. The macros and Lua table entries that were available to access these data were named outofplaneanglex, outofplaneangley, and percentageinimage (note the difference in case). Media Server has been updated so that the macros and Lua table entries are consistent with the XML output. The all-lowercase names have been deprecated and will be removed in a future release. 23.2.0
OCR The Blacklist and Whitelist configuration parameters have been deprecated. Use the parameters DisabledCharacters and ExtraEnabledCharacters instead. 23.2.0
Actions The GetLatestRecord action. The new actions KeepLatestRecords and GetLatestRecords provide more control over what to store and retrieve. 12.5.0

Supported Platforms

Media Server 23.2.0 is supported on the following platforms.

Windows (x86-64)

  • Windows Server 2022
  • Windows Server 2019
  • Windows Server 2016
  • Windows Server 2012

Linux (x86-64)

The minimum supported versions of particular distributions are:

  • Red Hat Enterprise Linux (RHEL) 7
  • CentOS 7
  • SuSE Linux Enterprise Server (SLES) 12
  • Ubuntu 14.04
  • Debian 8

Documentation

The following documentation is available for Media Server version 23.2.0.

  • Media Server Help