Key IDOL Concepts
The IDOL product family contains many different components, and a large range of features and functions. This section provides some information about the basic concepts in IDOL.
Unstructured Data
Digital data generally falls into two forms. Structured data is well-organised and easily searchable by computers, such as relational databases. Unstructured data is the more common human-readable information, such as documents, video, audio, and image files. IDOL can manage both forms of data, but its greatest strength is its ability to extract meaning and useful insights from the unstructured data.
IDOL Text
IDOL Text is the part of IDOL that provides search, analytics and data enrichment for unstructured text sources. IDOL Text includes the main text index, which allows you to search your text-based data. It also includes data enrichment such as categorization, data clustering, and entity extraction (Eduction), which finds useful snippets of information such as names and addresses), and categorization.
IDOL Rich Media
IDOL Rich Media is the part of IDOL that provides analytics and data enrichment for multimedia sources. Rich Media support is provided by IDOL Media Server.
Media Server analyzes video files and streams, images, and audio to extract information about their content. It can run analysis operations such as face recognition, number plate recognition, speech-to-text, and speaker identification.
IDOL KeyView
KeyView is the part of IDOL that processes documents in their native format to get the text content and export for easy viewing. It is embedded into other IDOL components, such as NiFi Ingest, and the IDOL View component.
In NiFi Ingest, KeyView performs format detection, which reads the file to detect the correct format, so that you can process the file correctly. For text-based file formats, KeyView then finds and extracts the text. It can also extract files from containers (such as zips, files with embedded images or imported subfiles, or emails with attachments), and process the subfiles.
In View, KeyView renders an original document into HTML format, which you can use for easy viewing in a Web browser, for example to have a document preview in your search application.
KeyView is also available as the following SDKs, which allow you to embed KeyView functionality in your own custom applications:
-
The KeyView Filter SDK detects and extracts text content from a variety of files.
-
The KeyView Export SDK renders original document formats into HTML or XML for easy Web viewing.
-
The KeyView Viewing SDK renders files for viewing in Windows applications.
-
The Panopticon SDK allows you to decrypt files that have been protected with Microsoft Azure Rights Management System (RMS).
IDOL Ingest
IDOL Ingest is the part of IDOL that retrieves your data from your various repositories.
You retrieve data from your repositories (such as databases, file systems, Web sites, and email) by using IDOL Connectors. The connector contacts and retrieves data from these repositories.
Ingest uses an embedded version of KeyView to detect the file format, and process it accordingly. For example, it might extract text from text-based files, perform Optical Character Recognition (OCR) on images to find text, or use speech-to-text on video to convert audio to text.
In a wider IDOL setup, the ingest components can send the text data to your IDOL index. You can also use IDOL ingest without an IDOL text index to extract and process content from your repositories.
IDOL Ingest and its connectors are available in two formats:
-
IDOL NiFi Ingest is a newer format based on Apache NiFi, where the connectors and ingest component are all available in one place, accessible with a Web user interface. You can use the interface to create complex workflows and manage all your connectors, data enrichment, and document flows.
-
Connector Framework Server (CFS) is the older, ACI server-based ingest component (see ACI Servers). In this case, all the connectors are also ACI servers, and it performs much of its data enrichment by connecting to other IDOL ACI servers, such as Media Server and Category.
In addition to retrieval, many connectors have additional features such as:
-
view and browse all the documents in a particular repository.
-
access the repository to view the original documents.
-
retrieve documents and push them to a different repository.
ACI Servers
ACI servers are IDOL components that use a common interface, the ACI API. ACI servers use a Web API framework that allows you to send HTTP requests (known as actions), and return XML responses.
ACI servers share a lot of common configuration concepts, and many standard actions.
IDOL Server
IDOL Server refers to a set of components that together perform most of the IDOL Text Analytics back end functionality.
The following components are part of IDOL Server:
-
Content. Text indexing and query.
-
Category. Categorization and clustering.
-
Community. User management and security.
-
View. HTML document viewing for Web browsers.
IDOL Server also includes an Agentstore component, which is a separately configured Content component for storing agents and categories.
In very simple training and testing environments, you can use these components in a unified form with an IDOL Proxy component. In this case you send all actions to IDOL Proxy, which distributes them to the appropriate component.
In production environments, OpenText recommends that you send actions directly to the appropriate component, without using IDOL Proxy. The component-based setup improves scalability and reduces overhead, as well as making it easier to maintain and configure your system.
Distributed Systems
It is often advisable or useful to install an IDOL system across multiple servers. This kind of setup is known as a distributed system.
For IDOL Text, there are two ACI server components that manage indexing and querying over a distributed index. These are the Distributed Index Handler (DIH) and Distributed Action Handler (DAH).
You can also use other common networking tools to distribute the stateless parts of your IDOL system, for example for load-balancing.
IDOL Security
IDOL provides methods to secure access to your data, and communications between different users and components.
-
User authentication. At the front end, users must log on before they can query IDOL. IDOL can authenticate users against an existing directory service, such as Microsoft Active Directory.
-
Document security. Many data repositories have security features that apply permissions to files, so that only authorized users can view them. IDOL Document security maintains these access restrictions in your IDOL index, so that queries return only documents that the logged-in user has permission to view.
-
Index Encryption. You can encrypt your IDOL text index on disk, to prevent unauthorized access. For more information about index encryption, refer to the IDOL Content Component Help.
-
Secure communications. IDOL supports TLS/SSL for secure communications. IDOL also supports GSSAPI for authentication and secure communications.