Glossary
-
Autonomy Content Infrastructure. ACI is the method that Autonomy components use to communicate with each other.
-
A server component that runs on the Autonomy Content Infrastructure (ACI).
-
A request sent to an ACI Server.
-
A process that searches for information about a specific topic, according to a training query or text.
-
Adaptive Probabilistic Concept Modeling. A statistical method, used by Knowledge Discovery to calculate query result weights and ranks.
-
A set of criteria that define a particular topic, which you can use to categorize documents that contain content relevant to the topic.
-
An automatically identified set of related documents.
-
A group of users with related interests or expertise.
-
Autonomy Connectors extract content from source repositories, and send them to CFS for processing and indexing.
-
CFS processes data from connectors and sends it to the Content component for indexing.
-
A Content component database is a data pool that stores indexed information. You can restrict queries to a particular database.
-
DAH distributes actions to multiple versions of a Knowledge Discovery component. It allows you to use failover, load balancing, or distributed content.
-
DIH distributes index actions to multiple versions of the Content component. It allows you to copy or distribute your content.
-
The process of extracting entities (patterns of text) from documents.
-
A word, phrase, or block of information that the Eduction component can match and extract from documents.
-
In Eduction, an entity is a word, phrase, or block of information. An entity can be a specific text string, such as a name, or it can be a pattern of text such as an address or phone number. You define the pattern in a grammar, which Eduction uses to find the entities in documents.
-
The process of extracting text, metadata, and subfiles from documents. File Content Extraction performs this extraction process in Knowledge Discovery.
-
A group of settings that instruct a connector how to retrieve data from a repository. Connectors can run fetch tasks automatically, or in response to an action.
-
Fields define different parts of content in Knowledge Discovery documents, such as the title, content, and metadata information.
-
A product that extracts data, including text, metadata, and subfiles from over 1,000 different file types.
-
In Eduction, a grammar is a pattern that defines an entity.
-
An encoded value that identifies the source of a document in Knowledge Discovery. Connectors and CFS add identifiers to every document that they create for index into the Content component. They store this value in the AUTN_IDENTIFIER field.
-
A format of documents that the Content component uses for indexing.
-
Importing is the process where CFS, using File Content Extraction, extracts metadata, content, and sub-files from items retrieved by a connector. CFS adds the information to documents so that it is indexed into the Content component. Importing allows Content to use the information in a repository, without needing to process the information in its native format.
-
A processing task run by CFS on new documents before they are indexed into to the Content component. Import tasks can run before or after (pre- or post-) File Content Extraction filtering.
-
The data index contains document content and field information for analysis and retrieval.
-
A processing task run by CFS on documents. Update-index tasks run when a document's metadata (but not its content) is updated. Delete-index tasks run when a document is deleted from a repository.
-
The process of adding information to the Content component index. This process includes all linguistic processing, storing information for optimized retrieval, and storing document content.
-
Ingestion converts information that exists in a repository into documents that can be indexed into the Content component. Ingestion starts when a connector finds new documents in a repository, or documents that have been updated or deleted, and sends this information to CFS. Ingestion includes the import process, and processing tasks that can modify and enrich the information in a document.
-
The Knowledge Discovery data platform allows you to convert your source content to an index of useful data that you can search and analyze.
-
An embedded scripting language that you can use to write custom scripts to expand certain Knowledge Discovery functionality.
-
Information about a user, based on the concepts in documents that they access. The Community component automatically creates profiles for users, according to their interests.
-
A search string that you can use to retrieve information from the Content component that matches your syntax. Queries include simple conceptual queries, and complicated strings with Boolean, proximity, and field restriction expressions.
-
A server that manipulates incoming searches to modify the query, modify the results set, or return promotions.
-
A string that is used to identify a document. This might be a title or a URL, and allows the Content component to identify documents for retrieval, indexing, and deduplication.
-
Security includes anything that makes sure that only authorized users can access or perform actions on data. It includes making sure that only permitted users can view and retrieve documents, user authentication, and secure communications.
-
In clustering with snapshots, a seed is a potential cluster. It contains a document, and suggested conceptually similar documents from the Content component index.
-
A form of Eduction that identifies positive and negative sentiment in text.
-
The process of finding a common root for related words. The Content component uses this common root to search, so it finds the concept rather than the particular term.
-
A list of stop words, which the Content component uses to discard these terms at index and query time, which can improve performance.
-
A word in a language that occurs frequently and does not add much meaning to text.
-
A file that has been extracted from a container (such as a .ZIP archive).
-
A few sentences or paragraphs that describe what a document is about. The Content component can automatically create summaries from document content.
-
A hierarchical structure of categories, or other information, which gives you an overview of your data.
-
The basic entity that the Content component indexes; for example, a word after stemming.
-
Terms and Weights. These are used in categorization to define the most important terms that define a category topic.
-
Text, documents, and query syntax used to define the topic that an agent or category must match.
-
Wildcard searches (and some advanced search techniques) use the index of unstemmed terms. This index contains all the exact terms that occur in documents.
-
A Knowledge Discovery component that converts files in a repository to HTML formats for viewing in a Web browser.
A
C
D
E
F
G
I
K
L
P
Q
R
S
T
U
V