Glossary
-
See ACL.
-
Autonomy Content Infrastructure. A technology layer that automates operations on unstructured information for cross-enterprise applications. ACI enables an automated and compatible business-to-business, peer-to-peer infrastructure.
-
A server component that runs on the Autonomy Content Infrastructure (ACI). ACI servers use the ACI API to accept queries and actions, and return XML responses.
-
Access Control List. A metadata string associated with a document that defines which users and groups are permitted to access the document.
-
A request sent to an ACI server.
-
A domain controller for the Microsoft Windows operating system, which uses LDAP to authenticate users and computers on a network.
-
See APCM.
-
A process that searches for information about a specific topic. An administrator can create agents for users or allow users to create their own agents. See Also: explicit agent, implicit agent
-
An index that stores agents and profiles.
-
A field that stores Boolean agents (Boolean or proximity expressions that legacy technologies use to categorize documents). You can then query the Agentstore component with text and an AgentBoolean field to return categories whose Boolean agent matches this text.
-
Automatic Language Detection. The process of automatically detecting the language of a particular document, and indexing it into the Content component according to the rules for the detected language.
-
An automatic process for alerting users, by e-mail, text, or message, when new content is added to the index that matches their agents or profiles. See Also: mailing.
-
A component that processes natural language questions, and returns direct answers.
-
Adaptive Probabilistic Concept Modelling. A technique whereby terms are given a weight according to their statistical importance in the Content component index. Terms can have a weight between 0 and 255.
-
Automatic Query Guidance. A set of operations that use the results from query summaries. AQG includes dynamic thesaurus generation, automatic query disambiguation, query refinement, and rapid clustering of a results set.
-
The process of checking user credentials (user names, passwords, and PIN codes) against a Community component or external security repository. The authentication process identifies a user, and allows Community component to confirm their access permissions for different documents.
-
An internal Content component document rank, which determines the order in which two or more documents return in a results list when the relevance or other sort option is equal.
-
See ALD.
-
See AQG.
-
See ACI.
-
A type of query that uses Boolean terms such as AND OR, and NOT to specify matching criteria.
-
The process of matching documents against the available categories, and optionally tagging the document with category information.
-
A set of criteria that define a particular topic, which you can use to categorize documents that contain content relevant to the topic.
-
A component that manages categorization and clustering.
-
An index that stores categories.
-
Text or documents that define a topic or subject for a particular category. When the Category component categorizes documents, it matches document content to similar category training.
-
Connector Framework Server. A component that processes the information that is retrieved by connectors. CFS uses File Content Extraction to extract document content and metadata from over 1000 different file types. When the information has been processed, it is sent to a Content component index or Distributed Index Handler (DIH).
-
A set of documents that Knowledge Discovery identifies as being related. Each cluster represents a concept area, which contains a set of items that share common properties. Clustering data allows you to make trends and developments in data visible.
-
The process of grouping documents into sets (clusters) that have related content. Each cluster represents a concept area, which contains a set of items that share common properties. Clustering data allows you to make trends and developments in data visible.
-
A query operation that combines two or more query results into a specified smaller number of results. The most usual case is to combine two or more sections of the same document as a single query result. It can also combine results by a reference or metadata field value.
-
All the people in a user network neighborhood. It allows users to find other people in the community who have been looking at similar documents, or have agents that are similar to their agents.
-
A component that manages users and communities.
-
A brief summary of each result document that returns for a query. The concept summary displays a few sentences that are typical of the result content (these sentences can be from different parts of the result document).
-
A type of query that allows you to search for documents that match the concept that your query text defines, rather than matching the particular keywords in your text. See Also: query.
-
A component (for example File System Connector) that retrieves information from a local or remote repository (for example, a file system, database, or Web site).
-
See CFS.
-
A component that manages the data index and performs most of the search and retrieval operations from the index.
-
A conceptual summary of the result document that is biased by the terms of the query. A context summary includes sentences that are particularly relevant to the terms in the query (these sentences can be from different parts of the result document).
-
The process that Connectors and Web crawlers use to retrieve content from Web resources, by recursively following hyperlinks from an initial page. See Also: spidering
-
Distributed Action Handler. A component that distributes actions to multiple copies of a Knowledge Discovery component. It allows you to use failover, load balancing, or distributed content.
-
An index that stores content data. You can customize how to store data in the data index by configuring appropriate settings in the Content component configuration file.
-
A Content component data pool that stores indexed information. The administrator can set up one or more databases, and specifies how data is fed to the databases. By default, the Content component contains the databases News, and Archive, and the Agentstore component contains the databases Profile, Agent, Activated, and Deactivated.
-
The default user role in the Community component. A default user has only the privileges that have been allocated to this default role.
-
Distributed Index Handler. A component that allows you to efficiently split and index extremely large quantities of data into multiple copies of the Content component. DIH allows you to create a scalable solution that delivers high performance and high availability. It provides a flexible way to batch, route, and categorize the indexing of internal and external content into the Content component.
-
See DAH.
-
See DIH.
-
An integrated security solution to protect your data. At the front end, authentication checks that users are allowed to access the system that contains the result data. At the back end, entitlement checking and authentication combine to ensure that query results contain only documents that the user is allowed to see, from repositories that the user has permission to access. For more information, refer to the Document Security Administration Guide.
-
A type of automatic query guidance (AQG) that provides a list of similar terms and concepts for a particular query.
-
The process of extracting entities (patterns of text) from documents.
-
In Eduction, an entity is a word, phrase, or block of information that the Eduction component can match and extract from documents. An entity can be a specific text string, such as a name, or it can be a pattern of text such as an address or phone number. You define the pattern in a grammar, which Eduction uses to find the entities in documents.
-
see Eduction
-
A Knowledge Discovery operation to find groups of users with a particular set of expertise or interests.
-
An agent that users explicitly create from themselves. See Also: agent, implicit agent
-
See XML.
-
The process of extracting text, metadata, and subfiles from documents. File Content Extraction performs this extraction process in Knowledge Discovery.
-
See: parametric search
-
The process of downloading documents from the repository in which they are stored (such as a local folder, Web site, Lotus Domino server, and so on), importing them to IDX format, and indexing them into the Content component.
-
A group of settings that instruct a Connector how to retrieve data from a repository. Connectors can run fetch tasks automatically, or in response to an action.
-
Fields define different parts of content in documents in the Knowledge Discovery index, such as the title, content, and metadata information.
-
A syntax string that defines a matching criteria in FieldText.
-
A type of query that searches for particular content in a particular document field. See Also: query, field
-
The component that extracts data, including text, metadata, and subfiles from over 1,000 different file types. File Content Extraction can also convert documents to HTML format for viewing in a Web browser.
-
See parametric search.
-
A search for a location, based on coordinates. With appropriate content, Knowledge Discovery can search for a specific location, or locations in a particular area, or within a specified distance of a point.
-
In Eduction, a grammar is a file that defines the entities that you want to extract. It can be a simple list of entiites, or a pattern that defines what the entity looks like. See Also: Eduction, entity
-
The ability for Knowledge Discovery to connect related documents to results, by using suggestions. See Also: suggest
-
An encoded value that identifies the source of a document in the Content component. Connectors and CFS add identifiers to every document that they create for indexing into Content. They store this value in the AUTN_IDENTIFIER field.
-
A structured file format that can be indexed into the Content component. You can use a connector to import files into this format, or you can manually create IDX files.
-
An agent that is created as part of a user profile. When you profile a user, Knowledge Discovery creates these agents for a user, according to the documents and search results that the user views. See Also: agent, explicit agent
-
After a document has been downloaded from the repository in which it is stored, it is imported to an IDX or XML file format. This process is called importing.
-
The data index contains document content and field information for analysis and retrieval.
-
A command to index data, or to maintain and manipulate the data index.
-
Fields that the Content component processes linguistically when it stores them. Store fields that contain text that you want to query frequently as Index fields. Content applies stemming and stop word lists to text in Index fields before it stores them, which allows Content to process queries for these fields more quickly.Typically DRETITLE and DRECONTENT are fields that are set up as Index fields.
-
The process of storing data in the Content component. Content stores data in different field types (such as index, numeric, and ordinary fields). It is important to store data in appropriate field types to ensure optimized performance.
-
See IQL.
-
Intelligent Query Logic. Functionality that allows you to set up rules to return a particular set of documents, or to run a secondary query in response to an initial keyword or conceptual query.
-
The first frame following a significant scene change. Keyframes are often used as preview images for video clips.
-
A family of products that allow you to collect, ingest, index, and process unstructure, semi-structured and structured information from multiple sources and repositories.
-
Lightweight Directory Access Protocol. Applications can use LDAP to retrieve information from a server. LDAP is used for directory services (such as corporate email and telephone directories) and user authentication. See also: active directory, primary domain controller.
-
License Server enables you to license and run multiple Knowledge Discovery solutions. You must have a License Server on a machine with a known, static IP address.
-
Also referred to as "links". Terms in query text that are also contained in the result documents that the Content component returns for a query.
-
An embedded scripting language that you can use to write custom scripts to expand certain Knowledge Discovery functionality.
-
An automatic process for sending an email to users when new content is added to the Content component index that matches their agents or profiles. See Also: alerting
-
A security setup where Connectors index documents into the Content component index with an encrypted access control list (ACL), which Knowledge Discovery uses to match user permissions for the document. With this method, Knowledge Discovery does not need to check the original data repository to check the security information every time a user attempts to access the document. See Also: ACL
-
The ability for computers to act on the meaning of content. This includes conceptual searching, and also workflows that automatically process documents according to their content.
-
A component that analyzes video files and streams, image files, and audio to extract information about their content. Media Server can run analysis operations such as face recognition, number plate recognition, speech-to-text, and speaker identification.
-
The process of answering a question that is asked in normal speech-style language, rather than in query language. Answer Server can process natural language questions and return answers.
-
A component that manages access permissions for your users. It communicates with your repositories and components to apply access permissions to documents.
-
A type of query that returns a list of all possible values of a specified field for documents that match a particular standard query. You can use the values to find matching documents with a particular property. This process is also known as filtering or faceted search. Compare With: FieldText
-
Personal Identification Number security feature used in addition to a user ID and password.
-
A server computer in a Microsoft Windows domain that controls various computer resources. See also: active directory, LDAP.
-
Role-based capabilities that determine, for example, whether a user is allowed to access specific data.
-
Information about a user that is based on the concepts in documents that the user reads. Every time a user opens a document, the Community component updates their profile. This process allows the administrator to bring new documents that match the interests in a user profile to the attention of the users.
-
Targeted content that you want to display to users but that is not included in the search results, such as advertisements.
-
A component that accepts incoming actions and distributes them to the appropriate subcomponent. Proxy also performs some maintenance operations to make sure that the subcomponents are running, and to start and stop them when necessary.
-
A component that modifies user queries and manipulates the results, for example to return promotions, remove particular query terms, or add synonyms.
-
A text string that you submit to the Content component, which analyzes the concept of the query text and returns documents that are conceptually similar to it. You can submit queries to Content to perform several kinds of search, such as natural language, Boolean, bracketed Boolean, and keyword.
-
See QMS.
-
A query operation that determines the important topics and phrases in a set of documents. Query summaries are used in Automatic Query Guidance (AQG). See Also: AQG.
-
A brief summary of each result document that returns for a query. The quick summary displays the first few sentences of the result document.
-
The process of removing sensitive content from output. Knowledge Discovery supports text redaction through Eduction, and face redaction in Media Server.
-
A string that is used to identify a document. This might be a title or a URL, and allows the Content component to identify documents for retrieval, indexing, and deduplication.
-
Fields used to identify documents. At index time the Content component can use ReferenceType fields to eliminate duplicate copies of documents. It uses them at query time to filter results.
-
The similarity that a particular query result has to the initial query. The Content component assigns results a percentage relevance score according to how closely it matches the query criteria.
-
The process used to increase the accuracy of agents by indicating which of the results that return to you are most relevant to your query. The retrained agent then returns more relevant results.
-
The ability to analyze audio, images, and videos for additional value and information, such as speech to text, optical character recognition (OCR), face recognition and identification, and object classification.
-
A set of privileges that an administrator can allocate to a Community component user.
-
Part of the Data Admin application that allows you to manage Knowledge Discovery content.
-
The process of separating long documents into multiple sections for indexing. The number of sections increases in proportion to the size of the document. This process ensures that when you, for example, query for text that is relevant to a specific part of a book, the Content component can find the appropriate section and return it. If the book was not indexed in sections, Content might not find the text you searched for, because it might not be conceptually relevant to the entire book.
-
See section breaking.
-
Security includes anything that makes sure that only authorized users can access or perform actions on data. It includes making sure that only permitted users can view and retrieve documents, user authentication, and secure communications.
-
In clustering with snapshots, a seed is a potential cluster. It contains a document, and suggested conceptually similar documents from the index.
-
A form of Eduction that identifies positive and negative sentiment in text.
-
Internal raw data from which you can extract clusters. You can thus generate cluster information and spectrographs.
-
A query type that allows you to search for a term by using a phonetic spelling.
-
A type of query that allows you to search for a term by using a phonetic spelling.
-
A graphical representation of the results of clustering. The spectrograph displays clusters of documents, and the similarities between different clusters.
-
The process that Connectors use to retrieve content from Web resources, by recursively following hyperlinks from an initial page.
-
The process of extracting the morphological root of a word. In languages, some words have a common root. The Content component includes stemming algorithms that reduce words to this form. This process allows Content to match concepts regardless of the grammatical use of words. In English, for example, the words 'help', 'helpful', 'helping', and 'helped' all reduce to their stem 'help' without significant loss of meaning.
-
A very common word that occurs too frequently to be useful for searching. Stop words include articles (for example, the) and prepositions (for example, to or from). Stop words are language-specific. You can use a stop word list to allow the Content component to discard these words at index and query time to save index space and improve retrieval performance.
-
Also called stop list. A list (located in the Content component langfiles directory) that contains common words (stop words) that Content does not store. Words such as the, and, or a occur too frequently to carry any significance, and Content does not require them to understand the concept of the text.
-
The process of removing the words listed in the stop word list from documents before they are stored in the Content component, and from query text before it is matched against indexed content.
-
A set of query results, which is stored in the Content component for you to re-use later in other operations. When you store a state, Content provides a state token, which you can use to retrieve the stored state.
-
A file that has been extracted from a container (such as a .ZIP archive)
-
A type of query that returns documents that contain similar concepts to a particular document, rather than matching a particular query string. See Also: query
-
A few sentences or paragraphs that describe what a document is about. Knowledge Discovery can automatically create summaries from document content.
-
A file that allows the Content component to handle synonym queries. A synonym query returns results which are conceptually similar to the query terms, and conceptually similar to the synonyms that are available for the query terms. A synonym file contains comma-separated lists of synonym strings for words. You can specify lists for each language type that you have set up in the Content component in this file.
-
A type of query that returns documents that contain synonyms for a particular search term, as well as documents that contain the exact term. See Also: query
-
The process of adding extra information to documents. The tag might be a category, or entities returned from Eduction. Tagging usually adds a field to a document, which you can use to search by the name of a tag.
-
An automatically created hierarchical structure of clusters or other information. A taxonomy provides you with an overview of the information landscape, and an insight into specific areas of the information.
-
The basic entity that the Content component indexes (for example, a word in a document after it has been stemmed).
-
See TNW.
-
Terms and Weights. These values are used in categorization to define the most important terms that define a category topic.
-
Text, documents, and query syntax used to define the topic that an agent or category must match.
-
See UQL.
-
A security setup where Knowledge Discovery checks the security entitlement of a user against the original data repositories in real time when the user attempts to access a document. With this method, Knowledge Discovery always has the current security information, but the response can be slow because of the additional connection to the repository. Compare With: mapped security
-
Universal Query Language. A name for the Content component query syntax, which you can use for keyword, conceptual, Boolean, and Wildcard searches.
-
A component that converts files in a repository to HTML formats for viewing in a Web browser.
-
A character that stands in for any character or group of characters in a query.
-
Extensible Markup Language. XML is a language that defines the different attributes of document content in a format that can be read by humans and machines. In Knowledge Discovery, you can index documents in XML format. Knowledge Discovery can also return action responses in XML format.
A
B
C
D
E
F
G
H
I
K
L
M
N
O
P
Q
R
S
T
U
V
W
X