Index Documents into Vertica

FileNet P8 Connector can index documents into Vertica, so that you can run queries on structured fields (document metadata).

Depending on the metadata contained in your documents, you could investigate the average age of documents in a repository. You might want to answer questions such as: How much time has passed since the documents were last updated? How many files are regularly updated? Does this represent a small proportion of the total number of documents? Who are the most active users?

TIP: In most cases, OpenText recommends sending documents to a Connector Framework Server (CFS). CFS extracts metadata and content from any files that the connector has retrieved, and can manipulate and enrich documents before they are indexed. CFS also has the capability to insert documents into more than one index, for example IDOL Server and a Vertica database. For information about sending documents to CFS, see Send Data to Connector Framework Server

Prerequisites

  • FileNet P8 Connector supports indexing into Vertica 7.1 and later.
  • You must install the appropriate Vertica ODBC drivers (version 7.1 or later) on the machine that hosts FileNet P8 Connector. If you want to use an ODBC Data Source Name (DSN) in your connection string, you will also need to create the DSN. For more information about installing Vertica ODBC drivers and creating the DSN, refer to the Vertica documentation.

New, Updated and Deleted Documents

When documents are indexed into Vertica, FileNet P8 Connector adds a timestamp that contains the time when the document was indexed. The field is named VERTICA_INDEXER_TIMESTAMP and the timestamp is in the format YYYY-MM-DD HH:NN:SS.

When a document in a data repository is modified, FileNet P8 Connector adds a new record to the database with a new timestamp. All of the fields are populated with the latest data. The record describing the older version of the document is not deleted. You can create a projection to make sure your queries only return the latest record for a document.

When FileNet P8 Connector detects that a document has been deleted from a repository, the connector inserts a new record into the database. The record contains only the DREREFERENCE and the field VERTICA_INDEXER_DELETED set to TRUE.

Fields, Sub-Fields, and Field Attributes

Documents that are created by connectors can have multiple levels of fields, and field attributes. A database table has a flat structure, so this information is indexed into Vertica as follows:

  • Document fields become columns in the flex table. An IDOL document field and the corresponding database column have the same name.
  • Sub-fields become columns in the flex table. A document field named my_field with a sub-field named subfield results in two columns, my_field and my_field.subfield.
  • Field attributes become columns in the flex table. A document field named my_field, with an attribute named my_attribute results in two columns, my_field holding the field value and my_field.my_attribute holding the attribute value.

Prepare the Vertica Database

Indexing documents into a standard database is problematic, because documents do not have a fixed schema. A document that represents an image has different metadata fields to a document that represents an e-mail message. Vertica databases solve this problem with flex tables. You can create a flex table without any column definitions, and you can insert a record regardless of whether a referenced column exists.

You must create a flex table before you index data into Vertica.

When creating the table, consider the following:

  • Flex tables store entire records in a single column named __raw__. The default maximum size of the __raw__ column is 128K. You might need to increase the maximum size if you are indexing documents with large amounts of metadata.
  • Documents are identified by their DREREFERENCE. OpenText recommends that you do not restrict the size of any column that holds this value, because this could result in values being truncated. As a result, rows that represent different documents might appear to represent the same document. If you do restrict the size of the DREREFERENCE column, ensure that the length is sufficient to hold the longest DREREFERENCE that might be indexed.

To create a flex table without any column definitions, run the following query:

   create flex table my_table();

To improve query performance, create real columns for the fields that you query frequently. For documents indexed by a connector, this is likely to include the DREREFERENCE:

   create flex table my_table(DREREFERENCE varchar NOT NULL);

You can add new column definitions to a flex table at any time. Vertica automatically populates new columns with values for existing records. The values for existing records are extracted from the __raw__ column.

For more information about creating and using flex tables, refer to the Vertica Documentation or contact Vertica technical support.

Send Data to Vertica

To send documents to a Vertica database, follow these steps.

To send data to Vertica

  1. Stop the connector.
  2. Open the connector’s configuration file in a text editor.
  3. In the [Ingestion] section, set the following parameters:

    EnableIngestion To enable ingestion, set this parameter to true.
    IngesterType To send data to a Vertica database, set this parameter to Indexer.

    For example:

    [Ingestion]
    EnableIngestion=TRUE
    IngesterType=Indexer
  4. In the [Indexing] section, set the following parameters:

    IndexerType To send data to a Vertica database, set this parameter to Library.
    LibraryDirectory The directory that contains the library to use to index data.
    LibraryName The name of the library to use to index data. You can omit the .dll or .so file extension. Set this parameter to verticaIndexer.
    ConnectionString The connection string to use to connect to the Vertica database.
    TableName The name of the table in the Vertica database to index the documents into. The table must be a flex table and must exist before you start indexing documents. For more information, see Prepare the Vertica Database.

    For example:

    [Indexing]
    IndexerType=Library
    LibraryDirectory=indexerdlls
    LibraryName=verticaIndexer
    ConnectionString=DSN=VERTICA
    TableName=my_flex_table
  5. Save and close the configuration file.