HPE CFS can index documents into Vertica, so that you can run queries on structured fields (document metadata).
Depending on the metadata contained in your documents, you could:
When documents are indexed into Vertica, CFS adds a timestamp that contains the time when the document was indexed. The field is named VERTICA_INDEXER_TIMESTAMP
and the timestamp is in the format YYYY-MM-DD HH:NN:SS
.
When a document in a data repository is modified, CFS adds a new record to the database with a new timestamp. All of the fields are populated with the latest data. The record describing the older version of the document is not deleted. You can create a projection to make sure your queries only return the latest record for a document.
When a connector detects that a document has been deleted from a repository, CFS inserts a new record into the database. The record contains only the DREREFERENCE
and the field VERTICA_INDEXER_DELETED
set to TRUE
.
Documents that are created by connectors and processed by CFS can have multiple levels of fields, and field attributes. A database table has a flat structure, so this information is indexed into Vertica as follows:
my_field
with a sub-field named subfield
results in two columns, my_field
and my_field.subfield
.my_field
, with an attribute named my_attribute
results in two columns, my_field
holding the field value and my_field.my_attribute
holding the attribute value.
|