DocumentEmbeddings
The document embeddings processor allows you to generate vector embeddings to use in your documents. You can use these embeddings in an IDOL Content component index for vector search, and vector document suggestion.
A vector is an array of numbers that represents an object, which in general might be text or an image. With the Document Embeddings processor you can generate vectors for text, by using a model that you configure. In general, the vectors generated for pieces of text with a similar meaning are close in the vector space. Therefore embeddings allow you to perform vector-based comparisons to find similar content. You can use embeddings in IDOL to perform vector-based suggestions and search for similar documents.
The processor uses the configured model to generate a vector from your ingest document, which it outputs in the configured field name. It includes any offset information for the vectors, if the model returns offsets. Depending on the ingest document and the model, the processor might generate more than one embedding.
Properties
Name | Default Value | Description |
---|---|---|
IDOL License Service | An IdolLicenseServiceImpl that provides a way to communicate with an IDOL License Server. | |
Document Embeddings Service | The DocumentEmbeddingsServiceImpl to use to generate the embeddings. | |
Document Field Name |
The name of the document field that NiFi Ingest should use to store the embeddings information. To use these embeddings in an IDOL index, this field name must match a |
Relationships
Name | Description |
---|---|
success | Successfully processed FlowFiles are routed to this relationship. |
failure | FlowFiles that were not successfully enriched. |