DocumentEmbeddingsServiceImpl

A service that is used by the DocumentEmbeddings processor to generate embeddings for the text in your documents.

This service uses an LLM from Hugging Face, which NiFi downloads and caches when the service first runs.

Properties

Name Default Value Description
Idol License Service   An IdolLicenseServiceImpl that provides a way to communicate with an IDOL License Server.
Model Name  

The name of the model to use to generate embeddings.

This model name must match exactly the name of a model on Hugging Face. You can choose any sentence-transforme r model (OpenText recommends the sentence-transformers/sentence-t5-large model: https://huggingface.co/sentence-transformers/sentence-t5-large).

The first time that you use this model, NiFi Ingest downloads it from Hugging Face. It then caches it for future use.

Cache Directory ~/.cache/huggingface/hub The location where NiFi Ingest caches your model files.
Device CPU The type of processor to use process your embeddings model, cpu or cuda. If you set this option to cuda, you must have a CUDA-compatible GPU.
Embedding Precision   The number of decimal places to use in the embedding values. You can use a value between 1 and 10 (inclusive).
Model Max Sequence Length  

The maximum chunk size permitted by the model that you use to generate embeddings.

Model Minimum Final Sequence Length   The minimum length of the final chunk of text used to generate embeddings, when multiple embeddings are required for a piece of text.
Model Sequence Overlap  

The length of overlap required for text used to generate successive embeddings, when multiple embeddings are required for a piece of text.