NiFi Ingest
24.2.0
New Features
- The AnalyzeMedia processor that was introduced in IDOL 24.1 now accepts a user-defined session configuration. The configuration is similar to an IDOL Media Server session configuration, with some exceptions. For example, you must ingest data using the
NiFiInput
engine, and output data using theNiFiOutput
engine. For more information, refer to the IDOL NiFi Ingest documentation. -
The method for using LLMs to generate embeddings in the Document Embeddings service has been improved.
Rather than create your own model files by using a script, NiFi can now download and cache the models directly from Hugging Face. This change means that NiFi supports a wider range of models, and it does not require the python script or external python libraries.
To use LLMs, you must now set the Model Name parameter in your Document Embeddings service configuration to specify the model to use from Hugging Face. You can also optionally set Cache Directory to the location that Answer Server must use to store the cached model files.
For information about the models that you can use for different configurations, refer to the NiFi Ingest Help.
IMPORTANT: As part of this change, the Model Path and Tokenizer Path parameters have been removed, and are no longer supported.
-
In the Document Embeddings service, you can now control the precision of the generated embeddings, by using the Embedding Precision parameter. This parameter sets the number of decimal places to use in the embedding values.
-
In the Document Embeddings service, you can now configure embedding models and generative LLMs to use a CUDA-compatible GPU device, by setting the new Device parameter to cuda.
- The advanced configuration UI for the NISTRDSFilter processor includes a cancel button, so that you can cancel the download of NIST RDS hash sets.
Resolved Issues
There were no resolved issues in this release.
24.1.0
New Features
- The ExecuteDocumentPython processor can install and use third-party modules. There is a new "Packages" tab in the advanced configuration interface that you can use to install modules.
- The Eduction processor has an improved guided setup wizard that makes it easier to select entities.
- A new AnalyzeMedia processor has been added to the IDOL NiFi Media components. This processor can run analysis on rich media files without an IDOL Media Server (all analysis takes place within NiFi). The processor can run one of several pre-set configurations that perform tasks such as Face detection and recognition, OCR, or Speech-To-Text.
- A new DocumentEmbeddings processor has been added to allow you to generate vector embeddings that you can include in your documents to index into IDOL. You can download this processor in the separate IDOL Document Embeddings package.
- The Redaction processor can be used for highlighting. It has a new property named "Opacity", that accepts a value from zero to one. With the default value (1), redacted regions are completely opaque. Values between zero and one produce partially opaque regions, meaning that you can use the processor for highlighting.
Resolved Issues
There were no resolved issues in this release.
23.4.0
New Features
- A new type of controller service has been introduced to simplify OAuth authentication. The OAuthServiceImpl is a NiFi controller service that provides a suitable endpoint for all IDOL Ingest processors that need to provide an OAuth redirect URL. IDOL Ingest processors can provide their own redirect URLs but these include component names and version numbers, so you might need to update your OAuth applications when you upgrade your IDOL NiFi Ingest components. Using the OAuthServiceImpl, you can set up your OAuth applications with a common redirect URL that does not change.
- A new ExecuteDocumentPython processor has been added, so that you can create and run Python scripts that interact with your FlowFiles and IDOL documents.
- You can set dynamic properties on the KeyViewFilterServiceImpl and KeyViewExportServiceImpl controller services, to modify settings in the KeyView
formats.ini
orformats_e.ini
configuration file. For example, you can configure the reader to use to process each file format. - The Eduction processor has an improved guided setup wizard that makes it easier to select entities.
- The NISTRDSFilter processor has been updated to support the NIST RDSv3 hash sets.
Resolved Issues
There were no resolved issues in this release.
23.3.0
New Features
- NiFi Ingest can enrich IDOL documents by adding more information about a user. There are two new processors, named EnrichUserFromAzureAD and EnrichUserFromLDAP. These query Azure AD or an LDAP directory to obtain information about a user, and add the information to the IDOL document. The processors can read a user ID or e-mail address from either a FlowFile attribute or an IDOL document metadata field. You might use one of these processors when a connector writes a user ID to an IDOL document, and you want to add more information about the user, such as their display name.
- The Eduction processor can extract entities from tables (such as comma-separated or tab-separated data, or the output from KeyView filtering when you set "Output Table Info" to
TRUE
), using information in the table headers to provide context. For more information about the supported table formats, refer to the Eduction documentation. - The Universal Redaction template supports tables detected by OCR.
- IDOL NiFi connectors support a new advanced property named
adv:AllowedClusterNodesRegex
. By default, a connector runs on all nodes of a cluster. If you set this property the connector runs only on nodes where the hostname matches the specified regular expression.
Resolved Issues
- NiFi could terminate unexpectedly when a NiFi Ingest media processor (for example the OpticalCharacterRecognition processor) was used to process an image with a very large number of pixels.
-
The Content component NiFi processor,
ContentServiceImpl
was unable to obtain a license correctly.
Notes
- When you use the ContentFromHTML or RenderHTML processors, the embedded web browser is no longer permitted to navigate away from the source page. If necessary you can allow navigation by setting the new configuration parameter
AllowNavigations=TRUE
.
23.2.0
New Features
- The ConvertXMLToDocuments processor has a guided setup wizard.
- The performance of external ODBC datastores has been improved (connectors store their state information in an external database when you set the State Database Service property).
Resolved Issues
- Lower than expected performance was observed when using an external document registry database. This could result in slow indexing, for example with the PutIDOL processor. The performance of the document registry has been improved.
- The KeyViewFilterDocument processor could fail to process FlowFiles, reporting the error "An invalid or illegal xml character is specified".
- The RemoveDocumentPart processor did not delete temporary .MSG or .EML files if it was used in a dataflow following a KeyViewExtractFiles processor with the "Merge mails" property set to TRUE.
- Reduced memory usage for some IDOL NiFi processors.
- The OpticalCharacterRecognition processor could terminate unexpectedly.
- When using the OpticalCharacterRecognition processor, some temporary files produced by KeyView were written to the system temporary directory, rather than the location specified in the KeyView Export Service. Some temporary files were not deleted when no longer required.
Notes
- The minimum version of Apache NiFi supported by the IDOL NiFi Ingest components is now NiFi 1.15.3.
- Due to an issue with the storage of OAuth tokens in previous versions of the RMSDecrypt and RMSEncrypt processors, you might need to obtain OAuth tokens again.
Deprecated Features
The following features are deprecated and might be removed in a future release.
Category | Deprecated Feature | Deprecated Since |
---|---|---|
Clipping | In the ContentFromHTML and RenderHTML processors, the SMARTPRINT clipping mode has been deprecated. |
23.3.0 |