NiFi Ingest
NiFi Ingest is a set of IDOL components for data retrieval and enrichment, that run within an open-source framework called Apache NiFi. NiFi Ingest provides a new way to ingest data into IDOL.
Ingest Data into IDOL
IDOL is a platform that helps you get the most benefit from large quantities of information. Before you can start using IDOL, you need to index your data. Your organization is likely to have data in many different formats, distributed across many different kinds of repository. The process of extracting information from repositories and preparing it for indexing into the IDOL index is called ingestion.
IDOL Connectors connect to repositories and retrieve content. There are IDOL connectors for over 150 types of repository, including:
- Local and network file systems.
- Web sites and social media feeds.
- Document and content management systems such as Microsoft SharePoint.
- E-mail servers such as Microsoft Exchange.
- Database servers such as Microsoft SQL Server, Oracle, and MySQL.
- Cloud services such as OneDrive and Google Drive.
After connectors have retrieved information from a repository, but before it is indexed, the information is usually processed and enriched.
Typically, files that contain text are filtered by IDOL KeyView, which extracts the text so that IDOL does not need to process information in its native format. Media files, such as images or audio recordings can be sent to an IDOL Media Server which can perform media analysis such as optical character recognition or speech-to-text. The information can be standardized, so that information that originated in different repositories is stored in the same document fields and can be used more effectively. You can discard irrelevant content so that it does not pollute the IDOL index.
Before NiFi Ingest, IDOL Connectors were deployed as discrete components and an IDOL Connector Framework Server (CFS) was used to coordinate the processing and enrichment tasks.
Use NiFi Ingest
NiFi Ingest helps you use Apache NiFi to build a custom ingestion pipeline for IDOL. You can create an ingestion pipeline based on NiFi, instead of deploying IDOL Connectors and IDOL Connector Framework Server.
NiFi Ingest, combined with the Apache NiFi framework, provides features that:
- Improve visibility. Apache NiFi provides a graphical interface that you use to build your ingestion pipeline. When you start ingesting documents, you can use the same interface to monitor processing speed and queue sizes, and identify bottlenecks and ingestion errors. The Apache NiFi framework also has built-in support for tracking documents through the ingestion process.
- Improve customization. You can create an ingestion pipeline that is customized to your use case, with less need for custom Lua scripts.
- Improve control. You can stop parts of the ingestion pipeline and make changes, without stopping the entire system.
- Improve performance and reliability. Apache NiFi can scale to process extremely large volumes of data. You can distribute processing across multiple NiFi instances.
NiFi Ingest provides processors that connect to your data repositories, in a similar way to IDOL Connectors. The NiFi Ingest distribution also provides processors that enrich your data, in a similar way to CFS import tasks. For example, the distribution includes a processor for KeyView filtering and a processor for sending files to an IDOL Media Server for further analysis.
After retrieving and enriching your data, NiFi Ingest can index the resulting documents into your IDOL Server.