Universal Redaction

The NiFi Ingest package includes an example template for "universal redaction" (Universal_Redaction.xml). This template contains a process group that accepts IDOL document FlowFiles, and redacts personal information (PII) in both image and document file formats. The output from the dataflow is a redacted image of the original file.

For example, if the input is a Microsoft Word document, the dataflow uses a KeyViewExportToHtml processor to replace the Word document with an HTML representation of the content. The HTML is then rendered and replaced by an image. Each word in the text (with its location in the rendered image) is added to the document metadata. Alternatively, if the input is an image, Optical Character Recognition is used to extract the text.

For both documents and images, an Eduction processor locates any PII in the extracted text. A redacted copy of the text is added to the document metadata and a RedactImage processor is used to redact the rendered image.

The process group has the following outputs:

  • Success - Processing was successful. The file in the input IDOL document FlowFile was replaced by a redacted image.
  • Failure - Processing failed.
  • Unprocessed - The file type is not supported, for example a document cannot be processed if it cannot be exported by the KeyView HTML Export SDK.