Input and Output Methods

Some of the previous examples show File Content Extraction taking input from a file (keyview::io::InputFile) and sending output to a file (keyview::io::OutputFile). The C++ API allows you to take input from and write output to other data sources by making use of generic types for input and output. For example, the signature of the Session::open method is:

template <typename InputType>
Document open(InputType& input)

...and the signature of the Document::filter method is:

template <typename OutputType>
void filter(OutputType& output);

Some input and output types are defined in Keyview_IO.hpp. These are InputFile, OutputFile, and InMemoryFile. You can create your own input and output types to read from and write to any data source you like.

Input Methods

To create a custom input type, create a class with read, seek, and tell methods, which reads from your data source. The methods you write must conform to the example signatures in the keyview::InputFile class defined in Keyview_IO.hpp. You can then pass instances of your class into Session::open.

Output Methods

To create a custom output type, create a class with a write method that writes to your data source. Your write method must conform to the example signature in the keyview::OutputFile class, also defined in Keyview_IO.hpp. You can then pass instances of this class into any Filter API function that takes an OutputType (such as Document::filter or Subfile::extract).

For example: 

class MyOutput
{
   public:
      int64_t write(const char* ptr, int64_t count)
      {
         // process the output
         return count;
      }
};

You then pass this class in when you call filter, for example; 

MyOutput output; // Create your custom output object
auto doc = session.open(input);
doc.filter(output);

File Content Extraction calls the write function you have implemented once for each chunk of data that it filters. The process is:

  1. Your code calls session.open then doc.filter.

  2. File Content Extraction opens the file and starts to read it.

  3. File Content Extraction finds some text and calls output::write (in your custom code).

  4. Your code now has control again. The write call tells you how many bytes of text you have, and what the text is. You can do any processing you want to on the text, and then return to File Content Extraction. You can either request more text by returning the number of bytes written, or return 0 to stop the filtering process (see Input and Output Methods).

  5. If there is more text to filter, and output::write() requested more, File Content Extraction returns to step 2. Otherwise, it returns from doc.filter().

A class can be valid as both an InputType and OutputType.

If you are writing the document's text to an output using the Document::filter method, you can enable partial filtering by implementing a custom OutputType as shown in the previous example. File Content Extraction calls the write method you implement for each block of text that it filters. The count argument is the number of bytes in each chunk. Chunks can vary significantly in size. After you have the text you need, you can return 0 to stop the filtering process.

TIP: Instead of writing a custom output method for partial filtering, you might find it simpler to read from the istream provided on the Document object. See Streaming Filtered Text.