Filtering

Filter SDK enables you to filter many different types of documents. Filtering is the process of extracting the text from a document without the application-specific markup. However, the filtering process can also include the following:

  • Subfile extraction—exposes all subfiles for filtering. See Use the File Extraction API.
  • File format extraction—detects a file's format, and reports the information to the API, which in turn reports the information to the developer's application. See Obtain Format Information.
  • Metadata extraction—extracts selected metadata (document properties) from a file. See Extract Metadata.
  • Character set conversion—controls the character set of the output text. See Character Encoding.