Filtering
Filter SDK enables you to filter many different types of documents. Filtering is the process of extracting the text from a document without the application-specific markup. However, the filtering process can also include the following:
- Subfile extraction—exposes all subfiles for filtering. See Use the File Extraction API.
- File format extraction—detects a file's format, and reports the information to the API, which in turn reports the information to the developer's application. See File Format Detection.
- Metadata extraction—extracts selected metadata (document properties) from a file. See Extract Metadata.
- Character set conversion—controls the character set of both the input and the output text. See Convert Character Sets.