Using Document Objects

As shown in Start Using the Filter API, you obtain a document object for each document that you want to process. The document object provides access to the data within a document, including its file format, metadata, text, and subfiles.

This topic provides some additional guidance about using document objects in your code.

A document object must be created and used on the same thread that created the parent session.

Document Objects and Session Configuration

Changing session configuration options can change the data that is returned by methods of a document object, even if you created the document object before changing the configuration.

Consider the following example. You have a document with embedded images but no subfiles and you run the following code:

Copy
with kv.FilterSession(bin_directory_path, license) as session:
    session.config.extract_images(False)

    with session.open(file_path) as doc:
        len(doc.subfiles)  # Returns 0

        session.config.extract_images(True)
        len(doc.subfiles)  # Returns 3

(Changing your session configuration invalidates objects previously obtained through the document object. Instead, you should access data through document object members.)

Password Protected Documents

When you have a document object, you might find that it is password protected. For instance, doc.info.encrypted may be True, or filtering the document might raise a KeyViewError. If you know the password, you can configure the session to use that password and then access the document's contents, as in the following code sample.

Copy
with session.open(file_path) as doc:
    # doc.filter(output_file_path) - would raise exception
    session.config.password("secret123")
    doc.filter(output_file_path) # does not raise exception