Using Document Objects
As shown in Start Using the Filter API, you obtain a document object for each document that you want to process. The document object provides access to the data within a document, including its file format, metadata, text, and subfiles.
This topic provides some additional guidance about using document objects in your code.
A document object must be created and used on the same thread that created the parent session.
Document Objects and Session Configuration
Changing session configuration options can change the data that is returned by methods of a document object, even if you created the document object before changing the configuration.
Consider the following example. You have a document with embedded images but no subfiles and you run the following code:
with kv.FilterSession(bin_directory_path, license) as session:
session.config.extract_images(False)
with session.open(file_path) as doc:
len(doc.subfiles) # Returns 0
session.config.extract_images(True)
len(doc.subfiles) # Returns 3
(Changing your session configuration invalidates objects previously obtained through the document object. Instead, you should access data through document object members.)
Password Protected Documents
When you have a document object, you might find that it is password protected. For instance, doc.info.encrypted
may be True
, or filtering the document might raise a KeyViewError
. If you know the password, you can configure the session to use that password and then access the document's contents, as in the following code sample.
with session.open(file_path) as doc:
# doc.filter(output_file_path) - would raise exception
session.config.password("secret123")
doc.filter(output_file_path) # does not raise exception