Start Using the Filter API
Begin your Python program by importing the keyview.filter
module that you installed in Install the Filter Python Module.
import keyview.filter as kv
Create a Session
To use the Filter API, first create a session:
bin_directory_path = 'C:\\KeyviewFilterSDK\\WINDOWS_X86_64\\bin\\'
license = 'YOUR_LICENSE_KEY'
with kv.FilterSession(bin_directory_path, license) as session:
# Configure the session and open a document
bin_directory_path
should describe the path of the Filter SDK binaries, and license
should be your license key.
The FilterSession
class is a context manager, so you can use the with
statement to ensure that resources are automatically freed when no longer required.
To use the Filter API in a multi-threaded application, create a unique session on each thread and do not share a session or document between threads.
Open a Document
A Document
object provides access to the data within a file, including the file format, text, metadata, and any subfiles. Obtain a Document
object from a file or stream by calling the open()
method on your FilterSession
. For example:
with session.open(file_path) as doc:
# Use the document object to detect the file format, filter text
# access metadata, and extract subfiles
Determine the Format of a Document
You can find the format of a document by using the info
property on the Document
object. For example:
with session.open(file_path) as doc:
# Print information about the document
print(f"Format Name: \t {doc.info.doc_format.name}")
print(f"Version: \t {doc.info.version}")
print(f"Category Name:\t {doc.info.doc_class.name}")
print(f"Encrypted: \t {doc.info.encrypted}")
Filter a Document
To filter a document and write the text to a file or stream, call the filter()
method on a Document
object. For example:
with session.open(file_path) as doc:
try:
# Filter to a file
doc.filter(output_file_path)
# Alternatively, filter to a stream
# with open(output_file_path, "w", encoding="utf-8") as txt_stream:
# while line := doc.text.readline():
# txt_stream.write(line)
# txt_stream.flush()
except kv.KeyViewError as e:
print(e)
Access Metadata
You can access document metadata through the metadata
property on a Document
object. For example:
for element in doc.metadata:
# Process standardized elements where possible but also process
# non-standard elements that have no standardized alternative
# Ignore duplicate elements output for backwards compatibility.
if not element.has_standard_alternative and not element.is_superseded:
if element.standard_key != kv.MetadataKey.Other:
print (f"[Standardized] {element.standard_key}: {element.value}")
else:
print (f"[Non-standard] {element.key}: {element.value}")
Extract Subfiles
The subfiles
property on a Document
object provides access to the document's subfiles. Each element returned by the iterator contains information about the subfile, and a method that you can use to extract it:
for file in doc.subfiles:
print(f"Subfile: {file.index}, Size: {file.size}")
if file.type != kv.SubfileType.Folder:
file.extract(generate_output_path(file))
In this example, generate_output_path
is a function that you would write, which returns a suitable path for writing the extracted subfile.