Start Using the Filter API

Begin your Python program by importing the keyview.filter module that you installed in Install the Filter Python Module.

Copy
import keyview.filter as kv

Create a Session

To use the Filter API, first create a session:

Copy
bin_directory_path = 'C:\\KeyviewFilterSDK\\WINDOWS_X86_64\\bin\\'
license = 'YOUR_LICENSE_KEY'
with kv.FilterSession(bin_directory_path, license) as session:
    # Configure the session and open a document

bin_directory_path should describe the path of the Filter SDK binaries, and license should be your license key.

The FilterSession class is a context manager, so you can use the with statement to ensure that resources are automatically freed when no longer required.

To use the Filter API in a multi-threaded application, create a unique session on each thread and do not share a session or document between threads.

Open a Document

A Document object provides access to the data within a file, including the file format, text, metadata, and any subfiles. Obtain a Document object from a file or stream by calling the open() method on your FilterSession. For example:

Copy
with session.open(file_path) as doc:
    # Use the document object to detect the file format, filter text
    # access metadata, and extract subfiles

Determine the Format of a Document

You can find the format of a document by using the info property on the Document object. For example:

Copy
with session.open(file_path) as doc:
    # Print information about the document
    print(f"Format Name:  \t {doc.info.doc_format.name}")
    print(f"Version:      \t {doc.info.version}")
    print(f"Category Name:\t {doc.info.doc_class.name}")
    print(f"Encrypted:    \t {doc.info.encrypted}")    

Filter a Document

To filter a document and write the text to a file or stream, call the filter() method on a Document object. For example:

Copy
with session.open(file_path) as doc:
    try:
        # Filter to a file
        doc.filter(output_file_path)
        
        # Alternatively, filter to a stream
        # with open(output_file_path, "w", encoding="utf-8") as txt_stream:
        #     while line := doc.text.readline():
        #         txt_stream.write(line)
        #         txt_stream.flush()
                  
    except kv.KeyViewError as e:
        print(e)

Access Metadata

You can access document metadata through the metadata property on a Document object. For example:

Copy
for element in doc.metadata:
    
    # Process standardized elements where possible but also process
    # non-standard elements that have no standardized alternative
    # Ignore duplicate elements output for backwards compatibility.

    if not element.has_standard_alternative and not element.is_superseded:
        if element.standard_key != kv.MetadataKey.Other:
            print (f"[Standardized] {element.standard_key}: {element.value}")
        else:
            print (f"[Non-standard] {element.key}: {element.value}")

Extract Subfiles

The subfiles property on a Document object provides access to the document's subfiles. Each element returned by the iterator contains information about the subfile, and a method that you can use to extract it:

Copy
for file in doc.subfiles:
    print(f"Subfile: {file.index}, Size: {file.size}")

    if file.type != kv.SubfileType.Folder:
        file.extract(generate_output_path(file))

In this example, generate_output_path is a function that you would write, which returns a suitable path for writing the extracted subfile.