Extract Subfiles

To filter all files in a container file, you must open the container and extract its subfiles to either a file or a stream. The extraction process is done repeatedly until all subfiles are extracted and exposed for filtering. After a subfile is extracted, you can call Filter API methods to filter the data.

If you want to filter a container file and its subfiles, to a single file, you must extract all files from the container, filter the files, and then append the filtered text to the same output file.

You can iterate over subfile information by calling the Subfiles method on a document object. Each element returned by the iterator contains information about the subfile, and a method to let you extract it:

Copy
foreach (Subfile file in myDoc.Subfiles()) using (file)
{
    Console.WriteLine($"Subfile {file.Index()}, size: {file.Size()}");
    if (file.GetSubfileType() != SubfileType.Folder)
    {
        string outputPath = GenerateOutputPath(file);
        file.Extract(outputPath);
    }
}

In this example, GenerateOutputPath() is a function that returns the path you want to use for the extracted subfile. If the name of the subfile does not matter (for example, the subfile will being passed into KeyView for further processing) you could use a unique identifier like a GUID. If you instead choose to base the filename on Subfile.RawName() - the path the container file provides - you should ensure you protect against directory traversal attacks (where the name of the subfile contains a relative or absolute path).