Tab Delimited Output for Spreadsheets and Embedded Tables

You can use File Content Extraction to convert spreadsheets, embedded tables in Word Processing documents (for example, Microsoft Word documents), and tables detected by Optical Character Recognition (OCR), to tab-delimited form.

In this format, File Content Extraction inserts a tab character between each cell, and a line break between each row. Tab and line break characters in the cells are replaced with spaces. For spreadsheets, this format ensures that tabs exist between empty cells, which can be useful when you need to keep the table structure after filtering.

To enable tab delimited output for spreadsheets and embedded tables

  • In the .NET API, call the method TabDelimited on your session configuration. For example:

    session.Config().TabDelimited(true);
  • In formats.ini, set the following parameter.

    [Options]
    TabDelimited=TRUE

Table Output for Eduction

For files that contain multiple tables, File Content Extraction includes an option that creates output with delimiters between tables that can be understood by Eduction. This option allows Eduction to extract entity data from tables.

To use this option, you must enable Tab Delimited output, and set the target character set to KVCS_UTF8.

To enable table delimiters for spreadsheets and embedded tables

  • In the .NET API, call the method OutputTableDelimiters on your session configuration. For example:

    session.Config().OutputTableDelimiters(true);
  • In formats.ini, set the following parameter.

    [Options]
    OutputTableDelimiters=TRUE

For more information about table extraction in Eduction, refer to the Eduction User and Programming Guide.