Tab Delimited Output for Spreadsheets and Embedded Tables

You can use File Content Extraction to convert spreadsheets, embedded tables in Word Processing documents (for example, Microsoft Word documents), and tables detected by Optical Character Recognition (OCR), to tab-delimited form.

In this format, File Content Extraction inserts a tab character between each cell, and a line break between each row. Tab and line break characters in the cells are replaced with spaces. For spreadsheets, this format ensures that tabs exist between empty cells, which can be useful when you need to keep the table structure after filtering.

To enable tab delimited output for spreadsheets and embedded tables

  • In the Java API, call the setTabDelimited method on the filter object, for example:

    filter.setTabDelimited(true);
  • In formats.ini, set the following parameter. (This is an alternative approach - you do not need to do this if you have configured this feature through the API.)

    [Options]
    TabDelimited=TRUE

Table Output for Eduction

For files that contain multiple tables, File Content Extraction includes an option that creates output with delimiters between tables that can be understood by Eduction. This option allows Eduction to extract entity data from tables.

To use this option, you must enable Tab Delimited output, and set the target character set to KVCS_UTF8.

To enable table delimiters for spreadsheets and embedded tables

  • In the Java API, call the setOutputTableDelimiters method on the filter object, for example:

    Filter.setOutputTableDelimiters(true);
  • In formats.ini, set the following parameter. (This is an alternative approach - you do not need to do this if you have configured this feature through the API.)

    [Options]
    OutputTableDelimiters=TRUE

For more information about table extraction in Eduction, refer to the Eduction User and Programming Guide.