Create Coding Files

The coding files are simple files that describe the entity, property, and qualifier codes in your Fact Bank system. It also defines any aliases for any of the entities, properties, and qualifiers, and maps all aliases to the same code.

A Fact Bank system requires four coding files:

  • code_to_property.txt. Assigns a unique code to each property and qualifier in your data, as well as a canonical human-readable name and the data type.
  • property_to_code.txt. An inverse mapping of the property and qualifier codes, including any aliases.
  • code_to_entity.txt. Assigns a unique code to each entity in your data (that is, things for which you might want to know the values of a property).
  • entity_to_code.txt. An inverse mapping of the entity codes, including any aliases.

The following sections use a simple example to show how to create the coding files from your data.

Example Data

The example data starts with facts, organized in a table. This version uses CSV format:

product_name, color, buy_price, sell_price, sold_last_year
alpha, red, 10, 12, 3500000
beta, blue, 11, 13, 2000000
gamma, green, 9, 10, 1000000

For this example, you might want to be able to answer questions such as:

  • What is the purchase price for alpha?
  • What was the selling price of beta?
  • What color is gamma?

Generate the Property Code Files

The properties in your data are the values that you want to find in the Fact Bank. For a table like the one in this example, the properties are the columns in the table.

The code_to_property.txt coding file assigns a unique code for each property and qualifier. This coding file also defines the canonical human-readable name for the property or qualifier, and sets its type. You can use the following types: 

  • string. The property or qualifier values are strings.
  • time. The property or qualifier values are times in the ISO format YYYY-MM-DDTHH:NN:SS.
  • entity. The property or qualifier values are entity codes. In this case, you must list the entity code in the code_to_entity.txt file, and you must list the possible values for this entity in the entity_to_code.txt file. This option allows you to map multiple values to the same qualifier code.

NOTE: If your data values contain punctuation characters, such as commas (,) and equals signs (=), you must percent-encode the value in the coding files. For example, use %3D for an equals sign.

For example, the following sample is the code_to_property.txt file for the example data in the previous section.

PRODUCT_NAME=product,string
COLOR=color,string
BUY_PRICE=buying price,string
SELL_PRICE=selling price,string
SOLD_LAST_YEAR=sold last year,string

The property_to_code.txt coding file contains the inverse mapping of the code_to_property.txt file, without the type information. You can also include aliases for a value, on a separate line.

For example, the following sample is the property_to_code.txt file for the example data. It includes the alias sale price for the SELL_PRICE code.

product=PRODUCT_NAME
color=COLOR
buying price=BUY_PRICE
purchase price=BUY_PRICE
selling price=SELL_PRICE
sale price=SELL_PRICE
sold last year=SOLD_LAST_YEAR

Generate the Entity Code Files

The entities are the things that you want to find the property values for. For the example table, the obvious choice is the product_name.

The code_to_entity.txt coding file assigns a unique code to each entity.

NOTE: If your data values contain punctuation characters, such as commas (,) and equals signs (=), you must percent-encode the value in the coding files. For example, use %3D for an equals sign.

For example, the following sample is the code_to_entity.txt for the example data.

ALPHA=alpha
BETA=beta
GAMMA=gamma

The entity_to_code.txt coding file contains the inverse mapping of the code_to_entity.txt. You can also include aliases for the entity names, on a separate line.

For example, the following sample is the entity_to_code.txt for the example data. It includes the alias alpha one for the ALPHA code.

alpha=ALPHA
beta=BETA
gamma=GAMMA
alpha one=ALPHA