Set up a SQL Database for Coding Information
The coding information in Fact Bank describes the entity, property, and qualifier codes in your Fact Bank system. It also defines any aliases for any of the entities, properties, and qualifiers, and maps all aliases to the same code.
You can store the coding information as a series of text files (see Create Coding Files), or in a SQL database.
You configure your Fact Bank to use a coding database by setting the CodifierType parameter in your Fact Bank system to odbc
, and creating a configuration section for the coding database. For more information, see Configure the Fact Bank System.
Create a Codings SQL Database
Answer Server can connect to any RDBMS that supports SQL. The most fully tested options are:
- SQLite
- PostgreSQL (version 9.3 or later is required)
The SQL coding database has four required tables:
property_codes
. Assigns a unique code to each property and qualifier in your data, as well as a canonical human-readable name and the data type.properties
. An inverse mapping of the property and qualifier codes, including any aliases.entity_codes
. Assigns a unique code to each entity in your data (that is, things for which you might want to know the values of a property).entities
. An inverse mapping of the entity codes, including any aliases.
Your Answer Server installation includes a PostgreSQL schema file for these tables, and a utility script that uses the psql
utility to apply the schema to a database. These files are available in the /factbank/schemas/postgresql/codings
directory in your installation.
The following sections describe these tables in more detail.
Property_Codes Table
The properties in your data are the values that you want to find in the Fact Bank. The property_codes
table contains a unique code for each property and qualifier. This table also defines the canonical human-readable name for the property or qualifier, and sets its type.
Column | Type | Description |
---|---|---|
id
|
text | The ID for the property or qualifier. |
code
|
text | The unique code for the property or qualifier. |
canonical_name
|
text | The canonical name for the property (any aliases that you define in the properties table refer back to this name). |
type
|
text |
The type of the property. You can use the following types:
|
Properties Table
The properties
table contains the inverse mapping of the property_codes
file, without the type information. You can also include aliases for a value, in a separate row.
Column | Type | Description |
---|---|---|
id
|
text | The ID for the property or qualifier value. |
name
|
text | The name of the property or qualifier. This value can be an alias. |
code_id
|
text | The ID of the row in the property_codes table that corresponds to this property or qualifier. |
Entity_Codes Table
The entities are the things that you want to find the property values for. The entity_codes
table assigns a unique code to each entity.
Column | Type | Description |
---|---|---|
id
|
text | The ID for the entity. |
code
|
text | The unique code for the entity. |
canonical_name
|
text | The primary name for the entity (any aliases that you define in the entities table refer back to this name). |
weight
|
integer | The entity weight. If entities with similar names are returned as candidate answers, the entity with the higher weight scores more highly. |
Entities Table
The entities
table contains the inverse mapping of the entity_codes
file. You can also include aliases for a value, in a separate row.
Column | Type | Description |
---|---|---|
id
|
text | The ID for the entity alias. |
name
|
text | The name of the entity. This value can be an alias. |
code_id
|
text | The ID of the row in the entity_codes table that corresponds to this entity. |
Import Codings into a Database from Coding Files
The Answer Server installation includes a Python script utility, sql_import.py
, to allow you to migrate from an existing set of coding files to a SQL codings database.
To run the script, you need:
- an existing set of coding files
- the details of an IDOL Content component that the script can connect to. The script uses this Content to generate stems of the canonical names that it inserts into the database. This Content must have the same stemming configuration as Fact Bank for the language that the codings apply to. Content must be running before you run the script.
To import your codings from an existing set of coding files, you run the script against the directory that contains the coding files, and provide the details of the IDOL Content component.
This script creates the required tables in your database, with the VARCHAR lengths correctly sized for your data. It also inserts the codings into the tables for the odbc fact codifier.
NOTE: The insertion is transactional. If the script is successful, it imports all the data. If there are any errors, the script does not import any data.