Index Documents into Knowledge Graph

Knowledge Graph accepts indexed documents in JSON format. You can also use the python script IDXtoJSON.py, included in the Knowledge Graph installation directory to convert and index an existing IDX document. For example, you can use this script on an IDX document that you have exported from IDOL Server.

When you index a document, Knowledge Graph extracts the values of fields that you have configured as nodes in your edge configuration, and creates the graph of nodes and edges. After indexing, it creates a persistent store of the graph on disk.

Index a JSON Document

To index a JSON document, send the IndexDocs action to Knowledge Graph, using a POST request method, and set the Data parameter to the contents of the JSON document. For example:

http://12.3.4.56:9000/action=IndexDocs&Data=[{'title': 'document 1', 'myfield': 'value1'}, {'title': 'document 2', 'myfield': 'value 2'}]

Index an IDX Document

You can use the IDXtoJSON.py script to convert an existing IDX document to JSON and index it into Knowledge Graph.

NOTE: The script indexes all your documents in batches, without saving the graph index to disk. After indexing is complete, it runs a Persist action to save the graph index to disk, and the RegenerateWeights action to generate weights, if you have set the Weighting configuration parameter to Shortstep.

To run this script, you must have:

  • Python version 2 or 3.
  • The Requests module.

To index an IDX document using the script

  • Open a command prompt in your Knowledge Graph installation directory, and run the following command:

    python IDXtoJSON.py --input InputPath --server ServerDetails --batchsize BatchSize

    where:

    InputPath is the file name and path to the input IDX file that you want to index into Knowledge Graph.
    ServerDetails is the host name and ACI port of the Knowledge Graph that you want to index into. Use the format host:port.
    BatchSize

    is the number of documents from the IDX file to include in each POST request to Knowledge Graph. The default value is 10000.

    NOTE: Large batch sizes can fail if the request becomes too big. If you do not want to reduce the batch size, you can increase the value of the MaxFileUploadSize configuration parameter in the [Server] section of the Knowledge Graph configuration file. For more information, refer to the Knowledge Graph Component Reference.

    For example:

    python IDXtoJSON.py --input /c/graphdata/graphinput.idx --server localhost:13000 --batchsize 1000

Use Connector Framework Server to Index Documents

The IDOL Connector Framework Server (CFS) processes files of different types and extracts the text into an IDOL document format. You can configure CFS to index into Knowledge Graph by using the IDOLACIIndexer library, which converts the normal CFS output to JSON format, which Knowledge Graph can index.

For more information about how to configure CFS and use it to retrieve documents, refer to the Connector Framework Server Administration Guide.

To configure CFS to index into Knowledge Graph, add the following configuration to your CFS configuration file:

[Indexing]
IndexerSections=Graph

[Graph]
IndexerType=Library
LibraryDirectory=IndexerDLLPath
LibraryName=IDOLACIIndexer
ACIIndexHost=GraphServerHost
ACIIndexPort=GraphServerPort

where,

IndexerDLLPath is the path to your version of the IDOLACIIndexer library.
GraphServerHost is the host name or IP address of the machine that your Knowledge Graph is installed on.
GraphServerPort is the ACI Port of your Knowledge Graph.

By default, CFS creates XML documents. The library converts these documents to JSON. In this process, any XML attributes become JSON fields, and the JSON field name is the XML attribute name with the @ symbol as a prefix. For example, the following XML:

<Field1>Value1</Field1>
<Field2 attribute1="AttValue1">Value2</Field2>

Becomes the following in JSON format:

{
   "Field1" : "Value1",
   "Field2" : [
      "Value2",
      {
         "@Attribute1" : "AttValue1"
      }
   ]
}