Categorize Documents

Categorization analyzes the concepts that exist in a document and, if those concepts match categories in your Category component, adds category information to the document. Categorizing documents is useful because you can alert your users to new content that matches their interests, help them find information through taxonomies, and help them to identify similar documents.

To use categorization, you must have created and trained categories in your Category component. CFS queries Category by sending the CategorySuggestFromText action for each document, and Category returns information about any categories that match. If a document does not match any of the categories, the document is not categorized. For information about how to create and train categories, refer to the Knowledge Discovery Administration Guide.

To categorize documents

  1. Stop CFS.
  2. Open the CFS configuration file.
  3. Create an import task to run the CategorySuggestFromText Lua script that is supplied with CFS. For example:

    [ImportTasks]
    Post0=Lua:./scripts/CategorySuggestFromText.lua
  4. Open the script in a text editor.
  5. Modify the variables in the script so that the script sends actions to your Category component:

    Line Variable name Value
    178 idolCategorizeHost The host name or IP address of your Category component.
    179 idolCategorizePort The ACI port of your Category component. The port argument in the function send_aci_action expects a number, so do not surround the port number with quotation marks.
    184 timeoutMilliseconds The amount of time, in milliseconds, that CFS waits for a response from your Category component. If CFS does not receive a response within this time limit and the number of retries is reached, the document is not categorized. You should not need to modify the default value, which is 60 seconds.
    185 retries The number of times that CFS retries a request, if the first attempt is not successful.
    186-192 sslParameters A table of SSL parameters for connecting to Category. For more information about the SSL parameters that you can set, refer to the Connector Framework Server Reference.

    For example:

    local idolCategorizeHost = "10.0.0.1"
    local idolCategorizePort = 9000
    
    ...
    
    local timeoutMilliseconds = 30000
    local retries = 3
    local sslParameters =
       {
    	SSLMethod = "SSLV23",
    	--SSLCertificate = "host1.crt",
    	--SSLPrivateKey = "host1.key",
    	--SSLCACertificate = "trusted.crt"
       }
  6. Save and close the script.