Configure the Passage Extractor LLM System

The Answer Server configuration file contains information about the subcomponents in your Passage Extractor LLM systems.

For any Passage Extractor LLM system, you must configure the host and port details of your data store, which is an IDOL Content component that contains the documents that Answer Server uses to find answers. You must also configure the LLM module to use to generate or extract answers.

Passage extractor LLM also uses question classifiers, to determine the type of a question. The classifier is required. The Answer Server installation includes classifiers for some languages, but for others you must train a classifier yourself.

The following procedure describes how to configure the Passage Extractor LLM system in Answer Server.

For more details about the configuration parameters for the Passage Extractor LLM system, see Passage Extractor LLM System Configuration Parameters.

To configure the Passage Extractor System

  1. Open the Answer Server configuration file in a text editor.

  2. Find the [Systems] section, or create one if it does not exist. This section contains a list of systems, which refer to the associated configuration sections for each system.

  3. After any existing systems, add an entry for your new Passage Extractor system. For example:

    [Systems]
    0=MyAnswerBank
    1=MyFactBank
    2=MyPassageExtractor
    3=MyPassageExtractorLLM
  4. Create a configuration section for your Passage Extractor LLM system, with the name that you specified. For example, [MyPassageExtractorLLM].

  5. Set Type to PassageExtractorLLM.

  6. Set IDOLHost and IDOLACIPort to the host name and ACI Port of the IDOL Content component that contains the documents that you want to use to find answers.

    NOTE: If you want to use synonyms to expand queries, set these parameters to the host and port of the Query Manipulation Server (QMS) that provides access to your synonyms. Set the host and port of the Content component in the QMS configuration file instead. For more information about how to enable synonyms, see Use Synonyms to Expand Queries.

  7. Set ModuleID to the name of the configuration section that provides details of the LLM module to use. For information about how to configure this, see Configure the LLM Module.

  8. Set the ClassifierFile parameter to the path of the question classifier file, and set LabelFile to the path of the label file.

    TIP: The Answer Server installation includes classifier and labels files for English and German. For example, to use the default files for the English language, set ClassifierFile to the location of the svm_en.dat file, and set LabelFile to the location of the labels_en.dat file.

    If you want to train your own classifier or are configuring a Passage Extractor system for use with another language, set the ClassifierFile and LabelFile parameters to the locations where you want Answer Server to save the question classifier and label files, when you perform training. For information about how to train classifiers, see Train Passage Extractor Classifiers.

  9. Save and close the configuration file.

  10. Restart Answer Server for your changes to take effect.

For example:

[MyPassageExtractor]
Type=PassageExtractorLLM
// Data store IDOL
IdolHost=localhost
IdolAciport=6002
// Classifier Files
ClassifierFile=./passageextractor/classifiertraining/svm_en.dat
LabelFile=./passageextractor/classifiertraining/labels_en.dat

// Module to use
ModuleID=LLMExtractiveQuestionAnswering-Small

[LLMExtractiveQuestionAnswering-Small]
Type=ExtractiveQuestionAnsweringLLM
ModelPath=LLMFiles/model.pt
TokenizerPath=LLMFiles/tokenizer.spiece.model

Change the Passage Extractor Language

The default installation of passage extractor includes example question classifier training files for English. To use passage extractor in another language, you must:

  • create a new question classifier in the new language. See Train Passage Extractor Classifiers.
  • set the Language configuration parameter to the appropriate language, either in the [Server] section (to set the language for all of Answer Server), or in the passage extractor LLM system configuration section (to set the language for just passage extractor). You might also want to set the StopList parameter. See Language Configuration.

Configure the LLM Module

For a passage extractor LLM system, you must have an LLM module that defines the locations of the model files that Answer Server must use. You can use the following types of models: 

  • Extractive or Generative LLM. These options use model files that you download and convert into a format that Answer Server can use directly. See Create the Model Files.

  • Extractive or Generative Lua Script. These options use a Lua script to produce answers. For example, you might want to use your Lua script to generate answers using an HTTP endpoint. For these modules, Answer Server calls the Lua script and processes the response to use as the answers to a question. See Create a Question Answering Lua Script.

The extractive question answering types find the answers in your documents and return the exact text as it occurs in the original text. The generative models can generate new text to answer the question, based on the information that it finds in your documents.

Create the Model Files

You can create the required model files by using the export_transformers_model.py script. These models use LLMs from Hugging Face, which the script converts into a format that Answer Server can use to extract or generate answers. You can use one of the following LLM types:

The script export_transformers_model.py is installed in your Answer Server installation tools directory. This directory also includes a requirements.txt file to allow you to install the necessary dependencies for the script.

To create your model

  1. Install the requirements for the export_transformers_model.py script by using pip with the requirements.txt file. For example:

    pip install -r requirements.txt
  2. Run the export_transformers_model.py script with the following arguments:

    model

    The model to download from Hugging Face.

    model-type The type of model to create. For extractive question answering, set this parameter to extractive. For generative question answering, set this parameter to generative.

    You can also optionally set the following arguments:

    output The file name to use for the generated model file. The default value is model.pt.
    output-spiece The file name to use for the sentencepiece tokenizer file. The default value is spiece.model.
    cache The location for the cache for model downloads. The default value is .cache.

    When the script finishes, it outputs the name and location of the model and tokenizer files that it creates. You use these values in your LLM Module configuration (see ModelPath and TokenizerPath).

Create a Question Answering Lua Script

You can configure a Lua script to perform generative or extractive question answering. For example, you might want to use your Lua script to generate answers using an HTTP endpoint.

For extractive question answering, your script must define a function called perform_extractive_qa, which must accept two parameters:

  • the first parameter is a string that represents the question.

  • the second parameter is a string that defines the context from which the answer must be extracted.

The function must return a string that represents the answer to the question that has been extracted from the context. It can optionally also return a float between 0 and 1 corresponding to a score for the answer.

For example:

function perform_extractive_qa(question, context)
   return context, 0.95
end

For generative question answering, your script must define a function called perform_generative_qa, which must accept two parameters:

  • the first parameter is a string that represents the question.

  • the second parameter is a string that defines the context from which the answer must be extracted.

The function must return a string that represents the answer to the question, based on the provided context.

For example:

function perform_generative_qa(question, context)
   return string.format("Here is your answer: %s", string.sub(context, 1, 100))
end

Configure the LLM Module

After you create your model files, or your Lua script, you must configure an LLM module section in your configuration file, which you refer to in your Passage Extractor LLM system configuration.

To configure the LLM Module

  1. Open the Answer Server configuration file in a text editor.

  2. Create a new configuration section for your LLM module. This is the configuration section name that you use in the ModuleID parameter when you configure the passage extractor LLM system. For example [LLMExtractiveQA]

  3. In the new configuration section, set the Type parameter to one of the following values:

    • ExtractiveQuestionAnsweringLLM. Use an extractive LLM to extract answers from IDOL documents.

    • GenerativeLLM. Use a generative LLM to generate answers based on information in your IDOL documents.

    • ExtractiveQuestionAnsweringLua. Use a Lua script to extract answers from IDOL documents.

    • GenerativeLua. Use a Lua script to generate answers based on information in your IDOL documents.

  4. Set the required parameters for your module type:

    • For a module that uses an LLM (that is, when Type is set to ExtractiveQuestionAnsweringLLM or GenerativeLLM), set ModelPath and TokenizerPath to the path and file name for the model and tokenizer files that you created (see Create the Model Files).

    • For a module that uses a Lua script (that is, when Type is set to ExtractiveQuestionAnsweringLua or GenerativeLua), set Script to the path and file name for your Lua script.

    For example:

    [LLMExtractiveQuestionAnswering-Small]
    Type=ExtractiveQuestionAnsweringLLM
    ModelPath=modelfiles/extractivemodel.pt
    TokenizerPath=modelfiles/extractivetokenizer.spiece.model
    
    [LLMGenerativeQuestionAnswering-Large]
    Type=GenerativeLLM
    ModelPath=modelfiles/generativemodel.pt
    TokenizerPath=modelfiles/generativetokenizer.spiece.model
    
    [ExtractiveQuestionAnsweringLuaScript]
    Type=ExtractiveQuestionAnsweringLua
    Script=LLMscripts/extractive_script.lua
    
    [GenerativeQuestionAnsweringLuaScript]
    Type=GenerativeLua
    Script = LLMscripts/generative_script.lua
  5. Save and close the configuration file.