Configure Candidate Retrieval for Passage Extractor

When you send an Ask action to your Passage Extractor or Passage Extractor LLM system, Answer Server performs the following steps: 

  1. Question Classification. Answer Server analyzes the input question to classify the key elements of the question.

  2. Candidate Retrieval. Answer Server uses the information from the classified question to send a query to the IDOL Content component to retrieve documents that might contain an answer.

  3. Answer Generation. Answer Server finds the likely answer from the candidate documents, and returns it.

For the candidate retrieval step, and for the answer generation step you can choose whether to use the standard IDOL conceptual approach, or to use a third-party LLM approach.

To use an LLM approach for the answer generation step, you configure a Passage Extractor LLM system (see Configure the Passage Extractor LLM System). To use the IDOL conceptual approach, you configure a standard Passage Extractor system (see Configure the Passage Extractor System).

To configure candidate retrieval, you can use different settings in your system, depending on the approach you want to use.

IDOL Conceptual Candidate Retrieval

The default method of candidate retrieval uses IDOL conceptual search to retrieve answers from your IDOL Content component index. You must use this option when you do not have vectors indexed in your IDOL Content component.

Conceptual candidate retrieval does not require any additional configuration, but you can add options to refine your results. For example, you can set ACIMaxResults to set a maximum number of results to retrieve, and you can set CandidateRetrievalDefaults to a configuration section that defines other parameters to use in the IDOL Content component query.

IDOL Vector Search Candidate Retrieval

You can use IDOL vector search when your IDOL Content component contains vectors in VectorType fields. For more information about the IDOL Content component vector search, refer to the IDOL Content component Help.

To configure Answer Server to send vector searches to IDOL, you must do some additional configuration in your Passage Extractor system configuration:

  • Set AnswerCandidateEmbeddingsSettings to enable vector search. This parameter points to a configuration section where you define additional settings for the vector search.

  • Create a vector settings configuration section. This section must set VectorField to define the VectorType field to query in the IDOL Content component index, and settings for the model to use to generate embeddings for the query text that Answer Server creates from your questions.

  • Create an embedding generation configuration section. This section provides details of the embedding generation model that you want to use. You can create these model files by using the export_transformers_model.py, which is included in your Answer Server installation (see Create Embedding Model Files), or you can use a Lua script for embedding generation (see Create an Embedding Generation Lua Script).

NOTE: When using vector search, you can still use the other candidate retrieval configuration parameters. For example, ACIMaxResults to set a maximum number of results to retrieve, and you can set CandidateRetrievalDefaults to a configuration section that defines other parameters to use in the IDOL Content component query.

Create Embedding Model Files

You can create the model files required for embedding generation by using the export_transformers_model.py script. These models use LLMs from Hugging Face, which the script converts into a format that Answer Server can use to generate the embeddings.

You can choose any sentence-transformer sentencepiece model to generate embeddings. The model you use must produce the same kind of embeddings as the model that you use to create vectors in your IDOL Content component.

TIP: To find models appropriate for embedding generation, you can use the hugging face tags for sentence similarity. For example: https://huggingface.co/models?pipeline_tag=sentence-similarity&sort=trending

The script export_transformers_model.py is installed in your Answer Server installation tools directory. This directory also includes a requirements.txt file to allow you to install the necessary dependencies for the script.

To create your model

  1. Install the requirements for the export_transformers_model.py script by using pip with the requirements.txt file. For example:

    pip install -r requirements.txt
  2. Run the export_transformers_model.py script with the following arguments:

    model

    The model to download from Hugging Face.

    model-type The type of model to create. For embedding generation, set this parameter to sentence-transformer.

    You can also optionally set the following arguments:

    output The file name to use for the generated model file. The default value is model.pt.
    output-spiece The file name to use for the sentencepiece tokenizer file. The default value is spiece.model.
    cache The location for the cache for model downloads. The default value is .cache.

    When the script finishes, it outputs the name and location of the model and tokenizer files that it creates. You use these values in your Embeddings configuration (see ModelPath and TokenizerPath).

Create an Embedding Generation Lua Script

You can create your own Lua script model, using any method that you choose to generate the embedding data. You can use the Lua model to interact or access third party API embedding generation, such as the Hugging Face Inference API, or Inference Endpoints service.

The script must define a function called generateembeddings. This function must accept a single parameter, which is a string representing the text to generate embeddings for.

The function must return a table of tables, where each inner table contains floating point numbers. Each inner table therefore corresponds to a vector for that text.

You can optionally include a second table of tables, which returns offsets for the embeddings in the format:

{{embedding_1_start_offset, embedding_1_end_offset}, {embedding_2_start_offset, embedding_2_end_offset}, {embedding_3_start_offset, embedding_3_end_offset}, ...},

NOTE: If you want to use offset information in IDOL (for indexing or querying), you must use UTF-8 byte offsets.

For example:

function generateembeddings(text)
   return {{1,2,3,4}, {5,6,7,8}, {9,10,11,12}}, {{34,48}, {48, 124}, {124, 156}}
end

In this example, the function returned three embeddings in the first table (with three nested tables). The second table is the offset values. The vector {1,2,3,4} starts at offset 34 and ends at 48. The second vector {5,6,7,8} starts at 48 and ends at 124, and so on.

Configure Answer Server for Embedding Generation and Search

You must configure your embedding generation in the Answer Server configuration file, with any suitable configuration section name. You then refer to this from your passage extractor system.

To configure Answer Server to generate embeddings and use vector search

  1. Open your configuration file in a text editor.

  2. Create a new configuration section for your embedding generation model, for example:

    [EmbeddingsGenerator]
  3. Set the Type parameter to the type of model you want to use. For example:

    [EmbeddingsGenerator]
    Type=Transformer
  4. Set additional parameters for your model. The required parameters depend on the type of model:

  5. Find the Passage Extractor or Passage Extractor LLM system that you want to configure to use vector search.

  6. Set the AnswerCandidateEmbeddingsSettings configuration parameter to the name of a configuration section where you define vector search settings. For example:

    AnswerCandidateEmbeddingsSettings=VectorSettings
  7. Create a new configuration section with the name you set.

  8. In this new section, set EmbeddingsConfig to the name of your embeddings generator configuration section. Set VectorField to the name of the IDOL Content component VectorType field that contains the vector values that you want to search. For example:

    [VectorSettings]
    EmbeddingsConfig=EmbeddingsGenerator
    VectorField=VECTORA
  9. Save and close the configuration file.

For example:

[PassageExtractorSystem]
IDOLHost=localhost
IDOLACIPort=6002
Type=PassageExtractor
...
AnswerCandidateEmbeddingsSettings=VectorSettings

[VectorSettings]
EmbeddingsConfig=EmbeddingsGenerator
VectorField=VECTORA

[EmbeddingsGenerator]
Type=Transformer
ModelPath=modelfiles/model.pt
TokenizerPath=modelfiles/tokenizer.spiece.model
ModelMaxSequenceLength=128