Content Component

Content Component is an ACI server. For details of changes that affect all ACI servers, see ACI Server Framework.

24.2.0

New Features

  • The prediction of total results for the GetQueryTagValues action has been optimized to improve performance.

  • Vector queries on sectioned documents have been improved. Previously, the VECTOR operator could only return the first section of a document. Now, when you have source metadata for your vectors, and you have generated embeddings from the text in a section-broken field, the query returns the document ID of the relevant section.

  • The BIASVAL FieldText operator now accepts an optional third argument, fuzzy distance. A scaled boost is applied if the field value for the hit document is within this fuzzy distance of the query value. An exact match earns the full boost, as before.

  • The performance of vector index compaction during a DRECOMPACT operation has been improved.

  • The processing of AUTONOMY_SECURITY_V4_TRIM_CONTEXT_EXT_MAPPED security type has been optimized.

    NOTE: You must also update the shared security library (mapped_security.dll or mapped_security.so) to see these improvements.

  • The third-party LZ4 library used for NodetableCompression has been updated to version 1.9.4.

  • The language files now include the makeda utility (in the langfiles/jpn-cha directory) to allow you to create custom dictionary data files for the ChaSen Japanese sentence-breaking library.

  • The profile of relevancy scores assigned to VECTOR matches has been improved to better reflect the quality of results.

Resolved Issues

  • In some circumstances, the Content index port could become unresponsive. This issue affected versions 23.4 and 24.1.0 on Windows platforms.

  • When a large number (typically, thousands) of vector field instances with identical values were indexed, some expected results were missed from a vector search.

  • The VECTOR operator did not function correctly if Ngram was configured (and not restricted to multibyte characters).

  • Intermittently, an index command could briefly be reported as having status -37 (Failed), before going on to report successfully completed.

  • If CollectFieldStatistics was enabled, but the source document for a Suggest action had no suitable metadata fields to use for structured suggestions, the action could fail with an error "The supplied fieldtext was invalid".

  • When using a vector index with SearchUncommittedDocuments set to True, processing large numbers of small index jobs could become slower than necessary.

  • In some cases, GetQueryTagValues counted documents with values outside of a specified range. This usually happened when a document with a value outside the range was counted when a document with a higher doc ID had a value within the range.

24.1.0

New in this Release

  • You can include additional metadata in VectorType fields. For example:

    #DREFIELD MYVEC="0.1,0.2,0.3;source=DRECONTENT[0:55]"

    This example includes the source field and offset information for the embedding. The NiFi document embeddings generator automatically creates this metadata.

  • You can now include any available vector metadata in your Query or Suggest action responses by setting the new VectorMetadata parameter to True. This parameter has an effect for Query actions with the VECTOR operators, and Suggest actions with the UseVectors parameter set to True.

  • You can now highlight text that matches in a VECTOR query, as long as you print the vector source field in the results. Content highlights the text that was used to generate the vector that matches the query.

    Vector highlighting works for all highlight options (for example, sentence highlighting highlights all sentences that overlap with the vector source text). For terms highlighting, Content highlights only entire words, even if the vector offset is in the middle of a word. Proximity highlighting is currently the same as terms highlighting, as there is no defined proximity between vector and non-vector fields.

  • You can now request a vector summary for query hits, by using the new vector option in the Summary parameter. For example:

    action=query&text=VECTOR{0.1,0.2,0.3}:VECTORA&maxresults=10&summary=vector

    The summary consists of the text corresponding to the vectors that match in the document, concatenated with an ellipsis (...). The matching text is term-aligned, so that if the vector includes a partial term in the source field, the summary includes the whole term.

    If a document returns without matching a query vector, IDOL generates a quick summary.

Resolved Issues

  • There was an issue with the security type AUTONOMY_SECURITY_V4_TRIM_CONTEXT_EXT_MAPPED (for Content Manager). In some circumstances, users who were not permitted to bypass exclusions could be incorrectly denied access to documents.
  • When using the SYNONYM operator in a query, relevancy scores did not always reflect the total number of occurrences of synonym terms in the matched documents.

  • The VECTOR operator did not return results from the GetQueryTagValues action unless maxneighbors was set in the VECTOR operator, or if the MaxValues parameter was set in the action.

23.4.0

New in this Release

  • Content now allows you to collect basic statistics about the distribution and data content of your document fields, such as:

    • The total number of individual occurrences of each field, and the number of distinct documents each appears in.

    • How many distinct values are observed for each field.

    • Occurrence counts for values that might be parsed as numeric, date, or geographic values.

    • Distribution (minimum, maximum, and mean) of numeric and date values, and value lengths, for each field.

    You can configure statistics collection by setting the new CollectFieldStatistics configuration parameter to in the [Server] section.

    For an existing index, you can also use the new RegenerateFieldStatsIndex configuration parameter to generate field statistics at startup.

    When you enable field statistics:

    • you can retrieve statistics for each field by using the GetTagNames action with the new FieldStats parameter set to True.

    • the Suggest action uses structural information from the source documents as well as the unstructured information to find relevant documents.

    • the TermGetBest action can return information about the occurrences of structured field and value pairs, when you set the new FieldStats parameter to True.

    • The GetQueryTagValues can return total occurrences for non-parametric fields (that is when you set both AllowNonParametricFields and DocumentCount to True).

  • You can now index vector values into Content, and use these for queries. A vector in a document is a comma-separated list of floating point values. You can generate vectors by using many different models. Content can then use these vectors to find documents that are similar to a vector value that you use in the new VECTOR operator in a Query action, or to perform Suggest queries.

    To configure Content to process and use vector values, you must use the new VectorType field property for the field that contains the vector values. You can update an existing index to use these vector values by setting the RegenerateVectorIndex configuration parameter, or by using the DREREGENERATE index action.

    You can configure the method to use to determine how close vectors are to one another by setting the DistanceMetric parameter in the [VectorIndex] configuration section. You can also change the directory that Content uses to store the vector index files by setting the VectorPath parameter in the [Paths] section.

    For more information, refer to the IDOL Content Component Help.

  • The spellcheck phase of a query now respects timeouts.

  • Indexing performance has been improved when sending documents to Content in small batches.

  • The mapped security library has been updated. The security type AUTONOMY_SECURITY_V4_TRIM_CONTEXT_EXT_MAPPED (for Content Manager) now supports exclusions.

  • Performance has been improved for cases where a several index actions were issued sequentially with a pause ot wait for each to complete before sending the next (for example, from a script or application that polls for a finished status between running each action).

Resolved Issues

  • When requesting value details from a numeric field (with the GetQueryTagValues action and ValueDetails set to True), results were sometimes missing from multi-section documents.

  • When Content was archiving index actions, and the index log stream was configured to report messages at Full log level, sending an index action with the NoArchive parameter set to True could cause an unexpected interruption of service.

  • Geospatial queries could time out when the XMLFullStructure configuration parameter was set to Trueand there were a large number of geospatial fields (more than approximately 10000).

23.3.0

New in this Release

  • The handling of reasons has been improved to merge overlapping reasons. For example, the query text James Watt" DNEAR Jr previously gave the reasons James Watt and Watt Jr. It now returns the single reason James Watt Jr.

  • The efficiency of suggesting spelling corrections has been improved. This change gives particular improvements when UnstemmedMinDocOccs is configured to a value less than the current SpellCheckCorrectMinDocOccs setting.

  • Several updates and improvements have been made to the BIAS FieldText operators:

    • The new BIASRANGE operator has been added. This operator allows you to bias the score of results that fall within a particular date range. It also allows you to reduce the score bias for values within a specified range outside this optimum range. For example:

      BIASRANGE{21/08/2011,25/08/2011,172800,86400,10}:DATE

      This example boosts the score by 10% for documents with a DATE field value in the range 21/08/2011 to 25/08/2011 (inclusive). It gives a smaller boost (on a linear scale) for documents within 172800s (two days) before 21/08/2011, and 86400s (one day) after.

    • The new BIASNRANGE operator has been added. this operator allows you to bias the score of results that contain a value within a specified range in a specified field, and to reduce the score bias linearly for values within a specified range outside this optimum range. For example:

      FieldText=BIASNRANGE{100,150,20,40,10}:*/PRICE

      A document whose PRICE field value is between 100 and 150 has its weight increased by 10%. This boost decreases linearly to 0% at 80 and lower, and 190 and higher.

    • The BIASVAL operator now supports an empty value for its first argument. For example, BIASVAL{,10}:COLOUR applies a score boost to any result document that does not have a COLOUR field, or has a COLOUR field with an empty value.

      NOTE: BIASVAL still requires two arguments, so BIASVAL{10}:COLOUR is not valid syntax.

    • You can now use all BIAS field specifiers in FieldTextField fields for use with AgentBoolean queries (that is, BIAS, BIASDATE, BIASDISTCARTESIAN, BIASDISTSPHERICAL, BIASVAL, BIASRANGE, and BIASNRANGE are now supported for AgentBoolean queries).

  • You can now use an open-ended range in the NRANGE field operator by setting one of the values to a period (.). For example NRANGE{.,5}:NUM means that the NUM field must contain a value of 5 or less.

  • The GetQueryTagValues value response when DocumentCount is set to True now includes the total number of occurrences for each value in the server.

Resolved Issues

  • When used in conjunction with the WHEN operator in XML full-structure mode, the TERM and TERMEXACT FieldText specifiers failed to return some documents that should have matched.

  • The indexer thread could be blocked for an extended time when attempting to delete a file, if the target had been removed in the meantime by an external process.

  • When rebuilding the unstemmed index with RegenerateUnstemmedIndex, numeric/alphanumeric terms were sometimes excluded, regardless of the configured IndexNumbers value.

  • The Content component NiFi processor, ContentServiceImpl was unable to obtain a license correctly.

23.2.0

New in this Release

  • Loading has been optimized for ACLType fields that have also been configured as MemcachedType (see NodetableCacheFields).

    NOTE: This change is only relevant to security models where the DLL load is required for evaluation.

  • The QueryCacheMaxMemKB configuration parameter has been added to the [Security] section. Set this parameter to a value in KB to enable a per-query cache that speeds up security checks for cases where there are many non-unique ACLs in the system (for example, where security is inherited from a top level folder). If the same ACL has already been evaluated during the query, Content does not need to call the security DLL again. You can set QueryCacheMaxMemKB to -1 for an unlimited cache size, or 0 to disable the cache.

    NOTE: This change is only relevant to security models where the DLL load is required for evaluation (for example, there is no need to use this parameter with NT_V4 security).

Resolved Issues

  • In some cases, Content failed to return hits for terms that existed only in the index cache and not in any indexed documents when SearchUncommittedDocuments was set to True.

  • Content could spuriously log an error "Dynterm list is NULL for term". This error tended to happen for terms with a large number (millions) of occurrences, in servers where documents were regularly deleted and the index compacted.

  • When the Active Directory contained a group name that ends with a space character, the Content security index could become invalid after the component was restarted.

  • When the saved best terms cache file was non-valid, the Content application could shut down during a DRECOMPACT operation. Content now automatically rebuilds the cache if it cannot load the saved file.