Source Code Identification

When KeyView auto-detects a file that contains source code, it can attempt to identify the programming language that it is written in.

When you do not enable source code identification, files containing source code may be identified as ASCII text files, causing the application to treat them in the same way as ordinary text. However, in many instances, it can be useful to route these files elsewhere or filter them out. For example, indexing source code into an IDOL index has minimal value and could bloat the engine with terms that are of no use in retrieval. You can use source code identification to identify files containing a particular programming language as a more specific format.

You can set source code identification to different levels.

Option Description
KVSOURCECODE_OFF Do not enable source code identification.
KVSOURCECODE_ENABLED

Enable source code identification for the most common source code formats.

This option can detect KeyView formats 498-545, which would otherwise be detected as ASCII_Text_Fmt.

KVSOURCECODE_EXTENDED

Enable source code identification for all supported source code formats. This option might lead to false positives in some cases (for example, a C++ file might get identified as a rarer format).

This option can detect KeyView formats 498-545, and 749-907, which would otherwise be detected as ASCII_Text_Fmt.

For the complete list of source code formats supported for both options, see Supported Formats.

TIP: Source code identification does not correspond exactly to the adSOURCECODE file class. Most files that are detected by source code identification have this file class, but there are exceptions. Also be aware that some formats in the adSOURCECODE class can be detected without enabling source code identification.

To configure source code identification

  • In the C API, call the function fpSetConfig() and set the flag KVFLT_SOURCECODEDETECTION.

    (*fpSetConfig)(pKVFilter, KVFLT_SOURCECODEDETECTION, KVSOURCECODE_ENABLED, NULL);

    Setting the option through fpSetConfig overrides any settings in formats.ini.

  • In formats.ini, set the following parameter to the appropriate level. (This is an alternative approach - you do not need to do this if you have configured this feature through the API).

    [Options]
    SourceCodeDetection=KVSOURCECODE_ENABLED