fingerprint_string
The fingerprint_string
function generates a list of fingerprints from an input string. These fingerprints can be compared with fingerprints generated from a different string to ascertain whether any parts of the strings match.
This function is intended to process document-sized portions of text. It returns multiple fingerprints because the input is divided into shorter sections (chunks) before the fingerprints are generated, so that you can find matches between two strings that are not identical but do contain common chunks of content.
Syntax
fingerprint_string( input[, min_chars[, mask_size[, num_bytes]]] )
Arguments
Argument | Description |
---|---|
input
|
(string) The input string to generate fingerprints from. |
min_chars
|
(integer) The minimum chunk size, in UTF-8 characters. The default value is 10. |
mask_size
|
(integer) Determines the frequency at which chunks are generated from the input. Increasing the value by one should approximately halve the number of chunks. Decreasing the value by one should approximately double the number of chunks. Specify a value from 1 to 31. The default value is 8. |
num_bytes
|
(integer) The length of the substring to check when calculating string chunk boundaries. Micro Focus recommends using the default value, which is 48. |
Returns
(strings). One or more strings.
Example
To map the return values to a table, surround the function call with braces. For example:
local fingerprints = { fingerprint_string("here is the input string") };