text_to_docs
The text_to_docs
function splits a file into multiple documents.
Syntax
text_to_docs( doc, sectionName, filename)
Arguments
Argument | Description |
---|---|
doc
|
(LuaDocument) The document that you want to divide into multiple documents. |
sectionName
|
(string) The name of the section in the CFS configuration file that contains the TextToDocs configuration parameters. For information about these parameters, see TextToDocs Task Parameters. |
filename
|
(string) The file that contains the text to be converted (the original file that resulted in the document). |
Returns
LuaDocuments. A list of document objects representing the documents that are produced.
Example
You might have a connector ingesting files from a repository, but want to split those files into multiple documents. The following example uses the get_filename function to find the path of the file associated with an ingested document, and uses the text_to_docs
function to generate multiple documents. This example splits the file using settings in the [MyTextToDocs]
section of the CFS configuration file. It then calls the ingest function to add the resulting documents to the ingest queue.
function handler(document) if document:hasField("PROCESSED") then return true end local file = get_filename(document) local docs = text_to_docs(document, "MyTextToDocs", file) for i, doc in ipairs(docs) do doc:addField("PROCESSED", "YES") ingest(doc) end return true end
In this example, the original documents are also indexed. If you want to index only the documents generated by the text_to_docs
function, you could return false
from the handler
function.