Retrieve Documents from a Repository
The following table lists the fetch actions that retrieve information from a repository.
Action | Description | Method to override |
---|---|---|
action=fetch&fetchaction=synchronize
|
Sends ingest commands to the ingest target to bring it up to date with what is contained the repository. | Synchronize
|
action=fetch&fetchaction=synchronize&identifiers=...
|
Forces a synchronize of the documents listed by the identifiers action parameter, whether they have changed or not. Ingest deletes are sent to the ingest target if the documents have been deleted. |
SynchronizeIds
|
action=fetch&fetchaction=Collect
|
Retrieve content and metadata of specified documents from the repository. | Collect
|
action=View
|
Retrieve a single document from the repository. | View
|
The synchronize
action has already been demonstrated in A Complete Synchronize Action and Make an Incremental Synchronize Action.
SynchronizeIds
, Collect
, and View
I
DocInfo
objects. These contain no metadata or content, but each contains the identifier of a document to retrieve. Your connector must try to set the content and metadata for these documents from the repository. For an individual I
DocInfo (doc)
, indicate success or failure to retrieve the document by performing the operations shown in the following table:
Method | Success operation | Failure operation |
---|---|---|
SynchronizeIds
|
doc.Success();
|
doc.Failed(message);
|
Collect
|
|
|
View
|
Return as normal | Throw an exception from the View method. |
You can throw exceptions for any fatal errors, such as network failures that cause the retrieval of all documents to fail, from any of the methods.
View and Collect Example
The collect
fetch action and view
action might appear to be very similar actions, and often they are implemented to share most of the implementation. However, there are some important differences:
Collect
is an asynchronous fetch action. It should be able to handle stop requests if the action is likely to take some time.View
is a synchronous action, so it should be quick to execute.Collect
retrieves the content (file or text) and metadata for multiple documents when provided with the document identifiers.View
retrieves the content (file) for a single document when provided with the document's identifier; metadata might also be retrieved but is discarded later.Collect
should handle any exception that might occur from an individual document so that remaining documents are still processed.View
might throw any exception caused by the attempt to retrieve the single document.
The following sample code shows how collect
and view
might be implemented for a basic connector, using the file system as a repository (like the connector introduced in A Complete Synchronize Action):
public override void Collect(ICollectTask task) { foreach (IDocInfo doc in task.Docs) { try { CollectDocument(doc); doc.Success(); } catch (FileNotFoundException e) { doc.Failed(e.Message); } } } public override void View(IViewTask task) { CollectDocument(task.Doc); } private void CollectDocument(IDocInfo doc) { if (File.Exists(doc.Identifier.Reference)) { doc.File = DocFile.Create(doc.Identifier.Reference, false); } else { throw new FileNotFoundException("File Not Found", doc.Identifier.Reference); } }
The view
action is provided with a single document, IViewTask.Doc
, while the collect action is provided with multiple documents, ICollectTask.Docs
File
property to the file (if there is one) and update the Document
property with any metadata.
If the information is successfully retrieved from the repository and set in the file, call the S
uccess
method to indicate that the document was handled successfully. If there is a problem, call the F
ailed
method with a description of the error that can be reported to the user. If C
ollect
calls neither the
nor the S
uccess
method for a document, failure is assumed and a warning message is written to the logs.F
ailed