Incremental Synchronize

The incremental synchronize implemented by the sample code in IncrementalSynchronize.java works by maintaining the following datastore table:

Reference LastModified PendingAction Seen
... ... ... ...

The connector adds one row to the table for every file that it finds in the repository. The full path of the file is used as the reference.

Ignore Unchanged Files

The connector writes the file's last modified date to the "LastModified" column. In subsequent cycles, if the connector encounters a file for which the last modified date has not changed, it does not ingest it again.

Delete Unseen Items

At the beginning of the synchronize action, the value of the "Seen" column is set to “UNSEEN” for all rows (if there are any rows). When the connector loops over all the files in the repository, the record for each file is updated so that the value of the "Seen" column is “SEEN”. When the connector has finished looping over the files, any records that still have “UNSEEN” in the "Seen" column must have been deleted. The connector can send ingest-deletes for these files to the ingestion target.

Pending Actions

When the connector determines that a file needs to be ingested or deleted from the ingest target it sets the "PendingAction" column for the file’s record. A pending action can be one of the following:

  • None – There is nothing to do, the ingest target has the latest version of this file.
  • Add – An ingest add for this file needs to be sent to the ingest target.
  • Replace – An ingest replace for this file needs to be sent to the ingest target.
  • Remove – An ingest delete for this file needs to be sent to the ingest target.

The connector later loops through the records in the datastore table and performs whatever actions are needed.

When a file has been ingested successfully the "PendingAction" for the file is set to “None”. When a file is ingest-deleted successfully the record for the file is deleted.

If ingestion of a file is unsuccessful, the "PendingAction" is not changed. The connector attempts to perform the action again on the next synchronize cycle.

NOTE: The sample code scans all the files in the file system on every synchronize cycle, checking every file against the datastore to see if it is new or has changed. Some repositories provide a more efficient method to determine what has changed. Connectors for these repositories might not need to store information for every file in the datastore.