Processing and processing agent
The following FAQs address questions about file processing and the processing agent.
Some tasks require multiple processes, or steps, to complete and the task requested is in between steps and waiting to be picked up for the next step. For example, you want to send the items in a workbook to a target. For the items in this workbook, only the metadata was indexed and the source repository and destination are managed by different agent clusters. In this scenario, the items must be collected before they can be sent to the defined target. This task may show a "waiting" task status after the items are collected as the task waits to be picked up to send the items to the target.
The assigned agent may not be reachable. If the task status remains "waiting", ensure that the agents in the agent cluster assigned to the task are running and accessible. Specifically, verify that the agentAPI service is running on the agent host assigned to perform the task.
Fusion tracks file deletions when a processing job runs against a Fusion repository. A job run occurs when a repository is updated, either run on a schedule or manually updated from the Manage Repositories page in Connect (click the inline update icon for the repository or the Update button in the repository detail panel).
File systems
Fusion tracks file deletions by directly comparing with the original file system location identified by the repository path. Items are removed from the Fusion index seven days after the deletion is detected. If an item within a container file (such as ZIP) is deleted in the original file system location, the item is removed from the index as part of updating the container file when the Fusion job run occurs. In this case, the item may be removed from Fusion sooner than seven days after deletion is detected.
Exchange
No deletion detection from Exchange. Fusion retains items it has already processed until a delete action is initiated from Fusion.
SharePoint
Fusion tracks the deletion of managed SharePoint items using the SharePoint change logs. Each time processing is run on a dataset—on a schedule, or on demand—Fusion checks the SharePoint logs for deleted items. For each managed item that is deleted in SharePoint, Fusion deletes that item from the Fusion index. If an item within a container file (such as ZIP) is deleted in SharePoint, the item is removed from the index as part of updating the container file when the Fusion job run occurs.
To ensure accurate tracking of items deleted from SharePoint, ensure that the SharePoint datasets in Fusion are updated more often than the maximum number of days SharePoint logs are kept. For example, if your SharePoint logs are configured to be stored for 60 days, verify that your SharePoint datasets are updated at least every 59 days.
Content Manager
Fusion tracks the deletion of managed Content Manager items using the Content Manager delete events. Each time processing is run on a repository—on a schedule, or on demand—Fusion checks the delete events. For each managed item that is deleted in Content Manager, Fusion deletes that item from the Fusion index. If an item within a container file (such as ZIP) is deleted from Content Manager, the item is removed from the index as part of updating the container file when the Fusion job run occurs.
To ensure accurate tracking of items deleted from Content Manager, ensure that the Content manager repositories in Fusion are updated more often than Content Manager administrator purges delete events. For example, if your Content Manager administrator purges delete events every 60 days, verify that your Content Manager repositories are updated at least every 59 days.
Google Drive
Fusion tracks the deletion of managed Google Drive items using the change log for the Google drive defined by the Fusion repository. Each time processing is run on a repository—on a schedule, or on demand—Fusion checks the change logs for deleted items. For each managed item that is deleted in Google Drive, Fusion deletes that item from the Fusion index. If an item within a container file (such as ZIP) is deleted in Google Drive, the item is removed from the index as part of updating the container file when the Fusion job run occurs.
To ensure accurate tracking of items deleted from Google Drive, ensure that the Google Drive repositories in Fusion are updated more often than the maximum number of days Google Drive change logs are kept. For example, the default retention for change logs is 30 days. Verify that your Google Drive repositories are updated at least every 29 days.
As the processing agent reads each file, it generates a SHA (Secure Hash Algorithm) checksum of the file. This is a hash function which takes an input, in this case the file, and produces an item hash value that is stored in the Fusion index. This means that Fusion generates a fingerprint that can identify a file based on its contents. If multiple files with the same contents, regardless of name or extension, those files are identified as duplicates.