Google Drive connection
If you will be creating datasets that process Google Workspace's Drive data, you must complete additional tasks to enable processing by OpenText Core Data Discovery & Risk Insights.
NOTE: OpenText Core Data Discovery & Risk Insights supports processing of data from Google Workspace's Drive; data from personal Google Drives is not supported.
OpenText Core Data Discovery & Risk Insights uses a service account to access the user drives within your Google Workspace for processing and sending items to a Google Drive target. Use of the service account provides the opportunity to access the drives without requiring individual end-user (employee) consent, and the access does not expire.
In OpenText Core Data Discovery & Risk Insights, a dataset to process Google Drive data is associated with a single user account for Google Drive.
Requirements
Prior to beginning the connection tasks, you must have the following in place.
-
A Google Workspace for the domain that includes the desired users' drives.
-
A Google project within the Workspace, with the User Type set to Internal.
Set the User Type for the project in the Google Cloud Platform, APIs & Services > OAth consent screen.
Review your Google Cloud document quotas to ensure ideal performance. For more information about document quotas, see the Google Cloud documentation at https://cloud.google.com/docs/quota.
Configure Google Drive connection
Complete the following tasks to enable OpenText Core Data Discovery & Risk Insights to connect to and process items from or to send data to Google Drive.
TIP: If accessing multiple Google Drive workspaces, you may need to create multiple service accounts. Additionally, creating separate accounts for use by sources and targets may allow for greater flexibility.
-
Log on to Google Cloud Platform as a G Suite administrator.
-
Create a new service account for the desired project.
When creating the account,
-
do not select any roles for the Grant this service account access to project (optional) step.
-
do not grant any user access for the Grant users access to this service account (optional) step.
-
-
Create a service account key.
When creating the service account key,
-
select JSON as the Key Type.
-
download the JSON key file. You will need this file to create the Google Drive source in OpenText Core Data Discovery & Risk Insights. When you create the Google Drive source or target, you will select this JSON file when prompted for the "Certificate Upload (JSON)."
IMPORTANT: Your new public/private keypair is generated and downloaded to your machine; it serves as the only copy of this key. You are responsible for storing it securely. If you lose this keypair, you will need to generate a new one.
-
-
In the Service Account Details, select Enable Google Workspace Domain-wide Delegation
Make note of your Client ID.
-
From the APIs & Services dashboard, click ENABLE APIS AND SERVICES.
Search for and enable Google Drive API if not already enabled.
-
Log on to your domain's G Suite Admin console as a super administrator.
-
Navigate to Security > API Controls > Domain Wide Delegation > Manage Domain Wide Delegation and click Add New for the API Client.
-
Type the Client ID previously generated and set the following API scopes.
-
https://www.googleapis.com/auth/drive
-
https://www.googleapis.com/auth/admin.directory.user
Click Authorize.
-
Configure web proxy settings (optional)
The Google Drive processor service controlled by the processing agent requires connectivity to the OpenText Core Data Discovery & Risk Insights cloud components, often located away from the local network where the agent host servers are located. Although direct connectivity is ideal, use of a web proxy may be required in some environments for the agent systems to reach the OpenText Core Data Discovery & Risk Insights cloud.
-
On the machine hosting the OpenText Core Data Discovery & Risk Insights processing agent, log on to the agent administration UI.
From the Start menu, click OpenText Core Data Discovery & Risk Insights Agent > Agent Admin.
-
In the navigation pane, click Advanced Settings.
-
In the Category list, click Google Drive Processor.
-
Complete the following options.
Option Description Proxy address URL Type the URL of the web proxy.
Proxy bypass list Type a comma separated list of addresses that do not use the proxy server..
-
Click Save. You can close the agent administration UI.
Google Drive as a target
You can send data managed by OpenText Core Data Discovery & Risk Insights to a destination on Google Drive. Destinations are associated with a source and define a specific location on Google Drive and accessed using a Google user account.
IMPORTANT: OpenText Core Data Discovery & Risk Insights supports sending data to a Google Drive target using a single processing agent.
Before creating a Google Drive targets, create at least one agent cluster with a single processing agent. When you create Google Drive targets, select an agent cluster with a single agent. Destination directories (or any portion of) for your targets must not overlap in any way.
When you create a target, you will define whether the target is a shared drive or a user drive. If a shared drive, the account you define to access the drive must be the owner or have Editor permission for the drive.
The destination path defined is a relative path and is case sensitive. If the case in the path you type is different than the case in the path on Google Drive, a new folder is created at the level of the case difference. For example, the relative path on Google Drive is hr/recruiting and you type hr/Recruiting. The end result is that items sent to this destination will be sent to a new sub-directory /Recruiting under the /hr directory alongside the /recruiting sub-directory.
Google Drive deletion tracking
OpenText Core Data Discovery & Risk Insights tracks the deletion of managed Google Drive items using the change log for the Google drive defined by the source in Connect. Each time processing is run on a dataset—on a schedule, or on demand—the application checks the change logs for deleted items. For each managed item that is deleted in Google Drive, the application deletes that item from the application index. If an item within a container file (such as ZIP) is deleted in Google Drive, the item is removed from the index as part of updating the container file when the job run occurs.
To ensure accurate tracking of items deleted from Google Drive, ensure that the Google Drive datasets in Connect are updated more often than the maximum number of days Google Drive change logs are kept. For example, the default retention for change logs is 30 days. Verify that your Google Drive datasets are updated at least every 29 days.