Identifiers

The identifiers fetch action retrieves a list of items that are present in the repository and returns an identifier for each item. This action can be used by front end applications for providing an interface to browse a repository. You can use the identifiers that are returned by this action in other connector actions that require you to provide identifiers.

By default the identifiers action only returns identifiers for items that would be synchronized using the specified task configuration. If you have set configuration parameters to exclude certain items, those items are not returned. If you want to see excluded items, set the action parameter ShowExcluded to TRUE.

TIP: The identifiers fetch action does not expand container files, and does not provide identifiers for sub-files.

Type: Asynchronous

Parameter Name Description Required
Config A base-64 encoded configuration. The configuration parameters that are set override the same parameters in the connector's configuration file. No
ConfigSection The name of the configuration file section that contains the task settings. Yes
ContainersOnly A Boolean value (default false) that specifies whether to return only those items that represent containers. No
FilterTypes A comma-separated list of the types of items to return identifiers for. If you omit this parameter, the action returns items of all types. No
Identifiers A comma-separated list of identifiers. The action returns identifiers (and status information if supported) for these items and their ancestors, but does not return descendant items (to do this, set the ParentIdentifiers parameter instead). Set this parameter or ParentIdentifiers
IdentifiersAction The name of an action to perform on the returned identifiers. Only the collect fetch action is available. If the action you specify would require additional parameters, specify them as parameters to this action. No
MaxDepth The maximum depth that the connector crawls in the repository (from ParentIdentifiers). The default maximum depth is 1. To specify no limit, set this parameter to 0 (zero). Be aware that if you increase the maximum depth or specify an unlimited maximum depth, the action could take a long time to complete. No
ParentIdentifiers A comma-separated list of identifiers. The action returns identifiers (and status information if supported) for these items, and for ancestors and descendants of these items. To specify the root of the repository, set this parameter to ROOT. Set this parameter or Identifiers
ShowAncestors A Boolean value (default true) that specifies whether to return the identifiers of ancestors for items specified by ParentIdentifiers or Identifiers. The action returns parent items up to the root of the repository. No
ShowAttributes A Boolean value (default true) that specifies whether to show attributes in the response. For example, shows whether an item is a container, and shows whether a document could be ingested to represent the item. No
ShowDocStatus A Boolean value (default false) that specifies whether to show status information. This can include the ingestion status for each document and the modification history for items in the repository. No
ShowExcluded A Boolean value (default false) that specifies whether to return identifiers for excluded items. (Items that would not be synchronized because they are excluded by the task configuration). By default the action only returns items that would be synchronized. If you want to see items that would be ignored, set this parameter to true. No
ShowMetadata

A comma-separated list of basic metadata fields to return for each item. You can set this parameter to a comma-separated list of the following values:

  • createdDate - the date when the item was created.
  • modifiedDate - the date when them was last modified.
  • sizeBytes - the size of the item in bytes.

If you omit this parameter the connector does not return metadata. This feature is not supported by all connectors, and some platforms, repositories, or items might not support all of the metadata fields.

No
ShowNames A Boolean value (default true) that specifies whether the response shows a display name for each item (if one is available). No
ShowTypes A Boolean value (default true) that specifies whether the response shows the type of item that each identifier represents. No
Override_Config_Parameters

Any other action parameters that you set override settings in the connector's configuration file. For example:

/action=fetch&fetchaction=...
&[Section]Parameter=Value

where [Section] (optional) is the name of a configuration file section, Parameter is the name of a configuration parameter, and Value is the parameter value.

No

Example

The following example sends the identifiers fetch action to the connector.

http://localhost:7054/action=Fetch&FetchAction=Identifiers
                                  &ConfigSection=MyTask
                                  &ParentIdentifiers=ROOT
                                  &MaxDepth=2

Response

This action is asynchronous, so Hadoop Connector always returns success accompanied by a token. You can use this token with the QueueInfo action to retrieve the status of your request.

The response contains an <identifiers parent_identifier="..."> element for each of the identifiers passed to the action in the ParentIdentifiers or Identifiers action parameter. If you use the ParentIdentifiers action parameter and set the MaxDepth action parameter to a value greater than 1, the response also contains an <identifiers parent_identifier="..."> element for descendant items that have child identifiers, down to the requested depth. The parent_identifier attribute specifies the identifier of the item. These elements can also include the following attributes:

  • self="true" indicates that this is one of the identifiers you passed to the action in the ParentIdentifiers or Identifiers parameter.
  • ancestor="true" indicates that this item is an ancestor of one of the identifiers you passed to the action in the ParentIdentifiers or Identifiers parameter. You can hide these elements by setting the action parameter ShowAncestors=False.
  • descendant="true" indicates that this item is a descendant of one of the identifiers you passed to the action in the ParentIdentifiers parameter. You can limit the number of descendants that are returned by setting the action parameter MaxDepth.

Each <identifiers parent_identifier="..."> element contains <identifier ...> elements for items that are direct descendants of the parent item.

An <identifiers ...> or <identifier ...> element can provide the following attributes:

  • name - a display name for the item (if one is available).
  • attributes - contains a comma-separated list of attributes for the item.

    • If an item has the container attribute it is an item that can contain other items. To retrieve the identifiers of the child items, increase the value of MaxDepth or run the action again, using the identifier of the container as the value of the ParentIdentifiers action parameter.
    • If an item has the document attribute it is a file or has metadata that can be ingested.
  • meta_* - these attributes contain basic metadata for the item. They are present in the response only if you set the ShowMetadata action parameter. This feature is not supported by all connectors, and some platforms, repositories, or items might not support all of the metadata fields.

    • meta_createdDate - the date when the item was created, in epoch seconds.
    • meta_modifiedDate - the date when the item was last modified, in epoch seconds.
    • meta_sizeBytes - the size of the item, in bytes.
  • type - the type of item that the identifier represents.

  • exclude - indicates whether the item is excluded from being synchronized by the task configuration. This attribute can have any combination of the values self (the item itself is excluded) and children (all descendants of the item are excluded, to an unlimited depth). For example, exclude="self,children" means that the item and its descendants are both excluded. Excluded items and the exclude attribute are returned in the response only when you set the action parameter ShowExcluded=TRUE. Be aware that an identifier might not be marked as excluded but could be excluded because one of its ancestors has the exclude attribute exclude="children".

The identifiers action for the Hadoop Connector can return the following types:

Type Possible Attributes Description Display Name Supported Metadata Fields Child Identifier Types
Directory Container A directory in the Hadoop file system. The path of the directory.   Directory, File
File Document A file in the Hadoop file system. The file name. modifiedDate, sizeBytes