The identifiers
fetch action retrieves a list of documents that are present in the repository and generates an identifier for each document. This action might be used by front end applications for providing an interface to browse a repository.
The action returns identifiers for only those documents that meet the requirements defined in the task configuration. For example, if you use configuration parameters to exclude certain Web pages, the connector does not return identifiers for those Web pages.
Tip: The identifiers
fetch action does not expand container files, and does not provide identifiers for sub-files.
Type: Asynchronous
Parameter Name | Description | Required |
---|---|---|
Config
|
A base-64 encoded configuration. The configuration parameters that are set override the same parameters in the connector's configuration file. |
No |
ConfigSection
|
The name of the configuration file section that contains the task settings. | Yes |
ContainersOnly
|
A Boolean value (default false) that specifies whether to return only those items that represent containers. | No |
IdentifiersAction
|
The name of an action to perform on the returned identifiers. Currently only the collect fetch action is available. If the action you specify would require additional parameters, specify them as parameters to this action. |
No |
ParentIdentifiers
|
A comma-separated list of identifiers. The action returns identifiers for items that exist below these identifiers in the repository. To specify the root of the repository, set this parameter to ROOT . |
Yes |
ShowAttributes
|
A Boolean value (default true) that specifies whether to show attributes in the response. The possible attributes are determined by the type of repository that you are connecting to. | No |
ShowNames
|
A Boolean value (default true) that specifies whether the response shows the URL for each item. | No |
ShowTypes
|
A Boolean value (default true) that specifies whether the response shows the type of item that each identifier represents. The possible types are determined by the type of repository that you are connecting to. | No |
Override_Config_Parameters
|
Any other action parameters that you set override settings in the connector's configuration file. For example: /action=fetch&fetchaction=... where |
No |
The following example sends the identifiers
fetch action to the connector. The connector returns the items that it finds by crawling the site from the URL specified in the task configuration:
http://localhost:1234/action=Fetch&FetchAction=Identifiers &ConfigSection=MyTask &ParentIdentifiers=ROOT
The fetch
action is asynchronous, so it returns a token:
<autnresponse> <action>FETCH</action> <response>SUCCESS</response> <responsedata> <token>MTYuMjguOTQuMTcyOjEyMzQ6RkVUQ0g6MTQzOTgyOTY0OTQ2NjAxMjY4ODQ1NTM2</token> </responsedata> </autnresponse>
You can use the token with the QueueInfo
action to retrieve the response.
<autnresponse> <action>QUEUEINFO</action> <response>SUCCESS</response> <responsedata> <actions> <action> <status>Finished</status> <queued_time>2015-Aug-17 17:40:49</queued_time> <time_in_queue>1</time_in_queue> <process_start_time>2015-Aug-17 17:40:50</process_start_time> <time_processing>2</time_processing> <process_end_time>2015-Aug-17 17:40:52</process_end_time> <documentcounts> <documentcount task="MyTask" errors="0" seen="108" /> </documentcounts> <fetchaction>IDENTIFIERS</fetchaction> <identifiers parent_identifier="[identifier1]"> <identifier name="http://hp.com/example/012345" type="Page" attributes="container,document">[identifier2]</identifier> <identifier name="http://hp.com/example/012346" type="Page" attributes="container,document">[identifier3]</identifier> <identifier name="http://hp.com/example/012347" type="Page" attributes="container,document">[identifier4]</identifier> <identifier name="http://hp.com/example/012348" type="Page" attributes="container,document">[identifier5]</identifier> <identifier name="http://hp.com/example/012349" type="Page" attributes="container,document">[identifier6]</identifier> ... </identifiers> <token> MTYuMjguOTQuMTcyOjEyMzQ6RkVUQ0g6MTQzOTgyOTY0OTQ2NjAxMjY4ODQ1NTM2 </token> </action> </actions> </responsedata> </autnresponse>
The following information is available for each item:
parent_identifier
attribute specifies the identifier of the parent item. For example, a page might be crawled by following a link from the index page of the site. In this case the parent identifier would be the identifier of the index page.name
attribute shows the URL of the page.The type
attribute shows the type of item that the identifier represents:
Canonical Link
- the item is a page but it contains a canonical link to another page. If IgnoreCanonicalLinks=False
, the connector does not crawl the page for links or ingest the page.Page
- the item is a Web page.The attributes
attribute contains a comma-separated list of attributes for the page.
container
attribute would be crawled by the (synchronize) fetch task for links to other pages. If the page contains a canonical link to another page (type="Canonical Link"
), the connector only follows the canonical link.document
attribute would be ingested by the (synchronize) fetch task.
|