Retrieve Documents from Drupal

To retrieve information from a Drupal content management system, create a new fetch task by following these steps. The connector runs fetch tasks automatically, based on the schedule that is configured in the configuration file.

To create a new Fetch Task

  1. Stop the connector.
  2. Open the configuration file in a text editor.
  3. In the [FetchTasks] section of the configuration file, specify the number of fetch tasks using the Number parameter. If you are configuring the first fetch task, type Number=1. If one or more fetch tasks have already been configured, increase the value of the Number parameter by one (1). Below the Number parameter, specify the names of the fetch tasks, starting from zero (0). For example:

    [FetchTasks]
    Number=1
    0=MyTask
  4. Below the [FetchTasks] section, create a new TaskName section. The name of the section must match the name of the new fetch task. For example:

    [FetchTasks]
    Number=1
    0=MyTask
    
    [MyTask]
  1. In the new section, set the following configuration parameters:

    DrupalHost The machine that hosts the Drupal content management system.
    ViewUris A list of view URIs that the connector can use to list top-level entities. For more information, see Configure Drupal for the Connector.
    BasicUsername The user name to use to authenticate with the Drupal API.
    BasicPassword The password to use to authenticate with the Drupal API. For information about how to encrypt the password before entering it into the configuration file, see Encrypt Passwords.
    EntitySections A comma-separated list of configuration file sections that define the entity types to retrieve.
  2. Create a section for each of the entity types that you want to retrieve and set the following configuration parameters in each section:

    Name The name of the entity type. For example, node, media, or user.
    IdAttribute The path to, or name of, the attribute in the response from the Drupal REST API that contains the entity's unique ID number.
    ContentAttribute The path to, or name of, the attribute in the response from the Drupal REST API that contains the entity's content. This information becomes the content of the ingested document.
    ContentAttributeIsUrl If the JSON attribute identified by ContentAttribute contains a URL to a file, set this parameter to true.
    ScourEntities

    The connector can follow links from the entities that it retrieves to entities of different types, and other entities of the same type.

    Set this parameter to a comma-separated list of the entity types to retrieve by following a link from an entity being processed. Any object in a JSON response containing only the attributes 'id', 'resource', and 'url' is treated as a link to another entity. The entity names you specify must relate to the 'resource' attribute in the response from the Drupal API.

    You can use the wildcard * to specify all entity types.

    You can also set this parameter in the TaskName section, to provide a default value for all of your entity definitions.

    For example:

    [MyTask]
    BasicUsername=user
    BasicPassword=password
    DrupalHost=https://drupal.example.com
    ViewUris=/views/all_content
    EntitySections=ENTITY_NODE,ENTITY_MEDIA,ENTITY_USER
    ScourEntities=*
    
    [ENTITY_NODE]
    Name=node
    IdAttribute=nid
    ContentAttribute=body/value
    
    [ENTITY_USER]
    Name=user
    IdAttribute=uid
    
    [ENTITY_MEDIA]
    Name=media
    IdAttribute=mid
    ContentAttribute=field_media_image/url
    ContentAttributeIsUrl=true
    
  3. Save and close the configuration file.