Retrieve Information from Hadoop

To automatically retrieve content from Hadoop, create a new fetch task by following these steps. The connector runs each fetch task automatically, based on the schedule that is configured in the configuration file.

To create a new Fetch Task

  1. Stop the connector.
  2. Open the configuration file in a text editor.
  3. In the [FetchTasks] section of the configuration file, specify the number of fetch tasks using the Number parameter. If you are configuring the first fetch task, type Number=1. If one or more fetch tasks have already been configured, increase the value of the Number parameter by one (1). Below the Number parameter, specify the names of the fetch tasks, starting from zero (0). For example:

    [FetchTasks]
    Number=1
    0=MyTask
  4. Below the [FetchTasks] section, create a new TaskName section. The name of the section must match the name of the new fetch task. For example:

    [FetchTasks]
    Number=1
    0=MyTask
    
    [MyTask]
  1. In the new section, set the following parameters:

    FileSystemRootUri The root URI of the filesystem to connect to.
    FileSystemPath The path in the file system to start crawling for files.

    For example:

    [MyTask]
    FileSystemRootUri=hdfs://hadoop:8020/
    FileSystemPath=/home/hadoop/files
  2. (Optional) You can set additional parameters to specify which files are retrieved by the connector. For more information about the parameters you can set, refer to the Hadoop Connector (CFS) Reference.

  3. Save and close the configuration file. You can now start the connector.