Detect Errors

The IDOL Web Connector makes a request to a web server for each of the pages that you want to process. The response to each request includes an HTTP status code, usually "200" which indicates that the request was successful.

If a problem occurs, the web server should provide an appropriate HTTP status code to the connector, for example "404" if the requested page is not found or "500" if there is a server error.

However, some web servers return an HTTP 200 status code even if there is a problem. Sometimes the web server returns a page that contains an error message. If you are indexing a web site that behaves in this way, you can configure error detection so that the connector can take appropriate action.

To detect error messages

  1. Stop the connector and open the configuration file.
  2. Modify your fetch task by adding the parameter ErrorPageSections. This parameter specifies the names of the sections in the connector's configuration file that contain settings for error detection. For example:

    [MyTask0]
    ...
    ErrorPageSections0=LoginError
  3. Create a new section in the configuration file, using the name you specified in the previous step, and set the following parameters:

    ErrorPageUrlRegex A Perl-compatible regular expression to match the full URLs of the pages on which you want to run error detection.
    ErrorPageCssSelector A CSS selector that identifies an element that must exist on the page, for the connector to detect an error. If you do not set this parameter, the connector detects an error for any page that matches ErrorPageUrlRegex.
    ErrorPageAttribute The name of an attribute that must exist on the element specified by ElementPageCssSelector, for the connector to detect an error.
    ErrorPageRegex A Perl-compatible regular expression that the value of the selected element or attribute must match, for the connector to detect an error.
    ErrorPageHttpErrorCode An HTTP status code that specifies how the connector behaves when an error is detected. For example, if you set this parameter to 401, the connector behaves as if there is an authentication error and the synchronize task (correctly) fails.

    For example, the following configuration detects errors on any page named login.html. The connector detects an error when there is an element in the page with an ID of authresult, and the element contains the text failed.

    [LoginError]
    ErrorPageUrlRegex=.*/login.html
    ErrorPageCssSelector=#authresult
    ErrorPageRegex=.*failed.*
    ErrorPageHttpErrorCode=401

    One example of a matching element would be:

    <h1 id="authresult">Login failed</h1>

    For more information about the configuration parameters you can use to detect and handle errors, refer to the Web Connector Reference.

  4. Save and close the configuration file.