Web Connector

The Web Connector is an IDOL connector that retrieves information from the World Wide Web. The connector can crawl the Web by following the links that exist on each page, or retrieve the resources listed in a site map.

The Web Connector uses an embedded browser to process Web pages and therefore provides several advantages over the HTTP Connector:

  • The Web Connector supports Javascript and dynamic pages. If Javascript is used on a page, the script runs before the connector crawls the page for links and ingests the page. As a result the connector can crawl links generated by the script and the page that is ingested is the final source after any scripts have run.
  • The Web Connector can interpret the structure of a page, as defined by the HTML markup, in addition to reading the content. As a result, when configuring the connector to extract links or submit forms, you can identify page elements using CSS selectors rather than regular expressions.