Web sites might have pages, such as RSS feeds, that contain XML rather than HTML. This section describes how you can use HPE Web Connector to process XML pages.
HPE Web Connector identifies XML pages by the MIME type returned in the Content-Type
header of the response from the web server.
XML pages can include processing instructions that instruct a web browser to apply an XSL transformation to the page, and sometimes the XML is transformed into HTML. The processing instruction looks similar to this:
<?xml-stylesheet type="text/xsl" href="transform_to_html.xsl"?>
If the Content-Type
header indicates that the page contains XML but no XSL transformation is provided, the connector ingests the page as an XML document.
If the header indicates that page contains XML and an XSL transformation is provided, the connector applies the transformation and processes the page as if it were HTML. This means that the connector can:
As a result, you can use the HPE Web Connector to process an RSS feed that is transformed into HTML. The Web Connector, like the RSS Connector, retrieves the content contained in the feed, such as page titles and summaries. In addition, the Web Connector can follow the links contained in the feed and ingest the content on the associated pages.
In the Web Connector task configuration, specify the URL of the feed using the Url
parameter and set Depth=1
. Setting Depth=1
ensures that the connector follows the links from the RSS feed, but does not follow any links that are extracted from the associated web pages:
[FetchTasks] Number=1 0=RssFeed [RssFeed] Url=http://www.example-news-website.com/feed/rss.xml Depth=1
When the HPE Web Connector applies an XSL transformation to an XML document and logging is configured with LogLevel=Full
, you will see the following messages in the synchronize log:
WKOOP:Applying XSL transfrom WKOOP:XSL Transform applied to XML document
|