Render an Image, Thumbnail, or PDF for Ingested Pages

The HPE Web Connector can render an image, thumbnail image, or PDF of each Web page that it ingests.

By default, when you configure the connector to create one or more of these files they are each indexed as the document content of a separate document, alongside the indexed Web page. Alternatively, you can configure the connector to write the files to a folder.

If you write rendered images, PDF files, and thumbnails to a folder the connector adds metadata fields to each document that contain the paths of associated files. The fields are named:

When the connector sends ingest-removes for deleted documents, the connector deletes any rendered images, PDF files and thumbnails associated with those documents.

To render an image, thumbnail, or PDF for each ingested page

  1. Stop the connector and open the configuration file.
  2. Modify your fetch task by adding the following parameters:

    CreateImageRendition To render an image for each ingested page, set this parameter to true.
    CreateThumbnailRendition To render a thumbnail image for each ingested page, set this parameter to true.
    CreatePDFRendition To render a PDF copy of each ingested page, set this parameter to true.
    RenditionsFilePath The path of the folder to write rendered images, PDF files, and thumbnails to. The folder must already exist, and the user running the connector must have permission to write files to the folder. If you don't set this parameter, the connector indexes each file as the content of a separate document.
    FullPageRender Specifies whether images and thumbnail images show the full page (true), or only the top part of the page (false), that you would see when viewing the page in a web browser.
    RenditionImageFormat The image format for images and thumbnail images.
    RenditionImageQuality The image quality for JPEG and PNG images and thumbnail images. Specify an integer value from 0 to 100, where lower values represent higher compression (usually resulting in a smaller file size), and higher values represent higher quality.
    ThumbnailRenditionWidth The maximum width for thumbnail images, in pixels.
    ThumbnailRenditionHeight The maximum height for thumbnail images, in pixels.
  3. Save and close the configuration file.

Example

The following example renders a thumbnail for each page, as a PNG image that has a maximum width of 350 pixels:

[MyTask]
Url=http://www.autonomy.com
...
CreateThumbnailRendition=true
RenditionImageFormat=png
ThumbnailRenditionWidth=350
FullPageRender=FALSE

_HP_HTML5_bannerTitle.htm