SitemapFile

The path to a plain text file that contains a list of pages to ingest. The file must contain one URL on each line.

The connector only ingests pages that are listed in the file; it does not crawl the web by following links. You can set further parameters, including UrlCantHaveRegex and UrlMustHaveRegex, to filter the URLs contained in the file. The connector respects the robot protocol by default, but you can configure this using the parameter FollowRobotProtocol.

The file can be generated manually or by an external process, and can be updated. If a URL is removed from the file, the connector sends an ingest-delete for that page on the next synchronize cycle.

TIP: Web Connector can retrieve information in one of the following ways:

In each case the other parameters are ignored. SitemapUrl has precedence, followed by SitemapFile, followed by Url.

Type: String
Default:  
Required: You must set Url, SitemapUrl, or SitemapFile
Configuration Section: TaskName or FetchTasks
Example: SitemapFile=my-list-of-urls.txt
See Also:

Url

SitemapUrl