StripQueryParametersRegex
A regular expression to match query parameters that you want to remove from URLs.
You can set this parameter to prevent multiple documents being ingested for the same content. For example, when crawling a web site, the connector might find the following links:
http://www.example.com/page.php?parameter=something http://www.example.com/page.php?parameter=something-else
The connector must ingest these as separate pages, because the web server could use the query parameter value to customize the page content. However, if the query parameter does not affect the page content, you could set StripQueryParametersRegex=parameter
so that the connector ingests a single document for http://www.example.com/page.php
.
Type: | Regular expression |
Default: | |
Required: | No |
Configuration Section: | TaskName or FetchTasks |
Example: | StripQueryParametersRegex=parameter|ignore
|
See Also: | StripFragmentIdentifiers |