StripQueryParametersRegex

A regular expression to match query parameters that you want to remove from URLs.

You can set this parameter to prevent multiple documents being ingested for the same content. For example, when crawling a web site, the connector might find the following links:

http://www.example.com/page.php?parameter=something
http://www.example.com/page.php?parameter=something-else

The connector must ingest these as separate pages, because the web server could use the query parameter value to customize the page content. However, if the query parameter does not affect the page content, you could set StripQueryParametersRegex=parameter so that the connector ingests a single document for http://www.example.com/page.php.

Type: Regular expression
Default:  
Required: No
Configuration Section: TaskName or FetchTasks
Example: StripQueryParametersRegex=parameter|ignore
See Also: StripFragmentIdentifiers