CantHaveCheck
Specifies where the strings defined using the parameter CantHaveCSVs must not appear. If the connector finds any of the strings in any of the places specified, the page is discarded.
This parameter accepts a bit-wise mask number that you can create by adding together some of the following numbers as appropriate.
URL: 1 | The connector checks the page URL. If the URL contains any of the strings specified by CantHaveCSVs, the connector discards the page. |
Page header: 4 | The connector checks the HTML <HEAD> tag of a page. If the element contains any of the strings specified by CantHaveCSVs, the connector discards the page. |
Page content: 8 | The connector checks the content of the page. If the content contains any of the strings specified by CantHaveCSVs, the connector discards the page. |
Case insensitive: 64 | Add 64 to the value to allow case-insensitive matches. Note that you must also specify which part(s) of the page to check. |
Before download: 128 | If you add 128 to the CantHaveCheck value, the connector determines whether a page contains any of the strings specified in the parameter CantHaveCSVs before it downloads it. If a page contains any of the CantHaveCSVs strings, it is not downloaded. |
Spider check cache URL: 256 | If you enter 256 , the connector checks previously retrieved URLs from the spider structure cache whenever CantHaveCSVs is modified to determine whether the URLs have changed. If a URL fails this check, it is deleted. |
Valid site structure: 512 | If you enter 512 , the connector rechecks the CantHaveCSVs values for the site to ensure the site is still valid before it updates it. If you do not include this setting, then changes to these values are never checked. If the site is not valid, it is not downloaded. |
Spider strip content: 1024 | If you enter 1024 , the connector unescapes any HTML entities in downloaded pages. This can affect other functionality, for example if the date format of a page contains HTML entities, these are removed before a date check is performed. |
Spider Check Content Type: 2048 | The connector checks the content type of the page for the strings specified in CantHaveCSVs before downloading it. If the content type contains any of the CantHaveCSVs strings, it is not downloaded. |
If you enter 0
, the connector does not check for CantHaveCSVs
.
Type: | Integer |
Default: | 4 |
Required: | No |
Configuration Section: | TaskName or Default |
Example: | CantHaveCheck=77
|
See Also: |