MustHaveCheck

Specifies where the strings defined using the parameter MustHaveCSVs must appear. Unless the connector finds at least one of the strings in all of the places specified, the page is discarded.

This parameter accepts a bit-wise mask number that you can create by adding together some of the following numbers as appropriate.

URL: 1 The connector checks the page URL. Unless the URL contains at least one of the strings specified by MustHaveCSVs, the connector discards the page.
Page header: 4 The connector checks the HTML <HEAD> tag of a page. Unless the element contains at least one of the strings specified by MustHaveCSVs, the connector discards the page.
Page content: 8 The connector checks the content of the page. Unless the page content contains at least one of the strings specified by MustHaveCSVs, the connector discards the page.
Case insensitive: 64 Add 64 to the value to allow case-insensitive matches. Note that you must also specify which part(s) of the page to check.
Before download: 128

If you add 128 to the value, the connector determines whether a page contains any of the strings specified in the parameter MustHaveCSVs before it downloads it. Unless a page contains any of the MustHaveCSVs strings, it is not downloaded.

Note that you must also specify which part(s) of the page the connector should check.

Spider check cache URL: 256

If you enter 256, the connector checks previously retrieved URLs from the spider structure cache whenever MustHaveCSVs is modified to determine whether the URLs have changed. If a URL fails this check, it is deleted.

Note that if you specify 256, you must also specify 1 (URL).

Valid site structure: 512 If you enter 512, the connector rechecks the MustHaveCSVs values for the site to ensure the site is still valid before it updates it. If you do not include this setting, then changes to these values are never checked. If the site is not valid, it is not downloaded.
Spider strip content: 1024 If you enter 1024, the connector unescapes any HTML entities in downloaded pages. This can affect other functionality, for example if the date format of a page contains HTML entities, these are removed before a date check is performed.
Spider check content type: 2048 The connector checks the content type of the page for the strings specified in MustHaveCSVs before downloading it. Unless the content type contains any of the MustHaveCSVs strings, it is not downloaded.

If you enter 0, the connector does not check for MustHaveCSVs.

Type: Integer
Default: 0
Required: No
Configuration Section: TaskName or Default
Example: MustHaveCheck=77
See Also:

MustHaveCSVs

_FT_HTML5_bannerTitle.htm