MaxPageDate

The MaxPageDate parameter filters the pages that are ingested by date. The connector only ingests pages that are older than the specified date. Newer pages are not ingested, though links from these pages might still be followed, depending on the value of SpiderDateFilteredPages.

Specify the date in one of the following ways:

  • An absolute date in any of the formats specified by DateFormats.
  • A relative date, starting with the character ‘-’ followed by a time duration. The time duration is relative to the time when the task begins, which results in a rolling limit. Relative dates support the units year(s), month(s), week(s), day(s), hour(s), minute(s), and second(s). You can shorten the value by abbreviating the units to their first letter (months abbreviates to m, minutes to n). For example:

       -3months 1week
       -3m 1w

    A single value without a unit is read in seconds, so the following are equivalent:

       -1day
       -86400s
       -86400

    Positive values are accepted if they begin with the character '+', but be aware that positive values represent times in the future.

To filter pages by date the connector must be able to extract a date from the page URL, page content, or HTTP headers. Configure how to extract the date by setting DateInUrl, PageDateSelector, or PageDateHeader.

If you set the parameter MaxPageAge, this parameter is ignored.

Type: Date or relative date (UTC time)
Default:  
Required: No
Configuration Section: TaskName or FetchTasks
Example:

To retrieve pages that have not been modified since 17 February 2015:

MaxPageDate=2015-Feb-17

To ignore pages that have been modified in the last 7 days:

MaxPageDate=-7days
See Also: MinPageDate