MinPageDate

The MinPageDate parameter filters the pages that are ingested by date. The connector only ingests pages that are newer than the specified date. Older pages are not ingested, though links from these pages might still be followed, depending on the value of SpiderDateFilteredPages.

Specify the date in one of the following ways:

  • An absolute date in any of the formats specified by DateFormats.
  • A relative date, starting with the character ‘-’ followed by a time duration. The time duration is relative to the time when the task begins, which results in a rolling limit. Relative dates support the units year(s), month(s), week(s), day(s), hour(s), minute(s), and second(s). You can shorten the value by abbreviating the units to their first letter (months abbreviates to m, minutes to n). For example:

       -3months 1week
       -3m 1w

    A single value without a unit is read in seconds, so the following are equivalent:

       -1day
       -86400s
       -86400

    Positive values are accepted if they begin with the character '+', but be aware that positive values represent times in the future.

TIP: With relative dates, items that were previously ingested but no longer meet the limit are removed from the IDOL index. For example, some pages might be ingested when you first synchronize with a Web site. On the next synchronize cycle, any pages that exceed the maximum age are removed from the IDOL index.

To filter pages by date the connector must be able to extract a date from the page URL, page content, or HTTP headers. Configure how to extract the date by setting DateInUrl, PageDateSelector, or PageDateHeader.

If you set the parameter MinPageAge, this parameter is ignored.

Type: Date or relative date (UTC time)
Default:  
Required: No
Configuration Section: TaskName or FetchTasks
Example:

To retrieve pages that were modified on or after 01 June 2015:

MinPageDate=2015-June-01

To retrieve pages that have been modified in the last 30 days:

MinPageDate=-30days

See Also: MaxPageDate