Some Web sites require that you log on before you can access content. To retrieve content from these Web sites, configure the connector to log on to the site.
To log on to a Web site that uses Basic, HTTP Digest, or NTLM version 2 authentication, specify a user name and password by setting the configuration parameters AuthUser
and AuthPassword
. For example:
[MyTask] Url=http://www.autonomy.com AuthUser=user AuthPassword=pass
Alternatively, you can set the AuthSection
parameter and set the AuthUser
and AuthPassword
parameters in a different section of the configuration file. This means you can use the same user name and password for several tasks:
[MyTask] Url=http://www.autonomy.com AuthSection=MyAuthSection [AnotherTask] Url=http://www.hp.com AuthSection=MyAuthSection [MyAuthSection] AuthUser=user AuthPassword=pass
If you create a fetch task to crawl more than one site, or you need to specify more than one set of credentials for a site, you can use multiple sections containing authentication details. The AuthSection
parameter accepts multiple values. Set the AuthUrlRegex
parameter in each section to specify the URLs that the authentication details can be used against. For example:
[MyTask] Url=http://www.autonomy.com AuthSection0=LogOnAuthAutonomy AuthSection1=LogOnAuthAutonomySubDomain [LogOnAuthAutonomy] AuthUrlRegex=.*www\.autonomy\.com/.* AuthUser=MyAutonomyUsername AuthPassword=MyAutonomyP4ssw0rd [LogOnAuthAutonomySubDomain] AuthUrlRegex=.*subsite\.autonomy\.com/.* AuthUser=MyAutonomySubsiteUsername AuthPassword=MyAutonomySubsiteP4ssw0rd
If the Web site does not use Basic, HTTP Digest or NTLMv2 authentication, the connector might be able to log on by submitting a form.
Configure the connector to submit a form by setting the following configuration parameters:
FormUrlRegex
|
A regular expression to identify the page that contains the log-on form. The connector does not attempt to submit form data unless the URL of a page matches the regular expression. |
InputSelector
|
A list of CSS2 selectors to identify the form fields to populate. Specify the selectors in a comma-separated list or by using numbered parameters. |
InputValue
|
The values to use for the form fields specified by the InputSelector parameter. Specify the values in a comma-separated list or by using numbered parameters. |
SubmitSelector
|
A CSS2 selector that identifies the form element to use to submit the form. |
ValidateFormData
|
A Boolean value that specifies whether the connector attempts to validate the data supplied to complete a form. The connector can validate the data based on the types of the input elements. |
You can set these parameters in the [TaskName]
section of the configuration file, for example:
[MyTask] Url=http://www.autonomy.com FormUrlRegex=.*login\.php InputSelector0=input[name=username]
InputSelector1=input[name=password]InputValue0=MyUsername
SubmitSelector=input[name=login]
InputValue1=MyP4ssw0rd!
To specify the information in a separate section of the configuration file, set the FormsSection
parameter:
[MyTask] Url=http://www.autonomy.com FormsSection=LogOnForm [LogOnForm] FormUrlRegex=.*login\.php InputSelector0=input[name=username]
InputSelector1=input[name=password]InputValue0=MyUsername
SubmitSelector=input[name=login]
InputValue1=MyP4ssw0rd!
To submit different forms during a single task, you can create multiple sections containing form settings. The FormsSection
parameter accepts multiple values. In each section, use the FormUrlRegex
parameter to identify the page that contains the form:
[MyTask] Url=http://www.autonomy.com FormsSection0=LogOnFormAutonomy FormsSection1=LogOnFormAutonomySubDomain [LogOnFormAutonomy] FormUrlRegex=.*www\.autonomy\.com/.*login\.php InputSelector0=input[name=username] InputSelector1=input[name=password] InputValue0=MyAutonomyUsername InputValue1=MyAutonomyP4ssw0rd! SubmitSelector=input[name=login] [LogOnFormAutonomySubDomain] FormUrlRegex=.*subsite\.autonomy\.com/.*login\.php InputSelector0=input[name=username] InputSelector1=input[name=password] InputValue0=MySubsiteUsername InputValue1=MySubsiteP4ssw0rd! SubmitSelector=input[name=login]
|