Log On to a Web Site

Some Web sites require that you log on before you can access content. To retrieve content from these Web sites, configure the connector to log on to the site.

Basic, HTTP Digest, and NTLMv2 Authentication

To log on to a Web site that uses Basic, HTTP Digest, or NTLM version 2 authentication, specify a user name and password by setting the configuration parameters AuthUser and AuthPassword. For example:

[MyTask]
Url=http://www.autonomy.com
AuthUser=user
AuthPassword=pass

Alternatively, you can set the AuthSection parameter and set the AuthUser and AuthPassword parameters in a different section of the configuration file. This means you can use the same user name and password for several tasks:

[MyTask]
Url=http://www.autonomy.com
AuthSection=MyAuthSection

[AnotherTask]
Url=http://www.hp.com
AuthSection=MyAuthSection
			
[MyAuthSection]
AuthUser=user
AuthPassword=pass

If you create a fetch task to crawl more than one site, or you need to specify more than one set of credentials for a site, you can use multiple sections containing authentication details. The AuthSection parameter accepts multiple values. Set the AuthUrlRegex parameter in each section to specify the URLs that the authentication details can be used against. For example:

[MyTask]
Url=http://www.autonomy.com
AuthSection0=LogOnAuthAutonomy 
AuthSection1=LogOnAuthAutonomySubDomain 

[LogOnAuthAutonomy] 
AuthUrlRegex=.*www\.autonomy\.com/.* 
AuthUser=MyAutonomyUsername 
AuthPassword=MyAutonomyP4ssw0rd

[LogOnAuthAutonomySubDomain]
AuthUrlRegex=.*subsite\.autonomy\.com/.* 
AuthUser=MyAutonomySubsiteUsername 
AuthPassword=MyAutonomySubsiteP4ssw0rd

Submit a Form

If the Web site does not use Basic, HTTP Digest or NTLMv2 authentication, the connector might be able to log on by submitting a form.

Configure the connector to submit a form by setting the following configuration parameters:

FormUrlRegex

A regular expression to identify the page that contains the log-on form. The connector does not attempt to submit form data unless the URL of a page matches the regular expression.

InputSelector A list of CSS2 selectors to identify the form fields to populate. Specify the selectors in a comma-separated list or by using numbered parameters.
InputValue The values to use for the form fields specified by the InputSelector parameter. Specify the values in a comma-separated list or by using numbered parameters.
SubmitSelector A CSS2 selector that identifies the form element to use to submit the form.
ValidateFormData A Boolean value that specifies whether the connector attempts to validate the data supplied to complete a form. The connector can validate the data based on the types of the input elements.

You can set these parameters in the [TaskName] section of the configuration file, for example:

[MyTask]
Url=http://www.autonomy.com
FormUrlRegex=.*login\.php
InputSelector0=input[name=username]
InputSelector1=input[name=password] InputValue0=MyUsername
InputValue1=MyP4ssw0rd!
SubmitSelector=input[name=login]

To specify the information in a separate section of the configuration file, set the FormsSection parameter:

[MyTask]
Url=http://www.autonomy.com
FormsSection=LogOnForm

[LogOnForm]
FormUrlRegex=.*login\.php
InputSelector0=input[name=username]
InputSelector1=input[name=password] InputValue0=MyUsername
InputValue1=MyP4ssw0rd!
SubmitSelector=input[name=login]

To submit different forms during a single task, you can create multiple sections containing form settings. The FormsSection parameter accepts multiple values. In each section, use the FormUrlRegex parameter to identify the page that contains the form:

[MyTask]
Url=http://www.autonomy.com
FormsSection0=LogOnFormAutonomy 
FormsSection1=LogOnFormAutonomySubDomain 

[LogOnFormAutonomy] 
FormUrlRegex=.*www\.autonomy\.com/.*login\.php 
InputSelector0=input[name=username]
InputSelector1=input[name=password]
InputValue0=MyAutonomyUsername
InputValue1=MyAutonomyP4ssw0rd!
SubmitSelector=input[name=login]

[LogOnFormAutonomySubDomain] 
FormUrlRegex=.*subsite\.autonomy\.com/.*login\.php 
InputSelector0=input[name=username]
InputSelector1=input[name=password]
InputValue0=MySubsiteUsername
InputValue1=MySubsiteP4ssw0rd!
SubmitSelector=input[name=login]

_HP_HTML5_bannerTitle.htm