This section describes how to troubleshoot common problems that might occur when you set up the HTTP Connector.
If the connector cannot connect to the Web site that you want to index, check whether the connector machine is behind a proxy server. If this is the case, use the configuration parameters ProxyHost
and ProxyPort
(or ProxyFromLua
) to specify the host name or IP address, and port, of the proxy server.
If pages are not indexed, set the configuration parameter LogVerbose=true
. You can then view the synchronize
log file to see the links that are extracted from pages. Check your configuration to ensure that it does not exclude the pages that you want to index. The connector cannot parse Javascript, so any links contained in Javascript are not found by the connector and those pages are not indexed.
Some Web sites require visitors, and therefore the connector, to log on before they can retrieve content. You must set the LoginMethod
configuration parameter and provide credentials in the connector’s configuration file.
To determine the correct method to use to log in to a Web site, you can:
If you configure the connector to log on to a Web site by submitting a form, ensure that the connector submits all of the required fields.
You might see this error if the system has allocated all available TCP ports.
Operating systems typically allocate a limited number of ports to applications that need to make outbound connections. Also, when a connection is closed, the operating system waits before a port is released and can be reused.
If you encounter this issue on a Windows system, you can set the Windows registry parameter MaxUserPort
, so that more ports are available for use. You could also shorten the amount of time that the operating system waits before releasing a port by setting the registry parameter TcpTimedWaitDelay
. These are both set in:
HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Services\Tcpip\Parameters
|