Monitor the Status of IDOL Components

You can submit standard HTTP requests to the IDOL ACI and Service ports to get information about the status of your IDOL components, whether individual components are running, and so on. You can use this information for manual analysis, or to configure automated monitoring tools such as Nagios, which can send and check HTTP responses using the check_http plugin (for more information, refer to the Nagios documentation).

In addition to submitting actions, you can also use the Overview tab on the Status page of IDOL Admin to view information about the status and availability of your components.

You can also use statistics to monitor your components.

Availability Checks

You can use the following methods to check the availability of your components.

Check That a Component is Running

To check that a component is running, submit an /action=GetStatus request to the service port of the component. If the component is running, the response body includes <RESPONSE>Running</RESPONSE>.

Check That an ACI Port is Responding

To check that an IDOL component’s ACI port is responding, and that the component has threads available to handle incoming ACI requests, send an /action=GetPID request. If the ACI port is responding normally, the response body includes <response>SUCCESS</response>.

NOTE: This action is available on all ACI servers, but is particularly relevant for components with ACI threads that are processing long-running requests (for example, Content or DAH).

The advantage of this method is that it has very little impact on processing time, performance impact, and network overhead.

Monitor ACI Thread Status From the Service Port

To check the status of ACI threads, use the ACIThreadStatus service port action. The action returns information on the current state of all ACI threads, even if the ACI port is unavailable (for example, if all threads are servicing long-running actions).

TIP: You can use this action to detect and flag up long-running queries in real time.

The action returns the following information:

  • Whether a thread is in progress (that is, currently servicing a request).

  • Whether a thread is idle.

  • The number of accepted connections (ACI requests which have been accepted, but cannot yet be processed because no ACI thread is available to handle them).

  • In the case of active threads, the action that is being processed, and the currently elapsed time.

Check That the Index Port is Responding

This method is relevant to components with an index port (for example, Content and DIH).

To check that the index port can be contacted and is responding to incoming requests, send the /PING? request to the index port. If the index port is running normally, this returns the string PING SUCCESS.

Check That Children are Responding

This method is relevant to components with children (for example, IDOL Proxy, DIH, and DAH).

To check that child components are running normally, submit the ACI GetStatus action. The response contains the following information on the current state of the child components:

  • IDOL Proxy: The component/componentname/status response is RUNNING if the child component is running normally.

    TIP: Because the Proxy component allocates port numbers to its children, it might be easier to monitor their state by using the Proxy, rather than by contacting the children directly.

  • DIH: The engines/engine/status response is UP if the engine is running normally. The response reports the last-known state of the engine (that is, it might be cached, with a maximum age dependent on the configured ping time).

  • DAH: The engines/engine/status response is ONLINE if the engine is running and queryable. DAH collects current information from its children when it receives a GetStatus request, so child statuses are current; however, if one or more children are down, this might incur a timeout penalty when attempting to connect.

Check the Status of the Content Component

The IDOL Content component’s GetStatus action provides specific information that can provide an indication of the status of the component. For example:

  • If the number of documents is much smaller than the number of committed_documents, you might need to compact the engine.

  • If the number of total_terms increases suddenly, or is significantly higher than in comparable engines, this might indicate the presence of documents containing many “junk” terms. This might be because the documents have been incorrectly extracted, or because the OCR process has been unsuccessful. (You can use the TermGetAll action to investigate terms with unusually low or high document occurrences.)

  • If the number of fieldcodes increases suddenly, or is significantly higher than in comparable engines, this might indicate the presence of documents containing many “junk” metadata fields.

  • Unexpected values for mindatestring or maxdatestring might indicate that the engine’s DateFormatCSVs parameter is misconfigured, or that incoming documents lack date information, or contain dates and times in an unexpected format.

  • You can check the timestamp and outcome of the last validation run on various indexes, as well as the timestamp for the last run-time error (that is, any error returned from that index during the course of a query or index command).