Scale IDOL: Distributed Architecture
The Distributed Index Handler (DIH) can automatically send each server a subset of incoming documents. The Distributed Action Handler (DAH) performs query distribution and result collation.
See Also: Architecture Considerations
Configure Indexing (DIH)
For information about configuring the DIH for indexing, and how to decide on a distribution method, see Distributed Index Handler.
The DIH can distribute index actions to any component with an index port, including other DIHs. A system with multiple levels of DIH is generally referred to as a tiered architecture. A tiered indexing structure adds overhead and complexity, and is often unnecessary. The following points describe why you might consider a tiered example, and why it might not be necessary:
-
A tiered architecture can reduce the number of child servers for any one DIH. A single DIH can distribute to several hundred child servers, so it is usually sufficient for all but the largest distributed IDOL deployments. However, depending on your deployment, you might find that two levels of DIH is more manageable than a single DIH with hundreds of child servers.
-
You can use a tiered architecture to mix distribution and mirroring. However, you can also use server groups functionality to set up a single DIH to distribute and mirror content, which often eliminates the need for a tier of additional DIHs.
-
In a tiered architecture, you can mix different distribution modes.
CAUTION: An architecture with multiple distribution modes usually inherits the disadvantages of all modes, and in many cases different modes are incompatible. In general, Micro Focus does not recommend this approach.
-
A tiered architecture might reduce network traffic in modes where the full IDX or XML document is sent to all child servers (that is, simple distribution mode). In this case, the parent DIH might distribute one copy of the document to each physical machine, and a DIH on each machine can distribute the documents between multiple IDOL Server instances on the machine. Alternatively, you can also often reduce the network traffic by using a different distribution method.
Using a tiered architecture increases the overhead of index status polling, because an IndexerGetStatus
request can cascade down from a parent DIH to its child servers, and its children’s child servers. To avoid this, Micro Focus recommends configuring the following option for all DIHs below the top level:
[Server] IndexerGetStatusPolling=False
You can optionally override this configuration manually for each IndexerGetStatus
action by setting the PollChildren
action parameter to True
.
Configure Querying (DAH)
For information about configuring the DAH for distributed querying, and how to decide on a distribution method, see Distributed Action Handler.
Generally, the broad scheme of DAH configuration is determined by the distribution method chosen in the DIH. In most cases, Micro Focus recommends using simple combinator mode to collate results from distributed servers.
As with indexing, you can create a tiered architecture of DAHs. Unlike DIH, which forwards a single index action to multiple children, DAH must receive multiple query responses and combine them into one. This step can incur a performance impact as the number of children increases, because DAH must hold a sufficient number of hits from each child server in its memory to ensure that it returns the correct documents to the user.
Consider forwarding a request for 10,000 hits to 100 children. The DAH might end up receiving, sorting, and combining up to 1,000,000 candidate documents.
Often the most logical way to organize query distribution is to configure a DAH on each physical server, reporting to a top-level DAH. This child DAH sever pre-combines hits from all Content indexes on the server, ensuring that only one results set per machine is sent over the network. For deployments with a large number of physical servers, additional levels of DAH combining hits from groups of boxes might also be required.
TIP: From DAH 10.3.0, a DAH in simple combinator mode supports server groups in the same way as DIH, which allows you to eliminate a tier of DAH used purely for mirroring.
Configure IDOL Content
In the Content component, you do not need to set any special configuration parameters to specify that the component is part of a distributed system. However, Micro Focus strongly recommends that all child servers in the deployment use the same configuration files. Using a setup where different child servers have different settings might have unexpected or unpredictable results, particularly for field types and language configurations. The details of the IDOL Server configuration also play a large part in sizing the deployment.
When you run multiple Content components on one physical server, you might need to consider the impact on file system I/O. If all servers share storage on the same disk, the throughput for a single server is often less than if each server stored its index on a dedicated drive. These effects are particularly noticeable if multiple processes attempt to write data simultaneously. When servers share the same local storage media, you might need to configure a flush lock file for that drive:
[Server] // This file must be the same for all engines with data on C: FlushLockFile=C:/IDOL/Lockfile
Only the process that holds the lock on the flush lock file can write data to the disk.
NOTE: The flush lock file is intended only for use on local storage devices, and not networks.
For more information, see The Index Flush Process.
Micro Focus generally recommends that the servers in a distributed architecture are simple instances of the Content component, without IDOL Proxy or other components. The Content component does not require any other components for indexing documents.
Similarly, Micro Focus recommends that any intermediate DAH and DIH servers in a tiered architecture are plain component instances, rather than Distributed IDOL installations. You might want to configure IDOL Proxy with DAH and DIH at the top level of the architecture to provide a single access for your system. However, this option adds an additional HTTP step for all connections, and modifying your client applications to cope with separate DAH (query component ) and DIH (index component) ACI ports is preferable, where this is practical.
Next: Size IDOL