The default IDOL Server unified installation includes all the main HPE IDOL components, and allows you to set up a small index and query your documents. However, most real-world uses of HPE IDOL outgrow this scenario fairly rapidly. Keeping all documents in a single instance of HPE IDOL Server becomes unsustainable in a growing system.
As the amount of data increases, so does the volume of the HPE IDOL Server index on disk. If you keep on indexing data, it might use all the available disk space on the machine. In practice, HPE IDOL Server requires a certain amount of free space to continue normal operations, so it automatically stops indexing more documents before it reaches this point.
In normal operation, HPE IDOL Server holds speed-critical components of the index in memory. As the index size increases, so does the amount of physical memory required. At this point, the process swaps out its virtual memory (for example, to a page file), which has a corresponding impact on performance. As the process memory usage increases, the amount of free memory available to the operating system to cache frequently-accessed data from disk also decreases, with a further impact on performance.
Query time tends to increase as the number of documents and the number of term occurrences (the volume of data to search) increases. When you add documents, HPE IDOL Server must merge new data with the existing index, so indexing performance can also decrease as the HPE IDOL Server index grows.
You can distribute an HPE IDOL index by running multiple instances of the Content component, each of which indexes a different subset of documents. Each instance is a self-contained index, containing a fraction of the complete body of documents. At query time, results are collated to combine search across the combined set of indexed documents.
The individual indexes are generally spread across multiple physical servers, allowing the disk, memory, and processor footprint of the combined index to expand beyond the limitations of a single machine. Even on a single machine, it is common practice to spread documents across several smaller copies of the Content component, rather than a single large instance. This approach can still improve performance, because the individual instances can work in parallel on the sub-indexes, bringing back results more quickly than a single large HPE IDOL Server instance.
In addition to splitting data between servers, a distributed index can hold multiple copies of a document in different servers, for purposes of load balancing, failover, or disaster recovery.
Next: Scale IDOL: Distributed Architecture
|