Size IDOL
A major part of the design of a distributed IDOL architecture depends on:
-
how many documents you need to index into each instance of Content.
-
how many content servers must run on each physical server (or equivalently, how many documents in total you need to index on each physical server).
If you know the total number of documents, the total number of physical servers required follows easily from these two points. However, in many cases you do not know the final number of documents, but instead need to scale the system over time. In this case, you must choose a DIH distribution mode that allows you to dynamically add child servers.
The upper limit on the number of documents for a child server is likely to be determined by performance requirements. Multiple smaller servers tend to perform better than a single large one, because they can work in parallel on queries.
The key question is how many documents a server can index while remaining small. This value is tied to the local IDOL configuration, the type of data that you want to index, the type of queries that you will use, and the pattern of indexing and querying. While you can derive very rough estimate figures by comparing with similar systems on similar hardware, the only reliable way to get useful statistics is to test your proposed system. That is, run an instance of IDOL in the configuration you intend to use, on candidate hardware, and monitor performance while you index realistic data and send realistic queries.
When simulating load and assessing performance, you might also want to consider the following points:
-
Do you need to query servers during indexing, or can you set up indexing to occur only during quiet times when query load is low (for example, overnight)?
-
How many queries do you expect IDOL to handle simultaneously?
These statistics can give you a sensible maximum size for a single instance of Content, and also its likely system resource usage (footprint on disk, process memory size). After these values are known, you can determine the key values above. Obviously, total disk usage by all Content servers on a machine cannot exceed the space available. As outlined in the introduction, normal memory usage by all IDOL processes on a server ideally must fit in the machine’s physical RAM, with enough free space remaining for the OS to effectively cache file system data.