The WHERE Constraint

The Acu4GL WHERE constraint is an external variable that gives the developer some control over the data searches that result from database queries. It can help to improve performance in some situations. This topic describes purpose of the WHERE constraint and shows how it is implemented.

COBOL data processing is based on keyed READ operations following a positioning operation. Records are read until the key value being processed changes. Because traditional COBOL data processing is based on a B+ tree file system, the overhead for such operations is relatively minor.

RDBMS data processing introduces a new level of complexity to data processing. The database's query optimizer receives the SQL query for the COBOL operation being performed and then builds a working set of data that satisfies that query. Because the database optimizer has many different possible execution methods, this can result in poor performance if the optimizer chooses a query execution path that is less than optimal.

Performance degradation may also result from the fact that queries generated by COBOL operations result in unbounded index queries. Unbounded queries are generated because COBOL positioning operations (Start Not Less Than and Start Not Greater Than) provide only one of the bounding conditions for the working set, instead of both an upper and lower boundary.

As an example, consider the case in which an application needs to process all items in a warehouse on aisle 17, shelf 8, and bin 2. If each of these items is a field in a key, the COBOL program might generate the following query for a READ operation:

Note: The following example applies to the Acu4GL for Oracle product. The SQL generated differs based on the interface.
 
SELECT * FROM warehouse_items WHERE 
   aisle = 17 and 
   shelf = 8 and 
   bin = 2 
ORDER BY aisle ASC, shelf ASC, bin ASC;

This query achieves the desired result but has one problem. For the COBOL program to end its processing, it must read a record that has a new value for bin. The COBOL application has no way of specifying an upper boundary for the read operation, so when all rows of data from bin 2 have been read, the Connector attempts to read the next record by generating the following query:

SELECT * FROM warehouse_items WHERE 
   aisle = 17 and 
   shelf = 8 and 
   bin > 2 
ORDER BY aisle ASC, shelf ASC, bin ASC;

This query causes the database query optimizer to gather all records pertaining to items on the remainder of shelf 8 to build its working set. This is excessive from the COBOL application's point of view, because the COBOL program needs only the first record of the working set to determine that it has finished processing.

This problem can be even more serious if the application is processing the last bin on a shelf. Because there are no more bins on that shelf, the query would drop down a level and generate the following:

SELECT * FROM warehouse_items WHERE 
   aisle = 17 and 
   shelf > 8 
ORDER BY aisle ASC, shelf ASC, bin ASC;

This would select all items on the remainder of that aisle of the warehouse, which could be a very large working set if each aisle had 130 shelves.

The database query optimizer normally does not build the entire working set if it has been properly tuned, but instead performs INDEXED READS to process the query. This means that the query optimizer traverses an index tree to retrieve the records, much as COBOL index files do, as opposed to using combinations of indexes and sort and merge operations.

It can be helpful to influence precisely which information is to be returned. If the precise scope of the record processing performed by the read operations is known at compile time (or before the query is executed), information retrieval can be more finely tuned.

Acu4GL provides a method by which the COBOL programmer can provide additional information to the database query optimizer by providing more specific selection information to the Connector. This selection information is added to the WHERE clause of the SQL queries generated by the Connector. This can be particularly useful in providing upper boundaries to queries being generated, with the result that the working set is smaller.

Upper boundaries can be provided on the key segments for a select, or any other selection criteria needed to constrain the working set to just the desired subset of rows. This additional information is added to generated queries with the AND condition. It is not possible for the application developer to specify a larger working set than would otherwise have resulted. The developer may only constrain the working set to a smaller subset.