Data Express for distributed data (both in the configuration with MVS knowledge base and in the configuration with XDB knowledge
base) allows the capability of analyzing data content by using the sampling feature.
This feature helps in understanding the data element content and in assigning it to the appropriate class. The sampling process
provides two types of results:
-
Standard sampling : it is a list of values with corresponding occurrences of the data elements selected for the process. This result can easily
be checked via the user interface by human intervention.
-
Compressed sampling : this is a statistical distribution of the values of the data elements selected for the process, represented in a grid with
the first character and number of characters of each value contained in the data element. The result is designed to be used
by an algorithm based on class assignment by analogy (data elements having the same class are supposed to have the same statistical
distribution of values).
The result of the sampling can be used to make class assignment in two different ways:
- Manual assignment can be done by checking the standard sampling result and then by assigning classes to data elements through
the "Work with data elements" feature. Note that the user interface is the same one of Data Express for MVS
- Using assignment from sampling result. In this case, compressed sampling results are used. This process is further composed
of two separate parts:
- The requirement is defining one or more prototypes for each specific class, this happens in the "Work with data elements"
feature by dragging a "Selected data element attribute" to a "Class sample". That is associating to a class the compressed
sampling of a data element containing the same class.
- Then an interactive process starts, and this process assigns the class to all data elements included in the parameters of
the interactive process itself, basing the class assignment on the similarity of the compressed sampling results. This process
compares the compressed sampling of the class to the compressed sampling of the data elements included in the parameters of
the process itself, and if they fit it, assigns the class to the corresponding data element. Note that Data Express for MVS
uses a similar but different one, based on the submission of a batch job from "Work with jobs" feature.
The sampling process will be described more in detail in the "Using Distributed Sampling" chapter, and it will be object
of a tutorial described in the present manual.
The standard sampling and compressed sampling result verification is described in details in the "Work with data store" chapter
of the Front End User Guide.
The manual class assignment and the prototype definition are described in details in the "Work with data elements" chapter
of the Front End User Guide.
The assignment from sampling result is described in details in the present manual, in paragraph "Sampling result" of "Importing
classes" chapter.