Abstract
Obstacles hindering scientific research may be broadly categorized
into two separate but overlapping groups. The first category, concerned
mainly with issues of throughput, includes the challenges inherent in
the efficient management and visualization of large-scale datasets.
The second category includes difficulties innate to the task of
gaining insight from datasets of high complexity.
Query-driven visualization (QDV) is well suited for performing
analysis and visualization on datasets which are both large and
highly complex. Tools like FastBit leverage highly efficient
(in both terms of speed and compression) data management
techniques to rapidly identify and visualize regions of interest
within a dataset. These regions are user specified and take the
form of boolean range queries. As such regions tend to be a
smaller subsets of the original dataset, time and effort spent on
analysis, visualization and interpretation are significantly reduced.
Contribution
The information provided by the generated solution of a boolean
range query helps to define spatial regions where specifically
defined events occur. Beyond indicating these regions, however, the
solution is a black box providing no additional information. Origins
and directions of entropy change for chemical reactions, gradient
directions and locations of flame fronts, vortex cores, etc. are all
examples of phenomenon which are broadly characterizable through
boolean range queries, but little is understood of the interactions
and behavior which lie in the domain of these characterizations.
In such phenomenon, it is the behavioral trends between variables, or
groups of variables, which are more important in providing insight
than the traits of individual variables. Thus, the challenge is to
identify these behavioral trends and utilize them to construct
coherent and meaningful visualizations which convey information
about the phenomenon of interest.
The novel contributions of this work are new techniques which
extend the capabilities of QDV by providing intuitive insight in
determining:
- how relationships between variables interact to generate the
phenomenon of interest in complex datasets, and
- what role other variables play in creating/altering
these interactions.
Approach
We utilize the cumulative distribution functions (CDF) generated
by the solution of a given query. The CDF of a query is an
n-dimensional field where each of dimension corresponds to one of
the n variables in the query. Each of the n fields indicates the
population of data from a given variable that satisfies a particular query.
Succinctly, the CDF of the query is an aggregate of 1-D histograms
(one for each variable).
In QDV, the solution set for a query is a list of records which
satisfy a set of variable dependent range-restrictive conditions. The
CDFs for these variables are formed by integrating over the solution
space and accumulating the values given by the respective
functional mappings independently as a histogram. Examining the
CDFs of the query's variables reveals initial information about
statistical regions of interest.
We extend this analysis further to reveal trends between a query's
variables by defining correlation fields between pairs of variables.
These mappings exist both for variables expressed in the query and
those excluded from the query. The correlation field created by any
particular pair of variables is used in conjunction with the CDFs
of each of the query’s variables to reveal, both visually and
statistically, trends of behavior and interaction between the variables
defined in the given query.
Results
We apply our approach to a dataset that models turbulent combustion for
a methane based V-flame (see Figure 1). The simulated dataset, consisting
of 38 variables, is generated from the DRM-19 subset of the GRI-Mech 1.2
methane mechanism for chemical kinetics. This mechanism models the
combustion behavior of methane by considering 20 chemical species and 84
fundamental reactions. Our goal is to provide insight into the interactions
between these various species.
The following image depicts the iso-surface of temperature at increasing
values. The iso-surface is colored by the correlation space constructed from
the variables oxygen and ethylene. As the temperature increases (left to right,
top to bottom) we observe a full spectrum color change from red to blue. Here
red regions indicate regions of high positive correlation between oxygen and
ethylene and blue regions indicate areas of strong negative correlation. The green
regions indicate independence between the two variables. Green regions also
indicate regions of increased entropy where flame-front regions exist.

This image depicts a cut-away view of the third time step
from the image above where temperature has been rendered through a correlation
field constructed from oxygen and ethylene. Here the iso-surface of temperature
(in green) is shown to "thread" the highest iso-surface values for ethylene
(in blue). This iso-volume formed by temperature is the region where the flame-front regions exist.
