N-Dimensional Brushing and XmdvTool

by Allen Martin

What is a brush?

Definition #1 - Something you paint with.

When you use a paintbrush to paint an area you can think of this as using the brush to define a region of interest. The area that you paint changes color to reflect the fact that you have selected it.

Definition #2 - Something you use to select data points.

In a visualization system, data points map to some type of graphical entity on the screen. Interesting structures in the visualization of data often correspond to interesting relationships in the data. One of the key theories of data visualization is that the user should have as much control as possible when deciding what data is important rather than have the system make an arbitrary decision using cutoff limits. Brushing an interesting graphical structure is a method of supplying user-feedback to the visualization system about what data is important.

Definition #3 - A region in N-space.

In multi-dimensional visualization, a brush is a region defined in the same dimension as the data. Points that are within the region as said to be contained by the brush.

The standard accepted definition of the term brush in computer visualization is: a graphical entity that contains a subset of the data being visualized and is controlled by the user in a quick, interactive, and intuitive manner. Most of the work in brushing has been done with single or two dimensional brushes, although PRIM-9, one of the first visualization systems that used a brushing concept, had brushes in the same dimension of the data (up to 9 dimensions) [FIS75]. Low dimensional brushes typically work in the display space of the visualization system. Ward studied the concept of high dimensional brushes in his XmdvTool system [WAR94a] [WAR94b]. These brushes allow the user more power in specifying what data should be covered by the brush.

Brush Characteristics

The characteristics of a brush define the way that it interacts with the data. In particular the characteristics of a brush define what data is contained by the brush.

Shape

Hyperbox

A hyberbox (or N-dimensional rectangular parallelepiped) is an orthogonal region in N space. It can be defined by a min/max or center/size in each dimension. A point lies within the hyberbox if its value in each dimension lies within the bounds for that dimension.

Hyperelipse

A hyperelipse is an elliptical region defined in N space. It is specified by a central point and a radius in each dimension. The test for containment is done by computing a radial distance from the center point.

Boundary

In traditional visualization systems that allow brushing, points can lie inside or outside the brush, but not in between. By varying the brush boundary, it is possible to have points that are partially covered by the brush. Using this coupled with a brush operation such as highlighting controlled by amount of coverage allows the user to quickly see not only points that are covered by a brush, but points that are near a brush.

Step edge

The step edge boundary is the standard brush boundary used in visualization systems. A point is either inside or outside the brush, resulting in a discrete decision at each boundary.

Ramp

With a ramped brush boundary containment falls off linearly from the start of the boundary to the end of the boundary. A point in the center of the boundary would have 1/2 containment.

Other

Any normalized function of one variable can be used as a brush boundary. Examples could be logarithmic or Gaussian drop-off.

Brush Operations

Points selected by the brush area a subset of the entire data set. Any numerical or visualization operations that can be performed on the entire data set can be performed on the data selected by the brush.

Highlighting

Highlighting is one of the most fundamental brush operations. Points that are contained by the brush are colored differently from other points to make them stand out.

Masking

Masking is the ability to have points not be displayed by the system. Either points covered by the brush, or points not covered by the brush may be masked. One of the problems of many visualization systems is cluttering. This is what happens when there are too many things being displayed at the same time, and they tend to obscure each other. By using masking it is possible to remove data that is uninteresting to allow the user to better concentrate on data that is important. Becker and Cleveland identified highlighting and masking as the two fundamental brush operations when they explored the principals of brushing [BEC87].

Moving Average

Haslett et al. used a graphical method to display the average of points currently contained by a brush [HAS91]. This average can be added to the display as simply another data point.

Multiple Brush Operations

By allowing the coexistence of multiple brushes it is possible to specify binary or even N-ary brush operations. This gives the user more control in specifying regions of interest. For example, by specifying the union of two brushes, a disjoint region in N space can be created.

Linking

Linking is the ability to select data in one display and see the same data selected in another display. This is useful when multiple methods of visualization are being used in conjunction. Becker and Cleveland [BEC87], Haslett et al. [HAS91], and Ward [WAR94a] have all used linking in visualization systems.

XmdvTool

XmdvTool stands for "X11 Multi-Dimensional Visualization Tool" and was developed by Ward [WAR94a] as a platform for combining different methods of visualizing high dimensional data. It is composed of four visualization techniques: scatterplots, glyphs, parallel coordinates, and dimensional stacking.

Scatterplots

Scatterplots are a widely used method of visualizing high dimensional data. Each pair of dimensions is used to create a parallel projection on a standard xy plot. All these plots are displayed together in a large grid.

Here is a screenshot showing the XmdvTool scatterplot display.

Glyphs

In computer visualization the term glyph is used to describe any graphical entity that data points map to. The various attributes of the glyph such as size, shape, and color are controlled by the values of the data point in different dimensions. The type of glyph used in XmdvTool are star glyphs. The value along each of the dimensions maps to a radial axis protruding from a central point. For each data point, the value along an axis creates a "spoke" in a different direction. Finally, a line connecting the ends of each of these spokes is drawn.

Here is a screenshot showing the XmdvTool glyph display.

Parallel Coordinates

In the parallel coordinates display method, each dimension corresponds to one of a set of uniformly spaced vertical axes. A data point maps to a set of points along each axis. The point is displayed by drawing a polyline across all of these axes, connecting the points.

Here is a screenshot showing the XmdvTool parallel coordinate display.

Dimensional Stacking

Dimensional stacking is a hierarchical visualization technique where dimensions are embedded within other dimensions. The highest order dimensions divide the display into rectangular regions. Within each of these regions, the next highest order dimensions divide the display into further rectangular regions, and so on. For more information on this display technique, refer to [LEB91] and [TIP93].

Here is a screenshot showing the XmdvTool dimensional stacking display.

=====================================================================

References

[BEC87]: Becker, Richard A., and Cleveland, William S. (1987). "Brushing Scatterplots," Technometrics, vol. 29, no. 2. pp 127-142.
[FIS75]: Fisherkeller, Mary A., Friedman, Jerome H., and Tukey, John W., (1975). "PRIM-9: An Interactive Multidimensional Data Display and Analysis System," Dynamic Graphics for Statistics, W. S. Cleveland and M. E. McGill, eds., pp 91-109. Pacific Grove, CA: Wadsworth & Brooks/Cole.
[HAS91]: Haslett, J., Bradley, R., Craig, P., Unwin, A., and Wills, G. (1991). "Dynamic graphics for exploring spatial data with application to locating global and local anomalies," Statistical Computing, vol. 45, no. 3. pp 234-242.
[LEB91]: LeBlanc, J., (1991). "N-Land: A Visualization Tool for N-Dimensional Data," Master's Thesis, WPI .
[TIP93]: Tipnis, R. (1993). "Visualization of High Dimensional Data Sets: Enhancements to N-Land," Master's Thesis, WPI .
[WAR94a]: Ward, Matthew O. (1994). "XmdvTool: Integrating Multiple Methods for Visualizing Multivariate Data", Proceedings of Visualization '94.
[WAR94b]: Ward, Matthew O. (1994). "N-Dimensional Brushes: Gaining Insights into Relationships in N-D", Working Document

=====================================================================

=====================================================================

Allen Martin / amartin@cs.wpi.edu