by Allen Martin
When you use a paintbrush to paint an area you can think of this as using the brush to define a region of interest. The area that you paint changes color to reflect the fact that you have selected it.
Definition #2 - Something you use to select data points.
In a visualization system, data points map to some type of graphical entity on the screen. Interesting structures in the visualization of data often correspond to interesting relationships in the data. One of the key theories of data visualization is that the user should have as much control as possible when deciding what data is important rather than have the system make an arbitrary decision using cutoff limits. Brushing an interesting graphical structure is a method of supplying user-feedback to the visualization system about what data is important.
Definition #3 - A region in N-space.
In multi-dimensional visualization, a brush is a region defined in the same dimension as the data. Points that are within the region as said to be contained by the brush.
The standard accepted definition of the term brush in computer visualization is: a graphical entity that contains a subset of the data being visualized and is controlled by the user in a quick, interactive, and intuitive manner. Most of the work in brushing has been done with single or two dimensional brushes, although PRIM-9, one of the first visualization systems that used a brushing concept, had brushes in the same dimension of the data (up to 9 dimensions) [FIS75]. Low dimensional brushes typically work in the display space of the visualization system. Ward studied the concept of high dimensional brushes in his XmdvTool system [WAR94a] [WAR94b]. These brushes allow the user more power in specifying what data should be covered by the brush.
A hyberbox (or N-dimensional rectangular parallelepiped) is an orthogonal region in N space. It can be defined by a min/max or center/size in each dimension. A point lies within the hyberbox if its value in each dimension lies within the bounds for that dimension.
A hyperelipse is an elliptical region defined in N space. It is specified by a central point and a radius in each dimension. The test for containment is done by computing a radial distance from the center point.
The step edge boundary is the standard brush boundary used in visualization systems. A point is either inside or outside the brush, resulting in a discrete decision at each boundary.
With a ramped brush boundary containment falls off linearly from the start of the boundary to the end of the boundary. A point in the center of the boundary would have 1/2 containment.
Any normalized function of one variable can be used as a brush boundary. Examples could be logarithmic or Gaussian drop-off.
Highlighting is one of the most fundamental brush operations. Points that are contained by the brush are colored differently from other points to make them stand out.
Masking is the ability to have points not be displayed by the system. Either points covered by the brush, or points not covered by the brush may be masked. One of the problems of many visualization systems is cluttering. This is what happens when there are too many things being displayed at the same time, and they tend to obscure each other. By using masking it is possible to remove data that is uninteresting to allow the user to better concentrate on data that is important. Becker and Cleveland identified highlighting and masking as the two fundamental brush operations when they explored the principals of brushing [BEC87].
Haslett et al. used a graphical method to display the average of points currently contained by a brush [HAS91]. This average can be added to the display as simply another data point.
Multiple Brush Operations
By allowing the coexistence of multiple brushes it is possible to specify binary or even N-ary brush operations. This gives the user more control in specifying regions of interest. For example, by specifying the union of two brushes, a disjoint region in N space can be created.
Scatterplots are a widely used method of visualizing high dimensional data. Each pair of dimensions is used to create a parallel projection on a standard xy plot. All these plots are displayed together in a large grid.
Here is a screenshot showing the XmdvTool scatterplot display.
In computer visualization the term glyph is used to describe any graphical entity that data points map to. The various attributes of the glyph such as size, shape, and color are controlled by the values of the data point in different dimensions. The type of glyph used in XmdvTool are star glyphs. The value along each of the dimensions maps to a radial axis protruding from a central point. For each data point, the value along an axis creates a "spoke" in a different direction. Finally, a line connecting the ends of each of these spokes is drawn.
Here is a screenshot showing the XmdvTool glyph display.
In the parallel coordinates display method, each dimension corresponds to one of a set of uniformly spaced vertical axes. A data point maps to a set of points along each axis. The point is displayed by drawing a polyline across all of these axes, connecting the points.
Here is a screenshot showing the XmdvTool parallel coordinate display.
Dimensional stacking is a hierarchical visualization technique where dimensions are embedded within other dimensions. The highest order dimensions divide the display into rectangular regions. Within each of these regions, the next highest order dimensions divide the display into further rectangular regions, and so on. For more information on this display technique, refer to [LEB91] and [TIP93].
Here is a screenshot showing the XmdvTool dimensional stacking display.
Allen Martin / firstname.lastname@example.org