Animating Multidimensional Scaling to Visualize Large N-Dimensional Data Sets

Chris L. Bentley
Computer Science Department
Worcester Polytechnic Institute

Overview

Goal: Present enhancements to Multidimensional Scaling (MDS) to extend technique to large data sets ( points) with many dimensions (up to 50)

What is N-dimensional data?
How can we analyse it?
How can we visualize it?
Focus on one technique: Multidimensional Scaling
Enhancements to traditional MDS: MAVIS
Conclusions & Future Work

Visualization

Science produces a numeric representation of our world
We search for: patterns, clusters, anomalies, outliers, things which make a difference to us.
Raw numeric data is difficult to interpret
Scientific Visualization translates numbers Images
This plays to the innate strengths of human visual perception

What is N-dimensional data?

We are familiar with 2 dimensional data defined by coordinate pairs (x,y). The Cartesian plane makes axes explicit, depicting entire data space. We can infer coordinate values of each point from the axes

2-column table makes values explicit - leaves us to imagine data space

N-dimensional data consists of points in a space with N coordinate axes

In table form, N-dimensional data is a mxn matrix. Each row i defines a data point and is an n-vector. Each column is an axis or dimension.

Analysing N-Dimensional Data

How do we find information in an mxn matrix of numbers?

Statistical methods - reduce data to summarizing values: mean, variance of columns or of entire matrix, measures of correlation or covariance between dimensions.
Multivariate regression analysis - finds the line, plane, hyperplane which summarizes the data with least error
Cluster analysis - groups data points with closest n-vectors, reduces number of data points. Cluster centroids can be treated as proxy data points

Visualizing N-Dimensional Data

How do we picture data with dimensions? There are four basic approaches

look at only a subset of the dimensions (e.g. all pairs) Multiple views can be displayed simultaneously - scatterplot matrices
rearrange axes so that they are no longer orthogonal - star glyphs, parallel coordinates, dimensional stacking (XmdvTool)
provide interactive manipulation of the data, so that user can explore data from multiple perspectives (PRIM-9)
reduce dimensionality of the data so 2D/3D graphics techniques can be used

Reducing Dimensionality

Goal: to reduce mxn matrix to mxp, where

Several methods can be used:

Factor Analysis - finds subset of columns (dimensions) which explain majority of variance in all columns of matrix
Principal Component Analysis - finds subset of columns (dimensions) which explain majority of correlation in all columns of matrix
Dimensional Clustering - combine dimensions with greatest correlation - each iteration reduces dimensionality by 1
Kohonen Self Organizing Feature Maps - uses neural network to arrive at matrix of weights that summarizes all n columns
Multidimensional Scaling (MDS) - developed between 1938 - 1962 by Richardson, Torgerson, Shepard, Kruskal. Initially used for displaying multivariate data from psychological testing

Multidimensional Scaling: Central Idea

measure all interpoint distances in n-dimensional data space
place 2D display points so they obey same mutual distance relationships

Is a perfect 2D configuration always possible? No.

Distances

distances in any dimension are always scalar values - they are one dimensional, thus they provide a bridge from nD to 2D
similarity data is often analogous to distance
2D distance formula is familiar Pythogorean Theorem:
generalized distance formula in n-dimensions:
even more general Minkowski metric:

Stress

If the nD distance equals the 2D distance then points i and j are correctly placed relative to one another
If then there is stress between the data and the 2D representation

For all points in data set:
Or the variant:

Minimizing Stress

MDS seeks a configuration of 2D display points that is a global minimum of stress
This can be calculated using: gradient descent, simplex method, simulated annealing, or other optimization techniques

Equivalent technique is the force based vector approach:

1       calculate nD distances between data points
2       generate starting set of 2D display points
3       calculate 2D distances between display points
4       while stress decreases
4.1         calculate stress for each distance pair
4.2         use stress to scale vector between points
4.3         update point positions by summing vectors

Figure shows vectors acting on point A, and cummulative vector in which point A will be updated

Measuring MDS

After t iterations of MDS, points will have moved into a low stress configuration
How good is a given MDS solution? How do we measure this?
1. Shepard plot - 2D distances vs. nD distances
2. Stress plot - graph of overall stress should drop
3. Correlation plot - graph of correlation between 2D distances and nD distances should approach 1.0 Calculated using Pearson's r:

Figure 1. shows Shepard plot
Figure 2. shows Stress graph
Figure 3. shows Correlation graph

Problems With MDS

Between m data points there are distance relationships
Thus stress is a function of variables
Time complexity: each iteration of MDS is quadratic. In total MDS is between and
Space complexity: to store the distance matrices for data points would require terabytes of memory (7 million megabytes)
As data dimensionality increases MDS has more trouble finding a global minimum of stress.

Solution #1: Subset-MDS

How to extend MDS so that it can be used with large data sets?

If distances are the problem don't use them all!
Use only a subset of all distances: Subset-MDS
If each point maintains distance relations to k others MDS will use only km distances - linear space, time complexity
Subset-MDS sacrifices some accuracy for tractability

Questions:

How should we select which k distances to retain?
For given k, how well does Subset-MDS approximate MDS?

Subset-MDS: Selecting Distances

Subset-MDS could select k random distances for each point

Or, Subset-MDS could retain closest k/2 & farthest k/2 distances

First Figure shows starting configuration of 5-dimensional data on iris flowers mapped to 3D. Dimensions with largest variance are used to initialize display coordinates
Second Figure shows Subset-MDS solution using closest k/2 & farthest k/2 distances
Third Figure shows Subset-MDS solution using random distance selection

Subset-MDS vs. MDS

The Figure below shows that the correlations achieved by Subset-MDS improves as the subset size grows.
In fact k = 10 achieves good correlations
This is probabaly due to ``transitive relationships'' between data points
If A is near to B and B is near to C then A will be placed near C for free

Solution #2: Animation + Randomness

As data dimensionality increases MDS cannot achieve low stress solutions
Rather than let MDS terminate in a poor local stress minimum, why not add random noise to point positions to keep stress high?
This will prevent MDS from equilibrating, and it will escape local minima, or at least cycle between a set of neighboring minima
We will get more information from this cycling than from static
This method similar to the simulated annealing method for escaping local minima equilibrium

Figure shows points of 3D tetrahedron animating towards low stress configuration in 2D display space.
Red indicates points are highly stressed

Random Perturbation

Random noise can be added to 2D display point coordinates

Or point updates can be scaled so points overshoot low stress position

It may be necessary to interpolate between successive positions

Figure shows points of 3D cube trapped in a high stress configuration.
Rerunning same MDS procedure with random perturbation of display points achieved lower stress solution

Conclusion

Summary

MDS was briefly introduced
Subset-MDS was proposed as linear time O(m) approximation
Initial results indicate approximation to MDS is good
Animating MDS with random perturbation was proposed
More research needs to be done on this

Future Work

Implement gradient descent MDS for comparison
Implement simulated annealing MDS for comparison
Research on adaptive variants of Subset-MDS
Adding clustering to preprocess large data sets

About this document ...

Next: About this document

Matthew Ward
Thu Oct 10 13:28:20 EDT 1996