Chris L. Bentley

Computer Science Department

Worcester Polytechnic Institute

**Overview**

Goal: Present enhancements to Multidimensional Scaling (MDS) to extend technique to large data sets ( points) with many dimensions (up to 50)

- What is
*N*-dimensional data? - How can we analyse it?
- How can we visualize it?
- Focus on one technique: Multidimensional Scaling
- Enhancements to traditional MDS:
**MAVIS** - Conclusions & Future Work

**Visualization**

- Science produces a numeric representation of our world
- We search for: patterns, clusters, anomalies, outliers, things which
make a
*difference*to us. - Raw numeric data is difficult to interpret
- Scientific Visualization translates
**numbers***Images* - This plays to the innate strengths of human visual perception

**What is N-dimensional data?**

We are familiar with 2 dimensional data defined by coordinate pairs (*x*,*y*).
The Cartesian plane makes axes explicit, depicting entire data space. We can
infer coordinate values of each point from the axes

2-column table makes values explicit - leaves us to imagine data space

*N*-dimensional data consists of points in a space with *N* coordinate axes

In table form, *N*-dimensional data is a *m*x*n* matrix. Each row *i*
defines a data point and is an *n*-vector. Each column is an axis or dimension.

**Analysing N-Dimensional Data**

How do we find information in an *m*x*n* matrix of numbers?

- Statistical methods - reduce data to summarizing values: mean, variance of columns or of entire matrix, measures of correlation or covariance between dimensions.
- Multivariate regression analysis - finds the line, plane, hyperplane which summarizes the data with least error
- Cluster analysis - groups data points with closest
*n*-vectors, reduces number of data points. Cluster centroids can be treated as proxy data points

**Visualizing N-Dimensional Data**

How do we picture data with dimensions? There are four basic approaches

- look at only a subset of the dimensions (e.g. all pairs) Multiple views can be displayed simultaneously - scatterplot matrices
- rearrange axes so that they are no longer orthogonal - star glyphs,
parallel coordinates, dimensional stacking (
**XmdvTool**) - provide interactive manipulation of the data, so that user can
explore data from multiple perspectives (
**PRIM-9**) - reduce dimensionality of the data so 2D/3D graphics techniques can be used

**Reducing Dimensionality**

Goal: to reduce *m*x*n* matrix to *m*x*p*, where

Several methods can be used:

- Factor Analysis - finds subset of columns (dimensions) which explain majority of variance in all columns of matrix
- Principal Component Analysis - finds subset of columns (dimensions) which explain majority of correlation in all columns of matrix
- Dimensional Clustering - combine dimensions with greatest correlation - each iteration reduces dimensionality by 1
- Kohonen Self Organizing Feature Maps - uses neural network to
arrive at matrix of weights that summarizes all
*n*columns - Multidimensional Scaling (MDS) - developed between 1938 - 1962 by Richardson, Torgerson, Shepard, Kruskal. Initially used for displaying multivariate data from psychological testing

**Multidimensional Scaling: Central Idea**

- measure all interpoint distances in
*n*-dimensional data space - place 2D
*display points*so they obey same mutual distance relationships

Is a perfect 2D configuration always possible? No.

**Distances**

- distances in any dimension are always scalar values - they are
one dimensional, thus they provide a
*bridge*from*n*D to 2D - similarity data is often analogous to distance
- 2D distance formula is familiar Pythogorean Theorem:
- generalized distance formula in
*n*-dimensions: - even more general Minkowski metric:

**Stress**

- If the
*n*D distance equals the 2D distance then points*i*and*j*are correctly placed relative to one another - If then there is
**stress**between the data and the 2D representation

- For all points in data set:
- Or the variant:

**Minimizing Stress**

- MDS seeks a configuration of 2D display points that is a
*global minimum*of stress - This can be calculated using: gradient descent, simplex method, simulated annealing, or other optimization techniques
- Equivalent technique is the
*force based*vector approach:1 calculate nD distances between data points 2 generate starting set of 2D display points 3 calculate 2D distances between display points 4 while stress decreases 4.1 calculate stress for each distance pair 4.2 use stress to scale vector between points 4.3 update point positions by summing vectors

Figure shows vectors acting on point *A*, and cummulative vector in
which point *A* will be updated

**Measuring MDS**

- After
*t*iterations of MDS, points will have moved into a*low stress*configuration - How good is a given MDS solution? How do we measure this?
- Shepard plot - 2D distances vs.
*n*D distances - Stress plot - graph of overall stress should drop
- Correlation plot - graph of correlation between 2D distances and
*n*D distances should approach 1.0 Calculated using*Pearson's r*:

- Shepard plot - 2D distances vs.

- Figure 1. shows Shepard plot
- Figure 2. shows Stress graph
- Figure 3. shows Correlation graph

- Between
*m*data points there are distance relationships - Thus
**stress**is a function of variables - Time complexity: each iteration of MDS is quadratic. In total MDS is between and
- Space complexity: to store the distance matrices for data
points would require
*terabytes*of memory (7 million megabytes) - As data dimensionality increases MDS has more trouble finding a global minimum of stress.

**Solution #1: Subset-MDS**

How to extend MDS so that it can be used with large data sets?

- If distances are the problem don't use them all!
- Use only a subset of all distances:
**Subset-MDS** - If each point maintains distance relations to
*k*others MDS will use only*km*distances - linear space, time complexity - Subset-MDS sacrifices some accuracy for tractability

Questions:

- How should we select which
*k*distances to retain? - For given
*k*, how well does Subset-MDS approximate MDS?

**Subset-MDS: Selecting Distances**

Subset-MDS could select *k* random distances for each point

Or, Subset-MDS could retain closest *k*/2 & farthest *k*/2 distances

- First Figure shows starting configuration of 5-dimensional data on iris flowers mapped to 3D. Dimensions with largest variance are used to initialize display coordinates
- Second Figure shows Subset-MDS solution using closest
*k*/2 & farthest*k*/2 distances - Third Figure shows Subset-MDS solution using random distance selection

- The Figure below shows that the correlations achieved by Subset-MDS improves as the subset size grows.
- In fact
*k*= 10 achieves good correlations - This is probabaly due to ``transitive relationships'' between data points
- If
**A**is near to**B**and**B**is near to**C**then**A**will be placed near**C****for free**

**Solution #2: Animation + Randomness**

- As data dimensionality increases MDS cannot achieve low stress solutions
- Rather than let MDS terminate in a poor local stress minimum, why not add random noise to point positions to keep stress high?
- This will prevent MDS from equilibrating, and it will escape local minima, or at least cycle between a set of neighboring minima
- We will get more information from this cycling than from static
- This method similar to the simulated annealing method for escaping local minima equilibrium

- Figure shows points of 3D tetrahedron animating towards low stress configuration in 2D display space.
- Red indicates points are highly stressed

Random noise can be added to 2D display point coordinates

Or point updates can be scaled so points overshoot low stress position

It may be necessary to interpolate between successive positions

- Figure shows points of 3D cube trapped in a high stress configuration.
- Rerunning same MDS procedure with random perturbation of display points achieved lower stress solution

**Summary**

- MDS was briefly introduced
- Subset-MDS was proposed as linear time
*O*(*m*) approximation - Initial results indicate approximation to MDS is good
- Animating MDS with random perturbation was proposed
- More research needs to be done on this

**Future Work**

- Implement gradient descent MDS for comparison
- Implement simulated annealing MDS for comparison
- Research on adaptive variants of Subset-MDS
- Adding clustering to preprocess large data sets

Thu Oct 10 13:28:20 EDT 1996