Detection and Visualization of Cyclic Multivariate Data

Last modified: 04/22/97 

Abstract 

Cyclic multivariate data is encountered in a variety of disciplines, including astronomy/astrophysics, medicine, time series analysis, global change studies, and production management. This data is characterized by some cyclic, or periodic, component in one or more dimensions. Visualizing the data in such a way that the cyclic nature of the data were enhanced and accessible would be desirable, and would provide the viewer greater insight into the data set. Few techniques exist which enhance the cyclic nature of data, and of these fewer still accommodate many dimensions. A modification of the existing XmdvTool is proposed, integrating original visualizations for cyclic multivariate data, and extending existing visualizations where appropriate to enhance the cyclic nature of the data. A mechanism for automatic cycle detection, as well as an interface for interactive cycle specification, is proposed, providing the viewer with full control of cycles within the data. 


Table of Contents 
Introduction 

Cyclic behavior is everywhere. Consider the pounding of ocean waves on a beach, or a pounding heartbeat. Consider the Earth rotating on its axis once each day, or revolving around the sun once each year. Consider the fluctuations of the economy, or the fluctuations in wild animal populations. Consider the daily routines of people in a large city, or the vibrations of molecules in a gas. Consider techno music, or the hum of alternating current. Whether due to nature, or imposed artificially by humans, cyclic behavior can be found everywhere. 

As these behaviors are often interesting to study and analyze, cyclic data becomes as popular as the behavior it models. The data is acquired no differently than acyclic data; its cyclic nature is only determined when analyzing the data. Often the data is compiled into a time series in which the data is cyclic with respect to time. However, the data may be cyclic with respect to space (e.g., waves are cyclic with respect to both space and time) or any other variable. 

Ever more frequently, interesting data is multivariate, that is, varying in multiple dimensions. Cyclic data is no exception. Consider each of the examples in the paragraph above. Many variables may be associated with each cycle. We may be interested not only in an ocean wave's position with respect to time, but also its amplitude, velocity, temperature, and salinity, some of which may also be cyclic. Multivariate data may have variables cyclic with respect to the same dimension, or may have variables cyclic with respect to different dimensions, or may not be cyclic at all. If cyclic with respect to the same dimension, variables still may be out of phase or have different cycle lengths, or periods. (See Figures 1-3.) 







Previously, data of this cyclic multivariate type, or cyclic univariate data for that matter, were visualized like any other data. The data would simply be graphed or rendered without regard to its cyclic nature. An electrocardiogram is a good example of this. (See Figure 4.) Any correlations between data of adjacent cycles would have to be visually compiled by the viewer. While somewhat effective for small univariate data sets with well-defined cyclic nature, this method fails when applied to large multivariate data sets of unknown or mixed periodicity. Such data is found in a variety of disciplines previously mentioned. 



With the increasing occurrence of cyclic multivariate data, and a striking lack of effective techniques for visualizing it, new visualizing techniques are needed to address this data type. 


Background 

Detection 

Unless the cycles are explicit in the data set, some analysis of the data must be performed to determine the cycles. Cycle detection in fact is auxiliary to the goal of this thesis, but it is a crucial process. If the detection phase yields cycles that are inaccurate, we cannot hope to visualize the original data effectively, regardless of how elegant the visualization is. 

Much research has been done concerning cycle detection, also known as periodic correlation. A survey of some of this work has been conducted. One or more of the methods surveyed will be incorporated into the automatic cycle detection mechanism described later in this thesis proposal. I will draw upon the work of Helmut Mayer [Mayer93], H.B. Hwarng [Hwarng95a,b], Bloomfield, et al. [Bloomfield94], Wikle, et al. [Wikle95], and Wang, et al. [Wang93]. 

Visualization 

A significant amount of research has been conducted in the area of multivariate data, especially within the last decade. Cyclic data has received much less attention, although some examples of how to enhance the display of cyclic data has been found. To the best of my knowledge, visualization of data that is both cyclic and multivariate has received no attention at all, which is the impetus for this thesis work. 

Various methods for visualizing multivariate data have been integrated into a single software package by Matthew Ward [Ward94]. These include scatter plots, star glyphs, and dimensional stacking. The package also includes an n-dimensional brushing tool, improved upon by Allen Martin [Martin95]. This software is publicly available and will provide a basis for experimentation with cyclic multivariate data. 

A few interesting visualizations of cyclic data were discovered during the research of this topic. The first (Figure 5), attributed to Antonio Gabaglio, illustrates low-dimensional multivariate data with time spiraling from the center outward, the period being one year [Tufte83]. The second (Figure 6), from the same source, is attributed to Jacques Bertin. He shows a spiraling representation of univariate data, as well as a stacking of cycles on top of one another. Finally (Figure 7), circular histograms were detailed by Fisher, which do not preserve the cycle number, but do effectively summarize the intra-cycle relationships of the univariate data [Fisher93]. 







Viewers of cyclic data have long settled for visualizations that depend on the human vision system to detect patterns. Such a visualization might have time on one axis, like the electrocardiogram. Nelson Max, et al., describe a four-dimensional visualization with this property [Max93]. The viewer must scan the visualization to decide if the data looks cyclic, and then scan the adjacent cycles to see what change occurred on a cycle-by-cycle basis. While the human vision system is in general well suited to this task, it only works locally, not globally. If the data set is large, the system breaks down, as we can only distinguish change over adjacent cycles, rather than all cycles. Notice the difference between each representation of the data in Figure 6. The inter-cycle changes are easily detected for all cycles in the second and third representations, whereas the first representation offers no such help. Multivariate data only adds insult to this injury, as the scanning must be done for multiple dimensions unless there is sufficient prior knowledge of the data set to expedite this process. 

By the same principle, animations such as those described by Rhyne, et al., which display one time unit's worth of data per frame, suffer the same deficiency [Rhyne93]. Our vision system can detect periodicity in the animation if it exists, but we have no way of comparing the data of one cycle with that of the next. This is even worse than the static visualization, as one cannot even compare the data from one cycle to the next, as it disappears with each new frame! 

A successful visualization of cyclic multivariate data would allow comparison between all cycles simultaneously, and could be employed in the context of many dimensions. Preliminary research suggests that such a visualization does not yet exist, so its design and implementation is the goal of this thesis. 


Approach 

Detection 

XmdvTool will be extended such that, should a user wish to perform cyclic analysis on a data set, XmdvTool will apply an n-dimensional periodic correlation derived solely or in part from one or more of the cyclic detection papers referenced above.  This algorithm will result in the following:  The algorithm is expected to take roughly O(n2k2) time, where n is the number of dimensions and k is the size of the data set.  This is because every dimension must be correlated against every other dimension.  Preferably, the user would provide the algorithm with restrictions based on knowledge of the data set, such as to only consider time as a causal variable, or to limit cycle lengths to periods less than 3 months.  Such restrictions would reduce computational requirements considerably.  In fact, some such restrictions may be enforced to reduce the scope of this algorithm and subsequent visualizations.  Two such limitations will be the following:  Cycle boundaries resulting from this algorithm, represented by markers in a conventional XmdvTool visualization, may then be adjusted or totally replaced interactively by the user. 

The original data set is augmented with two new special dimensions, one for cycle number, and one for cycle offset.  The former indicates to which cycle the datum belongs, while the latter corresponds to the position within a cycle with the range [0,1) where 0 means we are on a cycle boundary, and 0.5 means we are in the middle of a cycle.  These special dimensions could be treated like any other dimensions by conventional XmdvTool visualizations, or could be employed as cues for the new cyclic-enhanced visualizations. 

Visualization 

In addition to the trivial case by which XmdvTool treats the cycle number and cycle offset as ordinary dimensions, one or more new cyclic-enhanced visualizations will be implemented and tested.  These new visualizations will map the cycle number and cycle offset dimensions in a visually intuitive manner, with the goals of maintaining inter-cycle relationships across all cycles as well as intra-cycle relationships. 

Some new visualization types that will be explored are the following: 
Deliverables 

The following is a list of deliverables which are required for the successful completion of this thesis:  The following is a list of deliverables which are not required for the successful completion of this thesis, but which may be desirable: 

Resources 

Software development will commence on a variety of systems and platforms, including a DEC AlphaStation 500/400 running Digital Unix and an Intel Pentium 90 running Linux.  Any computer running the X Window System is a candidate for software testing.  If XmdvTool is at any point extended to use Motif widgets or the OpenGL graphics API, commercial versions of these packages will be required for the Linux development machine. 

Suitable data sets for testing and demonstration will be acquired at no cost from public data repositories. 


Schedule 

The majority of the thesis work will be completed during the months of June through August, 1997.  This will include implementation of the cycle detection mechanism and interactive cycle specification interface, as well as the new cyclic-enhanced visualizations.  Testing of visualizations and interfaces will be conducted during the months of September and October, 1997.  The thesis paper will be written during the months of September through December, 1997, with submission in December. 


References 
[Bloomfield94]  Peter Bloomfield, Harry L. Hurd, Robert B. Lund, 1994, Periodic Correlation in Stratospheric Ozone Data.  Journal of Time Series Analysis, 15 (2), 127-50. 
[Fisher93]  N. I. Fisher, 1993, Statistical Analysis of Circular Data, 22, 29.  Cambridge University Press, Cambridge. 
[Hwarng95a]  H. B. Hwarng, 1995, Multilayer Perceptrons for Detecting Cyclic Data on Control Charts.  International Journal of Production Research, 33 (1), 3101-17. 
[Hwarng95b]  H. Brian Hwarng, 1995, Proper and Effective Training of a Pattern Recognizer for Cyclic Data.  IEE Transactions, 27 (6), 746-56. 
[Martin95]  Allen R. Martin, Matthew O. Ward, 1995, High Dimensional Brushing for Interactive Exploration of Multivariate Data.  Proceedings of IEEE Conference on Visualization (Visualization '95), 271-8. 
[Max93]  Nelson Max, Roger Crawfis, Dean Williams, 1993, Visualization for Climate Modeling.  IEEE Computer Graphics & Applications, 13 (4), 34-40. 
[Mayer93]  Helmut Mayer, 1993, Time-Series Analysis in Cyclic Stratigraphy: An Example from the Cretaceous of the Southern Alps, Italy.  Mathematical Geology, 25 (7), 975-1001. 
[Rhyne93]  Theresa Rhyne, Mark Bolstad, Penny Rheingans, Lynne Petterson, Walter Shackelford, 1993, Visualizing Environmental Data at the EPA.  IEEE Computer Graphics & Applications, 13 (2), 34-8. 
[Tufte83]  Edward R. Tufte, 1983, The Visual Display of Quantitative Information, 72, 169.  Graphics Press, Cheshire, Connecticut. 
[Wang93]  Huang-Xin Wang, Robert de Paola, William I. Norwood, 1993, Analysis of Intermittent Periodic Modes within Complex Data.  Physical Review Letters, 71 (18), 3039-42. 
[Ward94]  Matthew O. Ward, 1994, XmdvTool: Integrating Multiple Methods for Visualizing Multivariate Data.  Proceedings of IEEE Conference on Visualization (Visualization '94), 326-33. 
[Wikle95]  Christopher K. Wikle, Peter J. Sherman, Tsing-Chang Chen, 1995, Identifying Periodic Components in Atmospheric Data Using a Family of Minimum Variance Spectral Estimators.  Journal of Climate, 8 (10), 2352-63. 



Copyright © 1997 by Benj Lipchak. 
All rights reserved.