30 Years of Multidimensional Multivariate Visualization Pak Chung Wong R.Daniel Bergeron pcw@cs.unh.edu rdb@cs.unh.edu Department of Computer Science University of New Hampshire Durham,New Hampshire 03824,USA Abstract We present a survey of multidimensional multivariate(mdmv)visualization techniques developed during the last three decades.This subfield of scientific visualization deals with the analysis of data with multiple parameters or factors,and the key relationships among them.The course of development is roughly organized into four stages,within which major milestones are discussed.Recently developed techniques are explored with examples. 1 Introduction Multidimensional multivariate visualization is an important subfield of scientific visualization.It was studied sep- arately by statisticians and psychologists long before computer science was deemed a discipline.The appearance of low-priced personal computers and workstations during the 1980's breathed new life into graphical analysis of mdmv data.This research topic was among one of the short-term goals included in the 1987 National Science Foundation(NSF)sponsored workshop on Visualization in Scientific Computing [MDB87].The quest for effective and efficient mdmv visualization techniques has expanded since then. This paper attempts to trace three decades of intensive development in this visualization field.It is by no means a comprehensive survey.We provide a brief history along with a description of the principal concepts of some mdmv visualization techniques.Recently developed mdmv visualization techniques are discussed in detail with examples.A remark of the trends of mdmv visualization research is given. 2 Four Stages of Multidimensional Multivariate Visualization Develop- ment The last three decades of mdmv visualization development can be roughly characterized into four stages.The classic exploratory data analysis (EDA)book by Tukey [Tuk77],the 1987 NSF workshop on Visualization in 1
30 Years of Multidimensional Multivariate Visualization Pak Chung Wong R. Daniel Bergeron pcw@cs.unh.edu rdb@cs.unh.edu Department of Computer Science University of New Hampshire Durham, New Hampshire 03824, USA Abstract We present a survey of multidimensional multivariate (mdmv) visualization techniques developed during the last three decades. This subfield of scientific visualization deals with the analysis of data with multiple parameters or factors, and the key relationships among them. The course of development is roughly organized into four stages, within which major milestones are discussed. Recently developed techniques are explored with examples. 1 Introduction Multidimensional multivariate visualization is an important subfield of scientific visualization. It was studied separately by statisticians and psychologists long before computer science was deemed a discipline. The appearance of low-priced personal computers and workstations during the 1980’s breathed new life into graphical analysis of mdmv data. This research topic was among one of the short-term goals included in the 1987 National Science Foundation (NSF) sponsored workshop on Visualization in Scientific Computing [MDB87]. The quest for effective and efficient mdmv visualization techniques has expanded since then. This paper attempts to trace three decades of intensive development in this visualization field. It is by no means a comprehensive survey. We provide a brief history along with a description of the principal concepts of some mdmv visualization techniques. Recently developed mdmv visualization techniques are discussed in detail with examples. A remark of the trends of mdmv visualization research is given. 2 Four Stages of Multidimensional Multivariate Visualization Development The last three decades of mdmv visualization development can be roughly characterized into four stages. The classic exploratory data analysis (EDA) book by Tukey [Tuk77], the 1987 NSF workshop on Visualization in 1
Scientific Computing [MDB87],and the IEEE Visualization'91 conference [NR91]are the watersheds defining these stages.The first stage was primarily concerned with the graphical presentation of either one or two variate data.The second stage was dominated by Tukey's exploratory data analysis.Scientists started looking at graphical data with a different perspective.Although most of the graphics was still two dimensional,scientists were able to encode data with multiple parameters,i.e.,multivariate,into meaningful two dimensional plots.The momentum of this work carried on through the next stage when NSF recognized the importance of mdmv data visualization.The involvement of computer scientists accelerated the growth of the research by computerizing many of the old ideas and developing many new ones.The mission was formally defined and many promising concepts were developed during the following few years.The final(current)stage is concerned with the elaboration and assessment of mdmy visualization techniques.It remains to be seen whether the existing mdmv visualization concepts can lead to better visualization of a problem and better understanding of the underlying science.This discussion of mdmv visualization is far from complete.There are other important topics including volume visualization and vector/tensor field visualization that are not covered.The principal concepts and research issues related to these subjects can be found in [Nie92,PvW92,KHK+94,HPvW94] 2.1 Pre-1976 The Searching Stage Scientists have studied multivariate visualization since 1782 when Crome used point symbols to show the geo- graphical distribution in Europe of56 commodities [Col93].In 1950,Gibson [Gib50]started the research on visual texture perception.Later,Pickett and White [PW66]proposed mapping data sets onto artificial graphical objects composed of lines.This texture mapping work was further investigated by Pickett [Pic70],and was eventually computerized [PG88].Chernoff [Che73]presented his arrays of cartoon faces for multivariate data in 1973.In this well-known technique,variates are mapped to the shape of the cartoon faces and their facial features including nose,mouth,and eyes.These faces are then displayed in a two dimensional graph. The searching stage can be characterized by relatively small sized data,and tools for data visualization that usually consisted of color pencils and graph paper.The graphical output was mostly two dimensional y-displays. Statisticians were the dominant research force during this period.Graphics was used to bring out the key features of the data,suggest statistical analysis methods that are applied to the data,and present the conclusions [Fis70]. 2.2 1977-1985 The Awakening Stage Tukey's exploratory data analysis signified a new era of scientific data visualization.Exploratory data analysis is more than a tool;it is a way of thinking.It teaches people how to visually decode information from the data.When the personal computer arrived,it became the scientist's most powerful tool ever.Now scientists could visualize data beyond two dimensions interactively.The painfully long calculations suddenly became available in real time. Statisticians could visualize data during each stage of the analysis instead of waiting until the final results were available.The availability of other computer hardware such as high resolution color displays also gave the study of mdmv visualization new opportunities. During this stage,two and three dimensional spatial data were the most common data types being studied, 2
Scientific Computing [MDB87], and the IEEE Visualization ’91 conference [NR91] are the watersheds defining these stages. The first stage was primarily concerned with the graphical presentation of either one or two variate data. The second stage was dominated by Tukey’s exploratory data analysis. Scientists started looking at graphical data with a different perspective. Although most of the graphics was still two dimensional, scientists were able to encode data with multiple parameters, i.e., multivariate, into meaningful two dimensional plots. The momentum of this work carried on through the next stage when NSF recognized the importance of mdmv data visualization. The involvement of computer scientists accelerated the growth of the research by computerizing many of the old ideas and developing many new ones. The mission was formally defined and many promising concepts were developed during the following few years. The final (current) stage is concerned with the elaboration and assessment of mdmv visualization techniques. It remains to be seen whether the existing mdmv visualization concepts can lead to better visualization of a problem and better understanding of the underlying science. This discussion of mdmv visualization is far from complete. There are other important topics including volume visualization and vector/tensor field visualization that are not covered. The principal concepts and research issues related to these subjects can be found in [Nie92, PvW92, KHK+ 94, HPvW94]. 2.1 Pre–1976 The Searching Stage Scientists have studied multivariate visualization since 1782 when Crome used point symbols to show the geographical distribution in Europe of 56 commodities [Col93]. In 1950, Gibson [Gib50] started the research on visual texture perception. Later, Pickett and White [PW66] proposed mapping data sets onto artificial graphical objects composed of lines. This texture mapping work was further investigated by Pickett [Pic70], and was eventually computerized [PG88]. Chernoff [Che73] presented his arrays of cartoon faces for multivariate data in 1973. In this well-known technique, variates are mapped to the shape of the cartoon faces and their facial features including nose, mouth, and eyes. These faces are then displayed in a two dimensional graph. The searching stage can be characterized by relatively small sized data, and tools for data visualization that usually consisted of color pencils and graph paper. The graphical output was mostly two dimensional xy-displays. Statisticians were the dominant research force during this period. Graphics was used to bring out the key features of the data, suggest statistical analysis methods that are applied to the data, and present the conclusions [Fis70]. 2.2 1977–1985 The Awakening Stage Tukey’s exploratory data analysis signified a new era of scientific data visualization. Exploratory data analysis is more than a tool; it is a way of thinking. It teaches people how to visually decode information from the data. When the personal computer arrived, it became the scientist’s most powerful tool ever. Now scientists could visualize data beyond two dimensions interactively. The painfully long calculations suddenly became available in real time. Statisticians could visualize data during each stage of the analysis instead of waiting until the final results were available. The availability of other computer hardware such as high resolution color displays also gave the study of mdmv visualization new opportunities. During this stage, two and three dimensional spatial data were the most common data types being studied, 2
although multivariate data started gaining more attention.Asimov [Asi85]presented the grand tour technique for viewing projections of multivariate data on two dimensional planes.Earth resource satellites sent out decades ago are still transmitting data continuously.Gigabyte sized multivariate data had arrived. 2.3 1986-1991 The Discovery Stage The 1987 NSF workshop formally declared the need for two and three dimensional spatial object visualization.The two dimensional projections of multivariate data sets is also included as one of the short-term potential targets for scientific visualization research.Once the mission was defined,scientists started pushing hard on the representation and visualization of mdmv data.The limited availability of high speed graphics hardware during the previous stage was gradually conquered.A majority of research was directed away from the development of exploratory data analysis tools,which lay heavily on statistical measures,towards colorful high dimensional graphics that required high speed computations.Some of the mdmv visualization concepts developed during this stage include:grand tour methods [BA86],parallel coordinates [IRC87,ID87,ID90],iconography [PG88,BG89b,Bed90,Lev91],worlds within worlds [FB90a,FB90b],dimension stacking [LWW90],hierarchical axis [MGTS90,MTS91a,MTS91b], hyperbox [AC91],and various ideas collected in [Cle93,CMM93].Some of these techniques attempt to show all dimensions and all variates visually as one display,whereas others aim at direct manipulation graphics,in which the user interactively selects subsets for display by using an input device such as a mouse.Virtual reality [FB90a,FB90b]began to appear in the mdmv visualization literature. 2.4 1992-present The Elaboration and Assessment Stage In 1990 and 1991,there were at least fourteen mdmv related papers published in the IEEE Visualization conferences. A total of four have been published in the three visualization conferences since then.This stage so far has been a period of retrenchment in the development of new mdmv visualization techniques.Some of the most recently developed tools are,each in a different way,elaborations of work done in previous stages.For example,HyperSlice [vWvL93]is an attempt to combine the panel matrix of scatterplot matrix with direct manipulation of brushing [BC87].Auto Visual [BF92,BF93]is an extended version of worlds within worlds with a new rule-based interfaces.XmdvTool [War94]integrates four existing mdmv visualization tools:dimension stacking,scatterplot matrix,glyphs,and parallel coordinates into one system with enhanced n-dimensional brushing. The research in mdmv visualization has also been diversified into multidisciplinary collaborations.Attempts to combine sound with graphics [SPW92,SBG92]are currently being made.The concept of a rule-based queue [BF92,BF93]was also introduced.One of the latest research issues of mdmv visualization is the need to evaluate the correctness,effectiveness,and usefulness of mdmv visualization techniques.Similar concerns also appear in the other fields of visualization research [RET+94,HPvW94]. 3
although multivariate data started gaining more attention. Asimov [Asi85] presented the grand tour technique for viewing projections of multivariate data on two dimensional planes. Earth resource satellites sent out decades ago are still transmitting data continuously. Gigabyte sized multivariate data had arrived. 2.3 1986–1991 The Discovery Stage The 1987 NSF workshop formally declared the need for two and three dimensional spatial object visualization. The two dimensional projections of multivariate data sets is also included as one of the short-term potential targets for scientific visualization research. Once the mission was defined, scientists started pushing hard on the representation and visualization of mdmv data. The limited availability of high speed graphics hardware during the previous stage was gradually conquered. A majority of research was directed away from the development of exploratory data analysis tools, which lay heavily on statistical measures, towards colorful high dimensional graphics that required high speed computations. Some of the mdmv visualization concepts developed during this stage include: grand tour methods [BA86], parallel coordinates [IRC87, ID87, ID90], iconography [PG88, BG89b, Bed90, Lev91], worlds within worlds [FB90a, FB90b], dimension stacking [LWW90], hierarchical axis [MGTS90, MTS91a, MTS91b], hyperbox [AC91], and various ideas collected in [Cle93, CMM93]. Some of these techniques attempt to show all dimensions and all variates visually as one display, whereas others aim at direct manipulation graphics, in which the user interactively selects subsets for display by using an input device such as a mouse. Virtual reality [FB90a, FB90b] began to appear in the mdmv visualization literature. 2.4 1992–present The Elaboration and Assessment Stage In 1990 and 1991, there were at least fourteen mdmv related papers published in the IEEE Visualization conferences. A total of four have been published in the three visualization conferences since then. This stage so far has been a period of retrenchment in the development of new mdmv visualization techniques. Some of the most recently developed tools are, each in a different way, elaborations of work done in previous stages. For example, HyperSlice [vWvL93] is an attempt to combine the panel matrix of scatterplot matrix with direct manipulation of brushing [BC87]. AutoVisual [BF92, BF93] is an extended version of worlds within worlds with a new rule-based interfaces. XmdvTool [War94] integrates four existing mdmv visualization tools: dimension stacking, scatterplot matrix, glyphs, and parallel coordinates into one system with enhanced n-dimensional brushing. The research in mdmv visualization has also been diversified into multidisciplinary collaborations. Attempts to combine sound with graphics [SPW92, SBG92] are currently being made. The concept of a rule-based queue [BF92, BF93] was also introduced. One of the latest research issues of mdmv visualization is the need to evaluate the correctness, effectiveness, and usefulness of mdmv visualization techniques. Similar concerns also appear in the other fields of visualization research [RET+ 94, HPvW94]. 3
3 Terminology Unfortunately,the mdmv literature suffers from ill-defined and inconsistent terminology.The term dimensionality is especially overloaded.Mathematicians consider dimension as the number of independent variables in an algebraic equation.Engineers take dimension as measurements of any sort(breadth,length,height,and thickness). Even the prefix multi is frequently interchanged with another prefix hyper.In statistics literatures,the prefix multi means two or more,indicating a natural breakpoint between one and two dimension in probabilistic methods.For the breakpoint between three and four (or beyond),the prefix hyper is used [Cle93].We use the prefix multi to refer to dimensionality of two or more. Beddow [Bed92]points out the difference between multidimensional objects and multidimensional data. Multidimensional objects are spatial objects,and our goal is to understand their geometry.The most common form are two dimensional images and three dimensional volumes.They can best be described as n-dimensional Euclidean spaces R".Multidimensional data,on the other hand,refers to the study of relationships among multiple parameters.Mathematically these parameters can be classified into two categories:dependent and independent [KK93].Some statisticians prefer the terms factor and response [Cle93].A variable is said to be dependent if it is a function of another variable,the independent variable.The relationship of an independent variable z and a dependent variable y can best be described by the mathematical equation y=f().We adopt the convention that the term multidimensional refers to the dimensionality of the independent variables,while the term multivariate refers to the dimensionality of the dependent variables [BCH+94].This is by far the most popular way to describe the dimensionality of mdmv data sets in scientific visualization literature.For example,a three dimensional volume space in which both temperature and pressure are observed and recorded in various locations produces 3d2v data. Beddow [Bed92]argues that analytic methods used to explore n-dimensional Euclidean spaces R"are not appropriate for general multivariate analysis.In mdmv visualization research,the emphasis shifts away from the strong mathematical definition of dependent and independent variates towards the broader definition of multiple variables or factors.This happens not only in mdmv scientific visualization research but also in statistical studies. The tools are different,but the goal is the same:to find the hidden relationships between the variables(also known as fitting in statistics). In general,raw scientific data can be categorized into a hierarchy of data types.The most general and the lowest of the hierarchy is the nominal data,whose values have no inherent ordering.For example,the names of the fifty states are nominal data.The next higher type of the hierarchy is ordinal data,whose values are ordered, but for which no meaningful distance metric exists.The seven rainbow colors(i.e.,red,orange,..)belong to this category.The highest of the hierarchy is metric data,which has a meaningful distance metric between any two values.Times,distances,and temperatures are examples.If we bin metric data into ranges,it becomes ordinal data.If we further remove the ordering constraints,the data is nominal.Some of the visualization techniques included in this survey are specially designed to handle metric data(see Sections 5.2.2 and 5.2.9.) The above 3d2v temperature/pressure example more or less implies that each 3 dimensional coordinates contain simple(i.e.,neither a set nor an interval)and atomic(i.e.,not composite)values of pressure and temperatures.This is different from the case when we measure,for example,the chemical contents of a volume.Each coordinates
3 Terminology Unfortunately, the mdmv literature suffers from ill-defined and inconsistent terminology. The term dimensionality is especially overloaded. Mathematicians consider dimension as the number of independent variables in an algebraic equation. Engineers take dimension as measurements of any sort (breadth, length, height, and thickness). Even the prefix multi is frequently interchanged with another prefix hyper. In statistics literatures, the prefix multi means two or more, indicating a natural breakpoint between one and two dimension in probabilistic methods. For the breakpoint between three and four (or beyond), the prefix hyper is used [Cle93]. We use the prefix multi to refer to dimensionality of two or more. Beddow [Bed92] points out the difference between multidimensional objects and multidimensional data. Multidimensional objects are spatial objects, and our goal is to understand their geometry. The most common form are two dimensional images and three dimensional volumes. They can best be described as n-dimensional Euclidean spaces Rn . Multidimensional data, on the other hand, refers to the study of relationships among multiple parameters. Mathematically these parameters can be classified into two categories: dependent and independent [KK93]. Some statisticians prefer the terms factor and response [Cle93]. A variable is said to be dependent if it is a function of another variable, the independent variable. The relationship of an independent variable x and a dependent variable y can best be described by the mathematical equation y = f (x). We adopt the convention that the term multidimensional refers to the dimensionality of the independent variables, while the term multivariate refers to the dimensionality of the dependent variables [BCH+ 94]. This is by far the most popular way to describe the dimensionality of mdmv data sets in scientific visualization literature. For example, a three dimensional volume space in which both temperature and pressure are observed and recorded in various locations produces 3d2v data. Beddow [Bed92] argues that analytic methods used to explore n-dimensional Euclidean spaces Rn are not appropriate for general multivariate analysis. In mdmv visualization research, the emphasis shifts away from the strong mathematical definition of dependent and independent variates towards the broader definition of multiple variables or factors. This happens not only in mdmv scientific visualization research but also in statistical studies. The tools are different, but the goal is the same: to find the hidden relationships between the variables (also known as fitting in statistics). In general, raw scientific data can be categorized into a hierarchy of data types. The most general and the lowest of the hierarchy is the nominal data, whose values have no inherent ordering. For example, the names of the fifty states are nominal data. The next higher type of the hierarchy is ordinal data, whose values are ordered, but for which no meaningful distance metric exists. The seven rainbow colors (i.e., red, orange, ) belong to this category. The highest of the hierarchy is metric data, which has a meaningful distance metric between any two values. Times, distances, and temperatures are examples. If we bin metric data into ranges, it becomes ordinal data. If we further remove the ordering constraints, the data is nominal. Some of the visualization techniques included in this survey are specially designed to handle metric data (see Sections 5.2.2 and 5.2.9.) The above 3d2v temperature/pressure example more or less implies that each 3 dimensional coordinates contain simple (i.e., neither a set nor an interval) and atomic (i.e., not composite) values of pressure and temperatures. This is different from the case when we measure, for example, the chemical contents of a volume. Each coordinates 4
now has a set (instead of a simple value)of composite data (i.e.,chemical elements.)The varying numbers of values of a variate plotted in any single dimensional point is known as the density of that coordinate. 4 Fundamental Objective and Approach The main objectives of mdmv visualization are to visually summarize an mdmv data set,and find key trends and relationships among the variates.Different properties and characteristics of the data may changes the way we carry out visualization,but not its goals. The traditional two dimensional point and line plots are among the most commonly used visualization tech- niques for data with lower number of variates.This technique can be enhanced by putting an array of plots into one display,so as to add another variate to the visual presentation.This approach is discussed in Sections 5.1.5 and5.2.2. We can also map the variates of the data into graphical primitives of differnt colors,sizes,shapes,and locations (see Sections 5.2.4,5.2.5,and 5.2.6.)The display of all dimensions and all variates creates some kind of texture patterns,and provide critical insights needed for scientific discovery. For large(larger than the number of pixels of a display)scientific data,we can display a certain portion of data and allow the user to navigate the rest interactively.This is described in Sections 5.2.2,5.2.7,5.2.8,and 5.2.9, Most of the visualization techniques assume a Euclidean space environment.Orthogonal axes,however,are not always the best choice to plot data.Sections 5.2.7,5.2.10,and 5.2.11 give some alternatives. A powerful visualization technique is to display the data frame by frame according to a time variate.This animation approach is discussed in Sections 5.3.1,5.3.2,and 5.3.3. 5 Multidimensional Multivariate Visualization and Concepts The body of this paper covers the principal concepts and brief history of some of the popular mdmy visualization techniques.During the last decade,hundreds of so-called new mdmv visualization techniques have been invented. (Refer to [KK93]for more details in this regard.)A majority of them are designed for special purposes such as volume visualization and vector/tensor field visualization,which are not covered in our discussion.Some of the rest are merely ad hoc tools that produce pretty pictures.They are difficult to create and their results are hard to interpret.We are interested in techniques that are founded on a solid basis and that have potential for practical value. Categorizing mdmv visualization techniques is a difficult task.Possible criteria for such a categorization include the goal of the visualization,the type and/or dimensionality of the data,the dimensionality of the visualization technique,etc.We have not found a convincing set of criteria that cleanly differentiate the visualization techniques we wish to describe.We have chosen to group the techniques into those based on 2-variate displays,those based on multivariate displays,and those using time as an animation parameter: Technigues based on 2-variate displays include the fundamental 2-variate displays and simultaneous views 5
now has a set (instead of a simple value) of composite data (i.e., chemical elements.) The varying numbers of values of a variate plotted in any single dimensional point is known as the density of that coordinate. 4 Fundamental Objective and Approach The main objectives of mdmv visualization are to visually summarize an mdmv data set, and find key trends and relationships among the variates. Different properties and characteristics of the data may changes the way we carry out visualization, but not its goals. The traditional two dimensional point and line plots are among the most commonly used visualization techniques for data with lower number of variates. This technique can be enhanced by putting an array of plots into one display, so as to add another variate to the visual presentation. This approach is discussed in Sections 5.1.5 and 5.2.2. We can also map the variates of the data into graphical primitives of differnt colors, sizes, shapes, and locations (see Sections 5.2.4, 5.2.5, and 5.2.6.) The display of all dimensions and all variates creates some kind of texture patterns, and provide critical insights needed for scientific discovery. For large (larger than the number of pixels of a display) scientific data, we can display a certain portion of data and allow the user to navigate the rest interactively. This is described in Sections 5.2.2, 5.2.7, 5.2.8, and 5.2.9, Most of the visualization techniques assume a Euclidean space environment. Orthogonal axes, however, are not always the best choice to plot data. Sections 5.2.7, 5.2.10, and 5.2.11 give some alternatives. A powerful visualization technique is to display the data frame by frame according to a time variate. This animation approach is discussed in Sections 5.3.1, 5.3.2, and 5.3.3. 5 Multidimensional Multivariate Visualization and Concepts The body of this paper covers the principal concepts and brief history of some of the popular mdmv visualization techniques. During the last decade, hundreds of so-called new mdmv visualization techniques have been invented. (Refer to [KK93] for more details in this regard.) A majority of them are designed for special purposes such as volume visualization and vector/tensor field visualization, which are not covered in our discussion. Some of the rest are merely ad hoc tools that produce pretty pictures. They are difficult to create and their results are hard to interpret. We are interested in techniques that are founded on a solid basis and that have potential for practical value. Categorizing mdmv visualization techniques is a difficult task. Possible criteria for such a categorization include the goal of the visualization, the type and/or dimensionality of the data, the dimensionality of the visualization technique, etc. We have not found a convincing set of criteria that cleanly differentiate the visualization techniques we wish to describe. We have chosen to group the techniques into those based on 2-variate displays, those based on multivariate displays, and those using time as an animation parameter: Techniques based on 2-variate displays include the fundamental 2-variate displays and simultaneous views 5
of 2-variate displays.Most of the them developed in the statistics world.Both visual perception and statistical fitting of the data are of major concern.The data size is relatively small,usually in the order of hundreds of items.The graphics are mostly variations on two dimensional point and line plots. Multivariate display are the basis for many recently developed mdmy techniques,most of which use colorful graphics created by high-speed graphics computation.The data is usually larger and more complicated.A majority of the techniques were developed within the period of 1987-1991. Animation is a powerful tool for visualizing mdmy scientific data.Various movie animation techniques on mdmv data,and a scalar visualization animation model are presented.In principle,any single frame visualization technique can be extended to animation if the data can be represented as a time series showing two-way correlations. 5.1 Techniques Based on 2-variate Displays This section highlights some of the tools and summarizes the general approach developed based on 2-variate displays.The discussion is based upon the book by Cleveland [Cle93],which has a good collection of elegant visualization techniques developed by Cleveland,Tukey,and others throughout the 80's.Tukey's exploratory data analysis [Tuk77]is an important milestone of data visualization;most of the techniques were developed with pencil and paper during the early 70's.Cleveland's work emphasizes the structure of data and the validity of statistical models fitted to data.A majority of the visualization techniques are two dimensional,with the exception of isosurface plotting.Color is rarely used.Most of the tools show correlations between two variates.Our discussion skips the formulas,algorithms,and theories;only the concepts and techniques are presented. 5.1.1 Data Types The basic data types for statistical data analysis are univariate,bivariate,trivariate,and hypervariate which represent data with one dimension and one,two,three,and four or more variates.Cleveland also describes the multiway data type for data with higher dimensionality. 5.1.2 Reference Grids The most common display unit in statistics visualization is a two dimensional scatterplot,as depicted in the left panel of Figure 1.In the middle panel,simple grid lines are drawn for enhancement of pattern perception,not for plotting accuracy.Grids are drawn in equal intervals instead of numerical values.These reference lines are particularly powerful when we need to do scanning and matching of a matrix of scatterplots. 5.1.3 Fitted Curve In statistics,fitting means finding a description of a data set.For example,if a data set fits into a normal distribution,the whole data set can then be described by two numbers:its mean and standard deviation.In 6
of 2-variate displays. Most of the them developed in the statistics world. Both visual perception and statistical fitting of the data are of major concern. The data size is relatively small, usually in the order of hundreds of items. The graphics are mostly variations on two dimensional point and line plots. Multivariate display are the basis for many recently developed mdmv techniques, most of which use colorful graphics created by high-speed graphics computation. The data is usually larger and more complicated. A majority of the techniques were developed within the period of 1987–1991. Animation is a powerful tool for visualizing mdmv scientific data. Various movie animation techniques on mdmv data, and a scalar visualization animation model are presented. In principle, any single frame visualization technique can be extended to animation if the data can be represented as a time series showing two-way correlations. 5.1 Techniques Based on 2-variate Displays This section highlights some of the tools and summarizes the general approach developed based on 2-variate displays. The discussion is based upon the book by Cleveland [Cle93], which has a good collection of elegant visualization techniques developed by Cleveland, Tukey, and others throughout the 80’s. Tukey’s exploratory data analysis [Tuk77] is an important milestone of data visualization; most of the techniques were developed with pencil and paper during the early 70’s. Cleveland’s work emphasizes the structure of data and the validity of statistical models fitted to data. A majority of the visualization techniques are two dimensional, with the exception of isosurface plotting. Color is rarely used. Most of the tools show correlations between two variates. Our discussion skips the formulas, algorithms, and theories; only the concepts and techniques are presented. 5.1.1 Data Types The basic data types for statistical data analysis are univariate, bivariate, trivariate, and hypervariate which represent data with one dimension and one, two, three, and four or more variates. Cleveland also describes the multiway data type for data with higher dimensionality. 5.1.2 Reference Grids The most common display unit in statistics visualization is a two dimensional scatterplot, as depicted in the left panel of Figure 1. In the middle panel, simple grid lines are drawn for enhancement of pattern perception, not for plotting accuracy. Grids are drawn in equal intervals instead of numerical values. These reference lines are particularly powerful when we need to do scanning and matching of a matrix of scatterplots. 5.1.3 Fitted Curve In statistics, fitting means finding a description of a data set. For example, if a data set fits into a normal distribution, the whole data set can then be described by two numbers: its mean and standard deviation. In 6
0 .0 Figure 1:Left:A simple 2D scatterplot.Middle:A scatterplot with visual reference grids.Right:A fitted curve is included in the plot. statistics visualization,fitting means finding a smooth curve that describes the underlying pattern.In the right panel of Figure 1,a curve fit to the data is plotted;a pattern not apparent from the scatterplot before may suddenly emerge.Fitting formulas are not discussed in this paper;[Tay90,Cle93]are good references for this matter. 5.1.4 Banking The perception of the orientations of line segments can be enhanced by adjusting the aspect ratio of the graph. The aspect ratio of a graph is defined as the height of the data rectangle divided by the width.A line segment with an orientation of 45 or-45 is the best to convey linear properties of the curve.This technique is known as the banking to 45 principle [CMM93].In Figure 2,the same curve is plotted in three different aspect ratios.Only Figure 2:The same curve is plotted in three different aspect ratio.The upper left one conveys more information than the other two. the upper left panel shows both a curve on the left and a straight line on the right.The banking method is covered in [Cle93]. 5.1.5 Scatterplot Matrix One of the more popular statistics mdmy visualization techniques is the scatterplot matrix which presents multiple adjacent scatterplots.Each display panel in a scatterplot matrix is identified by its row and column numbers in the matrix.For example,the identity of the upper left panel of the matrix in Figure 3 is(1,3),and the lower right panel is(3,1).The empty diagonal panels denote the variable names.Panel (2,1)is a scatterplot of parameter X against Y while panel (1,2)is the reverse,i.e.,Y versus X.In a scatterplot matrix,every variate is treated identically.The >
Figure 1: Left: A simple 2D scatterplot. Middle: A scatterplot with visual reference grids. Right: A fitted curve is included in the plot. statistics visualization, fitting means finding a smooth curve that describes the underlying pattern. In the right panel of Figure 1, a curve fit to the data is plotted; a pattern not apparent from the scatterplot before may suddenly emerge. Fitting formulas are not discussed in this paper; [Tay90, Cle93] are good references for this matter. 5.1.4 Banking The perception of the orientations of line segments can be enhanced by adjusting the aspect ratio of the graph. The aspect ratio of a graph is defined as the height of the data rectangle divided by the width. A line segment with an orientation of 45 or 45 is the best to convey linear properties of the curve. This technique is known as the banking to 45 principle [CMM93]. In Figure 2, the same curve is plotted in three different aspect ratios. Only Figure 2: The same curve is plotted in three different aspect ratio. The upper left one conveys more information than the other two. the upper left panel shows both a curve on the left and a straight line on the right. The banking method is covered in [Cle93]. 5.1.5 Scatterplot Matrix One of the more popular statistics mdmv visualization techniques is the scatterplot matrix which presents multiple adjacent scatterplots. Each display panel in a scatterplot matrix is identified by its row and column numbers in the matrix. For example, the identity of the upper left panel of the matrix in Figure 3 is (1,3), and the lower right panel is (3,1). The empty diagonal panels denote the variable names. Panel (2,1) is a scatterplot of parameter X against Y while panel (1,2) is the reverse, i.e., Y versus X. In a scatterplot matrix, every variate is treated identically. The 7
0 0 00 ● Figure 3:A scatterplot matrix displays of data with three variates X,Y,and Z. basic idea is to visually link features in one panel with features in others.The redundancy is designed to improve the effect of visual linking.The technique is further enhanced with the help of reference grids.The pattern can be detected in both horizontal and vertical directions.The concept of linking is also discussed in [BMMS91]. The idea of pairwise adjacencies of variables is also a basis for the hyperbox [AC91],hierarchical axis [MGTS90,MTS91a,MTS91b],and HyperSlice [vWvL93].Despite its popularity in mdmv visualization applica- tions,nobody knows the identity of the original inventor [Cle93].The technique was first presented in [CCKT83]. A variety of powerful tools using this kind of multi-panel display are presented in [Cle93].The scatterplot matrix is also implemented in XmdvTool [War941. 5.1.6 Other Two Dimensional Analytical Techniques Cleveland's book also includes other powerful graphical techniques such as medium-difference plot,quantile- quantile plot,spread-location plot,given plot,and conditional plot,fitting tools such as loess and bisquare;and visual perception techniques such as jittering and outlier deletion. 5.2 Multivariate Visualization Techniques The scatterplot matrix uses multiple 2-way displays in an effort to provide correlation information among many variates simultaneously.The techniques described in this section are,however,aimed at extending the possibilities of multivariate correlation.All the techniques,with the exception of brushing and parallel coordinates,were developed after the 1987 NSF workshop.All of them claim positive results with real life mdmv scientific data. These techniques are also aimed at presenting much larger data sets than those appropriate for the statistical visualization techniques.Today's scientific data is huge;terabyte sized data will soon be common.A static scatterplot is just not big enough to display more than a few hundred data items.These techniques are broadly categorized into five sub-groups: Brushing allows direct manipulation of a mdmy visualization display.Only brushing a scatterplot matrix is described. 8
X Y Z Figure 3: A scatterplot matrix displays of data with three variates X, Y , and Z. basic idea is to visually link features in one panel with features in others. The redundancy is designed to improve the effect of visual linking. The technique is further enhanced with the help of reference grids. The pattern can be detected in both horizontal and vertical directions. The concept of linking is also discussed in [BMMS91]. The idea of pairwise adjacencies of variables is also a basis for the hyperbox [AC91], hierarchical axis [MGTS90, MTS91a, MTS91b], and HyperSlice [vWvL93]. Despite its popularity in mdmv visualization applications, nobody knows the identity of the original inventor [Cle93]. The technique was first presented in [CCKT83]. A variety of powerful tools using this kind of multi-panel display are presented in [Cle93]. The scatterplot matrix is also implemented in XmdvTool [War94]. 5.1.6 Other Two Dimensional Analytical Techniques Cleveland’s book also includes other powerful graphical techniques such as medium-difference plot, quantilequantile plot, spread-location plot, given plot, and conditional plot; fitting tools such as loess and bisquare; and visual perception techniques such as jittering and outlier deletion. 5.2 Multivariate Visualization Techniques The scatterplot matrix uses multiple 2-way displays in an effort to provide correlation information among many variates simultaneously. The techniques described in this section are, however, aimed at extending the possibilities of multivariate correlation. All the techniques, with the exception of brushing and parallel coordinates, were developed after the 1987 NSF workshop. All of them claim positive results with real life mdmv scientific data. These techniques are also aimed at presenting much larger data sets than those appropriate for the statistical visualization techniques. Today’s scientific data is huge; terabyte sized data will soon be common. A static scatterplot is just not big enough to display more than a few hundred data items. These techniques are broadly categorized into five sub-groups: Brushing allows direct manipulation of a mdmv visualization display. Only brushing a scatterplot matrix is described. 8
Panel matrix involves pairwise two dimensional plots of adjacent variates.Techniques included are Hyper- Slice and hyperbox.Both of them are elaborations of the scatterplot matrix. Iconography uses variates to determine values of parameters of small graphical objects,called icons or glyphs.Thousands of data points are represented by thousands of these icons which create a visual display characterized by varying texture patterns determined by the data.The mappings of data values to graphical parameters are usually chosen to generate texture patterns that hopefully bring insight into the data.Three iconographic techniques are described:stick figure icon,autoglyph,and color icons. Hierarchical displays map a subset of variates into different hierarchical levels of the display.Hierarchical axis,dimension stacking,and worlds within worlds belong to this group.These techniques support,or at least enable,dynamic interactive analysis. Non-Cartesian displays map data into non-Cartesian axes.They include parallel coordinates and VisDB. Parallel coordinates is the only technique that is capable of studying both multidimensional objects and multidimensional data. 5.2.1 Brushing Brushing was first presented in [BC87].It is included as one of the many direct manipulation techniques in [Cle93].There are two kinds of brushing a scatterplot matrix:labeling and enhanced linking.Labeling involves an interactive brush(e.g.,a mouse pointer)that causes information label(s)to pop-up for particular display item(s). In enhanced linking,the brush is an adjustable rectangle.It is used to cover a set of points in one of the panels. Figure 4 shows a rectangle brush in panel (3,2).Data inside the rectangle is displayed with a"+"instead of a"o." + 0 Figure 4:Enhanced brushing with the square brush located on panel (3,2). The same changes are applied to the corresponding data points in the other panels.By looking at different panels and comparing the vertical and horizontal extent of the brush,this enhanced linking technique provides a powerful direct manipulation tool for visual conditioning analysis.It is shown that the effect of brushing is more intense in a dynamic interactive display.In general,brushing can be added to many other mdmv visualization techniques [War94].More applications can be found in [Cle93]. 9
Panel matrix involves pairwise two dimensional plots of adjacent variates. Techniques included are HyperSlice and hyperbox. Both of them are elaborations of the scatterplot matrix. Iconography uses variates to determine values of parameters of small graphical objects, called icons or glyphs. Thousands of data points are represented by thousands of these icons which create a visual display characterized by varying texture patterns determined by the data. The mappings of data values to graphical parameters are usually chosen to generate texture patterns that hopefully bring insight into the data. Three iconographic techniques are described: stick figure icon, autoglyph, and color icons. Hierarchical displays map a subset of variates into different hierarchical levels of the display. Hierarchical axis, dimension stacking, and worlds within worlds belong to this group. These techniques support, or at least enable, dynamic interactive analysis. Non-Cartesian displays map data into non-Cartesian axes. They include parallel coordinates and VisDB. Parallel coordinates is the only technique that is capable of studying both multidimensional objects and multidimensional data. 5.2.1 Brushing Brushing was first presented in [BC87]. It is included as one of the many direct manipulation techniques in [Cle93]. There are two kinds of brushing a scatterplot matrix: labeling and enhanced linking. Labeling involves an interactive brush (e.g., a mouse pointer) that causes information label(s) to pop-up for particular display item(s). In enhanced linking, the brush is an adjustable rectangle. It is used to cover a set of points in one of the panels. Figure 4 shows a rectangle brush in panel (3,2). Data inside the rectangle is displayed with a “+” instead of a “.” X Y Z Figure 4: Enhanced brushing with the square brush located on panel (3,2). The same changes are applied to the corresponding data points in the other panels. By looking at different panels and comparing the vertical and horizontal extent of the brush, this enhanced linking technique provides a powerful direct manipulation tool for visual conditioning analysis. It is shown that the effect of brushing is more intense in a dynamic interactive display. In general, brushing can be added to many other mdmv visualization techniques [War94]. More applications can be found in [Cle93]. 9
5.2.2 HyperSlice HyperSlice [vWvL93]is one of the techniques invented during the elaboration and assessment stage.Like the scatterplot matrix,it has a matrix of panels,although each individual scatterplot is replaced with color or grey shaded graphics representing a scalar function of the variates.Furthermore,panels along the diagonal show the scalar function in terms of a single variate. HyperSlice defines a focal point of interest c=(c1,c2,...,c)and a set of scalar widths w:,where =1,...,n.Only data within the range R=[c/2,c+:/2]are displayed in the panel matrix.The rest of the data only appears if the user steers the focal point near it.Color Plate 1 shows the display of a HyperSlice of four variates.Like the coordinate system used in the scatterplot matrix,a HyperSlice panel is identified by a X5 X4 X3 X2 才 S X1 X2 X3 X4 X5 Figure 5:Navigate a five variate HyperSlice by dragging panel (4,2). horizontal and a vertical coordinate.For an off-diagonal panel i,j such thatij,the color shows the value of the scalar function that results from fixing the values of all variates except i andj to the values of the focal point, while varying i and j over their ranges in R.The diagonal panels show a graph of the scalar function versus one variate which changes over its range in R. The most important improvement of HyperSlice over the traditional scatterplot matrix is the idea of interactively navigating in the data around a user defined focal point.The user changes the focal point by interacting with any of the panels,as shown in Figure 5.The user moves the mouse into any panel and defines a direction by button down,move,and up.For example,the boldface arrow in panel(4,2)represents such an interaction.The direction of each arrow shows the motion of the focal point when the focal point is dragged in panel(4,2).Notice that the length (magnitude)of the vertical arrows across the X2 row,is the same as the vertical component of the arrow in(4,2).Similarly,each horizontal arrow in column X4 has the same length as the horizontal component of the arrow in panel (4,2).Panels solely related to X1,X3,and Xs move perpendicular to the image plan.Since the matrix is somewhat similar to an orthogonal matrix(along the grey diagonal panel),the motion on the upper left half is the mirror projection of the lower right. Interactive data navigation is a welcome addition to direct manipulation graphics.The use of the width scalar supports the notion of multiresolution analysis,and begins to address more than two-way correlations.Changing the focal point in one panel affects two variates which in turn results in simultaneous visual changes in displays of 10
5.2.2 HyperSlice HyperSlice [vWvL93] is one of the techniques invented during the elaboration and assessment stage. Like the scatterplot matrix, it has a matrix of panels, although each individual scatterplot is replaced with color or grey shaded graphics representing a scalar function of the variates. Furthermore, panels along the diagonal show the scalar function in terms of a single variate. HyperSlice defines a focal point of interest c = (c1; c2; ; cn) and a set of scalar widths wi , where i = 1; ; n. Only data within the range R = [ci wi=2; ci + wi=2] are displayed in the panel matrix. The rest of the data only appears if the user steers the focal point near it. Color Plate 1 shows the display of a HyperSlice of four variates. Like the coordinate system used in the scatterplot matrix, a HyperSlice panel is identified by a X5 X4 X3 X2 X1 X1 X2 X3 X4 X5 Figure 5: Navigate a five variate HyperSlice by dragging panel (4,2). horizontal and a vertical coordinate. For an off-diagonal panel i; j such that i 6= j, the color shows the value of the scalar function that results from fixing the values of all variates except i and j to the values of the focal point, while varying i and j over their ranges in R. The diagonal panels show a graph of the scalar function versus one variate which changes over its range in R. The most important improvement of HyperSlice over the traditional scatterplot matrix is the idea of interactively navigating in the data around a user defined focal point. The user changes the focal point by interacting with any of the panels, as shown in Figure 5. The user moves the mouse into any panel and defines a direction by button down, move, and up. For example, the boldface arrow in panel (4,2) represents such an interaction. The direction of each arrow shows the motion of the focal point when the focal point is dragged in panel (4,2). Notice that the length (magnitude) of the vertical arrows across the X2 row, is the same as the vertical component of the arrow in (4,2). Similarly, each horizontal arrow in column X4 has the same length as the horizontal component of the arrow in panel (4,2). Panels solely related to X1, X3, and X5 move perpendicular to the image plan. Since the matrix is somewhat similar to an orthogonal matrix (along the grey diagonal panel), the motion on the upper left half is the mirror projection of the lower right. Interactive data navigation is a welcome addition to direct manipulation graphics. The use of the width scalar supports the notion of multiresolution analysis, and begins to address more than two-way correlations. Changing the focal point in one panel affects two variates which in turn results in simultaneous visual changes in displays of 10