1p1ots.tex,b2648c8on2011/04/01 A visual tour of interactive graphics with R Christophe Lalanne* March,2011 Abstract:We here describe simple use of interactive data analysis using the iPlots R package.The idea is to use brushing in linked graphics to foster exploratory analysis and model diagnostic.Other R packages are discussed. Packages::iPlots·rgl·rggobi 1 Motivations Far from being an exhaustive review of interactive and dynamic statistical graphics,the idea here is to review some of the available capabilities in R.A larger review is provided in Cook and Swayne(2007),using the GGobi software and its R interface. We will focus on two aspects of interactive visualization,namely brushing (Becker and Cleve- land,1988)and 3D interactivty. 2 The iPlots eXtreme package The iPlots eXtreme package,aka Acinonyx (Urbanek,2009),is available from .It should supersede the traditional iPlots package.Although its functionnalities may appear rather limited at the moment,it already allows the user to explore data in an interactive manner,with linking and brushing enabled by default. Let us assume a simple linear model of the form yi=0.4 x i+E,where i~(0,12),that can be readily simulated in R as follows: set.seed(101) n) Well,it merely summarizes the type of object that is being plotted,and its address in memory. More information can be gathered by looking at its class: E-mail:ch.lalanneat|gmail.com.Text available on www.aliquote.org,in /articles/tech/rvisuals ens!A d 一●●●●●●●●
R Visuals iplots.tex, b2648c8 on 2011/04/01 A visual tour of interactive graphics with R Christophe Lalanne∗ March, 2011 Abstract: We here describe simple use of interactive data analysis using the iPlots R package. The idea is to use brushing in linked graphics to foster exploratory analysis and model diagnostic. Other R packages are discussed. Packages: iPlots • rgl • rggobi 1 Motivations Far from being an exhaustive review of interactive and dynamic statistical graphics, the idea here is to review some of the available capabilities in R. A larger review is provided in Cook and Swayne (2007), using the GGobi software and its R interface. We will focus on two aspects of interactive visualization, namely brushing (Becker and Cleveland, 1988) and 3D interactivty. 2 The iPlots eXtreme package The iPlots eXtreme package, aka Acinonyx (Urbanek, 2009), is available from . It should supersede the traditional iPlots package. Although its functionnalities may appear rather limited at the moment, it already allows the user to explore data in an interactive manner, with linking and brushing enabled by default. Let us assume a simple linear model of the form yi = 0.4 × xi + εi , where εi ∼ N (0, 1 2 ), that can be readily simulated in R as follows: set.seed(101) n ) Well, it merely summarizes the type of object that is being plotted, and its address in memory. More information can be gathered by looking at its class: ∗ E-mail: ch.lalanne|at|gmail.com. Text available on www.aliquote.org, in /articles/tech/rvisuals
1p1ots.tex,b2648c8on2011/04/01 [1]"iScatterplot""iPlot" "iVisual" "iObject" In fact,our scatterplot is a subclass of iplot It does not support the 'formula'interface, so data must be entered separately as x and y.However,overplotting is done by using transparency which results in nice-looking plots,while allowing to get a feel of the 2D density. Now,adding a regression line is as simple as ip lm(y -x) If we ask for an histogram of the i,the new plot will be automatically linked to the previous one.Note that it brings out a new graphic device,but we will learn shortly how to put them in a common frame. ihist(x) The top panel shows a scatterplot and an histogram for the same data after we selected a certain range of x values.On the bottom panel,we do the reverse and select statistical units in the scatterplot. 件09 3 The rgl package The rgl package,,uses OpenGL as a rendering engine,and provides interesting 3D viewing option,otherwise lacking in R. To get a feel of rgl capabilities,just try demo(bivar) to show up a parametric density surface of a bivariate normal distribution. sjens!A y ●●●●●●●●一 品
R Visuals iplots.tex, b2648c8 on 2011/04/01 [1] "iScatterplot" "iPlot" "iVisual" "iObject" In fact, our scatterplot is a subclass of iPlot It does not support the ‘formula’ interface, so data must be entered separately as x and y. However, overplotting is done by using transparency which results in nice-looking plots, while allowing to get a feel of the 2D density. Now, adding a regression line is as simple as ip + lm(y ~ x) If we ask for an histogram of the xi , the new plot will be automatically linked to the previous one. Note that it brings out a new graphic device, but we will learn shortly how to put them in a common frame. ihist(x) The top panel shows a scatterplot and an histogram for the same data, after we selected a certain range of x values. On the bottom panel, we do the reverse and select statistical units in the scatterplot. 3 The rgl package The rgl package, , uses OpenGL as a rendering engine, and provides interesting 3D viewing option, otherwise lacking in R. To get a feel of rgl capabilities, just try demo(bivar) to show up a parametric density surface of a bivariate normal distribution
1p1ots.tex,b2648c8on2011/04/01 The code to generate this figure is rather simple;here is a snipped version: n<-50;ngrid<-40 x <-rnorm(n);y <-rnorm(n) denobj<-kde2d(x,y, n=ngrid) den.z <-denobj$z xgrid <-denobjSx ygrid <-denobjsy bi.z <-dnorm(xgrid)%*t(dnorm(ygrid)) zscale<-20 Draws simulated data spheres3d(x,y,rep(0,n),radius=0.1) Draws non-parametric density surface3d(xgrid,ygrid,den.z*zscale,alpha=0.5) Draws parametric density surface3d(xgrid,ygrid,bi.z*zscale,front="lines") As an example,the following piece of code intends to show how PCA basically works.We first generate a matrix of random data,with a specific covariance structure,and then show the first three principal axes.Part of the code shown below comes from the excellent tutorials on Information Visualisation by Ross lhaka. sim.cor.data <-function(n=30,p=2,rho=0.6,sigma=1){ require(mvtnorm) H<-abs(outer(1:p,1:p,"-") V <-sigma rho H X <-rmvnorm(n,rep(0,p),V) return(X) X <-sim.cor.data(n=100,p=5) X.pca <-prcomp(X,scale=TRUE) Now,constructing the 3D plots is done as follows. rgl.open() rgl.bg(color="white") display the 3D cloud rgl.points(X.pca$x[,1:3],col="black",size=5,point_antialias=TRUE) set up a reference plane xyz.lims <-apply(X.pca$x[,1:3],2,range) sjens!A y ●●●一●●●●●
R Visuals iplots.tex, b2648c8 on 2011/04/01 The code to generate this figure is rather simple; here is a snipped version: n <- 50; ngrid <- 40 x <- rnorm(n); y <- rnorm(n) denobj <- kde2d(x, y, n=ngrid) den.z <-denobj$z xgrid <- denobj$x ygrid <- denobj$y bi.z <- dnorm(xgrid)%*%t(dnorm(ygrid)) zscale<-20 # Draws simulated data spheres3d(x,y,rep(0,n),radius=0.1) # Draws non-parametric density surface3d(xgrid,ygrid,den.z*zscale,alpha=0.5) # Draws parametric density surface3d(xgrid,ygrid,bi.z*zscale,front="lines") As an example, the following piece of code intends to show how PCA basically works. We first generate a matrix of random data, with a specific covariance structure, and then show the first three principal axes. Part of the code shown below comes from the excellent tutorials on Information Visualisation by Ross Ihaka. sim.cor.data <- function(n=30, p=2, rho=0.6, sigma=1) { require(mvtnorm) H <- abs(outer(1:p, 1:p, "-")) V <- sigma * rho^H X <- rmvnorm(n, rep(0,p), V) return(X) } X <- sim.cor.data(n=100, p=5) X.pca <- prcomp(X, scale=TRUE) Now, constructing the 3D plots is done as follows. rgl.open() rgl.bg(color="white") # display the 3D cloud rgl.points(X.pca$x[,1:3], col="black", size=5, point_antialias=TRUE) # set up a reference plane xyz.lims <- apply(X.pca$x[,1:3], 2, range)
1p1ots.tex,b2648c8on2011/04/01 bot.plane <-min(xyz.lims[1,3])-diff(xyz.lims[,3])/10 bot.plane <-mean(X.pca$x[,3]) rgl.surface(seq(xyz.lims[1,1],xyz.lims [2,1],length=10), seq(xyz.lims [1,2],xyz.lims [2,2],length=10), rep(bot.plane,10*10), color="#CCCCFF",front="lines") To capture the output,we can use rgl.snapshot(filename),where filename is the name of the PNG file to be saved. Instead of a reference plane,we could directly draw unit vectors rg1.1ines(c(0,1),c(0,0),c(0,0),col="red",1wd=2) rg1.1ines(c(0,0),c(0,1),c(0,0),col="red",1wd=2) rg1.1ines(c(0,0),c(0,0),c(0,1),co1="red",1wd=2) or axes (ranging from min to max observed values) rgl.lines(xyz.lims[,1],c(0,0),c(0,0),col="red",lwd=2) rgl.lines(c(0,0),xyz.lims[,2],c(0,0),col="red",lwd=2) rgl.lines(c(0,0),c(0,0),xyz.lims[,3],col="red",lwd=2) rg1.texts(c(xyz.1ims[2,1]+.5,-.15,-.15), c(-.15,xyz.1ims[2,2]+.5,-.15) c(-.15,-.15,xyz.1ims[2,3]+.5),1 etters[24:26],co1="red") Both results are shown below 1 Finally,there is no possibility of brushing an rgl device,but we can use spinning (here,360) with: for(i in seq(0,360,by =1)){ rgl.viewpoint(theta =i,phi =0) Sys.sleep(1/60) There are alternative and more practical ways to the above,as found in e.g.,ordirgl in the vegan package,or the BiplotGUI package that provides a complete environment for ens!A ●●●●●●●●
R Visuals iplots.tex, b2648c8 on 2011/04/01 bot.plane <- min(xyz.lims[1,3]) - diff(xyz.lims[,3])/10 bot.plane <- mean(X.pca$x[,3]) rgl.surface(seq(xyz.lims[1,1],xyz.lims[2,1], length=10), seq(xyz.lims[1,2],xyz.lims[2,2], length=10), rep(bot.plane, 10*10), color="#CCCCFF", front="lines") To capture the output, we can use rgl.snapshot(filename), where filename is the name of the PNG file to be saved. Instead of a reference plane, we could directly draw unit vectors rgl.lines(c(0,1), c(0,0), c(0,0), col="red", lwd=2) rgl.lines(c(0,0), c(0,1), c(0,0), col="red", lwd=2) rgl.lines(c(0,0), c(0,0), c(0,1), col="red", lwd=2) or axes (ranging from min to max observed values) rgl.lines(xyz.lims[,1], c(0,0), c(0,0), col="red", lwd=2) rgl.lines(c(0,0), xyz.lims[,2], c(0,0), col="red", lwd=2) rgl.lines(c(0,0), c(0,0), xyz.lims[,3], col="red", lwd=2) rgl.texts(c(xyz.lims[2,1]+.5,-.15,-.15), c(-.15,xyz.lims[2,2]+.5,-.15), c(-.15,-.15,xyz.lims[2,3]+.5), letters[24:26], col="red") Both results are shown below. Finally, there is no possibility of brushing an rgl device, but we can use spinning (here, 360◦ ) with: for(i in seq(0, 360, by = 1)) { rgl.viewpoint(theta = i, phi = 0) Sys.sleep(1/60) } There are alternative and more practical ways to the above, as found in e.g., ordirgl in the vegan package, or the BiplotGUI package that provides a complete environment for
1p1ots.tex,b2648c8on2011/04/01 manipulating biplots(Gower and Hand,1996),in 2D or 3D.For those who are seeking a more direct application of the commands discussed here,you can try to adapt the sphpca function in the psy package(Falissard,1996). 4 Back to the basics So far,we only talked about dedictaed environments for interactive visualization.However, the base R functionalities might still prove to be useful in some cases.In fact,the tcltk package offers a simple way to attach interactive buttons to the current device. Let's say we want to intercatively display the most extremes individuals on a given matrix of scores.'Extreme'could mean many things,but for now assume this is a percentile-based measure,for example the 5e and 95e percentile are used to flag individuals having extreme low or high scores. filter.perc <function(x,cutoff=c(.05,.95),id=NULL,collate=FALSE){ lh <quantile(x,cutoff,na.rm=TRUE) out <-list(x.low=which(x Ih[1]),x.high=which(x 1h[2])) if (!is.null(id)){ out [["x.low"]]<-id[out [["x.low"]]] out [["x.high"]]<-id[out [["x.high"]]] if (collate) out <unique(c(out [["x.low"]],out [["x.high"]])) return(out) n<-500 scores <-replicate(5,rnorm(n,mean=sample(20:40,1))) idx <-apply(scores,2,filter.perc,id=NULL,collate=TRUE) my.col <-as.numeric(1:n %in%unique(unlist(idx)))+1 splom(-scores,pch=19,col=my.col,alpha=.5,cex=.6) A simple display for the distribution of these five series of scores is shown below,with individuals in red corresponding to those being in the lowest or highest fifth percentile.(Also,keep in mind that is done in a purely univariate manner.) Now,what about varying the thresholds for highlighting individuals?Instead of repeating the same steps,we could simply add a dynamic selector to this display. Using aplpack:slider.this can be implemented as follows: do.it <-function() require(aplpack) update.display <function(...){ value <-slider(no=1) idx <-apply(scores,2,filter.perc,cutoff=c(value,1-value), id=NULL,collate=TRUE) slens!A d ●●●●●●●●
R Visuals iplots.tex, b2648c8 on 2011/04/01 manipulating biplots (Gower and Hand, 1996), in 2D or 3D. For those who are seeking a more direct application of the commands discussed here, you can try to adapt the sphpca function in the psy package (Falissard, 1996). 4 Back to the basics So far, we only talked about dedictaed environments for interactive visualization. However, the base R functionalities might still prove to be useful in some cases. In fact, the tcltk package offers a simple way to attach interactive buttons to the current device. Let’s say we want to intercatively display the most extremes individuals on a given matrix of scores. ‘Extreme’ could mean many things, but for now assume this is a percentile-based measure, for example the 5e and 95e percentile are used to flag individuals having extreme low or high scores. filter.perc lh[2])) if (!is.null(id)) { out[["x.low"]] <- id[out[["x.low"]]] out[["x.high"]] <- id[out[["x.high"]]] } if (collate) out <- unique(c(out[["x.low"]], out[["x.high"]])) return(out) } n <- 500 scores <- replicate(5, rnorm(n, mean=sample(20:40, 1))) idx <- apply(scores, 2, filter.perc, id=NULL, collate=TRUE) my.col <- as.numeric(1:n %in% unique(unlist(idx))) + 1 splom(~ scores, pch=19, col=my.col, alpha=.5, cex=.6) A simple display for the distribution of these five series of scores is shown below, with individuals in red corresponding to those being in the lowest or highest fifth percentile. (Also, keep in mind that is done in a purely univariate manner.) Now, what about varying the thresholds for highlighting individuals? Instead of repeating the same steps, we could simply add a dynamic selector to this display. Using aplpack::slider, this can be implemented as follows: do.it <- function() { require(aplpack) update.display <- function(...) { value <- slider(no=1) idx <- apply(scores, 2, filter.perc, cutoff=c(value, 1-value), id=NULL, collate=TRUE)
1p1ots.tex,b2648c8on2011/04/01 27 25 2627 26 25 V5 24 232425 23 35 32333435 33 32 V4 32 29303132 30 29 0 3 3730 35 34 35 36 34 43 2 40414243 40 V1 40 37383940 38 Matrice de nuages de points my.col . 5 Miscalleneous TODO. sjens!A d ●●●●●●-●● 是
R Visuals iplots.tex, b2648c8 on 2011/04/01 Matrice de nuages de points 40 V1 41 42 43 40 41 42 43 37 38 39 40 37 38 39 40 36 V2 37 38 36 37 38 34 35 36 34 35 36 38 V3 39 40 38 39 40 36 37 38 36 37 38 32 V4 33 34 35 32 33 34 35 29 30 31 32 29 30 31 32 25 V5 26 27 25 26 27 23 24 25 23 24 25 my.col . 5 Miscalleneous TODO
1p1ots.tex,b2648c8on2011/04/01 discuss 3D PCA in psy ● mention BiplotGUI .discuss ordirgl in vegan library(Rcmdr) attach(mtcars) scatter3d(wt,disp,mpg) ●●●●●●●
R Visuals iplots.tex, b2648c8 on 2011/04/01 • discuss 3D PCA in psy • mention BiplotGUI • discuss ordirgl in vegan library(Rcmdr) attach(mtcars) scatter3d(wt, disp, mpg)
1p1ots.tex,b2648c8on2011/04/01 References Cook,D.and Swayne,D.(2007).Interactive and Dynamic Graphics for Data Analysis With R and GGobi. Springer.http://www.ggobi.org/book/. Becker,R.and Cleveland,W.(1988).Brushing scatterplots.In Cleveland,W.and McGill,M..editors,Dynamic Graphics for Statistics,pages 201-224.Wadsworth Brooks/Cole,Belmont,CA. Urbanek,S.(2009).iPlots eXtreme.Next-generation interactive graphics for analysis of large data.In UseR! 2009 Conference.http://www.r-project.org/conferences/useR-2009/slides/Urbanek.pdf. Gower,J.and Hand,D.(1996).Biplots.Chapman Hall,London,UK. Falissard,B.(1996).A spherical representation of a correlation matrix.Journal of Classification,13(2).167-280. sjensIA d ●●●●●●● 品
R Visuals iplots.tex, b2648c8 on 2011/04/01 References Cook, D. and Swayne, D. (2007). Interactive and Dynamic Graphics for Data Analysis With R and GGobi. Springer. http://www.ggobi.org/book/. Becker, R. and Cleveland, W. (1988). Brushing scatterplots. In Cleveland, W. and McGill, M., editors, Dynamic Graphics for Statistics, pages 201-224. Wadsworth & Brooks/Cole, Belmont, CA. Urbanek, S. (2009). iPlots eXtreme. Next-generation interactive graphics for analysis of large data. In UseR! 2009 Conference . http://www.r-project.org/conferences/useR-2009/slides/Urbanek.pdf. Gower, J. and Hand, D. (1996). Biplots. Chapman & Hall, London, UK. Falissard, B. (1996). A spherical representation of a correlation matrix. Journal of Classification, 13(2), 167-280