Multivariate Linear Models in R An Appendix to An R Companion to Applied Regression,Second Edition John Fox Sanford Weisberg last revision:28 July 2011 Abstract The multivariate linear model is ()()(m)(n) where Y is a matrix of n observations on m response variables;X is a model matrix with columns for k+1 regressors,typically including an initial column of 1s for the regression constant;B is a matrix of regression coefficients,one column for each response variable;and E is a matrix of errors.This model can be fit with the lm function in R,where the left-hand side of the model comprises a matrix of response variables,and the right-hand side is specified exactly as for a univariate linear model (i.e.,with a single response variable).This appendix to Fox and Weisberg(2011)explains how to use the Anova and linearHypothesis functions in the car package to test hypotheses for parameters in multivariate linear models,including models for repeated-measures data. 1 Basic Ideas The multivariate linear model accommodates two or more response variables.The theory of mul- tivariate linear models is developed very briefly in this section.Much more extensive treatments may be found in the recommended reading for this appendix. The multivariate general linear model is Y,= XB+E (n×m)(n×k+1)(k+1×m)(m×m) where Y is a matrix of n observations on m response variables;X is a model matrix with columns for k+1 regressors,typically including an initial column of 1s for the regression constant;B is a matrix of regression coefficients,one column for each response variable;and E is a matrix of errors.1 The contents of the model matrix are exactly as in the univariate linear model (as described in Ch.4 of An R Companion to Applied Regression,Fox and Weisberg,2011-hereafter,the "R Companion"), and may contain,therefore,dummy regressors representing factors,polynomial or regression-spline terms,interaction regressors,and so on. The assumptions of the multivariate linear model concern the behavior of the errors:Let e represent the ith row of E.Then e~Nm(0,>)where is a nonsingular error-covariance matrix, constant across observations;e and e are independent for ii;and X is fixed or independent A typographical note:B and E are,respectively,the upper-case Greek letters Beta and Epsilon.Because these are indistinguishable from the corresponding Roman letters B and E,we will denote the estimated regression coefficients as B and the residuals as E. 1
Multivariate Linear Models in R An Appendix to An R Companion to Applied Regression, Second Edition John Fox & Sanford Weisberg last revision: 28 July 2011 Abstract The multivariate linear model is Y (n×m) = X (n×k+1) B (k+1×m) + E (n×m) where Y is a matrix of n observations on m response variables; X is a model matrix with columns for k + 1 regressors, typically including an initial column of 1s for the regression constant; B is a matrix of regression coefficients, one column for each response variable; and E is a matrix of errors. This model can be fit with the lm function in R, where the left-hand side of the model comprises a matrix of response variables, and the right-hand side is specified exactly as for a univariate linear model (i.e., with a single response variable). This appendix to Fox and Weisberg (2011) explains how to use the Anova and linearHypothesis functions in the car package to test hypotheses for parameters in multivariate linear models, including models for repeated-measures data. 1 Basic Ideas The multivariate linear model accommodates two or more response variables. The theory of multivariate linear models is developed very briefly in this section. Much more extensive treatments may be found in the recommended reading for this appendix. The multivariate general linear model is Y (n×m) = X (n×k+1) B (k+1×m) + E (n×m) where Y is a matrix of n observations on m response variables; X is a model matrix with columns for k+ 1 regressors, typically including an initial column of 1s for the regression constant; B is a matrix of regression coefficients, one column for each response variable; and E is a matrix of errors.1 The contents of the model matrix are exactly as in the univariate linear model (as described in Ch. 4 of An R Companion to Applied Regression, Fox and Weisberg, 2011—hereafter, the “R Companion”), and may contain, therefore, dummy regressors representing factors, polynomial or regression-spline terms, interaction regressors, and so on. The assumptions of the multivariate linear model concern the behavior of the errors: Let ε 0 i represent the ith row of E. Then ε 0 i ∼ Nm(0, Σ), where Σ is a nonsingular error-covariance matrix, constant across observations; ε 0 i and ε 0 i 0 are independent for i 6= i 0 ; and X is fixed or independent 1A typographical note: B and E are, respectively, the upper-case Greek letters Beta and Epsilon. Because these are indistinguishable from the corresponding Roman letters B and E, we will denote the estimated regression coefficients as Bb and the residuals as Eb. 1
of E.We can write more compactly that vec(E)~Nnm(0,In )Here,vec(E)ravels the error matrix row-wise into a vector,and is the Kronecker-product operator. The maximum-likelihood estimator of B in the multivariate linear model is equivalent to equation-by-equation least squares for the individual responses: B=(X'X)-X'Y Procedures for statistical inference in the multivariate linear model,however,take account of the fact that there are several,generally correlated,responses. Paralleling the decomposition of the total sum of squares into regression and residual sums of squares in the univariate linear model,there is in the multivariate linear model a decomposi- tion of the total sum-of-squares-and-cross-products (SSP)matrix into regression and residual SSP matrices.We have SSPT Y'Y-nyy (m×m) =它+(Y'空-n寸) SSPR+SSPReg where y is the (m x 1)vector of means for the response variables;Y=XB is the matrix of fitted values;and E=Y-Y is the matrix of residuals. Many hypothesis tests of interest can be formulated by taking differences in SSPReg(or,equiva- lently,SSPR)for nested models.Let SSPH represent the incremental SSP matrix for a hypothesis. Multivariate tests for the hypothesis are based on the m eigenvalues of SSPrSSP(the hy- pothesis SSP matrix"divided by"the residual SSP matrix),that is,the values of A for which det(SSPHSSPR-AI)=0 The several commonly employed multivariate test statistics are functions of these eigenvalues: llai-Bartlett Trace,TP=A m Hotelling-Lawley Trace,T=j j=1 (1) m 1 Wilks's Lambda,A= j=1 Roy's Maximum Root,Ai By convention,the eigenvalues of SSPHSSPR are arranged in descending order,and so is the largest eigenvalue.There are F approximations to the null distributions of these test statistics. For example,for Wilks's Lambda,let s represent the degrees of freedom for the term that we are testing (i.e.,the number of columns of the model matrix X pertaining to the term).Define r=n-k-1-m-8+1 2 (2) u=ms-2 4 Vm2s2-4 m2+s2-5 for m2+s2-5>0 t 0 otherwise 2
of E. We can write more compactly that vec(E) ∼ Nnm(0, In ⊗ Σ). Here, vec(E) ravels the error matrix row-wise into a vector, and ⊗ is the Kronecker-product operator. The maximum-likelihood estimator of B in the multivariate linear model is equivalent to equation-by-equation least squares for the individual responses: Bb = (X0X) −1X0Y Procedures for statistical inference in the multivariate linear model, however, take account of the fact that there are several, generally correlated, responses. Paralleling the decomposition of the total sum of squares into regression and residual sums of squares in the univariate linear model, there is in the multivariate linear model a decomposition of the total sum-of-squares-and-cross-products (SSP) matrix into regression and residual SSP matrices. We have SSPT (m×m) = Y0Y − ny y 0 = Eb0Eb + Yb 0Yb − ny y 0 = SSPR + SSPReg where y is the (m × 1) vector of means for the response variables; Yb = XBb is the matrix of fitted values; and Eb = Y − Yb is the matrix of residuals. Many hypothesis tests of interest can be formulated by taking differences in SSPReg (or, equivalently, SSPR) for nested models. Let SSPH represent the incremental SSP matrix for a hypothesis. Multivariate tests for the hypothesis are based on the m eigenvalues λj of SSPHSSP−1 R (the hypothesis SSP matrix “divided by” the residual SSP matrix), that is, the values of λ for which det(SSPHSSP−1 R − λIm) = 0 The several commonly employed multivariate test statistics are functions of these eigenvalues: Pillai-Bartlett Trace, TP B = Xm j=1 λj 1 − λj Hotelling-Lawley Trace, THL = Xm j=1 λj Wilks’s Lambda, Λ = Ym j=1 1 1 + λj Roy’s Maximum Root, λ1 (1) By convention, the eigenvalues of SSPHSSP−1 R are arranged in descending order, and so λ1 is the largest eigenvalue. There are F approximations to the null distributions of these test statistics. For example, for Wilks’s Lambda, let s represent the degrees of freedom for the term that we are testing (i.e., the number of columns of the model matrix X pertaining to the term). Define r = n − k − 1 − m − s + 1 2 (2) u = ms − 2 4 t = √ m2s 2 − 4 m2 + s 2 − 5 for m2 + s 2 − 5 > 0 0 otherwise 2
Rao (1973,p.556)shows that under the null hypothesis, 1-△1/trt-2u F0= 41/× (3) ms follows an approximate F-distribution with ms and rt-2u degrees of freedom,and that this result is exact if min(m,s)<2(a circumstance under which all four test statistics are equivalent). Even more generally,suppose that we want to test the linear hypothesis Ho:L B=C (4) (g×k+1)(k+1×m)(q×m) where L is a hypothesis matrix of full-row rank g<k+1,and the right-hand-side matrix C consists of constants(usually 0s).2 Then the SSP matrix for the hypothesis is SSP=(B-C)L(X')L](LB-c) and the various test statistics are based on the p=min(g,m)nonzero eigenvalues of SSPSSPR (and the formulas in Equations 1,2,and 3 are adjusted by substituting p for m) When a multivariate response arises because a variable is measured on different occasions, or under different circumstances (but for the same individuals),it is also of interest to formulate hypotheses concerning comparisons among the responses.This situation,called a repeated-measures design,can be handled by linearly transforming the responses using a suitable model matrix,for example extending the linear hypothesis in Equation 4 to Ho:L B P =C (5) (g×k+1)(k+1×m)(m×v)(9×u) Here,the response-transformation matrix P provides contrasts in the responses (see,e.g.,Hand and Taylor,1987,or O'Brien and Kaiser,1985).The SSP matrix for the hypothesis is SSPa=(P'BL-CL(X'x)L LBP-C) (gxq) and test statistics are based on the p=min(g,v) nonzero eigenvalues of SSPH(P'SSPRP)-1. 2 Fitting and Testing Multivariate Linear Models in R Multivariate linear models are fit in R with the lm function.The procedure is the essence of simplicity:The left-hand side of the model is a matrix of responses,with each column representing a response variable and each row an observation;the right-hand side of the model and all other arguments to lm are precisely the same as for a univariate linear model (as described in Chap.4 of the R Companion).Typically,the response matrix is composed from individual response variables via the cbind function. The anova function in the standard R distribution is capable of handling multivariate linear models(see Dalgaard,2007),but the Anova and linearHypothesis functions in the car package may also be employed,in a manner entirely analogous to that described in the R Companion 2Cf.,Sec.4.4.5 of the R Companion for linear hypotheses in univariate linear models. 3
Rao (1973, p. 556) shows that under the null hypothesis, F0 = 1 − Λ 1/t Λ1/t × rt − 2u ms (3) follows an approximate F-distribution with ms and rt−2u degrees of freedom, and that this result is exact if min(m, s) ≤ 2 (a circumstance under which all four test statistics are equivalent). Even more generally, suppose that we want to test the linear hypothesis H0: L (q×k+1) B (k+1×m) = C (q×m) (4) where L is a hypothesis matrix of full-row rank q ≤ k+1, and the right-hand-side matrix C consists of constants (usually 0s).2 Then the SSP matrix for the hypothesis is SSPH = Bb0L 0 − C0 hL(X 0X) −1L 0 i−1 LBb − C and the various test statistics are based on the p = min(q, m) nonzero eigenvalues of SSPHSSP−1 R (and the formulas in Equations 1, 2, and 3 are adjusted by substituting p for m). When a multivariate response arises because a variable is measured on different occasions, or under different circumstances (but for the same individuals), it is also of interest to formulate hypotheses concerning comparisons among the responses. This situation, called a repeated-measures design, can be handled by linearly transforming the responses using a suitable model matrix, for example extending the linear hypothesis in Equation 4 to H0: L (q×k+1) B (k+1×m) P (m×v) = C (q×v) (5) Here, the response-transformation matrix P provides contrasts in the responses (see, e.g., Hand and Taylor, 1987, or O’Brien and Kaiser, 1985). The SSP matrix for the hypothesis is SSPH (q×q) = P 0Bb0L 0 − C0 hL(X 0X) −1L 0 i−1 LBPb − C and test statistics are based on the p = min(q, v) nonzero eigenvalues of SSPH(P0SSPRP) −1 . 2 Fitting and Testing Multivariate Linear Models in R Multivariate linear models are fit in R with the lm function. The procedure is the essence of simplicity: The left-hand side of the model is a matrix of responses, with each column representing a response variable and each row an observation; the right-hand side of the model and all other arguments to lm are precisely the same as for a univariate linear model (as described in Chap. 4 of the R Companion). Typically, the response matrix is composed from individual response variables via the cbind function. The anova function in the standard R distribution is capable of handling multivariate linear models (see Dalgaard, 2007), but the Anova and linearHypothesis functions in the car package may also be employed, in a manner entirely analogous to that described in the R Companion 2Cf., Sec. 4.4.5 of the R Companion for linear hypotheses in univariate linear models. 3
Figure 1:Three species of irises in the Anderson/Fisher data set:setosa (left),versicolor (center), and virginica(right).Source:The photographs are respectively by Radomil Binek,Danielle Langlois,and Frank Mayfield,and are distributed under the Creative Commons Attribution-Share Alike 3.0 Unported license (first and second images)or 2.0 Creative Commons Attribution-Share Alike Generic license (third image);they were obtained from the Wikimedia Commons. (Sec.4.4)for univariate linear models.We briefly demonstrate the use of these functions in this section. To illustrate multivariate linear models,we will use data collected by Anderson (1935)on three species of irises in the Gaspe Peninsula of Quebec,Canada.The data are of historical interest in statistics,because they were employed by R.A.Fisher (1936)to introduce the method of discriminant analysis.The data frame iris is part of the standard R distribution: library(car) some(iris) Sepal.Length Sepal.Width Petal.Length Petal.Width Species 25 4.8 3.4 1.9 0.2 setosa 47 5.1 3.8 1.6 0.2 setosa 67 5.6 3.0 4.5 1.5 versicolor 73 6.3 2.5 4.9 1.5 versicolor 104 6.3 2.9 5.6 1.8 virginica 109 6.7 2.5 5.8 1.8 virginica 113 6.8 3.0 5.5 2.1 virginica 131 7.4 2.8 6.1 1.9 virginica 140 6.9 3.1 5.4 2.1 virginica 149 6.2 3.4 5.4 2.3 virginica The first four variables in the data set represent measurements (in cm)of parts of the flowers, while the final variable specifies the species of iris.(Sepals are the green leaves that comprise the calyx of the plant,which encloses the flower.)Photographs of examples of the three species of irises-setosa,versicolor,and virginica-appear in Figure 1.Figure 2 is a scatterplot matrix of the four measurements classified by species,showing within-species 50 and 95%concentration ellipses (see Sec.4.3.8 of the R Companion);Figure 3 shows boxplots for each of the responses by species: scatterplotMatrix(Sepal.Length Sepal.Width Petal.Length Petal.Width I Species, + data=iris,smooth=FALSE,reg.line=FALSE,ellipse=TRUE, by.groups=TRUE,diagonal="none") 4
Figure 1: Three species of irises in the Anderson/Fisher data set: setosa (left), versicolor (center), and virginica (right). Source: The photographs are respectively by Radomil Binek, Danielle Langlois, and Frank Mayfield, and are distributed under the Creative Commons Attribution-Share Alike 3.0 Unported license (first and second images) or 2.0 Creative Commons Attribution-Share Alike Generic license (third image); they were obtained from the Wikimedia Commons. (Sec. 4.4) for univariate linear models. We briefly demonstrate the use of these functions in this section. To illustrate multivariate linear models, we will use data collected by Anderson (1935) on three species of irises in the Gasp´e Peninsula of Quebec, Canada. The data are of historical interest in statistics, because they were employed by R. A. Fisher (1936) to introduce the method of discriminant analysis. The data frame iris is part of the standard R distribution: > library(car) > some(iris) Sepal.Length Sepal.Width Petal.Length Petal.Width Species 25 4.8 3.4 1.9 0.2 setosa 47 5.1 3.8 1.6 0.2 setosa 67 5.6 3.0 4.5 1.5 versicolor 73 6.3 2.5 4.9 1.5 versicolor 104 6.3 2.9 5.6 1.8 virginica 109 6.7 2.5 5.8 1.8 virginica 113 6.8 3.0 5.5 2.1 virginica 131 7.4 2.8 6.1 1.9 virginica 140 6.9 3.1 5.4 2.1 virginica 149 6.2 3.4 5.4 2.3 virginica The first four variables in the data set represent measurements (in cm) of parts of the flowers, while the final variable specifies the species of iris. (Sepals are the green leaves that comprise the calyx of the plant, which encloses the flower.) Photographs of examples of the three species of irises—setosa, versicolor, and virginica—appear in Figure 1. Figure 2 is a scatterplot matrix of the four measurements classified by species, showing within-species 50 and 95% concentration ellipses (see Sec. 4.3.8 of the R Companion); Figure 3 shows boxplots for each of the responses by species: > scatterplotMatrix(~ Sepal.Length + Sepal.Width + Petal.Length + + Petal.Width | Species, + data=iris, smooth=FALSE, reg.line=FALSE, ellipse=TRUE, + by.groups=TRUE, diagonal="none") 4
2.02.53.03.54.0 0.51.01.52.02.5 Sepal.Length + o setosa A versicolor +virginica c Sepal.Width Petal.Length 4 Petal.Width .0 4.5 5.5 6.5 7.5 1 2 3 5 6 7 Figure 2:Scatterplot matrix for the Anderson/Fisher iris data,showing within-species 50 and 95% concentration ellipses. 5
● setosa versicolor virginica Sepal.Length 2.0 2.5 3.0 3.5 4.0 ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ●● ● ● ● ●● ● ● ● ●● ● ● ● ● ● ● ●● ● ● ● ● ●● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ●● ● ● 0.5 1.0 1.5 2.0 2.5 4.5 5.5 6.5 7.5 ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ●● ● ● ● ● ● ● ● ●● ● ● 2.0 2.5 3.0 3.5 4.0 ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● Sepal.Width ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ●●● ● ● ● ● ●●● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ●●● ●● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ●●● ●● ● ●●● ●●● ● ● ● ● ● ● ● ● ● ● ● ●● ●●●● ● ●●● ●● ● ● ●● ●●●● ● ● ● ●●● ● ● ● Petal.Length 1 2 3 4 5 6 7 ●● ● ● ● ●●● ●● ●●● ● ●● ● ● ● ● ●● ● ●● ● ●●●● ● ● ●● ●● ● ● 4.5 5.5 6.5 7.5 0.5 1.0 1.5 2.0 2.5 ●● ●●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ●●● ● ● ● ●●● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ●● ●● ● ● 1 2 3 4 5 6 7 ●●● ● ● ● ● ● ● ● ●● ●●● ● ● ● ● ● ● ● ●● ● ● ●●● ● ● ● ● ● ●●● ● ● Petal.Width Figure 2: Scatterplot matrix for the Anderson/Fisher iris data, showing within-species 50 and 95% concentration ellipses. 5
2. 0'9 107。 3 042 setosa versicolor virginica setosa versicolor virginica Species Species 日 6 PIM'leed 990 8牡 古23 setosa versicolor virginica setosa versicolor virginica Species Species Figure 3:Boxplots for the response variables in the iris data set classified by species. par(mfrow=c(2,2)) for (response in c("Sepal.Length","Sepal.Width","Petal.Length","Petal.Width")) + Boxplot(iris[,response]~Species,data=iris,ylab=response) As the photographs suggest,the scatterplot matrix and boxplots for the measurements reveal that versicolor and virginica are more similar to each other than either is to setosa.Further,the ellipses in the scatterplot matrix suggest that the assumption of constant within-group covariance matrices is problematic:While the shapes and sizes of the concentration ellipses for versicolor and virginica are reasonably similar,the shapes and sizes of the ellipses for setosa are different from the other two. We proceed nevertheless to fit a multivariate one-way ANOVA model to the iris data: mod.iris <-lm(cbind(Sepal.Length,Sepal.Width,Petal.Length,Petal.Width) ~Species,data=iris) 6
● setosa versicolor virginica 4.5 5.0 5.5 6.0 6.5 7.0 7.5 8.0 Species Sepal.Length 107 ● setosa versicolor virginica 2.0 2.5 3.0 3.5 4.0 Species Sepal.Width 42 ● ● setosa versicolor virginica 1 2 3 4 5 6 7 Species Petal.Length 23 99 ● ● setosa versicolor virginica 0.5 1.0 1.5 2.0 2.5 Species Petal.Width 2444 Figure 3: Boxplots for the response variables in the iris data set classified by species. > par(mfrow=c(2, 2)) > for (response in c("Sepal.Length", "Sepal.Width", "Petal.Length", "Petal.Width")) + Boxplot(iris[, response] ~ Species, data=iris, ylab=response) As the photographs suggest, the scatterplot matrix and boxplots for the measurements reveal that versicolor and virginica are more similar to each other than either is to setosa. Further, the ellipses in the scatterplot matrix suggest that the assumption of constant within-group covariance matrices is problematic: While the shapes and sizes of the concentration ellipses for versicolor and virginica are reasonably similar, the shapes and sizes of the ellipses for setosa are different from the other two. We proceed nevertheless to fit a multivariate one-way ANOVA model to the iris data: > mod.iris <- lm(cbind(Sepal.Length, Sepal.Width, Petal.Length, Petal.Width) + ~ Species, data=iris) 6
class(mod.iris) [1]"mlm""1m" >mod.iris Call: 1m(formula cbind(Sepal.Length,Sepal.Width,Petal.Length,Petal.Width) Species,data iris) Coefficients: Sepal.Length Sepal.Width Petal.Length Petal.Width (Intercept) 5.006 3.428 1.462 0.246 Speciesversicolor 0.930 -0.658 2.798 1.080 Speciesvirginica 1.582 -0.454 4.090 1.780 summary(mod.iris) Response Sepal.Length Call: lm(formula Sepal.Length Species,data iris) Residuals: Min 10 Median 30 Max -1.688-0.329-0.0060.3121.312 Coefficients: Estimate Std.Error t value Pr(>tl) (Intercept) 5.0060 0.0728 68.76Itl) 7
> class(mod.iris) [1] "mlm" "lm" > mod.iris Call: lm(formula = cbind(Sepal.Length, Sepal.Width, Petal.Length, Petal.Width) ~ Species, data = iris) Coefficients: Sepal.Length Sepal.Width Petal.Length Petal.Width (Intercept) 5.006 3.428 1.462 0.246 Speciesversicolor 0.930 -0.658 2.798 1.080 Speciesvirginica 1.582 -0.454 4.090 1.780 > summary(mod.iris) Response Sepal.Length : Call: lm(formula = Sepal.Length ~ Species, data = iris) Residuals: Min 1Q Median 3Q Max -1.688 -0.329 -0.006 0.312 1.312 Coefficients: Estimate Std. Error t value Pr(>|t|) (Intercept) 5.0060 0.0728 68.76 |t|) 7
(Intercept) 3.4280 0.0480 71.36Itl) (Intercept) 1.4620 0.0609 24.0 Itl) (Intercept) 0.2460 0.0289 8.5 2e-14 Speciesversicolor 1.0800 0.0409 26.4 <2e-16 Speciesvirginica 1.7800 0.0409 43.5 <2e-16 Residual standard error:0.205 on 147 degrees of freedom Multiple R-squared:0.929, Adjusted R-squared:0.928 F-statistic:960 on 2 and 147 DF,p-value:<2e-16 8
(Intercept) 3.4280 0.0480 71.36 |t|) (Intercept) 1.4620 0.0609 24.0 |t|) (Intercept) 0.2460 0.0289 8.5 2e-14 Speciesversicolor 1.0800 0.0409 26.4 <2e-16 Speciesvirginica 1.7800 0.0409 43.5 <2e-16 Residual standard error: 0.205 on 147 degrees of freedom Multiple R-squared: 0.929, Adjusted R-squared: 0.928 F-statistic: 960 on 2 and 147 DF, p-value: <2e-16 8
The lm function returns an S3 object of class c("mlm","Im").The printed representation of the object simply shows the estimated regression coefficients for each response,and the model summary is the same as we would obtain by performing separate least-squares regressions for the four responses. We use the Anova function in the car package to test the null hypothesis that the four response means are identical across the three species of irises:3 >(manova.iris F) Species 2 1.19 53.5 8290F) Pillai 2 1.19 53.5 8 290<2e-16 Wilks 2 0.02 199.1 8 288<2e-16 Hotelling-Lawley 2 32.48 580.5 8 286<2e-16 Roy 2 32.19 1167.0 4 145<2e-16 The Anova function returns an object of class "Anova.mlm"which,when printed,produces a multivariate-analysis-of-variance ("MANOVA")table,by default reporting Pillai's test statistic; 3The Manova function in the car package is equivalent to Anova applied to a multivariate linear model. 9
The lm function returns an S3 object of class c("mlm", "lm"). The printed representation of the object simply shows the estimated regression coefficients for each response, and the model summary is the same as we would obtain by performing separate least-squares regressions for the four responses. We use the Anova function in the car package to test the null hypothesis that the four response means are identical across the three species of irises:3 > (manova.iris F) Species 2 1.19 53.5 8 290 class(manova.iris) [1] "Anova.mlm" > summary(manova.iris) Type II MANOVA Tests: Sum of squares and products for error: Sepal.Length Sepal.Width Petal.Length Petal.Width Sepal.Length 38.956 13.630 24.625 5.645 Sepal.Width 13.630 16.962 8.121 4.808 Petal.Length 24.625 8.121 27.223 6.272 Petal.Width 5.645 4.808 6.272 6.157 ------------------------------------------ Term: Species Sum of squares and products for the hypothesis: Sepal.Length Sepal.Width Petal.Length Petal.Width Sepal.Length 63.21 -19.95 165.25 71.28 Sepal.Width -19.95 11.34 -57.24 -22.93 Petal.Length 165.25 -57.24 437.10 186.77 Petal.Width 71.28 -22.93 186.77 80.41 Multivariate Tests: Species Df test stat approx F num Df den Df Pr(>F) Pillai 2 1.19 53.5 8 290 <2e-16 Wilks 2 0.02 199.1 8 288 <2e-16 Hotelling-Lawley 2 32.48 580.5 8 286 <2e-16 Roy 2 32.19 1167.0 4 145 <2e-16 The Anova function returns an object of class "Anova.mlm" which, when printed, produces a multivariate-analysis-of-variance (“MANOVA”) table, by default reporting Pillai’s test statistic; 3The Manova function in the car package is equivalent to Anova applied to a multivariate linear model. 9
summarizing the object produces a more complete report.The object returned by Anova may also be used in further computations,for example,for displays such as HE plots (Friendly,2007;Fox et al.,2009;Friendly,2010).Because there is only one term (beyond the regression constant)on the right-hand side of the model,in this example the type-II test produced by default by Anova is the same as the sequential test produced by the standard R anova function: anova(mod.iris) Analysis of Variance Table Df Pillai approx F num Df den Df Pr(>F) (Intercept) 10.993 5204 4144<2e-16 Species 21.192 53 8 290<2e-16 Residuals 147 The null hypothesis is soundly rejected. The linearHypothesis function in the car package may be used to test more specific hypothe- ses about the parameters in the multivariate linear model.For example,to test for differences between setosa and the average of versicolor and virginica,and for differences between versicolor and virginica: linearHypothesis(mod.iris,"0.5*Speciesversicolor 0.5*Speciesvirginica", + verbose=TRUE) Hypothesis matrix: (Intercept)Speciesversicolor 0.5*Speciesversicolor 0.5*Speciesvirginica 0 0.5 Speciesvirginica 0.5*Speciesversicolor +0.5*Speciesvirginica 0.5 Right-hand-side matrix: Sepal.Length Sepal.Width 0.5*Speciesversicolor +0.5*Speciesvirginica 0 0 Petal.Length Petal.Width 0.5*Speciesversicolor +0.5*Speciesvirginica 0 0 Estimated linear function (hypothesis.matrix %*coef -rhs): Sepal.Length Sepal.Width Petal.Length Petal.Width 1.256 -0.556 3.444 1.430 Sum of squares and products for the hypothesis: Sepal.Length Sepal.Width Petal.Length Petal.Width Sepal.Length 52.58 -23.28 144.19 59.87 Sepal.Width -23.28 10.30 -63.83 -26.50 Petal.Length 144.19 -63.83 395.37 164.16 Petal.Width 59.87 -26.50 164.16 68.16 Sum of squares and products for error: 10
summarizing the object produces a more complete report. The object returned by Anova may also be used in further computations, for example, for displays such as HE plots (Friendly, 2007; Fox et al., 2009; Friendly, 2010). Because there is only one term (beyond the regression constant) on the right-hand side of the model, in this example the type-II test produced by default by Anova is the same as the sequential test produced by the standard R anova function: > anova(mod.iris) Analysis of Variance Table Df Pillai approx F num Df den Df Pr(>F) (Intercept) 1 0.993 5204 4 144 linearHypothesis(mod.iris, "0.5*Speciesversicolor + 0.5*Speciesvirginica", + verbose=TRUE) Hypothesis matrix: (Intercept) Speciesversicolor 0.5*Speciesversicolor + 0.5*Speciesvirginica 0 0.5 Speciesvirginica 0.5*Speciesversicolor + 0.5*Speciesvirginica 0.5 Right-hand-side matrix: Sepal.Length Sepal.Width 0.5*Speciesversicolor + 0.5*Speciesvirginica 0 0 Petal.Length Petal.Width 0.5*Speciesversicolor + 0.5*Speciesvirginica 0 0 Estimated linear function (hypothesis.matrix %*% coef - rhs): Sepal.Length Sepal.Width Petal.Length Petal.Width 1.256 -0.556 3.444 1.430 Sum of squares and products for the hypothesis: Sepal.Length Sepal.Width Petal.Length Petal.Width Sepal.Length 52.58 -23.28 144.19 59.87 Sepal.Width -23.28 10.30 -63.83 -26.50 Petal.Length 144.19 -63.83 395.37 164.16 Petal.Width 59.87 -26.50 164.16 68.16 Sum of squares and products for error: 10