Visual Hypothesis Tests in Multivariate Linear Models: The heplots Package for R John Fox Michael Friendly Georges Monette McMaster University York University York University 6 Februrary 2007 Abstract Hypothesis-error (or "HE")plots,introduced by Friendly (2006,2007),permit the visualization of hypothesis tests in multivariate linear models by representing hypothesis and error matrices of sums of squares and cross-products as ellipses.This paper describes the implementation of these methods in R, as well as their extension,for example from two to three dimensions and by scaling hypothesis ellipses and ellipsoids in a natural manner relative to error.The methods,incorporated in the heplots package for R,exploit new facilities in the car package for testing linear hypotheses in multivariate linear models and for constructing MANOVA tables for these models,including models for repeated measures. 1 Introduction This paper introduces the heplots package for R,which implements and extends the methods described in Friendly(2006,2007)for visualizing hypothesis tests in multivariate linear models.The paper begins with a brief description of multivariate linear models;proceeds to explain how dispersion matrices can be represented by ellipses or ellipsoids:describes new facilities in the car package (associated with Fox.2002)for testing linear hypotheses in multivariate linear models and for constructing multivariate analysis-of-variance tables; and illustrates the use of the functions in the heplots package for two and three-dimensional visualization of hypothesis tests in multivariate analysis of variance and regression. 2 Multivariate Linear Models The univariate linear model y=XB+8 (1) is surely the most familiar of statistical models.In Equation 1,y is an n x 1 column vector of observations on a response variable;X is an n x p model matrix of full column rank that is either fixed or,if random, independent of the n x 1 vector of errors e;and the p x 1 vector of regression coefficients B is to be estimated from the data..As is also familiar,under the standard assumptions that the errors are normally and independently distributed with zero expectations and common variance,si~NID(0,2)or equivalently eNn(0,02In),the least squares estimator, B=(XTx)-xTy is the maximum-likelihood estimator of B.Here,Nn denotes the multivariate-normal distribution for n variables,0 is the n x 1 zero vector,and In is the order-n identity matrix. In the multivariate linear model (e.g.,Timm,1975), Y=XB+E the response vector y is replaced by an n x m matrix of responses Y,where each column represents a distinct response variable,B is a pxm matrix of regression coefficients,and E is an nxm matrix of errors.Under the 入y
Visual Hypothesis Tests in Multivariate Linear Models: The heplots Package for R John Fox McMaster University Michael Friendly York University Georges Monette York University 6 Februrary 2007 Abstract Hypothesis-error (or “HE”) plots, introduced by Friendly (2006, 2007), permit the visualization of hypothesis tests in multivariate linear models by representing hypothesis and error matrices of sums of squares and cross-products as ellipses. This paper describes the implementation of these methods in R, as well as their extension, for example from two to three dimensions and by scaling hypothesis ellipses and ellipsoids in a natural manner relative to error. The methods, incorporated in the heplots package for R, exploit new facilities in the car package for testing linear hypotheses in multivariate linear models and for constructing MANOVA tables for these models, including models for repeated measures. 1 Introduction This paper introduces the heplots package for R, which implements and extends the methods described in Friendly (2006, 2007) for visualizing hypothesis tests in multivariate linear models. The paper begins with a brief description of multivariate linear models; proceeds to explain how dispersion matrices can be represented by ellipses or ellipsoids; describes new facilities in the car package (associated with Fox, 2002) for testing linear hypotheses in multivariate linear models and for constructing multivariate analysis-of-variance tables; and illustrates the use of the functions in the heplots package for two and three-dimensional visualization of hypothesis tests in multivariate analysis of variance and regression. 2 Multivariate Linear Models The univariate linear model y = Xβ + ε (1) is surely the most familiar of statistical models. In Equation 1, y is an n × 1 column vector of observations on a response variable; X is an n × p model matrix of full column rank that is either fixed or, if random, independent of the n × 1 vector of errors ε; and the p × 1 vector of regression coefficients β is to be estimated from the data.. As is also familiar, under the standard assumptions that the errors are normally and independently distributed with zero expectations and common variance, εi ∼ NID(0, σ2) or equivalently ε ∼ Nn(0, σ2In), the least squares estimator, βb = (XT X) −1XT y is the maximum-likelihood estimator of β. Here, Nn denotes the multivariate-normal distribution for n variables, 0 is the n × 1 zero vector, and In is the order-n identity matrix. In the multivariate linear model (e.g., Timm, 1975), Y = XB + E the response vector y is replaced by an n×m matrix of responses Y, where each column represents a distinct response variable, B is a p×m matrix of regression coefficients, and E is an n×m matrix of errors. Under the 1
assumption that the rows of E are independent.and that each row is multivariately normally distributed with zero expectation and common covariance matrix,~Nn(0,E)or equivalently vec(E)~Nnp(0,In ) the least squares estimator B=(XTX)-XTY is the maximum-likelihood estimator or B.Here,the 0 vectors are respectively of order n x 1 and np x 1, and represents the Kronecker product. Hypothesis tests for multivariate linear models also closely parallel those for univariate linear models. Consider the linear hypothesis Ho:L3=0 in the univariate linear model,where L is a g x p hypothesis matrix of rank q and 0 is the qx 1 zero vector. Under this hypothesis, LLXTX-LL3 P0= 9 SS/q EE SSE/(n-P) n-卫 is distributed as F with q and n-pdegrees of freedom.The quantity sLTL is the sum of squares for the hypothesis,y is the vector of residuals,SSg is the sum of squares for error,and s2=/(n-p)is the estimated error variance.To test the analogous hypothesis in the multivariate linear model, Ho:LB=0 (2) where 0 is now the gx m zero matrix,we compute the m x m hypothesis sum of squares and products matrix SSP BTLT L(XTX)-LT]-LB and the m x m error sum of squares and products matrix SSPE=ETE where E=Y-XB is the matrix of residuals.Multivariate tests of the hypothesis are based on the s min(g,m)nonzero latent roots A1 >A2>...>As of the matrix SSP relative to the matrix SSPE, that is.the values of A for which det(SSPH-λSSPE)=0 These are also the ordinary latent roots of of SSPSSP,that is,the values of for which det(SSPHSSP1-AIm)=0 The corresponding latent vectors give a set of s orthogonal linear combinations of the responses that pro- duce maximal univariate F statistics for the hypothesis in Equation 2.The several commonly employed multivariate test statistics are functions of the latent roots: Pillai's trace. j=1 Hotelling-Lawley trace. THL=∑男=1 1 Wilks's Lambda A= 1+ Roy's maximum root, There is an F approximation to the null distribution of each of these test statistics. In a univariate linear model,it is common to provide F tests for each term in the model,summarized in an analysis-of-variance (ANOVA)table.The hypothesis sums of squares for these tests can be expressed as differences in the error sums of squares for nested models.For example,dropping each term in the model
assumption that the rows of E are independent, and that each row is multivariately normally distributed with zero expectation and common covariance matrix, εT i ∼ Nn(0, Σ) or equivalently vec(E) ∼ Nnp(0, In ⊗ Σ), the least squares estimator Bb = (XT X) −1XT Y is the maximum-likelihood estimator or B. Here, the 0 vectors are respectively of order n × 1 and np × 1, and ⊗ represents the Kronecker product. Hypothesis tests for multivariate linear models also closely parallel those for univariate linear models. Consider the linear hypothesis H0: Lβ = 0 in the univariate linear model, where L is a q × p hypothesis matrix of rank q and 0 is the q × 1 zero vector. Under this hypothesis, F0 = βbT LT [L(XT X) −1 LT ] −1 Lβb q bε T bε n − p = SSH/q SSE/(n − p) is distributed as F with q and n − p degrees of freedom. The quantity SSH = βbT LT [L(XT X) −1 LT ] −1 Lβb is the sum of squares for the hypothesis, bε = y − Xβb is the vector of residuals, SSE = bε T bε is the sum of squares for error, and s2 = bε T bε/(n − p) is the estimated error variance. To test the analogous hypothesis in the multivariate linear model, H0: LB = 0 (2) where 0 is now the q×m zero matrix, we compute the m×m hypothesis sum of squares and products matrix SSPH = Bb TLT [L(XT X) −1 LT ] −1 LBb and the m × m error sum of squares and products matrix SSPE = EbTEb where Eb = Y − XBb is the matrix of residuals. Multivariate tests of the hypothesis are based on the s = min(q,m) nonzero latent roots λ1 > λ2 > ··· > λs of the matrix SSPH relative to the matrix SSPE, that is, the values of λ for which det(SSPH − λSSPE)=0 These are also the ordinary latent roots of of SSPHSSP−1 E , that is, the values of λ for which det(SSPHSSP−1 E − λIm)=0 The corresponding latent vectors give a set of s orthogonal linear combinations of the responses that produce maximal univariate F statistics for the hypothesis in Equation 2. The several commonly employed multivariate test statistics are functions of the latent roots: Pillai’s trace, TP = Xp j=1 λj 1 + λj Hotelling-Lawley trace, THL = Pp j=1 λj Wilks’s Lambda, Λ = Yp j=1 1 1 + λj Roy’s maximum root, λ1 There is an F approximation to the null distribution of each of these test statistics. In a univariate linear model, it is common to provide F tests for each term in the model, summarized in an analysis-of-variance (ANOVA) table. The hypothesis sums of squares for these tests can be expressed as differences in the error sums of squares for nested models. For example, dropping each term in the model 2
in turn and contrasting the resulting residual sum of squares with that for the full model produces so-called Type-III tests;adding terms to the model sequentially produces so-called Type-I tests;and testing each term after all terms in the model with the exception of those to which it is marginal produces so-called Type-II tests.Closely analogous multivariate analysis-of-variable (MANOVA)tables can be formed similarly by taking differences in error sum of squares and products matrices. In some contexts-for example.when the response variables represent repeated measures of the same variable over time-it is also of interest to entertain a design and hypotheses on the response (see,e.g., O'Brien and Kaiser,1985).Such tests can be formulated by extending the linear hypothesis in Equation 2 to Ho:LBP=0 where the m x k matrix P provides contrasts in the responses. 3 Data Ellipses and Ellipsoids The data ellipse,described by Dempster(1969)and Monette(1990),is a device for visualizing the relationship between two variables,Yi and Y2.Let D(y)=(y-y)TS-1(y-y)represent the squared Mahalanobis distance of the point y=(,y2)T from the centroid of the data y=(Y1,Y2)T.The data ellipse Ee of size c is the set of all points y with Di(y)less than or equal to c2: (y:s)={y:(y-)Ts-1(y-)≤2} (3) Here.S is the sample covariance matrix. s=-y-可 n-1 Selecting c=1 produces the "standard"data ellipse,as illustrated in Figure 1:The perpendicular "shadows"of the ellipse on the axes mark off twice the standard deviation of each variable;the regression line for Y2 on Yi intersects the points of vertical tangency on the boundary of the ellipse;and the correlation between the two variables is proportional to the length of the line from the bottom of the ellipse to the point of vertical tangency at the right.Many other properties of correlation and regression can be visualized using the data ellipse (see,e.g.,Monette,1990). These properties of the data ellipse hold regardless of the joint distribution of the variables,but if the variables are bivariate normal,then the data ellipse represents a contour of constant density in their joint distribution.In this case,D(y)has a large-sample x2 distribution with 2 degrees of freedom,and so,for example,taking c2=x2(0.95)=5.996 encloses approximately 95 percent of the data.Alternatively,in small samples,we can take 2=2m-Fm-2≈2n2m-2 n-2 but this typically makes little difference visually. The generalization of the data ellipse to more than two variables is immediate:Applying Equation 3 to y =(42,43)T,for example,produces a data ellipsoid in three dimensions.For m multivariate-normal variables,selecting c2=x(1-a)encloses approximately 100(1-a)percent of the data.Again,for greater precision,we can use 2=ma-Fnn-m≈mEnn-m n-m 4 Implementation of Tests for Multivariate Linear Models in the car Package Tests for multivariate linear models are implemented in the car package as S3 methods for the generic linear.hypothesis and Anova functions,with Manova provided as a synonym for the latter.The Anova function computes partial (so-called "Types II and III")hypothesis tests,as opposed to the anova function in the stats package,which computes sequential ("Type-I")tests;these tests coincide in one-way and balanced designs.Several examples of the use of these functions are given in this section. 3
in turn and contrasting the resulting residual sum of squares with that for the full model produces so-called Type-III tests; adding terms to the model sequentially produces so-called Type-I tests; and testing each term after all terms in the model with the exception of those to which it is marginal produces so-called Type-II tests. Closely analogous multivariate analysis-of-variable (MANOVA) tables can be formed similarly by taking differences in error sum of squares and products matrices. In some contexts – for example, when the response variables represent repeated measures of the same variable over time – it is also of interest to entertain a design and hypotheses on the response (see, e.g., O’Brien and Kaiser, 1985). Such tests can be formulated by extending the linear hypothesis in Equation 2 to H0: LBP = 0 where the m × k matrix P provides contrasts in the responses. 3 Data Ellipses and Ellipsoids The data ellipse, described by Dempster (1969) and Monette (1990), is a device for visualizing the relationship between two variables, Y1 and Y2. Let D2 M(y)=(y − y)T S−1(y − y) represent the squared Mahalanobis distance of the point y = (y1, y2)T from the centroid of the data y = (Y 1, Y 2)T . The data ellipse Ec of size c is the set of all points y with D2 M(y) less than or equal to c2: Ec(y; S,y) ≡ © y: (y − y) T S−1(y − y) ≤ c2ª (3) Here, S is the sample covariance matrix, S = Pn i=1(y − y)T (y − y) n − 1 Selecting c = 1 produces the “standard” data ellipse, as illustrated in Figure 1: The perpendicular “shadows” of the ellipse on the axes mark off twice the standard deviation of each variable; the regression line for Y2 on Y1 intersects the points of vertical tangency on the boundary of the ellipse; and the correlation between the two variables is proportional to the length of the line from the bottom of the ellipse to the point of vertical tangency at the right. Many other properties of correlation and regression can be visualized using the data ellipse (see, e.g., Monette, 1990). These properties of the data ellipse hold regardless of the joint distribution of the variables, but if the variables are bivariate normal, then the data ellipse represents a contour of constant density in their joint distribution. In this case, D2 M(y) has a large-sample χ2 distribution with 2 degrees of freedom, and so, for example, taking c2 = χ2 2(0.95) = 5.99 ≈ 6 encloses approximately 95 percent of the data. Alternatively, in small samples, we can take c2 = 2(n − 1) n − 2 F2,n−2 ≈ 2F2,n−2 but this typically makes little difference visually. The generalization of the data ellipse to more than two variables is immediate: Applying Equation 3 to y = (y1, y2, y3)T , for example, produces a data ellipsoid in three dimensions. For m multivariate-normal variables, selecting c2 = χ2 m(1−α) encloses approximately 100(1−α) percent of the data. Again, for greater precision, we can use c2 = m(n − 1) n − m Fm,n−m ≈ mFm,n−m 4 Implementation of Tests for Multivariate Linear Models in the car Package Tests for multivariate linear models are implemented in the car package as S3 methods for the generic linear.hypothesis and Anova functions, with Manova provided as a synonym for the latter. The Anova function computes partial (so-called “Types II and III”) hypothesis tests, as opposed to the anova function in the stats package, which computes sequential (“Type-I”) tests; these tests coincide in one-way and balanced designs. Several examples of the use of these functions are given in this section. 3
98 60 2×「×s2 2×S2 8 99 品 0 2×s1 40 50 55 60 Y Figure 1:The standard data ellipse,showing the standard deviation of each variable (s1 and s2),their means(given by the solid black dot),the line for the regression of Y2 on Y1,and the correlation between the variables (r). 4.1 One-Way MANOVA:Romano-British Pottery Tubb,Parker,and Nickless(1980)(Tubb et al.,1980)used atomic absorption spectrophotometry to analyze data on the element composition of 26 samples of Romano-British pottery found at four different kiln sites in Britain with a view to determining if the chemical content of aluminium,iron,magnesium,calcium and sodium could differentiate those sites:see also Hand et al (1994:252).If so.the chemical content of pottery of unknown origin might be used for classification purposes.The data thus comprise a one-way MANOVA design with four groups and five response variables. The data for this example are in the data frame Pottery in the car package: library(heplots) Pottery Site Al Fe Mg Ca Na 1 L1 anedyrn14.47.004.300.150.51 2 L1 anedyrn13.87.083.430.120.17 3 L1 anedyrn14.67.093.880.130.20 25Ash1 eyRai1s14.82.740.670.030.05 26Ash1 eyRai1s19.11.640.600.100.03 table(Pottery$site) AshleyRails Caldicot IsleThorns Llanedyrn 5 2 6 14 The ellipses in the output (...)represent elided lines. In R.multivariate linear models are fit by the Im function.returning an object of class mlm.Here,we fit a one-way MANOVA model to the Pottery data.The print method for the object returned by the Anova function gives a brief display of the multivariate test for Site,using the Pillai trace statistic by default.A more detailed display,including the SSP and SSPE matrices and all four multivariate tests is provided
35 40 45 50 55 60 70 80 90 100 110 120 130 Y1 Y2 2 × s1 2× s2 2 × r × s2 - Figure 1: The standard data ellipse, showing the standard deviation of each variable (s1 and s2), their means (given by the solid black dot), the line for the regression of Y2 on Y1, and the correlation between the variables (r). 4.1 One-Way MANOVA: Romano-British Pottery Tubb, Parker, and Nickless (1980)(Tubb et al., 1980) used atomic absorption spectrophotometry to analyze data on the element composition of 26 samples of Romano-British pottery found at four different kiln sites in Britain with a view to determining if the chemical content of aluminium, iron, magnesium, calcium and sodium could differentiate those sites; see also Hand et al (1994: 252). If so, the chemical content of pottery of unknown origin might be used for classification purposes. The data thus comprise a one-way MANOVA design with four groups and five response variables. The data for this example are in the data frame Pottery in the car package: > library(heplots) > Pottery Site Al Fe Mg Ca Na 1 Llanedyrn 14.4 7.00 4.30 0.15 0.51 2 Llanedyrn 13.8 7.08 3.43 0.12 0.17 3 Llanedyrn 14.6 7.09 3.88 0.13 0.20 ... 25 AshleyRails 14.8 2.74 0.67 0.03 0.05 26 AshleyRails 19.1 1.64 0.60 0.10 0.03 > table(Pottery$Site) AshleyRails Caldicot IsleThorns Llanedyrn 5 2 5 14 The ellipses in the output (...) represent elided lines. In R, multivariate linear models are fit by the lm function, returning an object of class mlm. Here, we fit a one-way MANOVA model to the Pottery data. The print method for the object returned by the Anova function gives a brief display of the multivariate test for Site, using the Pillai trace statistic by default. A more detailed display, including the SSPH and SSPE matrices and all four multivariate tests is provided 4
by the summary method for Anova.mlm objects (suppressing the univariate test for each response,which is given by default): pottery.mod F) Site31.55394.2984 15 602.413e-05*** Signif.c0de8:0J**’0.0013*’0.01J*’0.05’.’0.1’’1 >All 4 multivariate tests summary(Anova(pottery.mod),univariate=FALSE,digits=4) Type II MANOVA Tests: Sum of squares and products for error: Al Fe Mg Ca Na A148.28817.080070.608010.106470.58896 Fe7.080110.950850.52706-0.155190.06676 Mg0.60800.5270615.429610.435380.02762 Ca0.1065-0.155190.435380.051490.01008 Na0.58900.066760.02762 0,010080.19929 Term:Site Sum of squares and products for the hypothesis: Al Fe Mg Ca Na A1175.610-149.296-130.810-5.8892-5.3723 Fe-149.296134.222117.7454.82185.3259 Mg-130.810 117.745 103.3514.20924.7105 Ca -5.889 4.822 4.2090.20470.1548 Na -5.372 5.326 4.7110.15480.2582 Multivariate Tests:Site Df test stat approx F num Df den Df Pr(>F) Pillai 3.00 1.55 4.3015.0060.002.41e-05** Wilks 3.00 0.01 13.0915.0050.091.84e-12*** Hotelling-Lawley 3.00 35.44 39.3815.0050.00F) (Intercept)10.99523.075 18<2.2e-16*** 5
by the summary method for Anova.mlm objects (suppressing the univariate test for each response, which is given by default): > pottery.mod Anova(pottery.mod) Type II MANOVA Tests: Pillai test statistic Df test stat approx F num Df den Df Pr(>F) Site 3 1.5539 4.2984 15 60 2.413e-05 *** --- Signif. codes: 0 ’***’ 0.001 ’**’ 0.01 ’*’ 0.05 ’.’ 0.1 ’ ’ 1 > # All 4 multivariate tests > summary(Anova(pottery.mod), univariate=FALSE, digits=4) Type II MANOVA Tests: Sum of squares and products for error: Al Fe Mg Ca Na Al 48.2881 7.08007 0.60801 0.10647 0.58896 Fe 7.0801 10.95085 0.52706 -0.15519 0.06676 Mg 0.6080 0.52706 15.42961 0.43538 0.02762 Ca 0.1065 -0.15519 0.43538 0.05149 0.01008 Na 0.5890 0.06676 0.02762 0.01008 0.19929 ------------------------------------------ Term: Site Sum of squares and products for the hypothesis: Al Fe Mg Ca Na Al 175.610 -149.296 -130.810 -5.8892 -5.3723 Fe -149.296 134.222 117.745 4.8218 5.3259 Mg -130.810 117.745 103.351 4.2092 4.7105 Ca -5.889 4.822 4.209 0.2047 0.1548 Na -5.372 5.326 4.711 0.1548 0.2582 Multivariate Tests: Site Df test stat approx F num Df den Df Pr(>F) Pillai 3.00 1.55 4.30 15.00 60.00 2.41e-05 *** Wilks 3.00 0.01 13.09 15.00 50.09 1.84e-12 *** Hotelling-Lawley 3.00 35.44 39.38 15.00 50.00 anova(pottery.mod) Analysis of Variance Table Df Pillai approx F num Df den Df Pr(>F) (Intercept) 1 0.99 523.07 5 18 < 2.2e-16 *** 5
Site 31.55 4.30 15 602.413e-05*** Residuals 22 Signif.c0des:0J**’0.0013**’0.01’*’0.05’.’0.1’’1 There is,therefore,strong evidence against the null hypothesis of no differences in mean vectors across sites. 4.2 Two-Way MANOVA:Plastic Film Data For a slightly more complex example,we use textbook data from Johnson and Wichern (1992:266)on an experiment conducted to determine the optimum conditions for extruding plastic film.Three responses(tear resistance,film gloss,and opacity)were measured in relation to two factors:rate of extrusion (Low/High) and amount of an additive (Low/High).Again,the data are in the heplots package: Plastic tear gloss opacity rate additive 1 6.59.5 4.4 Low LoW 6.2 9.9 6.4 Low Low 3 5.8 9.6 3.0L0w LoW 4 6.5 9.6 4.1 Low LOW 6.5 9.2 0.8 Low LoW 6 6.9 9.1 5.7 Low High 7 7.2 10.0 2.0 Low High 6.9 9.9 3.9 Low Hig融 9 6.1 9.5 1.9 Low High 10 6.3 9.4 5.7 Low Hig助 11 6.7 9.1 2.8 High LOW 12 6.6 9.3 4.1 High LOW 13 7.2 8.3 3.8 High Low 14 7.1 8.4 1.6 High LOW 15 6.8 8.5 3.4 High LOw 16 7.1 9.2 8.4 High High 17 7.0 8.8 5.2 High High 18 7.2 9.7 6.9 High High 19 7.5 10.1 2.7 High High 20 7.6 9.2 1.9 High High We fit the two-way MANOVA model and display the Anova results,using Roy's maximum root test. Both main effects are significant,but their interaction is not: plastic.mod F) rate 1 1.61887.5543 3 140.003034** additive 1 0.9119 4.2556 3 140.024745* rate:additive 1 0.28681.3385 3 140.301782 Sig题if.c0des:0’**?0.001’*30.01’*’0.05).’0.1’)1 Again,we get the same tests from anova,this time because the data are balanced (so that sequential and Type-II tests coincide): 6
Site 3 1.55 4.30 15 60 2.413e-05 *** Residuals 22 --- Signif. codes: 0 ’***’ 0.001 ’**’ 0.01 ’*’ 0.05 ’.’ 0.1 ’ ’ 1 There is, therefore, strong evidence against the null hypothesis of no differences in mean vectors across sites. 4.2 Two-Way MANOVA: Plastic Film Data For a slightly more complex example, we use textbook data from Johnson and Wichern (1992: 266) on an experiment conducted to determine the optimum conditions for extruding plastic film. Three responses (tear resistance, film gloss, and opacity) were measured in relation to two factors: rate of extrusion (Low/High) and amount of an additive (Low/High). Again, the data are in the heplots package: > Plastic tear gloss opacity rate additive 1 6.5 9.5 4.4 Low Low 2 6.2 9.9 6.4 Low Low 3 5.8 9.6 3.0 Low Low 4 6.5 9.6 4.1 Low Low 5 6.5 9.2 0.8 Low Low 6 6.9 9.1 5.7 Low High 7 7.2 10.0 2.0 Low High 8 6.9 9.9 3.9 Low High 9 6.1 9.5 1.9 Low High 10 6.3 9.4 5.7 Low High 11 6.7 9.1 2.8 High Low 12 6.6 9.3 4.1 High Low 13 7.2 8.3 3.8 High Low 14 7.1 8.4 1.6 High Low 15 6.8 8.5 3.4 High Low 16 7.1 9.2 8.4 High High 17 7.0 8.8 5.2 High High 18 7.2 9.7 6.9 High High 19 7.5 10.1 2.7 High High 20 7.6 9.2 1.9 High High We fit the two-way MANOVA model and display the Anova results, using Roy’s maximum root test. Both main effects are significant, but their interaction is not: > plastic.mod Anova(plastic.mod, test.statistic="Roy") Type II MANOVA Tests: Roy test statistic Df test stat approx F num Df den Df Pr(>F) rate 1 1.6188 7.5543 3 14 0.003034 ** additive 1 0.9119 4.2556 3 14 0.024745 * rate:additive 1 0.2868 1.3385 3 14 0.301782 --- Signif. codes: 0 ’***’ 0.001 ’**’ 0.01 ’*’ 0.05 ’.’ 0.1 ’ ’ 1 Again, we get the same tests from anova, this time because the data are balanced (so that sequential and Type-II tests coincide): 6
anova(plastic.mod,test="Roy") Analysis of Variance Table Df Roy approx F num Df den Df Pr(>F) (Intercept) 11275.2 5950.9 心 14F) SES 1 0.378512.1818 602.507e-06*** ◇ 1 0.0403 0.8400 3 60 0.477330 8 0.0927 2.0437 600.117307 ns 1 0.1928 4.7779 600.004729** na 1 0.2313 6.0194 600.001181** 88 1 0.0499 1.0504 3 600.376988 S1gmif.codes:0’**’0.001’*’0.01’*’0.05’.’0.1’’1 This multivariate linear model is of interest because,although the multivariate tests for two of the covariates (ns and na)are highly significant.univariate multiple regression tests for the separate responses [from summary(rohwer.mod)]are relatively weak.We can test the 5 df hypothesis that all covariates have null effects for all responses as a linear hypothesis (suppressing display of the error and hypothesis SSP matrices)
> anova(plastic.mod, test="Roy") Analysis of Variance Table Df Roy approx F num Df den Df Pr(>F) (Intercept) 1 1275.2 5950.9 3 14 Rohwer group SES SAT PPVT Raven n s ns na ss 1 1 Lo 49 48 8 1 2 6 12 16 2 1 Lo 47 76 13 5 14 14 30 27 3 1 Lo 11 40 13 0 10 21 16 16 ... 68 2 Hi 98 74 15 2 6 14 25 17 69 2 Hi 50 78 19 5 10 18 27 26 Initially (and optimistically), we fit the MANCOVA model that allows different means for the two SES groups on the responses, but constrains the slopes for the PA covariates to be equal. > rohwer.mod Anova(rohwer.mod) Type II MANOVA Tests: Pillai test statistic Df test stat approx F num Df den Df Pr(>F) SES 1 0.3785 12.1818 3 60 2.507e-06 *** n 1 0.0403 0.8400 3 60 0.477330 s 1 0.0927 2.0437 3 60 0.117307 ns 1 0.1928 4.7779 3 60 0.004729 ** na 1 0.2313 6.0194 3 60 0.001181 ** ss 1 0.0499 1.0504 3 60 0.376988 --- Signif. codes: 0 ’***’ 0.001 ’**’ 0.01 ’*’ 0.05 ’.’ 0.1 ’ ’ 1 This multivariate linear model is of interest because, although the multivariate tests for two of the covariates (ns and na) are highly significant, univariate multiple regression tests for the separate responses [from summary(rohwer.mod)] are relatively weak. We can test the 5 df hypothesis that all covariates have null effects for all responses as a linear hypothesis (suppressing display of the error and hypothesis SSP matrices), 7
Regr F) Pillai 5.00 0.6658 3.536915.00186.002.309e-05*** Wilks 5.00 0.4418 3.811815.00166.038.275e-06*** Hotelling-Lawley 5.00 1.0309 4.032115.00176.002.787e-06*** Roy 5.00 0.7574 9.39245.0062,001,062e-06*** -- S1gif.codes:0’**’0.001’*’0.01’*’0.05’.’0.1’’1 As explained,in the MANCOVA model rohwer.mod we have assumed homogeneity of slopes for the predictors,and the test of SES relies on this assumption.We can test this as follows,adding interactions of SES with each of the covariates: rohwer.mod2 F) SES 1 0.391211,7822 3 55 4,55e-06**米 n 0.0790 1.5727 550.2063751 8 0.1252 2.6248 3 550.0595192 ns 1 0.2541 6.2461 3 550.0009995*** na 0.3066 8.1077 0 550.0001459*** 88 1 0.0602 1.1738 3 550.3281285 SES:n 1 0.0723 1.4290 550.2441738 SES:s 0.0994 2.0240 3 550.1211729 SES:ns 1 0.1176 2.4425 550.0738258 SES:na 0.1480 3.1850 550.0308108* SES:ss 0.0573 1.1150 550.3509357 S1g即if.codes:0’**’0.001’*’0.01’*)0.05’.’0.1’’1 It appears from the above that there is only weak evidence of unequal slopes from the separate SES: terms.The evidence for heterogeneity is stronger,however,when these terms are tested collectively using the linear.hypothesis function: >(coefs F) Pillai 5.00000.4179381.84522615.0000171.00000.0320861* Wilks 5.00000.6235821.89361315.0000152.23220.0276949* Hotelling-Lawley 5.00000.5386511.927175 15.0000161.00000.0239619* Roy 5.00000.3846494.384997 5.000057.00000.0019053** Signif.codes: 03***30.0013**30.013*)0.053.30.1331 8
> Regr print(Regr, digits=5, SSP=FALSE) Multivariate Tests: Df test stat approx F num Df den Df Pr(>F) Pillai 5.00 0.6658 3.5369 15.00 186.00 2.309e-05 *** Wilks 5.00 0.4418 3.8118 15.00 166.03 8.275e-06 *** Hotelling-Lawley 5.00 1.0309 4.0321 15.00 176.00 2.787e-06 *** Roy 5.00 0.7574 9.3924 5.00 62.00 1.062e-06 *** --- Signif. codes: 0 ’***’ 0.001 ’**’ 0.01 ’*’ 0.05 ’.’ 0.1 ’ ’ 1 As explained, in the MANCOVA model rohwer.mod we have assumed homogeneity of slopes for the predictors, and the test of SES relies on this assumption. We can test this as follows, adding interactions of SES with each of the covariates: > rohwer.mod2 Anova(rohwer.mod2) Type II MANOVA Tests: Pillai test statistic Df test stat approx F num Df den Df Pr(>F) SES 1 0.3912 11.7822 3 55 4.55e-06 *** n 1 0.0790 1.5727 3 55 0.2063751 s 1 0.1252 2.6248 3 55 0.0595192 . ns 1 0.2541 6.2461 3 55 0.0009995 *** na 1 0.3066 8.1077 3 55 0.0001459 *** ss 1 0.0602 1.1738 3 55 0.3281285 SES:n 1 0.0723 1.4290 3 55 0.2441738 SES:s 1 0.0994 2.0240 3 55 0.1211729 SES:ns 1 0.1176 2.4425 3 55 0.0738258 . SES:na 1 0.1480 3.1850 3 55 0.0308108 * SES:ss 1 0.0573 1.1150 3 55 0.3509357 --- Signif. codes: 0 ’***’ 0.001 ’**’ 0.01 ’*’ 0.05 ’.’ 0.1 ’ ’ 1 It appears from the above that there is only weak evidence of unequal slopes from the separate SES: terms. The evidence for heterogeneity is stronger, however, when these terms are tested collectively using the linear.hypothesis function: > (coefs print(linear.hypothesis(rohwer.mod2, coefs[grep(":", coefs)]), SSP=FALSE) Multivariate Tests: Df test stat approx F num Df den Df Pr(>F) Pillai 5.0000 0.417938 1.845226 15.0000 171.0000 0.0320861 * Wilks 5.0000 0.623582 1.893613 15.0000 152.2322 0.0276949 * Hotelling-Lawley 5.0000 0.538651 1.927175 15.0000 161.0000 0.0239619 * Roy 5.0000 0.384649 4.384997 5.0000 57.0000 0.0019053 ** --- Signif. codes: 0 ’***’ 0.001 ’**’ 0.01 ’*’ 0.05 ’.’ 0.1 ’ ’ 1 8
4.4 Repeated-Measures MANOVA:O'Brien and Kaiser's Data O'Brien and Kaiser (1985:Table 7)describe an imaginary study in which 16 female and male subjects,who are divided into three treatments,are measured on an unspecified response variable at a pretest,post-test, and a follow-up session:during each session,they are measured at five occasions at intervals of one hour. The design,therefore,has two between-subject and two within-subject factors.The data are in the data frame OBrienKaiser in the car package: OBrienKaiser treatment gender pre.1 pre.2 pre.3 pre.4 pre.5 post.1 post.2 post.3 post.4 post.5 1 control M 1 2 4 2 1 3 2 3 2 control M 4 4 2 3 3 3 control M 5 6 6 > 6 4 control F 5 4 57 5 P 3 5 3 5 control F 3 4 67 3 6 6 3 M 9 9 9 9 10 9 7 5 5 6 4 5 1 7 8 10 8 8 A F 2 3 6 5 9 F 3 3 6 5 6 1 10 M 4 4 5 6 6 8 8 11 12 BBBBBB 3 7 8 10 9 6 14 52 6 8 6 7 15 F 4 24 9 62 F 4 5225 6337 567 > fup.1 fup.2 fup.3 fup.4 fup.5 2 4 23456789 7 3564 4473 4 3 269561 4 10 9 9 6 6 1 5 5 4 7 8 11 5 6 980 57 16436864857 68 3 7 7 1 6 15 77 888 8377 16 0668 8 10 The contrasts specified for each between-subject factor correspond to what was employed in the original source: contrasts(OBrienKaiserStreatment) [,1][,2] control -2 0 B 1 1 9
4.4 Repeated-Measures MANOVA: O’Brien and Kaiser’s Data O’Brien and Kaiser (1985: Table 7) describe an imaginary study in which 16 female and male subjects, who are divided into three treatments, are measured on an unspecified response variable at a pretest, post-test, and a follow-up session; during each session, they are measured at five occasions at intervals of one hour. The design, therefore, has two between-subject and two within-subject factors. The data are in the data frame OBrienKaiser in the car package: > OBrienKaiser treatment gender pre.1 pre.2 pre.3 pre.4 pre.5 post.1 post.2 post.3 post.4 post.5 1 control M 1 2 4 2 132532 2 control M 4 4 5 3 422353 3 control M 5 6 5 7 745754 4 control F 5 4 7 5 422353 5 control F 3 4 6 4 367863 6 A M 7 8 7 9 9 9 9 10 8 9 7 A M 5 5 6 4 5 7 7 8 10 8 8 A F23532 2 4 8 6 5 9 A F33464 4 5 6 4 1 10 B M 4 4 5 3 4 6 7 6 8 8 11 B M 3 3 4 2 3 5 4 7 5 4 12 B M 6 7 8 6 3 9 10 11 9 6 13 B F 5 5 6 8 6 4 6 6 8 6 14 B F 2 2 3 1 2 5 6 7 5 2 15 B F 2 2 3 4 4 6 6 7 9 7 16 B F 4 5 7 5 4 7 7 8 6 7 fup.1 fup.2 fup.3 fup.4 fup.5 1 23244 2 45641 3 76976 4 44534 5 43643 6 9 10 11 9 6 7 8 9 11 9 8 8 66756 9 54754 10 8 8 9 7 8 11 5 6 8 6 5 12 8 7 10 8 7 13 7 7 8 10 8 14 6 7 8 6 3 15 7 7 8 6 7 16 7 8 10 8 7 The contrasts specified for each between-subject factor correspond to what was employed in the original source: > contrasts(OBrienKaiser$treatment) [,1] [,2] control -2 0 A 1 -1 B 11 9
contrasts(OBrienKaiserSgender) [,1] 1 M -1 We fit a multivariate linear model to the O'Brien and Kaiser data,using mod.ok F) (Intercept) 0.967296.389 1 109.241e-09*** treatment 3 0.441 3.940 2 100.0547069. gender 1 0.268 3.659 1 100.0848003 treatment:gender 0.364 2.855 2 100.1044692 10
> contrasts(OBrienKaiser$gender) [,1] F 1 M -1 We fit a multivariate linear model to the O’Brien and Kaiser data, using > mod.ok phase hour idata idata phase hour 1 pretest 1 2 pretest 2 3 pretest 3 4 pretest 4 5 pretest 5 6 posttest 1 7 posttest 2 8 posttest 3 9 posttest 4 10 posttest 5 11 followup 1 12 followup 2 13 followup 3 14 followup 4 15 followup 5 The MANOVA employing this intra-subject data frame along with a crossed design on the intra-subject factors phase and hour is obtained as follows: (av.ok F) (Intercept) 1 0.967 296.389 1 10 9.241e-09 *** treatment 2 0.441 3.940 2 10 0.0547069 . gender 1 0.268 3.659 1 10 0.0848003 . treatment:gender 2 0.364 2.855 2 10 0.1044692 10