REVIEW INTERPRETATION GGE Biplot vs.AMMI Analysis of Genotype-by-Environment Data Weikai Yan,*Manjit S.Kang,Baoluo Ma,Sheila Woods,and Paul L.Cornelius W.Yan and B.L.Ma.Eastern Cereal and Oilseed Research Centre ABSTRACT The use of genotype mai CORCgdCanada (AAPC)0 kang Dep.of gronomy LA 70803-2110:S.Woods.Cereal Research Center (CRC).AAFC,195 cultural researchers has increased dramatically Dafoe Road,Winnipeg.MB.Canada.R3T 2M9:P.L.Cornelius,Dep. during the past 5 yr for analyzing multi-environ tria MET)data Dep. ndin Abbreviations:AEC.average environment coordination:AMMI.Addi- this to com biplot tive Main Effect and Multiplicative Interactio G,genotype main eftect y-enviror bV-D namely mega-environment analysis,ge nrincinal comr ponent:MET ation: environment tials;NID.normally and independently distributed an d be com importance of model diagnosis in biplot a LANT BREEDERS and gen eticists as well as statisticians,hav Our main 、9 inte in i and inte ons are:(1)bo nd ce in selecting sun rior g enot s in ls (Barah et al..1981:Kan 1g88 1993:Eskridge,1990:Kang vironment analysis and genotype e and Pham.1991:Hiihn.1996:Yan et al.2000)Many statisti (ii)the GGE biplot is superior to the AMMI cal methods have been developed for GED analysis,including graph in mec AMMI analysis(Gauch 1992)and GGE biplot analysis(Yan and and has the inner-product property of the hinlot Kang,2003;Yan and Tinker,2006). The biplot (Gabriel,1971)has become a popular data visu- biplot is effective in evaly zation too in many scientific research areas,including psy- dsnot pos ch ogy, ogy,and of biplots yses each dataset is useful,but accuracy gain from 7周 n(19 and ac model diagnosis should not be overstated gry pop evaluation and mega-env men (Yan et al Published in Crop Sci 47:643-655(2007). CROP SCIENCE,VOL.47,MARCH-APRIL 2007 643
Reproduced from Crop Science. Published by Crop Science Society of America. All copyrights reserved. CROP SCIENCE, VOL. 47, MARCH–APRIL 2007 643 ABSTRACT The use of genotype main effect (G) plus genotype-by-environment (GE) interaction (G+GE) biplot analysis by plant breeders and other agricultural researchers has increased dramatically during the past 5 yr for analyzing multi-environment trial (MET) data. Recently, however, its legitimacy was questioned by a proponent of Additive Main Effect and Multiplicative Interaction (AMMI) analysis. The objectives of this review are: (i) to compare GGE biplot analysis and AMMI analysis on three aspects of genotype-by-environment data (GED) analysis, namely mega-environment analysis, genotype evaluation, and test-environment evaluation; (ii) to discuss whether G and GE should be combined or separated in these three aspects of GED analysis; and (iii) to discuss the role and importance of model diagnosis in biplot analysis of GED. Our main conclusions are: (i) both GGE biplot analysis and AMMI analysis combine rather than separate G and GE in megaenvironment analysis and genotype evaluation, (ii) the GGE biplot is superior to the AMMI1 graph in mega-environment analysis and genotype evaluation because it explains more G+GE and has the inner-product property of the biplot, (iii) the discriminating power vs. representativeness view of the GGE biplot is effective in evaluating test environments, which is not possible in AMMI analysis, and (iv) model diagnosis for each dataset is useful, but accuracy gain from model diagnosis should not be overstated. GGE Biplot vs. AMMI Analysis of Genotype-by-Environment Data Weikai Yan,* Manjit S. Kang, Baoluo Ma, Sheila Woods, and Paul L. Cornelius W. Yan and B.L. Ma, Eastern Cereal and Oilseed Research Centre (ECORC), Agric. and Agri-Food Canada (AAFC), 960 Carling Ave., Ottawa, ON, Canada, K1A 0C6; M.S. Kang, Dep. of Agronomy & Environ. Mgmt., Louisiana State Univ. Agric. Center, Baton Rouge, LA 70803-2110; S. Woods, Cereal Research Center (CRC), AAFC, 195 Dafoe Road, Winnipeg, MB, Canada, R3T 2M9; P.L. Cornelius, Dep. of Plant and Soil Sciences and Dep. of Statistics, Univ. of Kentucky, Lexington, KY 40506. ECORC contribution number: 06-688. Received 9 June 2006. *Corresponding author (yanw@agr.gc.ca). Abbreviations: AEC, average environment coordination; AMMI, Additive Main Eff ect and Multiplicative Interaction; G, genotype main eff ect; GE, genotype-by-environment interaction; GED, genotype-by-environment data (for a single trait); GGE, genotype main eff ect plus genotypeby-environment interaction; IPC, interaction principal component; MET, multi-environment trials; NID, normally and independently distributed; PC, principal component; SREG, Sites (Environments) Regression model; SVD, singular value decomposition; SVP, singular value partitioning. Plant breeders and geneticists, as well as statisticians, have a long-standing interest in investigating and integrating G and GE in selecting superior genotypes in crop performance trials (Barah et al., 1981; Kang, 1988, 1993; Eskridge, 1990; Kang and Pham, 1991; Hühn, 1996; Yan et al., 2000). Many statistical methods have been developed for GED analysis, including AMMI analysis (Gauch 1992) and GGE biplot analysis (Yan and Kang, 2003; Yan and Tinker, 2006). The biplot (Gabriel, 1971) has become a popular data visualization tool in many scientifi c research areas, including psychology, medicine, business, sociology, ecology, and agricultural sciences. Earlier uses of biplots in GED analyses include Bradu and Gabriel (1978), Kempton (1984), and Cooper and DeLacy (1994). The biplot tool has become increasingly popular among plant breeders and agricultural researchers since its use in cultivar evaluation and mega-environment investigation (Yan et al., Published in Crop Sci 47:643–655 (2007). doi: 10.2135/cropsci2006.06.0374 © Crop Science Society of America 677 S. Segoe Rd., Madison, WI 53711 USA REVIEW & INTERPRETATION
2000).Yan et al.(2000)referred to biplots based on sin- scaling).Mathem gular value ition (SVD)of nk 2 lea ank n natrix Z.This rep ntation anique except for hinlots”bee chan on all o nd/ which are the two sources of variation that are relevant to or all 6 andAn important property of the biplot is cultivar evaluation (Kang.1988.1993:Gauch and Zobel that the rank 2 approximation of any entry in the original 1996:Yan and Kang.2003) matrix Z can be computed by taking the inner product The commonly used GGE biplot is based on the Sites of the corresponding genotype and environment vectors, Regression(SREG)linear-bilinear(multiplicative)mode ie.6a,2)-,-2)=,di+di2 (Cornelius et al.,1996).which can be written as -,=∑入40+回 This is know -product property of the biplo logy (Yan e nean of genotype i in environm en 200 200 ran an ng,200 8 biplot interpreta and t is C) ined in the nd oth 1).The model is subied t to the found GGE biplots useful in mega ent analys 入,≥0 and to orthonormality Yan and rai s et al 2005-Sam scores that is et al.2005:Yan and Tinker.2005b:Dardanellia et al. =0 if with similar constraints on the 2006),genotype evaluation(Bhan et al.,2005;Malvar et [defined by replacing symbols (i.g,o)with (je,).The al.,2005:Voltas et al.,2005;Kang et al,2006),test-envi- e:are assumed NID(0.2/r).where r is the number of ronment evaluation (Yan and raican.2002:Blanche and replications within an environment. Myers,2006;Thomason and Phillips,2006 trait-as Least squares solution for is the empirical mean ciation and trait-profile analyses(Yan and Rajcan,2002 for the jth environme and th least squares solutions to M 004:Ober et a and heter para the ter analy and Hunt ble for rom tl th(for is the et 2006 mac del Rank(Z).In gen forGED alit super =1 also T of this nd inte multinlicative effects of the ith cultivar and ith enviro are:(i)to compare GGE biplotanalysis and AMMlanaly ment(for first usage of such terminology in a multiplic on three aspects of ged analysis namely mega-environ- tive model context,see Seyedsadr and Cornelius,1992). ment analysis,genotype evaluation,and test-environment Thus,Eq.[1]may be described as modeling the deviations evaluation:(ii)to discuss whether g and ge should be of the cell means from the environment means as a sum of combined or separated in GED analysis;and(iii)to discuss PCs,each of which is the product of a cultivar score the importance of model diagnosis in SVD-based analy. an environment score ()and a scale factor(the singular sis of GED.This disc ussion should enhance agrict ultura researchers'understanding of biplot analysis of GED GE biplot is onstr ed from PC the n cu THREE ASPECTS OF GED ANALYSIS USING GGE BIPLOTS and i-/ ent The of GED G e.,MET data for ingle e trai 0<f<1.i ale the s to enhance visual in of the biplot for nd fii particular purpose.Specifically,singular values are allo- type evaluation (Yan and Kang.2003).We use the vield cated entirely to cultivar scores if f=1 ithis is"cultivar. data of 18 winter wheat(Triticum aestivum L)genotypes focused"scaling (Yan,2002),or entirely to environment (Gl to G18)tested at nine Ontario locations(El to E9) scores if f=0("environment-focused"scaling):and f= (Table 1)as an example to illustrate the three aspects of 0.5 will allocate the square roots of the X values to cul biplot analysis.The same dataset was used extensively in tivar scores and also to environment scores ("symmetric Yan and Kang (2003)and Yan and Tinker (2006).When 644 WWW.CROPS.ORG CROP SCIENCE,VOL.47,MARCH-APRIL 2007
Reproduced from Crop Science. Published by Crop Science Society of America. All copyrights reserved. 644 WWW.CROPS.ORG CROP SCIENCE, VOL. 47, MARCH–APRIL 2007 2000). Yan et al. (2000) referred to biplots based on singular value decomposition (SVD) of environment-centered or within-environment standardized GED as “GGE biplots,” because these biplots display both G and GE, which are the two sources of variation that are relevant to cultivar evaluation (Kang, 1988, 1993; Gauch and Zobel, 1996; Yan and Kang, 2003). The commonly used GGE biplot is based on the Sites Regression (SREG) linear-bilinear (multiplicative) model (Cornelius et al., 1996), which can be written as 1 t ij j k ik jk ij k y = − μ = ∑λα γ + ε [1] where y – ij is the cell mean of genotype i in environment j; μj is the mean value in environment j; i = 1, ∙ ∙ ∙ g; j = 1, ∙ ∙ ∙ e, g and e being the numbers of cultivars and environments, respectively; and t is the number of principal components (PC) used or retained in the model, with t ≤ min(e,g − 1). The model is subject to the constraint λ1 ≥ λ2 ≥ ∙ ∙ ∙ λt ≥ 0 and to orthonormality constraints on the αik scores, that is, 1 ' g ik ik i= ∑ α α = 1 if k = k' and 1 ' g ik ik i= ∑ α α = 0 if k ≠ k', with similar constraints on the γjk scores [defi ned by replacing symbols (i,g,α) with (j,e, γ)]. The eij are assumed 2 NID(0, / ) σ r , where r is the number of replications within an environment. Least squares solution for μj is the empirical mean (y – .j) for the jth environment, and the least squares solutions for parameters in the term λk αikγjk (for i = 1, ∙ ∙ ∙ ,g; j = 1,…,e) are obtained from the kth PC of the SVD of the matrix Z = [zij], where zij = y – ij – y – .j. The maximum number of PCs available for estimating the model parameters is p = Rank(Z). In general, p ≤ min(e, g − 1), with equality holding in most cases. For k = 1, 2, 3, ∙ ∙ ∙ , αik and γjk have also been characterized as primary, secondary, tertiary, etc., multiplicative eff ects of the ith cultivar and jth environment (for fi rst usage of such terminology in a multiplicative model context, see Seyedsadr and Cornelius, 1992). Thus, Eq. [1] may be described as modeling the deviations of the cell means from the environment means as a sum of PCs, each of which is the product of a cultivar score (αik), an environment score (γjk), and a scale factor (the singular value, λk ). The GGE biplot is constructed from the fi rst two PCs from the SVD of Z with “markers,” one for each cultivar, plotted with 1 1 ˆ ˆ f λ αi as abscissa and 2 2 ˆ ˆ f λ αi as ordinate. Similarly, markers for environments are plotted with 1 1 1 ˆ ˆf j − λ γ as abscissa and 1 2 2 ˆ ˆf j − λ γ as ordinate. The exponent f, with 0 ≤ f ≤ 1, is used to rescale the cultivar and environment scores to enhance visual interpretation of the biplot for a particular purpose. Specifi cally, singular values are allocated entirely to cultivar scores if f = 1 [this is “cultivarfocused” scaling (Yan, 2002)], or entirely to environment scores if f = 0 (“environment-focused” scaling); and f = 0.5 will allocate the square roots of the λˆ k values to cultivar scores and also to environment scores (“symmetric” scaling). Mathematically, a GGE biplot is a graphical representation of the rank 2 least squares approximation of the rank p matrix Z. This representation is unique except for possible simultaneous sign changes on all 1 ˆαi and 1 ˆ j γ and/ or all 2 ˆαi and 2 ˆ j γ . An important property of the biplot is that the rank 2 approximation of any entry in the original matrix Z can be computed by taking the inner product of the corresponding genotype and environment vectors, i.e., ( )( ) 1 1 1 1 2 2 1 1 2 2 111 22 2 ˆˆ ˆ ˆ ˆ ˆ ˆ ˆ ˆ ˆ ˆˆ ˆˆ , , ff f f i i j j ij ij − − ′ λα λα λ γ λ γ = λα γ + λα γ . This is known as the inner-product property of the biplot. The GGE biplot methodology (Yan et al., 2000; Yan, 2001, 2002; Yan and Kang, 2003; Yan and Tinker, 2006) consists of a set of biplot interpretation methods, whereby important questions regarding genotype evaluation and test-environment evaluation can be visually addressed. Increasingly, plant breeders and other agronomists have found GGE biplots useful in mega-environment analysis (Yan and Rajcan, 2002; Casanoves et al., 2005; Samonte et al., 2005; Yan and Tinker, 2005b; Dardanellia et al., 2006), genotype evaluation (Bhan et al., 2005; Malvar et al., 2005; Voltas et al., 2005; Kang et al., 2006), test-environment evaluation (Yan and Rajcan, 2002; Blanche and Myers, 2006; Thomason and Phillips, 2006), trait-association and trait-profi le analyses (Yan and Rajcan, 2002; Morris et al., 2004; Ober et al., 2005), and heterotic pattern analysis (Yan and Hunt, 2002; Narro et al., 2003; Andio et al., 2004; Bertoia et al., 2006). The legitimacy of GGE biplot analysis was, however, recently questioned by Gauch (2006), who concluded that, for GED analyses, AMMI analysis was either superior or equal to GGE biplot analysis. The objectives of this review and interpretation paper are: (i) to compare GGE biplot analysis and AMMI analysis on three aspects of GED analysis, namely, mega-environment analysis, genotype evaluation, and test-environment evaluation; (ii) to discuss whether G and GE should be combined or separated in GED analysis; and (iii) to discuss the importance of model diagnosis in SVD-based analysis of GED. This discussion should enhance agricultural researchers’ understanding of biplot analysis of GED. THREE ASPECTS OF GED ANALYSIS USING GGE BIPLOTS The analysis of GED (i.e., MET data for a single trait) should include three major aspects: (i) mega-environment analysis; (ii) test-environment evaluation, and (iii) genotype evaluation (Yan and Kang, 2003). We use the yield data of 18 winter wheat (Triticum aestivum L.) genotypes (G1 to G18) tested at nine Ontario locations (E1 to E9) (Table 1) as an example to illustrate the three aspects of biplot analysis. The same dataset was used extensively in Yan and Kang (2003) and Yan and Tinker (2006). When
pltmeamtlnotione,dhncmnioementlS or cult across years (Yan and Rajcan 2002.dat genotyp and G years re I to de whether or o ur ncan be divide (:2006) es ofG ang 200 ested at the (sub-)set of Mega-environment Analysis ent o A GGE biplot is constructed by plotting the first prin- but not sufficient for declating diffe cipal component(PC1)scores of the ge notypes and the ments.For example,even if the target environments can environments against their respective scores for the second be subdivided into Group 1 and Group 2 repeatedly across principal component(PC2)that result from SVD of envi years,the target environment still may not be meaning ronment-centered or environment-standardized GED. fully divided if cultivar A and B win in Groups 1 and 2. The"which-won-where"view of the GGE biplot (Yan et respectively,in 1 yr,but the which-won-where pattern a set on for a repea wn fron ne eac which- an m es at righ of the polygon g pattern (Ya C, 200 ma an d K ar L- ep line that starts from the bi and p be divided in n side s the se of hype in the harlev ex nle gi nin Yan and Tinker (2005h)the cal environmen in which the two cultivars defining that GE that causes the c sovers among winning genotyp side perform cqually:the relative ranking of the two culti- can be exploited by selecting in and for each mega-er vars would be reversed in environments on opposite sides ronment.If the crossover GE patterns are not repeatable of the line (the so-called"crossover GE").Therefore.the across years,the GE cannot be exploited.Rather,it must perpendicular lines to the polygon sides divide the biplot be avoided by selecting high yielding and stable genotype into sectors,each having its own winning cultivar. across target environments win ing cul for a se or is ppropriate mega-enviro ment analysi should cla ction h ygon sides wh perpend sif的 the target environmen into one of three possibl ry o sector;it is p sector (see Meanyield ( E1 7993G kers fall into a single this indicates that,to a r Geno Test Environments had the highest yield types E1 E2 E4 E5 E6 E7 E8 E9 Mean .If er E3 markers fall into different sectors this indicates 42 that different cultivars won in different sectors revealin the which-won-where pattern ofa ged set is an intrinsi 473475338300.2 3.4545 property of the GGE biplot rendered by the inner-prod- 4.394.603513.8557 54251541 uct property of the biplot(Yan and Kang.2003).Once a 5184482.993776.58 GGE biplot is constructed.the polygon and the lines that 338418274316534427416406203370 divide the biplot into sec tors can be drawn by hand without 48546644330555458341750635746 alculation.In the of th G9 5.044.743.513.445.964.864.984.512.864.43 1)ba th 11 10 5.204.663.603.765.945.353.904.453.304.46 1 G11 4.294.532.763.426.145.254.884.143.154.28 all G12 3.153.0 2.392.35 1.23 4.263.384.072.103.22 1 4.10 2.30 3.72 4565,152.604.962.893.80 1g1 G8 G14 3.34 2.78 .635.0 3.283.92 258 3.5 highest vieldin This G1 4.3 that the may be divided into different mega Since a mega-environment is defined as a groun of locations that consistently share the best set of genotypes 4443.143.496.68 5e 4.244.36290419 CROP SCIENCE.VOL.47.MARCH-APRIL 2007 WWW.CROPS.ORG 645
Reproduced from Crop Science. Published by Crop Science Society of America. All copyrights reserved. CROP SCIENCE, VOL. 47, MARCH–APRIL 2007 WWW.CROPS.ORG 645 supplemental information (e.g., data on environmental or genotypic covariates) is available, a fourth aspect, which is to understand the causes of G and GE, can be included (Yan and Hunt, 2001; Yan and Kang, 2003; Yan and Tinker, 2005b, 2006). Mega-environment Analysis A GGE biplot is constructed by plotting the fi rst principal component (PC1) scores of the genotypes and the environments against their respective scores for the second principal component (PC2) that result from SVD of environment-centered or environment-standardized GED. The “which-won-where” view of the GGE biplot (Yan et al., 2000) is an eff ective visual tool in mega-environment analysis. It consists of an irregular polygon and a set of lines drawn from the biplot origin and intersecting each of the sides at right angles. The vertices of the polygon are the genotype markers located farthest away from the biplot origin in various directions, such that all genotype markers are contained within the resulting polygon. A line that starts from the biplot origin and perpendicularly intersects a polygon side represents the set of hypothetical environments in which the two cultivars defi ning that side perform equally; the relative ranking of the two cultivars would be reversed in environments on opposite sides of the line (the so-called “crossover GE”). Therefore, the perpendicular lines to the polygon sides divide the biplot into sectors, each having its own winning cultivar. The winning cultivar for a sector is the vertex cultivar at the intersection of the two polygon sides whose perpendicular lines form the boundary of that sector; it is positioned usually, but not necessarily, within its winning sector (see Yan, 2002 for a detailed example). If all environment markers fall into a single sector, this indicates that, to a rank-two approximation, a single cultivar had the highest yield in all environments. If environment markers fall into diff erent sectors, this indicates that diff erent cultivars won in diff erent sectors. Revealing the which-won-where pattern of a GED set is an intrinsic property of the GGE biplot rendered by the inner-product property of the biplot (Yan and Kang, 2003). Once a GGE biplot is constructed, the polygon and the lines that divide the biplot into sectors can be drawn by hand without further calculation. In the which-won-where view of the GGE biplot (Fig. 1) based on the data in Table 1, the nine environments fell into two sectors with diff erent winning cultivars. Specifi cally, G18 was the highest yielding cultivar in E5 and E7 (but only slightly higher than several other cultivars with markers in close proximity to G18), and G8 was the highest yielding cultivar in the other environments. This crossover GE suggests that the target environments may be divided into diff erent mega-environments. Since a mega-environment is defi ned as a group of locations that consistently share the best set of genotypes or cultivars across years (Yan and Rajcan, 2002), data from multiple years are essential to decide whether or not the target region can be divided into diff erent mega-environments. Furthermore, a defi nitive conclusion must be based on data in which the same (sub-)set of genotypes is tested at the same (sub-)set of test locations across multiple years. Repeatable environment grouping is necessary, but not suffi cient, for declaring diff erent mega-environments. For example, even if the target environments can be subdivided into Group 1 and Group 2 repeatedly across years, the target environment still may not be meaningfully divided if cultivar A and B win in Groups 1 and 2, respectively, in 1 yr, but the which-won-where pattern is reversed in another year. The necessary and suffi cient condition for mega-environment division is a repeatable which-won-where pattern rather than merely a repeatable environment-grouping pattern (Yan and Rajcan, 2002; Yan and Kang, 2003). If the which-won-where or crossover patterns are repeatable across years and, hence, the target environment can be divided into subregions or mega-environments, as in the barley example given in Yan and Tinker (2005b), the GE that causes the crossovers among winning genotypes can be exploited by selecting in and for each mega-environment. If the crossover GE patterns are not repeatable across years, the GE cannot be exploited. Rather, it must be avoided by selecting high yielding and stable genotypes across target environments. Appropriate mega-environment analysis should classify the target environment into one of three possible Table 1. Mean yield (Mg ha−1) of 18 winter wheat cultivars (G1 to G18) tested at nine Ontario locations (E1 to E9) in 1993. Genotypes Test Environments E1 E2 E3 E4 E5 E6 E7 E8 E9 Mean G1 4.46 4.15 2.85 3.08 5.94 4.45 4.35 4.04 2.67 4.00 G2 4.42 4.77 2.91 3.51 5.70 5.15 4.96 4.39 2.94 4.31 G3 4.67 4.58 3.10 3.46 6.07 5.03 4.73 3.90 2.62 4.24 G4 4.73 4.75 3.38 3.90 6.22 5.34 4.23 4.89 3.45 4.54 G5 4.39 4.60 3.51 3.85 5.77 5.42 5.15 4.10 2.83 4.40 G6 5.18 4.48 2.99 3.77 6.58 5.05 3.99 4.27 2.78 4.34 G7 3.38 4.18 2.74 3.16 5.34 4.27 4.16 4.06 2.03 3.70 G8 4.85 4.66 4.43 3.95 5.54 5.83 4.17 5.06 3.57 4.67 G9 5.04 4.74 3.51 3.44 5.96 4.86 4.98 4.51 2.86 4.43 G10 5.20 4.66 3.60 3.76 5.94 5.35 3.90 4.45 3.30 4.46 G11 4.29 4.53 2.76 3.42 6.14 5.25 4.86 4.14 3.15 4.28 G12 3.15 3.04 2.39 2.35 4.23 4.26 3.38 4.07 2.10 3.22 G13 4.10 3.88 2.30 3.72 4.56 5.15 2.60 4.96 2.89 3.80 G14 3.34 3.85 2.42 2.78 4.63 5.09 3.28 3.92 2.56 3.54 G15 4.38 4.70 3.66 3.59 6.19 5.14 3.93 4.21 2.93 4.30 G16 4.94 4.70 2.95 3.90 6.06 5.33 4.30 4.30 3.03 4.39 G17 3.79 4.97 3.38 3.35 4.77 5.30 4.32 4.86 3.38 4.24 G18 4.24 4.65 3.61 3.91 6.64 4.83 5.01 4.36 3.11 4.48 Mean 4.36 4.44 3.14 3.49 5.68 5.06 4.24 4.36 2.90 4.19
nmv2 1.This AEC view is based on gular valus 0.8 ing (SVP).that is.the are entirely par G3 G18 titioned into the genotyp 0.4 G12 G2G6 (GGE biplot option "SVP =1") (Yan,2002).This AEC view with G6 SVP =1 is also referred to as the 0.0 G14 G1S16 “Mean vs.Stability”view because E4E3 E1 it facilitates genotype compari sons bas d on mean performance 04 d stability across environments G17 E8 GiEE9 men e AEC 0.8 d li -G8 G13 rough the d the 12 which is -1.2 -0.8 -0.4 0.0 0.4 0.8 1.2 1.6 at the center of the small circle with coordinates (ie PC1 means of environment PCI and Figure 1.The"which-won-where"view of the GGE biplot based on the G x E data in Table 1.The PC2 scores.The axis of the Aec ordinate is the double-arrowed a SVP-2" and th a the line that passes through the bip- nvironm ents.It explai d78%omeoalrcETmeg8nopesa8beeaa6Goeigang lot origin and is perpendicular to the environments are labeled as E1 to Eg the AEC or th the bip types (Table 2).Type 1is the easiest target environment lot,the projections e genotype ma aver ally an opt age env. prop of the GE opp on of the sho app genoty G.The 2 is of t山 if they hole int of AEC ab and GE an alysis.T oints in the dir tion of highe 3 is th lenging target environment and, fortunately.also the nance most common one Unless G is too small to be meaningful,the ranking of Genotype evaluation and test-environment evaluation the genotypes on the AEC abscissa is always perfectly or become meaningful only after the mega-environment highly correlated with G.the correlation being 1.0 for the issue is addressed.Within a single mega-environment, cultivars should be evaluated for their mean performance ing to G as follows:G8: G and stability across environments (Fig.2);and the test 16 17 G18 G6 G2>Mean G11 G3> environments should be evaluated for being,or not being. G13 G1 1 G7 >G12 representative or the target environment and for thei Since GG repre nts G+GE and since the power to d criminate among genotypes(Fig.3). Genotype Evaluation Gen ingful only for T1 otyp spe G4 wa stable loc d al nost on the AeC high mear nd higl stability within a me the AFC This ind environment assuming that the mega. nvironment dif that its rank was highly consistent across environments ferentiation in Fig.1 is repeatable across years,genotype within this mega-environment.In contrast,G17 and G6 evaluation should be conducted for each mega-enviro were two of the least stable genotypes with above average ment.Figure 2 is the "Average Environment Coordina mean performance tion"(AEC)view (Yan,2001)ofthe GGE biplot involving Yan (2001)defined an "ideal"genotype on the basis the seven environments in the G8 niche identified in Fig. of both mean performance and stability.and the geno 646 WWW.CROPS.ORG CROP SCIENCE,VOL.47,MARCH-APRIL 2007
Reproduced from Crop Science. Published by Crop Science Society of America. All copyrights reserved. 646 WWW.CROPS.ORG CROP SCIENCE, VOL. 47, MARCH–APRIL 2007 types (Table 2). Type 1 is the easiest target environment one can hope for, but it is usually an overoptimistic expectation. Type 2 suggests opportunities for exploiting some of the GE. Such opportunities should not be overlooked if they exist, which is the whole point of mega-environment analysis and GE analysis. Type 3 is the most challenging target environment and, unfortunately, also the most common one. Genotype evaluation and test-environment evaluation become meaningful only after the mega-environment issue is addressed. Within a single mega-environment, cultivars should be evaluated for their mean performance and stability across environments (Fig. 2); and the test environments should be evaluated for being, or not being, representative of the target environment and for their power to discriminate among genotypes (Fig. 3). Genotype Evaluation Genotype evaluation is mean ingful only for a specifi c mega-environment, and an ideal geno type should have both high mean performance and high stability within a megaenvironment. Assuming that the mega-environment differentiation in Fig. 1 is repeatable across years, genotype evaluation should be conducted for each mega-environment. Figure 2 is the “Average Environment Coordination” (AEC) view (Yan, 2001) of the GGE biplot involving the seven environments in the G8 niche identifi ed in Fig. 1. This AEC view is based on genotype-focused singular value partitioning (SVP), that is, the singular values are entirely partitioned into the genotype scores (GGE biplot option “SVP = 1”) (Yan, 2002). This AEC view with SVP = 1 is also referred to as the “Mean vs. Stability” view because it facilitates genotype comparisons based on mean performance and stability across environments within a mega-environment. The axis of the AEC abscissa, or “average environment axis,” is the single-arrowed line that passes through the biplot origin and the “average environment,” which is at the center of the small circle with coordinates .1 .2 (, ) γ γ ˆ ˆ , i.e., means of environment PC1 and PC2 scores. The axis of the AEC ordinate is the double-arrowed line that passes through the biplot origin and is perpendicular to the AEC abscissa. Because of the inner-product property of the biplot, the projections of the genotype markers on the “average environment axis” are proportional to the rank-two approximation of the genotype means and represent the main eff ects of the genotypes, G. The arrow shown on the axis of the AEC abscissa points in the direction of higher mean performance of the genotypes and, consequently ranks the genotypes with respect to mean performance. Unless G is too small to be meaningful, the ranking of the genotypes on the AEC abscissa is always perfectly or highly correlated with G, the correlation being 1.0 for the current example. Thus, the genotypes are ranked according to G as follows: G8 > G4 = G10 > G5 = G9 = G15 = G16 = G17 = G18 > G6 > G2 > Mean = G11 > G3 > G13 > G1 > G14 > G7 > G12. Since GGE represents G+GE and since the AEC abscissa approximates the genotypes’ contributions to G, the AEC ordinate must approximate the genotypes’ contributions to GE, which is a measure of their stability or instability. Thus, G4 was the most stable genotype, as it was located almost on the AEC abscissa and had a nearzero projection onto the AEC ordinate. This indicates that its rank was highly consistent across environments within this mega-environment. In contrast, G17 and G6 were two of the least stable genotypes with above average mean performance. Yan (2001) defi ned an “ideal” genotype on the basis of both mean performance and stability, and the genoFigure 1. The “which-won-where” view of the GGE biplot based on the G × E data in Table 1. The data were not transformed (“Transform = 0”), not scaled (“Scaling = 0”), and were environmentcentered (“Centering = 2”). The biplot was based on environment-focused singular value partitioning (“SVP = 2”) and therefore is appropriate for visualizing the relationships among environments. It explained 78% of the total G+GE. The genotypes are labeled as G1 to G18 and the environments are labeled as E1 to E9
types can beranked based on theirbip- from the ideal genotype G7%PC%8m%2vp小 0.8 E8 catio 200no be m G17 than either nean erformance o E3 G8 a stability index. G14 G12 G13 0.0 9 Test Environment Evaluation The purpose of test-environment evalu- ation is to identify test environments that -0.4 effectively identity superior genotypes for a mega-environment.An 'ideal"test environment should be both discrimi 0.8 sh t as Fig. excep 2 E1 ling (Ya 2002).that is -1.6 12 -0.8 04 0.0 0.4 0.8 1.6 the 12 sVp=2門s PC1 so tha riate for studving the relation and stability of the among test environments.This ure 2 The subset of the Gx F hins th GGE bipl type of AEC can be referred to as the deta in Table 1.The data were not transto med ("Transform =0").not scaled (Scaling "Discriminating power vs.Representa- "),and were environment-centered ("Centering=2).The biplot was based on genotype e parut tiveness"view of the GGE biplot.It can the simi pes be helpful in evaluating each of the test environments with respect to the following questions: Table 2.Three types of target environment based on mega-environ- 1.Is the test environment capable of discrim ment analysis. No Cross er GE eatable 0e2 e 1:t 2.ong genotypes cross years select specit 3 Do represent te e of the mega-en Strate test at a the data are not scaled for standardized) Not ear s ("Scaling=0"),the length of an environment vec repeatable ng of a sin gle but complex mega- tor is pr portional to the standard deviation of cul- cross year tivar means in the environment.which is a measure the gy:select a set of cultivars of the discriminating power of the environment perfo e and stabil assuming that the experimental errors of the test environments are comparable.Test environments with longer vectors (like El in our example)are of th more discrimi environ ments that have small angles with it (e.g.E2.E3.E4 that is. ment has very ang at wl en SVI he ny en t is not wellr ed by PCI and PC2 if the biplot on the values in tha nvironment and the st of the GGE of the data. A of Fig.3is to indicate the test-environ Based on fig 3 a test environment may he classified ments'representativeness of the mega-environment Since into one of three types (Table 3).Type 1 environments the AEC abscissa is the "average-environment axis,"test have short vectors and provide little or no information CROP SCIENCE,VOL.47,MARCH-APRIL 2007 WWW.CROPS.ORG 647
Reproduced from Crop Science. Published by Crop Science Society of America. All copyrights reserved. CROP SCIENCE, VOL. 47, MARCH–APRIL 2007 WWW.CROPS.ORG 647 types can be ranked based on their biplot distance from the ideal genotype. Dimitrios Baxevanos (personal communication, 2006) found this GGE distance to be more repeatable across years than either mean performance or a stability index. Test Environment Evaluation The purpose of test-environment evaluation is to identify test environments that eff ectively identify superior genotypes for a mega-environment. An “ideal” test environment should be both discriminating of the genotypes and representative of the mega-environment. Figure 3 is the same GGE biplot as Fig. 2 except that it is based on environment-focused scaling (Yan, 2002), that is, the singular values were entirely partitioned into the environment scores (“SVP = 2”) so that it is appropriate for studying the relationships among test environments. This type of AEC can be referred to as the “Discriminating power vs. Representativeness” view of the GGE biplot. It can be helpful in evaluating each of the test environments with respect to the following questions: 1. Is the test environment capable of discriminating among the genotypes, i.e., does it provide much information about the diff erences among genotypes? 2. Is it representative of the mega-environment? 3. Does it provide unique information about the genotypes? When the data are not scaled (or standardized) (“Scaling = 0”), the length of an environment vector is proportional to the standard deviation of cultivar means in the environment, which is a measure of the discriminating power of the environment, assuming that the experimental errors of the test environments are comparable. Test environments with longer vectors (like E1 in our example) are more discriminating of the genotypes. If a test environment marker falls close to the biplot origin, that is, if the test environment has a very short vector, it means that all genotypes performed similarly in it and therefore it provided little or no information about the genotype differences. A short vector could also mean that the environment is not well represented by PC1 and PC2 if the biplot does not explain most of the GGE of the data. A second usage of Fig. 3 is to indicate the test-environments’ representativeness of the mega-environment. Since the AEC abscissa is the “average-environment axis,” test environments that have small angles with it (e.g., E2, E3, E4, E6, and E9 are more representative of the mega-environment than those that have larger angles with it, e.g., E1 and E8). This follows from the fact that when SVP = 2, the cosine of the angle between any environment vector and the “average environment axis” approximates the correlation coeffi cient between the genotype values in that environment and the genotype means across the environments. Based on Fig. 3, a test environment may be classifi ed into one of three types (Table 3). Type 1 environments have short vectors and provide little or no information Figure 2. The “mean vs. stability” view of the GGE biplot based on a subset of the G × E data in Table 1. The data were not transformed (“Transform = 0”), not scaled (“Scaling = 0”), and were environment-centered (“Centering = 2”). The biplot was based on genotypefocused singular value partitioning (“SVP = 1”) and therefore is appropriate for visualizing the similarities among genotypes. It explained 79.5% of the total G+GE for the subset. Table 2. Three types of target environment based on mega-environment analysis. With Crossover GE No Crossover GE Repeatable across years Type 2: target environment consisting of multiple mega-environments. Strategy: select specifi cally adapted genotypes for each megaenvironment. A single year multilocation trial may be suffi cient. Type 1: target environment consisting of a single, simple megaenvironment. Strategy: test at a single test location in a single year suffi ces to select for a single best cultivar. Not repeatable across years Type 3: target environment consisting of a single but complex megaenvironment. Strategy: select a set of cultivars for the whole region based on both mean performance and stability based on data from multiyear and multilocation tests
orted one test loca 1.2 s always highly G17 one of the other three lo ions in ranking edundan 08 test location. Yan2001)def6 ned an“ideal”test environment,which is a virtual environ- G14 G13 G12 E84E9E3 ment that has the longest vector of all test environments (most discriminating)and is 0.0 G7 located on the AECabscissa(most represen e visuall G10 heir use 0.4 super ea on th d G96 on the ideal eir mar -0.8 Blanche 2006 G6 this idea cre and Myers ely in the 12 0.8 0.4 00 0.4 0.8 1.6 tion should he an imnortant a aspect of GED P C1 analysis.Analysis of historical MET data Discriminating power and representativeness of the test environ can lead to the identification ofa minimum Figure 3.The"discriminating power vs.repre view of the GGE biplo set of test environments (locations)for cul- on a e1.The dat话 were tivar evaluation.For example,E3 may be 2”. was based on genotype-focused singular value partitioning(SVP regarded as an ideal test location and E1. nd the ng the relationships among ervironments E3,and E8 may constitute a minimum set of test locations if the pattern shown in Fig. acro For quan mapping studies, h about the genoty sand,therefore,should not be used identification ha s)is cial he nd small a ong th the pe If bude ents (An g,per ication.2006). ments are the first choice iyne 3 envire ments have long vectors and large angles with the AEC abscissa (e.g.,E1); they cannot be used in selecting superior genotypes,but THREE ASPECTS OF GED are useful in culling unstable genotypes. ANALYSIS USING AMMI GRAPHS Useful test environments should be further exam Mega-Environment Analysis ined for their uniqueness.Some environments may never The AMMI1 graph,first proposed in Gauch and Zobe provide unique information,as they are always similar (1997),was designed to address the which-won-where to some other environment(s)in separating and ranking pattern.In this graph,the abscissa represents the the genotypes. environ ents he first interaction pr cipal con ithout L 1g much in 1)and t e ordinate rep e ents genotype e nd IPCL.E on genotype mean yi yp nd l of d 1ch2002b multivear data.In Fig.3.five envin 12t E6.and E9)were highly correlated in their ranking of the genotynes indicating that these environments produced a gge biplot like fig 1 this may be true in some cases similar information about the genotypes.If this pattern but,even then,the GGE biplot is more advantageous in repeats across years,then it can be concluded that some of everal aspects.First.the gGe biplot always explains more them are redundant and can be dropped.In analyzing a G+GE than the AMMIl graph and is,therefore,a more multivear Ontario soybean performance trial dataset.Yan accurate presentation of the GGE of the data.For exam 648 WWW.CROPS.ORG CROP SCIENCE,VOL.47,MARCH-APRIL 2007
Reproduced from Crop Science. Published by Crop Science Society of America. All copyrights reserved. 648 WWW.CROPS.ORG CROP SCIENCE, VOL. 47, MARCH–APRIL 2007 about the genotypes and, therefore, should not be used as test environments. Type 2 environments have long vectors and small angles with the AEC abscissa and are ideal for selecting superior genotypes. If budgetary constraints allow only a few test environments, Type 2 test environments are the fi rst choice. Type 3 environments have long vectors and large angles with the AEC abscissa (e.g., E1); they cannot be used in selecting superior genotypes, but are useful in culling unstable genotypes. Useful test environments should be further examined for their uniqueness. Some environments may never provide unique information, as they are always similar to some other environment(s) in separating and ranking the genotypes. Some (not all) of these environments can be dropped without losing much information about the genotypes. Testing cost can be reduced and effi ciency improved by using a minimum set of test environments. Identifi cation and removal of noninformative and redundant test locations (not environments) must be based on multiyear data. In Fig. 3, fi ve environments (E2, E3, E4, E6, and E9) were highly correlated in their ranking of the genotypes, indicating that these environments produced similar information about the genotypes. If this pattern repeats across years, then it can be concluded that some of them are redundant and can be dropped. In analyzing a multiyear Ontario soybean performance trial dataset, Yan and Rajcan (2002) reported one test location that was always highly correlated with one of the other three locations in ranking genotypes and was regarded as a redundant test location. Yan (2001) defi ned an “ideal” test environment, which is a virtual environment that has the longest vector of all test environments (most discriminating) and is located on the AEC abscissa (most representative). Test environments can be visually ranked for their usefulness in identifying superior genotypes based on the distances on the GGE biplot between their markers and the marker of the ideal test environment. Blanche and Myers (2006) used this idea creatively in their study of cotton test locations. Test-environment evaluation should be an important aspect of GED analysis. Analysis of historical MET data can lead to the identifi cation of a minimum set of test environments (locations) for cultivar evaluation. For example, E3 may be regarded as an ideal test location and E1, E3, and E8 may constitute a minimum set of test locations if the pattern shown in Fig. 3 is repeatable across years. For quantitative trait loci (QTL) mapping studies, the identifi cation of a few discriminating and representative test environments (locations) is even more crucial because it is usually not feasible to test a large number of genotypes in many environments (Anna McClung, personal communication, 2006). THREE ASPECTS OF GED ANALYSIS USING AMMI GRAPHS Mega-Environment Analysis The AMMI1 graph, fi rst proposed in Gauch and Zobel (1997), was designed to address the which-won-where pattern. In this graph, the abscissa represents the environment scores for the fi rst interaction principal component (IPC1) and the ordinate represents the “nominal yield” based on genotype mean yield (G) and IPC1. Each genotype is represented by a straight line defi ned by that genotype’s mean yield and IPC1 score (i.e., regression on the environment IPC1 score). Ebdon and Gauch (2002b) claimed that mega-environment classifi cation based on this method should be virtually the same as that based on a GGE biplot like Fig. 1. This may be true in some cases, but, even then, the GGE biplot is more advantageous in several aspects. First, the GGE biplot always explains more G+GE than the AMMI1 graph and is, therefore, a more accurate presentation of the GGE of the data. For examFigure 3. The “discriminating power vs. representativeness” view of the GGE biplot based on a subset of the G × E data in Table 1. The data were not transformed (“Transform = 0”), not scaled (“Scaling = 0”), and were environment-centered (“Centering = 2”). The biplot was based on genotype-focused singular value partitioning (“SVP = 2”) and therefore is appropriate for visualizing the relationships among environments. It explained 79.5% of the total G+GE for the subset
e GGE bip Table 3.Three types of test environments based on test environment evaluation and 6460 graph Discriminating Nondiscriminating Representative Type 2:Ideal for selecting superior genotypes. Type 1:Useless onte et a Not representative Type 3:Useful for culling interior genotypes. not always e to vi in the AMMIl aph articularly when many advocates where Amml2 (AMMI with two iPCs)and types and test environments are i AMMI7 (AMMI with seven IPCs)were identified as the Fig.2 of Ebdon and Gauch (2002b).This is because,in best models(Ebdon and Gauch,2002b). the Ammll graph.the environments can be labeled only along the abscissa rather than across the graph,and the Genotype Evaluation genotypes are represented by straight lines rather than by The AMMI1 biplot(Zobel et al.,1988)is the most well- dots. Moreover.whereas the which-won- -where view of known and appealing component of AMMI analysis.Its aGGE biplot is an intrinsic property of the GGE biplot. abscissa represents the main effects (Gand E)and its ordi represents the IPC scores I heretore,it prov bette ans ng the mean perfo a th nng co sions rather than as a AMM The GCE nd gauch 200261 ch2006)f mportant po erty ofa hich-won-where if m than two PCs are 人 lays the data Thi is easily solved by generating GGE biplots for each gr is why a different AMMIl graph(Gauch and Zobel,1997) of environments,as exemplified in Fig.2 vs.Fig.1.In is needed for visualizing the which-won-where patterr contrast,such remains a challenge in AMMI analysis if as discussed above.There are two other reasons why the more than one IPC is required.Although Gauch (1992) AMMI1 biplot is less useful to breeders than the GGE bip proposed an AMMI2 graph for mega-environment analy- lot.First.it always explains less G+GE than the GGE bip sis when two IPCs are needed to approxi nate the data, lot.Second,its shape is completely subjective because the has hu axes are ir nt units (origina lunit for ment ed by their th Unli b th in ng g otype ment ma test ld h onment evalua to the p tiaedneaphBa m identify the Test Environment Evaluation this graph is better understood as entation tool rather than a nattern-discovery genotype evaluation is an important component of GED tool,while pattern discovery is the primary interest of analysis,which has a great impact in plant breeding,it has GED analysis.Gauch(1992)envisioned this graph as a not been a research topic in ammlanalysis.The ammIl 3D plot with G being a third dimension perpendicular biplot (Zobel et al,1988)displays the test environments to the IPC1 vs.IPC2 plane.Even so,it is still not capable by their main effects E and IPCI scores,but it provides no because information on the environment's ability in identifying not in the same units just as superior cultivars. h the hot in er). ne G AND GE:JOINTLY OR SEPARATELY? AMMI2 (2006 ana b om GE ed tha h1992 AMMI GGE biplo uld be located of the of G from GE.Hov IPC2 This t is neithe in the ir ts of plant breeders nor in th ever,the ersal losers would also he located exactly in interests of or owers.to base selection of cultivars eithe the same area.The usefulness of this AmmI2 graph has solely on G or on GE(Kang,1993).We believe that GGE never been demonstrated,even in the work of the AMMI biplot analysis achieves much more than AMMI analysis CROP SCIENCE.VOL.47.MARCH-APRIL 2007 WWW.CROPS.ORG 649
Reproduced from Crop Science. Published by Crop Science Society of America. All copyrights reserved. CROP SCIENCE, VOL. 47, MARCH–APRIL 2007 WWW.CROPS.ORG 649 ple, for a rice dataset, the GGE biplot and the AMMI1 graph explained 77.3 and 64.6% of the total G+GE, respectively (Samonte et al., 2005). Second, the which-won-where patterns are not always easy to visualize in the AMMI1 graph, particularly when many genotypes and test environments are involved, as shown in Fig. 2 of Ebdon and Gauch (2002b). This is because, in the AMMI1 graph, the environments can be labeled only along the abscissa rather than across the graph, and the genotypes are represented by straight lines rather than by dots. Moreover, whereas the which-won-where view of a GGE biplot is an intrinsic property of the GGE biplot, the AMMI1 graph is a completely diff erent graph than the AMMI1 biplot. Therefore, the AMMI1 graph is better viewed as a tool for presenting conclusions rather than as a tool for discovering which-won-where patterns. The GGE biplot was criticized by Ebdon and Gauch (2002b) and Gauch (2006) for not being able to reveal which-won-where patterns if more than two PCs are required to approximate the data. This problem, however, is easily solved by generating GGE biplots for each group of environments, as exemplifi ed in Fig. 2 vs. Fig. 1. In contrast, such remains a challenge in AMMI analysis if more than one IPC is required. Although Gauch (1992) proposed an AMMI2 graph for mega-environment analysis when two IPCs are needed to approximate the data, its usefulness has not been demonstrated thus far. This graph is a plot of test environments defi ned by their IPC1 and IPC2 scores; the test environments are grouped by the IPC1 and IPC2 scores, and the winning genotypes for each group are identifi ed from the genotype by environment table of “predicted” yield based on the AMMI2 model and superimposed on the graph. Because one must go to the predicted yield table to identify the winning genotypes, this graph is better understood as a conclusion-presentation tool rather than a pattern-discovery tool, while pattern discovery is the primary interest of GED analysis. Gauch (1992) envisioned this graph as a 3D plot with G being a third dimension perpendicular to the IPC1 vs. IPC2 plane. Even so, it is still not capable of identifying the which-won-where patterns, because G and the IPC scores are not in the same units, just as the two axes of the AMMI1 biplot are not in the same units (more discussion on this later). Moreover, neither AMMI1 nor AMMI2 graph has the inner-product property of a true biplot, which is the underpinning of biplot analysis. Gauch (1992) hypothesized that the universal winners would be located near the origin of the IPC1 vs. IPC2 graph. This may be true in some cases; however, the universal losers would also be located exactly in the same area. The usefulness of this AMMI2 graph has never been demonstrated, even in the work of the AMMI advocates where AMMI2 (AMMI with two IPCs) and AMMI7 (AMMI with seven IPCs) were identifi ed as the best models (Ebdon and Gauch, 2002b). Genotype Evaluation The AMMI1 biplot (Zobel et al., 1988) is the most wellknown and appealing component of AMMI analysis. Its abscissa represents the main eff ects (G and E) and its ordinate represents the IPC1 scores. Therefore, it provides a means of visualizing the mean performance (G) and the stability (IPC1) of the genotypes simultaneously. However, although regarded as a biplot, the AMMI1 biplot does not have the most important property of a true biplot, namely the inner-product property. As a result, the performance of a given genotype in a given environment cannot be accurately visualized even if it fully displays the data. This is why a diff erent AMMI1 graph (Gauch and Zobel, 1997) is needed for visualizing the which-won-where pattern as discussed above. There are two other reasons why the AMMI1 biplot is less useful to breeders than the GGE biplot. First, it always explains less G+GE than the GGE biplot. Second, its shape is completely subjective because the axes are in diff erent units (original unit for the abscissa and square root of the original unit for the ordinate). Unlike the GGE biplot, the AMMI1 biplot also presents the environment main eff ects of the test environments or E, which is irrelevant to cultivar and test-environment evaluation (Yan and Kang, 2003). Test Environment Evaluation Although identifying test environments for eff ective genotype evaluation is an important component of GED analysis, which has a great impact in plant breeding, it has not been a research topic in AMMI analysis. The AMMI1 biplot (Zobel et al., 1988) displays the test environments by their main eff ects E and IPC1 scores, but it provides no information on the environment’s ability in identifying superior cultivars. G AND GE: JOINTLY OR SEPARATELY? Gauch (2006) criticized GGE biplot analysis for not explicitly separating G from GE and concluded that AMMI analysis was “always superior” over GGE biplot analysis for its clear separation of G from GE. However, it is neither in the interests of plant breeders, nor in the interests of growers, to base selection of cultivars either solely on G or on GE (Kang, 1993). We believe that GGE biplot analysis achieves much more than AMMI analysis Table 3. Three types of test environments based on test environment evaluation. Discriminating Nondiscriminating Representative Type 2: Ideal for selecting superior genotypes. Type 1: Useless Not representative Type 3: Useful for culling inferior genotypes
often reveal that major QTL are res nsible for hoth G nd GE al.1996 He koever et al..2004).An AMMI analysis often from relations hetw res for Ge should be separated (e.g..see figures in Ebdon and Gauch,2002a),suggesting common genetic controls for both G and GE.Therefore. G and GE Must Be Considered the stance that G and GE must be treated as distinct enti- Simultaneously in Genotype Evaluation ties in GED analysis is neither plausible nor supported by There is no disagreement among AMMI users and GGE agricultural and biological evidence. biplot users on this issue:all agree that G and GE must be co dedmmitancoustingmogahatGE I he The G and GE Are Interchangeable G bplot was esigned to include 0g10 dary ot for nega- men It is un th at G,th Gand GE:the 11 ich it is GGE biplot only that The G e tain less g+tge and have less fun aliti than th CCE a be GE if at into a wider of en ly GE then puts them to ther aga whereas GGE hinlot analy imated in a wider nts can hec ome g sis deals with g+ge directly Therefore explicit separa ifthe environments are subdivided.in other words g and tion of G from GE in AMMI analysis does not lead to the GE can be interpreted only in the context of the actual set conclusion that it is superior to gge biplot analysis of cultivars evaluated in the actual set of environments Recognition of the interchangeability between G and GE G and GE Are Mathematical Definitions is the sole justification for mega-environment analysis Gauch (2006)argued that AN AMI analysis s discussed earlier.The GE b omes G if the cope o ecause it GE the environments is narro ved;G be com GE have di cope of env widene gist o 101 rep analysis is to se opportu pta is higl it is fai n he ep that wid G and ge n f th is high per nce in it is determined by G+GE,not by GE alone.The GE is a from GE is primarily a mathematical manipulation that is component of specific adaptation;it alone has no defined not always supported by biological evidence.Combining agricultural implications.because a genotype interacting G and GE in GGE biplot analysis is essential for addressing positively with an environment can have the lowest vield plant breeding and agricultural problems.It is an intention in that environment. rather than a mistake,a strength rather than a weakness. It should also be recognized that G and GE are fun- damentally mathematical partition ng of the otal varia The Utility of the AEC Is Be yond tion of a GED set.Their correspondence to biological an sepa f G from ons is not au biplot (Fig. and 3)doe can f nd GE pre id。 and nt and the f the hinlot That i crop productivity in different regions (vernalization genes the aec allows genot ypes to be evaluated by their mean and photoperiodism genes are the most notable examples) performance and stability.and test environments evalu- but there is little evidence for the existence of genes whose ated by their discriminating power and representativeness expression is completely independent of the environment. Such functionality has not been shown for any other GED particularly for those that control agronomically important analysis methodology traits.The OTL studies involving multiple environments 650 WWW.CROPS.ORG CROP SCIENCE,VOL.47,MARCH-APRIL 2007
Reproduced from Crop Science. Published by Crop Science Society of America. All copyrights reserved. 650 WWW.CROPS.ORG CROP SCIENCE, VOL. 47, MARCH–APRIL 2007 relative to the three objectives of GED analysis, namely mega-environment analysis, test-environment evaluation, and genotype evaluation. In this section, we will examine from a more theoretical point of view whether G and GE should be separated. G and GE Must Be Considered Simultaneously in Genotype Evaluation There is no disagreement among AMMI users and GGE biplot users on this issue; all agree that G and GE must be considered simultaneously in genotype evaluation. The GGE biplot was designed to include both G and GE. The AMMI1 graph for mega-environment analysis and the AMMI1 biplot for genotype evaluation also contain both G and GE; they might as well be called “GGE graphs.” They diff er from the GGE biplot only in that they contain less G+GE and have less functionality than the GGE biplot. The AMMI analysis separates G from GE fi rst and then puts them together again, whereas GGE biplot analysis deals with G+GE directly. Therefore, explicit separation of G from GE in AMMI analysis does not lead to the conclusion that it is superior to GGE biplot analysis. G and GE Are Mathematical Defi nitions Gauch (2006) argued that AMMI analysis was superior to other methods because it clearly separated G and GE and that G and GE have diff erent agricultural implications, with G representing wide adaptation and GE representing specifi c adaptation. Indeed, if wide adaptation is high performance across environments, it is fair to say that G represents wide adaptation, but only within the confi nes of the test environments. If specifi c adaptation is high performance in specifi c environments, however, it is determined by G+GE, not by GE alone. The GE is a component of specifi c adaptation; it alone has no defi ned agricultural implications, because a genotype interacting positively with an environment can have the lowest yield in that environment. It should also be recognized that G and GE are fundamentally mathematical partitioning of the total variation of a GED set. Their correspondence to biological and agricultural implications is not automatic. The G and GE can be regarded as representing diff erent biological interpretations only if it is shown that G and GE are under the control of distinct genes or genetic interactions. There is much evidence that the expression of genes and their eff ects on crop productivity are aff ected by the environment, and the same gene can exert diff erent eff ects on crop productivity in diff erent regions (vernalization genes and photoperiodism genes are the most notable examples), but there is little evidence for the existence of genes whose expression is completely independent of the environment, particularly for those that control agronomically important traits. The QTL studies involving multiple environments often reveal that major QTL are responsible for both G and GE (e.g., Romagosa et al., 1996; Tinker et al., 1996; de Koeyer et al., 2004). An AMMI analysis often reveals strong correlations between G and genotypic scores for GE (e.g., see fi gures in Ebdon and Gauch, 2002a), suggesting common genetic controls for both G and GE. Therefore, the stance that G and GE must be treated as distinct entities in GED analysis is neither plausible nor supported by agricultural and biological evidence. The G and GE Are Interchangeable There is no clear biological boundary between G and GE; G and GE are interchangeable. It is understood that G, the genotype main eff ect, is always specifi c to the environments in which it is estimated. It has no meaning when separated from its environmental context. The G estimated from a small range of environments can be GE if put into a wider scope of environments. Conversely, GE estimated in a wider range of environments can become G if the environments are subdivided. In other words, G and GE can be interpreted only in the context of the actual set of cultivars evaluated in the actual set of environments. Recognition of the interchangeability between G and GE is the sole justifi cation for mega-environment analysis, as discussed earlier. The GE becomes G if the scope of the environments is narrowed; G becomes GE when the scope of environments is widened. The gist of mega-environment analysis is to seek opportunities to subdivide the target environment into subregions (mega-environments) so that some repeatable GE can be converted into G. In summary, G and GE must be considered simultaneously in mega-environment analysis, genotype evaluation, and test-environment evaluation; separation of G from GE is primarily a mathematical manipulation that is not always supported by biological evidence. Combining G and GE in GGE biplot analysis is essential for addressing plant breeding and agricultural problems. It is an intention rather than a mistake, a strength rather than a weakness. The Utility of the AEC Is Beyond Reseparation of G from GE The AEC view of the GGE biplot (Fig. 2 and 3) does reseparate G from GE whenever G is sizable, as pointed out by Gauch (2006); however, as discussed in the previous sections, it partitions GGE in a way that genotype evaluation and test-environment evaluation can be visually addressed in terms familiar to researchers without sacrifi cing the inner-product property of the biplot. That is, the AEC allows genotypes to be evaluated by their mean performance and stability, and test environments evaluated by their discriminating power and representativeness. Such functionality has not been shown for any other GED analysis methodology
the length of a genotype .approximatest ongin to the po genotype Accurac y Gain from Model Diagnosis of the AFC ordinate (i gh Should Not Be Ov rstated ingful).the Mea vs.Stability viev of the GGE Great ac nd ma “fre e observations'”are biplot(Fig. 2)partitions this GGE into the genoty e's con claimed for m osis and identifcation of"r redi tributionoG(projectionoatofhcAEcEQ and its tively accurate"models in AMMI analysis.For ex mple contribution to ge (proiection onto the aec ordinate) Ebdon and Gauch(2002b)reported for a perennial rye This property allows identification of"ideal"genotypes(a grass(Lolium pereme L)performance dataset that a statisti large and positive contribution to g and a small contribu cal efficiency of 5.6 was achieved by using the AMMI2 tion to GE)for a given mega-environment.Many breeders model (AMMI with two IPCs),which was converted have found this application of GGE biplots to be useful. to 101844 "free observations"or a saving of $1,000,000 rlier sections,the AEC vie ot th biplot tion ng -environment, the er small by th ega-en a con p The r in the GGE bip 6 the an data H 2 the s side of the AEC ordinate fie when the g in the data is larg enough to be ing in mind hat in practice vars rather than a single one are recommended for each ness"view of the GGE biplot(Fig.3)partitions this discrim- meg nvironment Condition 2 is often false due to inating power into two components:discrimination on G practical considerations.For example,AMMIl was used (proiection to the Aec abscissa)and discrimination on Ge n mega-environment analysis and cultivar recommer projection to the AEC ordinate),whereby test environments even though AMMI2 and AMMI7 were identi ideal for selecting high-yielding and stable genotypes can be 6 dered s an con whicl renders th A omple irre 51s type ng tes nvironments that are superic th or genotype evaluatio MM use tal di MODEL DIAGNOSIS st perfo It is the f AND ACCURACY GAIN ortant and it remains a Model Diagnosis Is Useful model identified through cross We agree with Gauch (2006)that model diagnosis for predictive of future performance (Speller and dombek each dataset is useful.Many methods have been pro 1995).Therefore,model diagnosis is useful,but accuracy posed to determine how many PCs are required to fully gain from model diagnosis must not be overstated. approximate a two-way table of data,which can be used As Gauch(2006)pointed out.GED analysis is first of determine whether a biplot under-fits or over-fits the all an agricultural issue rather than a stati ical one.There data.Currently,we (Yan, Ma,an fore,it is important to understand how cultiv ars are se add reco es one kno e biplot ent o if the biplo vars on the the biplot i should be vide the data into tal and/on ealed in the iated and it is ra find a genotyne that is hest for everything (yan and division should stop when the biplot is judged as sufficient Wallace.1995).For the same reason,agr onomists always in displaying the patterns of the subset or when there are recommend a set of cultivars,rather than a single cultivar, CROP SCIENCE.VOL.47.MARCH-APRIL 2007 WWW.CROPS.ORG 651
Reproduced from Crop Science. Published by Crop Science Society of America. All copyrights reserved. CROP SCIENCE, VOL. 47, MARCH–APRIL 2007 WWW.CROPS.ORG 651 In a GGE biplot, the vector length of a genotype, which is the distance from the biplot origin to the position of the genotype marker, approximates the genotype’s contribution to GGE. When all environments are on the same side of the AEC ordinate (i.e., when G is large enough to be meaningful), the Mean vs. Stability view of the GGE biplot (Fig. 2) partitions this GGE into the genotype’s contribution to G (projection onto the AEC abscissa) and its contribution to GE (projection onto the AEC ordinate). This property allows identifi cation of “ideal” genotypes (a large and positive contribution to G and a small contribution to GE) for a given mega-environment. Many breeders have found this application of GGE biplots to be useful. However, as illustrated in earlier sections, the AEC view of the GGE biplot is used only for genotype evaluation for a single mega-environment, where the GE is either small (a simple mega-environment) or not exploitable (a complex mega-environment). The length of an environment vector in the GGE biplot approximates the environment’s discriminating power. When all environments are on the same side of the AEC ordinate (i.e., when the G in the data is large enough to be meaningful), the “Discriminating power vs. Representativeness” view of the GGE biplot (Fig. 3) partitions this discriminating power into two components: discrimination on G (projection to the AEC abscissa) and discrimination on GE (projection to the AEC ordinate), whereby test environments ideal for selecting high-yielding and stable genotypes can be identifi ed. Gauch (2006) considered E as an essential component for environment evaluation. Although E is essential for environment evaluation for nonbreeding purposes, it is irrelevant for identifying test environments that are superior for genotype evaluation. MODEL DIAGNOSIS AND ACCURACY GAIN Model Diagnosis Is Useful We agree with Gauch (2006) that model diagnosis for each dataset is useful. Many methods have been proposed to determine how many PCs are required to fully approximate a two-way table of data, which can be used to determine whether a biplot under-fi ts or over-fi ts the data. Currently, we (Yan, Ma, and Cornelius) are investigating alternative methods for addressing two questions: (i) how does one know if the biplot is adequate in approximating the two-way table that is under investigation, and, (ii) what should one do if the biplot is inadequate. Briefl y, whenever the biplot is judged as inadequate, attempts should be made to divide the data into subsets based on environmental and/or genotypic groups revealed in the biplot, as demonstrated in the above example. Data subdivision should stop when the biplot is judged as suffi cient in displaying the patterns of the subset or when there are no clear patterns (environmental or genotypic groupings) in the biplot. Accuracy Gain from Model Diagnosis Should Not Be Overstated Great accuracy gain and many “free observations” are claimed for model diagnosis and identifi cation of “predictively accurate” models in AMMI analysis. For example, Ebdon and Gauch (2002b) reported for a perennial ryegrass (Lolium perenne L.)performance dataset that a statistical effi ciency of 5.6 was achieved by using the AMMI2 model (AMMI with two IPCs), which was converted to 101 844 “free observations” or a saving of $1,000,000 (Gauch, 2006). However, this claim can be justifi ed only if all of the following conditions are met: (1) the accuracy that was achieved by the “best model” is absolutely necessary; (2) the cultivar recommendations are made exactly as suggested by the “best model”; and (3) future performances are exactly the same as expected from the current data. However, Condition 1 is met only if adopting the best model leads to diff erent cultivar recommendations, bearing in mind that, in practice, multiple cultivars rather than a single one are recommended for each mega-environment. Condition 2 is often false due to practical considerations. For example, AMMI1 was used in mega-environment analysis and cultivar recommendation, even though AMMI2 and AMMI7 were identifi ed as the best models for two turfgrass datasets (Ebdon and Gauch, 2002b), which renders the model diagnosis completely irrelevant. Condition 3 is almost always false because genotype × year and genotype × location × year interactions are inevitable. Pertaining to Condition 3, the term “predictive success” used in AMMI analysis must be interpreted properly. There is a fundamental diff erence between predicting future performance and “predicting” past performance (cross-validation). It is the former that is important and it remains a question whether the best model identifi ed through cross-validation is truly more predictive of future performance (Sneller and Dombek, 1995). Therefore, model diagnosis is useful, but accuracy gain from model diagnosis must not be overstated. As Gauch (2006) pointed out, GED analysis is fi rst of all an agricultural issue rather than a statistical one. Therefore, it is important to understand how cultivars are selected and recommended in the real world to have a realistic assessment about gains from model diagnosis. Breeders do not select cultivars on the basis of only a single trait (e.g., yield), because superior cultivars must meet requirements for multiple breeding objectives. Breeders do not select just one genotype with respect to a trait, because breeding objectives are often negatively associated, and it is rare to fi nd a genotype that is best for everything (Yan and Wallace, 1995). For the same reason, agronomists always recommend a set of cultivars, rather than a single cultivar
to the g 罗es the region.Con later detailed in Yan and Kang (2003). and mor nd Tink tion and ro the nent of Gauch GGEbiplot software that ha not been described p /2006+ht s that it n vides informa tion on the methods of data analysis is like turning the clock back on plant breed- formation centering scaling and singular-value par. ing"is an overstatement.In practice,it suffices to classify titioning associated with the biplot,along with its good- the genotypes into a few categories based on each breed- ness of fit (see top-left corner in Fig.1-3),which is essential ing objective (trait),e.g.,excellent,acceptable,and unac for correct interpretation of the biplot. ceptabl and to select those that are excellent or at least indicates that the data were not acceptable for all of the breeding objectives.Therefore, transformed before biplot analysis. understanding the patterns in a GED set is more impor GGEbiplot include:(i)transformation to natu tant than getting some estimates,and GGE logarithm;(ra 0g3 a;an biplot is an effective to this purpose or sta abilize th a and thereby The Penalty for Not Conducting Model Diagnosis is usually ble for bipl It is im to hat of the sion data (pitt nd wils 2003 en the GGE biplot under-fits =0 tha t the data wer of the data when the data are over fitted som of the not divided by anything).Other data scaling patterns in the biplot can be spurious this can he eas. in GGEbiplot include:1.rescaled by the within-environ ily prevented if formal statistical tests are conducted before ment standard deviation;2,rescaled by the within-envi any serious decisions are made.Furthermore,this situation ronment standard errors:3.rescaled by the environmenta happens only when the dataset is small and,thus,it is nor means.The purpose of scaling is to put the variables(envi- mally not a problem.When under-fitting is suspected,it is ronments)in con able ranges (i.e.,max-min).S aling important to understand that the GGE biplot still presents is optional for GED,but it is n ecessary if the variables are the most important patterns s of the GGE int he上 Thes Fo give wh patterns ar ev als serve as are expres sam unit en 8 n b th ally lead to data d by mal Th 19 tical atio so tha subdiv this 311 ups of environment)is defined (Ebdon and Gauch. heterog neity among environments with regard to 2002b).By definition,the GGE biplot always displays the their exnerimental errors while tetaining the informatiot most important patterns of the g+ge in the ged.there. about the environments'discriminating ability.Repli fore,if no pattern is seen from the biplot,it means that cated data are required for using this option.Use of"Scal- there is no clear pattern in the data;the question about the 1ng=3" removes the differences in unit and data range adequacy of the biplot becomes irrelevant and the search among variables while retaining the discriminating abil for patterns should stop.Yan and Tinker(2005b)presente ity of the environments.Therefore,t this option may have an example of environment subdivision based on the GGE some advantage over"Scaling=1. The choice of a tran biplot patterns forma on me of a scaling metho is dataset an Essential Information about a Biplot research purpo ep9cndic es that the dat were envi biplot in GED effec .E h gly us dis th d th v G+GE).Other tial for 0 gh the focus ofthis r an cen both biplot analysis it is important to be aware that many did effects and column main effects are of interest:and 3 ferent types of biplots can be constructed based on a single double-centered.The choice of a centering method is als two-way dataset.All types of biplots are useful depending esearch purpose specific.Use of"Centering =0"is useful on the research objectives (Yan and Tinker,2005b).The for visualizing the original data and is effective for datasets GGE biplots presented in this paper were generated using whose grand mean is close to 0."Centering 0"was use the“GGEbiplot"”software first reported in Yan(20o))and in studying OTL-by-environment interactions (Yan and 652 WWW.CROPS.ORG CROP SCIENCE,VOL.47,MARCH-APRIL 2007
Reproduced from Crop Science. Published by Crop Science Society of America. All copyrights reserved. 652 WWW.CROPS.ORG CROP SCIENCE, VOL. 47, MARCH–APRIL 2007 to the growers for any given region. Consequently, the choice among similar models may not aff ect cultivar selection and recommendations, and the argument of Gauch (2006) that using a suboptimal statistical model in GED analysis is like “turning the clock back on plant breeding” is an overstatement. In practice, it suffi ces to classify the genotypes into a few categories based on each breeding objective (trait), e.g., excellent, acceptable, and unacceptable, and to select those that are excellent or at least acceptable for all of the breeding objectives. Therefore, understanding the patterns in a GED set is more important than getting some “accurate” estimates, and GGE biplot is an eff ective tool for this purpose. The Penalty for Not Conducting Model Diagnosis It is important to have realistic understanding of the penalty when the GGE biplot under-fi ts or over-fi ts the G+GE of the data. When the data are over-fi tted, some of the patterns in the biplot can be spurious. This can be easily prevented if formal statistical tests are conducted before any serious decisions are made. Furthermore, this situation happens only when the dataset is small and, thus, it is normally not a problem. When under-fi tting is suspected, it is important to understand that the GGE biplot still presents the most important patterns of the GGE in the GED. These patterns are not only directly meaningful; they also serve as guide for data subdivision so that additional patterns can be explored. Continued data subdivision without a stopping criterion may eventually lead to data over-fi tting. This can be prevented by conducting a formal statistical test or by imposing some practical considerations so that subdivision terminates when a feasible number of mega-environments (or groups of environments) is defi ned (Ebdon and Gauch, 2002b). By defi nition, the GGE biplot always displays the most important patterns of the G+GE in the GED. Therefore, if no pattern is seen from the biplot, it means that there is no clear pattern in the data; the question about the adequacy of the biplot becomes irrelevant and the search for patterns should stop. Yan and Tinker (2005b) presented an example of environment subdivision based on the GGE biplot patterns. Essential Information about a Biplot Since biplot analysis has been increasingly used in GED analysis and multivariate data analysis, this section discusses specifi cations of a biplot that are essential for its correct interpretation. Although the focus of this paper is on GGE biplot analysis, it is important to be aware that many different types of biplots can be constructed based on a single two-way dataset. All types of biplots are useful depending on the research objectives (Yan and Tinker, 2005b). The GGE biplots presented in this paper were generated using the “GGEbiplot” software fi rst reported in Yan (2001) and later detailed in Yan and Kang (2003), and more recently summarized in Yan and Tinker (2006). One feature of the GGEbiplot software that has not been described previously is that it provides information on the methods of data transformation, centering, scaling, and singular-value partitioning associated with the biplot, along with its goodness of fi t (see top-left corner in Fig. 1–3), which is essential for correct interpretation of the biplot. “Transform = 0” indicates that the data were not transformed before biplot analysis. Other transformation options in GGEbiplot include: (i) transformation to natural logarithm; (ii) transformation to base 10 logarithm; and (iii) square-root transformation. The purpose of transformation is to normalize or stabilize the data and thereby to linearize the relationships among variables. For example, log transformation is usually desirable for biplot analysis of gene expression data (Pittelkow and Wilson, 2003). “Scaling = 0” means that the data were not rescaled (i.e., not divided by anything). Other data scaling options in GGEbiplot include: 1, rescaled by the within-environment standard deviation; 2, rescaled by the within-environment standard errors; 3, rescaled by the environmental means. The purpose of scaling is to put the variables (environments) in comparable ranges (i.e., max- min). Scaling is optional for GED, but it is necessary if the variables are of diff erent units. For the GED of a given trait, which are expressed in the same unit of measure in all environments, use of “Scaling = 0” will retain the information of diff erential standard deviations in diff erent environments, which may be used as a measure of the discriminating ability of the environments. Use of “Scaling = 1” will remove this information and assume all environments to be equally important. Use of “Scaling = 2” can remove any heterogeneity among environments with regard to their experimental errors while retaining the information about the environments’ discriminating ability. Replicated data are required for using this option. Use of “Scaling = 3” removes the diff erences in unit and data range among variables while retaining the discriminating ability of the environments. Therefore, this option may have some advantage over “Scaling = 1.” The choice of a transformation method and of a scaling method is dataset and research purpose specifi c. “Centering = 2” indicates that the data were environment centered (i.e., the main eff ect E was removed from the data and the biplot displays only G+GE). Other centering options in GGEbiplot include: 0, no centering; 1, grand mean centered, useful when both row main eff ects and column main eff ects are of interest; and 3, double-centered. The choice of a centering method is also research purpose specifi c. Use of “Centering = 0” is useful for visualizing the original data and is eff ective for datasets whose grand mean is close to 0. “Centering = 0” was used in studying QTL-by-environment interactions (Yan and