Mol Genet Genomics (2015)290:239-255 D0I10.1007/s00438-014-0912.7 ORIGINAL PAPER Genome-wide analysis of the MADS-box gene family in Brassica rapa(Chinese cabbage) Weike Duan·Xiaoming Song·Tongkun Liu· Zhinan Huang·Jun Ren·Xilin Hou·Ying Li Received:29 June 2014/Accepted:28 August 2014 /Published online:13 September 2014 Springer-Verlag Berlin Heidelberg 2014 Abstract The MADS-box gene family is an ancient and the type I genes.Finally,RNA-seq transcriptome data and well-studied transcription factor family that functions in quantitative real-time PCR analysis revealed that BrMADS almost every developmental process in plants.There are a genes are expressed in a tissue-specific manner similar to number of reports about the MADS-box family in differ- Arabidopsis.Interestingly,a number of BrMIKC genes ent plant species,but systematic analysis of the MADS-box showed responses to different abiotic stress treatments,sug- transcription factor family in Brassica rapa (Chinese cab- gesting a function for some of the genes in these processes bage)is still lacking.In this study,160 MADS-box tran- as well.Taken together,the characterization of the B.rapa scription factors were identified from the entire Chinese MADS-box family presented here,will certainly help in the cabbage genome and compared with the MADS-box factors selection of appropriate candidate genes and further facilitate from 21 other representative plant species.A detailed list of functional studies in Chinese cabbage. MADS proteins from these 22 species was sorted.Phylo- genetic analysis of the BrMADS genes,together with their Keywords Abiotic stress.Chinese cabbage.Genome- Arabidopsis and rice counterparts,showed that the BrMADS wide analysis.MADS-box transcription factor.qRT-PCR genes were categorised into type I (Ma,MB,My)and type II (MIKCC,MIKC*)groups,and the MIKCC proteins were further divided into 13 subfamilies.The Chinese cabbage Introduction type II group has 95 members,which is twice as much as the Arabidopsis type II group,indicating that the Chinese cab- MADS-box genes encode transcription factors that are bage type II genes have been retained more frequently than involved in developmental control and signal transduction in eukaryotes (Riechmann and Meyerowitz 1997).These Communicated by S.Hohmann genes are found in fungi (Passmore et al.1988),animals (Norman et al.1988)and plants (Sommer et al.1990; Electronic supplementary material The online version of this Yanofsky et al.1990).They constitute a large gene family. article(doi:10.1007/s00438-014-0912-7)contains supplementary which is named after a few of its earliest members,MCMI material,which is available to authorized users. (from yeast)(Passmore et al.1988),AGAMOUS (from W.Duan.X.Song.T.Liu.Z.Huang J.Ren.X.Hou. A.thaliana)(Yanofsky et al.1990).DEFICIENS (from Y.Li(☒) Antirrhinum majus)(Sommer et al.1990)and SRF(from State Key Laboratory of Crop Genetics and Germplasm Homo sapiens)(Norman et al.1988).Previous studies of Enhancement,Ministry of Agriculture,Nanjing Agricultural University,Nanjing 210095,People's Republic of China the MADS-box genes have included a thorough compari- e-mail:yingli@njau.edu.cn son analysis of their roles in plant growth and development. However,there are relatively few analyses of the response W.Duan.X.Song.T.Liu.Z.Huang.J.Ren.X.Hou.Y.Li of these genes to stress conditions.Brassica rapa ssp.peki- Key laboratory of Biology and Germplasm Enhancement nensis (Chinese cabbage)is one of the subspecies of Bras- of Crops in East China,Ministry of Agriculture,Nanjing Agricultural University,Nanjing 210095,People's Republic sica rapa.This subspecies,which originated in China,is of China one of the most economically significant vegetable crops 鱼Springer
1 3 Mol Genet Genomics (2015) 290:239–255 DOI 10.1007/s00438-014-0912-7 ORIGINAL PAPER Genome‑wide analysis of the MADS‑box gene family in Brassica rapa (Chinese cabbage) Weike Duan · Xiaoming Song · Tongkun Liu · Zhinan Huang · Jun Ren · Xilin Hou · Ying Li Received: 29 June 2014 / Accepted: 28 August 2014 / Published online: 13 September 2014 © Springer-Verlag Berlin Heidelberg 2014 the type I genes. Finally, RNA-seq transcriptome data and quantitative real-time PCR analysis revealed that BrMADS genes are expressed in a tissue-specific manner similar to Arabidopsis. Interestingly, a number of BrMIKC genes showed responses to different abiotic stress treatments, suggesting a function for some of the genes in these processes as well. Taken together, the characterization of the B. rapa MADS-box family presented here, will certainly help in the selection of appropriate candidate genes and further facilitate functional studies in Chinese cabbage. Keywords Abiotic stress · Chinese cabbage · Genomewide analysis · MADS-box transcription factor · qRT-PCR Introduction MADS-box genes encode transcription factors that are involved in developmental control and signal transduction in eukaryotes (Riechmann and Meyerowitz 1997). These genes are found in fungi (Passmore et al. 1988), animals (Norman et al. 1988) and plants (Sommer et al. 1990; Yanofsky et al. 1990). They constitute a large gene family, which is named after a few of its earliest members, MCM1 (from yeast) (Passmore et al. 1988), AGAMOUS (from A. thaliana) (Yanofsky et al. 1990), DEFICIENS (from Antirrhinum majus) (Sommer et al. 1990) and SRF (from Homo sapiens) (Norman et al. 1988). Previous studies of the MADS-box genes have included a thorough comparison analysis of their roles in plant growth and development. However, there are relatively few analyses of the response of these genes to stress conditions. Brassica rapa ssp. pekinensis (Chinese cabbage) is one of the subspecies of Brassica rapa. This subspecies, which originated in China, is one of the most economically significant vegetable crops Abstract The MADS-box gene family is an ancient and well-studied transcription factor family that functions in almost every developmental process in plants. There are a number of reports about the MADS-box family in different plant species, but systematic analysis of the MADS-box transcription factor family in Brassica rapa (Chinese cabbage) is still lacking. In this study, 160 MADS-box transcription factors were identified from the entire Chinese cabbage genome and compared with the MADS-box factors from 21 other representative plant species. A detailed list of MADS proteins from these 22 species was sorted. Phylogenetic analysis of the BrMADS genes, together with their Arabidopsis and rice counterparts, showed that the BrMADS genes were categorised into type I (Mα, Mβ, Mγ) and type II (MIKCC, MIKC*) groups, and the MIKCC proteins were further divided into 13 subfamilies. The Chinese cabbage type II group has 95 members, which is twice as much as the Arabidopsis type II group, indicating that the Chinese cabbage type II genes have been retained more frequently than Communicated by S. Hohmann. Electronic supplementary material The online version of this article (doi:10.1007/s00438-014-0912-7) contains supplementary material, which is available to authorized users. W. Duan · X. Song · T. Liu · Z. Huang · J. Ren · X. Hou · Y. Li (*) State Key Laboratory of Crop Genetics and Germplasm Enhancement, Ministry of Agriculture, Nanjing Agricultural University, Nanjing 210095, People’s Republic of China e-mail: yingli@njau.edu.cn W. Duan · X. Song · T. Liu · Z. Huang · J. Ren · X. Hou · Y. Li Key laboratory of Biology and Germplasm Enhancement of Crops in East China, Ministry of Agriculture, Nanjing Agricultural University, Nanjing 210095, People’s Republic of China
240 Mol Genet Genomics (2015)290:239-255 in Asia.Moreover,Chinese cabbage has become a vegeta- made great progress in elucidating the roles of these genes ble that is grown worldwide due to its high yield and good in plant development.Further genetic and molecular anal- quality.Thus,the growth,development and flowering time yses regarding their biological functions have focused on of this plant are significant for its yield.Recently.the Chi- flower organogenesis,which acts as the major component nese cabbage(Chiifu-401-42)genome has been sequenced, in the well-known ABCDE model:sepals(A +E),petals and this sequence can help us with the analysis of MADS- (A +B+E),stamens (B+C+E),carpels (C E),and box genes from the entire genome (Wang et al.2011).This ovules (D+E)(Zahn et al.2006).Briefly,a previous study genome has undergone triplication events since its diver- of Arabidopsis MIKC genes classified these genes into five gence from Arabidopsis (13-17 mya)(Wang et al.2011); functional classes as follows:Class A includes APETALAl however,a high degree of sequence similarity and con- (API);class B includes PISTILATA (PD)and AP3;class C served genome structure remain between these two species, includes AGAMOUS (AG):class D includes SEEDSTICK/ these traits make B.rapa a good species to use to study the AGAMOUS-LIKEII (STK/AGL11);and class E includes retention and ortholog groups of MADS-box genes dur- SEPALLATA (SEPI,SEP2,SEP3,and SEP4)(Pinyop- ing genome duplication events.Furthermore,plant growth ich et al.2003).Other MIKC genes were later identified and development are influenced greatly by numerous plant as being involved in different regulatory steps,such as: growth regulators and environmental factors. (1)Determination of flowering time genes,which include MADS proteins are characterised by the presence of a Suppressor of Overexpression Of Constansl (SOCI) conserved 58-60 amino acids long DNA-binding domain (Samach et al.2000;Moon et al.2003a,b),AGAMOUS- in the N-terminal region,which is known as the MADS LIKE GENE 24 (AGL24)(Liu et al.2008),Short Vegeta- domain,and which binds to CArG boxes (Yanofsky et al. tive Phase (SVP)(Lee et al.2007),MADS Affecting Flow- 1990).Based on the phylogenetic analysis,the plant MADS ering (MAF1/FLM),Flowering Locus c(FLC)(Michaels gene family is divided into two large lineages,type I and and Amasino 1999;Ratcliffe et al.2003)and AGL/5, type II,which were generated by an ancestral gene duplica- AGL/8(Adamczyk et al.2007);(2)Fruit ripening genes, tion event (Alvarez-Buylla et al.2000;Becker and TheiBen which include SHATTERPROOF 1-2(SHPI.SHP2)and 2003).The type I genes encode SRF-like domain proteins, FUL (Liljegren et al.2000);(3)Seed pigmentation and whereas type II genes encode MEF2-like proteins(De Bodt embryo development genes,which include TRANSPARENT et al.2003).The plant type II proteins are named MIKC due TESTA/6(TT/6)(Nesi et al.2002).Apart from reproduc- to their four domains.In addition to the MADS(M)domain. tive development,MIKC genes also function in vegetative MIKC type proteins contain the I (intervening),K(keratin- development and root development,such as AGL/2 and like)and C(C-terminal)domains (Cho et al.1999).The I AGLI7 genes (Tapia-Lopez et al.2008). domain contributes to dimer formation (Henschel et al. Some MIKCC genes have already been shown to play 2002).The K domain is characterised by a coiled-coil struc- key roles to control flowing time in Brassica,such as ture,which primarily regulates to the dimerisation of MADS BrFLCI,2,3,BcFLC,BrAGL20 and BnAP3 (Pylatuik proteins (Diaz-Riquelme et al.2009).The C domain func- et al.2003;Hong et al.2012;Liu et al.2013).For example, tions in transcriptional activation and in the formation of the overexpression of BrAGL20 can significantly affect the higher order protein complexes (Honma and Goto 2001). flowering time of B.napus,and BrFLC genes act similar to MIKC-type genes have been further divided into two sub- AtFLC,with lower expression in early-flowering Chinese groups,MIKCC and MIKC*,based on sequence divergence cabbage (Hong et al.2012).Furthermore,plant growth at the I domain (Henschel et al.2002).The MIKC*genes and development are infuenced greatly by numerous plant encode proteins that tend to have longer I domains and have growth regulators and environmental factors.Gibberel- a duplicated K domain.The type I lineage groups genes lin(GA)promotes flower formation and flowering time in with a relatively simple gene structure (only with one or biennial plants.Its involvement in flower initiation in plants two exons)that lack the K domain and that have common is well-established,and there is growing insight into the ancestors.The type I genes are subdivided into three groups, mechanisms by which floral induction is achieved (Mutasa- Ma,MB,My,based on the sequence of the MADS domain Gottgens and Hedden 2009).Salicylic acid(SA)also reg- and on the presence of additional motifs.The function of the ulates flowering time because SA-deficient plants are late type I genes appears to be restricted to female gametophyte flowering (Martinez et al.2004).Abscisic acid (ABA) (AGL80 and AGL61)and seed development (PHEI,PHE2, regulates many aspects of plant growth and development AGL23.AGL28.AGL40.AGL62)(Kohler et al.2003:Bemer (Bezerra et al.2004;Wilmowicz et al.2008).As important et al.2010:Colombo et al.2008:Masiero et al.2011). environmental stress factors,cold and heat also regulate Plant MIKC genes were first identified as floral organ plant growth and development.To learn more about the identity genes in Antirrhinum majus and in Arabidopsis response of B.rapa MADS-box genes to abiotic stresses, (Sommer et al.1990;Yanofsky et al.1990).Biologists have we selected these five treatments to explore in this study Springer
240 Mol Genet Genomics (2015) 290:239–255 1 3 in Asia. Moreover, Chinese cabbage has become a vegetable that is grown worldwide due to its high yield and good quality. Thus, the growth, development and flowering time of this plant are significant for its yield. Recently, the Chinese cabbage (Chiifu-401-42) genome has been sequenced, and this sequence can help us with the analysis of MADSbox genes from the entire genome (Wang et al. 2011). This genome has undergone triplication events since its divergence from Arabidopsis (13–17 mya) (Wang et al. 2011); however, a high degree of sequence similarity and conserved genome structure remain between these two species, these traits make B. rapa a good species to use to study the retention and ortholog groups of MADS-box genes during genome duplication events. Furthermore, plant growth and development are influenced greatly by numerous plant growth regulators and environmental factors. MADS proteins are characterised by the presence of a conserved 58–60 amino acids long DNA-binding domain in the N-terminal region, which is known as the MADS domain, and which binds to CArG boxes (Yanofsky et al. 1990). Based on the phylogenetic analysis, the plant MADS gene family is divided into two large lineages, type I and type II, which were generated by an ancestral gene duplication event (Alvarez-Buylla et al. 2000; Becker and Theißen 2003). The type I genes encode SRF-like domain proteins, whereas type II genes encode MEF2-like proteins (De Bodt et al. 2003). The plant type II proteins are named MIKC due to their four domains. In addition to the MADS (M) domain, MIKC type proteins contain the I (intervening), K (keratinlike) and C (C-terminal) domains (Cho et al. 1999). The I domain contributes to dimer formation (Henschel et al. 2002). The K domain is characterised by a coiled-coil structure, which primarily regulates to the dimerisation of MADS proteins (Díaz-Riquelme et al. 2009). The C domain functions in transcriptional activation and in the formation of higher order protein complexes (Honma and Goto 2001). MIKC-type genes have been further divided into two subgroups, MIKCC and MIKC*, based on sequence divergence at the I domain (Henschel et al. 2002). The MIKC* genes encode proteins that tend to have longer I domains and have a duplicated K domain. The type I lineage groups genes with a relatively simple gene structure (only with one or two exons) that lack the K domain and that have common ancestors. The type I genes are subdivided into three groups, Mα, Mβ, Mγ, based on the sequence of the MADS domain and on the presence of additional motifs. The function of the type I genes appears to be restricted to female gametophyte (AGL80 and AGL61) and seed development (PHE1, PHE2, AGL23, AGL28, AGL40, AGL62) (Köhler et al. 2003; Bemer et al. 2010; Colombo et al. 2008; Masiero et al. 2011). Plant MIKC genes were first identified as floral organ identity genes in Antirrhinum majus and in Arabidopsis (Sommer et al. 1990; Yanofsky et al. 1990). Biologists have made great progress in elucidating the roles of these genes in plant development. Further genetic and molecular analyses regarding their biological functions have focused on flower organogenesis, which acts as the major component in the well-known ABCDE model: sepals (A + E), petals (A + B + E), stamens (B + C + E), carpels (C + E), and ovules (D + E) (Zahn et al. 2006). Briefly, a previous study of Arabidopsis MIKC genes classified these genes into five functional classes as follows: Class A includes APETALA1 (AP1); class B includes PISTILATA (PI) and AP3; class C includes AGAMOUS (AG); class D includes SEEDSTICK/ AGAMOUS-LIKE11 (STK/AGL11); and class E includes SEPALLATA (SEP1, SEP2, SEP3, and SEP4) (Pinyopich et al. 2003). Other MIKC genes were later identified as being involved in different regulatory steps, such as: (1) Determination of flowering time genes, which include Suppressor of Overexpression Of Constans1 (SOC1) (Samach et al. 2000; Moon et al. 2003a, b), AGAMOUSLIKE GENE 24 (AGL24) (Liu et al. 2008), Short Vegetative Phase (SVP) (Lee et al. 2007), MADS Affecting Flowering (MAF1/FLM), Flowering Locus c(FLC) (Michaels and Amasino 1999; Ratcliffe et al. 2003) and AGL15, AGL18 (Adamczyk et al. 2007); (2) Fruit ripening genes, which include SHATTERPROOF 1–2 (SHP1, SHP2) and FUL (Liljegren et al. 2000); (3) Seed pigmentation and embryo development genes, which include TRANSPARENT TESTA16 (TT16) (Nesi et al. 2002). Apart from reproductive development, MIKC genes also function in vegetative development and root development, such as AGL12 and AGL17 genes (Tapia-López et al. 2008). Some MIKCC genes have already been shown to play key roles to control flowing time in Brassica, such as BrFLC1, 2, 3, BcFLC, BrAGL20 and BnAP3 (Pylatuik et al. 2003; Hong et al. 2012; Liu et al. 2013). For example, the overexpression of BrAGL20 can significantly affect the flowering time of B. napus, and BrFLC genes act similar to AtFLC, with lower expression in early-flowering Chinese cabbage (Hong et al. 2012). Furthermore, plant growth and development are influenced greatly by numerous plant growth regulators and environmental factors. Gibberellin (GA) promotes flower formation and flowering time in biennial plants. Its involvement in flower initiation in plants is well-established, and there is growing insight into the mechanisms by which floral induction is achieved (MutasaGöttgens and Hedden 2009). Salicylic acid (SA) also regulates flowering time because SA-deficient plants are late flowering (Martínez et al. 2004). Abscisic acid (ABA) regulates many aspects of plant growth and development (Bezerra et al. 2004; Wilmowicz et al. 2008). As important environmental stress factors, cold and heat also regulate plant growth and development. To learn more about the response of B. rapa MADS-box genes to abiotic stresses, we selected these five treatments to explore in this study
Mol Genet Genomics(2015)290:239-255 241 Flower development is controlled by a complex network The Pfam database (http://pfam.sanger.ac.uk/)was of interactions between transcription factors,most of them used to screen the genome assemblies of Prunus per- belonging to the MADS-box family (Airoldi and Davies sica,Arabidopsis lyrata,Capsella rubella,Thellungiella 2012).To get a better picture about the size and phylog- halophila,Solanum tuberosum,Solanum lycopersicum, eny of the MADS-box family in plants,we sorted and Aquilegia coerulea and Volvox carteri.The genome data compared the MADS-box genes from 22 different plant were downloaded from the genome browser phytozome species.To better understand these transcription factors in (http://www.phytozome.net/),and the evolutionary rela- Chinese cabbage,we determined 160 MADS-box genes tionships of these species were determined using the and analysed the phylogenetic relationships,conserved genome browser phytozome and the public database PGDD motifs,retention and ortholog groups between these Chi- (http://chibba.agtec.uga.edu/duplication/)(Lee et al.2013) nese cabbage MADS-box genes and Arabidopsis MADS- box genes.We further studied the chromosomal locations, Phylogenetic analysis gene duplication and tissue-specific expression of BrMADS genes.The expression of all of the BrMIKCC genes was In the phylogenetic tree,the Arabidopsis MADS proteins also investigated under different treatments.which included were used to classify the Chinese cabbage MADS proteins GA.SA.ABA.heat and cold. into different groups.Full-length sequences of MADS pro- teins of Chinese cabbage and Arabidopsis were aligned using the Clustalw2 program with default parameters Materials and methods (Thompson et al.1997).Then,a phylogenetic tree was then constructed by the neighbour-joining method,and boot- Identification of MADS-box gene family in Chinese strap values were calculated with 1,000 replications using cabbage MEGA5.2 (Tamura et al.2011).Additionally,an Arabidop- sis MADS proteins phylogenetic tree was used to detect the All the files that are related to Brassica genome sequence reliability of this method,and to test and verify the classi- data that were used for the identification and annotation of fication,a phylogenetic tree of Chinese cabbage,Arabidop- MADS proteins were downloaded from the Brassica data- sis,rice and grapevine was built. base (BRAD;http://brassicadb.org/brad/)(Wang et al.2011). To estimate the nucleotide divergence between Proteins with SRF-TF domains (PF00319)were retrieved sequences,all nucleotide sequences of Chinese cabbage from the Pfam 27.0 database (http://Pfam.sanger.ac.uk/) MADS-box genes were also analysed by MEGA5.2 using (Punta et al.2012).The hidden Markov model (HMM)was the Jukes-Cantor model.Bootstrap(1,000 replicates)analy- used to identify the putative MADS proteins in Chinese ses were also performed for this estimation. cabbage (Finn et al.2011).To obtain the proteins,first we used the tool hmmsearch,with an expected value (e-value) Identification of conserved motifs and gene structure cut-off 1.0.Then,we verified these sequences using the tool SMART (http://smart.embl-heidelberg.de/)(Letunic et al. To identify the conserved motifs in full-length Chinese cab- 2012),the Pfam database (http://Pfam.sanger.ac.uk/)and the bage and Arabidopsis MADS proteins,the Multiple Expec- NCBI database (http://www.ncbi.nlm.nih.gov/). tation-maximisations for Motif Elicitation (MEME)pro- gram version 4.9.0(Bailey et al.2009)was used with default Sequence retrieval parameters,except for the following parameters:(1)opti- mum motif width was set to =10 and <100;and (2)the max- The Arabidopsis thaliana MADS proteins were retrieved imum number of motifs was set to identify 15 motifs.The from the TAIR database (http://www.arabidopsis.org/) MEME motifs were annotated using the SMART program according to a previous report by Parenicova et al.(2003). (http://smart.embl-heidelberg.de)and the Pfam database. The dataset of predicted Oryza sativa MADS proteins The coding domain sequences(CDS)and DNA sequences was retrieved from previous analyses by Arora et al. of Chinese cabbage MADS-box genes were used to reveal (Arora et al.2007).A MADS-box domain was not found the gene structure using the tool GSDS (http://gsds.cbi.pku. in LOC_Os02g01360 (OsMADS60),LOC_Os12g31010 edu.cn/). (OsMADS67),and LOC_Os08g20460 (OsMADS69).The MADS proteins of Populus trichocarpa,Medicago trun- Ortholog groups of MADS-box genes in Brassica catula,Glycine max,Cucumls sativus,Citrus sinensis,Cit- and Arabidopsis genome rus clementine,Vitis vinifra,Sorghum bicolor,Zea mays, Selaginella moellendorffi and Physcomitrella paters were The program OrthoMCL (http://www.orthomcl.org/cgi- retrieved from a previous report. bin/OrthoMclWeb.cgi)(Li et al.2003)was used to identify ②Springer
Mol Genet Genomics (2015) 290:239–255 241 1 3 Flower development is controlled by a complex network of interactions between transcription factors, most of them belonging to the MADS-box family (Airoldi and Davies 2012). To get a better picture about the size and phylogeny of the MADS-box family in plants, we sorted and compared the MADS-box genes from 22 different plant species. To better understand these transcription factors in Chinese cabbage, we determined 160 MADS-box genes and analysed the phylogenetic relationships, conserved motifs, retention and ortholog groups between these Chinese cabbage MADS-box genes and Arabidopsis MADSbox genes. We further studied the chromosomal locations, gene duplication and tissue-specific expression of BrMADS genes. The expression of all of the BrMIKCC genes was also investigated under different treatments, which included GA, SA, ABA, heat and cold. Materials and methods Identification of MADS-box gene family in Chinese cabbage All the files that are related to Brassica genome sequence data that were used for the identification and annotation of MADS proteins were downloaded from the Brassica database (BRAD; http://brassicadb.org/brad/) (Wang et al. 2011). Proteins with SRF-TF domains (PF00319) were retrieved from the Pfam 27.0 database (http://Pfam.sanger.ac.uk/) (Punta et al. 2012). The hidden Markov model (HMM) was used to identify the putative MADS proteins in Chinese cabbage (Finn et al. 2011). To obtain the proteins, first we used the tool hmmsearch, with an expected value (e-value) cut-off 1.0. Then, we verified these sequences using the tool SMART (http://smart.embl-heidelberg.de/) (Letunic et al. 2012), the Pfam database (http://Pfam.sanger.ac.uk/) and the NCBI database (http://www.ncbi.nlm.nih.gov/). Sequence retrieval The Arabidopsis thaliana MADS proteins were retrieved from the TAIR database (http://www.arabidopsis.org/) according to a previous report by Parenicova et al. (2003). The dataset of predicted Oryza sativa MADS proteins was retrieved from previous analyses by Arora et al. (Arora et al. 2007). A MADS-box domain was not found in LOC_Os02g01360 (OsMADS60), LOC_Os12g31010 (OsMADS67), and LOC_Os08g20460 (OsMADS69). The MADS proteins of Populus trichocarpa, Medicago truncatula, Glycine max, Cucumls sativus, Citrus sinensis, Citrus clementine, Vitis vinifra, Sorghum bicolor, Zea mays, Selaginella moellendorffi and Physcomitrella paters were retrieved from a previous report. The Pfam database (http://pfam.sanger.ac.uk/) was used to screen the genome assemblies of Prunus persica, Arabidopsis lyrata, Capsella rubella, Thellungiella halophila, Solanum tuberosum, Solanum lycopersicum, Aquilegia coerulea and Volvox carteri. The genome data were downloaded from the genome browser phytozome (http://www.phytozome.net/), and the evolutionary relationships of these species were determined using the genome browser phytozome and the public database PGDD (http://chibba.agtec.uga.edu/duplication/) (Lee et al. 2013). Phylogenetic analysis In the phylogenetic tree, the Arabidopsis MADS proteins were used to classify the Chinese cabbage MADS proteins into different groups. Full-length sequences of MADS proteins of Chinese cabbage and Arabidopsis were aligned using the ClustalW2 program with default parameters (Thompson et al. 1997). Then, a phylogenetic tree was then constructed by the neighbour-joining method, and bootstrap values were calculated with 1,000 replications using MEGA5.2 (Tamura et al. 2011). Additionally, an Arabidopsis MADS proteins phylogenetic tree was used to detect the reliability of this method, and to test and verify the classification, a phylogenetic tree of Chinese cabbage, Arabidopsis, rice and grapevine was built. To estimate the nucleotide divergence between sequences, all nucleotide sequences of Chinese cabbage MADS-box genes were also analysed by MEGA5.2 using the Jukes-Cantor model. Bootstrap (1,000 replicates) analyses were also performed for this estimation. Identification of conserved motifs and gene structure To identify the conserved motifs in full-length Chinese cabbage and Arabidopsis MADS proteins, the Multiple Expectation-maximisations for Motif Elicitation (MEME) program version 4.9.0 (Bailey et al. 2009) was used with default parameters, except for the following parameters: (1) optimum motif width was set to ≥10 and ≤100; and (2) the maximum number of motifs was set to identify 15 motifs. The MEME motifs were annotated using the SMART program (http://smart.embl-heidelberg.de) and the Pfam database. The coding domain sequences (CDS) and DNA sequences of Chinese cabbage MADS-box genes were used to reveal the gene structure using the tool GSDS (http://gsds.cbi.pku. edu.cn/). Ortholog groups of MADS-box genes in Brassica and Arabidopsis genome The program OrthoMCL (http://www.orthomcl.org/cgibin/OrthoMclWeb.cgi) (Li et al. 2003) was used to identify
242 Mol Genet Genomics (2015)290:239-255 the homologous genes of MADS-box between Chinese completed for this cultivar;thus,this cultivar is a typical cabbage and Arabidopsis.Briefly,the tools BLASTP, cultivar for Chinese cabbage research.Seeds were grown with an e-value 85%), tions.Five micrograms of each sample were reverse and then Ks values were calculated for all pair-wise align- transcribed into cDNA using the PrimeScript RT rea- ments of these genes,which previously obtained by blast, gent Kit (TaKaRa).The specific primers of Chinese using the method of Nei and Gojobori as implemented in cabbage MADS-box genes and the housekeeping actin KaKs_calculator (Zhang et al.2006).Lastly,based on phy- gene(Bra028615)were designed using the Primer Pre- logenies,the nucleotide divergence (Dist <0.1)was used mier 5.0 software (Supplementary Table 11).To verify as the final standard (Lynch and Conery 2000).The pur- the primer specificity,we used the program BLAST ple lines were used to link the duplicate genes on different against the Chinese cabbage genome.The qRT-PCR chromosomes. assays were performed with three biological and three technical replicates.Each reaction was performed in a Chinese cabbage RNA-seg data analysis 20 uL reaction mixture containing a diluted cDNA sam- ple as the template,2x Power SYBR Green PCR Mas- For the expression profiling of Chinese MADS-box genes, ter Mix(Applied Biosystems),and 400 nM each of for- we utilised the Illumina RNA-seq data that were previously ward and reverse gene-specific primers.The reactions generated and analysed by Tong et al.(2013).Six tissues were performed using a MyiQ Single-Color Real-Time of B.rapa accession Chiifu-401-42,including callus,root, PCR Detection System (Bio-Rad,Hercules,CA)with stem,leaf,flower,and silique,were analysed.Two sam- the following cycling profile:94 C for 30 s,followed ples of root and leaf tissues were generated from different by 40 cycles at 94 C for 10 s,and 58 C for 30 s.A batches of plants.The transcript abundance is expressed as melting curve(61 cycles at 65 C for 10 s)was gener- fragments per kilobase of exon model per million mapped ated to verify the specificity of the amplification (Song reads (FPKM)values.Heat maps for Chinese cabbage et al.2013).The relative expression ratio of each gene MADS-box genes were generated,which have positive was calculated using the comparative C,value method FPKM values in at least one or more of the samples. (Heid et al.1996).The MADS-box gene expression cluster from each stress treatment was analysed using Plant material and treatments the Cluster program (http://bonsai.hgc.jp/~mdehoon/ software/cluster/software.htm)(Eisen et al.1998).and The Chinese cabbage cultivar Chiifu-401-42 was used the results were shown using the TreeView software for the experiments.Whole genome sequencing has been (http://jtreeview.sourceforge.net/). Springer
242 Mol Genet Genomics (2015) 290:239–255 1 3 the homologous genes of MADS-box between Chinese cabbage and Arabidopsis. Briefly, the tools BLASTP, with an e-value ≤1e−10, and orthomclPairs were applied to find orthologs, inparalogs and coorthologs in these two species. To link these genes to chromosomes, a tool called Circos (Krzywinski et al. 2009) was used. In addition, the Cytoscape software was applied to build the network of these relationships (Shannon et al. 2003). Chromosome localisation and gene duplications To determine the physical locations of MADS-box genes, the starting and ending positions of all MADS-box genes on each chromosome were obtained from the BRAD database. The Perl in-house program was used to draw the location images of the Chinese cabbage MADS-box genes. The positions of each Chinese cabbage MADS-box gene on the blocks were verified by searching for homologous genes between Arabidopsis and three B. rapa subgenomes, including least fractionated (LF), medium fractionated (MF1) and most fractionated (MF2) genomes (http://brassicadb.org/brad/searchSynteny.php) (Wang et al. 2011; Cheng et al. 2013). To determine the gene duplications, first, the CDS sequences of Chinese cabbage MADS-box genes were blasted against each other (evalue 85 %), and then Ks values were calculated for all pair-wise alignments of these genes, which previously obtained by blast, using the method of Nei and Gojobori as implemented in KaKs_calculator (Zhang et al. 2006). Lastly, based on phylogenies, the nucleotide divergence (Dist <0.1) was used as the final standard (Lynch and Conery 2000). The purple lines were used to link the duplicate genes on different chromosomes. Chinese cabbage RNA-seq data analysis For the expression profiling of Chinese MADS-box genes, we utilised the Illumina RNA-seq data that were previously generated and analysed by Tong et al. (2013). Six tissues of B. rapa accession Chiifu-401-42, including callus, root, stem, leaf, flower, and silique, were analysed. Two samples of root and leaf tissues were generated from different batches of plants. The transcript abundance is expressed as fragments per kilobase of exon model per million mapped reads (FPKM) values. Heat maps for Chinese cabbage MADS-box genes were generated, which have positive FPKM values in at least one or more of the samples. Plant material and treatments The Chinese cabbage cultivar Chiifu-401-42 was used for the experiments. Whole genome sequencing has been completed for this cultivar; thus, this cultivar is a typical cultivar for Chinese cabbage research. Seeds were grown in pots containing a soil: vermiculite mixture (3:1) in the greenhouse of Nanjing Agricultural University, and the controlled-environment growth chamber programmed is light 16 h/25 °C, dark 8 h/20 °C (Song et al. 2013). One month later, seedlings at the five-leaf stage were transferred to growth chambers that were set at 4 or 38 °C under identical light intensity and day length as the cold and heat treatments. Simultaneously, for acclimation, some plants were cultured in 1/2 Hoagland’s solution in plastic containers, with the pH at 6.5 (Jensen and Bassham 1966). After 5 days of acclimatisation, plants were cultured in the following four treatments: (1) Control; (2) 100 μM ABA; (3) 100 μM GA; (4) 100 μM SA. At 4 and 12 h after treatment, the young leaf samples were collected, frozen in liquid nitrogen and stored at −70 °C for further analysis. RNA isolation and quantitative real-time PCR Total RNA was isolated from 100 mg of frozen tissue using an RNA kit (RNAsimply Total RNA Kit, Tiangen, Beijing, China) according to the manufacturer’s instructions. Five micrograms of each sample were reverse transcribed into cDNA using the PrimeScript RT reagent Kit (TaKaRa). The specific primers of Chinese cabbage MADS-box genes and the housekeeping actin gene (Bra028615) were designed using the Primer Premier 5.0 software (Supplementary Table 11). To verify the primer specificity, we used the program BLAST against the Chinese cabbage genome. The qRT-PCR assays were performed with three biological and three technical replicates. Each reaction was performed in a 20 μL reaction mixture containing a diluted cDNA sample as the template, 2× Power SYBR Green PCR Master Mix (Applied Biosystems), and 400 nM each of forward and reverse gene-specific primers. The reactions were performed using a MyiQ Single-Color Real-Time PCR Detection System (Bio-Rad, Hercules, CA) with the following cycling profile: 94 °C for 30 s, followed by 40 cycles at 94 °C for 10 s, and 58 °C for 30 s. A melting curve (61 cycles at 65 °C for 10 s) was generated to verify the specificity of the amplification (Song et al. 2013). The relative expression ratio of each gene was calculated using the comparative Ct value method (Heid et al. 1996). The MADS-box gene expression cluster from each stress treatment was analysed using the Cluster program (http://bonsai.hgc.jp/~mdehoon/ software/cluster/software.htm) (Eisen et al. 1998), and the results were shown using the TreeView software (http://jtreeview.sourceforge.net/)
Mol Genet Genomics(2015)290:239-255 243 Results to be type I MADS-box genes (including the Ma,MB and My groups),which is comparable to that in Arabidopsis. Identification and classification of MADS-box genes To perform comparative genomic analyses,we searched in Chinese cabbage and comparative analyses for MADS protein-coding sequences in the genomes of 22 other plant species.Some of these genes have been pub- To identify the putative MADS proteins in the Chinese cab- lished previously,while others are described in this work bage genome,a HMM search resulted in the identification for the first time(Supplementary Tables 2 and 3).The evo- of 164 proteins.Subsequently,all 164 protein sequences lutionary relationships of the species and the number of were subjected to Pfam and SMART analyses,which MADS-box genes in their genomes are shown in Fig.1. resulted in the identification of 162 MADS proteins,called The data that are coloured green were for the first time BrMADS001 to BrMADS162 according to the hmmsearch analysed in this work.The pre-classified groups of these e-value (Supplementary Table 1).Simultaneously,by per- species were based on their phylogenetic relationships forming a homology search against Arabidopsis and by with Arabidopsis MADS-box genes.The data show that analyzing the gene structure,two genes were removed. the number of MADS-box genes in Alga,Bryophyta and BrMADS047 and BrMADS124 contained other functional Pteridophyta is less than that in Angiospermae.Since sev- domains,while their homologs were non-MADS genes eral whole genome duplication(WGD)events happened (Supplementary Fig.1). during angiosperm evolution,it is likely that this higher To pre-classify the Chinese MADS-box genes,a phy- number is caused by an elevated duplication frequency, logenetic relationship with Arabidopsis MADS-box genes in combination with an increased retention of MADS- was built (Supplementary Fig.2).In total,95 genes were box genes that were subjected to neofunctionalization and determined to be type II MADS-box genes (including gained important functions in angiosperm flower devel- MIKC and MIKC*),with twofold more members than that opment(Doebley and Lukens 1998;Theissen et al.2000; in Arabidopsis.However,65 of these genes were confirmed Nam et al.2003). The number of MADS-box genes in plant species Total Total Type ll genes MIKCC MIKC'Total Type I genes Ma MB My Populus trichocarpa 105 64 55 9 41 23126 Medicago fruncatufa 91 31 27 60 31524 -Glycine max 163 89 82 7 75 3714 品 Cucimls sativus 3 4 52 72 32 29 3 40 1910 Arabidopsis thaliana 108 46 3 > 24 22 16 Arabidopsis lyrata 81 44 10 21 13 3 Caosele rubela 133 9 8 4112 160 95 1 16 2 Theiungielle halophia 120 46 9 7 27 11 36 52 36 2 8 9 3 Citnis clementine 84 9 24 54 4 3 10 167 4 1 66 22 14 Solanum lycopersicu 95 6 56 7 6 63 2 1 Sorghum bicola 65 33 30 Zea mays 75 43 4 27 2 Oryza sativa 72 37 4 31 12 9 Selaginella moellendorfi 19 6 3 13 3 24 6 11 2 0 Volvox carten 2 Whole Genome Tripiication Whole Genome Duplication This work Previous work Fig.1 The evolutionary relationships of the species and the number species.The data that are coloured blue were described in this work. detail of the MADS-box family of each species.The left of this fig- and the data that are coloured green were published in previous works ure shows the evolutionary relationships of the species;the right of (colour figure online) this figure shows the number detail of the MADS-box family of each ②Springer
Mol Genet Genomics (2015) 290:239–255 243 1 3 Results Identification and classification of MADS-box genes in Chinese cabbage and comparative analyses To identify the putative MADS proteins in the Chinese cabbage genome, a HMM search resulted in the identification of 164 proteins. Subsequently, all 164 protein sequences were subjected to Pfam and SMART analyses, which resulted in the identification of 162 MADS proteins, called BrMADS001 to BrMADS162 according to the hmmsearch e-value (Supplementary Table 1). Simultaneously, by performing a homology search against Arabidopsis and by analyzing the gene structure, two genes were removed. BrMADS047 and BrMADS124 contained other functional domains, while their homologs were non-MADS genes (Supplementary Fig. 1). To pre-classify the Chinese MADS-box genes, a phylogenetic relationship with Arabidopsis MADS-box genes was built (Supplementary Fig. 2). In total, 95 genes were determined to be type II MADS-box genes (including MIKCc and MIKC*), with twofold more members than that in Arabidopsis. However, 65 of these genes were confirmed to be type I MADS-box genes (including the Mα, Mβ and Mγ groups), which is comparable to that in Arabidopsis. To perform comparative genomic analyses, we searched for MADS protein-coding sequences in the genomes of 22 other plant species. Some of these genes have been published previously, while others are described in this work for the first time (Supplementary Tables 2 and 3). The evolutionary relationships of the species and the number of MADS-box genes in their genomes are shown in Fig. 1. The data that are coloured green were for the first time analysed in this work. The pre-classified groups of these species were based on their phylogenetic relationships with Arabidopsis MADS-box genes. The data show that the number of MADS-box genes in Alga, Bryophyta and Pteridophyta is less than that in Angiospermae. Since several whole genome duplication (WGD) events happened during angiosperm evolution, it is likely that this higher number is caused by an elevated duplication frequency, in combination with an increased retention of MADSbox genes that were subjected to neofunctionalization and gained important functions in angiosperm flower development (Doebley and Lukens 1998; Theissen et al. 2000; Nam et al. 2003). Fig. 1 The evolutionary relationships of the species and the number detail of the MADS-box family of each species. The left of this figure shows the evolutionary relationships of the species; the right of this figure shows the number detail of the MADS-box family of each species. The data that are coloured blue were described in this work, and the data that are coloured green were published in previous works (colour figure online)
244 Mol Genet Genomics (2015)290:239-255 Copy number variation and differential retention Identification of conserved motifs and gene structure of MADS-box genes in Chinese cabbage To compare the differences in the protein structure,MEME A comparison of the homologous MADS-box genes in was used to identify the conserved motifs among the Chi- Arabidopsis and the three B.rapa subgenomes (LF,MF1 nese cabbage and Arabidopsis MADS proteins.The type and MF2)using the BRAD database revealed that most I and type II MADS proteins of these two species were BrMADSs on the conserved collinear blocks have been compared in separate analyses,and for each comparison, well-conserved throughout the divergent evolution of fifteen conserved motifs,named motif 1 to motif 15,were Arabidopsis and B.rapa (Cheng et al.2012)(Supplemen- identified(Supplementary Fig.5 and Fig.3).In general,the tary Table 4).The gene dosage hypothesis predicts that MADS proteins were clustered in the same subgroups and genes whose products are dose-sensitive,interacting either shared similar motif composition,which indicates func- with other proteins or in networks,should be overretained tional similarities among members of the same subgroup (Thomas et al.2006;Birchler and Veitia 2007).The type (Parenicova et al.2003).The Arabidopsis and Chinese cab- II proteins have been shown to function in large complexes bage MADS proteins were found to have similar structure during flower development,while it is still unclear how for every subgroup in type II except BrMADS031,060. the type I proteins perform their functions.Interestingly, 112 and 103 which with incomplete domains.However,in type II genes have been retained after triplication and frac- type I,the protein structure was divergent (Supplementary tionation in B.rapa at a significantly higher rate than the Fig.5).This finding indicates that the C-terminal part of type I genes(Supplementary Fig.3a).Most(74 %)type II the MADS domain in the Ma,MB and My groups is more genes were retained in two or three copies,which is sig- divergent than that in the MIKC group.In type I,except the nificantly greater than the retention of type I genes(15 % MADS domain,each of the groups shows a different motif (Supplementary Fig.3a),while more(65 %)of the type I profile,and none of these motifs can be annotated using the genes were completely lost.The proportion of homoeologs tool SMART.The protein motifs shared by the Arabidopsis retained varied among the three sub-genomes (Supplemen- and Brassica type I proteins within a clade,show that there tary Fig.3b).In the LF sub-genome,more MADS-box gene is also conservation beyond the MADS domain,although homoeologs were retained than other two sub-genomes. proteins of one clade sometimes show some variation in The retention of type II genes homoeologs among the sub- the motif profile,like for example:BrMADS118,128,144, genomes was more than that of the type I genes(Supple- 136,157 in My and BrMADS106,108,113 in MB (Supple- mentary Fig.3b). mentary Fig.5). Simultaneously,the protein structure of BrMADS was Phylogenetic and classification analysis of BrMADS genes analysed using the program MEME.As expected,the com- monly shared motifs tend to be in the same group.The To examine the phylogenetic relationships between motifs were detected by the tool SMART(Supplementary BrMADS genes in detail,independent phylogenetic trees Fig.6).It will be interesting to characterise the functions of were constructed with Arabidopsis and rice type I and type the common motifs within the newly designated groups in II proteins(Supplementary Fig.4 and Fig.2).The type I relation to the functions of these genes. proteins were divided into three subfamilies Ma(27),MB In addition to the protein structure,the gene structure (16),My(22),whereas the type II proteins were divided was also analysed.We found that all type II BrMADS genes into 13 subgroups (Supplementary Table 5).Subgroup have at least three exons.while the number of exons in the TM3-like (SOC/)consisted of the highest (16)number type I genes is at maximum two consistent with AtMADS of BrMADS type II proteins,whereas subgroup AGL12, genes(Parenicova et al.2003).Furthermore the first exon AGL6 and Bs(TT/6)had the lowest members,with only (approximately 180 bp)of type II genes conservatively three.Other subgroups contained from four to ten mem- codes the MADS domain.Supplementary Fig.6 gives an bers (Supplementary Fig.3c).In addition,in the type II overview of the structures of the Chinese cabbage MADS group,there are eleven genes that were identified as genes and proteins. MIKC*-type. Finally,we visualized the phylogenetic relationship of Ortholog groups,chromosomal localization and gene the BrMADS proteins with the Arabidopsis,rice,soybean duplication of MADS-box genes and grapevine MADS proteins by building an unrooted tree of the full-length MADS protein sequences.The phy- Most angiosperm plant lineages have experienced one or logenetic tree divided these proteins into 5 distinct sub- more rounds of ancient polyploidy (Lee et al.2013).Chi- families (MIKCC,MIKC*,Ma,MB,My)(Supplementary nese cabbage has undergone genome triplication since its Fig.2c). divergence from Arabidopsis(Wang et al.2011).Generally, Springer
244 Mol Genet Genomics (2015) 290:239–255 1 3 Copy number variation and differential retention of MADS-box genes in Chinese cabbage A comparison of the homologous MADS-box genes in Arabidopsis and the three B. rapa subgenomes (LF, MF1 and MF2) using the BRAD database revealed that most BrMADSs on the conserved collinear blocks have been well-conserved throughout the divergent evolution of Arabidopsis and B. rapa (Cheng et al. 2012) (Supplementary Table 4). The gene dosage hypothesis predicts that genes whose products are dose-sensitive, interacting either with other proteins or in networks, should be overretained (Thomas et al. 2006; Birchler and Veitia 2007). The type II proteins have been shown to function in large complexes during flower development, while it is still unclear how the type I proteins perform their functions. Interestingly, type II genes have been retained after triplication and fractionation in B. rapa at a significantly higher rate than the type I genes (Supplementary Fig. 3a). Most (74 %) type II genes were retained in two or three copies, which is significantly greater than the retention of type I genes (15 %) (Supplementary Fig. 3a), while more (65 %) of the type I genes were completely lost. The proportion of homoeologs retained varied among the three sub-genomes (Supplementary Fig. 3b). In the LF sub-genome, more MADS-box gene homoeologs were retained than other two sub-genomes. The retention of type II genes homoeologs among the subgenomes was more than that of the type I genes (Supplementary Fig. 3b). Phylogenetic and classification analysis of BrMADS genes To examine the phylogenetic relationships between BrMADS genes in detail, independent phylogenetic trees were constructed with Arabidopsis and rice type I and type II proteins (Supplementary Fig. 4 and Fig. 2). The type I proteins were divided into three subfamilies Mα (27), Mβ (16), Mγ (22), whereas the type II proteins were divided into 13 subgroups (Supplementary Table 5). Subgroup TM3-like (SOC1) consisted of the highest (16) number of BrMADS type II proteins, whereas subgroup AGL12, AGL6 and Bs (TT16) had the lowest members, with only three. Other subgroups contained from four to ten members (Supplementary Fig. 3c). In addition, in the type II group, there are eleven genes that were identified as MIKC*-type. Finally, we visualized the phylogenetic relationship of the BrMADS proteins with the Arabidopsis, rice, soybean and grapevine MADS proteins by building an unrooted tree of the full-length MADS protein sequences. The phylogenetic tree divided these proteins into 5 distinct subfamilies (MIKCC, MIKC*, Mα, Mβ, Mγ) (Supplementary Fig. 2c). Identification of conserved motifs and gene structure To compare the differences in the protein structure, MEME was used to identify the conserved motifs among the Chinese cabbage and Arabidopsis MADS proteins. The type I and type II MADS proteins of these two species were compared in separate analyses, and for each comparison, fifteen conserved motifs, named motif 1 to motif 15, were identified (Supplementary Fig. 5 and Fig. 3). In general, the MADS proteins were clustered in the same subgroups and shared similar motif composition, which indicates functional similarities among members of the same subgroup (Parenicová et al. 2003). The Arabidopsis and Chinese cabbage MADS proteins were found to have similar structure for every subgroup in type II except BrMADS031, 060, 112 and 103 which with incomplete domains. However, in type I, the protein structure was divergent (Supplementary Fig. 5). This finding indicates that the C-terminal part of the MADS domain in the Mα, Mβ and Mγ groups is more divergent than that in the MIKC group. In type I, except the MADS domain, each of the groups shows a different motif profile, and none of these motifs can be annotated using the tool SMART. The protein motifs shared by the Arabidopsis and Brassica type I proteins within a clade, show that there is also conservation beyond the MADS domain, although proteins of one clade sometimes show some variation in the motif profile, like for example: BrMADS118, 128, 144, 136, 157 in Mγ and BrMADS106, 108, 113 in Mβ (Supplementary Fig. 5). Simultaneously, the protein structure of BrMADS was analysed using the program MEME. As expected, the commonly shared motifs tend to be in the same group. The motifs were detected by the tool SMART (Supplementary Fig. 6). It will be interesting to characterise the functions of the common motifs within the newly designated groups in relation to the functions of these genes. In addition to the protein structure, the gene structure was also analysed. We found that all type II BrMADS genes have at least three exons, while the number of exons in the type I genes is at maximum two consistent with AtMADS genes (Parenicová et al. 2003). Furthermore the first exon (approximately 180 bp) of type II genes conservatively codes the MADS domain. Supplementary Fig. 6 gives an overview of the structures of the Chinese cabbage MADS genes and proteins. Ortholog groups, chromosomal localization and gene duplication of MADS-box genes Most angiosperm plant lineages have experienced one or more rounds of ancient polyploidy (Lee et al. 2013). Chinese cabbage has undergone genome triplication since its divergence from Arabidopsis (Wang et al. 2011). Generally
Mol Genet Genomics(2015)290:239-255 245 E (SEP1/2/3) AGL6 A (AP1/FU儿U CAL) AGL12 -030 MIKC 014 C/D BrMAD (AG/ STK/ SHP1/2) FLC MAF TM3-like (S0C1) (AP3/PD Bs Arabidopsis type ll MADS-box genes (TT16) ◆ Rice type ll AGL17 MADS-box genes AGL18/15 SVP Fig.2 Phylogenetic tree of Chinese cabbage,Arabidopsis and rice rice (41)showing similar groups in all of the plant species.In total type II MADS-box proteins.Phylogenetic analysis of 182 type II 13 clades with different colours that were formed by type II MADS MADS proteins from Chinese cabbage (95),Arabidopsis (46)and proteins are also marked (colour figure online) the gene number in the Chinese cabbage genome was nota- the correlation of the MADS-box genes in Chinese cabbage bly less than three times the Arabidopsis gene number and Arabidopsis,the networks of MADS-box genes were because some genes were lost during polyploidy speciation. constructed using these two species orthologous (Sup- Additionally,both segmental and tandem gene duplications plementary Fig.7).Among the orthologous gene pairs of have significant impacts on the expansion and evolution of Chinese cabbage and Arabidopsis,16 Arabidopsis MADS- gene families in plant genomes.In this study,we analysed box genes were found no ortholog with Chinese cabbage the ortholog groups between Chinese cabbage and Arabi- MADS-box genes,these genes have been duplicated in dopsis MADS-box genes using the OrthoMCL program. Arabidopsis after the split.Fifty Arabidopsis MADS-box Then,we identified 67 orthologous gene pairs and 120 co- genes have only one ortholog in Chinese cabbage,these orthologous gene pairs in the MADS proteins of these two genes were present before the split,but two of the three species (Supplementary Table 6).Their visualisation was copies have been lost after the B.rapa genome triplica- performed using the Circos software (Fig.4).Among the tion (Supplementary Fig.7a),and 42 Arabidopsis genes orthologous gene pairs of Chinese cabbage and Arabidop- have co-orthologs in Chinese cabbage,these genes were sis,we found more Chinese cabbage MADS-box homolo- preferentially retained after the triplication (Supplemen- gous genes in Arabidopsis chromosome 5 and chromosome tary Fig.7b,c and d).Meanwhile,we found 71 and 60 in I than in other chromosomes.To further obtain insight into paralogous gene pairs in Arabidopsis and Chinese cabbage, Springer
Mol Genet Genomics (2015) 290:239–255 245 1 3 the gene number in the Chinese cabbage genome was notably less than three times the Arabidopsis gene number because some genes were lost during polyploidy speciation. Additionally, both segmental and tandem gene duplications have significant impacts on the expansion and evolution of gene families in plant genomes. In this study, we analysed the ortholog groups between Chinese cabbage and Arabidopsis MADS-box genes using the OrthoMCL program. Then, we identified 67 orthologous gene pairs and 120 coorthologous gene pairs in the MADS proteins of these two species (Supplementary Table 6). Their visualisation was performed using the Circos software (Fig. 4). Among the orthologous gene pairs of Chinese cabbage and Arabidopsis, we found more Chinese cabbage MADS-box homologous genes in Arabidopsis chromosome 5 and chromosome 1 than in other chromosomes. To further obtain insight into the correlation of the MADS-box genes in Chinese cabbage and Arabidopsis, the networks of MADS-box genes were constructed using these two species orthologous (Supplementary Fig. 7). Among the orthologous gene pairs of Chinese cabbage and Arabidopsis, 16 Arabidopsis MADSbox genes were found no ortholog with Chinese cabbage MADS-box genes, these genes have been duplicated in Arabidopsis after the split. Fifty Arabidopsis MADS-box genes have only one ortholog in Chinese cabbage, these genes were present before the split, but two of the three copies have been lost after the B. rapa genome triplication (Supplementary Fig. 7a), and 42 Arabidopsis genes have co-orthologs in Chinese cabbage, these genes were preferentially retained after the triplication (Supplementary Fig. 7b, c and d). Meanwhile, we found 71 and 60 in paralogous gene pairs in Arabidopsis and Chinese cabbage, Fig. 2 Phylogenetic tree of Chinese cabbage, Arabidopsis and rice type II MADS-box proteins. Phylogenetic analysis of 182 type II MADS proteins from Chinese cabbage (95), Arabidopsis (46) and rice (41) showing similar groups in all of the plant species. In total, 13 clades with different colours that were formed by type II MADS proteins are also marked (colour figure online)
246 Mol Genet Genomics(2015)290:239-255 Fig.3 Phylogenetic relation- ships and conserved motif com- positions of Chinese cabbage and Arabidopsis type II MADS (SEP1/2/3) proteins.The neighbour-joining tree of Chinese cabbage and 013 Arabidopsis type II MADS-box genes and their motif locations AGL6 06 8018 (AP1/FUL/CAL) AGL12 C/D (AG/STK/SHP1/2) TM3-like (SOC1) 02 FLC/MAF AGL18/15 AGL17 Bs (TT16) B(AP3/PI) 29 SVP MIKC* MADS-box genes 么Springer
246 Mol Genet Genomics (2015) 290:239–255 1 3 Fig. 3 Phylogenetic relation - ships and conserved motif com - positions of Chinese cabbage and Arabidopsis type II MADS proteins. The neighbour-joining tree of Chinese cabbage and Arabidopsis type II MADS-box genes and their motif locations
Mol Genet Genomics (2015)290:239-255 247 A0 B01 At04 8e2 罩 81 Br09 Br07 Br08 Fig.4 Ortholog groups of MADS-box genes in B.rapa and Arabi- co-orthologous gene pairs are coloured black:Chinese cabbage par- dopsis Genome.Ten Chinese cabbage chromosomes and five Arabi- alogous gene pairs are coloured yellow and Arabidopsis paralogous dopsis chromosomes are coloured different random colours with gene pairs are coloured red.The figure was created using the software their names on the periphery.The lines in the figure represent four Circos(colour figure online) pairs.The lines regarding orthologous gene pairs are coloured blue; respectively (Supplementary Table 5 and Fig.4).From this on Scaffold 000343 and 000385,respectively.The other analysis,we found gene duplication events after the diver- 158 members of the BrMADS genes were distributed non- gence of Chinese cabbage and Arabidopsis resulted in a randomly on 10 Chinese cabbage chromosomes (Fig.5a). high number of paralogous and co-orthologous genes in Chromosomes 2 and 9 contain the most MADS-box genes both species.While in Arabidopsis the type I subfamily has (15/16 %)whereas chromosome 8 contains the few- predominantly expanded,it is in Chinese cabbage the type est (6 %)(Fig.5b).We also found that some MADS-box II family which has expanded. genes cluster together in a region of the chromosome.For The physical map positions of the MADS-box genes on example,16 genes clustered in the end of chromosome 2. Chinese cabbage chromosomes were identified (Fig.5). and almost of the genes belong to BrMIKCC.Type I and Among the 160 BrMADS genes,two genes (BrMADS150, type II also show a differential distribution on Chinese cab- BrMADS/34)could not be anchored on any of the Chinese bage chromosomes.The type I genes are distributed evenly cabbage chromosomes.BrMADS150 and BrMADS134 are across all ten chromosomes,whereas genes from type II are Springer
Mol Genet Genomics (2015) 290:239–255 247 1 3 respectively (Supplementary Table 5 and Fig. 4). From this analysis, we found gene duplication events after the divergence of Chinese cabbage and Arabidopsis resulted in a high number of paralogous and co-orthologous genes in both species. While in Arabidopsis the type I subfamily has predominantly expanded, it is in Chinese cabbage the type II family which has expanded. The physical map positions of the MADS-box genes on Chinese cabbage chromosomes were identified (Fig. 5). Among the 160 BrMADS genes, two genes (BrMADS150, BrMADS134) could not be anchored on any of the Chinese cabbage chromosomes. BrMADS150 and BrMADS134 are on Scaffold 000343 and 000385, respectively. The other 158 members of the BrMADS genes were distributed nonrandomly on 10 Chinese cabbage chromosomes (Fig. 5a). Chromosomes 2 and 9 contain the most MADS-box genes (15/16 %), whereas chromosome 8 contains the fewest (6 %) (Fig. 5b). We also found that some MADS-box genes cluster together in a region of the chromosome. For example, 16 genes clustered in the end of chromosome 2, and almost of the genes belong to BrMIKCC. Type I and type II also show a differential distribution on Chinese cabbage chromosomes. The type I genes are distributed evenly across all ten chromosomes, whereas genes from type II are Fig. 4 Ortholog groups of MADS-box genes in B. rapa and Arabidopsis Genome. Ten Chinese cabbage chromosomes and five Arabidopsis chromosomes are coloured different random colours with their names on the periphery. The lines in the figure represent four pairs. The lines regarding orthologous gene pairs are coloured blue; co-orthologous gene pairs are coloured black; Chinese cabbage paralogous gene pairs are coloured yellow and Arabidopsis paralogous gene pairs are coloured red. The figure was created using the software Circos (colour figure online)
248 Mol Genet Genomics(2015)290:239-255 Br05 Br06 B07 B08 Br10 Br10 77 27 (11/15) (6/6) Br01 Br09 Br02 1 Br03 B07 Br04 Br06 Br05 Br10 Br01 7 Br09 Type ll Br02 Br08 ■AK3☐K Br07 Br04 Br06 Br05 Fig.5 Distribution of the BrMADS genes on ten Chinese cabbage relevant chromosomes.The tandem duplicated genes are in the box. chromosomes.a The 158 BrMADS genes non-randomly distributed The conserved collinear blocks on each chromosome are labeled A-X on each conserved collinear blocks of the chromosome.Type I and and are colour-coded according to inferred ancestral chromosomes type II genes are coloured blue and red,respectively Chromosome following an established convention.b The percentages of BrMADS numbers are indicated above each chromosome followed by type genes on each chromosome are demonstrated by the pie.e The per- I and type II numbers.The MADS-box genes present on duplicated centages of BrMADS type I and type II genes on each chromosome chromosomal segments are connected by blue lines between the two are demonstrated by the doughnut chart (colour figure online) located more members that from type I on chromosomes duplications have produced MADS-box gene clusters or 03,07 and 08(Fig 5c).Interestingly,this is different from hotspots,whereas segment duplications have produced Arabidopsis,where the MIKC genes are distributed evenly many homologs of MADS-box genes on different chro- across all five chromosomes,whereas the type I genes are mosomes,as indicated with purple lines.The results indi- located mainly on chromosomes 1 and 5(Parenicova et al. cated that the divergence time of duplicated BrMADS gene 2003). pairs ranged from 0.18 to 11.29 million years ago (MYA) Duplicated genes from eukaryotic transcription factor and averaged 6.43 MYA,which indicates that the dupli- families have originated predominantly from inter-chromo- cated divergence of the MADS family members in B.rapa somal duplications(Friedman and Hughes 2001).The large mostly accompanied the triplication events (5-9 MYA) size of the gene family MADS-box in B.rapa may suggest (Supplementary Table 9)(Wang et al.2011). that this gene family underwent frequent duplication events during evolution.To learn more regarding the duplication Differential expression of BrMADS genes in various tissues of these genes,we defined the duplicated genes based on their Ks values and phylogenetic criteria (Supplementary To identify tissue-specific expression profiles of BrMADS Table 7,8 and 9).Furthermore,these genes,which share genes,we utilised transcriptome data that were derived similar gene structure and protein structure,were shown from Illumina RNA-Seg reads that were generated and in chromosomes and in the phylogenetic tree (Fig.5 and analysed by Tong et al.(2013).The transcript abundance Supplementary Fig.8).The duplicated genes were clus-of 160 BrMADSs in 6 different tissues,including callus. tered closely together at the extremities of the phyloge-root,stem,leaf,flower,and silique,was obtained;how- netic tree.Most MADS-box genes have undergone segment ever,almost all of the type I BrMADSs either transcribed at duplication (39 duplications),whereas others have under- too low a level to be detected or have spatial and temporal gone tandem duplication (6 duplications)(Fig.5).Tandem expression patterns that had no expression in the RNA-seq Springer
248 Mol Genet Genomics (2015) 290:239–255 1 3 located more members that from type I on chromosomes 03, 07 and 08 (Fig 5c). Interestingly, this is different from Arabidopsis, where the MIKC genes are distributed evenly across all five chromosomes, whereas the type I genes are located mainly on chromosomes 1 and 5 (Parenicová et al. 2003). Duplicated genes from eukaryotic transcription factor families have originated predominantly from inter-chromosomal duplications (Friedman and Hughes 2001). The large size of the gene family MADS-box in B. rapa may suggest that this gene family underwent frequent duplication events during evolution. To learn more regarding the duplication of these genes, we defined the duplicated genes based on their Ks values and phylogenetic criteria (Supplementary Table 7, 8 and 9). Furthermore, these genes, which share similar gene structure and protein structure, were shown in chromosomes and in the phylogenetic tree (Fig. 5 and Supplementary Fig. 8). The duplicated genes were clustered closely together at the extremities of the phylogenetic tree. Most MADS-box genes have undergone segment duplication (39 duplications), whereas others have undergone tandem duplication (6 duplications) (Fig. 5). Tandem duplications have produced MADS-box gene clusters or hotspots, whereas segment duplications have produced many homologs of MADS-box genes on different chromosomes, as indicated with purple lines. The results indicated that the divergence time of duplicated BrMADS gene pairs ranged from 0.18 to 11.29 million years ago (MYA) and averaged 6.43 MYA, which indicates that the duplicated divergence of the MADS family members in B. rapa mostly accompanied the triplication events (5–9 MYA) (Supplementary Table 9) (Wang et al. 2011). Differential expression of BrMADS genes in various tissues To identify tissue-specific expression profiles of BrMADS genes, we utilised transcriptome data that were derived from Illumina RNA-Seq reads that were generated and analysed by Tong et al. (2013). The transcript abundance of 160 BrMADSs in 6 different tissues, including callus, root, stem, leaf, flower, and silique, was obtained; however, almost all of the type I BrMADSs either transcribed at too low a level to be detected or have spatial and temporal expression patterns that had no expression in the RNA-seq Fig. 5 Distribution of the BrMADS genes on ten Chinese cabbage chromosomes. a The 158 BrMADS genes non-randomly distributed on each conserved collinear blocks of the chromosome. Type I and type II genes are coloured blue and red, respectively Chromosome numbers are indicated above each chromosome followed by type I and type II numbers. The MADS-box genes present on duplicated chromosomal segments are connected by blue lines between the two relevant chromosomes. The tandem duplicated genes are in the box. The conserved collinear blocks on each chromosome are labeled A–X and are colour-coded according to inferred ancestral chromosomes following an established convention. b The percentages of BrMADS genes on each chromosome are demonstrated by the pie. c The percentages of BrMADS type I and type II genes on each chromosome are demonstrated by the doughnut chart (colour figure online)