正在加载图片...
Yang et al/Comparative transcrip lysis of Phalaenopsis flowers used false discovery rate(FDR)to determine the threshold Tab.I Summary of sequencing and assembly results. of the P-value in multiple tests.To identify DEGs,we used a combination of two criteria:expression fold-change log,Ra- Assembled Statistics of data tio 1 and FDR-adjusted P-value <0.001. data production White petals Red labella We performed GO and KEGG functional enrichment Reads analysis to determine which DEGs were significantly en- No.of reads 10734813 16224038 riched in GO terms and metabolic pathways(Ps 0.05) Total nucleotides 21349247973276861889 compared with the selected transcriptome background. (nt) Significance was calculated according to the formula: GC percentage(%) 46.80 45.08 Q20 percentage(%) 100.00 100.00 P=1- Contigs No.of contigs 1057077 2105819 Total nucleotides 79618808 141300083 (nt)in contigs where N is the number of genes with GO/KEGG annota- Length of N50(bp) 99 65 tions,n is the number of DEGs in the set of N genes,M is Mean length of 75 6> the number of genes mapped to a certain GO/KEGG term, contigs (bp) and m is the number of DEGs in the set of M genes. No.of contigs above 16999 17334 500bp Scaffolds Results No.of scaffolds 55101 67026 Total nucleotides 44692377 63535969 Short-read de novo sequencing and assembly (nt)in scaffolds We pooled equal amounts of RNA from the two samples Length of N50 (bp) 1280 1443 (petals and labella)to construct cDNA libraries for tran- Mean length of 811 948 scriptome sequencing and analysis on a Genome Analyzer scaffolds(bp) IIx platform using Illumina sequencing technology.Using No.of scaffolds 23777 40620 this sequencing approach,raw reads were generated from above 500 bp both ends of the cDNA fragments.After data filtering using Unigenes stringent quality criteria(i.e.,removal of sequences shorter No.of unigenes 37723 34020 than 65 bp or with CycleQ20 values less than 100%),we Total nucleotides 26656163 27636110 obtained 10 734 813 clean reads comprising 2 134 924 797 (nt)in unigenes nucleotides(nt)from the petal library and 16 224 038 clean Length of N50 (bp) 1125 1398 reads comprising 3 276 861 889 nt from the labella library. Average length of 707 812 The clean reads were assembled de novo into contigs using unigenes (bp) the Trinity software package.The results of the sequencing No.of unigenes 15903 15984 assembly are presented in Tab.1.A total of 1 057 077 and above 500 bp 2 105 819 contigs were assembled from petals and labella, respectively,with corresponding mean lengths of 75 bp and 67 bp and N50s of 99 bp and 65 bp.Of these,16999(1.61%) contigs from petals and 17 334(0.82%)contigs from labella comparing gene expression between the petal and labella were longer than 500 bp.The contigs from petals and labella libraries.Unigene expression was calculated using the RPKM were further assembled into 55 101 and 67 026 scaffolds with method [11].DEGs detected with at least two-fold differ- mean lengths of 811 bp and 948 bp,respectively.Scaffold ences (FDR 0.001 and log,Ratio>1)between petal and N50 lengths were 1280 bp for petals and 1168 bp for labella. labella libraries are shown in Fig.2.Using these criteria,we In total,15 207 scaffolds assembled from petals and 24 853 identified 2736 DEGs between petals and labella.Of these, scaffolds from labella coded for transcripts longer than I kb. 1277 were up-regulated and 1459 were down-regulated in Finally,37 723 unigenes were obtained from the petals,with petals.A greater number of genes were expressed only in a final unigene N50 length of 1 125 bp and a total length the petals(243)than in the labella(62).This result suggests of 26 656 163 nt;similarly,34 020 unigenes were generated that petal and labella development may involve totally dif- from the labella data,with a final unigene N50 length of ferent processes. 1398 bp and a total length of 27 636 110 nt.These unigenes were then organized into a transcriptome database for the DEG functional annotation identification of putative genes related to flower color and To validate and annotate the assembled DEGs,the 2736 floral differentiation DEGs were subjected to BLASTX comparisons(E-value s 1 x 10-5)against several public protein databases to identify DEG analysis between white petal and labella libraries putative functions of the unigene sequences.As a result,2698, A primary goal of transcriptome sequencing is com- 2183,2319,2706,2446,837,and1077 DEGs were found to parison of gene expression levels in different samples. have homologous sequences in Nr,Nt,SwissProt,TrEMBL, In this study,a large number of DEGs were estimated by GO,KEGG,and COG databases,respectively. The Author(s)2014 Published by Polish Botanical Soclety Acta Soc Bot Pol 83(3):191-199 193© The Author(s) 2014 Published by Polish Botanical Society Acta Soc Bot Pol 83(3):191–199 193 Yang et al. / Comparative transcriptome analysis of Phalaenopsis flowers used false discovery rate (FDR) to determine the threshold of the P-value in multiple tests. To identify DEGs, we used a combination of two criteria: expression fold-change |log2 Ra￾tio| ≥ 1 and FDR-adjusted P-value < 0.001. We performed GO and KEGG functional enrichment analysis to determine which DEGs were significantly en￾riched in GO terms and metabolic pathways (P ≤ 0.05) compared with the selected transcriptome background. Significance was calculated according to the formula: where N is the number of genes with GO/KEGG annota￾tions, n is the number of DEGs in the set of N genes, M is the number of genes mapped to a certain GO/KEGG term, and m is the number of DEGs in the set of M genes. Results Short-read de novo sequencing and assembly We pooled equal amounts of RNA from the two samples (petals and labella) to construct cDNA libraries for tran￾scriptome sequencing and analysis on a Genome Analyzer IIx platform using Illumina sequencing technology. Using this sequencing approach, raw reads were generated from both ends of the cDNA fragments. After data filtering using stringent quality criteria (i.e., removal of sequences shorter than 65 bp or with CycleQ20 values less than 100%), we obtained 10 734 813 clean reads comprising 2 134 924 797 nucleotides (nt) from the petal library and 16 224 038 clean reads comprising 3 276 861 889 nt from the labella library. The clean reads were assembled de novo into contigs using the Trinity software package. The results of the sequencing assembly are presented in Tab. 1. A total of 1 057 077 and 2 105 819 contigs were assembled from petals and labella, respectively, with corresponding mean lengths of 75 bp and 67 bp and N50s of 99 bp and 65 bp. Of these, 16 999 (1.61%) contigs from petals and 17 334 (0.82%) contigs from labella were longer than 500 bp. The contigs from petals and labella were further assembled into 55 101 and 67 026 scaffolds with mean lengths of 811 bp and 948 bp, respectively. Scaffold N50 lengths were 1280 bp for petals and 1168 bp for labella. In total, 15 207 scaffolds assembled from petals and 24 853 scaffolds from labella coded for transcripts longer than 1 kb. Finally, 37 723 unigenes were obtained from the petals, with a final unigene N50 length of 1 125 bp and a total length of 26 656 163 nt; similarly, 34 020 unigenes were generated from the labella data, with a final unigene N50 length of 1398 bp and a total length of 27 636 110 nt. These unigenes were then organized into a transcriptome database for the identification of putative genes related to flower color and floral differentiation. DEG analysis between white petal and labella libraries A primary goal of transcriptome sequencing is com￾parison of gene expression levels in different samples. In this study, a large number of DEGs were estimated by comparing gene expression between the petal and labella libraries. Unigene expression was calculated using the RPKM method [11]. DEGs detected with at least two-fold differ￾ences (FDR < 0.001 and |log2 Ratio| ≥ 1) between petal and labella libraries are shown in Fig. 2. Using these criteria, we identified 2736 DEGs between petals and labella. Of these, 1277 were up-regulated and 1459 were down-regulated in petals. A greater number of genes were expressed only in the petals (243) than in the labella (62). This result suggests that petal and labella development may involve totally dif￾ferent processes. DEG functional annotation To validate and annotate the assembled DEGs, the 2736 DEGs were subjected to BLASTX comparisons (E-value ≤ 1 × 10−5) against several public protein databases to identify putative functions of the unigene sequences. As a result, 2698, 2183, 2319, 2706, 2446, 837, and 1077 DEGs were found to have homologous sequences in Nr, Nt, SwissProt, TrEMBL, GO, KEGG, and COG databases, respectively. 𝑃𝑃𝑃𝑃 = 1 − 𝑀𝑀𝑀𝑀 𝑖𝑖𝑖𝑖 𝑁𝑁𝑁𝑁 − 𝑀𝑀𝑀𝑀 𝑛𝑛𝑛𝑛 − 𝑖𝑖𝑖𝑖 𝑁𝑁𝑁𝑁 𝑁𝑁𝑁𝑁 !!! !!! Assembled data Statistics of data production White petals Red labella Reads No. of reads 10 734 813 16 224 038 Total nucleotides (nt) 2 134 924 797 3 276 861 889 GC percentage (%) 46.80 45.08 Q20 percentage (%) 100.00 100.00 Contigs No. of contigs 1 057 077 2 105 819 Total nucleotides (nt) in contigs 79 618 808 141 300 083 Length of N50 (bp) 99 65 Mean length of contigs (bp) 75 67 No. of contigs above 500 bp 16 999 17 334 Scaffolds No. of scaffolds 55 101 67 026 Total nucleotides (nt) in scaffolds 44 692 377 63 535 969 Length of N50 (bp) 1280 1443 Mean length of scaffolds (bp) 811 948 No. of scaffolds above 500 bp 23 777 40 620 Unigenes No. of unigenes 37 723 34 020 Total nucleotides (nt) in unigenes 26 656 163 27 636 110 Length of N50 (bp) 1125 1398 Average length of unigenes (bp) 707 812 No. of unigenes above 500 bp 15 903 15 984 Tab. 1 Summary of sequencing and assembly results
<<向上翻页向下翻页>>
©2008-现在 cucdc.com 高等教育资讯网 版权所有