正在加载图片...
THE HUMAN GENOME nome, and even a modest error rate can entire human genome in a single facility, dent, nonbiased view of the genome. The sec- reduce the effectiveness of assembly. In we were able to ensure uniform quality ond approach involves clustering all of the frag- ddition, maintaining the validity of mate- standards and the cost advantages associat- ments to a region or chromosome on the basis pair information is absolutely critical for ed with automation, an economy of scale, of mapping information. The clustered data the algorithms described below. Procedural and process consistency were then shredded and subjected to computa- controls were established for maintaining tional assembly. Both approaches provided es- the validity of sequence mate-pairs as se- 2 Genome Assembly Strategy and ntially the same reconstruction of assembled quencing reactions proceeded through the Characterization process, including strict rules built into the Summary. We describe in this section the two DNA sequence win proper order and orienta- LIMS. The accuracy of sequence data pro- approaches that we used to assemble the ge- greater sequence coverage(fewer gaps) and duced by the Celera process was validated nome. One method involves the computational was the principal sequence used for the analysi in the course of the Drosophila genome combination of all sequence reads with shred- phase. In addition, we document the complete- project (26). By collecting data for the ded data from Gen Bank to generate an indepen- ness and correctness of this assembly process Potential Entry Points Potential Exit Points oces Human Sample Workflow Process - sample screening Tissue Samples DNA/RNA Extraction QC: size and clarity DNA/RNA (DNA Resources] /DNA Resources, (DNA Resources) QC, size concentration QC, insert size DNA/RNA(External) Libraries (DNA Resource Library Construction library complexity /DNA Resources/ (DNA Resources] 8cNEam5o Libraries QC: titer functional test Pre-Sequencing Fluorescently Labeled DNA Resource (Pre-Sequencing Labl C: monitor statistical Fluorescently Labeled Sequencing summary data Trace Files [NT] Sequencing Lab (Pre-Sequencing Lab) (Sequencing Lab/ vector contaminant Trace Files [UNIX load QCDS quality info Post-Sequencing creening [Content Systems/ 33sE= QC: byte count, External Fragments emove duplicates Proces Pre-Assembly IContent Systems·EDA Content Systems] /Content Systemsj QC: "gatekee External Trimmed syntax, duplicates Fragments Proto I/O File Generation_ quality values. Proto l/o files y Chromosome Proto l/o Files gatekeeper"run again Assembly Team QA review Assemblies [Informatics Research/ R/C Fig. 2. Flow diagram for sequencing pipeli lected, and processed in compliance with standard operating proc and da tau ith ot Maternal and extemal entities ac t dures, with a focus on quality within and across departments. Each ntrol measures, and responsible parties are indicated and are process has defined inputs and outputs with the capability to exchang further in the text. 1308 16FebRuaRy2001Vol291SciEncewww.sciencemag.orgnome, and even a modest error rate can reduce the effectiveness of assembly. In addition, maintaining the validity of mate￾pair information is absolutely critical for the algorithms described below. Procedural controls were established for maintaining the validity of sequence mate-pairs as se￾quencing reactions proceeded through the process, including strict rules built into the LIMS. The accuracy of sequence data pro￾duced by the Celera process was validated in the course of the Drosophila genome project (26). By collecting data for the entire human genome in a single facility, we were able to ensure uniform quality standards and the cost advantages associat￾ed with automation, an economy of scale, and process consistency. 2 Genome Assembly Strategy and Characterization Summary. We describe in this section the two approaches that we used to assemble the ge￾nome. One method involves the computational combination of all sequence reads with shred￾ded data from GenBank to generate an indepen￾dent, nonbiased view of the genome. The sec￾ond approach involves clustering all of the frag￾ments to a region or chromosome on the basis of mapping information. The clustered data were then shredded and subjected to computa￾tional assembly. Both approaches provided es￾sentially the same reconstruction of assembled DNA sequence with proper order and orienta￾tion. The second method provided slightly greater sequence coverage (fewer gaps) and was the principal sequence used for the analysis phase. In addition, we document the complete￾ness and correctness of this assembly process Fig. 2. Flow diagram for sequencing pipeline. Samples are received, selected, and processed in compliance with standard operating proce￾dures, with a focus on quality within and across departments. Each process has defined inputs and outputs with the capability to exchange samples and data with both internal and external entities according to defined quality guidelines. Manufacturing pipeline processes, products, quality control measures, and responsible parties are indicated and are described further in the text. T H E H UMAN G ENOME 1308 16 FEBRUARY 2001 VOL 291 SCIENCE www.sciencemag.org on September 27, 2009 www.sciencemag.org Downloaded from
<<向上翻页向下翻页>>
©2008-现在 cucdc.com 高等教育资讯网 版权所有