正在加载图片...
articles construction. Volunteers of diverse backgrounds were accepted on a RPCI-13 and CalTech D libraries(Table 1). DNA from each BAC first-come, first-taken basis Samples were obtained after discussion clone was digested with the restriction enzyme HindIll, and the sizes ith a genetic counsellor and written informed consent. The of the resulting fragments were measured by agarose gel electro- samples were made anonymous as follows: the sampling laboratory phoresis. The pattern of restriction fragments provides a ' finger stripped all identifiers from the samples, applied random numeric print for each BAC, which allows different BACs to be distinguished labels, and transferred them to the processing laboratory, which and the degree of overlaps to be assessed. We used these restriction- hen removed all labels and relabelled the samples. All records of the fragment fingerprints to determine clone overlaps, and thereby labelling were destroyed. The processing laboratory chose samples assembled the BACs into fingerprint clone contigs at random from which to prepare DNA and immortalized cell lines. The fingerprint clone contigs were positioned along the chromo- Around 5-10 samples were collected for every one that was somes by anchoring them with STS markers from existing genetic ventually used. Because no link was retained between donor and and physical maps. Fingerprint clone contigs were tied to specific DNA sample, the identity of the donors for the libraries is not STSs initially by probe hybridization and later by direct search of the known, even by the donors themselves. A more complete descrip- sequenced clones. To localize fingerprint clone contigs that did not tioncanbefoundathttp://www.nhgri.nih.gov/grant_info/fuNd-containknownmarkersnewStsSweregeneratedandplacedonto ing/Statements/RFA/human_subjects. htmL. chromosomes.Representative clones were also positioned by fluor- During the pilot phase, centres showed that sequence-tagged sites escence in situ hybridization(FISH)(ref. 86 and C. McPherson, (STSs)from previously constructed genetic and physical maps unpublished) t data were dditional probes from flow sorting of chromosomes to obtain reviewed.g to evaluate overlaps and to assess cove rage of specific chromosomes or chromosomal bias against rearranged clones,). STS content information and regions BAC end sequence information were also used. Where possible, For the large-scale sequence production phase, a genome-wide we tried to select a minimally overlapping set spanning a region hysical map of overlapping clones was also cor ted by sys- However, because the genome-wide physical map was constructed tematic analysis of BAC clones representing 20-fold coverage of the concurrently with the sequencing, continuity in many regions wa human genome Most clones came from the first three sections of low in early stages. These small fingerprint clone contigs were the RPCI-11 library, supplemented with clones from sections of the nonetheless useful in identifying validated, nonredundant clones Table 1 Key large-insert genome-wide libraries Library name" GenBank Vector Source DNA Lit umber Number of abbrevation type om日 the draft genome Number Total bases fraction af library BAC Hind‖ 0021 Caltech D1 TD BAC Human 3811,36718560043 2,566-267 3,000-3253EcoF RPC1-1 3.388 RPCI- 267,931379773 ECoRI 321312 252413.9089 0916 eight libraries Total all Bbraries 354510 2984,2605 nds, more than 95% of both end sequences contained at least 100 bp of nonrepetitive sequence BAC-end amia nstitute of Technology and the University of Washington High Throughput Sequencing cente fortheTablewerehttp://www.ncbi.nm.nihgow/ganome/clone/ sthesEaretheclonesinthesequenced-clonelayoutmaphttp://genome.wustl.edw/gsc/human/apping/index.shtmlthatwerepredraftdraftorfinished ojects; in addition, not all of the clones from completed chromosomes 21 and 22 were included here because only the avail equence from those chromosomes was used in the assembly f The number reported is the tot 866 A@2001 Macmillan Magazines Ltd NATURE VOL 409 15 FEBRUARY 20011construction. Volunteers of diverse backgrounds were accepted on a ®rst-come, ®rst-taken basis. Samples were obtained after discussion with a genetic counsellor and written informed consent. The samples were made anonymous as follows: the sampling laboratory stripped all identi®ers from the samples, applied random numeric labels, and transferred them to the processing laboratory, which then removed all labels and relabelled the samples. All records of the labelling were destroyed. The processing laboratory chose samples at random from which to prepare DNA and immortalized cell lines. Around 5±10 samples were collected for every one that was eventually used. Because no link was retained between donor and DNA sample, the identity of the donors for the libraries is not known, even by the donors themselves. A more complete descrip￾tion can be found at http://www.nhgri.nih.gov/Grant_info/Fund￾ing/Statements/RFA/human_subjects.html. During the pilot phase, centres showed that sequence-tagged sites (STSs) from previously constructed genetic and physical maps could be used to recover BACs from speci®c regions. As sequencing expanded, some centres continued this approach, augmented with additional probes from ¯ow sorting of chromosomes to obtain long-range coverage of speci®c chromosomes or chromosomal regions89±94. For the large-scale sequence production phase, a genome-wide physical map of overlapping clones was also constructed by sys￾tematic analysis of BAC clones representing 20-fold coverage of the human genome86. Most clones came from the ®rst three sections of the RPCI-11 library, supplemented with clones from sections of the RPCI-13 and CalTech D libraries (Table 1). DNA from each BAC clone was digested with the restriction enzyme HindIII, and the sizes of the resulting fragments were measured by agarose gel electro￾phoresis. The pattern of restriction fragments provides a `®nger￾print' for each BAC, which allows different BACs to be distinguished and the degree of overlaps to be assessed. We used these restriction￾fragment ®ngerprints to determine clone overlaps, and thereby assembled the BACs into ®ngerprint clone contigs. The ®ngerprint clone contigs were positioned along the chromo￾somes by anchoring them with STS markers from existing genetic and physical maps. Fingerprint clone contigs were tied to speci®c STSs initially by probe hybridization and later by direct search of the sequenced clones. To localize ®ngerprint clone contigs that did not contain known markers, new STSs were generated and placed onto chromosomes95. Representative clones were also positioned by ¯uor￾escence in situ hybridization (FISH) (ref. 86 and C. McPherson, unpublished). We selected clones from the ®ngerprint clone contigs for sequen￾cing according to various criteria. Fingerprint data were reviewed86,90 to evaluate overlaps and to assess clone ®delity (to bias against rearranged clones83,96). STS content information and BAC end sequence information were also used91,92. Where possible, we tried to select a minimally overlapping set spanning a region. However, because the genome-wide physical map was constructed concurrently with the sequencing, continuity in many regions was low in early stages. These small ®ngerprint clone contigs were nonetheless useful in identifying validated, nonredundant clones articles 866 NATURE | VOL 409 | 15 FEBRUARY 2001 | www.nature.com Table 1 Key large-insert genome-wide libraries Library name* GenBank abbreviation Vector type Source DNA Library segment or plate numbers Enzyme digest Average insert size (kb) Total number of clones in library Number of ®ngerprinted clones² BAC-end sequence (ends/clones/ clones with both ends sequenced)³ Number of clones in genome layout§ Sequenced clones used in construction of the draft genome sequence Numberk Total bases (Mb)¶ Fraction of total from library Caltech B CTB BAC 987SK cells All HindIII 120 74,496 16 2/1/1 528 518 66.7 0.016 Caltech C CTC BAC Human sperm All HindIII 125 263,040 144 21,956/ 14,445/ 7,255 621 606 88.4 0.021 Caltech D1 (CITB-H1) CTD BAC Human sperm All HindIII 129 162,432 49,833 403,589/ 226,068/ 156,631 1,381 1,367 185.6 0.043 Caltech D2 (CITB-E1) BAC Human sperm All 2,501±2,565 EcoRI 202 24,960 2,566±2,671 EcoRI 182 46,326 3,000±3,253 EcoRI 142 97,536 RPCI-1 RP1 PAC Male, blood All MboI 110 115,200 3,388 1,070 1,053 117.7 0.028 RPCI-3 RP3 PAC Male, blood All MboI 115 75,513 644 638 68.5 0.016 RPCI-4 RP4 PAC Male, blood All MboI 116 105,251 889 881 95.5 0.022 RPCI-5 RP5 PAC Male, blood All MboI 115 142,773 1,042 1,033 116.5 0.027 RPCI-11 RP11 BAC Male, blood All 178 543,797 267,931 379,773/ 243,764/ 134,110 19,405 19,145 3,165.0 0.743 1 EcoRI 164 108,499 2 EcoRI 168 109,496 3 EcoRI 181 109,657 4 EcoRI 183 109,382 5 MboI 196 106,763 Total of top eight libraries 1,482,502 321,312 805,320/ 484,278/ 297,997 25,580 25,241 3,903.9 0.916 Total all libraries 354,510 812,594/ 488,017/ 100,775 30,445 29,298 4,260.5 1 ................................................................................................................................................................................................................................................................................................................................................................... * For the CalTech libraries82, see http://www.tree.caltech.edu/lib_status.html; for RPCI libraries83, see http://www.chori.org/bacpac/home.htm. ² For the FPC map and ®ngerprinting84±86, see http://genome.wustl.edu/gsc/human/human_database.shtml. ³ The number of raw BAC end sequences (clones/ends/clones with both ends sequenced) available for use in human genome sequencing. Typically, for clones in which sequence was obtained from both ends, more than 95% of both end sequences contained at least 100 bp of nonrepetitive sequence. BAC-end sequencing of RPCI-11 and of the CalTech libraries was done at The Institute for Genomic Research, the California Institute of Technology and the University of Washington High Throughput Sequencing Center. The sources for the Table were http://www.ncbi.nlm.nih.gov/genome/clone/ BESstat.shtml and refs 87, 88. § These are the clones in the sequenced-clone layout map (http://genome.wustl.edu/gsc/human/Mapping/index.shtml) that were pre-draft, draft or ®nished. k The number of sequenced clones used in the assembly. This number is less than that in the previous column owing to removal of a small number of obviously contaminated, combined or duplicated projects; in addition, not all of the clones from completed chromosomes 21 and 22 were included here because only the available ®nished sequence from those chromosomes was used in the assembly. ¶ The number reported is the total sequence from the clones indicated in the previous column. Potential overlap between clones was not removed here, but Ns were excluded. © 2001 Macmillan Magazines Ltd
<<向上翻页向下翻页>>
©2008-现在 cucdc.com 高等教育资讯网 版权所有