麻省理工学院：《动物遗传学 Genetics》课程教学资源（英文讲义）Lecture 10 Analysis of Gene Sequences.pdf_大学文库

Classically,genes are identified by their function.That is the existence of the gene is recognized because of mutations in the gene that give an observable phenotypic change. Historically,many genes have been discovered because of their effects on phenotype. Now,in the era of genomic sequencing,many genes of no known function can be detected by looking for patterns in DNA sequences.The simplest method which works for bacterial and phage genes(but not for most eukaryotic genes as we will see later)is to look for stretches of sequence that lack stop codons.These are known as "open reading frames"or ORFs.This works because a random seguence should contain an average of one stop codon in every 21 codons.Thus,the probability of a random occurrence of even a short open reading frame of say 100 codons without a stop codon is very small(61/ 64)00=8.2×103 Identifying genes in DNA sequences from higher organisms is usally more difficult than in bacteria.This is because in humans,for example,gene coding sequences are separated by long sequences that do not code for proteins.Moreover,genes of higher eukaryotes are interrupted by introns,which are sequences that are spliced out of the RNA before translation.The presence of introns breaks up the open reading frames into short segments making them much harder to distinguish from non-coding sequences.The maps below show 50 kbp segments of DNA from yeast,Drosophila,and humans.The dark grey boxes represent coding sequences and the light grey boxes represent introns.The boxes above the line are transcribed to the right ant the boxes below are transcribed to the left.Names have been assigned to each of the identified genes.Although the yeast genes are much like those of bacteria(few introns and packed closely together),the Drosophila and human genes are spread apart and interrupted by many introns.Sophisti- cated computer algorithms were used to identify these dispersed gene sequences. Saccharomyces cerevisiae YFL046W YFL040W YFL030W RGD2 FET5 TUB2 RP041 YFL034W HAC1 STE2 0 50 SEC53 ACT1 MOB2 RIM15 CAK1 BST1 EPL1 YFL044C YPT1 RPL22B CAF16 YFL042C GYP8 Drosophila melanogaster CG3131 syt CG15400 0 50 CG16987CG2964 CG3123 Human GATA1 HDAC6 L0C139168 0 PCSK1N

Classically, genes are identified by their function. That is the existence of the gene is recognized because of mutations in the gene that give an observable phenotypic change. Historically, many genes have been discovered because of their effects on phenotype. Now, in the era of genomic sequencing, many genes of no known function can be detected by looking for patterns in DNA sequences. The simplest method which works for bacterial and phage genes (but not for most eukaryotic genes as we will see later) is to look for stretches of sequence that lack stop codons. These are known as “open reading frames” or ORFs. This works because a random sequence should contain an average of one stop codon in every 21 codons. Thus, the probability of a random occurrence of even a short open reading frame of say 100 codons without a stop codon is very small (61/ 64)100 = 8.2 x 10–3 Identifying genes in DNA sequences from higher organisms is usally more difficult than in bacteria. This is because in humans, for example, gene coding sequences are separated by long sequences that do not code for proteins. Moreover, genes of higher eukaryotes are interrupted by introns introns, which are sequences that are spliced out of the RNA before translation. The presence of introns breaks up the open reading frames into short segments making them much harder to distinguish from non-coding sequences. The maps below show 50 kbp segments of DNA from yeast, Drosophila, and humans. The dark grey boxes represent coding sequences and the light grey boxes represent introns. The boxes above the line are transcribed to the right ant the boxes below are transcribed to the left. Names have been assigned to each of the identified genes. Although the yeast genes are much like those of bacteria (few introns and packed closely together), the Drosophila and human genes are spread apart and interrupted by many introns. Sophisticated computer algorithms were used to identify these dispersed gene sequences. Saccharomyces cerevisiae YFL046W YFL040W YFL030W RGD2 FET5 TUB2 RP041 YFL034W HAC1 STE2 SEC53 ACT1 MOB2 RIM15 CAK1 BST1 EPL1 0 50 YFL044C YPT1 RPL22B CAF16 YFL042C GYP8 Drosophila melanogaster CG3131 CG16987 CG2964 CG15400 CG3123 syt 0 50 Human GATA1 HDAC6 LOC139168 0 50 PCSK1N

DNA Sequencing Consider a segment of DNA that is about 1000 base pairs long that we wish to sequence. (1)The two DNA strands are separated.Heating to 100C to melt the base pairing hydrogen bonds that hold the strands together does this. (2)A short oligonucleotide(ca.18 bases)designed to be complimentary to the end of one of the strands is allowed to anneal to the single stranded DNA.The resulting DNA hybrid looks much like the general polymerase substrate shown previously. (3)DNA polymerase is added along with the four nucleotide precursors(dATP,dGTP, dCTP,and dTTP).The mixture is then divided into four separate reactions and to each reaction a small quantity different dideoxy nucleotide precursor is added.Dideoxy nucleotide precursors are abbreviated ddATP,ddGTP,ddCTP,and ddTTP. (4)The polymerase reactions are allowed to proceed and,using one of a variety of methods,radiolabel is incorporated into the newly synthesized DNA. (5)After the DNA polymerase reactions are complete,the samples are melted and run on a gel system that allows DNA strands of different lengths to be resolved.The DNA sequence can be read from the gel by noting the positions of the radiolabeled fragments. The crucial element of the sequencing reactions is the added dideoxynuclotides.These molecules are identical to the normal nucleotide precursors in all respects except that they lack a hydroxyl group at their 3'position(3'OH). CH OH OH OH OH OH OH dNTP ddNTP Thus dideoxynuclotides can be incorporated into DNA,but once a dideoxynuclotide has been incorporated further elongation stops because the resulting DNA will no longer have a free 3'OH end.Each of the four reactions contains one of the dideoxynuclotides added at about 1%the concentration of the normal nucleotide precursors.Thus,for example,in the reaction with added ddATP about 1%of the elongated chains will terminate at the position of each A in the sequence.Once all of the elongating chains have been terminated there will be a population of labeled chains that have terminated at the position of each A in the sequence

DNA Sequencing Consider a segment of DNA that is about 1000 base pairs long that we wish to sequence. (1) The two DNA strands are separated. Heating to 100˚C to melt the base pairing hydrogen bonds that hold the strands together does this. (2) A short oligonucleotide (ca. 18 bases) designed to be complimentary to the end of one of the strands is allowed to anneal to the single stranded DNA. The resulting DNA hybrid looks much like the general polymerase substrate shown previously. (3) DNA polymerase is added along with the four nucleotide precursors (dATP, dGTP, dCTP, and dTTP). The mixture is then divided into four separate reactions and to each reaction a small quantity different dideoxy nucleotide precursor is added. Dideoxy nucleotide precursors are abbreviated ddATP, ddGTP, ddCTP, and ddTTP. (4) The polymerase reactions are allowed to proceed and, using one of a variety of methods, radiolabel is incorporated into the newly synthesized DNA. (5) After the DNA polymerase reactions are complete, the samples are melted and run on a gel system that allows DNA strands of different lengths to be resolved. The DNA sequence can be read from the gel by noting the positions of the radiolabeled fragments. The crucial element of the sequencing reactions is the added dideoxynuclotides. These molecules are identical to the normal nucleotide precursors in all respects except that they lack a hydroxyl group at their 3’ position (3’ OH). Thus dideoxynuclotides can be incorporated into DNA, but once a dideoxynuclotide has been incorporated further elongation stops because the resulting DNA will no longer have a free 3’ OH end. Each of the four reactions contains one of the dideoxynuclotides added at about 1% the concentration of the normal nucleotide precursors. Thus, for example, in the reaction with added ddATP about 1% of the elongated chains will terminate at the position of each A in the sequence. Once all of the elongating chains have been terminated there will be a population of labeled chains that have terminated at the position of each A in the sequence