Lecture 19 EUKARYOTIC GENES AND GENOMES I For the last several lectures we have been looking at how one can manipulate prokaryotic genomes and how prokaryotic genes are regulated.In the next several lectures we will be considering eukaryotic genes and genomes, and considering how model eukaryotic organisms are used to study eukaryotic gene function.During the course of the next six lectures we will think about genes and genomes of some commonly used model organisms,the yeast Saccharomyces cerevisiae and the mouse Mus musculus.But first let's look how the genes and genomes of these organisms compare to E.coli at one extreme, and humans at the other. genome=DNA content of a complete haploid set of chromosomes DNA content of a gamete (sperm or egg) DNA year Species Chromosomes CM content/ genes/ genes have sequence introns? haploid (Mb) completed haploid E.coli N/A 1997 4,200 no S.cerevisiae 16 4000 12 1997 5,800 rarely C.elegans 6 300 100 1998 19.000 nearly all D.melanogaster 4 280 180 2000 14,000 nearly all 2002 draft M.musculus 20 1700 3000 2005 finished? 22,500? nearly all H.sapiens 23 3300 3000 2001 draft 2003 finished 22,500? nearly all Note: cM=centi Morgan =1%recombination Mb=megabase=1 million base-pairs of DNA Kb kilobase =1 thousand base-pairs of DNA Let's think about the number of genes in an organism and the size of the organism's genome.The average protein is about 300 amino acids long, requiring 300 triplet codons,or roughly 1Kb of DNA.Thus it makes sense that to encode 4,200 genes E.coli requires a genome of 5 million base pairs.However, the human genome encodes about 22,500 proteins,and this should require a genome of lets say 25 million base pairs.Instead,humans have a genome that is 3000 million base pairs,or ~3,000 Mb,i.e.,~3 billion base pairs.In other words,there is about 100-fold more DNA in the human genome than is required for encoding 22,500 proteins.What is it all doing?Some of it constitutes promoters upstream of each gene,some is structural DNA around centromeres
Lecture 19 EUKARYOTIC GENES AND GENOMES I For the last several lectures we have been looking at how one can manipulate prokaryotic genomes and how prokaryotic genes are regulated. In the next several lectures we will be considering eukaryotic genes and genomes, and considering how model eukaryotic organisms are used to study eukaryotic gene function. During the course of the next six lectures we will think about genes and genomes of some commonly used model organisms, the yeast Saccharomyces cerevisiae and the mouse Mus musculus. But first let’s look how the genes and genomes of these organisms compare to E. coli at one extreme, and humans at the other. Kb = kilobase = 1 thousand base-pairs of DNA = DNA content of a gamete (sperm or egg) genome = DNA content of a complete haploid set of chromosomes H. sapiens M. musculus D. melanogaster C. elegans S. cerevisiae E. coli genes/ haploid year sequence completed DNA content/ haploid (Mb) Species Chromosomes cM 1 16 6 4 20 23 N/A 4000 300 280 1700 3300 5 12 100 180 3000 3000 1997 1997 1998 2000 19,000 14,000 22,500? 22,500? genes have introns? no rarely nearly all nearly all nearly all nearly all Mb = megabase = 1 million base-pairs of DNA Note: cM = centi Morgan = 1% recombination 2002 draft 2001 draft 2005 finished? 2003 finished 4,200 5,800 Kb = kilobase = 1 thousand base-pairs of DNA = DNA content of a gamete (sperm or egg) genome = DNA content of a complete haploid set of chromosomes H. sapiens M. musculus D. melanogaster C. elegans S. cerevisiae E. coli genes/ haploid year sequence completed DNA content/ haploid (Mb) Species Chromosomes cM 1 16 6 4 20 23 N/A 4000 300 280 1700 3300 5 12 100 180 3000 3000 1997 1997 1998 2000 19,000 14,000 22,500? 22,500? genes have introns? no rarely nearly all nearly all nearly all nearly all Mb = megabase = 1 million base-pairs of DNA Note: cM = centi Morgan = 1% recombination 2002 draft 2001 draft 2005 finished? 2003 finished 4,200 5,800 Kb = kilobase = 1 thousand base-pairs of DNA = DNA content of a gamete (sperm or egg) genome = DNA content of a complete haploid set of chromosomes H. sapiens M. musculus D. melanogaster C. elegans S. cerevisiae E. coli genes/ haploid year sequence completed DNA content/ haploid (Mb) Species Chromosomes cM 1 16 6 4 20 23 N/A 4000 300 280 1700 3300 5 12 100 180 3000 3000 1997 1997 1998 2000 19,000 14,000 22,500? 22,500? genes have introns? no rarely nearly all nearly all nearly all nearly all Mb = megabase = 1 million base-pairs of DNA Note: cM = centi Morgan = 1% recombination 2002 draft 2001 draft 2005 finished? 2003 finished 4,200 5,800 Let’s think about the number of genes in an organism and the size of the organism’s genome. The average protein is about 300 amino acids long, requiring 300 triplet codons, or roughly 1Kb of DNA. Thus it makes sense that to encode 4,200 genes E. coli requires a genome of 5 million base pairs. However, the human genome encodes about 22,500 proteins, and this should require a genome of lets say 25 million base pairs. Instead, humans have a genome that is ~ 3000 million base pairs, or ~ 3,000 Mb, i.e., ~ 3 billion base pairs. In other words, there is about 100-fold more DNA in the human genome than is required for encoding 22,500 proteins. What is it all doing? Some of it constitutes promoters upstream of each gene, some is structural DNA around centromeres
and telomeres(the end of chromosomes,some is simply intergenic regions(non- coding regions between genes)but much of it is present as introns. What does it mean "Genes Have Introns".This represents one of the fundamental organizational differences between prokaryotic and eukaryotic genes.Eukaryotic genes turn out to be interrupted with long DNA sequences that do not encode for gene protein...these exons introns "intervening sequences" 3 chromosome(ds DNA) are called introns. transcription The DNA segments that are ultimately expressed primary transcript(ss RNA) 1 2 3 as protein,i.e.,the DNA addition of 5'cap sequence that contains 3'polyadenylation triplet codon information, splicing out of introns are called exons.The mRNA(ssRNA) MeG1 2 3 AAAAA intronic sequences are cap AUG stop removed from the primary translation transcript by splicing. protein (amino acids) 3 A major consequence of this arrangement is the potential for alternative splicing to produce different proteins species from the same gene and primary transcript.This gives the potential for tremendous amplification of the complexity of mammals(and other eukaryotes)through many more thousands of possible proteins. Note that lower eukaryotes such as the yeast S.cerevisiae only have ~5%of their genes interrupted by introns,but for multicellular organisms,like humans, >90%of all genes are interrupted by anywhere between 2 and 60 introns,but most genes have between 5 and 12 introns. Saccharomyces cerevisiae YFL046W YFL040W YFL030W RGD2 FET5 TUB2 RP041 YFL034W HAC1 STE2 SEC AC MOB RIM15 BST1 EPL1 FL044C YPT1 PL22B CAF16 YFL042C YP8 Drosophila melanogaster CG3131 syt CG15400 0 50 CG16987CG2964 CG3123 Human GATA1 HDAC6 L0C139168 50 PCSK1N Figure by MIT OCW
and telomeres (the end of chromosomes, some is simply intergenic regions (noncoding regions between genes) but much of it is present as introns. What does it mean “Genes Have Introns”. This represents one of the fundamental organizational differences between prokaryotic and eukaryotic genes. Eukaryotic genes turn out to be interrupted with long DNA sequences that do not encode for protein…these “intervening sequences” chromosome (ds DNA) are called introns. 1 2 3 gene exons introns transcription primary transcript (ss RNA) 1 2 3 mRNA (ssRNA) translation protein (amino acids) 1 2 3 MeG 1 2 3 cap AAAAA addition of 5’ cap 3’ polyadenylation splicing out of introns AUG stop chromosome (ds DNA) 1 2 3 gene exons introns transcription primary transcript (ss RNA) 1 2 3 mRNA (ssRNA) translation protein (amino acids) 1 2 3 MeG 1 2 3 cap AAAAA addition of 5’ cap 3’ polyadenylation splicing out of introns MeG 1 2 3 cap AAAAA addition of 5’ cap 3’ polyadenylation splicing out of introns AUG stop The DNA segments that are ultimately expressed as protein, i.e., the DNA sequence that contains triplet codon information, are called exons. The intronic sequences are removed from the primary transcript by splicing. A major consequence of this arrangement is the potential for alternative splicing to produce different proteins species from the same gene and primary transcript. This gives the potential for tremendous amplification of the complexity of mammals (and other eukaryotes) through many more thousands of possible proteins. Note that lower eukaryotes such as the yeast S. cerevisiae only have ~ 5% of their genes interrupted by introns, but for multicellular organisms, like humans, >90% of all genes are interrupted by anywhere between 2 and 60 introns, but most genes have between 5 and 12 introns. Drosophila melanogaster syt CG16987 CG2964 CG3123 CG15400 CG3131 0 50 Human GATA1 HDAC6 LOC139168 PCSK1N 0 50 Saccharomyces cerevisiae RGD2 SEC53 ACT1 FET5 TUB2 RP041 YFL034W HAC1 STE2 YFL046W YFL044C YPT1 MOB2 RPL22B RIM15 CAK1 BST1 EPL1 CAF16 YFL042C 0 50 GYP8 YFL040W YFL030W Figure by MIT OCW
Gene Regulation in Yeast In the next few lectures we will consider how eukaryotic genes and genomes can be manipulated and studied,and we will begin with an example of examining how genes are regulated in S.cerevisiae.First,let's figure out how to use some neat genetics to identify some regulated genes,and in the next lecture we will figure out how one can use genetics to dissect the mechanism of that regulation. Characterizing function and regulation of S.cerevisiae genes:We are going to combine a few neat genetic tools that you learned about in Prof.Kaiser's lectures for this,namely a library of yeast genomic fragments cloned into a bacterial plasmid,a modified transposon(mini-Tn7),and the lacz gene embedded within the transposon.In this experiment the lacz gene is going to be used as a reporter for transcriptional activity of yeast genes. Mini-Tn7 In E.coli T厦 Th7IR LRA3 Th7IR In yeast Required for Reporter of Selection in Selection in Required for Yeast genomic DNA transposition transcription yeast Ecoll transposition E.coli The mini-Tn7 is introduced into a population of E.coli that harbor a plasmid library of the S.cerevisiae genome;i.e.,each E.coli cell is home Tn7 to a plasmid that contains a different Tn7 donor segment of the S.cerevisiae genome, Random yeast such that the whole geneome is insertion library represented many times over in this population of E.coli.The mini-Tn7 is allowed to transpose by integrating into either the plasmid DNA or the bacterial DNA;the original DNA that carries the mini-Tn7 can not replicate,but cells that have integrated the mini-Tn7 into the plasmid or E.coli chromosome are selected as Tetracycline resistant colonies. Plasmid DNA is purified from these transformants and retransformed into tetracycline sensitive E.coli;the resulting tetracycline resistant bacteria harbor only plasmids that have an integrated mini-Tn7 transposon.Plasmid is isolated
Gene Regulation in Yeast In the next few lectures we will consider how eukaryotic genes and genomes can be manipulated and studied, and we will begin with an example of examining how genes are regulated in S. cerevisiae. First, let’s figure out how to use some neat genetics to identify some regulated genes, and in the next lecture we will figure out how one can use genetics to dissect the mechanism of that regulation. Characterizing function and regulation of S. cerevisiae genes: We are going to combine a few neat genetic tools that you learned about in Prof. Kaiser’s lectures for this, namely a library of yeast genomic fragments cloned into a bacterial plasmid, a modified transposon (mini-Tn7), and the lacZ gene embedded within the transposon. In this experiment the lacZ gene is going to be used as a reporter for transcriptional activity of yeast genes. Tn7TR lacZ URA3 tet Tn7TR Reporter of transcription Selection in yeast Selection in E. coli Required for transposition Required for transposition Mini-Tn7 Tn7TR lacZ URA3 tet Tn7TR Reporter of transcription Selection in yeast Selection in E. coli Required for transposition Required for transposition Mini-Tn7 Tn7TR lacZ URA3 tet Tn7TR Tn7TR lacZ URA3 tet Tn7TR In E. coli Tn7TR lacZ URA3 tet Tn7TR In yeast Yeast genomic DNA Tn7TR lacZ URA3 tet Tn7TR Tn7TR lacZ URA3 tet Tn7TR In E. coli Tn7TR lacZ URA3 tet Tn7TR Tn7TR lacZ URA3 tet Tn7TR In E. coli Tn7TR lacZ URA3 tet Tn7TR In yeast Yeast genomic DNA he + Random yeast insertion library Tn7 donor Yeast genomic plasmid library Tn7 E. coli + Random yeast insertion library Tn7 donor Yeast genomic plasmid library Tn7 + Random yeast insertion library Tn7 donor Yeast genomic plasmid library Tn7 Tn7 donor Yeast genomic plasmid library Tn7 E. coli The mini-Tn7 is introduced into a population of E.coli that harbor a plasmid library of the S. cerevisiae genome; i.e., each E. coli cell is home to a plasmid that contains a different segment of the S. cerevisiae genome, such that the whole geneome is represented many times over in this population of E. coli. The mini-Tn7 is allowed to transpose by integrating into either the plasmid DNA or the bacterial DNA; the original DNA that carries the mini-Tn7 can not replicate, but cells that have integrated the mini-Tn7 into t plasmid or E. coli chromosome are selected as Tetracycline resistant colonies. Plasmid DNA is purified from these transformants and retransformed into tetracycline sensitive E. coli; the resulting tetracycline resistant bacteria harbor only plasmids that have an integrated mini-Tn7 transposon. Plasmid is isolated
from these cells and the yeast genomic fragments are isolated by digestion with an appropriate restriction enzyme. So now we have a library of yeast genomic fragments each of which has the transposon inserted;these genomic fragments can be transformed into S. cerevisiae cells that are ura3-.Each Ura+transformant colony will have recombined a Tn7 transposon-containing genomic DNA into its genome.This essentially gives us a library of yeast with transposons randomly integrated into it genome. Note that the lacZ gene in the Promoter transposon does not carry its own of gene X transcription or a translation start site,but if the transposon inserts in the correct orientation downstream of a yeast gene promoter,and in the correct triplet codon reading frame,the .One in two insertions will be in the incorrect lacZ gene comes under the orientation and will not produce a Lacz-fusion control of that promoter and protein when transcription is activated from that promoter a Lacz-fusion .Only one in three correct orientation insertions can produce a LacZ-fusion proten protein is expressed,and most LacZ-fusion proteins display .At most,only one in six insertions produce a robust B-galactosidase activity. functional LacZ-fusion proten Promoter of gene X Translation start stop Transcriptio Yeast cells expressing B-galactosidase activity can easily be detected by growth URA in the presence of 5-bromo-4-chloro- 3-indolyl-beta-D-galactopyranoside, better known as X-gal.LacZ cleaves X- mRNA- AUG gal to release a chemical moiety that has Fusion protein a brilliant blue color...and so the colonies Mini turn bright blue! amino acids amino acids amino acids Fusion Protein has B-galactosidase activity There are at least two useful things to come out of such a collection of yeast strains: (1) Any transposon that integrated into a gene will essentially disrupt that gene and is likely to cause a null mutation. (2) For transposons that integrate into a yeast gene such that the lacz gene is in frame with the genes coding region,the level of B-galactosidase activity in these cells therefore becomes a reporter for the transcription of that gene
from these cells and the yeast genomic fragments are isolated by digestion with an appropriate restriction enzyme. So now we have a library of yeast genomic fragments each of which has the transposon inserted; these genomic fragments can be transformed into S. cerevisiae cells that are ura3-. Each Ura+ transformant colony will have recombined a Tn7 transposon-containing genomic DNA into its genome. This essentially gives us a library of yeast with transposons randomly integrated into it genome. romoter, and in the correct -fusion t . brilliant blue color…and so the colonies c d u such that the lacZ gene n, the level of β-galactosidase activity in these cells therefore eporter for the transcription Note that the lacZ gene in the transposon does not carry its own transcription or a translation start site, but if the transposon inserts in the correct orientation downstream of a yeast gene Tn7TR lacZ URA3 tet Tn7TR Promoter of gene X Tn7TR lacZ URA3 tet Tn7TR Promoter of gene X Yeast cells expressing β-galactosidase activity can easily be detected by growth in the presence of 5-bromo-4-chloro- 3-indolyl-beta-D-galactopyranoside, better known as X-gal. LacZ cleaves Xgal to release a chemical moiety that has •Only one in three correct orientation insertions can produce a LacZ-fusion proten •One in two insertions will be in the incorrect orientation and will not produce a LacZ-fusion protein •At most, only one in six insertions produce a functional LacZ-fusion proten •Only one in three correct orientation insertions can produce a LacZ-fusion proten •One in two insertions will be in the incorrect orientation and will not produce a LacZ-fusion protein •At most, only one in six insertions produce a functional LacZ-fusion proten p RT Tn7 t te A3 UR Zcal RT7 Tn triplet codon reading frame, the lacZ gene comes under the control of that promoter and when transcription is activated from that promoter a LacZ protein is expressed, and mos LacZ-fusion proteins display robust β-galactosidase activity Promoter of gene X Transcription stop Tn7TR lacZ URA3 Transcription start Translation start mRNA AUG Promoter of gene X Transcription stop Tn7TR lacZ URA3 Transcription start Translation start Transcription stop Tn7TR lacZ URA3 Transcription start Translation start Tn7TR lacZ URA3 Transcription start Translation start mRNA AUG a turn bright blue! ome out of such a collection of yeast into a gene will essentially disrupt that ll mutation. into a yeast gene There are at least two useful things to strains: (1) Any transposon that integrate gene and is likely to cause a n (2) For transposons that integrate is in frame with the genes coding regio Fusion protein N- -C Gene X encoded amino acids Mini-Tn7 encoded amino acids LacZ encoded amino acids Fusion Protein has β−galactosidase activity Fusion protein N- -C Gene X encoded amino acids Mini-Tn7 encoded amino acids LacZ encoded amino acids Fusion Protein has β−galactosidase activity becomes a r of that gene
Here are just two examples of how such a library can be used:(1)to identify genes that protect cells against a DNA damaging agent that causes cancer;lets take the example of one of the manymany compounds found in tobacco smoke;and(2)to identify genes whose transcription is up-regulated in response to being exposed to this tobacco smoke chemical. The chemical we'll use as an example is 4-(Methylnitrosoamino)-1-(3-pyridyl)-1- butanone (NNK).The yeast random insertion library is first plated out so that individual cells give rise to a CH colony;these colonies are then replicated onto test NNK plates.To screen the library for genes that protect against the cell killing that can be induced by NNK the colonies are replica plated onto agar medium that either does or does not contain a high dose of NNK.To screen the library for genes that are transcriptionally regulated in the presence of this nasty carcinogenic compound,the colonies are replica plated onto agar medium containing either X-gal alone or X-gal plus a low dose of NNK. Random library of Tn7lacz insertion mutants- Random library of Tn7lacZ insertion mutants Phenotypic screen for NNK sensitivity screen for NNK-regulated genes 0 0 O 0 0 0 Plus NNK +X-Gal o 0 X-Gal NNK Minus NNK high dose low dose 0 ● O NNK O 0 0 O sensitive 0 strain O Interesting colonies can be retrieved from the master plate for further study and for identification(and subsequent cloning)of the gene responsible for the interesting phenotype. Once we have identified a gene that is transcriptionally up or down regulated in response to an environmental change,how can we use genetics to figure out how regulation is achieved.This is the topic of the next lecture
identify g ing agent that causes can l tobacco s response The chemical we’ll use as an example is 4-(Methylnitrosoamino)-1-(3-pyridyl)-1- ary to a induced by NNK the colonies are replica plated not contain c ly r o re r n response to an environmental change, how can we use genetics to figure out Here are just two examples of how such a library can be used: (1) to enes that protect cells against a DNA damag cer; ets take the example of one of the many many compounds found in moke; and (2) to identify genes whose transcription is up-regulated in to being exposed to this tobacco smoke chemical. butanone (NNK). The yeast random insertion libr is first plated out so that individual cells give rise colony; these colonies are then replicated onto test plates. To screen the library for genes that protect against the cell killing that can be onto agar medium that either does or does reen the library for genes that are transcriptional f this nasty carcinogenic compound, the colonies a edium containing either X-gal alone or X-gal plus a low + X-Gal Random library of Tn7lacZ insertion mutants – screen for NNK-regulated genes + X-Gal Random library of Tn7lacZ insertion mutants – screen for NNK-regulated genes Random library of Tn7lacZ insertion mutants – Phenotypic screen for NNK sensitivity Plus NNK Random library of Tn7lacZ insertion mutants – Phenotypic screen for NNK sensitivity Plus NNK a high dose of NNK. To s egulated in the presence eplica plated onto agar m dose of NNK. Interesting colonies can be retrieved from for identification (and subsequent clo interesting phenotype. Once we have identified a gene that is NNK sensitive strain NNK sensitive strain the master plate for further study and ing) of the gene responsible for the transcriptionally up or down regulated in Minus N Minus NNK high dose X-Gal + NNK low dose how regulation is achieved. This is the topic of the next lecture