Molecular Biology Problem Solver: A Laboratory Guide. Edited by Alan S Gerstein opyright◎2001 ISBNS:0-471-37972-7( Paper);0-47 (Electronic) 6 Eukaryotic Expression JohnJ. Trill, Robert Kirkpatrick, Allan R Shatzman, and Alice Marcy Section aa Practical guide Planning the Eukaryotic Expr What Is the Intended Use of the protein and What Quantity Is Required What Do you know about the gene and the gene Product? Can You obtain the cdna? Expression Vector Design and Subcloning Selecting an Appropriate Expression Host 50l Selecting an Appropriate Expression Vector Implementing the Eukaryotic Expressio n Experimen 5| ledia Requirements, Gene Transfer, and Selection Scale-up and Harvest 5|4 Gene Expression Analysis 5|5 Troubleshoot 5l7 Confirm Sequence and vector Design Investigate Alternate Hosts 5|9 A Case Study of an Expressed Protein from cdNa to 5|9 ummary 52l Section B: Working with Baculovirus 52 49
491 16 Eukaryotic Expression John J. Trill, Robert Kirkpatrick, Allan R. Shatzman, and Alice Marcy Section A: A Practical Guide to Eukaryotic Expression . . . . . . 492 Planning the Eukaryotic Expression Project . . . . . . . . . . . . . . . 493 What Is the Intended Use of the Protein and What Quantity Is Required? . . . . . . . . . . . . . . . . . . . . . . . . 493 What Do You Know about the Gene and the Gene Product? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 496 Can You Obtain the cDNA? . . . . . . . . . . . . . . . . . . . . . . . . . . 497 Expression Vector Design and Subcloning . . . . . . . . . . . . . . . 498 Selecting an Appropriate Expression Host . . . . . . . . . . . . . . . 501 Selecting an Appropriate Expression Vector . . . . . . . . . . . . . 506 Implementing the Eukaryotic Expression Experiment . . . . . . . 511 Media Requirements, Gene Transfer, and Selection . . . . . . . 511 Scale-up and Harvest . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 514 Gene Expression Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 515 Troubleshooting . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 517 Confirm Sequence and Vector Design . . . . . . . . . . . . . . . . . . 517 Investigate Alternate Hosts . . . . . . . . . . . . . . . . . . . . . . . . . . . 519 A Case Study of an Expressed Protein from cDNA to Harvest . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 519 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 521 Section B: Working with Baculovirus . . . . . . . . . . . . . . . . . . . . . 521 Planning the Baculovirus Experiment . . . . . . . . . . . . . . . . . . . . . 521 Molecular Biology Problem Solver: A Laboratory Guide. Edited by Alan S. Gerstein Copyright © 2001 by Wiley-Liss, Inc. ISBNs: 0-471-37972-7 (Paper); 0-471-22390-5 (Electronic)
Is an Insect Cell System Suitable for the Expression of Your protein Should You express Your Protein in an insect Cell Line or Recombinant baculovirus? Procedures for Preparing Recombinant Baculovirus 524 Criteria for Selecting a Transfer Vector Which Insect Cell Host Is Most Appropriate for Your 525 Implementing the Baculovirus Experiment 527 What Special Considerations Are There for Expressin..527 What's the Best Approach to Scale-Up? What Special Considerations Are There for Expressing..527 Secreted Proteins? Glycosylated Proteins 528 What Are the Options for Expressing More Than One Protein? How Can You obtain maximal Protein yields? 529 x What ls the Best Way to Process Cells for Purification..530 Troubleshooting 530 Suboptimal Growth Conditions 530 Viral Production problems 53 Mutation Solubility Problems 532 532 Bibliography∴ 533 SECTION A: A PRACTICAL GUIDE TO EUKARYOTIC EXPRESSION Recombinant gene expression in eukaryotic systems is often the nly viable route to the large-scale production of authentic, po translationally modified proteins. It is becoming increasingly easy to find a suitable system to overexpress virtually any gene pre provided that it is properly engineered into an appropriate expres- sion vector. Commercially available systems provide a wide range of possibilities for expression in mammalian, insect, and lower eukaryotic hosts, each claiming the highest possible expression levels with the least amount of effort. Indeed, many of these systems do offer vast improvements in their ease of use and rapid end points over technologies available as re ago. In addition methods of transferring dna into cells have advanced in parallel enabling transfection efficiencies approach ing 100%. However, one still needs to carefully consider the most 492 Trill et al
Is an Insect Cell System Suitable for the Expression of Your Protein? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 521 Should You Express Your Protein in an Insect Cell Line or Recombinant Baculovirus? . . . . . . . . . . . . . . . . . . . . . . . . . . 522 Procedures for Preparing Recombinant Baculovirus . . . . . . 524 Criteria for Selecting a Transfer Vector . . . . . . . . . . . . . . . . . 524 Which Insect Cell Host Is Most Appropriate for Your Situation? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 525 Implementing the Baculovirus Experiment . . . . . . . . . . . . . . . . 527 What’s the Best Approach to Scale-Up? . . . . . . . . . . . . . . . . 527 What Special Considerations Are There for Expressing Secreted Proteins? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 527 What Special Considerations Are There for Expressing Glycosylated Proteins? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 528 What Are the Options for Expressing More Than One Protein? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 529 How Can You Obtain Maximal Protein Yields? . . . . . . . . . . . 529 What Is the Best Way to Process Cells for Purification?. . . 530 Troubleshooting . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 530 Suboptimal Growth Conditions . . . . . . . . . . . . . . . . . . . . . . . . 530 Viral Production Problems . . . . . . . . . . . . . . . . . . . . . . . . . . . . 531 Mutation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 531 Solubility Problems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 532 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 532 Bibliography . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 533 SECTION A: A PRACTICAL GUIDE TO EUKARYOTIC EXPRESSION Recombinant gene expression in eukaryotic systems is often the only viable route to the large-scale production of authentic, posttranslationally modified proteins. It is becoming increasingly easy to find a suitable system to overexpress virtually any gene product, provided that it is properly engineered into an appropriate expression vector. Commercially available systems provide a wide range of possibilities for expression in mammalian, insect, and lower eukaryotic hosts, each claiming the highest possible expression levels with the least amount of effort. Indeed, many of these systems do offer vast improvements in their ease of use and rapid end points over technologies available as recently as 5 to 10 years ago. In addition methods of transferring DNA into cells have advanced in parallel enabling transfection efficiencies approaching 100%. However, one still needs to carefully consider the most 492 Trill et al
appropriate vector and host system that is compatible with a par ticular expression need. This will largely depend on the type of protein being expressed (e.g, secreted, membrane-bound, or intracellular) and its intended use. No one system can or should be expected to meet all expression needs. n this section we will attempt to outline the critical ster involved in the planning and implementation of a successful eukaryotic expression project Planning the project will begin by answering pertinent questions such as what is known about the protein being expressed, what is its function, what is the intended use of the product, will the protein be tagged, how much protein is needed. and how soon will it be needed. Based on these con siderations, an appropriate host or vector system can be chosen that will best meet the anticipated needs. Considerations during the implementation phase of the pro- ject will include choosing the best method of gene transfer and stable selection compared to transient expression and selection methods for stable lines, and clonal compared to polyclonal selection. Finally, we will discuss anticipated outcomes from various methods, commonly encountered problems, and possible solutions to these problems. PLANNING THE EUKARYOTIC EXPRESSION PROJECT What Is the Intended use of the protein and what Quantity Is Required? Protein quantity is an important consideration, since substan tial time and effort are required to achieve gram quantities while production of 10 to 100 milligrams is often easily obtained from a few liters of cell culture. Therefore we tend to group the expressed proteins into the following three categories: target, reagent, and therapeutic protein. This is helpful both in choosing an appropri ate expression system and in determining how much is enough to meet immediate needs(Table 16.1) Targets Protein targets represent the majority of expressed proteins used in classical pharmaceutical drug discovery, which involves the configuration of a high-throughput screen(HTS) of a chemical or natural product library in order to find selective antagonists or agonists of the protein's biological activity. Protein targets include enzymes(e.g, kinases or proteases), receptors(e.g, 7 Eukaryotic Expression
appropriate vector and host system that is compatible with a particular expression need. This will largely depend on the type of protein being expressed (e.g., secreted, membrane-bound, or intracellular) and its intended use. No one system can or should be expected to meet all expression needs. In this section we will attempt to outline the critical steps involved in the planning and implementation of a successful eukaryotic expression project. Planning the project will begin by answering pertinent questions such as what is known about the protein being expressed, what is its function, what is the intended use of the product, will the protein be tagged, how much protein is needed, and how soon will it be needed. Based on these considerations, an appropriate host or vector system can be chosen that will best meet the anticipated needs. Considerations during the implementation phase of the project will include choosing the best method of gene transfer and stable selection compared to transient expression and selection methods for stable lines, and clonal compared to polyclonal selection. Finally, we will discuss anticipated outcomes from various methods, commonly encountered problems, and possible solutions to these problems. PLANNING THE EUKARYOTIC EXPRESSION PROJECT What Is the Intended Use of the Protein and What Quantity Is Required? Protein quantity is an important consideration, since substantial time and effort are required to achieve gram quantities while production of 10 to 100 milligrams is often easily obtained from a few liters of cell culture.Therefore we tend to group the expressed proteins into the following three categories: target, reagent, and therapeutic protein. This is helpful both in choosing an appropriate expression system and in determining how much is enough to meet immediate needs (Table 16.1). Targets Protein targets represent the majority of expressed proteins used in classical pharmaceutical drug discovery, which involves the configuration of a high-throughput screen (HTS) of a chemical or natural product library in order to find selective antagonists or agonists of the protein’s biological activity. Protein targets include enzymes (e.g., kinases or proteases), receptors (e.g., 7 Eukaryotic Expression 493
Table 16.1 Categories of Expressed Proteins Class of Protein Examples Expression Amount Appropriate System Enzymes and For screening: 10mg Stable insect receptors For structural Baculovirus studies: 100mg Mammalian Yeast Reagent Modifying <10mg Stable insect Baculovirus Enzyme Mammalian Yeast Therapeutic Therapeutic g/L Mammalian(CHO, Monoclonal Cytokine Hormone transmembrane, nuclear hormone, integrin), and their ligands and membrane transporters(e.g, ion channels). In basic terms, suffi cient quantities of a protein target need to be supplied in order to run the hTS. The actual amounts depend on the size of a given library to be screened and the number of hits that are obtained, which will then need to be further characterized as a rule of thumb, for purified proteins such as enzymes and receptor ligands, amounts around 10mg are usually needed to support the screen. or nonpurified proteins such as receptors, one needs to think in terms of cell number and the growth properties of the cell line. For most cell lines, screens are configured by plating between 100,000 to 300,000 cells per milliliter. By way of example, a typical screen of one million compounds in multiwell formats(e.g, 96 84, or 1536 well)could use between 0.5 to 1.5 x 10 cells.The smaller the volume of the screen, the fewer cells will be required Because protein targets require a finite amount of protein one has the flexibility of choosing from virtually any expression system. Consequently the selection of the system for producing a target protein really depends on considerations other than quantity. The most important goal is to achieve a product with the highest possible biological activity. This will enable a screen to be configured with the least amount of protein and will give the best chance of establishing a screen with the highest possible to background ratio. Other considerations include the of protein being expressed (e.g, intracellular, secreted, and membrane-associated proteins ). As discussed below, stable cell systems tend to be more amenable to secreted and membrane associated proteins, while intracellular proteins are often pro- 494 Trill et al
transmembrane, nuclear hormone, integrin), and their ligands and membrane transporters (e.g., ion channels). In basic terms, suffi- cient quantities of a protein target need to be supplied in order to run the HTS. The actual amounts depend on the size of a given library to be screened and the number of hits that are obtained, which will then need to be further characterized. As a rule of thumb, for purified proteins such as enzymes and receptor ligands, amounts around 10mg are usually needed to support the screen. For nonpurified proteins such as receptors, one needs to think in terms of cell number and the growth properties of the cell line. For most cell lines, screens are configured by plating between 100,000 to 300,000 cells per milliliter. By way of example, a typical screen of one million compounds in multiwell formats (e.g., 96, 384, or 1536 well) could use between 0.5 to 1.5 ¥ 109 cells. The smaller the volume of the screen, the fewer cells will be required. Because protein targets require a finite amount of protein, one has the flexibility of choosing from virtually any expression system. Consequently the selection of the system for producing a target protein really depends on considerations other than quantity. The most important goal is to achieve a product with the highest possible biological activity. This will enable a screen to be configured with the least amount of protein and will give the best chance of establishing a screen with the highest possible signal to background ratio. Other considerations include the type of protein being expressed (e.g., intracellular, secreted, and membrane-associated proteins). As discussed below, stable cell systems tend to be more amenable to secreted and membraneassociated proteins, while intracellular proteins are often pro- 494 Trill et al. Table 16.1 Categories of Expressed Proteins Class of Protein Examples Expression Amount Appropriate System Target Enzymes and For screening: 10 mg Stable insect receptors For structural Baculovirus studies: 100mg Mammalian Yeast Reagent Modifying <10mg Stable insect enzymes Baculovirus Enzyme Mammalian Substrates Yeast Therapeutic Therapeutic g/L Mammalian (CHO, Monoclonal myelomas) antibody (mAb) Cytokine Hormone
duced very efficiently from lytic systems such as baculovirus Whatever system is used, it should be scaled appropriately to meet the needs of hts subset of target proteins are those that are used studies. In order to grow crystals that are of sufficient quality to yield high-resolution structures, it is particularly imp begin with properly folded, processed, active protein. Proteins used for structural studies are often supplied at very high con centrations(5mg/ml)and must be free of heterogeneity glyco problematic because its addition and trimming tends to be heterogenous(Hsieh and Robbins, 1984; Kornfeld and Kornfeld, 1985). As a result it is often necessary to enzymatically remove some or all of the carbohydrate before crystals can be formed. As a starting point, one often needs approximately 10mg f absolutely pure protein so that crystallization conditions can be tested and optimized, with the total protein requirement often exceeding 100mg In order to avoid the issue of glycosylation in structural studies altogether, one can express the protein in a glycosylation-deficient host (Stanley, 1989). Alternatively one can remove glycosylation sites by site-directed mutagenesis prior to expression. Howeve these are very empirical methods that do not often work well for a variety of reasons, including the need in some cases to maintain glycosylation for proper solubility. Thus, for direct expression of a nonglycosylated protein, a first-pass expression approach would likely involve a bacterial system in which high level expression of nonglycosylated protein is more readily attained R A second category of expressed proteins is reagents. These are proteins that are not directly required to configure a screen bi are needed to either evaluate compounds in secondary assays or to help produce a target protein itself. Examples of reagent pro- teins include full-length substrates that are replaced by synthetic peptides for screening. Enzyme substrates themselves are often cleaved to produce biologically active species whose activities can be assessed in vitro. Reagent proteins can also include processing enzymes that are required for the in vitro activation of a purified protein (e.g, cleavage of a zymogen or phosphorylation by ar upstream activating kinase). Also included in this category are gene orthologues from species other than the one being used in the screen, whose expression will be used to support animal studies and to determine the cross-species selectivity or activity of Eukaryotic Expression
duced very efficiently from lytic systems such as baculovirus. Whatever system is used, it should be scaled appropriately to meet the needs of HTS. A subset of target proteins are those that are used for structural studies. In order to grow crystals that are of sufficient quality to yield high-resolution structures, it is particularly important to begin with properly folded, processed, active protein. Proteins used for structural studies are often supplied at very high concentrations (>5mg/ml) and must be free of heterogeneity. Glycosylation is often problematic because its addition and trimming tends to be heterogenous (Hsieh and Robbins, 1984; Kornfeld and Kornfeld, 1985). As a result it is often necessary to enzymatically remove some or all of the carbohydrate before crystals can be formed. As a starting point, one often needs approximately 10mg of absolutely pure protein so that crystallization conditions can be tested and optimized, with the total protein requirement often exceeding 100mg. In order to avoid the issue of glycosylation in structural studies altogether, one can express the protein in a glycosylation-deficient host (Stanley, 1989). Alternatively one can remove glycosylation sites by site-directed mutagenesis prior to expression. However, these are very empirical methods that do not often work well for a variety of reasons, including the need in some cases to maintain glycosylation for proper solubility. Thus, for direct expression of a nonglycosylated protein, a first-pass expression approach would likely involve a bacterial system in which high level expression of nonglycosylated protein is more readily attained. Reagents A second category of expressed proteins is reagents. These are proteins that are not directly required to configure a screen but are needed to either evaluate compounds in secondary assays or to help produce a target protein itself. Examples of reagent proteins include full-length substrates that are replaced by synthetic peptides for screening. Enzyme substrates themselves are often cleaved to produce biologically active species whose activities can be assessed in vitro. Reagent proteins can also include processing enzymes that are required for the in vitro activation of a purified protein (e.g., cleavage of a zymogen or phosphorylation by an upstream activating kinase). Also included in this category are gene orthologues from species other than the one being used in the screen, whose expression will be used to support animal studies and to determine the cross-species selectivity or activity of selected compounds. Eukaryotic Expression 495
Reagent proteins are usually required in much lower amounts han target proteins. Some can even be purchased commercially in sufficient quantities to meet the required need. Others, because of price or the required quantit expression. But, since only small quantities are usually required (<10mg), it is possible to choose an expression system with fea- tures that will favor efficient and rapid expression. Furthermore the expression scale can be minimized. The bottom line is that reagent proteins should be the least resource intensive to produce One should avoid trying to overproduce reagent proteins or scaling them to quantities that will never be used In contrast tic pre he most demanding in terms of resource. Therapeutic proteins have intrinsic biological properties like medical drugs. The ulti mate objective for expression of a therapeutic protein is the pro- duction of clinical-grade protein approaching or exceeding gram per liter quantities. For most expression systems this is not readily achievable. Other than bacterial and yeast expression, the most robust system for producing these levels is the Chinese hamster ovary(CHO)system. Due to the lack of proper post-translational modifications(e.g, glycosylation)in bacteria and yeast, CHO cell expression is often the only choice to achieve sufficient expres sion. Examples of therapeutic proteins, produced in CHO cells, include humanized monoclonal antibodies(Trill, Shatzman, and Ganguly, 1995), tPA (tissue plasminogen activator; Spellman et al 1989), and cytokines(Sarmiento et al., 1994). In many cases months are spent selecting and amplifying lines with appropriate growth properties and expression levels to meet production criteria What Do You know about the gene and the gene product Infor homologues or orthologues, enables one to make an educated guess as to what is the best eukaryotic expression system to use Is there anything published in the literature about the gene, or is it completely uncharacterized? Do we know in what tissue he gene is expressed, based on either Northern blot analysis or by quantitative or semiquantitative RT-PCR measures? Other factors to determine are whether the protein to be expressed is secreted, cytosolic, or membrane-bound. If it is a receptor, is it a homodimer, heterodimer, multimeric, single, or multispanning 496 Trill et al
Reagent proteins are usually required in much lower amounts than target proteins. Some can even be purchased commercially in sufficient quantities to meet the required need. Others, because of price or the required quantity, may necessitate recombinant expression. But, since only small quantities are usually required (<10mg), it is possible to choose an expression system with features that will favor efficient and rapid expression. Furthermore the expression scale can be minimized. The bottom line is that reagent proteins should be the least resource intensive to produce. One should avoid trying to overproduce reagent proteins or scaling them to quantities that will never be used. Therapeutics In contrast to reagent proteins, therapeutic protein agents are the most demanding in terms of resource. Therapeutic proteins have intrinsic biological properties like medical drugs. The ultimate objective for expression of a therapeutic protein is the production of clinical-grade protein approaching or exceeding gram per liter quantities. For most expression systems this is not readily achievable. Other than bacterial and yeast expression, the most robust system for producing these levels is the Chinese hamster ovary (CHO) system. Due to the lack of proper post-translational modifications (e.g., glycosylation) in bacteria and yeast, CHO cell expression is often the only choice to achieve sufficient expression. Examples of therapeutic proteins, produced in CHO cells, include humanized monoclonal antibodies (Trill, Shatzman, and Ganguly, 1995), tPA (tissue plasminogen activator; Spellman et al., 1989), and cytokines (Sarmiento et al., 1994). In many cases months are spent selecting and amplifying lines with appropriate growth properties and expression levels to meet production criteria. What Do You Know about the Gene and the Gene Product? Information about the gene product or for that matter, its homologues or orthologues, enables one to make an educated guess as to what is the best eukaryotic expression system to use. Is there anything published in the literature about the gene, or is it completely uncharacterized? Do we know in what tissue the gene is expressed, based on either Northern blot analysis or by quantitative or semiquantitative RT-PCR measures? Other factors to determine are whether the protein to be expressed is secreted, cytosolic, or membrane-bound. If it is a receptor, is it a homodimer, heterodimer, multimeric, single, or multispanning 496 Trill et al
transmembrane receptor or anchored to the surface(e. g, through a glycosyl phosphatidylinositol phosphate(GPI linkage) Fortunately we usually have the luxury of working with genes that are at least partially characterized by their biological prop- erties. But what about the genes of unknown origin or function? In this new age of genomics, many of the genes we obtain are like" genes, belonging to large families of related genes that share only a minimal percentage of homology with a known gene Despite these similarities there is often no way to know whether the same expression and purification methods used for one ortho- logue or homologue will be effective for another. Thus one is immediately faced with the challenging prospect of having to consider multiple expression strategies in order to get the protein expressed and purified to sufficient levels in an active form, in addition to not knowing what activity to look for Can You obtain the cdnA? Before embarking on an expression project you will need to locate a CDNA copy of the gene of interest. It is also possible in theory to express genomic DNA containing introns, provided that the expression host will recognize the proper splice junctions In practice, however, this is not often the most efficient route to expression because it is not usually known how the introns will affect expression levels or whether the desired splice variant will be expressed. Furthermore most mammalian genes are inter rupted by multiple intron sequences that can span many kilobases in length. This can make subcloning of genomic DNA consider ably more difficult than for the corresponding cDNA The three most common ways to obtain a known gene of interest include purchase from a distributor of clones from the Integrated Molecular Analysis of Genomes and their Expression(image)consortium(http:/image.liNlgov/),requests from a published source such as an academic lab, or RT-PCR cloning from RNa derived from a cell or tissue source. IMAGE clones can be found by performing a BLAST search of electronic database such as Gen Bank, which can be accessed at the National Library of Medicine PubMed browser (http://www.ncbinlm.nih.gov/pubmed/).Fromthereyou quickly determine if a sequence is present, if it is full ler publications related to this gene, and possible sources of the (tissue sources, personal contacts, etc). Most expressed sequence tags(EST's) matching the gene of interest are available as IMAGE clones. The trick is to find one that is full length. It is Eukaryotic Expression
transmembrane receptor or anchored to the surface (e.g., through a glycosyl phosphatidylinositol phosphate (GPI linkage). Fortunately we usually have the luxury of working with genes that are at least partially characterized by their biological properties. But what about the genes of unknown origin or function? In this new age of genomics, many of the genes we obtain are “like” genes, belonging to large families of related genes that share only a minimal percentage of homology with a known gene. Despite these similarities there is often no way to know whether the same expression and purification methods used for one orthologue or homologue will be effective for another. Thus one is immediately faced with the challenging prospect of having to consider multiple expression strategies in order to get the protein expressed and purified to sufficient levels in an active form, in addition to not knowing what activity to look for. Can You Obtain the cDNA? Before embarking on an expression project you will need to locate a cDNA copy of the gene of interest. It is also possible in theory to express genomic DNA containing introns, provided that the expression host will recognize the proper splice junctions. In practice, however, this is not often the most efficient route to expression because it is not usually known how the introns will affect expression levels or whether the desired splice variant will be expressed. Furthermore most mammalian genes are interrupted by multiple intron sequences that can span many kilobases in length. This can make subcloning of genomic DNA considerably more difficult than for the corresponding cDNA. The three most common ways to obtain a known gene of interest include purchase from a distributor of clones from the Integrated Molecular Analysis of Genomes and their Expression (IMAGE) consortium (http://image.llnl.gov/), requests from a published source such as an academic lab, or RT-PCR cloning from RNA derived from a cell or tissue source. IMAGE clones can be found by performing a BLAST search of an electronic database such as GenBank, which can be accessed at the National Library of Medicine PubMed browser (http://www.ncbi.nlm.nih.gov/PubMed/). From there you can quickly determine if a sequence is present, if it is full length, publications related to this gene, and possible sources of the gene (tissue sources, personal contacts, etc). Most expressed sequence tags (EST’s) matching the gene of interest are available as IMAGE clones. The trick is to find one that is full length. It is Eukaryotic Expression 497
easy to determine if an ESt is likely to contain a full-length sequence if it is derived from a directional oligo dT primed library and sequenced from the 5 end by searching for an ATG and an upstream stop codon. Once you identify a full-length EST, you should then be able to obtain the corresponding IMAGE clone from Incyte Genomics, Life Seq Public Incyte clones (http.//www.incyte.com/reagents/index.shtml),ResearchGenetics (http://www.resgen.com),ortheamEricanTypeCultureCollection (atcC,http://www.atcc.org).Ifthegeneispublishedyoucanalso try contacting the author who cloned it in order to obtain a cDNA clone. Most labs, including both academic and pharmaceutical/ biotech companies, will honor a request for a cDNA clone if it is published. Alternatively, you may consider deriving the gene de novo by rT-PCR using the sequence obtained above Depending on the size, abundance, and tissue distribution of the nRNA, a PCR approach could be straightforward or complex One may isolate RNA from tissue, generate cDNA from the RnA using reverse transcriptase, design PCR primers to perform PCR, and fish out the gene of interest. Alternatively, one may simply purchase a cDNA library from which to PCR amplify the gene Several vendors carry a wide array of high-quality cDNA libraries derived from human and animal tissues. For example, CDNA libraries for virtually every major human or murine tissue/organ canbeobtainedfromInvitrogen(http://www.invitrogen.com./ atalog_project/index.htmlorClontech(http://www.clontech.comm products/catalog/Libraries/index. html). These companies obtain heir samples from sources under Federal Guidelines. i Expression Vector Design and Subcloning Perhaps the most critical step in the process of expressing a gene is the vector design and subcloning. As much an art as a science, it nevertheless requires complete precision. In many cases you will need to amplify the gene by PCR from RNA. If the gene is in a library, you may also need to trim the 5 and 3 UTR (untranslated region) and to add restriction sites and/or a signal sequence if one is not already present. You may also want to add sEditor's note: In addition to the planning recommended by the authors, it is wise to ask commercial suppliers of expression systems about the existence of patents relating to the components of an expression vector(i.e, promoters) or the use of proteins produced by a patented expression vector/system Trill et al
easy to determine if an EST is likely to contain a full-length sequence if it is derived from a directional oligo dT primed library and sequenced from the 5¢ end by searching for an ATG and an upstream stop codon. Once you identify a full-length EST, you should then be able to obtain the corresponding IMAGE clone from Incyte Genomics, LifeSeq Public Incyte clones (http://www.incyte.com/reagents/index.shtml), Research Genetics (http://www.resgen.com), or the American Type Culture Collection (ATCC, http://www.atcc.org). If the gene is published, you can also try contacting the author who cloned it in order to obtain a cDNA clone. Most labs, including both academic and pharmaceutical/ biotech companies, will honor a request for a cDNA clone if it is published. Alternatively, you may consider deriving the gene de novo by RT-PCR using the sequence obtained above. Depending on the size, abundance, and tissue distribution of the mRNA, a PCR approach could be straightforward or complex. One may isolate RNA from tissue, generate cDNA from the RNA using reverse transcriptase, design PCR primers to perform PCR, and fish out the gene of interest. Alternatively, one may simply purchase a cDNA library from which to PCR amplify the gene. Several vendors carry a wide array of high-quality cDNA libraries derived from human and animal tissues. For example, cDNA libraries for virtually every major human or murine tissue/organ can be obtained from Invitrogen (http://www.invitrogen.com./ catalog_project/index.html) or Clontech (http://www.clontech.com/ products/catalog/Libraries/index.html). These companies obtain their samples from sources under Federal Guidelines.* Expression Vector Design and Subcloning Perhaps the most critical step in the process of expressing a gene is the vector design and subcloning. As much an art as a science, it nevertheless requires complete precision. In many cases you will need to amplify the gene by PCR from RNA. If the gene is in a library, you may also need to trim the 5¢ and 3¢ UTR (untranslated region) and to add restriction sites and/or a signal sequence if one is not already present. You may also want to add 498 Trill et al. *Editor’s note: In addition to the planning recommended by the authors, it is wise to ask commercial suppliers of expression systems about the existence of patents relating to the components of an expression vector (i.e., promoters) or the use of proteins produced by a patented expression vector/system
epitope tags for detection and purification(e.g, His tag). When PCR is involved, the gene will eventually need to be entirely re sequenced in order to rule out PCR-induced mutations that can occur at a low frequency. If mutations are found, they will need to be repaired, thereby adding to the time required to generate the final expression construct. The best practice is to start with a high-fidelity polymerase with a proofreading (3-5 exonuclease activity) function to avoid PCR errors. Sequence information If you are lucky enough to obtain a dnA from a known source a new litany of questions will need to be answered Is a sequence and restriction map available? Do you know what vector the gene has been cloned into? Has the gene been sequenced in its entirety? How much do you trust the source from which you have received the gene? It is usually best to have the gene re-sequenced so that you know the junctions and restriction sites and can assure yourself that you are indeed working with the correct gene. What do you do if there are differences between your sequence and the published sequence? You will need to decide if the difference is due to a mutation, an artifact from the PCr reaction, a gene poly morphism, or an error in the published sequence. A search of an eSt database coupled with a comparison with genes of other species can help distinguish whether the error is in the data- base or due to a polymorphism. Alternatively, sequencing multi ple, independently derived clones can help answer the ese questions. Control Regions We now have a gene with a confirmed sequence. But which control regions are present? Does the gene contain a Kozak sequence, 5-GCCA/GCCAUGG-3, required to promote effi cient translational initiation of the open reading frame (ORF) in a vertebrate host(Kozak, 1987) or an equivalent sequence 5-CAAAACAUG-3 for expression in an insect host( Cavener, 1987)? If this sequence is missing, it is essential to add it to your expression vector. It is also advisable to trim the gene to remove ny unnecessary sequences upstream of the ATG. The 5non coding regions may contain sequences(e.g, upstream ATG's or secondary structures) that may inhibit translation from the actual start. A noncoding sequence at the 3 end may destabilize the message Eukaryotic Expression
epitope tags for detection and purification (e.g., His6 tag). When PCR is involved, the gene will eventually need to be entirely resequenced in order to rule out PCR-induced mutations that can occur at a low frequency. If mutations are found, they will need to be repaired, thereby adding to the time required to generate the final expression construct. The best practice is to start with a high-fidelity polymerase with a proofreading (3¢–5¢ exonuclease activity) function to avoid PCR errors. Sequence Information If you are lucky enough to obtain a DNA from a known source, a new litany of questions will need to be answered. Is a sequence and restriction map available? Do you know what vector the gene has been cloned into? Has the gene been sequenced in its entirety? How much do you trust the source from which you have received the gene? It is usually best to have the gene re-sequenced so that you know the junctions and restriction sites and can assure yourself that you are indeed working with the correct gene. What do you do if there are differences between your sequence and the published sequence? You will need to decide if the difference is due to a mutation, an artifact from the PCR reaction, a gene polymorphism, or an error in the published sequence. A search of an EST database coupled with a comparison with genes of other species can help distinguish whether the error is in the database or due to a polymorphism. Alternatively, sequencing multiple, independently derived clones can also help answer these questions. Control Regions We now have a gene with a confirmed sequence. But which control regions are present? Does the gene contain a Kozak sequence, 5¢-GCCA/GCCAUGG-3¢, required to promote effi- cient translational initiation of the open reading frame (ORF) in a vertebrate host (Kozak, 1987) or an equivalent sequence 5¢-CAAAACAUG-3¢ for expression in an insect host (Cavener, 1987)? If this sequence is missing, it is essential to add it to your expression vector. It is also advisable to trim the gene to remove any unnecessary sequences upstream of the ATG. The 5¢ noncoding regions may contain sequences (e.g., upstream ATG’s or secondary structures) that may inhibit translation from the actual start. A noncoding sequence at the 3¢ end may destabilize the message. Eukaryotic Expression 499
Epitope Tags and Cleavage sites Another sequence you might need to add to your gene isan tope tag or a fusion partner with or without a protease cleav age site. This will aid in the identification of your protein product (via Western blot, ELISA, or immunofluorescence)and assist in protein purification. Among the various epitope tags available are FLAG(DYKDDDDK)(Hopp et al., 1988), influenza hema glutinin or HA (YPYDVPDYA)(Niman et al., 1983), His6 (HHHHHH)(Lilius et aL., 1991), and c-myc (EQKLISEEDL) (Evan et aL., 1985). The more popular protease cleavage sites, used to remove the tag from the protein, are thrombin(VPR'GS) (Chang, 1985), factor Xa(IEGR: Nagai and Thogersen, 1984) Pre Scission protease(LEvlFQ'GR; Cordingley et al., 1990), and enterokinase (DDDDK; Matsushima et al., 1994)One use larger fusion pal choose a protease that is not predicted to rtners such as the Fc region of human Iggl or GST. It is crucial to cleave within the protein itself, but this does not preclude spur ous cleavages The benefits and drawbacks of utilizing epitope tags are dis- cussed in greater detail below in the section, "Gene Expression Subcloning Your gene is now ready to be cloned into an expression vector of your choice, provided that you have already decided what system to use. This will traditionally involve the use of restriction enzymes to precisely excise the gene on a DNA fragment, which is subsequently ligated into a donor expression vector at the sam or compatible sites. If appropriate unique restriction sites are not located in flanking regions they can be added by PCr (incorpo rating the sequence onto the end of the amplification primer ),or by site-directed mutagenesis. Recent technological advances also offer the possibility of subcloning without restriction enzymes. These new age cloning systems are based on recombinase-mediated gene transfer. Invit- rogen offers ECHOM and Gateway cloning technologies, while Clontech markets the Creator gene cloning and expression Recombinases essentially perform restriction and liga- le step thereby eliminating the time-consuming of purifying restriction fragments for subcloning and lig ating them. These new systems are particularly advantageous when transferring the same gene into multiple expression vectors for expression in different host systems Trill et al
Epitope Tags and Cleavage Sites Another sequence you might need to add to your gene is an epitope tag or a fusion partner with or without a protease cleavage site. This will aid in the identification of your protein product (via Western blot, ELISA, or immunofluorescence) and assist in protein purification. Among the various epitope tags available are FLAG® (DYKDDDDK) (Hopp et al., 1988), influenza hemaglutinin or HA (YPYDVPDYA) (Niman et al., 1983), His6 (HHHHHH) (Lilius et al., 1991), and c-myc (EQKLISEEDL) (Evan et al., 1985).The more popular protease cleavage sites, used to remove the tag from the protein, are thrombin (VPR’GS) (Chang, 1985), factor Xa (IEGR’; Nagai and Thogersen, 1984), PreScission protease (LEVLFQ’GR; Cordingley et al., 1990), and enterokinase (DDDDK’; Matsushima et al., 1994) One may also use larger fusion partners such as the Fc region of human IgG1 or GST. It is crucial to choose a protease that is not predicted to cleave within the protein itself, but this does not preclude spurious cleavages. The benefits and drawbacks of utilizing epitope tags are discussed in greater detail below in the section, “Gene Expression Analysis.” Subcloning Your gene is now ready to be cloned into an expression vector of your choice, provided that you have already decided what system to use. This will traditionally involve the use of restriction enzymes to precisely excise the gene on a DNA fragment, which is subsequently ligated into a donor expression vector at the same or compatible sites. If appropriate unique restriction sites are not located in flanking regions they can be added by PCR (incorporating the sequence onto the end of the amplification primer), or by site-directed mutagenesis. Recent technological advances also offer the possibility of subcloning without restriction enzymes. These new age cloning systems are based on recombinase-mediated gene transfer. Invitrogen offers ECHOTM and GatewayTM cloning technologies, while Clontech markets the CreatorTM gene cloning and expression system. Recombinases essentially perform restriction and ligation in a single step, thereby eliminating the time-consuming process of purifying restriction fragments for subcloning and ligating them. These new systems are particularly advantageous when transferring the same gene into multiple expression vectors for expression in different host systems. 500 Trill et al