Molecular Biology Problem Solver: A Laboratory Guide. Edited by Alan S Gerstein opyright◎2001 ISBNS:0-471-37972-7( Paper);0-47 (Electronic) E coli Expression Systems Peter a. bell Expression Vector Structure What Makes a Plasmid an Expression Vector? Is a Stronger Promoter Always Desirable Why Do Promoters Leak and What Can You Do What Factors Affect the level of translation? 464 What Can Affect the Stability of the Protei the cell? Which Protein Expression System Suits Your Needs? Track Record What Do You Know about the Gene to Be Expressed?..465 Very Promising. What Levels of Expression snould o..468 What Do You know about Your protein? Advertisements for Commercial Expression Vectors Ar :r60 47 Why Should You Select a Fusion System? 471 When Should You Avoid a Fusion System? Is It Necessary to Cleave the Tag off the Fusion Protein? 474 Will Extra Amino acid Residues Affect Your protein of Interest after Digestion? Working with Expression Systems 475 What Are the Options for Cloning a Gene for Expression?... 475
461 15 E. coli Expression Systems Peter A. Bell Expression Vector Structure . . . . . . . . . . . . . . . . . . . . . . . . . . . . 462 What Makes a Plasmid an Expression Vector? . . . . . . . . . . . 462 Is a Stronger Promoter Always Desirable? . . . . . . . . . . . . . . 463 Why Do Promoters Leak and What Can You Do about It? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 464 What Factors Affect the Level of Translation? . . . . . . . . . . . . 464 What Can Affect the Stability of the Protein in the Cell? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 464 Which Protein Expression System Suits Your Needs? . . . . . . . 465 Track Record . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 465 What Do You Know about the Gene to Be Expressed? . . . 465 What Do You Know about Your Protein? . . . . . . . . . . . . . . . 468 Advertisements for Commercial Expression Vectors Are Very Promising. What Levels of Expression Should You Expect? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 470 Which E. coli Strain Will Provide Maximal Expression for Your Clone? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 471 Why Should You Select a Fusion System? . . . . . . . . . . . . . . . 471 When Should You Avoid a Fusion System? . . . . . . . . . . . . . . . 472 Is It Necessary to Cleave the Tag off the Fusion Protein? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 474 Will Extra Amino Acid Residues Affect Your Protein of Interest after Digestion? . . . . . . . . . . . . . . . . . . . . . . . . . . . . 475 Working with Expression Systems . . . . . . . . . . . . . . . . . . . . . . . 475 What Are the Options for Cloning a Gene for Expression? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 475 Molecular Biology Problem Solver: A Laboratory Guide. Edited by Alan S. Gerstein Copyright © 2001 by Wiley-Liss, Inc. ISBNs: 0-471-37972-7 (Paper); 0-471-22390-5 (Electronic)
Is Screening Necessary Prior to Expression 476 What Aspects of Growth and Induction Are Critical to Success? What Are the Options for Lysing Cells? Troubleshooting No Expression of the Protein The Protein Is Expressed, but It Is Not the Expected Size Based on Electrophoretic Analysis The protein is Insoluble. now what? 48 Solubility Is Essential. What Are Your Options? The Protein Is Made, but Very Little Is Full-Length; Most of It Is Cleaved to Smaller fragments 483 Your Fusion Protein Wont Bind to Its Affinity Resin 484 Your Fusion Protein Won't Digest Cleavage of the Fusion Protein with a Protease Produced tra bands 485 Extra Protein Bands Are Observed after Affinity Must the Protease Be Removed after Digestion of the..485 Purification Fusion protein? Bibliography.... Over the past decade the variety of hosts and vector systems for recombinant protein expression has increased dramatically Researchers now select from among mammalian, insect, yeast, and prokaryotic hosts, and the number of vectors available for use in these organisms continues to grow. With the increased availabil ding sequencing information, it is certain that these and other, yet to be developed systems will be important in the future. Despite the development of eukaryotic systems, E coli remains the most widely used host for recombi nant protein expression. E. coli is easy to transform, grows quickly in simple media, and requires inexpensive equipment for growth and storage. And in most cases, E coli can be made to produce adequate amounts of protein suitable for the intended application The purpose of this chapter is to guide the user in selecting the appropriate host and troubleshooting the process of recombinant protein expression. EXPRESSION VECTOR STRUCTURE What Makes a Plasmid an Expression Vector? Vectors for expression in E. coli contain at a minimum the following elements Bell
Is Screening Necessary Prior to Expression? . . . . . . . . . . . . 476 What Aspects of Growth and Induction Are Critical to Success? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 478 What Are the Options for Lysing Cells? . . . . . . . . . . . . . . . . 479 Troubleshooting . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 480 No Expression of the Protein . . . . . . . . . . . . . . . . . . . . . . . . . 480 The Protein Is Expressed, but It Is Not the Expected Size Based on Electrophoretic Analysis . . . . . . . . . . . . . . . . . . . 480 The Protein Is Insoluble, Now What? . . . . . . . . . . . . . . . . . . 481 Solubility Is Essential. What Are Your Options? . . . . . . . . . . . 482 The Protein Is Made, but Very Little Is Full-Length; Most of It Is Cleaved to Smaller Fragments . . . . . . . . . . . 483 Your Fusion Protein Won’t Bind to Its Affinity Resin . . . . . . 484 Your Fusion Protein Won’t Digest . . . . . . . . . . . . . . . . . . . . . 485 Cleavage of the Fusion Protein with a Protease Produced Several Extra Bands . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 485 Extra Protein Bands Are Observed after Affinity Purification . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 485 Must the Protease Be Removed after Digestion of the Fusion Protein? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 486 Bibliography . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 486 Over the past decade the variety of hosts and vector systems for recombinant protein expression has increased dramatically. Researchers now select from among mammalian, insect, yeast, and prokaryotic hosts, and the number of vectors available for use in these organisms continues to grow. With the increased availability of cDNAs and protein coding sequencing information, it is certain that these and other, yet to be developed systems will be important in the future. Despite the development of eukaryotic systems, E. coli remains the most widely used host for recombinant protein expression. E. coli is easy to transform, grows quickly in simple media, and requires inexpensive equipment for growth and storage. And in most cases, E. coli can be made to produce adequate amounts of protein suitable for the intended application. The purpose of this chapter is to guide the user in selecting the appropriate host and troubleshooting the process of recombinant protein expression. EXPRESSION VECTOR STRUCTURE What Makes a Plasmid an Expression Vector? Vectors for expression in E. coli contain at a minimum, the following elements: 462 Bell
Table 15.1 Characteristics of Popular Prokaryotic Promoters Regulation/Inducer Promoter (Concentration LacUv5 Lactose operon lacI/IPTG Strong (0.1-1mM) Ip trpR 3- Strong beta-indoleacrylic Hybrid of -35 lacI/IPTG Strong Trp and -10 (0.1-1mM) P Phage lambda ambda cl Strong Phage T5 lacl (2 operators)/ Strong IPTG(0.1-1 mM) Arabinose Arabinose Ara badlarabinose (1m-10mM) T7 phage RNA lacI/m ery strong (0.1-1mM) A transcriptional promoter. A ribosome binding site. a translation initiation site A selective marker(e.g, antibiotic resistance) An origin of replication In general, things that affect these can affect the level of protein xpression. At a minimum, transcription promoters in E. coli consist of two dna hexamers located -35 and -10 relative to the transcriptional start site. Together these elements mediate binding of the about 500kDa multimeric complex of Rna polymerase e. Suppliers of vectors for expression have selected highly active, inducible promoter sequences, and there is usually little need to be concerned until a problem is encountered in expression. A list of the commonly used promoters and their regulation is shown in Table 15.1 Is a Stronger Promoter Always Desirable? A strong promoter may not be best for all situations Over- production of RNa may saturate translation machinery, and maximizing RNa synthesis may not be desirable or necessary. A weaker promoter may actually give higher steady-state levels of soluble, intact protein than one that is rapidly induced E coli Expression Systems
• A transcriptional promoter. • A ribosome binding site. • A translation initiation site. • A selective marker (e.g., antibiotic resistance). • An origin of replication. In general, things that affect these can affect the level of protein expression. At a minimum, transcription promoters in E. coli consist of two DNA hexamers located -35 and -10 relative to the transcriptional start site.Together these elements mediate binding of the about 500kDa multimeric complex of RNA polymerase. Suppliers of vectors for expression have selected highly active, and inducible promoter sequences, and there is usually little need to be concerned until a problem is encountered in expression. A list of the commonly used promoters and their regulation is shown in Table 15.1. Is a Stronger Promoter Always Desirable? A strong promoter may not be best for all situations. Overproduction of RNA may saturate translation machinery, and maximizing RNA synthesis may not be desirable or necessary. A weaker promoter may actually give higher steady-state levels of soluble, intact protein than one that is rapidly induced. E. coli Expression Systems 463 Table 15.1 Characteristics of Popular Prokaryotic Promoters Regulation/Inducer Promoter (Concentration) Strength LacUV5 Lactose operon lacI/IPTG Strong (0.1–1 mM) Trp Tryptophan trpR 3- Strong operon beta-indoleacrylic acid Tac Hybrid of -35 lacI/IPTG Strong Trp and -10 (0.1–1 mM) lac promoter PL Phage lambda Lambda cI Strong repressor/heat Phage T5 T5 phage lacI (2 operators)/ Strong IPTG (0.1–1mM) Arabinose Arabinose AraBAD/arabinose Variable operon (1mm–10 mM) T7 T7 phage RNA lacI/IPTG Very strong polymerase (0.1–1 mM)
Why Do Promoters Leak and what can You do about It? Most promoters will have some background activity. Promoters regulated by the lactose operator/repressor will drive a small amount of transcription in the absence of added inducer(e.g IPTG). To minimize this leakage, 10% glucose can be added to the medium to repress the lactose induction pathway, the growth tem- perature can be reduced to 15 to 30C, and a minimal medium that contains no trace amounts of lactose can be used. promoter leakage is only a problem when the expressed protein is highly toxic to the cells The tightly regulated T7 promoter has very low background due to the low levels of T7 RNa polymerase made in the absence of inducer (in specifically engineered host cells such as BL21 (DE3/pLysS). It has been estimated that the fold induction of transcription in the t7 driven pET vector system is greater than 1000, while the magnitude of induction obtained with lac repres- sor regulated promoters is generally about 50-fold What Factors Affect the level of translation? Translation can be affected by nucleotides adjacent to the atG nitiator codon, the amino acid residue immediately following the initiator, and secondary structures in the vicinity of the start site. Most commercially available vectors for expression use optimal ATG and Shine-Dalgarno sequences. Secondary structures in the mRNa contributed by the gene of interest can prevent ribosome binding(Tessier et al., 1984; Looman et al., 1986; Lee et al., 1987) In addition the downstream box aauCacaaaGug found after the initiator codon in many bacterial genes can also enhance translation initiation. Conversion of the amino terminal sequence of the gene of interest to one that comes close to this consensus may improve the rate of translation of the mRNA(Etchegaray and Inouye, 1999) What Can Affect the Stability of the Protein in the Cell? One of the first steps in protein degradation in E coli is the catalyzed removal of the amino terminal methionine residue. This reaction, catalyzed by methionyl aminopeptidase, occurs more slowly when the amino acid in the +2 position has a larger side chain(Hirel et al., 1989; Lathrop et al., 1992). When the methionine residue is intact, the protein will be stable to all but endopeptidase cleavage. Tobias et al.(1991)have determined the relationship between a protein's amino terminal amino acid 464 Bell
Why Do Promoters Leak and What Can You Do about It? Most promoters will have some background activity. Promoters regulated by the lactose operator/repressor will drive a small amount of transcription in the absence of added inducer (e.g., IPTG).To minimize this leakage, 10% glucose can be added to the medium to repress the lactose induction pathway, the growth temperature can be reduced to 15 to 30°C, and a minimal medium that contains no trace amounts of lactose can be used. Promoter leakage is only a problem when the expressed protein is highly toxic to the cells. The tightly regulated T7 promoter has very low background due to the low levels of T7 RNA polymerase made in the absence of inducer (in specifically engineered host cells such as BL21 (DE3)/pLysS). It has been estimated that the fold induction of transcription in the T7 driven pET vector system is greater than 1000, while the magnitude of induction obtained with lac repressor regulated promoters is generally about 50-fold. What Factors Affect the Level of Translation? Translation can be affected by nucleotides adjacent to the ATG initiator codon, the amino acid residue immediately following the initiator, and secondary structures in the vicinity of the start site. Most commercially available vectors for expression use optimal ATG and Shine-Dalgarno sequences. Secondary structures in the mRNA contributed by the gene of interest can prevent ribosome binding (Tessier et al., 1984; Looman et al., 1986; Lee et al., 1987). In addition, the downstream box AAUCACAAAGUG found after the initiator codon in many bacterial genes can also enhance translation initiation. Conversion of the amino terminal sequence of the gene of interest to one that comes close to this consensus may improve the rate of translation of the mRNA (Etchegaray and Inouye, 1999). What Can Affect the Stability of the Protein in the Cell? One of the first steps in protein degradation in E. coli is the catalyzed removal of the amino terminal methionine residue. This reaction, catalyzed by methionyl aminopeptidase, occurs more slowly when the amino acid in the +2 position has a larger side chain (Hirel et al., 1989; Lathrop et al., 1992). When the methionine residue is intact, the protein will be stable to all but endopeptidase cleavage. Tobias et al. (1991) have determined the relationship between a protein’s amino terminal amino acid 464 Bell
and its stability in bacteria, that is, the N-end rule. They reported protein half-lives of only 2 minutes when the following amino acids were present at the amino terminus: Arg, Lys, Phe Leu, Trp, and Tyr. In contrast, all other amino acids conferred half-lives of >10 hours when present at the amino terminus of the protein examined. This suggests that one should examine the sequence to be expressed for the residue in the +2 position. If the residue is among those that destabilize the protein, it may be worth the effort to change this residue to one that confers stability WHICH PROTEIN EXPRESSION SYSTEM SUITS YOUR NEEDS? Track record What systems are currently used in the laboratory or by others in the field? If the protein coding sequence of interest is well characterized, and the protein or its close relatives have been expressed successfully by others in the field, it is wise to try the same expression system. Go with what has worked in the past If nothing else, results obtained using the familiar system will serve as a starting point. As an example, most of the recombinant expression of mammalian src homology SH2 protein interaction domains has been done using the pGeX vector series, and sim ilar examples of preferred systems are found in other fields of research. If little is known about the protein to be expressed, it is best to take stock of what information there is before entering the lab Before beginning any experimentation, it is wise to answer the following question What Do You Know about the gene to Be Expressed Source In general, simple globular proteins from prokaryotic and eukaryotic sources are good candidates for expression in e coli Monomeric proteins with few cysteines or prosthetic gr heme and metals)and of average size(<60kDa) will likely give good production. Secreted eukaryotic proteins and membrane bound proteins, especially those with several transmembrane domains, are likely to be problematic in E. coli Solubility of recombinant proteins in E coli can also be estimated by a math- ematical analysis of the amino acid sequences (wilkinson and Harrison. 1991 E coli Expression System 465
and its stability in bacteria, that is, the “N-end rule.”They reported protein half-lives of only 2 minutes when the following amino acids were present at the amino terminus: Arg, Lys, Phe, Leu, Trp, and Tyr. In contrast, all other amino acids conferred half-lives of >10 hours when present at the amino terminus of the protein examined. This suggests that one should examine the sequence to be expressed for the residue in the +2 position. If the residue is among those that destabilize the protein, it may be worth the effort to change this residue to one that confers stability. WHICH PROTEIN EXPRESSION SYSTEM SUITS YOUR NEEDS? Track Record What systems are currently used in the laboratory or by others in the field? If the protein coding sequence of interest is well characterized, and the protein or its close relatives have been expressed successfully by others in the field, it is wise to try the same expression system. Go with what has worked in the past. If nothing else, results obtained using the familiar system will serve as a starting point. As an example, most of the recombinant expression of mammalian src homology SH2 protein interaction domains has been done using the pGEX vector series, and similar examples of preferred systems are found in other fields of research. If little is known about the protein to be expressed, it is best to take stock of what information there is before entering the lab. Before beginning any experimentation, it is wise to answer the following question: What Do You Know about the Gene to Be Expressed? Source In general, simple globular proteins from prokaryotic and eukaryotic sources are good candidates for expression in E. coli. Monomeric proteins with few cysteines or prosthetic groups (e.g., heme and metals) and of average size (<60kDa) will likely give good production. Secreted eukaryotic proteins and membranebound proteins, especially those with several transmembrane domains, are likely to be problematic in E. coli. Solubility of recombinant proteins in E. coli can also be estimated by a mathematical analysis of the amino acid sequences (Wilkinson and Harrison, 1991). E. coli Expression Systems 465
Presence of a Start Codon Some expression vectors provide the start codon for translation tiation, while others rely on the start codon of the gene you're rying to express. Note that in E coli, 5 to 12 base pairs or less sep arate the ribosome binding site and the start codon. So you would incorporate this requirement into your cloning strategy when the start codon is provided by the gene you plan to express GC Content Coding sequences with high GC (70%) content may reduce level of expression of a protein in E coli. Check the sequence ing a DNa analysis program Codon usage may also affect the level of protein expression. If the gene of interest contains codons not commonly used in E coli, low expression may result due to the depletion of tRNAs for the rarer codons. When one or more rare codons is encountered ranslational pausing may result, slowing the rate of protein synthesis and exposing the mRNA to degradation. This potential problem is of particular concern when the sequence encodes a protein >60kDa, when rare codons are found at high frequency or when multiple rare codons are found over a short distance of the coding sequence. For example, rare codons for arginine found in tandem can create a recognition sequence for ribosome binding(e.g,_AGGAGG) that closely approximates a Shine Dalgarno sequence UAAGGAGG. This may bind ribosomes non productively and block translation from the bona fide ribosome binding site(RBS)at the initiator codon further upstream Nonetheless, the appearance of a rare codon does not necessarily lead to poor expression. It is best to try expression of the native gene, and then make changes if these seem warranted later Strate gies include mutating the gene of interest to use optimal codons for the host organism, and co-transforming the host with rare tRNA genes. In one example, introduction into the E coli host of a rare arginine(AGG) tRNA resulted in a several-fold increase in the expression of a protein that uses the AGG codon(Hua et al., 1994). In another case, substitution of the rare arginine codon AGG with the E. coli-preferred CGU improved expression Robinson et aL., 1984). Other work has shown that rare codons account for decreased expression of the gene of interest in E. coli (Zhang, Zubay, and Goldman, 1991; Sorensen, Kurland, and Pederson, 1989). Rare codons may have an even more dramatic 466 Bell
Presence of a Start Codon Some expression vectors provide the start codon for translation initiation, while others rely on the start codon of the gene you’re trying to express. Note that in E. coli, 5 to 12 base pairs or less separate the ribosome binding site and the start codon. So you would incorporate this requirement into your cloning strategy when the start codon is provided by the gene you plan to express. GC Content Coding sequences with high GC (>70%) content may reduce the level of expression of a protein in E. coli. Check the sequence using a DNA analysis program. Codon Usage Codon usage may also affect the level of protein expression. If the gene of interest contains codons not commonly used in E. coli, low expression may result due to the depletion of tRNAs for the rarer codons. When one or more rare codons is encountered, translational pausing may result, slowing the rate of protein synthesis and exposing the mRNA to degradation. This potential problem is of particular concern when the sequence encodes a protein >60kDa, when rare codons are found at high frequency, or when multiple rare codons are found over a short distance of the coding sequence. For example, rare codons for arginine found in tandem can create a recognition sequence for ribosome binding (e.g., _AGGAGG) that closely approximates a ShineDalgarno sequence UAAGGAGG.This may bind ribosomes nonproductively and block translation from the bona fide ribosome binding site (RBS) at the initiator codon further upstream. Nonetheless, the appearance of a rare codon does not necessarily lead to poor expression. It is best to try expression of the native gene, and then make changes if these seem warranted later. Strategies include mutating the gene of interest to use optimal codons for the host organism, and co-transforming the host with rare tRNA genes. In one example, introduction into the E. coli host of a rare arginine (AGG) tRNA resulted in a several-fold increase in the expression of a protein that uses the AGG codon (Hua et al., 1994). In another case, substitution of the rare arginine codon AGG with the E. coli-preferred CGU improved expression (Robinson et al., 1984). Other work has shown that rare codons account for decreased expression of the gene of interest in E. coli (Zhang, Zubay, and Goldman, 1991; Sorensen, Kurland, and Pederson, 1989). Rare codons may have an even more dramatic 466 Bell
effect on translation when they occur close to the initiator codon Chen and Inouye, 1990). While codon usage is not the only or most important factor, be aware that it may influence translation Secondary Structure Secondary structures that occur near the start codon may block translation initiation(Gold et al., 1981; Buell et al., 1985) or serve as translation pause sites resulting in premature termi nation and truncated protein. These can be found using DNA or RNA analysis software Structures with clear stem structures greater than eight bases long may be disrupted by site-specific mutation or by making all or a portion of the coding sequence synthetically Depending on the size of the gene, and the importance of obtaining high-expression levels, it may be worth synthesizing the gene. This has been generally done by synthesizing overlapping oligonucleotides that when annealed can be extended using PCr and ligated to form the full-length coding sequence. There are several examples where this approach has been used to optimize codon usage for E coli (Koshiba et al., 1999; Beck von Bodman et aL., 1986). In addition, if one takes on the work and expense of synthesizing a gene, secondary structures in the predicted RnA that might stall translation can be removed and sites for restric tion endonucleases can be introduced Size of a Gene or protein As a rule, very large(>100k Da)and very small(<5kDa) pro teins are more difficult to express in E coli. Small polypeptides with little secondary structure tend to be rapidly degraded in E. coli Degradation can be minimized by expressing such short oligopeptides as concatemers with proteolytic or chemical cleav age sites in between the monomeric units(Hostomsky, Smrt, and Paces, 1985). Short peptides are also successfully expressed as fusion proteins. Fusion with GST, MalB or other larger, well folded partners will tend to stabilize a short peptide, making expression possible and purification relatively simple. One publi- cation has shown MBP to be superior to other large fusion pro- teins at stabilizing short polypeptides(Kapust and Waugh, 1999) At the other extreme, proteins that are above 60kDa are best made using smaller affinity tags, such as FLAG, his, or on their own, without any fusion. While there is no clear upper limit, the larger the protein, the lower the yield is likely to be E coli Expression System 467
effect on translation when they occur close to the initiator codon (Chen and Inouye, 1990). While codon usage is not the only or most important factor, be aware that it may influence translation efficiency. Secondary Structure Secondary structures that occur near the start codon may block translation initiation (Gold et al., 1981; Buell et al., 1985), or serve as translation pause sites resulting in premature termination and truncated protein. These can be found using DNA or RNA analysis software. Structures with clear stem structures greater than eight bases long may be disrupted by site-specific mutation or by making all or a portion of the coding sequence synthetically. Depending on the size of the gene, and the importance of obtaining high-expression levels, it may be worth synthesizing the gene. This has been generally done by synthesizing overlapping oligonucleotides that when annealed can be extended using PCR and ligated to form the full-length coding sequence. There are several examples where this approach has been used to optimize codon usage for E. coli (Koshiba et al., 1999; Beck von Bodman et al., 1986). In addition, if one takes on the work and expense of synthesizing a gene, secondary structures in the predicted RNA that might stall translation can be removed, and sites for restriction endonucleases can be introduced. Size of a Gene or Protein As a rule, very large (>100kDa) and very small (<5kDa) proteins are more difficult to express in E. coli. Small polypeptides with little secondary structure tend to be rapidly degraded in E. coli. Degradation can be minimized by expressing such short oligopeptides as concatemers with proteolytic or chemical cleavage sites in between the monomeric units (Hostomsky, Smrt, and Paces, 1985). Short peptides are also successfully expressed as fusion proteins. Fusion with GST, MalB or other larger, wellfolded partners will tend to stabilize a short peptide, making expression possible and purification relatively simple. One publication has shown MBP to be superior to other large fusion proteins at stabilizing short polypeptides (Kapust and Waugh, 1999). At the other extreme, proteins that are above 60kDa are best made using smaller affinity tags, such as FLAG, his6, or on their own, without any fusion. While there is no clear upper limit, the larger the protein, the lower the yield is likely to be. E. coli Expression Systems 467
What Do You know about your protein? Cysteine There are many things that E. coli does not do well, or at all. If the protein of interest is naturally multimeric, or requires post ranslational modifications for activity, E coli as an expression host may be a poor choice. Disulfide bonds, formed between two cysteines in an expressed protein, are made inefficiently in the reducing environment of the E. coli cytoplasm(Bessette et al 1999: Derman et al., 1993). If the protein is produced, and can be purified from E. coli, in vitro oxidation of the cysteines may be tried(Dodd et aL., 1995). Alternatively, the gene of interest can be cloned in a vector that includes a signal sequence(e.g, OmpA, genelll, and phoA) that will direct the recombinant protein to the relatively oxidizing environment of the periplasm of E coli, where disulfide formation is more efficient Strains of e coli that are defi cient in thioredoxin reductase (trxB)permit proper disulfide on in the cytoplasm(Derman et al., 1993; Yasukawa et al 1995). Subsequent work has produced strains that lack both trxB and glutathione oxidoreductase and give better rates of disulfide formation than those seen in native E. coli periplasm(Bessette et al., 1999) Membrane bound If the protein to be expressed is naturally associated with mem brane and/or has at least one transmembrane domain addition of a secretion signal to the amino terminus may help to maxi- mize expression of functional protein. Signal sequences, about 20 residues long are derived from proteins that naturally are secreted into the periplasmic space, such as pelB, OmpA, OmpT, Male, alkaline phosphatase(phoA), or genelll of filamentous phage (Izard and Kendall, 1994). Protein with an amino terminal signal will be directed to the inner membrane of E coli, and the carboxy terminal portion of the protein will be translocated into th periplasmic space. Depending on the hydrophobicity of the protein of interest, it may not translocate entirely into the periplasm but remain associated with the inner membrane Secretion may help stabilize proteins from proteolytic attack(Pines and Inouye, 1999) or at least can reduce aggregation of hydrophobic proteins in the ytoplasm, and minimize inclusion body formation. Because of the redu nvironment of the periplasmic space, proteins that contain one or more disulfide bonds are best secreted The presence of an N-terminal signal sequence appears to 468 Bell
What Do You Know about Your Protein? Cysteines There are many things that E. coli does not do well, or at all. If the protein of interest is naturally multimeric, or requires posttranslational modifications for activity, E. coli as an expression host may be a poor choice. Disulfide bonds, formed between two cysteines in an expressed protein, are made inefficiently in the reducing environment of the E. coli cytoplasm (Bessette et al., 1999; Derman et al., 1993). If the protein is produced, and can be purified from E. coli, in vitro oxidation of the cysteines may be tried (Dodd et al., 1995). Alternatively, the gene of interest can be cloned in a vector that includes a signal sequence (e.g., OmpA, geneIII, and phoA) that will direct the recombinant protein to the relatively oxidizing environment of the periplasm of E. coli, where disulfide formation is more efficient. Strains of E. coli that are defi- cient in thioredoxin reductase (trxB) permit proper disulfide formation in the cytoplasm (Derman et al., 1993; Yasukawa et al., 1995). Subsequent work has produced strains that lack both trxB and glutathione oxidoreductase and give better rates of disulfide formation than those seen in native E. coli periplasm (Bessette et al., 1999). Membrane Bound If the protein to be expressed is naturally associated with membrane and/or has at least one transmembrane domain, addition of a secretion signal to the amino terminus may help to maximize expression of functional protein. Signal sequences, about 20 residues long are derived from proteins that naturally are secreted into the periplasmic space, such as pelB, OmpA, OmpT, MalE, alkaline phosphatase (phoA), or geneIII of filamentous phage (Izard and Kendall, 1994). Protein with an amino terminal signal will be directed to the inner membrane of E. coli, and the carboxy terminal portion of the protein will be translocated into the periplasmic space.Depending on the hydrophobicity of the protein of interest, it may not translocate entirely into the periplasm but remain associated with the inner membrane. Secretion may help stabilize proteins from proteolytic attack (Pines and Inouye, 1999), or at least can reduce aggregation of hydrophobic proteins in the cytoplasm, and minimize inclusion body formation. Because of the reducing environment of the periplasmic space, proteins that contain one or more disulfide bonds are best secreted. The presence of an N-terminal signal sequence appears to 468 Bell
be necessary but not sufficient to direct a target protein to the periplasm. Translocation across the outer membrane and into the growth medium is inefficient. In most cases target proteins found in the growth medium are the result of damage to the cell enve- pe and do not represent true secretion (Stader and Silhavy, 1990). Translocation across the inner cell membrane of E coli is incompletely understood (reviewed by Wickner, Driessen, and Hartl, 1991), and the efficiency of export will depend on the indi idual target protein. Currently the export cannot be predicted based on protein sequence, although some generalizations have been made about the sequence immediately following the sigr peptide(Boyd and vith. 1990: Yamane and mizushima 1988). Therefore it is possible to find target proteins in the cyto- plasm(with uncleaved signal sequence) or in the periplasm in partially processed form, in place of or in addition to the expected periplasmic processed species. In some cases the proportion of protein that is exported can be increased by lowering the tem- perature 15 to 30oC during induction Post-translational Modification E. coli does not glycosylate or phosphorylate proteins or cognize proteolytic processing signals from eukaryotes, so take this into account when designing the cloning strategy. If proteolytic processing is needed, it is best to express only the coding sequences for the fully processed protein. If the protein of interest requires glycosylation for activity, and full activity is important in the final der a eukaryotic host, such Pichia, insect cells, or mammalian cells. Is the protein potentially Toxic? Consider whether the protein of interest is likely to have a tox effect on the host cell. Where the function of the protein is known this can be guessed at with some accuracy. For example, non specific proteases, nucleases, or pore-forming membrane proteins might all be expected to have some toxic effect on E coli. Expres sion of toxic proteins may be very low, and there will be strong selective pressure on cells to eliminate the gene of interest by point mutation to change the translation frame, insertion of a stop codon, or change in an amino acid residue critical to the proteins function. Larger deletion of parts of the plasmid may also be seen. If there is a suggestion that the gene product will be toxic, use an expression vector with a tightly regulated promoter (e.g, T7, PET E coli Expression Systems 469
be necessary but not sufficient to direct a target protein to the periplasm. Translocation across the outer membrane and into the growth medium is inefficient. In most cases target proteins found in the growth medium are the result of damage to the cell envelope and do not represent true secretion (Stader and Silhavy, 1990). Translocation across the inner cell membrane of E. coli is incompletely understood (reviewed by Wickner, Driessen, and Hartl, 1991), and the efficiency of export will depend on the individual target protein. Currently the export cannot be predicted based on protein sequence, although some generalizations have been made about the sequence immediately following the signal peptide (Boyd and Beckwith, 1990; Yamane and Mizushima, 1988). Therefore it is possible to find target proteins in the cytoplasm (with uncleaved signal sequence) or in the periplasm in partially processed form, in place of or in addition to the expected periplasmic processed species. In some cases the proportion of protein that is exported can be increased by lowering the temperature 15 to 30°C during induction. Post-translational Modification E. coli does not glycosylate or phosphorylate proteins or recognize proteolytic processing signals from eukaryotes, so take this into account when designing the cloning strategy. If proteolytic processing is needed, it is best to express only the coding sequences for the fully processed protein. If the protein of interest requires glycosylation for activity, and full activity is important in the final use, consider a eukaryotic host, such as Pichia, insect cells, or mammalian cells. Is the Protein Potentially Toxic? Consider whether the protein of interest is likely to have a toxic effect on the host cell.Where the function of the protein is known, this can be guessed at with some accuracy. For example, nonspecific proteases, nucleases, or pore-forming membrane proteins might all be expected to have some toxic effect on E. coli. Expression of toxic proteins may be very low, and there will be strong selective pressure on cells to eliminate the gene of interest by point mutation to change the translation frame, insertion of a stop codon, or change in an amino acid residue critical to the protein’s function. Larger deletion of parts of the plasmid may also be seen. If there is a suggestion that the gene product will be toxic, use an expression vector with a tightly regulated promoter (e.g., T7, pET E. coli Expression Systems 469
vectors). Minimize propagation of the cells to avoid opportunities for mutation and recombination Each requirement placed on a recombinant protein will affect the choice of expression system. If a protein is to be used only to prepare antibody, it need not be soluble or active, and the pro- duction of inclusion bodies(aggregates of improperly folded protein)in E. coli may be all that is needed. Alternatively, if a proteins biological activity will be assayed, or if it is to be used in structural studies(NMR, crystallography, etc. ) a properly folded and soluble form will be required Will Structural Changes(Additional or Fewer Amino Acids) Affect Your Application? Depending on the way that a gene is inserted in an expression vector, additional sequences may be added to the clone, and these may lead to extra amino acid residues at the N-or C-termini of the final expressed protein. In many cases these will have no dele terious effect, but if structural studies or precise comparisons to a native protein are to be done, it is wise to eliminate amino acids added by cloning steps. PCR amplification is the most commonly used method to generate inserts for expression, and proper desigN of PCR primers can eliminate most or all additional residues in he protein. Is the Sequence of Your protein Recognized by Specific Proteases? If you plan to express your gene in a fusion vector that prov an internal protease cleavage site for removal of the affinity tag (discussed below ), check that your native protein is not recognized by the protease. Most proteases are highly specific, but thrombin has a variety of secondary cleavage sites( Chang, 1985). Advertisements for Commercial Expression vectors Are Very Promising. What Levels of Expression Should You Expect? There are several systems available for protein expressio mammalian, insect, yeast, and E. coli. While it is impossible to predict the yields of protein from these systems for any given protein, some rough guidelines can be given. For any vector it is possible that no expression will be seen! Reported yields in stably transfected mammalian cells are in the range of 1 to 100 ug/10 470 Bell
vectors). Minimize propagation of the cells to avoid opportunities for mutation and recombination. Must Your Protein Be Functional? Each requirement placed on a recombinant protein will affect the choice of expression system. If a protein is to be used only to prepare antibody, it need not be soluble or active, and the production of inclusion bodies (aggregates of improperly folded protein) in E. coli may be all that is needed. Alternatively, if a protein’s biological activity will be assayed, or if it is to be used in structural studies (NMR, crystallography, etc.), a properly folded and soluble form will be required. Will Structural Changes (Additional or Fewer Amino Acids) Affect Your Application? Depending on the way that a gene is inserted in an expression vector, additional sequences may be added to the clone, and these may lead to extra amino acid residues at the N- or C-termini of the final expressed protein. In many cases these will have no deleterious effect, but if structural studies or precise comparisons to a native protein are to be done, it is wise to eliminate amino acids added by cloning steps. PCR amplification is the most commonly used method to generate inserts for expression, and proper design of PCR primers can eliminate most or all additional residues in the protein. Is the Sequence of Your Protein Recognized by Specific Proteases? If you plan to express your gene in a fusion vector that provides an internal protease cleavage site for removal of the affinity tag (discussed below), check that your native protein is not recognized by the protease. Most proteases are highly specific, but thrombin has a variety of secondary cleavage sites (Chang, 1985). Advertisements for Commercial Expression Vectors Are Very Promising.What Levels of Expression Should You Expect? There are several systems available for protein expression in mammalian, insect, yeast, and E. coli. While it is impossible to predict the yields of protein from these systems for any given protein, some rough guidelines can be given. For any vector it is possible that no expression will be seen! Reported yields in stably transfected mammalian cells are in the range of 1 to 100mg/106 470 Bell