7.91 Amy Keating Ab initio structure prediction Protein Design
Ab Initio Structure Prediction & Protein Design 7.91 Amy Keating
Ab initio prediction Ab initio = from the beginning"; in strictest sense uses first principles, not information about other protein structures In practice all methods rely on empirical observations about other structures Force fields Knowledge-based scoring functions Training sets Fragment structures a good review: Bonneau, R, and D Baker. "Ab Initio Protein Structure Prediction: Progress and Prospects. "Rev Biophys Biomol Struct 30(2001):17389
Ab initio prediction • Ab initio = “from the beginning”; in strictest sense uses first principles, not information about other protein structures • In practice, all methods rely on empirical observations about other structures – Force fields – Knowledge-based scoring functions – Training sets – Fragment structures A good review: Bonneau, R, and D Baker. "Ab Initio Protein Structure Prediction: Progress and Prospects." Rev Biophys Biomol Struct. 30 (2001): 173-89
Approaches to ab initio folding Full Md with explicit solvation( e.g. IBM Blue Gene) VERY expensive May not work Reduced complexity models No side chains(sometimes no main chain atoms either!) Reduced degrees of freedom On-or off-lattice Generally have a solvation-based score and a knowledge based residue-residue interaction term Sometimes used as first step to prune the enormous conformational space then resolution is increased for later fine-tuning
Approaches to ab initio folding • Full MD with explicit solvation (e.g. IBM Blue Gene) – VERY expensive – May not work • Reduced complexity models – No side chains (sometimes no main chain atoms either!) – Reduced degrees of freedom – On- or off-lattice – Generally have a solvation-based score and a knowledgebased residue-residue interaction term – Sometimes used as first step to prune the enormous conformational space, then resolution is increased for later fine-tuning
ROSETTA- the most successful approach to ab initio prediction David Baker, U. Washington, Seattle Based on the idea that the possible conformations of any short peptide fragment(3-9 residues) are well represented by the structures it is observed to adopt in the pdb Generate a library of different possible structures for each sequence segment Search the possible combinations of these for ones that are protein-like by various criteria
ROSETTA - the most successful approach to ab initio prediction • David Baker, U. Washington, Seattle • Based on the idea that the possible conformations of any short peptide fragment (3-9 residues) are wellrepresented by the structures it is observed to adopt in the pdb • Generate a library of different possible structures for each sequence segment • Search the possible combinations of these for ones that are protein-like by various criteria
ROSETTA fragment libraries Remove all homologs of the protein to be modeled(25% sequence identity) For each 9 residue segment in the target use sequence similarity and secondary structure similarity(compare predicted secondary stucture for target to fragment secondary structure) to select w 25 fragments Because secondary structure is influenced by tertiary structure, ensure that the fragments span different secondary structures The extent to which the fragments cluster around a consensus structure is correlated with how good a model the fragment is likely to be for the target LSERTVARS①e
ROSETTA fragment libraries • Remove all homologs of the protein to be modeled (>25% sequence identity) • For each 9 residue segment in the target, use sequence similarity and secondary structure similarity (compare predicted secondary stucture for target to fragment secondary structure) to select ~25 fragments • Because secondary structure is influenced by tertiary structure, ensure that the fragments span different secondary structures • The extent to which the fragments cluster around a consensus structure is correlated with how good a model the fragment is likely to be for the target LSERTVARS
ROSETTA search al algorithm Monte Carlo/ simulated Annealing Structures are assembled from fragments by Begin with a fully extended chain Randomly replace the conformation of one 9 residue segment with the conformation of one of its neighbors in the library Evaluate the move: Accept or reject based on an energy function Make another random move After a prescribed number of cycles switch to 3- residue fragment moves
ROSETTA search algorithm Monte Carlo/Simulated Annealing • Structures are assembled from fragments by: – Begin with a fully extended chain – Randomly replace the conformation of one 9 residue segment with the conformation of one of its neighbors in the library – Evaluate the move: Accept or reject based on an energy function – Make another random move… – After a prescribed number of cycles, switch to 3- residue fragment moves
ROSETTA scoring function p(sequence structure p(structure sequence)=p(structure)x (sequence) sequence is constant need to estimate for decoys built from fragments Main contributions to P(structure) secondary structure packing (e.g. ensureβ- strands formβ- sheets Vdw packing Simons et al. PROTEINs(1999)34, 82-95
ROSETTA scoring function P (structure | sequence) = P (structure) × P ( sequence | structure ) P( sequence) sequence is constant need to estimate for decoys built from fragments Main contributions to P(structure) - secondary structure packing (e.g. ensure β-strands form β-sheets) - VdW packing Simons et al. PROTEINS (1999) 34, 82-95
Native-like structures have characteristic secondary structure packing Example: b-strand dipeptide vector Simons, KT, I Ruczinski, C Kooperberg, BA Fox, C Bystroff, and D Baker. "Improved Recognition of Native-like Protein Structures using A Combination of Sequence-dependent and Sequence-independent Features of Proteins. "Proteins 34, no. 1(1 January 1999): 82-95
Native-like structures have characteristic secondary structure packing Example: b-strand dipeptide vector Simons, KT, I Ruczinski, C Kooperberg, BA Fox, C Bystroff, and D Baker. "Improved Recognition of Native-like Protein Structures using A Combination of Sequence-dependent and Sequence-independent Features of Proteins." Proteins 34, no. 1 (1 January 1999): 82-95
B-strand packing geometry can detect native-like structures Simons, KT, I Ruczinski, C Kooperberg, BA Fox, C Bystroff, and D Baker. "Improved Recognition of Native-like Protein Structures using A Combination of Sequence-dependent and Sequence- independent Features of Proteins. Proteins 34, no. 1(1 January 1999): 82-95
Simons, KT, I Ruczinski, C Kooperberg, BA Fox, C Bystroff, and D Baker. "Improved Recognition of Native-like Protein Structures using A Combination of Sequence-dependent and Sequence-independent Features of Proteins." Proteins 34, no. 1 (1 January 1999): 82-95. β-strand packing geometry can detect native-like structures
ROSETTA scoring function P(structure sequence)=P(structure)x p(sequence structure) sequence sequence is constant need to estimate for decoys built from fragments
ROSETTA scoring function P (structure | sequence) = P (structure) × P ( sequence | structure ) P( sequence) sequence is constant need to estimate for decoys built from fragments