7.91 Amy Keating Comparing protein structures Why? detect evolutionary relationships identify recurring motifs detect structure/function relationships predict function assess predicted structures Classify structures-used for many purposes
Comparing Protein Structures Why? detect evolutionary relationships identify recurring motifs detect structure/function relationships predict function assess predicted structures classify structures - used for many purposes 7.91 Amy Keating
Algorithms for detecting structure similarity ynamIc Programming works on 1D strings-reduce problem to this cant accommodate topological changes example: Secondary Structure Alignment Program(SSAP) 3D Comparison/Clustering identify secondary structure elements or fragments look for a similar arrangement of these between different structures allows for different topology, large insertions example: Vector Alignment Search Tool (VAST) Distance matrix identify contact patterns of groups that are close together compare these for different structures fast. insensitive to insertions example: Distance ALIgnment Tool(DALI) Unit vector rMs lap structure to sphere of vectors minimize the difference between spheres fast. insensitive to outliers example: Matching Molecular Models obtained from Theory (MAMMOTH
Algorithms for detecting structure similarity Dynamic Programming - works on 1D strings - reduce problem to this - can’t accommodate topological changes - example: Secondary Structure Alignment Program (SSAP) 3D Comparison/Clustering - identify secondary structure elements or fragments - look for a similar arrangement of these between different structures - allows for different topology, large insertions - example: Vector Alignment Search Tool (VAST) Distance Matrix - identify contact patterns of groups that are close together - compare these for different structures - fast, insensitive to insertions - example: Distance ALIgnment Tool (DALI) Unit vector RMS - map structure to sphere of vectors - minimize the difference between spheres - fast, insensitive to outliers - example: Matching Molecular Models Obtained from Theory (MAMMOTH)
SSAP- Structure and Sequence Alignment Program How about using dynamic programming? Any problems here? Taylor orengo JMB (1989 )208, 1-22
SSAP - Structure and Sequence Alignment Program How about using dynamic programming? Any problems here? Taylor & Orengo JMB (1989) 208, 1-22
SSAP- Structure and Sequence Alignment Program How about using dynamic programming? Any problems here? 1. How will you evaluate if two positions are similar? Residue type expose to solvent secondary structure relationship to other atoms 2. Score that you give to an alignment of 2 residues depends on other residues alignment depends on SuPeRPosition but SuPeRPOSiTiON depends on ALIGNMENt Taylor, WR, and CA Orengo. " Protein Structure Alignment. J Mol Biol. 208, no. 1 (5 July 1989): 1-22
SSAP - Structure and Sequence Alignment Program How about using dynamic programming? Any problems here? 1. How will you evaluate if two positions are similar? Residue type expose to solvent secondary structure relationship to other atoms 2. Score that you give to an alignment of 2 residues depends on other residues ALIGNMENT depends on SUPERPOSITION but SUPERPOSITION depends on ALIGNMENT Taylor, WR, and CA Orengo. "Protein Structure Alignment." J Mol Biol. 208, no. 1 (5 Jul y 1989): 1-22
SSAP- Structure and Sequence Alignment Program For each pair of residues, i, assume their equivalence. How similar are their environments wrt other residues? c V E Q R R G○ A C V ik Ea/(ld -dkI b)i so s is large if d i and dk are similar Which j and i should you compare with each other? Images adapted from Taylor, WR, and CA Orengo. " Protein Structure Alignment. "J Mol Biol. 208, no. 1 (5 July 1989) 1-22
SSAP - Structure and Sequence Alignment Program For each pair of residues, (i,j), assume their equivalence. How similar are their environments wrt other residues? i C A M G G k H S H R R V F E CV sik = Σa/(|dij - dkl| + b); so s is large if dij and dkl are similar. Which j and l should you compare with each other? Images adapted from Taylor, WR, and CA Orengo. "Protein Structure Alignment." J Mol Biol. 208, no. 1 (5 Jul y 1989): 1-22
Answer: use the j s and l s that give the best score Vectors from atom k to HS ERR V F 1223 A 0210 C E 9-8O8 R H ∽ R A 021141 01125 F NOTE: this gives an aLignment of how the residues of sequence a align with those of sequence b, when viewed from the perspective of i and k BUT, Which i' s and k's should you compare?
Answer: use the j’s and l’s that give the best score Vectors from atom k to: i k H S E H R R V F C A M G G V Q H S E R R H V F 12 2 3 1 1 10 1 0 2 1 0 1 23 1 0 1 7 4 1 0 2 14 1 0 1 25 G Q Vectors from atom i to: V G M A C NOTE: this gives an ALIGNMENT of how the residues of sequence A align with those of sequence B, when viewed from the perspective of i and k. BUT, which i’s and k’s should you compare?
ALL OF THEM Then combine the results and take a consensus via another round of dynamic programming = double dynamic programming Vectors from k= F 101 0 Protein a Vectors from k= v 25
ALL OF THEM! Then combine the results and take a consensus via another round of dynamic programming = “double dynamic programming” Vectors from k = F Vectors from i = C Vectors from i = C 12 2 3 1 1 10 1 0 2 1 0 1 23 1 0 1 7 4 1 0 2 14 1 0 1 25 Protein A Protein B 28 21 10 4 27 12 15 14 25 2 5 Vectors from k = V 16 1 2 1 21 1 1 1 4 0 0 5 4 1 1 4 5 1 1 2 15 1 0 1 25 1
Instead of using distances, use vectors to include some directionality Sj=a(ldj-dkl b) (V-VA|+b); Can also include other information about residues i and k if desired (e.g. sequence or environment information) Si=(a+F(k)Vi- vx1l b)
Instead of using distances, use vectors to include some directionality sij = a/(|dij - dkl| + b); sij = a/(|V ij - V kl| + b); Can also include other information about residues i and k if desired (e.g. sequence or environment information) sij = (a + F(i,k)/(|V ij - V kl| + b);
It is important to assess whether detected similarities are SIGNIFICANT Various statistical criteria have been used General idea: How"surprising"is the discovery of a shared structure?
It is important to assess whether detected similarities are SIGNIFICANT. Various statistical criteria have been used. General idea: How “surprising” is the discovery of a shared structure?
Structural classification of proteins Structure VS structure comparisons(e.g. using DALI reveal related groups of proteins Structurally-similar proteins with detectable sequence homology are assumed to be evolutionarily related Similarities between non-homologous proteins suggest convergent evolution to a favorable or useful fold A number of different groups have proposed classification schemes SCOP (by hand CATH (uSes SSAP FSSP (uses Dali
Structural Classification of Proteins • Structure vs. structure comparisons (e.g. using DALI) reveal related groups of proteins • Structurally-similar proteins with detectable sequence homology are assumed to be evolutionarily related • Similarities between non-homologous proteins suggest convergent evolution to a favorable or useful fold • A number of different groups have proposed classification schemes – SCOP (by hand) – CATH (uses SSAP) – FSSP (uses Dali)