Scoring system should: favor matching identical or related amino acids Penalize for poor matches and for gaps To get a good scoring system need to know: how often a particular amino acid Pair is found in related proteins compared with its occurence by chance. This Is the information contained in the substitution matrix …… and when a gap would be a better choice Deriving realistic substitution matrices First need to know frequency of one amino acid substituting for another In related proteins plab)] c/w the chance that substituting one for the other occurred by chance based on the relative frequencies of each amino acid in proteins, gla) and q(b). Call this theodds ratio" P()q(b If we do this for all positions in an alignment, then the total probablilty will be the product of the odds ratios at each position. but multiplication is computationally expensive. .SO. take the log(odds ratio) and add them instead Matrices like PAM and BLOSUM matrices are derived from these log odds ratios And contain positive and negative numbers reflecting likelihood of amino Acid substitutions in related proteinsScoring system should: favor matching identical or related amino acids Penalize for poor matches and for gaps. To get a good scoring system need to know: how often a particular amino acid Pair is found in related proteins compared with its occurence by chance. This Is the information contained in the substitution matrix …..….and when a gap would be a better choice Deriving realistic substitution matrices: First need to know frequency of one amino acid substituting for another In related proteins [=P(ab)] c/w the chance that substituting one for the other occurred by chance, based on the relative frequencies of each amino acid in proteins, q(a) and q(b). Call this the “odds ratio”: P(ab)/q(a)q(b) If we do this for all positions in an alignment, then the total probablilty will be the product of the odds ratios at each position….but multiplication is computationally expensive….so….take the log (odds ratio) and add them instead. Matrices like PAM and BLOSUM matrices are derived from these log odds ratios And contain positive and negative numbers reflecting likelihood of amino Acid substitutions in related proteins