7.91-Lecture #5 Michael Yaffe Database Searching Molecular Phylogenetics ABCD ABc ((A,B)c)D)
7.91 – Lecture #5 Database Searching & Molecular Phylogenetics A A B B C D D (((A,B)C)D) C Michael Yaffe
Outline Distance matrix methods Neighbor-Joining Method and related Neighbor Methods · Maximum likelihood Parsimony Branch and bound Heuristic Seaching · Consensus Trees Software(PHYLIP, PAUP The tree of life
Outline • Distance Matrix Methods • Neighbor-Joining Method and Related Neighbor Methods • Maximum Likelihood • Parsimony Branch and Bound Heuristic Seaching • Consensus Trees • Software (PHYLIP, PAUP) • The Tree of Life
Transformed distance method UPGMA assumes constant rate Of evolution across all lineages-can lead to wrong tree topologies Can allow different rates of evolution across different lineages if you normalize using an external reference that diverged early.. e. use an outgroup Define dp =average distance a B d Between outgroup and all ingroups iD MiD )/2+d Now use d, to do the clustering basically just comes from the insight that ingroups evolved separately from each other ONLY AFTER they diverged from outgroup
Transformed Distance Method UPGMA assumes constant rate Of evolution across all lineages - can lead to wrong tree topologies Can allow different rates of evolution across different lineages if you normalize using an external reference that diverged early…i.e. use an outgroup Define dD =average distance A B C D Between outgroup and all ingroups d’ij = (dij –diD –djD)/2 + dD Now use d’ij to do the clustering ..basically just comes from the insight that ingroups evolved separately from each othe r ONLY AFTER they diverged from outgroup
EXample Species A B C B B is distance Between a and b C 8 D 15 10 A 6 Use d as outgroup 3 Species A B 2 B 10/3 C 16/316/3 dn=37/3 Now use UPgma to build tree
Example Species A B C B 9 dAB is distance Between A and B C 8 11 D 12 15 10 A B C D 6 Use D as outgroup 3 3 6 Species A B 2 1 B 10/3 C 16/3 16/3 dD = 37/3 Now use UPGMA to build tree
Neighbor's Relation Method Variant of UPGMA that pairs species in a way that creates a tree with minimal overall branch lengths Pairs of sequences separated by only 1 node are said to be neighbors. single central branch a e terminal branches D For this tree topology dac dbd= daD + dbc=a+b+C+d+ 2e=dAB dcD +2e For neighbor relations, four-point condition will be true dAB +dcd dac dBD ..and. daB+ dcd daD dBc So just have to consider all pairwise arrangements and determine which one satisfies the four-point condition
Neighbor’s Relation Method Variant of UPGMA that pairs species in a way that creates a tree with minimal overall branch lengths. Pairs of sequences separated by only 1 node are said to be neighbors. A B C D a b c d e terminal branches single central branch For this tree topology dAC + dBD = dAD + dBC = a + b + c + d + 2e =dAB + dCD +2e For neighbor relations, four-point condition will be true: dAB + dCD < dAC + dBD …and…dAB + dCD < dAD + dBC So just have to consider all pairwise arrangements and determine which one satisfies the four-point condition
Neighbor-Joining Methods Start with star-like tree. Find neighbors sequentially to minimize total length of all branches C B C D Studier& Kepler 1988 Q12=(N-2)d12-d1-d2 Where any 2 sequences can be 1 and 2 Try all possible sequence combinations. Whichever combination of pairs gives the smallest Q1 is the final tree
Neighbor-Joining Methods Start with star-like tree. Find neighbors sequentially to minimize total length of all branches A B D C D Studier & Kepler 1988: Q12=(N-2)d12 - Σ d1i - Σ d2i Where any 2 sequences can be 1 and 2 Try all possible sequence combinations. Whichever combination of pairs gives the smallest Q12 is the final tree! B A C
Maximum likelihood A purely statistical method Probablilities for every nucleotide substitution in a set of aligned sequences is considered Calculation of probabilities is complex since ancestor is unknown Test all possible trees and calculate the aggregate probablility Tree with single highest aggregate probablity is the most likely to reflect the true phylogenetic tree VERY COMPUTATONALLY INTENSE
Maximum Likelihood • A purely statistical method. • Probablilities for every nucleotide substitution in a set of aligned sequences is considered. • Calculation of probabilities is complex since ancestor is unknown • Test all possible trees and calculate the aggregate probablility. • Tree with single highest aggregate probablilty is the most likely to reflect the true phylogenetic tree. VERY COMPUTATIONALLY INTENSE
Parsimony Parsimony: a derogatory term from the 1930s and 1940s To describe someone who was especially careful with Spending money Biologically: Attach preference to an evolutionary pathway That minimizes the number of mutational events since (1)Mutations are rare events, and (2) The more unlikely events a model postulates, the less likely the model is to be true Parsimony: a character-based method, NoT a distance-based method
Parsimony Parsimony: a derogatory term from the 1930s and 1940s To describe someone who was especially careful with Spending money. Biologically: Attach preference to an evolutionary pathway That minimizes the number of mutational events since (1) Mutations are rare events, and (2) The more unlikely events a model postulates, the less l likely the model is to be true. Parsimony: a character-based method, NOT a distance-based method
Parsimony For parsimony analysis, positions in a sequence alignment fall into one of two categories: informative and uninformative Position Sequence 123456 GGGGGG GGG A G T 234 GG A T A G G A T C A T Only 3 possible unrooted trees you can make
Parsimony For parsimony analysis, positions in a sequence alignment fall into one of two categories: informative and uninformative. Position Sequence 1 2 3 4 5 6 1 G G G G G G 2 G G G A G T 3 G G A T A G 4 G A T C A T Only 3 possible unrooted trees you can make…