Introduction to algorithms 6.046J/18,401J/SMA5503 Lecture 15 Prof charles e. leiserson
Introduction to Algorithms 6.046J/18.401J/SMA5503 Lecture 15 Prof. Charles E. Leiserson
Dynamic programming Design technique, like divide-and-conquer Example: Longest Common Subsequence (LCs) Given two sequences x[l.. m] and yll.n], find a longest subsequence common to them both a' not the 2 :A CB DA B BCBA :B D C A B A LCS(, y) functional notation but not a function c 2001 by Charles E Leiserson Introduction to Agorithms Day 26 L15.2
© 2001 by Charles E. Leiserson Introduction to Algorithms Day 26 L15.2 Dynamic programming Design technique, like divide-and-conquer. Example: Longest Common Subsequence (LCS) • Given two sequences x[1 . . m] and y[1 . . n], find a longest subsequence common to them both. x:A B C B D A B y:B D C A B A “a” not “the” BCBA = LCS(x, y) functional notation, but not a function
Brute-force LCS algorithm Check every subsequence of[1. m to see if it is also a subsequence of[l Analysis Checking =O(n) time per subsequence 2m subsequences of x(each bit-vector of length m determines a distinct subsequence OIX Worst-case running time=O(n2m exponential time c 2001 by Charles E Leiserson Introduction to Agorithms Day 26 L15.3
© 2001 by Charles E. Leiserson Introduction to Algorithms Day 26 L15.3 Brute-force LCS algorithm Check every subsequence of x[1 . . m] to see if it is also a subsequence of y[1 . . n]. Analysis • Checking = O(n) time per subsequence. • 2m subsequences of x (each bit-vector of length m determines a distinct subsequence of x). Worst-case running time = O(n2m) = exponential time
Towards a better algorithm Simplification 1. Look at the length of a longest-common subsequence 2. Extend the algorithm to find the lcs itself. Notation: Denote the length of a sequence s Strategy: Consider prefixes of x and y Define ci,j=LCs([1.i,yll.i Then, cm, n]=LCS(x, y) c 2001 by Charles E Leiserson Introduction to Agorithms Day 26 L15.4
© 2001 by Charles E. Leiserson Introduction to Algorithms Day 26 L15.4 Towards a better algorithm Simplification: 1. Look at the length of a longest-common subsequence. 2. Extend the algorithm to find the LCS itself. Strategy: Consider prefixes of x and y. • Define c[i, j] = | LCS(x[1 . . i], y[1 . . j])|. • Then, c[m, n] = | LCS(x, y)|. Notation: Denote the length of a sequence s by |s|
Recursive formulation Theorem c门=c+1=1+1 ifx=yl] max c[i-1,1, c[i,j-1]) otherwise Proof. Case x[=yU小 X y etz.k=LCS(x[.il,yll.D, where ci,jI k. Then, =k=xi, or else z could be extended Thus, zll. k-l is Cs ofx. i-1 and yll.j-11 c 2001 by Charles E Leiserson Introduction to Agorithms Day 26 L15.5
© 2001 by Charles E. Leiserson Introduction to Algorithms Day 26 L15.5 Recursive formulation Theorem. c[i, j] = c[i–1, j–1] + 1 if x[i] = y[j], max{c[i–1, j], c[i, j–1]} otherwise. Let z[1 . . k] = LCS(x[1 . . i], y[1 . . j]), where c[i, j] = k. Then, z[k] = x[i], or else z could be extended. Thus, z[1 . . k–1] is CS of x[1 . . i–1] and y[1 . . j–1]. Proof. Case x[i] = y[j]: L 1 2 i m L 1 2 j n x: y: =
Proof (continued) Clain:[1..k1=LCs(x1.+-1],yl1.-1]) Suppose w is a longer CS ofx[1. i-1 and y[l.j-1, that is, w>k-1. Then, cut and paste:wzk(w concatenated with z[)is a common subsequence of x[1. i] and yll.i with ll=[k]I>k Contradiction, proving the claim Thus, ci-1,j-1=k-1, which implies that ci, j1 c[i-1,-1]+1. Other cases are similar. c 2001 by Charles E Leiserson Introduction to Agorithms Day 26 L15.6
© 2001 by Charles E. Leiserson Introduction to Algorithms Day 26 L15.6 Proof (continued) Claim: z[1 . . k–1] = LCS(x[1 . . i–1], y[1 . . j–1]). Suppose w is a longer CS of x[1 . . i–1] and y[1 . . j–1], that is, |w| > k–1. Then, cut and paste: w || z[k] (w concatenated with z[k]) is a common subsequence of x[1 . . i] and y[1 . . j] with |w || z[k]| > k. Contradiction, proving the claim. Thus, c[i–1, j–1] = k–1, which implies that c[i, j] = c[i–1, j–1] + 1. Other cases are similar
Dynamic-programming hallmark #1 Optimal substructure An optimal solution to a problem (instance) contains optimal solutions to subproblems If z=LCs(, y), then any prefix of z is an LCS of a prefix of x and a prefix of y c 2001 by Charles E Leiserson Introduction to Agorithms Day 26 L15.7
© 2001 by Charles E. Leiserson Introduction to Algorithms Day 26 L15.7 Dynamic-programming hallmark #1 Optimal substructure An optimal solution to a problem (instance) contains optimal solutions to subproblems. If z = LCS(x, y), then any prefix of z is an LCS of a prefix of x and a prefix of y
Recursive algorithm for LCs LCS(,y, i,j) ifx=yli then ci,j]<LCS(,y,i-1,j-1)+1 else c[i, j]<max LCS(x,y, i-1,j) LCS(x,y,i,j-1)) Worst-case: xi*yljil, in which case the algorithm evaluates two subproblems. each with only one parameter decremented c 2001 by Charles E Leiserson Introduction to Agorithms Day 26 L15.8
© 2001 by Charles E. Leiserson Introduction to Algorithms Day 26 L15.8 Recursive algorithm for LCS LCS(x, y, i, j) if x[i] = y[ j] then c[i, j] ← LCS(x, y, i–1, j–1) + 1 else c[i, j] ← max{LCS(x, y, i–1, j), LCS(x, y, i, j–1)} Worst-case: x[i] ≠ y[ j], in which case the algorithm evaluates two subproblems, each with only one parameter decremented
Recursion tree m=3.n=4 subproblem mtn 2,2 Height=m+n= work potentially exponential but were solving subproblems already solved c 2001 by Charles E Leiserson Introduction to Agorithms Day 26 L15.9
© 2001 by Charles E. Leiserson Introduction to Algorithms Day 26 L15.9 same subproblem , but we’re solving subproblems already solved! Recursion tree m = 3, n = 4: 3,43,4 2,42,4 1,41,4 3,33,3 2,32,3 3,23,2 1,31,3 2,22,2 Height = m + n ⇒ work potentially exponential. 2,32,3 1,31,3 2,22,2 m+n
Dynamic-programming hallmark #2 Overlapping subproblems a recursive solution contains a ¨ small number of distinct subproblems repeated many times The number of distinct LCs subproblems for two strings of lengths m and n is only mn c 2001 by Charles E Leiserson Introduction to Agorithms Dav26L15.10
© 2001 by Charles E. Leiserson Introduction to Algorithms Day 26 L15.10 Dynamic-programming hallmark #2 Overlapping subproblems A recursive solution contains a “small” number of distinct subproblems repeated many times. The number of distinct LCS subproblems for two strings of lengths m and n is only mn