综合练习:如何查资料、整理资料 Lecture 3. Dynamic ■资料来源 Programming ■资料使用 清华大学宋斌 Dynamic programming 动态规划方法回顾 a Longest common subsequence ■与分治方法有没有共同点?【不同点】 ■ Printing neatly 子问题空间?决定复杂性因子之 子问题的治与合?复杂性因子之二 最优值 ■技巧: Memoization备忘录法 ■最优解 讨论] 最长公共子序列 问题定义 m A finite alphabet a finite set R of strings from A', and f Awith w I>= K such that w each x element of r? -complete for alA(R)>=2
1 清华大学 宋斌恒 1 Lecture 3. Dynamic Programming 清华大学 宋斌恒 清华大学 宋斌恒 2 综合练习:如何查资料、整理资料 n 资料来源 n 资料整理 n 资料使用 清华大学 宋斌恒 3 Dynamic programming n Longest common subsequence n Printing neatly n Edit distance n Optimal binary search tree n 技巧:Memoization[备忘录法] 清华大学 宋斌恒 4 动态规划方法回顾 n 与分治方法有没有共同点?【不同点】 n 子问题空间?决定复杂性因子之一 n 子问题的治与合?复杂性因子之二 n 最优值 n 最优解 n [讨论] 清华大学 宋斌恒 5 最长公共子序列 清华大学 宋斌恒 6 问题定义 n Given : n A finite alphabet A, finite set R of strings from A*, and apositiveinteger K n Is There : n A string w element of A* with | w | >= K such that w is a subsequence of each x element of R ? n The problem remains NP-complete for all A ( R ) >= 2 n Analogous Longest Common Substring problem is trivially solvable in polynomial time
实例 些应用 I Given as input, the strings ■·Stng-to- String Correction( spell checking a. Distance Between Strings(database keys) ■ FGHABD Minimum Cost Edit Operation Between Strings A, B, D, AB ■· Amino Acid Sequences in Molecular Biology a The Longest Common Subsequences Are ■· Data Compression Techniques nAB Longest Common subsequence Longest Common subsequence ■ Background m In bioinformatics, it needs often to compare the DNA a Two possible DNA string of two or more different organisms S1 ACCGGTCGAGTGCGCGGAAGCCGGC m A strand of DNA consists of a string of molecules m S2= GTCGTTCGGAATCGGCTTGCCGTTG called base, where the possible bases are Adenine:腺嘌呤 how "similar" the two strands are Guanine:鸟嘌呤 Cytosine:胞核嘧啶,氧氨嘧啶 Thymine:胸腺嘧啶 m A DNa can be represented by a string over the finite set ( AC, G, T] 清华太学未证想 What is the similarity of two strings? Concepts a Issue: The similarity can be defined in many a Subsequence and substring ways, here we use the "longest common s A substring of a string is a continuous part of ubsequence to describe the similarity the string Show your opinions! [Many] A subsequence of a string x is a string y with its elements in y in the same orde
2 清华大学 宋斌恒 7 实例 n Given as input, the strings : n DEABC n ABCD n FGHABD n Common Subsequences Are : n A, B, D, AB n The Longest Common Subsequences Are : n AB 清华大学 宋斌恒 8 一些应用 n •String - to - String Correction (spell checking & correction) n •Distance Between Strings (database keys) n •Minimum Cost Edit Operation Between Strings n •Amino Acid Sequences in Molecular Biology n •Data Compression Techniques 清华大学 宋斌恒 9 Longest Common subsequence n Background: n In bioinformatics, it needs often to compare the DNA of two or more different organisms. n A strand of DNA consists of a string of molecules called base, where the possible bases are: n Adenine:腺嘌呤 n Guanine:鸟嘌呤 n Cytosine:胞核嘧啶, 氧氨嘧啶 n Thymine:胸腺嘧啶 n A DNA can be represented by a string over the finite set [A,C,G,T] 清华大学 宋斌恒 10 Longest Common subsequence n Two possible DNA string; n S1 = ACCGGTCGAGTGCGCGGAAGCCGGC n S2 = GTCGTTCGGAATCGGCTTGCCGTTG n Our goal: Compare the two string to determine how “similar”the two strands are. 清华大学 宋斌恒 11 What is the similarity of two strings? n Issue: The similarity can be defined in many ways, here we use the “longest common subsequence”to describe the similarity n Show your opinions![Many] 清华大学 宋斌恒 12 Concepts n Subsequence and substring n A substring of a string is a continuous part of the string n A subsequence of a string x is a string y with its elements in y in the same order
Longest Common Subsequence Problem Stepl: Analyzing the problem I Given two sequences aA brute-force method: [Naive X= be two ■ Proof of"tfxn° yn then zk=xm° yn and zk1isan sequences and let Z=<z1, Z2,.,Zk? be a LCS LCS ofx and of x and y xm n, then z xm yn and zk-1 common subsequence of X andY, hich is a contradiction hence we have Xm-, and Yn-1 ZXm n and then, kxm implies that Z, is ar m Z,, is a common subsequence of Xm, andY m If xm -yo then zk -y, and zk is an LCS of Xn If not, there are another common subsequence andY which is longer than Zk, and then we append 清华太学未证想 Continuing proof of Theorem 15.1 Step 2: A recursive solution a Proof of"If xm l=yn, then(zk l=xm implies that a Let C[i,j be the length of the LCS of X, and Y, then Zk is an LCS of Xm-1 and Yy) a If z is not an LCS of Xm, and,, then it will conclude a contradiction that z is an lcs of x i,j=0 c={41-1,-]+ Mi=y max(cl,j-1, ci-1,j x*y
3 清华大学 宋斌恒 13 Longest Common Subsequence Problem n Given two sequences n X= n Y= n Find the longest common subsequence, where a common subsequence of two strings is another string y such that it is a subsequence of both of the two given strings. 清华大学 宋斌恒 14 Step1: Analyzing the problem n A brute-force method: [Naïve] n How many subsequences for string x? n Check each one is also a subsequence of string y. n Complexity? 清华大学 宋斌恒 15 Theorem 15.1 Optimal substructure of an LCS n Let X= ,Y= be two sequences and let Z= be a LCS of X and Y. n If xm=yn , then zk=xm=yn and Zk-1 is an LCS of Xm-1 and Yn-1 n If xm !=yn , then zk != xm implies that Zk is an LCS of Xm-1 and Yn n If xm != yn , then zk != yn and Zk is an LCS of Xm and Yn-1 清华大学 宋斌恒 16 Proof of Theorem 15.1 n Proof of “If xm=yn , then zk=xm=yn and Zk-1 is an LCS of Xm-1 and Yn-1” n If zk !=xm ‡we could append xm to Z to obtain a longer common subsequence of X and Y, which is a contradiction, hence we have zk=xm=yn and n Zk-1 is a common subsequence of Xm-1 and Yn-1 , which we will show it is an LCS of Xm-1 and Yn-1 . If not, there are another common subsequence which is longer than Zk-1 and then we append yn to it, which is a contradiction. 清华大学 宋斌恒 17 Continuing proof of Theorem 15.1 n Proof of “If xm !=yn , then (zk != xm implies that Zk is an LCS of Xm-1 and Yn )” n If Zk is not an LCS of Xm-1 and Yn , then it will conclude a contradiction that Z is an LCS of X and Y. 清华大学 宋斌恒 18 Step 2: A recursive solution n Let C[i,j] be the length of the LCS of Xi and Yj , then ï î ï í ì - - ¹ - - + = = = i j i j c i j c i j x y c i j x y i j c i j max{ [ , 1], [ 1, ]} [ 1, 1] 1 0 , 0 [ , ]
Step 3: Computing the length of an LCs Step 4: Constructing the LCS LCS-Length(X, Modification for the last algorithm: store the information of choices. Here after the b(i j] stands for the choice ③Foi∈1tm[0∥初始化 of computing cl小 ④Forj←0ton{c0=0 ∥初始化 ⑤Fori∈1tom,j∈1ton 画 o if xi=yi then [c[ijcl-1j-11+1: b[ii-upperleft continue) D if xi=y then clF=C[1j-1+1: continue 2 if c[i- j-1Pc[i-1 then (c[j=c[-j-1F b[i-jF-upper ②计c1c1 h then{} 1] continue} continue) ③c[lFc1d ③c[}=c-1Jb]et ⑦} Print-LCS(b, X, ij) How to improve your code O If i=0 or jo then return a Save the space used in the last algorithm ① Print-LCS(b,X-1}1) ② Print x ■ Complexity analysis ③ Return a Complete the LCS ■备忘录法 o If(o, j]=upper Print-Lcs(b, X, i-1,); return; Hcs(b, X, i, j ); 清华太学未证想 如何分析问题 子问题空间 ■优美打印为例: ■原问题?不优美度(一段文字) ■什么是优美打印问题 考虑不考虑首行缩进问题? ■英文打印每行空格的差别要小:一般是空格个 ■一般子问题:从某位置开始的子文字段 数的三次方的和达到极小。【举例说明】 问题构造原问题? ■ 是香可以由这类子问题构造原问题的 有没有更小的子问题空间?
4 清华大学 宋斌恒 19 Step 3: Computing the length of an LCS LCS-Length(X,Y) ① m=length[X] ② n=length[Y] ③ For i fl 1 to m {c[i,0]=0} //初始化 ④ For j fl 0 to n {c[0,j]=0} //初始化 ⑤ For i fl 1 to m, j fl 1 to n ⑥ { ① if xi=yj then {c[i,j]=c[i-1,j-1]+1; continue} ② if c[i,j-1]>c[i-1,j] then {c[i,j]=c[i.j-1]; continue} ③ c[i,j]=c[i-1,j] ⑦ } 清华大学 宋斌恒 20 Step 4: Constructing the LCS Modification for the last algorithm: store the information of choices. Here after the b[i,j] stands for the choice of computing c[i,j]. ⑤ For i fl 1 to m, j fl 1 to n ⑥ { ① if xi=yj then {c[i,j]=c[i-1,j-1]+1; b[i,j]=upperleft; continue} ② if c[i,j-1]>c[i-1,j] then {c[i,j]=c[i.j-1]; b[i,j]=upper; continue} ③ c[i,j]=c[i-1,j]; b[i,j]=left ⑦ } 清华大学 宋斌恒 21 Print-LCS(b,X,i,j) ① If i=0 or j=0 then return ② If b[i,j]=upperleft then{ ① Print-LCS(b,X,i-1,j-1) ② Print Xi ; ③ Return; ③ } ④ If b[i,j]=upper Print-LCS (b,X,i-1,j); return; ⑤ Print-LCS(b,X,i,j); ⑥ return; 清华大学 宋斌恒 22 How to improve your code n Save the space used in the last algorithm n How to write the code neatly n Complexity analysis n Complete the LCS n 备忘录法 清华大学 宋斌恒 23 如何分析问题 n 优美打印为例: n 什么是优美打印问题: n 英文打印每行空格的差别要小:一般是空格个 数的三次方的和达到极小。【举例说明】 清华大学 宋斌恒 24 子问题空间 n 原问题?不优美度(一段文字) n 考虑不考虑首行缩进问题? n 不优美度(空字符数,一段文字) n 一般子问题:从某位置开始的子文字段 n ?分析子问题维数? n ?如何由子问题构造原问题? n 特殊子问题:所有子文字段都从头排版 n ?子问题问题维数? n 是否可以由这类子问题构造原问题的解? n 有没有更小的子问题空间?
Analyzing Printing neatly again Recursive formula 断句点 个如是的众来 断句点确定后留下两类问题 的不优美度,假 类问题的不优美度要考虑最后一行数值:另 类问题的不优美度不考虑最后一行的数值 ■D(1,n)!递归 ■说明上述选择方法、证明上述选择方法的正确 a D(j)=min ek D(k+1.j)1 ■初始值? ■递归公式准备工作如何做?【见下页】 初始值( Initial values) 最优子结构 当k)一计+所有词的字符数≤行字符数时【词与词 ■原问题的解是最优的 之间空格数为】 ■C(uj)=行字符数一K(〕]3 ■如何说明原问题的解是由子问题的最优解构成 ■D(uj)户=0 的,这是分析问题的关键 ■如果编程:请问最严重的异常情况是什么? ■思考:如果字符宽度不同,上述方法有效否? ■进一步:如果字符宽度不同,其宽度可以无级缩放, 每行两头完全对齐,如何定义优美度 清华太学未证想 优美打印问题变种 编辑距离 ■字符宽度不同 ■从一个字符串到另一个字符串可以由若干变换构成 ■可以调整某些字符的宽度,如空格,但调整要 计算在优美度内 可以调整所有字符的宽度 ■其它语言如汉语的优美度问题 n Twiddle ■图文混排问题等等 ■x=oy,其中o是一连串的变换=t2…tn
5 清华大学 宋斌恒 25 Analyzing Printing neatly again n 断句点 n 断句点确定后留下两类问题 n 一类问题的不优美度要考虑最后一行数值;另 一类问题的不优美度不考虑最后一行的数值; n 说明上述选择方法、证明上述选择方法的正确 性。 n 递归公式准备工作如何做?【见下页】 清华大学 宋斌恒 26 Recursive Formula n 请问:如果C(i,j)表示从第i个词开始到第j个词结束段落 包括最后一行的不优美度,如果D(i,j)表示从第i个词开 始到第j个词结束段落不包括最后一行的不优美度,假 设有n个词,原问题的不优美度为 n D(1,n)!递归公式? n C(i,j)=mini<k<j {C(i,k)+ C(k+1,j)} n D(i,j)=mini<k<j {C(i,k)+ D(k+1,j)} n 初始值? 清华大学 宋斌恒 27 初始值(Initial values) n 当k(i,j)=j-i+所有词的字符数≤行字符数时【词与词 之间空格数为1】 n C(i,j)=[行字符数-k(i,j)]3 n D(i,j)=0 n 如果编程:请问最严重的异常情况是什么? n 思考:如果字符宽度不同,上述方法有效否? n 进一步:如果字符宽度不同,其宽度可以无级缩放, 每行两头完全对齐,如何定义优美度? 清华大学 宋斌恒 28 最优子结构 n 原问题的解是最优的 n 如何说明原问题的解是由子问题的最优解构成 的,这是分析问题的关键 清华大学 宋斌恒 29 优美打印问题变种 n 字符宽度不同 n 可以调整某些字符的宽度,如空格,但调整要 计算在优美度内。 n 可以调整所有字符的宽度。 n 其它语言如汉语的优美度问题 n 图文混排问题等等 清华大学 宋斌恒 30 编辑距离 n 从一个字符串到另一个字符串可以由若干变换构成: n Copy n Replace n Delete n Insert n Twiddle n Kill n x=s(y), 其中s是一连串的变换s=t1 t2… tn
变换成本与距离 Optimal binary search tree 费用最小的o的费用就是和y之间的距离 ■ Binary Search Tree? ■C[x,yJ=C[yml+ twiddle可以 twiddle, a Search tree 可以 replace m Examples? cyn]=C1ymH+ delete可以 delet I Optimal: under some condition C区1yc[Xyn]k可以k m The elements appear with different ■最优子结构:最优解可以分解为子问题的最优解的综合。 frequencie a How to formally define it A search tree Definition for the optimal binary search tree istinct keys stored from smaller value to larger value a We need n+1 dummy keys for values not in K, these dummy keys are do d1,..., dn a The appearance probabilities of k and d are Find a search tree such that: 清华太学未证想 Cost of the searches are minimum The structure of an optimal binary search tree a The expectation cost of search: [See p358 a We need a structure that the optimal solution 15.16] can be constructed from one or many optimal ■ Esearch cost in solution for subproblems Sum[(depth(*)+1)pl+ a Any sub tree of binary search tree contains sum((depth()+1)如] the keys continuing from k to k, and also th Where p are the occurrence possibility of k, q dummy keys from d- to d
6 清华大学 宋斌恒 31 变换成本与距离 n Cost(s)=SCost(ti ) n 我们称由x到y的费用最小的s的费用就是x和y之间的距离。 n 子问题空间如何确定: n C[xn ,ym]=C[xn-2 ,ym-2 ]+twiddle 可以 twiddle, n C[xn ,ym]=C[xn-1 ,ym-1 ]+copy 可以 copy, n C[xn ,ym]=C[xn-1 ,ym-1 ]+replace 可以 replace, n C[xn ,ym]=C[xn-1 ,ym]+delete 可以 delete, n C[xn ,ym]=C[xn ,ym-1 ]+insert 可以 insert, n C[xn ,ym]=C[xn-s ,ym]+kill 可以 kill, n 最优子结构:最优解可以分解为子问题的最优解的综合。 n 问题: 清华大学 宋斌恒 32 Optimal binary search tree n Binary Search Tree? n Binary tree n Search tree n Examples? n Optimal: under some condition: n The elements appear with different frequencies n How to formally define it? 清华大学 宋斌恒 33 A search tree k1 k3 k4 k2 d0 d1 d2 d3 d4 清华大学 宋斌恒 34 Definition for the optimal binary search tree n Given a sequence K= of n distinct keys stored orderly from smaller value to larger value. n We need n+1 dummy keys for values not in K, these dummy keys are d0 ,d1 ,… ,dn n The appearance probabilities of ki and dj are given n Find a search tree such that: 清华大学 宋斌恒 35 Cost of the searches are minimum n The expectation cost of search:[See p358 15.16] n E[search cost in T]= Sum[(depth(ki )+1)*pi ]+ sum[(depth(di )+1)*qi ]+ Where pi are the occurrence possibility of ki , qi are the occurrence possibility of di 清华大学 宋斌恒 36 The structure of an optimal binary search tree n We need a structure that the optimal solution can be constructed from one or many optimal solution for subproblems n Any sub tree of binary search tree contains the keys continuing from ki to kj and also the dummy keys from di-1 to dj
费用递归函数 The optimal structure of the binary search tree e[jl, the optimal If the tree is optimal then its all subtree is xpectation value for the ■ Prove it! the key k to k, we have ■ej|=p+(e1]+w(F 1)+er+1+wr+1J) ■e}=e[r1}+er+1』w( Chapter notes aR. Bellman' s systematic study on dynamic programming in 1955 with solid math base solution by Hu and Shing a Knuth posed a question is there any solution the answer is yes with a solution with o(mn/lg n)by Masket and Paterson 清华太学未证想
7 清华大学 宋斌恒 37 费用递归函数 n e[i,j ], the optimal expectation value for the optimal tree n If kr is the root of the optimal subtree containing the key ki to kj , we have n e[i,j ]=pr+(e[i,r-1]+w(i,r- 1))+e[r+1,j]+w(r+1,j)) n e[i,j ]=e[i,r-1]+e[r+1,j]+w(i,j) k1 k3 k4 k2 d0 d1 d2 d3 d4 清华大学 宋斌恒 38 The optimal structure of the binary search tree n If the tree is optimal, then its all subtree is optimal. n Prove it! 清华大学 宋斌恒 39 Chapter notes n R. Bellman’s systematic study on dynamic programming in 1955 with solid math base n Matrix chain production has an O(n lg n) solution by Hu and Shing n Knuth posed a question is there any solution whose complexity is less than O(mn) for LCS, the answer is yes with a solution with O(mn/lg n) by Masket and Paterson