Randomized Algorithm for motif Detection Lusheng Wang Dept of Computer Science City University of Hong Kong Web:http://www.cs.cityu.edu.hk/wlwang e-mail: Iwang @cs. cityu. edu.hk 2021/1/29
2021/1/29 1 Randomized Algorithm for Motif Detection Lusheng Wang Dept. of Computer Science City University of Hong Kong Web: http://www.cs.cityu.edu.hk/~lwang e-mail: lwang@cs.cityu.edu.hk
Outline Review our theoretical results 2. A software for motif detection 3. Other problems we have been working Parametric tools RNA secondary structure comparison Computing translocation distance between genomes 2021/1/29
2021/1/29 2 Outline: 1. Review our theoretical results 2. A software for motif detection 3. Other problems we have been working --Parametric tools RNA secondary structure comparison --Computing translocation distance between genomes
Distinguishing Substring Selection Problem Instance: Given a set B of n, (bad) strings of length at least L, a setG of n,(good) strings, two thresholds db and < Objective: Find a string s such that for each string b E B there exists a substring t of b with lengthl such that d(S,1)≤db and for any string g∈G, (S,g)≥ Bad sequences s: motif Good sequences 2021/1/29
2021/1/29 3 Distinguishing Substring Selection Problem Instance: Given a set B of n1 (bad) strings of length at least L, a set G of n2 (good) strings, two thresholds db and dg ( db < dg ). Objective: Find a string s such that for each string b B there exists a substring t of b with length L such that d(s, t) db and for any string g G, d(s, g) dg . Bad sequences Good sequences s: motif
Applications 1. Finding targets for Potential drugs (T Jiang, C. Trendall, S, Wang, T. Wareham, X. Zhang, 98) (K. Lanctot, M. Li, B Ma, S. Wang, and L. Zhang 1999) bad strings in b are from bacteria --good strings are from Humans find a substring s of length L that is conserved in all bad strings, but not conserved in good strings use s to screen chemicals those selected chemicals can then be tested as potential broad-range antibiotics 2021/1/29
2021/1/29 4 Applications 1. Finding Targets for Potential Drugs (T. Jiang, C. Trendall, S, Wang, T. Wareham, X. Zhang, 98) (K. Lanctot, M. Li, B. Ma, S. Wang, and L. Zhang 1999) -- bad strings in B are from Bacteria. --good strings are from Humans -- find a substring s of length L that is conserved in all bad strings, but not conserved in good strings. -- use s to screen chemicals -- those selected chemicals can then be tested as potential broad-range antibiotics
Applications 2. Creating Diagnostic Probes for Bacterial Infection (T. Brown, G.A. Leonard, E.D. Booth, G. Kneale, 1990) a group of closely related pathogenic bacteria find a substring that occurs in each of the bacterial sequences(with as few substitutions as possible) and does not occurs in the host(Human) 2021/1/29
2021/1/29 5 Applications 2. Creating Diagnostic Probes for Bacterial Infection (T. Brown, G.A. Leonard, E.D. Booth, G. Kneale, 1990) -- a group of closely related pathogenic bacteria -- find a substring that occurs in each of the bacterial sequences (with as few substitutions as possible) and does not occurs in the host (Human)
Applications 3. Creating Universal PCR Primers 4. Creating Unbiased Consensus Sequences 5. Anti-sense Drug design 6. In coding theory(computer science) 2021/1/29
2021/1/29 6 Applications 3. Creating Universal PCR Primers 4. Creating Unbiased Consensus Sequences 5. Anti-sense Drug Design 6. In coding theory (computer science)
Distinguishing string selection Problem Instance: Given a set B of n,(bad) strings of length L, a set G of n2(good)strings of length L, two thresholds db and d Center string Good Strings 2021/1/29
2021/1/29 7 Distinguishing String Selection Problem Instance: Given a set B of n1 (bad) strings of length L, a set G of n2 (good) strings of length L, two thresholds db and dg ( db < dg ). Objective: Find a string s such that for each string b B, d(s, b) db and for any string g G, d(s, g) dg . Bad strings Center string Good Strings
Closest string problem Instance: Given a set B of n strings of length L, and an integer d Objective: Find a string s of length L such that for each string s∈B,a(s,s;)≤d Theorem: Closest string problem is NP-hard. (Lanctot, Li, Ma, Wang, Zhang, 1999) 2021/1/29 8
2021/1/29 8 Closest String Problem Instance: Given a set B of n strings of length L, and an integer d Objective: Find a string s of length L such that for each string si B, d(s, si ) d Theorem: Closest string problem is NP-hard. (Lanctot, Li, Ma, Wang, Zhang, 1999)
Closest Substring Problem Instance: Given a set b of n strings of length at least L and an integer d Objective: Find a string s of length l such that for each string S,E B, there is a substring t; of s,(with length D) such that a(,s)≤d Theorem: Closest substring problem is NP-hard (K. Lanctot, M. Li, B Ma, S. Wang, and L. Zhang 1999) A PTAS was given in M. Li, B Ma, and L. Wang 2000 2021/1/29
2021/1/29 9 Closest Substring Problem Instance: Given a set B of n strings of length at least L, and an integer d Objective: Find a string s of length L such that for each string si B, there is a substring t i of si (with length L) such that d(s, si ) d Theorem: Closest substring problem is NP-hard. (K. Lanctot, M. Li, B. Ma, S. Wang, and L. Zhang 1999) A PTAS was given in M. Li, B. Ma, and L. Wang 2000
Consensus pattern Instance: Given a set B of n strings of different lengths and an integer L Objective: Find a string s of length L and for each string S;E B, find a substring t; of s, such that d(s,t Is minimized Theorem: The consensus pattern problem is NP-hard There is a PTAs for consensus pattern(M Li, B Ma, and L Wang, 1999) 2021/1/29 10
2021/1/29 10 Consensus Pattern Instance: Given a set B of n strings of different lengths, and an integer L, Objective: Find a string s of length L and for each string si B, find a substring t i of si such that n d(s, t i ) i=1 is minimized. Theorem: The consensus pattern problem is NP-hard. There is a PTAs for consensus pattern. (M. Li, B. Ma, and L. Wang, 1999)