String Matching algorithm Overview& Analysis By cyclone@ NSlab, RIIT Sep.62008
String Matching Algorithm Overview & Analysis By cyclone @ NSlab, RIIT Sep. 6 2008
Structure ● Algorithm overview o Performance experiments o Solution and future work o Bibliography and resources
Structure ⚫Algorithm overview ⚫Performance experiments ⚫Solution and future work ⚫Bibliography and resources
Definition o Given an alphabet bet S, a pattern P of length m and a text T of length n, find if P is in Tor the position( s)p matches a substring of T, where usually m<<n o Considering the pattern P Ostring exact string matching O string with errors approximate string matching O regular expression regular expression matching
Definition ⚫ Given an alphabet bet S, a pattern P of length m and a text T of length n, find if P is in T or the position (s) P matches a substring of T, where usually m<<n ⚫ Considering the pattern P string exact string matching string with errors approximate string matching regular expression regular expression matching
Categories ● Category1 O Single string matching algorithms O Multiple string matching algorithms o Category 2 O Prefix based algorithms O Suffix based Algorithms O Factor based Algorithms o Category 3 O Automaton based algorithms O Trie based algorithms O Table based algorithms
Categories ⚫ Category 1 Single string matching Algorithms Multiple string matching Algorithms ⚫ Category 2 Prefix based Algorithms Suffix based Algorithms Factor based Algorithms ⚫ Category 3 Automaton based Algorithms Trie based Algorithms Table based Algorithms ⚫ ……
Prefix based algorithms b o The search is done forward in the search window o Search for the longest prefix of the window Which is also a prefix of the pattern(s) oall the characters are read
Prefix based algorithms ⚫The search is done forward in the search window ⚫Search for the longest prefix of the window which is also a prefix of the pattern(s) ⚫all the characters are read
Suffix based algorithms o The search is done backwards along the search window Search for the longest suffix of the window which is also a suffix of the pattern(s) not all the characters are read which leads to sublinear average-case algorithms
Suffix based algorithms ⚫The search is done backwards along the search window ⚫Search for the longest suffix of the window which is also a suffix of the pattern(s) ⚫not all the characters are read, which leads to sublinear average-case algorithms
Factor based algorithms a o The search is done backwards along the search window o Search for the longest suffix of the window which is also a factor of the pattern(s) o not all the characters are read o Require to recognize the set of factors of the pattern( s)and this is quite complex
Factor based algorithms ⚫ The search is done backwards along the search window ⚫ Search for the longest suffix of the window which is also a factor of the pattern(s) ⚫ not all the characters are read ⚫ Require to recognize the set of factors of the pattern(s) and this is quite complex
Roadmap Single Multiple Prefix KMP AC_、A6 C Modified BⅣ CW(AC_BM) Suffix →SBMH BMH WM Factor BDM SBDM BOM SBOM
Roadmap
KMP ●MP: border the longest prefiⅸ v of the pattern which is also the suffix of the portion u of the text ●KMP: Longest border add“C(u+1) not equal to C(v+1)
KMP ⚫ MP: border the longest prefix v of the pattern which is also the suffix of the portion u of the text ⚫ KMP: Longest border add “C (u+1) not equal to C(v+1)