Q gram di istance Let q be an integer. Given a string s, the set of q-grams of s denoted G(s), is obtained by sliding a window of length g over the characters oI strings. For example, if q=2 G(和Ford) Ha,'ar,'r,ri,'is,so,^on,'n3,F,'Fo,or3,'rdl} G(“ Harison fort fHa,'ar3,'ri','is,'so’,^on’,n’,F,'Fo,'or,'rt”} Similarity(s1, S2)1-G(s1)nG(S2)G(SD)U G(s2) Similarity Harrison Ford,"Harison Fort)=1-10/13 0.23Q-gram distance ◼ Let q be an integer. Given a string s, the set of q-grams of s, denoted G(s), is obtained by sliding a window of length q over the characters of strings. ◼ For example, if q = 2: G(“Harrison Ford”) = {’Ha’, ’ar’, ’rr’, ’ri’, ’is’, ’so’, ’on’, ’n ’, ’F’, ’Fo’, ’or’, ’rd’}. G(“Harison Fort”) = {’Ha’, ’ar’, ’ri’, ’is’, ’so’, ’on’, ’n ’, ’ F’, ’Fo’, ’or’, ’rt’}. ◼ Similarity(s1, s2) = 1 − |G(s1) ∩ G(s2)|/ |G(s1) ∪ G(s2)| ◼ Similarity(“Harrison Ford”, “Harison Fort”) = 1 – 10/13 ≈ 0.23