Character-based character-based metrics Edit distance Advantages: Affine gap distance work well for estimating distance Smith-Waterman distance between strings that differ due to Jaro distance metric typographicalerrors or abbreviations Q-gram distance Disadvantages: expensive and less accurate for larger Token-based strings Atomic strings Token-based metric WHIRL View string as " bags of tokens and Q-grams with tf idf disregarding the order in which the tokens occur◼ character-based metrics ◼ Advantages: work well for estimating distance between strings that differ due to typographical errors or abbreviations ◼ Disadvantages: expensive and less accurate for larger strings ◼ Token-based metric View string as “bags of tokens” and disregarding the order in which the tokens occur. Character-based Edit distance Affine gap distance Smith-Waterman distance Jaro distance metric Q-gram distance Token-based Atomic strings WHIRL Q-grams with tf.idf