正在加载图片...
Stop Words and Stemming From a given Stop Word list a, about, again, are, the to, of, Remove them from the documents Or, determine stop words Given a large enough corpus of common English Sort the list of words in decreasing order of their occurrence frequency in the corpus Zipf's law: Frequency * rank x constant most frequent words tend to be short most frequent 20% of words account for 60% of usage3 Stop Words and Stemming ◼ From a given Stop Word List ◼ [a, about, again, are, the, to, of, …] ◼ Remove them from the documents ◼ Or, determine stop words ◼ Given a large enough corpus of common English ◼ Sort the list of words in decreasing order of their occurrence frequency in the corpus ◼ Zipf’s law: Frequency * rank  constant ◼ most frequent words tend to be short ◼ most frequent 20% of words account for 60% of usage
<<向上翻页向下翻页>>
©2008-现在 cucdc.com 高等教育资讯网 版权所有