正在加载图片...
Index Compression Dictionary Compression Fixed-width terms are wastefu Most of the bytes in the term column are wasted we allot 20 bytes for 1 letter terms And we still cant handle supercalifragilisticexpialidocious or hydrochlorofluorocarbons Written English averages 4.5 characters/ word Ave. dictionary word in English: 8 characters How do we use"8 characters per dictionary term? Short words dominate token counts but not type average.Index Compression 20 Fixed-width terms are wasteful ▪ Most of the bytes in the Term column are wasted – we allot 20 bytes for 1 letter terms. ▪ And we still can’t handle supercalifragilisticexpialidocious or hydrochlorofluorocarbons. ▪ Written English averages ~4.5 characters/word. ▪ Ave. dictionary word in English: ~8 characters ▪ How do we use ~8 characters per dictionary term? ▪ Short words dominate token counts but not type average. Dictionary Compression
<<向上翻页向下翻页>>
©2008-现在 cucdc.com 高等教育资讯网 版权所有