正在加载图片...
Index Compression Collection Statistics Exercises Compute the vocabulary size M for this scenario Looking at a collection of web pages you find that there are 3000 different terms in the first 10,000 tokens and 30.000 different terms in the first 1,000,000 tokens Assume a search engine indexes a total of 20,000,000,000 (2 X 1010 )pages, containing 200 tokens on average What is the size of the vocabulary of the indexed collection as predicted by heaps law?Index Compression 12 Exercises ▪ Compute the vocabulary size M for this scenario: ▪ Looking at a collection of web pages, you find that there are 3000 different terms in the first 10,000 tokens and 30,000 different terms in the first 1,000,000 tokens. ▪ Assume a search engine indexes a total of 20,000,000,000 (2 × 1010) pages, containing 200 tokens on average ▪ What is the size of the vocabulary of the indexed collection as predicted by Heaps’ law? Collection Statistics
<<向上翻页向下翻页>>
©2008-现在 cucdc.com 高等教育资讯网 版权所有