正在加载图片...
Index Compression Collection Statistics Vocabulary vs collection size Heaps’law:M=k7 M is the size of the vocabulary t is the number of tokens in the collection Typical values:30≤k≤100andb≈0.5 In a log-log plot of vocabulary size M vs. T, Heaps law predicts a line with slope about 2 It is the simplest possible relationship between the two in log- log space An empirical finding empirical law")Index Compression 10 Vocabulary vs. collection size ▪ Heaps’ law: M = kTb ▪ M is the size of the vocabulary, T is the number of tokens in the collection ▪ Typical values: 30 ≤ k ≤ 100 and b ≈ 0.5 ▪ In a log-log plot of vocabulary size M vs. T, Heaps’ law predicts a line with slope about ½ ▪ It is the simplest possible relationship between the two in log-log space ▪ An empirical finding (“empirical law”) Collection Statistics
<<向上翻页向下翻页>>
©2008-现在 cucdc.com 高等教育资讯网 版权所有