Index Compression Collection Statistics Heaps Law Fig 5.1 p81 For rcv1, the dashed line oguM=0.49g107+164 is the best least squares fit. Thus,M=10.6470490k= 10164≈44andb=049 Good empirical fit for Reuters rcv1 For first 1,000.020 tokens law predicts 38, 323 terms actually, 38, 365 terms log10TIndex Compression 11 Heaps’ Law For RCV1, the dashed line log10M = 0.49 log10T + 1.64 is the best least squares fit. Thus, M = 101.64T 0.49 so k = 101.64 ≈ 44 and b = 0.49. Good empirical fit for Reuters RCV1 ! For first 1,000,020 tokens, law predicts 38,323 terms; actually, 38,365 terms Fig 5.1 p81 Collection Statistics