正在加载图片...
信息检索与数据挖掘 2019/3/7 22 回顾:词条化的关键步骤 比较词条可能需要更多的CPU周期 Term Doc# Term Doc# 1 ambitious 2 did 1 be 2 enact 1 brutus 1 •在所有文档被转换后,倒排 julius 1 brutus 2 caesar 7 capitol 1 表按照词项的字母顺序进行 1 caesar 1 was 1 caesar 2 排序 killed 1 caesar 2 1 did 1 the 1 enact 1 capitol 1 hath 1 brutus 1 1 killed 1 1 me 1 P 1 我们关注这一排序过程。 so 2 t 2 let 2 julius 1 我们有100M的词条需要排序。 i设 2 killed 1 be 2 killed 1 with 2 let 2 caesar 2 me 1 the 2 noble 2 noble 2 so 2 brutus 2 the 1 hath 2 the 2 told 2 told 2 you 2 you 2 caesar 2 was 1 was 2 was 2 ambitious 2 with 2 22信息检索与数据挖掘 2019/3/7 22 回顾:词条化的关键步骤 比较词条可能需要更多的CPU周期 • 在所有文档被转换后,倒排 表按照词项的字母顺序进行 排序 Term Doc # I 1 did 1 enact 1 julius 1 caesar 1 I 1 was 1 killed 1 i' 1 the 1 capitol 1 brutus 1 killed 1 me 1 so 2 let 2 it 2 be 2 with 2 caesar 2 the 2 noble 2 brutus 2 hath 2 told 2 you 2 caesar 2 was 2 ambitious 2 Term Doc # ambitious 2 be 2 brutus 1 brutus 2 capitol 1 caesar 1 caesar 2 caesar 2 did 1 enact 1 hath 1 I 1 I 1 i' 1 it 2 julius 1 killed 1 killed 1 let 2 me 1 noble 2 so 2 the 1 the 2 told 2 you 2 was 1 was 2 with 2 我们关注这一排序过程。 我们有100M 的词条需要排序。 22
<<向上翻页向下翻页>>
©2008-现在 cucdc.com 高等教育资讯网 版权所有