当前位置:高等教育资讯网  >  中国高校课件下载中心  >  大学文库  >  浏览文档

《网络搜索和挖掘关键技术 Web Search and Mining》课程教学资源(PPT讲稿)Lecture 08 Scoring and results assembly

资源类别:文库,文档格式:PPT,文档页数:48,文件大小:446KB,团购合买
点击下载完整版文档(PPT)

Computing Scores in a Complete Search System Web Search and Mining Lecture 8: Scoring and results assembly

Computing Scores in a Complete Search System Lecture 8: Scoring and results assembly Web Search and Mining 1

Computing Scores in a Complete Search System Recap: tf-idf weighting The tf-idf weight of a term is the product of its tf weight and its idf weight W,=(1+log tf, d)logo(N/df, Best known weighting scheme in information retrieval Increases with the number of occurrences within a document a Increases with the rarity of the term in the collection

Computing Scores in a Complete Search System Recap: tf-idf weighting ▪ The tf-idf weight of a term is the product of its tf weight and its idf weight. ▪ Best known weighting scheme in information retrieval ▪ Increases with the number of occurrences within a document ▪ Increases with the rarity of the term in the collection w (1 log tf ) log ( /df ) , t,d 10 N t t d = +  2

Computing Scores in a Complete Search System Recap: Queries as vectors Key idea 1: Do the same for queries: represent them as vectors in the space Key idea 2: Rank documents according to their proximity to the query in this space proximity similarity of vectors

Computing Scores in a Complete Search System Recap: Queries as vectors ▪ Key idea 1: Do the same for queries: represent them as vectors in the space ▪ Key idea 2: Rank documents according to their proximity to the query in this space ▪ proximity = similarity of vectors 3

Computing Scores in a Complete Search System Recap: cosine(query, document Dot product Unit vectors cos(g, d) ∑ cos(a, d) is the cosine similarity of g and d...on equivalently, the cosine of the angle between g and d

Computing Scores in a Complete Search System Recap: cosine(query,document)    = = = = • = • = V i i V i i V i i i q d q d d d q q q d q d q d 1 2 1 2 1 cos( , )           Dot product Unit vectors cos(q,d) is the cosine similarity of q and d … or, equivalently, the cosine of the angle between q and d. 4

Computing Scores in a Complete Search System This lecture Speeding up vector space ranking Putting together a complete search system Will require learning about a number of miscellaneous topics and heuristics

Computing Scores in a Complete Search System This lecture ▪ Speeding up vector space ranking ▪ Putting together a complete search system ▪ Will require learning about a number of miscellaneous topics and heuristics 5

Computing Scores in a Complete Search System Efficient Ranking Computing cosine scores CosineSCORE(q float Scores[N=0 2 float Length[N] 3 for each query term t 4 do calculate Wt g and fetch postings list for t or each pair(d, tft. d )in postings list lo Scores[d+=Wd×Wtq 7 Read the array Length 8 for each d 9 do Scores d= Scores[d/Length[d 10 return Top K components of Scores

Computing Scores in a Complete Search System Computing cosine scores Efficient Ranking 6

Computing Scores in a Complete Search System Efficient Ranking Efficient cosine rankin Find the k docs in the collection "nearest to the query =K largest query-doc cosines ■ Efficient ran king: Computing a single cosine efficiently Choosing the k largest cosine values efficiently Can we do this without computing all n cosines?

Computing Scores in a Complete Search System Efficient cosine ranking ▪ Find the K docs in the collection “nearest” to the query  K largest query-doc cosines. ▪ Efficient ranking: ▪ Computing a single cosine efficiently. ▪ Choosing the K largest cosine values efficiently. ▪ Can we do this without computing all N cosines? Efficient Ranking 7

Computing Scores in a Complete Search System Efficient Ranking Efficient cosine rankin What we re doing in effect solving the k-nearest neighbor problem for a query vector In general, we do not know how to do this efficiently for high-dimensional spaces But it is solvable for short queries and standard indexes support this well

Computing Scores in a Complete Search System Efficient cosine ranking ▪ What we’re doing in effect: solving the K-nearest neighbor problem for a query vector ▪ In general, we do not know how to do this efficiently for high-dimensional spaces ▪ But it is solvable for short queries, and standard indexes support this well Efficient Ranking 8

Computing Scores in a Complete Search System Efficient Ranking Special case -unweighted queries No weighting on query terms Assume each query term occurs only once Then for ranking, dont need to normalize query vector Slight simplification of agorithm from Lecture 7

Computing Scores in a Complete Search System Special case – unweighted queries ▪ No weighting on query terms ▪ Assume each query term occurs only once ▪ Then for ranking, don’t need to normalize query vector ▪ Slight simplification of algorithm from Lecture 7 Efficient Ranking 9

Computing Scores in a Complete Search System Efficient Ranking Faster cosine: unweighted query FASTCOSINESCORE(Q 1 float Scores[N=0 2 for each d 3 do Initialize Length[d] to the length of doc d 4 for each query term t 5 do calculate wtg and fetch postings list for t 6 for each pair(d, tft, d )in postings list 7 do add wf, d to Scores[d] 8 Read the array Length[dI g for each d 10 do Divide Scoresld] by Length[dI 11 return Top K components of Scoresll Figure 7.1 A faster algorithm for vector space scores

Computing Scores in a Complete Search System Faster cosine: unweighted query Efficient Ranking 10

点击下载完整版文档(PPT)VIP每日下载上限内不扣除下载券和下载次数;
按次数下载不扣除下载券;
24小时内重复下载只扣除一次;
顺序:VIP每日次数-->可用次数-->下载券;
共48页,可试读16页,点击继续阅读 ↓↓
相关文档

关于我们|帮助中心|下载说明|相关软件|意见反馈|联系我们

Copyright © 2008-现在 cucdc.com 高等教育资讯网 版权所有