Outline Vector Space Model (VSM) Latent Semantic Model (LSI) ·Language Model(LM) CCF-ADL at Zhengzhou University, 2 June25-27,2010
Outline • Vector Space Model (VSM) • Latent Semantic Model (LSI) • Language Model (LM) 2 CCF-ADL at Zhengzhou University, June 25-27, 2010
Simple flow of retrieval process Information Need Text Objects 2 Representation Representation Query Indexed Objects Comparison Evaluation /Feedback Retrieved Objects CCF-ADL at Zhengzhou University June25-27,2010
CCF -ADL at Zhengzhou University, June 25 -27, 2010 3
文件E)编辑(E)查看)历史(⑤)书签但)工具(①)帮助仙) http://www.google.com/search?hl-en&newwindow=-18q-latent+semantictindexing&aq-0e&oq Google 4 in Zob..James Mcc-.Chengxian..图百度搜索_Gmail-nb.Conferenc..web Base. Pregel 如何修改p.laten.…区 Web Images Videos Maps News Shopping Gmail more kyhhdm@gmail.com|Web History I Settings Google latent semantic indexing Search Advanced Search weh田Show options. Results 1-10 of about 129,000 for latent semantic indexing.(0.31 seco Latent semantic indexing-Wikipedia,the free encyclopedia Latent Semantic Indexing(LSI)is an indexing and retrieval method that uses a mathematical technique called Singular Value Decomposition (SVD)to identify... Relevance Feedback Benefits of LSI-LSI Timeline-Mathematics of LSI en.wikipedia.org/wiki/Latent_semantic_indexing-Cached-Similar- Query Expansion Latent semantic analysis-Wikipedia,the free encyclopedia CO:2-9.http://Isi.research.telcordia.com/Isi/papers/JASIS90.pdf.Original article where the model was first exposed.Michael Berry.S.T.Dumais,.. Occurrence matrix-Applications-Rank lowering-Derivation en.wikipedia.org/wiki/Latent_semantic_analysis-Cached-Similar- Google Semantically Related Words Latent Semantic Indexing .. Google recently strongly promoted the semantic relationships of words in their algorithm. www.seobook.com/archives/000657.shtml-Cached-Similar-x Latent Semantic Indexing Latent semantic indexing adds an important step to the document indexing process.In addition to recording which keywords a document contains,.. www.seobook.com/lsi/lsa_definition.htm-Cached Similar- LSI-Latent Semantic Indexing Web Site January 12,2006 podcast interview of Michael W.Berry discussing LSI on the Good Karma Show hosted by Greg Niland (aka GoodROl)at WebmasterRadio.fm... ww.cs.utk.edu-lsi/-Cached-Similar-⊙图☒ Laterit Semantic Indexingrsity, 完成tne25-27,2010
Relevance Feedback Query Expansion CCF-ADL at Zhengzhou University, June 25-27, 2010 4
Vector Space Model
Vector Space Model
Documents as vectors Di D2 D3 Da Ds Do 中国 4.1 0.0 3.7 5.9 3.1 0.0 文化 4.5 4.5 0 0 11.6 0 日本 0 3.5 2.9 0 2.1 3.9 留学生 0 3.1 5.1 12.8 0 0 教育 2.9 0 0 2.2 0 0 北京 7.1 0 0 0 4.4 3.8 每一个文档j能够被看作一个向量,每个term是一个维 度,取值为log-scaled tf.idf So we have a vector space -terms are axes docs live in this space -高维空间:即使作stemming,.may have20,000+dimensions 6
Documents as vectors • 每一个文档 j 能够被看作一个向量,每个term 是一个维 度,取值为log-scaled tf.idf • So we have a vector space – terms are axes – docs live in this space – 高维空间:即使作stemming, may have 20,000+ dimensions D1 D2 D3 D4 D5 D6 … 中国 4.1 0.0 3.7 5.9 3.1 0.0 文化 4.5 4.5 0 0 11.6 0 日本 0 3.5 2.9 0 2.1 3.9 留学生 0 3.1 5.1 12.8 0 0 教育 2.9 0 0 2.2 0 0 北京 7.1 0 0 0 4.4 3.8 … 6
Intuition t3 d2 d3 ,d 8 中 t da Postulate:在vector space中“close together'"的 文档会talk about the same things. 用例:Query-by-example,Free Text query as vector CCF-ADL at Zhengzhou University,June 25-27,2010
Intuition Postulate: 在vector space中“close together” 的 文档会talk about the same things. t1 d2 d1 d3 d4 d5 t3 t2 θ φ 用例:Query-by-example,Free Text query as vector CCF-ADL at Zhengzhou University, June 25-27, 2010 7
Cosine similarity t3 d2 。向量d,和d,的“closeness” 可以用它们之间的夹角大 小来度量 -d ·具体的,可用cosine of the 8 angle x来计算向量相似度. 向量按长度归一化 Normalization 2 a=v∑w2=1 sim(djdk)= d d ∑e小 V∑∑暖 8
Cosine similarity 1 1 , 2 = = = M i i j d j w • 向量d1和d2的“closeness” 可以用它们之间的夹角大 小来度量 • 具体的,可用cosine of the angle x来计算向量相似度. • 向量按长度归一化 Normalization t 1 d 2 d 1 t 3 t 2 θ = = = = = M i i k M i i j M i i j i k j k j k j k w w w w d d d d sim d d 1 2 , 1 2 , 1 , , ( , ) 8
Latent Semantic Model
Latent Semantic Model
Vector Space Model:Pros Automatic selection of index terms Partial matching of queries and documents (dealing with the case where no document contains all search terms) Ranking according to similarity score (dealing with large result sets) Term weighting schemes (improves retrieval performance) ·Various extensions -Document clustering Relevance feedback(modifying query vector) Geometric foundation CCF-ADL at Zhengzhou University, 10 June25-27,2010
Vector Space Model: Pros • Automatic selection of index terms • Partial matching of queries and documents (dealing with the case where no document contains all search terms) • Ranking according to similarity score (dealing with large result sets) • Term weighting schemes (improves retrieval performance) • Various extensions – Document clustering – Relevance feedback (modifying query vector) • Geometric foundation CCF-ADL at Zhengzhou University, June 25-27, 2010 10
I guess this page is about a blackberry...? plackberry blackberry blackberry blackhemy CCF-ADL at Zhengzhou University, 11 June25-27,2010
CCF -ADL at Zhengzhou University, June 25 -27, 2010 11