当前位置:高等教育资讯网  >  中国高校课件下载中心  >  大学文库  >  浏览文档

上海交通大学:《挖掘海量数据集 Mining Massive Datasets》课程教学资源(PPT讲稿)Lecture 07 链接分析 Link Analysis

资源类别:文库,文档格式:PPT,文档页数:84,文件大小:1.04MB,团购合买
▪ PageRank ▪ Topic-Sensitive PageRank ▪ Hubs and Authorities ▪ Spam Detection
点击下载完整版文档(PPT)

Link Analysis Mining Massive Datasets Wu-Jun li Department of Computer Science and Engineering Shanghai Jiao Tong University Lecture 7: Link Analysis

Link Analysis 1 Wu-Jun Li Department of Computer Science and Engineering Shanghai Jiao Tong University Lecture 7: Link Analysis Mining Massive Datasets

Link Analysis Link Analysis algorithms PageRank Hubs and authorities Topic-Sensitive PageRank Spam Detection Algorithms Other interesting topics we wont cover Detecting duplicates and mirrors Mining for communities community detection (Refer to Chapter 10 of the textbook)

Link Analysis 2 Link Analysis Algorithms ▪ PageRank ▪ Hubs and Authorities ▪ Topic-Sensitive PageRank ▪ Spam Detection Algorithms ▪ Other interesting topics we won’t cover ▪ Detecting duplicates and mirrors ▪ Mining for communities (community detection) (Refer to Chapter 10 of the textbook)

Link Analysis Outline PageRank Topic-Sensitive PageRank Hubs and authorities Spam detection

Link Analysis 3 Outline ▪ PageRank ▪ Topic-Sensitive PageRank ▪ Hubs and Authorities ▪ Spam Detection

Link Analysis PageRank Ranking web pages Web pages are not equally important www.joe-schmoe.comvwww.stanford.edu Inlinks as votes www.stanford.eduhas23,400inlinks www.joe-schmoe.comhas1inlink Are all inlinks equal? Recursive question!

Link Analysis 4 Ranking web pages ▪ Web pages are not equally “important” ▪ www.joe-schmoe.com v www.stanford.edu ▪ Inlinks as votes ▪ www.stanford.edu has 23,400 inlinks ▪ www.joe-schmoe.com has 1 inlink ▪ Are all inlinks equal? ▪ Recursive question! PageRank

Link Analysis PageRank Simple recursive formulation Each link's vote is proportional to the importance of Its source page If page P with importance x has n outlinks, each link gets x/n votes Page p's own importance is the sum of the votes on its inlinks

Link Analysis 5 Simple recursive formulation ▪ Each link’s vote is proportional to the importance of its source page ▪ If page P with importance x has n outlinks, each link gets x/n votes ▪ Page P’s own importance is the sum of the votes on its inlinks PageRank

Link Analysis PageRank Simple flow"model The web in 1839 y/2 Yahoo y=y/2+a/2 a=y 2+m a/2 y m=a/2 m Amazon Soft a/2 a m

Link Analysis 6 Simple “flow” model The web in 1839 Yahoo Amazon M’soft y a m y/2 y/2 a/2 a/2 m y = y /2 + a /2 a = y /2 + m m = a /2 PageRank

Link Analysis PageRank Solving the flow equations 3 equations 3 unknowns, no constants No unique solution All solutions equivalent modulo scale factor Additional constraint forces uniqueness yta+m 1 ny=2/5,a=2/5,m=1/5 Gaussian elimination method works for small examples but we need a better method for large graphs

Link Analysis 7 Solving the flow equations ▪ 3 equations, 3 unknowns, no constants ▪ No unique solution ▪ All solutions equivalent modulo scale factor ▪ Additional constraint forces uniqueness ▪ y+a+m = 1 ▪ y = 2/5, a = 2/5, m = 1/5 ▪ Gaussian elimination method works for small examples, but we need a better method for large graphs PageRank

Link Analysis PageRank Matrix formulation Matrix M has one row and one column for each web page Suppose page j has n outlinks fj→ i, then m=1/n ■E|seM:=0 M is a column stochastic matrix Columns sum to 1 Suppose r is a vector with one entry per web page r; is the importance score of page i Call it the rank vector

Link Analysis 8 Matrix formulation ▪ Matrix M has one row and one column for each web page ▪ Suppose page j has n outlinks ▪ If j i, then Mij=1/n ▪ Else Mij=0 ▪ M is a column stochastic matrix ▪ Columns sum to 1 ▪ Suppose r is a vector with one entry per web page ▪ ri is the importance score of page i ▪ Call it the rank vector ▪ |r| = 1 PageRank

Link Analysis PageRank Example Suppose page j links to 3 pages, including i

Link Analysis 9 Example Suppose page j links to 3 pages, including i i j M r r = i 1/3 PageRank

Link Analysis PageRank Eigenvector formulation The flow equations can be written r= Mr So the rank vector is an eigenvector of the stochastic web matrix In fact, its first or principal eigenvector, with corresponding eigenvalue 1

Link Analysis 10 Eigenvector formulation ▪ The flow equations can be written r = Mr ▪ So the rank vector is an eigenvector of the stochastic web matrix ▪ In fact, its first or principal eigenvector, with corresponding eigenvalue 1 PageRank

点击下载完整版文档(PPT)VIP每日下载上限内不扣除下载券和下载次数;
按次数下载不扣除下载券;
24小时内重复下载只扣除一次;
顺序:VIP每日次数-->可用次数-->下载券;
共84页,可试读20页,点击继续阅读 ↓↓
相关文档

关于我们|帮助中心|下载说明|相关软件|意见反馈|联系我们

Copyright © 2008-现在 cucdc.com 高等教育资讯网 版权所有