正在加载图片...
10 K.BRYAN AND T.LEISE is the second largest eigenvalue of M.Moreover,for M of the form M=(1-m)A+mS with A column-stochastic and all Sij =1/n it can be shown that 2<1-m (see,e.g.,[1],Theorem 5.10). As a result,the power method will converge much more rapidly than indicated by clxo-qll1. Nonetheless,the value of c in Proposition 4 provides a very simple bound on the convergence of the power method here.It is easy to see that since all entries of M are at least m/n,we will always have c<1-2m/n in Proposition 4. As a practical matter,note that the n x n positive matrix M has no non-zero elements,so the multiplication My for vE R"will typically take O(n2)multiplications and additions,a formidable computation if n =8,000,000,000.But equation(3.2)shows that if x is positive withx=1 then the multiplication Mx is equivalent to (1-m)Ax+ms.This is a far more efficient computation, since A can be expected to contain mostly zeros (most web pages link to only a few other pages). We've now proved our main theorem: THEOREM 4.2.The matrir M defined by (3.1)for a web with no dangling nodes will always be a positive column-stochastic matrir and so have a unique q with positive components such that Mq q and iqi =1.The vector q may be computed as the limit of iterations xk=(1-m)Axk-1+ms, where xo is any initial vector with positive components and xo1=1. The eigenvector x defined by equation(3.2)also has a probabilistic interpretation.Consider a web-surfer on a web of n pages with no dangling nodes.The surfer begins at some web page (it doesn't matter where)and randomly moves from web page to web page according to the following procedure:If the surfer is currently at a page with r outgoing links,he either randomly chooses any one of these links with uniform probability 1mOR he jumps to any randomly selected page on the web,each with probability(note thatrn=1,so this accounts for everything he can do). The surfer repeats this page-hopping procedure ad infinitum.The component rj of the normalized vector x in equation(3.2)is the fraction of time that the surfer spends,in the long run,on page j of the web.More important pages tend to be linked to by many other pages and so the surfer hits cIs.For the in Brercise 1,compute the alues of and those most often. for k =1,5,10,50,using an initial guess xo not too close to the actual eigenvector q(so that you can watch the convergence).Determine c=maxisjsn1-2mini<isn Mijl and the absolute value of the second largest eigenvalue of M. M*xo-qll EXERCISE 15.To see why the second largest eigenvalue plays a role in bounding consider an n x n positive column-stochastic matrix M that is diagonalizable.Let xo be any vector with non-negative components that sum to one.Since M is diagonalizable,we can create a basis of eigenvectors (q,v1,...,vn-1},where q is the steady state vector,and then write xo =aq+ bxvk.Determine Mkxo,and then show that a=1 and the sum of the components of each vk must equal 0.Nert apply Proposition 4 to prove that,except for the non-repeated eigenvalue X=1,the other eigenvalues are all strictly less than one in absolute value.Use this to evaluate IM*xo-ql1 limk一ooMk-1x-qi EXERCISE 16.Consider the link matrix 0 11 0 Show that M=(1-m)A+mS (all Sij =1/3)is not diagonalizable for 0<m 1. EXERCISE 17.How should the value of m be chosen?How does this choice affect the rankings and the computation time? REFERENCES [1]A.BERMAN AND R.PLEMMONS,Nonnegative Matrices in the Mathematical Sciences,Academic Press,New York,1979. [2 M.W.BERRY AND M.BROWNE,Understanding Search Engines:Mathematical Modeling and Tert Retrieval, Second Edition,SIAM,Philadelphia,2005. [3]M.BIANCHINI,M.GORI,AND F.SCARSELLI,Inside PageRank,ACM Trans.Internet Tech.,5(2005),pp.92-128.10 K. BRYAN AND T. LEISE is the second largest eigenvalue of M. Moreover, for M of the form M = (1 − m)A + mS with A column-stochastic and all Sij = 1/n it can be shown that |λ2| ≤ 1−m (see, e.g., [1], Theorem 5.10). As a result, the power method will converge much more rapidly than indicated by c kkx0 − qk1. Nonetheless, the value of c in Proposition 4 provides a very simple bound on the convergence of the power method here. It is easy to see that since all entries of M are at least m/n, we will always have c ≤ 1 − 2m/n in Proposition 4. As a practical matter, note that the n × n positive matrix M has no non-zero elements, so the multiplication Mv for v ∈ lRn will typically take O(n 2 ) multiplications and additions, a formidable computation if n = 8, 000, 000, 000. But equation (3.2) shows that if x is positive with kxk1 = 1 then the multiplication Mx is equivalent to (1 − m)Ax + ms. This is a far more efficient computation, since A can be expected to contain mostly zeros (most web pages link to only a few other pages). We’ve now proved our main theorem: Theorem 4.2. The matrix M defined by (3.1) for a web with no dangling nodes will always be a positive column-stochastic matrix and so have a unique q with positive components such that Mq = q and P i qi = 1. The vector q may be computed as the limit of iterations xk = (1 − m)Axk−1 + ms, where x0 is any initial vector with positive components and kx0k1 = 1. The eigenvector x defined by equation (3.2) also has a probabilistic interpretation. Consider a web-surfer on a web of n pages with no dangling nodes. The surfer begins at some web page (it doesn’t matter where) and randomly moves from web page to web page according to the following procedure: If the surfer is currently at a page with r outgoing links, he either randomly chooses any one of these links with uniform probability 1−m r OR he jumps to any randomly selected page on the web, each with probability m n (note that r 1−m r +n m n = 1, so this accounts for everything he can do). The surfer repeats this page-hopping procedure ad infinitum. The component xj of the normalized vector x in equation (3.2) is the fraction of time that the surfer spends, in the long run, on page j of the web. More important pages tend to be linked to by many other pages and so the surfer hits those most often. Exercise 14. For the web in Exercise 11, compute the values of kMkx0−qk1 and kMkx0−qk1 kMk−1x0−qk1 for k = 1, 5, 10, 50, using an initial guess x0 not too close to the actual eigenvector q (so that you can watch the convergence). Determine c = max1≤j≤n |1 − 2 min1≤i≤n Mij | and the absolute value of the second largest eigenvalue of M. Exercise 15. To see why the second largest eigenvalue plays a role in bounding kMkx0−qk1 kMk−1x0−qk1 , consider an n × n positive column-stochastic matrix M that is diagonalizable. Let x0 be any vector with non-negative components that sum to one. Since M is diagonalizable, we can create a basis of eigenvectors {q, v1, . . . , vn−1}, where q is the steady state vector, and then write x0 = aq + Pn−1 k=1 bkvk. Determine Mkx0, and then show that a = 1 and the sum of the components of each vk must equal 0. Next apply Proposition 4 to prove that, except for the non-repeated eigenvalue λ = 1, the other eigenvalues are all strictly less than one in absolute value. Use this to evaluate limk→∞ kMkx0−qk1 kMk−1x0−qk1 . Exercise 16. Consider the link matrix A =   0 1 2 1 2 0 0 1 2 1 1 2 0  . Show that M = (1 − m)A + mS (all Sij = 1/3) is not diagonalizable for 0 ≤ m < 1. Exercise 17. How should the value of m be chosen? How does this choice affect the rankings and the computation time? REFERENCES [1] A. Berman and R. Plemmons, Nonnegative Matrices in the Mathematical Sciences, Academic Press, New York, 1979. [2] M. W. Berry and M. Browne, Understanding Search Engines: Mathematical Modeling and Text Retrieval, Second Edition, SIAM, Philadelphia, 2005. [3] M. Bianchini, M. Gori, and F. Scarselli, Inside PageRank, ACM Trans. Internet Tech., 5 (2005), pp. 92–128
<<向上翻页向下翻页>>
©2008-现在 cucdc.com 高等教育资讯网 版权所有