Using this we obtain P(X1 ∈ B|X0 ∈ A)_中国高校课件下载中心

点击下载：《多元统计分析》课程教学资源（阅读材料）A tutorial on Spectral clustering

正在加载图片...

Using this we obtain P(X1∈BX∈A)= P(X0∈A,X1∈B) P(Xo∈A) vol(A) ∑EAJEB vol(V) vol(V) ∈A.j∈B vol(A) Now the proposition follows directly with the definition of Ncut. ▣ This proposition leads to a nice interpretation of Ncut,and hence of normalized spectral clustering.It tells us that when minimizing Ncut,we actually look for a cut through the graph such that a random walk seldom transitions from A to A and vice versa. The commute distance A second connection between random walks and graph Laplacians can be made via the commute dis- tance on the graph.The commute distance (also called resistance distance)cij between two vertices v and v;is the expected time it takes the random walk to travel from vertex v;to vertex v;and back (Lovasz,1993;Aldous and Fill,in preparation).The commute distance has several nice properties which make it particularly appealing for machine learning.As opposed to the shortest path distance on a graph,the commute distance between two vertices decreases if there are many different short ways to get from vertex vi to vertex vj.So instead of just looking for the one shortest path,the commute distance looks at the set of short paths.Points which are connected by a short path in the graph and lie in the same high-density region of the graph are considered closer to each other than points which are connected by a short path but lie in different high-density regions of the graph.In this sense,the commute distance seems particularly well-suited to be used for clustering purposes. Remarkably,the commute distance on a graph can be computed with the help of the generalized inverse (also called pseudo-inverse or Moore-Penrose inverse)Lt of the graph Laplacian L.In the following we denote ei=(0,...0,1,0,...,0)'as the i-th unit vector.To define the generalized inverse of L,recall that by Proposition 1 the matrix L can be decomposed as L=UAU'where U is the matrix containing all eigenvectors as columns and A the diagonal matrix with the eigenvalues A1,...,An on the diagonal. As at least one of the eigenvalues is 0,the matrix L is not invertible.Instead,we define its generalized inverse as Lt:=UAtU'where the matrix At is the diagonal matrix with diagonal entries 1/A;if A;0 and 0if=0.The entries of Lt can be computed as The matrix Lt is positive semi-definite and symmetric.For further properties of Li see Gutman and Xiao(2004). Proposition 6(Commute distance)Let G=(V,E)a connected,undirected graph.Denote by cij the commute distance between verterv;and vertervj and by Lt=(the generalized inverse of L.Then we have: c=vol(W)-2吗+i）)=vol(W(e-e)'L(e-e. This result has been published by Klein and Randic(1993),where it has been proved by methods of electrical network theory.For a proof using first step analysis for random walks see Fouss,Pirotte,Ren- ders,and Saerens(2007).There also exist other ways to express the commute distance with the help of graph Laplacians.For example a method in terms of eigenvectors of the normalized Laplacian Lsym can be found as Corollary 3.2 in Lovasz(1993),and a method computing the commute distance with the help of determinants of certain sub-matrices of L can be found in Bapat,Gutman,and Xiao(2003). Proposition 6 has an important consequence.It shows that cij can be considered as a Euclidean distance function on the vertices of the graph.This means that we can construct an embedding which 15Using this we obtain P(X1 ∈ B|X0 ∈ A) = P(X0 ∈ A, X1 ∈ B) P(X0 ∈ A) =   1 vol(V ) X i∈A,j∈B wij   vol(A) vol(V ) −1 = P i∈A,j∈B wij vol(A) . Now the proposition follows directly with the definition of Ncut. 2 This proposition leads to a nice interpretation of Ncut, and hence of normalized spectral clustering. It tells us that when minimizing Ncut, we actually look for a cut through the graph such that a random walk seldom transitions from A to A and vice versa. The commute distance A second connection between random walks and graph Laplacians can be made via the commute distance on the graph. The commute distance (also called resistance distance) cij between two vertices vi and vj is the expected time it takes the random walk to travel from vertex vi to vertex vj and back (Lov´asz, 1993; Aldous and Fill, in preparation). The commute distance has several nice properties which make it particularly appealing for machine learning. As opposed to the shortest path distance on a graph, the commute distance between two vertices decreases if there are many different short ways to get from vertex vi to vertex vj . So instead of just looking for the one shortest path, the commute distance looks at the set of short paths. Points which are connected by a short path in the graph and lie in the same high-density region of the graph are considered closer to each other than points which are connected by a short path but lie in different high-density regions of the graph. In this sense, the commute distance seems particularly well-suited to be used for clustering purposes. Remarkably, the commute distance on a graph can be computed with the help of the generalized inverse (also called pseudo-inverse or Moore-Penrose inverse) L † of the graph Laplacian L. In the following we denote ei = (0, . . . 0, 1, 0, . . . , 0)0 as the i-th unit vector. To define the generalized inverse of L, recall that by Proposition 1 the matrix L can be decomposed as L = UΛU 0 where U is the matrix containing all eigenvectors as columns and Λ the diagonal matrix with the eigenvalues λ1, . . . , λn on the diagonal. As at least one of the eigenvalues is 0, the matrix L is not invertible. Instead, we define its generalized inverse as L † := UΛ †U 0 where the matrix Λ† is the diagonal matrix with diagonal entries 1/λi if λi 6= 0 and 0 if λi = 0. The entries of L † can be computed as l † ij = Pn k=2 1 λk uikujk. The matrix L † is positive semi-definite and symmetric. For further properties of L † see Gutman and Xiao (2004). Proposition 6 (Commute distance) Let G = (V, E) a connected, undirected graph. Denote by cij the commute distance between vertex vi and vertex vj , and by L † = (l † ij )i,j=1,...,n the generalized inverse of L. Then we have: cij = vol(V )(l † ii − 2l † ij + l † jj ) = vol(V )(ei − ej ) 0L † (ei − ej ). This result has been published by Klein and Randic (1993), where it has been proved by methods of electrical network theory. For a proof using first step analysis for random walks see Fouss, Pirotte, Renders, and Saerens (2007). There also exist other ways to express the commute distance with the help of graph Laplacians. For example a method in terms of eigenvectors of the normalized Laplacian Lsym can be found as Corollary 3.2 in Lov´asz (1993), and a method computing the commute distance with the help of determinants of certain sub-matrices of L can be found in Bapat, Gutman, and Xiao (2003). Proposition 6 has an important consequence. It shows that √cij can be considered as a Euclidean distance function on the vertices of the graph. This means that we can construct an embedding which 15

<<向上翻页向下翻页>>

点击下载：《多元统计分析》课程教学资源（阅读材料）A tutorial on Spectral clustering