Of course,the relaxation we discussed above is not unique.For example,a completely different relax- ation which leads to a semi-definite program is derived in Bie and Cristianini(2006),and there might be many other useful relaxations.The reason why the spectral relaxation is so appealing is not that it leads to particularly good solutions.Its popularity is mainly due to the fact that it results in a standard linear algebra problem which is simple to solve. 6 Random walks point of view Another line of argument to explain spectral clustering is based on random walks on the similarity graph.A random walk on a graph is a stochastic process which randomly jumps from vertex to vertex. We will see below that spectral clustering can be interpreted as trying to find a partition of the graph such that the random walk stays long within the same cluster and seldom jumps between clusters. Intuitively this makes sense,in particular together with the graph cut explanation of the last section: a balanced partition with a low cut will also have the property that the random walk does not have many opportunities to jump between clusters.For background reading on random walks in general we refer to Norris (1997)and Bremaud(1999),and for random walks on graphs we recommend Aldous and Fill (in preparation)and Lovasz(1993).Formally,the transition probability of jumping in one step from vertex vi to vertex vj is proportional to the edge weight wij and is given by pij:=wj/di. The transition matrix P=(Pij)=1..n of the random walk is thus defined by P=D-W. If the graph is connected and non-bipartite,then the random walk always possesses a unique stationary distribution =(m1,...,n),where mi=di/vol(V).Obviously there is a tight relationship between Lrw and P,as Lrw =I-P.As a consequence,A is an eigenvalue of Lrw with eigenvector u if and only if 1-A is an eigenvalue of P with eigenvector u.It is well known that many properties of a graph can be expressed in terms of the corresponding random walk transition matrix P,see Lovasz(1993)for an overview.From this point of view it does not come as a surprise that the largest eigenvectors of P and the smallest eigenvectors of Lrw can be used to describe cluster properties of the graph. Random walks and Ncut A formal equivalence between Ncut and transition probabilities of the random walk has been observed in Meila and Shi(2001). Proposition 5(Ncut via transition probabilities)Let G be connected and non bi-partite.As- sume that we run the random walk (Xt)tEN starting with Xo in the stationary distribution n.For disjoint subsets A,BCV,denote by P(BA):=P(X1 E BXo E A).Then: Ncut(A,A)=P(AA)+P(AA) Proof.First of all observe that P(K0∈A,X1∈B)=∑P(X0=i,X1=)=∑P iEA.jEB i∈A.jeB 品品 ∑ iEA.jEB 14Of course, the relaxation we discussed above is not unique. For example, a completely different relaxation which leads to a semi-definite program is derived in Bie and Cristianini (2006), and there might be many other useful relaxations. The reason why the spectral relaxation is so appealing is not that it leads to particularly good solutions. Its popularity is mainly due to the fact that it results in a standard linear algebra problem which is simple to solve. 6 Random walks point of view Another line of argument to explain spectral clustering is based on random walks on the similarity graph. A random walk on a graph is a stochastic process which randomly jumps from vertex to vertex. We will see below that spectral clustering can be interpreted as trying to find a partition of the graph such that the random walk stays long within the same cluster and seldom jumps between clusters. Intuitively this makes sense, in particular together with the graph cut explanation of the last section: a balanced partition with a low cut will also have the property that the random walk does not have many opportunities to jump between clusters. For background reading on random walks in general we refer to Norris (1997) and Br´emaud (1999), and for random walks on graphs we recommend Aldous and Fill (in preparation) and Lov´asz (1993). Formally, the transition probability of jumping in one step from vertex vi to vertex vj is proportional to the edge weight wij and is given by pij := wij/di . The transition matrix P = (pij )i,j=1,...,n of the random walk is thus defined by P = D−1W. If the graph is connected and non-bipartite, then the random walk always possesses a unique stationary distribution π = (π1, . . . , πn) 0 , where πi = di/ vol(V ). Obviously there is a tight relationship between Lrw and P, as Lrw = I −P. As a consequence, λ is an eigenvalue of Lrw with eigenvector u if and only if 1 − λ is an eigenvalue of P with eigenvector u. It is well known that many properties of a graph can be expressed in terms of the corresponding random walk transition matrix P, see Lov´asz (1993) for an overview. From this point of view it does not come as a surprise that the largest eigenvectors of P and the smallest eigenvectors of Lrw can be used to describe cluster properties of the graph. Random walks and Ncut A formal equivalence between Ncut and transition probabilities of the random walk has been observed in Meila and Shi (2001). Proposition 5 (Ncut via transition probabilities) Let G be connected and non bi-partite. Assume that we run the random walk (Xt)t∈◆ starting with X0 in the stationary distribution π. For disjoint subsets A, B ⊂ V , denote by P(B|A) := P(X1 ∈ B|X0 ∈ A). Then: Ncut(A, A) = P(A|A) + P(A|A). Proof. First of all observe that P(X0 ∈ A, X1 ∈ B) = X i∈A,j∈B P(X0 = i, X1 = j) = X i∈A,j∈B πipij = X i∈A,j∈B di vol(V ) wij di = 1 vol(V ) X i∈A,j∈B wij . 14