1.We want to find a partition such that points in different clusters are dissimilar to each other. that is we want to minimize the between-cluster similarity.In the graph setting,this means to minimize cut(A,A). 2.We want to find a partition such that points in the same cluster are similar to each other,that is we want to maximize the within-cluster similarities W(A,A)and W(A.A). Both RatioCut and Ncut directly implement the first objective by explicitly incorporating cut(A,A) in the objective function.However,concerning the second point,both algorithms behave differently. Note that W(A,A)=W(A,V)-W(A,A)=vol(A)-cut(A,A). Hence,the within-cluster similarity is maximized if cut(A,A)is small and if vol(A)is large.As this is exactly what we achieve by minimizing Ncut,the Ncut criterion implements the second objective. This can be seen even more explicitly by considering yet another graph cut objective function,namely the MinMaxCut criterion introduced by Ding.He.Zha.Gu,and Simon (2001): MinMaxCut(A1....A)=cut(Ai,4) W(Ai:Ai) =1 Compared to Ncut,which has the terms vol(A)=cut(A,A)+W(A,A)in the denominator,the MinMaxCut criterion only has W(A,A)in the denominator.In practice,Ncut and MinMaxCut are often minimized by similar cuts,as a good Ncut solution will have a small value of cut(A,A)anyway and hence the denominators are not so different after all.Moreover,relaxing MinMaxCut leads to exactly the same optimization problem as relaxing Ncut,namely to normalized spectral clustering with the eigenvectors of Lw.So one can see by several ways that normalized spectral clustering incorporates both clustering objectives mentioned above. Now consider the case of RatioCut.Here the objective is to maximize A andA instead of vol(A)and vol(A).But A and A are not necessarily related to the within-cluster similarity,as the within-cluster similarity depends on the edges and not on the number of vertices in A.For instance,just think of a set A which has very many vertices,all of which only have very low weighted edges to each other. Minimizing RatioCut does not attempt to maximize the within-cluster similarity,and the same is then true for its relaxation by unnormalized spectral clustering So this is our first important point to keep in mind:Normalized spectral clustering implements both clustering objectives mentioned above,while unnormalized spectral clustering only implements the first objective. Consistency issues A completely different argument for the superiority of normalized spectral clustering comes from a sta- tistical analysis of both algorithms.In a statistical setting one assumes that the data points z1,...,n have been sampled i.i.d.according to some probability distribution P on some underlying data space &The most fundamental question is then the question of consistency:if we draw more and more data points,do the clustering results of spectral clustering converge to a useful partition of the underlying space t? For both normalized spectral clustering algorithms,it can be proved that this is indeed the case(von Luxburg,Bousquet,and Belkin,2004,2005;von Luxburg,Belkin,and Bousquet,to appear).Mathe- matically,one proves that as we take the limit n-oo,the matrix Lsym converges in a strong sense 251. We want to find a partition such that points in different clusters are dissimilar to each other, that is we want to minimize the between-cluster similarity. In the graph setting, this means to minimize cut(A, A). 2. We want to find a partition such that points in the same cluster are similar to each other, that is we want to maximize the within-cluster similarities W(A, A) and W(A, A). Both RatioCut and Ncut directly implement the first objective by explicitly incorporating cut(A, A) in the objective function. However, concerning the second point, both algorithms behave differently. Note that W(A, A) = W(A, V ) − W(A, A) = vol(A) − cut(A, A). Hence, the within-cluster similarity is maximized if cut(A, A) is small and if vol(A) is large. As this is exactly what we achieve by minimizing Ncut, the Ncut criterion implements the second objective. This can be seen even more explicitly by considering yet another graph cut objective function, namely the MinMaxCut criterion introduced by Ding, He, Zha, Gu, and Simon (2001): MinMaxCut(A1, . . . , Ak) := X k i=1 cut(Ai , Ai) W(Ai , Ai) . Compared to Ncut, which has the terms vol(A) = cut(A, A) + W(A, A) in the denominator, the MinMaxCut criterion only has W(A, A) in the denominator. In practice, Ncut and MinMaxCut are often minimized by similar cuts, as a good Ncut solution will have a small value of cut(A, A) anyway and hence the denominators are not so different after all. Moreover, relaxing MinMaxCut leads to exactly the same optimization problem as relaxing Ncut, namely to normalized spectral clustering with the eigenvectors of Lrw. So one can see by several ways that normalized spectral clustering incorporates both clustering objectives mentioned above. Now consider the case of RatioCut. Here the objective is to maximize |A| and|A| instead of vol(A) and vol(A). But |A| and |A| are not necessarily related to the within-cluster similarity, as the within-cluster similarity depends on the edges and not on the number of vertices in A. For instance, just think of a set A which has very many vertices, all of which only have very low weighted edges to each other. Minimizing RatioCut does not attempt to maximize the within-cluster similarity, and the same is then true for its relaxation by unnormalized spectral clustering. So this is our first important point to keep in mind: Normalized spectral clustering implements both clustering objectives mentioned above, while unnormalized spectral clustering only implements the first objective. Consistency issues A completely different argument for the superiority of normalized spectral clustering comes from a statistical analysis of both algorithms. In a statistical setting one assumes that the data points x1, . . . , xn have been sampled i.i.d. according to some probability distribution P on some underlying data space X . The most fundamental question is then the question of consistency: if we draw more and more data points, do the clustering results of spectral clustering converge to a useful partition of the underlying space X ? For both normalized spectral clustering algorithms, it can be proved that this is indeed the case (von Luxburg, Bousquet, and Belkin, 2004, 2005; von Luxburg, Belkin, and Bousquet, to appear). Mathematically, one proves that as we take the limit n → ∞, the matrix Lsym converges in a strong sense 25