form disconnected parts in the 10-nearest neighbor graph,in which case the eigenvectors are given as in Propositions 2 and 4.The next two rows show the results for the fully connected graph.As the Gaussian similarity function is always positive,this graph only consists of one connected component. Thus,eigenvalue 0 has multiplicity 1,and the first eigenvector is the constant vector.The following eigenvectors carry the information about the clusters.For example in the unnormalized case (last row),if we threshold the second eigenvector at 0,then the part below 0 corresponds to clusters 1 and 2,and the part above 0 to clusters 3 and 4.Similarly,thresholding the third eigenvector separates clusters 1 and 4 from clusters 2 and 3,and thresholding the fourth eigenvector separates clusters 1 and 3 from clusters 2 and 4.Altogether,the first four eigenvectors carry all the information about the four clusters.In all the cases illustrated in this figure,spectral clustering using k-means on the first four eigenvectors easily detects the correct four clusters. 5 Graph cut point of view The intuition of clustering is to separate points in different groups according to their similarities.For data given in form of a similarity graph,this problem can be restated as follows:we want to find a par- tition of the graph such that the edges between different groups have a very low weight(which means that points in different clusters are dissimilar from each other)and the edges within a group have high weight (which means that points within the same cluster are similar to each other).In this section we will see how spectral clustering can be derived as an approximation to such graph partitioning problems. Given a similarity graph with adjacency matrix W,the simplest and most direct way to construct a partition of the graph is to solve the mincut problem.To define it,please recall the notation W(A,B):and A for the complement of A.For a given numberk of subsets,the mincut approach simply consists in choosing a partition A1,...,Ak which minimizes cut(A1,...,Ak):= ∑W(A,A) 2 i=1 Here we introduce the factor 1/2 for notational consistency,otherwise we would count each edge twice in the cut.In particular for k=2,mincut is a relatively easy problem and can be solved efficiently, see Stoer and Wagner (1997)and the discussion therein.However,in practice it often does not lead to satisfactory partitions.The problem is that in many cases,the solution of mincut simply separates one individual vertex from the rest of the graph.Of course this is not what we want to achieve in clustering,as clusters should be reasonably large groups of points.One way to circumvent this problem is to explicitly request that the sets A1,...,Ak are "reasonably large".The two most common objective functions to encode this are RatioCut (Hagen and Kahng,1992)and the normalized cut Ncut (Shi and Malik,2000).In RatioCut,the size of a subset A of a graph is measured by its number of vertices A,while in Ncut the size is measured by the weights of its edges vol(A).The definitions are: RatioCut(41,,A):=)∑4,4 、cut(Ai,A) Ncut(A1,...,Ak):= 2 =>cut(4,A) vol(A:) i=1 vol(Ai) Note that both objective functions take a small value if the clusters Ai are not too small.In partic- ular,the minimum of the function(A)is achieved if all coincide,and the minimum of (1vo(A))is achieved if all vol(A)coincide.So what both objective functionstry to achieve is that the clusters are "balanced",as measured by the number of vertices or edge weights,respectively. Unfortunately,introducing balancing conditions makes the previously simple to solve mincut problem 9form disconnected parts in the 10-nearest neighbor graph, in which case the eigenvectors are given as in Propositions 2 and 4. The next two rows show the results for the fully connected graph. As the Gaussian similarity function is always positive, this graph only consists of one connected component. Thus, eigenvalue 0 has multiplicity 1, and the first eigenvector is the constant vector. The following eigenvectors carry the information about the clusters. For example in the unnormalized case (last row), if we threshold the second eigenvector at 0, then the part below 0 corresponds to clusters 1 and 2, and the part above 0 to clusters 3 and 4. Similarly, thresholding the third eigenvector separates clusters 1 and 4 from clusters 2 and 3, and thresholding the fourth eigenvector separates clusters 1 and 3 from clusters 2 and 4. Altogether, the first four eigenvectors carry all the information about the four clusters. In all the cases illustrated in this figure, spectral clustering using k-means on the first four eigenvectors easily detects the correct four clusters. 5 Graph cut point of view The intuition of clustering is to separate points in different groups according to their similarities. For data given in form of a similarity graph, this problem can be restated as follows: we want to find a partition of the graph such that the edges between different groups have a very low weight (which means that points in different clusters are dissimilar from each other) and the edges within a group have high weight (which means that points within the same cluster are similar to each other). In this section we will see how spectral clustering can be derived as an approximation to such graph partitioning problems. Given a similarity graph with adjacency matrix W, the simplest and most direct way to construct a partition of the graph is to solve the mincut problem. To define it, please recall the notation W(A, B) := P i∈A,j∈B wij and A for the complement of A. For a given number k of subsets, the mincut approach simply consists in choosing a partition A1, . . . , Ak which minimizes cut(A1, . . . , Ak) := 1 2 X k i=1 W(Ai , Ai). Here we introduce the factor 1/2 for notational consistency, otherwise we would count each edge twice in the cut. In particular for k = 2, mincut is a relatively easy problem and can be solved efficiently, see Stoer and Wagner (1997) and the discussion therein. However, in practice it often does not lead to satisfactory partitions. The problem is that in many cases, the solution of mincut simply separates one individual vertex from the rest of the graph. Of course this is not what we want to achieve in clustering, as clusters should be reasonably large groups of points. One way to circumvent this problem is to explicitly request that the sets A1, . . . , Ak are “reasonably large”. The two most common objective functions to encode this are RatioCut (Hagen and Kahng, 1992) and the normalized cut Ncut (Shi and Malik, 2000). In RatioCut, the size of a subset A of a graph is measured by its number of vertices |A|, while in Ncut the size is measured by the weights of its edges vol(A). The definitions are: RatioCut(A1, . . . , Ak) := 1 2 X k i=1 W(Ai , Ai) |Ai | = X k i=1 cut(Ai , Ai) |Ai | Ncut(A1, . . . , Ak) := 1 2 X k i=1 W(Ai , Ai) vol(Ai) = X k i=1 cut(Ai , Ai) vol(Ai) . Note that both objective functions take a small value if the clusters Ai are not too small. In particular, the minimum of the function Pk i=1(1/|Ai |) is achieved if all |Ai | coincide, and the minimum of Pk i=1(1/ vol(Ai)) is achieved if all vol(Ai) coincide. So what both objective functions try to achieve is that the clusters are “balanced”, as measured by the number of vertices or edge weights, respectively. Unfortunately, introducing balancing conditions makes the previously simple to solve mincut problem 9