Finally,note that the choice of the number of clusters and the choice of the connectivity parameters of the neighborhood graph affect each other.For example,if the connectivity parameter of the neigh- borhood graph is so small that the graph breaks into,say,ko connected components,then choosing ko as the number of clusters is a valid choice.However,as soon as the neighborhood graph is connected, it is not clear how the number of clusters and the connectivity parameters of the neighborhood graph interact.Both the choice of the number of clusters and the choice of the connectivity parameters of the graph are difficult problems on their own,and to our knowledge nothing non-trivial is known on their interactions. 8.4 The k-means step The three spectral clustering algorithms we presented in Section 4 use k-means as last step to extract the final partition from the real valued matrix of eigenvectors.First of all,note that there is nothing principled about using the k-means algorithm in this step.In fact,as we have seen from the various explanations of spectral clustering,this step should be very simple if the data contains well-expressed clusters.For example,in the ideal case if completely separated clusters we know that the eigenvectors of L and Lrw are piecewise constant.In this case,all points ri which belong to the same cluster C are mapped to exactly the sample point yi,namely to the unit vector esERk.In such a trivial case, any clustering algorithm applied to the points yiE R will be able to extract the correct clusters. While it is somewhat arbitrary what clustering algorithm exactly one chooses in the final step of spec- tral clustering,one can argue that at least the Euclidean distance between the points yi is a meaningful quantity to look at.We have seen that the Euclidean distance between the points yi is related to the "commute distance"on the graph,and in Nadler,Lafon,Coifman,and Kevrekidis(2006)the authors show that the Euclidean distances between the yi are also related to a more general "diffusion dis- tance".Also,other uses of the spectral embeddings (e.g.,Bolla(1991)or Belkin and Niyogi(2003)) show that the Euclidean distance in Rd is meaningful. Instead of k-means,people also use other techniques to construct he final solution from the real-valued representation.For example,in Lang(2006)the authors use hyperplanes for this purpose.A more advanced post-processing of the eigenvectors is proposed in Bach and Jordan(2004).Here the authors study the subspace spanned by the first k eigenvectors,and try to approximate this subspace as good as possible using piecewise constant vectors.This also leads to minimizing certain Euclidean distances in the space R,which can be done by some weighted k-means algorithm. 8.5 Which graph Laplacian should be used? A fundamental question related to spectral clustering is the question which of the three graph Lapla- cians should be used to compute the eigenvectors.Before deciding this question,one should always look at the degree distribution of the similarity graph.If the graph is very regular and most vertices have approximately the same degree,then all the Laplacians are very similar to each other,and will work equally well for clustering.However,if the degrees in the graph are very broadly distributed, then the Laplacians differ considerably.In our opinion,there are several arguments which advocate for using normalized rather than unnormalized spectral clustering,and in the normalized case to use the eigenvectors of Lrw rather than those of Lsym. Clustering objectives satisfied by the different algorithms The first argument in favor of normalized spectral clustering comes from the graph partitioning point of view.For simplicity let us discuss the case k=2.In general,clustering has two different objectives: 24Finally, note that the choice of the number of clusters and the choice of the connectivity parameters of the neighborhood graph affect each other. For example, if the connectivity parameter of the neighborhood graph is so small that the graph breaks into, say, k0 connected components, then choosing k0 as the number of clusters is a valid choice. However, as soon as the neighborhood graph is connected, it is not clear how the number of clusters and the connectivity parameters of the neighborhood graph interact. Both the choice of the number of clusters and the choice of the connectivity parameters of the graph are difficult problems on their own, and to our knowledge nothing non-trivial is known on their interactions. 8.4 The k-means step The three spectral clustering algorithms we presented in Section 4 use k-means as last step to extract the final partition from the real valued matrix of eigenvectors. First of all, note that there is nothing principled about using the k-means algorithm in this step. In fact, as we have seen from the various explanations of spectral clustering, this step should be very simple if the data contains well-expressed clusters. For example, in the ideal case if completely separated clusters we know that the eigenvectors of L and Lrw are piecewise constant. In this case, all points xi which belong to the same cluster Cs are mapped to exactly the sample point yi , namely to the unit vector es ∈ ❘ k . In such a trivial case, any clustering algorithm applied to the points yi ∈ ❘ k will be able to extract the correct clusters. While it is somewhat arbitrary what clustering algorithm exactly one chooses in the final step of spectral clustering, one can argue that at least the Euclidean distance between the points yi is a meaningful quantity to look at. We have seen that the Euclidean distance between the points yi is related to the “commute distance” on the graph, and in Nadler, Lafon, Coifman, and Kevrekidis (2006) the authors show that the Euclidean distances between the yi are also related to a more general “diffusion distance”. Also, other uses of the spectral embeddings (e.g., Bolla (1991) or Belkin and Niyogi (2003)) show that the Euclidean distance in ❘ d is meaningful. Instead of k-means, people also use other techniques to construct he final solution from the real-valued representation. For example, in Lang (2006) the authors use hyperplanes for this purpose. A more advanced post-processing of the eigenvectors is proposed in Bach and Jordan (2004). Here the authors study the subspace spanned by the first k eigenvectors, and try to approximate this subspace as good as possible using piecewise constant vectors. This also leads to minimizing certain Euclidean distances in the space ❘ k , which can be done by some weighted k-means algorithm. 8.5 Which graph Laplacian should be used? A fundamental question related to spectral clustering is the question which of the three graph Laplacians should be used to compute the eigenvectors. Before deciding this question, one should always look at the degree distribution of the similarity graph. If the graph is very regular and most vertices have approximately the same degree, then all the Laplacians are very similar to each other, and will work equally well for clustering. However, if the degrees in the graph are very broadly distributed, then the Laplacians differ considerably. In our opinion, there are several arguments which advocate for using normalized rather than unnormalized spectral clustering, and in the normalized case to use the eigenvectors of Lrw rather than those of Lsym. Clustering objectives satisfied by the different algorithms The first argument in favor of normalized spectral clustering comes from the graph partitioning point of view. For simplicity let us discuss the case k = 2. In general, clustering has two different objectives: 24