Histogram of the sample Histogram of the sample Histogram of the sample 10 10 Eigenvalues Eigenvalues Eigenvalues 0.08 米 0.08 0.06 0.06 0.06 0.04 0.04 米 0.04 0.02 米米米 0.02 0.02 0叶2吉45678910 12345678910 12345678910 Figure 4:Three data sets,and the smallest 10 eigenvalues of Lw.See text for more details. Braun,and Buhmann,2004;Ben-David,von Luxburg,and Pal,2006).Of course all those methods can also be used for spectral clustering.Additionally,one tool which is particularly designed for spectral clustering is the eigengap heuristic,which can be used for all three graph Laplacians.Here the goal is to choose the number k such that all eigenvalues入1,.,入are very small,,but入k+1 is relatively large.There are several justifications for this procedure.The first one is based on perturbation theory, where we observe that in the ideal case of k completely disconnected clusters,the eigenvalue 0 has multiplicity k,and then there is a gap to the (k+1)th eigenvalue k+1>0.Other explanations can be given by spectral graph theory.Here,many geometric invariants of the graph can be expressed or bounded with the help of the first eigenvalues of the graph Laplacian.In particular,the sizes of cuts are closely related to the size of the first eigenvalues.For more details on this topic we refer to Bolla (1991),Mohar(1997)and Chung(1997). We would like to illustrate the eigengap heuristic on our toy example introduced in Section 4.For this purpose we consider similar data sets as in Section 4,but to vary the difficulty of clustering we consider the Gaussians with increasing variance.The first row of Figure 4 shows the histograms of the three samples.We construct the 10-nearest neighbor graph as described in Section 4,and plot the eigenvalues of the normalized Laplacian Lw on the different samples(the results for the unnormalized Laplacian are similar).The first data set consists of four well separated clusters,and we can see that the first 4 eigenvalues are approximately 0.Then there is a gap between the 4th and 5th eigenvalue, that is A5-A is relatively large.According to the eigengap heuristic,this gap indicates that the data set contains 4 clusters.The same behavior can also be observed for the results of the fully connected graph (already plotted in Figure 1).So we can see that the heuristic works well if the clusters in the data are very well pronounced.However,the more noisy or overlapping the clusters are,the less effective is this heuristic.We can see that for the second data set where the clusters are more "blurry", there is still a gap between the 4th and 5th eigenvalue,but it is not as clear to detect as in the case before.Finally,in the last data set,there is no well-defined gap,the differences between all eigenvalues are approximately the same.But on the other hand,the clusters in this data set overlap so much that many non-parametric algorithms will have difficulties to detect the clusters,unless they make strong assumptions on the underlying model.In this particular example,even for a human looking at the histogram it is not obvious what the correct number of clusters should be.This illustrates that,as most methods for choosing the number of clusters,the eigengap heuristic usually works well if the data contains very well pronounced clusters,but in ambiguous cases it also returns ambiguous results. 230 2 4 6 8 10 0 5 10 Histogram of the sample 0 2 4 6 8 10 0 5 10 Histogram of the sample 0 2 4 6 8 10 0 2 4 6 Histogram of the sample 1 2 3 4 5 6 7 8 9 10 0 0.02 0.04 0.06 Eigenvalues 1 2 3 4 5 6 7 8 9 10 0.02 0.04 0.06 0.08 Eigenvalues 1 2 3 4 5 6 7 8 9 10 0 0.02 0.04 0.06 0.08 Eigenvalues Figure 4: Three data sets, and the smallest 10 eigenvalues of Lrw. See text for more details. Braun, and Buhmann, 2004; Ben-David, von Luxburg, and P´al, 2006). Of course all those methods can also be used for spectral clustering. Additionally, one tool which is particularly designed for spectral clustering is the eigengap heuristic, which can be used for all three graph Laplacians. Here the goal is to choose the number k such that all eigenvalues λ1, . . . , λk are very small, but λk+1 is relatively large. There are several justifications for this procedure. The first one is based on perturbation theory, where we observe that in the ideal case of k completely disconnected clusters, the eigenvalue 0 has multiplicity k, and then there is a gap to the (k + 1)th eigenvalue λk+1 > 0. Other explanations can be given by spectral graph theory. Here, many geometric invariants of the graph can be expressed or bounded with the help of the first eigenvalues of the graph Laplacian. In particular, the sizes of cuts are closely related to the size of the first eigenvalues. For more details on this topic we refer to Bolla (1991), Mohar (1997) and Chung (1997). We would like to illustrate the eigengap heuristic on our toy example introduced in Section 4. For this purpose we consider similar data sets as in Section 4, but to vary the difficulty of clustering we consider the Gaussians with increasing variance. The first row of Figure 4 shows the histograms of the three samples. We construct the 10-nearest neighbor graph as described in Section 4, and plot the eigenvalues of the normalized Laplacian Lrw on the different samples (the results for the unnormalized Laplacian are similar). The first data set consists of four well separated clusters, and we can see that the first 4 eigenvalues are approximately 0. Then there is a gap between the 4th and 5th eigenvalue, that is |λ5−λ4| is relatively large. According to the eigengap heuristic, this gap indicates that the data set contains 4 clusters. The same behavior can also be observed for the results of the fully connected graph (already plotted in Figure 1). So we can see that the heuristic works well if the clusters in the data are very well pronounced. However, the more noisy or overlapping the clusters are, the less effective is this heuristic. We can see that for the second data set where the clusters are more “blurry”, there is still a gap between the 4th and 5th eigenvalue, but it is not as clear to detect as in the case before. Finally, in the last data set, there is no well-defined gap, the differences between all eigenvalues are approximately the same. But on the other hand, the clusters in this data set overlap so much that many non-parametric algorithms will have difficulties to detect the clusters, unless they make strong assumptions on the underlying model. In this particular example, even for a human looking at the histogram it is not obvious what the correct number of clusters should be. This illustrates that, as most methods for choosing the number of clusters, the eigengap heuristic usually works well if the data contains very well pronounced clusters, but in ambiguous cases it also returns ambiguous results. 23