completely coincide with (0,...,0,1,0,...0),but do so up to some small error term.Hence,if the perturbations are not too large,then k-means algorithm will still separate the groups from each other. 7.1 The formal perturbation argument The formal basis for the perturbation approach to spectral clustering is the Davis-Kahan theorem from matrix perturbation theory.This theorem bounds the difference between eigenspaces of symmetric matrices under perturbations.We state those results for completeness,but for background reading we refer to Section V of Stewart and Sun(1990)and Section VII.3 of Bhatia(1997).In perturbation theory, distances between subspaces are usually measured using "canonical angles"(also called "principal angles").To define principal angles,let Vi and V2 be two p-dimensional subspaces of Rd,and Vi and V2 two matrices such that their columns form orthonormal systems for Vi and V2,respectively.Then the cosines cos;of the principal angles are the singular values of V'V2.For p=1,the so defined canonical angles coincide with the normal definition of an angle.Canonical angles can also be defined if V and V2 do not have the same dimension,see Section V of Stewart and Sun(1990),Section VII.3 of Bhatia (1997),or Section 12.4.3 of Golub and Van Loan (1996).The matrix sin (V1,V2)will denote the diagonal matrix with the sine of the canonical angles on the diagonal. Theorem7(Davis-.Kahan)LetA,H∈Rnxn be symmetric matrices,and let‖·‖be the Frobenius norm or the two-norm for matrices,respectively.Consider A:=A+H as a perturbed version of A. Let S CR be an interval.Denote by os (A)the set of eigenvalues of A which are contained in S1, and by Vi the eigenspace corresponding to all those eigenvalues (more formally,Vi is the image of the spectral projection induced by os,(A)).Denote by os (A)and Vi the analogous quantities for A. Define the distance between S and the spectrum of A outside of S as 6=min{lA-sl:入eigenvalue of A,入年S1,s∈S} Then the distance d(vi,Vi):=lsin (vi,Vi)between the two subspaces Vi and Vi is bounded by d(.)s 6 For a discussion and proofs of this theorem see for example Section V.3 of Stewart and Sun(1990). Let us try to decrypt this theorem,for simplicity in the case of the unnormalized Laplacian(for the normalized Laplacian it works analogously).The matrix A will correspond to the graph Laplacian L in the ideal case where the graph has k connected components.The matrix A corresponds to a perturbed case,where due to noise the k components in the graph are no longer completely discon- nected,but they are only connected by few edges with low weight.We denote the corresponding graph Laplacian of this case by L.For spectral clustering we need to consider the first k eigenvalues and eigenvectors of L.Denote the eigenvalues of L by A1,...An and the ones of the perturbed Laplacian L by A1,...,An.Choosing the interval Si is now the crucial point.We want to choose it such that both the first k eigenvalues of L and the first k eigenvalues of L are contained in S1.This is easier the smaller the perturbation H=L-L and the larger the eigengap s-+is.If we manage to find such a set,then the Davis-Kahan theorem tells us that the eigenspaces corresponding to the first k eigenvalues of the ideal matrix L and the first k eigenvalues of the perturbed matrix L are very close to each other,that is their distance is bounded by/6.Then,as the eigenvectors in the ideal case are piecewise constant on the connected components,the same will approximately be true in the perturbed case..How good“approximately”is depends on the norm of the perturbation‖H‖and the distance 6 between S1 and the (k+1)st eigenvector of L.If the set S1 has been chosen as the interval [0,Ak],then 6 coincides with the spectral gap+1-.We can see from the theorem that the larger this eigengap is,the closer the eigenvectors of the ideal case and the perturbed case are,and hence the better spectral clustering works.Below we will see that the size of the eigengap can also be used in a 17completely coincide with (0, . . . , 0, 1, 0, . . . 0)0 , but do so up to some small error term. Hence, if the perturbations are not too large, then k-means algorithm will still separate the groups from each other. 7.1 The formal perturbation argument The formal basis for the perturbation approach to spectral clustering is the Davis-Kahan theorem from matrix perturbation theory. This theorem bounds the difference between eigenspaces of symmetric matrices under perturbations. We state those results for completeness, but for background reading we refer to Section V of Stewart and Sun (1990) and Section VII.3 of Bhatia (1997). In perturbation theory, distances between subspaces are usually measured using “canonical angles” (also called “principal angles”). To define principal angles, let V1 and V2 be two p-dimensional subspaces of ❘ d , and V1 and V2 two matrices such that their columns form orthonormal systems for V1 and V2, respectively. Then the cosines cos Θi of the principal angles Θi are the singular values of V 0 1V2. For p = 1, the so defined canonical angles coincide with the normal definition of an angle. Canonical angles can also be defined if V1 and V2 do not have the same dimension, see Section V of Stewart and Sun (1990), Section VII.3 of Bhatia (1997), or Section 12.4.3 of Golub and Van Loan (1996). The matrix sin Θ(V1, V2) will denote the diagonal matrix with the sine of the canonical angles on the diagonal. Theorem 7 (Davis-Kahan) Let A, H ∈ ❘ n×n be symmetric matrices, and let k · k be the Frobenius norm or the two-norm for matrices, respectively. Consider A˜ := A + H as a perturbed version of A. Let S1 ⊂ ❘ be an interval. Denote by σS1 (A) the set of eigenvalues of A which are contained in S1, and by V1 the eigenspace corresponding to all those eigenvalues (more formally, V1 is the image of the spectral projection induced by σS1 (A)). Denote by σS1 (A˜) and V˜ 1 the analogous quantities for A˜. Define the distance between S1 and the spectrum of A outside of S1 as δ = min{|λ − s|; λ eigenvalue of A, λ 6∈ S1, s ∈ S1}. Then the distance d(V1, V˜ 1) := k sin Θ(V1, V˜ 1)k between the two subspaces V1 and V˜ 1 is bounded by d(V1, V˜ 1) ≤ kHk δ . For a discussion and proofs of this theorem see for example Section V.3 of Stewart and Sun (1990). Let us try to decrypt this theorem, for simplicity in the case of the unnormalized Laplacian (for the normalized Laplacian it works analogously). The matrix A will correspond to the graph Laplacian L in the ideal case where the graph has k connected components. The matrix A˜ corresponds to a perturbed case, where due to noise the k components in the graph are no longer completely disconnected, but they are only connected by few edges with low weight. We denote the corresponding graph Laplacian of this case by L˜. For spectral clustering we need to consider the first k eigenvalues and eigenvectors of L˜. Denote the eigenvalues of L by λ1, . . . λn and the ones of the perturbed Laplacian L˜ by λ˜ 1, . . . , λ˜ n. Choosing the interval S1 is now the crucial point. We want to choose it such that both the first k eigenvalues of L˜ and the first k eigenvalues of L are contained in S1. This is easier the smaller the perturbation H = L − L˜ and the larger the eigengap |λk − λk+1| is. If we manage to find such a set, then the Davis-Kahan theorem tells us that the eigenspaces corresponding to the first k eigenvalues of the ideal matrix L and the first k eigenvalues of the perturbed matrix L˜ are very close to each other, that is their distance is bounded by kHk/δ. Then, as the eigenvectors in the ideal case are piecewise constant on the connected components, the same will approximately be true in the perturbed case. How good “approximately” is depends on the norm of the perturbation kHk and the distance δ between S1 and the (k + 1)st eigenvector of L. If the set S1 has been chosen as the interval [0, λk], then δ coincides with the spectral gap |λk+1 −λk|. We can see from the theorem that the larger this eigengap is, the closer the eigenvectors of the ideal case and the perturbed case are, and hence the better spectral clustering works. Below we will see that the size of the eigengap can also be used in a 17