3 GENERALIZATIONS 5 the index of the points 21,...,2n.Further we define sAm)=me N) to be the n-th shatter coefficient of the class A. The index is the number of different subsets of given n points that can be identified by intersecting with sets in A.Then obviously the shatter coefficient is the maximal number of different subsets that can be picked out.It measures the diversity of the class.Clearly s(A,n)<2n as there are only 2n subsets of n points.The largest integer h for which s(A,h)=2h is called the vc-dimension of the dlass A.It plays a crucial role in the statistical learning theory (see [Vap0o]). Theorem 5 (Vapnik-Chervonenkis).FOR ANY PFOEABIIITY MEASUFE AND ClASS of SES A,AND fOR ANY N ANDE>0, P sup v(A)-vn(A)I>e<8s(A,n)e-ne2/32, LAEA Proof.We modify the proof of Theorem 3:In step 3 we now argue that the supremum is actually a maximum over at most s(A,n)sets. ▣ REM A (Measurability).The supremum in the theorem is not always measurable.In fact this must be checked for every family A. REM AFk (Optimal Exponent).For the sake of brevity we followed Pollards([Pol84])proof instead of the original one by Vapnik([VC71).In particular the exponent -ne2/32 is worse than the -ne2/8 in the original paper.The best known exponent for the Glivenko Cantelli Theorem is -2ne2 ([Mas90]).For the Vapnik Chervonenkis inequality several refinements exist that play with the prefactor and the exponent.See for instance [Dev82]. In fact it should be clear that the statement could also be given in a probabilistic manner: Theorem 6. P{sup v(A)-(A川>e 8ENA(Z....Zn)}e-ne/2 、AA For real world applications this statement is more difficult to hande,but it is of great theo- retical interest.We say that a uniform law of large numbers holds if sup vn(A)-v(A)0 in probability AEA It follows from the theorem that this is the case if 1og{(N4(Z,,Zn)》一0 Vapnik and Chervonenkis showed is [VC71 Vap98 that this condition is also necessary for the uniform law of large numbers.Another characterization is given by Talagrand [Tal87,who showed that the uniform law of large numbers holds if and only if there does not exist a set A CRd with v(A)>0 such that,with probability one,the set {21,...,Zn}nA is shattered by A.3 GENERALIZATIONS 5 the index of the points z1, . . . , zn. Further we dene s(A, n) := max z1,...,zn∈Rd NA(z1, . . . , zn) to be the n-th shatter coecient of the class A. The index is the number of dierent subsets of given n points that can be identied by intersecting with sets in A. Then obviously the shatter coecient is the maximal number of dierent subsets that can be picked out. It measures the diversity of the class. Clearly s(A, n) ≤ 2 n as there are only 2 n subsets of n points. The largest integer h for which s(A, h) = 2h is called the vc-dimension of the class A. It plays a crucial role in the statistical learning theory (see [Vap00]). Theorem 5 (Vapnik-Chervonenkis). For any probability measure ν and class of sets A, and for any n and ² > 0, P ½ sup A∈A |ν(A) − νn(A)| > ²¾ ≤ 8s(A, n)e−n²2/32 , Proof. We modify the proof of Theorem 3: In step 3 we now argue that the supremum is actually a maximum over at most s(A, n) sets. Remark (Measurability). The supremum in the theorem is not always measurable. In fact this must be checked for every family A. Remark (Optimal Exponent). For the sake of brevity we followed Pollards([Pol84]) proof instead of the original one by Vapnik ([VC71]). In particular the exponent −n²2/32 is worse than the −n²2/8 in the original paper. The best known exponent for the Glivenko Cantelli Theorem is −2n²2 ([Mas90]). For the Vapnik Chervonenkis inequality several renements exist that play with the prefactor and the exponent. See for instance [Dev82]. In fact it should be clear that the statement could also be given in a probabilistic manner: Theorem 6. P ½ sup A∈A |ν(A) − νn(A)| > ²¾ ≤ 8E {NA(Z1, . . . , Zn)} e −n²2/32 , For real world applications this statement is more dicult to handle, but it is of great theoretical interest. We say that a uniform law of large numbers holds if sup A∈A |νn(A) − ν(A)| → 0 in probability It follows from the theorem that this is the case if log E {(NA(Z1, . . . , Zn))} n → 0 Vapnik and Chervonenkis showed is [VC71, Vap98] that this condition is also necessary for the uniform law of large numbers. Another characterization is given by Talagrand [Tal87], who showed that the uniform law of large numbers holds if and only if there does not exist a set A ⊆ R d with ν(A) > 0 such that, with probability one, the set {Z1, . . . , Zn} ∩ A is shattered by A