正在加载图片...
However,the division also reflects the chronological development of the mea- sures:the counting pairs measures date from the 1970s and 1980s,the mea- sures based on set overlaps from the 1990s and the information-theoretical measures have been developed in the last years (2002/2003). Till this day,there is no formalization of the problem of comparing clus- terings.We think that a set of axioms would be helpful in detecting and defining"good"measures.As a first step towards such a set of axioms,we give in Section 6 aspects and properties that have to be taken into account. 2 Definitions and notations Let X be a finite set with cardinality =n.A clustering C is a set [C1,...,C}of non-empty disjoint subsets of X such that their union equals X.The set of all clusterings of X is denoted by P(X).For a clustering C={C1,...,Ck}we assume Cil>0 for all i=1,...,k.A trivial clustering is either the one-clustering that consist of just one cluster or the singleton clustering in which every element forms its own cluster. Let C={C1,...,C}P(X)denote a second clustering of X.The con- fusion matrix M=(mij)(or contingency table)of the pair C,C is a k x e- matrix whose ij-th entry equals the number of elements in the intersection of the clusters Ci and C: m=ICinC,1≤i≤k,1≤j≤e. Clustering C'is a refinement of C (and C is a coarsening of C),if each cluster of C'is contained in a cluster of C,formally: C∈C3C∈C:CcC. The product CxC'of two clusterings C,C'is the coarsest common refinement of the two clusterings: C×C={CnC|C∈C,Cg∈C,CnC≠0}. The product C xC'is again a clustering,and if C'is a refinement of C,then Cx C=C. 3 Measures based on counting pairs A very intuitional approach to comparing clusterings is counting pairs of objects that are"classified"in the same way in both clusterings,i.e.pairs of 3However, the division also reflects the chronological development of the mea￾sures: the counting pairs measures date from the 1970s and 1980s, the mea￾sures based on set overlaps from the 1990s and the information-theoretical measures have been developed in the last years (2002/2003). Till this day, there is no formalization of the problem of comparing clus￾terings. We think that a set of axioms would be helpful in detecting and defining ”good” measures. As a first step towards such a set of axioms, we give in Section 6 aspects and properties that have to be taken into account. 2 Definitions and notations Let X be a finite set with cardinality |X| = n. A clustering C is a set {C1, . . . , Ck} of non-empty disjoint subsets of X such that their union equals X. The set of all clusterings of X is denoted by P(X). For a clustering C = {C1, . . . , Ck} we assume |Ci | > 0 for all i = 1, . . . , k. A trivial clustering is either the one-clustering that consist of just one cluster or the singleton clustering in which every element forms its own cluster. Let C 0 = {C 0 1 , . . . , C0 ` } ∈ P(X) denote a second clustering of X. The con￾fusion matrix M = (mij ) (or contingency table) of the pair C, C 0 is a k × `- matrix whose ij-th entry equals the number of elements in the intersection of the clusters Ci and C 0 j : mij = |Ci ∩ C 0 j |, 1 ≤ i ≤ k, 1 ≤ j ≤ `. Clustering C 0 is a refinement of C (and C is a coarsening of C 0 ), if each cluster of C 0 is contained in a cluster of C, formally: ∀C 0 j ∈ C0 ∃Ci ∈ C : C 0 j ⊆ Ci . The product C×C0 of two clusterings C, C 0 is the coarsest common refinement of the two clusterings: C × C0 = {Ci ∩ C 0 j | Ci ∈ C, C0 j ∈ C0 , Ci ∩ C 0 j 6= ∅}. The product C × C0 is again a clustering, and if C 0 is a refinement of C, then C × C0 = C 0 . 3 Measures based on counting pairs A very intuitional approach to comparing clusterings is counting pairs of objects that are ”classified” in the same way in both clusterings, i.e. pairs of 3
<<向上翻页向下翻页>>
©2008-现在 cucdc.com 高等教育资讯网 版权所有