正在加载图片...
cation.The most common performance measure for this problem calculates the fraction of correctly classified (respectively misclassified)elements to all elements.For Rand,comparing two clusterings was just a natural extension of this problem which has a corresponding extension of the performance measure:instead of counting single elements he counts correctly classified pairs of elements.Thus,the Rand Index is defined by: R(C,c)=2mu+no0) n(n-1) R ranges from 0(no pair classified in the same way under both clusterings) to 1 (identical clusterings).The value of R depends on both,the number of clusters and the number of elements.Morey and Agresti showed that the Rand Index is highly dependent upon the number of clusters [2].In [4],Fowlkes and Mallows show that in the (unrealistic)case of independant clusterings the Rand Index converges to 1 as the number of clusters increases which is undesirable for a similarity measure. 3.2.2 Adjusted Rand Index The expected value of the Rand Index of two random partitions does not take a constant value (e.g.zero).Thus,Hubert and Arabie proposed an adjustment [3 which assumes a generalized hypergeometric distribution as null hypothesis:the two clusterings are drawn randomly with a fixed number of clusters and a fixed number of elements in each cluster (the number of clusters in the two clusterings need not be the same).Then the adjusted Rand Index is the (normalized)difference of the Rand Index and its expected value under the null hypothesis.It is defined as follows [6]: RC,c)==1学)- (t1+2)-t3 where ti=】 ()-() 2t1t2 n(n-1) -1 -1 This index has expected value zero for independant clusterings and maxi- mum value 1(for identical clusterings).The significance of this measure has to be put into question because of the strong assumptions it makes on the distribution.Meila [7]notes,that some pairs of clusterings may result in negative index values. 5cation. The most common performance measure for this problem calculates the fraction of correctly classified (respectively misclassified) elements to all elements. For Rand, comparing two clusterings was just a natural extension of this problem which has a corresponding extension of the performance measure: instead of counting single elements he counts correctly classified pairs of elements. Thus, the Rand Index is defined by: R(C, C 0 ) = 2(n11 + n00) n(n − 1) R ranges from 0 (no pair classified in the same way under both clusterings) to 1 (identical clusterings). The value of R depends on both, the number of clusters and the number of elements. Morey and Agresti showed that the Rand Index is highly dependent upon the number of clusters [2]. In [4], Fowlkes and Mallows show that in the (unrealistic) case of independant clusterings the Rand Index converges to 1 as the number of clusters increases which is undesirable for a similarity measure. 3.2.2 Adjusted Rand Index The expected value of the Rand Index of two random partitions does not take a constant value (e.g. zero). Thus, Hubert and Arabie proposed an adjustment [3] which assumes a generalized hypergeometric distribution as null hypothesis: the two clusterings are drawn randomly with a fixed number of clusters and a fixed number of elements in each cluster (the number of clusters in the two clusterings need not be the same). Then the adjusted Rand Index is the (normalized) difference of the Rand Index and its expected value under the null hypothesis. It is defined as follows [6]: Radj (C, C 0 ) = Pk i=1 P` j=1 ￾mij 2  − t3 1 2 (t1 + t2) − t3 where t1 = X k i=1  |Ci | 2  , t2 = X ` j=1  |C 0 j | 2  , t3 = 2t1t2 n(n − 1) This index has expected value zero for independant clusterings and maxi￾mum value 1 (for identical clusterings). The significance of this measure has to be put into question because of the strong assumptions it makes on the distribution. Meila [7] notes, that some pairs of clusterings may result in negative index values. 5
<<向上翻页向下翻页>>
©2008-现在 cucdc.com 高等教育资讯网 版权所有