正在加载图片...
Chap.1 Introduction 3 12 lower similarity threshold,we perceive nine clusters.Which answer is correct? Looking at the data at multiple scales may actually help in analyzing its structure. Thus the crucial problem in identifying clusters in data is to specify what proximity is and how to measure it.As is to be expected,the notion of proximity is problem dependent. Clustering techniques offer several advantages over a manual grouping pro- cess.First,a clustering program can apply a specified objective criterion consistently to form the groups.Human beings are excellent cluster seekers in two and often in three dimensions,but different individuals do not always identify the same clusters in data.The proximity measure defining similarity among objects depends on an individual's educational and cultural background.Thus it is quite common for different human subjects to form different groups in the same data,especially when the groups are not well separated.Second,a clustering algorithm can form the groups in a fraction of time required by a manual grouping,particularly if a long list of descriptors or features is associated with each object.The speed, reliability,and consistency of a clustering algorithm in organizing data together constitute an overwhelming reason to use it.A clustering algorithm relieves a scientist or data analyst of the treacherous job of "looking''at a pattern matrix or a similarity matrix to detect clusters.A data analyst's time is better spent in analyzing or interpreting the results provided by a clustering algorithm. Clustering is also useful in implementing the"divide and conquerstrategy to reduce the computational complexity of various decision-making algorithms in pattern recognition.For example,the nearest-neighbor decision rule is a popular technique in pattern recognition (Duda and Hart,1973).However,finding the nearest neighbor of a test pattern can be very time consuming if the number of training patterns or prototypes is large.Fukunaga and Narendra(1975)used the well-known partitional clustering algorithm,ISODATA(Chapter 3),to decompose the patterns,and then in conjunction with the branch-and-bound method obtained an efficient algorithm to compute nearest neighbors.Similarly,Fukunaga and Short (1978)used clustering for problem localization,whereby a simple decision rule can be implemented in local regions or clusters of the pattern space.The applications of clustering continue to grow. Consider the problem of grouping various colleges and universities in the United States to illustrate the factors in clustering problems.Schools can be clustered based on their geographical location,size of the student body,size of the campus, tuition fee,or offerings of various professional graduate programs.The factors depend on the goal of the analysis.The shapes and sizes of the clusters formed will depend on which particular attribute is used in defining the similarity between colleges.Interesting and challenging clustering problems arise when several attri- butes are taken together to construct clusters.One cluster could represent private, midwestern,and primarily liberal arts colleges with fewer than 1000 students and another can represent large state universities.The features or attributes that we have mentioned so far can easily be measured.What about such attributes as quality of education,quality of faculty,and the quality of campus life,which
<<向上翻页向下翻页>>
©2008-现在 cucdc.com 高等教育资讯网 版权所有