Factors of Clustering What data could be used in clustering? a Large or small, Gaussian or non-Gaussian, etc a Which clustering algorithm?(cost function) Partition-based(e.g k-means n Model-based(e.g EM algorithm) a Density-based(e.g. DBSCAN) Genetic, spectral a Choosing(dis similarity measures-a critical step in clustering 口 Euclidean distance, a Pearson linear correlation a How to evaluate the clustering result?(cluster validity)4 Factors of Clustering ◼ What data could be used in clustering? Large or small, Gaussian or non-Gaussian, etc. ◼ Which clustering algorithm? (cost function) Partition-based (e.g. k-means) Model-based (e.g. EM algorithm) Density-based (e.g. DBSCAN) Genetic, spectral …… ◼ Choosing (dis)similarity measures – a critical step in clustering Euclidean distance,… Pearson Linear Correlation,… ◼ How to evaluate the clustering result? (cluster validity)