Google News IPhone activation headaches still trouble users ·They didn't pick Computerworld-1 hour ago July 02,2007(Computerworld)--It took lain Gillott 47 hours to activate his iPhone after waiting in the Texas heat Friday afternoon to buy one. all3,400,217 Most iPhone users thrilled but a few are iRate Reuters Local6.com Apple iPhone Arrives in the US Techtree.com Forbes-ZDNet-Ars Technica-Wired News related articles all 562 news articles by hand... McCain Considers Ways to Reshape Campaign Washington Post-35 minutes ago By Alec MacGillis Sen.John McCain's presidential campaign today Or Amazon.com announced widespread cutbacks and said it was considering whether Seattle Po过 to accept public campaign funds after another disappointing Intelligencer fundraising effort that has left the Arizona Republican with... McCain's Troubles Mount New York Times 。Or Netflix.. McCain Campaign Struggling,Reduces Staff ABC News CBS News-Reuters-Angus Reid Global Monitor-Sarasota Herald-Tribune all 291 news articles
Google News • They didn’t pick all 3,400,217 related articles by hand… • Or Amazon.com • Or Netflix…
Other less glamorous things... ·Hospital Records 。Scientific Imaging -Related genes,related stars,related sequences ·Market Research -Segmenting markets,product positioning Social Network Analysis ·Data mining Image segmentation
Other less glamorous things... • Hospital Records • Scientific Imaging – Related genes, related stars, related sequences • Market Research – Segmenting markets, product positioning • Social Network Analysis • Data mining • Image segmentation…
The Distance Measure o How the similarity of two elements in a set is determined,e.g. -Euclidean Distance The Euclidean distance between points P-(p1,p2,...Pn)and -(1,2,...,n).in Euclidean n-space.is defined as: V(p-g+()++(P-)=(p-9)2. A common notation for distance isp]-[where [p]=[p1,p2,...,Pn]and [q=g1,92,...,gn]are vectors. -Manhattan Distance -Inner Product Space Maximum Norm -Or any metric you define over the space
The Distance Measure • How the similarity of two elements in a set is determined, e.g. – Euclidean Distance – Manhattan Distance – Inner Product Space – Maximum Norm – Or any metric you define over the space…
Types of Algorithms Hierarchical Clustering vs. Partitional Clustering
• Hierarchical Clustering vs. • Partitional Clustering Types of Algorithms
Hierarchical Clustering ●● Builds or breaks up a hierarchy of clusters
Hierarchical Clustering • Builds or breaks up a hierarchy of clusters
Partitional Clustering 00 O 0 o O O O O 00 o 0 Partitions set into all clusters simultaneously
Partitional Clustering • Partitions set into all clusters simultaneously
Partitional Clustering 0 0 0 Partitions set into all clusters simultaneously
Partitional Clustering • Partitions set into all clusters simultaneously
K-Means Clustering Simple Partitional Clustering Choose the number of clusters.k Choose k points to be cluster centers 。Then
K-Means Clustering • Simple Partitional Clustering • Choose the number of clusters, k • Choose k points to be cluster centers • Then…
K-Means Clustering iterate Compute distance from all points to all k- centers Assign each point to the nearest k-center Compute the average of all points assigned to all specific k-centers Replace the k-centers with the new averages ]
K-Means Clustering iterate { Compute distance from all points to all kcenters Assign each point to the nearest k-center Compute the average of all points assigned to all specific k-centers Replace the k-centers with the new averages }
But! The complexity is pretty high: -k n O(distance metric )num (iterations) 0 Moreover,it can be necessary to send tons of data to each Mapper Node. Depending on your bandwidth and memory available,this could be impossible
But! • The complexity is pretty high: – k * n * O ( distance metric ) * num (iterations) • Moreover, it can be necessary to send tons of data to each Mapper Node. Depending on your bandwidth and memory available, this could be impossible