当前位置:高等教育资讯网  >  中国高校课件下载中心  >  大学文库  >  浏览文档

北京大学:《大规模数据处理——云计算 Mass Data Processing Cloud Computing》课程教学资源(PPT课件)Clustering问题 Clustering

资源类别:文库,文档格式:PPT,文档页数:51,文件大小:464KB,团购合买
点击下载完整版文档(PPT)

Google News IPhone activation headaches still trouble users ·They didn't pick Computerworld-1 hour ago July 02,2007(Computerworld)--It took lain Gillott 47 hours to activate his iPhone after waiting in the Texas heat Friday afternoon to buy one. all3,400,217 Most iPhone users thrilled but a few are iRate Reuters Local6.com Apple iPhone Arrives in the US Techtree.com Forbes-ZDNet-Ars Technica-Wired News related articles all 562 news articles by hand... McCain Considers Ways to Reshape Campaign Washington Post-35 minutes ago By Alec MacGillis Sen.John McCain's presidential campaign today Or Amazon.com announced widespread cutbacks and said it was considering whether Seattle Po过 to accept public campaign funds after another disappointing Intelligencer fundraising effort that has left the Arizona Republican with... McCain's Troubles Mount New York Times 。Or Netflix.. McCain Campaign Struggling,Reduces Staff ABC News CBS News-Reuters-Angus Reid Global Monitor-Sarasota Herald-Tribune all 291 news articles

Google News • They didn’t pick all 3,400,217 related articles by hand… • Or Amazon.com • Or Netflix…

Other less glamorous things... ·Hospital Records 。Scientific Imaging -Related genes,related stars,related sequences ·Market Research -Segmenting markets,product positioning Social Network Analysis ·Data mining Image segmentation

Other less glamorous things... • Hospital Records • Scientific Imaging – Related genes, related stars, related sequences • Market Research – Segmenting markets, product positioning • Social Network Analysis • Data mining • Image segmentation…

The Distance Measure o How the similarity of two elements in a set is determined,e.g. -Euclidean Distance The Euclidean distance between points P-(p1,p2,...Pn)and -(1,2,...,n).in Euclidean n-space.is defined as: V(p-g+()++(P-)=(p-9)2. A common notation for distance isp]-[where [p]=[p1,p2,...,Pn]and [q=g1,92,...,gn]are vectors. -Manhattan Distance -Inner Product Space Maximum Norm -Or any metric you define over the space

The Distance Measure • How the similarity of two elements in a set is determined, e.g. – Euclidean Distance – Manhattan Distance – Inner Product Space – Maximum Norm – Or any metric you define over the space…

Types of Algorithms Hierarchical Clustering vs. Partitional Clustering

• Hierarchical Clustering vs. • Partitional Clustering Types of Algorithms

Hierarchical Clustering ●● Builds or breaks up a hierarchy of clusters

Hierarchical Clustering • Builds or breaks up a hierarchy of clusters

Partitional Clustering 00 O 0 o O O O O 00 o 0 Partitions set into all clusters simultaneously

Partitional Clustering • Partitions set into all clusters simultaneously

Partitional Clustering 0 0 0 Partitions set into all clusters simultaneously

Partitional Clustering • Partitions set into all clusters simultaneously

K-Means Clustering Simple Partitional Clustering Choose the number of clusters.k Choose k points to be cluster centers 。Then

K-Means Clustering • Simple Partitional Clustering • Choose the number of clusters, k • Choose k points to be cluster centers • Then…

K-Means Clustering iterate Compute distance from all points to all k- centers Assign each point to the nearest k-center Compute the average of all points assigned to all specific k-centers Replace the k-centers with the new averages ]

K-Means Clustering iterate { Compute distance from all points to all k￾centers Assign each point to the nearest k-center Compute the average of all points assigned to all specific k-centers Replace the k-centers with the new averages }

But! The complexity is pretty high: -k n O(distance metric )num (iterations) 0 Moreover,it can be necessary to send tons of data to each Mapper Node. Depending on your bandwidth and memory available,this could be impossible

But! • The complexity is pretty high: – k * n * O ( distance metric ) * num (iterations) • Moreover, it can be necessary to send tons of data to each Mapper Node. Depending on your bandwidth and memory available, this could be impossible

点击下载完整版文档(PPT)VIP每日下载上限内不扣除下载券和下载次数;
按次数下载不扣除下载券;
24小时内重复下载只扣除一次;
顺序:VIP每日次数-->可用次数-->下载券;
共51页,可试读17页,点击继续阅读 ↓↓
相关文档

关于我们|帮助中心|下载说明|相关软件|意见反馈|联系我们

Copyright © 2008-现在 cucdc.com 高等教育资讯网 版权所有