当前位置:高等教育资讯网  >  中国高校课件下载中心  >  大学文库  >  浏览文档

《机器学习》课程教学资源(讲稿)对(文本)聚类中一些问题的讨论(Thinking in Clustering)

资源类别:文库,文档格式:PDF,文档页数:16,文件大小:1.41MB,团购合买
点击下载完整版文档(PDF)

Text Mining NLP ML Thinking in (Text)Clustering No math,be not afraid Yueshen Xu (lecturer) ysxu@xidian.edu.cn/xuyueshen@163.com Data and Knowledge Engineering Research Center Xidian University

Thinking in (Text) Clustering (No math, be not afraid) Yueshen Xu (lecturer) ysxu@xidian.edu.cn / xuyueshen@163.com Data and Knowledge Engineering Research Center Xidian University Text Mining & NLP & ML

Outline 历些毛子代拔大》 XIDIAN UNIVERSITY ▣Background What can be clustered? Problems in K-XXX(Means/Medoid/Center...) ■Similarity Measure Basics,not ■Convex and Concave state-of-the-art Problems in Gaussian Mixture Model Problems in Matrix Factorization Multinomial and Sparsity Keywords:Clustering,K-Means/Medoid,Similarity Computation,GMM,MF, Multinomial Distribution 2017/4/13 Software Engineering

2017/4/13 Software Engineering Outline  Background  What can be clustered?  Problems in K-XXX (Means/Medoid/Center…)  Similarity Measure  Convex and Concave  Problems in Gaussian Mixture Model  Problems in Matrix Factorization  Multinomial and Sparsity 2 Keywords: Clustering, K-Means/Medoid, Similarity Computation, GMM, MF, Multinomial Distribution Basics, not state-of-the-art

Background 历忠毛子代枚大学 XIDIAN UNIVERSITY Information Overloading Big Data Chinese International Travel Monitor 2015 at a glance Hotels.com Cloud Com uting Artificiatelligence Deep Kearnng n we need 8o0oa summarization isualization 人盘 Dimensional Reduction 2017/4/13 Software Engineering

2017/4/13 Software Engineering Background  Information Overloading 3 we need summarization Visualization Dimensional Reduction Big Data Cloud Computing Artificial Intelligence Deep Learning ,…, etc

Background 历些毫子种拔大” XIDIAN UNIVERSITY Dimensional Reduction (DR) ■Clustering >Text Clustering,Webpage Clustering,Image Clustering... ■Summarization NMF ●nigina >Document Summarization,Image Summ ■Factorization >Rating Matrix Factorization,Image Non- ▣Basic Requirement Automatic Applicable Explainable →Clustering(Text) 2017/14/13 Software Engineering

2017/4/13 Software Engineering Background Dimensional Reduction (DR)  Clustering  Text Clustering, Webpage Clustering, Image Clustering…  Summarization Document Summarization, Image Summarization…  Factorization  Rating Matrix Factorization, Image Non-negative Factorization 4 Automatic Applicable Explainable  Basic Requirement Clustering (Text)

Some Concepts 历些毛子种技大学 XIDIAN UNIVERSITY Information Retrieval Related Research Areas Dimensional Reduction(DR) Machine DR ■Text Mining Learning (Text) Clustering Natural Language Processing Computational Linguistics Tex Mining Artificial Information Retrieval Machine Natu al Language Processing Artificial Intelligence Translation Computational Linguistics ntelligence (Text)Clustering Data Mining >We all know what(text)clustering is,right? >Widely-accepted topic,since everyone knows it 2017/4/13 Software Engineering

2017/4/13 Software Engineering  Related Research Areas  Dimensional Reduction (DR)  Text Mining  Natural Language Processing  Computational Linguistics  Information Retrieval  Artificial Intelligence  (Text) Clustering Some Concepts 5 Information Retrieval Computational Linguistics Natural Language Processing LSA/Topic Model Text Mining DR Data Mining Artificial Intelligence Machine Learning Machine Translation (Text) Clustering  We all know what (text) clustering is, right?  Widely-accepted topic, since everyone knows it

What can be clustered? 历些毛子种枝大” XIDIAN UNIVERSITY Data Sample1:(1.2,1.4,2.234,3.231),(8.2,6.4,4.243,5.41), (5.234,3.56,4.454,6.78) Data Sample2:(1),(0),(1),(0),(1),(1),(1),(0),(1),(0) Data Sample 3:(China,modern,people,gov.),(policy, paper,conference,chair),(report,solution,UN,UK) Data Sample 4:(aaabbbccc),(dddfffggg),(hhhiiiijj) Data Sample5:(Av◆),(,(ao●) 2017/14/13 6 Software Engineering

2017/4/13 Software Engineering What can be clustered? 6 Data Sample 1:(1.2, 1.4, 2.234, 3.231), (8.2, 6.4, 4.243, 5.41), (5.234, 3.56, 4.454, 6.78) Data Sample 2:(1), (0),(1),(0),(1),(1),(1),(0),(1),(0) Data Sample 3:(China, modern, people, gov.), (policy, paper, conference, chair), (report, solution, UN, UK) Data Sample 4:(aaabbbccc), (dddfffggg), (hhhiiiijjj) Data Sample 5:(▲▼♦), (♣♠█),(■□●)

Is there anything that 历粤莞子代找大学 XIDIAN UNIVERSITY cannot be clustered? Yes,but not related to us What can be clustered? Anything which a similarity measure can be defined over 207721 31 451 14126 46 904 28 All kinds of data can be Matrix clustered 3916i2088i;2 2017/4/13

2017/4/13 Software Engineering Is there anything that cannot be clustered? 7 Yes, but not related to us What can be clustered? Anything which a similarity measure can be defined over Matrix topology All kinds of data can be clustered

K-Means Trap 历些毛子代枝大等 XIDIAN UNIVERSITY 4.5 4.0 Defects of K-Means,K- 3.5 Medoid,K-XXX 3.0 →How many K? 20 Where are the initial centers? 1.5 >Do the data really form a 0.5 sphere? 0.0 >Do the data really follow Minkowski /Euclidean distance? 12 1.0 0.6

2017/4/13 Software Engineering K-Means Trap 8 Defects of K-Means, K￾Medoid,K-XXX  How many K?  Where are the initial centers?  Do the data really form a sphere?  Do the data really follow Minkowski /Euclidean distance?

How about these? 历些毛子种枚大学 XIDIAN UNIVERSITY What kind of data that K-XXX better fits? What kind of data that the methods relying on distance-similarity computation better fit? CONVEX 2017/4/13 Software Engineering

2017/4/13 Software Engineering How about these? What kind of data that K-XXX better fits? What kind of data that the methods relying on distance-similarity computation better fit? CONVEX

Alternative 历些毛子代枝大等 XIDIAN UNIVERSITY >Gaussian Mixture Model 2017/14/13 Software Engineering

2017/4/13 Software Engineering Alternative  Gaussian Mixture Model

点击下载完整版文档(PDF)VIP每日下载上限内不扣除下载券和下载次数;
按次数下载不扣除下载券;
24小时内重复下载只扣除一次;
顺序:VIP每日次数-->可用次数-->下载券;
共16页,试读已结束,阅读完整版请下载
相关文档

关于我们|帮助中心|下载说明|相关软件|意见反馈|联系我们

Copyright © 2008-现在 cucdc.com 高等教育资讯网 版权所有