电子科技大学研究生《机器学习》课程 eSTC 1966 第8讲非监督学习 8 Unsupervised Learning 郝家胜 (Jiasheng Hao) Ph.D., Associate Professor Email: hao@uestc.edu.cn School of Automation Engineering, Center for Robotics University of Electronics and Science of China, Chengdu 611731 参考:《机器学习》周志华
电子科技大学研究生《机器学习》课程 Email: hao@uestc.edu.cn School of Automation Engineering, Center for Robotics University of Electronics and Science of China, Chengdu 611731 郝家胜 (Jiasheng Hao) Ph.D., Associate Professor 参考:《机器学习》周志华
引言 有监督学习和无监督学习: ·有监督训练过程 一训练样本集中每个样本的类别已经被标记 无监督训练过程 Unsupervised Learning 一使用未被标记的训练样本 X2 我们的目标是发现这组数据中的特殊结构 X1
我们的目标是发现这组数据中的特殊结构
引言 ▣无监督识别应用非常广泛 。收集并标记大型样本集非常费时费力 例如:语音信息的记录 逆向解决问题:用大量未标记样本集训练,再 人工标记数据分组 一例如:数据挖掘的应用 对于待分类模式性质会随时间变化的情况,使 用无监督方法可以大幅提升分类器性能 一例:自动食品分类器中食品随季节而改变
p 无监督识别应用非常广泛
引言 ▣例子 新闻分类:如Google News搜集网上的新闻,并且根据新闻的主题将新 闻分成许多簇,然后将在同一个簇的新闻放在一起。 No12,2015 千”▣*小 Death toll in Northern California wildfire rises to 42,as PARADISE,Calif.-The inferno that ravaged the wooded town of Paradise in northern California became the deadliest wildfire in the Trump OKs disaster declaration for Golden State state's modern history on Monday when officials said they had discovered the remains of 13 more people,bringing the death toll to 42 Fox News·today The Butte County sheriff,Kory L.Honea,has sald more than 200 people remain missing in and around the town,which sits in the California Wildfires Updates:42 Deaths Make Camp Fire foothills of the Sierra Nevada and was popular with retirees. Deadliest in State History "My sincere hope is that I don't have to come here each night and The New York Times today report a higher and higher number,"Sheriff Honea said at a news conference Monday night. The fire,which continues to rage in the hills and ravines east of the Trump couldn't just express empathy for California fire victims. city of Chico,is also the most destructive fire in California history,with What's the matter with him?-Los Angeles Tim more than 7,100 structures destroyed,most of them homes. Los Angeles Times·today·Opinion To the editor:President Trump's first tocet about the California fires should have been: Utility emailed woman about problems 1 day before fire "We are saddened by the loss of life and property caused by California's fires.Our thanks and admirintothe frefightersdthersonthefrof thsdYour WRAL.com·today will do everything it can to assist." Instead,it was:"There is no reason for these massive,deadly and costly forest fires in Camp Fire growth slows as winds ease up Monday Californla except that forest management is so poor.Billions of dollars are given each year, with so many lives lost,all because of gross mismanagement of the forests.Remedy now or ABc10.com KXTV·today no more Fed payments!" As a UCLA Extension instructor on crisis management,I know that support and appreclation are imperative during a crisis;critiques and recommended improvements belong in after. action reports.Threats are never acceptable. How sad that our president doesn't know when to support and when to critieize.People and animals are dying:bomes are going up in flames,and thousands of responders are working around the clock.Yet the president did not mention that at first
p 例子 p 新闻分类:如Google News搜集网上的新闻,并且根据新闻的主题将新 闻分成许多簇, 然后将在同一个簇的新闻放在一起
引言 口例子 ▣提前捕捉未知欺诈和洗钱攻击:DataVisor解决方案可以在没有训练标 签和历史欺诈样本时有效自动检测各类新型攻击,发掘未知的系统性和规 模性的风险。并且能在攻击者发起破坏前阻止他们。DataVisor的反欺诈 工作包括各种恶意注册、盗号、骗贷、刷量等等欺诈行为。DataVisor的 强项就是特征计算,准确的数据清洗、字段提取、字段拆分和字段组合等。 通过对特征的聚类,可以高效地抓到欺诈团伙,及时阻止欺诈行为。将 DataVisor的无监督学习应用于某些欺诈场景,其准确率可以高达99%
p 例子 p 提前捕捉未知欺诈和洗钱攻击: DataVisor解决方案可以在没有训练标 签和历史欺诈样本时有效自动检测各类新型攻击,发掘未知的系统性和规 模性的风险。并且能在攻击者发起破坏前阻止他们。DataVisor的反欺诈 工作包括各种恶意注册、盗号、骗贷、刷量等等欺诈行为。DataVisor的 强项就是特征计算,准确的数据清洗、字段提取、字段拆分和字段组合等。 通过对特征的聚类,可以高效地抓到欺诈团伙,及时阻止欺诈行为。将 DataVisor的无监督学习应用于某些欺诈场景,其准确率可以高达99%
引言 口例子 ▣Image clustering 酒 25
p 例子 p Image clustering
引言 ▣例子 口有组织计算机集群,社交网络分析,市场划分,天文数据分析等 Organize computing clusters Social network analysis Market segmentation Astronomical data analysis Andrew Ng
p 例子 p 有组织计算机集群,社交网络分析,市场划分,天文数据分析等
引言 ▣无监督方法可有助于特征选择 ● 用无监督方法提取一些对进一步分类很有用的 基本特征 独立于数据的“灵巧预处理”,“灵巧特 征提取” 。揭示观测数据的一些内部结构和规律 就能更有效设计有针对性的分类器
p 无监督方法可有助于特征选择
引言 ▣Yann LeCun:无监督学习代表了AI技术的未来。 Reinforcement Learning (cherry) The machine predicts a scalar reward given once in a while. A few bits for some samples Supervised Learning (icing) The machine predicts a category or a few numbers for each input 10-10,000 bits per sample Unsupervised Learning (cake) The machine predicts any part of its input for any observed part. Predicts future frames in videos Millions of bits per sample ▣Yoshua Bengio:无监督学习是深度学习突破的关键,是现 在人工智能的一个重大挑战,可能还要花费数十年的时间才 能解决
p Yann LeCun: 无监督学习代表了 AI 技术的未来。 p Yoshua Bengio: 无监督学习是深度学习突破的关键, 是现 在人工智能的一个重大挑战, 可能还要花费数十年的时间才 能解决
目录 口无监督学习任务 口基于原型的学习方法 ▣EM算法 ▣动态聚类方法 口新方法
p 无监督学习任务 p 基于原型的学习方法 p EM算法 p 动态聚类方法 p 新方法