e包回H厄与 第七章:机器学习基础 理练计
数理统计在化学中的应用 第七章:机器学习基础
Keyyof MoeiveDepme of Ch $7.0什么是机器学习 A computer program is said to learn from experience E with respect to some class of tasks T and performance measure P if its performance at tasks in T,as measured by P,improves with experience E Improve over task T' With respect to performance measurement P Based on experience E By Mitchell,T.(1997).Machine Learning.McGraw Hill. 振华制 数理统计在化学中的应用 2
李 振 华 制 造 $7.0 什么是机器学习 A computer program is said to learn from experience E with respect to some class of tasks T and performance measure P if its performance at tasks in T, as measured by P, improves with experience E Improve over task T With respect to performance measurement P Based on experience E By Mitchell, T. (1997). Machine Learning. McGraw Hill. 数理统计在化学中的应用 2
hng4iKqLahortryofMdenrCalsandhnwatreMateri冰,Deoat恤etofCheriy 机器学能干什么 模式积别 计算机视觉 自然语言处理 Pattern Recognition ⑧6大敢据 数据挖掘 大数据 统计学习 语音识别 自动驾驶 VOICE RECOGNITION Speak now Cancel 数理统计在化学中的应用 3 李振华制造
李 振 华 制 造 机器学习能干什么 数理统计在化学中的应用 3 计算机视觉 数 据 挖 掘 统计学习 语音识别 模式识别 自然语言处理 自动驾驶
KLaooy fMoyioviveDeparm of Ch 机器学习基本概念 数据集(data set) 样本:instance,sample,example 特征:feature,attribute,independent variable 特征的取值:attribute value 属性空间,样本空间:attribute space,sample space 特征向量:feature vector 输出,目标:target,.output,.response,.dependent variable 商品的需求量与消费者的平均收入、商品价格 output (y) 需求量(Y 100 75 80 70 50 特征1 收入(x) 1000 600 1200 500 300 input (x1,x2) 价格(x2) 5 7 6 6 8 特征2 振华制 数理统计在化学中的应用 造
李 振 华 制 造 机器学习基本概念 数据集(data set) 样本:instance, sample, example 特征:feature, attribute, independent variable 特征的取值:attribute value 属性空间,样本空间:attribute space, sample space 特征向量:feature vector 输出,目标:target, output, response, dependent variable 数理统计在化学中的应用 4 需求量(Y) 100 75 80 70 50 收入(x1 ) 1000 600 1200 500 300 价格(x2 ) 5 7 6 6 8 商品的需求量与消费者的平均收入、商品价格 output (y) input (x1 , x2 ) 特征1 特征2
KeyLoyfMave Demofh 数据集的柜阵表示: Matrix Vector:An nxl matrix Rnxm Rn 收入 价格 1000 1000 5 600 600 X1= 1200 7 y= X= 1200 500 6 105000 500 6 300 300 3 R5×2 576 X2 三 Matrix element 63 X:X第行第列的元 y:y第i个元 振华 数理统计在化学中的应用 5 造
李 振 华 制 造 Matrix ℝ𝑛×𝑚 数理统计在化学中的应用 5 𝐗 = 1000 5 600 7 1200 6 500 6 300 3 ℝ5×2 收入 价格 Vector: An n1 matrix 𝐱𝟐 = 5 7 6 6 3 𝐱𝟏 = 1000 600 1200 500 300 𝐲 = 100 75 80 70 50 ℝ𝑛 𝑋𝑖𝑗: X第i行第j列的元 Matrix element 𝑦𝑖 : y第i个元 数据集的矩阵表示:
hangtaiKeyLhortnyofMhkeorCa咖ssandhmoratrceMatrik,Deoat血etofhsamiy 机器学习的类型 Supervised Learning(监督学习) The computer is presented with example inputs and their desired outputs, given by a"teacher",and the goal is to learn a general rule that maps inputs to outputs. Semi-supervised learning:the computer is given only an incomplete training signal:a training set with some(often many)of the target outputs missing. Active learning:the computer can only obtain training labels for a limited set of instances(based on a budget),and also has to optimize its choice of objects to acquire labels for.When used interactively,these can be presented to the user for labeling. Reinforcement learning:training data(in form of rewards and punishments)is given only as feedback to the program's actions in a dynamic environment,such as driving a vehicle or playing a game against an opponent. 振华制 数理统计在化学中的应用 6 造
李 振 华 制 造 机器学习的类型 Supervised Learning (监督学习) The computer is presented with example inputs and their desired outputs, given by a "teacher", and the goal is to learn a general rule that maps inputs to outputs. Semi-supervised learning: the computer is given only an incomplete training signal: a training set with some (often many) of the target outputs missing. Active learning: the computer can only obtain training labels for a limited set of instances (based on a budget), and also has to optimize its choice of objects to acquire labels for. When used interactively, these can be presented to the user for labeling. Reinforcement learning: training data (in form of rewards and punishments) is given only as feedback to the program's actions in a dynamic environment, such as driving a vehicle or playing a game against an opponent. 数理统计在化学中的应用 6
KLao fMoaymovive Deparm of Ch Starting point: Response measurement Y Vector of p predictor measurements X In the regression problem,Y is quantitative (e.g.price, blood pressure). ● In the classification problem,Y takes values in a finite, unordered set(survived/died,digit 0-9,cancer class of tissue sample). We have training data (x,);...;(x y).These are observations (examples,instances)of these measurements. 振华制 数理统计在化学中的应用
李 振 华 制 数理统计在化学中的应用 7 造 Starting point: Response measurement Y Vector of p predictor measurements X In the regression problem, Y is quantitative (e.g. price, blood pressure). In the classification problem, Y takes values in a finite, unordered set (survived/died, digit 0-9, cancer class of tissue sample). We have training data (x1 , y1 ); … ; (xN , yN ). These are observations (examples, instances) of these measurements
UN KeyyofMsivDeprment of Ch 监督学 Labels already Training: KNOWN Feature Feature Feature Feature Known #1 #2 #3 . N Labels Build model 李振华制 数理统计在化学中的应用 8
李 振 华 制 造 监督学习 数理统计在化学中的应用 8 Feature #1 Feature #2 Feature #3 … Feature N Build model Known Labels Labels already KNOWN Training:
UN KeyyofMivDeprment of Ch 91 监督学 Labels NOT Training: KNOWN Feature Feature Feature Feature Goal #1 #2 #3 。o N Labels Use model built during training 数理统计在化学中的应用 9 李振华制造
李 振 华 制 造 监督学习 数理统计在化学中的应用 9 Feature #1 Feature #2 Feature #3 … Feature N Use model built during training Goal Labels Labels NOT KNOWN Training:
Key La fMoeiveDprmentof Ch 机器学习的类型 Unsupervised Learning(无监督学习 No labels are given to the learning algorithm,leaving it on its own to find structure in its input.Unsupervised learning can be a goal in itself (discovering hidden patterns in data)or a means towards an end (feature learning). No outcome variable,just a set of predictors(features)measured on a set of samples. objective is more fuzzy find groups of samples that behave similarly find features that behave similarly Find linear combinations of features with the most variation. difficult to know how well your are doing different from supervised learning,but can be useful as a pre- processing step for supervised learning 振华制 数理统计在化学中的应用 10 造
李 振 华 制 造 机器学习的类型 Unsupervised Learning (无监督学习) No labels are given to the learning algorithm, leaving it on its own to find structure in its input. Unsupervised learning can be a goal in itself (discovering hidden patterns in data) or a means towards an end (feature learning). No outcome variable, just a set of predictors (features) measured on a set of samples. objective is more fuzzy find groups of samples that behave similarly find features that behave similarly Find linear combinations of features with the most variation. difficult to know how well your are doing different from supervised learning, but can be useful as a preprocessing step for supervised learning 数理统计在化学中的应用 10