复旦大学：《数理统计在化学中的应用》课程教学资源（讲稿）第七章机器学习基础

$7.0 什么是机器学习 $7.1 相关关系和相关系数(correlation and correlation coefficient) $7.2 相关系数的显著性检验 $7.3 线性回归和非线性回归 $7.4 一元线性回归线的置信区间 $7.5 多元二项式回归的Matalab实现 $7.6 回归方程的拟合检验 $7.7 Logistic Regression (Intuition) $7.8 Support Vector Machines (支持向量机) $7.9 Training, Validation, and Test

团购合买资源类别：文库，文档格式：PDF，文档页数：96，文件大小：9.55MB

e包回H厄与第七章：机器学习基础理练计

数理统计在化学中的应用第七章：机器学习基础

Keyyof MoeiveDepme of Ch $7.0什么是机器学习 A computer program is said to learn from experience E with respect to some class of tasks T and performance measure P if its performance at tasks in T,as measured by P,improves with experience E Improve over task T' With respect to performance measurement P Based on experience E By Mitchell,T.(1997).Machine Learning.McGraw Hill. 振华制数理统计在化学中的应用 2

李振华制造 $7.0 什么是机器学习  A computer program is said to learn from experience E with respect to some class of tasks T and performance measure P if its performance at tasks in T, as measured by P, improves with experience E  Improve over task T  With respect to performance measurement P  Based on experience E By Mitchell, T. (1997). Machine Learning. McGraw Hill. 数理统计在化学中的应用 2

hng4iKqLahortryofMdenrCalsandhnwatreMateri冰，Deoat恤etofCheriy 机器学能干什么模式积别计算机视觉自然语言处理 Pattern Recognition ⑧6大敢据数据挖掘大数据统计学习语音识别自动驾驶 VOICE RECOGNITION Speak now Cancel 数理统计在化学中的应用 3 李振华制造

李振华制造机器学习能干什么数理统计在化学中的应用 3 计算机视觉数据挖掘统计学习语音识别模式识别自然语言处理自动驾驶

KLaooy fMoyioviveDeparm of Ch 机器学习基本概念数据集(data set) 样本：instance,sample,example 特征：feature,attribute,independent variable 特征的取值：attribute value 属性空间，样本空间：attribute space,sample space 特征向量：feature vector 输出，目标：target,.output,.response,.dependent variable 商品的需求量与消费者的平均收入、商品价格 output (y) 需求量(Y 100 75 80 70 50 特征1 收入(x) 1000 600 1200 500 300 input (x1,x2) 价格(x2) 5 7 6 6 8 特征2 振华制数理统计在化学中的应用造

李振华制造机器学习基本概念  数据集(data set)  样本：instance, sample, example  特征：feature, attribute, independent variable  特征的取值：attribute value  属性空间，样本空间：attribute space, sample space  特征向量：feature vector  输出，目标：target, output, response, dependent variable 数理统计在化学中的应用 4 需求量(Y) 100 75 80 70 50 收入(x1 ) 1000 600 1200 500 300 价格(x2 ) 5 7 6 6 8 商品的需求量与消费者的平均收入、商品价格 output (y) input (x1 , x2 ) 特征1 特征2

KeyLoyfMave Demofh 数据集的柜阵表示： Matrix Vector:An nxl matrix Rnxm Rn 收入价格 1000 1000 5 600 600 X1= 1200 7 y= X= 1200 500 6 105000 500 6 300 300 3 R5×2 576 X2 三 Matrix element 63 X:X第行第列的元 y:y第i个元振华数理统计在化学中的应用 5 造

李振华制造 Matrix  ℝ𝑛×𝑚 数理统计在化学中的应用 5 𝐗 = 1000 5 600 7 1200 6 500 6 300 3 ℝ5×2 收入价格 Vector: An n1 matrix 𝐱𝟐 = 5 7 6 6 3 𝐱𝟏 = 1000 600 1200 500 300 𝐲 = 100 75 80 70 50  ℝ𝑛 𝑋𝑖𝑗: X第i行第j列的元 Matrix element 𝑦𝑖 : y第i个元数据集的矩阵表示：

hangtaiKeyLhortnyofMhkeorCa咖ssandhmoratrceMatrik,Deoat血etofhsamiy 机器学习的类型 Supervised Learning(监督学习) The computer is presented with example inputs and their desired outputs, given by a"teacher",and the goal is to learn a general rule that maps inputs to outputs. Semi-supervised learning:the computer is given only an incomplete training signal:a training set with some(often many)of the target outputs missing. Active learning:the computer can only obtain training labels for a limited set of instances(based on a budget),and also has to optimize its choice of objects to acquire labels for.When used interactively,these can be presented to the user for labeling. Reinforcement learning:training data(in form of rewards and punishments)is given only as feedback to the program's actions in a dynamic environment,such as driving a vehicle or playing a game against an opponent. 振华制数理统计在化学中的应用 6 造

李振华制造机器学习的类型  Supervised Learning （监督学习） The computer is presented with example inputs and their desired outputs, given by a "teacher", and the goal is to learn a general rule that maps inputs to outputs.  Semi-supervised learning: the computer is given only an incomplete training signal: a training set with some (often many) of the target outputs missing.  Active learning: the computer can only obtain training labels for a limited set of instances (based on a budget), and also has to optimize its choice of objects to acquire labels for. When used interactively, these can be presented to the user for labeling.  Reinforcement learning: training data (in form of rewards and punishments) is given only as feedback to the program's actions in a dynamic environment, such as driving a vehicle or playing a game against an opponent. 数理统计在化学中的应用 6

KLao fMoaymovive Deparm of Ch Starting point: Response measurement Y Vector of p predictor measurements X In the regression problem,Y is quantitative (e.g.price, blood pressure). ● In the classification problem,Y takes values in a finite, unordered set(survived/died,digit 0-9,cancer class of tissue sample). We have training data (x,);...;(x y).These are observations (examples,instances)of these measurements. 振华制数理统计在化学中的应用

李振华制数理统计在化学中的应用 7 造  Starting point:  Response measurement Y  Vector of p predictor measurements X  In the regression problem, Y is quantitative (e.g. price, blood pressure).  In the classification problem, Y takes values in a finite, unordered set (survived/died, digit 0-9, cancer class of tissue sample).  We have training data (x1 , y1 ); … ; (xN , yN ). These are observations (examples, instances) of these measurements

UN KeyyofMsivDeprment of Ch 监督学 Labels already Training: KNOWN Feature Feature Feature Feature Known #1 #2 #3 . N Labels Build model 李振华制数理统计在化学中的应用 8

李振华制造监督学习数理统计在化学中的应用 8 Feature #1 Feature #2 Feature #3 … Feature N Build model Known Labels Labels already KNOWN Training:

UN KeyyofMivDeprment of Ch 91 监督学 Labels NOT Training: KNOWN Feature Feature Feature Feature Goal #1 #2 #3 。o N Labels Use model built during training 数理统计在化学中的应用 9 李振华制造

李振华制造监督学习数理统计在化学中的应用 9 Feature #1 Feature #2 Feature #3 … Feature N Use model built during training Goal Labels Labels NOT KNOWN Training:

Key La fMoeiveDprmentof Ch 机器学习的类型 Unsupervised Learning(无监督学习 No labels are given to the learning algorithm,leaving it on its own to find structure in its input.Unsupervised learning can be a goal in itself (discovering hidden patterns in data)or a means towards an end (feature learning). No outcome variable,just a set of predictors(features)measured on a set of samples. objective is more fuzzy find groups of samples that behave similarly find features that behave similarly Find linear combinations of features with the most variation. difficult to know how well your are doing different from supervised learning,but can be useful as a pre- processing step for supervised learning 振华制数理统计在化学中的应用 10 造

李振华制造机器学习的类型  Unsupervised Learning （无监督学习）  No labels are given to the learning algorithm, leaving it on its own to find structure in its input. Unsupervised learning can be a goal in itself (discovering hidden patterns in data) or a means towards an end (feature learning).  No outcome variable, just a set of predictors (features) measured on a set of samples.  objective is more fuzzy  find groups of samples that behave similarly  find features that behave similarly  Find linear combinations of features with the most variation.  difficult to know how well your are doing  different from supervised learning, but can be useful as a preprocessing step for supervised learning 数理统计在化学中的应用 10

点击进入文档下载页（PDF格式）

共96页，可试读20页，点击继续阅读 ↓↓

点击下载（PDF格式）

浏览记录