相关文档

电子科技大学：《统计学习理论及应用 Statistical Learning Theory and Applications》课程教学资源（课件讲稿，英文版）Lecture 02 Review of Linear Algebra and Probability Theory
电子科技大学：《统计学习理论及应用 Statistical Learning Theory and Applications》课程教学资源（课件讲稿，英文版）Lecture 01 Introduction
安顺学院：《经济统计学》专业新增学士学位授予权评审汇报PPT（吴永武）
对外经济贸易大学：《应用统计 Applied Statistics》课程教学资源（教案讲稿）
对外经济贸易大学：《应用统计 Applied Statistics》课程教学资源（教学大纲）
上海交通大学：《统计原理 Principal of statistics》课程教学资源_大脑衰老与吃兴奋功能食品关系研究（调查问卷）
上海交通大学：《统计原理 Principal of statistics》课程教学资源_课后作业答案
上海交通大学：《统计原理 Principal of statistics》课程教学资源_课后习题解答
《统计原理 Principal of statistics》课程教学资源（统计软件教程）北京大学《统计软件SAS教程》（李东风）
《统计原理 Principal of statistics》课程教学资源（统计软件教程）数据分析与EVIEWS应用（易丹辉）
《统计原理 Principal of statistics》课程教学资源（统计软件教程）SPSS18.0教程（SPSS统计与分析）
《统计原理 Principal of statistics》课程教学资源（统计软件教程）R语言实战（中文完整版）
《统计原理 Principal of statistics》课程教学资源（统计软件教程）Matlab基础及其应用教程
《统计原理 Principal of statistics》课程教学资源（统计软件教程）MATLAB2013超强教程
《统计原理 Principal of statistics》课程教学资源（统计软件教程）Excel统计分析实例精讲
上海交通大学：《统计原理 Principal of statistics》课程教学资源_统计原理练习题（放大解答）
上海交通大学：《统计原理 Principal of statistics》课程教学资源_统计原理练习题
上海交通大学：《统计原理 Principal of statistics》课程教学资源_统计原理练习题
上海交通大学：《统计原理 Principal of statistics》课程教学资源_统计原理教材目录
上海交通大学：《统计原理 Principal of statistics》课程教学资源（上课讲义PPT）第五章统计数据关系的分析
电子科技大学：《统计学习理论及应用 Statistical Learning Theory and Applications》课程教学资源（课件讲稿，英文版）Lecture 04 Perceptron
电子科技大学：《统计学习理论及应用 Statistical Learning Theory and Applications》课程教学资源（课件讲稿，英文版）Lecture 05 Support Vector Machine
电子科技大学：《统计学习理论及应用 Statistical Learning Theory and Applications》课程教学资源（课件讲稿，英文版）Lecture 06 Multilayer Perceptron
电子科技大学：《统计学习理论及应用 Statistical Learning Theory and Applications》课程教学资源（课件讲稿，英文版）Lecture 07 Non-Linear Classification Model - Ensemble Methods
电子科技大学：《统计学习理论及应用 Statistical Learning Theory and Applications》课程教学资源（课件讲稿，英文版）Lecture 08 Data Representation - Parametric Model
电子科技大学：《统计学习理论及应用 Statistical Learning Theory and Applications》课程教学资源（课件讲稿，英文版）Lecture 09 Data Representation — Non-Parametric Model
电子科技大学：《统计学习理论及应用 Statistical Learning Theory and Applications》课程教学资源（课件讲稿，英文版）Lecture 10 Unsupervised Learning
电子科技大学：《统计学习理论及应用 Statistical Learning Theory and Applications》课程教学资源（课件讲稿）第一讲概述（文泉、陈娟）
电子科技大学：《统计学习理论及应用 Statistical Learning Theory and Applications》课程教学资源（课件讲稿）第二讲概率与线性代数回顾
电子科技大学：《统计学习理论及应用 Statistical Learning Theory and Applications》课程教学资源（课件讲稿）第三讲回归模型
电子科技大学：《统计学习理论及应用 Statistical Learning Theory and Applications》课程教学资源（课件讲稿）第四讲感知机
电子科技大学：《统计学习理论及应用 Statistical Learning Theory and Applications》课程教学资源（课件讲稿）第五讲支持向量机
电子科技大学：《统计学习理论及应用 Statistical Learning Theory and Applications》课程教学资源（课件讲稿）第六讲非线性分类模型——多层感知机
电子科技大学：《统计学习理论及应用 Statistical Learning Theory and Applications》课程教学资源（课件讲稿）第七讲非线性分类模型——集成方法
电子科技大学：《统计学习理论及应用 Statistical Learning Theory and Applications》课程教学资源（课件讲稿）第八讲数据表示——含参模型
电子科技大学：《统计学习理论及应用 Statistical Learning Theory and Applications》课程教学资源（课件讲稿）第九讲数据表示——不含参模型
电子科技大学：《统计学习理论及应用 Statistical Learning Theory and Applications》课程教学资源（课件讲稿）第十讲非监督学习
中国人民大学：《应用随机过程 Applied Stochastic Processes》课程教学资源（课件讲稿）第10章随机过程在保险精算中的应用
中国人民大学：《应用随机过程 Applied Stochastic Processes》课程教学资源（课件讲稿）第11章 Markov链Monte Carlo方法
中国人民大学：《应用随机过程 Applied Stochastic Processes》课程教学资源（课件讲稿）第1章预备知识（张波、商豪、邓军）

电子科技大学：《统计学习理论及应用 Statistical Learning Theory and Applications》课程教学资源（课件讲稿，英文版）Lecture 03 Regression Models

1 A Case 2 Least Squares Method 3 From Linear to Nonlinear: Using Linear Model 4 How Regression Got Its Name 5 Probability Interpretation 6 Bias–Variance Dilemma

团购合买资源类别：文库，文档格式：PDF，文档页数：79，文件大小：925.42KB

Statistical Learning Theory and Applications Lecture 3 Regression Models Instructor:Quan Wen SCSE@UESTC Fall 2021

Statistical Learning Theory and Applications Lecture 3 Regression Models Instructor: Quan Wen SCSE@UESTC Fall 2021

Outline (Level 1) ①A Case 2 Least Squares Method 3 From Linear to Nonlinear:Using Linear Model 4How Regression Got Its Name 5】 Probability Interpretation 6 Bias-Variance Dilemma 1178

Outline (Level 1) 1 A Case 2 Least Squares Method 3 From Linear to Nonlinear: Using Linear Model 4 How Regression Got Its Name 5 Probability Interpretation 6 Bias–Variance Dilemma 1 / 78

Topics: Basic theoretical concepts,properties,calculations of regression analysis Derivation and calculation of least squares o Probability interpretation of regression analysis o Regression analysis of nonlinear functions o Bias-variance dilemma of regression analysis Key points and difficulties: Key points:Derivation and calculation of least squares o Difficulties:Probability interpretation of regression analysis 2/78

Topics: Basic theoretical concepts, properties, calculations of regression analysis Derivation and calculation of least squares Probability interpretation of regression analysis Regression analysis of nonlinear functions Bias-variance dilemma of regression analysis Key points and difficulties: Key points: Derivation and calculation of least squares Difficulties: Probability interpretation of regression analysis 2 / 78

Outline (Level 1) A Case Least Squares Method From Linear to Nonlinear:Using Linear Model How Regression Got Its Name Probability Interpretation Bias-Variance Dilemma 3/78

Outline (Level 1) 1 A Case 2 Least Squares Method 3 From Linear to Nonlinear: Using Linear Model 4 How Regression Got Its Name 5 Probability Interpretation 6 Bias–Variance Dilemma 3 / 78

1.A Case o Investigate the trend of housing prices,with the following data: Year m2 Price (10k$) 1999 70 6 2000 60 6 2001 120 20 2002 125 26 o These data are usually expected to predict the future trend of house prices 4/78

1. A Case Investigate the trend of housing prices, with the following data: Year m 2 Price (10k$) 1999 70 6 2000 60 6 2001 120 20 2002 125 26 . . . . . . . . . These data are usually expected to predict the future trend of house prices 4 / 78

Let x =[x1,x2,..,xM be a regressor with each dimension represents a feature input.d corresponds to an output of x.Their dependencies can be expressed by a linear regression model as follows: M d= ∑ Wixi+E i=1 1 w1,w2,...,wM:set of fixed but unknown parameters. 2 e:expected error of the model."Fixed"means that we assume that the environment is stable,static. Written in a vector and matrix form: d=wx+E 5178

▶ Let x = [x1, x2, · · · , xM] T be a regressor with each dimension represents a feature input. d corresponds to an output of x. Their dependencies can be expressed by a linear regression model as follows: d = X M i=1 wixi + ε 1 w1,w2, · · · ,wM : set of fixed but unknown parameters. 2 ε: expected error of the model. ”Fixed” means that we assume that the environment is stable, static. ▶ Written in a vector and matrix form: d = w T x + ε 5 / 78

Outline (Level 1) A Case ②Least Squares Method From Linear to Nonlinear:Using Linear Model How Regression Got Its Name Probability Interpretation Bias-Variance Dilemma 6/78

Outline (Level 1) 1 A Case 2 Least Squares Method 3 From Linear to Nonlinear: Using Linear Model 4 How Regression Got Its Name 5 Probability Interpretation 6 Bias–Variance Dilemma 6 / 78

Outline (Level 2) ②Least Squares Method o Numeric Approach Analytic Approach 7/78

Outline (Level 2) 2 Least Squares Method Numeric Approach Analytic Approach 7 / 78

2.Least Squares Method 2.1.Numeric Approach Assuming a training set 2={x2,d),(2,f),…,(x,d'} defines the following cost function: haw)=2∑ew)-∑d-wx Through gradient descent algorithm,we can get w w+1=w,-n六wh(w） .n:step size (learning rate in machine learning) 8/78

2. Least Squares Method 2.1. Numeric Approach ▶ Assuming a training set Ω = {(x 1 , d 1 ),(x 2 , d 2 ), · · · ,(x N , d N )}, defines the following cost function: JΩ(w) = 1 2 X N i=1 ε 2 i (w) = 1 2 X N i=1 (d i − w T x i ) 2 ▶ Through gradient descent algorithm, we can get w wt+1 = wt − η ∂ ∂w JΩ(wt) • η : step size (learning rate in machine learning) 8 / 78

Gradient descent algorithm is based on the observations: If the real value function F(x)is differentiable and defined at a,then the function F(x)descends fastest along-VF(a),the opposite direction of the gradient at a If n has only one sample: 品am)=x{品wx-}x2xwx-0 =x(wx-d) =w-0x scalar using denominator layout/Hessian formulation for gradient. 9178

▶ Gradient descent algorithm is based on the observations: If the real value function F (x) is differentiable and defined at a, then the function F(x) descends fastest along − ▽ F (a), the opposite direction of the gradient at a ▶ If Ω has only one sample: ∂ ∂w JΩ(w) = 1 2 × ∂ ∂w (w T x − d) × 2 × (w T x − d) = x(w T x − d) = (w T x − d) | {z } scalar x using denominator layout/Hessian formulation for gradient. 9 / 78

点击进入文档下载页（PDF格式）

共79页，可试读20页，点击继续阅读 ↓↓

点击下载（PDF格式）

浏览记录