中国人民大学：《非参数统计》课程教学资源（教案讲义，综合版）第八章非参数回归.pdf_大学文库

2016/12/21 1 第8章非参数回归参考:王星2014 非参数统计chap8 王星办公电话:86-10-82500167 电子邮箱:wangxingwisdom@126.com 大纲 • 核光滑回归 • 局部多项式回归 • 稳健回归 • *K近邻回归 • *正交序列回归 • *B-Spline Parametric & partial parametric 3 4 1.非参数回归 • The aim of a regression analysis is to produce a reasonable analysis to the unknown response function m, where for n data points ( ), the relationship can be modeled as • Unlike parametric approach where the function m is fully described by a finite set of parameters, nonparametric modeling accommodate a very flexible form of the regression curve. 超强适应的回归形式 Xi Yi , Y m(X ) , i 1, , n (1) i  i  i   ( ) ( ) Y m X X t t t t    5 Motivation • It provides a versatile method of exploring a general relationship between variables，can be used to test for nonlinearity. 提供更丰富的用于表达变量关系的视角,表达非线性结构 • It gives predictions of observations yet to be made without reference to a fixed parametric model 不需要在固定的参数形式下获得预测 • It provides a tool for finding spurious observations by studying the influence of isolated points 提供了一种发现异常观测并研究它可能影响的方法 • It constitutes a flexible method of substituting for missing values or interpolating between adjacent X-values 面对数据存在缺失或需要对缺失进行相邻插值时，它的适应性很强 6 光滑回归的基本原理 • A reasonable approximation to the regression curve m(x) will be the mean of response variables near a point x. This local averaging procedure can be defined as Every smoothing method to be described is of the form (2). where , and . W (x) ni ( ) ( / ) 1 Kh u h K u h   Kernel smoothing describes the shape of the weight function by a density function K with a scale parameter that adjusts the size and the form of the weights near x. The kernel K is a continuous, bounded and symmetric real function which integrates to 1。 ˆ( ) ( ) (2) 1 1   n i ni Yi m x n W x ( ) (3) ˆ W (x) K (x X )/ f x hi  h  i h     n i h h Xi f x n K x 1 1 ( ) ( ) ˆ

2016/12/21 2 7 Kernel Smoothing核光滑 • The Nadaraya-Watson estimator is defined by 均方误差，当我们有如下结论：这里当 h增大时，偏差bias增加的时候方差会下降。. (4) ( ) ( ) ˆ ( ) 1 1        n i h i n i h i i h K x X K x X Y m x 2 d (x,h) E[m ˆ (x) m(x)] M  h  n , h 0, nh, ( , ) ( ) [ ''( )] / 4 (5) 1 2 4 2 2 d x h nh c h d K m x M  K       var( i), cK  K (u)du, dK  u K(u)du 2 2 2   8 Figure 2. The Epanechnikov kernel K (u) = 0.75(1-u 2 ) I (|u| <= 1 ). Figure 3. The effective kernel weights for the food versus net income data set. at x = 1 and x = 2.5 for h = 0.1 ( label 1 ), h = 0.2 ( label 2 ), h = 0.3 ( label 3 ) with Epanechnikov kernel. ( ) ˆ K (x )/ f x h h  9 The amount of averaging is controlled by a smoothing parameter. The choice of smoothing parameter is related to the balances between bias and variance. N-W估计中核的选择影响微乎其微，带宽的影响比较大带宽变化时模式的变化 2016/12/21 局部回归 -Local Regression • 局部回归方法：  取每个局部点附近，长度s=k/n的邻域分段  依据距离，为邻域内点赋予权重，外围点权重为0  最小二乘拟合，使估计参数满足：min  联合各点函数拟合预测模型  自变量较多，可考虑有选择的选取自变量进行局部回归  维数≤3,4；高维模型稳定性易受训练集稀疏性的制约 0 x K0    n i i i i K y x 1 2 0 0 1 (   ) 11 2.局部多项式回归     0 2 0 0 0 0 0 1 0 0 ( ) ( ) ( ) ( )( ) ( ) ... 2! ( ) ( ) ( ) ! p p p m x m x m x m x x x x x m x x x O x x p              L Y m(X ) , i 1, , n (1) i  i  i   回忆标准非参数型：在待估计点附近做局部多项式拟合：局部多项式的矩阵表示为：     2 0 0 1 0 n p j t j t h t t j Y X x K X x               min     T y X W y X      12 为了实现局部多项式估计，我们需要选择多项式的阶数p ，带宽h以及核函数K .当然这些参数相互关联．当时，局部多项式拟合就变成全局多项式拟合，阶数决定模型的复杂性。 h  p

2016/12/21 3 与参数模型不同，局部多项式估计拟合的复杂性是由带宽来控制的, 通常是较小的，故而选择的问题就变得不重要了．如果目的是估计，则当是奇数，局部多项式拟合自动修正边界偏倚．进一步，则当是奇数，与阶拟合相比较，阶拟合包含了一个多余常数，但没有增加估计的方差。不过这个参数创造了一个降低偏倚的机会，特别是在边界区域．另一方面，带宽的选择在多项式拟合中起着重要作用．太大的带宽引起过渡平滑，产生过大的建模偏倚，而太小的带宽会导致不足平滑，获得受干扰的估计。 p p v m p v  p v  p 1 p v m h h 局部回归中不同的窗宽结果 14 3.稳健回归LOWESS locally weighted scatterplot smoother • 基本思想：局部线性估计稳健的权重平滑（残差大的减小权重） 15 MAD=median(|ri-median(ri)|) MAD 16 #Step1 #Defining the window width plot(TIME, LIBERAL, xlab="Time (in days)", ylab="Liberal Support", type='n', main="Defining the Window Width") ord which.diff], Lib[diffs > which.diff], pch=16, cex=2, col=gray(.75)) points(time[diffs <= which.diff], Lib[diffs <= which.diff],cex=2) x.n <- time[diffs <= which.diff] y.n <- Lib[diffs <= which.diff] text(locator(1), "Window Width") 17 #Step 2 #Applying the Tricube Weight #Tricube function tricube <- function(z) { ifelse (abs(z) < 1, (1 - (abs(z))^3)^3, 0) } #Bisquare weight bisquare <- function(z) { ifelse (abs(z) < 1, (1 - (abs(z))^2)^2, 0) } plot(range(TIME), c(0,1), xlab="Time (in days)", ylab="Tricube Weight", type='n', main="The Tricube Weight") abline(v=c(x0-which.diff, x0+which.diff), lty=2) abline(v=x0) xwts <- seq(x0-which.diff, x0+which.diff, len=250) lines(xwts, tricube((xwts-x0)/which.diff), lty=1, lwd=2) points(x.n, tricube((x.n - x0)/which.diff), cex=2) #Step 3 #The local polynomial plot(TIME, LIBERAL, xlab="Time (in days)", ylab="Liberal Support", type='n', main="Local Linear Regression") abline(v=c(x0-which.diff, x0+which.diff), lty=2) abline(v=x0) points(x.n, y.n, cex=2) mod <- lm(y.n ~ x.n, weights=tricube((x.n-x0)/which.diff)) reg.line(mod, lwd=2, col=1) points(x0, predict(mod, data.frame(x.n=x0)), pch=16, cex=1.8) text(locator(1), "Fitted Value of Y at Focal X") 18

中国人民大学：《非参数统计》课程教学资源（教案讲义，综合版）第八章 非参数回归

中国人民大学：《非参数统计》课程教学资源（教案讲义，综合版）第八章非参数回归