正在加载图片...
1.1.SPLINE REGRESSION 7 The only difficulty is the poor conditioning of the truncated power basis which will result in inaccuracies in the calculation of B.It is for this reason that the B-spline basis was introduced.Using this basis,we re-formulate the regression model as p+k BBi,p(x)+e (1.6) i=0 or in vector-matrix form y=BB+E where the (j,i)element of B is Bip(i).The least-squares estimate of B is then 3=(BTB)-BTy The orthogonality of the B-splines which are far enough apart results in a banded matrix BTB which has better conditioning properties than the matrix TTT.The bandedness property actually allows for the use of more efficient numerical techniques in computing B.Again,all of the usual regression techniques are available.The only drawback with this model is that the coefficients are uninterpretable,and the B-splines are a little less intuitive than the truncated power functions. We have been assuming that the knots are known.In general,they are unknown,and they must be chosen.Badly chosen knots can result in bad approximations.Because the spline regression problem can be formulated as an ordinary regression problem with a transformed predictor,it is possible to apply variable selection techniques such as back- ward selection to choose a set of knots.The usual approach is to start with a set of knots located at a subset of the order statistics of the predictor.Then backward selection is applied,using the truncated power basis form of the model.Each time a basis function is eliminated,the corresponding knot is eliminated.The method has drawbacks,notably the ill-conditioning of the basis as mentioned earlier. Figure 1.3 exhibits an example of a least-squares spline with automatically generated knots,applied to a data set consisting of titanium measurements.3 A version of backward selection was used to generated these knots;the stopping rule used was similar to the Akaike Information Criterion (AIC)discussed in Chapter 6.Although this least-squares spline fit to these data is better than what could be obtained using polynomial regression, it is unsatisfactory in many ways.The flat regions are not modelled smoothly enough, and the peak is cut off. 3To obtain Figure 1.3,type attach(titanium) y.1m <-1m(g bs(temperature,knots=c(755,835,905,975), Boundary.knots=c(550,1100))) plot(titanium) lines(temperature,predict(y.lm))1.1. SPLINE REGRESSION 7 The only difficulty is the poor conditioning of the truncated power basis which will result in inaccuracies in the calculation of βb. It is for this reason that the B-spline basis was introduced. Using this basis, we re-formulate the regression model as yj = X p+k i=0 βiBi,p(xi) + εj (1.6) or in vector-matrix form y = Bβ + ε where the (j, i) element of B is Bi,p(xj ). The least-squares estimate of β is then βb = (B TB) −1B T y The orthogonality of the B-splines which are far enough apart results in a banded matrix BTB which has better conditioning properties than the matrix T T T. The bandedness property actually allows for the use of more efficient numerical techniques in computing βb. Again, all of the usual regression techniques are available. The only drawback with this model is that the coefficients are uninterpretable, and the B-splines are a little less intuitive than the truncated power functions. We have been assuming that the knots are known. In general, they are unknown, and they must be chosen. Badly chosen knots can result in bad approximations. Because the spline regression problem can be formulated as an ordinary regression problem with a transformed predictor, it is possible to apply variable selection techniques such as back￾ward selection to choose a set of knots. The usual approach is to start with a set of knots located at a subset of the order statistics of the predictor. Then backward selection is applied, using the truncated power basis form of the model. Each time a basis function is eliminated, the corresponding knot is eliminated. The method has drawbacks, notably the ill-conditioning of the basis as mentioned earlier. Figure 1.3 exhibits an example of a least-squares spline with automatically generated knots, applied to a data set consisting of titanium measurements.3 A version of backward selection was used to generated these knots; the stopping rule used was similar to the Akaike Information Criterion (AIC) discussed in Chapter 6. Although this least-squares spline fit to these data is better than what could be obtained using polynomial regression, it is unsatisfactory in many ways. The flat regions are not modelled smoothly enough, and the peak is cut off. 3To obtain Figure 1.3, type attach(titanium) y.lm <- lm(g ~ bs(temperature, knots=c(755, 835, 905, 975), Boundary.knots=c(550, 1100))) plot(titanium) lines(temperature, predict(y.lm))
<<向上翻页向下翻页>>
©2008-现在 cucdc.com 高等教育资讯网 版权所有