DENSITY ESTIMATION 593 “bandwidth.bcv_中国高校课件下载中心

点击下载：《实用非参数统计》课程教学资源（阅读材料）Density Estimation

正在加载图片...

DENSITY ESTIMATION 593 "bandwidth.bcv",while in R it is invoked by bw pilot bandwidth for the estimate R(f"),as a function "bw.bcv".Scott (1992,page 167)pointed out that of h,namely, lim BCV(h)=0 and hence he recommended that hBcv be taken as the largest local minimizer less 8(h)=C(K) Rf7n, than or equal to the oversmoothed bandwidth hos.On R(f") the other hand,Jones,Marron and Sheather (1996) and estimating the resulting unknown functionals of f recommend that hBcy be taken as the smallest local using kernel density estimates with bandwidths based minimizer,since they claim it gives better empirical performance. on normal rules of thumb.In this situation,the only Scott and Terrell (1987)showed that unknown in the following equation is h: 1/ n/10 hBCV R(K) n-1/5 hAMISE Lu(K)R()] has an asymptotic N(O,)distribution.A related The Sheather-Jones plug-in bandwidth hsJ is the so- result holds for least squares cross-validation,namely, lution to this equation.In S-PLUS,hsJ is invoked by that width ="bandwidth.sj",while in R it is invoked by hLSCV bw ="bw.SJ".In SAS PROC KDE,this method is hAMISE called Sheather-Jones plug-in(METHOD SJPI). 2 has an asymptotic N(0,oLscy)distribution(Hall and Under smoothness assumptions on the underlying Marron,1987a;Scott and Terrell,1987).According to density, Wand and Jones (1995,page 80),the ratio of the two asymptotic variances for the Gaussian kernel is n5/14 o2scy≈15.7， has an asymptotic N(0,os)distribution.Thus,the OBCV Sheather-Jones plug-in bandwidth has a relative con- thus indicating that bandwidths obtained from least vergence rate of order n-5/14,which is much higher squares cross-validation are expected to be much than that of BCV.Most of the improvement is because more variable than those obtained from biased cross- BCV effectively uses the same bandwidth to estimate validation. R(f")as it does to estimate f,while the Sheather- Jones plug-in approach uses different bandwidths. 3.3 Plug-in Methods However,it is important to note that the Sheather- The slow rate of convergence of LSCV and BCV Jones plug-in approach assumes more smoothness of encouraged much research on faster converging meth- the underlying density than either LSCV or BCV. ods.A popular approach,commonly called plug-in Jones,Marron and Sheather (1996)found that for methods,is to replace the unknown quantity R(f")in easy to estimate densities [i.e.,those for which R(f") the expression for hAMISE given by (3)with an esti- is relatively small],the distribution of hsy tends to mate.The method is commonly thought to date back be centered near hAMISE and has much lower vari- to Woodroofe (1970),who proposed it for estimat- ability than the distribution of hLscv.For hard to es- ing the density at a given point.Estimating R(f") timate densities [i.e.,those for which If"(x)varies by R(f")requires the user to choose the bandwidth widely],they found that the distribution of hs,tends to g for this so-called pilot estimate.There are many be centered at values larger than hAMISE(and thus over- ways this can be done.We next describe the "solve- smooths)and again has much lower variability than the the-equation"plug-in approach developed by Sheather distribution of hLscv. and Jones (1991),since this method is widely recom- A number of authors recommended that density esti- mended (e.g.,Simonoff,1996,page 77;Bowman and mates be drawn with more than one value of the band- Azzalini,1997,page 34;Venables and Ripley,2002, width.Scott (1992,page 161)advised looking at"a se- page 129). quence of(density)estimates based on the sequence of Different versions of the plug-in approach depend on smoothing parameters the exact form of the estimate of R(f").The Sheather and Jones (1991)approach is based on writing g,the h=hos/1.05fork=0,1,2,,DENSITY ESTIMATION 593 “bandwidth.bcv”, while in R it is invoked by bw = “bw.bcv”. Scott (1992, page 167) pointed out that limh→∞ BCV(h) = 0 and hence he recommended that hBCV be taken as the largest local minimizer less than or equal to the oversmoothed bandwidth hOS. On the other hand, Jones, Marron and Sheather (1996) recommend that hBCV be taken as the smallest local minimizer, since they claim it gives better empirical performance. Scott and Terrell (1987) showed that n1/10 hBCV hAMISE − 1 has an asymptotic N (0, σ2 BCV) distribution. A related result holds for least squares cross-validation, namely, that n1/10 hLSCV hAMISE − 1 has an asymptotic N (0, σ2 LSCV) distribution (Hall and Marron, 1987a; Scott and Terrell, 1987). According to Wand and Jones (1995, page 80), the ratio of the two asymptotic variances for the Gaussian kernel is σ2 LSCV σ2 BCV 15.7, thus indicating that bandwidths obtained from least squares cross-validation are expected to be much more variable than those obtained from biased crossvalidation. 3.3 Plug-in Methods The slow rate of convergence of LSCV and BCV encouraged much research on faster converging methods. A popular approach, commonly called plug-in methods, is to replace the unknown quantity R(f ) in the expression for hAMISE given by (3) with an estimate. The method is commonly thought to date back to Woodroofe (1970), who proposed it for estimating the density at a given point. Estimating R(f ) by R(fˆ g ) requires the user to choose the bandwidth g for this so-called pilot estimate. There are many ways this can be done. We next describe the “solvethe-equation” plug-in approach developed by Sheather and Jones (1991), since this method is widely recommended (e.g., Simonoff, 1996, page 77; Bowman and Azzalini, 1997, page 34; Venables and Ripley, 2002, page 129). Different versions of the plug-in approach depend on the exact form of the estimate of R(f ). The Sheather and Jones (1991) approach is based on writing g, the pilot bandwidth for the estimate R(fˆ), as a function of h, namely, g(h) = C(K) R(f ) R(f ) 1/7 h5/7, and estimating the resulting unknown functionals of f using kernel density estimates with bandwidths based on normal rules of thumb. In this situation, the only unknown in the following equation is h: h = R(K) µ2(K)2R(fˆ g(h)) 1/5 n−1/5. The Sheather–Jones plug-in bandwidth hSJ is the solution to this equation. In S-PLUS, hSJ is invoked by width = “bandwidth.sj”, while in R it is invoked by bw = “bw.SJ”. In SAS PROC KDE, this method is called Sheather–Jones plug-in (METHOD = SJPI). Under smoothness assumptions on the underlying density, n5/14 hSJ hAMISE − 1 has an asymptotic N (0, σ2 SJ) distribution. Thus, the Sheather–Jones plug-in bandwidth has a relative convergence rate of order n−5/14, which is much higher than that of BCV. Most of the improvement is because BCV effectively uses the same bandwidth to estimate R(f ) as it does to estimate f , while the Sheather– Jones plug-in approach uses different bandwidths. However, it is important to note that the Sheather– Jones plug-in approach assumes more smoothness of the underlying density than either LSCV or BCV. Jones, Marron and Sheather (1996) found that for easy to estimate densities [i.e., those for which R(f ) is relatively small], the distribution of hSJ tends to be centered near hAMISE and has much lower variability than the distribution of hLSCV. For hard to estimate densities [i.e., those for which |f (x)| varies widely], they found that the distribution of hSJ tends to be centered at values larger than hAMISE (and thus oversmooths) and again has much lower variability than the distribution of hLSCV. A number of authors recommended that density estimates be drawn with more than one value of the bandwidth. Scott (1992, page 161) advised looking at “a sequence of (density) estimates based on the sequence of smoothing parameters h = hOS/1.05k for k = 0, 1, 2,...,

<<向上翻页向下翻页>>

点击下载：《实用非参数统计》课程教学资源（阅读材料）Density Estimation