正在加载图片...
590 S.J.SHEATHER In SAS,PROC KDE produces kernel density esti- Silverman's reference bandwidth or Silverman's rule mates based on the usual Gaussian kernel (i.e.,the of thumb.It is given by Gaussian density with mean 0 and standard devia- tion 1),whereas S-PLUS has a function density which hSROT=0.9An-l/5」 produces kernel density estimates with a default kernel, where A minsample standard deviation,(sam- the Gaussian density with mean 0 and standard devia- ple interquartile range)/1.34).In SAS PROC KDE. tion 1/4.Thus,the bandwidths described in what fol- this method is called Silverman's rule of thumb lows must be multiplied by 4 when used in S-PLUS. (METHOD SROT).In R,Silverman's bandwidth is The program R also has a function density which pro- invoked by bw ="bw.nrd0".In S-PLUS,Silverman's duces kernel density estimates with a default kernel, bandwidth with constant 1.06 rather than 0.9 is invoked the Gaussian density with mean 0 and standard devia- by width=“nrd'. tion 1. Terrell and Scott (1985)and Terrell (1990)de- 3.1 Rules of Thumb veloped a bandwidth selection method based on the maximal smoothing principle so as to produce over- The computationally simplest method for choosing smoothed density estimates.The method is based on a global bandwidth h is based on replacing R(f"), choosing the "largest degree of smoothing compati- the unknown part of hAMISE,by its value for a para- ble with the estimated scale of the density"(Terrell, metric family expressed as a multiple of a scale pa- 1990,page 470).Looking back at (3),this amounts to rameter,which is then estimated from the data.The finding,for a given value of scale,the density f with method seems to date back to Deheuvels (1977)and the smallest value of R(f").Taking the variance o2 Scott (1979),who each proposed it for histograms. as the scale parameter,Terrell(1990,page 471)found However,the method was popularized for kernel den- that the beta(4,4)family of distributions with vari- sity estimates by Silverman(1986,Section 3.2),who ance o2 minimizes R(f").For the standard Gaussian used the normal distribution as the parametric family. kernel this leads to the oversmoothed bandwidth Let o and IQR denote the standard deviation and in- terquartile range of X,respectively.Take the kernel K hos=1.144Sn-1/5 to be the usual Gaussian kernel.Assuming that the un- derlying distribution is normal,Silverman(1986,pages In SAS PROC KDE,this method is called over- 45 and 47)showed that (3)reduces to smoothed(METHOD =OS). Comparing the oversmoothed bandwidth with the hAMISENORMAL =1.06gn-1/5 normal reference bandwidth hsNR,we see that the oversmoothed bandwidth is 1.08 times larger.Thus, and hAMISENORMAL =0.79IQRn-1/5. in practice there is often very little visual differ- ence between density estimates produced using either Jones,Marron and Sheather(1996)studied the Monte the oversmoothed bandwidth or the normal reference Carlo performance of the normal reference bandwidth bandwidth. based on the standard deviation,that is,they considered 3.2 Cross-Validation Methods hSNR 1.06Sn-1/5 A measure of the closeness of f and f for a given where S is the sample standard deviation.In SAS sample is the integrated squared error (ISE),which is PROC KDE,this method is called the simple nor- given by mal reference(METHOD SNR).Jones,Marron and Sheather(1996)found that hsNR had a mean that was ISE()=(f)-f()2dy usually unacceptably large and thus often produced oversmoothed density estimates. =ao2d-2∫a0d Furthermore,Silverman (1986,page 48)recom- mended reducing the factor 1.06 in the previous equa- +)dy. tion to 0.9 in an attempt not to miss bimodality and using the smaller of two scale estimates.This rule is Notice that the last term on the right-hand side of the commonly used in practice and it is often referred to as previous expression does not involve h.590 S. J. SHEATHER In SAS, PROC KDE produces kernel density esti￾mates based on the usual Gaussian kernel (i.e., the Gaussian density with mean 0 and standard devia￾tion 1), whereas S-PLUS has a function density which produces kernel density estimates with a default kernel, the Gaussian density with mean 0 and standard devia￾tion 1/4. Thus, the bandwidths described in what fol￾lows must be multiplied by 4 when used in S-PLUS. The program R also has a function density which pro￾duces kernel density estimates with a default kernel, the Gaussian density with mean 0 and standard devia￾tion 1. 3.1 Rules of Thumb The computationally simplest method for choosing a global bandwidth h is based on replacing R(f ), the unknown part of hAMISE, by its value for a para￾metric family expressed as a multiple of a scale pa￾rameter, which is then estimated from the data. The method seems to date back to Deheuvels (1977) and Scott (1979), who each proposed it for histograms. However, the method was popularized for kernel den￾sity estimates by Silverman (1986, Section 3.2), who used the normal distribution as the parametric family. Let σ and IQR denote the standard deviation and in￾terquartile range of X, respectively. Take the kernel K to be the usual Gaussian kernel. Assuming that the un￾derlying distribution is normal, Silverman (1986, pages 45 and 47) showed that (3) reduces to hAMISENORMAL = 1.06σ n−1/5 and hAMISENORMAL = 0.79 IQRn−1/5. Jones, Marron and Sheather (1996) studied the Monte Carlo performance of the normal reference bandwidth based on the standard deviation, that is, they considered hSNR = 1.06Sn−1/5, where S is the sample standard deviation. In SAS PROC KDE, this method is called the simple nor￾mal reference (METHOD = SNR). Jones, Marron and Sheather (1996) found that hSNR had a mean that was usually unacceptably large and thus often produced oversmoothed density estimates. Furthermore, Silverman (1986, page 48) recom￾mended reducing the factor 1.06 in the previous equa￾tion to 0.9 in an attempt not to miss bimodality and using the smaller of two scale estimates. This rule is commonly used in practice and it is often referred to as Silverman’s reference bandwidth or Silverman’s rule of thumb. It is given by hSROT = 0.9An−1/5, where A = min{sample standard deviation, (sam￾ple interquartile range)/1.34}. In SAS PROC KDE, this method is called Silverman’s rule of thumb (METHOD = SROT). In R, Silverman’s bandwidth is invoked by bw = “bw.nrd0”. In S-PLUS, Silverman’s bandwidth with constant 1.06 rather than 0.9 is invoked by width = “nrd”. Terrell and Scott (1985) and Terrell (1990) de￾veloped a bandwidth selection method based on the maximal smoothing principle so as to produce over￾smoothed density estimates. The method is based on choosing the “largest degree of smoothing compati￾ble with the estimated scale of the density” (Terrell, 1990, page 470). Looking back at (3), this amounts to finding, for a given value of scale, the density f with the smallest value of R(f ). Taking the variance σ2 as the scale parameter, Terrell (1990, page 471) found that the beta(4, 4) family of distributions with vari￾ance σ2 minimizes R(f ). For the standard Gaussian kernel this leads to the oversmoothed bandwidth hOS = 1.144Sn−1/5. In SAS PROC KDE, this method is called over￾smoothed (METHOD = OS). Comparing the oversmoothed bandwidth with the normal reference bandwidth hSNR, we see that the oversmoothed bandwidth is 1.08 times larger. Thus, in practice there is often very little visual differ￾ence between density estimates produced using either the oversmoothed bandwidth or the normal reference bandwidth. 3.2 Cross-Validation Methods A measure of the closeness of fˆ and f for a given sample is the integrated squared error (ISE), which is given by ISE(fˆ h) =   fˆ h(y) − f (y)2 dy =  (fˆ h(y))2 dy − 2  fˆ h(y)f (y) dy +  f 2(y) dy. Notice that the last term on the right-hand side of the previous expression does not involve h.
<<向上翻页向下翻页>>
©2008-现在 cucdc.com 高等教育资讯网 版权所有