正在加载图片...
DENSITY ESTIMATION 591 Bowman(1984)proposed choosing the bandwidth In S-PLUS,hLscv is invoked by width ="bandwidth. as the value of h that minimizes the estimate of the two ucv”,while in R it is invoked by bw=“bw.ucv”.The other terms in the last expression,namely least squares cross-validation function LSCV(h)can have more than one local minimum(Hall and Marron, 5 1991).Thus,in practice,it is prudent to plot LSCV(h) and not just rely on the result of a minimization rou- where fi(y)denotes the kernel estimator constructed tine.Jones,Marron and Sheather(1996)recommended from the data without the observation Xi.The method that the largest local minimizer of LSCV(h)be used is commonly referred to as least squares cross- as hLscv,since this value produces better empirical validation,since it is based on the so-called leave-one- performance than the global minimizer.The Bowman out density estimator f_i(y).Rudemo(1982)proposed and Azzalini (1997)library of S-PLUS functions con- the same technique from a slightly different viewpoint. tains the function cv(y,h)which produces values of Bowman and Azzalini(1997,page 37)provided an ex- LSCV(h)for the data set y over the vector of different plicit expression for(5)for the Gaussian kernel. bandwidth values in h. Stone(1984,pages 1285-1286)provided the follow- For the least squares cross-validation based criterion, ing straightforward demonstration that the second term by using the representation in(5)is an unbiased estimate of the second term in ISE. Observe (6)LSCV(h) E[ao)fo)dy】 nh =fK(h)rds fdy where y(c)=fK(w)K(w+c)dw-2K(c),Scott and Terrell(1987)showed that =[(',小 h4 EILSCV(-(K(R( This leads to the unbiased estimate offf(y)f(y)dy -R(f)+O0n- a2()-2.6 AMISE[fin}-R(f)+0(n-1). Hall(1983,page 1157)showed that Thus,least cross-validation essentially provides esti- mates of R(f"),the only unknown quantity in 2i-6rd=io2dw+o,(品) AMISE(fi). For a given set of data,denote the bandwidth that minimizes ISE(fh)by hIsE.A number of authors(e.g., and hence changed the least squares cross-validation Gu,1998)argued that the ideal bandwidth is the ran- (LSCV)based criterion from(5)to dom quantity hIsE,since it minimizes the ISE for the Lscv=∫a0》2d山-2手,x. given sample.However,hIsE is an inherently difficult quantity to estimate.In particular,Hall and Marron i=l (1987a)showed that the smallest possible relative er- since "it is slightly simpler to compute,without affect- ing the asymptotics."This version is the one used by ror for any data based bandwidth h is most authors.We denote the value of h that minimizes LSCV(h)by hLscv.Least squares cross-validation is 元-1=0,n-0 also referred to as unbiased cross-validation since Fu Hall and Marron(1987b)and Scott and Terrell(1987) E[LSCV(h)]= E[io-fo2a】 showed that the least squares cross-validation band- width hscv achieves this best possible convergence -f20)d rate.In particular,they showed that -MISE-f2o)dy hISEDENSITY ESTIMATION 591 Bowman (1984) proposed choosing the bandwidth as the value of h that minimizes the estimate of the two other terms in the last expression, namely 1 n n i=1  (fˆ −i(y))2 dy − 2 n n i=1 fˆ (5) −i(Xi), where fˆ −i(y) denotes the kernel estimator constructed from the data without the observation Xi. The method is commonly referred to as least squares cross￾validation, since it is based on the so-called leave-one￾out density estimator fˆ −i(y). Rudemo (1982) proposed the same technique from a slightly different viewpoint. Bowman and Azzalini (1997, page 37) provided an ex￾plicit expression for (5) for the Gaussian kernel. Stone (1984, pages 1285–1286) provided the follow￾ing straightforward demonstration that the second term in (5) is an unbiased estimate of the second term in ISE. Observe E  fˆ h(y)f (y) dy =   K y − x h f (x) dx f (y) dy = E K Y − X h . This leads to the unbiased estimate of  f (y) f (y) dy ˆ : 1 n(n − 1)h  i=j K Xi − Xj h = 1 n n i=1 fˆ −i(Xi). Hall (1983, page 1157) showed that 1 n n i=1  (fˆ −i(y))2 dy =  (fˆ h(y))2 dy + Op 1 n2h and hence changed the least squares cross-validation (LSCV) based criterion from (5) to LSCV(h) =  (fˆ h(y))2 dy − 2 n n i=1 fˆ −i(Xi), since “it is slightly simpler to compute, without affect￾ing the asymptotics.” This version is the one used by most authors. We denote the value of h that minimizes LSCV(h) by hLSCV. Least squares cross-validation is also referred to as unbiased cross-validation since E[LSCV(h)] = E   fˆ h(y) − f (y)2 dy −  f 2(y) dy = MISE −  f 2(y) dy. In S-PLUS, hLSCV is invoked by width = “bandwidth. ucv”, while in R it is invoked by bw = “bw.ucv”. The least squares cross-validation function LSCV(h) can have more than one local minimum (Hall and Marron, 1991). Thus, in practice, it is prudent to plot LSCV(h) and not just rely on the result of a minimization rou￾tine. Jones, Marron and Sheather (1996) recommended that the largest local minimizer of LSCV(h) be used as hLSCV, since this value produces better empirical performance than the global minimizer. The Bowman and Azzalini (1997) library of S-PLUS functions con￾tains the function cv(y, h) which produces values of LSCV(h) for the data set y over the vector of different bandwidth values in h. For the least squares cross-validation based criterion, by using the representation (6) LSCV(h) = 1 nhR(K) + 2 n2h  i<j γ Xi − Xj h , where γ (c) =  K(w)K(w+c) dw−2K(c), Scott and Terrell (1987) showed that E[LSCV(h)] = 1 nhR(K) + h4 4 µ2(K)2R(f ) − R(f ) + O(n−1) = AMISE{fˆ h} − R(f ) + O(n−1). Thus, least cross-validation essentially provides esti￾mates of R(f ), the only unknown quantity in AMISE{fˆ h}. For a given set of data, denote the bandwidth that minimizes ISE(fˆ h) by hˆISE. A number of authors (e.g., Gu, 1998) argued that the ideal bandwidth is the ran￾dom quantity hˆISE, since it minimizes the ISE for the given sample. However, hˆISE is an inherently difficult quantity to estimate. In particular, Hall and Marron (1987a) showed that the smallest possible relative er￾ror for any data based bandwidth hˆ is hˆ hˆISE − 1 = Op  n−1/10 . Hall and Marron (1987b) and Scott and Terrell (1987) showed that the least squares cross-validation band￾width hLSCV achieves this best possible convergence rate. In particular, they showed that n1/10hLSCV hˆISE − 1
<<向上翻页向下翻页>>
©2008-现在 cucdc.com 高等教育资讯网 版权所有