正在加载图片...
D0110.1214088342304000000297 Density Estimation Simon J.Sheather Abstract.This paper provides a practical description of density estimation based on kernel methods.An important aim is to encourage practicing statis- ticians to apply these methods to data.As such,reference is made to imple- mentations of these methods in R,S-PLUS and SAS. Key words and phrases:Kernel density estimation,bandwidth selection, local likelihood density estimates,data sharpening. 1.INTRODUCTION data from the U.S.PGA tour in Section 5.Finally,in Density estimation has experienced a wide ex- Section 6 we provide some overall recommendations. plosion of interest over the last 20 years.Silver- man's (1986)book on this topic has been cited over 2.THE BASICS OF KERNEL 2000 times.Recent texts on smoothing which in- DENSITY ESTIMATION clude detailed density estimation include Bowman and Let X1,X2,....Xn denote a sample of size n from a Azzalini(1997),Simonoff(1996)and Wand and Jones random variable with density f. (1995).Density estimation has been applied in many The kernel density estimate of f at the point x is fields,including archaeology (e.g.,Baxter,Beardah given by and Westwood,2000),banking(e.g.,Tortosa-Ausina, 2002),climatology (e.g.,Ferreyra et al.,2001),eco- (1) nomics(e.g.,DiNardo,Fortin and Lemieux,1996),ge- nh i=1 netics(e.g.,Segal and Wiemels,2002),hydrology (e.g., Kim and Heo,2002)and physiology(e.g.,Paulsen and where the kernel K satisfies fK(x)dx =1 and the Heggelund,1996). smoothing parameter h is known as the bandwidth.In This paper provides a practical description of density practice,the kernel K is generally chosen to be a uni- estimation based on kernel methods.An important aim modal probability density symmetric about zero.In this is to encourage practicing statisticians to apply these case,K satisfies the conditions methods to data.As such,reference is made to imple- mentations of these methods in R,S-PLUS and SAS K(y)dy=1, Section 2 provides a description of the basic proper- ties of kernel density estimators.It is well known that yK(y)dy=0, the performance of kernel density estimators depends crucially on the value of the smoothing parameter, y2K0)d=2(K)>0. commonly referred to as the bandwidth.We describe methods for selecting the value of the bandwidth in A popular choice for K is the Gaussian kernel,namely, Section 3.In Section 4,we describe two recent im- portant improvements to kernel methods,namely,local (2) likelihood density estimates and data sharpening.We K)=2cp( compare the performance of some of the methods that Throughout this section we consider a small gener- have been discussed using a new example involving ated data set to illustrate the ideas presented.The data consist of a random sample of size n=10 from a Simon J.Sheather is Professor of Statistics,Australian normal mixture distribution made up of observations Graduate School of Management,University of New from N(u=-1,o2 (1/3)2)and N(u=1,o2= South Wales and the University of Sydney,Sydney,NSW (1/3)2),each with probability 0.5.Figure 1 shows a 2052,Australia (e-mail:simonsh @agsm.edu.au). kernel estimate of the density for these data using the 588Statistical Science 2004, Vol. 19, No. 4, 588–597 DOI 10.1214/088342304000000297 © Institute of Mathematical Statistics, 2004 Density Estimation Simon J. Sheather Abstract. This paper provides a practical description of density estimation based on kernel methods. An important aim is to encourage practicing statis￾ticians to apply these methods to data. As such, reference is made to imple￾mentations of these methods in R, S-PLUS and SAS. Key words and phrases: Kernel density estimation, bandwidth selection, local likelihood density estimates, data sharpening. 1. INTRODUCTION Density estimation has experienced a wide ex￾plosion of interest over the last 20 years. Silver￾man’s (1986) book on this topic has been cited over 2000 times. Recent texts on smoothing which in￾clude detailed density estimation include Bowman and Azzalini (1997), Simonoff (1996) and Wand and Jones (1995). Density estimation has been applied in many fields, including archaeology (e.g., Baxter, Beardah and Westwood, 2000), banking (e.g., Tortosa-Ausina, 2002), climatology (e.g., Ferreyra et al., 2001), eco￾nomics (e.g., DiNardo, Fortin and Lemieux, 1996), ge￾netics (e.g., Segal and Wiemels, 2002), hydrology (e.g., Kim and Heo, 2002) and physiology (e.g., Paulsen and Heggelund, 1996). This paper provides a practical description of density estimation based on kernel methods. An important aim is to encourage practicing statisticians to apply these methods to data. As such, reference is made to imple￾mentations of these methods in R, S-PLUS and SAS. Section 2 provides a description of the basic proper￾ties of kernel density estimators. It is well known that the performance of kernel density estimators depends crucially on the value of the smoothing parameter, commonly referred to as the bandwidth. We describe methods for selecting the value of the bandwidth in Section 3. In Section 4, we describe two recent im￾portant improvements to kernel methods, namely, local likelihood density estimates and data sharpening. We compare the performance of some of the methods that have been discussed using a new example involving Simon J. Sheather is Professor of Statistics, Australian Graduate School of Management, University of New South Wales and the University of Sydney, Sydney, NSW 2052, Australia (e-mail: simonsh@agsm.edu.au). data from the U.S. PGA tour in Section 5. Finally, in Section 6 we provide some overall recommendations. 2. THE BASICS OF KERNEL DENSITY ESTIMATION Let X1, X2,...,Xn denote a sample of size n from a random variable with density f . The kernel density estimate of f at the point x is given by fˆ h(x) = 1 nh n i=1 K x − Xi h (1) , where the kernel K satisfies  K(x) dx = 1 and the smoothing parameter h is known as the bandwidth. In practice, the kernel K is generally chosen to be a uni￾modal probability density symmetric about zero. In this case, K satisfies the conditions  K(y) dy = 1,  yK(y) dy = 0,  y2K(y) dy = µ2(K) > 0. A popular choice for K is the Gaussian kernel, namely, K(y) = 1 √2π exp −y2 2 (2) . Throughout this section we consider a small gener￾ated data set to illustrate the ideas presented. The data consist of a random sample of size n = 10 from a normal mixture distribution made up of observations from N (µ = −1, σ2 = (1/3)2) and N (µ = 1, σ2 = (1/3)2), each with probability 0.5. Figure 1 shows a kernel estimate of the density for these data using the 588
向下翻页>>
©2008-现在 cucdc.com 高等教育资讯网 版权所有