正在加载图片...
COMPUTERS GEOSCIENCES ELSEVIER Computers Geosciences 31 (2005)579-587 www.elsevier.com/locate/cageo Multivariate outlier detection in exploration geochemistry* Peter Filzmoser*,Robert G.Garrett,Clemens Reimann "Institute of Statistics and Probability Theory.Vienna University of Technology.Wiedner Hauptstr.8-10.A-1040 Wien.Austria bGeological Survey of Canada,Natural Resources Canada,601 Booth Street,Ottawa,Ontario,Canada,KIA 0E8 Geological Survey of Norway.N-7491 Trondheim.Norway Received 16 November 2004:accepted 16 November 2004 Abstract A new method for multivariate outlier detection able to distinguish between extreme values of a normal distribution and values originating from a different distribution (outliers)is presented.To facilitate visualising multivariate outliers spatially on a map,the multivariate outlier plot,is introduced.In this plot different symbols refer to a distance measure from the centre of the distribution,taking into account the shape of the distribution,and different colours are used to signify the magnitude of the values for each variable.The method is illustrated using a real geochemical data set from far-northern Europe.It is demonstrated that important processes such as the input of metals from contamination sources and the contribution of sea-salts via marine aerosols to the soil can be identified and separated. C 2004 Elsevier Ltd.All rights reserved. Keywords:Multivariate outliers:Robust statistics:Exploration geochemistry:Background 1.Introduction tances.The definition of an outlier limit or threshold, dividing background data from outliers.has found The detection of data outliers and unusual data much attention in the geochemical literature and to date structures is one of the main tasks in the statistical no universally applicable method of identifying outliers analysis of geochemical data.Traditionally,despite the has been proposed (see discussion in Reimann et al., fact that geochemistry data sets are almost always 2005).In this context,background is defined by the multivariate,outliers are most frequently sought for properties,location and spread,of geochemical samples each single variable in a given data set(Reimann et al., that represent the natural variation of the material being 2005).The search for outliers is usually based on studied in a specific area that are uninfluenced by location and spread of the data.The higher (lower)the extraneous and exotic processes such as those related to analytical result of a sample,the greater is the distance rare rock types,mineral deposit forming processes,or of the observation from the central location of all anthropogenic contamination.In geochemistry,outliers observations;outliers thus,typically,have large dis- are generally observations resulting from a secondary process and not extreme values from the background *Code available from server at http://cran.r-project.org/ distribution.Samples where the analytical values are *Corresponding author.Tel.:+431 58801 10733; derived from a secondary process-be it mineralisation fax:+4315880110799. or contamination-do not need to be especially high E-mail addresses:p.filzmoser@tuwien.ac.at (P.Filzmoser). (or low)in relation to all values of a variable in a data garrett@gsc.NRCan.gc.ca(R.G.Garrett), set,and thus attempts to identify these samples with Clemens.Reimann@ngu.no(C.Reimann). classical univariate methods commonly fail.However, 0098-3004/S-see front matter 2004 Elsevier Ltd.All rights reserved. doi:10.1016j.cageo.2004.11.013Computers & Geosciences 31 (2005) 579–587 Multivariate outlier detection in exploration geochemistry$ Peter Filzmosera,, Robert G. Garrettb , Clemens Reimannc a Institute of Statistics and Probability Theory, Vienna University of Technology, Wiedner Hauptstr. 8-10, A-1040 Wien, Austria b Geological Survey of Canada, Natural Resources Canada, 601 Booth Street, Ottawa, Ontario, Canada, K1A 0E8 c Geological Survey of Norway, N-7491 Trondheim, Norway Received 16 November 2004; accepted 16 November 2004 Abstract A new method for multivariate outlier detection able to distinguish between extreme values of a normal distribution and values originating from a different distribution (outliers) is presented. To facilitate visualising multivariate outliers spatially on a map, the multivariate outlier plot, is introduced. In this plot different symbols refer to a distance measure from the centre of the distribution, takinginto account the shape of the distribution, and different colours are used to signify the magnitude of the values for each variable. The method is illustrated using a real geochemical data set from far-northern Europe. It is demonstrated that important processes such as the input of metals from contamination sources and the contribution of sea-salts via marine aerosols to the soil can be identified and separated. r 2004 Elsevier Ltd. All rights reserved. Keywords: Multivariate outliers; Robust statistics; Exploration geochemistry; Background 1. Introduction The detection of data outliers and unusual data structures is one of the main tasks in the statistical analysis of geochemical data. Traditionally, despite the fact that geochemistry data sets are almost always multivariate, outliers are most frequently sought for each single variable in a given data set (Reimann et al., 2005). The search for outliers is usually based on location and spread of the data. The higher (lower) the analytical result of a sample, the greater is the distance of the observation from the central location of all observations; outliers thus, typically, have large dis￾tances. The definition of an outlier limit or threshold, dividingbackground data from outliers, has found much attention in the geochemical literature and to date no universally applicable method of identifyingoutliers has been proposed (see discussion in Reimann et al., 2005). In this context, background is defined by the properties, location and spread, of geochemical samples that represent the natural variation of the material being studied in a specific area that are uninfluenced by extraneous and exotic processes such as those related to rare rock types, mineral deposit formingprocesses, or anthropogenic contamination. In geochemistry, outliers are generally observations resulting from a secondary process and not extreme values from the background distribution. Samples where the analytical values are derived from a secondary process—be it mineralisation or contamination—do not need to be especially high (or low) in relation to all values of a variable in a data set, and thus attempts to identify these samples with classical univariate methods commonly fail. However, ARTICLE IN PRESS www.elsevier.com/locate/cageo 0098-3004/$ - see front matter r 2004 Elsevier Ltd. All rights reserved. doi:10.1016/j.cageo.2004.11.013 $Code available from server at http://cran.r-project.org/. Correspondingauthor. Tel.: +43 1 58801 10733; fax: +43 1 58801 10799. E-mail addresses: p.filzmoser@tuwien.ac.at (P. Filzmoser), garrett@gsc.NRCan.gc.ca (R.G. Garrett), Clemens.Reimann@ngu.no (C. Reimann)
向下翻页>>
©2008-现在 cucdc.com 高等教育资讯网 版权所有