正在加载图片...
132 Austrian Journal of Statistics,Vol.34(2005),No.2,127-138 m=1.5 Overlap 10% 8 (ssew qeqoud) 0 eouejsip 0 0 0 8 0 00 0 8 0 00000000000000 10152025 30 2345 10 15202530 Dimension p Dimension p Figure 3:Effect of the dimension on the overlap (left)and of the choice of n(right)of a shifted distribution. quence of the heavier tails of the 73 distribution.Method BG gives the best results with this respect.The breakdown in the curve of correctly identified outliers occurs already at a lower percentage of simulated outliers.Method BG gives rather poor results in the low dimensional situation(left picture)but very good results for p=30(right picture).If we take the same value n =1.5 of the shift outliers as in the previous simulation,the results for the correctly identified outliers are comparable to the right picture of Figure 2. Remark 2:Note that the critical values for the methods BG and FGR were computed for the multivariate normal distribution as model distribution.Since we used T3 distribution here as "clean data"distribution it would be correct to compute the critical values under this model.However,an aspect of this simulation was to see the effect of deviations from the model. 3.3 Skewed Data with Shift Normal Outliers Deviations from normality often occur in practical applications,and here we will study the effect of asymmetric data.The simulation setup is similar as before with the difference that we take the absolute values of the data generated from the T3 distribution.The normal shift outliers are at a value of n=3 (for p=5,n =200)and n =1.5 (for p 30,n 1000),respectively.The results(Figure 5)are coherent with the results of the previous experiments.The percentages of wrongly identified outliers are comparable to the previous experiment with 73 distribution,but they decrease with increasing outlier percentage.Again,method BG gives the best results.Methods FGR and RZ have a very good performance for identifying the outliers whereas BG has difficulties 3.4 Sensitivity with Respect to the Choice of the Parameters The parameters for the different outlier detection methods were fixed in the previous ex- periments.Of course it is of interest if this choice has severe influence to the performance132 Austrian Journal of Statistics, Vol. 34 (2005), No. 2, 127-138 0 5 10 15 20 25 30 0.0 0.1 0.2 0.3 0.4 0.5 0.6 Dimension p Estimated overlap (probability mass) η = 1.5 1.0 1.5 2.0 2.5 3.0 Overlap = 10% Dimension p Estimated outlier distance | | | | | | | | | 2 3 4 5 10 15 20 25 30 Figure 3: Effect of the dimension on the overlap (left) and of the choice of η (right) of a shifted distribution. quence of the heavier tails of the T3 distribution. Method BG gives the best results with this respect. The breakdown in the curve of correctly identified outliers occurs already at a lower percentage of simulated outliers. Method BG gives rather poor results in the low dimensional situation (left picture) but very good results for p = 30 (right picture). If we take the same value η = 1.5 of the shift outliers as in the previous simulation, the results for the correctly identified outliers are comparable to the right picture of Figure 2. Remark 2: Note that the critical values for the methods BG and FGR were computed for the multivariate normal distribution as model distribution. Since we used T3 distribution here as “clean data” distribution it would be correct to compute the critical values under this model. However, an aspect of this simulation was to see the effect of deviations from the model. 3.3 Skewed Data with Shift Normal Outliers Deviations from normality often occur in practical applications, and here we will study the effect of asymmetric data. The simulation setup is similar as before with the difference that we take the absolute values of the data generated from the T3 distribution. The normal shift outliers are at a value of η = 3 (for p = 5, n = 200) and η = 1.5 (for p = 30, n = 1000), respectively. The results (Figure 5) are coherent with the results of the previous experiments. The percentages of wrongly identified outliers are comparable to the previous experiment with T3 distribution, but they decrease with increasing outlier percentage. Again, method BG gives the best results. Methods FGR and RZ have a very good performance for identifying the outliers whereas BG has difficulties. 3.4 Sensitivity with Respect to the Choice of the Parameters The parameters for the different outlier detection methods were fixed in the previous ex￾periments. Of course it is of interest if this choice has severe influence to the performance
<<向上翻页向下翻页>>
©2008-现在 cucdc.com 高等教育资讯网 版权所有