《实用统计软件》课程教学资源（阅读材料）多元分类问题中的应用 Variance Reduction with Monte Carlo Estimates of Error Rates in Multivariate Classication.pdf_大学文库

2 Multivariate Classication 2 we will concentrate on control variates and demonstrate their variance reduction potential in our special problem. The paper is organized as follows: In section 2 we will give a brief introduction to multivariate classication. In section 3, we will propose two dierent control variates which will be studied and compared in section 4 by means of some examples. The paper closes with a conclusion in section 5. 2 Multivariate Classication Classication deals with the allocation of ob jects to g predetermined groups 1; 2;::: ; g, say. The aim is to minimize the misclassication error (rate) over all possible future allocations, given the group densities pi(x) ( i= 1 ;2;::: ;g). The minimal error rate is the so{called Bayes error. We measure d features (variables) of the ob jects we consider important for discrimination between the ob jects. These can be continuous features (GNP, consumption etc.) or discrete (number of rms, number of inhabitants etc.). Once the group densities are specied, in order to minimize the error rate we allocate an ob ject with feature vector x to group i, if pi(x) > pj (x) ( 6=j i): (1) Classication methods often assume the group densities pi(x) to be normal. Then there are at least two modelling possibilities (see, e.g., [2]): Estimate the same covariance matrix for all groups (LDA, linear discriminant analysis) or estimate a dierent covariance matrix for each group (QDA, quadratic discriminant analysis). Of course, both methods also estimate dierent mean vectors for each group. In this paper we take QDA as the adequate, and thus standard classication procedure. Often we additionally want to reduce the dimension from d to d0 = 1 or 2 to enhance human perception (dimension reduction). The construction of a d0{space with minimal error rate, given the group densities pi(x) in d{space, can be done by modern optimization techniques, for example Simulated Annealing [3]. In each optimization step, a pro jection space is proposed. Then we determine the group densities (either estimated by means of the pro jected data or directly derived from the pro jected densities of the original space) [4], and calculate the error rate in the pro jection space

3.2 Two Specific Control Variates 4 where the first term on the right hand side has no variance and the second term is computed as the sample mean of Monte-Carlo replicates.Then the variance of the right hand side of(4)is Var(error -errorc)/N, (6) where N is the sample size to determine error and errorco,and Var(error -errorc)=Var(erron+Var(erroro) -2Cov(error,errorc). (7) Now it becomes clear that a large positive correlation between error and errorc can reduce the variance compared to the "naive"estimator EMc(error),i.e.the sample mean of Monte Carlo replicates of error with variance Var(EMc(error))= Var(error)/N.We can even do better.We can use the equation E(error)=aE(errorc)+E(error-aerrorcu) (8) to select that parameter a that minimizes the variance Var(error-aerrorc), (9) leading to Q= Cov(error,errorcu) Var(errore) (10) which is almost equal to the correlation coefficient owhen Var(error)Var(errorco) holds.The final result is then min Var(error-aerrora)(1-2)Var(error), (11) i.e.there can always be a gain when o0. Considering the above arguments,what we look for as a control variate procedure is any classification method which gives results as much as possible correlated with QDA,and for which the exact expected error rate is easily computable.Moreover, one should avoid control variates for which the additional computational effort is that high that overall computation time is increased even in the case of variance reduction. 3.2 Two Specific Control Variates What is,then,a suitable control variate in our context?We will discuss two possi- bilities.In both cases the control variate procedure assumes a somewhat simplified problem situation to be true in order to simplify the Monte Carlo procedure.In the first procedure the covariance matrices of the different groups are assumed to be identical.In the second procedure the possibly high dimensional problem is optimally projected to one dimension.Note that we assumed to study problems with normal group densities with individual covariance matrices.Thus,QDA was assumed to be the standard classification method

3.2 Two Specic Control Variates 4 where the rst term on the right hand side has no variance and the second term is computed as the sample mean of Monte-Carlo replicates. Then the variance of the right hand side of (4) is Var(error errorcv)=N ; (6) where N is the sample size to determine error and errorcv , and Var(error errorcv) = Var( error) + Var( errorcv) 2 Cov( error; errorcv): (7) Now it becomes clear that a large positive correlation between error and errorcv can reduce the variance compared to the "naive" estimator E^ MC (error), i.e. the sample mean of Monte Carlo replicates of error with variance Var(E^ MC (error)) = Var(error)=N. We can even do better. We can use the equation E(error) = E(errorcv) + E( error errorcv) (8) to select that parameter that minimizes the variance Var(error errorcv); (9) leading to = Cov(error; errorcv) Var(errorcv) (10) which is almost equal to the correlation coecient % when Var(error) Var(errorcv) holds. The nal result is then min Var(error errorcv) (1 % 2)Var(error); (11) i.e. there can always be a gain when % 6= 0. Considering the above arguments, what we look for as a control variate procedure is any classication method which gives results as much as possible correlated with QDA, and for which the exact expected error rate is easily computable. Moreover, one should avoid control variates for which the additional computational eort is that high that overall computation time is increased even in the case of variance reduction. 3.2 Two Specic Control Variates What is, then, a suitable control variate in our context? We will discuss two possibilities. In both cases the control variate procedure assumes a somewhat simplied problem situation to be true in order to simplify the Monte Carlo procedure. In the rst procedure the covariance matrices of the dierent groups are assumed to be identical. In the second procedure the possibly high dimensional problem is optimally pro jected to one dimension. Note that we assumed to study problems with normal group densities with individual covariance matrices. Thus, QDA was assumed to be the standard classication method

3.2 Two Specific Control Variates 5 1.The first idea is to utilize the error rate computed by LDA as erroree based on the assumption of equal covariance matrices for all the groups.The error rate error is calculated by QDA from N random realizations drawn from the group densities.To get Emc(error)we generate W such error rates and average.Therefore we used N x W random vectors.Now we take the same random vectors and apply LDA with the same,so-called pooled,covariance matrix for all groups to calculate error rates erroree.If Ei is the assumed covariance matrix for group i,then ((N-1))/(N-g)is the pooled covariance matrix,where M is the number of realizations in group i.The differences of the w corresponding estimates error and errorc are used to calculate E(error-errore).At last we calculate E(errorce)in an exact manner (so that we have no variance)by numerical integration based on the densities with pooled covariance matrices.We now have all the ingredients we need for an efficiency comparison with the naive Monte Carlo estimator.The variance of the naive estimator is calculated by the sample of size W of the estimated error rates error and the variance of the control variate estimator by the sample of size W of(error-error).This approach has the drawback that we have to calculate an exact integral in a projection space which might be two dimensional or of even higher dimension with rather ugly borderlines. 2.A second possibility to determine the error rate error is to use another con- trol variate:the error rate of an "optimal"one dimensional projection.This can be obtained by the largest eigenvalue and the corresponding eigenvector of QDA in the original space or by direct minimization of the error rate.We do the same as in 1 to obtain Evc(error).But then we project the random vectors onto the optimally discriminating direction taking into account the different covariance structures and build the differences of corresponding er- ror estimates to compute E(error-error).Now,the exact calculation of E(error)is simply a one dimensional integration with clearcut intersection points.This speeds up the procedure compared to 1.To construct the opti- mally discriminating one dimensional projection we follow an idea in [5]where it was proposed to project on the first eigenvector v of MMT,where M=(h-4,,2-41,2g-1,,2-) (12) where the i are the group means and the Ei are (again)the group covariance matrices,i=1,...,g.The projected means,variances and feature vectors then have the form:=i=ofiv and x*=of'r. In order to represent adequate control variates the additional computation time of procedures 1 and 2 have to be small relative to the computation time of naive Monte Carlo.That this is the case not considering the computation of the exact expected error rates should be clear by the following arguments. Naive Monte Carlo estimates the means and the covariance matrices of the groups,and evaluates the corresponding estimated group densities for each observation

3.2 Two Specic Control Variates 5 1. The rst idea is to utilize the error rate computed by LDA as errorcv based on the assumption of equal covariance matrices for all the groups. The error rate error is calculated by QDA from N random realizations drawn from the group densities. To get E^ MC (error) we generate W such error rates and average. Therefore we used N W random vectors. Now we take the same random vectors and apply LDA with the same, so-called pooled, covariance matrix for all groups to calculate error rates errorcv. If i is the assumed covariance matrix for group i, then (Pg i=1(Ni 1)i)=(N g) is the pooled covariance matrix, where Ni is the number of realizations in group i. The dierences of the W corresponding estimates error and errorcv are used to calculate E( ^ error errorcv). At last we calculate E(errorcv) in an exact manner (so that we have no variance) by numerical integration based on the densities with pooled covariance matrices. We now have all the ingredients we need for an eciency comparison with the naive Monte Carlo estimator. The variance of the naive estimator is calculated by the sample of size W of the estimated error rates error and the variance of the control variate estimator by the sample of size W of (error errorcv). This approach has the drawback that we have to calculate an exact integral in a pro jection space which might be two dimensional or of even higher dimension with rather ugly borderlines. 2. A second possibility to determine the error rate error is to use another control variate: the error rate of an "optimal" one dimensional pro jection. This can be obtained by the largest eigenvalue and the corresponding eigenvector of QDA in the original space or by direct minimization of the error rate. We do the same as in 1 to obtain E^ MC (error). But then we pro ject the random vectors onto the optimally discriminating direction taking into account the dierent covariance structures and build the dierences of corresponding error estimates to compute E( ^ error errorcv). Now, the exact calculation of E(errorcv) is simply a one dimensional integration with clearcut intersection points. This speeds up the procedure compared to 1. To construct the optimally discriminating one dimensional pro jection we follow an idea in [5] where it was proposed to pro ject on the rst eigenvector v1 of MMT , where M = ( g 1; :::; 2 1; g 1; :::; 2 1) (12) where the i are the group means and the i are (again) the group covariance matrices, i = 1, ..., g. The pro jected means, variances and feature vectors then have the form: i = v T 1 i, i = v T 1 iv1 and x = v T 1 x . In order to represent adequate control variates the additional computation time of procedures 1 and 2 have to be small relative to the computation time of naive Monte Carlo. That this is the case not considering the computation of the exact expected error rates should be clear by the following arguments. Naive Monte Carlo estimates the means and the covariance matrices of the groups, and evaluates the corresponding estimated group densities for each observation.

3.3 Exact expected error rates 6 Procedure 1 additionally needs to compute the mean of the estimated covari- ance matrices of the groups,and to evaluate group densities for each observa- tion corresponding to the pooled covariance matrix in each group. Procedure 2 additionally computes the'difference matrix'M,its first eigenvec- tor v,and the corresponding projections of the group means and covariance matrices,and evaluates the corresponding 1D normal densities in each pro- jected observation. Therefore,in procedures 1 and 2 the preparation'of the density evaluation does not depend on the number of observations,resulting in a much smaller additional preparation time'than the preparation time for naive Monte Carlo for big numbers of observations.Moreover,in procedure 2 also the additional density evaluations are much quicker than the density evaluations in naive Monte Carlo,since they are in 1D. In procedure 1 the exact expected error rates have to be calculated numerically,in general.For the exact expected error rates in procedure 2,however,an analytic formula can be derived,even.This will be done in the next subsection.In section 4 we will demonstrate the differences between procedures 1 and 2 by some examples. 3.3 Exact expected error rates In procedure 2 exact expected error rates have to be calculated for univariate normal projected distributions.In this case a general formula for the exact expected error rate could be given depending on the intersection points of the univariate normal densities corresponding to the projected group means and variances.In order to illustrate the idea,let us discuss the 2 and 3 groups cases.Moreover,let us assume equal a-priori probabilities 1/g for all g groups for simplicity.In the simulations in the following sections,we also will discuss these cases only. In the case of 2 groups let the intersection point of the two normal densities be 12. Then,obviously,the exact expected error rate corresponding to these densities is (cp.figure 1) E(erTorc)=(1-Φ1(x12）+④2(x12)/2) (13) where is the normal distribution with mean to the left of z12,and 2 the distri- bution with mean to the right. In the case of 3 groups let the distribution indices again be chosen so that a lower index indicates a lower mean.We are now interested in the relative location of the intersection points of the 3 densities.The error rate of the leftmost group 1 is determined by the first intersection on the right hand side with one of the densities of the other groups.For the rightmost group 3 the same is true for the densities on the left.The error rate of the middle group 2 is,correspondingly,determined by

3.3 Exact expected error rates 6 Procedure 1 additionally needs to compute the mean of the estimated covariance matrices of the groups, and to evaluate group densities for each observation corresponding to the pooled covariance matrix in each group. Procedure 2 additionally computes the 'dierence matrix' M, its rst eigenvector v1, and the corresponding pro jections of the group means and covariance matrices, and evaluates the corresponding 1D normal densities in each projected observation. Therefore, in procedures 1 and 2 the 'preparation' of the density evaluation does not depend on the number of observations, resulting in a much smaller additional 'preparation time' than the preparation time for naive Monte Carlo for big numbers of observations. Moreover, in procedure 2 also the additional density evaluations are much quicker than the density evaluations in naive Monte Carlo, since they are in 1D. In procedure 1 the exact expected error rates have to be calculated numerically, in general. For the exact expected error rates in procedure 2, however, an analytic formula can be derived, even. This will be done in the next subsection. In section 4 we will demonstrate the dierences between procedures 1 and 2 by some examples. 3.3 Exact expected error rates In procedure 2 exact expected error rates have to be calculated for univariate normal pro jected distributions. In this case a general formula for the exact expected error rate could be given depending on the intersection points of the univariate normal densities corresponding to the pro jected group means and variances. In order to illustrate the idea, let us discuss the 2 and 3 groups cases. Moreover, let us assume equal a-priori probabilities 1=g for all g groups for simplicity. In the simulations in the following sections, we also will discuss these cases only. In the case of 2 groups let the intersection point of the two normal densities be x12. Then, obviously, the exact expected error rate corresponding to these densities is (cp. gure 1) E(errorcv) = ((1 1(x12)+2(x12))=2) (13) where 1 is the normal distribution with mean to the left of x12, and 2 the distribution with mean to the right. In the case of 3 groups let the distribution indices again be chosen so that a lower index indicates a lower mean. We are now interested in the relative location of the intersection points of the 3 densities. The error rate of the leftmost group 1 is determined by the rst intersection on the right hand side with one of the densities of the other groups. For the rightmost group 3 the same is true for the densities on the left. The error rate of the middle group 2 is, correspondingly, determined by

4 Simulations 7 Figure 1:The error rate of the left group is gray shaded. the first intersections points of its density with the other densities on the left and on the right.For simplicity let us now assume that the relevant intersection points are 12,determining the error of group 1 and the 'left part'of the error of group 2, and x23,determining the error of group 3 and the'right part'of the error of group 2.This then leads to the following formula for the exact error rate corresponding to the 3 groups: E(erToreu)=(1-Φ(c12)+(Φ2(x12)+(1-④2(x23)+Φ3(x23)/3) (14) As an example consider 3 groups with group means =-3,u2=-2,and u3 0,and with standard deviations o1 2.037,02=0.9406,ando3 1.These parameters lead to intersection points 12 =-3.17 and x23 =-1,as well as an exact error rate E(erroree)=31.45%. For procedure 1 we only succeeded to find a general analytic formula for the exact expected error rate in the case of 2 groups.Procedure 1 assumes equal covariance matrices for all groups.This covariance matrix is estimated by the pooled covariance matrix∑=（∑1+∑2）/2，where:is the estimated covariance matrix of group i, i=1 2.For normal group distributions with means and u2 and a common covariance matrix one can show (see [6,p.12)that the exact error rate is E(erroTq)=Φ(-0.5d2）where 012=V(u2-h)T∑-1(u2-】 (15)） and is the distribution function of the standard normal. 4 Simulations 4.1 Known densities In this subsection we assume that the group densities are fully known so that pa- rameter estimation is superfluous.This means in particular that QDA as well as LDA is carried out with the correct densities.In this way the outcome does not depend on the goodness of parameter estimation.In the next subsection,we will discuss the case when density parameters have to be estimated. In all examples sample size N=100 is used for each group.Also,W =100 is used.In order to be independent of the drawn random vectors,this experiment was repeated V=100 times and the means of the meanerror rates and the corresponding

4 Simulations 7 Figure 1: The error rate of the left group is gray shaded. the rst intersections points of its density with the other densities on the left and on the right. For simplicity let us now assume that the relevant intersection points are x12, determining the error of group 1 and the 'left part' of the error of group 2, and x23, determining the error of group 3 and the 'right part' of the error of group 2. This then leads to the following formula for the exact error rate corresponding to the 3 groups: E(errorcv) = ((1 1(x12)) + (2(x12) + (1 2(x23)) + 3(x23))=3) (14) As an example consider 3 groups with group means 1 = 3, 2 = 2, and 3 = 0, and with standard deviations 1 = 2 :037,2 = 0 :9406, and3 = 1. These parameters lead to intersection points x12 = 3:17 and x23 = 1, as well as an exact error rate E(errorcv) = 31 :45%. For procedure 1 we only succeeded to nd a general analytic formula for the exact expected error rate in the case of 2 groups. Procedure 1 assumes equal covariance matrices for all groups. This covariance matrix is estimated by the pooled covariance matrix =( ^ ^ 1 + ^ 2)=2 , where ^ i is the estimated covariance matrix of group i, i = 1 2. For normal group distributions with means ; 1 and 2 and a common covariance matrix one can show (see [6], p. 12) that the exact error rate is E(errorcv) = ( 0:512) where 12 = p (2 1)T 1(2 1) (15) and is the distribution function of the standard normal. 4 Simulations 4.1 Known densities In this subsection we assume that the group densities are fully known so that parameter estimation is super uous. This means in particular that QDA as well as LDA is carried out with the correct densities. In this way the outcome does not depend on the goodness of parameter estimation. In the next subsection, we will discuss the case when density parameters have to be estimated. In all examples sample size N = 100 is used for each group. Also, W = 100 is used. In order to be independent of the drawn random vectors, this experiment was repeated V = 100 times and the means of the mean error rates and the corresponding

4.1 Known densities 9 Sim naiveMC procl proc2 varl var2 cor2 minl min2 mvar1 mvar2 1 2.53 0.61 1.65 94% 57% 0.97 0.79 0.61 1.55 94% 62% 2 2.55 2.31 1.96 18% 41% 0.59 0.70 2.06 1.82 35% 49% 2.55 1.93 2.45 43% 8% 0.74 0.43 1.72 2.30 55% 9% Table 1:Monte Carlo standard deviations,variance reductions,and correlations for known densities two groups in the first simulation we use a third group with the follow ing parameters of a normal distribution: 0.3 s=(3,0)'and (21) The true expected error rate is approximately 28.44% The results of naive Monte Carlo and the two control variate procedures are sum- marized in Table 1.Note that 'Sim'indicates the simulation number,'naiveMC'the estimated mean standard deviation of the naive Monte Carlo,'procl'and proc2' the corresponding standard deviations of the control variate procedures,'varl'and var2'the corresponding percentages of variance reduction,'corl'and 'cor2'the mean correlation coefficients,'minl'and 'min2'the corresponding minimal standard devi- ations of the control variate procedures,and'mvarl'and'mvar2'the corresponding maximum percentages of variance reduction. Analysing Table 1 note particularly that for simulation 2 procedure 2 leads to a big- ger variance reduction than procedure 1,but that the maximum variance reduction is nevertheless smaller than for simulation 1 since the univariate projected constel- lation of the groups is more complicated in this example.The bad performance of procedure 1 indicates that in this example the covariance matrices of the different groups can not be assumed to be approximately equal. Third Simulation: Upto now,the examples were mainly one dimensional in that the groups were shifted in the first component only.Since this might lead to an overoptimistic judgement of procedure 2,the third example is the same as the second,but with 3=(1,1) (22) i.e.with a mean shifted in both directions The corresponding Monte Carlo results can also be found in Table 1.Note in partic- ular that now again procedure 1 is very adequate,but procedure 2,unfortunately, does not lead to a substantial variance reduction,and might thus even cause an increase in computer time.The problems of procedure 2 also become clear noting that the exact expected error rate is 45%for this procedure in contrast to a true expected error rate of around 33%

4.1 Known densities 9 Sim naiveMC proc1 proc2 var1 var2 cor1 cor2 min1 min2 mvar1 mvar2 1 2.53 0.61 1.65 94% 57% 0.97 0.79 0.61 1.55 94% 62% 2 2.55 2.31 1.96 18% 41% 0.59 0.70 2.06 1.82 35% 49% 3 2.55 1.93 2.45 43% 8% 0.74 0.43 1.72 2.30 55% 9% Table 1: Monte Carlo standard deviations, variance reductions, and correlations for known densities two groups in the rst simulation we use a third group with the following parameters of a normal distribution: 3 = (3 ;0)0 and 3 = 2 0:3 0:3 2 :4 : (21) The true expected error rate is approximately 28:44%. The results of naive Monte Carlo and the two control variate procedures are summarized in Table 1. Note that 'Sim' indicates the simulation number, 'naiveMC' the estimated mean standard deviation of the naive Monte Carlo, 'proc1' and 'proc2' the corresponding standard deviations of the control variate procedures, 'var1' and 'var2' the corresponding percentages of variance reduction, 'cor1' and 'cor2' the mean correlation coecients, 'min1' and 'min2' the corresponding minimal standard deviations of the control variate procedures, and 'mvar1' and 'mvar2' the corresponding maximum percentages of variance reduction. Analysing Table 1 note particularly that for simulation 2 procedure 2 leads to a bigger variance reduction than procedure 1, but that the maximum variance reduction is nevertheless smaller than for simulation 1 since the univariate pro jected constellation of the groups is more complicated in this example. The bad performance of procedure 1 indicates that in this example the covariance matrices of the dierent groups can not be assumed to be approximately equal. Third Simulation: Up to now, the examples were mainly one dimensional in that the groups were shifted in the rst component only. Since this might lead to an overoptimistic judgement of procedure 2, the third example is the same as the second, but with 3 = (1 ;1)0 (22) i.e. with a mean shifted in both directions. The corresponding Monte Carlo results can also be found in Table 1. Note in particular that now again procedure 1 is very adequate, but procedure 2, unfortunately, does not lead to a substantial variance reduction, and might thus even cause an increase in computer time. The problems of procedure 2 also become clear noting that the exact expected error rate is 45% for this procedure in contrast to a true expected error rate of around 33%

4.2 Estimated densities 10 As a preliminary conclusion this indicates that procedure 2 is useful probably only if a 1-dimensional projection does not alter the problem too much.Naturally,a similar statement is true for procedure 1,but the assumption of equal,or at least similar,covariance structures is,most of the time,not that much problematic.That the amount of variance reduction depends on the 'similarity'of the control variate to the true error rate could have already been deduced from equation (11).On the other hand,for procedure 1 the exact expected error rate is not easily computable, in general,and procedure 2 is much quicker. 4.2 Estimated densities Since in practice the distributions of the grouped observations are not known,the implementation of the control variate procedures has somewhat to be adapted.For the purpose of this paper we nevertheless assume normal distributions for conve- nience so that only the corresponding distribution parameters have to be estimated. Moreover,since the densities have to be estimated from the observations,the true expected misclassificationrate has to be estimated by means of a resampling method in order to avoid overoptimism.As the resampling method we use leave-one-out cross validation.I.e.we preliminarily eliminate one observation,estimate the densities from the remaining observations,and predict the class of the eliminated observation by means of these estimated densities.This is done for each observation. This causes two problems.First,the extra loop for resampling leads to such a big computational effort that the number of replicates of the whole Monte Carlo experiment is reduced to V 10 for simulation 1 and to V=5 otherwise.Sec- ond,the exact expectations in procedures 1 and 2 should not have to be computed for each resampled sample.Thus,we propose to compute the exact expectation from the "observed"sample only,and use this value for all resampled samples also. Moreover,for the purpose of the simulations for this paper we decided to use the exact expectations from the densities used to generate the observations in order to reduce computational effort.Finally,we used the same example densities as in the preceding section to be able to judge the extra variance caused by parameter estimation. The results of the Monte Carlo simulations can be found in Table 2.Note the increase of variance and the very small correlation %with procedure 2 in the third simulation.The optimal standard deviation reachable by(8)in this case would thus be 2.53,which is only very slightly lower than 2.57 with the naive Monte Carlo. Since,nevert heless,the results are very similar to the results of the simulations with known densities,the conclusions from the last subsection appear to be valid also in the case of density parameters to be estimated

4.2 Estimated densities 10 As a preliminary conclusion this indicates that procedure 2 is useful probably only if a 1-dimensional pro jection does not alter the problem too much. Naturally, a similar statement is true for procedure 1, but the assumption of equal, or at least similar, covariance structures is, most of the time, not that much problematic. That the amount of variance reduction depends on the 'similarity' of the control variate to the true error rate could have already been deduced from equation (11). On the other hand, for procedure 1 the exact expected error rate is not easily computable, in general, and procedure 2 is much quicker. 4.2 Estimated densities Since in practice the distributions of the grouped observations are not known, the implementation of the control variate procedures has somewhat to be adapted. For the purpose of this paper we nevertheless assume normal distributions for convenience so that only the corresponding distribution parameters have to be estimated. Moreover, since the densities have to be estimated from the observations, the true expected misclassication rate has to be estimated by means of a resampling method in order to avoid overoptimism. As the resampling method we use leave-one-out cross validation. I.e. we preliminarily eliminate one observation, estimate the densities from the remaining observations, and predict the class of the eliminated observation by means of these estimated densities. This is done for each observation. This causes two problems. First, the extra loop for resampling leads to such a big computational eort that the number of replicates of the whole Monte Carlo experiment is reduced to V = 10 for simulation 1 and to V = 5 otherwise. Second, the exact expectations in procedures 1 and 2 should not have to be computed for each resampled sample. Thus, we propose to compute the exact expectation from the "observed" sample only, and use this value for all resampled samples also. Moreover, for the purpose of the simulations for this paper we decided to use the exact expectations from the densities used to generate the observations in order to reduce computational eort. Finally, we used the same example densities as in the preceding section to be able to judge the extra variance caused by parameter estimation. The results of the Monte Carlo simulations can be found in Table 2. Note the increase of variance and the very small correlation % with procedure 2 in the third simulation. The optimal standard deviation reachable by (8) in this case would thus be 2.53, which is only very slightly lower than 2.57 with the naive Monte Carlo. Since, nevertheless, the results are very similar to the results of the simulations with known densities, the conclusions from the last subsection appear to be valid also in the case of density parameters to be estimated

《实用统计软件》课程教学资源（阅读材料）多元分类问题中的应用 Variance Reduction with Monte Carlo Estimates of Error Rates in Multivariate Classi cation

《实用统计软件》课程教学资源（阅读材料）多元分类问题中的应用 Variance Reduction with Monte Carlo Estimates of Error Rates in Multivariate Classication