正在加载图片...
P. Deb, PK Trivedi/Journal of Health Economics 21(2002)601-625 heterogeneity in a finite, usually small, number of latent classes, each of which may be regarded as a"type, or a"group". Second, the finite mixture approach is semiparametric it does not require any distributional assumptions for the mixing variable. The approach is an alternative to either nonp-arametric estimation or forcing the data through the straitjacket of a one-component parametric density. Third, the results of Laird(1978)and Heckman an Singer (1984)suggest that estimates of such finite mixture models may provide good numer ical approximations even if the underlying mixing distribution is continuous. The structure of the moments given above shows how the mixture model"decomposes" the information contained in a one-component model. Finally, the choice of a continuous mixing density for some parametric count models is sometimes restrictive and computationally intractable because the marginal density may not have an analytical solution. Note that in the NB model, the response of E(y)to a covariate x is fixed by exp(xB).If observations in the right tail have a different response to changes in x, the NB model could not capture that effect. The TPM loosens the parametric straitjacket by allowing different parameters for Pr(y = 0)and E(yly >0). However, the TPM is not likely to capture differential responses to changes in x in the right tail of the distribution because the re- sponse of E(yly >0)to a covariate x is fixed by exp(xB). In the LCM, the response of E()to a covariate x is determined by two or more sets of interactions between parame- ters and covariates(depending on the number of components), therefore is more likely to accommodate differential responsiveness A finite mixture characterization is especially attractive if the mixture components have a natural interpretation. However, this is not essential. A finite mixture may be simply a way of flexibly and parsimoniously modeling the data, with each mixture component providing a local approximation to some part of the true distribution. A caveat to the foregoing dise sion is that the lCm may fit the data better simply because outliers, infuential observations or contaminated observations are present in the data. The LCM model will capture this phe nomenon through additional mixture components. Hence it is desirable that the hypothesis of LCM should be supported both by a priori reasoning and by meaningful a posteriori differences in the behavior of latent classes 2.5. Maximum likelihood and cluster-robust standard errors Both TPM and LCM are estimated using(pseudo) maximum likelihood. The standarc TPM is computationally simple because the two parts of the likelihood function can be estimated separately. On the other hand, estimation of LCM is not straightforward. A comprehensive discussion of maximum likelihood estimation of the LCM model can be found in McLachlan and Peel (2000). The likelihood functions of finite mixture models can have multiple local maxima so it is important to ensure that the algorithm converges to the global maximum. In general, random perturbation or grid search techniques, or al gorithms such as simulated annealing( Goffe et al, 1994), designed to seek the global optimum, may be utilized. In this study, to ensure against the possibility of achievi 3 Lindsay (1995)provides a detailed theoretical analysis; Haughton( 1997) surveys computational issues and available software We thank an anonymous referee for suggesting this intuitionP. Deb, P.K. Trivedi / Journal of Health Economics 21 (2002) 601–625 607 heterogeneity in a finite, usually small, number of latent classes, each of which may be regarded as a “type”, or a “group”. Second, the finite mixture approach is semiparametric: it does not require any distributional assumptions for the mixing variable. The approach is an alternative to either nonp-arametric estimation or forcing the data through the straitjacket of a one-component parametric density. Third, the results of Laird (1978)and Heckman and Singer (1984) suggest that estimates of such finite mixture models may provide good numer￾ical approximations even if the underlying mixing distribution is continuous. The structure of the moments given above shows how the mixture model “decomposes” the information contained in a one-component model.3 Finally, the choice of a continuous mixing density for some parametric count models is sometimes restrictive and computationally intractable because the marginal density may not have an analytical solution. Note that in the NB model, the response of E(y) to a covariate x is fixed by exp(xβ). If observations in the right tail have a different response to changes in x, the NB model could not capture that effect. The TPM loosens the parametric straitjacket by allowing different parameters for Pr(y = 0) and E(y|y > 0). However, the TPM is not likely to capture differential responses to changes in x in the right tail of the distribution because the re￾sponse of E(y|y > 0) to a covariate x is fixed by exp(xβ). In the LCM, the response of E(y) to a covariate x is determined by two or more sets of interactions between parame￾ters and covariates (depending on the number of components), therefore is more likely to accommodate differential responsiveness.4 A finite mixture characterization is especially attractive if the mixture components have a natural interpretation. However, this is not essential. A finite mixture may be simply a way of flexibly and parsimoniously modeling the data, with each mixture component providing a local approximation to some part of the true distribution. A caveat to the foregoing discus￾sion is that the LCM may fit the data better simply because outliers, influential observations or contaminated observations are present in the data. The LCM model will capture this phe￾nomenon through additional mixture components. Hence it is desirable that the hypothesis of LCM should be supported both by a priori reasoning and by meaningful a posteriori differences in the behavior of latent classes. 2.5. Maximum likelihood and cluster-robust standard errors Both TPM and LCM are estimated using (pseudo) maximum likelihood. The standard TPM is computationally simple because the two parts of the likelihood function can be estimated separately. On the other hand, estimation of LCM is not straightforward. A comprehensive discussion of maximum likelihood estimation of the LCM model can be found in McLachlan and Peel (2000). The likelihood functions of finite mixture models can have multiple local maxima so it is important to ensure that the algorithm converges to the global maximum. In general, random perturbation or grid search techniques, or al￾gorithms such as simulated annealing (Goffe et al., 1994), designed to seek the global optimum, may be utilized. In this study, to ensure against the possibility of achieving 3 Lindsay (1995) provides a detailed theoretical analysis; Haughton (1997) surveys computational issues and available software. 4 We thank an anonymous referee for suggesting this intuition
<<向上翻页向下翻页>>
©2008-现在 cucdc.com 高等教育资讯网 版权所有