正在加载图片...
P. Deb, PK. Trivedi/Jourmal of Health Economics 21(2002)601-625 and let H=[A B]. The GoF=I'H(HH+H'l where l is a column vector of ones, i.e. GoF is NR2 from the regression of 1 on H(see Andrews, 1988a, Appendix 5, for details). Our implementation of the GoF adjusts for cluster effects by first summing the elements of H within clusters There is an important point of detail concerning our use of the above test. Because our likely to lead to the rejection of all models we will consider. This is a well-known difficulty of hypothesis testing with fixed significance levels in the classical framework. Moreover previous investigations of the properties of this test suggest that, at conventional critical values, it leads to overrejection of the true null(Deb and Trivedi, 1997; Cameron and Trivedi, 1998). However, it seems appropriate to rank models by the P-values associated with the test, the model with the largest P-value being preferred. Hence in addition to the formal x-test we use the size of the statistic informally as a measure of fit with a smaller statistic indicating better fit. Furthermore, we also graphically compare empirical and fitted cell probabilities 2.7. Cross-validation A common criticism of in-sample model selection methods is that they induce over-fitting in the case of complicated models. Consequently, the selected model may not be the best model. This bias can be avoided by treating one sample as a"training sample"used for estimation, and then using a second"hold-out" sample for forecast comparison sing parameter estimates from the training sample, we calculate three measures of performance for each model using the hold-out sample. The log-likelihood value is the most direct measure of the out-of-sample fit of the model. In order to continue to penalize models with large numbers of parameters, we also use the AlC. We do not use the BiC because it adds a penalty for the sample size in addition to a penalty for the number of parameters which is not appropriate in a cross-validation exercise. Finally, we use a modified version of the andrews statistic as a heuristic with the expectation that models with better fit will have smaller values of the modified GoF in the hold-out sample. We modify Eq. (2.12)to exactly maximized the likelihood function in the training sample MGoF= 1'A(A'A)+A'l where H is replaced by A, i.e. we assume that the parameters 3. Data and summary statistics e use data from the RhIe for this study. The experiment, conducted by the rand Corporation from 1974 to 1982, is the longest and largest controlled social experiment in medical care research. The main goal of the experiment was to assess how a patients use of health services is affected by types of health insurance, including both fee-for-service and health maintenance organizations(HMOs). In the rhie, data were collected from about 8000 enrollees in 2823 families, from six sites across the country. Each family was enrolled in one of fourteen different His insurance plans for either 3 or 5 years. The plansP. Deb, P.K. Trivedi / Journal of Health Economics 21 (2002) 601–625 609 and let H = [A B]. Then GoF = 1 H(H H) +H 1 (2.12) where 1 is a column vector of ones, i.e. GoF is NR2 from the regression of 1 on H (see Andrews, 1988a, Appendix 5, for details). Our implementation of the GoF adjusts for cluster effects by first summing the elements of H within clusters. There is an important point of detail concerning our use of the above test. Because our sample size is quite large, a model comparison based on significance tests of fixed size is likely to lead to the rejection of all models we will consider. This is a well-known difficulty of hypothesis testing with fixed significance levels in the classical framework. Moreover, previous investigations of the properties of this test suggest that, at conventional critical values, it leads to overrejection of the true null (Deb and Trivedi, 1997; Cameron and Trivedi, 1998). However, it seems appropriate to rank models by the P-values associated with the test, the model with the largest P-value being preferred. Hence in addition to the formal χ2-test we use the size of the statistic informally as a measure of fit with a smaller statistic indicating better fit. Furthermore, we also graphically compare empirical and fitted cell probabilities. 2.7. Cross-validation A common criticism of in-sample model selection methods is that they induce over-fitting in the case of complicated models. Consequently, the selected model may not be the best model. This bias can be avoided by treating one sample as a “training sample” used for estimation, and then using a second “hold-out” sample for forecast comparison. Using parameter estimates from the training sample, we calculate three measures of performance for each model using the hold-out sample. The log-likelihood value is the most direct measure of the out-of-sample fit of the model. In order to continue to penalize models with large numbers of parameters, we also use the AIC. We do not use the BIC because it adds a penalty for the sample size in addition to a penalty for the number of parameters which is not appropriate in a cross-validation exercise. Finally, we use a modified version of the Andrews statistic as a heuristic with the expectation that models with better fit will have smaller values of the modified GoF in the hold-out sample. We modify Eq. (2.12) to MGoF = 1 A(A A)+A 1 where H is replaced by A, i.e. we assume that the parameters exactly maximized the likelihood function in the training sample. 3. Data and summary statistics We use data from the RHIE for this study. The experiment, conducted by the RAND Corporation from 1974 to 1982, is the longest and largest controlled social experiment in medical care research. The main goal of the experiment was to assess how a patient’s use of health services is affected by types of health insurance, including both fee-for-service and health maintenance organizations (HMOs). In the RHIE, data were collected from about 8000 enrollees in 2823 families, from six sites across the country. Each family was enrolled in one of fourteen different HIS insurance plans for either 3 or 5 years. The plans
<<向上翻页向下翻页>>
©2008-现在 cucdc.com 高等教育资讯网 版权所有