Limited Dependent Variables Po=7x)=G(o+xB) ◆y*=B0+x月+t,y=max(O,y*) Economics 20- Prof anderson
Economics 20 - Prof. Anderson 1 Limited Dependent Variables P(y = 1|x) = G(b0 + xb) y* = b0 + xb + u, y = max(0,y*)
Binary dependent variables e Recall the linear probability model, which can be written as P(=1x)=o+xB va drawback to the linear probability model is that predicted values are not constrained to be between 0 and 1 e An alternative is to model the probability as a function, G(Bo +xB), where 0<G(k<1 Economics 20- Prof anderson
Economics 20 - Prof. Anderson 2 Binary Dependent Variables Recall the linear probability model, which can be written as P(y = 1|x) = b0 + xb A drawback to the linear probability model is that predicted values are not constrained to be between 0 and 1 An alternative is to model the probability as a function, G(b0 + xb), where 0<G(z)<1
The Probit model o One choice for G(z) is the standard normal cumulative distribution function(cdf) ◆G()=中()三Jdv)dv, where) is the standard normal, so d=)=(2)- exp(-z2/2) o This case is referred to as a probit model o Since it is a nonlinear model, it cannot be estimated by our usual methods Use maximum likelihood estimation Economics 20- Prof anderson
Economics 20 - Prof. Anderson 3 The Probit Model One choice for G(z) is the standard normal cumulative distribution function (cdf) G(z) = F(z) ≡ ∫f(v)dv, where f(z) is the standard normal, so f(z) = (2p) -1/2exp(-z 2 /2) This case is referred to as a probit model Since it is a nonlinear model, it cannot be estimated by our usual methods Use maximum likelihood estimation
The logit model o Another common choice for G(z)is the logistic function, which is the cdf for a standard logistic random variable ◆G(z) =exp(z/1+ exp(2J=A() e This case is referred to as a logit model, or sometimes as a logistic regression Both functions have similar shapes -they are increasing in z, most quickly around 0 Economics 20- Prof anderson 4
Economics 20 - Prof. Anderson 4 The Logit Model Another common choice for G(z) is the logistic function, which is the cdf for a standard logistic random variable G(z) = exp(z)/[1 + exp(z)] = L(z) This case is referred to as a logit model, or sometimes as a logistic regression Both functions have similar shapes – they are increasing in z, most quickly around 0
Probits and logits Both the probit and logit are nonlinear and require maximum likelihood estimation e No real reason to prefer one over the other e Traditionally saw more of the logit, mainly because the logistic function leads to a more easily computed model o Today, probit is easy to compute with standard packages, so more popular Economics 20- Prof anderson 5
Economics 20 - Prof. Anderson 5 Probits and Logits Both the probit and logit are nonlinear and require maximum likelihood estimation No real reason to prefer one over the other Traditionally saw more of the logit, mainly because the logistic function leads to a more easily computed model Today, probit is easy to compute with standard packages, so more popular
Interpretation of Probits and Logits(in particular vs LPM o In general we care about the effect of x on Po=lx), that is, we care about op/ ox e For the linear case, this is easily computed as the coefficient on x e For the nonlinear probit and logit models it's more complicated o Op/Ox;8(B0+xB), where g()is dG/dz Economics 20- Prof anderson 6
Economics 20 - Prof. Anderson 6 Interpretation of Probits and Logits (in particular vs LPM) In general we care about the effect of x on P(y = 1|x), that is, we care about ∂p/ ∂x For the linear case, this is easily computed as the coefficient on x For the nonlinear probit and logit models, it’s more complicated: ∂p/ ∂xj = g(b0 +xb)bj , where g(z) is dG/dz
Interpretation(continued) Clear that it's incorrect to just cor mare the coefficients across the three models e Can compare sign and significance(based on a standard t test) of coefficients, though e To compare the magnitude of effects, need to calculate the derivatives, say at the means Stata will do this for you in the probit case Economics 20- Prof anderson 7
Economics 20 - Prof. Anderson 7 Interpretation (continued) Clear that it’s incorrect to just compare the coefficients across the three models Can compare sign and significance (based on a standard t test) of coefficients, though To compare the magnitude of effects, need to calculate the derivatives, say at the means Stata will do this for you in the probit case
The likelihood Ratio test o Unlike the LPM. where we can compute F statistics or lM statistics to test exclusion restrictions, we need a new type of test Maximum likelihood estimation(MLE) will always produce a log-likelihood, L o Just as in an F test, you estimate the restricted and unrestricted model. then form LR=2(Lur-L)x Economics 20- Prof anderson 8
Economics 20 - Prof. Anderson 8 The Likelihood Ratio Test Unlike the LPM, where we can compute F statistics or LM statistics to test exclusion restrictions, we need a new type of test Maximum likelihood estimation (MLE), will always produce a log-likelihood, L Just as in an F test, you estimate the restricted and unrestricted model, then form LR = 2(Lur – Lr ) ~ c 2 q
Goodness of fit o Unlike the LPM, where we can compute an R2 to judge goodness of fit, we need new measures of goodness of fit One possibility is a pseudo r2 based on the log likelihood and defined as I-Lu/L o Can also look at the percent correctl predicted -if predict a probability >.5 then that matches y=1 and vice versa Economics 20- Prof anderson 9
Economics 20 - Prof. Anderson 9 Goodness of Fit Unlike the LPM, where we can compute an R2 to judge goodness of fit, we need new measures of goodness of fit One possibility is a pseudo R2 based on the log likelihood and defined as 1 – Lur/Lr Can also look at the percent correctly predicted – if predict a probability >.5 then that matches y = 1 and vice versa
Latent Variables o Sometimes binary dependent variable models are motivated through a latent variables model e The idea is that there is an underlying variable y*, that can be modeled as )*=Bo+xB+e, but we only observe y=l, ify x>0, and y=0 ify* <0 Economics 20- Prof anderson 10
Economics 20 - Prof. Anderson 10 Latent Variables Sometimes binary dependent variable models are motivated through a latent variables model The idea is that there is an underlying variable y*, that can be modeled as y* = b0 +xb + e, but we only observe y = 1, if y* > 0, and y =0 if y* ≤ 0