EC2610 Fall 2004 GMM Notes for EC2610 1 Introduction These notes povide an introduction to GMM estimation. Their primary purpose is to make the reader familiar enough with gmm to be able to solve problem set assignments. For the more theoretical foundations, properties and extensions of GMM, or to better understand its workings, interested reader should consult any Hayashi, Hamilton, etc, as well as the original GMM article by Hansen(1982) Available lecture notes for graduate econometrics courses, e. g. by Chamberlain (Ec 2140), by Pakes and Porter(Ec 2144), also contain very useful reviews of GMM Generalized Method of Moments provides asymptotic properties for estima- tors and is general enough to include many other commonly used techniques like Ols and ML Having such an umbrella to encompass many of the estimators is very useful, as one doesn't have to derive each estimator property separately With such a wide range, it is not surprising to see GMM used extensively, but one should also be careful when it is appropriate to apply. Since GMm deals with asymptotic properties, it works well for large samples, but does not pro- vide an answer when the samply size is small, or what is "largeenough sample ize. Also, when applying GMm, one may forgo certain desirable properties like efiiciency. 2 GMM Framework 2.1 Definition of GMM Estimator Let i, i=1,., n be i i.d. random draws from the unknown population distri- bution P For a known function y/, the parameter Bo Ee(usually also in the interior of e) is known to satisfy the key moment condition E(ai, 80)=0 This equation provides the core of the GMM estimation. The appropriate function y and the parameter Bo are usually derived from a theoretical model Both yl and Bo can be vector valued and not necessarily of the same size. Let the size of wl be q, and the size of 0 be p. The mean is 0 only at the true parameter value Bo, which is assumed to be unique over some neighborhood around Bo Along with equation (1), one also imposes certain boundary conditions for the 2nd order moment and partial derivative one E[v(x;60)v(x;,60)≡更<∞ 0 0e sm(a)
EC2610 Fall 2004 GMM Notes for EC2610 1 Introduction These notes povide an introduction to GMM estimation. Their primary purpose is to make the reader familiar enough with GMM to be able to solve problem set assignments. For the more theoretical foundations, properties and extensions of GMM, or to better understand its workings, interested reader should consult any of the standard graduate econometrics textbooks, e.g., by Greene, Wooldridge, Hayashi, Hamilton, etc., as well as the original GMM article by Hansen (1982). Available lecture notes for graduate econometrics courses, e.g. by Chamberlain (Ec 2140), by Pakes and Porter (Ec 2144), also contain very useful reviews of GMM. Generalized Method of Moments provides asymptotic properties for estimators and is general enough to include many other commonly used techniques, like OLS and ML. Having such an umbrella to encompass many of the estimators is very useful, as one doesnít have to derive each estimator property separately. With such a wide range, it is not surprising to see GMM used extensively, but one should also be careful when it is appropriate to apply. Since GMM deals with asymptotic properties, it works well for large samples, but does not provide an answer when the samply size is small, or what is "large" enough sample size. Also, when applying GMM, one may forgo certain desirable properties, like eÖiciency. 2 GMM Framework 2.1 DeÖnition of GMM Estimator Let xi ; i = 1; :::; n be i.i.d. random draws from the unknown population distribution P. For a known function ; the parameter 0 2 (usually also in the interior of ) is known to satisfy the key moment condition: E [ (xi ; 0)] = 0 (1) This equation provides the core of the GMM estimation. The appropriate function and the parameter 0 are usually derived from a theoretical model. Both and 0 can be vector valued and not necessarily of the same size. Let the size of be q, and the size of be p. The mean is 0 only at the true parameter value 0, which is assumed to be unique over some neighborhood around 0: Along with equation (1), one also imposes certain boundary conditions for the 2nd order moment and partial derivative one: E (xi ; 0) 0 (xi ; 0) < 1 and @ 2 j (x; ) @k@l m(x) 1
EC2610 Fall 2004 for all e∈, where E m(xp, then we're "over-identified, and a solution will not exist for most functions v. A natural approach for the latter case might be to try to get the left hand side as close to 0 as possible with" closeness" defined over some norm‖·‖A: where An is q-by-q symmetric, positive definite matrix Another approach could be to find the soltuion to(2), by making some linear bination of v,; equations equal to 0. I.e. for some p-by-q matrix Cn, of rank olve for ∑叭(xb hich will give us p equations with p unknowns In fact, both approaches are equivalent and gmM estimation is setup to do exactly that. That is, when p=q, GMM is just-identified and we can usually solve for 6 exactly. When q>p, we're in the over-identified case and for som appropriate matrix An(or Cn), GMM estimate 0 is found by v(x1,6)A 4 (Or equivalently, solving for: equation(3)). The choice of An will be discussed later, but for now assume An -y a.s., where y is also symmetric and positive- definite 2.2 Asymptotic properties of GMM Given the above setup, gMM provides two key results: consistency and as- ymptotic normality. Consistency shows that our estiamtor gives us the right
EC2610 Fall 2004 for all 2 ; where E [m(x)] p, then weíre "over-identiÖed," and a solution will not exist for most functions : A natural approach for the latter case might be to try to get the left hand side as close to 0 as possible, with "closeness" deÖned over some norm kkAn : kykAn = y 0A 1 n y where An is q-by-q symmetric, positive deÖnite matrix. Another approach could be to Önd the soltuion to (2), by making some linear combination of j equations equal to 0. I.e. for some p-by-q matrix Cn, of rank p, solve for: Cn 1 n X i (xi ;b) = 0 (3) which will give us p equations with p unknowns. In fact, both approaches are equivalent and GMM estimation is setup to do exactly that. That is, when p=q, GMM is just-identiÖed and we can usually solve for b exactly. When q>p, weíre in the over-identiÖed case and for some appropriate matrix An (or Cn), GMM estimate b is found by: b = arg min 2 " 1 n X i (xi ; ) #0 A 1 n " 1 n X i (xi ; ) # (4) (Or equivalently, solving for:equation (3)). The choice of An will be discussed later, but for now assume An ! a.s., where is also symmetric and positivedeÖnite. 2.2 Asymptotic properties of GMM Given the above setup, GMM provides two key results: consistency and asymptotic normality. Consistency shows that our estiamtor gives us the "right" 2
EC2610 Fall 2004 answer, and asymptotic normality provides us with variance-covariance matrix which we can use for hypothesis testing. More specifically, the estimator 6. found via equation 3)satisfies 0-8o as(consistency), and m(6-60)N(0,△) △=(D业-1D)-1D-1 (Looking at above properties, one can draw obvious similarities between the GMM estimator, and the Delta Method) To do hypothesis testing, let n denote the asymptotic distribtuion. The equation(5)implies 0N(60,A) where A=△更A′ =(D-D)-1D-1y-1D(D业-1D)-1 dp and D are population means defined over true parameter values, and y is the probability limit of An. When computing the variance matrix for a given mple, one usually replaces the population mean with the sample mean: the true parameter value with the estimated value, and y with E v(ri, 0o)v(ci, 0o) v(x1,)v(x,6) 亚≈An The standard errors are obtained from SEk where Akk is the kth diagonal entry of A
EC2610 Fall 2004 answer, and asymptotic normality provides us with variance-covariance matrix, which we can use for hypothesis testing. More speciÖcally, the estimator b, found via equation (3) satisÖes b ! 0 a.s. (consistency), and p n(b 0) d ! N(0;) (5) (asymptotic normality), where = 0 and = (D0 1D) 1D0 1 (Looking at above properties, one can draw obvious similarities between the GMM estimator, and the Delta Method). To do hypothesis testing, let A denote the asymptotic distribtuion. Then, equation (5) implies: b A N(0; 1 n ) where = 0 (6) = (D0 1D) 1D0 1 1D(D0 1D) 1 and D are population means deÖned over true parameter values, and is the probability limit of An::When computing the variance matrix for a given sample, one usually replaces the population mean with the sample mean; the true parameter value with the estimated value, and with An : = E (xi ; 0) 0 (xi ; 0) 1 n X i (xi ;b) 0 (xi ;b) D = E @ (xi ; 0) @0 1 n X i @ (xi ; ) @0 j =b and An The standard errors are obtained from: SEk = r 1 n kk where kk is the kth diagonal entry of : 3
EC2610 Fall 2004 2.3 Optimal Weighting Matrices Having established the properties of GMM, we now turn to the choice of the weighting matrix An and Cn. When gmm is just identified, then one can usually solve for 0 from equation(2). This is equivalent for finding a unique minimum and since it has full rank, ill ositive-definite matrix An. Also, D will be square: point in equation(4)for any △更△ (Dy- D)Dy-- D(Dy-lD)-1 D-1业D-1D重-1-DD-1yD-1 DD As expected, the choice of An doesn't affect the asymptotic distribution for the just-identified case. For the over-identified case. the choice of the weight matrix will now matter for 6. However, since the consistency and asymptotic normality results of gmm do not depend on the choice of An(as long as it's symmetric and positive definite), we should get our main results again for any choice of An. In such a case, the most common choice is the identity matrix: An=I g Then,业= Ig and D业-1D)-1D-1 (DD)D' and the approximate variance-covariance matrix will be (DD)DAD(DD) (This is the format of GMM variance-covariance matrix Prof. Pakes uses in the IO lecture notes. Given that one is free to choose which particular An to choose, one can try ick the weighting matrix to give gmm other desirable pr fficiency. From equation 6, we know that A=(D-1D)-1D业-4y-1D(D业-1D)-1
EC2610 Fall 2004 2.3 Optimal Weighting Matrices 2.3.1 Choice of An Having established the properties of GMM, we now turn to the choice of the weighting matrix An and Cn: When GMM is just identiÖed, then one can usually solve for b from equation (2). This is equivalent for Önding a unique minimum point in equation (4) for any positive-deÖnite matrix An: Also, D will be square; and since it has full rank, will be invertible. Then, the variance matrix will be: = 0 = (D0 1D) 1D0 1 1D(D0 1D) 1 = D1 D01D0 1 1DD1 D01 = D1D01 As expected, the choice of An doesnít a§ect the asymptotic distribution for the just-identiÖed case. For the over-identiÖed case, the choice of the weight matrix will now matter for b: However, since the consistency and asymptotic normality results of GMM do not depend on the choice of An (as long as itís symmetric and positive deÖnite), we should get our main results again for any choice of An: In such a case, the most common choice is the identity matrix: An = Iq Then, = Iq and = (D0 1D) 1D0 1 = (D0D) 1D0 and the approximate variance-covariance matrix will be: 1 n = 1 n 0 = 1 n (D0D) 1D0D(D0D) 1 (This is the format of GMM variance-covariance matrix Prof. Pakes uses in the IO lecture notes.) Given that one is free to choose which particular An to choose, one can try pick the weighting matrix to give GMM other desirable properties as well, like e¢ ciency. From equation 6, we know that: = (D0 1D) 1D0 1 1D(D0 1D) 1 4
EC2610 Fall 2004 Since we're now free to pick y, one can choose it to minimzes the variance A n(D/业-1D)-1D(14y-1D(D业-1D)-1 It is easy to show that the min(D业-1D)-1D业-14-1D(D业-1D)-1=(D4-1D)-1 which is obtained at The above solution has very intuitive appeal: indexes with larger variances are assigned smaller weights in the estimation 2.3.2 2-Step GMM estimation The above procedure then gives rise to 2-step gMM estimation. in the spirit of FGLS 1. Pick An= I (equal weighting), and solve for the 1st stage GMM estimate 01.Since 01 is consistent, iEi v(i, 01)v(a;, 01) will be consistent estimate of 2. Pick An =i v(i, 01)v(, 01), and obtain the 2nd stage GMM estimate 0, The variance matrix IA. will then be the smallest 2.3.3 Choice of c It should be clear by now how the equations 3)and(4)are related to each, and correspondingly, how An and Cn are related. By actually differentiating the minimization problem in equation(4), we obtain the FOC n400mn2v(x,)=0 Ou(x,0)-1 we have equation(7)tu One caveat should be pointed out. We specified that equation 3)is linear ombination of v (a, 0, i.e. Cn is a matrix of constants. But in equation(7) Cn will in general depend on the solution of the equation: 8. This can be easily circumvented if we look at the 2nd stage GMM solution, and use the first stage Ist stage 01 for Cn. That is, if in the second step, we'd normally solve v(x;,b2)=0
EC2610 Fall 2004 Since weíre now free to pick ; one can choose it to minimzes the variance: = arg min = arg min (D0 1D) 1D0 1 1D(D0 1D) 1 It is easy to show that the minimum is equal to: min (D0 1D) 1D0 1 1D(D0 1D) 1 = (D0 1D) 1 which is obtained at = The above solution has very intuitive appeal: indexes with larger variances are assigned smaller weights in the estimation. 2.3.2 2-Step GMM estimation The above procedure then gives rise to 2-step GMM estimation, in the spirit of FGLS. 1. Pick An = I (equal weighting), and solve for the 1st stage GMM estimate: b1: Since b1 is consistent, 1 n P i (xi ;b1) (xi ;b1) 0 will be consistent estimate of : 2. Pick An = 1 n P i (xi ;b1) (xi ;b1) 0 ; and obtain the 2nd stage GMM estimate b2. The variance matrix 1 n b2 will then be the smallest. 2.3.3 Choice of Cn It should be clear by now how the equations (3) and (4) are related to each, and correspondingly, how An and Cn are related. By actually di§erentiating the minimization problem in equation (4), we obtain the FOC: " 1 n X i @ (xi ;b) @0 #0 A 1 n " 1 n X i (xi ;b) # = 0 (7) If we now deÖne Cn " 1 n X i @ (xi ;b) @0 #0 A 1 n we have equation (7) turning into (3). One caveat should be pointed out. We speciÖed that equation (3) is linear combination of j (x;b); i.e. Cn is a matrix of constants. But in equation (7) Cn will in general depend on the solution of the equation: b: This can be easily circumvented if we look at the 2nd stage GMM solution, and use the Örst stage 1st stage b1 for Cn: That is, if in the second step, weíd normally solve: " 1 n X i @ (xi ;b2) @0 #0 A 1 n " 1 n X i (xi ;b2) # = 0 5
EC2610 Fall 2004 where An is obtained from the lst stage. We can instead solve for a different 2nd stage estimate 02 41∑xn,)=0 Since B1 satisfies consistency, and asymptotic normality, 62, will once agai be consistent, asymptotically normal, as well as efficient among the class of GMM estimators. And now C is linear when solving for e 3 Applications of GMM 3.1 Ordinary Least Squares Since gmm does not impose any restrictions on the functional form of wb, it can be easily applied to simple-linear as well as non-linear moment conditions (It can also be extended to continuous, but non-differentiable functions). The usefulness of GMM is perhaps more evident for non-linear estimations, but ne can become more familiar with gMM by drawing similarities with other standard techniques For the case of OLS. we have: rB+∈ with the zero covariance condition: E(Ei=0 The latter is the key gMm moment condition. and can be rewritten as e(v(ai, B)) 0 E(x(v-x)=0 The sample analog becomes n∑((01-x) Since these are k equations with k unknowns, GMM is just-identified with the unique solution of: GMM
EC2610 Fall 2004 where An is obtained from the 1st stage. We can instead solve for a di§erent 2nd stage estimate b2 0 : " 1 n X i @ (xi ;b1) @0 #0 A 1 n " 1 n X i (xi ;b2 0 ) # = 0 Since b1 satisÖes consistency, and asymptotic normality, b2 0 will once again be consistent, asymptotically normal, as well as e¢ cient among the class of GMM estimators. And now Cn is linear when solving for b2 0 : 3 Applications of GMM 3.1 Ordinary Least Squares Since GMM does not impose any restrictions on the functional form of ; it can be easily applied to simple-linear as well as non-linear moment conditions. (It can also be extended to continuous, but non-di§erentiable functions). The usefulness of GMM is perhaps more evident for non-linear estimations, but one can become more familiar with GMM by drawing similarities with other standard techniques. For the case of OLS, we have: yi = x 0 i + "i with the zero covariance condition: E(xi"i) = 0 The latter is the key GMM moment condition, and can be rewritten as: E( (xi ; )) = 0 E(xi(yi x 0 i)) = 0 The sample analog becomes: 1 n X i (xi(yi x 0 ib)) = 0 Since these are k equations with k unknowns, GMM is just-identiÖed with the unique solution of: b GMM = X i xix 0 i !1 X i xiyi ! 6
EC2610 Fall 2004 r which corresponds to the OLS solution. For the variance covariance matrix need to compute only and D E((ai, B)v(ai, B) E(iEiEi E(E2x;2) ∑咛 where e:=yi-riB 0v(x;,B0) 0 E Then, the variance-covariance matrix will equal A=1-13D This is also known as the White formula for heteroskedasticity-consistent standard errors For a simpler OLS example, if we assume homoskedasticity, one can obtain a simpler version of the variance matrix. With homoskedasticity. E(E(Eain E(E(E2)x;) =E(=2)E(x1x1) ∑(∑x which is the variance estimate for the homoskedastic case
EC2610 Fall 2004 which corresponds to the OLS solution. For the variance covariance matrix we need to compute only and D : = E( (xi ; ) (xi ; ) 0 ) = E(xi"i"ix 0 i ) = E(" 2 i xix 0 i ) = b 1 n X i e 2 i xix 0 i where ei = yi x 0 i: b D = E @ (xi ; 0 ) @0 = E (xix 0 i ) Db = 1 n X i xix 0 i Then, the variance-covariance matrix will equal: 1 n = b 1 n Db1bDb01 = 1 n 1 n X i xix 0 i !1 1 n X i e 2 i xix 0 i ! 1 n X i xix 0 i !1 = X i xix 0 i !1 X i e 2 i xix 0 i ! X i xix 0 i !1 This is also known as the White formula for heteroskedasticity-consistent standard errors. For a simpler OLS example, if we assume homoskedasticity, one can also obtain a simpler version of the variance matrix. With homoskedasticity, = E(" 2 i xix 0 i ) = E(E(" 2 i j xi)xix 0 i ) = E E(" 2 i )xix 0 i = E(" 2 i )E(xix 0 i ) = b 1 n X i e 2 i ! 1 n X i xix 0 i ! Then, 1 n =b 1 n X i e 2 i ! X i xix 0 i !1 which is the variance estimate for the homoskedastic case. 7
EC2610 Fall 2004 3.2 nstrumental variables Suppose again But for the iv estimation we have E(weI) where wi is not necessarily equal to xi. We only require E(wri#0 to be able to invert matrices. The sample analog now becomes: If wi and xi have the same dimension, then we're again in the just-identifie case, with the unique solution of (x)()= IV If the number of instruments exceeds the number of right-hand side variables were in the over-identified case. We can go ahead with 2-stage estimation, but a particular choice of the weighting matrix deserves attention. If we set The Foc for GMM becomes u(v-x26) Let Then we have IT
EC2610 Fall 2004 3.2 Instrumental Variables Suppose again yi = x 0 i + "i But for the IV estimation, we have E(wi"i) = 0 where wi is not necessarily equal to xi : We only require E(w 0 ixi) 6= 0 to be able to invert matrices. The sample analog now becomes: 1 n X i wi(yi x 0 ib) = 0 If wi and xi have the same dimension, then weíre again in the just-identiÖed case, with the unique solution of: b GMM = X i wix 0 i !1 X i wiyi ! = b IV If the number of instruments exceeds the number of right-hand side variables, weíre in the over-identiÖed case. We can go ahead with 2-stage estimation, but a particular choice of the weighting matrix deserves attention. If we set An = 1 n X i wiw 0 i The FOC for GMM becomes: " 1 n X i @ (xi ; b) @0 #0 A 1 n " 1 n X i (xi ; b) # = 0 1 n X i wix 0 i !0 1 n X i wiw 0 i !1 1 n X i wi(yi x 0 ib) ! = 0 Let, =b X i wiw 0 i !1 X i wix 0 i ! (8) b is then the regression coe¢ cients of xi on wi : Then we have: b0 X i wi(yi x 0 ib) ! = 0 (9) X i b0wi(yi x 0 ib) = 0 8
EC2610 Fall 2004 mm=()= ∑=∑ where Ti are the fitted values of xi from( 8) quation(9) then becomes This is the solution to 2 Stage Least Squares(2SLS). The first stage is the regression of the right hand side variables on the instruments; and the second stage is the regression of the dependent variable on the fitted values of the right hand side variables. Thus. with GM were able to obtain the 2SLS estimates and their correct standard errors.(The usual setting of 2SLS is to regress only the" problematic" right-hand side variables on the instruments, and ther use their fitted values. The right-hand side variables, not correlated with the error term, are part of the instruments, and so their fitted values are equal to themselves. We're then doing the exact same regressions 3.3 Maximum likelihood Suppose now we know the family of distributions, p( 0), where the x; come from but do not know the true parameter value Bo. Maximum Likelihood solution to finding Bo is: ML arg max p(E ex lIp(az: 1 6) The maximum point estimate is invariant to monotonic transformations, and M= arg max log(①I1) arge∑gpx1) The foc becomes If we let v(a:|0)=30g pze, the score function, then equation(10)can serve as the sample analog to a key moment condition of the form p(x;|6) When doing ML estimation, the above equation will usually hold. If in doubt you should consult the references
EC2610 Fall 2004 Note that: b0wi = w 0 ib 0 = xbi and X i xbix 0 i = X i xbixb 0 i where xbi are the Ötted values of xi from (8). Equation (9) then becomes: X i xbi(yi xb 0 ib) = 0 This is the solution to 2 Stage Least Squares (2SLS). The Örst stage is the regression of the right hand side variables on the instruments; and the second stage is the regression of the dependent variable on the Ötted values of the righthand side variables. Thus, with GMM weíre able to obtain the 2SLS estimates and their correct standard errors. (The usual setting of 2SLS is to regress only the "problematic" right-hand side variables on the instruments, and then use their Ötted values. The right-hand side variables, not correlated with the error term, are part of the instruments, and so their Ötted values are equal to themselves. Weíre then doing the exact same regressions). 3.3 Maximum Likelihood Suppose now we know the family of distributions, p(; ); where the xi come from but do not know the true parameter value 0. Maximum Likelihood solution to Önding 0 is: bML = arg max 2 p(x1; :::; xn j ) = arg max 2 Yp(xi j ) The maximum point estimate is invariant to monotonic transformations, and so: bML = arg max 2 log Yp(xi j ) = arg max 2 1 n Xlog p(xi j ) The FOC becomes: 1 n X @ log p(xi j b) @0 = 0 (10) If we let (xi j ) = @ log p(xij) @0 ; the score function, then equation (10) can serve as the sample analog to a key moment condition of the form: E @ log p(xi j ) @0 = 0 When doing ML estimation, the above equation will usually hold. If in doubt, you should consult the references. 9