Ch. 11 Panel Data model Data sets that combine time series and cross sections are common in econo- metrics. For example, the published statistics of the OECD contain numerous series of economic aggregate observed yearly for many countries. The PSID is a studies of roughly 6000 families and 15000 individuals who has been interviews periodically from 1968 to the present. Panel data sets are more oriented toward cross-section analysis; they are wide but typically short(relatively). Hetero- geneity across units is an integral part of the analysis Recall that the(multiple) linear model is used to study the relationship be- tween a dependent variable and several independent variables. That is ∫(x1,x2,…,xk)+E B1x1+B2x2+….+Bkxk+E xB+ where y is the dependent or explained variable, i, i=l,.,k are the independent or the explanatory variables and Bi, i=1, ..,k are unknown coefficients that we are interested in learning about, either through estimation or through hypothesis testing. The term e is an unobservable random disturbance. In the following, we will see the panel data sets provide a richer source of information and the needin f some complex stochastic specifications. el data set of that it will allow the researcher greater fexibility in model difference in behavior across individuals. The basic framework for this statistical model is of the form t=xtB+z1a+et,i=1,2,…,N;t=1,2,…,T. There are k regressor in xit, not including a constant term. The heterogene- ity, or individual effect is za where zi contains a constant term and a set of individual or group specific variables, which may be obser location. an so on or unobserved as family specific characteristics, individ ual heterogeneity in skill or preference, and so on, all of which are taken to be constant over time t. the various cases we will consider are
Ch. 11 Panel Data Model Data sets that combine time series and cross sections are common in econometrics. For example, the published statistics of the OECD contain numerous series of economic aggregate observed yearly for many countries. The PSID is a studies of roughly 6000 families and 15000 individuals who has been interviews periodically from 1968 to the present. Panel data sets are more oriented toward cross-section analysis; they are wide but typically short (relatively). Heterogeneity across units is an integral part of the analysis. Recall that the (multiple) linear model is used to study the relationship between a dependent variable and several independent variables. That is y = f(x1, x2, ..., xk) + ε = β1x1 + β2x2 + ... + βkxk + ε = x 0β + ε where y is the dependent or explained variable, xi , i = 1, ..., k are the independent or the explanatory variables and βi , i = 1, ..., k are unknown coefficients that we are interested in learning about, either through estimation or through hypothesis testing. The term ε is an unobservable random disturbance. In the following, we will see the panel data sets provide a richer source of information and the needing of some complex stochastic specifications. The fundamental advantage of a panel data set over a cross section is that it will allow the researcher greater flexibility in model difference in behavior across individuals. The basic framework for this statistical model is of the form yit = x 0 itβ + z 0 iα + εit, i = 1, 2, ..., N; t = 1, 2, ..., T. There are k regressor in xit, not including a constant term. The heterogeneity, or individual effect is z 0 iα where z 0 i contains a constant term and a set of individual or group specific variables, which may be observed, such as race, sex, location, an so on or unobserved, such as family specific characteristics, individual heterogeneity in skill or preference, and so on, all of which are taken to be constant over time t. The various cases we will consider are: 1
1. Pooled Regression: If z contains only a constant term, then there is no individual specific characteristics in this model. All we need is pooling the data yit=xtB+a+t,i=1,2,,N;t=1,2,…,!T and OlS provides consistent and efficient estimate of the common B and a 2. Fixed Effects: If za=ai, then it is the fixed effect approach to take ai as a group-specific constant term in the regression model vit=xaB+a1+et,i=1,2,…,N;t=1,2,…,T 3. Random effects: If the unobserved individual heterogeneity can be assumed to be uncorrelated with the included variables, then the model may be formulated yit= xitB+E(zia)+Zia-e(zia)+Eit xtB+a+u1+et,i=1,2,…,N;t=1,2,,T The random effect approach specifies that ui is a group specific random element similar to Eit except that for each group, there is but a single draw that enters the regression identically in each period 1 Fixed effects This formulation of the model assume that differences across units can be cap- tured in difference in the constant term. each a is treated as an unknown parameter to be estimated. Let yi and Xi be the T observations the ith unit, i be atx 1 column of ones and let e be associated tx 1 vector of disturbance Then yi=Xi B+ia; +Ei, i=1, 2, It is also assumed that the disturbance terms are well behaved. that is E(E)=0 E(EE= 0I E(e;)=0fi≠
1. Pooled Regression: If z 0 i contains only a constant term, then there is no individual specific characteristics in this model. All we need is pooling the data, yit = x 0 itβ + α + εit, i = 1, 2, ..., N; t = 1, 2, ..., T. and OLS provides consistent and efficient estimate of the common β and α. 2. Fixed Effects: If z 0 iα = αi , then it is the fixed effect approach to take αi as a group-specific constant term in the regression model. yit = x 0 itβ + αi + εit, i = 1, 2, ..., N; t = 1, 2, ..., T. 3. Random effects: If the unobserved individual heterogeneity can be assumed to be uncorrelated with the included variables, then the model may be formulated as yit = x 0 itβ + E(z 0 iα) + [z 0 iα − E(z 0 iα)] + εit = x 0 itβ + α + ui + εit, i = 1, 2, ..., N; t = 1, 2, ..., T. The random effect approach specifies that ui is a group specific random element, similar to εit except that for each group, there is but a single draw that enters the regression identically in each period. 1 Fixed Effects This formulation of the model assume that differences across units can be captured in difference in the constant term. Each αi is treated as an unknown parameter to be estimated. Let yi and Xi be the T observations the ith unit, i be a T × 1 column of ones, and let εi be associated T × 1 vector of disturbance. Then yi = Xiβ + iαi + εi , i = 1, 2, ..., N. It is also assumed that the disturbance terms are well behaved, that is E(εi) = 0; E(εiε 0 i ) = σ 2 IT; and E(εiε 0 j ) = 0 if i 6= j. 2
Observations on all the cross-section can be rewritten as X 0 0 + y or in more compact form y=XB+Da+E where y and e are NT×1, X iS NT×k,isk×1,andD=dnd2…dN]is NT XN with di is a dummy variables indicating the ith unit. This model is usually referred to as the least squares dummy variable(LSDv) model Since this model satisfy the ideal conditions, ols estimator is BLue using the familiar partitioned regression of Ch. 6, the slope estimator would B=(XMDXXMpy where MD=INT -D(DD)D Lemma M00 M D(D'D-D 0 Mo where Mo=IT-1/T(ii)is the demean-matrix
Observations on all the cross-section can be rewritten as y1 y2 . . . . yN = X1 X2 . . . . XN β + i 0 . . . 0 0 i 0 . . 0 . . . . . . . . . . . . . . . . . . . . . . . . 0 . . . 0 i α1 α2 . . . . αN + ε1 ε2 . . . . εN , or in more compact form y = Xβ + Dα + ε, where y and ε are NT × 1, X is NT × k, β is k × 1, and D = [d1 d2 ...dN] is NT × N with di is a dummy variables indicating the ith unit. This model is usually referred to as the least squares dummy variable (LSDV) model. Since this model satisfy the ideal conditions, OLS estimator is BLUE. By using the familiar partitioned regression of Ch. 6, the slope estimator would be βˆ = (X0MDX) −1X0MDy, where MD = INT − D(D0D) −1D0 . Lemma: MD = INT − D(D0D) −1D0 = M0 0 . . . 0 0 M0 0 . . 0 . . . . . . . . . . . . . . . . . . . . . . . . 0 . . . 0 M0 , where M0 = IT − 1/T(ii0 ) is the demean-matrix. 3
By definition i 0 0 0 0i0 0 0i0 0 DD ii 0 0 T 0 0 0 ii 0 0 0T0 and therefore INT-D(DDD i 0 0 i 0 0 IT 0 0 0 0 0 0 1 0 0 M00 0 Ir -lii 0 0 Mo O 0 L 0M0
Proof: By definition, D0D = i 0 0 . . . 0 0 i 0 0 . . 0 . . . . . . . . . . . . . . . . . . . . . . . . 0 . . . 0 i 0 i 0 . . . 0 0 i 0 . . 0 . . . . . . . . . . . . . . . . . . . . . . . . 0 . . . 0 i = i 0 i 0 . . . 0 0 i 0 i 0 . . 0 . . . . . . . . . . . . . . . . . . . . . . . . 0 . . . 0 i 0 i = T 0 . . . 0 0 T 0 . . 0 . . . . . . . . . . . . . . . . . . . . . . . . 0 . . . 0 T N×N , and therefore INT − D(D0D) −1D0 = IT 0 . . . 0 0 IT 0 . . 0 . . . . . . . . . . . . . . . . . . . . . . . . 0 . . . 0 IT − i 0 . . . 0 0 i 0 . . 0 . . . . . . . . . . . . . . . . . . . . . . . . 0 . . . 0 i T 0 . . . 0 0 T 0 . . 0 . . . . . . . . . . . . . . . . . . . . . . . . 0 . . . 0 T −1 i 0 0 . . . 0 0 i 0 0 . . 0 . . . . . . . . . . . . . . . . . . . . . . . . 0 . . . 0 i 0 = IT − 1 T ii0 0 . . . 0 0 IT − 1 T ii0 0 . . 0 . . . . . . . . . . . . . . . . . . . . . . . . 0 . . . 0 IT − 1 T ii0 = M0 0 . . . 0 0 M0 0 . . 0 . . . . . . . . . . . . . . . . . . . . . . . . 0 . . . 0 M0 . 4
It is easy to see that the matrix MD is idempotent and that Mo 0 y1 y1-y/11 0 Mo 0 92 y 0Mo」Ly yN- UNI and M00 MoXI 0M00 0 MoX MoX where the scalar yi=1/∑1张m,i=1,2,…,N, and let X1=(xnx12….x, then MoX1= Moxi mox12…… MoXi]. Therefore Mo 1,2,…, k with;=1/∑1xt, Denote x1=[n2…, the least squares regression of May on MD X is equivalently to regression of yit-Jil on [xit -Xil The dummy variables coefficient can be recovered from Dy=DxB+DDa+De a=(D①D)-D(y-XB since d'e=0
It is easy to see that the matrix MD is idempotent and that MDy = M0 0 . . . 0 0 M0 0 . . 0 . . . . . . . . . . . . . . . . . . . . . . . . 0 . . . 0 M0 y1 y2 . . . . yN = y1 − y¯1i y2 − y¯2i . . . . yN − y¯N i and MDX = M0 0 . . . 0 0 M0 0 . . 0 . . . . . . . . . . . . . . . . . . . . . . . . 0 . . . 0 M0 X1 X2 . . . . XN = M0X1 M0X2 . . . . M0XN , where the scalar y¯i = 1/T PT t=1 yit, i = 1, 2, ..., N, and let Xi = [xi1 xi2 ....xik], then M0Xi = [M0xi1 M0xi2 ..... M0xik] . Therefore M0xij = xij − x¯ij i, j = 1, 2, ..., k with x¯ij = 1/T PT t=1 xijt . Denote x¯i = [x¯i1 x¯i2 ....x¯ik] 0 , the least squares regression of Mdy on MDX is equivalently to regression of [yit −y¯i ] on [xit − x¯i ]. The dummy variables coefficient can be recovered from D0y = D0Xβˆ + D0Dαˆ + D0 e, or αˆ = (D0D) −1D0 (y − Xβˆ) since D0e = 0. 5
This implies that a1 i′0 0 a 0i0 0 1 aN 是∑=1(-x1t月) t(yNt-X'NtB) 1-1 y2 N-对B Let the fixed effect model be partitioned y=XB+Da+e show that the variance of B is Var(B)=0(XMp X-I Proof: B=(X'MDX-X'Mpy B+(X'MDX-XMDE
This implies that αˆ1 αˆ2 . . . . αˆN = 1 T i 0 0 . . . 0 0 i 0 0 . . 0 . . . . . . . . . . . . . . . . . . . . . . . . 0 . . . 0 i 0 y1 − X1βˆ y2 − X2βˆ . . . . yN − XNβˆ = 1 T [ PT t=1(y1t − x 0 1tβˆ)] 1 T [ PT t=1(y2t − x 0 2tβˆ)]. . . . 1 T [ PT t=1(yNt − x 0 Ntβˆ)] = y¯1 − x¯ 0 1βˆ y¯2 − x¯ 0 2βˆ . . . . y¯N − x¯ 0 Nβˆ . Exercise: Let the fixed effect model be partitioned as y = Xβˆ + Dαˆ + e, show that the variance of βˆ is V ar(βˆ) = σ 2 (X0MDX) −1 . Proof: βˆ = (X0MDX) −1X0MDy = β + (X0MDX) −1X0MDε, 6
therefore, Var(B)= EL6-B)(6-B)y] El((XMDX)XMDE)((XMDXXMDE) EI(X'MDX)XMDEE X(X'MDX) g((XMDX)-XMDINTMDX(XMDX)- a(X'MD X) X(XM X)-1 0(X'MDX) With the above results, the appropriate estimator of Var(B)is therefore Est(Var(B))=s2(X'MpX)-l where the disturbance variance estimator is s2 p - Da)(y-xB- Dc给 Show that Var(a)=+xar()风 1.1 Testing the Significance of the Group Effects Consider the null hypothesis that Ho: 01=02 a. Under this null hypothesis, the efficient estimator is the pooled least squares. The F ration used for the test would be FN-LNt-N-k (REsDy- Pooled)/(N-1) (1-BsD)/(T-N-k) where RisDy indicates the R2 from the dummy variables model and Pooled in- dicates the R2 from the pooled or restricted model with only a single overall
therefore, V ar(βˆ) = E[(βˆ − β)(βˆ − β) 0 ] = E[((X0MDX) −1X0MDε)((X0MDX) −1X0MDε) 0 ] = E[(X0MDX) −1X0MDεε 0MDX(X0MDX) −1 ] = σ 2 [(X0MDX) −1X0MDINTMDX(X0MDX) −1 ] = σ 2 [(X0MDX) −1X0MDX(X0MDX) −1 ] = σ 2 (X0MDX) −1 . With the above results, the appropriate estimator of V ar(βˆ) is therefore Est(V ar(βˆ)) = s 2 (X0MDX) −1 , where the disturbance variance estimator is s 2 s 2 = (y − Xβˆ − Dαˆ) 0 (y − Xβˆ − Dαˆ) NT − N − K = PN i=1 PT t=1(yit − x 0 itβˆ − αˆi) 2 NT − N − k . Exercise: Show that V ar(αˆi) = σ 2 T + x¯ 0 iV ar(βˆ)x¯i . 1.1 Testing the Significance of the Group Effects Consider the null hypothesis that H0 : α1 = α2 = ... = αN = α. Under this null hypothesis, the efficient estimator is the pooled least squares. The F ration used for the test would be FN−1,NT −N−k = (R2 LSDV − R2 Pooled)/(N − 1) (1 − R2 LSDV )/(NT − N − k) , where R 2 LSDV indicates the R 2 from the dummy variables model and R 2 Pooled indicates the R2 from the pooled or restricted model with only a single overall 7
constant Example 13.2 at p 292 of Greene's, where N=6, k=3 and T=15(see Ex. 7.2 on Exercise Reproduce first, third and fourth rows of the results at Table 13. 1 on p. 292 of 1.2 The Within and Between Groups Estimators We could formulate a pooled regression model in three ways. First, the original formulation is xtB+t,i=1,2,…,N;t=1,2,…,T. In term of deviations from the group means vt-折=(xt-x)+ε-,i=1,2,…,N;t=1,2,…,!T and in terms of the group means =a+又B+E,i=1,2,…,N To estimate B by OLS, in(1) we would use the total sum of squares and cross pI ∑∑(xt-卖(xt-天) and ∑∑(xt-m-列 where=M∑∑1 Xit and=M∑1∑1mt In(2), the moments matrices we use are within-group (i.e, deviations from the group means )sums of squares and cross products within ∑∑(xt一)(x一 1t=1
constant. Example: Example 13.2 at p.292 of Greene’s, where N=6, k=3 and T=15 (see Ex. 7.2 on p.118). Exercise: Reproduce first, third and fourth rows of the results at Table 13.1 on p.292 of Greene. 1.2 The Within and Between Groups Estimators We could formulate a pooled regression model in three ways. First, the original formulation is yit = α + x 0 itβ + εit, i = 1, 2, ..., N; t = 1, 2, ..., T. (1) In term of deviations from the group means, yit − y¯i = (xit − x¯i) 0β + εit − ε¯i , i = 1, 2, ..., N; t = 1, 2, ..., T, (2) and in terms of the group means, y¯i = α + x¯ 0 iβ + ε¯i , i = 1, 2, ..., N. (3) To estimate β by OLS, in (1) we would use the total sum of squares and cross products, S total xx = X N i=1 X T t=1 (xit − x¯¯)(xit − x¯¯) 0 and S total xy = X N i=1 X T t=1 (xit − x¯¯)(yit − y¯¯), where x¯¯ = 1 NT PN i=1 PT t=1 xit and y¯¯ = 1 NT PN i=1 PT t=1 yit. In (2), the moments matrices we use are within-group (i.e., deviations from the group means) sums of squares and cross products, S Within xx = X N i=1 X T t=1 (xit − x¯i)(xit − x¯i) 0 8
and within ∑∑(xt一x)(一孙 i=1t=1 Finally, for( 3), the means of group mean are the overall mean (i.e, 1/N Ji y). Therefore the moment matrices are the between-groups sums of squares and cross products SBmn=∑T(x1一x1-天 and SBm∑T(x-)(- It is easy to verify that Total= swithin + SBetueer and SWithin+SBetween Therefore, there are three possible least square estimator of B corresponding to theses decomposition. The least squares estimator in the pooling regression is =(STotal)-lsfot within betae within Between The within-groups estimator is within\-l within his id the Lsdv estimator computed earlier. An alternative estimator would be the between-groups estimator, B etween=(SBetween )-SEBetween This is the least square estimator based on the N sets of group means. From the preceding expression Swithin= Swithin Bwithin
and S Within xy = X N i=1 X T t=1 (xit − x¯i)(yit − y¯i), Finally, for (3), the means of group mean are the overall mean (i.e., 1/N PN i=1 y¯i = y¯¯). Therefore the moment matrices are the between-groups sums of squares and cross products, S Between xx = X N i=1 T(x¯i − x¯¯)(x¯i − x¯¯) 0 and S Between xy = X N i=1 T(x¯i − x¯¯)(y¯i − y¯¯), It is easy to verify that S Total xx = S Within xx + S Between xx and S Total xy = S Within xy + S Between xy . Therefore, there are three possible least square estimator of β corresponding to theses decomposition. The least squares estimator in the pooling regression is βˆTotal = (S Total xx ) −1S Total xy = (S Within xx + S Between xx ) −1 (S Within xy + S Between xy ). The within-groups estimator is βˆWithin = (S Within xx ) −1S Within xy . This id the LSDV estimator computed earlier. An alternative estimator would be the between-groups estimator, βˆBetween = (S Between xx ) −1S Between xy . This is the least square estimator based on the N sets of group means. From the preceding expression, S Within xy = S Within xx βˆWithin 9
and Total (Sw thin Seetueen)-(Swithin B within SBetween B Between) within SEetweeny-Swithin B within +(Swithin Szetweeny-ISEetween B Betwe +F where FWithin =(Sw ithin+SBetween )-ISwithin and WIthin +FBetween=(Swithin+ getween )-(Sw ithin +SBetween )=I. That is the pooling OLS estimator is a matrix weighted average of the within- and between-groups estimator 2 Random effects Consider the model 1,2,…N;t=1,2,,T where there are k regressors including a constant and now the single constant term a is the mean of the unobserved heterogeneity, E(za). The component d is constant through time. We assume further (u)=0; E(2)=a2; E(Eiti)=0 for all i, t, and E(s1s)=0计ft≠sori≠ E(u2u)=0ifi≠j let y; and Xi(including the constant term) be the T observations the ith unit, i be at x 1 column of ones and let m1,n2,…,nr]
and S Between xy = S Between xx βˆBetween , we have βˆTotal = (S Within xx + S Between xx ) −1 (S Within xx βˆWithin + S Between xx βˆBetween) = (S Within xx + S Between xx ) −1S Within xx βˆWithin + (S Within xx + S Between xx ) −1S Between xx βˆBetween = F WithinβˆWithin + F BetweenβˆBetween , where F Within = (S Within xx +S Between xx ) −1S Within xx and F Within+F Between = (S Within xx + S Between xx ) −1 (S Within xx +S Between xx ) = I. That is the pooling OLS estimator is a matrix weighted average of the within- and between-groups estimator. 2 Random Effects Consider the model yit = x 0 itβ + α + ui + εit, i = 1, 2, ..., N; t = 1, 2, ..., T. where there are k regressors including a constant and now the single constant term α is the mean of the unobserved heterogeneity, E(z 0 iα). The component ui is the random heterogeneity specific to the ith observation and is constant through time. We assume further E(εit) = E(ui) = 0; E(ε 2 it) = σ 2 ε ; E(u 2 i ) = σ 2 u ; E(εituj) = 0 for all i,t, and j; E(εitεjs) = 0 if t 6= s or i 6= j; E(uiuj) = 0 if i 6= j. Denote ηit = εit + ui , let yi and Xi (including the constant term) be the T observations the ith unit, i be a T × 1 column of ones, and let ηi = [ηi1, ηi2, ...., ηiT ] 0 , 10