Chapter 9 Regression on Dummy Explanatory variables
Chapter 9 Regression on Dummy Explanatory Variables
9. 1 The Nature of Dummy Variables ●1. Concept: Dummy variables(also indicator variables; binary variables categorical variables, dichotomous variables. Qualitative variables in regression model For example: sex, race, color, religion, nationality, marital status. etc Qualitative variables can be quantified by constructing artificial variables that take on values of l or o 0: indicating the absence of an attribute 1: indicating the presence(or possession) of that attribute Dummy variable (D): variables that assume values such aso and
9.1 The Nature of Dummy Variables ⚫ 1.Concept: Dummy variables (also indicator variables; binary variables; categorical variables; dichotomous variables.)-- Qualitative variables in regression model. For example: sex, race, color, religion, nationality, marital status, etc. Qualitative variables can be quantified by constructing artificial variables that take on values of 1 or 0: 0: indicating the absence of an attribute 1: indicating the presence (or possession) of that attribute. Dummy variable(D): variables that assume values such as 0 and 1
o 2. ANOVA model. Regression models that contain only dummy explanatory variables are called analysis-of-variance (ANOVA)models B,+BD. +u sociology, psychology, education, market research elds of anoVa models are usually used in the fi (1) Dummy variables generally tal ke on values of l or 0, they are nonstochastic; that is, their values are fixed (2) Estimation Dummy explanatory variables do not pose any new estimation problems, under the assumptions of CLRM, we can use the customary ols method to estimate the parameters of models that contain dummy variables
⚫ 2. ANOVA model: ⚫ ——Regression models that contain only dummy explanatory variables are called analysis-of-variance (ANOVA) models. Yi = B1+B2Di +ui ANOVA models are usually used in the fields of sociology, psychology, education, market research. (1)Dummy variables generally take on values of 1 or 0, they are nonstochastic; that is, their values are fixed. (2)Estimation: Dummy explanatory variables do not pose any new estimation problems, under the assumptions of CLRM, we can use the customary OLS method to estimate the parameters of models that contain dummy variables
9.2 Regression with one Quantitative Variable and One Qualitative Variable, with Two Categories ANCOVa models Y1=B1+B2D1+B3X1+u1(9 Features 1. If a qualitative variable has m categories, introduce(m-1) dummy variables If there are only two categories use only one dummy variables 2. The assignment of 1 and o val ues to two categories, such as male and female, is arbitrary 3. The category that is assigned the value ofo is often referred to as the base, bench mark control, comparison or omitted category 4. The coefficient B, attached to the dummy variable d can be called the differential intercept coefficient because it tells by how much the value of the intercept term of the category that receives the value of 1 differs from the intercept coefficient of the base category
9.2 Regression with one Quantitative Variable and One Qualitative Variable, with Two Categories --ANCOVA models Yi = B1+B2Di +B3Xi +ui (9.6) Features : 1. If a qualitative variable has m categories, introduce (m – 1) dummy variables. If there are only two categories, use only one dummy variables 2. The assignment of 1 and 0 values to two categories, such as male and female, is arbitrary. 3. The category that is assigned the value of 0 is often referred to as the base , bench mark, control, comparison, or omitted category. 4. The coefficient B2 attached to the dummy variable D can be called the differential intercept coefficient because it tells by how much the value of the intercept term of the category that receives the value of 1 differs from the intercept coefficient of the base category
9.3 Regression on a Quantitative Variable and a Qualitative Variable with More Than Two Classes or Categories--Introduce m-1 dummy variables ●1. Model: Y1=B1+B2D21+B3D31+B4X1+u E(YID2 =0D3=0X =B+B4X (9.14 E(YD2=1D3=0X)=(B1+B2)+B4 (9.15) E(YD2=0D3=1X)(B1+B3)+B4x1 (9.16) ●2. Estimate
9.3 Regression on a Quantitative Variable and a Qualitative Variable with More Than Two Classes or Categories——Introduce m-1 dummy variables ⚫ 1. Model: Yi = B1+B2D2i +B3 D3i +B4Xi +ui (9.13) E(Yi |D2 =0′D3 =0′Xi )=B1+B4Xi (9.14) E(Yi |D2 =1′D3 =0′Xi )=(B1+B2 )+B4Xi (9.15) E(Yi |D2 =0′D3 =1′Xi )=(B1+B3 )+B4Xi (9.16) ⚫ 2. Estimate
9.4 Regression on One Quantitative Variable and Two Qualitative Variables ● 1. Model Y1=B1+B2D21+B3D3+B4X1+u1 9.18) E(YD2=0D3=0X)B1+B4x1 (9.19 E(YD2=1D3=0X)(B1+B2)+B4 (9.20) E(YD2=0D3=1X)(B1+B3)+B4X (9.21) E(YD2=1D3=1X)=(B1+B2+B3)+B4X;(9.22) ●2. Estimate
9.4 Regression on One Quantitative Variable and Two Qualitative Variables ⚫ 1.Model Yi = B1+B2D2i +B3D3i +B4Xi +ui (9.18) E(Yi |D2 =0′D3 =0′Xi )=B1+B4Xi (9.19) E(Yi |D2 =1′D3 =0′Xi )=(B1+B2 )+B4Xi (9.20) E(Yi |D2 =0′D3 =1′Xi )=(B1+B3 )+B4Xi (9.21) E(Yi |D2 =1′D3 =1′Xi )=(B1+B2+B3 )+B4Xi (9.22) ⚫ 2. Estimate
9,5A Generalization We can extend our model to include more than on quantitative variable and more than two qualitative variables, but the number of dummies for each qualitative variable is one less than the number of categories of that variable
9.5 A Generalization We can extend our model to include more than on quantitative variable and more than two qualitative variables, but the number of dummies for each qualitative variable is one less than the number of categories of that variable
9.6 Structural Stability of Regression Models The Dummy Variable Approach The Chow test did not tell us whether the difference in these two regressions is in their intercept values or the slope values or both Y1=A1+A2X1+u1 (9.23) Y1=B1+B2X1+u2(9.24) 1. A=B, and A=B,: coincident regression, the two regressions are identical 2. A,+B, but A,=B,, parallel regressions, the two regressions have different intercepts but same slope 3. A B, but A2+B2; concurrent regressions, the two regressions have same intercepts but different slope 4. A1+B, and A2+B2, dissimilar regressions, the two regressions are different in both intercepts and slope
9.6 Structural Stability of Regression Models: The Dummy Variable Approach The Chow test did not tell us whether the difference in these two regressions is in their intercept values or the slope values or both. Yt =A1 +A2Xt +u1t (9.23) Yt =B1 +B2Xt +u2t (9.24) 1. A1=B1 and A2=B2 ; coincident regression, the two regressions are identical 2. A1 ≠B1 but A2=B2 ; parallel regressions, the two regressions have different intercepts but same slope. 3. A1=B1 but A2 ≠B2 ; concurrent regressions, the two regressions have same intercepts but different slope. 4. A1 ≠B1 and A2 ≠B2 ; dissimilar regressions, the two regressions are different in both intercepts and slope
To find out which possibility will be, we can use dummy variable technique to check the model Y:CI+CD +C3X +CA(D, X)F (9.25) E(YID=0.X=C1+C3X E(YD=1X)=(C1+C2)+(C3+C4)X1 C: differential intercept 4: differential slope coefficient Y=A+A,X +uIt 9.23 Yt=B1+B2X+u2t(924) When a=C1,A2C3,then(9.26)=(9.23) B1=C1+C2,B2=C3+C4.then(9.27)=(9.24)
To find out which possibility will be, we can use dummy variable technique to check the model: Yt=C1+C2Dt+C3Xt+C4 (Dt·Xt )+ut (9.25) E(Yt |Dt =0′Xt )=C1+C3Xt (9.26) E(Yt |Dt =1′Xt )=(C1+C2 )+(C3+C4 )Xt (9.27) C2 : differential intercept; C4 : differential slope coefficient Yt =A1 +A2Xt +u1t (9.23) Yt =B1 +B2Xt +u2t (9.24) When A1 = C1 , A2 = C3 , then (9.26) =(9.23). B1 = C1+C2 , B2 = C3+C4, . then (9.27) =(9.24)
Advantages to the dummy variable approach 1. Instead of running three regressions (7.54),(7.55), and(7.56)under the Chow test, in the dummy variable approach all we have to do is to run just one regression 2. From the differential dummy and intercept coefficients, we can point out the source() of the difference
Advantages to the dummy variable approach. 1. Instead of running three regressions (7.54), (7.55), and (7.56) under the Chow test, in the dummy variable approach all we have to do is to run just one regression. 2. From the differential dummy and intercept coefficients, we can point out the source(s) of the difference