4-1 Chapter 4 Further issues with the classical linear regression model
4-1 Chapter 4 Further issues with the classical linear regression model
4-2 本章目标 继续讨论古典线性回归模型 ·了解确定模型优劣的各种方法 ·普通最小二乘法OLS可能遇到的各种问题及其处理
4-2 本章目标 继续讨论古典线性回归模型 • 了解确定模型优劣的各种方法 • 普通最小二乘法OLS可能遇到的各种问题及其处理
4-3 1 Goodness of fit statistics We would like some measure of how well our regression model actually fits the data. We have goodness of fit statistics to test this: i.e. how well the sample regression function(srf) fits the data. The most common goodness of fit statistic is known as R2.One way to define rl is to say that it is the square of the correlation coefficient between y and y For another explanation, recall that what we are interested in doing is explaining the variability of y about its mean value, y i,, the total sum of squares,TSS总变差: 7SS=∑(01-y) We can split the Tss into two parts, the part which we have explained (known as the explained sum of squares, ESS)and the part which we did not explain using the model (the rss)
4-3 1 Goodness of Fit Statistics • We would like some measure of how well our regression model actually fits the data. * • We have goodness of fit statistics to test this: i.e. how well the sample regression function (srf) fitsthe data. • The most common goodness of fit statistic is known as R2 . One way to define R2 is to say that it is the square of the correlation coefficient between y and . • For another explanation, recall that what we are interested in doing is explaining the variability of y about its mean value, , i.e. the total sum of squares, TSS总变差: • We can split the TSS into two parts, the part which we have explained (known as the explained sum of squares, ESS) and the part which we did not explain using the model (the RSS)*. y $ = ( − ) t t TSS y y 2 y
4-4 Defining R2 That is. SS Ess RSS ∑(-y)=∑(1-y)+∑4 Goodness of fit statistic is R2- ESS TSS ESS TSS- RSS Rss R TSS TSS TSS R must always lie between zero and one. To understand this consider two extremes RSS= TSSie. ESS=0 S0 R= ESS/TSS=0 ESS= SS ie. RSS=0 S0 R2= ESS/TSS= 1
4-4 Defining R2 • That is, TSS = ESS + RSS • Goodness of fit statistic is • R2 must always lie between zero and one. To understand this, consider two extremes RSS = TSS i.e. ESS = 0 so R2 = ESS / TSS = 0 ESS = TSS i.e. RSS = 0 so R2 = ESS / TSS = 1 R ESS TSS 2 = R ESS TSS TSS RSS TSS RSS TSS 2 = = 1 − = − ( − ) = ( − ) + t t t t yt y yt y u 2 2 2 ˆ ˆ
4-5 The Limit Cases: 2=0 and R2=1 y yI
4-5 The Limit Cases: R2 = 0 and R2 = 1 t y y t x t y t x
Problems with r2 as a goodness offif4-6 Measure 1. R is defined in terms of variation about the mean of y so that if a model is reparameterised (rearranged) and the dependent variable changes, R will change. 2. R never falls if more regressors are added to the regression, e.g. consider Regression 1: y=B1+B2x2t+Bx3t+ut Regression 2: y B1+B22+B3x3+B4 4t R2 will always be at least as high for regression 2 relative to regression 1 3. R2 quite often takes on values of 0.9 or higher for time series regressions
4-6 Problems with R2 as a Goodness of Fit Measure 1. R2 is defined in terms of variation about the mean of y so that if a model is reparameterised (rearranged) and the dependent variable changes, R2 will change. 2. R2 never falls if more regressors are added to the regression, e.g. consider: Regression 1: yt = 1 + 2x2t + 3x3t + ut Regression 2: y = 1 + 2x2t + 3x3t + 4x4t + ut R2 will always be at least as high for regression 2 relative to regression 1. 3. R2 quite often takes on values of 0.9 or higher for time series regressions
4-7 Adiusted R2 In order to get around these problems, a modification is often made which takes into account the loss of degrees of freedom associated with adding extra variables. This is known as r3, or adjusted R2 T-1 R2=1 (1-R) T-k So if we add an extra regressor k increases and unless r2 increases by a more than offsetting amount,R2 will actually fall ·R可用于决定某一变量是否应包括在模型中。 There are still problems with the criterion: 1.A“sof”'rule。如果只按这一标准选择模型,模型中会 包含很多边际显著或不显著的变量。 2. No distribution for r or r2。从而不能进行假设检验, 以比较一个模型的拟合优度是否显著高于另一个模型
4-7 Adjusted R2 • In order to get around these problems, a modification is often made which takes into account the loss of degrees of freedom associated with adding extra variables. This is known as , or adjusted R2: • So if we add an extra regressor, k increases and unless R2 increases by a more than offsetting amount, will actually fall. • 可用于决定某一变量是否应包括在模型中。 • There are still problems with the criterion: 1. A “soft” rule。如果只按这一标准选择模型,模型中会 包含很多边际显著或不显著的变量。 2. No distribution for or R2。从而不能进行假设检验, 以比较一个模型的拟合优度是否显著高于另一个模型。2 R − − − = − (1 ) 1 1 2 2 R T k T R 2 R 2 R 2 R
4-8 2 A Regression Example: Hedonic House pricing Models Hedonic models are used to value real assets especially housing and view the asset as representing a bundle of characteristics Des rosiers and Therialt (1996)consider the effect of various amenities on rental values for buildings and apartments 5 sub-markets in the Quebec area of canada. The rental value in Canadian Dollars per month (the dependent variable)is a function of 9 to 14 variables (depending on the area under consideration). The paper employs 1990 data, and for the Quebec city region, there are 13, 378 observations, and the 12 explanatory variables are:
4-8 2 A Regression Example: Hedonic House Pricing Models • Hedonic models are used to value real assets, especially housing, and view the asset as representing a bundle of characteristics. • Des Rosiers and Thérialt (1996) consider the effect of various amenities on rental values for buildings and apartments 5 sub-markets in the Quebec area of Canada. • The rental value in Canadian Dollars per month (the dependent variable) is a function of 9 to 14 variables (depending on the area under consideration). The paper employs 1990 data, and for the Quebec City region, there are 13,378 observations, and the 12 explanatory variables are:
4-9 LnagE log of the apparent age of the property NBROOMS number of bedrooms AREABYRM -area per room (in square metres) ELEVATOR a dummy variable l if the building has an elevator: o otherwise BASEMENT a dummy variable =1 if the unit is located in a basement;0 otherwise OUTPARK number of outdoor parking spaces NDPARK number of indoor parking spaces NOLEASE a dummy variable= l if the unit has no lease租借权 attached to it;0 otherwise LndIstcBd log of the distance in kilometres to the central business district SINGLPAR percentage of single parent families in the area where the building stands DSHOPCNTR distance in kilometres to the nearest shopping centre VACDIFFI vacancy difference between the building and the census figure
4-9 LnAGE - log of the apparent age of the property NBROOMS - number of bedrooms AREABYRM - area per room (in square metres) ELEVATOR - a dummy variable = 1 if the building has an elevator; 0 otherwise BASEMENT - a dummy variable = 1 if the unit is located in a basement; 0 otherwise OUTPARK - number of outdoor parking spaces INDPARK - number of indoor parking spaces NOLEASE - a dummy variable = 1 if the unit has no lease租借权 attached to it; 0 otherwise LnDISTCBD - log of the distance in kilometres to the central business district SINGLPAR - percentage of single parent families in the area where the building stands DSHOPCNTR - distance in kilometres to the nearest shopping centre VACDIFF1 - vacancy difference between the building and the census figure
4-10 Hedonic House Pricing models Examine the signs and sizes of the coefficients. The coefficient estimates themselves show the Canadian dollar rental price per month of each feature of the dwelling The coefficient on the constant term often has little useful interpretation
4-10 Hedonic House Pricing Models: • Examine the signs and sizes of the coefficients. – The coefficient estimates themselves show the Canadian dollar rental price per month of each feature of the dwelling. • The coefficient on the constant term often has little useful interpretation