3-1 Chapter 3 a brief overview of the Iwom classical linear regression model
3-1 Chapter 3 A brief overview of the classical linear regression model
3-2 1 Regression Regression is probably the single most important tool at the econometricians disposal. What is regression analysis It is concerned with describing and evaluating the relationship between a given variable (usually called the dependent variable) and one or more other variables ( usually known as the independent variable(s). 回归是试图用自变量的变动来解释因变量的变化
3-2 1 Regression • Regression is probably the single most important tool at the econometrician’s disposal. What is regression analysis? • It is concerned with describing and evaluating the relationship between a given variable (usually called the dependent variable) and one or more other variables (usually known as the independent variable(s)). • 回归是试图用自变量的变动来解释因变量的变化
3-3 Some notation Denote the dependent variable by y and the independent variable(s) by x1 x2,., xk where there are k independent variables Some alternative names for the y and x variables: dependent variable Independent variables regressand regressors effect variable causal variables explained variable explanatory variable Note that there can be many x variables but we will limit ourselves to the case where there is only one x variable to start with In our set-up, there is only one y variable
3-3 Some Notation • Denote the dependent variable by y and the independent variable(s) by x1 , x2 , ... , xk where there are k independent variables. • Some alternative names for the y and x variables: y x dependent variable independent variables regressand regressors effect variable causal variables explained variable explanatory variable • Note that there can be many x variables but we will limit ourselves to the case where there is only one x variable to start with. In our set-up, there is only one y variable
3-4 2 Regression is different from Correlation If we say y and x are correlated, it means that we are treating y and x in a completely symmetrical way. In regression, we treat the dependent variable v) and the independent variable(s('s)very differently. The y variable is assumed to be random or "stochastic" in some way, i.e. to have a probability distribution. The x variables are. however. assumed to have fixed (non-stochastic)values in repeated samples Regression as a tool is more flexible and powerful than correlation
3-4 2 Regression is different from Correlation • If we say y and x are correlated, it means that we are treating y and x in a completely symmetrical way. • In regression, we treat the dependent variable (y) and the independent variable(s) (x’s) very differently. The y variable is assumed to be random or “stochastic” in some way, i.e. to have a probability distribution. The x variables are, however, assumed to have fixed (“non-stochastic”) values in repeated samples. • Regression as a tool is more flexible and powerful than correlation
3-5 3 Simple regression For simplicity, say k-l. This is the situation where y depends on only one x variable. Examples of the kind of relationship that may be of interest include: How asset returns vary with their level of market risk Measuring the long-term relationship between stock prices and dividends. Constructing an optimal hedge ratio(套期比)
3-5 3 Simple Regression • For simplicity, say k=1. This is the situation where y depends on only one x variable. • Examples of the kind of relationship that may be of interest include: – How asset returns vary with their level of market risk – Measuring the long-term relationship between stock prices and dividends. – Constructing an optimal hedge ratio(套期比)
3-6 Simple regression: An Example Suppose that we have the following data on the excess returns on a fund manager' s portfolio(“fundⅹxX” together with the excess returns on a market index: Year t Ex cess return Excess return on market index r 178 39.0 23.2 12.8 69 242 16.8 17.2 12.3 We therefore want to find whether there appears to be a relationship between x and y given the data that we have. The first stage would be to form a scatter plot of the two variables
3-6 Simple Regression: An Example • Suppose that we have the following data on the excess returns on a fund manager’s portfolio (“fund XXX”) together with the excess returns on a market index: • We therefore want to find whether there appears to be a relationship between x and y given the data that we have. The first stage would be to form a scatter plot of the two variables. Year, t Excess return = rXXX,t – rft Excess return on market index = rmt - rft 1 17.8 13.7 2 39.0 23.2 3 12.8 6.9 4 24.2 16.8 5 17.2 12.3
3-7 Graph(Scatter Diagram) 45 40 30 soE92 25 20 15 610 0 15 20 25 Excess return on market portfolio
3-7 Graph (Scatter Diagram) 0 5 10 15 20 25 30 35 40 45 0 5 10 15 20 25 Excess return on market portfolio Excess return on fund XXX
3-8 Finding a Line of best Fit We can use the general equation for a straight line Fa+bx to get the line that best“fts” the data However, this equation (y=a+bx) is completely deterministic Is this realistic? No. so what we do is to add a random disturbance term u into the equation y=a+Bx,+u, where t=1,2,3,4,5
3-8 Finding a Line of Best Fit • We can use the general equation for a straight line, y=a+bx to get the line that best “fits” the data. • However, this equation (y=a+bx) is completely deterministic. • Is this realistic? No. So what we do is to add a random disturbance term, u into the equation. yt = + xt + ut where t = 1,2,3,4,5
3-9 Why do we include a disturbance term? The disturbance term can capture a number of features We always leave out some determinants of There may be errors in the measurement of y, that cannot be modelled Random outside influences on v, which we cannot moc
3-9 Why do we include a Disturbance term? • The disturbance term can capture a number of features: - We always leave out some determinants of yt - There may be errors in the measurement of yt that cannot be modelled. - Random outside influences on yt which we cannot model
3-10 Determining the regression Coefficients So how do we determine what a and B are? Choose a and B so that the(vertical) distances from the data points to the fitted lines are minimised(so that the line fits the data as closely as possible)
3-10 Determining the Regression Coefficients • So how do we determine what and are? • Choose and so that the (vertical) distances from the data points to the fitted lines are minimised (so that the line fits the data as closely as possible): y x