Dayu Wu Applied Statistics Lecture Notes 1 Revision 1.E(∑)=∑Ex 2.Var(x+y)=Var(z)+Var(y)+2Cov(x,y) 3.Cou(aiz+by,azx bay)=a2Cou(aix+by,)b2Cou(aix+by,y) 4.Hypothesis Test:Ho,H1,p-value 5.If ~N(u,2),thenN(0,1)and (1-a)CI is (u-zgu+z) 2 Linear Regression Basic Concepts 1.:=f(x)+ 2.Gauss-Markov: ·E(e)=0 ·Var(e)=g2 ·Cov(e,e)=0 3.Regrssion Model:E(yl)=Bo+B,y=(y,...,Un)T,x=(x1,...,n)T 4.G=-3-月西 5.e=h-成-房1 6.∑(:-)万=0 Estimation:OLSE 1.Loss Function:Q=∑e=∑(:--1x)2 2.First Order Condition:器=0and9器=0 Ths-2∑(h--房x)=0and-2∑x,(-%-)=0 Equally:∑e=0and∑x:e=0 3.Center(,:可=民+A元 4.高=7-月正 5.=兴==器=爵 1of6
Dayu Wu Applied Statistics Lecture Notes 1 Revision 1. E( Pxi) = PExi 2. V ar(x + y) = V ar(x) + V ar(y) + 2Cov(x, y) 3. Cov(a1x + b1y, a2x + b2y) = a2Cov(a1x + b1y, x) + b2Cov(a1x + b1y, y) 4. Hypothesis Test: H0, H1, p-value 5. If x ∼ N(µ, σ2 ), then x−µ σ ∼ N(0, 1) and (1 − α) CI is µ − σZ α 2 , µ + σZ α 2 2 Linear Regression Basic Concepts 1. yi = f(xi) + ϵi 2. Gauss-Markov: • E(ϵi) = 0 • V ar(ϵi) = σ 2 • Cov(ϵi , ϵj ) = 0 3. Regrssion Model: E(y|x) = β0 + β1x, y = (y1, . . . , yn) T , x = (x1, . . . , xn) T 4. ϵi = yi − β0 − β1xi 5. ei = yi − βb0 − βb1xi 6. P(xi − x¯)¯y = 0 Estimation: OLSE 1. Loss Function: Q = Pe 2 i = P(yi − β0 − β1xi) 2 2. First Order Condition: ∂Q ∂β0 = 0 and ∂Q ∂β1 = 0 Thus −2 P(yi − β0 − β1xi) = 0 and −2 Pxi(yi − β0 − β1xi) = 0 Equally: Pei = 0 and Pxiei = 0 3. Center (¯y, x¯): y¯ = βb0 + βb1x¯ 4. βb0 = ¯y − βb1x¯ 5. βb1 = Lxy Lxx = P( Pxi−x¯)(yi−y¯) (xi−x¯) 2 = P P (xi−x¯)yi (xi−x¯) 2 = P Pxiyi−nx¯y¯ x 2 i −n(¯x) 2 1 of 6
Dayu Wu Applied Statistics Lecture Notes Estimation:MLE 1.Assumption:N(0,02) 2.Probability Density Function:() 3.Likelihood Function: L(3,61,o2)=Πf=(2ro2)-ep{-a∑(h--31x)2} 4.Log Likelihood Function: 1ogL(8,B1,σ2)=-号1og2πa2)-a∑(--3x)2 5.First Order Condition:坐=-女+点∑(-%-ax)2=0 6.2=∑(贴-%-3)2=1∑ Estimation:B 1.房alinear combination of:a==器=∑rh 2.Unbiased:E=∑Eh=∑(6+r)=A 3.Var()=∑()VarW==是 4.a~(a,2) Estimation:Bo 1.房is a linear combination of张:A=可-月i=A∑h-r∑z,二rh 2.Unbiased:E3o=E(-B1)= 3.Cow(,a)=Cov(片∑h,∑rh)=∑Var()-0 4.Cou(h,)=Cou(e,6)=0 5.Var()=Var(g-a)=Var()+2Var(a)-2zCou(@,a)=(很+器) 6.属~N(,(+兰)2) 7.Co(,a)=Cou(@-a,)=Cou(@,A)-Var(间=-产o 2of6
Dayu Wu Applied Statistics Lecture Notes Estimation: MLE 1. Assumption: ϵi ∼ N(0, σ2 ) 2. Probability Density Function: fX(x) = √ 1 2πσ e 1 2σ2 (x−µ) 2 3. Likelihood Function: L(β0, β1, σ2 ) = Πfyi = (2πσ2 ) − n 2 exp{− 1 2σ2 P(yi − β0 − β1xi) 2} 4. Log Likelihood Function: log L(β0, β1, σ2 ) = − n 2 log(2πσ2 ) − 1 2σ2 P(yi − β0 − β1xi) 2 5. First Order Condition: ∂ log L ∂σ2 = − n 2σ2 + 1 2σ4 P(yi − β0 − β1xi) 2 = 0 6. bσ 2 = 1 n P(yi − β0 − β1xi) 2 = 1 n Pe 2 i Estimation: βb1 1. βb1 is a linear combination of yi : βb1 = P P (xi−x¯)yi (xi−x¯) 2 = P Pxi−x¯ (xi−x¯) 2 yi 2. Unbiased: Eβb1 = P Pxi−x¯ (xi−x¯) 2Eyi = P Pxi−x¯ (xi−x¯) 2 (β0 + β1xi) = β1 3. V ar(βb1) = P Pxi−x¯ (xi−x¯) 2 2 V ar(yi) = σ 2 P(xi−x¯) 2 = σ 2 Lxx 4. βb1 ∼ N(β1, σ 2 Lxx ) Estimation: βb0 1. βb0 is a linear combination of yi : βb0 = ¯y − βb1x¯ = 1 n Pyi − x¯ P Pxi−x¯ (xi−x¯) 2 yi 2. Unbiased: Eβb0 = E(¯y − β1x¯) = β0 3. Cov(¯y, βb1) = Cov( 1 n Pyi , P Pxi−x¯ (xi−x¯) 2 yi) = 1 n P Pxi−x¯ (xi−x¯) 2 V ar(yi) = 0 4. Cov(yi , yj ) = Cov(ϵi , ϵj ) = 0 5. V ar(βb0) = V ar(¯y−βb1x¯) = V ar(¯y)+¯x 2V ar(βb1)−2¯xCov(¯y, βb1) = 1 n + x¯ 2 Lxx σ 2 6. βb0 ∼ N(β0, 1 n + x¯ 2 Lxx σ 2 ) 7. Cov(βb0, βb1) = Cov(¯y − x¯βb1, βb1) = Cov(¯y, βb1) − xV ar ¯ (βb) = − x¯ Lxx σ 2 2 of 6
Dayu Wu Applied Statistics Lecture Notes Estimation: 1.可=a+月1 2.Unbiased:E=8o+B=Ey 3.Var(俞=Var(a)+x2var(a)+2rCou(a,属)=(任+=2)2 4.前~N(+x,(日+) T test 1.o:a=0,房N(0,) 2.=2∑c=2∑(-) 3t=高= F test 1.SST=∑(h-列2=Lw 2.SSR=∑(-2-会~X好 3.SSE=∑(-)2=(n-2)~X-2 4.SST SSR+SSE 5.F=胎心a-2 Correlation 1r=aV儡 2.2-器=是 3.R2=1- 4.t=心tn- Residual 1.Ee:=0 2.Var(e)=Var(-)=(1-A-)o2 3.Leverage:h 3of6
Dayu Wu Applied Statistics Lecture Notes Estimation: yb 1. yb = βb0 + βb1x 2. Unbiased: Eyb = β0 + β1x = Ey 3. V ar(yb) = V ar(βb0) + x 2V ar(βb1) + 2xCov(βb1, βb0) = 1 n + (x−x¯) 2 Lxx σ 2 4. yb ∼ N(β0 + β1x, 1 n + (x−x¯) 2 Lxx σ 2 ) T test 1. H0 : β1 = 0, βb1 ∼ N(0, σ 2 Lxx ) 2. σb 2 = 1 n−2 Pe 2 i = 1 n−2 P(yi − ybi) 2 3. t = βb √ 1 σb2/Lxx = βb1 √ Lxx σb F test 1. SST = P(yi − y¯) 2 = Lyy 2. SSR = P(ybi − y¯) 2 = L2 xy Lxx ∼ χ 2 1 3. SSE = P(yi − ybi) 2 = (n − 2)σb 2 ∼ χ 2 n−2 4. SST = SSR + SSE 5. F = SSR/1 SSE/(n−2) ∼ F1,n−2 Correlation 1. r = √ Lxy LxxLyy = βb1 qLxx Lyy 2. r 2 = SSR SST = L2 xy LxxLyy 3. R2 = 1 − SSE SST 4. t = √ n−2r √ 1−r 2 ∼ tn−2 Residual 1. Eei = 0 2. V ar(ei) = V ar(yi − ybi) = 1 − 1 n − (x−x¯) 2 Lxx σ 2 3. Leverage: hi = 1 n + (x−x¯) 2 Lxx 3 of 6
Dayu Wu Applied Statistics Lecture Notes Confidential Interval 1.高~N(,) 2.a~N(,(信+)) 3.可N(风+A(日+)) 4.For example: mwP(品sga--1-a Then (1-a)IC is ( 3 Multiple Regression Basic Concepts 1x11…xp e1 2 1x212p i…m Bn) 2.Y=X8+e 3.G~N(0,o2),id 4.e N(0,aIn) Estimation:OLS 1.Note that:Xe=0. Thus T(Y-3)=0 We have XTY=XTxB. When det(xrx)≠0,B=(xrX)-1xTY. 2.9=x3=XXTX)-1XTY=HY 3.Hat matrix:H2=H,tr(H)=ha =p+1 4.e=Y-=(In -H) 5.Cou(e,e)=a2(In-H),Var(e;)=a2(1-ha) 6=∑ 4 of6
Dayu Wu Applied Statistics Lecture Notes Confidential Interval 1. βb1 ∼ N(β1, σ 2 Lxx ) 2. βb0 ∼ N(β0, 1 n + x¯ 2 Lxx σ 2 ) 3. yb ∼ N(β0 + β1x, 1 n + (x−x¯) 2 Lxx σ 2 ) 4. For example: t = βb √1−β1 σb2/Lxx ∼ tn−2 Thus P βb √1−β1 σb2/Lxx ≤ t α 2 (n − 2) = 1 − α Then (1 − α) IC is βb1 − √ σb Lxx t α 2 , βb1 + √ σb Lxx t α 2 3 Multiple Regression Basic Concepts 1. y1 y2 . . . yn = 1 x11 . . . x1p 1 x21 . . . x2p . . . . . . . . . . . . 1 xn1 . . . xnp β1 β2 . . . βn + ϵ1 ϵ2 . . . ϵn 2. Y = Xβ + ϵ 3. ϵi ∼ N(0, σ2 ), iid 4. ϵ ∼ N(0, σIn) Estimation: OLS 1. Note that: Xϵ = 0. Thus X T (Y − Xβb) = 0. We have X TY = X TXβb. When det(X TX) ̸= 0, βb = (X TX) −1X TY. 2. Yb = Xβb = XXTX) −1X TY = HY 3. Hat matrix: H2 = H, tr(H) = Phii = p + 1 4. e = Y − Yb = (In − H)Y 5. Cov(e, e) = σ 2 (In − H), V ar(ei) = σ 2 (1 − hii) 6. σb 2 = 1 n−p−1 Pe 2 i 4 of 6
Dayu Wu Applied Statistics Lecture Notes Estimation:MLE 1.Y~N(XB,021n) 2.L=(2m)(a2)-号ep{-克(Y-X8)T∑-1(Y-X8)} Propositions 1.8=(☒X)-lxTY 2.E(8=8 3.Var(8)=2(Xrz)-1 4.Gauss-Markov:E(Y)=XB,Var(Y)=a2I 5.Cou(8,e)=0 6.Y~N(XB,21) 7.a~N(8,g2(xx)-1) 8.要~X-p-1 Test 1.:==A=0,F=sg~Fp,n-p-) 2.o:8=0,6=高心n-p-1 3.F=号 4.(1-a)CIof月:(8-ta2vVGo,月+ta2vca) Standardization 1坊瑞 2斯=瑞 3两=2 Correlation 1.r 2 5 of 6
Dayu Wu Applied Statistics Lecture Notes Estimation: MLE 1. Y ∼ N(Xβ, σ2 In) 2. L = (2π) − n 2 (σ 2 ) − n 2 exp{− 1 2σ2 (Y − Xβ) TΣ −1 (Y − Xβ)} Propositions 1. βb = (X TX) −1X TY 2. E(βb) = β 3. V ar(β) = σ 2 (X TX) −1 4. Gauss-Markov: E(Y) = Xβ, V ar(Y) = σ 2 In 5. Cov(β, e b ) = 0 6. Y ∼ N(Xβ, σ2 In) 7. βb ∼ N(β, σ2 (X TX) −1 ) 8. SSE σ2 ∼ χ 2 n−p−1 Test 1. H0 : β1 = · · · = βn = 0, F = SSR/p SSE/(n−p−1) ∼ F(p, n − p − 1) 2. H0 : βj = 0, tj = βcj √cjjσb ∼ tn−p−1 3. Fj = t 2 j 4. (1 − α) CI of βj : (βbj − tα/2 √cjjσ, b βbj + tα/2 √cjjσb) Standardization 1. x ∗ ij = x√ ij−x¯j Ljj 2. y ∗ i = √ yi−y¯ Lyy 3. β ∗ j = √ Ljj √ Lyy βbj Correlation 1. r = 1 r12 . . . r1n r21 1 . . . r2n . . . . . . . . . . . . rn1 rn2 . . . 1 2. r 2 y1;2 = SSE(x2)−SSE(x1,x2) SSE(x2) 5 of 6
Dayu Wu Applied Statistics Lecture Notes 3.n23p=7 4n2s=南 4 Violation of Regression Assumptions Heteroscedasticity 1.r,=1-mt-i 2t=是 Weighted Least Square 1.Loss Function:Q=∑,e2=∑,(--3)月 2.B=(XTWX)-XTWY Box-Cox 1.A≠0,y= 2.1=0,Y)=1ogy Autocorrelation 1.p=∑n2t-1/V∑2GV∑2- 2.DW-∑-2(e-e-1)2/∑-2≈2(1-)∈0,4刂 Outlier,High Leverage Point,and Influential Point 1.Outlier:big le,extreme y 2.High Leverage Point:extreme 3.Influential Point:result in different regression equations without it 4.extreme X:Cook's distance D. 2 5.extreme Y:e()= 5 Variable Selection 1.Full Model and Selected Model 2.Criteria:R,AIC,Cp 3.Forward,Backward,and Stepwise 6of6
Dayu Wu Applied Statistics Lecture Notes 3. r12:3...p = √−∆12 ∆11∆22 4. r12:3 = √ r12−r13r23 (1−r 2 13)(1−r 2 12) 4 Violation of Regression Assumptions Heteroscedasticity 1. rs = 1 − 6 n(n2−1)Pd 2 i 2. t = √ √n−2rs 1−r 2 s Weighted Least Square 1. Loss Function: Q = Pwie 2 i = Pwi(yi − β0 − β1xi) 2 2. βbw = (XTW X) −1XTW Y Box-Cox 1. λ ̸= 0, Y (y) = Y λ−1 λ 2. λ = 0, Y (y) = log Y Autocorrelation 1. ρ = Pn t=2 ϵtϵt−1/ pPn t=2 ϵ 2 t pPn t=2 ϵ 2 t−1 2. DW = Pn t=2(et − et−1) 2/ Pn t=2 e 2 i ≈ 2(1 − ρb) ∈ [0, 4] Outlier, High Leverage Point, and Influential Point 1. Outlier: big |ei |, extreme yi 2. High Leverage Point: extreme xi 3. Influential Point: result in different regression equations without it 4. extreme X: Cook’s distance Di = e 2 i (p+1)σb2 hii (1−hii) 2 5. extreme Y: e(i) = ei 1−hii 5 Variable Selection 1. Full Model and Selected Model 2. Criteria: R2 a , AIC, Cp 3. Forward, Backward, and Stepwise 6 of 6