near model ING SHEN SSE TONGJIUNIVERSITY SEP.2016
Linear Model Y I NG SH EN SSE, TO NG JI UNI VERSITY SEP. 2 0 1 6
The basic form of the linear model Given a samplex=(i,x2,,x d)' with d attributes The linear model tries to a learn a prediction function using a linear combination of all attributes, i e f(r)=Wix,+W2x2+.+Waxd tb The vector form of the function is f(=w'x+ b Where w=(W1,W2,…,wd Once w and d have been learned from samples, f will be determined For example 好瓜=02*x色泽+0.5*x根蒂+0.3*x敲声+ 2/3/2021 PATTERN RECOGNITION
The basic form of the linear model Given a sample 𝒙 = 𝑥1 , 𝑥2 , … , 𝑥𝑑 𝑇with d attributes The linear model tries to a learn a prediction function using a linear combination of all attributes, i.e. 𝑓 𝒙 = 𝑤1𝑥1 + 𝑤2𝑥2 + ⋯ + 𝑤𝑑𝑥𝑑 + 𝑏 The vector form of the function is 𝑓 𝒙 = 𝒘𝑇𝒙 + 𝑏 where 𝒘 = 𝑤1 , 𝑤2 , … , 𝑤𝑑 𝑇 Once w and d have been learned from samples, f will be determined. For example 2/3/2021 PATTERN RECOGNITION 2 𝑓好瓜 = 0.2 ∗ 𝑥色泽 + 0.5 ∗ 𝑥根蒂 + 0.3 ∗ 𝑥敲声 + 1
near regression Given a dataset=(1, yi,(x2,y2),,(mmbx (i1, xi2,u, Mid), the task of a linear regression is to learn a linear model which can predict a value for a new samplex' that close to its true value y When d=1, i=xi 400}0 Hours Spent Studying 1014 4 1222 Math SAT 390580650730410530600790350400590 Sc core 2/3/2021 PATTERN RECOGNITION
Linear regression Given a dataset 𝐷 = 𝒙1 , 𝑦1 , 𝒙2 , 𝑦2 , … , 𝒙𝑚, 𝑦𝑚 ; 𝒙𝑖 = 𝑥𝑖1, 𝑥𝑖2, … , 𝑥𝑖𝑑 𝑇 , the task of a linear regression is to learn a linear model which can predict a value for a new sample x' that close to its true value y'. When 𝑑 = 1, 𝒙𝑖 = 𝑥𝑖 2/3/2021 PATTERN RECOGNITION 3 Hours Spent Studying 4 9 10 14 4 7 12 22 1 3 8 Math SAT Score 390 580 650 730 410 530 600 790 350 400 590
near regression We will learn a linear regression model f(i=wxi+b, such that f(xi=y How do we determine w and 6? >600 2/3/2021 PATTERN RECOGNITION
Linear regression We will learn a linear regression model 𝑓 𝑥𝑖 = 𝑤𝑥𝑖 + 𝑏, 𝑠𝑢𝑐ℎ 𝑡ℎ𝑎𝑡 𝑓(𝑥𝑖 ) ≅ 𝑦𝑖 How do we determine w and b? 2/3/2021 PATTERN RECOGNITION 4
near regression Mean squared error(Mse) is a commonly used performance measure m MSE We want to minimize MSe between f(xi)and y (w*, b*)=arg min)((xi-yi (w,b) m = arg min>(i-wxi-b (w,b) 1 2/3/2021 PATTERN RECOGNITION
Linear regression Mean squared error (MSE) is a commonly used performance measure: We want to minimize MSE between f(xi ) and yi : 2/3/2021 PATTERN RECOGNITION 5 𝑀𝑆𝐸 = 1 𝑚 𝑖=1 𝑚 𝑦𝑖 ′ − 𝑦𝑖 2 𝑤 ∗ , 𝑏 ∗ = arg min (𝑤,𝑏) 𝑖=1 𝑚 𝑓 𝑥𝑖 − 𝑦𝑖 2 = arg min (𝑤,𝑏) 𝑖=1 𝑚 𝑦𝑖 − 𝑤𝑥𝑖 − 𝑏 2
near regression The method of determining the fitting model based on msE is called the least square method In linear regression problem the least square method aims to find a line such that the sum of distances of all the samples to it is the smallest >600 2/3/2021 PATTERN RECOGNITION
Linear regression The method of determining the fitting model based on MSE is called the least square method In linear regression problem, the least square method aims to find a line such that the sum of distances of all the samples to it is the smallest. 2/3/2021 PATTERN RECOGNITION 6
Pre-requisite A stationary point of a differentiable function of one variable is a point of the domain of the function where the derivative is zero Single-variable function f(x)is differentiable in(a, b). At xo 0 x Ix Two-variables function f(x, y) is differentiable in its domain. At(xo, yo), dx 0, 00 yoyo 2/3/2021 PATTERN RECOGNITION
Pre-requisite A stationary point of a differentiable function of one variable is a point of the domain of the function where the derivative is zero Single-variable function: f(x) is differentiable in (a, b). At x0 , Two-variables function: f(x, y) is differentiable in its domain. At (x0 , y0 ), 2/3/2021 PATTERN RECOGNITION 7 𝑑𝑓 𝑑𝑥 ቚ 𝑥0 = 0 𝑑𝑓 𝑑𝑥 ቚ 𝑥0,𝑦0 = 0, 𝑑𝑓 𝑑𝑦 ቚ 𝑥0,𝑦0 = 0
Pre-requisite In general case, if xo is a stationary point of f(x),x E RiXI df df\=0,…axn df axiNo dx2xo Proposition Let f be a differentiable function of n variables defined on the convex set s, and let xo be in the interior of s. if f is convex then xo is a global minimizer of fin S if and only if it is a stationary point of f( df e·dx kxo =0 for i= l 9() Convex Concave 2/3/2021 PATTERN RECOGNITION
Pre-requisite In general case, if x0 is a stationary point of f(x), 𝒙 ∈ ℝ 𝑛×1 Proposition: Let f be a differentiable function of n variables defined on the convex set S, and let x0 be in the interior of S. If f is convex then x0 is a global minimizer of f in S if and only if it is a stationary point of f (i.e. 𝑑𝑓 𝑑𝑥𝑖 ȁ𝒙0 = 0 for i = 1, ..., n). 2/3/2021 PATTERN RECOGNITION 8 𝑑𝑓 𝑑𝑥1 ቚ 𝒙0 = 0, 𝑑𝑓 𝑑𝑥2 ቚ 𝒙0 = 0,… , 𝑑𝑓 𝑑𝑥𝑛 ቚ 𝒙0 = 0
Parameter estimation Function Ewb=Xiz1i-wxi-b) is a convex function The extremum can be achieved at the stationary point, i.e de ae dE 0w=2(u Ci -)xi, L=1 aE ab 2(mb ∑ Ci -wxD) 2/3/2021 PATTERN RECOGNITION 9
Parameter estimation Function 𝐸𝑤,𝑏 = σ𝑖=1 𝑚 𝑦𝑖 − 𝑤𝑥𝑖 − 𝑏 2 is a convex function The extremum can be achieved at the stationary point, i.e. 2/3/2021 PATTERN RECOGNITION 9 𝜕𝐸 𝜕𝑤 = 0, 𝜕𝐸 𝜕𝑏 = 0 𝜕𝐸 𝜕𝑤 = 2(𝑤 𝑖=1 𝑚 𝑥𝑖 2 − 𝑖=1 𝑚 𝑦𝑖 − 𝑏 𝑥𝑖 ), 𝜕𝐸 𝜕𝑏 = 2(𝑚𝑏 − 𝑖=1 𝑚 𝑦𝑖 − 𝑤𝑥𝑖 )
Parameter estimation Solve the equations and we can have closed-form expression of w and b m ∑i1y(x1-x) W 2,b=m>Ci-wxi=y-w ∑ =1 (∑z1x) Where x=2∑m1x1,y=2∑m1y2 is the mean of x and y 2/3/2021 PATTERN RECOGNITION
Parameter estimation Solve the equations and we can have closed-form expression of w and b Where 𝑥ҧ= 1 𝑚 σ𝑖=1 𝑚 𝑥𝑖 , 𝑦ത = 1 𝑚 σ𝑖=1 𝑚 𝑦𝑖 is the mean of x and y 2/3/2021 PATTERN RECOGNITION 10 𝑤 = σ𝑖=1 𝑚 𝑦𝑖 (𝑥𝑖 − 𝑥ҧ) σ𝑖=1 𝑚 𝑥𝑖 2 − 1 𝑚 σ𝑖=1 𝑚 𝑥𝑖 2 , 𝑏 = 1 𝑚 𝑖=1 𝑚 (𝑦𝑖 − 𝑤𝑥𝑖 ) = 𝑦ത − 𝑤𝑥ҧ