Statistical Learning Theory and Applications Lecture 9 Data Representation-Non-Parametric Model Instructor:Quan Wen SCSE@UESTC Fall,2021
Statistical Learning Theory and Applications Lecture 9 Data Representation — Non-Parametric Model Instructor: Quan Wen SCSE@UESTC Fall, 2021
Outline (Level 1) ①Density Estimation 2 Histogram Method 3 Parzen Window 4 k-nearest neighbors 5Summary of nonparametric models 1/45
Outline (Level 1) 1 Density Estimation 2 Histogram Method 3 Parzen Window 4 k-nearest neighbors 5 Summary of nonparametric models 1 / 45
Outline (Level 1) ○Density Estimation ②Histogram Method Parzen Window k-nearest neighbors Summary of nonparametric models 2/45
Outline (Level 1) 1 Density Estimation 2 Histogram Method 3 Parzen Window 4 k-nearest neighbors 5 Summary of nonparametric models 2 / 45
1.Density Estimation Basic concepts 1 Density estimation:estimating the probability density function p(x)based on a given set of training samples D={x1,x2,...,xN}. 2 Estimated density:denoted by p(x). 3 Training samples are i.i.d.and distributed according to p(x). 4 Parametric estimation:parameter vector 0 ofp(x;0) 5 Non-parametric estimation:a function p:X->R 6 Finite number of training samples meaning that there will be some errors in the function(density)estimation. 3/45
1. Density Estimation Basic concepts 1 Density estimation: estimating the probability density function p(x) based on a given set of training samples D = {x1, x2, ..., xN}. 2 Estimated density: denoted by pˆ(x). 3 Training samples are i.i.d. and distributed according to p(x). 4 Parametric estimation: parameter vector θ of p(x; θ) 5 Non-parametric estimation: a function p : X → R 6 Finite number of training samples meaning that there will be some errors in the function (density) estimation. 3 / 45
The parametric model probability estimation has a known global distribution form,i.e.function form. But in fact,we know nothing about distribution. The non-parametric model can be applied to any case of probability distribution without assuming that the form of probability distribution is known. 4/45
▶ The parametric model probability estimation has a known global distribution form, i.e. function form. ▶ But in fact, we know nothing about distribution. ▶ The non-parametric model can be applied to any case of probability distribution without assuming that the form of probability distribution is known. 4 / 45
Outline (Level 2) ○Density Estimation o Probability of a Region R Probability Density of a Region R 5145
Outline (Level 2) 1 Density Estimation Probability of a Region R Probability Density of a Region R 5 / 45
1.1.Probability of a Region R Assuming n samplesx1,x2,...,x,sampled from the probability density distribution p(x),then the probability P ofa vectorx in a region R is: P-p)dx. Then,for n samples,the probability of k samples in region R is determined by binomial distribution. P=()P-P From expectation and variance of the random variable k: Ek=nP Var(k)=nP(1-P) It is not easy to calculate Ek]=okPx and Var]=(k-Ek])2P directly 6/45
1.1. Probability of a Region R ▶ Assuming n samples x1, x2, ..., xn, sampled from the probability density distribution p(x), then the probability P of a vector x in a region R is: P = Z R p(x ′ )dx′ . ▶ Then, for n samples, the probability of k samples in region R is determined by binomial distribution. Pk = n k P k (1 − P) n−k . ▶ From expectation and variance of the random variable k: E[k] = nP Var(k) = nP(1 − P) ▶ It is not easy to calculate E[k] = Pn k=0 kPk and Var[k] = Pn k=0(k − E[k])2Pk directly 6 / 45
Binomial Distribution Deduction of Ek and Var(k Binomial distribution mean deduction:From the equation k=X1+…+Xm where all XiE {0,1}are Bernoulli distributed random variables with(Xi=1)'s probability equals to P,then:E[Xil=P We get: E[W=E[X+…+X=EX]+…+EX=P+·+卫=nP n times Binomial distribution variance deduction:All Xi are independently Bernoulli distributed random variables.Since Var(Xi)=P(1-P),we get: Var(k=ar(X+…+Xn)=Var(X)+…+Var(Xn) nVar(X1)=nP(1-P) 7/45
Binomial Distribution Deduction of E[k] and Var(k) ▶ Binomial distribution mean deduction: From the equation k = X1 + · · · + Xn where all Xi ∈ {0, 1} are Bernoulli distributed random variables with (Xi = 1)’s probability equals to P, then: E[Xi ] = P ▶ We get: E[k] = E[X1 + · · · + Xn] = E[X1] + · · · + E[Xn] = P + · · · + P | {z } n times = nP ▶ Binomial distribution variance deduction: All Xi are independently Bernoulli distributed random variables. Since Var(Xi) = P(1 − P), we get: Var(k) = Var(X1 + · · · + Xn) = Var(X1) + · · · + Var(Xn) = nVar(X1) = nP(1 − P) 7 / 45
P 100 6 50 20 k/n P=.7 We have the expectations:PvarvarP(P)/n When n is very large,k/n has a sharp distribution at the mean P.Therefore: k P≈ n 8/45
▶ We have the expectations: E[ k n ] = E[k]/n = P var[ k n ] = var[k]/n 2 = P(1−P)/n ▶ When n is very large, k/n has a sharp distribution at the mean P. Therefore: P ≈ k n 8 / 45
Outline (Level 2) ○Density Estimation o Probability of a Region R Probability Density of a Region R 9145
Outline (Level 2) 1 Density Estimation Probability of a Region R Probability Density of a Region R 9 / 45