16.322 Stochastic Estimation and Control, Fall 2004 Prof vander velde Lecture 7 Last time: Moments of the Poisson distribution from its generating function G(S) =e u(s-1) ds d-G r= aG u +A =+H-p Example: Using telescope to measure intensity of an object Photon flux photoelectron flux. The number of photoelectrons are poisson distributed. During an observation we cause N photoelectron emissions. Nis the measure of the signal s=N=at u=it 1(S For signal-to-noise ratio of 10, require N=100 photoelectrons. All this follows from the property that the variance is equal to the mean. This is an unbounded experiment, whereas the binomial distribution is for n number of trials 9/30/2004955AM Page 1 of 10
16.322 Stochastic Estimation and Control, Fall 2004 Prof. Vander Velde 9/30/2004 9:55 AM Page 1 of 10 Lecture 7 Last time: Moments of the Poisson distribution from its generating function. ( 1) ( 1) 2 2 ( 1) 2 1 2 2 2 1 1 2 2 22 2 2 ( ) s s s s s s Gs e dG e ds d G e ds dG X ds d G dG X ds ds X X X µ µ µ µ µ µ µ µ σ µ µµ µ − − − = = = = = = = = = + = + = − = +− = = Example: Using telescope to measure intensity of an object Photon flux Î photoelectron flux. The number of photoelectrons are Poisson distributed. During an observation we cause N photoelectron emissions. N is the measure of the signal. 2 2 1 N N N SN t t S t t t S t λ σ µλ λ λ σ λ λ σ = = = = = = ⎛ ⎞ = ⎜ ⎟ ⎝ ⎠ For signal-to-noise ratio of 10, require N = 100 photoelectrons. All this follows from the property that the variance is equal to the mean. This is an unbounded experiment, whereas the binomial distribution is for n number of trials
16.322 Stochastic Estimation and Control, Fall 2004 Prof vander velde 3. The Poisson approximation to the binomial distribution The binomial distribution, like the Poisson, is that of a random variable taking only positive integral values. Since it involves factorials, the binomial distribution is not very convenient for numerical application We shall show under what conditions the poisson expression serves as a good approximation to the binomial expression-and thus may be used for g00 convenience b(k)= p(1-p k!(n-k Consider a large number of trials, n, with small probability of success in each, p, such that the mean of the distribution, np, is of moderate magnitude. Define u= np with n large and p small P Recalling 2n 2en Stirlings formula b(k)= k!(n-k)! n 2丌n2e A2T(n-k)2e-+k n 2e k! as n becomes large relative to k ue The relative error in this approximation is of order of magnitude Rel. Error≈ 9/30/2004955AM Page 2 of 10
16.322 Stochastic Estimation and Control, Fall 2004 Prof. Vander Velde 9/30/2004 9:55 AM Page 2 of 10 3. The Poisson Approximation to the Binomial Distribution The binomial distribution, like the Poisson, is that of a random variable taking only positive integral values. Since it involves factorials, the binomial distribution is not very convenient for numerical application. We shall show under what conditions the Poisson expression serves as a good approximation to the binomial expression – and thus may be used for convenience. ( ) ! ( ) (1 ) ! ! n k nk bk p p knk − = − − Consider a large number of trials, n, with small probability of success in each, p, such that the mean of the distribution, np, is of moderate magnitude. 1 2 1 2 1 2 1 2 1 2 Define with large and small Recalling: ! ~ 2 Stirling's formula lim 1 ! () 1 !( )! 2 1 ! 2( ) ! 1 n n n n n k k k n n k k n n nk k n k n n np n p p n n ne e n n b k kn k n n n e k n nk e n n e k n µ µ µ π µ µ µ µπ µ π µ + − − →∞ − + − − + − + + − + ≡ = ⎛ ⎞ = − ⎜ ⎟ ⎝ ⎠ ⎛ ⎞ = − ⎜ ⎟ − ⎝ ⎠ ⎛ ⎞ ≈ − ⎜ ⎟ ⎝ ⎠ − = 1 2 1 as becomes large relative to ! 1 ! n k n k k k k k k n k e n e n k kee e k µ µ µ µ µ − − + − − − ⎛ ⎞ ⎜ ⎟ − ⎝ ⎠ ⎛ ⎞ ⎜ ⎟ − ⎝ ⎠ ≈ = The relative error in this approximation is of order of magnitude 2 ( ) Rel. Error ~ k n − µ
16.322 Stochastic Estimation and Control, Fall 2004 Prof vander Velde However, for values of k much smaller or larger than u, the probability becomes all sma The normal distribution Outline 1. Describe the common use of the normal distribution 2. The practical employment of the Central Limit Therorem 3. Relation to tabulated functions Normal distribution function Normal error function Complementary error function 1. Describe the common use of the normal distribution Normally distributed variables appear repeatedly in physical situations Voltage across the plate of a vacuum tub angle trac Atmospheric gust velocity Wave height in the open sea 2. The practical employment of the Central limit Therorem X(i=1, 2,, n)are independent random variables Define the sum of these Xi as ∑ s=∑X Then under the condition lim -A=0 9/30/2004955AM Page 3 of 10
16.322 Stochastic Estimation and Control, Fall 2004 Prof. Vander Velde 9/30/2004 9:55 AM Page 3 of 10 However, for values of k much smaller or larger than µ , the probability becomes small. The Normal Distribution Outline: 1. Describe the common use of the normal distribution 2. The practical employment of the Central Limit Therorem 3. Relation to tabulated functions Normal distribution function Normal error function Complementary error function 1. Describe the common use of the normal distribution Normally distributed variables appear repeatedly in physical situations. • Voltage across the plate of a vacuum tube • Radar angle tracking noise • Atmospheric gust velocity • Wave height in the open sea 2. The practical employment of the Central Limit Therorem ( 1,2,..., ) Xi i n = are independent random variables. Define the sum of these Xi as 2 1 1 2 1 i n i i n i i n S X i S X S X σ σ = = = = = = ∑ ∑ ∑ Then under the condition ( ) 3 1 3 lim 0 i i n S n X i X ii X X β σ β β β →∞ = = = = − ∑
16.322 Stochastic Estimation and Control, Fall 2004 Prof vander Velde he limiting distribution of S is the normal distribution Note that this is true for any distributions of the Xi These are sufficient conditions under which the theorem can be proved. It is not lear that they ar Notice that each of the noises mentioned earlier depend on the accumulated effect of a great many small causes e.g., voltage across plate: electrons traveling from cathode to plate It is convenient to work with the characteristic function since we are dealing with the sum of independent variables Normal probability density function f(x) /2 Normal probability distribution function 1。smt 2丌 √2z Where X This integral with the integrand normalized is tabulated. It is called the normal probability function and symbolized with p 9/30/2004955AM Page 4 of 10
16.322 Stochastic Estimation and Control, Fall 2004 Prof. Vander Velde 9/30/2004 9:55 AM Page 4 of 10 the limiting distribution of S is the normal distribution. Note that this is true for any distributions of the Xi. These are sufficient conditions under which the theorem can be proved. It is not clear that they are necessary. Notice that each of the noises mentioned earlier depend on the accumulated effect of a great many small causes e.g., voltage across plate: electrons traveling from cathode to plate. It is convenient to work with the characteristic function since we are dealing with the sum of independent variables. Normal probability density function: 2 2 ( ) 2 1 ( ) 2 x m fx e σ π − − = Normal probability distribution function: 2 2 2 ( ) 2 2 1 ( ) 2 1 2 Where: 1 x u m x m v F x e du e dv m X u m v dv du σ σ πσ πσ σ σ − − −∞ − − −∞ = = = − = = ∫ ∫ This integral with the integrand normalized is tabulated. It is called the normal probability function and symbolized with Φ
16.322 Stochastic Estimation and Control, Fall 2004 Prof vander Velde o (x) d This is a different x. Note the relationship between this and the quantity x previous defined. We use x again here as this is how a is usually written Not only this function but also its first several derivatives which appear in analytic work are tabulated 3. Relation to tabulated functions Even more generally available are the closely related functions Error function: erf(x) Complementary error function cerf(x)=2 Φ(x)=1 p(1) 1 e/m(cos(to y)+jsin(toy)e Differentiation of this form will yield correctly the first 2 moments of the distribution 9/30/2004955AM Page 5 of 10
16.322 Stochastic Estimation and Control, Fall 2004 Prof. Vander Velde 9/30/2004 9:55 AM Page 5 of 10 2 2 1 ( ) 2 x v x e dv π − −∞ Φ = ∫ This is a different x. Note the relationship between this and the quantity x previous defined. We use x again here as this is how Φ is usually written. Not only this function but also its first several derivatives which appear in analytic work are tabulated. 3. Relation to tabulated functions Even more generally available are the closely related functions: Error function: 2 0 2 ( ) x u erf x e du π − = ∫ Complementary error function: 2 2 ( ) u x cerf x e du π ∞ − = ∫ 1 () 1 2 2 x x erf ⎡ ⎤ ⎛ ⎞ Φ= + ⎢ ⎥ ⎜ ⎟ ⎣ ⎦ ⎝ ⎠ () () ( ) 2 2 2 2 2 2 2 2 2 ( ) 2 ( ) 2 2 2 2 2 1 ( ) 2 1 , where 2 1 (cos sin ) 2 2 cos 2 2 2 2 x m jtx y jt m y y jtm y jtm t jtm t jtm t e e dx x m e e dy y e t y j t y e dy e t y e dy e e e σ σ σ σ φ πσ π σ σ σ π σ π π π ∞ − − −∞ ∞ − + −∞ ∞ − −∞ ∞ − −∞ − ⎛ ⎞ ⎜ ⎟ − ⎝ ⎠ = − = = = + = = = ∫ ∫ ∫ ∫ Differentiation of this form will yield correctly the first 2 moments of the distribution
16.322 Stochastic Estimation and Control, Fall 2004 Prof vander velde Most important property of normal variables: any linear combination (weighted sum) of normal variables, whether independent or not, is another normal variable Note that for zero mean variables n2 f(x)= √2丌a p(1)=e o(t) to+f(x) Both are Gaussian forms The Normal Approximation to the Binomial Distribution The binomial distribution deals with the outcomes of n independent trials of an experiment. Thus if n is large, we should expect the binomial distribution to be well approximated by the normal distribution. The approximation is given by the normal distribution having the same mean and variance. Thus b(k, n, p znpq Relative error is of the order of (-np) (npg). The relative fit is good near the mean if npq is large and degenerates in the tails where the probability itself is small The Normal approximation to the poisson distribution Also the Poisson distribution depends on the outcomes of independent events. If 9/30/2004955AM Page 6 of 10
16.322 Stochastic Estimation and Control, Fall 2004 Prof. Vander Velde 9/30/2004 9:55 AM Page 6 of 10 Most important property of normal variables: any linear combination (weighted sum) of normal variables, whether independent or not, is another normal variable. Note that for zero mean variables 2 2 2 2 2 2 1 ( ) 2 ( ) x t fx e t e σ σ πσ φ − − = = Both are Gaussian forms. The Normal Approximation to the Binomial Distribution The binomial distribution deals with the outcomes of n independent trials of an experiment. Thus if n is large, we should expect the binomial distribution to be well approximated by the normal distribution. The approximation is given by the normal distribution having the same mean and variance. Thus 2 ( ) 1 2 (,, ) 2 k np npq bkn p e πnpq − − ≈ Relative error is of the order of 3 2 ( ) ( ) k np npq − The relative fit is good near the mean if npq is large and degenerates in the tails where the probability itself is small. The Normal Approximation to the Poisson Distribution Also the Poisson distribution depends on the outcomes of independent events. If there are enough of them
16.322 Stochastic Estimation and Control, Fall 2004 Prof vander velde P(k,u Relative error is of the order of (k-a) The relative fit is subject to the same behavior as the binomial approximation Interpretation of a continuous distribution approximating a discrete one: The value of the normal density function at any k approximates the value of the discrete distribution for that value of k. Think of spreading the area of each impulse over a unit interval. Then the height of each rectangle is the probability that the corresponding value of k will be taken. The normal curve approximate this step-wise function P(k=x) 012345 Note that in summing the probabilities for values of k in some interval, the approximating normal curve should be integrated over that interval plus 2 on each end to get all the probability associated with those values of k P(N≤X≤N2)=∑P(k) P(k,u) d Multidimensional normal Distribution Probability density function f(x)= exp[ -(x-DTM-(x-Y) 9/30/2004955AM Page 7 of 10
16.322 Stochastic Estimation and Control, Fall 2004 Prof. Vander Velde 9/30/2004 9:55 AM Page 7 of 10 2 ( ) 1 2 (, ) 2 k Pk e µ µ µ πµ − − ≈ Relative error is of the order of 3 2 ( ) k µ µ − The relative fit is subject to the same behavior as the binomial approximation. Interpretation of a continuous distribution approximating a discrete one: The value of the normal density function at any k approximates the value of the discrete distribution for that value of k. Think of spreading the area of each impulse over a unit interval. Then the height of each rectangle is the probability that the corresponding value of k will be taken. The normal curve approximates this step-wise function. Note that in summing the probabilities for values of k in some interval, the approximating normal curve should be integrated over that interval plus ½ on each end to get all the probability associated with those values of k. 2 1 1 2 ( ) () N k N PN X N Pk = ≤≤ = ∑ 2 2 2 1 1 1 ( ) 2 2 1 2 2 1 1 (, ) 2 1 1 2 2 N N x k N N P k e dx N N µ µ µ πµ µ µ µ µ + − − = − ≈ ⎛ ⎞⎛ ⎞ +− −− ⎜ ⎟⎜ ⎟ =Φ −Φ ⎝ ⎠⎝ ⎠ ∑ ∫ Multidimensional Normal Distribution Probability density function: ( ) 1 2 1 1 ( ) exp ( ) ( ) 2 2 n f x xXM xX π M ⎡ ⎤ Τ − = −− − ⎢ ⎥ ⎣ ⎦
16.322 Stochastic Estimation and Control, Fall 2004 Prof vander velde Assuming zero mean, which is often the case f(x)= exp For zero mean variables: contours of constant probability density are given by: M-x=c X Not expressed in principal coordinates if the Xi are correlated Need to know the rudimentary properties of eigenvalues and eigenvectors Mis symmetric and full rank. Mv=2y VV=0= 0,i≠ This probability density function can be better visualized in terms of its principal coordinates. These coordinates are defined by the directions of the eigenvectors of the covariance matrix. The appropriate transformation is Vx Thus yi is the component of x in the direction y (In terms of the new variable y, the contours of constant probability density are) 9/30/2004955AM Page 8 of 10
16.322 Stochastic Estimation and Control, Fall 2004 Prof. Vander Velde 9/30/2004 9:55 AM Page 8 of 10 Assuming zero mean, which is often the case: ( ) 1 2 1 1 ( ) exp 2 2 n f x xM x π M ⎡ ⎤ Τ − = −⎢ ⎥ ⎣ ⎦ For zero mean variables: contours of constant probability density are given by: T 1 2 x Mx c − = Not expressed in principal coordinates if the Xi are correlated. Need to know the rudimentary properties of eigenvalues and eigenvectors. M is symmetric and full rank. 1, 0, i ii T i j ij Mv v i j v v i j λ δ = ⎧ = = = ⎨ ⎩ ≠ This probability density function can be better visualized in terms of its principal coordinates. These coordinates are defined by the directions of the eigenvectors of the covariance matrix. The appropriate transformation is 1 T T n y Vx v V v = ⎡ ⎤ ← → ⎢ ⎥ = ⎢ ⎥ ⎢ ⎥ ⎣ ⎦ ← → M Thus yi is the component of x in the direction i v . (In terms of the new variable y , the contours of constant probability density are)
16.322 Stochastic Estimation and Control, Fall 2004 Prof vander velde V-M yr y Y=VMT Y=VMV! 1…Vn 1→ V…. y is the covariance matrix for the random variable y= vy so the a are the variance of the Yi. n n These are the principal coordinate with intercepts at y, =+ca with athe standard deviation of yi lote that two random variables, each having a normal distribution singly, do not necessarily have a binormal ioint distribution However, if the random variables are independent and normally distributed, heir joint distribution is clearly a multidimensional normal distribution 9/30/2004955AM Page 9 of 10
16.322 Stochastic Estimation and Control, Fall 2004 Prof. Vander Velde 9/30/2004 9:55 AM Page 9 of 10 1 11 1 1 11 1 1 1 1 1 1 ... ... ... T TT T T T n T n n T n xM x yV M V y yY y Y V MV Y VMV VM v v v v v v λ λ λ λ − − −− − − − −− = = = = ⎡ ⎤ ↑ ↑ ⎢ ⎥ = ⎢ ⎥ ⎢ ⎥ ↓ ↓ ⎣ ⎦ ⎡ ⎤ ←→↑ ↑ ⎡ ⎤ ⎢ ⎥⎢ ⎥ = ⎢ ⎥⎢ ⎥ ⎢ ⎥ ←→↓ ↓ ⎢ ⎥ ⎣ ⎦⎣ ⎦ ⎡ ⎤ ⎢ ⎥ = ⎢ ⎥ ⎢ ⎥ ⎣ ⎦ O Y is the covariance matrix for the random variable Y VX = , so the λi are the variance of the Yi. 1 1 1 1 n Y λ λ − ⎡ ⎤ ⎢ ⎥ ⎢ ⎥ = ⎢ ⎥ ⎢ ⎥ ⎢ ⎥ ⎢ ⎥ ⎣ ⎦ O 1 2 22 2 1 2 1 2 1 2 1 1 ... T n T n n y yc yy y yY y c λ λ λλ λ − ⎡ ⎤ ⎢ ⎥ ⎢ ⎥ ⎢ ⎥ = ⎢ ⎥ ⎢ ⎥ ⎢ ⎥ ⎣ ⎦ = + ++ = O These are the principal coordinate with intercepts at i i y c = ± λ with λi the standard deviation of yi. Note that two random variables, each having a normal distribution singly, do not necessarily have a binormal joint distribution. However, if the random variables are independent and normally distributed, their joint distribution is clearly a multidimensional normal distribution
16.322 Stochastic Estimation and Control, Fall 2004 Prof vander Velde Two dimensional case- continued. my=Xx f(x1,x2) m12x1 11m2-ml2 2(m1m2-m2) m1=6 my pna In terms of these symbols PI f(x12x2)= 八02)(2 2(1-n2) ote that if a set of random variables having the multidimensional normal distribution is uncorrelated they are independent. This is not true in general 9/30/20049:55AM Page 10 of 10
16.322 Stochastic Estimation and Control, Fall 2004 Prof. Vander Velde 9/30/2004 9:55 AM Page 10 of 10 Two dimensional case- continued: m XX ij i j = 2 2 22 1 12 1 2 11 2 212 2 2 11 22 12 11 22 12 2 11 1 2 22 2 12 12 12 12 2 1 12 2 12 1 12 2 212 2 1 2 12 1 2 ( , ) exp 2 2( ) In terms of these symbols: 2 1 ( , ) exp 2 1 x y m x m xx m x f xx mm m mm m m m m x xx x f xx π σ σ µ µ ρ σ σ ρ σ σσ σ πσ σ ρ ⎡ ⎤ − + = −⎢ ⎥ − ⎣ ⎦ − = = = = ⎛ ⎞ ⎛ ⎞⎛ ⎞ ⎛ ⎞ ⎜ ⎟ ⎜ ⎟⎜ ⎟ ⎜ − + ⎝ ⎠ ⎝ ⎠⎝ ⎠ ⎝ = − − 2 2 12 2(1 ) ρ ⎡ ⎤ ⎢ ⎥ ⎟ ⎢ ⎠ ⎥ ⎢ − ⎥ ⎢ ⎥ ⎢ ⎥ ⎣ ⎦ Note that if a set of random variables having the multidimensional normal distribution is uncorrelated, they are independent. This is not true in general