16.322 Stochastic Estimation and Control, Fall 2004 Prof vander velde Lecture 4 Last time: Left off with characteristic function 4. Prove (t)=lo (n)where X=x,+x2++Xn(X; independent Let S=X,+X,+.X, where the X; are independent d, (=EJe=Een(x+x ++) ELe ] E[e:]E[lek This is the main reason why use of the characteristic function is convenient This would also follow from the more devious reasoning of the density function for the sum of n independent random variables being the nth order convolution of the individual density functions- and the knowledge that convolution in the direct variable domain becomes multiplication in the transform domain 5. MacLaurin series expansion of p(n) Becausef'x) is non-negative and /(xdx=(or, even better, JI/(x]dx follows that JI/(x)dx=l converges so that f(ax)is Fourier transformable. Thus the characteristic function o() exists for all distributions and the inverse relation p(0)f(r)holds for all distributions. This implies that p(r) is analytic for all real values of t Then it can be expanded in a power series, which converges for all finite values d)=O)+p(0)1+,p2(0)2+…+-p(0)+ 0(1)=f(x)edx,c(0)=1 Page 1 of 6
16.322 Stochastic Estimation and Control, Fall 2004 Prof. Vander Velde Lecture 4 Last time: Left off with characteristic function. 4. Prove φ () = Πφ i () where X = X1 + X 2 + ... + X (Xi t t independent) x x n Let S X = 1 + X + ...X where the Xi are independent. 2 n ... t ⎤ jtS ⎤ = E ⎡e jt X 1 +X 2 + +Xn ) φ () = E ⎣ ⎡e ⎦ ⎣ ( ⎣ s ⎦ jtX1 ⎤ Ee jtX jtXn ⎡ ⎡ ⎤ ⎣ ⎦ = Ee ⎦ ⎣ ⎡ 2 ⎤ ⎦...Ee = ∏φ ( )t Xi This is the main reason why use of the characteristic function is convenient. This would also follow from the more devious reasoning of the density function for the sum of n independent random variables being the nth order convolution of the individual density functions – and the knowledge that convolution in the direct variable domain becomes multiplication in the transform domain. 5. MacLaurin series expansion of φ( )t ∞ ∞ Because f(x) is non-negative and f ( ) x dx = 1 (or, even better, f ( ) x dx = 1), it ∫ ∫ −∞ −∞ ∞ follows that ∫ f ( ) x dx = 1converges so that f(x) is Fourier transformable. Thus −∞ the characteristic function φ( )t exists for all distributions and the inverse relation φ() t → f (x) holds for all distributions. This implies that φ( )t is analytic for all real values of t. Then it can be expanded in a power series, which converges for all finite values of t. 1 2 0 2 n n φ(t) = φ(0) +φ 0 0 ( ) ( )t + 1 φ ( ) ( )t + ... + 1 φ ( ) ( )t + ... 2! n! ∞ jtx φ() t = ) dx , ∫ f (x e φ(0) = 1 −∞ Page 1 of 6
16.322 Stochastic Estimation and Control, Fall 2004 Prof. VanderⅤelde d P(0 dr"=j/()("/ dx p(0)=∫x(x)dx=X ()=1++2()X2+…+()x+ The coefficients of the expansion are given by the moments of the distribution Thus the characteristic function can be determined from the moments Similarly the moments can be determined from the characteristic function ly by 1d"p() dt or by expanding o(r)into its power series in some other way and identifying the coefficients of the various powers of t. The Generating Function The generating function has its most useful application to random variables which take integer values only. Examples of such would be the number of telephone calls into a switchboard in a certain time interval, the number of cars entering a toll station in a certain time interval, the number of times a 7 is thrown in n tosses of 2 dice, etc For integer-valued random variables, the Generating Function yields the same advantages as the Characteristic Function and is of simpler form. Consider a random variable which takes the integer values k P(X=k=p (k=0,1,2…) For a discrete distribution you can sum in lieu of integration. The Characteristic Function for this random variable is 0)=E[e]∑e"p Pk If we define a new variable s=e/,we have G(s)=∑P2s Page 2 of 6
1 16.322 Stochastic Estimation and Control, Fall 2004 Prof. Vander Velde d φ t n jtx fx n ( ) = ∞ ∫ ( )( jx) e dx n dt −∞ ∞ n n n n 0 ∫ φ ) ( ) ( ) = j xn f (x dx = j X −∞ n n n ( ) X t + ... + 1 φ(t) = + jXt + ( ) X t + ... 1 j 2 2 2 j 2! n! The coefficients of the expansion are given by the moments of the distribution. Thus the characteristic function can be determined from the moments. Similarly, the moments can be determined from the characteristic function directly by n n 1 d φ(t) X = n n j dt t=0 or by expanding φ( )t into its power series in some other way and identifying the coefficients of the various powers of t. The Generating Function The generating function has its most useful application to random variables which take integer values only. Examples of such would be the number of telephone calls into a switchboard in a certain time interval, the number of cars entering a toll station in a certain time interval, the number of times a 7 is thrown in n tosses of 2 dice, etc. For integer-valued random variables, the Generating Function yields the same advantages as the Characteristic Function and is of simpler form. Consider a random variable which takes the integer values k: PX ( = k ) = p (k=0,1,2,…) k For a discrete distribution you can sum in lieu of integration. The Characteristic Function for this random variable is ∞ jtk ( ) = E ⎡e p jtX φ t ⎤ = ∑e ⎣ ⎦ k k =0 ∞ = ∑ p e jt k k =0 k ( ) If we define a new variable s e jt = , we have ∞ Gs k k ( ) = ∑ p s k =0 Page 2 of 6
16.322 Stochastic Estimation and Control, Fall 2004 Prof vander velde which is called the Generating Function. It has all the interesting properties of the characteristic function. Note that [>0 corresponds to s>1 Let s establish the connection between moments of a distribution and the =2 d-g k(k-Dp Prs kp Just calculate ds andd and reorganize them in terms of X and X =∑=X,←1" moment expression d2G dG ∈2- moment expression Each moment is a linear combination of its order derivative and lower order derivatives. The generating function for the sum of independent integer-valued variables is the product of their generating functions. This is harder to prove han the same property of the characteristic function, but it does, in fact, hold Multiple randon variables Characterizing a joint set of random variables, define a probability distribution function F(x)=P(X1≤x12X2≤x2,Xn≤xn) This is called the joint probability distribution function Properties: If any of the arguments xi goes to -oo, then F(x)>0 lim F(x=0 Page 3 of 6
16.322 Stochastic Estimation and Control, Fall 2004 Prof. Vander Velde which is called the Generating Function. It has all the interesting properties of the characteristic function. Note that t → 0 corresponds to s →1. Let’s establish the connection between moments of a distribution and the generating function: ∞ dG = ∑kp sk −1 ds k =0 k 2 ∞ ( dG k −2 = ∑kk −1) p s k ds2 k =0 ∞ ∞ 2 k −2 k −2 = ∑k p s −∑kp s k k k=0 k=0 2 dG d G Just calculate and and reorganize them in terms of X and X 2 : ds ds2 s=1 s=1 ∞ dG st = ∑kp = X , ← 1 moment expression k ds s=1 k =0 2 ∞ ∞ dG 2 = ∑kp −∑kp k k ds2 k =0 s=1 k =0 2 X 2 dG dG = + ← 2nd moment expression ds2 ds s=1 s=1 Each moment is a linear combination of its order derivative and lower order derivatives. The generating function for the sum of independent integer-valued variables is the product of their generating functions. This is harder to prove than the same property of the characteristic function, but it does, in fact, hold true. Multiple Random Variables Characterizing a joint set of random variables, define a probability distribution function F( ( x) = P X ≤ x , X ≤ x2 ,..., X ≤ x ) 1 1 2 n n This is called the joint probability distribution function. Properties: If any of the arguments xi goes to −∞ , then F() x → 0 . lim F(x) = 0 any xi→−∞ Page 3 of 6
16322 Stochastic Estimation and Control. Fall 2004 Prof vander velde If all of the xi go to oo, then F(x)>1 lim FO F(x)is monotonically non-decreasing in each x, Define joint density function by differentiation f(x) f(x)≥0,x F(x…x)=」dn…Jmnf1=,(a1-n) Setting each x→∞, du F,(x1…,x)=P(1≤x,…,Xn≤x) =P(X1≤x,…,Xk≤xk,Xk1≤∞,…,Xn≤∞) Page 4 of 6
16.322 Stochastic Estimation and Control, Fall 2004 Prof. Vander Velde If all of the xi go to ∞, then F( ) x →1. lim F(x) = 1 all xi→∞ Fx ( )is monotonically non-decreasing in each xi. Define joint density function by differentiation: n f ( ) x = ∂ x x2 ∂ ∂ ...∂x 1 n f ( ) x ≥ ∀0, x xn x1 Fx1... xn ( x1...x ) = ∫ du1... ∫ du f (u ...u ) n n x1... x 1 n n −∞ −∞ Setting each x → ∞ , i ∞ ∞ du1... du F (u ,..., u ) = 1 ∫ ∫ n u1 ,..., u 1 n n −∞ −∞ F ( x1,..., xk ( x1,..., xk ) = P X 1 ≤ x1,..., Xn ≤ xn ) = P X( 1 ≤ x ,..., Xk ≤ x Xk+1 ≤ ∞,..., X ≤ ∞) 1 k , n = Fx1,..., x ( x1,..., x , ∞,..., ∞) k n Page 4 of 6
16.322 Stochastic Estimation and Control, Fall 2004 Prof. VanderⅤelde For the density function F F ax, ax,.xk nfx-x(4…,n) Rk,uk Marginal density If you integrate above over all variables but one, it is referred to as the marginal ∫d,f,-(x,,x,) n-I terms: all except r, Mutually independent sets of random variable Definition of independence P[X∈s,X2∈S2,]=P[X∈s]P[X2∈s2] for any sets The product rule holds for joint probability distribution and density functions for independent random variables F12x(x1,x2,x3…)=F2(x)F2(x2)F(x2) )=f2(x)2(x2)2(x2) Expectations E[8(x)]=dx ax,g(x)/(x) Page 5 of 6
16.322 Stochastic Estimation and Control, Fall 2004 Prof. Vander Velde For the density function: ∂k f x1 ,..., xk ( x1,..., x ) = x x2...∂xk Fx1 ,..., x ( x1,..., xk ) k k ∂ ∂1 ∂k = x x2...∂xk Fx1,..., x ( x1,..., xk , ∞,..., ∞) n ∂ ∂1 x1 xk ∞ ∞ ∂k du1... ∫ duk ∫ duk+1 = ... du f (u ,..., u ) ∂ ∂ ∫ ∫ n x 1,..., x 1 n n x x2...∂x 1 k −∞ −∞ −∞ −∞ ∞ ∞ ∫ n x 1,...,x ( x1,..., x uk+1 du ,..., u ) k+1... du f k , = ∫ n n −∞ −∞ ∞ ∞ = ∫ duk+1... ∫ du f ( x1,..., x ) n x 1,..., x n n −∞ −∞ Marginal density If you integrate above over all variables but one, it is referred to as the marginal density. ∞ ∞ f xi ( xi) = dx1... dx f ,..., x ( x ,..., x ) ∫ ∫ n x1 n 1 n −∞ 144244−∞ 3 n-1 terms: all except xi Mutually independent sets of random variables Definition of independence: P X1 ∈ s X ∈ s2 ,... ] = PX ∈ s P X [ 2 ∈ s ]... 1, 2 [ 1 1 ] [ 2 for any sets s1, s2, … The product rule holds for joint probability distribution and density functions for independent random variables. F , ( x x x ,...) = F x F x F x 2 )... x x x , 3 ,... 1, 2 , 3 x1 ( 1) x2 ( 2 ) x ( 12 2 f ( x x x ,...) = f ( x f x2 ( x f ( x )... x x x , , 3 ,... 1, 2 , 3 x1 1) 2 ) x2 2 1 2 Expectations ∞ ∞ Egx [ ( )] = dx ( ) ( ) 1... dx g x f x ∫ ∫ n −∞ −∞ Page 5 of 6
16.322 Stochastic Estimation and Control, Fall 2004 Prof. VanderⅤelde For the sum of multiple random variables E[Xx+X2+…+X=」么∫(x+x+…+x),、(x,…x) dx X x1(x)+∫xf2(x2)2+…+∫xn/(xn) =E[X]+E[x2]+…+E[X This relation is true whether or not the xi are independent For the product of multiple independent random variables ELX,Y.X dxn(xx2…xn)x(x1…,xn) d1-d(xx2-x)/n(x)/2(x)-f,( 「x1(x)x1(x)2x(x) EYEY]E[X, Page 6 of 6
16.322 Stochastic Estimation and Control, Fall 2004 Prof. Vander Velde For the sum of multiple random variables: ∞ ∞ E X 1 + X + ... + Xn ] = ∫ dx1... dx x 1 + x + ... + x f x1,..., x [ ( x ,..., x ) 2 ∫ n ( 2 n ) n 1 n −∞ −∞ ∞ ∞ ∞ ∞ dx1... dx x f 1 n ∫ ∫ n n x 1,..., x ( x1,..., xn ) n 1 x1 ,..., x ( x ,..., x ) + ... + dx1 = ... dx x f ∫ ∫ n n −∞ −∞ −∞ −∞ ∞ ∞ ∞ = ∫ x f x dx1 + xf ( x dx + ... + xf ( x )dx 1 x1 ( 1) ∫ 2 x2 2 ) 2 ∫ n xn n n −∞ −∞ −∞ = E X[ ]1 + E X[ 2 ] + ... + E X[ n ] This relation is true whether or not the xi are independent. For the product of multiple independent random variables: ∞ ∞ E X X 2... Xn ] = ∫ dx1... dx x x2... x f x1 ,..., x [ ( x ,..., x ) 1 ∫ n ( 1 n ) n 1 n −∞ −∞ ∞ ∞ dx1... dx x1 x2 = ... x f x f ( x )... f ( x ) ∫ ∫ n ( n ) x1 ( 1) x2 2 xn n −∞ −∞ ∞ ∞ ∞ x f x dx 1 x f ( x dx2 = ... x f ( x dx ∫ 1 x1 ( 1) ∫ 2 x2 2 ) ∫ n xn n ) n −∞ −∞ −∞ = E X E X [ ][ 1 2 ]... E X[ n ] Page 6 of 6