正在加载图片...
12 CHAPTER 2.ENTROPY AND MUTUAL INFORMATION We now generalize the above theorem to a more general case. Theorem 2.2.3 (Chain rule for entropy).Let X1,X2.XN be discrete random vari- ables drawn according to P(1,z2.,N).Then Proof. HX1,X2,.,XN)=E-logP(X1,X2,·,XN川 =E-log IP(XnlX1,.Xn-1) (2.8) -含n =∑H(X.lX:,.,Xn-i) n=1 ◇ If X are independent of each other,then H(X1:X2.XN)=>H(Xn). (2.9) n=1 Similarly,we have H(K1,X2,.,XwY)=H(XnlX1,.,Xn-1,Y) 2.3 Properties of the entropy function Let X be a discrete random variable with alphabet={k,=1,2,. the pmf of by三Prx全e花Then the entropy fcam2g written as where p=(p1.p2,.PK)is a K-vector of probabilities. Property 2.3.1.H(p)>0 (Non-negativity of entropy) Poof.Since0≤pw≤l,we have logp≤0.Hence,.H(p)≥0. ◇ Definition 2.3.1.A function f()is said to be converU over an interval (a,b)if for every x1,x2∈(a,b)and0≤入≤l, f(A1+(1-)x2)≤f(x)+(1-)fx2) A function f is said to be strictly conve if equality holds only=1. 12 CHAPTER 2. ENTROPY AND MUTUAL INFORMATION We now generalize the above theorem to a more general case. Theorem 2.2.3 (Chain rule for entropy). Let X1, X2, . . . , XN be discrete random vari￾ables drawn according to P(x1, x2, . . . , xN ). Then H(X1, X2, . . . , XN ) = ∑ N n=1 H(Xn|X1, X2, . . . , Xn−1) Proof. H(X1, X2, · · · , XN ) = E[− log P(X1, X2, · · · , XN )] = E [ − log ∏ N n=1 P(Xn|X1, · · · , Xn−1) ] (2.8) = ∑ N n=1 E[− log P(Xn|X1, · · · , Xn−1)] = ∑ N n=1 H(Xn|X1, · · · , Xn−1) If Xn are independent of each other, then H(X1, X2, · · · , XN ) = ∑ N n=1 H(Xn). (2.9) Similarly, we have H(X1, X2, · · · , XN |Y ) = ∑ N n=1 H(Xn|X1, · · · , Xn−1, Y ). 2.3 Properties of the entropy function Let X be a discrete random variable with alphabet X = {xk, k = 1, 2, · · · , K}. Denote the pmf of X by pk = P r(X = xk), xk ∈ X . Then the entropy H(X) of X can be written as H(p) = − ∑ K k=1 pk log pk where p = (p1, p2, · · · , pK) is a K-vector of probabilities. Property 2.3.1. H(p) ≥ 0 (Non-negativity of entropy) Proof. Since 0 ≤ pk ≤ 1, we have log pk ≤ 0. Hence, H(p) ≥ 0. Definition 2.3.1. A function f(x) is said to be convex-∪ over an interval (a, b) if for every x1, x2 ∈ (a, b) and 0 ≤ λ ≤ 1, f(λx1 + (1 − λ)x2) ≤ λf(x1) + (1 − λ)f(x2) A function f is said to be strictly convex-∪ if equality holds only when λ = 0 or λ = 1
<<向上翻页向下翻页>>
©2008-现在 cucdc.com 高等教育资讯网 版权所有