20 CHAPTER 2.ENTROPY AND MUTUAL INFORMATION From the symmetry,it follows that IY;2)≥I(x:Y) This theore m demonstrates that no processing of Z,deterministic or random,can increase the information that Z contains about X. Corollary 2.4.3.In particular,if Y=f(Z),we have I(X:Z)2I(X:f(Z)). Poof.X→Z→fZ)forms a Markov Chain. ◇ Corollary2.4.4.fX→Z→Y,then I(X:ZY)≤I(X;Z). Proof.Using the fact I(X:Y)20,the corollary follows immediately from(2.22). Expressing the data processing inequality in terms of entropies,we have H(x)-H(Z)H()-H(XY) →1(DX2)≤(XY (2.23 of a oh y vields the intu- itively satisfying result that this uncertainty or equivocation can never decrease as we go further from the input on a sequence of cascaded channels. ac0=(份)-1a Note that Pylx(01)=1 so that H(YIX=1)=0 Similarly,Pylx(00)=,so that 0YX=0=()=Ib Thus,H(Y1K)=是×1=是bt H(Y)=P(,)H(==) =0)+0)+四 = and H(XYZ)=H(X)+H(YX)+H(ZXY)=1+=2 bits. Because Py(1)=Py(0)=,we have H0m=压(份)=0s1 we see that H(YX)=(Y).However,H(YX =0)=1>H(Y). Furthermore,I(X;Y)=H(Y)-H(YX)=0.811-0.5=0.311 bits In words.the first c omponent of the random vector X.Y,Z]gives 0.311 bits of informa- tion about the 2nd component,and vice versa. 20 CHAPTER 2. ENTROPY AND MUTUAL INFORMATION From the symmetry, it follows that I(Y ;Z) ≥ I(X; Y ) This theorem demonstrates that no processing of Z, deterministic or random, can increase the information that Z contains about X. Corollary 2.4.3. In particular, if Y = f(Z), we have I(X;Z) ≥ I(X; f(Z)). Proof. X → Z → f(Z) forms a Markov Chain. Corollary 2.4.4. If X → Z → Y , then I(X;Z|Y ) ≤ I(X;Z). Proof. Using the fact I(X; Y ) ≥ 0, the corollary follows immediately from (2.22). Expressing the data processing inequality in terms of entropies, we have H(X) − H(X|Z) ≥ H(X) − H(X|Y ) ⇒ H(X|Z) ≤ H(X|Y ) (2.23) The average uncertainty H(X|Z) about the input of a channel given the output is called the equivocation on the channel, and thus the above inequality yields the intuitively satisfying result that this uncertainty or equivocation can never decrease as we go further from the input on a sequence of cascaded channels. Example 2.4.1. Suppose that the random vector [X, Y, Z] is equally likely to take on values in {[000], [010], [100], [101]}. Then H(X) = H2 ( 1 2 ) = 1bit Note that PY |X(0|1) = 1 so that H(Y |X = 1) = 0 Similarly, PY |X(0|0) = 1 2 , so that H(Y |X = 0) = H2 ( 1 2 ) = 1bit Thus, H(Y |X) = 1 2 × 1 = 1 2 bit. H(Z|XY ) = ∑ x,y P(x, y)H(Z|X = x, Y = y) = 1 4 (0) + 1 4 (0) + 1 2 (1) = 1 2 bit and H(XY Z) = H(X) + H(Y |X) + H(Z|XY ) = 1 + 1 2 + 1 2 = 2 bits. Because PY (1) = 1 4 , PY (0) = 3 4 , we have H(Y ) = H2 ( 1 4 ) = 0.811 bits we see that H(Y |X) = 1 2 < H(Y ). However, H(Y |X = 0) = 1 > H(Y ). Furthermore, I(X; Y ) = H(Y ) − H(Y |X) = 0.811 − 0.5 = 0.311 bits In words, the first component of the random vector [X, Y, Z] gives 0.311 bits of information about the 2nd component, and vice versa