Principles of Information Science Chapter 3 Measures of information
Principles of Information Science Chapter 3 Measures of Information
3-1 Measures of Random Syntactic Information Shannon Theory of Information Key point: 1. Information is something that can be used to remove uncertainty 2. The amount of information can then be measured by the amount of uncertainty it removed. 3. In the cases of communications only waveform is concerned while meaning and value are ignored 4. Uncertainty and thus information are statistic in nature and statistical mathematics is enough
3-1 Measures of Random Syntactic Information Shannon Theory of Information 2. The amount of information can then be measured by the amount of uncertainty it removed. 1. Information is something that can be used to remove uncertainty. Key point: 3. In the cases of communications, only waveform is concerned while meaning and value are ignored. 4. Uncertainty and thus information are statistic in nature and statistical mathematics is enough
Shannon Theorem of Random entropy The measure of uncertainty, will take the form of H(p,…,p,)=-k n=1 R log R if the conditions below are satisfied (1)Hs should be a continuous function of p for all n (2)Hs should be a monotonically increasing function ofn when p=I/N for all n 3) Hs should observe the rule of stepped weighting summation
Shannon Theorem of Random Entropy The measure of uncertainty, will take the form of H (p , …, p ) = - k p log p n n=1 N n S 1 N (1) H should be a continuous function of p , for all n; S n if the conditions below are satisfied: (2) H should be a monotonically increasing function of N when p = 1/N for all n; S n (3) H should observe the rule of stepped weighting summation. S
The rule of stepped weighting summation: 1/2 P 1/2 1/2 1/3 2p=1/3 l/22/3X2 1/6 P=1/6 1/3 3 H(u2,13,16)=Hs(1/2,12)+12H(2/3,13)
The rule of stepped weighting summation: 1/2 1/3 1/6 x x x 1 2 3 1/2 1/2 2/3 1/3 x x x 1 2 3 p = 1/2 p = 1/3 p = 1/6 1 2 3 H (1/2, 1/3, 1/6) = H (1/2, 1/2) + 1/2 H (2/3, 1/3) S S S
Proof: (a) In the case of equal probabilities Let H(N,…1N)=A(N) By use condition 3), it is then easy to have A(MN=HS(IMN,., IMN Hs(1M,…,1M)+,,(1M)(1N,…,1N) Then A(M+AN a(N2=2A(N), A(S)=CA(S), A(t3=BA(t) For any given B, it is always possible to find a proper a such that a+1
Proof: (a) In the case of equal probabilities S S H (1/N, …, 1/N) = A(N) By use condition (3), it is then easy to have A(MN) = H (1/MN, …, 1/MN) S S M i=1 = A(M) + A(N) Then A(N ) = 2 A(N), A(S ) = A(S), A(t ) = A(t) 2 a a b b Let = H (1/M, …, 1/M) + (1/M) H (1/N, …, 1/N) For any given b, it is always possible to find a proper a such that S t < S a b a+1 (*)
or equivalently ogSββ On the other hand, from()we have A(S) A(2)<A(s+1 aA(s) BA(t)<(a+1A(S) or 段A+ AG罗 Thus A(S) 10gS/ A(t) log t When B is large enough, we have A(t)=k log t
a b log t log S < b a + 1 b or equivalently On the other hand, from (*) we have A(S ) A(t ) < A(S ) a a b a+1 or A(s) bA(t) < (a+1)A(S) (**) b a A(t) A(S) < a b + 1 b (***) Thus A(t) A(S) - log t log S < 1 b When b is large enough, we have A(t) = k log t
(b) In the case of unequal and rational probabilities Let where n i- positive integer for all i then the unequal distribution entropy Hs(p,.,Pv) becomes the case of equal probabilities: H(
(b) In the case of unequal and rational probabilities Let ni ni i=1 N i then the unequal distribution entropy H (p , …, p ) becomes the case of equal probabilities: i i = n , n1 H ( … , …, ... , … , ... ) n n i N p = where n -- positive integer for all i. S 1 N
On one hand we have )=A(1/)=klog(1/) On the other hand HSO )sH1p,…P)+;Hm1/n,…1m) n i=1 Hs(P1,…,队+k-1p;log Hence H(P,…,R)=kg(/)1;P吗 n 1P0g早 0 (c)If p are irrational, the equation also valid, see (1)
On one hand we have H ( … ) = A(1/ ) = k log (1/ ) On the other hand H ( … ) = H (p , …, p ) + p H (1/n , …, 1/n ) = H (p , …, p ) + A(n ) = H (p , …, p ) + k log n 1 N i i i N i=1 S S S S S 1 N i=1 N p i i=1 N p i S 1 N Hence H (p , …, p ) = k[log(1/ ) - p log n ] = - k p log n = - k p log p i i i=1 N i i i=1 N i=1 N i i i i 1 N (c) If p are irrational, the equation also valid, see (1). S
In the case of ideal observation H(1,0,…,0) (0log0=0) k Let n=2, p=p, the base of logarithm takes the value 2 and H(1/2, 1/2)=I bit, then kl. Therefore we have P1,…,R)=Hsp1,…,)=- p log p(bit
I(p , …, p ) = H (p , …, p ) - H (1, 0, …, 0) = H (p , …, p ) = - k S S S 1 N 1 N 1 N i=1 N p log p i i Let N=2, p = p , the base of logarithm takes the value 2 and H (1/2, 1/2) = 1 bit, then k=1. Therefore we have In the case of ideal observation I(p , …, p ) = H (p , …, p ) = - p log p i i i=1 N 1 N S 1 N (bits) (0 log 0 = 0)
onclusion For a system with n possible states and their associated probabilities p,,..., P, the average a priori uncertainty about the state of the system is Hs(p,,R The average amount of information about the system obtained after observation in ideal case is numerically equal to the uncertainty it removed
Conclusion For a system with N possible states and their associated probabilities p , …, p , the average a priori uncertainty about the state of the system is H (p , …, p ). 1 N S 1 N The average amount of information about the system obtained after observation in ideal case is numerically equal to the uncertainty it removed