Chapter 2 Entropy Mutual Information (Shannon's measure of information) 中r指e。n对药 ots and definition 2.1 Entropy Definition 2.1.1.The entropy of a discrete r.v.X is defined as =-∑ P(r)log P(r) (2.1) When=2.the unit is called the bt(binary digit):whene,the unit is called the peined take al togartht logarithms to In the above definition,we use the convention that 0log0=0.Note that equivalent- ly,many books adopt the convention that the summation is taken over the corresponding support set.The support set of P(X),denoted by Sx,is the set of all such that P(r)>0;i.e.:Sx supp(Px)={P(r)>0}. The entropy H(X)is also called the uncertainty of X,meaning that it is a measure of the randomness of ={Green,Blue,Red) y={Sunday,Monday,Friday P(X):0.2,0.3,0.5 PY:0.2.0.3,0.5Chapter 2 Entropy & Mutual Information (Shannon’s measure of information) This chapter introduces basic concepts and definitions required for the discussion later. Mainly include: Entropy, Mutual information(互信息), and relative entropy(相对熵). 2.1 Entropy Let X be a discrete variable with alphabet X and PMF PX(x) = Pr{X = x}, x ∈ X . For convenience, we will often write simply P(x) for PX(x). Definition 2.1.1. The entropy of a discrete r.v. X is defined as H(X) = ∑ x∈X P(x) logb 1 P(x) = − ∑ x∈X P(x) logb P(x) (2.1) When b = 2, the unit is called the bit (binary digit); when b = e, the unit is called the nat (natural unit). (Conversion is easy: logb x = logb a loga x ⇒ Hb(x) = (logb a)Ha(x)). Unless otherwise specified, we will take all logarithms to base 2, hence all entropies will be measured in bits. In the above definition, we use the convention that 0 log 0 = 0. Note that equivalently, many books adopt the convention that the summation is taken over the corresponding support set. The support set of P(X), denoted by SX, is the set of all x ∈ X such that P(x) > 0; i.e., SX = supp(PX) = {x : P(x) > 0}. The entropy H(X) is also called the uncertainty of X, meaning that it is a measure of the randomness of X. Note that the entropy H(X) depends on the probabilities of different outcomes of X, but not on the names of the outcomes. For example, X = {Green, Blue, Red} Y = {Sunday, Monday, F riday} P(X) : 0.2, 0.3, 0.5 P(Y ) : 0.2, 0.3, 0.5 9