16.36: Communication Systems Engineering Lecture 2: Entropy Eytan Modiano
16.36: Communication Systems Engineering Lecture 2: Entropy Eytan Modiano Eytan Modiano Slide 1
Information content of a random variable Random variable x Outcome of a random experiment Discrete R.V. takes on values from a finite set of possible outcomes PMF: P(X=y)=P,ly) How much information is contained in the event x= y? Will the sun rise today Revealing the outcome of this experiment provides no information Will the Celtics win the nba championship? Since this is unlikely, revealing yes provides more information than revealing no Events that are less likely contain more information than likely events
Information content of a random variable • Random variable X – Outcome of a random experiment – Discrete R.V. takes on values from a finite set of possible outcomes PMF: P(X = y) = P x(y) • How much information is contained in the event X = y? – Will the sun rise today? Revealing the outcome of this experiment provides no information – Will the Celtics win the NBA championship? Since this is unlikely, revealing yes provides more information than revealing no • Events that are less likely contain more information than likely events Eytan Modiano Slide 2
Measure of information I(*i)=Amount of information revealed by an outcome X=X i Desirable properties of I(x) 1. If P(x)=1 or P(x=0, then I(x=0 2.00 3. If P(xly) 4. If x and y are independent events then i(x, y)=1(x+lly) Above is satisfied by: I(x)=Log2 (1/P(Xl) Base of Log is not critical Base 2=> information measured in bits
Measure of Information • I(xi) = Amount of information revealed by an outcome X = xi • Desirable properties of I(x): 1. If P(x) = 1 or P(x) = 0, then I(x) = 0 2. If 0 0 3. If P(x) I(y) 4. If x and y are independent events then I(x,y) = I(x)+I(y) • Above is satisfied by: I(x) = Log 2(1/P(x)) • Base of Log is not critical – Base 2 => information measured in bits Eytan Modiano Slide 3
Entropy a measure of the information content of a random variable X∈{x1…,XM} ●H(X)=EX=ΣP(x)Log21/P(x) Example: Binary experiment X=X, with probability p X=X2 with probability (1-p) H(X=pLog2(1p) +(1-p)Log2(1/(1-p))=Hb(p) H(X) is maximized with p=1/2, Hb(1/ 2) Not surprising that the result of a binary experiment can be conveyed using one bit
∈ ∑ Entropy • A measure of the information content of a random variable • X ∈ {x1,…,XM} • H(X) = E[I(X)] = ∑P(xi) Log2(1/P(xi)) • Example: Binary experiment – X = x1 with probability p – X = x2 with probability (1-p) – H(X) = pLog2(1/p) + (1-p)Log2(1/(1-p)) = Hb(p) – H(X) is maximized with p=1/2, Hb(1/2) = 1 Not surprising that the result of a binary experiment can be conveyed using one bit Eytan Modiano Slide 4
Simple bounds on entropy Theorem: Given a random variable with m possible values 0<=H(X≤=Log2M A)H(X)=0 if and only if P(x)=1 for some i B)H(X)=Log2 (M)if and only if P(x =1/M for all i Proof of a is obvious Y=x-1 Proof of b requires the Log Inequality if xo then In(x)<=X-1 Equality if X=1 Y=In( 5
Simple bounds on entropy • Theorem: Given a random variable with M possible values – 0 0 then ln(x) <= x-1 – Equality if x=1 Y= ln ( x) Eytan Modiano Slide 5
Proof, continued Consider the sum 2PLogtup ), by log inequality ∑F MP =∑ P)=0, equality when P.I M M Writing this in another way ∑F Ro)=∑PLog()+∑PLog()s0. equality when P Pl2gp)s∑P(MD=0
P Proof, continued M 1 Consider the sum ∑Pi Log( ), by log inequality : i=1 MPi M M 1 1 1 ≤ ∑Pi ( − 1) = ∑( − Pi ) = 0, equality when Pi = i=1 MPi i=1 M M Writing this in another way: M M M 1 1 1 ∑Pi Log( ) = ∑Pi Log( ) +∑Pi Log( ) ≤ 0, equality when Pi = i=1 MPi i=1 Pi i=1 M M M M 1 That is, ∑Pi Log( ) ≤∑Pi Log(M) = Log(M) i=1 Pi i=1 Eytan Modiano Slide 6 1
Joint Entropy Joint entropy: H(X, n) ∑ p(x, y)log(-) p(, y Conditional entropy HXY=uncertainty in X given Y =少)=∑x p(xr=y H(X Y=EH(X Y=y)]=>p(Y=yH(XJY=y HXY)=∑以xyly,1 p(xr=y In general xux random variables HXn|X1…,Xn1)=∑p(x,…,Xn)og
H X p x H X H X y Joint Entropy 1 Joint entropy : (, Y) = ∑ p(x, ) y log( (, y)) x y, Conditional entropy: H(X | Y) = uncertainty in X given Y 1 (| Y = y) = ∑ H X p x( | Y = y) log( (| Y = y)) p x x (| Y) = E[H(| X Y y = )] = ∑p(Y = y)H(X | Y y = ) y 1 (| Y) = ∑ p(x, y)log( (| Y = y)) p x x, y In General: X1,...,Xn random variables 1 H(Xn | X1,...,Xn-1) = ∑p(x1,...,xn )log( Eytan Modiano x ,...,xn p(x x1,...,xn-1) n | Slide 7 1
Rules for entropy Chain rule HX1y…,X)=H(X1)+H(X2x1)+HX3X2X+…+ H(XnX1…X1 2. H(X, Y)= H(X)+ HYX)=H(Y)+ HXY) 3. If X1, ., Xn are independent then H(X1…,X)=HX)+HX2)+…+H(Xxn) If they are also identically distributed (l.l. d )then HX1…,Xn)=nH(X 4. HX,, X,)<= H(X1)+ H(X2) +.. HXn(with equality if independent Proof: use chain rule and notice that H(XY)< H(X) entropy is not increased by additional information
Rules for entropy 1. Chain rule: H(X1, .., Xn) = H(X1) + H(X2|X1) + H(X3|X2,X1) + …+ H(Xn|Xn-1…X1) 2. H(X,Y) = H(X) + H(Y|X) = H(Y) + H(X|Y) 3. If X1, .., Xn are independent then: H(X1, .., Xn) = H(X1) + H(X2) + …+H(Xn) If they are also identically distributed (I.I.d) then: H(X1, .., Xn) = nH(X1) 4. H(X1, .., Xn) <= H(X1) + H(X2) + …+ H(Xn) (with equality if independent) Proof: use chain rule and notice that H(X|Y) < H(X) entropy is not increased by additional information Eytan Modiano Slide 8
Mutual Information X. Y random variables Definition: I(X, Y)=H(Y)-HYX) Notice that H(YX)=H(X, Y-H(X)=>I(X; Y)=H(X)+HY)-H(,Y) .IX,Y)=Y,X=H(X) -HXY) Note: IX Y)>=0 equality if independent) Because H(Y)>= H(YX)
Mutual Information • X, Y random variables • Definition: I(X;Y) = H(Y) - H(Y|X) • Notice that H(Y|X) = H(X,Y) - H(X) => I(X;Y) = H(X)+H(Y) - H(X,Y) • I(X;Y) = I(Y;X) = H(X) - H(X|Y) • Note: I(X,Y) >= 0 (equality if independent) – Because H(Y) >= H(Y|X) Eytan Modiano Slide 9