正在加载图片...
Chain rule.Another basic fact is the chain rule: H(X,Y)=H(X)+H(Y X). Think of this as the following:The uncertainty of H(X,Y)is that of one variable X,plus that of the other variable Y after knowing X. The chain rule also works with conditional entropy, H(X,Y Z)=H(X Z+H(Y X,Z), and with more variables: H(Xi,...Xnl2)=H(X12)+H(X2IX,Z)+.+H(XnlX...,Xn-1,Z) ≤H(XlZ)+H(X2lZ)+…+H(XnlZ), where the inequality is usually referred to as the subadditivity of entropy. Relative entropy.Another important concept is that of the relative entropy of two distributions p and q. H(pllq) p(x)logz p(x) q(x) XEX Relative entropy can sometimes serve as a good measure as distance of two measures.A basic property is that it is nonnegative. Fact.H(pllq)>0 with equality iff p=q. Proof.Use the faet that oin the definition ofH Mutual information.For a joint distribution (X,Y)=p(x,y),the mutual information between X and Y is I(X;Y)=D(p(x,y)Ilp(x)p(y))=H(X)-H(XIY)=H(X)+H(Y)-H(X,Y). It is a good exercise to verify the above equalities.But the second one has a clear explanation: We mentioned earlier that conditioning always reduces the entropy.How much does it reduce? Exactly the mutual information.So the mutual information is the amount of uncertainty of X minus that when Y is known.In this way,it measures how much Y contains theChain rule. Another basic fact is the chain rule: 𝐻(𝑋, 𝑌) = 𝐻(𝑋) + 𝐻(𝑌|𝑋). Think of this as the following: The uncertainty of 𝐻(𝑋, 𝑌) is that of one variable 𝑋, plus that of the other variable 𝑌 after knowing 𝑋. The chain rule also works with conditional entropy, 𝐻(𝑋, 𝑌|𝑍) = 𝐻(𝑋|𝑍) + 𝐻(𝑌|𝑋, 𝑍), and with more variables: 𝐻(𝑋1 , …, 𝑋𝑛 |𝑍) = 𝐻(𝑋1 |𝑍) + 𝐻(𝑋2 |𝑋1 ,𝑍) + ⋯ + 𝐻(𝑋𝑛 |𝑋1 ,… , 𝑋𝑛−1 , 𝑍) ≤ 𝐻(𝑋1 |𝑍) + 𝐻(𝑋2 |𝑍) + ⋯ + 𝐻(𝑋𝑛 |𝑍), where the inequality is usually referred to as the subadditivity of entropy. Relative entropy. Another important concept is that of the relative entropy of two distributions 𝑝 and 𝑞. 𝐻(𝑝||𝑞) = ∑𝑝(𝑥) log2 𝑝(𝑥) 𝑞(𝑥) 𝑥∈X . Relative entropy can sometimes serve as a good measure as distance of two measures. A basic property is that it is nonnegative. Fact. 𝐻(𝑝||𝑞) ≥ 0 with equality iff 𝑝 = 𝑞. Proof. Use the fact that log2 𝑥 = ln 𝑥 ln 2 ≤ 𝑥−1 ln 2 in the definition of 𝐻(𝑝||𝑞). Mutual information. For a joint distribution (𝑋, 𝑌) = 𝑝(𝑥,𝑦), the mutual information between 𝑋 and 𝑌 is 𝐼(𝑋; 𝑌) = 𝐷(𝑝(𝑥,𝑦)||𝑝(𝑥)𝑝(𝑦)) = 𝐻(𝑋) − 𝐻(𝑋|𝑌) = 𝐻(𝑋) + 𝐻(𝑌) − 𝐻(𝑋, 𝑌). It is a good exercise to verify the above equalities. But the second one has a clear explanation: We mentioned earlier that conditioning always reduces the entropy. How much does it reduce? Exactly the mutual information. So the mutual information is the amount of uncertainty of 𝑋 minus that when 𝑌 is known. In this way, it measures how much 𝑌 contains the
<<向上翻页向下翻页>>
©2008-现在 cucdc.com 高等教育资讯网 版权所有