Lecture Notes on Information Theory and Coding Baoming Bai State Key Lab.of ISN,Xidian University bmbai@mail.xidian.edu.cn May26,2020
Lecture Notes on Information Theory and Coding Baoming Bai State Key Lab. of ISN, Xidian University bmbai@mail.xidian.edu.cn May 26, 2020
Goals:The goal of this class is to establish an understanding of the instrinsic prop- erties of transmission of information and the relation between coding and the founda- mental limits of information transmission in the context of communication. Course Outline .Entropy and Mutual information (Measure of information) ·Sourse coding 。Channel capacity ·The Gaussian channel .Coding for a noisy channel(Block coding principles) .Rate distortion theory 基本内容可以概括为: r{镜能整资药务深-ind山adai感时u收sg Textbook:T M Cover and I A.Thon s Elements of info edit 版影印本:清华大学出版社,2003). References: (1)C.E.Shannon,"A mathematical theory of communication",Bell Syst.Tech.J., vol.27,1948. (2)R.G.Gallager,Information Theory and Reliable Communication,Wiley,1968. (3)R.J.McEliece,The Theory of Information and coding,1977.(Thisisa very readable small book) 田不教辰 Information Theory and Network Coding,Springer,2008. (5)J.Massey,Digital Information Theory,Course notes,ETH. (6)M.Medard,Transmission of Information,Course notes,MIT. (⑦)王有民,李晖,信息论与编码理论,第二版,高等教有出版社,2013 (⑧)仇佩亮,信息论与编码,高等教有出版社,2003 (⑨)付祖芸,信息论,电子工业出版社. (10)R.G.Gallager,"Claude E.Shannon:A retrospective on his life,work,and im- pact,"IEEE Trans.Inform.Theory,vol.47,no.7,pp.2681-2695,Nov.2001
Goals: The goal of this class is to establish an understanding of the instrinsic properties of transmission of information and the relation between coding and the foundamental limits of information transmission in the context of communication. Course Outline: • Entropy and Mutual information (Measure of information) • Sourse coding • Channel capacity • The Gaussian channel • Coding for a noisy channel (Block coding principles) • Rate distortion theory 基本内容可以概括为: IT { 通信的基本性能限 逼近性能限的方法 – Coding: source coding, channel coding, network coding Textbook: T. M. Cover and J. A. Thomas, Elements of Information Theory, 2nd edition, New York, Wiley, 2006. (中译本:信息论基础,机械工业出版社,2008;第一 版影印本:清华大学出版社,2003). References: (1) C. E. Shannon, “A mathematical theory of communication”, Bell Syst. Tech. J., vol. 27, 1948. (2) R. G. Gallager, Information Theory and Reliable Communication, Wiley, 1968. (3) R. J. McEliece, The Theory of Information and coding, 1977. (This is a very readable small book) (4) Raymond W. Yeung, Information Theory and Network Coding, Springer, 2008. (中 译本:高等教育出版社,2011) (5) J. Massey, Digital Information Theory, Course notes, ETH. (6) M. Medard, Transmission of Information, Course notes, MIT. (7) 王育民,李晖,信息论与编码理论,第二版,高等教育出版社,2013. (8) 仇佩亮,信息论与编码,高等教育出版社,2003. (9) 付祖芸,信息论,电子工业出版社. (10) R. G. Gallager, “Claude E. Shannon: A retrospective on his life, work, and impact,”IEEE Trans. Inform. Theory, vol.47, no.7, pp. 2681-2695, Nov. 2001
Chapter 1 Introduction The fundamental problem of communication is that of reproducing at one point either eractly or approrimately a message selected at another point.-Shannon 1948.Clauide E.Shannon published ap amTo of Com T s paper laid the groundw Information theory studies the transmission,processing and utilization of informa- tion. 1.1 Relationship of information theory and communica tion theory Information theory answers two fundamental questions in communication theory: 1.What is the ultimate data compression?H 2.What is the ultimate transmission rate? Information theoryals cation.(e.g.ra ding 信息理论 通信理论 其木性能图 估计理论 /网络与交 换理论等 信息现论和婚位理论的关系示意图 Shannon理论主要研究基本 通信系统传输的是信号:信号是消息的载体消息中的不确定成分是信息
Chapter 1 Introduction The fundamental problem of communication is that of reproducing at one point either exactly or approximately a message selected at another point. — Shannon In 1948, Claude E. Shannon published a landmark paper entitled “A Mathematical Theory of Communication”. This paper laid the groundwork of a entirely new scientific discipline, “Information Theory”. Information theory studies the transmission, processing and utilization of information. 1.1 Relationship of information theory and communication theory Information theory answers two fundamental questions in communication theory: 1. What is the ultimate data compression? H 2. What is the ultimate transmission rate? C Information theory also suggests means of achieving these ultimate limits of communication. (e.g. random coding, ML decoding) 网络与交 换理论等 Figure 1.1: 信 息 理 论 和 通 信 理 论 的 关 系 示 意 图. Shannon理 论 主 要 研 究 基 本 限(fundamental limits) 通信系统传输的是信号;信号是消息的载体;消息中的不确定成分是信息。 3
4 CHAPTER 1.INTRODUCTION ·狭义信息论(Shannon Theory) o在 的基础 陆 到的最性能限 以用编码方法实现这目标,并在理论上证明信系统可进 瓷产等理论外,还包括最佳接论(合号检、计与制 ·广义信息论 信息论是通信与信息系统的基础理论,是现代通信发展的动力和源泉: oupP oughs within months of each other,have launched and powered the 。信源编码定理→数据压缩技术→无线通信系统从1G变革到2G ·信道编码定理→差错控制编码(Trbo.LDPC)→3G 。数据处理定理→软判决译码 ·高斯噪声是最坏的加性噪声+多用户信息论→CDMA、多用户检测 ·MIMO容量理论→空时编码、预编码→LTE、4G ·多用户信息论协作通信、网络编码→新一代无线系统 The recent work on the information-theoretic aspects of communication concentrated on:1)Network information theory.and 2)MIMO systems. 1.2 What is information?(Measure of information) For Shannon theory,information is what we receive when uncertainty is reduced. How to measure: ·Amount of information should fulfill I≥0 .Amount of information should depend on probability P(r) For independent events:P(X,Y)=P(X)P(Y)I=I(X)+I(Y) It should has the form of log(Self-information of the eventx-z) 1.3 Applications .Data compression:voice coder,MPEG,LZ algorithm. ·Modem
4 CHAPTER 1. INTRODUCTION • 狭义信息论(Shannon Theory) Shannon在前人工作的基础上,用概率统计的方法研究通信系统。揭示了通信系 统中传送的对象是信息;系统设计的中心问题是在干扰噪声中如何有效而可靠地 传送信息。指出可以用编码方法实现这一目标;并在理论上证明了通信系统可达 到的最佳性能限。 • 一般信息论:除Shannon理论外,还包括最佳接收理论(信号检测、估计与调制 理论),噪声理论等。 • 广义信息论 信息论是通信与信息系统的基础理论,是现代通信发展的动力和源泉: I have often remarked that the transistor and information theory, two Bell Laboratories breakthroughs within months of each other, have launched and powered the vehicle of modern digital communications. Solid state electronics provided the engine while information theory gave us the steering wheel with which to guide it. — Viterbi, IT News Lett., 1998. • 信源编码定理→ 数据压缩技术→ 无线通信系统从1G变革到2G • 信道编码定理→ 差错控制编码(Turbo,LDPC)→ 3G • 数据处理定理→ 软判决译码 • 高斯噪声是最坏的加性噪声+ 多用户信息论→ CDMA、多用户检测 • MIMO容量理论→ 空时编码、预编码→ LTE、4G • 多用户信息论→ 协作通信、网络编码→ 新一代无线系统 The recent work on the information-theoretic aspects of communication concentrated on: 1) Network information theory, and 2) MIMO systems. 1.2 What is information? (Measure of information) For Shannon theory, information is what we receive when uncertainty is reduced. How to measure: • Amount of information should fulfill I ≥ 0 • Amount of information should depend on probability P(x) • For independent events: P(X, Y ) = P(X)P(Y ) → I = I(X) + I(Y ) It should has the form of log 1 PX(x) . (Self-information of the event X = x) 1.3 Applications • Data compression: voice coder, MPEG, LZ algorithm. • Modem
1.4.HISTORICAL NOTES 5 Deep space communication (and coding was called a"marriage made in heaven") ●CDMA.MIO.4G .Physical layer security(Information-theoretic securiy) Quantum communication 。Stock market 1.4 Historical notes Sampling theorem:1928 by Nyquist ●Hartley's measure of information1g28)】 I(X)=logL L=number of possible values of X. .Information theory:1948 by Shannon Investigate how to achieve the efficient and reliable communication ·Why using“entropy? Shannon与V.Neuman讨论时,V.Neuman建议用“嫡”, 1.你的不确定函数在统计力学中已经被称为熵(entropy), 2.没有人知道熵到底是什么,所以有争论时你就永远立于不敢之地 。在Shannon1948年的原文中,既没有使用“mutual information”也没有用一个 特殊符号来记它,而总是使用不确定性之差。术语“mutual information”及符 号I(X:Y)是后来由Fano入的. 。Shan was born in Michiga 1916 In 1936.he received BS degree in both electrical engine ering and mathematics from the Univ.of Michigan.Received his M.S.and Ph.D.degree from MIT.In 1941,he joined Bell Lab.In 1958,he accepted a permanent appointment at MIT.随后买了大房子,房子里有很多玩具, Sh prize for the best oublished b an author uder 30.It is widely recomized todayas the foundation of the switching field and as one of the most important Master's theses ever written. His Ph.D.dissertation,"An Algebra for Theoretical Genetics,"was completed in 1940.This thesis was never published. In 1961.Shannon published a pioneering paper "Two-way Communication Chan nels"which established the foundation for the discipline now known as "multi- user information theory";and later N.Abramson published his paper"The Aloha System-Another Alternative for Computer Communications"in 1970 which in- troduced the concept of multiple access using a shared common channel.The information theore appro n to an in 19 a coding t lope a paper"Broadcast channels
1.4. HISTORICAL NOTES 5 • Deep space communication (and coding was called a “marriage made in heaven”) • CDMA, MIMO, 4G • Physical layer security (Information-theoretic securiy) • Quantum communication • Stock market 1.4 Historical notes • Sampling theorem: 1928 by Nyquist • Hartley’s measure of information (1928) I(X) = log L, L=number of possible values of X. • Information theory: 1948 by Shannon Investigate how to achieve the efficient and reliable communication • Why using “entropy”? Shannon 与V. Neuman 讨论时, V. Neuman 建议用“熵”. 1. 你的不确定函数在统计力学中已经被称为熵(entropy). 2. 没有人知道熵到底是什么,所以有争论时你就永远立于不败之地. • 在Shannon 1948年的原文中,既没有使用“mutual information”也没有用一个 特殊符号来记它,而总是使用不确定性之差。术语“mutual information”及符 号I(X; Y )是后来由Fano引入的. • Shannon was born in Michigan, 1916. In 1936, he received B.S. degree in both electrical engineering and mathematics from the Univ. of Michigan. Received his M.S. and Ph.D. degree from MIT. In 1941, he joined Bell Lab. In 1958, he accepted a permanent appointment at MIT. 随后买了大房子, 房子里有很多玩具. • Shannon的硕士论文是关于布尔代数与交换的,他基于此研究工作发表的第一篇 论文won the 1940 Alfred Noble prize for the best paper in engineering published by an author under 30. It is widely recognized today as the foundation of the switching field and as one of the most important Master’s theses ever written. His Ph.D. dissertation, “An Algebra for Theoretical Genetics,”was completed in 1940. This thesis was never published. • In 1961, Shannon published a pioneering paper ”Two-way Communication Channels”, which established the foundation for the discipline now known as ”multiuser information theory”; and later N. Abramson published his paper ”The Aloha System - Another Alternative for Computer Communications” in 1970 which introduced the concept of multiple access using a shared common channel. The information theoretic approach to multiaccess communication began in 1972 with a coding theorem developed by Ahlswede and Liao. In 1972, T. Cover published a paper “Broadcast channels
6 CHAPTER 1.INTRODUCTION ·1995年,Gallager提出“Combining queueing theory with information theory for multiaccess;l9g8年,Ephremidus?发表了论文Information theory and commu- i2000 N.9 (Networ 和PR.K ion flo 文TeC 提出了传送容量 a tv的 腐念202后,研究正向若大规模网的信总理论发展T large networks) 1.5 A model of digital communication systems Channel encoder 信湖一→信湖编码洛 →EC的 数字方制器 Char 信道 ECC译码器 数字解器 symbols waveform Figure1.2:数字通信系统示意图 .source:discrete/continuous;memoryless/with memory encoder:convert the messages into the signal which is suitable for transmission over physical channels. .channel:wireless/cable,disk. ·interference,. ·sink:destination. 1.6 Review of Probability Baves rule: P(AIB)=P(AOB)=P(BIA)P(A) P(B) P(B) For a discrete random variable(r.v.), Probability Mass Function (PMF) Px(x)=P(X=z) denotes the Prob.of the event that X takes on the value For a continuous r.v
6 CHAPTER 1. INTRODUCTION • 1995年,Gallager提出“Combining queueing theory with information theory for multiaccess”; 1998年,Ephremidus发表了论文“Information theory and communication networks: An unconsummated union”; 2000年,R. Alshwede, N. Cai, S.-Y. R. Li, and R. W. Yeung发表了著名论文“Network information flow”,提出 了网络编码(Network coding)的思想; 2000年,P. Gupta和P. R. Kumar发表了论 文“The Capacity of wireless networks”,提出了传送容量(Transport capacity)的 概念;2003之后,研究正向着大规模网络的信息理论发展(Towards an IT of large networks)。 1.5 A model of digital communication systems 信源 信源编码器 ECC 编码器 数字调制器 信 道 干 扰 信宿 信源译码器 ECC 译码器 数字解调器 调制信道 Channel decoder n(t) bits Channel encoder symbols waveform Figure 1.2: 数字通信系统示意图 • source: discrete/continuous; memoryless/with memory • encoder: convert the messages into the signal which is suitable for transmission over physical channels. • channel: wireless/cable, disk. • interference. • sink: destination. 1.6 Review of Probability Bayes rule: P(A|B) = P(A ∩ B) P(B) = P(B|A)P(A) P(B) For a discrete random variable (r.v.), • Probability Mass Function (PMF) PX(x) = P(X = x) denotes the Prob. of the event that X takes on the value x. For a continuous r.v
1.6.REVIEW OF PROBABILITY Cumulative Distribution Function(CDF) Fx(e)=P(X≤x) .Probability Density Function(PDF) px)=&Px回
1.6. REVIEW OF PROBABILITY 7 • Cumulative Distribution Function (CDF) FX(x) = P(X ≤ x) • Probability Density Function (PDF) pX(x) = d dxFX(x)
8 CHAPTER 1.INTRODUCTION
8 CHAPTER 1. INTRODUCTION
Chapter 2 Entropy Mutual Information (Shannon's measure of information) 中r指e。n对药 ots and definition 2.1 Entropy Definition 2.1.1.The entropy of a discrete r.v.X is defined as =-∑ P(r)log P(r) (2.1) When=2.the unit is called the bt(binary digit):whene,the unit is called the peined take al togartht logarithms to In the above definition,we use the convention that 0log0=0.Note that equivalent- ly,many books adopt the convention that the summation is taken over the corresponding support set.The support set of P(X),denoted by Sx,is the set of all such that P(r)>0;i.e.:Sx supp(Px)={P(r)>0}. The entropy H(X)is also called the uncertainty of X,meaning that it is a measure of the randomness of ={Green,Blue,Red) y={Sunday,Monday,Friday P(X):0.2,0.3,0.5 PY:0.2.0.3,0.5
Chapter 2 Entropy & Mutual Information (Shannon’s measure of information) This chapter introduces basic concepts and definitions required for the discussion later. Mainly include: Entropy, Mutual information(互信息), and relative entropy(相对熵). 2.1 Entropy Let X be a discrete variable with alphabet X and PMF PX(x) = Pr{X = x}, x ∈ X . For convenience, we will often write simply P(x) for PX(x). Definition 2.1.1. The entropy of a discrete r.v. X is defined as H(X) = ∑ x∈X P(x) logb 1 P(x) = − ∑ x∈X P(x) logb P(x) (2.1) When b = 2, the unit is called the bit (binary digit); when b = e, the unit is called the nat (natural unit). (Conversion is easy: logb x = logb a loga x ⇒ Hb(x) = (logb a)Ha(x)). Unless otherwise specified, we will take all logarithms to base 2, hence all entropies will be measured in bits. In the above definition, we use the convention that 0 log 0 = 0. Note that equivalently, many books adopt the convention that the summation is taken over the corresponding support set. The support set of P(X), denoted by SX, is the set of all x ∈ X such that P(x) > 0; i.e., SX = supp(PX) = {x : P(x) > 0}. The entropy H(X) is also called the uncertainty of X, meaning that it is a measure of the randomness of X. Note that the entropy H(X) depends on the probabilities of different outcomes of X, but not on the names of the outcomes. For example, X = {Green, Blue, Red} Y = {Sunday, Monday, F riday} P(X) : 0.2, 0.3, 0.5 P(Y ) : 0.2, 0.3, 0.5 9
10 CHAPTER 2.ENTROPY AND MUTUAL INFORMATION H(X=H(Y) Remark:The entropy of can also be interpreted as the expected value of log (i.e,the average uncertainty):片 H(X)=E108 P(X) 1 (2.2) where we define [F()F().Recall that I()og is the self formation of the event=,soH(X)=E[I(e月is also referred toas平均自信息 量 A immediate consequence of the definition is that H(X)>0. Example 2.1.1.Let 1 with mrobabilitu X=0 with probability 1-p Then H(X)=-plogp-(1-p)log(1-p) (2.3) )) 44 Figure2.1:二元随机变量的熵函数 Example 2.1.2.Let a with probability 1/2 X= 61/4 e1/8 d1/8 ThenH(X)=-log麦-log}-专log8-&log是=是bits. =fa,b,c,d}with equal probability, then we ha We can see tha the unif orm distr
10 CHAPTER 2. ENTROPY AND MUTUAL INFORMATION H(X) = H(Y ) Remark: The entropy of X can also be interpreted as the expected value of log 1 P(X) (i.e., the average uncertainty): H(X) = E [ log 1 P(X) ] (2.2) where we define E[F(x)] = ∑ x∈X PX(x)F(x). Recall that I(x) = log 1 PX(x) is the selfinformation of the event X = x, so H(X) = E[I(x)] is also referred to as 平均自信息 量。 A immediate consequence of the definition is that H(X) ≥ 0. Example 2.1.1. Let X = { 1 with probability p 0 with probability 1 − p Then H(X) = −p log p − (1 − p) log(1 − p) (2.3) Equation (2.3) is often called the binary entropy function, and denoted by H2(p). Its graph is shown in Fig. 2.1. We can see that H(X) = 1 bit when p = 1 2 . 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 p H(p) Figure 2.1: 二元随机变量的熵函数 Example 2.1.2. Let X = a with probability 1/2 b 1/4 c 1/8 d 1/8 Then H(X) = − 1 2 log 1 2 − 1 4 log 1 4 − 1 8 log 1 8 − 1 8 log 1 8 = 7 4 bits. On the other hand, if X takes on values in X = {a, b, c, d} with equal probability, then we have H(X) = − 1 4 log 1 4 × 4 = 2 bits=log |X |. We can see that the uniform distribution over the range X is the maximum entropy distribution over this range. (In other words, the entropy of X is maximized when its values are equally likely.)