
Raymond W.YeungINFORMATIONTHEORYANDNETWORKCODINGSpringer

Information Theoryand Network Coding
Information Theory and Network Coding

Information Technology: Transmission,Processing,and StorageSeries EditorsRobert GallagerMassachusetts Institute of TechnologyCambridge, MassachusettsJack Keil WolfUniversity of California at San DiegoLa Jolla, CaliforniaInformation Theory and Network CodingRaymond W. YeungNetworks and Grids: Technology and TheoryThomas G. RobertazziCDMARadio with RepeatersJoseph Shapira and Shmuel MillerDigital Satellite CommunicationsGiovanni Corazza, ed.Immersive Audio Signal ProcessingSunil Bharitkar and Chris KyriakakisDigital Signal Processing for Measurement Systems: Theory and ApplicationsGabriele D'Antona and Alessandro FerreroCoding for Wireless ChannelsEzio BiglieriWireless Networks: Multiuser Detection in Cross-Layer DesignChristina Comaniciu, Narayan B. Mandayam and H. Vincent PoorThe Multimedia InternetStephen WeinsteinMIMO Signals and SystemsHorst J. BessaiMulti-Carrier Digital Communications:Theory and Applications of OFDM, 2nd EdAhmad R.S. Bahai, Burton R. Saltzberg and Mustafa ErgenPerformance Analysis and Modeling of Digital Transmission SystemsWilliam TurinWireless Communications Systems and NetworksMohsen GuizaniInterference Avoidance Methods for Wireless SystemsDimitrie C.Popescu and Christopher RoseStochastic Image ProcessingChee Sun Won and Robert M. GrayCoded Modulation SystemsJohn B. Anderson and Arne SvenssonCommunication System Design Using DSPAlgorithms:With Laboratory Experiments for the TMS320C6701 and TMS320C6711Steven A. Tretter(continued after index)
Information Technology: Transmission, Processing, and Storage Series Editors: Robert Gallager Massachusetts Institute of Technology Cambridge, Massachusetts Jack Keil Wolf University of California at San Diego La Jolla, California Information Theory and Network Coding Raymond W. Yeung Networks and Grids: Technology and Theory Thomas G. Robertazzi CDMA Radio with Repeaters Joseph Shapira and Shmuel Miller Digital Satellite Communications Giovanni Corazza, ed. Immersive Audio Signal Processing Sunil Bharitkar and Chris Kyriakakis Digital Signal Processing for Measurement Systems: Theory and Applications Gabriele D’Antona and Alessandro Ferrero Coding for Wireless Channels Ezio Biglieri Wireless Networks: Multiuser Detection in Cross-Layer Design Christina Comaniciu, Narayan B. Mandayam and H. Vincent Poor The Multimedia Internet Stephen Weinstein MIMO Signals and Systems Horst J. Bessai Multi-Carrier Digital Communications: Theory and Applications of OFDM, 2nd Ed Ahmad R.S. Bahai, Burton R. Saltzberg and Mustafa Ergen Performance Analysis and Modeling of Digital Transmission Systems William Turin Wireless Communications Systems and Networks Mohsen Guizani Interference Avoidance Methods for Wireless Systems Dimitrie C. Popescu and Christopher Rose Stochastic Image Processing Chee Sun Won and Robert M. Gray Coded Modulation Systems John B. Anderson and Arne Svensson Communication System Design Using DSP Algorithms: With Laboratory Experiments for the TMS320C6701 and TMS320C6711 Steven A. Tretter (continued after index)

Raymond W. YeungInformation Theoryand Network CodingSpringer
Raymond W. Yeung Information Theory and Network Coding 123

Raymond W. YeungThe Chinese University of Hong KongDepartmentofInformationEngineeringHongKongPeople's Republic of Chinawhyeung@ie.cuhk.edu.hkSeries EditorsJackKeilWolfRobert GallagerMassachusetts Institute of TechnologyUniversity of California, San DiegoDepartment of Electrical EngineeringDepartmentofElectricalandComputer Scienceand Computer EngineeringCambridge, MALa Jolla, CAUSAUSAISBN: 978-0-387-79233-0e-ISBN: 978-0-387-79234-7LibraryofCongressControlNumber:20089244722008SpringerScience+BusinessMedia, LLCAll rights reserved. This work may not be translated or copied in whole or in part without the writ-ten permission of the publisher (Springer Science+Business Media, LLC., 233 Spring Street, New York,NY10013, USA),except for brief excerpts in connection with reviews or scholarly analysis.Use in con-nection with any form of information storage and retrieval, electronic adaptation, computer software, orbysimilaror dissimilar methodologynowknown or hereafterdeveloped isforbidden.The use in this publication of trade names, trademarks, service marks, and similar terms, even if they arenot identified as such, is not to be taken as an expression of opinion as to whether or not they are subjectto proprietary rights.Printed on acid-free paper987654321springer.com
Raymond W. Yeung The Chinese University of Hong Kong Department of Information Engineering Hong Kong People’s Republic of China whyeung@ie.cuhk.edu.hk Series Editors Robert Gallager Jack Keil Wolf Massachusetts Institute of Technology University of California, San Diego Department of Electrical Engineering Department of Electrical and Computer Science and Computer Engineering Cambridge, MA La Jolla, CA USA USA ISBN: 978-0-387-79233-0 e-ISBN: 978-0-387-79234-7 Library of Congress Control Number: 2008924472 c 2008 Springer Science+Business Media, LLC All rights reserved. This work may not be translated or copied in whole or in part without the written permission of the publisher (Springer Science+Business Media, LLC., 233 Spring Street, New York, NY10013, USA), except for brief excerpts in connection with reviews or scholarly analysis. Use in connection with any form of information storage and retrieval, electronic adaptation, computer software, or by similar or dissimilar methodology now known or hereafter developed is forbidden. The use in this publication of trade names, trademarks, service marks, and similar terms, even if they are not identified as such, is not to be taken as an expression of opinion as to whether or not they are subject to proprietary rights. Printed on acid-free paper 987654321 springer.com

To my parents and my family
To my parents and my family

PrefaceThis book is an evolution from my book A First Course in InformationTheory published in 2002 when network coding was still at its infancy. Thelast fewvearshavewitnessedtherapiddevelopment of networkcodingintoa research field of its own in information science. With its root in informa-tion theory, network coding has not only brought about a paradigm shift innetwork communications at large,but also had significant influence on suchspecificresearch fields as coding theory,networking, switching,wireless com-munications, distributed data storage, cryptography, and optimization theory.While new applications of network coding keep emerging, the fundamental re-sults that lay the foundation of the subject are more or less mature. One ofthe main goals of this book therefore is to present these results in a unifyingand coherentmanner.Whilethe previous bookfocused only on information theoryfor discreterandom variables, the current book contains two new chapters on informationtheory for continuous random variables, namely the chapter on differentialentropy and the chapter on continuous-valued channels. With these topicsincluded, the book becomes more comprehensive and is more suitable to beused as a textbook for a course in an electrical engineering department.What Is inThis bookOut of the 21 chapters in this book,the first 16 chapters belong to Part I.Components of InformationTheory,and thelast 5 chapters belong to Part II,Fundamentals of Network Coding.PartIcovers thebasictopics in informationtheory and prepares the reader for the discussions in Part II. A brief rundownof the chapters will give a better idea of what is in this book.Chapter 1 contains a high-level introduction to the contents of this book.First, there is a discussion on the nature of information theory and the mainresults in Shannon's original paper in 1948 which founded the field.There arealso pointers to Shannon's biographies and his works.Chapter 2 introduces Shannon's information measures for discrete ran-dom variables and their basic properties. Useful identities and inequalities in
Preface This book is an evolution from my book A First Course in Information Theory published in 2002 when network coding was still at its infancy. The last few years have witnessed the rapid development of network coding into a research field of its own in information science. With its root in information theory, network coding has not only brought about a paradigm shift in network communications at large, but also had significant influence on such specific research fields as coding theory, networking, switching, wireless communications, distributed data storage, cryptography, and optimization theory. While new applications of network coding keep emerging, the fundamental results that lay the foundation of the subject are more or less mature. One of the main goals of this book therefore is to present these results in a unifying and coherent manner. While the previous book focused only on information theory for discrete random variables, the current book contains two new chapters on information theory for continuous random variables, namely the chapter on differential entropy and the chapter on continuous-valued channels. With these topics included, the book becomes more comprehensive and is more suitable to be used as a textbook for a course in an electrical engineering department. What Is in This book Out of the 21 chapters in this book, the first 16 chapters belong to Part I, Components of Information Theory, and the last 5 chapters belong to Part II, Fundamentals of Network Coding. Part I covers the basic topics in information theory and prepares the reader for the discussions in Part II. A brief rundown of the chapters will give a better idea of what is in this book. Chapter 1 contains a high-level introduction to the contents of this book. First, there is a discussion on the nature of information theory and the main results in Shannon’s original paper in 1948 which founded the field. There are also pointers to Shannon’s biographies and his works. Chapter 2 introduces Shannon’s information measures for discrete random variables and their basic properties. Useful identities and inequalities in

VIIIPrefaceinformation theory are derived and explained. Extra care is taken in handlingjoint distributions with zero probability masses. There is a section devotedto the discussion of maximum entropy distributions. The chapter ends with asection on the entropy rate of a stationary information source.Chapter 3 is an introduction to the theory of I-Measure which establishesa one-to-one correspondence between Shannon's information measures andset theory.Anumber of examples are given to showhowthe use of informa-tion diagrams can simplify the proofs of many results in information theory.Such diagrams are becoming standard tools for solving information theoryproblems.Chapter 4 is a discussion of zero-error data compression by uniquely de-codable codes, with prefix codes as a special case. A proof of the entropybound for prefix codes which involves neither the Kraft inequality nor thefundamental inequality is given. This proof facilitates the discussion of theredundancy of prefix codes.Chapter 5 is a thorough treatment of weak typicality. The weak asymp-totic equipartition property and the source coding theorem are discussed.Anexplanation of thefact that a good data compression scheme produces almosti.i.d. bits is given. There is also an introductory discussion of the Shannon-McMillan-Breiman theorem.The concept of weak typicality will be furtherdeveloped in Chapter 10 for continuous random variables.Chapter 6 contains a detailed discussion of strong typicality which appliesto random variables with finite alphabets.The results developed in this chap-ter will be used for proving the channel coding theorem and the rate-distortiontheorem inthenexttwo chapters.The discussion in Chapter 7 of the discrete memoryless channel is an en-hancement of the discussion in the previous book. In particular,the newdefinition of the discrete memoryless channel enables rigorous formulationand analysis of coding schemes for such channels with or without feed-back.The proof of the channel coding theorem uses a graphical modelapproach that helps explain the conditional independence of the randomvariables.Chapter 8 is an introduction to rate-distortion theory.The version of therate-distortion theorem here, proved by using strong typicality, is a strongerversion of the original theorem obtained by Shannon.In Chapter 9, the Blahut-Arimoto algorithms for computing the channelcapacity and the rate-distortion function are discussed, and a simplified prooffor convergence is given. Great care is taken in handling distributions withzero probability masses.Chapters 10 and 1l are devoted to the discussion of information theory forcontinuous random variables. Chapter 10 introduces differential entropy andrelated information measures, and their basic properties are discussed. Theasymptotic equipartitionpropertyfor continuous randomvariables is proved.The last section on maximum differential entropy distributions echoes thesection in Chapter 2 on maximum entropy distributions
VIII Preface information theory are derived and explained. Extra care is taken in handling joint distributions with zero probability masses. There is a section devoted to the discussion of maximum entropy distributions. The chapter ends with a section on the entropy rate of a stationary information source. Chapter 3 is an introduction to the theory of I-Measure which establishes a one-to-one correspondence between Shannon’s information measures and set theory. A number of examples are given to show how the use of information diagrams can simplify the proofs of many results in information theory. Such diagrams are becoming standard tools for solving information theory problems. Chapter 4 is a discussion of zero-error data compression by uniquely decodable codes, with prefix codes as a special case. A proof of the entropy bound for prefix codes which involves neither the Kraft inequality nor the fundamental inequality is given. This proof facilitates the discussion of the redundancy of prefix codes. Chapter 5 is a thorough treatment of weak typicality. The weak asymptotic equipartition property and the source coding theorem are discussed. An explanation of the fact that a good data compression scheme produces almost i.i.d. bits is given. There is also an introductory discussion of the Shannon– McMillan–Breiman theorem. The concept of weak typicality will be further developed in Chapter 10 for continuous random variables. Chapter 6 contains a detailed discussion of strong typicality which applies to random variables with finite alphabets. The results developed in this chapter will be used for proving the channel coding theorem and the rate-distortion theorem in the next two chapters. The discussion in Chapter 7 of the discrete memoryless channel is an enhancement of the discussion in the previous book. In particular, the new definition of the discrete memoryless channel enables rigorous formulation and analysis of coding schemes for such channels with or without feedback. The proof of the channel coding theorem uses a graphical model approach that helps explain the conditional independence of the random variables. Chapter 8 is an introduction to rate-distortion theory. The version of the rate-distortion theorem here, proved by using strong typicality, is a stronger version of the original theorem obtained by Shannon. In Chapter 9, the Blahut–Arimoto algorithms for computing the channel capacity and the rate-distortion function are discussed, and a simplified proof for convergence is given. Great care is taken in handling distributions with zero probability masses. Chapters 10 and 11 are devoted to the discussion of information theory for continuous random variables. Chapter 10 introduces differential entropy and related information measures, and their basic properties are discussed. The asymptotic equipartition property for continuous random variables is proved. The last section on maximum differential entropy distributions echoes the section in Chapter 2 on maximum entropy distributions

PrefaceIXChapter 1l discusses a variety of continuous-valued channels, with thecontinuous memoryless channel being thebasic building block.In proving thecapacity of the memoryless Gaussian channel,a careful justification is givenforthe existence of the differential entropy of the output random variable.Basedon this result, the capacity of a system of parallel/correlated Gaussian chan-nels is obtained.Heuristic arguments leading to theformula for the capacity ofthe bandlimited white/colored Gaussian channel are given. The chapter endswith a proof of thefact that zero-mean Gaussiannoise is theworst additivenoise.Chapter 12 explores the structure of the I-Measure for Markov structures.Set-theoretic characterizations of full conditional independence and Markovrandom field are discussed.The treatment of Markov random field here maybetoo specialized for the average reader, but the structure of the I-Measureand the simplicity of the information diagram for a Markov chain are bestexplained as a special case of a Markov random field.Information inequalities are sometimes called the laws of informationtheory because they govern the impossibilities in information theory.InChapter 13, thegeometrical meaning of information inequalities and the re-lation between information inequalities and conditional independence are ex-plained in depth.The framework for information inequalities discussed hereis the basis of the next two chapters.Chapter 14 explains how the problem of proving information inequalitiescan be formulated as a linear programming problem. This leads to a completecharacterization of all information inequalities provable by conventional tech-niques.These inequalities, called Shannon-type inequalities, can beproved bytheWorldWideWebavailable softwarepackageITIP.Itis also shownhowShannon-type inequalities can be used to tackle the implication problem ofconditional independence in probability theory.Shannon-type inequalities are all the information inequalities known dur-ing the first half century of information theory.In the late1990s,a few newinequalities, called non-Shannon-type inequalities, were discovered.These in-equalities implytheexistenceof lawsin informationtheorybeyond thoselaiddownby Shannon.In Chapter 15,wediscuss theseinequalities and theirap-plications.Chapter 16 explains an intriguing relation between information theoryand group theory.Specifically, for every information inequality satisfied byany joint probability distribution, there is a corresponding group inequalitysatisfied by any finite group and its subgroups and vice versa. Inequalitiesof the latter type govern the orders of any finite group and their subgroups.Group-theoretic proofs of Shannon-type information inequalities are given.Atthe end of the chapter, a group inequality is obtained from a non-Shannon-type inequality discussed in Chapter 15. The meaning and the implication ofthis inequality are yet to be understood.Chapter 17 starts Part II of the book with a discussion of the butterflynetwork, theprimary example in network coding.Variations of the butterfly
Preface IX Chapter 11 discusses a variety of continuous-valued channels, with the continuous memoryless channel being the basic building block. In proving the capacity of the memoryless Gaussian channel, a careful justification is given for the existence of the differential entropy of the output random variable. Based on this result, the capacity of a system of parallel/correlated Gaussian channels is obtained. Heuristic arguments leading to the formula for the capacity of the bandlimited white/colored Gaussian channel are given. The chapter ends with a proof of the fact that zero-mean Gaussian noise is the worst additive noise. Chapter 12 explores the structure of the I-Measure for Markov structures. Set-theoretic characterizations of full conditional independence and Markov random field are discussed. The treatment of Markov random field here maybe too specialized for the average reader, but the structure of the I-Measure and the simplicity of the information diagram for a Markov chain are best explained as a special case of a Markov random field. Information inequalities are sometimes called the laws of information theory because they govern the impossibilities in information theory. In Chapter 13, the geometrical meaning of information inequalities and the relation between information inequalities and conditional independence are explained in depth. The framework for information inequalities discussed here is the basis of the next two chapters. Chapter 14 explains how the problem of proving information inequalities can be formulated as a linear programming problem. This leads to a complete characterization of all information inequalities provable by conventional techniques. These inequalities, called Shannon-type inequalities, can be proved by the World Wide Web available software package ITIP. It is also shown how Shannon-type inequalities can be used to tackle the implication problem of conditional independence in probability theory. Shannon-type inequalities are all the information inequalities known during the first half century of information theory. In the late 1990s, a few new inequalities, called non-Shannon-type inequalities, were discovered. These inequalities imply the existence of laws in information theory beyond those laid down by Shannon. In Chapter 15, we discuss these inequalities and their applications. Chapter 16 explains an intriguing relation between information theory and group theory. Specifically, for every information inequality satisfied by any joint probability distribution, there is a corresponding group inequality satisfied by any finite group and its subgroups and vice versa. Inequalities of the latter type govern the orders of any finite group and their subgroups. Group-theoretic proofs of Shannon-type information inequalities are given. At the end of the chapter, a group inequality is obtained from a non-Shannontype inequality discussed in Chapter 15. The meaning and the implication of this inequality are yet to be understood. Chapter 17 starts Part II of the book with a discussion of the butterfly network, the primary example in network coding. Variations of the butterfly

xPrefacenetwork are analyzed in detail.The advantage of network coding over storeand-forward in wireless and satellite communications is explained through asimple example. We also explain why network coding with multiple infor-mation sources is substantially different from network coding with a singleinformation source.In Chapter 18, the fundamental bound for single-source network coding,called themax-flowbound,is explained in detail.Thebound is establishedforageneral class ofnetworkcodes.In Chapter 19, we discuss various classes of linear network codes on acyclicnetworksthatachievethemax-flowbound todifferentextents.Staticnetworkcodes, a special class of linear network codes that achieves the max-flow boundin the presence of channel failure, are also discussed.Polynomial-time algo-rithms for constructing these codes are presented.In Chapter 20,weformulateand analyze convolutional network codes oncyclic networks.The existence of such codes that achieve the max-flow boundis proved.Network codingtheoryisfurtherdeveloped in Chapter21.Thescenariowhen more than one information source are multicast in a point-to-pointacyclic network is discussed. An implicit characterization of the achievableinformation rate region which involves theframework for information inequalities developed inPartIis proved.HowtoUseThis book4110PartI121314015+16Part II1819?202117
X Preface network are analyzed in detail. The advantage of network coding over storeand-forward in wireless and satellite communications is explained through a simple example. We also explain why network coding with multiple information sources is substantially different from network coding with a single information source. In Chapter 18, the fundamental bound for single-source network coding, called the max-flow bound, is explained in detail. The bound is established for a general class of network codes. In Chapter 19, we discuss various classes of linear network codes on acyclic networks that achieve the max-flow bound to different extents. Static network codes, a special class of linear network codes that achieves the max-flow bound in the presence of channel failure, are also discussed. Polynomial-time algorithms for constructing these codes are presented. In Chapter 20, we formulate and analyze convolutional network codes on cyclic networks. The existence of such codes that achieve the max-flow bound is proved. Network coding theory is further developed in Chapter 21. The scenario when more than one information source are multicast in a point-to-point acyclic network is discussed. An implicit characterization of the achievable information rate region which involves the framework for information inequalities developed in Part I is proved. How to Use This book