西安电子科技大学：《信息论基础》课程教学资源（学习资料）随机编码错误指数（Gallager bound）.pdf_大学文库

´ECOLE POLYTECHNIQUE F´ED´ERALE DE LAUSANNE School of Computer and Communication Sciences Handout 18 Information Theory and Coding Notes on Random Coding December 12, 2003 Random Coding In this note we prove the achievability part of the channel coding theorem for discrete memoryless channels without feedback. The discussion is largely based on chapter 5 of R. G. Gallager, Information Theory and Reliable Communication, Wiley, 1968. 1. Discrete Memoryless Channels Without Feedback Throughout this note we will fix the channel we wish to communicate over. Let X and Y denote the input and output alphabets of this channel. We will assume that the channel is discrete, i.e., that X and Y are finite sets. The behavior of the channel will be completely described by specifying for each k ≥ 1 the function Pk : X k × Yk → R, (x k 1 , yk 1 ) 7→ Pk(yk|x k 1 , yk−1 1 ), which gives the probability of receiving the letter yk at the output at time k, given the past and current inputs to the channel, and the past outputs of the channel. We make the two following assumptions: the channel is memoryless and used without feedback. We will call a channel memoryless if Pk(yk|x k 1 , yk−1 1 ) = P(yk|xk), which in words means that the channel keeps no memory of the past inputs and outputs in determining the output of time k. Also note that the channel does not behave differently at different times: the function P on the right hand side is not an explicit function of k. If a memoryless channel is used without feedback, i.e., if PXk|X k−1 1 ,Y k−1 1 (xk|x k−1 1 , yk−1 1 ) = PXk|X k−1 1 (xk|x k−1 1 ) (in words: if the channel inputs do not depend on the past channel outputs) then PY n 1 |Xn 1 (y n 1 |x n 1 ) = PXn 1 ,Y n 1 (x n 1 , yn 1 ) PXn 1 (x n 1 ) = Qn k=1 PXk,Yk|X k−1 1 ,Y k−1 1 (xk, yk|x k−1 1 , yk−1 1 ) PXn 1 (x n 1 ) = Qn k=1 PXk|X k−1 1 ,Y k−1 1 (xk|x k−1 1 , yk−1 1 )PYk|Xk 1 ,Y k−1 1 (yk|x k 1 , yk−1 1 ) PXn 1 (x n 1 ) = Qn k=1 PXk|X k−1 1 (xk|x k−1 1 )PYk|Xk (yk|xk) PXn 1 (x n 1 ) = Yn k=1 P(yk|xk)

where we use the memoryless and without feedback conditions at the fourth equality. From now on, we will restrict our attention of channels used without feedback. With some abuse of notation we will let P(y|x) denote the probability of receiving the sequence y = y n 1 at the output of the channel when the channel input is the sequence x = x n 1 . If the channel is memoryless, we see from above that P(y|x) = Yn k=1 P(yk|xk). 2. Block Codes A block code with M messages and block length n is a mapping from a set of M messages {1, . . . , M} to channel input sequences of length n. Thus, a block code is specified when we specify the M channel input sequences c1 = (c1,1, . . . , c1,n), . . . , cM = (cM,1, . . . , cM,n) the messages are mapped into. We will call cm the codeword for message m. To send message m with such a block code we simply give the sequence cm to the channel as input. A decoder for such a block code is a mapping from channel output sequences Y n to the set of M messages {1, . . . , M}. For a given decoder, let Dm ⊂ Yn denote the set of channel outputs which are mapped to message m. Since an output sequence y is mapped to exactly one message, Dm’s form a collection of disjoint sets whose union is Y n . We define the rate of a block code with M messages and block length n as ln M n , and given such a code and a decoder we define Pe,m = X y6∈Dm P(y|cm), the probability of a decoding error when message m is sent. Further define Pe,ave = 1 M X M m=1 Pe,m and Pe,max = max 1≤m≤M Pe,m as the average and maximal (both over the possible messages) error probability of such a code and decoder. Among many possible decoding methods, the rule that minimizes Pe,ave is the maximum likelihood rule. Given a channel output sequence y, the maximum likelihood rule decodes a message m for which P(y|cm) ≥ P(y|cm0) for every m0 6= m, and if there are more than one such m chooses one of them arbitrarily. We will restrict ourselves in the following to the maximum likelihood rule. 3. Error probability for two codewords Consider now the case when M = 2, so the block code consists of two codewords, c1 and c2. We will find a bound on Pe,m for the maximum likelihood decoding rule. 2

Substituting back we get P¯ e,1 ≤ (M − 1)ρX y hX c1 Q(c1)P(y|c1) 1−sρihX c Q(c)P(y|c) s iρ . Choosing now s = 1/(1 + ρ) (this choice in fact minimizes the bound) and observing that for this choice 1 − sρ = s, and that hX c1 Q(c1)P(y|c1) 1−sρi = hX c Q(c)P(y|c) s i since the two summations differ only by the summation index, we get P¯ e,1 ≤ (M − 1)ρX y hX c Q(c)P(y|c) 1/(1+ρ) i1+ρ . If we now specialize this theorem to discrete memoryless channels and if we choose Q(c) = Qn i=1 Q(ci), we get that for every ρ ∈ [0, 1], P¯ e,1 ≤ (M − 1)ρ X y hX x Q(x)P(y|x) 1/(1+ρ) i1+ρ n If M = de nRe, then (M − 1) ≤ e nR, and we can summarize the above as Theorem 1. Given a discrete memoryless channel described by P(y|x), for any blocklength n, and any R ≥ 0 consider constructing a random block code with M = de nRe codewords by choosing each letter of each codeword independently according to a distribution Q on X . Then, the expected average error probability of this random code satisfies P¯ e,ave ≤ exp −n max ρ∈[0,1] [E0(ρ, Q) − ρR] where E0(ρ, Q) = − lnX y hX x Q(x)P(y|x) 1/(1+ρ) i1+ρ . Since the expected error probability cannot be better than the error probability of the best code we also get Corollary 1. Given a discrete memoryless channel described by P(y|x), for any distribution Q on X , any blocklength n and any R > 0, there exists a code of block length n and rate at least R with Pe,ave ≤ exp −nEr(R, Q) . where Er(R, Q) = max ρ∈[0,1] [E0(ρ, Q) − ρR]. Note that the above corollary establishes the existence of codes of a certain rate with a guarantee on their Pe,ave, but suggests no mechanism to find them. However, if one does carry out the experiment of constructing a code by randomly choosing its codewords, the probability that the code obtained will be much worse than the average is small, in particular, Markov inequality tells us that the probability that a code constructed as such has Pe,ave larger than αP¯ e,ave is small: Pr[Pe,ave ≥ αP¯ e,ave] ≤ 1/α for α > 1. 6