ECOLE POLYTECHNIQUE FEDERALE DE LAUSANNE School of Computer and Communication Sciences Handout 18 Information Theory and Codins Notes on random coding December 12.2005 RANDOM CODING in this note we prove the achievability part of the channel coding theorem for discrete memorvless channels without feedback.The discussion is largely based on chapter 5 of R.G.Gallager,Information Theory and Reliable Communication.Wiley,1968. 1.DISCRETE MEMORYLESS CHANNELS WITHOUT FEEDBACK de pociving for oncthe fumetion and lavior of the t the co ll be completely B:X*×Jy*→R,(,)P(z,-), al ne pa P(,)=P(, which in means that the channel keeps no me channel do es not ave d at c ion P on the right hand side i not an expiit fuction of If a memoryless channel is used without feedback,i.e.,if Pxx)=Pxx) (in words:if the channel inputs do not depend on the past channel outputs)then ()- Px(IT) _Π-1xe,时-) Pxr(ri) _Ixxe-乃x-) Px好() IΠ-1PxX-(arl-)乃Ix(lr) Pxr(i)
´ECOLE POLYTECHNIQUE F´ED´ERALE DE LAUSANNE School of Computer and Communication Sciences Handout 18 Information Theory and Coding Notes on Random Coding December 12, 2003 Random Coding In this note we prove the achievability part of the channel coding theorem for discrete memoryless channels without feedback. The discussion is largely based on chapter 5 of R. G. Gallager, Information Theory and Reliable Communication, Wiley, 1968. 1. Discrete Memoryless Channels Without Feedback Throughout this note we will fix the channel we wish to communicate over. Let X and Y denote the input and output alphabets of this channel. We will assume that the channel is discrete, i.e., that X and Y are finite sets. The behavior of the channel will be completely described by specifying for each k ≥ 1 the function Pk : X k × Yk → R, (x k 1 , yk 1 ) 7→ Pk(yk|x k 1 , yk−1 1 ), which gives the probability of receiving the letter yk at the output at time k, given the past and current inputs to the channel, and the past outputs of the channel. We make the two following assumptions: the channel is memoryless and used without feedback. We will call a channel memoryless if Pk(yk|x k 1 , yk−1 1 ) = P(yk|xk), which in words means that the channel keeps no memory of the past inputs and outputs in determining the output of time k. Also note that the channel does not behave differently at different times: the function P on the right hand side is not an explicit function of k. If a memoryless channel is used without feedback, i.e., if PXk|X k−1 1 ,Y k−1 1 (xk|x k−1 1 , yk−1 1 ) = PXk|X k−1 1 (xk|x k−1 1 ) (in words: if the channel inputs do not depend on the past channel outputs) then PY n 1 |Xn 1 (y n 1 |x n 1 ) = PXn 1 ,Y n 1 (x n 1 , yn 1 ) PXn 1 (x n 1 ) = Qn k=1 PXk,Yk|X k−1 1 ,Y k−1 1 (xk, yk|x k−1 1 , yk−1 1 ) PXn 1 (x n 1 ) = Qn k=1 PXk|X k−1 1 ,Y k−1 1 (xk|x k−1 1 , yk−1 1 )PYk|Xk 1 ,Y k−1 1 (yk|x k 1 , yk−1 1 ) PXn 1 (x n 1 ) = Qn k=1 PXk|X k−1 1 (xk|x k−1 1 )PYk|Xk (yk|xk) PXn 1 (x n 1 ) = Yn k=1 P(yk|xk)
where we use the memoryless and without feedback conditions at the fourth equality. From now on,we will restrict our attention of channels used without feedback.With some abuse of notation we will let P(yx)denote the probability of receiving the sequence y =y"at the output of the channel when the channel input is the sequence x=x".If the channel is memoryless,we see from above that Pyx)=ΠP() 2.BLOCK CODES A block code with M messages and block lengthn is a mapping from a set of M messages 化pcd长eW出put of1eu8hs.aoc冰code specid whe@ input sequences ci (c11,.,cin).,cA (cM.1,.cM.m) the messages are mapped into.We will call cm the codeword for message m. To send message m with such a block code we simply give the sequence c to the channel as input. A decoder for such a block code is a mapping from channel output sequences yn to the set of M messages {1,.,M).For a given decoder,let Dc y"denote the set of channel outputs which are mapped to message m.Since an output sequence y is mapped to exactly one message,D's form a collection of disjoint sets whose union is ya. We define the rate of a block code with M messages and block length n as n and given such a code and a decoder we define Pm=∑P(ylcm) the probability of a decoding error when message m is sent.Further define Pame=方∑P.m and P.mas=,Pm m=1 as the average and maximal(both over the possible messages)error probability of such a code and decoder. Among many possible decoding methods,the rule that minimizes Peave is the maximum likelihood rule.Given a channel output sequence y,the maximum likelihood rule decodes a message m for which P(ylcm)≥P(ylcm')for every m'≠n, tha such m choos of them arbitrarily.We will restrict 3.ERROR PROBABILITY FOR TWO CODEWORDS 2,s8 onsists of two codewords,c and
where we use the memoryless and without feedback conditions at the fourth equality. From now on, we will restrict our attention of channels used without feedback. With some abuse of notation we will let P(y|x) denote the probability of receiving the sequence y = y n 1 at the output of the channel when the channel input is the sequence x = x n 1 . If the channel is memoryless, we see from above that P(y|x) = Yn k=1 P(yk|xk). 2. Block Codes A block code with M messages and block length n is a mapping from a set of M messages {1, . . . , M} to channel input sequences of length n. Thus, a block code is specified when we specify the M channel input sequences c1 = (c1,1, . . . , c1,n), . . . , cM = (cM,1, . . . , cM,n) the messages are mapped into. We will call cm the codeword for message m. To send message m with such a block code we simply give the sequence cm to the channel as input. A decoder for such a block code is a mapping from channel output sequences Y n to the set of M messages {1, . . . , M}. For a given decoder, let Dm ⊂ Yn denote the set of channel outputs which are mapped to message m. Since an output sequence y is mapped to exactly one message, Dm’s form a collection of disjoint sets whose union is Y n . We define the rate of a block code with M messages and block length n as ln M n , and given such a code and a decoder we define Pe,m = X y6∈Dm P(y|cm), the probability of a decoding error when message m is sent. Further define Pe,ave = 1 M X M m=1 Pe,m and Pe,max = max 1≤m≤M Pe,m as the average and maximal (both over the possible messages) error probability of such a code and decoder. Among many possible decoding methods, the rule that minimizes Pe,ave is the maximum likelihood rule. Given a channel output sequence y, the maximum likelihood rule decodes a message m for which P(y|cm) ≥ P(y|cm0) for every m0 6= m, and if there are more than one such m chooses one of them arbitrarily. We will restrict ourselves in the following to the maximum likelihood rule. 3. Error probability for two codewords Consider now the case when M = 2, so the block code consists of two codewords, c1 and c2. We will find a bound on Pe,m for the maximum likelihood decoding rule. 2
Suppose message 1 is to be sent.The channel input is then c,and the probability of receiving y at the channel output is P(yc).An error will occur if the received sequence is not in D.Since for every y not in D,P(ylc2)>P(ylc),(but there may be y's for which P(ylc2)=P(ylc)and y Di)we have P1=Pylc) Pylc) v:P(ylea)zP(yle.) =>Pylc)1Pyle2≥PyIe) ≤ N y:P(ylea)2P(ylc) s∑Pote%e for any s>0 =∑Pylc)l-Pylc2 The choice s=1/2 gives us P.1≤∑VP(ylc)Pylc2). and by symmetry,the same quantity also upper bounds P.2 For a memoryless channel P(ylCm)=ITP(yelcm.)and we obtain P.m≤∑VP(ylc)Pyle2 =∑.∑VPla)Phle2).VP((.】) -[∑P(le)P(nlcx)[∑VP(l)P(l】 =I∑VP(ek)P(e2) For example,for a binary symmetric channel with crossover probability e,we see that ∑VPa)P)= 1 if Ci=C2.k 2ve(1-e)else, and we obtain Pem≤e(1-ejp where d is the number of places ci and c2 are different (i.e.,d=:c) 4.ERROR PROBABILITY FOR TWO RANDOMLY CHOSEN CODEWORDS Suppose e cod ord sci and ca are chos sen randomly and indepen
Suppose message 1 is to be sent. The channel input is then c1, and the probability of receiving y at the channel output is P(y|c1). An error will occur if the received sequence is not in D1. Since for every y not in D1, P(y|c2) ≥ P(y|c1), (but there may be y’s for which P(y|c2) = P(y|c1) and y ∈ D1) we have Pe,1 = X y6∈D1 P(y|c1) ≤ X y: P(y|c2)≥P(y|c1) P(y|c1) = X y P(y|c1)11P(y|c2)≥P(y|c1) ≤ X y:P(y|c2)≥P(y|c1) P(y|c1) P(y|c2) s P(y|c1) s ≤ X y P(y|c1) P(y|c2) s P(y|c1) s for any s ≥ 0 = X y P(y|c1) 1−sP(y|c2) s The choice s = 1/2 gives us Pe,1 ≤ X y p P(y|c1)P(y|c2), and by symmetry, the same quantity also upper bounds Pe,2. For a memoryless channel P(y|cm) = Qn k=1 P(yk|cm,k) and we obtain Pe,m ≤ X y p P(y|c1)P(y|c2) = X y1 · · ·X yn q P(y1|c1,1)P(y1|c2,1). . . q P(yn|c1,nP(yn|c2,n) = hX y1 q P(y1|c1,1)P(y1|c2,1) i . . . hX yn q P(yn|c1,n)P(yn|c2,n) i = Yn k=1 hX y q P(y|c1,k)P(y|c2,k) i . For example, for a binary symmetric channel with crossover probability , we see that X y q P(y|c1,k)P(y|c2,k) = ( 1 if c1,k = c2,k 2 p (1 − ) else, and we obtain Pe,m ≤ [4(1 − )]d/2 where d is the number of places c1 and c2 are different (i.e., d = |{k : c1,k 6= c2,k}|). 4. Error Probability for two randomly chosen codewords Suppose now that the codewords c1 and c2 are chosen randomly and independently each from a distribution Q on X n . Observe that the codewords are random variables C1 and 3
C2 and the probability that the block code is the particular block code with codewords c and ca is given by (e)(ca).The error probabilities P are now random variables Pm since they are functions of C and C2.Let Pm denote the expectation of P.m We will now give an upper bound on Pm in two different ways.The first is the more straightforward,but the second way will turn out to be conceptually more useful.We will take m=1,but isis clear by symmetry that P=P.2. Method 1:Write P1=E[Pea] =∑∑Q(c)Q(e)E[PlC=c,C=c But when we are given c and c P is no longer random,and we can use the bound of the last section on the probability of error to get ps∑∑QcQe)∑VP(ylc.)VP(ylca) =P可Q可 =∑∑Q(e)vP(yk可 Method 2:Write Pe1 E[Pel] =>Q(ci)P(ylci)E[PealCi =c1,Y =y]. Note that here we are computing the expectation by first conditioning on the transmitted and received sequences.Now,observe that given Ci=ci and the received sequence y,an error is sure not to occur if C2 is chosen such that P(yC2)Q(c2)1P(ylea)zP(yle) for any s≥0. ¥
C2 and the probability that the block code is the particular block code with codewords c1 and c2 is given by Q(c1)Q(c2). The error probabilities Pe,m are now random variables Pe,m since they are functions of C1 and C2. Let P¯ e,m denote the expectation of Pe,m. We will now give an upper bound on P¯ e,m in two different ways. The first is the more straightforward, but the second way will turn out to be conceptually more useful. We will take m = 1, but is is clear by symmetry that P¯ e,1 = P¯ e,2. Method 1: Write P¯ e,1 = E[Pe,1] = X c1 X c2 Q(c1)Q(c2)E[Pe,1|C1 = c1, C2 = c2]. But when we are given c1 and c2, Pe,1 is no longer random, and we can use the bound of the last section on the probability of error to get P¯ e,1 ≤ X c1 X c2 Q(c1)Q(c2) X y p P(y|c1) p P(y|c2) = X y hX c1 Q(c1) p P(y|c1) ihX c2 Q(c2) p P(y|c2) i = X y hX c Q(c) p P(y|c) i2 where the last equality follows by noting that the sums in the brackets in the next to last line are identical since they differ only by the index of summation. Method 2: Write P¯ e,1 = E[Pe,1] = X c1 X y Q(c1)P(y|c1)E[Pe,1|C1 = c1, Y = y]. Note that here we are computing the expectation by first conditioning on the transmitted and received sequences. Now, observe that given C1 = c1 and the received sequence y, an error is sure not to occur if C2 is chosen such that P(y|C2) < P(y|c1), and otherwise we can upper bound Pe,1 by 1. Thus, given C1 = c1 and y, we have Pe,1 ≤ ( 1 if P(y|C2) ≥ P(y|c1) 0 if P(y|C2) < P(y|c1) = 11P(y|C2)≥P(y|c1) . Taking expectations we see that E[Pe,1|C1 = c1, Y = y] ≤ Pr(B2) where B2 is the event that C2 is chosen such that P(y|C2) ≥ P(y|c1). We can bound Pr(B2) by Pr(B2) = X c2 Q(c2)11P(y|c2)≥P(y|c1) ≤ X c2 Q(c2) P(y|c2) s P(y|c1) s for any s ≥ 0. 4
Taking s=1/2 and substituting back we obtain the same bound as before, p,≤∑∑Q(c)vPyo可 channels the bound simplifies if (c)is chosen to beQ(c)=(c). ≤∑[∑Q()vF( 5.AVERAGE ERROR PROBABILITY OF A RANDOMLY CHOSEN CODE Consider now a code with M codewords,each codeword cm chosen independently according to the probability distribution Q.Just as in the above section,the probability that the block code constructed is a particular block code with codewords c1,.,ca is =Q(cm). The error probabilities Pe.m are also random variables,and again let Pem denote the expectation of Pe.m.By the symmetry with respect to the permutation of the codewords in the construction,we see that P.m does not depend on m,and it will suffice to analyze P.1.We will take the extension of 'method 2'in the previous section as our analysis method,and write P1=∑∑Q(c)Pylc)E[PealC=c,Y=y c1 y Now,for a given c and y define for each m >2,Bm as the event that codeword C is chosen such that P(yC)P(ylc),i.e.,that codeword m is at least as likely as the transmitted codeword.Then just as in the previous section, EPlC1=c,Y=y≤Pr(UBn) m2 ≤mim1,∑Pr(Bn)} ≤[∑Pr(Bjr for all p[0,1]. m=2 The second inequality above is just the union bound,the third inequality is because for p∈0,1,x≤z when r∈0,1],and1≤z when r≥l. Observe now that Pr(Bm)=>Q(Cm)1p(ylem)2P(yle) ≤Σee for any s0 Cm -2oo0器 and thus EP.lC=c,Y=≤W-)∑Qoe· P(ylc)sle
Taking s = 1/2 and substituting back we obtain the same bound as before, P¯ e,1 ≤ X y hX c Q(c) p P(y|c) i2 . For memoryless channels the bound simplifies if Q(c) is chosen to be Q(c) = Qn k=1 Q(ck). In this case we obtain P¯ e,1 ≤ X y hX x Q(x) p P(y|x) i2 n . 5. Average error probability of a randomly chosen code Consider now a code with M codewords, each codeword cm chosen independently according to the probability distribution Q. Just as in the above section, the probability that the block code constructed is a particular block code with codewords c1, . . . , cM is QM m=1 Q(cm). The error probabilities Pe,m are also random variables, and again let P¯ e,m denote the expectation of Pe,m. By the symmetry with respect to the permutation of the codewords in the construction, we see that P¯ e,m does not depend on m, and it will suffice to analyze P¯ e,1. We will take the extension of ‘method 2’ in the previous section as our analysis method, and write P¯ e,1 = X c1 X y Q(c1)P(y|c1)E[Pe,1|C1 = c1, Y = y]. Now, for a given c1 and y define for each m ≥ 2, Bm as the event that codeword Cm is chosen such that P(y|Cm) ≥ P(y|c1), i.e., that codeword m is at least as likely as the transmitted codeword. Then just as in the previous section, E[Pe,1|C1 = c1, Y = y] ≤ Pr [ M m=2 Bm ≤ minn 1, X M m=2 Pr(Bm) o ≤ hX M m=2 Pr(Bm) iρ for all ρ ∈ [0, 1]. The second inequality above is just the union bound, the third inequality is because for ρ ∈ [0, 1], x ≤ x ρ when x ∈ [0, 1], and 1 ≤ x ρ when x ≥ 1. Observe now that Pr(Bm) = X cm Q(cm)11P(y|cm)≥P(y|c1) ≤ X cm Q(cm) P(y|cm) s P(y|c1) s for any s ≥ 0 = X c Q(c) P(y|c) s P(y|c1) s and thus E[Pe,1|C1 = c1, Y = y] ≤ (M − 1)X c Q(c) P(y|c) s P(y|c1) s ρ . 5
Substituting back we get ≤(W-1P∑∑Qc)Pye)-][∑Q(e)P(yle) Choosing now s =1/(1+p)(this choice in fact minimizes the bound)and observing that for this choice 1-sp=s,and that [∑Qe)P(yl)-n-[∑Q(e)P(ylc)月 since the two summations differ only by the summation index,we get E≤M-1r∑[∑Q(e)P(yle)+ If we now specialize this theorem to discrete memoryless channels and if we choose Q(c)=ΠI,Q(ce),we get that for every p∈0,1, If M=[eR],then (M-1)seR,and we can summarize the above as THEOREM 1.Given a discrete memoryless channel described by P(),for any blocklength n,and any R20 consider constructing a random block code with M=[enR]codewords by choosing each letter of each codeword independently according to a distribution Q on X.Then,the expected average error probability of this random code satisfies P0,there exists a code of block length n and rate at least R with Pae≤exp{-nE.(R,Q)} where E.(R.Q)-[E(pQ)-Rl Note that the above corollary establishes the existence of codes of a certain rate with a guarantee on their p but suggests no mechanism to find them.However,if one does carry out the experiment of constructing a code by randomly choosing its codewords. rage is small,in the probability that the code obtained will be much worse than the aver quality tells us that the probability that a code constructed as such has Plarger than is smal PrPeave≥a Peave]≤1/a for a>1. 6
Substituting back we get P¯ e,1 ≤ (M − 1)ρX y hX c1 Q(c1)P(y|c1) 1−sρihX c Q(c)P(y|c) s iρ . Choosing now s = 1/(1 + ρ) (this choice in fact minimizes the bound) and observing that for this choice 1 − sρ = s, and that hX c1 Q(c1)P(y|c1) 1−sρi = hX c Q(c)P(y|c) s i since the two summations differ only by the summation index, we get P¯ e,1 ≤ (M − 1)ρX y hX c Q(c)P(y|c) 1/(1+ρ) i1+ρ . If we now specialize this theorem to discrete memoryless channels and if we choose Q(c) = Qn i=1 Q(ci), we get that for every ρ ∈ [0, 1], P¯ e,1 ≤ (M − 1)ρ X y hX x Q(x)P(y|x) 1/(1+ρ) i1+ρ n If M = de nRe, then (M − 1) ≤ e nR, and we can summarize the above as Theorem 1. Given a discrete memoryless channel described by P(y|x), for any blocklength n, and any R ≥ 0 consider constructing a random block code with M = de nRe codewords by choosing each letter of each codeword independently according to a distribution Q on X . Then, the expected average error probability of this random code satisfies P¯ e,ave ≤ exp −n max ρ∈[0,1] [E0(ρ, Q) − ρR] where E0(ρ, Q) = − lnX y hX x Q(x)P(y|x) 1/(1+ρ) i1+ρ . Since the expected error probability cannot be better than the error probability of the best code we also get Corollary 1. Given a discrete memoryless channel described by P(y|x), for any distribution Q on X , any blocklength n and any R > 0, there exists a code of block length n and rate at least R with Pe,ave ≤ exp −nEr(R, Q) . where Er(R, Q) = max ρ∈[0,1] [E0(ρ, Q) − ρR]. Note that the above corollary establishes the existence of codes of a certain rate with a guarantee on their Pe,ave, but suggests no mechanism to find them. However, if one does carry out the experiment of constructing a code by randomly choosing its codewords, the probability that the code obtained will be much worse than the average is small, in particular, Markov inequality tells us that the probability that a code constructed as such has Pe,ave larger than αP¯ e,ave is small: Pr[Pe,ave ≥ αP¯ e,ave] ≤ 1/α for α > 1. 6
IQ) (p.Q) -P Figure 1:Eo(p,Q) Thus,the difficulty in finding good practical codes is not that the good codes are rare,but the difficulty is in find good codes for which there are practical methods to encode and decode The corollary above makes guarantees on P but for a code constructed by random coding the value of P may be much higher than PCan we ensure the existence of codes with small Pmax THEOREM 2.Given a discrete memoryless channel described by P(yl),for any distribution Q on A,any blocklength n and any R>0.there exists a block code of length n and rate at least R with Pe.max Q()P(ulz)in ∑-Q(r)P(r
ρI(Q) E0(ρ, Q) ρ Figure 1: E0(ρ, Q) Thus, the difficulty in finding good practical codes is not that the good codes are rare, but the difficulty is in find good codes for which there are practical methods to encode and decode. The corollary above makes guarantees on Pe,ave, but for a code constructed by random coding the value of Pe,max may be much higher than Pe,ave. Can we ensure the existence of codes with small Pe,max? Theorem 2. Given a discrete memoryless channel described by P(y|x), for any distribution Q on X , any blocklength n and any R > 0, there exists a block code of length n and rate at least R with Pe,max ≤ 4 exp −nEr(R, Q) . Proof. Let M0 = d2e nRe. We know that there is a code with M0 codewords and satisfies Pe,ave = 1 M0 X M0 m=1 Pe,m ≤ (M0 − 1)ρ exp −nE0(ρ, Q) for every ρ ∈ [0, 1]. Since (M0 − 1)ρ ≤ 2 ρ e nρR ≤ 2e nρR , we see that for this code Pe,ave = 1 M0 X M0 m=1 Pe,m ≤ 2 exp −nEr(R, Q) . Now, among the M0 numbers Pe,1, . . . , Pe,M0 there cannot be more than M0/2 which exceed twice the average value Pe,ave. Thus, among these M0 numbers there exist at least de nRe of them which satisfy Pe,m ≤ 2Pe,ave ≤ 4 exp −nEr(R, Q) . Keeping only these codewords, and noticing that in the maximum likelihood decoder that corresponds to this code the decoding sets can only be enlarged, we see that we have constructed a code with the desired properties. 5.1. Properties of E0 and Er The value of the existence theorems we just proved depend on whether the bound we have on the probability of error can be made arbitrarily small for a set of rates of interest. Define I(Q) = X x X y Q(x)P(y|x) ln P(y|x) P x0 Q(x 0 )P(y|x 0 ) 7
as the value of mutual information between the input and output when the input distri- bution is Q.We will now show that for every rate R0 and thus for every rate R0 whenever for some p 0,1], Eo(p,Q)-pR >0 is satisfied. Eo(Q) Eo(i.Q) E(R,Q) I(Q)R Figure 2:E(R)as a maximum of linear functions It now follows that if R0 and thus E(R,Q)>0.But, 1g=0BQ刨 ap =0,B0@-▣g 0 where the last equality follows from fact that Eo(0,Q)=-In Q(E)P(l)=-In1= 0.Thus,by the definition of the limit above,for every R0.Since the above argument holds for any input distribution Q,it holds in particular for the Q that maximizes 1(O).This proves the existence of codes with arbitrarily small probability of error for every rate less than the capacity max I(Q). The approach we used here yields to the function E(R,Q)called random coding error exponent.To prove the achievability part of the coding theorem it would suffice to show that for any rate R below capacity there exists a sequence of rate R codes of length n with corresponding error probability that tends to zero as n tends to infinity.The random coding error exponent argument in addition to prove it,also characterizes the decay of the probability of error as n tends to infinity
as the value of mutual information between the input and output when the input distribution is Q. We will now show that for every rate R 0, and thus for every rate R 0 whenever for some ρ ∈ [0, 1], E0(ρ, Q) − ρR > 0 is satisfied. E0(1, Q) E0( 1 2 , Q) E0( 1 4 , Q) I(Q) Er(R, Q) R Figure 2: Er(R) as a maximum of linear functions. It now follows that if R 0 and thus Er(R, Q) > 0. But, I(Q) = ∂E0(ρ, Q) ∂ρ ρ=0 = lim ρ→0 E0(ρ, Q) − E0(0, Q) ρ = lim ρ→0 E0(ρ, Q) ρ . where the last equality follows from fact that E0(0, Q) = − lnP x,y Q(x)P(y|x) = − ln 1 = 0. Thus, by the definition of the limit above, for every R R. Hence for every R 0. Since the above argument holds for any input distribution Q, it holds in particular for the Q that maximizes I(Q). This proves the existence of codes with arbitrarily small probability of error for every rate less than the capacity maxQ I(Q). The approach we used here yields to the function Er(R, Q) called random coding error exponent. To prove the achievability part of the coding theorem it would suffice to show that for any rate R below capacity there exists a sequence of rate R codes of length n with corresponding error probability that tends to zero as n tends to infinity. The random coding error exponent argument in addition to prove it, also characterizes the decay of the probability of error as n tends to infinity. 8