CSCI5370 Quantum Computing November 25,2013 Lecture 11:Quantum Information III-Source Coding Lecturer:Shengyu Zhang Scribe:Hing Yin Tsang 11.1 Holevo's bound Suppose Alice has an information source X that generates symbol r with probability pr,her goal is to send the information to Bob.Classically,what Alice can do is to encode the symbol z as a codeword of certain length,namely CE {0,1]",then sends it to Bob and finally Bob decodes it to figure out what x is.Let Y be the symbol Bob gets after decoding,then I(X;Y)essentially measures how much information of X Bob can extract.It is clear that I(X;Y)<I(Cx;Y)=H(Cx)-H(Cx|Y)where the first inequality follows by the data processing inequality.It means that if we want Y to contain many information about X,then we need a long codeword.But what if we have a quantum channel that allows us to transmit qubits?In this case,Alice can actually encode z using qubits rather than classical bits,and the codeword she can send is now a quantum state of dimension 2 where n is the number of qubits.After receiving the quantum state,Bob can perform measurement on the state to get the variable Y and figure out what X.It is natural to ask whether we can encode X using a small number of qubits?Again we are interested in the quantity I(X;Y).From the above discussion,we know that I(X;Y)is upper bounded by the number of different states Alice can send.And the issue about classical bit is that a single bit can only represent 2 different states.However,a qubit is much more powerful since it can represent infinitely many different states!It seems that we can do much better in the quantum case.But after a moment of thought,we will realize that encoding X using one qubit does not solve the problem since Bob cannot distinguish the states perfectly unless the states are orthogonal! In general,the less qubits Alice uses,the more difficult Bob can distinguish them and hence extracting the information about X.So,there is a trade-off between the number qubit Alice use and how well Bob can distinguish the states.And how I(X;Y)is related to the quantum states is not clear.It turns out that one can upper bound I(X;Y)in terms of the von Neumann entropies of the some quantum states. Theorem 11.1 (Holevo's bound).Let X be a random variable such that Pr[X =z]=Pz.Suppose Alice sends a state pr to Bob if X=r,then for any measurement described by POVM elements {Ey} on the state Bob receive and let Y be the measurement outcome, I(X:Y)≤Sp)-∑pzS(pz): This theorem tells us what we can hope for if we want Y to contain much information about X. Indeed,since S(p)is at most the logarithm of the dimension of the quantum state,which is nothing but the number of qubits sent,it is now clear that quantum does not buy us much. Proof.The idea is to view everything as a quantum system consisting of three parts P,Q and M,where Q represents the quantum system pr Alice sends to Bob,and P,M represent the classical information X and Y.Then the result will follow by applying suitable inequalities for the von Neumann entropy. Formally,we consider the quantum state ppQM-∑Pp.lg⑧pz⑧10)@ P Q M The joint system PQ corresponds to the situation that Alice prepares the state Pr with probability pz.Then Bob performs a measurement with POVM elements [Ey}on the system Q and stores the measurement outcome into M without touching P.After this step the state would become pp'Qw=∑P.g⑧∑√瓦,Pz√瓦⑧刨 Q' M L11-1
CSCI5370 Quantum Computing November 25, 2013 Lecture 11: Quantum Information III - Source Coding Lecturer: Shengyu Zhang Scribe: Hing Yin Tsang 11.1 Holevo’s bound Suppose Alice has an information source X that generates symbol x with probability px, her goal is to send the information to Bob. Classically, what Alice can do is to encode the symbol x as a codeword of certain length, namely Cx ∈ {0, 1} n, then sends it to Bob and finally Bob decodes it to figure out what x is. Let Y be the symbol Bob gets after decoding, then I(X; Y ) essentially measures how much information of X Bob can extract. It is clear that I(X; Y ) ≤ I(CX; Y ) = H(CX) − H(CX|Y ) where the first inequality follows by the data processing inequality. It means that if we want Y to contain many information about X, then we need a long codeword. But what if we have a quantum channel that allows us to transmit qubits? In this case, Alice can actually encode x using qubits rather than classical bits, and the codeword she can send is now a quantum state of dimension 2n where n is the number of qubits. After receiving the quantum state, Bob can perform measurement on the state to get the variable Y and figure out what X. It is natural to ask whether we can encode X using a small number of qubits? Again we are interested in the quantity I(X; Y ). From the above discussion, we know that I(X; Y ) is upper bounded by the number of different states Alice can send. And the issue about classical bit is that a single bit can only represent 2 different states. However, a qubit is much more powerful since it can represent infinitely many different states! It seems that we can do much better in the quantum case. But after a moment of thought, we will realize that encoding X using one qubit does not solve the problem since Bob cannot distinguish the states perfectly unless the states are orthogonal! In general, the less qubits Alice uses, the more difficult Bob can distinguish them and hence extracting the information about X. So, there is a trade-off between the number qubit Alice use and how well Bob can distinguish the states. And how I(X; Y ) is related to the quantum states is not clear. It turns out that one can upper bound I(X; Y ) in terms of the von Neumann entropies of the some quantum states. Theorem 11.1 (Holevo’s bound). Let X be a random variable such that Pr[X = x] = px. Suppose Alice sends a state ρx to Bob if X = x, then for any measurement described by POVM elements {Ey} on the state Bob receive and let Y be the measurement outcome, I(X; Y ) ≤ S(ρ) − X x pxS(ρx). This theorem tells us what we can hope for if we want Y to contain much information about X. Indeed, since S(ρ) is at most the logarithm of the dimension of the quantum state, which is nothing but the number of qubits sent, it is now clear that quantum does not buy us much. Proof. The idea is to view everything as a quantum system consisting of three parts P, Q and M, where Q represents the quantum system ρx Alice sends to Bob, and P, M represent the classical information X and Y . Then the result will follow by applying suitable inequalities for the von Neumann entropy. Formally, we consider the quantum state ρP QM = X x px |xihx| | {z } P ⊗ ρx |{z} Q ⊗ |0ih0| | {z } M . The joint system P Q corresponds to the situation that Alice prepares the state ρx with probability px. Then Bob performs a measurement with POVM elements {Ey} on the system Q and stores the measurement outcome into M without touching P. After this step the state would become ρP 0Q0M0 = X x px |xihx| | {z } P 0 ⊗ X y p Eyρx p Ey | {z } Q0 ⊗ |yihy| | {z } M0 . L11-1
Now consider the quantity S(P;Q).It is clear that S(P;Q)=S(P;Q,M)since M is independent of P and Q.We also have S(P;Q,M)>S(P';Q',M')since the measurement only apply to the system QM,it does not increase the mutual information between P and QM.Finally observe that S(P;Q',M)>S(P;M')since discarding system does not increase mutual information.So we have the inequality S(P:Q)≥S(P';M'). We claim that S(P;M)=I(X;Y)and S(P;Q)=S(p)-pS(pz).The first one follows by the fact pp=∑Pzl⑧∑Pkl刨=Pzgl⑧l)刨 工,y For the second one,by definition we have S(P;Q)=S(pp)+S(pQ)-S(ppo).Observe that S(pp)= H(X),S(po)=S(p)and by the inequality for von Neumann entropy,we have S(ppQ)=H(X)+ pS(p).Putting all together yields the equality S(P:Q)-S(p)-PS(p)and hence complete the proof. 11.2 Application:quantum random access codes The Holevo's bound says that sending n bits of classical information requires n qubits.However,in some cases we are not very interested in knowing every bit of X(for instance when we are querying some entries in a database).Note that Holeve's bound does not say we need many qubits as I(X;Yi)is small(which is at most 1)for each fixed i.Classically,it is known that the codeword must have at least (1-H())m bits where m is the length of the original message and s is the success probability (this bound is almost tight as there exists classical random access code of length (1-H())m+O(log m)). Can we do better using quantum mechanics?It turns out that the answer is no.Nayak gave a very simple proof using Holevo's bound.First let us define what a quantum random access code is. Definition 11.2 (Quantum random access code).Let m,n be positive integers,>0,a (m,n,s)- quantum random access code is a function C that maps a m-bit string to a quantum state of dimension 2n such that for each 1 iEE [1/2,1],them S(p)(1-H(e))+(S(po)+S(p)). Now,suppose we have a(m,n,s)-quantum random access code.Let pr be the encoding of x and =(po+pi)where pP.By the definition of quantum random access code,there exists measurement {M,MI}such that for every Pa with xi =b,we have Tr(MoP)>E. It follows by the linearity of trace that Tr(Mbpb)>.So by Lemma 11.4,we have S(p)>(1-H(e))+ (S(Po)+S(p1)).And it is not difficult to see po still satisfy the condition of Lemma 11.4 and we have Sp)≥21-He》+1(SpPo)+S(po)+S(po)+Sp》 and by induction we have S(p)>m(1-H(e))+S(pb)>m(1-H(e)),as desired. L11-2
Now consider the quantity S(P; Q). It is clear that S(P; Q) = S(P; Q, M) since M is independent of P and Q. We also have S(P; Q, M) ≥ S(P 0 ; Q0 , M0 ) since the measurement only apply to the system QM, it does not increase the mutual information between P and QM. Finally observe that S(P 0 ; Q0 , M0 ) ≥ S(P 0 ; M0 ) since discarding system does not increase mutual information. So we have the inequality S(P; Q) ≥ S(P 0 ; M0 ). We claim that S(P 0 ; M0 ) = I(X; Y ) and S(P; Q) = S(ρ)− P x pxS(ρx). The first one follows by the fact ρP 0M0 = X x px|xihx| ⊗X y py|x|yihy| = X x,y px,y|xihx| ⊗ |yihy|. For the second one, by definition we have S(P; Q) = S(ρP ) + S(ρQ) − S(ρP Q). Observe that S(ρP ) = P H(X), S(ρQ) = S(ρ) and by the inequality for von Neumann entropy, we have S(ρP Q) = H(X) + x pxS(ρx). Putting all together yields the equality S(P; Q) = S(ρ) − P x pxS(ρx) and hence complete the proof. 11.2 Application: quantum random access codes The Holevo’s bound says that sending n bits of classical information requires n qubits. However, in some cases we are not very interested in knowing every bit of X (for instance when we are querying some entries in a database). Note that Holeve’s bound does not say we need many qubits as I(X; Yi) is small (which is at most 1) for each fixed i. Classically, it is known that the codeword must have at least (1 − H(ε))m bits where m is the length of the original message and ε is the success probability (this bound is almost tight as there exists classical random access code of length (1 − H(ε))m + O(log m)). Can we do better using quantum mechanics? It turns out that the answer is no. Nayak gave a very simple proof using Holevo’s bound. First let us define what a quantum random access code is. Definition 11.2 (Quantum random access code). Let m, n be positive integers, ε > 0, a (m, n, ε)- quantum random access code is a function C that maps a m-bit string to a quantum state of dimension 2n such that for each 1 ≤ i ≤ m, there exists a measurement {M0 i , M1 i } such that the probability that the measurement outcome is b is at least ε when the i-th bit of the string is b. Theorem 11.3 (Nayak’s bound). Let ε ∈ [1/2, 1], if (m, n, ε)-quantum random access code exists, then n ≥ (1 − H(ε))m. Proof. We use the following lemma, which follows from Holevo’s bound and Fano’s inequality: Lemma 11.4. Let ρ = 1 2 (ρ0 + ρ1), if there exists measurement {M0, M1} such that Tr(Mbρb) ≥ ε ∈ [1/2, 1], then S(ρ) ≥ (1 − H(ε)) + 1 2 S(ρ0) + S(ρ1) . Now, suppose we have a (m, n, ε)-quantum random access code. Let ρx be the encoding of x and ρ = P x 1 2m ρx = 1 2 (ρ0 + ρ1) where ρb = P x:x1=b 1 2m−1 ρx. By the definition of quantum random access code, there exists measurement {M0 1 , M1 1 } such that for every ρx with x1 = b, we have Tr(Mbρx) ≥ ε. It follows by the linearity of trace that Tr(Mbρb) ≥ ε. So by Lemma 11.4, we have S(ρ) ≥ (1 − H(ε)) + 1 2 S(ρ0) + S(ρ1) . And it is not difficult to see ρb still satisfy the condition of Lemma 11.4 and we have S(ρ) ≥ 2(1 − H(ε)) + 1 4 S(ρ00) + S(ρ01) + S(ρ10) + S(ρ11) , and by induction we have S(ρ) ≥ m(1 − H(ε)) + 1 2m P b S(ρb) ≥ m(1 − H(ε)), as desired. L11-2
Proof of Lemma 11.4.Let X be a uniform random bit,and let Y be the measurement outcome after applying the measurement {Mo,Mi}to the state p=(po+P1).Then by Holevo's bound,we have S(p)2 I(X;Y)+(S(po)+S(p)). Since we have Tr(Mbpb)>E,Pr[X =Y]>E and hence I(X;Y)=H(X)-H(XY)=1-H(XY).By Fano's inequality,let p Pr[XY],we have H(X Y)s>1/2.It follows that I(X;Y)>1-H(e)and the proof is completed. For completeness,we also give a proof of Fano's inequality.The following proof is quite standard that can be found in many textbooks on information theory such as[?]. Lemma 11.5 (Fano's inequality).Let X,Y be random variables with support A,if Pr[X Y]=p, then H(X Y)<H(p)+plog2(-1). Proof.Let 2 be a random variable that 1,if X =Y. 0,fX≠Y. Now,let us consider H(XY).Since H(ZX,Y)=0,we have H(XY)=H(X,ZY) H(ZY)+H(X Y,Z) ≤H(Z)+H(XIY,Z) H(p)+pH(X Y,Z=0)+(1-p)H(X Y,Z=1) =H(p)+pH(X Y,Z=0) ≤H(p)+plog2X1-1) where the first inequality follows since conditioning does not increase entropy,and the second inequality follows by the fact that if XY then the number of possible outcomes of X is at most-1. 11.3 Source coding theorems 11.3.1 Classical source coding theorem Suppose Alice have an information source(a random variable)X that generates symbol x with proba- bility pr.For the purpose of efficient storage or transmission,she would want to encode the information using some short codes.So that when she sends the codeword to Bob,Bob can decode it and extract the information back.Ideally,Alice would want to have such a encoding and decoding scheme so that the probability that Bob can get back the symbol x is arbitrarily close to 1.Intuitively,H(X)bits seem to be sufficient since the entropy of X is just H(X).However,for any fix X,it is impossible to achieve arbitrarily small error probability if the codeword is shorter than log2.Nevertheless,if we consider the asymptotic version of this problem,i.e.Alice has n independent copies of X,namely X",whose generates symbol z"=1...n with probability IIp,and she want to assign a codeword for each z"so that Bob can decode the codeword to get back x with probability tends to 1 as n-oo.It turns out that we can achieve it with rate (defined by the length of the codeword divided by n)arbitrarily close to H(X).And moreover,it is also necessary. Theorem 11.6 (Shannon's source coding theorem).Let Xn be n i.i.d.copies of X.Then the following holds: L11-3
Proof of Lemma 11.4. Let X be a uniform random bit, and let Y be the measurement outcome after applying the measurement {M0, M1} to the state ρ = 1 2 (ρ0 + ρ1). Then by Holevo’s bound, we have S(ρ) ≥ I(X; Y ) + 1 2 S(ρ0) + S(ρ1) . Since we have Tr(Mbρb) ≥ ε, Pr[X = Y ] ≥ ε and hence I(X; Y ) = H(X) − H(X|Y ) = 1 − H(X|Y ). By Fano’s inequality, let p = Pr[X 6= Y ], we have H(X|Y ) ≤ H(p) + p log2 (|X | − 1) ≤ H(ε) since X = 2 and p ≥ ε ≥ 1/2. It follows that I(X; Y ) ≥ 1 − H(ε) and the proof is completed. For completeness, we also give a proof of Fano’s inequality. The following proof is quite standard that can be found in many textbooks on information theory such as [?]. Lemma 11.5 (Fano’s inequality). Let X, Y be random variables with support X , if Pr[X 6= Y ] = p, then H(X|Y ) ≤ H(p) + p log2 (|X | − 1). Proof. Let Z be a random variable that Z = ( 1, if X = Y, 0, if X 6= Y. Now, let us consider H(X|Y ). Since H(Z|X, Y ) = 0, we have H(X|Y ) = H(X, Z|Y ) = H(Z|Y ) + H(X|Y, Z) ≤ H(Z) + H(X|Y, Z) = H(p) + pH(X|Y, Z = 0) + (1 − p)H(X|Y, Z = 1) = H(p) + pH(X|Y, Z = 0) ≤ H(p) + p log2 (|X | − 1) where the first inequality follows since conditioning does not increase entropy, and the second inequality follows by the fact that if X 6= Y then the number of possible outcomes of X is at most |X | − 1. 11.3 Source coding theorems 11.3.1 Classical source coding theorem Suppose Alice have an information source (a random variable) X that generates symbol x with probability px. For the purpose of efficient storage or transmission, she would want to encode the information using some short codes. So that when she sends the codeword to Bob, Bob can decode it and extract the information back. Ideally, Alice would want to have such a encoding and decoding scheme so that the probability that Bob can get back the symbol x is arbitrarily close to 1. Intuitively, H(X) bits seem to be sufficient since the entropy of X is just H(X). However, for any fix X, it is impossible to achieve arbitrarily small error probability if the codeword is shorter than log2 |X |. Nevertheless, if we consider the asymptotic version of this problem, i.e. Alice has n independent copies of X, namely Xn, whose generates symbol x n = x1 . . . xn with probability Qn i=1 pxi , and she want to assign a codeword for each x n so that Bob can decode the codeword to get back x n with probability tends to 1 as n → ∞. It turns out that we can achieve it with rate (defined by the length of the codeword divided by n) arbitrarily close to H(X). And moreover, it is also necessary. Theorem 11.6 (Shannon’s source coding theorem). Let Xn be n i.i.d. copies of X. Then the following holds: L11-3
1.fR>H(X),there erist Cn:Xn→{0,l}nR and Dn:{0,l}nR→n such that inDn(Cn(x"》=x门=1. 2.IfR0, and Tg={En:2-n(Hx))sp(")<2-(H(x)-)),then for n sufficiently large,the following holds: 1. Pr[X"∈T]≥1-d, (1) 2 (1-6)2HX)-e)≤Tg|≤2m(a(x)+e) (2) Proof.For the first item,by the definition of T,we have Pr[Xn E T]=Pr2-n(Hx)-e)≤p(X")≤2-n(a(X)+e] 2-s Pr r片容sa-sl≤可 ≥1-6 for sufficiently large n,where the inequality follows by the weak law of large number and the fact that -logp(Xi)are i.i.d. For the second item,by (1),we have 1-6≤PrXm∈Tg]≤1, which implies 1-6≤∑p(x)≤1. InETm Now,it is immediate from the definition of T that 1-6≤Tgl2-n(H(x)-e) and |Tg2-n((X)+e)≤1, as desired. 口 L11-4
1. If R > H(X), there exist Cn : X n → {0, 1} nR and Dn : {0, 1} nR → X n such that limn→∞ Pr xn∼Xn [Dn(Cn(x n )) = x n ] = 1. 2. If R 0, and T n ε = {x n ∈ X n : 2−n(H(X)+ε) ≤ p(x n) ≤ 2 −n(H(X)−ε)}, then for n sufficiently large, the following holds: 1. Pr[Xn ∈ T n ε ] ≥ 1 − δ, (1) 2. (1 − δ)2H(X)−ε) ≤ |T n ε | ≤ 2 n(H(X)+ε) (2) Proof. For the first item, by the definition of T n ε , we have Pr[Xn ∈ T n ε ] = Pr[2−n(H(X)−ε) ≤ p(Xn ) ≤ 2 −n(H(X)+ε) ] = Pr 1 n Xn i=1 log 1 p(Xi) − H(X) ≤ ε = Pr 1 n Xn i=1 log 1 p(Xi) − E h log 1 p(X) i ≤ ε ≥ 1 − δ for sufficiently large n, where the inequality follows by the weak law of large number and the fact that − log p(Xi) are i.i.d. For the second item, by (1), we have 1 − δ ≤ Pr[Xn ∈ T n ε ] ≤ 1, which implies 1 − δ ≤ X xn∈T n ε p(x n ) ≤ 1. Now, it is immediate from the definition of T n ε that 1 − δ ≤ |T n ε |2 −n(H(X)−ε) and |T n ε |2 −n(H(X)+ε) ≤ 1, as desired. L11-4
Proof of Theorem 11.6.For the first item,let s>0 such that R->H(X),the Cn simply assign each sequence r E T a distinct nR bit-string other sequences an arbitrary nR bit-string.The decoder output the typical sequence in C(y").If r is typical,there is no error.So,the error probability is at most the probability that xTr,which can be made to be arbitrary small by (1). For the second item,let An ={rmE tn:Dn(Cn(r"))=r"}.It is clear that Anl 2nR since An C Dn({0,1)nR).It suffices to show Pr[x"E An]is small.Now,take>0 such that R+0 small,then for n sufficiently large,the second term is at most 6 by (1),and the first term can be upper bounded by Pr[X"∈AnnT]≤lAnl.2-nHX-e)≤2-n(H(X-R-e) which tends to 0 as n-oo.It follows that the error probability does not tends to 1 as n-oo,and the proof is completed. ▣ 11.3.2 Quantum source coding theorem In the quantum case,Alice are given a state p,which is a product of n copies of p.And her goal is to encode it using as less qubits as possible while at the same Bob are able to get back a state which is close to p after performing measurement.Here the closeness is measured by fidelity.We have the following theorem: Theorem 11.8(Schumacher's quantum source coding theorem).Let pE H be an i.i.d.quantum source. Then the following holds: 1.If R>S(p),there erist Cn:HnC2R and Dn C2RH n such that im F(p".Dn(Cn(p))=1 2.If R<S(p),for any Cn:HnC2R and Dn C2RH n,we have lim F(pn,Dn(Cn(pn))<1 n40 Its proof is similar to the proof of Theorem 11.6.The main difference is that we use the typical subspace theorem,which is an analogue of the typical sequence theorem. L11-5
Proof of Theorem 11.6. For the first item, let ε > 0 such that R − ε > H(X), the Cn simply assign each sequence x ∈ T n ε a distinct nR bit-string other sequences an arbitrary nR bit-string. The decoder output the typical sequence in C −1 n (y n). If x is typical, there is no error. So, the error probability is at most the probability that x /∈ T n ε , which can be made to be arbitrary small by (1). For the second item, let An = {x n ∈ X n : Dn(Cn(x n)) = x n}. It is clear that |An| ≤ 2 nR since An ⊆ Dn({0, 1} nR). It suffices to show Pr[Xn ∈ An] is small. Now, take ε > 0 such that R + ε 0 small, then for n sufficiently large, the second term is at most δ by (1), and the first term can be upper bounded by Pr[Xn ∈ An ∩ T n ε ] ≤ |An| · 2 −n(H(X)−ε) ≤ 2 −n(H(X)−R−ε) which tends to 0 as n → ∞. It follows that the error probability does not tends to 1 as n → ∞, and the proof is completed. 11.3.2 Quantum source coding theorem In the quantum case, Alice are given a state ρ ⊗n, which is a product of n copies of ρ. And her goal is to encode it using as less qubits as possible while at the same Bob are able to get back a state which is close to ρ ⊗ after performing measurement. Here the closeness is measured by fidelity. We have the following theorem: Theorem 11.8 (Schumacher’s quantum source coding theorem). Let ρ ∈ H be an i.i.d. quantum source. Then the following holds: 1. If R > S(ρ), there exist Cn : H⊗n → C 2 nR and Dn : C 2 nR → H⊗n such that limn→∞ F(ρ ⊗n , Dn(Cn(ρ ⊗n )) = 1 2. If R < S(ρ), for any Cn : H⊗n → C 2 nR and Dn : C 2 nR → H⊗n, we have limn→∞ F(ρ ⊗n , Dn(Cn(ρ ⊗n )) < 1 Its proof is similar to the proof of Theorem 11.6. The main difference is that we use the typical subspace theorem, which is an analogue of the typical sequence theorem. L11-5