正在加载图片...
20.4 Huffman Coding and Compression of Data 903 k=ij[k][ip[(c+2)%10][7&m++]]; for(j=0;j<=9;j++) Find which appended digit will check properly. if (!ij[k][ip[j][m&7]])break; *ch=j+48; Convert to ASCll. return k==0; 2 CITED REFERENCES AND FURTHER READING: McNamara,J.E.1982,Technica/Aspects of Data Communication,2nd ed.(Bedford,MA:Digital Press).[1] da Cruz,F.1987,Kermit,A File Transfer Protocol(Bedford,MA:Digital Press).[2] Morse,G.1986,Byte,vol.11,pp.115-124(September).[3] LeVan,J.1987,Byte,vol.12,pp.339-341 (November).[4] 凌 Cam Sarwate,D.V.1988,Communications of the ACM,vol.31,pp.1008-1013.[5] Griffiths,G.,and Stones,G.C.1987,Communications of the ACM,vol.30,pp.617-620.[6] Wagner,N.R.,and Putter,P.S.1989,Communications of the ACM,vol.32,pp.106-110.[7] 9 20.4 Huffman Coding and Compression of Data A lossless data compression algorithm takes a string of symbols(typically 4的 ASCII characters or bytes)and translates it reversibly into another string,one that is on the average of shorter length.The words"on the average"are crucial;it is obvious that no reversible algorithm can make all strings shorter-there just aren't enough short strings to be in one-to-one correspondence with longer strings.Compression algorithms are possible only when,on the input side,some strings,or some input symbols,are more common than others.These can then be encoded in fewer bits than rarer input strings or symbols,giving a net average gain. There exist many,quite different,compression techniques,corresponding to different ways of detecting and using departures from equiprobability in input strings. Recipes Numerica 10621 In this section and the next we shall consider only variable length codes with defined 43106 word inputs.In these,the input is sliced into fixed units,for example ASClI characters,while the corresponding output comes in chunks of variable size.The simplest such method is Huffman coding [1],discussed in this section.Another example,arithmetic compression,is discussed in 820.5. At the opposite extreme from defined-word,variable length codes are schemes that divide up the input into units of variable length(words or phrases of English text. for example)and then transmit these,often with a fixed-length output code.The most widely used code of this type is the Ziv-Lempel code [21.References [3-6]give the flavor of some other compression techniques,with references to the large literature. The idea behind Huffman coding is simply to use shorter bit patterns for more common characters.We can make this idea quantitative by considering the concept of entropy.Suppose the input alphabet has Nch characters,and that these occur in the input string with respective probabilities pi,i=1,...,Nch,so that >pi =1. Then the fundamental theorem of information theory says that strings consisting of20.4 Huffman Coding and Compression of Data 903 Permission is granted for internet users to make one paper copy for their own personal use. Further reproduction, or any copyin Copyright (C) 1988-1992 by Cambridge University Press. Programs Copyright (C) 1988-1992 by Numerical Recipes Software. Sample page from NUMERICAL RECIPES IN C: THE ART OF SCIENTIFIC COMPUTING (ISBN 0-521-43108-5) g of machine￾readable files (including this one) to any server computer, is strictly prohibited. To order Numerical Recipes books or CDROMs, visit website http://www.nr.com or call 1-800-872-7423 (North America only), or send email to directcustserv@cambridge.org (outside North America). k=ij[k][ip[(c+2) % 10][7 & m++]]; } for (j=0;j<=9;j++) Find which appended digit will check properly. if (!ij[k][ip[j][m & 7]]) break; *ch=j+48; Convert to ASCII. return k==0; } CITED REFERENCES AND FURTHER READING: McNamara, J.E. 1982, Technical Aspects of Data Communication, 2nd ed. (Bedford, MA: Digital Press). [1] da Cruz, F. 1987, Kermit, A File Transfer Protocol (Bedford, MA: Digital Press). [2] Morse, G. 1986, Byte, vol. 11, pp. 115–124 (September). [3] LeVan, J. 1987, Byte, vol. 12, pp. 339–341 (November). [4] Sarwate, D.V. 1988, Communications of the ACM, vol. 31, pp. 1008–1013. [5] Griffiths, G., and Stones, G.C. 1987, Communications of the ACM, vol. 30, pp. 617–620. [6] Wagner, N.R., and Putter, P.S. 1989, Communications of the ACM, vol. 32, pp. 106–110. [7] 20.4 Huffman Coding and Compression of Data A lossless data compression algorithm takes a string of symbols (typically ASCII characters or bytes) and translates it reversibly into another string, one that is on the average of shorter length. The words “on the average” are crucial; it is obvious that no reversible algorithm can make all strings shorter — there just aren’t enough short strings to be in one-to-one correspondence with longer strings. Compression algorithms are possible only when, on the input side, some strings, or some input symbols, are more common than others. These can then be encoded in fewer bits than rarer input strings or symbols, giving a net average gain. There exist many, quite different, compression techniques, corresponding to different ways of detecting and using departures from equiprobability in input strings. In this section and the next we shall consider only variable length codes with defined word inputs. In these, the input is sliced into fixed units, for example ASCII characters, while the corresponding output comes in chunks of variable size. The simplest such method is Huffman coding [1], discussed in this section. Another example, arithmetic compression, is discussed in §20.5. At the opposite extreme from defined-word, variable length codes are schemes that divide up the input into units of variable length (words or phrases of English text, for example) and then transmit these, often with a fixed-length output code. The most widely used code of this type is the Ziv-Lempel code [2]. References [3-6] give the flavor of some other compression techniques, with references to the large literature. The idea behind Huffman coding is simply to use shorter bit patterns for more common characters. We can make this idea quantitative by considering the concept of entropy. Suppose the input alphabet has Nch characters, and that these occur in the input string with respective probabilities pi, i = 1,...,Nch, so that pi = 1. Then the fundamental theorem of information theory says that strings consisting of
<<向上翻页
©2008-现在 cucdc.com 高等教育资讯网 版权所有