正在加载图片...
910 Chapter 20. Less-Numerical Algorithms 20.5 Arithmetic Coding We saw in the previous section that a perfect(entropy-bounded)coding scheme would use Li =-log2 pi bits to encode character i(in the range 1 i<Nch), if p;is its probability of occurrence.Huffman coding gives a way of rounding the Li's to close integer values and constructing a code with those lengths.Arithmetic coding [1],which we now discuss,actually does manage to encode characters using noninteger numbers of bits!It also provides a convenient way to output the result not as a stream of bits,but as a stream of symbols in any desired radix.This latter property is particularly useful if you want,e.g,to convert data from bytes(radix 256)to printable ASClI characters(radix 94).or to case-independent alphanumeric sequences containing only A-Z and 0-9(radix 36). 县 In arithmetic coding,an input message of any length is represented as a real number R in the range 0<R<1.The longer the message,the more precision required of R.This is best illustrated by an example,so let us return to the fictitious language,Vowellish,of the previous section.Recall that Vowellish has a 5 character alphabet (A.E.I.O.U).with occurrence probabilities 0.12.0.42.0.09.0.30.and 0.07,respectively.Figure 20.5.1 shows how a message beginning"IOU"is encoded: 9 The interval [0,1)is divided into segments corresponding to the 5 alphabetical characters;the length of a segment is the corresponding pi.We see that the first message character,"I",narrows the range of R to 0.37 <R<0.46.This interval is now subdivided into five subintervals,again with lengths proportional to the pi's.The second message character,"O",narrows the range of R to 0.3763 <R<0.4033. 至26 9 The "U"character further narrows the range to 0.37630 R<0.37819.Any value of R in this range can be sent as encoding"IOU".In particular,the binary fraction OF SCIENTIFIC .011000001 is in this range,so"IOU"can be sent in 9 bits.(Huffman coding took 6 10 bits for this example,see $20.4.) Of course there is the problem of knowing when to stop decoding.The fraction .011000001 represents not simply"IOU,"but"IOU...,"where the ellipses represent an infinite string of successor characters.To resolve this ambiguity,arithmetic coding generally assumes the existence of a special Nch+1th character,EOM (end of message),which occurs only once at the end of the input.Since EOM 3方V Numerical Recipes 10621 has a low probability of occurrence,it gets allocated only a very tiny piece of the number line. uction, 43106 In the above example,we gave R as a binary fraction.We could just as well have output it in any other radix,e.g.,base 94 or base 36,whatever is convenient (outside for the anticipated storage or communication channel. 腿 You might wonder how one deals with the seemingly incredible precision North required of R for a long message.The answer is that R is never actually represented all at once.At any give stage we have upper and lower bounds for R represented as a finite number of digits in the output radix.As digits of the upper and lower bounds become identical,we can left-shift them away and bring in new digits at the low-significance end.The routines below have a parameter NWK for the number of working digits to keep around.This must be large enough to make the chance of an accidental degeneracy vanishingly small.(The routines signal if a degeneracy ever occurs.)Since the process of discarding old digits and bringing in new ones is performed identically on encoding and decoding,everything stays synchronized.910 Chapter 20. Less-Numerical Algorithms Permission is granted for internet users to make one paper copy for their own personal use. Further reproduction, or any copyin Copyright (C) 1988-1992 by Cambridge University Press. Programs Copyright (C) 1988-1992 by Numerical Recipes Software. Sample page from NUMERICAL RECIPES IN C: THE ART OF SCIENTIFIC COMPUTING (ISBN 0-521-43108-5) g of machine￾readable files (including this one) to any server computer, is strictly prohibited. To order Numerical Recipes books or CDROMs, visit website http://www.nr.com or call 1-800-872-7423 (North America only), or send email to directcustserv@cambridge.org (outside North America). 20.5 Arithmetic Coding We saw in the previous section that a perfect (entropy-bounded) coding scheme would use Li = − log2 pi bits to encode character i (in the range 1 ≤ i ≤ Nch), if pi is its probability of occurrence. Huffman coding gives a way of rounding the Li’s to close integer values and constructing a code with those lengths. Arithmetic coding [1], which we now discuss, actually does manage to encode characters using noninteger numbers of bits! It also provides a convenient way to output the result not as a stream of bits, but as a stream of symbols in any desired radix. This latter property is particularly useful if you want, e.g., to convert data from bytes (radix 256) to printable ASCII characters (radix 94), or to case-independent alphanumeric sequences containing only A-Z and 0-9 (radix 36). In arithmetic coding, an input message of any length is represented as a real number R in the range 0 ≤ R < 1. The longer the message, the more precision required of R. This is best illustrated by an example, so let us return to the fictitious language, Vowellish, of the previous section. Recall that Vowellish has a 5 character alphabet (A, E, I, O, U), with occurrence probabilities 0.12, 0.42, 0.09, 0.30, and 0.07, respectively. Figure 20.5.1 shows how a message beginning “IOU” is encoded: The interval [0, 1) is divided into segments corresponding to the 5 alphabetical characters; the length of a segment is the corresponding pi. We see that the first message character, “I”, narrows the range of R to 0.37 ≤ R < 0.46. This interval is now subdivided into five subintervals, again with lengths proportional to the p i’s. The second message character, “O”, narrows the range of R to 0.3763 ≤ R < 0.4033. The “U” character further narrows the range to 0.37630 ≤ R < 0.37819. Any value of R in this range can be sent as encoding “IOU”. In particular, the binary fraction .011000001 is in this range, so “IOU” can be sent in 9 bits. (Huffman coding took 10 bits for this example, see §20.4.) Of course there is the problem of knowing when to stop decoding. The fraction .011000001 represents not simply “IOU,” but “IOU... ,” where the ellipses represent an infinite string of successor characters. To resolve this ambiguity, arithmetic coding generally assumes the existence of a special Nch + 1th character, EOM (end of message), which occurs only once at the end of the input. Since EOM has a low probability of occurrence, it gets allocated only a very tiny piece of the number line. In the above example, we gave R as a binary fraction. We could just as well have output it in any other radix, e.g., base 94 or base 36, whatever is convenient for the anticipated storage or communication channel. You might wonder how one deals with the seemingly incredible precision required of R for a long message. The answer is that R is never actually represented all at once. At any give stage we have upper and lower bounds for R represented as a finite number of digits in the output radix. As digits of the upper and lower bounds become identical, we can left-shift them away and bring in new digits at the low-significance end. The routines below have a parameter NWK for the number of working digits to keep around. This must be large enough to make the chance of an accidental degeneracy vanishingly small. (The routines signal if a degeneracy ever occurs.) Since the process of discarding old digits and bringing in new ones is performed identically on encoding and decoding, everything stays synchronized
向下翻页>>
©2008-现在 cucdc.com 高等教育资讯网 版权所有