910 Chapter 20. Less-Numerical Algorithms 20.5 Arithmetic Coding We saw in the previous section that a perfect(entropy-bounded)coding scheme would use Li =-log2 pi bits to encode character i(in the range 1 i<Nch), if p;is its probability of occurrence.Huffman coding gives a way of rounding the Li's to close integer values and constructing a code with those lengths.Arithmetic coding [1],which we now discuss,actually does manage to encode characters using noninteger numbers of bits!It also provides a convenient way to output the result not as a stream of bits,but as a stream of symbols in any desired radix.This latter property is particularly useful if you want,e.g,to convert data from bytes(radix 256)to printable ASClI characters(radix 94).or to case-independent alphanumeric sequences containing only A-Z and 0-9(radix 36). 县 In arithmetic coding,an input message of any length is represented as a real number R in the range 0<R<1.The longer the message,the more precision required of R.This is best illustrated by an example,so let us return to the fictitious language,Vowellish,of the previous section.Recall that Vowellish has a 5 character alphabet (A.E.I.O.U).with occurrence probabilities 0.12.0.42.0.09.0.30.and 0.07,respectively.Figure 20.5.1 shows how a message beginning"IOU"is encoded: 9 The interval [0,1)is divided into segments corresponding to the 5 alphabetical characters;the length of a segment is the corresponding pi.We see that the first message character,"I",narrows the range of R to 0.37 <R<0.46.This interval is now subdivided into five subintervals,again with lengths proportional to the pi's.The second message character,"O",narrows the range of R to 0.3763 <R<0.4033. 至26 9 The "U"character further narrows the range to 0.37630 R<0.37819.Any value of R in this range can be sent as encoding"IOU".In particular,the binary fraction OF SCIENTIFIC .011000001 is in this range,so"IOU"can be sent in 9 bits.(Huffman coding took 6 10 bits for this example,see $20.4.) Of course there is the problem of knowing when to stop decoding.The fraction .011000001 represents not simply"IOU,"but"IOU...,"where the ellipses represent an infinite string of successor characters.To resolve this ambiguity,arithmetic coding generally assumes the existence of a special Nch+1th character,EOM (end of message),which occurs only once at the end of the input.Since EOM 3方V Numerical Recipes 10621 has a low probability of occurrence,it gets allocated only a very tiny piece of the number line. uction, 43106 In the above example,we gave R as a binary fraction.We could just as well have output it in any other radix,e.g.,base 94 or base 36,whatever is convenient (outside for the anticipated storage or communication channel. 腿 You might wonder how one deals with the seemingly incredible precision North required of R for a long message.The answer is that R is never actually represented all at once.At any give stage we have upper and lower bounds for R represented as a finite number of digits in the output radix.As digits of the upper and lower bounds become identical,we can left-shift them away and bring in new digits at the low-significance end.The routines below have a parameter NWK for the number of working digits to keep around.This must be large enough to make the chance of an accidental degeneracy vanishingly small.(The routines signal if a degeneracy ever occurs.)Since the process of discarding old digits and bringing in new ones is performed identically on encoding and decoding,everything stays synchronized
910 Chapter 20. Less-Numerical Algorithms Permission is granted for internet users to make one paper copy for their own personal use. Further reproduction, or any copyin Copyright (C) 1988-1992 by Cambridge University Press. Programs Copyright (C) 1988-1992 by Numerical Recipes Software. Sample page from NUMERICAL RECIPES IN C: THE ART OF SCIENTIFIC COMPUTING (ISBN 0-521-43108-5) g of machinereadable files (including this one) to any server computer, is strictly prohibited. To order Numerical Recipes books or CDROMs, visit website http://www.nr.com or call 1-800-872-7423 (North America only), or send email to directcustserv@cambridge.org (outside North America). 20.5 Arithmetic Coding We saw in the previous section that a perfect (entropy-bounded) coding scheme would use Li = − log2 pi bits to encode character i (in the range 1 ≤ i ≤ Nch), if pi is its probability of occurrence. Huffman coding gives a way of rounding the Li’s to close integer values and constructing a code with those lengths. Arithmetic coding [1], which we now discuss, actually does manage to encode characters using noninteger numbers of bits! It also provides a convenient way to output the result not as a stream of bits, but as a stream of symbols in any desired radix. This latter property is particularly useful if you want, e.g., to convert data from bytes (radix 256) to printable ASCII characters (radix 94), or to case-independent alphanumeric sequences containing only A-Z and 0-9 (radix 36). In arithmetic coding, an input message of any length is represented as a real number R in the range 0 ≤ R < 1. The longer the message, the more precision required of R. This is best illustrated by an example, so let us return to the fictitious language, Vowellish, of the previous section. Recall that Vowellish has a 5 character alphabet (A, E, I, O, U), with occurrence probabilities 0.12, 0.42, 0.09, 0.30, and 0.07, respectively. Figure 20.5.1 shows how a message beginning “IOU” is encoded: The interval [0, 1) is divided into segments corresponding to the 5 alphabetical characters; the length of a segment is the corresponding pi. We see that the first message character, “I”, narrows the range of R to 0.37 ≤ R < 0.46. This interval is now subdivided into five subintervals, again with lengths proportional to the p i’s. The second message character, “O”, narrows the range of R to 0.3763 ≤ R < 0.4033. The “U” character further narrows the range to 0.37630 ≤ R < 0.37819. Any value of R in this range can be sent as encoding “IOU”. In particular, the binary fraction .011000001 is in this range, so “IOU” can be sent in 9 bits. (Huffman coding took 10 bits for this example, see §20.4.) Of course there is the problem of knowing when to stop decoding. The fraction .011000001 represents not simply “IOU,” but “IOU... ,” where the ellipses represent an infinite string of successor characters. To resolve this ambiguity, arithmetic coding generally assumes the existence of a special Nch + 1th character, EOM (end of message), which occurs only once at the end of the input. Since EOM has a low probability of occurrence, it gets allocated only a very tiny piece of the number line. In the above example, we gave R as a binary fraction. We could just as well have output it in any other radix, e.g., base 94 or base 36, whatever is convenient for the anticipated storage or communication channel. You might wonder how one deals with the seemingly incredible precision required of R for a long message. The answer is that R is never actually represented all at once. At any give stage we have upper and lower bounds for R represented as a finite number of digits in the output radix. As digits of the upper and lower bounds become identical, we can left-shift them away and bring in new digits at the low-significance end. The routines below have a parameter NWK for the number of working digits to keep around. This must be large enough to make the chance of an accidental degeneracy vanishingly small. (The routines signal if a degeneracy ever occurs.) Since the process of discarding old digits and bringing in new ones is performed identically on encoding and decoding, everything stays synchronized
20.5 Arithmetic Coding 911 1.0 0.46 0.4033 0.37819 A A A A 0.9 0.45 0.3780 0.400 0.8 0.3778 0.44 0.7 E E 0.395 E 0.3776 E 0.43 http://www.nr.com or call 1-800-872- Permission is readable files 0.6 0.3774 0.42 0.5 0.390 0.3772 (including this one) granted fori 0.41 0.4 0.3770 internet 0.40 0.385 0.3 0.3768 from NUMERICAL RECIPES IN C: 0.39 0 0 0 0.2 0.3766 0.380 州pWer:s to -7423(North America t users to make one paper 0.1 0.38 0.3764 1988-1992 by Cambridge University Press.Programs U U U 0.0 0.37 0.3763 0.37630 send Figure 20.5.1. Arithmetic coding of the message "IOU."in the fictitious language Vowellish. Successive characters give successively finer subdivisions of the initial interval between 0 and 1.The final value can be output as the digits of a fraction in any desired radix.Note how the subinterval allocated to dir Copyright (C) to a character is proportional to its probability of occurrence. The routine arcmak constructs the cumulative frequency distribution table used ectcustser to partition the interval at each stage.In the principal routine arcode.when an THE ART OF SCIENTIFIC COMPUTING (ISBN 0-521 interval of size jdif is to be partitioned in the proportions of some n to some ntot, say,then we must compute (n*jdif)/ntot.With integer arithmetic,the numerator v@cam is likely to overflow,and,unfortunately,an expression like jdif/(ntot/n)is not equivalent.In the implementation below,we resort to double precision floating .Further reproduction, 1988-1992 by Numerical Recipes arithmetic for this calculation.Not only is this inefficient,but different roundoff -43108-5 errors can(albeit very rarely)make different machines encode differently,though any one type of machine will decode exactly what it encoded,since identical roundoff (outside errors occur in the two processes.For serious use,one needs to replace this floating North Software. calculation with an integer computation in a double register(not available to the C programmer). The internally set variable minint,which is the minimum allowed number visit website of discrete steps between the upper and lower bounds,determines when new low- machine significance digits are added.minint must be large enough to provide resolution of all the input characters.That is,we must have pi x minint 1 for all i.A value of 100Nch,or 1.1/minpi,whichever is larger,is generally adequate.However,for safety,the routine below takes minint to be as large as possible,with the product minint*nradd just smaller than overflow.This results in some time inefficiency, and in a few unnecessary characters being output at the end of a message.You can
20.5 Arithmetic Coding 911 Permission is granted for internet users to make one paper copy for their own personal use. Further reproduction, or any copyin Copyright (C) 1988-1992 by Cambridge University Press. Programs Copyright (C) 1988-1992 by Numerical Recipes Software. Sample page from NUMERICAL RECIPES IN C: THE ART OF SCIENTIFIC COMPUTING (ISBN 0-521-43108-5) g of machinereadable files (including this one) to any server computer, is strictly prohibited. To order Numerical Recipes books or CDROMs, visit website http://www.nr.com or call 1-800-872-7423 (North America only), or send email to directcustserv@cambridge.org (outside North America). 1.0 0.9 0.8 0.7 0.6 0.5 0.4 0.3 0.2 0.1 0.0 A E I O U A E I O U A E I O U A E I O U 0.46 0.42 0.41 0.37 0.385 0.380 0.4033 0.3763 0.37819 0.37630 0.3780 0.3764 0.44 0.43 0.390 0.395 0.400 0.3766 0.3768 0.3772 0.3774 0.3778 0.3776 0.45 0.40 0.39 0.38 0.3770 Figure 20.5.1. Arithmetic coding of the message “IOU...” in the fictitious language Vowellish. Successive characters give successively finer subdivisions of the initial interval between 0 and 1. The final value can be output as the digits of a fraction in any desired radix. Note how the subinterval allocated to a character is proportional to its probability of occurrence. The routine arcmak constructs the cumulative frequency distribution table used to partition the interval at each stage. In the principal routine arcode, when an interval of size jdif is to be partitioned in the proportions of some n to some ntot, say, then we must compute (n*jdif)/ntot. With integer arithmetic, the numerator is likely to overflow; and, unfortunately, an expression like jdif/(ntot/n) is not equivalent. In the implementation below, we resort to double precision floating arithmetic for this calculation. Not only is this inefficient, but different roundoff errors can (albeit very rarely) make different machines encode differently, though any one type of machine will decode exactly what it encoded, since identical roundoff errors occur in the two processes. For serious use, one needs to replace this floating calculation with an integer computation in a double register (not available to the C programmer). The internally set variable minint, which is the minimum allowed number of discrete steps between the upper and lower bounds, determines when new lowsignificance digits are added. minint must be large enough to provide resolution of all the input characters. That is, we must have pi × minint > 1 for all i. A value of 100Nch, or 1.1/ min pi, whichever is larger, is generally adequate. However, for safety, the routine below takes minint to be as large as possible, with the product minint*nradd just smaller than overflow. This results in some time inefficiency, and in a few unnecessary characters being output at the end of a message. You can
912 Chapter 20.Less-Numerical Algorithms decrease minint if you want to live closer to the edge. A final safety feature in arcmak is its refusal to believe zero values in the table nfreq;a 0 is treated as if it were a 1.If this were not done,the occurrence in a message of a single character whose nfreq entry is zero would result in scrambling the entire rest of the message.If you want to live dangerously,with a very slightly more efficient coding,you can delete the IMAX(,1)operation. #include "nrutil.h" #include ANSI header file containing integer ranges. #define MC 512 #ifdef ULONG_MAX Maximum value of unsigned long. #define MAXINT (ULONG_MAX >1) #else 19881992 #define MAXINT 2147483647 #endif Here MC is the largest anticipated value of nchh;MAXINT is a large positive integer that does 1800 not overflow. FO州fro typedef struct unsigned long *ilob,*iupb,*ncumfq,jdif,nc,minint,nch,ncum,nrad; arithcode; (Nort server void arcmak(unsigned long nfreq[],unsigned long nchh,unsigned long nradd, arithcode *acode) America computer, THE Given a table nfreq[1..nchh]of the frequency of occurrence of nchh symbols,and given ART a desired output radix nradd,initialize the cumulative frequency table and other variables for arithmetic compression in the structure acode. Program unsigned long j; if (nchh MC)nrerror("input radix may not exceed Mc in arcmak.") if (nradd 256)nrerror("output radix may not exceed 256 in arcmak."); acode->minint=MAXINT/nradd; acode->nch=nchh; acode->nrad=nradd; 188819920 OF SCIENTIFIC COMPUTING(ISBN acode->ncumfq[1]=0; for (j=2;jnch+1;j++) acode->ncumfq[j]=acode->ncumfq[j-1]+IMAX(nfreq[j-1],1); acode->ncum=acode->ncumfq[acode->nch+2]=acode->ncumfq[acode->nch+1]+1; Numerical Recipes 10-621 -43108 The structure acode must be defined and allocated in your main program with statements like this: (outside North Software. #include "nrutil.h" #define MC 512 Maximum anticipated value of nchh in arcmak #define NWK 20 Keep this value the same as in arcode,below. typedef struct unsigned long *ilob,*iupb,*ncumfq,jdif,nc,minint,nch,ncum,nrad; arithcode; 4 arithcode acode; acode.ilob=(unsigned long *)lvector(1,NWK); Allocate space within acode. acode.iupb=(unsigned long *)lvector(1,NWK); acode.ncumfq=(unsigned long *)Ivector(1,MC+2);
912 Chapter 20. Less-Numerical Algorithms Permission is granted for internet users to make one paper copy for their own personal use. Further reproduction, or any copyin Copyright (C) 1988-1992 by Cambridge University Press. Programs Copyright (C) 1988-1992 by Numerical Recipes Software. Sample page from NUMERICAL RECIPES IN C: THE ART OF SCIENTIFIC COMPUTING (ISBN 0-521-43108-5) g of machinereadable files (including this one) to any server computer, is strictly prohibited. To order Numerical Recipes books or CDROMs, visit website http://www.nr.com or call 1-800-872-7423 (North America only), or send email to directcustserv@cambridge.org (outside North America). decrease minint if you want to live closer to the edge. A final safety feature in arcmak is its refusal to believe zero values in the table nfreq; a 0 is treated as if it were a 1. If this were not done, the occurrence in a message of a single character whose nfreq entry is zero would result in scrambling the entire rest of the message. If you want to live dangerously, with a very slightly more efficient coding, you can delete the IMAX( ,1) operation. #include "nrutil.h" #include ANSI header file containing integer ranges. #define MC 512 #ifdef ULONG_MAX Maximum value of unsigned long. #define MAXINT (ULONG_MAX >> 1) #else #define MAXINT 2147483647 #endif Here MC is the largest anticipated value of nchh; MAXINT is a large positive integer that does not overflow. typedef struct { unsigned long *ilob,*iupb,*ncumfq,jdif,nc,minint,nch,ncum,nrad; } arithcode; void arcmak(unsigned long nfreq[], unsigned long nchh, unsigned long nradd, arithcode *acode) Given a table nfreq[1..nchh] of the frequency of occurrence of nchh symbols, and given a desired output radix nradd, initialize the cumulative frequency table and other variables for arithmetic compression in the structure acode. { unsigned long j; if (nchh > MC) nrerror("input radix may not exceed MC in arcmak."); if (nradd > 256) nrerror("output radix may not exceed 256 in arcmak."); acode->minint=MAXINT/nradd; acode->nch=nchh; acode->nrad=nradd; acode->ncumfq[1]=0; for (j=2;jnch+1;j++) acode->ncumfq[j]=acode->ncumfq[j-1]+IMAX(nfreq[j-1],1); acode->ncum=acode->ncumfq[acode->nch+2]=acode->ncumfq[acode->nch+1]+1; } The structure acode must be defined and allocated in your main program with statements like this: #include "nrutil.h" #define MC 512 Maximum anticipated value of nchh in arcmak. #define NWK 20 Keep this value the same as in arcode, b elow. typedef struct { unsigned long *ilob,*iupb,*ncumfq,jdif,nc,minint,nch,ncum,nrad; } arithcode; ... arithcode acode; ... acode.ilob=(unsigned long *)lvector(1,NWK); Allocate space within acode. acode.iupb=(unsigned long *)lvector(1,NWK); acode.ncumfq=(unsigned long *)lvector(1,MC+2);
20.5 Arithmetic Coding 913 Individual characters in a message are coded or decoded by the routine arcode, which in turn uses the utility arcsum #include #include #define NWK 20 #define JTRY(j,k,m)((long)((((double)(k))*((double)(j)))/((double)(m)))) This macro is used to calculate (k*j)/m without overflow.Program efficiency can be improved by substituting an assembly language routine that does integer multiply to a double register. typedef struct unsigned long *ilob,*iupb,*ncumfq,jdif,nc,minint,nch,ncum,nrad; arithcode; 83g void arcode(unsigned long *ich,unsigned char **codep,unsigned long *lcode, granted for (including unsigned long *lcd,int isign,arithcode *acode) Compress (isign=1)or decompress (isign =-1)the single character ich into or out of 11.800 the character array *codep[1..lcode],starting with byte *codep [lcd]and(if necessary) incrementing lcd so that,on return,lcd points to the first unused byte in *codep.Note that the structure acode contains both information on the code,and also state information on from NUMERICAL RECIPES I 18881892 the particular output being written into the array *codep.An initializing call with isign=0 is required before beginning any *codep array,whether for encoding or decoding.This is in addition to the initializing call to arcmak that is required to initialize the code itself.A call with ich=nch (as set in arcmak)has the reserved meaning "end of message." 9 void arcsum(unsigned long iin[],unsigned long iout ]unsigned long ja, int nwk,unsigned long nrad,unsigned long nc); ART void nrerror(char error_text[]); int j,ki Program unsigned long ihi,ja,jh,jl,m; if (!isign){ Initialize enough digits of the upper and lower bounds. acode->idifsacode->nrad-1; for (j=NWK;j>=1;j--){ acode->iupb[j]=acode->nrad-1; acode->ilob[j]=0; acode->nc=j; 188 if (acode->jdif acode->minint)return; Initialization complete OF SCIENTIFIC COMPUTING(ISBN acode->jdif=(acode->jdif+1)*acode->nrad-1; 1920 nrerror("NWK too small in arcode."); else 0621 1f(1s1gm>0){ If encoding,check for valid input character. if (*ich acode->nch)nrerror("bad ich in arcode."); Numerical Recipes -43108 else If decoding.locate the character ich by bisection. ja=(+codep)[*lcd]-acode->ilob[acode->nc]; for (j=acode->nc+1;jnrad; Software. ja +=((*codep)[*lcd+j-acode->nc]-acode->ilob[j]); ihi=acode->nch+1; Amer *ich=0; while (ihi-(*ich)>1){ m=(*ich+ih1)>>1; if (ja >JTRY(acode->jdif,acode->ncumfq[m+1],acode->ncum)) *ich=m: else ihi=m; if (*ich =acode->nch)return Detected end of message Following code is common for encoding and decoding.Convert character ich to a new subrange [ilob,iupb)
20.5 Arithmetic Coding 913 Permission is granted for internet users to make one paper copy for their own personal use. Further reproduction, or any copyin Copyright (C) 1988-1992 by Cambridge University Press. Programs Copyright (C) 1988-1992 by Numerical Recipes Software. Sample page from NUMERICAL RECIPES IN C: THE ART OF SCIENTIFIC COMPUTING (ISBN 0-521-43108-5) g of machinereadable files (including this one) to any server computer, is strictly prohibited. To order Numerical Recipes books or CDROMs, visit website http://www.nr.com or call 1-800-872-7423 (North America only), or send email to directcustserv@cambridge.org (outside North America). Individual characters in a message are coded or decoded by the routine arcode, which in turn uses the utility arcsum. #include #include #define NWK 20 #define JTRY(j,k,m) ((long)((((double)(k))*((double)(j)))/((double)(m)))) This macro is used to calculate (k*j)/m without overflow. Program efficiency can be improved by substituting an assembly language routine that does integer multiply to a double register. typedef struct { unsigned long *ilob,*iupb,*ncumfq,jdif,nc,minint,nch,ncum,nrad; } arithcode; void arcode(unsigned long *ich, unsigned char **codep, unsigned long *lcode, unsigned long *lcd, int isign, arithcode *acode) Compress (isign = 1) or decompress (isign = −1) the single character ich into or out of the character array *codep[1..lcode], starting with byte *codep[lcd] and (if necessary) incrementing lcd so that, on return, lcd points to the first unused byte in *codep. Note that the structure acode contains both information on the code, and also state information on the particular output being written into the array *codep. An initializing call with isign=0 is required before beginning any *codep array, whether for encoding or decoding. This is in addition to the initializing call to arcmak that is required to initialize the code itself. A call with ich=nch (as set in arcmak) has the reserved meaning “end of message.” { void arcsum(unsigned long iin[], unsigned long iout[], unsigned long ja, int nwk, unsigned long nrad, unsigned long nc); void nrerror(char error_text[]); int j,k; unsigned long ihi,ja,jh,jl,m; if (!isign) { Initialize enough digits of the upper and lower bounds. acode->jdif=acode->nrad-1; for (j=NWK;j>=1;j--) { acode->iupb[j]=acode->nrad-1; acode->ilob[j]=0; acode->nc=j; if (acode->jdif > acode->minint) return; Initialization complete. acode->jdif=(acode->jdif+1)*acode->nrad-1; } nrerror("NWK too small in arcode."); } else { if (isign > 0) { If encoding, check for valid input character. if (*ich > acode->nch) nrerror("bad ich in arcode."); } else { If decoding, locate the character ich by bisection. ja=(*codep)[*lcd]-acode->ilob[acode->nc]; for (j=acode->nc+1;jnrad; ja += ((*codep)[*lcd+j-acode->nc]-acode->ilob[j]); } ihi=acode->nch+1; *ich=0; while (ihi-(*ich) > 1) { m=(*ich+ihi)>>1; if (ja >= JTRY(acode->jdif,acode->ncumfq[m+1],acode->ncum)) *ich=m; else ihi=m; } if (*ich == acode->nch) return; Detected end of message. } Following code is common for encoding and decoding. Convert character ich to a new subrange [ilob,iupb)
914 Chapter 20.Less-Numerical Algorithms jh=JTRY(acode->jdif,acode->ncumfq[*ich+2],acode->ncum); jl=JTRY(acode->jdif,acode->ncumfq[*ich+1],acode->ncum); acode->idif=jh-il; arcsum(acode->ilob,acode->iupb,jh,NWK,acode->nrad,acode->nc); arcsum(acode->ilob,acode->ilob,jl,NWK,acode->nrad,acode->nc); How many leading digits to output (if encoding)or skip over? for (j=acode->nc;jnch acode->iupb[j]!acode->ilob[j])break; if (*lcd *lcode){ fprintf(stderr,"Reached the end of the 'code'array.\n"); fprintf(stderr,"Attempting to expand its size.\n"); *lcode +=*lcode/2; if ((*codep=(unsigned char *)realloc(*codep, (unsigned)(*lcode*sizeof(unsigned char))))==NULL){ nrerror("Size expansion failed"); 18881892 if (isign 0)(*codep)[*lcd]=(unsigned char)acode->ilob[j]; 100 ++(*1cd); if (j>NWK) return; Ran out of message.Did someone forget to encode a from NUMERICAL RECIPESI acode->nc=j; terminating ncd? for(j=0;acode->jdifminint;j++) How many digits to shift? acode->jdif *=acode->nrad; if (acode->nc-j 1)nrerror("NWK too small in arcode."); if (j){ Shift them. for (k=acode->nc;kiupb [k-j]=acode->iupb[k]; ART acode->ilob[k-j]=acode->ilob[k]; Programs acode->nc -j; strictly proh for (k=NWK-j+1;kiupb[k]=acode->ilob [k]=0; return; Normal return. 2 to dir rectcustser OF SCIENTIFIC COMPUTING(ISBN void arcsum(unsigned long iin[],unsigned long iout []unsigned long ja, int nwk,unsigned long nrad,unsigned long nc) Used by arcode.Add the integer ja to the radix nrad multiple-precision integer iin [nc..nwk]. v@cam Return the result in iout [nc..nwk]. 10-621 int j,karry=0; 1988-1992 by Numerical Recipes 43108 unsigned long jtmp; for (j=nwk;j>nc;j--){ jtmp=ja; ja /nrad; Software. iout [j]=iin[j]+(jtmp-ja*nrad)+karry; if (iout[i]>=nrad){ ying of iout[j]-=nrad; (outside North America) karry=1; else karry=0; iout [nc]=iin[nc]+ja+karry; If radix-changing,rather than compression,is your primary aim (for example to convert an arbitrary file into printable characters)then you are of course free to set all the components of nfreq equal,say,to 1
914 Chapter 20. Less-Numerical Algorithms Permission is granted for internet users to make one paper copy for their own personal use. Further reproduction, or any copyin Copyright (C) 1988-1992 by Cambridge University Press. Programs Copyright (C) 1988-1992 by Numerical Recipes Software. Sample page from NUMERICAL RECIPES IN C: THE ART OF SCIENTIFIC COMPUTING (ISBN 0-521-43108-5) g of machinereadable files (including this one) to any server computer, is strictly prohibited. To order Numerical Recipes books or CDROMs, visit website http://www.nr.com or call 1-800-872-7423 (North America only), or send email to directcustserv@cambridge.org (outside North America). jh=JTRY(acode->jdif,acode->ncumfq[*ich+2],acode->ncum); jl=JTRY(acode->jdif,acode->ncumfq[*ich+1],acode->ncum); acode->jdif=jh-jl; arcsum(acode->ilob,acode->iupb,jh,NWK,acode->nrad,acode->nc); arcsum(acode->ilob,acode->ilob,jl,NWK,acode->nrad,acode->nc); How many leading digits to output (if encoding) or skip over? for (j=acode->nc;jnch && acode->iupb[j] != acode->ilob[j]) break; if (*lcd > *lcode) { fprintf(stderr,"Reached the end of the ’code’ array.\n"); fprintf(stderr,"Attempting to expand its size.\n"); *lcode += *lcode/2; if ((*codep=(unsigned char *)realloc(*codep, (unsigned)(*lcode*sizeof(unsigned char)))) == NULL) { nrerror("Size expansion failed"); } } if (isign > 0) (*codep)[*lcd]=(unsigned char)acode->ilob[j]; ++(*lcd); } if (j > NWK) return; Ran out of message. Did someone forget to encode a acode->nc=j; terminating ncd? for(j=0;acode->jdifminint;j++) How many digits to shift? acode->jdif *= acode->nrad; if (acode->nc-j nc;kiupb[k-j]=acode->iupb[k]; acode->ilob[k-j]=acode->ilob[k]; } } acode->nc -= j; for (k=NWK-j+1;kiupb[k]=acode->ilob[k]=0; } return; Normal return. } void arcsum(unsigned long iin[], unsigned long iout[], unsigned long ja, int nwk, unsigned long nrad, unsigned long nc) Used by arcode. Add the integer ja to the radix nrad multiple-precision integer iin[nc..nwk]. Return the result in iout[nc..nwk]. { int j,karry=0; unsigned long jtmp; for (j=nwk;j>nc;j--) { jtmp=ja; ja /= nrad; iout[j]=iin[j]+(jtmp-ja*nrad)+karry; if (iout[j] >= nrad) { iout[j] -= nrad; karry=1; } else karry=0; } iout[nc]=iin[nc]+ja+karry; } If radix-changing, rather than compression, is your primary aim (for example to convert an arbitrary file into printable characters) then you are of course free to set all the components of nfreq equal, say, to 1
20.6 Arithmetic at Arbitrary Precision 915 CITED REFERENCES AND FURTHER READING: Bell,T.C.,Cleary,J.G.,and Witten,I.H.1990,Text Compression (Englewood Cliffs,NJ:Prentice- Hall). Nelson,M.1991.The Data Compression Book(Redwood City.CA:M&T Books). Witten,I.H.,Neal,R.M.,and Cleary,J.G.1987,Communications of the ACM,vol.30,pp.520- 540.[1] 20.6 Arithmetic at Arbitrary Precision 83g nted for Let's compute the number a to a couple of thousand decimal places.In doing so,we'll learn some things about multiple precision arithmetic on computers and Cam meet quite an unusual application of the fast Fourier transform(FFT).We'll also develop a set of routines that you can use for other calculations at any desired level of arithmetic precision. RECIPESI To start with,we need an analytic algorithm for m.Useful algorithms are quadratically convergent,i.e.,they double the number of significant digits at each iteration.Quadratically convergent algorithms for are based on the AGM (arithmetic geometric mean)method,which also finds application to the calculation of elliptic integrals(cf.$6.11)and in advanced implementations of the ADI method for elliptic partial differential equations($19.5).Borwein and Borwein [1]treat this subject,which is beyond our scope here.One of their algorithms for starts with 、三.色g” the initializations IENTIFIC X0=V2 61 T0=2+V2 (20.6.1) %=迈 (ISBN and then,fori=0,1,...,repeats the iteration Numerica 10.621 (+左) Recipes 43108 Xi+1= Xi+1+1 (outside Ti+1= Y+1/ (20.6.2) North Software. YivXi+1+- Y+1= Xi+1 Y+1 The value m emerges as the limit Too. Now,to the question of how to do arithmetic to arbitrary precision:In a high-level language like C,a natural choice is to work in radix(base)256,so that character arrays can be directly interpreted as strings of digits.At the very end of our calculation,we will want to convert our answer to radix 10,but that is essentially a frill for the benefit of human ears,accustomed to the familiar chant,"three point
20.6 Arithmetic at Arbitrary Precision 915 Permission is granted for internet users to make one paper copy for their own personal use. Further reproduction, or any copyin Copyright (C) 1988-1992 by Cambridge University Press. Programs Copyright (C) 1988-1992 by Numerical Recipes Software. Sample page from NUMERICAL RECIPES IN C: THE ART OF SCIENTIFIC COMPUTING (ISBN 0-521-43108-5) g of machinereadable files (including this one) to any server computer, is strictly prohibited. To order Numerical Recipes books or CDROMs, visit website http://www.nr.com or call 1-800-872-7423 (North America only), or send email to directcustserv@cambridge.org (outside North America). CITED REFERENCES AND FURTHER READING: Bell, T.C., Cleary, J.G., and Witten, I.H. 1990, Text Compression (Englewood Cliffs, NJ: PrenticeHall). Nelson, M. 1991, The Data Compression Book (Redwood City, CA: M&T Books). Witten, I.H., Neal, R.M., and Cleary, J.G. 1987, Communications of the ACM, vol. 30, pp. 520– 540. [1] 20.6 Arithmetic at Arbitrary Precision Let’s compute the number π to a couple of thousand decimal places. In doing so, we’ll learn some things about multiple precision arithmetic on computers and meet quite an unusual application of the fast Fourier transform (FFT). We’ll also develop a set of routines that you can use for other calculations at any desired level of arithmetic precision. To start with, we need an analytic algorithm for π. Useful algorithms are quadratically convergent, i.e., they double the number of significant digits at each iteration. Quadratically convergent algorithms for π are based on the AGM (arithmetic geometric mean) method, which also finds application to the calculation of elliptic integrals (cf. §6.11) and in advanced implementations of the ADI method for elliptic partial differential equations (§19.5). Borwein and Borwein [1] treat this subject, which is beyond our scope here. One of their algorithms for π starts with the initializations X0 = √ 2 π0 =2+ √ 2 Y0 = √4 2 (20.6.1) and then, for i = 0, 1,... , repeats the iteration Xi+1 = 1 2 Xi + 1 √Xi πi+1 = πi Xi+1 + 1 Yi + 1 Yi+1 = Yi Xi+1 + 1 Xi+1 Yi + 1 (20.6.2) The value π emerges as the limit π∞. Now, to the question of how to do arithmetic to arbitrary precision: In a high-level language like C, a natural choice is to work in radix (base) 256, so that character arrays can be directly interpreted as strings of digits. At the very end of our calculation, we will want to convert our answer to radix 10, but that is essentially a frill for the benefit of human ears, accustomed to the familiar chant, “three point