正在加载图片...
904 Chapter 20.Less-Numerical Algorithms independently random sequences of these characters (a conservative,but not always realistic assumption)require,on the average,at least H=-∑plog2p: 20.4.1) bits per character.Here H is the entropy of the probability distribution.Moreover, coding schemes exist which approach the bound arbitrarily closely.For the case of equiprobable characters,with all pi=1/Nch,one easily sees that H=log2 Nch, which is the case of no compression at all.Any other set of pi's gives a smaller entropy,allowing some useful compression. Notice that the bound of(20.4.1)would be achieved if we could encode character iwith a code of length Li=-log2 pi bits:Equation(20.4.1)would then be the average >piLi.The trouble with such a scheme is that -log2 pi is not generally an integer.How can we encode the letter"Q"in 5.32 bits?Huffman coding makes a stab at this by,in effect,approximating all the probabilities pi by integer powers of 1/2,so that all the Li's are integral.If all the pi's are in fact of this form,then 心学 a Huffman code does achieve the entropy bound H. The construction of a Huffman code is best illustrated by example.Imagine 令 a language,Vowellish,with the Nch=5 character alphabet A,E,I,O,and U, occurring with the respective probabilities 0.12,0.42,0.09,0.30,and 0.07.Then the Americ computer, construction of a Huffman code for Vowellish is accomplished in the following table: ART Progra Node Stage: 2 3 4 5 1 A 0.12 0.12■ email 1CIYP ic Copyright (C) E: 0.42 0.42 0.42 0.42■ 0.09■ 4 0 0.30 0.30 0.30■ 5 U: 0.070 Ito directcustserv@cambridge.org 6 UI: 0.16■ 1988-1992 by Numerical Recipes OF SCIENTIFIC COMPUTING(ISBN 12-:621-43106-50 > AUI: 0.28■ AUIO: 0.58■ (outside North Software. EAUIO:1.00 Amer Here is how it works,proceeding in sequence through Neh stages,represented visit website by the columns of the table.The first stage starts with Neh nodes,one for each f machine letter of the alphabet,containing their respective relative frequencies.At each stage, the two smallest probabilities are found,summed to make a new node,and then dropped from the list of active nodes.(A"block"denotes the stage where a node is dropped.)All active nodes(including the new composite)are then carried over to the next stage(column).In the table,the names assigned to new nodes (e.g.,AUI) are inconsequential.In the example shown,it happens that(after stage 1)the two904 Chapter 20. Less-Numerical Algorithms Permission is granted for internet users to make one paper copy for their own personal use. Further reproduction, or any copyin Copyright (C) 1988-1992 by Cambridge University Press. Programs Copyright (C) 1988-1992 by Numerical Recipes Software. Sample page from NUMERICAL RECIPES IN C: THE ART OF SCIENTIFIC COMPUTING (ISBN 0-521-43108-5) g of machine￾readable files (including this one) to any server computer, is strictly prohibited. To order Numerical Recipes books or CDROMs, visit website http://www.nr.com or call 1-800-872-7423 (North America only), or send email to directcustserv@cambridge.org (outside North America). independently random sequences of these characters (a conservative, but not always realistic assumption) require, on the average, at least H = −pi log2 pi (20.4.1) bits per character. Here H is the entropy of the probability distribution. Moreover, coding schemes exist which approach the bound arbitrarily closely. For the case of equiprobable characters, with all pi = 1/Nch, one easily sees that H = log2 Nch, which is the case of no compression at all. Any other set of pi’s gives a smaller entropy, allowing some useful compression. Notice that the bound of (20.4.1) would be achieved if we could encode character i with a code of length Li = − log2 pi bits: Equation (20.4.1) would then be the average piLi. The trouble with such a scheme is that − log2 pi is not generally an integer. How can we encode the letter “Q” in 5.32 bits? Huffman coding makes a stab at this by, in effect, approximating all the probabilities pi by integer powers of 1/2, so that all the Li’s are integral. If all the pi’s are in fact of this form, then a Huffman code does achieve the entropy bound H. The construction of a Huffman code is best illustrated by example. Imagine a language, Vowellish, with the Nch = 5 character alphabet A, E, I, O, and U, occurring with the respective probabilities 0.12, 0.42, 0.09, 0.30, and 0.07. Then the construction of a Huffman code for Vowellish is accomplished in the following table: Node Stage: 1 2 3 45 1 A: 0.12 0.12 2 E: 0.42 0.42 0.42 0.42 3 I: 0.09 4 O: 0.30 0.30 0.30 5 U: 0.07 6 UI: 0.16 7 AUI: 0.28 8 AUIO: 0.58 9 EAUIO: 1.00 Here is how it works, proceeding in sequence through Nch stages, represented by the columns of the table. The first stage starts with Nch nodes, one for each letter of the alphabet, containing their respective relative frequencies. At each stage, the two smallest probabilities are found, summed to make a new node, and then dropped from the list of active nodes. (A “block” denotes the stage where a node is dropped.) All active nodes (including the new composite) are then carried over to the next stage (column). In the table, the names assigned to new nodes (e.g., AUI) are inconsequential. In the example shown, it happens that (after stage 1) the two
<<向上翻页向下翻页>>
©2008-现在 cucdc.com 高等教育资讯网 版权所有