834 IEEE Transactions on Consumer Electronics,Vol.48,No.4,NOVEMBER 2002 A HIGH-SPEED,PROGRAMMABLE,CSD COEFFICIENT FIR FILTER Zhangwen Tang,Jie Zhang and Hao Min ASIC System State-Key Laboratory,Fudan University,Shanghai 200433,P.R.China Xin回回回……。 Abstrac-A new high-speed,programmable FIR filter is presented,which is a multiplierless filter with CSD encoding coefficients.In this paper,we propose a new programmable X-0X-0)X-i--h0 CSD encoding structure to make CSD coefficients programmable.Compared with the conventional FIR structure with Booth multipliers,this coding structure improves the speed of filter and decreases the area.In the end of this paper,we design a 10-blts,18-taps video luminance filter with the presented filter structure.The completed filter core occupies 6.86.8 mm2 of silicon area in 0.6um 2P2M Yout CMOS technology,and its maximum work frequency is 100MHz. Fig.1.Structure of traditional FIR filter Index terms-Finite Impulse Response filter,Application cannot accommodate such high sample rates without an Specific Integrated Circuit,Canonic Signed-digit Encode, excessive amount of parallel processing.And for dedicated Booth Multiplier.Wallace Adder Tree. applications,the flexibility of a filter with high-speed multipliers [2]is not necessary. 1.INTRODUCTION In this paper,we present a new high-speed,CSD coefficient FIR filter structure.Through studying CSD coefficient F inite Impulse response (FIR)filters have been used in filters,Booth multipliers and high-speed adders,we consumer electronics products more and more widely, propose a new programmable CSD encoding structure to such as video and communication circuits.Hence,higher make CSD coefficients programmable.With this structure, performance in speed and area is demanded.The traditional we can implement any order of high-speed FIR filters,and FIR filter structure [1],as shown in Fig.I,has not yet met the critical path is almost not proportional to the tap the high-speed demand of high performance systems number.In the following,we will respectively address CSD because of the limit of multiplier and adder circuits in speed encoding in FIR filters in the second part,programmable and area. CSD encoding structure in the third part and the structure of partial-product adder tree in the fourth part.In the end, The transform function of FIR filter is described by we adopt the presented structure to implement a 10-bits,18- taps video luminance filter. y(n)= h(k)x(n-k) (1) k=0 II.CSD ENCODING IN FIR FILTER The critical path of the FIR filter in Fig.I is TM+MTA,TM In many applications,coefficients are fixed in FIR filter. is the delay of one multiplication,TA is the delay of one Thus we can simply shift the data bus to the left or right by adder,and M is the tap number.It is evident that the critical an appropriate number of bits and employ a small number path is rapidly increasing with the tap number of FIR filter. of adders/subtracters instead of multipliers.The resulting hardware complexity is a small fraction of the complexity High-speed digital filtering applications (such as,sample of a general filter with multipliers and thus a significantly rates in excess of 20MHz)generally require the use of larger number of taps can be integrated into a single chip. custom application specific integrated circuits (ASICs), because programmable signal processors (such as DSPs) As we all know,any fraction can be described by [3]. This work is supported by AM (Applied Material)Funds (NO.0108)of X= -P shanghai,in P.R.China.Zhangwen Tang,Jie Zhang and Hao Min are with (2) ASIC Systern State-key Laboratory,Fudan University,220 Handan Road,Shanghai 200433,P.R.China. Contributed Paper Original manuscript received March 25,2001 Revised manuscript received May 16,2002 00983063/00$10.002002IEEE
Z.Tang et al.:A High-Speed,Programmable,CSD Coefficient FIR Filter 835 Table 1.CSD Encoding Operation CSD Definition Operation "e1 Encoding 00000 20 Shift 0 bit 00001-01110 2124 Shift 1 to 14 bits 0 01111 0 Zero 10000 Shift 0 bit and Negate 10001-11110 2*124 Shift】tol4bits and Negate 11111 0 Zero og0s01is立zd由o南@4a帖05 The input of CSD encoding structure is 5-bits'signed Freqisncy binary number.MSB is a signed bit,which represents 0¥)=-72" 切02)-+7" negating operation.Four low bits represent the number of 0=00·-24 阿=00=- )=0》=+2.2* 闭=0)24,24- shifting right.If four low bits are all one"1111",the output 0=物x2*+z*-2 is all zero.In the end of this structure,three outputs (partial 9=0功=-2-2 products)of one tap are added together. Fig.2.Frequency response and CSD coefficients for 10- Employing the above programmabie CSD encoding bits,18-taps luminance filter structure,the partial-product number and the internal data length both decrease.As we all know,NbitsxNbits Booth multiplier has [N/2]partial products [5][6]and the interal data length is 2N bits.However,CSD encoding structure has only three partial products and the internal data length is a little larger than N(usually,N+2 or N+4)to guarantee the truncation error less than quantization noise.Thus the Fig.3.Structure of one tap CSD encoding programmable CSD encoding structure is more advantageous in the complexity and compatibility than where sk∈{l,0,l}and p∈{0,l,,M}.The representation filter structure with Booth multipliers given by (2)has M+I total (ternary)digits and L nonzero digits.Canonic signed-digit(CSD)representation is defined as the minimal representation for which has no two nonzero IV.PARTIAL PRODUCT ADDER TREE digits s being adjacent.Thus the number of adders/subtracters required to realize a CSD coefficient is A.Wallace Adder Tree in Booth Multiplier one less than the number of nonzero digits in the fraction. For any coefficient in FIR filters can been translated into CSD coefficient [4],we develop a MATLAB program to Multiplier is a fundamental unit in digital signal processing generate and optimize the CSD coefficients of general FIR circuits,and most of multipliers (Fig.4)in [5][6]employ filter.The CSD coefficients and frequency response modified Booth algorithm and parallel Wallace adder tree diagram for a 10-bits,18-taps luminance filter are shown as It consists of three modules:Booth encoder,partial- Fig.2. product adder array and final adder.By employing modified Booth algorithm,the number of partial products decreases to half.If the bit number of multiplier is N,then III.PROGRAMMABLE CSD ENCODING STRUCTURE there are [N/2]partial products.The partial product adder array adds the whole [N/2]partial products and generates The complexity of FIR filter can decrease rapidly with CSD 2N-bits Carry and Sum.In the interest of improving the parallelity,it adopts 4:2 compression adder instead of 3:2 coefficient multipliers instead of fixed coefficient full adder in partial product adder array.In the end,the multipliers,but the compatibility decreases too.In this final adder adds the 2N-bits C (Carry)and S (Sum)and paper we employ a new programmable CSD encoding generates the multiplier product. structure to decrease the complexity and increase the compatibility. B.Partial Product Adder Tree in FIR Filters Fig.3 is one-tap CSD encoding structure.Its CSD coefficients are generated by MATLAB program.There are In the FIR filter design,we propose a special adder tree for no more than three nonzero digits in one CSD coefficient. FIR filters through studying the above Wallace adder tree The shift operation according to the position of nonzero used in Booth multiplier.The partial-product adder tree digit is shown as Table 1. used in FIR filters is shown as Fig.5.There are three
836 IEEE Transactions on Consumer Electronics,Vol.48,No.4,NOVEMBER 2002 Multiplier(N bits) Multiplicand(N bits) BOOTH Encoder Aa Booth 7 Full Adder Block Partlal products generator cin Cout Partial Product Adder Array Block C.A.C 4:2 Compression Adder Final Adder Carry Select Adder &CLA Wallace Adder Tree Product(2N bits) Fig.4.Wallace adder tree in Booth multiplier 立 Carry Addor 4:2 Compression Adder 3525150 Partial Product Adder Array 儿甍幕 Fig.5.Adder array in FIR filter Fig.6.Carry selected adder with carry look-ahead adder partial products in one-tap CSD encoding structure,thus M- taps FIR filter has 3M partial products.In order to add all these partial products,we need [log23M]level partial- V.IMPLEMENTATION OF PROGRAMMABLE CSD FIR FILTER product adders (4:2 compression adder)to generate two partial products C(Carry)and S(Sum). A programmable CSD coefficient filter (Fig.7)is implemented with the above modules.The top is programmable CSD encoding structure,the middle is a 4:2 C.Final Adder compression adder tree,and the bottom is a carry-selected adder with carry look-ahead adders.To eliminate the DC The critical path of partial product adder array in FIR filter gain,the filtered signal is output to a DC gain cancelled is only the delay of [2xlog23M]serial full adders.However, module. final adder is a (N+2 or N+4)bits full adder(2 or 4 is the guarantee bits to decrease the truncation error),its delay is The presented filter is more advantageous in the speed and greater than the one in adder array.Thus we need to adopt area than the conventional filter with Booth multipliers. carry-selected adder (CSA)with carry look-ahead adder Performance comparison between two structures is shown (CLA).Carry look-ahead adder is shown as Fig.6,which in Table 2.Tea is the delay of one 4:2 compression adder. employs two 4-bits CLA to buildup one 8-bits CSA.It can Tn is the delay of one final adder.From Table 2,it can be be concluded that the critical path is one 8-bits CLA and concluded that the critical path of the proposed structure is some multiplexers,thus the adder of this structure can work almost not proportional to the tap number M. at high speed
Z.Tang et al.:A High-Speed,Programmable,CSD Coefficient FIR Filter 837 VIIL.REFERENCE [1]Zhou Yaohua et al.,Digital Signal Processing (Fudan University Press,Shanghai,1996),pp.65. [2]Jun Rim Choi,Lak Hyun Jang,et al.,"Structured Design of a 288-TAP FIR Filter by Optimized Partial Product Tree Compression",IEEE J.Solid-State 4:2 Circuit vol 32,pp.468-476,March,1997. [3]A.Avizienis,"Signed Digit Number Representation for I2C Slave Fast Parallel Arithmetic,"IRE Trans. Electron SDA Controller Computers,vol.EC-10,pp.389-400,1961. Partial [4]H.Samueli,"An improved search algorithm for the Product design of multiplierless FIR filters with powers-of-two Adder Array coefficients",IEEE Trans.Circuits Syst.,vol.36,pp. 1044-1047,July1989. Carry Lookahead [5]M.Nagamatsu et al.,"A 15ns 32x32-bit CMOS Fin选】der ar-l)Yout" Multiplier with an Improved Parallel structure,"/EEE J.Solid-State Circuit vol 25,pp.494-497,April,1990. DC gain Bodule [6]Junji Mori,et al,"A 10ns 54x54-b parallel structured Yout Full Array Multiplier with 0.5-um COMS Fig.7.Programmable,CSD coefficient FIR filter Technology,"IEEE J.Solid-State Ciruit vol 26.pp. 600-606.April1,1991. Table 2.Performance comparison Performance Proposed Conventional comparison structure structure Zhangwen Tang received the B.S. Partial product in 3 [N/2] one tap and M.S.degree in Electrical Internal word N+2(4) 2×N Engineering from Fudan University, length Shanghai,P.R.China in 1999 and Critical path [2×log23l [2×log2TN/2] of adder tree XT ×Ta 2001 respectively,and now is Critical path of [2×Iog23M [2x1og2[N/2]] pursuing the Ph.D.from Fudan filter ×Ta+Ta xTo+M*T University.From 1999 he was with In digital video encoder (DVE)system,we design a 10-bits, the ASIC and System State-Key 18-taps high-speed luminance filter using the proposed FIR Laboratory,where he worked on the design of DVE filter structure.Its frequency response diagram is shown as (Digital Video Encoder).His research interests include Fig.2.To download the coefficients into FIR filters,we design a slave-mode I'C bus controller.The whole chip is digital video signal processing,mixed signal VLSI deign implemented in 0.6-um 2P2M CMOS technology.The and CMOS TV tuner. filter core area is 6.8mmx6.8mm,the maximum work frequency is 100MHz. Hao Min received the B.S.and M.S.degree in VI.CONCLUSION electrical engineering from Fudan University,Shanghai, In this paper we present a new high-speed,programmable P.R.China in 1985 and 1988 respectively.In 1991 he FIR filter,which is a multiplierless filter with CSD received the Ph.D.degree in material science from encoding coefficients.A new programmable CSD encoding Fudan. structure is proposed to make CSD coefficients programmable.Compared with the conventional FIR From 1996 to 1998 he was a visiting scholar in Stanford structure with Booth multiplier,this coding structure University,CA.He is now the director of ASIC and improves the speed of filter and decreases the area.And we design a 10-bits,18-taps video luminance filter with the System State-key Laboratory.He has published more presented filter structure.The completed filter core than ten technical papers in journals and conferences. occupies 6.8x6.8 mm'of silicon area in 0.6-um 2P2M His research interests include mixed signal VLSI design CMOS technology,and its maximum work frequency is 100MHz. and digital signal processing VII.ACKNOWLEDGMENT The authors wish to thank Wenhong Li for providing the help in using EDA tools and Yawei Guo,Cheng Chen for DRC and LVS of the layout of the whole chip