Signal Processing:Image Communication 28 (2013)1435-1447 Contents lists available at ScienceDirect IMAGE Signal Processing:Image Communication ELSEVIER journal homepage:www.elsevier.com/locate/image Fuzzy quantization based bit transform for low bit-resolution CrossMark motion estimation Chuan-Ming Song 3.b.*,Yanwen Guob,Xiang-Hai Wang 3.b,Dan Liua College of Computer and Information Technology.Liaoning Normal University,Dalian 116029.China National Key Laboratory for Novel Software Technology.Nanjing University.Nanjing 210093.China ARTICLE INFO ABSTRACT Article history: This study proposes a novel fuzzy quantization based bit transform for low bit-resolution Received 6 February 2013 motion estimation.We formalize the procedure of bit resolution reduction by two Received in revised form successive steps,namely interval partitioning and interval mapping.The former is a 6 August 2013 Accepted 24 September 2013 many-to-one mapping which determines motion estimation performance,while the latter Available online 12 October 2013 is a one-to-one mapping.To gain a reasonable interval partitioning.we propose a non- uniform quantization method to compute coarse thresholds.They are then refined by Keywords: using a membership function to solve the mismatch of pixel values near threshold caused Video coding by camera noise,coding distortion,etc.Afterwards,we discuss that the sum of absolute Motion estimation difference(SAD)is one of the fast matching metrics suitable for low bit-resolution motion Block matching Fuzzy quantization estimation in the sense of mean squared errors.A fuzzy quantization based low bit- Low bit-resolution resolution motion estimation algorithm is consequently proposed.Our algorithm not only Low bit-depth can be directly employed in video codecs,but also be applied to other fast or complexity scalable motion estimation algorithms.Extensive experimental results show that the proposed algorithm can always achieve good motion estimation performances for video sequences with various characteristics.Compared with one-bit transform,multi- thresholding two-bit transform,and adaptive quantization based two-bit transform,our bit transform separately gains 0.98 dB,0.42 dB,and 0.24 dB improvement in terms of average peak signal-to-noise ratio,with less computational cost as well. 2013 Elsevier B.V.All rights reserved. 1.Introduction heterogeneous networks in real time.These clients may have varying capabilities in display resolution,computing power. Various video services,such as surveillance,video tele- network bandwidth,etc.An efficient video codec is essential phony/conferencing,mobile streaming.wireless LAN video in for the video services in such a challenging scenario.Sullivan home network,and even beyond high definition video,are et al.[3]believed that most of the efficiency improvement of becoming available in more and more application scenarios state-of-art video codecs,e.g,H.264/AVC,results from better with the rapid development of Internet,wireless communica- temporal prediction and compensation.Motion estimation/ tion,and pervasive computing technologies [1.2].Mass videos compensation therefore plays a key role in coding system. are required to be reliably delivered to diverse clients over Unfortunately,motion estimation is usually remarked as the most computationally intensive component,consuming up to 50%[4],even to 60-80%[5.of the computation evolved in the .Corresponding author at:College of Computer and Information entire codec.Such a heavy computation load will inhibit Technology.Liaoning Normal University.Dalian 116029,China. practical video communication on portable and wireless Te1:+8615140397690. E-mail addresses:chmsong@163.com, devices with limited battery power.Hence an efficient motion chmsong@Innu.edu.cn(C.-M.Song).ywguo@nju.edu.cn (Y.Guo). estimation algorithm with low computational complexity is of xhwang@Innu.edu.cn (X.-H.Wang),liudan_di@163.com (D.Liu). great importance for real-time video services cross platforms. 0923-5965/S-see front matter 2013 Elsevier B.V.All rights reserved. http://dx.doi.org/10.1016/j.image.2013.09.007
Fuzzy quantization based bit transform for low bit-resolution motion estimation Chuan-Ming Song a,b,n , Yanwen Guo b , Xiang-Hai Wang a,b , Dan Liu a a College of Computer and Information Technology, Liaoning Normal University, Dalian 116029, China b National Key Laboratory for Novel Software Technology, Nanjing University, Nanjing 210093, China article info Article history: Received 6 February 2013 Received in revised form 6 August 2013 Accepted 24 September 2013 Available online 12 October 2013 Keywords: Video coding Motion estimation Block matching Fuzzy quantization Low bit-resolution Low bit-depth abstract This study proposes a novel fuzzy quantization based bit transform for low bit-resolution motion estimation. We formalize the procedure of bit resolution reduction by two successive steps, namely interval partitioning and interval mapping. The former is a many-to-one mapping which determines motion estimation performance, while the latter is a one-to-one mapping. To gain a reasonable interval partitioning, we propose a nonuniform quantization method to compute coarse thresholds. They are then refined by using a membership function to solve the mismatch of pixel values near threshold caused by camera noise, coding distortion, etc. Afterwards, we discuss that the sum of absolute difference (SAD) is one of the fast matching metrics suitable for low bit-resolution motion estimation in the sense of mean squared errors. A fuzzy quantization based low bitresolution motion estimation algorithm is consequently proposed. Our algorithm not only can be directly employed in video codecs, but also be applied to other fast or complexity scalable motion estimation algorithms. Extensive experimental results show that the proposed algorithm can always achieve good motion estimation performances for video sequences with various characteristics. Compared with one-bit transform, multithresholding two-bit transform, and adaptive quantization based two-bit transform, our bit transform separately gains 0.98 dB, 0.42 dB, and 0.24 dB improvement in terms of average peak signal-to-noise ratio, with less computational cost as well. & 2013 Elsevier B.V. All rights reserved. 1. Introduction Various video services, such as surveillance, video telephony/conferencing, mobile streaming, wireless LAN video in home network, and even beyond high definition video, are becoming available in more and more application scenarios with the rapid development of Internet, wireless communication, and pervasive computing technologies [1,2]. Mass videos are required to be reliably delivered to diverse clients over heterogeneous networks in real time. These clients may have varying capabilities in display resolution, computing power, network bandwidth, etc. An efficient video codec is essential for the video services in such a challenging scenario. Sullivan et al. [3] believed that most of the efficiency improvement of state-of-art video codecs, e.g., H.264/AVC, results from better temporal prediction and compensation. Motion estimation/ compensation therefore plays a key role in coding system. Unfortunately, motion estimation is usually remarked as the most computationally intensive component, consuming up to 50% [4], even to 60–80% [5], of the computation evolved in the entire codec. Such a heavy computation load will inhibit practical video communication on portable and wireless devices with limited battery power. Hence an efficient motion estimation algorithm with low computational complexity is of great importance for real-time video services cross platforms. Contents lists available at ScienceDirect journal homepage: www.elsevier.com/locate/image Signal Processing: Image Communication 0923-5965/$ - see front matter & 2013 Elsevier B.V. All rights reserved. http://dx.doi.org/10.1016/j.image.2013.09.007 n Corresponding author at: College of Computer and Information Technology, Liaoning Normal University, Dalian 116029, China. Tel.: þ86 15140397690. E-mail addresses: chmsong@163.com, chmsong@lnnu.edu.cn (C.-M. Song), ywguo@nju.edu.cn (Y. Guo), xhwang@lnnu.edu.cn (X.-H. Wang), liudan_dl@163.com (D. Liu). Signal Processing: Image Communication 28 (2013) 1435–1447
1436 C-M.Song et aL Signal Processing:Image Communication 28 (2013)1435-1447 Various fast algorithms have been proposed so far to quantization based motion estimation is discussed in lower motion estimation cost.They can be roughly classi- Section 5.We examine the proposed approach in Section 6 fied into four categories. and conclude the paper in the last section. The first category of algorithms searches for motion vector in a subset of possible candidate vectors,reducing the number of search locations per macroblock.Represen- 2.Related works tative algorithms include three-step search [6].diamond search [7].cross-diamond-hexagonal search [8],hexagon- A considerable amount of research has been done on based search [9],unsymmetrical-cross multi-hexagon-grid bit transforms during the last decade.State-of-the-art search [5.motion estimation using dynamic models [10. transforms are mainly based on three approaches. adaptive neighborhood elimination algorithm [11].adap- The first approach is filtering.Feng et al.25 trans- tive motion search range prediction [12].and fast sub- formed an eight-bit pixel to its one-bit representation by pixel motion estimation 13. comparing the pixel against the mean value.Mizuki et al. The main concept of the second category is to compute [26,27]obtained the binary representation by an edge matching errors using a fraction of pixels,reducing operation detection procedure.Natarajan et al.[28,52,38]employed number per matching.Typical algorithms involve partial a 2-D multi-bandpass filter of size 17 x 17 to implement distortion search [14],multiresolution motion estimation the one-bit transform.Erturk et al.[29-31,33]proposed a [15-17],block matching with an adaptive pattern [18,191. novel filter kernel using shift operation instead of multi- and advanced spatial hierarchical motion estimation[20. plication in [28].Lee et al.[37,39,49-51]detected a zero- Low-complexity matching metrics are the core of the cross phase with mean as the DC bias and used the binary third category.Commonly used matching metrics include phase deviation as the bit transform result.Luo et al.[34] sum of absolute differences(SAD).pel difference classifi- proposed a linear and symmetric filter to better register cation (PDC)[21],minimized maximum error(MiniMax) average spatial characteristics and to facilitate a more [22].different pixel counts(DPC).bit exclusive-OR(BXOR) accurate binary representation.Although many improve- [23].and mean-predicted sum of absolute differences [24]. ments have been contributed,these algorithms always The fourth category of algorithms performs motion involve convolution operations and high computational estimation on low bit-resolution pixels whose values are overhead as a result. mapped to from eight bit-resolution pixels [25-61].These Some studies exploit frequency-domain techniques for algorithms contribute to simple hardware implementa- less computation.Wu et al.[32]presented a gradient- tion,low power consumption,and memory bandwidth. based thresholding in the discrete cosine transform(DCT) A cost-effective and real-time video codec can thus be domain.As video coders always work in the DCT domain. realized on a low-power device with limited capabilities. this method incurs little overhead.Erturk et al.[47]carried In this study.we focus on the last category of algo- out the binarization in wavelet domain by only retaining rithms,considering their potential in computational cap- filtered lowest frequency coefficients. ability constraint devices.Our main contribution is a bit Another limitation of filtering-based bit transforms is transform method which maps eight-bit values to those that they are only adequate for one-bit motion estimation. with lower bit-resolution.It is the crucial component of In the case of higher bit-resolution,the filtering is required low bit-resolution motion estimation,determining the executing for multiple times. codec performance.We first formalize bit transform by The second approach is pixel truncation.Baek et al.43 interval partitioning and interval mapping.Through presented a criterion,named reduced bits mean absolute exploiting quantization theory,we then find an initial difference,which only uses the most n significant bits to solution to interval partitioning.To resolve false partition- compute the matching errors when the bit-resolution is n. ing caused by inter-frame noises,a membership function is He et al.44,57,60,59,58 truncated the least significant employed to refine the initial thresholds. bits to realize the pixel resolution reduction.The number Our approach has the following benefits: of truncated bits can be fixed or adaptively defined. Nevertheless,Patras et al.[62]stated that motion- 1.Bit transform is bound to result in pixels'accuracy loss compensated differences follow an independent Laplacian as well as performance degradation of motion estima- distribution.It indicates that the matching errors always tion.Addressing bit transform by quantization enables occur on insignificant bits.If these bits are truncated, us to better utilize existing theory.This facilitates pixels in the area around the best-matched macroblock reducing as much accuracy loss as possible so as to could be converted into the same low-resolution values. guarantee motion estimation efficiency. The resulting matching errors corresponding to all candi- 2.Membership function achieves good robustness to date vectors in this area will be a constant,disabled from coding distortions,camera noises,etc.It can effectively distinguishing the optimal motion vector.Moshnyaga61 eliminate the mismatch due to hard thresholding.and addressed this issue by adaptively truncating the most improve prediction quality of complex videos with high significant bits according to the data variation. spatial details or fast motion. Truncating the least significant bits tends to influence the motion estimation performance for sequences with The remainder of this paper is organized as follows. low spatial detail or low amount of movement,and vice Section 2 gives an overview of previous works.Sections 3 versa.Therefore,the least significant bit truncation and and 4 describe the fuzzy quantization approach.A fuzzy the most significant bit truncation should be skillfully
Various fast algorithms have been proposed so far to lower motion estimation cost. They can be roughly classified into four categories. The first category of algorithms searches for motion vector in a subset of possible candidate vectors, reducing the number of search locations per macroblock. Representative algorithms include three-step search [6], diamond search [7], cross-diamond-hexagonal search [8], hexagonbased search [9], unsymmetrical-cross multi-hexagon-grid search [5], motion estimation using dynamic models [10], adaptive neighborhood elimination algorithm [11], adaptive motion search range prediction [12], and fast subpixel motion estimation [13]. The main concept of the second category is to compute matching errors using a fraction of pixels, reducing operation number per matching. Typical algorithms involve partial distortion search [14], multiresolution motion estimation [15–17], block matching with an adaptive pattern [18,19], and advanced spatial hierarchical motion estimation [20]. Low-complexity matching metrics are the core of the third category. Commonly used matching metrics include sum of absolute differences (SAD), pel difference classification (PDC) [21], minimized maximum error (MiniMax) [22], different pixel counts (DPC), bit exclusive-OR (BXOR) [23], and mean-predicted sum of absolute differences [24]. The fourth category of algorithms performs motion estimation on low bit-resolution pixels whose values are mapped to from eight bit-resolution pixels [25–61]. These algorithms contribute to simple hardware implementation, low power consumption, and memory bandwidth. A cost-effective and real-time video codec can thus be realized on a low-power device with limited capabilities. In this study, we focus on the last category of algorithms, considering their potential in computational capability constraint devices. Our main contribution is a bit transform method which maps eight-bit values to those with lower bit-resolution. It is the crucial component of low bit-resolution motion estimation, determining the codec performance. We first formalize bit transform by interval partitioning and interval mapping. Through exploiting quantization theory, we then find an initial solution to interval partitioning. To resolve false partitioning caused by inter-frame noises, a membership function is employed to refine the initial thresholds. Our approach has the following benefits: 1. Bit transform is bound to result in pixels' accuracy loss as well as performance degradation of motion estimation. Addressing bit transform by quantization enables us to better utilize existing theory. This facilitates reducing as much accuracy loss as possible so as to guarantee motion estimation efficiency. 2. Membership function achieves good robustness to coding distortions, camera noises, etc. It can effectively eliminate the mismatch due to hard thresholding, and improve prediction quality of complex videos with high spatial details or fast motion. The remainder of this paper is organized as follows. Section 2 gives an overview of previous works. Sections 3 and 4 describe the fuzzy quantization approach. A fuzzy quantization based motion estimation is discussed in Section 5. We examine the proposed approach in Section 6 and conclude the paper in the last section. 2. Related works A considerable amount of research has been done on bit transforms during the last decade. State-of-the-art transforms are mainly based on three approaches. The first approach is filtering. Feng et al. [25] transformed an eight-bit pixel to its one-bit representation by comparing the pixel against the mean value. Mizuki et al. [26,27] obtained the binary representation by an edge detection procedure. Natarajan et al. [28,52,38] employed a 2-D multi-bandpass filter of size 17 17 to implement the one-bit transform. Ertürk et al. [29–31,33] proposed a novel filter kernel using shift operation instead of multiplication in [28]. Lee et al. [37,39,49–51] detected a zerocross phase with mean as the DC bias and used the binary phase deviation as the bit transform result. Luo et al. [34] proposed a linear and symmetric filter to better register average spatial characteristics and to facilitate a more accurate binary representation. Although many improvements have been contributed, these algorithms always involve convolution operations and high computational overhead as a result. Some studies exploit frequency-domain techniques for less computation. Wu et al. [32] presented a gradientbased thresholding in the discrete cosine transform (DCT) domain. As video coders always work in the DCT domain, this method incurs little overhead. Ertürk et al. [47] carried out the binarization in wavelet domain by only retaining filtered lowest frequency coefficients. Another limitation of filtering-based bit transforms is that they are only adequate for one-bit motion estimation. In the case of higher bit-resolution, the filtering is required executing for multiple times. The second approach is pixel truncation. Baek et al. [43] presented a criterion, named reduced bits mean absolute difference, which only uses the most n significant bits to compute the matching errors when the bit-resolution is n. He et al. [44,57,60,59,58] truncated the least significant bits to realize the pixel resolution reduction. The number of truncated bits can be fixed or adaptively defined. Nevertheless, Patras et al. [62] stated that motioncompensated differences follow an independent Laplacian distribution. It indicates that the matching errors always occur on insignificant bits. If these bits are truncated, pixels in the area around the best-matched macroblock could be converted into the same low-resolution values. The resulting matching errors corresponding to all candidate vectors in this area will be a constant, disabled from distinguishing the optimal motion vector. Moshnyaga [61] addressed this issue by adaptively truncating the most significant bits according to the data variation. Truncating the least significant bits tends to influence the motion estimation performance for sequences with low spatial detail or low amount of movement, and vice versa. Therefore, the least significant bit truncation and the most significant bit truncation should be skillfully 1436 C.-M. Song et al. / Signal Processing: Image Communication 28 (2013) 1435–1447
C.-M.Song et aL Signal Processing:Image Communication 28 (2013)1435-1447 1437 combined depending on the pixel distributions,instead of Divide f into two steps,namely interval partitioning independently used. t:G-G.gr and interval mapping m:G-B.rb,and The third approach is quantization.Lee et al.[40,41] then we have f(g)=m(t(g)).i.e.. utilized an adaptive quantization to implement a two-bit transform.Ertuirk and Erturk [4,42,53]proposed a multi- T1, -1<g≤T1 thresholding with mean value and approximated standard r2, T1<g≤T2 deviation.Furthermore,Kim et al.[54,55]made use of t(g)= r3. T2<g≤T3 (1) positive and negative second derivatives to improve [4,42,53].Wang et al.[48]selected a low-pass filtered T2"-1<g≤255 version of the current frame as a mid point,and then applied a uniform quantization with a pre-defined step and size to accomplish the bit transform. Both truncation and quantization approaches can be 0, r=r used for bit transform whose resolution is more than one 1, r=r2 bit.Their computational complexity is obviously lower m(r)= r=r3 (2) than that of the filtering approach.Moreover,the trunca- tion is equivalent to a uniform quantization in essence.The 2 T=Tw quantization thus provides an efficient strategy for bit transform The interval partitioning is a many-to-one mapping,while However,current quantization based methods compute the interval mapping is a one-to-one mapping.Hence the thresholds on macroblock basis.Since a search window former decides the threshold selection as well as motion compasses several macroblocks,the low bit-resolution estimation performance.To preserve as much important reference tends to display discontinuities along two adja- information as possible in the low bit-resolution repre- cent blocks if different thresholds are selected [32].The sentation,we need to find an optimal mapping t that motion estimation accuracy will be influenced by the minimizes blocking artifact.Even if a uniform threshold is adopted by the truncation-based approach,it cannot adapt to scene 255 D=Eg-tg)》]=∑pg)g-tg (3) changes and would fail to provide a fine grain scratch for low bit-resolution motion estimation. in which Ef.]and p()denote expectation operator and On the other hand,state-of-art bit transforms adopt a hard thresholding manner which regards pixel values probability distribution function (PDF)of pixel values, respectively.However,finding t is equivalent to simulta- lying on the opposite sides of the threshold as a mismatch even if their values are close.Because of camera noises, neously determining an optimal set of r and non-translational motion,quantization errors during cod- Tj(ie(1.2,....2N).je(1.2.....2N-1)).This is a nonlinear ing.etc..there always exist differences between two best problem and cannot be solved with any ease. matched macroblocks [63].Consequently,hard threshold- Assume that g is a discrete random variable whose PDF ing would take the best pair as a mismatch.Urhan et al. is p.As we know,optimum scalar quantization is to decide a set of thresholds and quanta that minimizes reconstruc- 35,36]counted the pixel values away from threshold tion errors.It has the same issue with interval partitioning. within a certain distance D as a match regardless of their This motivates us to recourse quantization theory to solve values.But they neglected adjusting D to video character- istics.Large D will ignore the pixel differences that should Egs.(1)and(3).According to the quantization theory.a have excluded invalid candidate vectors.On the contrary. uniform quantizer yields better approximation than a non- small D cannot resist inter-frame noises and is not able to uniform one at high resolution [64].and vice versa.We thus adopt different strategies to accomplish the interval avoid an improper match. In general,little attention has been payed to an adap- partitioning depending on the bit-resolution. In the case that bit-resolution is higher than a threshold tive uniform thresholding and its mechanism to deal with R,we use a set of uniform thresholds to define Ti as inter-frame noises,even though they will definitely con- tribute to performance improvement. T=28-Nj-1. (4) 3.A bit transform based on quantization Set To=-1,then we have ri=[(Ti+Ti-1)/2]. (5) Motion estimation with low bit-resolution is always inferior to that with full bit-resolution in terms of motion- When bit-resolution is lower than R,non-uniform compensated quality.The main reason is that the bit thresholds are employed.Max [65]and Lloyd [66]ever transform leads to a data loss.It is therefore crucial to proposed iterative trial-and-error procedures to adaptively reserve as much information of full resolution video as compute non-uniform quantization steps.They also gave a possible.In this section,we present a quantization based list of parameters for signals following uniform,Gaussian, bit transform to address this issue. and Laplacian distributions.Unfortunately,hardly video SetG={0,1,2,,2551.B={0,1,2,,2w-1.Then the signals obey any explicit and identical probability distribu- bit transform mapping eight-bit values to those with N bit- tion.This forces us to run their methods frame-wise resolution can be formalized by a map f:G-B.gb. to assign optimum thresholds for each frame.Obviously
combined depending on the pixel distributions, instead of independently used. The third approach is quantization. Lee et al. [40,41] utilized an adaptive quantization to implement a two-bit transform. Ertürk and Ertürk [4,42,53] proposed a multithresholding with mean value and approximated standard deviation. Furthermore, Kim et al. [54,55] made use of positive and negative second derivatives to improve [4,42,53]. Wang et al. [48] selected a low-pass filtered version of the current frame as a mid point, and then applied a uniform quantization with a pre-defined step size to accomplish the bit transform. Both truncation and quantization approaches can be used for bit transform whose resolution is more than one bit. Their computational complexity is obviously lower than that of the filtering approach. Moreover, the truncation is equivalent to a uniform quantization in essence. The quantization thus provides an efficient strategy for bit transform. However, current quantization based methods compute thresholds on macroblock basis. Since a search window compasses several macroblocks, the low bit-resolution reference tends to display discontinuities along two adjacent blocks if different thresholds are selected [32]. The motion estimation accuracy will be influenced by the blocking artifact. Even if a uniform threshold is adopted by the truncation-based approach, it cannot adapt to scene changes and would fail to provide a fine grain scratch for low bit-resolution motion estimation. On the other hand, state-of-art bit transforms adopt a hard thresholding manner which regards pixel values lying on the opposite sides of the threshold as a mismatch even if their values are close. Because of camera noises, non-translational motion, quantization errors during coding, etc., there always exist differences between two best matched macroblocks [63]. Consequently, hard thresholding would take the best pair as a mismatch. Urhan et al. [35,36] counted the pixel values away from threshold within a certain distance D as a match regardless of their values. But they neglected adjusting D to video characteristics. Large D will ignore the pixel differences that should have excluded invalid candidate vectors. On the contrary, small D cannot resist inter-frame noises and is not able to avoid an improper match. In general, little attention has been payed to an adaptive uniform thresholding and its mechanism to deal with inter-frame noises, even though they will definitely contribute to performance improvement. 3. A bit transform based on quantization Motion estimation with low bit-resolution is always inferior to that with full bit-resolution in terms of motioncompensated quality. The main reason is that the bit transform leads to a data loss. It is therefore crucial to reserve as much information of full resolution video as possible. In this section, we present a quantization based bit transform to address this issue. Set G ¼ f0; 1; 2;…; 255g, B ¼ f0; 1; 2; …; 2N 1g. Then the bit transform mapping eight-bit values to those with N bitresolution can be formalized by a map f : G-B, g↦b. Divide f into two steps, namely interval partitioning t : G-G, g↦r and interval mapping m : G-B, r↦b, and then we have fðgÞ ¼ mðtðgÞÞ, i.e., tðgÞ ¼ r1; 1ogrT1 r2; T1ogrT2 r3; T2ogrT3 ⋮ r2N ; T2N 1ogr255 8 >>>>>>>>>>>: ; ð1Þ and mðrÞ ¼ 0; r ¼ r1 1; r ¼ r2 2; r ¼ r3 ⋮ 2N 1; r ¼ r2N 8 >>>>>>>>>>>: : ð2Þ The interval partitioning is a many-to-one mapping, while the interval mapping is a one-to-one mapping. Hence the former decides the threshold selection as well as motion estimation performance. To preserve as much important information as possible in the low bit-resolution representation, we need to find an optimal mapping t that minimizes D ¼ E½ðgtðgÞÞ2 ¼ ∑ 255 g ¼ 0 pðgÞ½gtðgÞ2; ð3Þ in which E½ and pðÞ denote expectation operator and probability distribution function (PDF) of pixel values, respectively. However, finding t is equivalent to simultaneously determining an optimal set of ri and TjðiAf1; 2;…; 2Ng; jAf1; 2; …; 2N 1gÞ. This is a nonlinear problem and cannot be solved with any ease. Assume that g is a discrete random variable whose PDF is p. As we know, optimum scalar quantization is to decide a set of thresholds and quanta that minimizes reconstruction errors. It has the same issue with interval partitioning. This motivates us to recourse quantization theory to solve Eqs. (1) and (3). According to the quantization theory, a uniform quantizer yields better approximation than a nonuniform one at high resolution [64], and vice versa. We thus adopt different strategies to accomplish the interval partitioning depending on the bit-resolution. In the case that bit-resolution is higher than a threshold R, we use a set of uniform thresholds to define Tj as Tj ¼ 28Nj1: ð4Þ Set T0 ¼ 1, then we have ri ¼ ⌊ðTi þTi1Þ=2⌋: ð5Þ When bit-resolution is lower than R, non-uniform thresholds are employed. Max [65] and Lloyd [66] ever proposed iterative trial-and-error procedures to adaptively compute non-uniform quantization steps. They also gave a list of parameters for signals following uniform, Gaussian, and Laplacian distributions. Unfortunately, hardly video signals obey any explicit and identical probability distribution. This forces us to run their methods frame-wise to assign optimum thresholds for each frame. Obviously, C.-M. Song et al. / Signal Processing: Image Communication 28 (2013) 1435–1447 1437
1438 C.-M.Song et aL Signal Processing:Image Communication 28(2013)1435-1447 this is unpractical with respect to the computational may have Me(x,y)>Tj and Mc(x,y≤Ti,orMc(xy)≤T complexity. and Mc(x.y)>Tj.At this moment,Me(x,y)and Mc(x.y)will As we know,the uniform quantization is the optimum be mapped to different intervals and produce matching quantizer if a signal's PDF is a uniform distribution.This errors.For low bit-resolution motion estimation,the match- conclusion enlightens us on an efficient solution to the ing error of a pixel is usually small.For example,the interval partitioning.Provided that we can find a trans- maximum matching error is just four in the case of two- form e from g to a random variable g'e (0.1,2.....255) bit motion estimation.This indicates that an inaccurate satisfying g'~U(0,255),we will easily obtain the optimum interval mapping brings about 25%matching deviation at thresholds of g'.Afterwards,the interval partitioning of g least.Such a large deviation will inevitably lead to a false can be realized through inversely transforming the com- motion vector.In contrast,the maximum matching error of puted thresholds using e-1.Based on this idea,we present full bit-resolution motion estimation is 255,thus small our method below. matching error,e.g.one,takes up a negligible deviation. This is the reason why low bit-resolution motion estima- Algorithm 1 (Non-uniform interval partitioning). tion,especially one-or two-bit motion estimation,is sensitive to the threshold selection.To summarize,a bit 1.Calculate histogram p of the input frame. transform is unfavorable for motion estimation when ignor- 2.Employ histogram equalization to map g to g',namely ing inter-frame noises and pixel changes. e(g) 255×pk) (6) 4.1.Membership function 3.Uniformly quantize to obtain threshold To find optimum motion vectors,it is necessary to T=28-j-1. estimate the probability that each pixel value within a 4.Compute T by an inverse transform e-1,ie.. certain distance from initial threshold belongs to its Tj=e-1(T'j).Since e is not an one-to-one mapping. current interval.If the probability is low,modify the initial there does not exist an inverse transform.Here,we threshold such that each pixel value around it is positioned define e-1 as in the right interval.In our study.we use a membership e-1(T)=min(k)s.te(k≥T,k∈{0,1,2,,255 function to describe the probability of a pixel value (7) belonging to each interval. Definition.(T1,T2...Tw gives an initial interval par- After the non-uniform interval partitioning above, titioning of G.obtaining 2N intervals namely Zo.Z..... Eq.(5)is utilized again to compute (ri). Z2w_1 We define the probability that g is in Zj as member- Algorithm 1 is able to handle video frames obeying any ship whose value depends on a membership function.The PDF with low computational complexity.It should be membership function is described as follows: noted that,however,Algorithm 1 only offers a suboptimal solution because histogram equalization cannot produce M1(g) p(s)ds (9) an ideal uniform distribution for discrete pixel values. Substitute (ri)into Eq.(2).and combine it with Eq.(1). Then we map the current and reference frames to B,on M(g)=1 Pi1(s)ds. pi(s)ds. (10) which the low bit-resolution motion estimation is exe- cuted.Now,our bit transform still adopts a hard threshold- ing manner.But [35]observed that the hard thresholding M(g)= P2w(s)ds, (11) always leads to performance degradation.We will analyze the reason in the next subsection.Afterwards,we will where M(g)denotes the membership function.Py is the introduce a fuzzy logic into the bit transform. probability density function of Tj.which relates to inter- frame noises and video signal itself.In this study.we assume that the inter-frame noises are caused by camera 4.Threshold refinement using fuzzy logic capability and coding distortion,and they are independent Block-based motion estimation generally assumes that of each other and Mc(x,y).Since the light condition a pixel and its correspondence in a successive frame have changes between successive frames could be ignored,we the same value.Indeed,it is not the case.Due to camera do not take it into consideration.Further,suppose that the capability,coding distortion,light changes,etc.,the cur- noises due to camera capability and coding distortion rent macroblock is usually different from its best matched follow normal distributions,namely N(0,2)and N(O,) macroblock.Let Me and M,be the macroblocks separately in respectively.then we define py as current and reference frames,respectively.M denotes the 1 6-T)2 optimal prediction of the current macroblock.Then we have Pi(S)= exp (12) 2(c2+o) 2(c2+) M=M(x+u.y+v)=Mc(x.y)+n(x.y). (8) Fig.1 depicts the membership function in the case of N=4. where (u,v)and n(,)are motion vector and random noises, Note that different probability density functions can also respectively.Under this condition,M(x,y)Mc(x,y)holds be accepted for different Ti according to the application for most pixels(xy).If we employ a hard threshold T.we scenarios
this is unpractical with respect to the computational complexity. As we know, the uniform quantization is the optimum quantizer if a signal's PDF is a uniform distribution. This conclusion enlightens us on an efficient solution to the interval partitioning. Provided that we can find a transform e from g to a random variable g′Af0; 1; 2; …; 255g satisfying g′ Uð0; 255Þ, we will easily obtain the optimum thresholds of g′. Afterwards, the interval partitioning of g can be realized through inversely transforming the computed thresholds using e1 . Based on this idea, we present our method below. Algorithm 1 (Non-uniform interval partitioning). 1. Calculate histogram p of the input frame. 2. Employ histogram equalization to map g to g′, namely eðgÞ ¼ ∑ g k ¼ 0 255 pðkÞ $ %: ð6Þ 3. Uniformly quantize g′ to obtain threshold T′j ¼ 28Nj1. 4. Compute Tj by an inverse transform e1, i.e., Tj ¼ e1ðT′jÞ. Since e is not an one-to-one mapping, there does not exist an inverse transform. Here, we define e1 as e1ðT′jÞ ¼ minfkg s:t: eðkÞZT′j; kAf0; 1; 2; …; 255g: ð7Þ After the non-uniform interval partitioning above, Eq. (5) is utilized again to compute frig. Algorithm 1 is able to handle video frames obeying any PDF with low computational complexity. It should be noted that, however, Algorithm 1 only offers a suboptimal solution because histogram equalization cannot produce an ideal uniform distribution for discrete pixel values. Substitute frig into Eq. (2), and combine it with Eq. (1). Then we map the current and reference frames to B, on which the low bit-resolution motion estimation is executed. Now, our bit transform still adopts a hard thresholding manner. But [35] observed that the hard thresholding always leads to performance degradation. We will analyze the reason in the next subsection. Afterwards, we will introduce a fuzzy logic into the bit transform. 4. Threshold refinement using fuzzy logic Block-based motion estimation generally assumes that a pixel and its correspondence in a successive frame have the same value. Indeed, it is not the case. Due to camera capability, coding distortion, light changes, etc., the current macroblock is usually different from its best matched macroblock. Let Mc and Mr be the macroblocks separately in current and reference frames, respectively. M^ n c denotes the optimal prediction of the current macroblock. Then we have M^ n c ¼ Mrðxþu; yþvÞ ¼ Mcðx; yÞþnðx; yÞ; ð8Þ where (u,v) and nð; Þ are motion vector and random noises, respectively. Under this condition, M^ n c ðx; yÞaMcðx; yÞ holds for most pixels (x,y). If we employ a hard threshold Tj, we may have M^ n c ðx; yÞ4Tj and Mcðx; yÞrTj, or M^ n c ðx; yÞrTj and Mcðx; yÞ4Tj. At this moment, M^ n c ðx; yÞ and Mcðx; yÞ will be mapped to different intervals and produce matching errors. For low bit-resolution motion estimation, the matching error of a pixel is usually small. For example, the maximum matching error is just four in the case of twobit motion estimation. This indicates that an inaccurate interval mapping brings about 25% matching deviation at least. Such a large deviation will inevitably lead to a false motion vector. In contrast, the maximum matching error of full bit-resolution motion estimation is 255, thus small matching error, e.g., one, takes up a negligible deviation. This is the reason why low bit-resolution motion estimation, especially one- or two-bit motion estimation, is sensitive to the threshold selection. To summarize, a bit transform is unfavorable for motion estimation when ignoring inter-frame noises and pixel changes. 4.1. Membership function To find optimum motion vectors, it is necessary to estimate the probability that each pixel value within a certain distance from initial threshold belongs to its current interval. If the probability is low, modify the initial threshold such that each pixel value around it is positioned in the right interval. In our study, we use a membership function to describe the probability of a pixel value belonging to each interval. Definition. fT1; T2;…; T2N 1g gives an initial interval partitioning of G, obtaining 2N intervals namely Z0, Z1, …, Z2N 1. We define the probability that g is in Zj as membership whose value depends on a membership function. The membership function is described as follows: M1ðgÞ ¼ Z þ1 g p1ðsÞ ds; ð9Þ MjðgÞ ¼ 1 Z þ1 g pj1ðsÞ ds Z g 1 pjðsÞ ds; ð10Þ M2N ðgÞ ¼ Z g 1 p2N 1ðsÞ ds; ð11Þ where Mi(g) denotes the membership function. pj is the probability density function of Tj, which relates to interframe noises and video signal itself. In this study, we assume that the inter-frame noises are caused by camera capability and coding distortion, and they are independent of each other and Mcðx; yÞ. Since the light condition changes between successive frames could be ignored, we do not take it into consideration. Further, suppose that the noises due to camera capability and coding distortion follow normal distributions, namely Nð0; s2 c Þ and Nð0; s2 dÞ, respectively, then we define pj as pjð Þ¼s 1 ffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi 2πðs2 c þs2 dÞ q exp ðsTjÞ 2 2ðs2 c þs2 dÞ " #: ð12Þ Fig. 1 depicts the membership function in the case of N¼4. Note that different probability density functions can also be accepted for different Tj according to the application scenarios. 1438 C.-M. Song et al. / Signal Processing: Image Communication 28 (2013) 1435–1447
C.-M.Song et aL Signal Processing:Image Communication 28 (2013)1435-1447 1439 simplify Eq.(13)to 2 256 - Mj(g) (15) From Eq.(15).T;is closely related to weights of the corresponding intervals.Large difference between Zj-1 and IZjl (suppose Zj-1>Z)indicates that the numbers T T of pixel values locating in Zj-1 and Z vary considerably. The probability is high that the pixel values within a Fig.1.General view of membership function. certain distance from threshold T belong to Zi.since these pixel values are much probably partitioned into Zi_1 due to noises.Hence,the T"should be moved toward Ti-1 from Tj.Based on this inference,we propose to adaptively 4.2.Initial threshold refinement refine T according to Zil.The procedure is shown in Algorithm 2. Upon defining the membership function,the optimum interval partitioning corresponds to (Ti....T)that Algorithm 2(Initial threshold refinement). maximizes 1.Set j=0. Mj(g)p(g)s.tTj-Tl≤Vo2+ (13) 2.f☑1≤2562/N.then let△=V02+后×256/ N-ZD/256/N=Vc2+1×(256-1Z1×N)/256, in which To=-1 and T>w =255.The last term in Eq.(13) IZ'jl=IZjl+A.Otherwise,set IZ'jl=IZjl.go to Step 4. constrains difference between (Tj}and (Tj).Otherwise, 3.Let j=j+1.If j=N.go to Step 4.Otherwise,go to (T;)may diverge from the suboptimal thresholds (Tj) Step 2. solved in Section 3.As we know about the normal 4.Let L=EilZ'jl.refine the length and threshold of distribution,nearly 70%of the random variation amplitude each interval by setting IZ'j=256 x IZ'jl/L,je is within one variance.Therefore,we seek T;around Ti in a 01,,2-1h.T0=-1,T2w=255,T=Ti-1+ range of one variance.In the following,we discuss our 1Z-1l-1,ie1,2,2w-1n. solution to Eq.(13)under two conditions. First,when p is the uniform distribution,p()can be In the Step 2 of the above algorithm,A is used to control extracted as a common factor from Eq.(13).Now,that Eq. (13)is maximum is equivalent to maximizing computational complexity.The small the 4 is,the less the complexity will be.Furthermore,(256/N-IZjD)/(256/N)= (256-lZ分×N/256 is to normalize△so that the length Mjg)st.T-T引≤Vo2+ (14) variation of each adjusted interval is within one variance j=1g=了-1+1 +as discussed in Eq.(13). The remaining question is how to compute2+ From Fig.1 we can find that the solution of Ti is the Since we assume the inter-frame noises are independent intersection of M and Mj+1.i.e..T.This tells us that of Me(x.y).we have=++where adr and o uniform quantization provides the optimal interval parti- denote separately the variances of pixel values in the tioning for uniform distribution,needing not to be refined. current and reference frames.respectively.We thus calcu- Second,p is non-uniform.Since most image pixels follow non-uniform distributions,video sequences are late a2+o匠by la2-o usually in this case.We could employ an optimization By substituting the computed (Ti]into Eqs.(1).(2)and method to resolve Eq.(14)for each frame.This will (3),we obtain the low bit-resolution representation B of consume considerable computational complexity.Here each frame.Before presenting our low complexity motion we present a compromise method for less computation. estimation approach on B.we discuss the computational Although the first condition can hardly be true for complexity of our bit transform. natural videos,its solution sheds a light on how to treat the second condition.The fundamental idea is to replace 4.3.Computational complexity of our bit transform p()of each interval using its average probability p so that Eq.(13)is transformed to a similar form to Eq.(14). Without particularly pointed out,addition and subtrac- Because T is computed by means of histogram equaliza- tion operations are generally called "addition"hereafter, tion,pixels'accumulate probability distribution in each and multiplication and division called "multiplication" Zi(ie(0,1.....2N-1))is approximately the same.Let The computational complexity of our proposed bit trans- Zil=Ti+1-Ti be the interval length.Then the smaller form depends on two steps. the Zil is,the higher its p will be.Furthermore,the interval First,histogram equalization.Assume that the size of length of a uniform distribution is 256/N,we thus use video frame is wx h pixels.Counting the histogram p of 256/(N x IZil)as a weight to roughly measure p and the current frame requires wx h additions.e can be
4.2. Initial threshold refinement Upon defining the membership function, the optimum interval partitioning corresponds to fT″ 1; …; T″ 2N 1g that maximizes ∑ 2N j ¼ 1 ∑ T″ j g ¼ T″ j 1 þ1 MjðgÞpðgÞ s:t: jTj T″ j jr ffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi s2 c þs2 d q ð13Þ in which T″ 0 ¼ 1 and T″ 2N ¼ 255. The last term in Eq. (13) constrains difference between fTjg and fT″ jg. Otherwise, fT″ jg may diverge from the suboptimal thresholds fTjg solved in Section 3. As we know about the normal distribution, nearly 70% of the random variation amplitude is within one variance. Therefore, we seek T″ j around Tj in a range of one variance. In the following, we discuss our solution to Eq. (13) under two conditions. First, when p is the uniform distribution, pðÞ can be extracted as a common factor from Eq. (13). Now, that Eq. (13) is maximum is equivalent to maximizing ∑ 2N j ¼ 1 ∑ T″ j g ¼ T″ j 1 þ1 MjðgÞ s:t: jTj T″ j jr ffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi s2 c þs2 d q : ð14Þ From Fig. 1 we can find that the solution of T″ j is the intersection of Mj and Mjþ1, i.e., Tj. This tells us that uniform quantization provides the optimal interval partitioning for uniform distribution, needing not to be refined. Second, p is non-uniform. Since most image pixels follow non-uniform distributions, video sequences are usually in this case. We could employ an optimization method to resolve Eq. (14) for each frame. This will consume considerable computational complexity. Here we present a compromise method for less computation. Although the first condition can hardly be true for natural videos, its solution sheds a light on how to treat the second condition. The fundamental idea is to replace pðÞ of each interval using its average probability p so that Eq. (13) is transformed to a similar form to Eq. (14). Because Tj is computed by means of histogram equalization, pixels' accumulate probability distribution in each ZiðiAf0; 1;…; 2N 1gÞ is approximately the same. Let jZij ¼ Tiþ1 Ti be the interval length. Then the smaller the jZij is, the higher its p will be. Furthermore, the interval length of a uniform distribution is 256/N, we thus use 256=ðN jZijÞ as a weight to roughly measure p and simplify Eq. (13) to ∑ 2N j ¼ 1 256 ðTj Tj1ÞN ∑ T″ j g ¼ T″ j 1 þ1 Mjð Þ g 8 < : 9 = ;: ð15Þ From Eq. (15), T″ j is closely related to weights of the corresponding intervals. Large difference between jZj1j and jZjj (suppose jZj1j4jZjj) indicates that the numbers of pixel values locating in Zj1 and Zj vary considerably. The probability is high that the pixel values within a certain distance from threshold Tj belong to Zj, since these pixel values are much probably partitioned into Zj1 due to noises. Hence, the T″j should be moved toward Tj1 from Tj. Based on this inference, we propose to adaptively refine Tj according to jZij. The procedure is shown in Algorithm 2. Algorithm 2 (Initial threshold refinement). 1. Set j¼0. 2. If jZjjr256λ=N, then let Δ ¼ ffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi s2 c þs2 d q ð256= NjZjjÞ=ð256=NÞ ¼ ffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi s2 c þs2 d q ð256jZjj NÞ=256, jZ′jj¼jZjjþΔ. Otherwise, set jZ′jj¼jZjj, go to Step 4. 3. Let j ¼ jþ1. If j¼N, go to Step 4. Otherwise, go to Step 2. 4. Let L ¼ ∑jjZ′jj, refine the length and threshold of each interval by setting jZ′jj ¼ 256 jZ′jj=L, jA f0; 1; …; 2N 1g, T″ 0 ¼ 1, T″ 2N ¼ 255, T″ i ¼ T″ i1 þ jZ′i1j1, iAf1; 2; …; 2N 1gÞ. In the Step 2 of the above algorithm, λ is used to control computational complexity. The small the λ is, the less the complexity will be. Furthermore, ð256=NjZjjÞ=ð256=NÞ ¼ ð256jZjj NÞ=256 is to normalize Δ so that the length variation of each adjusted interval is within one variance ffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi s2 c þs2 d q , as discussed in Eq. (13). The remaining question is how to compute ffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi s2 c þs2 d q . Since we assume the inter-frame noises are independent of Mcðx; yÞ, we have s2 rf ¼ s2 c þs2 d þs2 cf where s2 cf and s2 rf denote separately the variances of pixel values in the current and reference frames. respectively. We thus calculate ffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi s2 c þs2 d q by ffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi js2 cf s2 rf j q . By substituting the computed fT″ jg into Eqs. (1), (2) and (3), we obtain the low bit-resolution representation B of each frame. Before presenting our low complexity motion estimation approach on B, we discuss the computational complexity of our bit transform. 4.3. Computational complexity of our bit transform Without particularly pointed out, addition and subtraction operations are generally called “addition” hereafter, and multiplication and division called “multiplication”. The computational complexity of our proposed bit transform depends on two steps. First, histogram equalization. Assume that the size of video frame is w h pixels. Counting the histogram p of the current frame requires w h additions. e can be Fig. 1. General view of membership function. C.-M. Song et al. / Signal Processing: Image Communication 28 (2013) 1435–1447 1439
1440 C-M.Song et aL Signal Processing:Image Communication 28 (2013)1435-1447 b metric is adopted.It can distinguish two situations only. 000⊕111 i.e,match and mismatch.Nevertheless,the number of matching situations is more than two when N>1.This implies that DPC cannot well exploit the information 1 011⊕100 provided by high bit-resolution pixels. 11⊕00 The BXOR metric was also researched [59,67].This metric can be regarded as an operation on two N-dimen- 10⊕01 sional vectors in a duality field(0,1,⊕,⑧)where“⊕" 00 10 and "denote "Exclusive-OR"and "AND"operations. Fig.2.Inaccurate matching evaluation by BXOR metric.(a)Two-bit BXOR respectively.Unfortunately,the output of BXOR does not metric.(b)Three-bit BXOR metric. completely conform to the real difference between the two pixel values.For example,if two pixels'representations are Table 1 separately "01"and "10",the metric output is "11"(Fig.2 Test video sequences and formats. (a))which means that their difference is three.Obviously. the correct difference equals one.In such a case,the BXOR Sequence name Format Resolution Frame rate(fps) metric cannot distinguish between(01,10)and(00.11). Flower CIF 352×288 30 Actually,(01,10)is a better match than (00,11).For three- Football CIF 352×288 30 bit matching.BXOR metric presents the same matter as Mobile CIF 352×288 30 shown in Fig.2(b). Paris CIF 352×288 30 We believe that SAD is still the accurate and efficient Stefan CIF 352×288 30 Tennis CIF 352×288 30 matching metric for low bit-resolution motion estimation. Tempete CIF 352×288 30 In our study.we adopt a bit-truncation SAD [23]defined as City 4CIF 704×576 30 s-1S-1 Crew 4CIF 704×576 30 h(u,v)= +N-1(x.y)-+N-1(x+u.y+v)l. Harbour 4CIF 704×576 30 X=0y=0 Carphone QCIF 176×144 30 (16 Grandma QCIF 176×144 30 in which Ic(,)and I(,)are separately pixel values of the current and reference frames.q:q+N-1 indicates that computed using 256 multiplications and 255 additions. operations are only executed on from the g-th to the Solving e-needs 255 comparison operations.Note that, (g+N-1)-th bit of each byte.On one hand,we could when N>R,the histogram equalization is not required,so store multiple pixel values in one byte,available for are the above operations. parallel processing.On the other hand,we are able to Second,threshold refinement.The mean value #c andr create a table in memory to save the matching errors of the current and reference frames are computed by total under all situations to realize metric (16)with low com- 2 x(w x h-1)additions.In order to calculate the variance putational cost.Using+N-1(x.y)and+N-1(x+u. r and f.2 x wx h multiplications and 2 x wx h addi- y+v)as an offset,we then employ the XLAT instruction tions are required.Furthermore,one addition,one abso- of the assembly language to locate the matching error in lute conversion,and one square root operations are the aforementioned table.This method requires less clock needed to obtain o+.Afterwards,Step 2 of cycles than sequentially performing subtraction and abso- lute conversion operations. Algorithm 2 executes at most 2N+additions and 2x (2N-1)+2 multiplications.Step 4 is composed of 3x 5.2.Fuzzy quantization based low bit-resolution motion (2N-1)additions and 2 x (2N-1)multiplications.In addi- estimation tion,quantizing each pixel needs at most (2N-1) comparisons. On the basis of the concept mentioned above,we present a low bit-resolution motion estimation approach 5.Low bit-resolution motion estimation based on fuzzy quantization.The current and reference frames are first processed by the bit transform proposed in This section first talks over the matching metric sui- Sections 3 and 4,respectively.Next,one of the conven- table for low bit-resolution motion estimation.A novel tional search strategies,e.g.full search,is applied for motion estimation approach is then developed. motion estimation using h as the matching metric.This approach is summarized in Algorithm 3. 5.1.Matching metric for low bit-resolution motion Algorithm 3 (Low bit-resolution motion estimation). estimation 1.Separately compute the mean pixel values #=Ellc]and An efficient matching metric is one of the important r=E[lr]of the current and reference frames,and their components for motion estimation.Bahari et al.59 found variance od E[l2]-E [lc]and =E[F]-E[l]. that DPC and BXOR metrics show good performance in the 2.Carry out the histogram equalization (see Eq.(6))on low bit-resolution motion estimation.Among most low reference frame to obtain g'. bit-resolution motion estimation algorithms,the DPC 3.Uniformly quantize g'and determine (T'j)
computed using 256 multiplications and 255 additions. Solving e1 needs 255 comparison operations. Note that, when NZR, the histogram equalization is not required, so are the above operations. Second, threshold refinement. The mean value μc and μr of the current and reference frames are computed by total 2 ðw h1Þ additions. In order to calculate the variance s2 cf and s2 rf , 2 w h multiplications and 2 w h additions are required. Furthermore, one addition, one absolute conversion, and one square root operations are needed to obtain ffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi s2 c þs2 d q . Afterwards, Step 2 of Algorithm 2 executes at most 2ðN þ1Þ additions and 2 ð2N 1Þþ2 multiplications. Step 4 is composed of 3 ð2N 1Þ additions and 2 ð2N 1Þ multiplications. In addition, quantizing each pixel needs at most ð2N 1Þ comparisons. 5. Low bit-resolution motion estimation This section first talks over the matching metric suitable for low bit-resolution motion estimation. A novel motion estimation approach is then developed. 5.1. Matching metric for low bit-resolution motion estimation An efficient matching metric is one of the important components for motion estimation. Bahari et al. [59] found that DPC and BXOR metrics show good performance in the low bit-resolution motion estimation. Among most low bit-resolution motion estimation algorithms, the DPC metric is adopted. It can distinguish two situations only, i.e, match and mismatch. Nevertheless, the number of matching situations is more than two when N41. This implies that DPC cannot well exploit the information provided by high bit-resolution pixels. The BXOR metric was also researched [59,67]. This metric can be regarded as an operation on two N-dimensional vectors in a duality field ðf0; 1g; ; Þ where “” and “” denote “Exclusive-OR” and “AND” operations, respectively. Unfortunately, the output of BXOR does not completely conform to the real difference between the two pixel values. For example, if two pixels' representations are separately “01” and “10”, the metric output is “11” (Fig. 2 (a)) which means that their difference is three. Obviously, the correct difference equals one. In such a case, the BXOR metric cannot distinguish between (01, 10) and (00, 11). Actually, (01, 10) is a better match than (00, 11). For threebit matching, BXOR metric presents the same matter as shown in Fig. 2(b). We believe that SAD is still the accurate and efficient matching metric for low bit-resolution motion estimation. In our study, we adopt a bit-truncation SAD [23] defined as hðu; vÞ ¼ ∑ S1 x ¼ 0 ∑ S1 y ¼ 0 jI q:qþN 1 c ðx; yÞI q:qþN 1 r ðxþu; yþvÞj; ð16Þ in which Icð; Þ and Irð; Þ are separately pixel values of the current and reference frames. q : qþN1 indicates that operations are only executed on from the q-th to the ðqþN1Þth bit of each byte. On one hand, we could store multiple pixel values in one byte, available for parallel processing. On the other hand, we are able to create a table in memory to save the matching errors under all situations to realize metric (16) with low computational cost. Using I q:qþN 1 c ðx; yÞ and I q:qþN 1 r ðxþu; yþvÞ as an offset, we then employ the XLAT instruction of the assembly language to locate the matching error in the aforementioned table. This method requires less clock cycles than sequentially performing subtraction and absolute conversion operations. 5.2. Fuzzy quantization based low bit-resolution motion estimation On the basis of the concept mentioned above, we present a low bit-resolution motion estimation approach based on fuzzy quantization. The current and reference frames are first processed by the bit transform proposed in Sections 3 and 4, respectively. Next, one of the conventional search strategies, e.g., full search, is applied for motion estimation using h as the matching metric. This approach is summarized in Algorithm 3. Algorithm 3 (Low bit-resolution motion estimation). 1. Separately compute the mean pixel values μc ¼ E½Ic and μr ¼ E½Ir of the current and reference frames, and their variance s2 cf ¼ E½I 2 c E2½Ic and s2 rf ¼ E½I 2 r E2½Ir. 2. Carry out the histogram equalization (see Eq. (6)) on reference frame to obtain g′. 3. Uniformly quantize g′ and determine fT′jg. Fig. 2. Inaccurate matching evaluation by BXOR metric. (a) Two-bit BXOR metric. (b) Three-bit BXOR metric. Table 1 Test video sequences and formats. Sequence name Format Resolution Frame rate (fps) Flower CIF 352 288 30 Football CIF 352 288 30 Mobile CIF 352 288 30 Paris CIF 352 288 30 Stefan CIF 352 288 30 Tennis CIF 352 288 30 Tempete CIF 352 288 30 City 4CIF 704 576 30 Crew 4CIF 704 576 30 Harbour 4CIF 704 576 30 Carphone QCIF 176 144 30 Grandma QCIF 176 144 30 1440 C.-M. Song et al. / Signal Processing: Image Communication 28 (2013) 1435–1447
C.-M.Song et aL Signal Processing:Image Communication 28 (2013)1435-1447 1441 a 方分0年i23 2A河前订23 45678910 456789 1112131415617 日23456丽 1819202122¥24 81920122 ●7282 6而行23 7到5活T2行 45B789期 48878910 中231M6花 112365能 1892的172 8920212 e ,+p 5新行53 45678910 45678910 1123145活7 111234567 18192022 81920222 28 428 Fig.3.Comparisons of the motion-compensated 3rd frame produced by our algorithm with those by previous methods at the bit-rate of 128 kbps.The macroblock size is 16 x 16 pixels.(a)Original 3rd frame.(b)1BT,PSNR=22.17.(c)MT-2BT,PSNR=22.11.(d)AQ-2BT.PSNR=22.19.(e)8BT.PSNR=22.48. ()FQ-2BT,PSNR=22.34. 4.Use e-1(Eq.(7))to solve initial thresholds (Tj). on the first 90 frames of several test sequences in various 5.Compute a2+abyV√o2+o径=Vlo2-al formats,which are listed in Table 1.The experimental platform is MPEG-4 verification software with version 6.Employ Algorithm 2 to calculate (Ti)and perform the number "Microsoft-FDAM1-2.3-001213".The parameters bit transform on current and reference frames.Note are set as default configuration except the ones below:(1) that the fuzzy quantization thresholds of the current GOP size is 16 with 1 I-frame and 15 P-frames.(2)Rate and reference frames are identical. control algorithm is "TM5".(3)The quantization type is 7.Conduct block-based full search on low bit-resolution "H263".Motion estimation was carried out with integer- representations,using h to evaluate matching differences pel accuracy,using a block size of 16 x 16 pixels and 8 x 8 pixels in a search range of +16 pixels.In addition,the parameters 4 and R of our bit transform are respectively set 6.Experimental results and discussions to 0.625 and 4 which are determined through experi- ments.The quality of decoded video is evaluated by the To verify the effectiveness of our bit transform in Peak Signal-to-Noise Ratio (PSNR).We also compared the motion estimation,extensive experiments were conducted results against several representative methods,including
4. Use e1 (Eq. (7)) to solve initial thresholds fTjg. 5. Compute ffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi s2 c þs2 d q by ffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi s2 c þs2 d q ¼ ffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi js2 cf s2 rf j q . 6. Employ Algorithm 2 to calculate fT″ jg and perform the bit transform on current and reference frames. Note that the fuzzy quantization thresholds of the current and reference frames are identical. 7. Conduct block-based full search on low bit-resolution representations, using h to evaluate matching differences. 6. Experimental results and discussions To verify the effectiveness of our bit transform in motion estimation, extensive experiments were conducted on the first 90 frames of several test sequences in various formats, which are listed in Table 1. The experimental platform is MPEG-4 verification software with version number “Microsoft-FDAM1-2.3-001213”. The parameters are set as default configuration except the ones below: (1) GOP size is 16 with 1 I-frame and 15 P-frames. (2) Rate control algorithm is “TM5”. (3) The quantization type is “H263”. Motion estimation was carried out with integerpel accuracy, using a block size of 16 16 pixels and 8 8 pixels in a search range of 716 pixels. In addition, the parameters λ and R of our bit transform are respectively set to 0.625 and 4 which are determined through experiments. The quality of decoded video is evaluated by the Peak Signal-to-Noise Ratio (PSNR). We also compared the results against several representative methods, including Fig. 3. Comparisons of the motion-compensated 3rd frame produced by our algorithm with those by previous methods at the bit-rate of 128 kbps. The macroblock size is 16 16 pixels. (a) Original 3rd frame. (b) 1BT, PSNR¼22.17. (c) MT-2BT, PSNR¼22.11. (d) AQ-2BT, PSNR¼22.19. (e) 8BT, PSNR¼22.48. (f) FQ-2BT, PSNR¼22.34. C.-M. Song et al. / Signal Processing: Image Communication 28 (2013) 1435–1447 1441
1442 C-M.Song et aL Signal Processing:Image Communication 28 (2013)1435-1447 a b 1 88日123 0年123 45678910 4567886 121314151617 12.3146 1819202122 8920212; ●7282 $分mT2万 器7面万2写 456789 4587897 1t234也形 n2314面67 89202T22 8920272 728 728 e 后072了 西50i2写 4667890 256789酒 112134历6 112131456 18920212 8920212 528 728 Fig.4.Comparisons of the motion-compensated 3rd frame produced by our algorithm with those by previous methods at the bit-rate of 128 kbps.The macroblock size is 8 x 8 pixels.(a)Original 3rd frame.(b)1BT,PSNR=21.80.(c)MT-2BT,PSNR=21.97.(d)AQ-2BT.PSNR=22.11.(e)8BT.PSNR=22.79.(f) FQ-2BT.PSNR=22.34. filtering-based one-bit full search [28](1BT).adaptive bits per second).Figs.3 and 4 illustrate the reconstructed quantization-based two-bit full search [23](AQ-2BT). 3rd frames of"Mobile Calendar"sequence using various multithresholding-based two-bit full search [4](MT-2BT) motion estimation algorithms with a block size of 16 x 16 and eight-bit full search with SAD matching metric(8BT). and 8 x 8 pixels,respectively. Since state-of-the-art literatures research only one-or From Fig.3,we can find that the reconstructed frames by two-bit motion estimation,the experimental results of 1BT and MT-2BT algorithms present the most serious ringing our algorithm(FQ-2BT)are not discussed in the case of artifacts at edges.Several numbers on the calendar,e.g.10,17. three-and higher-bit representation in this study. and 24,are very blurred even missing.The result by AQ-2BT algorithm suffers less ringing artifact around number bor- 6.1.Subjective quality comparison ders,but still has number"17"lost and one fake edge at the left side of the calendar.Although the PSNR of FO-2BT is In the first experiment,we compared the subjective 0.14 dB lower than that of 8BT,FQ-2BT and 8BT algorithms quality of decoded frames by our algorithm with those by contribute similar subjective quality which displays clearer representative algorithms at the bit-rate of 128 kbps(kilo numbers than those of 1BT,MT-2BT,and AQ-2BT
filtering-based one-bit full search [28] (1BT), adaptive quantization-based two-bit full search [23] (AQ-2BT), multithresholding-based two-bit full search [4] (MT-2BT), and eight-bit full search with SAD matching metric (8BT). Since state-of-the-art literatures research only one- or two-bit motion estimation, the experimental results of our algorithm (FQ-2BT) are not discussed in the case of three-and higher-bit representation in this study. 6.1. Subjective quality comparison In the first experiment, we compared the subjective quality of decoded frames by our algorithm with those by representative algorithms at the bit-rate of 128 kbps (kilo bits per second). Figs. 3 and 4 illustrate the reconstructed 3rd frames of “Mobile & Calendar” sequence using various motion estimation algorithms with a block size of 16 16 and 8 8 pixels, respectively. From Fig. 3, we can find that the reconstructed frames by 1BT and MT-2BT algorithms present the most serious ringing artifacts at edges. Several numbers on the calendar, e.g. 10, 17, and 24, are very blurred even missing. The result by AQ-2BT algorithm suffers less ringing artifact around number borders, but still has number “17” lost and one fake edge at the left side of the calendar. Although the PSNR of FQ-2BT is 0.14 dB lower than that of 8BT, FQ-2BT and 8BT algorithms contribute similar subjective quality which displays clearer numbers than those of 1BT, MT-2BT, and AQ-2BT. Fig. 4. Comparisons of the motion-compensated 3rd frame produced by our algorithm with those by previous methods at the bit-rate of 128 kbps. The macroblock size is 8 8 pixels. (a) Original 3rd frame. (b) 1BT, PSNR¼21.80. (c) MT-2BT, PSNR¼21.97. (d) AQ-2BT, PSNR¼22.11. (e) 8BT, PSNR¼22.79. (f) FQ-2BT, PSNR¼22.34. 1442 C.-M. Song et al. / Signal Processing: Image Communication 28 (2013) 1435–1447
C.-M.Song et aL Signal Processing:Image Communication 28 (2013)1435-1447 1443 Table 2 Average PSNR(dB)of various reconstructed sequences by each algorithm with a block size of 16 x 16 pixels Sequence Bit-rate(kbps) 8BT 1BT MT-2BT AQ-2BT NUQ-2BT FQ-2BT Flower 128 23.09 22.81 22.88 22.89 22.87 23.00 384 23.14 22.86 22.92 22.94 22.94 23.06 500 23.39 22.92 22.97 23.12 2315 23.39 Football 128 28.60 26.92 27.85 27.75 27.77 27.96 384 29.03 26.99 27.92 27.88 27.87 28.05 500 3019 27.13 28.52 28.81 28.63 28.95 Mobile 128 22.17 21.76 21.81 21.88 22.02 22.04 384 22.22 21.82 21.85 2194 22.06 22.09 500 22.26 21.85 21.88 21.97 22.10 22.13 Paris 128 25.03 24.35 24.56 24.60 24.75 24.76 384 26.64 24.97 25.42 25.59 26.07 26.12 500 27.85 26.27 26.82 26.98 27.32 27.37 Stefan 128 24.14 23.59 23.83 23.76 23.87 23.87 384 24.36 23.67 23.92 23.86 23.95 23.98 500 24.99 24.04 24.42 24.46 24.55 24.61 Tennis 128 27.50 25.96 26.72 26.76 26.82 26.85 384 28.48 26.28 27.12 2731 27.50 27.55 500 29.26 27.06 27.79 28.18 28.25 28.33 Tempete 128 24.32 23.82 23.92 23.96 24.03 24.09 384 24.98 2412 24.18 24.38 24.55 24.66 500 25.69 24.83 24.92 25.10 25.27 25.36 City 1024 27.64 25.94 26.18 2626 26.74 26.81 1500 29.12 27.12 27.21 27.64 28.08 28.23 2048 30.23 28.74 28.83 2926 29.38 29.49 Crew 1024 30.83 28.87 29.51 29.46 29.60 29.69 1500 32.34 29.63 30.48 30.65 31.04 31.08 2048 33.61 31.29 32.11 32.25 32.44 32.47 Harbour 1024 26.10 25.22 25.36 25.56 25.67 25.69 1500 27.17 25.48 25.65 26.14 26.36 26.46 2048 26.79 26.79 27.02 27.62 27.67 27.80 Carphone 27.48 26.22 26.55 26.61 26.88 26.90 4 28.53 26.29 26.67 26.98 27.57 27.62 12 32.28 29.48 30.55 31.11 31.33 21.40 Grandma 2 28.94 27.63 28.02 28.12 28.13 28.26 64 31.32 28.20 28.79 3022 30.09 30.41 128 35.14 33.18 34.20 34.74 34.29 34.38 In contrast,all algorithms achieve less ringing artifacts of the bit transform only with non-uniform interval in Fig.4 than those in Fig.3.This indicates that motion partitioning (NUQ-2BT).Table 2 shows the performance estimation with a block size of 8 x 8 pixels gains better comparison results for a block size of 16 x 16 pixels,while prediction for image details than that with 16 x 16 macro- Table 3 lists the results for a block size of 8x8 pixels. blocks.However,low bit-resolution pixels always present Figs.5-8 illustrate the PSNR curves of decoded "Flower", poorer details than eight-bit depth pixels.so that the "Paris","Harbour",and "Grandma"sequences by six motion estimation with small macroblocks tends to incur motion estimation algorithms.As can be seen from aperture effect and even ubiquitous matching.Hence, Tables 2 and 3,the proposed bit transform and its motion there is a slight drop in the PSNR of low bit-resolution estimation are superior to other bit transforms and low motion estimation algorithms,except for our algorithm. bit-resolution motion estimation methods including the The PSNR decrease of 1BT,MT-2BT,and AQ-2BT is sepa- NUQ-2BT in terms of average PSNR.Moreover,the perfor- rately 0.37 dB.0.14 dB,and 0.08 dB.It demonstrates that mance of our algorithm is very stable on each frame our bit transform reaches the minimum information loss demonstrated in Figs.5-8. and better preserves the details and texture features of Among four previous low bit-resolution methods,1BT original video frames among all bit transform methods, performs worst whose average PSNR is 1.88 dB lower than obtaining high motion-compensation efficiency. that of 8BT algorithm,since one-bit representation cannot provide rich enough pattern features for block matching. 6.2.Objective quality comparison Two-bit depth pixels contain more detailed features of video scenes.The MT-2BT algorithm thus leads to a PSNR In the second experiment,we compared our bit trans- of up to 0.56 dB higher than 1BT,which is 1.32 dB lower form with previous methods in terms of PSNR.To better than 8BT.Nevertheless,the bit transform of MT-2BT understand the two individual steps of our algorithm, adopts varying thresholds for different macroblocks.This namely the non-uniform interval partitioning and the will inevitably produce discontinuities at the bounda- threshold refinement,we also analyzed the performance ries of neighboring macroblocks in one search window
In contrast, all algorithms achieve less ringing artifacts in Fig. 4 than those in Fig. 3. This indicates that motion estimation with a block size of 8 8 pixels gains better prediction for image details than that with 16 16 macroblocks. However, low bit-resolution pixels always present poorer details than eight-bit depth pixels, so that the motion estimation with small macroblocks tends to incur aperture effect and even ubiquitous matching. Hence, there is a slight drop in the PSNR of low bit-resolution motion estimation algorithms, except for our algorithm. The PSNR decrease of 1BT, MT-2BT, and AQ-2BT is separately 0.37 dB, 0.14 dB, and 0.08 dB. It demonstrates that our bit transform reaches the minimum information loss and better preserves the details and texture features of original video frames among all bit transform methods, obtaining high motion-compensation efficiency. 6.2. Objective quality comparison In the second experiment, we compared our bit transform with previous methods in terms of PSNR. To better understand the two individual steps of our algorithm, namely the non-uniform interval partitioning and the threshold refinement, we also analyzed the performance of the bit transform only with non-uniform interval partitioning (NUQ-2BT). Table 2 shows the performance comparison results for a block size of 16 16 pixels, while Table 3 lists the results for a block size of 8 8 pixels. Figs. 5–8 illustrate the PSNR curves of decoded “Flower”, “Paris”, “Harbour”, and “Grandma” sequences by six motion estimation algorithms. As can be seen from Tables 2 and 3, the proposed bit transform and its motion estimation are superior to other bit transforms and low bit-resolution motion estimation methods including the NUQ-2BT in terms of average PSNR. Moreover, the performance of our algorithm is very stable on each frame demonstrated in Figs. 5–8. Among four previous low bit-resolution methods, 1BT performs worst whose average PSNR is 1.88 dB lower than that of 8BT algorithm, since one-bit representation cannot provide rich enough pattern features for block matching. Two-bit depth pixels contain more detailed features of video scenes. The MT-2BT algorithm thus leads to a PSNR of up to 0.56 dB higher than 1BT, which is 1.32 dB lower than 8BT. Nevertheless, the bit transform of MT-2BT adopts varying thresholds for different macroblocks. This will inevitably produce discontinuities at the boundaries of neighboring macroblocks in one search window. Table 2 Average PSNR (dB) of various reconstructed sequences by each algorithm with a block size of 16 16 pixels. Sequence Bit-rate (kbps) 8BT 1BT MT-2BT AQ-2BT NUQ-2BT FQ-2BT Flower 128 23.09 22.81 22.88 22.89 22.87 23.00 384 23.14 22.86 22.92 22.94 22.94 23.06 500 23.39 22.92 22.97 23.12 23.15 23.39 Football 128 28.60 26.92 27.85 27.75 27.77 27.96 384 29.03 26.99 27.92 27.88 27.87 28.05 500 30.19 27.13 28.52 28.81 28.63 28.95 Mobile 128 22.17 21.76 21.81 21.88 22.02 22.04 384 22.22 21.82 21.85 21.94 22.06 22.09 500 22.26 21.85 21.88 21.97 22.10 22.13 Paris 128 25.03 24.35 24.56 24.60 24.75 24.76 384 26.64 24.97 25.42 25.59 26.07 26.12 500 27.85 26.27 26.82 26.98 27.32 27.37 Stefan 128 24.14 23.59 23.83 23.76 23.87 23.87 384 24.36 23.67 23.92 23.86 23.95 23.98 500 24.99 24.04 24.42 24.46 24.55 24.61 Tennis 128 27.50 25.96 26.72 26.76 26.82 26.85 384 28.48 26.28 27.12 27.31 27.50 27.55 500 29.26 27.06 27.79 28.18 28.25 28.33 Tempete 128 24.32 23.82 23.92 23.96 24.03 24.09 384 24.98 24.12 24.18 24.38 24.55 24.66 500 25.69 24.83 24.92 25.10 25.27 25.36 City 1024 27.64 25.94 26.18 26.26 26.74 26.81 1500 29.12 27.12 27.21 27.64 28.08 28.23 2048 30.23 28.74 28.83 29.26 29.38 29.49 Crew 1024 30.83 28.87 29.51 29.46 29.60 29.69 1500 32.34 29.63 30.48 30.65 31.04 31.08 2048 33.61 31.29 32.11 32.25 32.44 32.47 Harbour 1024 26.10 25.22 25.36 25.56 25.67 25.69 1500 27.17 25.48 25.65 26.14 26.36 26.46 2048 26.79 26.79 27.02 27.62 27.67 27.80 Carphone 32 27.48 26.22 26.55 26.61 26.88 26.90 64 28.53 26.29 26.67 26.98 27.57 27.62 128 32.28 29.48 30.55 31.11 31.33 21.40 Grandma 32 28.94 27.63 28.02 28.12 28.13 28.26 64 31.32 28.20 28.79 30.22 30.09 30.41 128 35.14 33.18 34.20 34.74 34.29 34.38 C.-M. Song et al. / Signal Processing: Image Communication 28 (2013) 1435–1447 1443
1444 C.-M.Song et aL Signal Processing:Image Communication 28 (2013)1435-1447 Table 3 Average PSNR(dB)of various reconstructed sequences by each algorithm with a block size of 8 x 8 pixels Sequence Bit-rate(kbps】 8BT 1BT MT-2BT AQ-2BT NUQ-2BT FQ-2BT Flower 128 23.29 22.47 22.66 22.73 22.63 22.91 384 23.32 22.49 22.68 22.76 22.66 22.94 500 23.34 22.51 22.70 22.78 22.67 22.96 Football 128 29.91 27.06 28.19 28.35 28.07 28.41 384 29.99 27.09 28.25 28.41 28.09 28.46 500 30.03 27.14 2828 28.45 28.12 28.50 Mobile 128 22.65 21.37 21.73 21.84 22.02 22.05 384 22.68 21.39 21.75 21.87 22.05 22.08 500 22.71 21.41 21.77 21.89 22.07 22.10 Paris 128 25.63 24.02 24.38 24.57 24.82 24.83 384 25.81 2417 24.50 24.74 25.01 25.02 500 26.34 24.24 24.59 24.83 2537 25.41 Stefan 128 24.55 23.06 23.67 23.69 23.79 23.80 384 24.62 23.12 23.72 23.75 23.87 28.89 500 24.67 23.15 23.76 23.80 23.91 23.93 Tennis 128 28.55 25.96 26.99 27.12 26.87 27.07 384 28.65 25.98 26.99 27.17 26.96 27.14 500 28.71 26.03 26.99 27.19 27.03 27.19 Tempete 128 24.95 23.43 23.72 23.88 24.00 24.06 384 25.02 23.48 23.68 23.94 24.05 2410 500 25.08 23.51 23.79 23.98 2410 24.15 City 1024 28.52 25.59 26.36 26.53 2684 26.88 1500 28.58 25.64 26.40 26.59 26.89 26.93 2048 28.68 25.68 26.45 26.66 26.95 27.02 Crew 1024 32.48 28.76 29.86 28.42 30.00 30.01 1500 32.56 28.81 29.91 30.36 30.05 30.07 2048 32.63 28.86 29.99 30.43 30.10 30.15 Harbour 1024 27.25 24.86 25.38 24.99 25.86 25.88 1500 27.30 24.90 25.42 25.79 2591 25.92 2048 27.39 24.95 25.49 25.86 25.97 26.00 Carphone 32 28.77 25.97 26.64 26.87 27.08 27.11 28.88 26.05 26.71 26.99 27.14 27.15 128 29.28 26.19 26.95 27.22 28.14 28.15 Grandma 29.73 26.70 27.87 28.10 28.13 28.23 64 29.95 26.81 28.03 28.30 2833 28.41 128 31.89 2712 28.48 29.03 29.49 30.37 255 29 o 8BT (gp)awey 25 0-1BT 28.5 -MT-2BT 24.5 ¥-AQ-2BT NUQ-2BT 24 -0-FQ-2BT 27.5 25 23 26.5 22.5 26 22 一AQ-2BT 25.5 9NU0-2B1 0-F0-28T 21.5 10 20 3040506070 80 25 2 10 20 30405060 7080 90 Frame Number Frame Number Fig.5.PSNR performance comparison among different motion estima- Fig.6.PSNR performance comparison among different motion estima- tion algorithms for "Flower"at the bit-rate of 500 kbps.The macroblock tion algorithms for "Paris"at the bit-rate of 500 kbps.The macroblock size is16×16 pixels. size is16×16 pixels. Furthermore,MT-2BT uses the variance of a video frame to introducing an adaptive quantization strategy and a uni- complete bit transform.Because the encoder's scalar form threshold for blocks in one search window.AQ-2BT quantization always affects pixels'variance,the bit trans- algorithm improves the bit transform quality and achieves form results tend to be inaccurate.Both the two factors of 0.17 dB higher PSNR than that of MT-2BT.But AQ-2BT does MT-2BT mislead its search for optimum matching block so not take scalar quantization noises into consideration as to degrade the encoder's coding efficiency.Through either.Its hard thresholding technique brings about an
Furthermore, MT-2BT uses the variance of a video frame to complete bit transform. Because the encoder's scalar quantization always affects pixels' variance, the bit transform results tend to be inaccurate. Both the two factors of MT-2BT mislead its search for optimum matching block so as to degrade the encoder's coding efficiency. Through introducing an adaptive quantization strategy and a uniform threshold for blocks in one search window, AQ-2BT algorithm improves the bit transform quality and achieves 0.17 dB higher PSNR than that of MT-2BT. But AQ-2BT does not take scalar quantization noises into consideration either. Its hard thresholding technique brings about an Table 3 Average PSNR (dB) of various reconstructed sequences by each algorithm with a block size of 8 8 pixels. Sequence Bit-rate (kbps) 8BT 1BT MT-2BT AQ-2BT NUQ-2BT FQ-2BT Flower 128 23.29 22.47 22.66 22.73 22.63 22.91 384 23.32 22.49 22.68 22.76 22.66 22.94 500 23.34 22.51 22.70 22.78 22.67 22.96 Football 128 29.91 27.06 28.19 28.35 28.07 28.41 384 29.99 27.09 28.25 28.41 28.09 28.46 500 30.03 27.14 28.28 28.45 28.12 28.50 Mobile 128 22.65 21.37 21.73 21.84 22.02 22.05 384 22.68 21.39 21.75 21.87 22.05 22.08 500 22.71 21.41 21.77 21.89 22.07 22.10 Paris 128 25.63 24.02 24.38 24.57 24.82 24.83 384 25.81 24.17 24.50 24.74 25.01 25.02 500 26.34 24.24 24.59 24.83 25.37 25.41 Stefan 128 24.55 23.06 23.67 23.69 23.79 23.80 384 24.62 23.12 23.72 23.75 23.87 28.89 500 24.67 23.15 23.76 23.80 23.91 23.93 Tennis 128 28.55 25.96 26.99 27.12 26.87 27.07 384 28.65 25.98 26.99 27.17 26.96 27.14 500 28.71 26.03 26.99 27.19 27.03 27.19 Tempete 128 24.95 23.43 23.72 23.88 24.00 24.06 384 25.02 23.48 23.68 23.94 24.05 24.10 500 25.08 23.51 23.79 23.98 24.10 24.15 City 1024 28.52 25.59 26.36 26.53 26.84 26.88 1500 28.58 25.64 26.40 26.59 26.89 26.93 2048 28.68 25.68 26.45 26.66 26.95 27.02 Crew 1024 32.48 28.76 29.86 28.42 30.00 30.01 1500 32.56 28.81 29.91 30.36 30.05 30.07 2048 32.63 28.86 29.99 30.43 30.10 30.15 Harbour 1024 27.25 24.86 25.38 24.99 25.86 25.88 1500 27.30 24.90 25.42 25.79 25.91 25.92 2048 27.39 24.95 25.49 25.86 25.97 26.00 Carphone 32 28.77 25.97 26.64 26.87 27.08 27.11 64 28.88 26.05 26.71 26.99 27.14 27.15 128 29.28 26.19 26.95 27.22 28.14 28.15 Grandma 32 29.73 26.70 27.87 28.10 28.13 28.23 64 29.95 26.81 28.03 28.30 28.33 28.41 128 31.89 27.12 28.48 29.03 29.49 30.37 Fig. 5. PSNR performance comparison among different motion estimation algorithms for “Flower” at the bit-rate of 500 kbps. The macroblock size is 16 16 pixels. Fig. 6. PSNR performance comparison among different motion estimation algorithms for “Paris” at the bit-rate of 500 kbps. The macroblock size is 16 16 pixels. 1444 C.-M. Song et al. / Signal Processing: Image Communication 28 (2013) 1435–1447