第11卷第1期 智能系统学报 Vol.11 No.1 2016年2月 CAAI Transactions on Intelligent Systems Feh.2016 D0I:10.11992/is.201507073 网络出版地址:htp:/www.cmki.net/kcms/detail/23.1538.tp.201509030.1456.002.html Efficient tracker based on sparse coding with Euclidean local structure-based constraint WANG Hongyuan',ZHANG Ji',CHEN Fuhua2 (1.School of Information Science and Engineering,Changzhou University,Changzhou,Jiangsu,China 213164;2.Department of Nat- ural Science and Mathematics,West Liberty University,West Virginia,United States 26074) Abstract:Sparse coding (SC)based visual tracking(I-tracker)is gaining increasing attention,and many related algorithms are developed.In these algorithms,each candidate region is sparsely represented as a set of target tem- plates.However,the structure connecting these candidate regions is usually ignored.Lu proposed an NLSSC-tracker with non-local self-similarity sparse coding to address this issue,which has a high computational cost.In this study, we propose an Euclidean local-structure constraint based sparse coding tracker with a smoothed Euclidean local structure.With this tracker,the optimization procedure is transformed to a small-scale I,-optimization problem,sig- nificantly reducing the computational cost.Extensive experimental results on visual tracking demonstrate the effectiveness and efficiency of the proposed algorithm. Keywords:euclidean local-structure constraint;I,-tracker;sparse coding;target tracking CLC Number:TP18;TP301.6 Document Code:A Article ID:1673-4785(2016)01-0136-12 Citation:WANG Hongyuan,ZHANG Ji,CHEN Fuhua.Efficient tracker based on sparse coding with Euclidean local struc- ture-based constraint[J].CAAI Transactions on Intelligent Systems,2016,11(1):136-147. Recently,visual target tracking was widely used Based on sparse coding (SC;also referred to as in security surveillance,navigation,human-computer sparse sensing or compressive sensing)(7,Mei pro- interaction,and other applications(2.In a video se- posed an I-tracker for generative trackings,ad- quence,targets for tracking often change dynamically dressing occlusion,corruption,and some other chal- and uncertainly because of disturbance phenomena lenging issues.However,this tracker incurs a very such as occlusion,noisy and varying illumination,and high computational cost to achieve efficient tracking object appearance.Many tracking algorithms were pro- (see section 2.1 and Fig.1 for details),and the local posed in the last twenty years that can be divided into structures of similar regions are ignored,which may two categories:generative tracking and discriminant cause the instability and even failure of the I-tracker. tracking algorithms Generative algorithms (e.g., Indeed,the sparse coefficients,for representing six eigen tracker,mean-shift tracker,incremental tracker, similar regions (CR-CR)under ten template regions covariance tracker[2])adopt appearance models to ex- (T-To)with original l,-tracker,are diversified (Fig. press the target observations,whereas discriminant al- 3).Considering CR,and CR,for example,we can gorithms (e.g.,TLD3],ensemble tracking,and see that although the latter is almost the partial occlu- MILTrack[s])view tracking as a classification prob- sion version of the former,their sparse representations lem,thus attempting to distinguish the target from the are very different.Tracking CR(the woman's face) backgrounds.Here,we present a new generative algo- may fail,because the tracker is likely to incorrectly rithm. consider the region Ts(the book)as its target. Received Date:2015-07-31.Online Pulication:2015-09-30. Contrary to expectations,Xu proved that a sparse Foundation Item:National Natural Foundation of China under Grant (61572085,61502058). algorithm cannot be stable and that similar signals may Corresponding Author:Hongyuan Wang.E-mail:hywang@cczu.edu.cn. not exhibit similar sparse coefficients Thus,a
第 11 卷第 1 期 智 能 系 统 学 报 Vol.11 №.1 2016 年 2 月 CAAI Transactions on Intelligent Systems Feb. 2016 DOI:10.11992 / tis.201507073 网络出版地址:http: / / www.cnki.net / kcms/ detail / 23.1538.tp.201509030.1456.002.html Efficient tracker based on sparse coding with Euclidean local structure⁃based constraint WANG Hongyuan 1 , ZHANG Ji 1 , CHEN Fuhua 2 (1. School of Information Science and Engineering, Changzhou University, Changzhou, Jiangsu, China 213164; 2. Department of Nat⁃ ural Science and Mathematics, West Liberty University, West Virginia, United States 26074) Abstract:Sparse coding (SC) based visual tracking (l 1 ⁃tracker) is gaining increasing attention, and many related algorithms are developed. In these algorithms, each candidate region is sparsely represented as a set of target tem⁃ plates. However, the structure connecting these candidate regions is usually ignored. Lu proposed an NLSSC⁃tracker with non⁃local self⁃similarity sparse coding to address this issue, which has a high computational cost. In this study, we propose an Euclidean local⁃structure constraint based sparse coding tracker with a smoothed Euclidean local structure. With this tracker, the optimization procedure is transformed to a small⁃scale l 1 ⁃optimization problem, sig⁃ nificantly reducing the computational cost. Extensive experimental results on visual tracking demonstrate the effectiveness and efficiency of the proposed algorithm. Keywords:euclidean local⁃structure constraint; l 1 ⁃tracker; sparse coding; target tracking CLC Number:TP18; TP301.6 Document Code:A Article ID:1673⁃4785(2016)01⁃0136⁃12 Citation:WANG Hongyuan, ZHANG Ji, CHEN Fuhua. Efficient tracker based on sparse coding with Euclidean local struc⁃ ture⁃based constraint[J]. CAAI Transactions on Intelligent Systems, 2016, 11(1): 136⁃147. Received Date:2015⁃07⁃31. Online Pulication:2015⁃09⁃30. Foundation Item: National Natural Foundation of China under Grant (61572085,61502058). Corresponding Author:Hongyuan Wang. E⁃mail: hywang@ cczu.edu.cn. Recently, visual target tracking was widely used in security surveillance, navigation, human⁃computer interaction, and other applications [1⁃2] . In a video se⁃ quence, targets for tracking often change dynamically and uncertainly because of disturbance phenomena such as occlusion, noisy and varying illumination, and object appearance. Many tracking algorithms were pro⁃ posed in the last twenty years that can be divided into two categories: generative tracking and discriminant tracking algorithms [1⁃2] . Generative algorithms ( e. g., eigen tracker, mean⁃shift tracker, incremental tracker, covariance tracker [2] ) adopt appearance models to ex⁃ press the target observations, whereas discriminant al⁃ gorithms ( e. g., TLD [3] , ensemble tracking [4] , and MILTrack [5] ) view tracking as a classification prob⁃ lem, thus attempting to distinguish the target from the backgrounds. Here, we present a new generative algo⁃ rithm. Based on sparse coding ( SC; also referred to as sparse sensing or compressive sensing) [6⁃7] , Mei pro⁃ posed an l 1 ⁃tracker for generative tracking [8⁃9] , ad⁃ dressing occlusion, corruption, and some other chal⁃ lenging issues. However, this tracker incurs a very high computational cost to achieve efficient tracking (see section 2.1 and Fig.1 for details), and the local structures of similar regions are ignored, which may cause the instability and even failure of the l 1 ⁃tracker. Indeed, the sparse coefficients, for representing six similar regions (CR1 -CR6 ) under ten template regions ( T1 -T10 ) with original l 1 ⁃tracker, are diversified (Fig. 3). Considering CR1 and CR4 , for example, we can see that although the latter is almost the partial occlu⁃ sion version of the former, their sparse representations are very different. Tracking CR4 ( the womans face) may fail, because the tracker is likely to incorrectly consider the region T8(the book) as its target. Contrary to expectations, Xu proved that a sparse algorithm cannot be stable and that similar signals may not exhibit similar sparse coefficients [10] . Thus, a
1 WANG Hongyuan,et al:Efficient tracker based on sparse coding with Euclidean local structure-based constraint ·137. trade-off occurs between sparsity and stability when de- 1 Related works signing a learning algorithm.In addition,instability in the I,-optimization problem affects the performance of 1.1 Sparse coding and the I,-tracker the 1-tracker. Sparse coding is an attractive signal reconstruction Lu developed a NLSSS-tracker NLSSST)based method proposed by Candes that reconstructs a sig- on SC applying a non-local self-similarity constraint by nal y R"x with an over-complete dictionary D introducing the geometrical information of the set of R)with a sparse coefficient vectoreR.The candidates as a smoothing term to alleviate the instabil- SC formulation can be written as the lo-norm-constrain- ity of the -tracker However,its low efficiency (e- ed optimization problem as follows: ven slower than the original l-tracker,Table 4)re- min,lly-Dcli+all cllo (1) which is NP-hard,where‖·‖edenotes the vector's stricts its applicability in real-time tracking.In this Frobenius norm(i.e.,l2-nom),and‖·‖locounts study,motivated by the robustness of the I-tracker and the number of non-zero elements of the vector.Candes stability of NISSST,we propose a novel tracker, proved that the I-norm Il is the tightest upper called ELSS-tracker (ELSST),that is both robust and bound of the lo-nom‖·Io,and thus,Eq.(1)can efficient.The main contributions of this study are as be rewritten as the following I-optimization prob- follows: lem[67]: 1)An efficient tracker,i.e.,ELSST,is developed min,lly-Dcl+a‖lcli (2) by considering the local structure of the set of target Based on SC,Mei presented a nice I-tracker for candidates.In contrast to the Lu5s and Mei5s-track- robust trackingFig.1).Considering that the target e our tracker is more stable and sparse. is located in the latest frame,the I-tracker is initial- ized in the new arrival frame and N candidate regions 2)The proposed tracker shows excellent perform- are generated with Bayesian inference (Fig.la,b). ance in tracking different video sequences with regard With n templates learned from previous tracking and to scale,occlusion,pose variations,background clut- 2m trivial templates (m positive ones and m negative ter,and illumination changes. ones,where m is the dimension of 1D stretched image, The rest of this study is organized as follows:I- Fig.Ic),Eq.(2)can be solved Fig.1d,e,f).With and NLSSS-tracker are introduced in section 2:in sec- positive and negative trivial templates,Mei added a tion 3,we analyze the disadvantages of these two track- non-negative constraint c0 in Eq.(2),with which ers and propose our tracker;experimental results with the reconstruction errors of all candidate regions with our tracker and four comparison algorithms are reported SC coefficients can be used to determine the weights for each candidate,and the object in the new arrival frame in section 4;the conclusion and future work are sum- can be located with the sum of the weighted candidates. marized in section 5. The dictionaries updating strategies can be seen in target template T Trival Template/ (a)New arrival frame (b)N candidate (c)Over-complete regions dictionary Cr∈RW1 C∈RW C∈RmN cwith sparity constraint minllcll (d)Sample (e)Dictionaries (f)Coefficient of represention Fig.1 Original I-tracker algorithm
trade⁃off occurs between sparsity and stability when de⁃ signing a learning algorithm. In addition, instability in the l 1 ⁃optimization problem affects the performance of the l 1 ⁃tracker. Lu developed a NLSSS⁃tracker (NLSSST) based on SC applying a non⁃local self⁃similarity constraint by introducing the geometrical information of the set of candidates as a smoothing term to alleviate the instabil⁃ ity of the l 1 ⁃tracker [11] . However, its low efficiency (e⁃ ven slower than the original l 1 ⁃tracker, Table 4) re⁃ stricts its applicability in real⁃time tracking. In this study, motivated by the robustness of the l 1 ⁃tracker and stability of NLSSST, we propose a novel tracker, called ELSS⁃tracker (ELSST), that is both robust and efficient. The main contributions of this study are as follows: 1)An efficient tracker, i.e., ELSST, is developed by considering the local structure of the set of target candidates. In contrast to the Lu5s [11] and Mei5s⁃track⁃ er [8⁃9] , our tracker is more stable and sparse. 2) The proposed tracker shows excellent perform⁃ ance in tracking different video sequences with regard to scale, occlusion, pose variations, background clut⁃ ter, and illumination changes. The rest of this study is organized as follows: l 1 ⁃ and NLSSS⁃tracker are introduced in section 2; in sec⁃ tion 3, we analyze the disadvantages of these two track⁃ ers and propose our tracker; experimental results with our tracker and four comparison algorithms are reported in section 4; the conclusion and future work are sum⁃ marized in section 5. 1 Related works 1.1 Sparse coding and the l 1 ⁃tracker Sparse coding is an attractive signal reconstruction method proposed by Candes [6⁃7] that reconstructs a sig⁃ nal y ∈ R m×1 with an over⁃complete dictionary D ∈ R m×(n+2m) with a sparse coefficient vectorc ∈ R n×1 . The SC formulation can be written as the l 0 ⁃norm⁃constrain⁃ ed optimization problem as follows: minc‖y - Dc‖2 F + α‖c‖0 (1) which is NP⁃hard, where ‖·‖F denotes the vector’s Frobenius norm ( i. e., l 2 ⁃norm), and ‖·‖0 counts the number of non⁃zero elements of the vector. Candes proved that the l 1 ⁃norm ‖·‖1 is the tightest upper bound of the l 0 ⁃norm ‖·‖0 , and thus, Eq.(1) can be rewritten as the following l 1 ⁃optimization prob⁃ lem [6⁃7] : minc ‖y - Dc‖2 F + α ‖c‖1 (2) Based on SC, Mei presented a nice l 1 ⁃tracker for robust tracking [8⁃9] (Fig. 1). Considering that the target is located in the latest frame, the l 1 ⁃tracker is initial⁃ ized in the new arrival frame and N candidate regions are generated with Bayesian inference ( Fig. 1a, b). With n templates learned from previous tracking and 2m trivial templates (m positive ones and m negative ones, where m is the dimension of 1D stretched image, Fig. 1c), Eq.(2) can be solved (Fig. 1d,e,f). With positive and negative trivial templates, Mei added a non⁃negative constraint c≥0 in Eq. (2), with which the reconstruction errors of all candidate regions with SC coefficients can be used to determine the weights for each candidate, and the object in the new arrival frame can be located with the sum of the weighted candidates. The dictionaries updating strategies can be seen in [8⁃9] . Fig.1 Original l 1 ⁃tracker algorithm 第 1 期 WANG Hongyuan, et al: Efficient tracker based on sparse coding with Euclidean local structure⁃based constraint ·137·
·138. 智能系统学报 第11卷 1.2 Non-local self-similarity based sparse coding rameter enforcing similarity,and s,is the normalization for tracking (NLSSST) factor.The weight measures the similarity between Recently,Xu indicated the trade-off between the K-neighborhood of y;and y:.Lu's algorithm actual- sparsity and stability in sparse regularized algo- ly attempts to solve the following: rithms[].Moreover,Yang pointed out the same A-op- timization issue in pattern classification Based on minc I Y-DC+a cll+B II C-CWIl the fact that lots of similar regions exist in all N candi- (4) dates generated by Bayesian inference,Lu proposed Taking the solution of the I-tracker from Eq.(2) his tracker with the non-local self-similarity constraint as the initial coefficients co,Eq.(4)can be solved as 含s-含6P=1c-cw1 through iterative computations.However,the high (3) computational cost of the original I,-tracker and itera- i=1 where c;and c,are the sparse coefficients corresponding tive procedure for maintaining the neighborhood con- to the candidate regions y;and y,respectively,and w straints of sparse coefficients make NLSSST inefficient is the weight assigned to c.Given N m-dimensional in achieving real-timing tracking.In contrast to Fig.1, candidates Y=[y.yx]E RTxN,the first K-closest the schematic diagram of NLSSST presented in Fig.2, candidate point aroundy,is denoted by N(y),and includes an additional neighborhood constraint between 1_Iwy-W2☑ the weight w=一e where h is a pa- y;and N(y:). target template T trival template t 图图 (a)N Candidate regions (b)Over-complete dictionary Cr∈RxN C+∈RmxN C∈RmxN with sparity yNx()y\{y,N(y》 constraint min c (c)Sample (d)Dictionaries (e)Coefficient of representation Fig.2 Lu's NLSSST Algorithm 2 Euclidean local structure-based sparse known that the 12-norm is much more commonly used for measuring the distance between two vectors and is coding for tracking (ELSST) much easier to optimize than the I-norm.Thus,we To circumvent the heavy computation burden of take the former to measure the relationships between the 1-tracker and NISSST (Table 4),we propose an the sparse coefficient vectors,which are close to each efficient tracker,called ELSST,that considers the lo- other,i.e.,the Euclidean local-structure constraint, cal Euclidean structures of the candidates. and the latter I-norm of C to maintain the sparsity of 2.1 Original euclidean local structure constraint the optimization as follows: sparse coding Original ELSSC) It is evident from Eq.(4)that NLSSST attempts minc I Y-DC+a C+B I C-CWll to solve a double I-norm problem.However,it is well (5)
1.2 Non⁃local self⁃similarity based sparse coding for tracking (NLSSST) Recently, Xu indicated the trade⁃off between sparsity and stability in sparse regularized algo⁃ rithms [10] . Moreover, Yang pointed out the same A⁃op⁃ timization issue in pattern classification [12] . Based on the fact that lots of similar regions exist in all N candi⁃ dates generated by Bayesian inference, Lu proposed his tracker with the non⁃local self⁃similarity constraint as ∑ n i = 1 ci - ∑ K j = 1 wji cj 2 = ‖C - CW‖2 F (3) where ci and cj are the sparse coefficients corresponding to the candidate regions yi and yj, respectively, and wji is the weight assigned to cj . Given N m⁃dimensional candidates Y = y1 … yN [ ] ∈ R m×N , the first K⁃closest candidate point aroundyj is denoted by NK ( yj ), and the weight wji = 1 sj e - ‖NK (y j ) -NK (y i )‖2 h , where h is a pa⁃ rameter enforcing similarity, and sj is the normalization factor. The weight wji measures the similarity between the K⁃neighborhood of yj and yi . Lu’s algorithm actual⁃ ly attempts to solve the following: minC ‖Y - DC‖2 F + α ‖C‖1 + β ‖C - CW‖1 (4) Taking the solution of the l 1 ⁃tracker from Eq.(2) as the initial coefficients c0 , Eq. ( 4) can be solved through iterative computations [11] . However, the high computational cost of the original l 1 ⁃tracker and itera⁃ tive procedure for maintaining the neighborhood con⁃ straints of sparse coefficients make NLSSST inefficient in achieving real⁃timing tracking. In contrast to Fig. 1, the schematic diagram of NLSSST presented in Fig. 2, includes an additional neighborhood constraint between yi and NK(yi). Fig. 2 Lus NLSSST Algorithm 2 Euclidean local structure⁃based sparse coding for tracking (ELSST) To circumvent the heavy computation burden of the l 1 ⁃tracker and NLSSST (Table 4), we propose an efficient tracker, called ELSST, that considers the lo⁃ cal Euclidean structures of the candidates. 2.1 Original euclidean local structure constraint sparse coding (Original ELSSC) It is evident from Eq. (4) that NLSSST attempts to solve a double l 1 ⁃norm problem. However, it is well known that the l 2 ⁃norm is much more commonly used for measuring the distance between two vectors and is much easier to optimize than the l 1 ⁃norm. Thus, we take the former to measure the relationships between the sparse coefficient vectors, which are close to each other, i. e., the Euclidean local⁃structure constraint, and the latter l 1 ⁃norm of C to maintain the sparsity of the optimization as follows: minC ‖Y - DC‖2 F + α ‖C‖1 + β ‖C - CW‖2 F (5) ·138· 智 能 系 统 学 报 第 11 卷
1 WANG Hongyuan,et al:Efficient tracker based on sparse coding with Euclidean local structure-based constraint ·139. Table 1 Optimization for ELS constraint based SC(ELSSC) Input:Given N data points Y=[yY]ER,over-complete dictionary DR) Output:Sparse matrix C=[ce]R(x Parameters:Maxiumn iteration number J=10,neighborhood size K=5,all-zero vector co,a=0.01,B=0.5,y=0.001 1:For each point y,compute the nearest K neighborhoods N(y)and weights 2:Compute the SVD-decomposition of D UJVT,where VER()(2m) 3:compute ID'D‖,and set‖D'D|+2 B randomly 4:For i=1:N 5:For t=1:J 6:If lec)<,break inner iteration 7:0omie0=,6-,0=[Dy+290,-w+(y-28)c-DD,e-"],md0=VeR 8:Repesent”thparse colficient vector”,ie,optimize子l”-lke”tallo"l, 9:End 10:End Equation (5)is the objective function of our Eu- clidean local structure constraint-based SC and can be +7291a-ol-宁Ie-nl店 solved through iterative computation.In particular,at 1 lI3-6:D9〉+aIe0,-2pc9,-)+ the t-th iteration,for a single candidate y:in Y,Eq. (5)can be written as follows: 子"1-g-9e)+Y,81o+ 2 mineof(c()=min Iy:-De+ a ll e :+B lc)-0)(6) 〈De”,D〉-2ID,l3+BI-wI经= where.At the t-th iteration for the optimization of c,c,j is fixed.Therefore,we can 子1e"3-De)-29(c0,g-)- regard 1)as a constant.To solve Eq.(6),we intro- (y-2B)c.co+(Def Dco)+a llel+ duce the following surrogate function as presented in [11]: (兮%店+7291l+811》- 6o)=分1e-6l-71n0-l目 子Ic”-0+aI0,+R (7) (8) where is convex.According to Daubechies,when where AI-D'D is a strictly positive definite matrix,c,co) 1 =[D'y:+280+(y-28)c-D De-] 2 is strictly convex for any co with respect to c.Hence,in and our experiments,the constant A is set accordingly (A y-28;Table 1).Once the over-complete dictionary D is R=lxl+Y291o1-1l+ fixed,we can derive the following convex objective func- tion from Eq.(7): BIwI-子I” )lyDeare fixed at the t-th iteration.Thus,we can simplify
Table 1 Optimization for ELS constraint based SC(ELSSC) Input:Given N data points Y= [ y1… yN ]∈R m×N ,over⁃complete dictionary D∈R m×(n+2m) Output:Sparse matrix C= [c1… cN ]∈R (n+2m)×N Parameters: Maxiumn iteration number J = 10,neighborhood size K= 5,all⁃zero vector c0 ,α= 0.01,β = 0.5,γ = 0.001 1:For each point yi,compute the nearest K neighborhoods NK(yi) and weights wji 2:Compute the SVD⁃decomposition of D = UΣV T ,where V∈R (n+2m)×(n+2m) 3:compute ‖D TD‖, and set‖D TD‖+2β randomly 4:For i = 1:N 5: For t = 1:J 6: If ‖c (t) i - c (t-1) i ‖2 < τ , break inner iteration 7:Compute θ (t) i = ∑j wji c (t-1) j ,v (t) i = 1 γ [D T yi + 2βθi (t-1) + (γ - 2β)ci t-1 - D TDi c t-1 ] ,and x (t) i =Vv (t) i ∈R (n+2m)×1 8: Represent x (t) i with sparse coefficient vector c (t) i ,i.e.,optimize γ 2 ‖x (t) i -Vc (t) i ‖2 2 +α‖c (t) i ‖1 9: End 10:End Equation (5) is the objective function of our Eu⁃ clidean local structure constraint⁃based SC and can be solved through iterative computation. In particular, at the t⁃th iteration, for a single candidate yi in Y, Eq. (5) can be written as follows: minci (t) f(ci (t) ) = minci (t) ‖yi - Dci (t)‖2 2 + α ‖c (t) i ‖1 + β ‖c (t) i - θ (t-1) i ‖2 2 (6) where θ (t) i = ∑ jwji c (t-1) j . At the t⁃th iteration for the optimization of ci,cj,i ≠ j is fixed. Therefore, we can regard θ (t-1) i as a constant. To solve Eq. (6), we intro⁃ duce the following surrogate function as presented in [11]: ψ(ci,c0) = λ 2 ‖c (t) i - c0‖2 2 - 1 2 ‖Dc (t) i - Dc0‖2 2 (7) where λ is convex. According to Daubechies [13] , when λI - D TD is a strictly positive definite matrix, ψ(ci,c0) is strictly convex for any c0 with respect to ci . Hence, in our experiments, the constant λ is set accordingly ( λ = γ - 2β; Table 1). Once the over⁃complete dictionary D is fixed, we can derive the following convex objective func⁃ tion from Eq. (7): f(c (t) i ) = 1 2 ‖yi - Dc (t) i ‖2 2 + α ‖c (t) i ‖1 + β ‖c (t) i - θ (t-1) i ‖2 2 + γ - 2β 2 ‖c (t) i - c0‖2 2 - 1 2 ‖Dc (t) i - Dc0‖2 2 = 1 2 ‖yi‖2 2 - 〈yi,Dc (t) i 〉 + α ‖c (t) i ‖1 - 2β〈c (t) i ,θ (t-1) i 〉 + γ 2 ‖c (t) i ‖2 2 - (γ - 2β)〈c (t) i ,c0〉 + γ - 2β 2 ‖c0‖2 2 + 〈Dc (t) i ,Dc0 〉 - 1 2 ‖Dc0‖2 2 + β ‖θ (t-1) i ‖2 2 = γ 2 ‖c (t) i ‖2 2 - 〈yi,Dc (t) i 〉 - 2β〈c (t) i ,θ (t-1) i 〉 - (γ - 2β)〈c (t) i ,c0〉 + 〈Dc (t) i ,Dc0〉 + α ‖c (t) i ‖1 + ( 1 2 ‖yi‖2 2 + γ - 2β 2 ‖c0‖2 2 + β ‖θ (t-1) i ‖2 2 ) = γ 2 ‖c (t) i - v (t) i ‖2 2 + α ‖c (t) i ‖1 + R (8) where v (t) i = 1 γ D T yi + 2βθ (t-1) i + (γ - 2β)c t-1 i - D TDc t-1 [ ] and R = 1 2 ‖yi‖2 2 + γ - 2β 2 ‖c0‖ - 1 2 ‖Dc0‖2 2 + β ‖θ (t-1) i ‖2 2 - γ 2 ‖v (t) i ‖2 2 are fixed at the t⁃th iteration. Thus, we can simplify 第 1 期 WANG Hongyuan, et al: Efficient tracker based on sparse coding with Euclidean local structure⁃based constraint ·139·
·140· 智能系统学报 第11卷 Eg.(8)as follows: e)=子Ixw-reoI+alel )=+a (9) (12) To solve Eq.(9)using SVD,we decompose the over- where V"=[V'I'-I"]E R"x3,I'denotes the n-or- complete dictionary DR(asD=UV,where dered identity matrix,andx'is the first n rows of x;in U∈RmXm,∑∈Rmx(a+2m)andV∈Ra+2m)x(a+2). Eq.(10). Since V is an orthogonal matrix,Eq.(9)can be re- 2.3 Original and improved ELSSC-tracker written as fc")=子I9-e”+a1c”, Based on the above algorithm,our tracker can be obtained with the framework of the original I,-track- (10) er(Table 2).We need to iteratively solve the large- where x=Vo).Consequently,we can transform the scale l,-optimization problem in Eq.(10)twice,up to optimization problem with the Euclidean local structure three times for each candidate in the algorithm,and constraint in Eq.(6)to a pure l-optimization problem more than five times in NLSSST.The initial sparse in Eq.(10),i.e.,to represent the given signalxwith coefficients co are considered as all-zero vectors and it- sparse coefficientsc under the new dictionary V eratively solve the problem without any I,-optimization R(2m)x(+2m).The procedure of Euclidean local-struc- issues,as in Table 1 in [11].Nevertheless,we find ture constraint based sparse coding (ELSSC)is sum- that,in NLSSST,it is more effective and accurate to marized in Table 1 and is very different from the opti- initialize co as the solution of the I-optimization prob- mization procedure followed for NISSSC],even lem.Therefore,the computation complexity of our though the difference between their objective functions tracker is of the same order of magnitude as that of the seems very small (Eqs.(4)and (5),respectively).1-tracker and NLSSST.When we resize all n 10 tar- 2.2 Improved euclidean local structure constraint gets and N=200 candidate regions to 40 x 40,i.e., sparse coding (Improved ELSSC) m 1 600 Figs.1 and 2),then the over-complete If m in Eq.(10)is large,it is time-consuming to dictionary D is 1 600 x 3 210 and the orthogonal ma- obtain the optimization result c,as that in I-optimiza- trix V is 3 210 x 3 210 in Eq.(10).It is very difficult tion and NISSSC.Fortunately,in terms of SVD and to solve the corresponding I-optimization problem with the structure of D Figs.1 and 2),we have such a D in 1-tracker and NISSST)or V in our D UEV [UEV,I,-I]=[T,I,-I] ELSST). (11) With the improved ELSSC,is the first ten rows where I denotes the m-ordered identity matrix.'is the of and V'consists of the first ten rows and first ten first n rows of V'consists of the first n rows and the columns of V.Thus,each iteration of each candidate first n columns of V,and mn.As a result,when region in ELSST can be reduced from the large-scale constructing the dictionary V in Eq.(10),only the I-optimization problem to a much smaller one because first n rows and first n columns of V must be prepared,of the much smaller scale V E Rox10.To overcome whereas the remaining parts of V are not considered to the problem of occlusions in tracking,the analogous make any contribution to the target templates T.Thus,trivial templates are used to construct the new dictiona- the large scale optimization in Eq.(10)can be re-ryRx,i.e.,a ten-ordered identity matrix and duced to a much smaller one as follows: ten-ordered negative identity matrix
Eq. (8) as follows: f(c (t) i ) = γ 2 ‖c (t) i - v (t) i ‖2 2 + α ‖c (t) i ‖1 (9) To solve Eq. (9) using SVD, we decompose the over⁃ complete dictionary D∈R m×(n+2m) as D = UΣV T ,where U ∈ R m×m ,Σ ∈ R m×(n+2m) and V ∈ R (n+2m) ×(n+2m) . Since V is an orthogonal matrix, Eq. (9) can be re⁃ written as f(c (t) i ) = γ 2 ‖x (t) i - Vc (t) i ‖2 2 + α ‖c (t) i ‖1 (10) where x (t) i = Vv (t) i . Consequently, we can transform the optimization problem with the Euclidean local structure constraint in Eq. (6) to a pure l 1 ⁃optimization problem in Eq. (10), i.e., to represent the given signalx (t) i with sparse coefficientsc (t) i under the new dictionary V ∈ R (n+2m) ×(n+2m) . The procedure of Euclidean local⁃struc⁃ ture constraint based sparse coding (ELSSC) is sum⁃ marized in Table 1 and is very different from the opti⁃ mization procedure followed for NLSSSC [11] , even though the difference between their objective functions seems very small (Eqs. (4) and (5), respectively). 2.2 Improved euclidean local structure constraint sparse coding (Improved ELSSC) If m in Eq. (10) is large, it is time⁃consuming to obtain the optimization result ci, as that in l 1 ⁃optimiza⁃ tion and NLSSSC. Fortunately, in terms of SVD and the structure of D (Figs. 1 and 2), we have D = UΣV T = UΣV′ T [ ,I, - I] = [T,I, - I] (11) where I denotes the m⁃ordered identity matrix. Σ′ is the first n rows of Σ, V′ consists of the first n rows and the first n columns of V, and m≫n. As a result, when constructing the dictionary V in Eq. ( 10), only the first n rows and first n columns of V must be prepared, whereas the remaining parts of V are not considered to make any contribution to the target templates T. Thus, the large scale optimization in Eq. ( 10) can be re⁃ duced to a much smaller one as follows: f(c′ (t) i ) = γ 2 ‖x′ (t) i - V″c′ (t) i ‖2 2 + α ‖c′ (t) i ‖1 , (12) where V″ = [V′ I′ - I″] ∈ R n×3n ,I′ denotes the n⁃or⁃ dered identity matrix, and xi ′ is the first n rows of xi in Eq.(10). 2.3 Original and improved ELSSC⁃tracker Based on the above algorithm, our tracker can be obtained with the framework of the original l 1 ⁃track⁃ er [8⁃9] (Table 2). We need to iteratively solve the large⁃ scale l 1 ⁃optimization problem in Eq. (10) twice, up to three times for each candidate in the algorithm, and more than five times in NLSSST. The initial sparse coefficients c0 are considered as all⁃zero vectors and it⁃ eratively solve the problem without any l 1 ⁃optimization issues, as in Table 1 in [11]. Nevertheless, we find that, in NLSSST, it is more effective and accurate to initialize c0 as the solution of the l 1 ⁃optimization prob⁃ lem. Therefore, the computation complexity of our tracker is of the same order of magnitude as that of the l 1 ⁃tracker and NLSSST. When we resize all n = 10 tar⁃ gets and N = 200 candidate regions to 40 × 40, i.e., m = 1 600 ( Figs. 1 and 2), then the over⁃complete dictionary D is 1 600 × 3 210 and the orthogonal ma⁃ trix V is 3 210 × 3 210 in Eq. (10). It is very difficult to solve the corresponding l 1 ⁃optimization problem with such a D ( in l 1 ⁃tracker and NLSSST) or V ( in our ELSST). With the improved ELSSC, Σ′ is the first ten rows of Σ, and V′ consists of the first ten rows and first ten columns of V. Thus, each iteration of each candidate region in ELSST can be reduced from the large⁃scale l 1 ⁃optimization problem to a much smaller one because of the much smaller scale V′ ∈ R 10×10 . To overcome the problem of occlusions in tracking, the analogous trivial templates are used to construct the new dictiona⁃ ryV″ ∈ R 10×30 , i.e., a ten⁃ordered identity matrix and ten⁃ordered negative identity matrix. ·140· 智 能 系 统 学 报 第 11 卷
1 WANG Hongyuan,et al:Efficient tracker based on sparse coding with Euclidean local structure-based constraint ·141. Table 2 Euclidean Local Structure based Tracking ELSS-tracker) Input:Given a video stream for tracking,location of the target l in frame #1 Output:Tracking results of each frame 1:Set s=1,select 10 template regions extremely near the target in #1,then resize and stretch them to beTR 2:While not reach the end of the video sequence,ss+1 3:Pick N=200 candidate regions around the latest target location in frame #s,and stretch to be yRx 4:Construct D with T,positive and negative identity matries,likewise in Fig.I and Fig2 5:Solve ELSSC-optimization with Y and D in Tab.1,and denote the optimization result as c 6:Compute the reconstruct errors eVeand the normalized weight where exp(ea) 7:Locate the object for tracking with the weighted sum of all 200 candidate regions and in frame #s 8:Select 10 regions that extremely nearby the object as the new target templates T 9:End 3 Experiments ELSSC were verified.The experiments were designed with the Face sequence in the VOT 2013 benchmark 3.1 Experimental setting datasetis]as follows:six similar regions were repre- In order to evaluate the proposed tracker,experi- sented (CR,..,CR,their means and standard deri- ments on 12 video sequences were conducted,inclu- vations illustrate the similarity)sparsely with template ding Surfer,Dudek,Faceocc2,Animal,Girl,Stone,T=[T,To]R0xio from two regions apart Car,Cup,Face,Juice,Singer,Sunshade,Bike,Car from each other the red region and the green one). Dark,and Jumping These sequences covered al- Evidently,T is over-completed,and the entire diction- most all challenges in tracking,including occlusion ary DR is constructed likewise in Figs.1 (even heavy occlusion),motion blur,rotation,scale and 2. variation,illumination variation,and complex back- The sparse coefficients of CR,.,CR generated ground.For comparison,we used four state-of-the-art with the 1-,the NISSSC-,the original ELSSC-,and algorithms with the same initial positions and the same the improved ELSSC-optimization are plotted in Fig.3. representations of the targets.They were the incremen-In particular,six similar regions have very different tal learning-based tracker (IVT,a common discrimi- representation coefficients,when using the original I- nant tracker)[14],the covariance-based tracker (Cov- optimization problem,which ignores the structure in- Track,a generative tracker on Lie-group)(,the formation between regions.The results of the other tracker (a generative tracking method),and the three algorithms are much more stable,because of NISSST.All the experiments were run on a comput- preservation of the structural information.If two regions er with a 2.67 GHz CPU and a 2 GB memory. are similar to each other,they also have similar sparse The main parameters used in our experiments coefficients.This improves the robustness of tracking; are set as follows:the number of candidate regions otherwise,the tracker may degenerate or even fail to N=200,the number of template regions is n track.CR for example,with I-optimization,can be 10,and the candidates and targets are resized to represented by T2,Ts,T,T,and T1,and the track- 40×40. er may fail to track the top of the book.Meanwhile,ex- 3.2 Experimental results for sparsity and stability perimental results show that,NLSSSC and our two The stability and sparsity of the original sparse ELSSC are sparser than the original 1 -optimization coding,the NISSSC,and the original and improved problem
Table 2 Euclidean Local Structure based Tracking (ELSS⁃tracker) Input:Given a video stream for tracking,location of the target l 1 in frame #1 Output:Tracking results of each frame 1:Set s = 1,select 10 template regions extremely near the target in #1,then resize and stretch them to be T ∈ R 1 600×10 2:While not reach the end of the video sequence,s←s+1 3:Pick N= 200 candidate regions around the latest target location l s⁃1 in frame #s,and stretch to be Y ∈ R 1 600×200 4:Construct D with T, positive and negative identity matries,likewise in Fig.1 and Fig2 5:Solve ELSSC⁃optimization with Y and D in Tab.1,and denote the optimization result as c (s) i 6:Compute the reconstruct errors e (s) i = ‖x (s) i - Vc (s) i ‖2 2 and the normalized weight w (s) i = w (s) i /∑i w (s) i , where w (s) i = exp( - e (s) i / α) 7:Locate the object for tracking with the weighted sum of all 200 candidate regions and w (s) i in frame #s 8:Select 10 regions that extremely nearby the object as the new target templates T 9:End 3 Experiments 3.1 Experimental setting In order to evaluate the proposed tracker, experi⁃ ments on 12 video sequences were conducted, inclu⁃ ding Surfer, Dudek, Faceocc2, Animal, Girl, Stone, Car, Cup, Face, Juice, Singer, Sunshade, Bike, Car Dark, and Jumping [17⁃19] . These sequences covered al⁃ most all challenges in tracking, including occlusion (even heavy occlusion), motion blur, rotation, scale variation, illumination variation, and complex back⁃ ground. For comparison, we used four state⁃of⁃the⁃art algorithms with the same initial positions and the same representations of the targets. They were the incremen⁃ tal learning⁃based tracker ( IVT, a common discrimi⁃ nant tracker) [14] , the covariance⁃based tracker (Cov⁃ Track, a generative tracker on Lie⁃group) [15] , the l 1 ⁃ tracker ( a generative tracking method) [8⁃9] , and the NLSSST [11] .All the experiments were run on a comput⁃ er with a 2.67 GHz CPU and a 2 GB memory. The main parameters used in our experiments are set as follows: the number of candidate regions N = 200, the number of template regions is n = 10, and the candidates and targets are resized to 40 × 40. 3.2 Experimental results for sparsity and stability The stability and sparsity of the original sparse coding, the NLSSSC, and the original and improved ELSSC were verified. The experiments were designed with the Face sequence in the VOT 2013 benchmark dataset [18] as follows: six similar regions were repre⁃ sented (CR1 ,…,CR6 , their means and standard deri⁃ vations illustrate the similarity) sparsely with template T = T1 ,…,T10 [ ] ∈ R 1 600×10 from two regions apart from each other ( the red region and the green one). Evidently, T is over⁃completed, and the entire diction⁃ ary D ∈ R 1 600×3 210 is constructed likewise in Figs. 1 and 2. The sparse coefficients of CR1 ,…, CR6 generated with the l 1 ⁃, the NLSSSC⁃, the original ELSSC⁃, and the improved ELSSC⁃optimization are plotted in Fig. 3. In particular, six similar regions have very different representation coefficients, when using the original l 1 ⁃ optimization problem, which ignores the structure in⁃ formation between regions. The results of the other three algorithms are much more stable, because of preservation of the structural information. If two regions are similar to each other, they also have similar sparse coefficients. This improves the robustness of tracking; otherwise, the tracker may degenerate or even fail to track. CR4 for example, with l 1 ⁃optimization, can be represented by T2 , T8 , T6 , T7 , and T1 , and the track⁃ er may fail to track the top of the book. Meanwhile, ex⁃ perimental results show that, NLSSSC and our two ELSSC are sparser than the original l 1 ⁃optimization problem. 第 1 期 WANG Hongyuan, et al: Efficient tracker based on sparse coding with Euclidean local structure⁃based constraint ·141·
.142. 智能系统学报 第11卷 Mean:65.4463 CR2 Mean:747100 Mean:707238 CR d292313 5i25.2076 Std247166 4an22681 Mean.64.199 CR4 Men:56.3894 St30.6914 CRS Sd27.0634CR Sd318258 6 candidate regions(CR1-CR6) 10 template regions(TI-T10) L1-tracker 0.4 NLSSSC-tracke Original ELSSC-tracke 02 100.040.05 006 0.0 0.11 140.07 320100.i10 00700800的 wse co质iens withN忆SSYC'-rakr Fig.3 Comparisions of sparsity and stability with the original 1,-.NLSSSC-.and our ELSSC-optimization.The sparse coefficients only are accurated to the second decimal place. 3.3 Experimental results for visual target tracking large size of candidates based on the definition of in- We evaluate the investigated algorithms compara-tegral image,the feature extraction of these candidates tively,using the center location errors,the average is so fast,that its cost can be ignored),which makes success rates,and the average frames per second.The it robust for occlusion,scale variation,and blur. results are shown in Figs.4&5 and in Tables 3&4.The NISSST and our original and improved trackers all templates of NISSST,the original ELSST,and the im-work well,when the targets are occluded;our two proved ELSST are shown in Fig.4(g-o).Overall,our trackers work even better. original and improved trackers outperform the other For motion blur,our two trackers work better than state-of-the-art algorithms. IVT and the original l-tracker.Moreover,CovTracking For occlusion,five algorithms,except IVT,func- also reveals its ability to handle blur (e.g.,#4,#9, tion satisfactorily,especially at #206,#366 of the and #38 in Fig.4(d,o).In the former sequence,the Dudek sequence in Fig.4(b)(the head in tracking is animal runs and jumps fast (motion blur)with a lot of covered by the hand and glasses),#143,#265,#496 water splashing occlusion),while in the latter,the of the Faceocc2 sequence in Fig.4(c)(the head in man ropes skipping and the camera cannot take the tracking is covered by the book),#85,#108,#433 of clear face of the man.IVT and I,-tracker fail both from the Girl sequence in Fig.4(e)(the head in tracking #4 in Fig.4(d),and never recover after that.Our o- turns right,turns back,and blocks someone else),riginal and improved ELSS lost the target in #31 and and #56,#104,#301 of the Face sequence in Fig.4 #41,then recovered in #33 and #44 Fig.4(d)).In (i)(the head in tracking is also covered by the #12 to #21 and #44 to #71,the improved ELSST works book).After the target recovers from occlusion,these better than original ELSST,CovTracking,I-tracker, five trackers can seek it quickly.IVT works poorly,e- and NLSSST. ven loses the target in #10 of the Girl sequence Fig.5 For rotation and scale variation,our trackers also (e)),because the number of positive and negative perform robustly (Figs.4(a,c,e,g,j)and 5(a,c,e, samples is limited (considering the learning g,j).When the surfer falls forward and backward,the efficiency),and the incremental updating of the girl turns left and right,moves towards and away from classifier in IVT is less effective.CovTracking has a the camera,the man turns left and right,the car turns
Fig. 3 Comparisions of sparsity and stability with the original l 1 ⁃, NLSSSC⁃, and our ELSSC⁃optimization. The sparse coefficients only are accurated to the second decimal place. 3.3 Experimental results for visual target tracking We evaluate the investigated algorithms compara⁃ tively, using the center location errors, the average success rates, and the average frames per second. The results are shown in Figs. 4&5 and in Tables 3&4. The templates of NLSSST, the original ELSST, and the im⁃ proved ELSST are shown in Fig. 4(g⁃o). Overall, our original and improved trackers outperform the other state⁃of⁃the⁃art algorithms. For occlusion, five algorithms, except IVT, func⁃ tion satisfactorily, especially at # 206, # 366 of the Dudek sequence in Fig. 4 (b) (the head in tracking is covered by the hand and glasses), #143, #265, #496 of the Faceocc2 sequence in Fig. 4 ( c) ( the head in tracking is covered by the book), #85, #108, #433 of the Girl sequence in Fig .4 (e) (the head in tracking turns right, turns back, and blocks someone else), and #56, #104, #301 of the Face sequence in Fig. 4 ( i ) ( the head in tracking is also covered by the book). After the target recovers from occlusion, these five trackers can seek it quickly. IVT works poorly, e⁃ ven loses the target in #10 of the Girl sequence (Fig. 5 (e)), because the number of positive and negative samples is limited ( considering the learning efficiency), and the incremental updating of the classifier in IVT is less effective. CovTracking has a large size of candidates (based on the definition of in⁃ tegral image, the feature extraction of these candidates is so fast, that its cost can be ignored), which makes it robust for occlusion, scale variation, and blur. NLSSST and our original and improved trackers all work well, when the targets are occluded; our two trackers work even better. For motion blur, our two trackers work better than IVT and the original l 1 ⁃tracker. Moreover, CovTracking also reveals its ability to handle blur ( e. g., #4, #9, and #38 in Fig. 4(d,o). In the former sequence, the animal runs and jumps fast (motion blur) with a lot of water splashing ( occlusion), while in the latter, the man ropes skipping and the camera cannot take the clear face of the man. IVT and l 1 ⁃tracker fail both from #4 in Fig. 4(d), and never recover after that. Our o⁃ riginal and improved ELSS lost the target in #31 and #41, then recovered in #33 and #44 (Fig. 4(d)). In #12 to #21 and #44 to #71, the improved ELSST works better than original ELSST, CovTracking, l 1 ⁃tracker, and NLSSST. For rotation and scale variation, our trackers also perform robustly (Figs. 4(a,c,e,g,j) and 5( a,c,e, g,j). When the surfer falls forward and backward, the girl turns left and right, moves towards and away from the camera, the man turns left and right, the car turns ·142· 智 能 系 统 学 报 第 11 卷
1WANG Hongyuan,et al:Efficient tracker based on sparse coding with Euclidean local structure-based constraint .143. over,and the juice bottle becomes bigger and smaller to evaluate the success rate.In general,from the above in Surfer,Girl,Faceocc2,Car,and Juice sequence, analysis,we find that our original and improved respectively,five trackers except IVT perform well,es- ELSSC-trackers perform almost the same,and the for- pecially the NLSSS-tracker and our two ELSSC-trackers. mer is slightly better,especially in the Dudek, In a complex background and with high illumina- Faceoce2,Surfer,Stone,CarDark,and Jumping se- tion variance Fig.4(f)),there are many similar quences (Fig.5(a,b,c,f,n,o).However,we also stones to track.The I,-tracker and our two trackers find from Table 4,which summarizes the average work better than other three trackers.Cov-tracker fails, frames per second,that the improved ELSST works because it extracts edge information of targets as one much faster than the original ELSST and almost all the dimension of features,and in this sequences,edge of other trackers;IVT is faster than the improved ELSST targets are ambiguous and hard to be distinct.Similar when dealing with Surfer and Dudek sequences,but its results are obtained from Fig.4(h,1,m). success rate is much worse than that of the improved Table 3 summarizes the average success rates. ELSST.It is sensitive under the phenomena of occlu- Given the tracking results R and the ground-truth Rc, sion,rotation,and target motion blur.The original I,- we use the detection criterion in the PASCAL VOC tracker performs well in most frames,but it is also challenget],i.e., time-consuming and fails to track sometimes;Cov- area(Rr∩Rc) Tracking is suitable for occlusion and rotation,but fails score area(Rr URc) when facing a complex background. #219 14 出 (a)Sequence surfer (375 frames) (b)Sequence dudek (1145 frames) (c)Sequence faceoce2 (819 frames) *2 (d)Sequence animal (71 frames) (e)Sequence girl(500 frames) (f)Sequence S tone (593 frames) 106 175 231 (g)Sequence car(374 frames) (h)Sequence cup(303 frames (i)Sequence face (415 frames) 5【年 (j)Sequence juice (404 frames) (k)Sequence singer(351 frames) (1)Sequence sunshade (172 frames)
over, and the juice bottle becomes bigger and smaller in Surfer, Girl, Faceocc2, Car, and Juice sequence, respectively, five trackers except IVT perform well, es⁃ pecially the NLSSS⁃tracker and our two ELSSC⁃trackers. In a complex background and with high illumina⁃ tion variance ( Fig. 4 ( f)), there are many similar stones to track. The l 1 ⁃tracker and our two trackers work better than other three trackers. Cov⁃tracker fails, because it extracts edge information of targets as one dimension of features, and in this sequences, edge of targets are ambiguous and hard to be distinct. Similar results are obtained from Fig. 4(h,l,m). Table 3 summarizes the average success rates. Given the tracking results RT and the ground⁃truth RG , we use the detection criterion in the PASCAL VOC challenge [16] ,i.e., score = area(RT ∩ RG ) area(RT ∪ RG ) to evaluate the success rate. In general, from the above analysis, we find that our original and improved ELSSC⁃trackers perform almost the same, and the for⁃ mer is slightly better, especially in the Dudek, Faceocc2, Surfer, Stone, CarDark, and Jumping se⁃ quences (Fig. 5 ( a, b, c, f, n, o). However, we also find from Table 4, which summarizes the average frames per second, that the improved ELSST works much faster than the original ELSST and almost all the other trackers; IVT is faster than the improved ELSST when dealing with Surfer and Dudek sequences, but its success rate is much worse than that of the improved ELSST. It is sensitive under the phenomena of occlu⁃ sion, rotation, and target motion blur. The original l 1 ⁃ tracker performs well in most frames, but it is also time⁃consuming and fails to track sometimes; Cov⁃ Tracking is suitable for occlusion and rotation, but fails when facing a complex background. 第 1 期 WANG Hongyuan, et al: Efficient tracker based on sparse coding with Euclidean local structure⁃based constraint ·143·
·144 智 能系统学 报 第11卷 (m)Sequence bike(228 frames) (n)Sequence cardark(393 frames) (o)Sequence jumping(313 frames) Fig.4 Some tracking results C-tracker NLSSSC-tracker 350 NLSSSC-tracker acker 800 C-tracker 250 ELSS SC-trackerl 300 racker ELSSSC-tracker2 ELSSSC-tracker2 250 II-tracke 700 -11-tracker 200 I-tracker -VT IVT 200 8 150 100 50 100 (E 50 100150200250300350 #Frame #Frame #Frame (a)Sequence surfer(375 frames) (b)Sequence dudek(1145 frames) (c)Sequence faceocc2(819 frames) NLSSSC-tracker NLSSSC-tracker er 300 700 150 NLSSSC-acker 250 racker 600 -tracker2 IVT 200 CovTracking 100 400 CovTracking 300 100 200 0 50 100 nk 100 200.300 400 500 050100150200250300350400 #Frame #F ame #Frame (d)Sequence animal(71 frames) (e)Sequence girl(500 frames) (f)Sequence S tone (593 frames) 350 LSSSC-tracker NISSSC-trac 250 NLSSSC-tracker 300 ELSSSC-trackerl 120 C-trac -tracker 250 ELSSSC-tracker2 100 ELSSSC-tracker2 200 SC-tracke I-tracker tracker cker 200 VT IVT 20 150 CovTracking 00 CovTracking 100 100 50 50 85 0 50 100150200250300350 0 50 100150200 250300 0 50 100150200250300350400 #Frame #Frame #Frame (g)Sequence car(374 frames) (h)Sequence cup (303 frames) (i)Sequence face (415 frames) NLSSSC-tracker -NLSSSC-tracker 400 SC-tracker 500 SC-tracker 500 -tracker 350 ELSSSC-tracker2 ELSSSC-tracker2 C-tracker2 300 11-tracker 400 -tracker IVT 250 CovTracking cking 300 CevTracking 200 150 200 100 5s0 100 100 0 50 100150200250300350 400 0 50 100150200250300350 0 20 40 60 80100120140160 #Frame #Frame #Frame (j)Sequence juice (404 frames) (k)Sequence singer(351 frames) (1)Sequence sunshade (172 frames) NLSSSC-tra cke ker SSS C-tracker NLSSSC-tracker 400 150 -tracker 250 C-tracker 350 I-tracker ELSSSC-tracker2 300 200 II-tracker CovTracking 100 IVT 200 CoyTracking 50 100 50 50 的 50 100 150 200 050100150200250300350400 0 50100150200250300 #Frame #Frame #Frame (m)Sequence bike (228 frames) (n)Sequence cardark(393 frames) (o)Sequence jumping (313 frames) Fig.5 Quantitative evaluation in terms of center location error in pixel)
Fig. 4 Some tracking results Fig. 5 Quantitative evaluation in terms of center location error (in pixel) ·144· 智 能 系 统 学 报 第 11 卷
1 WANG Hongyuan,et al:Efficient tracker based on sparse coding with Euclidean local structure-based constraint .145. Table 3 Average Success Rates Video IVT CovTrack 1-tracker NLSSST ELSST1 ELSST2 Sufer 0.0515 0.4770 0.0388 0.4646 0.4667 0.4052 Dudek 0.2011 0.4216 0.6215 0.6528 0.6726 0.6604 Faceocc2 0.4553 0.3918 0.6084 0.4579 0.5747 0.4641 Animal 0.0218 0.2701 0.0336 0.3692 0.4078 0.4117 Girl 0.0228 0.2171 0.4869 0.4853 0.4006 0.4693 Stone 0.0974 0.1114 0.5834 0.4109 0.6611 0.6572 Car 0.0607 0.1858 0.0956 0.3418 0.3278 0.3825 Cup 0.6300 0.3769 0.5598 0.5738 0.5238 0.5637 Face 0.3341 0.2806 0.0479 0.5248 0.5496 0.5827 Juice 0.0743 0.4218 0.5111 0.5299 0.5186 0.5835 Singer 0.3326 0.1361 0.1184 0.5790 0.4781 0.5651 Sunshade 0.0481 0.1803 0.5257 0.5348 0.4743 0.4948 Bike 0.0576 0.3721 0.0451 0.4438 0.3608 0.3917 CarDark 0.0831 0.3087 0.0790 0.0110 0.4208 0.3737 Jumping 0.0577 0.2755 0.0711 0.0847 0.4530 0.4505 The best two results are shown in bold.Our original and improved algorithms are shown in the last two columns,respectively. Table 4 Average Frames per Second Video IVT CovTrack 1-tracker NLSSST ELSSTI ELSST2 Sufer 2.8649 1.5707 0.0358 0.0141 0.0156 2.3469 Dudek 3.3211 1.2454 0.0388 0.0171 0.0179 3.2454 Faceoce2 2.7886 1.1278 0.0180 0.0107 0.0142 3.1278 Animal 1.8979 1.2534 0.0312 0.0150 0.0071 3.2534 Girl 1.6548 1.2209 0.0376 0.0167 0.0098 3.2209 Stone 1.2903 1.8890 0.0271 0.0144 0.0146 4.1138 Car 3.6841 2.8502 0.0621 0.0525 0.0365 6.2253 Cup 7.8175 3.5479 0.0798 0.0677 0.0538 6.3949 Face 6.7422 2.8961 0.0543 0.0417 0.0546 6.1681 Juice 7.0489 3.9297 0.0635 0.0665 0.0586 5.4738 Singer 6.0959 2.8026 0.0195 0.0683 0.0481 6.1891 Sunshade 7.3027 2.7905 0.0713 0.0587 0.0781 6.0601 Bike 6.9747 2.7192 0.0163 0.0320 0.0210 5.8400 CarDark 3.7041 1.3951 0.0226 0.0552 0.0295 2.4221 Jumping 7.3296 2.6080 0.0520 0.0476 0.0579 3.5519 The best two results are shown in bold.Our original and improved algorithms are shown in the last two columns,respectively
Table 3 Average Success Rates Video IVT CovTrack l 1 ⁃tracker NLSSST ELSST1 ELSST2 Sufer 0.051 5 0.477 0 0.038 8 0.464 6 0.466 7 0.405 2 Dudek 0.201 1 0.421 6 0.621 5 0.652 8 0.672 6 0.660 4 Faceocc2 0.455 3 0.391 8 0.608 4 0.457 9 0.574 7 0.464 1 Animal 0.021 8 0.270 1 0.033 6 0.369 2 0.407 8 0.411 7 Girl 0.022 8 0.217 1 0.486 9 0.485 3 0.400 6 0.469 3 Stone 0.097 4 0.111 4 0.583 4 0.410 9 0.661 1 0.657 2 Car 0.060 7 0.185 8 0.095 6 0.341 8 0.327 8 0.382 5 Cup 0.630 0 0.376 9 0.559 8 0.573 8 0.523 8 0.563 7 Face 0.334 1 0.280 6 0.047 9 0.524 8 0.549 6 0.582 7 Juice 0.074 3 0.421 8 0.511 1 0.529 9 0.518 6 0.583 5 Singer 0.332 6 0.136 1 0.118 4 0.579 0 0.478 1 0.565 1 Sunshade 0.048 1 0.180 3 0.525 7 0.534 8 0.474 3 0.494 8 Bike 0.057 6 0.372 1 0.045 1 0.443 8 0.360 8 0.391 7 CarDark 0.083 1 0.308 7 0.079 0 0.011 0 0.420 8 0.373 7 Jumping 0.057 7 0.275 5 0.071 1 0.084 7 0.453 0 0.450 5 The best two results are shown in bold. Our original and improved algorithms are shown in the last two columns, respectively. Table 4 Average Frames per Second Video IVT CovTrack l 1 ⁃tracker NLSSST ELSST1 ELSST2 Sufer 2.864 9 1.570 7 0.035 8 0.014 1 0.015 6 2.346 9 Dudek 3.321 1 1.245 4 0.038 8 0.017 1 0.017 9 3.245 4 Faceocc2 2.788 6 1.127 8 0.018 0 0.010 7 0.014 2 3.127 8 Animal 1.897 9 1.253 4 0.031 2 0.015 0 0.007 1 3.253 4 Girl 1.654 8 1.220 9 0.037 6 0.016 7 0.009 8 3.220 9 Stone 1.290 3 1.889 0 0.027 1 0.014 4 0.014 6 4.113 8 Car 3.684 1 2.850 2 0.062 1 0.052 5 0.036 5 6.225 3 Cup 7.817 5 3.547 9 0.079 8 0.067 7 0.053 8 6.394 9 Face 6.742 2 2.896 1 0.054 3 0.041 7 0.054 6 6.168 1 Juice 7.048 9 3.929 7 0.063 5 0.066 5 0.058 6 5.473 8 Singer 6.095 9 2.802 6 0.019 5 0.068 3 0.048 1 6.189 1 Sunshade 7.302 7 2.790 5 0.071 3 0.058 7 0.078 1 6.060 1 Bike 6.974 7 2.719 2 0.016 3 0.032 0 0.021 0 5.840 0 CarDark 3.704 1 1.395 1 0.022 6 0.055 2 0.029 5 2.422 1 Jumping 7.329 6 2.608 0 0.052 0 0.047 6 0.057 9 3.551 9 The best two results are shown in bold. Our original and improved algorithms are shown in the last two columns, respectively. 第 1 期 WANG Hongyuan, et al: Efficient tracker based on sparse coding with Euclidean local structure⁃based constraint ·145·