第3卷第5期 智能系统学报 Vol 3 No 5 2008年10月 CAA I Transactions on Intelligent Systems 0ct2008 基于局部SM分类器的表情识别方法 孙正兴,徐文晖 (南京大学计算机软件新技术国家重点实验室,江苏南京210093) 摘要:提出了一种新的视频人脸表情识别方法.该方法将识别过程分成人脸表情特征提取和分类2个部分,首先采用 基于点跟踪的活动形状模型(ASM)从视频人脸中提取人脸表情几何特征:然后,采用一种新的局部支撑向量机分类器对 表情进行分类.在Cohn-Kanade数据库上对N、SM、NN-SM和LSM4种分类器的比较实验结果验证了所提出方 法的有效性 关键字:人脸表情识别:局部支撑向量机活动形状模型;几何特征 中图分类号:IP391文献标识码:A文章编号:1673-4785(2008)050455-12 Facal expression recogn ition based on local SVI classifiers SUN Zheng-xing,XU Wen-hui (State Key Lab or Novel Sofware Technobgy,Nanjing University,Nanjing 210093,China) Abstract:This paper presents a novel technique developed for the identification of facial expressions in video sources The method uses two steps facial expression feature extraction and expression classification Firstwe used an active shape model (ASM)based on a facial point tracking system to extract the geometric features of facial ex- pressions in videos Then a new type of local support vecpormachine (LSVM)was created to classify the facial ex- pressions Four different classifiers using KNN,SVM,KNN-SVM,and LSVM were compared with the new LSVM. The results on the Cohn-Kanade database showed the effectiveness of our method Keywords:facial expression recognition;bcal SVM;active shape model,geometry feature Automatic facial expression recogniton has attrac-erature to automate the recognition of facial expressions ted a lot of attention in recent years due o its potential in mug shots or video sequences Early methods used ly vital role in applications,particularly those using mug shots of expressions that captured characteristic human centered interfaces Many applications,such as mages at the apex However,according o psy virtual reality,video-conferencing.user pofiling.and chobgists,analysis of video sequences produces customer satisfaction studies for broadcast and web more accurate and robust recognition of facial expres- services,require efficient facial expresson recognition sions These methods can be categorized based on the in order to achieve their desired results Therefore,the data and features they use,as well as the classifiers mpact of facial expression recognition on the above-created for expression recognition In summary,the mentoned applications is constantly growing classifiers include Nearest Neighbor classifier,Neu- Several app oaches have been reported in the lit ral Neworks,SVM,Bayesian Neworks,Ada- Boost classifie and hidden Markov mode The 收稿日期:2008-07-11 data used for automated facial expresson analysis 基金项目:National Hig-Technolgy Research and Develpment Program (863)of China (2007AA01Z334);National Natural Science (AFEA)can be geometric features or texture features, Foundaton of China(69903006,60373065,0721002). 通信作者:孙正兴.Email:sx@nju.edu.cn for each there are different feature extracton methods Though facial expression recognition has made remark- 1994-2009 China Academic Journal Electronic Publishing House.All rights reserved.http://www.cnki.net
第 3卷第 5期 智 能 系 统 学 报 Vol. 3 №. 5 2008年 10月 CAA I Transactions on Intelligent System s Oct. 2008 基于局部 SVM分类器的表情识别方法 孙正兴 ,徐文晖 (南京大学 计算机软件新技术国家重点实验室 ,江苏 南京 210093) 摘 要 :提出了一种新的视频人脸表情识别方法. 该方法将识别过程分成人脸表情特征提取和分类 2个部分 ,首先采用 基于点跟踪的活动形状模型 (ASM)从视频人脸中提取人脸表情几何特征 ;然后 ,采用一种新的局部支撑向量机分类器对 表情进行分类. 在 Cohn2Kanade数据库上对 KNN、SVM、KNN2SVM和 LSVM 4种分类器的比较实验结果验证了所提出方 法的有效性. 关键字 :人脸表情识别 ;局部支撑向量机 ;活动形状模型 ;几何特征 中图分类号 : TP391 文献标识码 : A 文章编号 : 167324785 (2008) 0520455212 Fac ial expression recogn ition based on local SVM classifiers SUN Zheng2xing, XU W en2hui ( State Key Lab for Novel Software Technology, Nanjing University, Nanjing 210093, China) Abstract: This paper p resents a novel technique developed for the identification of facial exp ressions in video sources. The method uses two step s: facial exp ression feature extraction and exp ression classification. Firstwe used an active shape model (ASM) based on a facial point tracking system to extract the geometric features of facial ex2 p ressions in videos. Then a new type of local support vectormachine (LSVM) was created to classify the facial ex2 p ressions. Four different classifiers using KNN, SVM, KNN2SVM, and LSVM were compared with the new LSVM. The results on the Cohn2Kanade database showed the effectiveness of our method. Keywords: facial exp ression recognition; local SVM; active shape model; geometry feature 收稿日期 : 2008207211. 基金项目 : National High Technology Research and Development Program (863) of China ( 2007AA01Z334) ; National Natural Science Automatic facial exp ression recognition has attrac2 ted a lot of attention in recent years due to its potential2 ly vital role in app lications, particularly those using human centered interfaces. Many app lications, such as virtual reality, video2conferencing, user p rofiling, and customer satisfaction studies for broadcast and web services, require efficient facial exp ression recognitio Foundation of China (69903006, 60373065, 0721002) . 通信作者 :孙正兴. E2mail: szx@nju.edu.cn n in order to achieve their desired results. Therefore, the impact of facial exp ression recognition on the above2 mentioned app lications is constantly growing. Several app roaches have been reported in the lit2 erature to automate the recognition of facial exp ressions in mug shots or video sequences. Early methods used mug shots of exp ressions that cap tured characteristic images at the apex [ 122 ] . However, according to p sy2 chologists [ 3 ] , analysis of video sequences p roduces more accurate and robust recognition of facial exp res2 sions. These methods can be categorized based on the data and features they use, as well as the classifiers created for exp ression recognition. In summary, the classifiers include Nearest Neighbor classifier [ 4 ] , Neu2 ral Networks [ 5 ] , SVM [ 6 ] , Bayesian Networks [ 7 ] , Ada2 Boost classifier [ 6 ] and hidden Markov model [ 8 ] . The data used for automated facial exp ression analysis (AFEA) can be geometric features or texture features, for each there are different feature extraction methods. Though facial exp ression recognition has made remark2
·456· 智能系统学报 第3卷 able progress,recognizing facial expressions with high active shape model (ASM)based tracking In each accuracy is a difficult problem!AFEA and its effec- video sequence,the first frame shows a neutral exp res- tive use in computing presents a number of difficult sion while the last frame shows an expression with max- challenges In general,wo main processes can be dis-mum intensity For each frame,we extract geometric tinguished in tackling the problem:1)ldentification of features as a static feature vector,which represents fa- features that contain useful infomation and reduction of cial contour infomation during changes of expression the dmensions of feature vectors in order to design bet- At the end,by subtracting the static features of the first ter classifiers 2)Design and mplementation of robust frame from those of the last,we get dynam ic geometric classifiers that can leam the underlying models of facial infomation or classifier input Then an LSVM classifi- expressons er is used for classification into the six basic expression We propose a new classifier for facial expression types recognition,which comes from the ideas used in the The rest of the paper is organized as follows Sec- KNN-SVM algorithm.Ref [10 proposed this algo- tion 2 reviews facial expression recognition studies In rithm for visual object recognition This method com- Secton 3 we briefly describe our facial point tracking bines SVM and KNN classifiers and iplements accu- system and the features extracted for classification of rate local classification by using KNN for selecting rele- facial expressions Section 4 describes the Local SVM classifier used for classifying the six basic facial ex- vant training data for the SVM.In order to classify a pressions in the video sequences Experments,per sample x,it first selects k training samples nearest to fomance evaluations,and discussions are given in sec- the sample x,and then uses these k samples o train an tion 5.Finally,section 6 gives conclusions about our SVM model which is then used to make decisions work KNN-SVM builds a maxmal margin classifier in the neighborhood of a test sample using the feature space 1 Rela ted work induced by the SVM's kemel function But this classifi- Psychological studies have suggested that facial er discards nearest-neighbor searches from the SVM motion is fundamental to the recognition of facial ex- leaming algorithm.Once the K-nearest neighbors have pression Expermnents conducted by Bassili demon- been identified,the SVM algorithm completely ignores strated that humans do a better job recognizing exp res- their si ilarities to the given test example So we pres- sions from dynam ic mages as opposed to mug shots ent a new classifier based on KNN-SVM,called bcal Facial expressions are usually described in to ways SVM (LSVM),which incorporates neighborhood infor as combinations of action units,or as universal expres- mation into SVM leaming The principle behind LSVM sions The facial action coding system (FACS)was is that it reduces the mpact of support vectors located devebped to describe facial exp ressions using a combi- far away from a given test example nation of action units (AU)Each action unit co In this paper,a system for automatically recogniz responds to specific muscular activity that produces ing the six universal facial expressions anger,dis- momentary changes in facial appearance Universal ex- gust,fear,joy,sadness,and surprise)in video se- pressions are studied as a complete representation of a quences using geometrical feature and a novel class of specific type of intemal emotion,without breaking up SVM called LSVM is proposed The system detects expressions into muscular units Most commonly stud- frontal faces in video sequences and then geometrical ied universal expressions include happ iness,anger, features of some key facial points are extracted using sadness,fear,and disgust In this study,universal 1994-2009 China Academic Journal Electronic Publishing House.All rights reserved.http://www.cnki.net
able p rogress, recognizing facial exp ressions with high accuracy is a difficult p roblem [ 9 ] . AFEA and its effec2 tive use in computing p resents a number of difficult challenges. In general, two main p rocesses can be dis2 tinguished in tackling the p roblem: 1) Identification of features that contain useful information and reduction of the dimensions of feature vectors in order to design bet2 ter classifiers. 2) Design and imp lementation of robust classifiers that can learn the underlyingmodels of facial exp ressions. W e p ropose a new classifier for facial exp ression recognition, which comes from the ideas used in the KNN2SVM algorithm. Ref. [ 10 ] p roposed this algo2 rithm for visual object recognition. This method com2 bines SVM and KNN classifiers and imp lements accu2 rate local classification by using KNN for selecting rele2 vant training data for the SVM. In order to classify a samp le x, it first selects k training samp les nearest to the samp le x, and then uses these k samp les to train an SVM model which is then used to make decisions. KNN2SVM builds a maximal margin classifier in the neighborhood of a test samp le using the feature space induced by the SVM’s kernel function. But this classifi2 er discards nearest2neighbor searches from the SVM learning algorithm. Once the K2nearest neighbors have been identified, the SVM algorithm comp letely ignores their sim ilarities to the given test examp le. So we p res2 ent a new classifier based on KNN2SVM, called local SVM (LSVM) , which incorporates neighborhood infor2 mation into SVM learning. The p rincip le behind LSVM is that it reduces the impact of support vectors located far away from a given test examp le. In this paper, a system for automatically recogniz2 ing the six universal facial exp ressions ( anger, dis2 gust, fear, joy, sadness, and surp rise) in video se2 quences using geometrical feature and a novel class of SVM called LSVM is p roposed. The system detects frontal faces in video sequences and then geometrical features of some key facial points are extracted using active shape model (ASM ) based tracking. In each video sequence, the first frame shows a neutral exp res2 sion while the last frame shows an exp ression with max2 imum intensity. For each frame, we extract geometric features as a static feature vector, which rep resents fa2 cial contour information during changes of exp ression. A t the end, by subtracting the static features of the first frame from those of the last, we get dynam ic geometric information for classifier input. Then an LSVM classifi2 er is used for classification into the six basic exp ression types. The rest of the paper is organized as follows. Sec2 tion 2 reviews facial exp ression recognition studies. In Section 3 we briefly describe our facial point tracking system and the features extracted for classification of facial exp ressions. Section 4 describes the Local SVM classifier used for classifying the six basic facial ex2 p ressions in the video sequences. Experiments, per2 formance evaluations, and discussions are given in sec2 tion 5. Finally, section 6 gives conclusions about our work. 1 Rela ted work Psychological studies have suggested that facial motion is fundamental to the recognition of facial ex2 p ression. Experiments conducted by Bassili [ 11 ] demon2 strated that humans do a better job recognizing exp res2 sions from dynam ic images as opposed to mug shots. Facial exp ressions are usually described in two ways: as combinations of action units, or as universal exp res2 sions. The facial action coding system ( FACS) was developed to describe facial exp ressions using a combi2 nation of action units (AU) [ 12 ] . Each action unit cor2 responds to specific muscular activity that p roduces momentary changes in facial appearance. Universal ex2 p ressions are studied as a comp lete rep resentation of a specific type of internal emotion, without breaking up exp ressions into muscular units. Most commonly stud2 ied universal exp ressions include happ iness, anger, sadness, fear, and disgust. In this study, universal ·456· 智 能 系 统 学 报 第 3卷
第5期 孙正兴,等:基于局部SM分类器的表情识别方法 ·457- expressons were analyzed using the facial exp ression sions For the static case,a DBN is used,organized in coding system. a tree structure For the dynam ic approach,a multi- Many automated facial expression analysis meth- level hidden Markov models (HMMs)classifier is em- ods have been devebped Masued optical plyed flow (OF)to recognize facial exp ressions He was one These methods are sm ilar in that they first extract of the first to use mage-processing techniques to recog- some features from the mages,then these features are nize facial expressions Black and Yacoobls1 used used as inputs into a classification system,and the out cal parameterized models of mage motion to recover come is one of the pre-selected emotion categories non-rigid motion Once recovered,these parameters They differ mainly in the features extracted fiom the were used as inputs o a rule-based classifier o recog video mages and in the classifiers used to distinguish nize the six basic facial expressions Ref [16]used beteen the different emotions In the follwing sec- bwer face tracking to extract mouth shape features and tions,an automatic geometric feature based method is used them as inputs to an HMM based facial expression proposed,and then LSVM classifiers are used for rec- recognition system recognizing neutral,happy,sad, ognizing facial expressions fiom video sequences and an open mouth).Bartlett autmatically detects 2 Geom etrical fea ture extraction frontal faces in the video stream and classifies them in seven classes in real tme:neutral,anger,disgust, Our work focused on the design of classifiers for fear,joy,sadness,and surprise An exp ression recog- mproving recognition accuracy,follwing the extrac- nizer receives mage regions produced by a face detec- tion of geometric features using a model-based face tor and then a Gabor representation of the facial mage tracking system.That is,the proposed process for fa- region is fomed to be later processed by a bank of cial expression recognition is composed of to steps SVM classifiers Facial feature detection and tracking one AS based geometric infomation extraction;the is based on active InfraRed illum ination in Ref [18], next LSVM based classification Geometric feature in- in order to provide visual infomation under variable fomation extraction is perfomed by ASM based auto- lighting and head motion The classification is per matic locating and tracking,while the classification of fomed using a dynam ic Bayesian netork (DBN).geometric infomation is peromed by an LSVM Classi- COHEN eta popod a methodor static and dy- fier Fig 1 shows the proposed facial expression recog- nam ic segentation and classification of facial expres- nition scheme First frame Distance Six general classification parameters of facial expressions 个 Face video ASM-based Geometric Local SVM sequences tracking features classifiers ◆ Distance Classifier Last frame parameters Samples collection training Fig 1 Process of facial expression recognition for video sequences 1994-2009 China Academic Journal Electronic Publishing House.All rights reserved.http://www.cnki.net
exp ressions were analyzed using the facial exp ression coding system. Many automated facial exp ression analysis meth2 ods have been developed [ 13 ] . Mase [ 14 ] used op tical flow (OF) to recognize facial exp ressions. He was one of the first to use image2p rocessing techniques to recog2 nize facial exp ressions. B lack and Yacoob [ 15 ] used lo2 cal parameterized models of image motion to recover non2rigid motion. Once recovered, these parameters were used as inputs to a rule2based classifier to recog2 nize the six basic facial exp ressions. Ref. [ 16 ] used lower face tracking to extract mouth shape features and used them as inputs to an HMM based facial exp ression recognition system ( recognizing neutral, happy, sad, and an open mouth). Bartlett [ 17 ] automatically detects frontal faces in the video stream and classifies them in seven classes in real time: neutral, anger, disgust, fear, joy, sadness, and surp rise. An exp ression recog2 nizer receives image regions p roduced by a face detec2 tor and then a Gabor rep resentation of the facial image region is formed to be later p rocessed by a bank of SVM classifiers. Facial feature detection and tracking is based on active InfraRed illum ination in Ref. [ 18 ], in order to p rovide visual information under variable lighting and head motion. The classification is per2 formed using a dynam ic Bayesian network (DBN ). COHEN et al [ 18 ] p roposed a method for static and dy2 nam ic segmentation and classification of facial exp res2 sions. For the static case, a DBN is used, organized in a tree structure. For the dynam ic app roach, a multi2 level hidden Markov models (HMM s) classifier is em2 p loyed. These methods are sim ilar in that they first extract some features from the images, then these features are used as inputs into a classification system, and the out2 come is one of the p re2selected emotion categories. They differ mainly in the features extracted from the video images and in the classifiers used to distinguish between the different emotions. In the following sec2 tions, an automatic geometric feature based method is p roposed, and then LSVM classifiers are used for rec2 ognizing facial exp ressions from video sequences. 2 Geom etr ica l fea ture extraction Our work focused on the design of classifiers for imp roving recognition accuracy, following the extrac2 tion of geometric features using a model2based face tracking system. That is, the p roposed p rocess for fa2 cial exp ression recognition is composed of two step s: one ASM based geometric information extraction; the next LSVM based classification. Geometric feature in2 formation extraction is performed by ASM based auto2 matic locating and tracking, while the classification of geometric information is performed by an LSVM Classi2 fier. Fig. 1 shows the p roposed facial exp ression recog2 nition scheme. Fig. 1 Process of facial exp ression recognition for video sequences 第 5期 孙正兴 ,等 :基于局部 SVM分类器的表情识别方法 ·457·
·458· 智能系统学报 第3卷 For each input video sequence,an AdaBoost Where,B is a regulating parameter usually set beteen based face detector is applied to detect frontal and I and 3 according to the desired degree of flexibility in near-frontal faces in the first frame Inside detected the shape model m is the number of retained eigen- faces,our method identifies some mportant facial vectors,and is the eigenvalues of the covariance landmarks using the active shape model (ASM).ASM matrix The intensity model is constructed by compu- automatically bcalizes the facial feature points in the ting the second order statistics of nomalized mage gra- first frame and then trackes the feature points through dients,sampled at each side of the landmarks,perpen- the video frames as the facial expression evolves dicular to the shape's contour,hereinafter referred to as through tme The first frame shows a neutral expres-the profile In other words,the profile is a fixed-size sion while the last frame shows an expression with the vector of values in this case,pixel intensity values) greatest intensity For each frame,we extract distance smpled alng the pependicular o the conour such parameters beween some key facial points At the that the contour passes right through the middle of the end,by subtracting distance parameters from the first perpendicular The matching procedure is an altema- frame from those of the last frame,we get the geomet-tion of mage driven landmark disp lacements and statis- ric features for classification Then a LSVM classifier is tical shape constraining based on the PDM.It is usual- used for classification into the six basic expression ly perfomed in a multi-resolution fashion in order to types enhance the capture range of the algorithm.The land- 2 1 ASM based loca tng and tracking mark displacements are individually detem ined using AM is empboyed to extract shape infomation the intensity model,by m inm izing the Mahalanobis on specific faces in each frame of the video sequence distance beteen the candidate gradient and the model's The use of a face detection algorithm as a prior step has mean the advantage of speeding up the search for the shape To extract facial feature points in case of expres- parameters during ASM based processing ASM is built sion variation,we trained an active shape model from from sets of prom inent points known as landmarks, the JAFFE (Japanese female facial expression)data- computing a point distribution model (PDM)and a ba,which contains219 mages from 10 individual cal mage intensity model around each of those points Japanese females For each subject there are six basic The PDM is constructed by app lying PCA to an aligned facial expressions (anger,disgust,fear,happ iness, set of shapes,each represented by landmarks The o- sadness,suprise)and a neutral face 68 landmarks riginal shapes and their model representation b,(i=1.are used to define the face shape,as shown in Fig 2 2..N)are related by means of the mean shape u and the eigenvector matrixφ: b=φT(u-d,w,=u+中b. (1) To reduce the dmensions of the representation,it is possible to use only the eigenvectors corresponding to the largest eigenvalues Therefore,Equ (1)becomes an approxmation,with an error depending on the mag- nitude of the excluded eigenvalues Furthemore,un- der Gaussian assumptions,each component of the b, vectors is constrained to ensure that only valid shapes are represented,as follows |1≤BNn,1≤i≤N,1≤m≤M2) Fig 2 ASM training smple 1994-2009 China Academic Journal Electronic Publishing House.All rights reserved.http://www.cnki.net
For each input video sequence, an AdaBoost based face detector is app lied to detect frontal and near2frontal faces in the first frame. Inside detected faces, our method identifies some important facial landmarks using the active shape model (ASM). ASM automatically localizes the facial feature points in the first frame and then trackes the feature points through the video frames as the facial exp ression evolves through time. The first frame shows a neutral exp res2 sion while the last frame shows an exp ression with the greatest intensity. For each frame, we extract distance parameters between some key facial points. A t the end, by subtracting distance parameters from the first frame from those of the last frame, we get the geomet2 ric features for classification. Then a LSVM classifier is used for classification into the six basic exp ression types. 2. 1 ASM ba sed loca ting and tracking ASM [ 19 ] is emp loyed to extract shape information on specific faces in each frame of the video sequence. The use of a face detection algorithm as a p rior step has the advantage of speeding up the search for the shape parameters during ASM based p rocessing. ASM is built from sets of p rom inent points known as landmarks, computing a point distribution model (PDM) and a lo2 cal image intensity model around each of those points. The PDM is constructed by app lying PCA to an aligned set of shapes, each rep resented by landmarks. The o2 riginal shapes and their model rep resentation bi ( i = 1, 2, …, N ) are related by means of the mean shape u and the eigenvector matrixφ: bi =φT ( ui - u) , ui = u +φbi . (1) To reduce the dimensions of the rep resentation, it is possible to use only the eigenvectors corresponding to the largest eigenvalues. Therefore, Equ. (1) becomes an app roximation, with an error depending on the mag2 nitude of the excluded eigenvalues. Furthermore, un2 der Gaussian assump tions, each component of the bi vectors is constrained to ensure that only valid shapes are rep resented, as follows: | b m i | ≤β λm , 1 ≤ i ≤N, 1 ≤m ≤M. (2) W here, β is a regulating parameter usually set between 1 and 3 according to the desired degree of flexibility in the shape model. m is the number of retained eigen2 vectors, and λm is the eigenvalues of the covariance matrix. The intensity model is constructed by compu2 ting the second order statistics of normalized image gra2 dients, samp led at each side of the landmarks, perpen2 dicular to the shape’s contour, hereinafter referred to as the p rofile. In other words, the p rofile is a fixed2size vector of values ( in this case, p ixel intensity values) samp led along the perpendicular to the contour such that the contour passes right through the m iddle of the perpendicular. The matching p rocedure is an alterna2 tion of image driven landmark disp lacements and statis2 tical shape constraining based on the PDM. It is usual2 ly performed in a multi2resolution fashion in order to enhance the cap ture range of the algorithm. The land2 mark disp lacements are individually determ ined using the intensity model, by m inim izing the Mahalanobis distance between the candidate gradient and the model’s mean. To extract facial feature points in case of exp res2 sion variation, we trained an active shape model from the JAFFE (Japanese female facial exp ression) data2 base [ 19 ] , which contains 219 images from 10 individual Japanese females. For each subject there are six basic facial exp ressions ( anger, disgust, fear, happ iness, sadness, surp rise) and a neutral face. 68 landmarks are used to define the face shape, as shown in Fig. 2. Fig. 2 ASM training samp le ·458· 智 能 系 统 学 报 第 3卷
第5期 孙正兴,等:基于局部SM分类器的表情识别方法 ·459- ding the classifier with data that encode the most m- portant aspects of the facial expressions The distance parameters are computed as the mplicit fixed Euclide- an distances between key points The complete list of such distance parameters is given in Table 1 In Table 1, (P,.P),represents the horion distance beteen points P,and P,.(P,.P),represents the vertical dis- tance beteen points P,and P.Because when facial Fig 3 Facial characteristic points expressions change,most movement is in the vertical 22 Facil characteristic points model direction,most of the distance parameters compute ver The shape infomation extracted by AM from a tical distance We extracted the differences beteen face mage is used to compute a set of distance parame- the last and the first frame's distance parameters as the ters that describe the appearance of facial features geometric features The geometric features capture the ASM extracts 68 facial points,however some of these subtle changes in facial expression which varied over don't reflect changes in facial exp ressions The first the video sequence Let Vena be the distance parameter step is the selection of the 20 optmal key facial points, of the last frame,Ve be the distance parameter of the those which change the most with changes in expres- first frame. sion These key points P are defined as the facial char =Vead-eem,i∈fl,2,…sNk.(3) acteristic points (FCPs,Fig 3),which were derived Where x,is the geometric feature of the i-th video se- from the Kobayashi Hara model2.In the second quence,which is defined as the difference beteen step the FCPs are converted into some distance param- static features of the first frame and the last frame The eters This parameterization has the advantage of provi- diension of the geometric feature x,is 18 Table 1 The set of distance parameters meaning Visual feature v meaning Visual feature meaning Visual feature n(Po.P) Left eyebrow h(P,P9), Left eye n3 (Pu.P16)y Mouse 2(Po,P2), Left eyebrow (P6.Ps) Left eye M4 (Pis,Pis)y Mouse (P,P, Right eyebrow (P6.P)y Left eye ns (Pu.Pis)y Mouse v(P3.Ps)Right eyebrow Vho (Pu.Pu)y Right eye h6(P4,P, Mouse s (Po.Pu),Left eyebrow vu (Pio.Pu)y Right eye Vi (Pis.Pu)x Mouse 1(P3.Pu),Right eyebrow 2(P10,P3, R ight eye Vis (Pu:Pis)y Chin 3 Facil expression recogn ition based cial expression recognition accuracy We propose a fur ther mprovement,an LSVM classifier for facial exp res- on local SVM sion recognition,with its oots in the KNN-SVM!0 Effective facial expression recognition is a key classifier,but KNN-SVM decouples the nearest-neigh- problem in automated facial expression analysis The bor search from the SVM leaming algorithm.Once the KNNI201 and SVMI21 classifiers have been successfully K-nearest neighbors have been identified,the SVM al- applied to facial expression recognition and mprove fa- gorithm comp letely ignores their sm ilarities to the given 1994-2009 China Academic Journal Electronie Publishing House.All rights reserved.http://www.cnki.net
Fig. 3 Facial characteristic points 2. 2 Fac ia l character istic po ints m odel The shape information extracted by ASM from a face image is used to compute a set of distance parame2 ters that describe the appearance of facial features. ASM extracts 68 facial points, however some of these don ’t reflect changes in facial exp ressions. The first step is the selection of the 20 op timal key facial points, those which change the most with changes in exp res2 sion. These key points P are defined as the facial char2 acteristic points (FCPs, Fig. 3) , which were derived from the Kobayashi & Hara model [ 2 ] . In the second step the FCPs are converted into some distance param2 eters. This parameterization has the advantage of p rovi2 ding the classifier with data that encode the most im2 portant aspects of the facial exp ressions. The distance parameters are computed as the imp licit fixed Euclide2 an distances between key points. The comp lete list of such distance parameters is given in Table 1. In Table 1, ( Pi , Pj ) x rep resents the horizon distance between points Pi and Pj , ( Pi , Pj ) y rep resents the vertical dis2 tance between points Pi and Pj . Because when facial exp ressions change, most movement is in the vertical direction, most of the distance parameters compute ver2 tical distance. We extracted the differences between the last and the first frame’s distance parameters as the geometric features. The geometric features cap ture the subtle changes in facial exp ression which varied over the video sequence. Let Vend be the distance parameter of the last frame, Vbegin be the distance parameter of the first frame, xi = Vend - Vbegin , i ∈ { 1, 2, …, N }. (3) W here xi is the geometric feature of the i2th video se2 quence, which is defined as the difference between static features of the first frame and the last frame. The dimension of the geometric feature xi is 18. Table 1 The set of d istance param eters vi meaning V isual feature vi meaning V isual feature vi meaning V isual feature v1 ( P0 , P1 ) y Left eyebrow v7 ( P7 , P9) y Left eye v13 ( P14 , P16 ) y Mouse v2 ( P0 , P2 ) y Left eyebrow v8 ( P6 , P8 ) y Left eye v14 ( P15 , P18 ) y Mouse v3 ( P3 , P4 ) y Right eyebrow v9 ( P6 , P9 ) y Left eye v15 ( P14 , P15 ) y Mouse v4 ( P3 , P5 ) y Right eyebrow v10 ( P11 , P13 ) y Right eye v16 ( P14 , P17 ) y Mouse v5 ( P0 , P14 ) y Left eyebrow v11 ( P10 , P12 ) y Right eye v17 ( P15 , P17 ) x Mouse v6 ( P3 , P14 ) y Right eyebrow v12 ( P10 , P13 ) y Right eye v18 ( P14 , P19 ) y Chin 3 Fac ia l expression recogn ition ba sed on loca l SVM Effective facial exp ression recognition is a key p roblem in automated facial exp ression analysis. The KNN [ 20 ] and SVM [ 21 ] classifiers have been successfully app lied to facial exp ression recognition and imp rove fa2 cial exp ression recognition accuracy. W e p ropose a fur2 ther imp rovement, an LSVM classifier for facial exp res2 sion recognition, with its roots in the KNN2SVM [ 10 ] classifier, but KNN2SVM decoup les the nearest2neigh2 bor search from the SVM learning algorithm. Once the K2nearest neighbors have been identified, the SVM al2 gorithm comp letely ignores their sim ilarities to the given 第 5期 孙正兴 ,等 :基于局部 SVM分类器的表情识别方法 ·459·
·460· 智能系统学报 第3卷 test example So we incorporated neighborhood infor cal risk tem and a complexity tem that depends on the mation into SVM leaming to mprove the classification VC dmension of the linear separator Controlling or accuracy of KNN-SVM. m inm izing both tems pem its control of the generaliza- 3.1 Nearest neighbors and SVM tion error in a theoretically well-founded way The In this part we will give a brief description of nea-leaming procedure of an SVM can be summarized as rest neighbors and SVM classifiers Lets assume a clas ollows The m inm ization of the complexity tem is a- sification problem with samplesD=(x,,y)with i= chieved bym quantiy namey l,2.,N,x∈R and y∈fl,-l/.For the K-nea rest neighbor (KNN)algorithm,given a point x'in the maxm izing the class separation margin The empirical risk tem is controlled through the follwing constraint n-dmensional feature pace,an ordering function (6) RR is defined A typ ical ordering function is based %w·中(x)+b)≥1-5 on Euclidean metrics W here,.5,∈fi=l,2,N}≥0.The presence of the (x)=llx-x'll. slack variables allows some misclassification in the By means of an ordering function,it is possible to training set In fact,during model building,a nonlin- order the entire set of training samples x with respect to ear SVM is trained to solve the following opti ization x!This is equivalent to defining a function:1.. p roblem: N1..N that maps the indexes of the N train- max ing points of the datasets We define this function re- (7) cursive ly. st∑0y=0.0≤a,≤c1=1,2n 左(1)=argn in‖φ(x,)-中(x)I2, By refommulating such an op tm ization problem with La- 安(0=興n‖中(x)-中(x)川, (4) grange multipliers a,(i=1..N),it is possible to write the following decision rule: i≠5(1,j-1,j=2,N. In this way,x represents the th point in the set SVM (x)=sign( 2:%·x)中)+ D=(x,,y)in tems of distance from x',namely (8 he广h nearest neighbor of x',wih左(xw丿= Where,the mapping appears only in the dot products lbeing its distance fromx'and is its 中(x)·中(x).This is an important property,which classification Given the above definition,the decision allows kemelizing of the classification problemn.In- rule of the KNN classifier for binary classification is de- deed,if a kemel functon k(,.)satisfies Mercer's fined by theorem,it is possible to substitute k(x.x)with aNW=sign(∑,n (5) 中(x,)·中(x)in Equ(7)obtaining thus a decision Support vector machines (SVMs)are based on rule exp ressed as statistical leaming theory In the classification con- text,the decision rule of an SVM is generally given by M()=sig(20,·%·k,+b以.9 SM(x)=sign(w·中(x)+b),whee,中(x):R→F 3 2 KNN-SVM Clssifier is a mapping in some transfomed feature space F wE KNN-SVM1 combines localities and searches for F and bER are parameters such that they m inm ize an a large margin separating surfaces by partitioning the upper bound on the expected risk while m inm izing the entire transfomed feature pace through an ensemble of empirical risk Such a bound is composed of an empiri- local maxmal margin hyper planes In order to classify 1994-2009 China Academic Journal Electronic Publishing House.All rights reserved.http://www.cnki.net
test examp le. So we incorporated neighborhood infor2 mation into SVM learning to imp rove the classification accuracy of KNN2SVM. 3. 1 Nearest ne ighbors and SVM In this part we will give a brief descrip tion of nea2 rest neighbors and SVM classifiers. Lets assume a clas2 sification p roblem with samp lesD = { ( xi , yi ) } with i = 1, 2, …, N, xi ∈R d and yi ∈{ 1, - 1}. For the K2nea2 rest neighbor ( KNN) algorithm, given a point x’ in the n2dimensional feature space, an ordering function fx’: R d →R is defined. A typ ical ordering function is based on Euclidean metrics: fx′( x) = ‖x - x′‖. By means of an ordering function, it is possible to order the entire set of training samp les x with respect to x′. This is equivalent to defining a function rx′: { 1, …, N } →{ 1, …, N } that map s the indexes of the N train2 ing points of the datasets. W e define this function re2 cursively. rx′(1) = argm in i = 1, …, N ‖< ( xi ) - < ( x′) ‖ 2 , rx′( j) = argm in i = 1, …, N ‖< ( xi ) - < ( x′) ‖ 2 , i≠rx′(1) , …, rx′( j - 1) , j = 2, …, N1 (4) In this way, xr x′( j) rep resents the j2th point in the set D = { ( xi , yi ) } in term s of distance from x′, namely the j2th nearest neighbor of x′, with fx′ ( xr x′( j) ) = ‖xr x′( j) - x′‖ being its distance from x′and yr x′( j) is its classification. Given the above definition, the decision rule of the KNN classifier for binary classification is de2 fined by KNN ( x) = sign ( 6 k i =1 yr x′( i) ). (5) Support vector machines ( SVM s) are based on statistical learning theory [ 22 ] . IIn the classification con2 text, the decision rule of an SVM is generally given by SVM ( x) = sign (w·< ( x) + b) , where, < ( x) : R d →F is a mapp ing in some transformed feature space F. w∈ F and b∈R are parameters such that they m inim ize an upper bound on the expected risk while m inim izing the emp irical risk. Such a bound is composed of an emp iri2 cal risk term and a comp lexity term that depends on the VC dimension of the linear separator. Controlling or m inim izing both term s perm its control of the generaliza2 tion error in a theoretically well2founded way. The learning p rocedure of an SVM can be summarized as follows. The m inim ization of the comp lexity term is a2 chieved by m inim izing the quantity 1 2 ‖w 2 ‖, namely maxim izing the class separation margin. The emp irical risk term is controlled through the following constraint: yi (w·< ( xi ) + b) ≥ 1 - ξi . (6) W here, ξi ∈{ i = 1, 2, …, N } ≥0. The p resence of the slack variablesξi allows some m isclassification in the training set. In fact, during model building, a nonlin2 ear SVM is trained to solve the following op tim ization p roblem: max 6 n i =1 αi - 1 2 6 n i, j=1 αiαj yi yj < ( xi , xj ) , s1t1 6 n i =1 αi yi = 0, 0 ≤αi ≤ c, i = 1, 2, …, n1 (7) By reformulating such an op tim ization p roblem with La2 grange multip liersαi ( i = 1, …, N ) , it is possible to write the following decision rule: SVM ( x) = sign ( 6 N i =1 αi ·yi ·< ( xi ) ·< ( x) + b). (8) W here, the mapp ing < appears only in the dot p roducts < ( xi ) ·< ( x). This is an important p roperty, which allows kernelizing of the classification p roblem. In2 deed, if a kernel function k ( ·, ·) satisfies Mercer’s theorem, it is possible to substitute k ( xi , x ) with < ( xi ) ·< ( x ) in Equ. ( 7) obtaining thus a decision rule exp ressed as: SVM ( x) = sign ( 6 N i =1 αi ·yi ·k ( xi , x) + b). (9) 3. 2 KNN2SVM C la ssifier KNN2SVM [ 10 ] combines localities and searches for a large margin separating surfaces by partitioning the entire transformed feature space through an ensemble of local maximal margin hyper p lanes. In order to classify ·460· 智 能 系 统 学 报 第 3卷
第5期 孙正兴,等:基于局部SM分类器的表情识别方法 ·461· a given pointx'in the input pace,we first find its K-the classification eror of each training example accord- nearest neighbors in the transfomed feature space F,ing o its si ilarity to the test example The si ilarity and then search for an optial separating hyper plane is captured by a distance functiono,the same as the only over these K-nearest neighbors In practice,this app roach used by KNN. means that an SVM is built over the neighborhood of For each test sample x',we construct its bcal each test point x'.Accordingly,the constraints in SVM model by solving the follow ing opti ization prob- Equ (6)become: lem: nm中(w+]≥1-5ai=L.k10 Whee,左:fl,…}→{l,N}is a function, s t y (w'x,-b≥1-5, (13) which maps the indexes of the training point defined in 5≥0.i=1,2,…n Equ (4).In this way,is the th point of the set Where,o(x'x,)is the L2 distance between x'and x. D in tems of distance from x'and thus j<k号‖中(3)-中(x)‖<‖φ(xw)-中()I The solution to Equ (13)identifies the decision sur face as well as the bcal neighborhood of the samples because of the monotonicity of the quadratic operator The functiono penalizes training examples that are loca- The computation is expressed in tems of kemels as ted far away from the test example As a result,classi- ‖中(x)-中(x)I2=(中(x),中(x))p+(φ(x), fication of the test example depends only on the support φ(x))p-2(φ(x,φ(x))r= vectors in its local neighborhood To further appreciate k(x.x)+k(x'x)-2k(x,x). 11) the role of the weight function,consider the dual fom In the case of linear kemels,the ordering function of Equ (13): can be built using the Euclidean distance,whereas if the kemel is not linear,the ordering can be different 1,0y max∑-2 If the kemel is the RBF kemel,the ordering function is (14) equivalent to using the Euclidean metric The decision s t 2m=00≤a,≤x动. rule associated with the method is i=1,2.,n Compared to Equ (7),the difference beteen LSVM SVMNN (x)=sign( and SVM is that the constraint on the upper bound for (12) a,has been modified from c to co (x.x).Thismodifr- 33 Local support vector machnes cation has the following to effects It reduces the m- KNN-SVM is a combination of KNN and SVM. pact of distant support vectors,and Non-support vectors But this method abandons nearest-neighbor searches in of the nonlinear SVM may become support vectors of the SVM leaming algorithm.Once K-nearest neighbors LSVM. are identified,the SVM algorithm completely ignores 3 4 LSVM in facial expression recogn ition their si ilarity to the given test example when solving For facial exp ression recognition using LSVM,ge- the dual opti ization problem given in Equ (7). ometric features are used as an input Six classes were So we developed a new LSVM algorithm,which considered in the experments,each one representing incoporates neighborhood infomation directly into one of the basic facial expressions (anger,disgust, SVM leaming The princple of LSVM is to reduce the fear,happiness,sadness,and surprise).The LSVM mpact of support vectors bcated far away from a given classifies geometric features as one of these six basic fa- test example This can be accomplished by weighting cial expressions Pseudo code of the basic version of 1994-2009 China Academic Journal Electronic Publishing House.All rights reserved.http://www.cnki.net
a given point x′in the input space, we first find its K2 nearest neighbors in the transformed feature space F, and then search for an op timal separating hyper p lane only over these K2nearest neighbors. In p ractice, this means that an SVM is built over the neighborhood of each test point x ’. Accordingly, the constraints in Equ. (6) become: yr x′( i) [w< ( xr x′( i) + b) ]≥1 -ξr x′( i) , i = 1, …, k. (10) Where, rx′: { 1, …, N } → { 1, …, N } is a function, which map s the indexes of the training point defined in Equ. (4). In this way, xr x′( j) is the j2th point of the set D in term s of distance from x′and thus j < k] ‖< ( xr x′( j) ) - < ( x′) ‖ < ‖< ( xr x′( k) ) - < ( x′) ‖ because of the monotonicity of the quadratic operator. The computation is exp ressed in term s of kernels as: ‖< ( x) - < ( xi ) ‖ 2 =〈< ( x) , < ( x) 〉F +〈< ( x′) , < ( x′) 〉F - 2〈< ( x) , < ( x′) 〉F = k ( x, x) + k ( x′, x′) - 2k ( x, x′). (11) In the case of linear kernels, the ordering function can be built using the Euclidean distance, whereas if the kernel is not linear, the ordering can be different. If the kernel is the RBF kernel, the ordering function is equivalent to using the Euclidean metric. The decision rule associated with the method is: SVMNN ( x) = sign ( 6 k i =1 ar x ( i) yr s ( i) k ( xr x ( i) , x) + b). (12) 3. 3 Loca l support vector machines KNN2SVM is a combination of KNN and SVM. But this method abandons nearest2neighbor searches in the SVM learning algorithm. Once K2nearest neighbors are identified, the SVM algorithm comp letely ignores their sim ilarity to the given test examp le when solving the dual op tim ization p roblem given in Equ. (7). So we developed a new LSVM algorithm, which incorporates neighborhood information directly into SVM learning. The p rincip le of LSVM is to reduce the impact of support vectors located far away from a given test examp le. This can be accomp lished by weighting the classification error of each training examp le accord2 ing to its sim ilarity to the test examp le. The sim ilarity is cap tured by a distance functionσ, the same as the app roach used by KNN. For each test samp le x ’ , we construct its local SVM model by solving the following op tim ization p rob2 lem: m in 1 2 ‖w‖ 2 2 + C 6 n i =1 σ( x′, xi )ξi , s. t. yi (w T xi - b) ≥ 1 - ξi , (13) ξi ≥ 0, i = 1, 2, …, n. W here, σ( x′, xi ) is the L2 distance between x′and xi . The solution to Equ. ( 13) identifies the decision sur2 face as well as the local neighborhood of the samp les. The functionσpenalizes training examp les that are loca2 ted far away from the test examp le. A s a result, classi2 fication of the test examp le depends only on the support vectors in its local neighborhood. To further app reciate the role of the weight function, consider the dual form of Equ. (13) : max6 n i =1 αi - 1 2 6 n i, j=1 αiαj yi yj < ( xi , xj ) , s. t. 6 n i =1 αi yi = 0, 0 ≤αi ≤ cσ( x′, xi ) , i = 1, 2, …, n. (14) Compared to Equ. ( 7) , the difference between LSVM and SVM is that the constraint on the upper bound for αi has been modified from c to cσ( x′, xi ). Thismodifi2 cation has the following two effects: It reduces the im2 pact of distant support vectors, and Non2support vectors of the nonlinear SVM may become support vectors of LSVM. 3. 4 LSVM in fac ia l expression recogn ition For facial exp ression recognition using LSVM, ge2 ometric features are used as an input. Six classes were considered in the experiments, each one rep resenting one of the basic facial exp ressions ( anger, disgust, fear, happ iness, sadness, and surp rise). The LSVM classifies geometric features as one of these six basic fa2 cial exp ressions. Pseudo code of the basic version of 第 5期 孙正兴 ,等 :基于局部 SVM分类器的表情识别方法 ·461·
·462· 智能系统学报 第3卷 the facial expression algorithm is given in Fig 4. Input:Geometric feature sample of facial exp ression x' Training set T=(x,)(x)(x.y),where x,ER,x,is the i-th geometric feature,y=(1,2.3,4.5.6/,y,is the facial expression classifications Number of nearest neighbors k Output:facial expression classifications=1.2.3.4.5.6 1.Find k samples (x.y)with m inmal values ofk(x.x)-2k (x.x,), 2 Train an modified multi-class SVM model on the k selected samples,the modified SVM model incorporates the neighborhood infomation, 3. Classify x,using this model,get the result, 4.retum y Fig 4 The LSVM classifier for facial expression recognition The LSVM makes binary decisions There are a and surprise).Each video sequence starts with a neu- number of methods for making multi-class decisions tral expresson and ends with the peak of the facial ex- with a set of binary classifiers We emp byed pairwise pression This database is annotated with AUs(Action partitioning strategies For pairwise partitioning (1: Units).These combinations ofAUswere translated in- 1),the SVM were trained to discrmn inate all pairs of to facial expressions according Ref [24],in order emotions For six categories that makes 15 SVMs to define the corresponding ground truth for the facial expressions All the subjects were used to fom the da- 4 Exper in en ts and eva lua tions tabase for the experments The database contains 480 In order to validate our proposed app oach for fa- video sequences,containing 84 exp ressons of"fear", cial exp ression recognition,we carried out experments l05of“surprise”,92of“sadness'”,36of“anger'”, on a machine with a Pentium 4/2 0G CPU,IGB 56of“disgust”and107of“happ iness”The upper memory,W indowsXP,and Visual C++60 The row of Fig 5 shows the extraction of facial feature Cohn-Kanade database2s was used to recognize facial points in the initial frames in the video sequences for expression as one of the six basic facial expression the 6 basic exp ression types,while the lower row shows classes anger,disgust,fear,happ iness,sadness, that of the last frames of those video sequences (a)happy (b)disgust (c)fear (d)sad (e)anger (f)surprise Fig 5 ASM based facial feature points extraction examples In our experments,three classification algo-ness Both KNN-SVM and LSVM emply a linear ker rithms,KNN,nonlinear SVM and SVMNN were com- nel The parameters of the classification algorithm,i pared with our LSVM classifier to show its effective- e the k in KNN,c in SVM,bandwidthA in the RBF 1994-2009 China Academic Journal Electronie Publishing House.All rights reserved.http://www.cnki.net
the facial exp ression algorithm is given in Fig. 4. Input: Geometric feature samp le of facial exp ression x′ Training set: T = { ( x1 , y1 ) , ( x2 , y2 ) , …, ( xn , yn ) } , where xi ∈R d , xi is the i2th geometric feature, yi = { 1, 2, 3, 4, 5, 6}, yi is the facial exp ression classifications. Number of nearest neighbors k. Output: facial exp ression classifications yp = { 1, 2, 3, 4, 5, 6} 1. Find k samp les ( xi , yi ) with m inimal values of k ( xi , xi ) - 2k ( x, xi ) , 2. Train an modified multi2class SVM model on the k selected samp les, the modified SVM model incorporates the neighborhood information, 3. Classify xi using this model, get the result yp , 4. return yp . Fig. 4 The LSVM classifier for facial exp ression recognition The LSVM makes binary decisions. There are a number of methods for making multi2class decisions with a set of binary classifiers. W e emp loyed pair2wise partitioning strategies. For pair2wise partitioning ( 1: 1) , the SVM were trained to discrim inate all pairs of emotions. For six categories that makes 15 SVM s. 4 Exper im en ts and eva lua tion s In order to validate our p roposed app roach for fa2 cial exp ression recognition, we carried out experiments on a machine with a Pentium 4 /2. 0G CPU, 1GB memory, W indowsXP, and V isual C + + 6. 0. The Cohn2Kanade database [ 23 ] was used to recognize facial exp ression as one of the six basic facial exp ression classes ( anger, disgust, fear, happ iness, sadness, and surp rise). Each video sequence starts with a neu2 tral exp ression and ends with the peak of the facial ex2 p ression. This database is annotated with AU s (Action Units). These combinations of AU s were translated in2 to facial exp ressions according to Ref. [ 24 ], in order to define the corresponding ground truth for the facial exp ressions. A ll the subjectswere used to form the da2 tabase for the experiments. The database contains 480 video sequences, containing 84 exp ressions of“fear”, 105 of“surp rise”, 92 of“sadness”, 36 of“anger”, 56 of“disgust”and 107 of“happ iness”. The upper row of Fig. 5 shows the extraction of facial feature points in the initial frames in the video sequences for the 6 basic exp ression types, while the lower row shows that of the last frames of those video sequences. Fig. 5 ASM based facial feature points extraction examp les In our experiments, three classification algo2 rithm s, KNN, nonlinear SVM and SVM2NN were com2 pared with our LSVM classifier to show its effective2 ness. Both KNN2SVM and LSVM emp loy a linear ker2 nel. The parameters of the classification algorithm, i. e. the k in KNN, c in SVM, bandwidthλin the RBF ·462· 智 能 系 统 学 报 第 3卷
第5期 孙正兴,等:基于局部SM分类器的表情识别方法 ·463- kemel and k in LSVM were detem ined by 10-fold First,we tested the facial expresson recognition coss validation on the training set To mplement the accuracy based on our proposed classifier LSVM.Con- proposed LSVM algorithm,we modified the C++code fusion matrices were used to evaluate accuracy The of the L BSVM (http://www.csie ntu edu w/cjlin/ confusion matrix is a matrix containing infomation a- libsvm)tool developed by Chang and L in to use Co as bout the actual class label in its columns)and the la- its upper bound constraint for a instead of c bel obtained through classification in its rows).The In order to make maxmal use of the available data diagonal entries of the confusion matrix are the rates of and produce averaged classification accuracy results, correctly classified facial expressions,while the off-di- the expermental results reported in this study were ob- agonal entries correspond to m isclassification rates tained based on app lying a 5-fold cross validation to the The confusion matrix shown in Table 2 presents the re- data sets More specifically,all mage sequences con- sults obtained while using the LSVM classifier From tained in the database were divided into six classes,this table,it can be seen that our method achieves each one corresponding to one of the six basic facial 89 11%overall recognition of facial expressions The expressions b be recognized Five sets containing 20% confusion matrix confims that some expressions are of the data for each class,chosen random ly,were cre- harder to differentiate than others Expressions identi- ated One set containing 20%of the samples for each fied as surprise or happiness are recognized with the class was used for the test set,while the remaining sets highest accuracy (91.32%and 92 48%).For dis- fomed the training set After the classification proce- gust,the recognition rate was 88 32%,for fear it was dure was perfomed,the samples fom ing the testing 89.64%,for sadness is 86.38%,and for anger is set were incorporated into the current training set,and 86 54%.As can be seen,the most ambiguous facial a new set of samples (20%of the samples for each expression was sadness The main reason is that both class)was extracted b fom the new test set The re- surprise and happ iness cause obvious geometric shape maining samples fomed a new training set This proce- changes when the facial expression moves from neutral dure was repeated five tmes The average classification to peak,while othersmay not produce enough geomet- accuracy is the mean value of the percentages of the ric infomation to be as clearly discrm inated correctly classified fac ial expressions Table 2 Confusion matrix based on LSVM Inputs Results (% Happy Surprise Disgust Fear Sad Anger Happy 91.32 179 211 032 258 188 Surprise 268 9248 183 132 0 1.69 Disgust 209 217 8832 308 1.08 3.26 Fear 0 0 357 8964 436 243 Sad 1.62 356 246 178 8638 420 Anger 229 0 171 386 5.60 8654 In addition,we conducted experments to evaluate rithm.The accuracy of SVM is 85.06%while the ac- the perfomance of our proposed algorithms in compari- curacy of KNN is only 78.96%.Secondly,observe son KNN,nonlinear SVM and SVMNN algorithms that SVMNN fails to mp rove the accuracy over nonlin- A 5-fold cross validation was also emp byed in these ex-ear SVM.In fact,the SVMNN perfomance degrades, perments Table 3 summarizes the results of our exper as classification accuracy drops from 85.06%to ments Firstly,observe that,for the six basic facial 85.03%when using SVMNN instead of nonlinear exp ressions,nonlinear SVM outperfoms the KNN algo- SVM.One possible explanation for the poor perfom- 1994-2009 China Academic Journal Electronic Publishing House.All rights reserved.http://www.cnki.net
kernel and k in LSVM were determ ined by 102fold cross validation on the training set. To imp lement the p roposed LSVM algorithm, we modified the C ++ code of the L IBSVM ( http: / /www. csie. ntu. edu. tw / cjlin / libsvm) tool developed by Chang and Lin to use Cσ as its upper bound constraint forα instead of c. In order to make maximal use of the available data and p roduce averaged classification accuracy results, the experimental results reported in this study were ob2 tained based on app lying a 52fold cross validation to the data sets. More specifically, all image sequences con2 tained in the database were divided into six classes, each one corresponding to one of the six basic facial exp ressions to be recognized. Five sets containing 20% of the data for each class, chosen random ly, were cre2 ated. One set containing 20% of the samp les for each class was used for the test set, while the remaining sets formed the training set. After the classification p roce2 dure was performed, the samp les form ing the testing set were incorporated into the current training set, and a new set of samp les ( 20% of the samp les for each class) was extracted to form the new test set. The re2 maining samp les formed a new training set. This p roce2 dure was repeated five times. The average classification accuracy is the mean value of the percentages of the correctly classified facial exp ressions. First, we tested the facial exp ression recognition accuracy based on our p roposed classifier LSVM. Con2 fusion matrices were used to evaluate accuracy. The confusion matrix is a matrix containing information a2 bout the actual class label ( in its columns) and the la2 bel obtained through classification ( in its rows). The diagonal entries of the confusion matrix are the rates of correctly classified facial exp ressions, while the off2di2 agonal entries correspond to m isclassification rates. The confusion matrix shown in Table 2 p resents the re2 sults obtained while using the LSVM classifier. From this table, it can be seen that our method achieves 89. 11% overall recognition of facial exp ressions. The confusion matrix confirm s that some exp ressions are harder to differentiate than others. Exp ressions identi2 fied as surp rise or happ iness are recognized with the highest accuracy ( 91. 32% and 92. 48% ). For dis2 gust, the recognition rate was 88. 32% , for fear it was 89. 64% , for sadness is 86. 38% , and for anger is 86. 54%. A s can be seen, the most ambiguous facial exp ression was sadness. The main reason is that both surp rise and happ iness cause obvious geometric shape changes when the facial exp ression moves from neutral to peak, while othersmay not p roduce enough geomet2 ric information to be as clearly discrim inated. Table 2 Confusion ma tr ix ba sed on LSVM Inputs Results ( % ) Happy Surp rise D isgust Fear Sad Anger Happy 91. 32 1. 79 2. 11 0. 32 2. 58 1. 88 Surp rise 2. 68 92. 48 1. 83 1. 32 0 1. 69 D isgust 2. 09 2. 17 88. 32 3. 08 1. 08 3. 26 Fear 0 0 3. 57 89. 64 4. 36 2. 43 Sad 1. 62 3. 56 2. 46 1. 78 86. 38 4. 20 Anger 2. 29 0 1. 71 3. 86 5. 60 86. 54 In addition, we conducted experiments to evaluate the performance of our p roposed algorithm s in compari2 son to KNN, nonlinear SVM and SVM2NN algorithm s. A 52fold cross validation was also emp loyed in these ex2 periments. Table 3 summarizes the results of our exper2 iments. Firstly, observe that, for the six basic facial exp ressions, nonlinear SVM outperform s the KNN algo2 rithm. The accuracy of SVM is 85. 06% while the ac2 curacy of KNN is only 78. 96%. Secondly, observe that SVM2NN fails to imp rove the accuracy over nonlin2 ear SVM. In fact, the SVM2NN performance degrades, as classification accuracy drop s from 85. 06% to 85. 03% when using SVM2NN instead of nonlinear SVM. One possible exp lanation for the poor perform2 第 5期 孙正兴 ,等 :基于局部 SVM分类器的表情识别方法 ·463·
·464· 智能系统学报 第3卷 ance of SVMNN is the difficulty of choosing the right 0.95 number of nearest neighbors (K)when the number of 0.85 training examples is sall We observed that the LSVM algorithm consistently outperomed nonlinear SVM and 0.75 KNN-SVM for the six basic facial expressions We also 0.65 found both suprise and happ iness recognized with higher accuracy than other facial expressions,with the 0.55 SVM exception of the KNN classifier Fig 6 shows the ROC 0.45 curves of the four classifiers It demonstrates that the 0.02 0.04 0.06 0.08 0.10 LSVM classifier outperfoms the SVM,KNN-SVM and Fig 6 Roc curves of the four classifiers KNN classifiers Table 3 Clssifica tion accuracies (%for SV,KNN,KNN-SVM and LSVM Inputs Results/% Happy Surprise Disgust Fear Sad Anger A verage accuracy LSVM 91.32 9248 8832 89.64 8638 8654 89.11 SVM 8809 8867 85.86 85.71 8241 79.62 85.06 SVMNN 8755 87.92 8423 8468 8497 8085 8503 N 8209 7838 7936 8062 77.4175.92 7896 Fig 7 shows the six basic facial expression recog- frame shows an expression with great intensity In the nition results from our proposed system.Our method last frame,we extract geometric features and classify recognizes facial expressions from video sequences The the exp ression using LSVM. first frame shows a neutral expression while the last T02:40:41:04 Fig 7 Facial expression recogniton results in our poposed system 5 Conclusion ger,disgust,fear,joy,sadness and surprise We tracked the facial feature points using ASM and extrac- In this paper,we proposed an automatic method ted geometric features from video sequences To i- for recognizing protyp ical exp ressions that include an- prove facial exp resson recognition accuracy,we pres- 1994-2009 China Academic Journal Electronic Publishing House.All rights reserved.http://www.cnki.net
ance of SVM2NN is the difficulty of choosing the right number of nearest neighbors ( K) when the number of training examp les is small. W e observed that the LSVM algorithm consistently outperformed nonlinear SVM and KNN2SVM for the six basic facial exp ressions. W e also found both surp rise and happ iness recognized with higher accuracy than other facial exp ressions, with the excep tion of the KNN classifier. Fig. 6 shows the ROC curves of the four classifiers. It demonstrates that the LSVM classifier outperform s the SVM, KNN2SVM and KNN classifiers. Fig. 6 Roc curves of the four classifiers Table 3 C la ssifica tion accurac ies ( %) for SVM , KNN, KNN2SVM and LSVM Inputs Results/% Happy Surp rise D isgust Fear Sad Anger Average accuracy LSVM 91. 32 92. 48 88. 32 89. 64 86. 38 86. 54 89. 11 SVM 88. 09 88. 67 85. 86 85. 71 82. 41 79. 62 85. 06 SVM2NN 87. 55 87. 92 84. 23 84. 68 84. 97 80. 85 85. 03 KNN 82. 09 78. 38 79. 36 80. 62 77. 41 75. 92 78. 96 Fig. 7 shows the six basic facial exp ression recog2 nition results from our p roposed system. Our method recognizes facial exp ressions from video sequences. The first frame shows a neutral exp ression while the last frame shows an exp ression with great intensity. In the last frame, we extract geometric features and classify the exp ression using LSVM. Fig. 7 Facial exp ression recognition results in our p roposed system 5 Conclusion In this paper, we p roposed an automatic method for recognizing p rototyp ical exp ressions that include an2 ger, disgust, fear, joy, sadness and surp rise. We tracked the facial feature points using ASM and extrac2 ted geometric features from video sequences. To im2 p rove facial exp ression recognition accuracy, we p res2 ·464· 智 能 系 统 学 报 第 3卷