to build the gesture recognition module.For gesture authen- (a)sitting,tiny (b)sitting,normal (c)walking.normal tication module,we compile libSVM [27]as native code to implement the classifier.The model is trained offline on a remote machine (MacBook Air,i5-1.3GHz and 4GB- RAM).In this section,we will evaluate our system in two modules individually.We will report performance metrics in terms of TPR (FPR ()accuracy and Fl-Score (p)for both gesture recognition and authentication. 2t2 A.Gesture Recognition Fig.7:(a)Confusion matrix of command gestures (sitting,tiny). TPR:92.87%,FPR:5.7%.(b)Confusion matrix of command gestures We prime our system with eight command gesture tem- (sitting.normal).TPR:96.99%,FPR:2.4%.(c)Confusion matrix of plates:nod and shake,left and right 3 times,triangle and command gestures (walking,normal).TPR:94.88%,FPR:4.6% rectangle,and cw/ccw circle.We also prepare our system for number and alphaber input.We choose those head gestures movement.One way to improve this is to explore different styles of writing them,which is out of the scope of this paper. in order to evaluate the capability of gesture recognition on Gesture Recognition Performance.To demonstrate the various gestures. Gesture Recognition with command gestures.For each performance of gesture recognition,we evaluate it with the processing of 36 gestures of number (0-9)and alphabet (a-z). of these command gestures we conduct gesture data collection Firstly,we want to determine the proper down-sampling length three times,once with as little head movement as possible nEp for calculating Euclidean distant used in kNN search (tiny)in sitting,and a second time with a normal/natural and npTw for calculating DTW used in template matching. amount of head movement (normal)in sitting and a third In Fig.8,we evaluate the gesture recognition accuracy at dif- time in a normal amount of head movement in walking.This experiment is repeated for multiple rounds with each round ferent down-sampling lengths(nED)and numbers of nearest neighbours (k).We found that when the nEp is set as 10,it collecting about 10 gestures.The results of accuracy in Fig.7 gives best accuracy.In Fig.9,we change the nprw in the is in the form of confusion matrices. linear scanning using DTW distance metric.The time cost 1)Gesture in sitting:From results in Fig.7 (a,b),we grows exponentially with the input length,while the accuracy can see that for several gestures,such as nod,left3,right3, can reach a satisfactory level when down-sampling length is shake,the accuracy is perfect,even in the tiny gesture case. as small as 40 or 50.Next,we show the processing speedup The reason behind is that the gesture has a repeating pattern of our scheme against the linear scanning baseline.The results in itself,which distinguishes it from other miscellaneous are shown in Fig.10.We set nED 10,nDTw 50.The movements.The most easily confused gestures are clockwise number of nearest neighbours to be searched can be set to 10- circle and triangle,because of similar shapes in a clockwise 14,which is a reasonable trade-off between processing speed direction.When the user tries to make her gesture very tiny,the and accuracy based on Fig.10 and Fig.8.The running time head movement space is suppressed greatly,which will make will be reduced by 70%when k 10 and 55%when k:=14. these two gestures indistinguishable.Since our system allows We use k =14 in our system users to define their own gestures,we can notify them in case new gestures are too similar to any pre-existing templates to B.Authentication Evaluation ensure the best user experience. We have collected motion sensor data from 18 users while 2)Gesture in walking:When a user is walking,it is rather they are answering questions using head gestures.We have natural that the user will perform gestures in an obvious, gathered around 100-150 trials for each gesture of each unconstrained way.Otherwise,this gesture will just be buried user.Those data are pre-processed and used for feature extrac- in the noise of her walking.From the confusion matrix in Fig. tion,model training and evaluation. 7(c).we find minor deterioration of accuracy in recognizing Impact of Number of Training Samples.Before training gestures such as right3 and rect,which we believe is caused the model,we want to decide an appropriate size of training by noise of walking movement.However,the triangle and samples since it will be a trade-off between authentication clockwise circle are more distinguishable.which as we find is accuracy and user convenience.We run the one-class SVM easier for the user to perform while walking rather than sitting. .(OCSVM)training process with 10-fold cross validation. Number and Alphabet Input.Next,we evaluate gesture Based on trained models,we also build a one-class ensemble recognition accuracy when we use head gesture as number and SVM classifier (OCESVM).As plotted in Fig.12,we increase alphaber input method.Users are asked to draw 0-9 and a-z for the percentage of training samples from 0.1 to 1.0,use the rest at least 10 times to evaluate the accuracy.While 35 of total 36 as test samples,and report average TPR and FPR of all users. gestures are 100%identified,the only error is one instance of We find that 30 samples (0.2 ratio)is sufficient to achieve number 9 is mis-recognized as number 7.The failures are due an average TPR higher than 70%and keep an average FPR to the limitation of template matching i.e.writing 7 and 9 are lower than 0.3%.OCESVM shows great gain of TPR and just too similar if user doesn't write them carefully using head slight deterioration of FPR when the sizes of training samples