Algorithm 1 Build KD-Tree Algorithm 2 kNN search. 1:procedure BUILD KDTREES(T,nED) 1:procedure KNN SEARCH(Tr,t,k) 3 for each template t in T do 2: put k nearest neighbors of target t in downsampling to length-nED tree Tr into C. ¥ stored in Tdown- for each candidate in C do end for 4 run DTW on target and candidate. 心e 6: Build a KD Tree from Tdouen using 5 end for Euclidean distance.as Tr 6 return index of minimal DTW distances Fig.6:K-S test results for gesture Nod and 7:end procedure Shake 7:end procedure answers.These questions could be something like "Is Ford use a grid search to get the optimal parameters for the one- the maker of your first car?","Is red your favorite color?"etc. class SVM classifier (OCSVM)and the RBF kernel with a Next,the user is also involved in contributing an initial training 10-fold cross validation.To further improve the classification set.from which a classifier model can be built.Because performance,we employ a one-class ensemble SVM method the classifier requires some training samples before sufficient (OCESVM)[24]to combine multiple classifiers.The basic accuracy is achieved (>30 training samples in our evaluation), idea is that we collect and rank multiple sets of parameters we optionally propose that the system can leverage the gesture by the true positive rate (TPR)with a constraint on the false recognition module to opportunistically collect instances of positive rate (FPR <1%)during the grid search.Then the the“nod'and“shake'”gestures.Whenever GlassGesture rec- top-r (we set r=5 empirically)classifiers are chosen to ognizes these gestures,we store these instances for classifier form an ensemble classifier using majority voting on the training in the authentication module. classification decisions.We use the trained model to classify Feature Set.We select statistical features such as,mean, the test samples.The test samples can be labeled in one of two standard deviation (std),root mean square (rms),kurtosis, ways:1)samples from the authorized user;2)samples from skewness,median absolute deviation (mad),zero-crossing rate some other,unauthorized user.We will present the evaluation (zcr)and inter-guartile range (igr).We also add features like of our training and classification in next section. energy,duration and inter-axis correlation (iac).Additionally, Feature Selection.While our features are extracted from we add a new category of features called peak features (in- three axes,it is possible that a gesture in 3D space may be cluding average peak-to-peak duration,average peak-to-peak well characterized by features extracted from data of only one amplitude,and peak number)by analysing peaks in the motion (1D)or two axes (2D).Therefore,we apply recursive feature sensor data.which we have found effective at characterizing elimination (RFE)[25]to eliminate redundant or useless movements like head gestures.We collect motion sensor data features,which will increase accuracy and reduce delay.In of gesture samples from 18 users (gender:m/f:14/4;age:20- RFE,the training process will recursively select a subset of 30:11,30-40:5,40+:2.)while they are answering yes or no the features,which works best on preliminary data.However. questions using head gestures.We extract features from the RFE usually works with multi-class classifiers,not one-class raw accelerometer and gyroscope data on each axis,in total classifiers.Therefore,we propose a new way of applying 84 unique features,for each sample.To test the effectiveness RFE in one-class classification.The training set in one-class of the selected features,we run a two-sample Kolmogorov- classification are all positive instances(same class labels).The Smirnov test (K-S test)to see whether the features of different basic idea is to divide the training set into several groups users are from differentiable distributions.From the results in evenly and manually assign each group a different virtual class Fig.6,we can find that all the p-values,returned by K-S test, label,to turn the one-class training set into a multi-class one. are smaller than the significant level (0.05).which indicates In applying of RFE onto this"fake"multi-class training set,we the effectiveness of selected features. use a 10-fold cross validation and vote on the features in each run.Since features which top the voting result contribute most Training and Classification.SVM classifies have been in differentiating those virtual groups,we eliminate features widely used in biometric-based authentication systems and radial basis function(RBF)kernels have been shown to have with more than e votes and finally train a one-class SVM classifier with the rest of features.The problem here is how good performance [13],[14].For the authentication problem, to determine the value of e.Through our experiments we a one-class classifier is the most practical model since,at the training phase,the system can only gather training samples empirically set e =3,which gives the best performance in most of our trials.We will evaluate this feature selection from the authorized user.However,ideally,if the system scheme later. is able to gather enough negative instances,the one-class classifier might be outperformed by a two-class classifier, IV.EVALUATION eventually.Therefore,for practicality concerns,we report the one-class classifier results to assess our gesture-based Currently,we have implemented GlassGesture as a Google authentication system.The training phase happens offline.We Glass application.We adopt FastDTW implementation [26]Algorithm 1 Build KD-Tree 1: procedure BUILD KDTREES( T, nED) 2: for each template t in T do 3: downsampling to length-nED 4: stored in Tdown. 5: end for 6: Build a KD Tree from Tdown using Euclidean distance, as Tr 7: end procedure Algorithm 2 kNN search. 1: procedure KNN SEARCH(Tr, t, k) 2: put k nearest neighbors of target t in tree Tr into C. 3: for each candidate in C do 4: run DTW on target and candidate. 5: end for 6: return index of minimal DTW distances 7: end procedure Fig. 6: K-S test results for gesture Nod and Shake answers. These questions could be something like “Is Ford the maker of your first car?”, “Is red your favorite color?” etc. Next, the user is also involved in contributing an initial training set, from which a classifier model can be built. Because the classifier requires some training samples before sufficient accuracy is achieved (>30 training samples in our evaluation), we optionally propose that the system can leverage the gesture recognition module to opportunistically collect instances of the “nod” and “shake” gestures. Whenever GlassGesture recognizes these gestures, we store these instances for classifier training in the authentication module. Feature Set. We select statistical features such as, mean, standard deviation (std), root mean square (rms), kurtosis, skewness, median absolute deviation (mad), zero-crossing rate (zcr) and inter-quartile range (iqr). We also add features like energy, duration and inter-axis correlation (iac). Additionally, we add a new category of features called peak features (including average peak-to-peak duration, average peak-to-peak amplitude, and peak number) by analysing peaks in the motion sensor data, which we have found effective at characterizing movements like head gestures. We collect motion sensor data of gesture samples from 18 users (gender: m/f: 14/4; age: 20- 30: 11, 30-40: 5, 40+: 2.) while they are answering yes or no questions using head gestures. We extract features from the raw accelerometer and gyroscope data on each axis, in total 84 unique features, for each sample. To test the effectiveness of the selected features, we run a two-sample KolmogorovSmirnov test (K-S test) to see whether the features of different users are from differentiable distributions. From the results in Fig. 6, we can find that all the p-values, returned by K-S test, are smaller than the significant level (0.05), which indicates the effectiveness of selected features. Training and Classification. SVM classifies have been widely used in biometric-based authentication systems and radial basis function (RBF) kernels have been shown to have good performance [13], [14]. For the authentication problem, a one-class classifier is the most practical model since, at the training phase, the system can only gather training samples from the authorized user. However, ideally, if the system is able to gather enough negative instances, the one-class classifier might be outperformed by a two-class classifier, eventually. Therefore, for practicality concerns, we report the one-class classifier results to assess our gesture-based authentication system. The training phase happens offline. We use a grid search to get the optimal parameters for the oneclass SVM classifier (OCSVM) and the RBF kernel with a 10-fold cross validation. To further improve the classification performance, we employ a one-class ensemble SVM method (OCESVM) [24] to combine multiple classifiers. The basic idea is that we collect and rank multiple sets of parameters by the true positive rate (TPR) with a constraint on the false positive rate (FPR <1%) during the grid search. Then the top-r (we set r = 5 empirically) classifiers are chosen to form an ensemble classifier using majority voting on the classification decisions. We use the trained model to classify the test samples. The test samples can be labeled in one of two ways: 1) samples from the authorized user; 2) samples from some other, unauthorized user. We will present the evaluation of our training and classification in next section. Feature Selection. While our features are extracted from three axes, it is possible that a gesture in 3D space may be well characterized by features extracted from data of only one (1D) or two axes (2D). Therefore, we apply recursive feature elimination (RFE) [25] to eliminate redundant or useless features, which will increase accuracy and reduce delay. In RFE, the training process will recursively select a subset of the features, which works best on preliminary data. However, RFE usually works with multi-class classifiers, not one-class classifiers. Therefore, we propose a new way of applying RFE in one-class classification. The training set in one-class classification are all positive instances (same class labels). The basic idea is to divide the training set into several groups evenly and manually assign each group a different virtual class label, to turn the one-class training set into a multi-class one. In applying of RFE onto this “fake” multi-class training set, we use a 10-fold cross validation and vote on the features in each run. Since features which top the voting result contribute most in differentiating those virtual groups, we eliminate features with more than e votes and finally train a one-class SVM classifier with the rest of features. The problem here is how to determine the value of e. Through our experiments we empirically set e = 3, which gives the best performance in most of our trials. We will evaluate this feature selection scheme later. IV. EVALUATION Currently, we have implemented GlassGesture as a Google Glass application. We adopt FastDTW implementation [26]