head gestures have different movement distributions along slightly different gestures.We evaluate nprw through our each axis.For example,a nodding gesture is much stronger experiments in the evaluation section. in the z-axis the the y-axis or z-axis.Weights are calculated by the std on each axis of the template as C.Head-Gesture-based Authentication std(Gtz) Basic Idea.As we mentioned previously,Glass does not Dx三 have a robust authentication scheme.To secure the interface in std(Gtz)+std(Gty)+std(Gtz) GlassGesture,we propose the use of signatures extracted from The best match (minimal D(l,lt))is optimized in the sense simple head gestures.In order to lead the user to perform a of an optimal alignment of those samples.We can say that natural and instinctual gesture,a"yes or no"security question, G matches Gt,if dtw(G,Gt)is below a certain threshold. that can be answered using head gestures,is presented on the To recognize which gesture is present in a given window,we near-eye display.The user answers with head movements.In need to run DTW iterating all templates.Whichever template this way,the instinctual gestures(nodding and shaking)can be has the lowest DTW distance with the target,and is below a considered consistent head movements.After that,the answer safety threshold,is selected as the recognition result. (gestures)will be verified by the system.Features are extracted 3)Efficient Similarity Search:DTW is a pair-wise template from motion sensors,then fed into a trained classifier.If matching algorithm,which means that to detect a gesture the answer is correct and the classifier labels the gesture as naively,we need to traverse all gesture templates.It costs belonging to the user,the user will be accepted.Otherwise,it O(N2)to compare two time series at length of N (we set will reject the user.Thus,we form a two-factor authentication I=lt =N for simplicity),which is not efficient when there scheme.While we mainly test the feasibility of the "nod" is a large number of gesture templates.We propose several and "shake"gestures,since they convey social meanings in schemes to optimize the performance. answering questions,we do not rule out the possibility of Firstly,to reduce the search complexity,we want to build other head gestures.This scheme has several advantages over a k-dimensional (k-d)tree to do k-Nearest Neighbor (kNN) the existing authentication done on the touchpad.First,the searches.However,tree branch pruning based on the triangle user does not have to remember anything,as the signatures inequality will introduce errors if applied directly on DTW we extract are inherent in their movement/gesture.Second, distances between gesture templates,since DTW distance is nod and shake are simple gestures taking almost no effort a non-metric and does not satisfy the triangle inequality [23]. from user.Finally,an attacker cannot brute-force this system Therefore,we build the tree using Euclidean distance (ED) even with significant effort,because 1)the near-eye display instead,which is a metric distance,and therefore preserves is a private display,which can prevent shoulder surfing on the triangle inequality,allowing us to do pruning safely. the secure questions;2)the signature of the head gestures are Secondly,to further reduce the computation,we down- hard to observe by the human eye,unaided by any special sample the inputs before calculating the ED.Then we build equipment.Furthermore they are difficult to forge even with the k-d tree.To recognize a target gesture,we first use the explicit knowledge of the features. down-sampled target gesture to do the kNN search over the Threat Model.We have identified three types of possible k-d tree.Then,we iterate over all k candidate templates to attackers.The Type-/attacker has no prior information what- calculate the DTW distance with the target to find the best soever.This attacker simply has physical access to the user's match with no down-sampling for the best accuracy. Glass and attempts to authenticate as the user.Type-I attacks The construction of a k-d tree is given in Alg.1.And the are very likely to fail and ultimately amount to a brute force kNN search is given in Alg.2.Say we have m templates, attack,which can be mitigated by locking the device after a which are all of length N.It costs O(m*N2)when iterating few consecutive authentication failures.The Type-II attacker over all the templates to match a target gesture,using DTW. may know the answer to the user specific security questions, The set of m gesture templates in N-space (each template but will try to authenticate with head gestures in their own is of length N)can be firstly down-sampled to nED-space natural styles (not trying to imitate the correct user's motions (each template is at nEp length,nED <N).We build a k-d or features).The Type-Ill attacker,the most powerful attacker, tree of O(m)size in O(m logm)time to process the down- not only knows the answers to the security questions,but sampled templates,of which the cost can be amortized.The also is able to observe authentication instances (e.g.through kNN search query can be answered in O(mED+k),where a video clip).The attacker can try to perform the gesture in a k is the number of query results.In total,the time cost is similar manner as the owner,in an attempt to fool the system. O(m"ED +k+k*N2). Note that,there is no security mechanism which can guarantee Lastly,we can also down-sample the gesture data before that the attacker will not be able to obtain the data on the running DTW after the kNN search.The time cost will become device once the attacker has physical access.The proposed O(m"ED+k+k*nprw2)where npTw<N is the down- authentication method can slow the attacker down,foil naive sampled length for DTW.However,it is non-trivial to choose or inexperienced attackers,and make the task of extracting proper nprw,since we don't want the down-sampling to data from the device more difficult. remove important features of the time series.If this is the Authentication Setup.In this offline setup phase,the user case,then the DTW algorithm may fail at differentiating two first needs to establish a large set of security questions withhead gestures have different movement distributions along each axis. For example, a nodding gesture is much stronger in the x-axis the the y-axis or z-axis. Weights are calculated by the std on each axis of the template as wx = std(Gtx) std(Gtx) + std(Gty) + std(Gtz) The best match (minimal D(l, lt)) is optimized in the sense of an optimal alignment of those samples. We can say that G matches Gt, if dtw(G, Gt) is below a certain threshold. To recognize which gesture is present in a given window, we need to run DTW iterating all templates. Whichever template has the lowest DTW distance with the target, and is below a safety threshold, is selected as the recognition result. 3) Efficient Similarity Search: DTW is a pair-wise template matching algorithm, which means that to detect a gesture naively, we need to traverse all gesture templates. It costs O(N2 ) to compare two time series at length of N (we set l = lt = N for simplicity), which is not efficient when there is a large number of gesture templates. We propose several schemes to optimize the performance. Firstly, to reduce the search complexity, we want to build a k-dimensional (k-d) tree to do k-Nearest Neighbor (kNN) searches. However, tree branch pruning based on the triangle inequality will introduce errors if applied directly on DTW distances between gesture templates, since DTW distance is a non-metric and does not satisfy the triangle inequality [23]. Therefore, we build the tree using Euclidean distance (ED) instead, which is a metric distance, and therefore preserves the triangle inequality, allowing us to do pruning safely. Secondly, to further reduce the computation, we downsample the inputs before calculating the ED. Then we build the k-d tree. To recognize a target gesture, we first use the down-sampled target gesture to do the kNN search over the k-d tree. Then, we iterate over all k candidate templates to calculate the DTW distance with the target to find the best match with no down-sampling for the best accuracy. The construction of a k-d tree is given in Alg. 1. And the kNN search is given in Alg. 2. Say we have m templates, which are all of length N. It costs O(m ∗ N2 ) when iterating over all the templates to match a target gesture, using DTW. The set of m gesture templates in N-space (each template is of length N) can be firstly down-sampled to nED-space (each template is at nED length, nED ≪ N). We build a k-d tree of O(m) size in O(m log m) time to process the downsampled templates, of which the cost can be amortized. The kNN search query can be answered in O(m 1 nED + k), where k is the number of query results. In total, the time cost is O(m 1 nED + k + k ∗ N2 ). Lastly, we can also down-sample the gesture data before running DTW after the kNN search. The time cost will become O(m 1 nED +k +k ∗nDTW 2 ) where nDTW ≪ N is the downsampled length for DTW. However, it is non-trivial to choose proper nDTW , since we don’t want the down-sampling to remove important features of the time series. If this is the case, then the DTW algorithm may fail at differentiating two slightly different gestures. We evaluate nDTW through our experiments in the evaluation section. C. Head-Gesture-based Authentication Basic Idea. As we mentioned previously, Glass does not have a robust authentication scheme. To secure the interface in GlassGesture, we propose the use of signatures extracted from simple head gestures. In order to lead the user to perform a natural and instinctual gesture, a “yes or no” security question, that can be answered using head gestures, is presented on the near-eye display. The user answers with head movements. In this way, the instinctual gestures (nodding and shaking) can be considered consistent head movements. After that, the answer (gestures) will be verified by the system. Features are extracted from motion sensors, then fed into a trained classifier. If the answer is correct and the classifier labels the gesture as belonging to the user, the user will be accepted. Otherwise, it will reject the user. Thus, we form a two-factor authentication scheme. While we mainly test the feasibility of the “nod” and “shake” gestures, since they convey social meanings in answering questions, we do not rule out the possibility of other head gestures. This scheme has several advantages over the existing authentication done on the touchpad. First, the user does not have to remember anything, as the signatures we extract are inherent in their movement/gesture. Second, nod and shake are simple gestures taking almost no effort from user. Finally, an attacker cannot brute-force this system even with significant effort, because 1) the near-eye display is a private display, which can prevent shoulder surfing on the secure questions; 2) the signature of the head gestures are hard to observe by the human eye, unaided by any special equipment. Furthermore they are difficult to forge even with explicit knowledge of the features. Threat Model. We have identified three types of possible attackers. The Type-I attacker has no prior information whatsoever. This attacker simply has physical access to the user’s Glass and attempts to authenticate as the user. Type-I attacks are very likely to fail and ultimately amount to a brute force attack, which can be mitigated by locking the device after a few consecutive authentication failures. The Type-II attacker may know the answer to the user specific security questions, but will try to authenticate with head gestures in their own natural styles (not trying to imitate the correct user’s motions or features). The Type-III attacker, the most powerful attacker, not only knows the answers to the security questions, but also is able to observe authentication instances (e.g. through a video clip). The attacker can try to perform the gesture in a similar manner as the owner, in an attempt to fool the system. Note that, there is no security mechanism which can guarantee that the attacker will not be able to obtain the data on the device once the attacker has physical access. The proposed authentication method can slow the attacker down, foil naive or inexperienced attackers, and make the task of extracting data from the device more difficult. Authentication Setup. In this offline setup phase, the user first needs to establish a large set of security questions with