TouchID:User Authentication on Mobile Devices via Inertial-Touch Gesture Analysis 162 XINCHEN ZHANG,State Key Laboratory for Novel Software Technology,Nanjing University,China YAFENG YIN',State Key Laboratory for Novel Software Technology,Nanjing University,China LEI XIE,State Key Laboratory for Novel Software Technology,Nanjing University,China HAO ZHANG,State Key Laboratory for Novel Software Technology,Nanjing University,China ZEFAN GE,State Key Laboratory for Novel Software Technology,Nanjing University,China SANGLU LU,State Key Laboratory for Novel Software Technology,Nanjing University,China Due to the widespread use of mobile devices,it is essential to authenticate users on mobile devices to prevent sensitive information leakage.In this paper,we propose TouchID,which combinedly uses the touch sensor and the inertial sensor for gesture analysis,to provide a touch gesture based user authentication scheme.Specifically,TouchID utilizes the touch sensor to analyze the on-screen gesture while using the inertial sensor to analyze the device's motion caused by the touch gesture, and then combines the unique features between the on-screen gesture and the device's motion for user authentication.To mitigate the intra-class difference and reduce the inter-class similarity,we propose a spatial alignment method for sensor data and segment the touch gesture into multiple sub-gestures in space domain,to keep the stability of the same user and enhance the discriminability of different users.To provide a uniform representation of touch gestures with different topological structures,we present a four-part based feature selection method,which classifies a touch gesture into a start node,an end node,the turning node(s),and the smooth paths,and then select effective features from these parts based on the Fisher Score. In addition,considering the uncertainty of user's postures,which may change the sensor data of the same touch gesture,we propose a multi-threshold kNN based model to adaptively tolerate the posture difference for user authentication.Finally,we implement TouchID on commercial smartphones and conduct extensive experiments to evaluate TouchID.The experiment results show that TouchID can achieve a good performance for user authentication,ie.,having a low equal error rate of 4.90%. CCS Concepts:Human-centered computingUbiquitous and mobile computing. Additional Key Words and Phrases:User authentication.Touch gesture,Mobile devices,Inertial-touch gesture analysis ACM Reference Format: Xinchen Zhang,Yafeng Yin,Lei Xie,Hao Zhang,Zefan Ge,and Sanglu Lu.2020.TouchID:User Authentication on Mo- bile Devices via Inertial-Touch Gesture Analysis.Proc.ACM Interact.Mob.Wearable Ubiguitous Technol.37,4,Article 162 (December 2020),29 pages.https://doi.org/10.1145/1122445.1122456 "Corresponding Author Authors'addresses:Xinchen Zhang.State Key Laboratory for Novel Software Technology,Nanjing University,China,xczhang@smail.nju edu.cn;Yafeng Yin,State Key Laboratory for Novel Software Technology,Nanjing University,China,yafeng@nju.edu.cn;Lei Xie,State Key Laboratory for Novel Software Technology,Nanjing University,China,Ixie@nju.edu.cn;Hao Zhang.State Key Laboratory for Novel Software Technology,Nanjing University,China,h.zhang@smailnju.edu.cn;Zefan Ge,State Key Laboratory for Novel Software Technology,Nanjing University,China,zefan@smail.nju.edu.cn;Sanglu Lu,State Key Laboratory for Novel Software Technology,Nanjing University,China sanglu@nju.edu.cn Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page.Copyrights for components of this work owned by others than ACM must be honored.Abstracting with credit is permitted.To copy otherwise,or republish,to post on servers or to redistribute to lists,requires prior specific permission and/or a fee.Request permissions from permissions@acm.org. 2020 Association for Computing Machinery. 2474-9567/2020/12-ART162$15.00 https:/doi.org/10.1145/1122445.1122456 Proc.ACM Interact.Mob.Wearable Ubiquitous Technol.,Vol.37,No.4,Article 162.Publication date:December 2020
162 TouchID: User Authentication on Mobile Devices via Inertial-Touch Gesture Analysis XINCHEN ZHANG, State Key Laboratory for Novel Software Technology, Nanjing University, China YAFENG YIN∗ , State Key Laboratory for Novel Software Technology, Nanjing University, China LEI XIE, State Key Laboratory for Novel Software Technology, Nanjing University, China HAO ZHANG, State Key Laboratory for Novel Software Technology, Nanjing University, China ZEFAN GE, State Key Laboratory for Novel Software Technology, Nanjing University, China SANGLU LU, State Key Laboratory for Novel Software Technology, Nanjing University, China Due to the widespread use of mobile devices, it is essential to authenticate users on mobile devices to prevent sensitive information leakage. In this paper, we propose TouchID, which combinedly uses the touch sensor and the inertial sensor for gesture analysis, to provide a touch gesture based user authentication scheme. Specifically, TouchID utilizes the touch sensor to analyze the on-screen gesture while using the inertial sensor to analyze the device’s motion caused by the touch gesture, and then combines the unique features between the on-screen gesture and the device’s motion for user authentication. To mitigate the intra-class difference and reduce the inter-class similarity, we propose a spatial alignment method for sensor data and segment the touch gesture into multiple sub-gestures in space domain, to keep the stability of the same user and enhance the discriminability of different users. To provide a uniform representation of touch gestures with different topological structures, we present a four-part based feature selection method, which classifies a touch gesture into a start node, an end node, the turning node(s), and the smooth paths, and then select effective features from these parts based on the Fisher Score. In addition, considering the uncertainty of user’s postures, which may change the sensor data of the same touch gesture, we propose a multi-threshold kNN based model to adaptively tolerate the posture difference for user authentication. Finally, we implement TouchID on commercial smartphones and conduct extensive experiments to evaluate TouchID. The experiment results show that TouchID can achieve a good performance for user authentication, i.e., having a low equal error rate of 4.90%. CCS Concepts: • Human-centered computing → Ubiquitous and mobile computing. Additional Key Words and Phrases: User authentication, Touch gesture, Mobile devices, Inertial-touch gesture analysis ACM Reference Format: Xinchen Zhang, Yafeng Yin, Lei Xie, Hao Zhang, Zefan Ge, and Sanglu Lu. 2020. TouchID: User Authentication on Mobile Devices via Inertial-Touch Gesture Analysis. Proc. ACM Interact. Mob. Wearable Ubiquitous Technol. 37, 4, Article 162 (December 2020), 29 pages. https://doi.org/10.1145/1122445.1122456 ∗Corresponding Author Authors’ addresses: Xinchen Zhang, State Key Laboratory for Novel Software Technology, Nanjing University, China, xczhang@smail.nju. edu.cn; Yafeng Yin, State Key Laboratory for Novel Software Technology, Nanjing University, China, yafeng@nju.edu.cn; Lei Xie, State Key Laboratory for Novel Software Technology, Nanjing University, China, lxie@nju.edu.cn; Hao Zhang, State Key Laboratory for Novel Software Technology, Nanjing University, China, h.zhang@smail.nju.edu.cn; Zefan Ge, State Key Laboratory for Novel Software Technology, Nanjing University, China, zefan@smail.nju.edu.cn; Sanglu Lu, State Key Laboratory for Novel Software Technology, Nanjing University, China, sanglu@nju.edu.cn. Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from permissions@acm.org. © 2020 Association for Computing Machinery. 2474-9567/2020/12-ART162 $15.00 https://doi.org/10.1145/1122445.1122456 Proc. ACM Interact. Mob. Wearable Ubiquitous Technol., Vol. 37, No. 4, Article 162. Publication date: December 2020
162:2 1 INTRODUCTION With the widespread use of mobile devices (e.g.,smartphones,smartwatches,and tablets)in daily life,more and more sensitive information such as photos,emails,chat messages and bank accounts is stored on mobile devices,thus it is essential to authenticate users to prevent sensitive information leakage [22].Traditionally,the knowledge based authentication schemes were used,which mainly required the user to input the predefined PIN codes or patterns for authentication.Unfortunately,these schemes are vulnerable to various attacks such as smudge attack [2],shoulder surfing attack [49,54],and password inference attack [56,57,62].To overcome these issues,the physiological feature based authentication schemes were proposed,which mainly utilized the unique fingerprints [40,42 and face features [14,21]for authentication.It is convenient to stretch out the finger or show the face for authentication,thus these schemes are adopted in many devices.However,capturing fingerprints or face features often require dedicated or expensive hardwares,e.g.,fingerprint sensor or high-resolution camera. Besides,the physiological feature based authentication schemes can be vulnerable to replaying and spoofing attacks [13,26,38,41].For example,the fake fingerprint generated by 3D printer can achieve an average attack success rate of 80%[26],while the images,videos or 3D face masks can be used to spoof the face authentication systems [13]. In fact,whatever the knowledge based or the physiological feature based authentication schemes,they utilize "what”the user knows or“what”"the user has for user authentication,while ignoring“how”the user performs during the authentication process,which is the focus of this paper.To solve this problem,the behavioral biometrics based authentication schemes were proposed,which focused on the difference of user behaviors in performing gestures for authentication.For example,using the unique vibration characteristics of finger knuckles when being tapped for authentication [8],performing a sequence of rhythmic taps/slides on a device screen [9]for authentication,using the hand's geometry in taps generated by different fingers for authentication [51],using the inter-stroke relationship between multiple fingers in touch gestures for authentication [43].These methods often work with common sensors embedded in many off-the-shelf devices instead of the dedicated sensors used in physiological feature based authentication schemes.However,the existing work often designed customized gestures for user authentication,while the gestures were rarely adopted in commercial mobile devices. Different from the existing work,this paper aims to provide an online user authentication scheme TouchID which is expected to resist the common attacks like smudge attack [2],shoulder surfing attack [49,54],password inference attack [56,57,62],and spoofing attacks [26,38,41].In TouchID,the user performs a widely adopted graphic pattern based touch gesture with one finger on the commercial mobile device for user authentication, as shown in Fig.1.The unlock operations are the same with the existing graphic pattern based unlocking method and easy to use.In regard to the graphic pattern,it is common and has been integrated into many COTS devices(e.g.,Android-powered smartphones which have a 87%global market share [28])and applications (e.g.,payment software Alipay [4],image management software Safe Vault [58]).When compared with the existing work [43,47,51]designing customized gestures,TouchID does not require to use the specific features like the displacements between different fingers and the relationship between a series of customized gestures. thus user authentication in TouchID can be more challenging.As shown in Fig.1,when performing a touch gesture,TouchID leverages the touch sensor and the inertial sensor to capture the on-screen gesture and the device's motion,respectively.Then,TouchID utilizes the unique biometric feature of the user's finger and the touch behavior,i.e.,the geometry of on-screen gesture and the micro movement of the device,to perform user authentication in an online manner.However,to achieve the above goal,it is necessary to mitigate the intra-class difference and reduce the inter-class similarity of gestures,i.e.,enhancing the stability of gestures from the same user while enhancing the discriminability of gestures from the different users.Specifically,we need to solve the following three challenges,to provide the user authentication scheme TouchID. Proc.ACM Interact.Mob.Wearable Ubiquitous Technol..Vol 37.No.4.Article 162.Publication date:December 2020
162:2 • 1 INTRODUCTION With the widespread use of mobile devices (e.g., smartphones, smartwatches, and tablets) in daily life, more and more sensitive information such as photos, emails, chat messages and bank accounts is stored on mobile devices, thus it is essential to authenticate users to prevent sensitive information leakage [22]. Traditionally, the knowledge based authentication schemes were used, which mainly required the user to input the predefined PIN codes or patterns for authentication. Unfortunately, these schemes are vulnerable to various attacks such as smudge attack [2], shoulder surfing attack [49, 54], and password inference attack [56, 57, 62]. To overcome these issues, the physiological feature based authentication schemes were proposed, which mainly utilized the unique fingerprints [40, 42] and face features [14, 21] for authentication. It is convenient to stretch out the finger or show the face for authentication, thus these schemes are adopted in many devices. However, capturing fingerprints or face features often require dedicated or expensive hardwares, e.g., fingerprint sensor or high-resolution camera. Besides, the physiological feature based authentication schemes can be vulnerable to replaying and spoofing attacks [13, 26, 38, 41]. For example, the fake fingerprint generated by 3D printer can achieve an average attack success rate of 80% [26], while the images, videos or 3D face masks can be used to spoof the face authentication systems [13]. In fact, whatever the knowledge based or the physiological feature based authentication schemes, they utilize “what” the user knows or “what” the user has for user authentication, while ignoring “how” the user performs during the authentication process, which is the focus of this paper. To solve this problem, the behavioral biometrics based authentication schemes were proposed, which focused on the difference of user behaviors in performing gestures for authentication. For example, using the unique vibration characteristics of finger knuckles when being tapped for authentication [8], performing a sequence of rhythmic taps/slides on a device screen [9] for authentication, using the hand’s geometry in taps generated by different fingers for authentication [51], using the inter-stroke relationship between multiple fingers in touch gestures for authentication [43]. These methods often work with common sensors embedded in many off-the-shelf devices instead of the dedicated sensors used in physiological feature based authentication schemes. However, the existing work often designed customized gestures for user authentication, while the gestures were rarely adopted in commercial mobile devices. Different from the existing work, this paper aims to provide an online user authentication scheme TouchID, which is expected to resist the common attacks like smudge attack [2], shoulder surfing attack [49, 54], password inference attack [56, 57, 62], and spoofing attacks [26, 38, 41]. In TouchID, the user performs a widely adopted graphic pattern based touch gesture with one finger on the commercial mobile device for user authentication, as shown in Fig. 1. The unlock operations are the same with the existing graphic pattern based unlocking method and easy to use. In regard to the graphic pattern, it is common and has been integrated into many COTS devices (e.g., Android-powered smartphones which have a 87% global market share [28]) and applications (e.g., payment software Alipay [4], image management software Safe Vault [58]). When compared with the existing work [43, 47, 51] designing customized gestures, TouchID does not require to use the specific features like the displacements between different fingers and the relationship between a series of customized gestures, thus user authentication in TouchID can be more challenging. As shown in Fig. 1, when performing a touch gesture, TouchID leverages the touch sensor and the inertial sensor to capture the on-screen gesture and the device’s motion, respectively. Then, TouchID utilizes the unique biometric feature of the user’s finger and the touch behavior, i.e., the geometry of on-screen gesture and the micro movement of the device, to perform user authentication in an online manner. However, to achieve the above goal, it is necessary to mitigate the intra-class difference and reduce the inter-class similarity of gestures, i.e., enhancing the stability of gestures from the same user while enhancing the discriminability of gestures from the different users. Specifically, we need to solve the following three challenges, to provide the user authentication scheme TouchID. Proc. ACM Interact. Mob. Wearable Ubiquitous Technol., Vol. 37, No. 4, Article 162. Publication date: December 2020
TouchID:User Authentication on Mobile Devices via Inertial-Touch Gesture Analysis.162:3 二4 Touch sensor Inertial sensor Successtul TTING GESTURE STUR authentication failed CLASSIFIER TRANIN TESTING TESTING Fig.1.A typical application scenario of TouchID The first challenge is how to mitigate the intra-class difference and reduce the inter-class similarity among gestures? To address this challenge,we first conduct extensive experimental studies to observe how the finger moves in a touch gesture and how the device moves caused by the gesture.We find that time differences among touch gestures can affect intra-class difference and inter-class similarity.Therefore,we propose a spatial alignment method to align the sensor data in space domain,and then segment the touch gesture into multiple sub-gestures to highlight the sub-gesture which contributes more for enhancing the stability of the same user and the discriminability of different users. The second challenge is how to represent the touch gesture with different topological structures in a uniform way? The touch gestures corresponding to different graphic patterns have a different number of sub-gestures and the sub-gestures can also be different,which may lead to the different representation of a touch gesture.To address this challenge,we propose a four-part feature selection method,which classifies a touch gesture into four parts, i.e.,a start node,an end node,turning node(s),and smooth paths,whatever the topological structure of the touch gesture is.Then,we select effective features for each part based on Fisher Score.Finally,for each gesture,we can represent it with a feature vector consisted of the uniform feature set. The third challenge is how to tolerate the uncertainty caused by different body postures and hand postures?When performing a touch gesture,the user can sit,lay or stand,and she/he can interact with the device with one hand or two hands,the different postures will lead to the inconsistency of the sensor data for the same touch gesture. To address this challenge,we design a multi-threshold KNN based model to separate the touch gestures under different postures into different clusters adaptively,and then perform user authentication in each cluster.In addition,to reduce the computation overhead of multi-threshold kNN,we only use a small number of samples for training. We make three main contributions in this paper.1)We conduct an extensive experimental study to observe the finger's movement and the device's motion when performing a touch gesture,and then propose a spatial alignment method to align the touch gesture in space domain and segment the gesture into sub-gestures,to enhance the stability of the same user and the discriminability of different users.2)Based on a comprehensive analysis of touch sensor data and inertial sensor data,we propose a four-part feature selection method to represent the touch gestures with different topological structures in a uniform way,and select effective features based on the Fisher Score by considering both the intra-class stability and the inter-class discriminability.In addition,we also propose a multi-threshold KNN based model to mitigate the effect of different postures.3)We implement TouchID on an Android-powered smartphone and conduct a lot of experiments to evaluate the efficiency of TouchID Proc.ACM Interact.Mob.Wearable Ubiquitous Technol.,Vol.37,No.4,Article 162.Publication date:December 2020
TouchID: User Authentication on Mobile Devices via Inertial-Touch Gesture Analysis • 162:3 Touch sensor Successful authentication SETTING GESTURE CLASSIFIER TRANING TESTING Authentication failed SETTING GESTURE CLASSIFIER TRANING TESTING Inertial sensor Fig. 1. A typical application scenario of TouchID The first challenge is how to mitigate the intra-class difference and reduce the inter-class similarity among gestures? To address this challenge, we first conduct extensive experimental studies to observe how the finger moves in a touch gesture and how the device moves caused by the gesture. We find that time differences among touch gestures can affect intra-class difference and inter-class similarity. Therefore, we propose a spatial alignment method to align the sensor data in space domain, and then segment the touch gesture into multiple sub-gestures to highlight the sub-gesture which contributes more for enhancing the stability of the same user and the discriminability of different users. The second challenge is how to represent the touch gesture with different topological structures in a uniform way? The touch gestures corresponding to different graphic patterns have a different number of sub-gestures and the sub-gestures can also be different, which may lead to the different representation of a touch gesture. To address this challenge, we propose a four-part feature selection method, which classifies a touch gesture into four parts, i.e., a start node, an end node, turning node(s), and smooth paths, whatever the topological structure of the touch gesture is. Then, we select effective features for each part based on Fisher Score. Finally, for each gesture, we can represent it with a feature vector consisted of the uniform feature set. The third challenge is how to tolerate the uncertainty caused by different body postures and hand postures? When performing a touch gesture, the user can sit, lay or stand, and she/he can interact with the device with one hand or two hands, the different postures will lead to the inconsistency of the sensor data for the same touch gesture. To address this challenge, we design a multi-threshold KNN based model to separate the touch gestures under different postures into different clusters adaptively, and then perform user authentication in each cluster. In addition, to reduce the computation overhead of multi-threshold kNN, we only use a small number of samples for training. We make three main contributions in this paper. 1) We conduct an extensive experimental study to observe the finger’s movement and the device’s motion when performing a touch gesture, and then propose a spatial alignment method to align the touch gesture in space domain and segment the gesture into sub-gestures, to enhance the stability of the same user and the discriminability of different users. 2) Based on a comprehensive analysis of touch sensor data and inertial sensor data, we propose a four-part feature selection method to represent the touch gestures with different topological structures in a uniform way, and select effective features based on the Fisher Score by considering both the intra-class stability and the inter-class discriminability. In addition, we also propose a multi-threshold KNN based model to mitigate the effect of different postures. 3) We implement TouchID on an Android-powered smartphone and conduct a lot of experiments to evaluate the efficiency of TouchID. Proc. ACM Interact. Mob. Wearable Ubiquitous Technol., Vol. 37, No. 4, Article 162. Publication date: December 2020
162:4· The experiment results show that TouchlD can achieve a very low equal error rate for user authentication and outperforms the existing solutions 2 RELATED WORK When considering the unique features in user behaviors,a variety of gesture based user authentication schemes have been proposed.The gestures include hand gestures [8,20],eye movements [11,17],lip motions [37,48], heartbeats [25,36,52],touch gestures [1,7,9,12,16,31,34,43,46,47,51,55,59-61],and so on.Among the gestures,touch gestures are often used to authenticate owners of mobile devices,since users often interact with mobile devices with touch gestures.In this paper,we will summarize the related work using touch gestures for user authentication on mobile devices.In addition,from the perspective of sensors used for monitoring the touch gestures,we mainly classify the related work into three categories,i.e.,touch sensor based,inertial sensor based, and inertial-touch based user authentication. Touch sensor based user authentication:When performing an on-screen gesture,the touch sensor can provide the coordinates of fingertips and the sizes of touch areas,which can be used for inferring the moving directions,moving speeds,moving trajectories of fingertips,relative distances among multiple fingers,etc.The unique features in touch gestures can be used to differentiate users.Until now,many touch-related gestures,e.g., swiping [1,9,12,16,55,59,60],tapping [9,34,61],zooming in/out [60]and some user-defined gestures [47], have been proposed for user authentication.When performing a single-touch swiping gesture on the screen, Frank et al.[16]investigated whether a classifier can continuously authenticate users.Chen et al.[9]required users to perform a sequence of rhythmic tapping or swiping gestures on a multi-touch mobile device,then it extracted features from rhythmic gestures to authenticate users.Moreover,Song et al.[47]studied multi-touch swiping gestures and showed that both the hand geometry and the behavioral biometric can be recorded in these gestures and used for user authentication.Besides,some methods also proposed the usage of image-based features for modeling touchscreen data.Zhao et al.[60]proposed a novel Graphic Touch Gesture Feature(GTGF) to extract the identity traits from swiping and zooming in/out gestures,where the intensity values and shapes of the GTGF represent the moving trace and pressure dynamically.The existing work indicates the touch sensor can be used for on-screen gesture authentication.However,only with the touch sensor,these methods mainly focus on the geometry features on the screen.To get more distinguishable features,they may require the user to perform gestures in a rhythm or adopt the user-defined gestures. Inertial sensor based user authentication:The inertial sensor means the accelerometer,gyroscope,magne- tometer,or any combination of the three sensors.It can measure the device's motion related to the touch gesture for user authentication.For example,when tapping on the smartphone,Sitova et al.[46]used the accelerom- eter,gyroscope,and magnetometer to capture the micro hand movements and dynamic orientation changes during several tapping gestures for user authentication.Chen et al.[8]used the accelerometer in smartwatchs to capture the vibration characteristics when a user taps her/his finger knuckles,then they extracted features from vibration signals to authenticate users.Shen et al.[7]used the accelerometer,gyroscope,magnetometer to capture behavioral traits during swiping gestures and built a one-class Markov-based decision procedure to authentication users.Guerra-Casanova et al.[20]used the accelerometer in a mobile device to depict the behavioral characteristics when a user performs hand gestures while holding the mobile device.The existing work indicates that the inertial sensor can be used for touch gesture based user authentication.However,different from the touch sensor,the inertial sensor mainly captures the indirect sensor data of the touch gesture,ie.,using the sensor data corresponding to device motions to infer the characteristics in touch gestures,and it is often used to authenticate simple or short gestures like tapping and swiping. Inertial-touch based user authentication:When combining the touch sensor and the inertial sensor,it is possible to capture the on-screen gesture and the device's motion at the same time,thus enriching the sensor data Proc.ACM Interact.Mob.Wearable Ubiquitous Technol..Vol 37.No.4.Article 162.Publication date:December 2020
162:4 • The experiment results show that TouchID can achieve a very low equal error rate for user authentication and outperforms the existing solutions. 2 RELATED WORK When considering the unique features in user behaviors, a variety of gesture based user authentication schemes have been proposed. The gestures include hand gestures [8, 20], eye movements [11, 17], lip motions [37, 48], heartbeats [25, 36, 52], touch gestures [1, 7, 9, 12, 16, 31, 34, 43, 46, 47, 51, 55, 59–61], and so on. Among the gestures, touch gestures are often used to authenticate owners of mobile devices, since users often interact with mobile devices with touch gestures. In this paper, we will summarize the related work using touch gestures for user authentication on mobile devices. In addition, from the perspective of sensors used for monitoring the touch gestures, we mainly classify the related work into three categories, i.e., touch sensor based, inertial sensor based, and inertial-touch based user authentication. Touch sensor based user authentication: When performing an on-screen gesture, the touch sensor can provide the coordinates of fingertips and the sizes of touch areas, which can be used for inferring the moving directions, moving speeds, moving trajectories of fingertips, relative distances among multiple fingers, etc. The unique features in touch gestures can be used to differentiate users. Until now, many touch-related gestures, e.g., swiping [1, 9, 12, 16, 55, 59, 60], tapping [9, 34, 61], zooming in/out [60] and some user-defined gestures [47], have been proposed for user authentication. When performing a single-touch swiping gesture on the screen, Frank et al. [16] investigated whether a classifier can continuously authenticate users. Chen et al. [9] required users to perform a sequence of rhythmic tapping or swiping gestures on a multi-touch mobile device, then it extracted features from rhythmic gestures to authenticate users. Moreover, Song et al. [47] studied multi-touch swiping gestures and showed that both the hand geometry and the behavioral biometric can be recorded in these gestures and used for user authentication. Besides, some methods also proposed the usage of image-based features for modeling touchscreen data. Zhao et al. [60] proposed a novel Graphic Touch Gesture Feature (GTGF) to extract the identity traits from swiping and zooming in/out gestures, where the intensity values and shapes of the GTGF represent the moving trace and pressure dynamically. The existing work indicates the touch sensor can be used for on-screen gesture authentication. However, only with the touch sensor, these methods mainly focus on the geometry features on the screen. To get more distinguishable features, they may require the user to perform gestures in a rhythm or adopt the user-defined gestures. Inertial sensor based user authentication: The inertial sensor means the accelerometer, gyroscope, magnetometer, or any combination of the three sensors. It can measure the device’s motion related to the touch gesture for user authentication. For example, when tapping on the smartphone, Sitová et al. [46] used the accelerometer, gyroscope, and magnetometer to capture the micro hand movements and dynamic orientation changes during several tapping gestures for user authentication. Chen et al. [8] used the accelerometer in smartwatchs to capture the vibration characteristics when a user taps her/his finger knuckles, then they extracted features from vibration signals to authenticate users. Shen et al. [7] used the accelerometer, gyroscope, magnetometer to capture behavioral traits during swiping gestures and built a one-class Markov-based decision procedure to authentication users. Guerra-Casanova et al. [20] used the accelerometer in a mobile device to depict the behavioral characteristics when a user performs hand gestures while holding the mobile device. The existing work indicates that the inertial sensor can be used for touch gesture based user authentication. However, different from the touch sensor, the inertial sensor mainly captures the indirect sensor data of the touch gesture, i.e., using the sensor data corresponding to device motions to infer the characteristics in touch gestures, and it is often used to authenticate simple or short gestures like tapping and swiping. Inertial-touch based user authentication: When combining the touch sensor and the inertial sensor, it is possible to capture the on-screen gesture and the device’s motion at the same time, thus enriching the sensor data Proc. ACM Interact. Mob. Wearable Ubiquitous Technol., Vol. 37, No. 4, Article 162. Publication date: December 2020
TouchID:User Authentication on Mobile Devices via Inertial-Touch Gesture Analysis.162:5 Linear-accleration(as,as,as) Gyroscope(gu,gn,ga) Touch(xi,y) F Touch (s:) Touch(via) Fig.2.Performing a touch gesture on a smartphone for user authentication.When using the customized touch gestures on the smartphone for user authentication, Shahzad et al.[43]designed and chose 10 effective gestures that including 3 single-touch gestures and 7 multi- touch gestures,and then proposed GEAT,which extracted behavioral features from the touch sensor and the accelerometer for user authentication.Jain et al.[31]utilized the accelerometer,orientation and the touch sensor to capture the touch characteristics during swiping gestures,and used the modified Hausdorff distance as the classifier to authenticate users.Wang et al.[51]also utilized the accelerometer,gyroscope,and the touch sensor to capture the geometry of the user's hand when performing the user-defined gestures that require the user to tap four times with one finger or tap once with four fingers.The existing work indicates that using the touch sensor as well as the inertial sensor can capture both on-screen features and the motion features of the device.However, the existing work tends to introduce multi-touch gestures and user-defined gestures,to capture more specific features corresponding to a user(e.g.,the relations between fingers in a gesture)for better authentication. To capture both the on-screen gesture and the device motions,we utilize the touch sensor,accelerometer,and gyroscope for user authentication.In regard to the touch gesture,the existing work mainly used the customized gestures for user authentication,while the gestures are rarely used in commercial mobile devices.In this paper, the user only performs a widely adopted unlock gesture,i.e.,the graphic pattern based touch gestures,with one finger for user authentication.Due to the lack of specific features like displacements between different fingertips in multi-touch gestures and the relationship between a series of customized gestures,the user authentication in this paper which only uses a widely adopted single touch gesture can be more challenging. 3 OBSERVATIONS AND MODELING In this section,we will conduct extensive experiments to observe how the user performs a touch gesture and how the gesture affects the sensor data.Unless otherwise specified,the user performs a common unlock gesture on a 3 x 3 grid on the Samsung Galaxy S9 smartphone with a 5.9-inch screen,as shown in Fig.2.Then,we use the touch sensor and the inertial sensor(i.e.,accelerometer and gyroscope)to collect the sensor data for analysis.The sampling rates of the touch sensor,accelerometer,gyroscope are set to 60 Hz,100 Hz,and 100 Hz,respectively. 3.1 Finger Movements and Device Motions during a Touch Gesture A touch gesture can be measured with the fingertip's coordinates on the screen,the fingertip's touch sizes along the trajectory,and the device's motions along the time.When the fingertip touches the screen,it will generate the following data,i.e.,the coordinate,the touch size,the pressure of the fingertip.However,due to the limitation of Proc.ACM Interact.Mob.Wearable Ubiquitous Technol.,Vol.37,No.4,Article 162.Publication date:December 2020
TouchID: User Authentication on Mobile Devices via Inertial-Touch Gesture Analysis • 162:5 Fh Fp Fg 。 ( , , ) ( , , ) i i i i i i x y z x y z Gyroscope g g g Linear accleration a a a Touch(xi, yi) Touch (si) Touch(vi,i) Fig. 2. Performing a touch gesture on a smartphone for user authentication. When using the customized touch gestures on the smartphone for user authentication, Shahzad et al. [43] designed and chose 10 effective gestures that including 3 single-touch gestures and 7 multitouch gestures, and then proposed GEAT, which extracted behavioral features from the touch sensor and the accelerometer for user authentication. Jain et al. [31] utilized the accelerometer, orientation and the touch sensor to capture the touch characteristics during swiping gestures, and used the modified Hausdorff distance as the classifier to authenticate users. Wang et al. [51] also utilized the accelerometer, gyroscope, and the touch sensor to capture the geometry of the user’s hand when performing the user-defined gestures that require the user to tap four times with one finger or tap once with four fingers. The existing work indicates that using the touch sensor as well as the inertial sensor can capture both on-screen features and the motion features of the device. However, the existing work tends to introduce multi-touch gestures and user-defined gestures, to capture more specific features corresponding to a user (e.g., the relations between fingers in a gesture) for better authentication. To capture both the on-screen gesture and the device motions, we utilize the touch sensor, accelerometer, and gyroscope for user authentication. In regard to the touch gesture, the existing work mainly used the customized gestures for user authentication, while the gestures are rarely used in commercial mobile devices. In this paper, the user only performs a widely adopted unlock gesture, i.e., the graphic pattern based touch gestures, with one finger for user authentication. Due to the lack of specific features like displacements between different fingertips in multi-touch gestures and the relationship between a series of customized gestures, the user authentication in this paper which only uses a widely adopted single touch gesture can be more challenging. 3 OBSERVATIONS AND MODELING In this section, we will conduct extensive experiments to observe how the user performs a touch gesture and how the gesture affects the sensor data. Unless otherwise specified, the user performs a common unlock gesture on a 3 × 3 grid on the Samsung Galaxy S9 smartphone with a 5.9-inch screen, as shown in Fig. 2. Then, we use the touch sensor and the inertial sensor (i.e., accelerometer and gyroscope) to collect the sensor data for analysis. The sampling rates of the touch sensor, accelerometer, gyroscope are set to 60 Hz, 100 Hz, and 100 Hz, respectively. 3.1 Finger Movements and Device Motions during a Touch Gesture A touch gesture can be measured with the fingertip’s coordinates on the screen, the fingertip’s touch sizes along the trajectory, and the device’s motions along the time. When the fingertip touches the screen, it will generate the following data, i.e., the coordinate, the touch size, the pressure of the fingertip. However, due to the limitation of Proc. ACM Interact. Mob. Wearable Ubiquitous Technol., Vol. 37, No. 4, Article 162. Publication date: December 2020
162:6· 5000 250 三4000 200 3000 150 2000 00 言100 100 10 50 100 0 100 150 50 100 150 Sampling point Sampling point Sampling point Sampling point (a)Velocity (Touch sensor) (b)Direction (Touch sensor) (c)Acceleration-Z (Inertial sensor)(d)Rotation-Y (Inertial sensor) Fig.3.Touch sensor data and inertial sensor data of two touch gestures for user 1 1000 250 03 800 一Sampk 0 mple 600 200W 100 20 4060 80 20 406080 20406080 20 4060 Sampling point Sampling point Sampling point Sampling point (a)Velocity (Touch sensor) (b)Direction (Touch sensor)(c)Acceleration-Z(Inertial sensor)(d)Rotation-Y (Inertial sensor) Fig.4.Touch sensor data and inertial sensor data of two touch gestures for user 2 Android API [27],the measured pressure in many smartphones is either 0 or 1,i.e.,touching or non-touching, which is too coarse-grained to analyze the touch force.Therefore,we use the device's motion caused by the touch gesture to represent the pressure indirectly.As shown in Fig.2,when performing a touch gesture,we can obtain the moving trajectory and touch sizes of the fingertip along the time from the touch sensor.In regard to the device's motion,it is caused by the resultant force from the gravity Fg,the hand grasping the phone F and the fingertip pressing the screen Fp.When holding the device statically in a fixed orientation,the forces from the gravity and the hand can be treated as constant forces,thus the device's motion is mainly caused by the finger's pressure.That is to say,the device's motion measured by the embedded accelerometer and gyroscope can represent the finger's pressure.Consequently,we can combine the on-screen gestures and the device's motion to describe a touch gesture on the mobile device. 3.2 Feasibility of User Authentication The touch gestures demonstrate the stability of gestures from the same user,while demonstrating the discriminability of gestures from different users.To explore whether the touch gesture can be used for user authentication,we first invite two users and each one performs the gesture 'L'twice on the same smartphone,as shown in Fig.2.In Fig. 3 and Fig.4,we show the velocity and direction inferred from the touch sensor data,as well as linear acceleration in z-axis and angular velocity in y-axis measured from the inertial sensor for each user,respectively.The solid and dashed lines in the same figure indicate that the gestures from the same user have a high consistency.When comparing the figures in the same column of Fig.3 and Fig.4,we can conclude that the gestures from different users have differences in sensor data. To measure the similarity(or difference)between the sensor data from different gestures,we introduce the operations of normalization and interpolation,as well as the metric of Root Mean Squard Error(RMSE)[6]. For simplicity,we use die [1,nl,d2 i e [1,n]to represent the time-series data for the first gesture and Proc.ACM Interact.Mob.Wearable Ubiquitous Technol.,Vol.37,No.4,Article 162.Publication date:December 2020
162:6 • (a) Velocity (Touch sensor) (b) Direction (Touch sensor) (c) Acceleration-Z (Inertial sensor) (d) Rotation-Y (Inertial sensor) Fig. 3. Touch sensor data and inertial sensor data of two touch gestures for user 1 (a) Velocity (Touch sensor) (b) Direction (Touch sensor) (c) Acceleration-Z (Inertial sensor) (d) Rotation-Y (Inertial sensor) Fig. 4. Touch sensor data and inertial sensor data of two touch gestures for user 2 Android API [27], the measured pressure in many smartphones is either 0 or 1, i.e., touching or non-touching, which is too coarse-grained to analyze the touch force. Therefore, we use the device’s motion caused by the touch gesture to represent the pressure indirectly. As shown in Fig. 2, when performing a touch gesture, we can obtain the moving trajectory and touch sizes of the fingertip along the time from the touch sensor. In regard to the device’s motion, it is caused by the resultant force from the gravity 𝐹𝑔, the hand grasping the phone 𝐹ℎ and the fingertip pressing the screen 𝐹𝑝 . When holding the device statically in a fixed orientation, the forces from the gravity and the hand can be treated as constant forces, thus the device’s motion is mainly caused by the finger’s pressure. That is to say, the device’s motion measured by the embedded accelerometer and gyroscope can represent the finger’s pressure. Consequently, we can combine the on-screen gestures and the device’s motion to describe a touch gesture on the mobile device. 3.2 Feasibility of User Authentication The touch gestures demonstrate the stability of gestures from the same user, while demonstrating the discriminability of gestures from different users. To explore whether the touch gesture can be used for user authentication, we first invite two users and each one performs the gesture ‘L’ twice on the same smartphone, as shown in Fig. 2. In Fig. 3 and Fig. 4, we show the velocity and direction inferred from the touch sensor data, as well as linear acceleration in z-axis and angular velocity in y-axis measured from the inertial sensor for each user, respectively. The solid and dashed lines in the same figure indicate that the gestures from the same user have a high consistency. When comparing the figures in the same column of Fig. 3 and Fig. 4, we can conclude that the gestures from different users have differences in sensor data. To measure the similarity (or difference) between the sensor data from different gestures, we introduce the operations of normalization and interpolation, as well as the metric of Root Mean Squard Error (RMSE) [6]. For simplicity, we use 𝑑1𝑖 ,𝑖 ∈ [1, 𝑛1], 𝑑2𝑖 ,𝑖 ∈ [1, 𝑛2] to represent the time-series data for the first gesture and Proc. ACM Interact. Mob. Wearable Ubiquitous Technol., Vol. 37, No. 4, Article 162. Publication date: December 2020
TouchID:User Authentication on Mobile Devices via Inertial-Touch Gesture Analysis.162:7 the second gesture,and then we normalize dpp [1,2]with Eq.(1).Here,dp means the normalized data, dpmin={dp.Idp:≤dp,j≠ih,dpmax={dp,ldpu≥dpy,j≠i. dp-dpmin dpn dmax (1) 0 dpmin=dpmax Considering that the length of d and d2 can be different,i.e.,n n2,we introduce a linear interpolation algorithm [10]to make the length of d'and d be equal.Suppose n>n2.we need to interpolate data points into d2 to change the length of d2,as n.Consequently,the interval between consecutive data points is changed to and the kth data point for the second gesture is changed to ()where whend =d4,+(d1-d☑,)*(k.2- n1-1 =k (2) At this time,we have obtained the time-series datad and d with the same length m.Then,we can use Eq.(3) to calculate the similarity (or difference),i.e.,RMSE value r12,between them.Here,r12E[0,1],the smaller the value of r12,the higher the similarity (i.e.,the smaller the difference). ∑-) 1 T12 (3) To use the RMSE value to illustrate the stability or discriminability among gestures,we invite three users and each user performs gesture 'L'50 times.Then we calculate the RMSE value of sensor data from the same user and that from different users,respectively.According to Fig.5,the RMSE value corresponding to the same user (i.e.,Ui-Uj,i=j)is generally less than that corresponding to different users(i.e.,Ui-Uj',ij).It indicates that the gestures from the same user keep the similarity while the gestures from different users have unavoidable difference. 0.4f 06 06 a.5 白 0.3 20.3 204 0.2 02 白白自 中日白 01 UI-I U2-U2 U3-0B UI-4R2 UI-43 U203 UI-UI U2-02 LB-03 UI-02 UI-U3 U2-UG UI-UI U2-02 13-03 UI-2 UI-3 U2-03 User X-User Y User X.User Y User X-User Y User X-User Y (a)Velocity (b)Direction (c)Acceleration-Z (d)Rotation-Y Fig.5.The RMSE values between three users along each sensor data 3.3 Sensor Data Alignment in Space Domain The sensor data of touch gestures has unavoidable differences in the time domain while keeping non-negligible consistencies in the space domain.Due to the uncertainty of user behaviors,when a user performs the same gesture multiple times,the duration of each gesture can be different,which will reduce the stability of the gestures from the same user and lead to an authentication error.However,due to the layout constraint of the on-screen gesture, i.e.,the fixed locations of nodes,the trajectories of the gestures corresponding to the same graphic pattern keep essential consistencies.This motivates us to align the sensor data of the gestures based on the graphic pattern,to improve the stability of gestures from the same user. Proc.ACM Interact.Mob.Wearable Ubiquitous Technol.,Vol.37,No.4,Article 162.Publication date:December 2020
TouchID: User Authentication on Mobile Devices via Inertial-Touch Gesture Analysis • 162:7 the second gesture, and then we normalize 𝑑𝑝𝑖 , 𝑝 ∈ [1, 2] with Eq. (1). Here, 𝑑 ′ 𝑝𝑖 means the normalized data, 𝑑𝑝𝑚𝑖𝑛 = {𝑑𝑝𝑖 |𝑑𝑝𝑖 ≤ 𝑑𝑝𝑗 , ∀𝑗 ≠ 𝑖}, 𝑑𝑝𝑚𝑎𝑥 = {𝑑𝑝𝑖 |𝑑𝑝𝑖 ≥ 𝑑𝑝𝑗 , ∀𝑗 ≠ 𝑖}. 𝑑 ′ 𝑝𝑖 = 𝑑𝑝𝑖 − 𝑑𝑝𝑚𝑖𝑛 𝑑𝑝𝑚𝑎𝑥 − 𝑑𝑝𝑚𝑖𝑛 ,𝑑𝑝𝑚𝑖𝑛 ≠ 𝑑𝑚𝑎𝑥, 0 ,𝑑𝑝𝑚𝑖𝑛 = 𝑑𝑝𝑚𝑎𝑥 . (1) Considering that the length of 𝑑 ′ 1𝑖 and 𝑑 ′ 2𝑖 can be different, i.e., 𝑛1 ≠ 𝑛2, we introduce a linear interpolation algorithm [10] to make the length of 𝑑 ′ 1𝑖 and 𝑑 ′ 2𝑖 be equal. Suppose 𝑛1 > 𝑛2, we need to interpolate data points into 𝑑 ′ 2𝑖 to change the length of 𝑑 ′ 2𝑖 as 𝑛1. Consequently, the interval between consecutive data points is changed to 𝑛2−1 𝑛1−1 , and the 𝑘th data point for the second gesture is changed to Eq. (2), where 𝑘 ∈ [2, 𝑛1], when 𝑘 = 1, 𝑑 ′ 2𝑘 = 𝑑21 . 𝑑 ′ 2𝑘 = 𝑑2𝑖 + (𝑑2𝑖+1 − 𝑑2𝑖 ) ∗ (𝑘 · 𝑛2 − 1 𝑛1 − 1 − 𝑖),𝑖 = ⌊𝑘 · 𝑛2 − 1 𝑛1 − 1 ⌋ (2) At this time, we have obtained the time-series data 𝑑 ′ 1𝑖 and 𝑑 ′ 2𝑘 with the same length 𝑛1. Then, we can use Eq. (3) to calculate the similarity (or difference), i.e., RMSE value 𝑟12, between them. Here, 𝑟12 ∈ [0, 1], the smaller the value of 𝑟12, the higher the similarity (i.e., the smaller the difference). 𝑟12 = vut 1 𝑛1 𝑘 Õ=𝑛1 𝑘=1 (𝑑 ′ 1𝑘 − 𝑑 ′ 2𝑘 ) (3) To use the RMSE value to illustrate the stability or discriminability among gestures, we invite three users and each user performs gesture ‘L’ 50 times. Then we calculate the RMSE value of sensor data from the same user and that from different users, respectively. According to Fig. 5, the RMSE value corresponding to the same user (i.e., ‘U𝑖-U𝑗’, 𝑖 = 𝑗) is generally less than that corresponding to different users (i.e., ‘U𝑖-U𝑗’, 𝑖 ≠ 𝑗). It indicates that the gestures from the same user keep the similarity while the gestures from different users have unavoidable difference. U1-U1 U2-U2 U3-U3 U1-U2 U1-U3 U2-U3 User X - User Y 0.1 0.2 0.3 0.4 0.5 0.6 RMSE | | | | | | | | | | (a) Velocity U1-U1 U2-U2 U3-U3 U1-U2 U1-U3 U2-U3 User X - User Y 0.2 0.3 0.4 0.5 0.6 RMSE | | | | | | | | | | (b) Direction U1-U1 U2-U2 U3-U3 U1-U2 U1-U3 U2-U3 User X - User Y 0.1 0.2 0.3 0.4 RMSE | | | | | | | | | | (c) Acceleration-Z U1-U1 U2-U2 U3-U3 U1-U2 U1-U3 U2-U3 User X - User Y 0.1 0.2 0.3 0.4 RMSE | | | | | | | | | | (d) Rotation-Y Fig. 5. The RMSE values between three users along each sensor data 3.3 Sensor Data Alignment in Space Domain The sensor data of touch gestures has unavoidable differences in the time domain while keeping non-negligible consistencies in the space domain. Due to the uncertainty of user behaviors, when a user performs the same gesture multiple times, the duration of each gesture can be different, which will reduce the stability of the gestures from the same user and lead to an authentication error. However, due to the layout constraint of the on-screen gesture, i.e., the fixed locations of nodes, the trajectories of the gestures corresponding to the same graphic pattern keep essential consistencies. This motivates us to align the sensor data of the gestures based on the graphic pattern, to improve the stability of gestures from the same user. Proc. ACM Interact. Mob. Wearable Ubiquitous Technol., Vol. 37, No. 4, Article 162. Publication date: December 2020
1628 0.20 0.1 0. 0.15 Userd 0. -0.2 0.05 0.3 03 04 400600800100012001400 0 100200300400500600700 Duration of touch gestures(ms) Time(ms) Normalized Time (a)Duration distribution for four users (b)Angular velocity data in y-axis of two sam- (c)Temporal alignment results ples from the same user Fig.6.Temporal characteristics of touch gestures 00 0减xh 0.1 -Sample2 1前 Ns' -02 03 Ns -04 0 0 x(pixel) x(pixel) Normalized Time (a)Sensor data is spatially consistent (b)Spatial distribution and definition of nodes (c)Spatial alignment results Fig.7.Spatial characteristics of touch gestures In fact,data alignment is an effective way in the data processing and has been used in many scenarios,such as signal alignment in communications(e.g.,beam alignment in RADAR [18],optical axis alignment of the transmitter and receiver in LiDAR [15],C/A code alignment of the receiver and satellite in GPS [32]),point matching in point set registration [33,39],sequence alignment in videos [5],and so on.Take the sequence alignment task[5]as an example,they leverage both spatial displacement and temporal variations between image frames as cues,to correlate two different video sequences of the same dynamic scene in time and in space Differently,we adopt the layout constraint in space domain as spatial cues to align the time-series sensor data,as described below. Unavoidable time difference among touch gestures:To demonstrate the time difference among touch gestures,we invite four users to perform the gesture 'L'on the screen,as shown in Fig.2.Each one performs the same gesture 50 times.As shown in Fig.6(a),the durations of gestures corresponding to the same graphic pattern 'L'can be different,whether the gestures are performed by the same user or different users.Specifically, in Fig.6(b),we show the angular velocities in y-axis of two gestures corresponding to 'L'from the same user. The duration difference between the two gestures(i.e.,sample 1 and sample 2)is about 100 ms.At this time,to calculate the similarity between them,the temporal alignment method is often adopted,e.g,using the linear interpolation algorithm [10]in time domain to make the number of data points in sample 1 and that in sample 2 be equal,as shown in Fig.6(c).However,this temporal alignment method may break the consistency between the gestures,i.e.,decreasing the stability of gestures from the same user,as the misaligned peaks shown in Fig. 6(c).It indicates that it is inappropriate to align the sensor data in time domain Proc.ACM Interact.Mob.Wearable Ubiquitous Technol.,Vol.37,No.4,Article 162.Publication date:December 2020
162:8 • 400 600 800 1000 1200 1400 Duration of touch gestures (ms) 0 0.05 0.10 0.15 0.20 Probability User1 User2 User3 User4 (a) Duration distribution for four users 0 100 200 300 400 500 600 700 Time(ms) -0.4 -0.3 -0.2 -0.1 0 0.1 Rotaion(rad/s) Sample1 Sample2 (b) Angular velocity data in 𝑦-axis of two samples from the same user 0 1 Normalized Time -0.4 -0.3 -0.2 -0.1 0 0.1 Rotaion(rad/s) Sample1 Sample2 (c) Temporal alignment results Fig. 6. Temporal characteristics of touch gestures 0 200 400 600 800 1000 x(pixel) 900 1000 1100 1200 1300 1400 1500 1600 1700 y(pixel) 500 1000 1500 2000 2500 3000 3500 4000 Velocity(pixels/s) 450 500 550 600 1550 1620 1690 150 200 250 300 1100 1150 → 1200 → (a) Sensor data is spatially consistent N1 N4 N2 N5 N7 N8 N9 N6 O(3 xo3, yo3) r 3 (xi, yi) (xj, yj) (b) Spatial distribution and definition of nodes 0 1 Normalized Time -0.4 -0.3 -0.2 -0.1 0 0.1 Rotaion(rad/s) Sample1 Sample2 (c) Spatial alignment results Fig. 7. Spatial characteristics of touch gestures In fact, data alignment is an effective way in the data processing and has been used in many scenarios, such as signal alignment in communications (e.g., beam alignment in RADAR [18], optical axis alignment of the transmitter and receiver in LiDAR [15], C/A code alignment of the receiver and satellite in GPS [32]), point matching in point set registration [33, 39], sequence alignment in videos [5], and so on. Take the sequence alignment task[5] as an example, they leverage both spatial displacement and temporal variations between image frames as cues, to correlate two different video sequences of the same dynamic scene in time and in space. Differently, we adopt the layout constraint in space domain as spatial cues to align the time-series sensor data, as described below. Unavoidable time difference among touch gestures: To demonstrate the time difference among touch gestures, we invite four users to perform the gesture ‘L’ on the screen, as shown in Fig. 2. Each one performs the same gesture 50 times. As shown in Fig. 6(a), the durations of gestures corresponding to the same graphic pattern ‘L’ can be different, whether the gestures are performed by the same user or different users. Specifically, in Fig. 6(b), we show the angular velocities in y-axis of two gestures corresponding to ‘L’ from the same user. The duration difference between the two gestures (i.e., sample 1 and sample 2) is about 100 ms. At this time, to calculate the similarity between them, the temporal alignment method is often adopted, e.g, using the linear interpolation algorithm [10] in time domain to make the number of data points in sample 1 and that in sample 2 be equal, as shown in Fig. 6(c). However, this temporal alignment method may break the consistency between the gestures, i.e., decreasing the stability of gestures from the same user, as the misaligned peaks shown in Fig. 6(c). It indicates that it is inappropriate to align the sensor data in time domain. Proc. ACM Interact. Mob. Wearable Ubiquitous Technol., Vol. 37, No. 4, Article 162. Publication date: December 2020
TouchID:User Authentication on Mobile Devices via Inertial-Touch Gesture Analysis.162:9 Non-negligible space consistency among gestures:Different from the sensor data in time domain,the touch gestures in space domain are constrained by the layout of the lock screen,e.g.,the 3 x 3 grid in Fig.2. Consequently,the gestures corresponding to the same graphic pattern will keep the consistency in space domain. As shown in Fig.7(a),we show the moving velocity of the fingertip in each touch gesture,whose graphic pattern is'L'.We can find that the moving velocities of two gestures have a high consistency in space domain,e.g.having smaller velocities in nodes and larger velocities between nodes.It indicates that it is possible to align the sensor data in space domain to keep the stability of gestures. To achieve the above goal,we first define the touch gesture on the screen in space domain.As shown in Fig. 7(b),we use pi(xi y),ie[1,n]to represent the coordinate of the ith point in the moving trajectory of a touch gesture,while using the pair Ok,rk to represent the kth node in the lock screen.Here,O(o,yor),k E[1,m] and rk represent the center and radius of the node.When considering the layout constraint of the lock screen, the points in a moving trajectory can be classified into in-node points (i.e.,blue points in Fig.7(b))and out-node points(i.e.,red points in Fig.7(b)).That is to say,a touch gesture can be represented with in-node points and out-node points by turns along with time. For an on-screen point pi(xi.yi),if it satisfies Eq.(4),it is located in the kth node.Otherwise,it is out of the kth node. V(x:-xk)2+(班-yok)2≤rk,k∈[1,m (4) In this way,we can represent all the nk points in the kth node as pk j e[1,nkl,kj <kj+in sequence,based on the occurrence time of the point.In regard to the non-node points occurring between the kth node and k+1th node,they are represented as [pkP().For simplicity,we use N and Ckk+to represent the set of points in the kth node and the connection part between the kth node and (k +1)th node,as shown in Fig.7(b).For a node Nk,k E[1,m](or a connection part Ckk+1),we use the linear interpolation algorithm shown in Eq.(2)to align the sensor data of different gestures in the kth node(or Ck.k+).That is to say,in a node Nk,the number of data points from different gestures is the same,while in a connect part Ckk+1,the number of data points from different gestures is the same.In regard to applying the linear interpolation in N or Ck.k+1,everytime we align the sensor data in one dimension,e.g.,coordinates in x-axis,coordinates in y-axis,touch areas along the time,etc Finally,we can align all the sensor data in space domain. By introducing the spatial alignment,the angular velocity in Fig.6(b)is transformed into Fig.7(c),which can solve the problem of the time difference between the gestures corresponding to the same graphic pattern.When compared with the temporal alignment result shown in Fig.6(c),the spatial alignment result in Fig.7(c)keeps a higher consistency of the gestures from the same user. To quantitatively measure the similarity(difference)of the spatial(temporal)aligned sensor data,we collect 50 samples of gesture 'L'performed by the same user and calculate the RMSE value of the sensor data.The average RMSE value of the temporal and the spatial aligned sensor data is 0.282(standard deviation=0.081)and 0.157 (standard deviation=0.045),respectively.As mentioned before,low RMSE value means high similarity.Therefore, our spatial alignment method can keep the stability among gestures and reduce the intra-class difference for better user authentication. 3.4 Fine-grained Modeling for Gesture Segmentation The touch gestures in nodes and that out of nodes have different properties,especially for the gestures located in turning points.Therefore,after sensor data alignment in space domain,we further segment the touch gesture into several sub-gestures,to highlight the sub-gestures which contribute more for user authentication.As shown in Fig.8,we illustrate the mean acceleration in x-axis of 50 samples corresponding to a whole touch gesture 'L',the ith and the jth segmented sub-gesture of 'L'from different users,respectively.The overlap in Fig.8(a) indicates the low discriminability in whole gestures.However,the little overlap in Fig.8(b)indicates that the Proc.ACM Interact.Mob.Wearable Ubiquitous Technol..Vol.37.No.4.Article 162.Publication date:December 2020
TouchID: User Authentication on Mobile Devices via Inertial-Touch Gesture Analysis • 162:9 Non-negligible space consistency among gestures: Different from the sensor data in time domain, the touch gestures in space domain are constrained by the layout of the lock screen, e.g., the 3 × 3 grid in Fig. 2. Consequently, the gestures corresponding to the same graphic pattern will keep the consistency in space domain. As shown in Fig. 7(a), we show the moving velocity of the fingertip in each touch gesture, whose graphic pattern is ‘L’. We can find that the moving velocities of two gestures have a high consistency in space domain, e.g., having smaller velocities in nodes and larger velocities between nodes. It indicates that it is possible to align the sensor data in space domain to keep the stability of gestures. To achieve the above goal, we first define the touch gesture on the screen in space domain. As shown in Fig. 7(b), we use 𝑝𝑖(𝑥𝑖 , 𝑦𝑖),𝑖 ∈ [1, 𝑛] to represent the coordinate of the 𝑖th point in the moving trajectory of a touch gesture, while using the pair to represent the 𝑘th node in the lock screen. Here, 𝑂𝑘 (𝑥𝑜𝑘 , 𝑦𝑜𝑘 ), 𝑘 ∈ [1,𝑚] and 𝑟𝑘 represent the center and radius of the node. When considering the layout constraint of the lock screen, the points in a moving trajectory can be classified into in-node points (i.e., blue points in Fig. 7(b)) and out-node points (i.e., red points in Fig. 7(b)). That is to say, a touch gesture can be represented with in-node points and out-node points by turns along with time. For an on-screen point 𝑝𝑖(𝑥𝑖 , 𝑦𝑖), if it satisfies Eq. (4), it is located in the 𝑘th node. Otherwise, it is out of the 𝑘th node. q (𝑥𝑖 − 𝑥𝑜𝑘 ) 2 + (𝑦𝑖 − 𝑦𝑜𝑘 ) 2 ≤ 𝑟𝑘, 𝑘 ∈ [1,𝑚] (4) In this way, we can represent all the 𝑛𝑘 points in the 𝑘th node as 𝑝𝑘𝑗 , 𝑗 ∈ [1, 𝑛𝑘 ], 𝑘𝑗 < 𝑘𝑗+1 in sequence, based on the occurrence time of the point. In regard to the non-node points occurring between the 𝑘th node and 𝑘 + 1th node, they are represented as [𝑝𝑘𝑛𝑘 +1, 𝑝(𝑘+1)1−1]. For simplicity, we use 𝑁𝑘 and 𝐶𝑘,𝑘+1 to represent the set of points in the 𝑘th node and the connection part between the 𝑘th node and (𝑘 + 1)th node, as shown in Fig. 7(b). For a node 𝑁𝑘, 𝑘 ∈ [1,𝑚] (or a connection part 𝐶𝑘,𝑘+1 ), we use the linear interpolation algorithm shown in Eq. (2) to align the sensor data of different gestures in the 𝑘th node (or 𝐶𝑘,𝑘+1). That is to say, in a node 𝑁𝑘 , the number of data points from different gestures is the same, while in a connect part 𝐶𝑘,𝑘+1, the number of data points from different gestures is the same. In regard to applying the linear interpolation in 𝑁𝑘 or 𝐶𝑘,𝑘+1, everytime we align the sensor data in one dimension, e.g., coordinates in x-axis, coordinates in y-axis, touch areas along the time, etc. Finally, we can align all the sensor data in space domain. By introducing the spatial alignment, the angular velocity in Fig. 6(b) is transformed into Fig. 7(c), which can solve the problem of the time difference between the gestures corresponding to the same graphic pattern. When compared with the temporal alignment result shown in Fig. 6(c), the spatial alignment result in Fig. 7(c) keeps a higher consistency of the gestures from the same user. To quantitatively measure the similarity (difference) of the spatial (temporal) aligned sensor data, we collect 50 samples of gesture ‘L’ performed by the same user and calculate the RMSE value of the sensor data. The average RMSE value of the temporal and the spatial aligned sensor data is 0.282 (standard deviation=0.081) and 0.157 (standard deviation=0.045), respectively. As mentioned before, low RMSE value means high similarity. Therefore, our spatial alignment method can keep the stability among gestures and reduce the intra-class difference for better user authentication. 3.4 Fine-grained Modeling for Gesture Segmentation The touch gestures in nodes and that out of nodes have different properties, especially for the gestures located in turning points. Therefore, after sensor data alignment in space domain, we further segment the touch gesture into several sub-gestures, to highlight the sub-gestures which contribute more for user authentication. As shown in Fig. 8, we illustrate the mean acceleration in x-axis of 50 samples corresponding to a whole touch gesture ‘L’, the 𝑖th and the 𝑗th segmented sub-gesture of ‘L’ from different users, respectively. The overlap in Fig. 8(a) indicates the low discriminability in whole gestures. However, the little overlap in Fig. 8(b) indicates that the Proc. ACM Interact. Mob. Wearable Ubiquitous Technol., Vol. 37, No. 4, Article 162. Publication date: December 2020
162:10 020 025 025 02 02 0.1 01 0.05 0.0 .0.08 0.0400.040.0s0.12 024020.160.120.08-004 0.02 00.020.040.D6 0.08 Mean of sensor d山ata Mean of sensor data Mean of sensor data (a)x-axis acceleration of the entire (b)x-axis acceleration of i-th sub- (c)x-axis acceleration of j-th sub- gesture gesture gesture Fig.8.Distribution of feature value at gesture and different sub-gesture parts Touch Sensor Data Data Filtering Spatial Alignment Gesture Segmentation based on Sub-gesture's Synchronization of Sensor Data Graphic Patterns Sensor Data Inertial Sensor Data Fig.9.Gesture segmentation scheme high discriminability in the ith segmented sub-gesture,while the large overlap in Fig.8(c)indicates the very poor discriminability in the jth segmented sub-gesture.It means that the gesture in different segments has different stability and discriminability.Thus it is necessary and meaningful to segment the whole touch gesture into sub-gestures,to extract the sub-gestures having a good stability for the same user and the sub-gestures having a good discriminability for different users. To segment the touch gesture,we need to split the inertial sensor data and touch sensor data into each sub- gesture.As shown in Fig.9,the gesture segmentation consists of three steps,i.e.,data filtering and synchronization, spatial alignment of sensor data,gesture segmentation based on the graphic patterns.Firstly,we use a moving average filter to remove the high-frequency noises in the inertial sensor data and the touch sensor data.Besides, considering the difference between the sampling rates of the touch sensor(i.e.,60 Hz)and the inertial sensor (i.e.,100 Hz),we introduce the linear interpolation described in Eq.(2)to synchronize the sensor data,and make them have a uniform sampling rate,i.e.,100 Hz.Secondly,we use the spatial alignment method described in Section 3.3 to align the sensor data of gestures corresponding to the same graphic pattern,to keep the stability of gestures from the same user.Thirdly,we use the layout of the lock screen,i.e.,the locations of nodes,to segment the touch gesture as in-node sub-gestures and out-node sub-gesture by turns.As the blue segments and red segments shown in Fig.7(b).As mentioned in Section 3.3,we use [PkPk]to represent the in-node points in the kth node.Accordingly,the time of the first,the last data point occurring in kth node is represented as t,, tm respectively.Therefore,the sensor data occurring in [tktk]is split into the sub-gesture located in the kth node,while the sensor data occurring in [(is split into the sub-gesture located in the connection part between the kth node and the (k+1)th node. 4 DATA ANALYSIS AND FEATURE SELECTION According to the observations in Section 3,the touch sensor data and inertial sensor data of touch gestures can be used for authentication,since the sensor data shows the similarity of gestures from the same user and the difference of gestures from different users.However,the uncertainty of user behaviors may reduce the intra-class similarity and reduce the inter-class difference.Therefore,it is necessary to analyze the sensor data detailedly Proc.ACM Interact.Mob.Wearable Ubiquitous Technol.,Vol.37,No.4,Article 162.Publication date:December 2020
162:10 • -0.08 -0.04 0 0.04 0.08 0.12 Mean of sensor data 0 0.05 0.10 0.15 0.20 Probability User1 User2 (a) 𝑥-axis acceleration of the entire gesture -0.24 -0.2 -0.16 -0.12 -0.08 -0.04 Mean of sensor data 0 0.05 0.10 0.15 0.20 0.25 Probability User1 User2 (b) 𝑥-axis acceleration of 𝑖-th subgesture -0.02 0 0.02 0.04 0.06 0.08 Mean of sensor data 0 0.05 0.10 0.15 0.20 0.25 Probability User1 User2 (c) 𝑥-axis acceleration of 𝑗-th subgesture Fig. 8. Distribution of feature value at gesture and different sub-gesture parts Data Filtering & Synchronization Spatial Alignment of Sensor Data Gesture Segmentation based on Graphic Patterns Inertial Sensor Data Touch Sensor Data Sub-gesture’s Sensor Data Fig. 9. Gesture segmentation scheme high discriminability in the 𝑖th segmented sub-gesture, while the large overlap in Fig. 8(c) indicates the very poor discriminability in the 𝑗th segmented sub-gesture. It means that the gesture in different segments has different stability and discriminability. Thus it is necessary and meaningful to segment the whole touch gesture into sub-gestures, to extract the sub-gestures having a good stability for the same user and the sub-gestures having a good discriminability for different users. To segment the touch gesture, we need to split the inertial sensor data and touch sensor data into each subgesture. As shown in Fig. 9, the gesture segmentation consists of three steps, i.e., data filtering and synchronization, spatial alignment of sensor data, gesture segmentation based on the graphic patterns. Firstly, we use a moving average filter to remove the high-frequency noises in the inertial sensor data and the touch sensor data. Besides, considering the difference between the sampling rates of the touch sensor (i.e., 60 Hz) and the inertial sensor (i.e., 100 Hz), we introduce the linear interpolation described in Eq. (2) to synchronize the sensor data, and make them have a uniform sampling rate, i.e., 100 Hz. Secondly, we use the spatial alignment method described in Section 3.3 to align the sensor data of gestures corresponding to the same graphic pattern, to keep the stability of gestures from the same user. Thirdly, we use the layout of the lock screen, i.e., the locations of nodes, to segment the touch gesture as in-node sub-gestures and out-node sub-gesture by turns. As the blue segments and red segments shown in Fig. 7(b). As mentioned in Section 3.3, we use [𝑝𝑘1 , 𝑝𝑘𝑛𝑘 ] to represent the in-node points in the 𝑘th node. Accordingly, the time of the first, the last data point occurring in 𝑘th node is represented as 𝑡𝑘1 , 𝑡𝑛𝑘 , respectively. Therefore, the sensor data occurring in [𝑡𝑘1 , 𝑡𝑘𝑛𝑘 ] is split into the sub-gesture located in the 𝑘th node, while the sensor data occurring in [𝑡𝑘𝑛𝑘 +1 , 𝑡(𝑘+1)1−1] is split into the sub-gesture located in the connection part between the 𝑘th node and the (𝑘 + 1)th node. 4 DATA ANALYSIS AND FEATURE SELECTION According to the observations in Section 3, the touch sensor data and inertial sensor data of touch gestures can be used for authentication, since the sensor data shows the similarity of gestures from the same user and the difference of gestures from different users. However, the uncertainty of user behaviors may reduce the intra-class similarity and reduce the inter-class difference. Therefore, it is necessary to analyze the sensor data detailedly Proc. ACM Interact. Mob. Wearable Ubiquitous Technol., Vol. 37, No. 4, Article 162. Publication date: December 2020