AirContour:Building Contour-based Model for In-Air Writing Gesture Recognition YAFENG YIN and LEI XIE,State Key Laboratory for Novel Software Technology,Nanjing University, China TAO GU,RMIT University,Australia YIJIA LU and SANGLU LU,State Key Laboratory for Novel Software Technology,Nanjing University, China Recognizing in-air hand gestures will benefit a wide range of applications such as sign-language recogni- tion,remote control with hand gestures,and "writing"in the air as a new way of text input.This article presents AirContour,which focuses on in-air writing gesture recognition with a wrist-worn device.We pro- pose a novel contour-based gesture model that converts human gestures to contours in 3D space and then recognizes the contours as characters.Different from 2D contours,the 3D contours may have the problems such as contour distortion caused by different viewing angles,contour difference caused by different writing directions,and the contour distribution across different planes.To address the above problem,we introduce Principal Component Analysis(PCA)to detect the principal/writing plane in 3D space,and then tune the projected 2D contour in the principal plane through reversing,rotating,and normalizing operations,to make the 2D contour in right orientation and normalized size under a uniform view.After that,we propose both an online approach,AC-Vec,and an offline approach,AC-CNN,for character recognition.The experimen- tal results show that AC-Vec achieves an accuracy of 91.6%and AC-CNN achieves an accuracy of 94.3%for gesture/character recognition,both outperforming the existing approaches. CCS Concepts:Human-centered computing-Ubiquitous and mobile computing design and eval- uation methods;Empirical studies in ubiquitous and mobile computing: Additional Key Words and Phrases:AirContour,in-air writing,contour-based gesture model,principal com- ponent analysis(PCA),gesture recognition ACM Reference format: Yafeng Yin,Lei Xie,Tao Gu,Yijia Lu,and Sanglu Lu.2019.AirContour:Building Contour-based Model for In-Air Writing Gesture Recognition.ACM Trans.Sen.Netw.15,4,Article 44(October 2019),25 pages. https:/doi.org/10.1145/3343855 This work is supported by National Natural Science Foundation of China under Grant Nos.61802169,61872174,61832008. 61321491:JiangSu Natural Science Foundation under Grant No.BK20180325;the Fundamental Research Funds for the Cen- tral Universities under Grant No.020214380049;Australian Research Council(ARC)Discovery Project Grants DP190101888 and DP180103932.This work is partially supported by Collaborative Innovation Center of Novel Software Technology and Industrialization Authors'addresses:Y.Yin,L.Xie (corresponding author).Y.Lu,and S.Lu,State Key Laboratory for Novel Software Technology,Nanjing University,Nanjing.210023,China;emails:(yafeng.Ixie@nju.edu.cn,lyj@smailnju.edu.cn,sanglu@ nju.edu cn;T.Gu,School of Computer Science and Information Technology,RMIT University.Melbourne VIC 3000,Aus- tralia;email:tao.gu@rmit.edu.au. Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page.Copyrights for components of this work owned by others than ACM must be honored Abstracting with credit is permitted.To copy otherwise,or republish,to post on servers or to redistribute to lists.requires prior specific permission and/or a fee.Request permissions from permissions@acm.org. 2019 Association for Computing Machinery. 1550-4859/2019/10-ART44$15.00 https:/∥doi.org/10.1145/3343855 ACM Transactions on Sensor Networks,Vol.15,No.4.Article 44.Publication date:October 2019. 44
44 AirContour: Building Contour-based Model for In-Air Writing Gesture Recognition YAFENG YIN and LEI XIE, State Key Laboratory for Novel Software Technology, Nanjing University, China TAO GU, RMIT University, Australia YIJIA LU and SANGLU LU, State Key Laboratory for Novel Software Technology, Nanjing University, China Recognizing in-air hand gestures will benefit a wide range of applications such as sign-language recognition, remote control with hand gestures, and “writing” in the air as a new way of text input. This article presents AirContour, which focuses on in-air writing gesture recognition with a wrist-worn device. We propose a novel contour-based gesture model that converts human gestures to contours in 3D space and then recognizes the contours as characters. Different from 2D contours, the 3D contours may have the problems such as contour distortion caused by different viewing angles, contour difference caused by different writing directions, and the contour distribution across different planes. To address the above problem, we introduce Principal Component Analysis (PCA) to detect the principal/writing plane in 3D space, and then tune the projected 2D contour in the principal plane through reversing, rotating, and normalizing operations, to make the 2D contour in right orientation and normalized size under a uniform view. After that, we propose both an online approach, AC-Vec, and an offline approach, AC-CNN, for character recognition. The experimental results show that AC-Vec achieves an accuracy of 91.6% and AC-CNN achieves an accuracy of 94.3% for gesture/character recognition, both outperforming the existing approaches. CCS Concepts: • Human-centered computing → Ubiquitous and mobile computing design and evaluation methods; Empirical studies in ubiquitous and mobile computing; Additional Key Words and Phrases: AirContour, in-air writing, contour-based gesture model, principal component analysis (PCA), gesture recognition ACM Reference format: Yafeng Yin, Lei Xie, Tao Gu, Yijia Lu, and Sanglu Lu. 2019. AirContour: Building Contour-based Model for In-Air Writing Gesture Recognition. ACM Trans. Sen. Netw. 15, 4, Article 44 (October 2019), 25 pages. https://doi.org/10.1145/3343855 This work is supported by National Natural Science Foundation of China under Grant Nos. 61802169, 61872174, 61832008, 61321491; JiangSu Natural Science Foundation under Grant No. BK20180325; the Fundamental Research Funds for the Central Universities under Grant No. 020214380049; Australian Research Council (ARC) Discovery Project Grants DP190101888 and DP180103932. This work is partially supported by Collaborative Innovation Center of Novel Software Technology and Industrialization. Authors’ addresses: Y. Yin, L. Xie (corresponding author), Y. Lu, and S. Lu, State Key Laboratory for Novel Software Technology, Nanjing University, Nanjing, 210023, China; emails: {yafeng, lxie}@nju.edu.cn, lyj@smail.nju.edu.cn, sanglu@ nju.edu.cn; T. Gu, School of Computer Science and Information Technology, RMIT University, Melbourne VIC 3000, Australia; email: tao.gu@rmit.edu.au. Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from permissions@acm.org. © 2019 Association for Computing Machinery. 1550-4859/2019/10-ART44 $15.00 https://doi.org/10.1145/3343855 ACM Transactions on Sensor Networks, Vol. 15, No. 4, Article 44. Publication date: October 2019
44:2 Y.Yin et al. 1 INTRODUCTION With the advancement of rich embedded sensors,mobile or wearable devices (e.g.,smartphones, smartwatches)have been largely used in activity recognition [21,23,26,31,37,41,45]and benefit many human-computer interactions,e.g.,motion-sensing games [25],sign-language recognition [12],in-air writing [1],and so on.As a typical interaction mode,writing in the air has aroused wide attention [6,9,10,36,39].It allows users to write characters with arm and hand freely in the air without focusing attention on the small screen or tiny keys on a device [2].As shown in Figure 1,a user carrying/wearing a sensor-embedded device writes in the air,and the gesture will be recognized as a character.Recognizing in-air writing gestures is a key technology to facilitate writing gesture-based interactions in the air and can be used in many scenarios.For example, 'writing"commands in the air to control a unmanned aerial vehicle(UAV),while looking at the scene transmitted from the UAV in a virtual reality(VR)headset,to avoid taking off the VR headset and inputting the commands with a controller.Another example could be replacing the traditional on-screen text input by"writing"the text message in the air,thus allowing to interact with mobile or wearable devices having a tiny or no screen.Besides,when one hand of the user is occupied, typing with a keyboard becomes inconvenient;the sensor-assisted in-air input technology can be used to capture hand gestures and lay them out in text or image [1].When comparing to the existing handwriting,voice,or camera-based input,in-air writing with inertial sensors can tol- erate limited screen,environmental noises,and poor light conditions.In this article,we focus on recognizing in-air writing gestures as characters. In inertial sensor-based gesture recognition,many approaches have been proposed.Some data- driven approaches [2,7,10,15,35]tend to extract features from sensor data to train classifiers for gesture recognition while paying little attention on human activity analysis.If the user performs gestures with more degrees of freedom,i.e.,the gestures may have large variations in speeds,sizes, or orientations,then the type of approaches may fail to recognize them with high accuracy.In contrast,some pattern-driven approaches[1,13,32]try to capture the moving patterns of gestures for activity recognition.For example,Agrawal et al.[1]utilize the segmented strokes and grammar tree to recognize capital letters in a 2D plane.However,due to the complexity of analyzing human activities,the type of approaches may redefine the gesture patterns or constrain the gestures in a limited area (e.g.,on a limited 2D plane),which may decrease user experience.To track the continuous in-air gestures,Shen et al.[29]utilize the 5-DoF arm model and HMM to track the 3D posture of the arm.However,in 3D space,tracking is not directly linked to recognition,especially when the trajectory(e.g..handwriting trajectory)locates in different planes.Therefore,it is still a challenging task to apply the existing approaches to recognize in-air writing gestures that occur in 3D space with more degrees of freedom while guaranteeing user experience. To address the aforementioned issues,in this article,we explore contours to represent in-air writing gestures and propose a novel contour-based gesture model,where the "contour"is repre- sented with a sequence of coordinate points over time.We use an off-the-shelf wrist-worn device (e.g,smartwatch)to collect sensor data,and our basic idea is to build a 3D contour model for each gesture and utilize the contour feature to recognize gestures as characters,as illustrated in Figure 1.Since the gesture contour keeps the essential movement patterns of in-air gestures,it can tolerate the intra-class variability of gestures.It is worth noting that while the proposed"contour- gesture"model is applied in in-air writing gesture recognition for this work,it can also be used in sign-language recognition and remote control with hand gestures [40].However,different from 2D contours,building 3D contours presents several challenges,i.e.,contour distortion caused by different viewing angles,contour difference caused by different writing directions,and contour distribution across different planes,making it difficult to recognize 3D contours as 2D characters. ACM Transactions on Sensor Networks,Vol 15.No.4,Article 44.Publication date:October 2019
44:2 Y. Yin et al. 1 INTRODUCTION With the advancement of rich embedded sensors, mobile or wearable devices (e.g., smartphones, smartwatches) have been largely used in activity recognition [21, 23, 26, 31, 37, 41, 45] and benefit many human-computer interactions, e.g., motion-sensing games [25], sign-language recognition [12], in-air writing [1], and so on. As a typical interaction mode, writing in the air has aroused wide attention [6, 9, 10, 36, 39]. It allows users to write characters with arm and hand freely in the air without focusing attention on the small screen or tiny keys on a device [2]. As shown in Figure 1, a user carrying/wearing a sensor-embedded device writes in the air, and the gesture will be recognized as a character. Recognizing in-air writing gestures is a key technology to facilitate writing gesture-based interactions in the air and can be used in many scenarios. For example, “writing” commands in the air to control a unmanned aerial vehicle (UAV), while looking at the scene transmitted from the UAV in a virtual reality (VR) headset, to avoid taking off the VR headset and inputting the commands with a controller. Another example could be replacing the traditional on-screen text input by “writing” the text message in the air, thus allowing to interact with mobile or wearable devices having a tiny or no screen. Besides, when one hand of the user is occupied, typing with a keyboard becomes inconvenient; the sensor-assisted in-air input technology can be used to capture hand gestures and lay them out in text or image [1]. When comparing to the existing handwriting, voice, or camera-based input, in-air writing with inertial sensors can tolerate limited screen, environmental noises, and poor light conditions. In this article, we focus on recognizing in-air writing gestures as characters. In inertial sensor-based gesture recognition, many approaches have been proposed. Some datadriven approaches [2, 7, 10, 15, 35] tend to extract features from sensor data to train classifiers for gesture recognition while paying little attention on human activity analysis. If the user performs gestures with more degrees of freedom, i.e., the gestures may have large variations in speeds, sizes, or orientations, then the type of approaches may fail to recognize them with high accuracy. In contrast, some pattern-driven approaches [1, 13, 32] try to capture the moving patterns of gestures for activity recognition. For example, Agrawal et al. [1] utilize the segmented strokes and grammar tree to recognize capital letters in a 2D plane. However, due to the complexity of analyzing human activities, the type of approaches may redefine the gesture patterns or constrain the gestures in a limited area (e.g., on a limited 2D plane), which may decrease user experience. To track the continuous in-air gestures, Shen et al. [29] utilize the 5-DoF arm model and HMM to track the 3D posture of the arm. However, in 3D space, tracking is not directly linked to recognition, especially when the trajectory (e.g., handwriting trajectory) locates in different planes. Therefore, it is still a challenging task to apply the existing approaches to recognize in-air writing gestures that occur in 3D space with more degrees of freedom while guaranteeing user experience. To address the aforementioned issues, in this article, we explore contours to represent in-air writing gestures and propose a novel contour-based gesture model, where the “contour” is represented with a sequence of coordinate points over time. We use an off-the-shelf wrist-worn device (e.g., smartwatch) to collect sensor data, and our basic idea is to build a 3D contour model for each gesture and utilize the contour feature to recognize gestures as characters, as illustrated in Figure 1. Since the gesture contour keeps the essential movement patterns of in-air gestures, it can tolerate the intra-class variability of gestures. It is worth noting that while the proposed “contourgesture” model is applied in in-air writing gesture recognition for this work, it can also be used in sign-language recognition and remote control with hand gestures [40]. However, different from 2D contours, building 3D contours presents several challenges, i.e., contour distortion caused by different viewing angles, contour difference caused by different writing directions, and contour distribution across different planes, making it difficult to recognize 3D contours as 2D characters. ACM Transactions on Sensor Networks, Vol. 15, No. 4, Article 44. Publication date: October 2019
AirContour:Building Contour-based Model for In-Air Writing Gesture Recognition 44:3 .Modote Wrist-worn device (e.g.,smartwatch) In-air In-air/3D Output writing gesture contours (recognized characters) Fig.1.AirContour:in-air writing gesture recognition based on contours. To solve this problem,we first describe the range of viewing angles based on the way that the device is worn,which indicates the possible writing directions.We then apply Principal Compo- nent Analysis(PCA)to detect the principal/writing plane,i.e.,most of the contour is located in or close to the plane.After that,we calibrate the 2D projected contour in the principal plane for gesture/character recognition while considering the distortion caused by dimensionality reduction and the difference of gesture sizes. We make the following contributions in this article: To the best of our knowledge,we are the first to propose the contour-based gesture model to recognize in-air writing gestures.The model is designed to solve the new challenges in 3D gesture contours,e.g.,observation ambiguity,uncertain orientation and distribution of 3D contours,and tolerate the intra-class variability of gestures.The contour-based gesture model can be applied in not only in-air writing gesture recognition,but also many other scenarios such as sign-language recognition,motion-sensing games,and remote control with hand gestures. To recognize gesture contours in 3D space as characters in a 2D plane,we introduce PCA for dimensionality reduction and a series of calibrations for 2D contours.Specifically,we first utilize PCA to detect the principal/writing plane,and then project the 3D contour into the principal plane for dimensionality reduction.After that,we calibrate the 2D contour in the principal plane through reversing,rotating,and normalizing operations,to make it in right orientation and normalized size under a uniform view,i.e.,to make the 2D contour suitable for character recognition. We conduct extensive experiments to verify the efficiency of the proposed contour-based gesture model.In addition,based on the model,we propose an online approach,AC-Vec, and an offline approach,AC-CNN,to recognize 2D contours as characters.The experimental results show that AC-Vec and AC-CNN achieve an accuracy of 91.6%and 94.3%,respectively, for gesture/character recognition,and both outperform the existing approaches. 2 RELATED WORK In this section,we describe and analyze the state-of-the-art related to in-air gesture recognition, tracking,writing in the air,and handwritten character recognition,especially focusing on inertial sensor-based techniques. In-air gesture recognition:Parate et al.[26]design a mobile solution called RisQ to detect smoking gestures and sessions with a wristband and use a machine learning pipeline to process sensor data.Blank et al.[7]present a system for table tennis stroke detection and classification by attaching inertial sensors to table-tennis rackets.Thomaz et al.[31]describe the implementation ACM Transactions on Sensor Networks,Vol.15,No.4.Article 44.Publication date:October 2019
AirContour: Building Contour-based Model for In-Air Writing Gesture Recognition 44:3 Fig. 1. AirContour: in-air writing gesture recognition based on contours. To solve this problem, we first describe the range of viewing angles based on the way that the device is worn, which indicates the possible writing directions. We then apply Principal Component Analysis (PCA) to detect the principal/writing plane, i.e., most of the contour is located in or close to the plane. After that, we calibrate the 2D projected contour in the principal plane for gesture/character recognition while considering the distortion caused by dimensionality reduction and the difference of gesture sizes. We make the following contributions in this article: • To the best of our knowledge, we are the first to propose the contour-based gesture model to recognize in-air writing gestures. The model is designed to solve the new challenges in 3D gesture contours, e.g., observation ambiguity, uncertain orientation and distribution of 3D contours, and tolerate the intra-class variability of gestures. The contour-based gesture model can be applied in not only in-air writing gesture recognition, but also many other scenarios such as sign-language recognition, motion-sensing games, and remote control with hand gestures. • To recognize gesture contours in 3D space as characters in a 2D plane, we introduce PCA for dimensionality reduction and a series of calibrations for 2D contours. Specifically, we first utilize PCA to detect the principal/writing plane, and then project the 3D contour into the principal plane for dimensionality reduction. After that, we calibrate the 2D contour in the principal plane through reversing, rotating, and normalizing operations, to make it in right orientation and normalized size under a uniform view, i.e., to make the 2D contour suitable for character recognition. • We conduct extensive experiments to verify the efficiency of the proposed contour-based gesture model. In addition, based on the model, we propose an online approach, AC-Vec, and an offline approach, AC-CNN, to recognize 2D contours as characters. The experimental results show that AC-Vec and AC-CNN achieve an accuracy of 91.6% and 94.3%, respectively, for gesture/character recognition, and both outperform the existing approaches. 2 RELATED WORK In this section, we describe and analyze the state-of-the-art related to in-air gesture recognition, tracking, writing in the air, and handwritten character recognition, especially focusing on inertial sensor-based techniques. In-air gesture recognition: Parate et al. [26] design a mobile solution called RisQ to detect smoking gestures and sessions with a wristband and use a machine learning pipeline to process sensor data. Blank et al. [7] present a system for table tennis stroke detection and classification by attaching inertial sensors to table-tennis rackets. Thomaz et al. [31] describe the implementation ACM Transactions on Sensor Networks, Vol. 15, No. 4, Article 44. Publication date: October 2019
44:4 Y.Yin et al. and evaluation of an approach to infer eating moments using a 3-axis accelerometer in a smart- watch.Xu et al.[35]build a classifier to identify users'hand and finger gestures utilizing the essential features of accelerometer and gyroscope data measured from a smartwatch.Huang et al. [18]build a system to monitor brushing quality using a manual toothbrush modified by attaching small magnets to the handle and an off-the-shelf smartwatch.These approaches typically extract features from sensor data and apply machine learning techniques for gesture recognition. In-air gesture tracking:Zhou et al.[42-44]utilize a kinematic chain to track human upper- limb motion by placing multiple devices on the arm.Cutti et al.[11]utilize the joint angles to track the movements of upper limbs by placing sensors on the chest,shoulder,arm,and wrist. Chen et al.[8]design a wearable system consisting of a pair of magnetometers on fingers and a permanent magnet affixed to the thumb and introduce uTrack to convert the thumb and fingers into a continuous input system (e.g.,3D pointing).Shen et al.[29]utilize the 5-DoF arm model and HMM to track the 3D posture of the arm,using both motion and magnetic sensors in a smartwatch. In fact,accurate in-air gesture tracking in real time can be very challenging.Besides,obtaining the 3D moving trajectory does not mean recognizing in-air gestures.In this article,we do not require accurate trajectory tracking while aiming to obtain gesture contour and recognize it as a character. Writing in the air:Zhang et al.[39]quantify data into small integral vectors based on accel- eration orientation and then use HMM to recognize 10 Arabic numerals.Wang et al.[32]present IMUPEN to reconstruct motion trajectory and recognize handwritten digits.Bashir et al.[6]use a pen equipped with inertial sensors and apply DTW to recognize handwritten characters.Agrawal et al.[1]recognize handwritten capital letters and Arabic numerals in a 2D plane based on strokes and a grammar tree by using the built-in accelerometer in smartphone.Amma et al.[2]design a glove equipped with inertial sensors and use SVM,HMM,and statistical language model to rec- ognize capital letters,sentences,and so on.Deselaers et al.[13]present GyroPen to reconstruct the writing path for pen-like interaction.Xu et al.[36]utilize the continuous density HMM and Viterbi algorithm to recognize handwritten digits and letters using inertial sensors.In this article, we focus on single in-air character recognition without the assistance of a language model.For a character,we do not define specific strokes or require pen-up for stroke segmentation,while tol- erating the intra-class variability caused by writing speeds,gesture sizes,writing directions,and observation ambiguity caused by viewing angles and so on in 3D space. Handwritten character recognition:In addition to inertial sensor-based approaches,many image processing techniques [3,14,16]have also been adopted for recognizing handwritten characters in a 2D plane (i.e.,image).Bahlmann et al.[4]combine DTW and SVMs to establish a Gaussian DTW(GDTW)kernel for on-line recognition of UNIPEN handwriting data.Rayar et al.[28]propose preselection method for CNN-based classification and evaluate it in handwritten character recognition in images.Rao et al.[27]propose a newly designed network structure based on an extended nonlinear kernel residual network to recognize the handwritten characters over MINIST and SVHN datasets.These approaches focus on recognizing hand-moving trajectories in a 2D plane,while our article focuses on transforming the 3D gesture into a proper 2D contour and then utilizes the contour's space-time feature to recognize contours as characters. 3 TECHNICAL CHALLENGES AND DEFINITIONS IN IN-AIR GESTURE RECOGNITION 3.1 Intra-class Variability in Sensor Data As shown in Figure 2,even when the user performs the same type of gestures (e.g.,writes"t"), the sensor data can be quite different due to the variation of writing speeds(Figure 2(a)),gesture sizes(Figure 2(b)),writing directions(Figure 2(c)),and so on.It indicates that directly using the ACM Transactions on Sensor Networks,Vol 15.No.4,Article 44.Publication date:October 2019
44:4 Y. Yin et al. and evaluation of an approach to infer eating moments using a 3-axis accelerometer in a smartwatch. Xu et al. [35] build a classifier to identify users’ hand and finger gestures utilizing the essential features of accelerometer and gyroscope data measured from a smartwatch. Huang et al. [18] build a system to monitor brushing quality using a manual toothbrush modified by attaching small magnets to the handle and an off-the-shelf smartwatch. These approaches typically extract features from sensor data and apply machine learning techniques for gesture recognition. In-air gesture tracking: Zhou et al. [42–44] utilize a kinematic chain to track human upperlimb motion by placing multiple devices on the arm. Cutti et al. [11] utilize the joint angles to track the movements of upper limbs by placing sensors on the chest, shoulder, arm, and wrist. Chen et al. [8] design a wearable system consisting of a pair of magnetometers on fingers and a permanent magnet affixed to the thumb and introduce uTrack to convert the thumb and fingers into a continuous input system (e.g., 3D pointing). Shen et al. [29] utilize the 5-DoF arm model and HMM to track the 3D posture of the arm, using both motion and magnetic sensors in a smartwatch. In fact, accurate in-air gesture tracking in real time can be very challenging. Besides, obtaining the 3D moving trajectory does not mean recognizing in-air gestures. In this article, we do not require accurate trajectory tracking while aiming to obtain gesture contour and recognize it as a character. Writing in the air: Zhang et al. [39] quantify data into small integral vectors based on acceleration orientation and then use HMM to recognize 10 Arabic numerals. Wang et al. [32] present IMUPEN to reconstruct motion trajectory and recognize handwritten digits. Bashir et al. [6] use a pen equipped with inertial sensors and apply DTW to recognize handwritten characters. Agrawal et al. [1] recognize handwritten capital letters and Arabic numerals in a 2D plane based on strokes and a grammar tree by using the built-in accelerometer in smartphone. Amma et al. [2] design a glove equipped with inertial sensors and use SVM, HMM, and statistical language model to recognize capital letters, sentences, and so on. Deselaers et al. [13] present GyroPen to reconstruct the writing path for pen-like interaction. Xu et al. [36] utilize the continuous density HMM and Viterbi algorithm to recognize handwritten digits and letters using inertial sensors. In this article, we focus on single in-air character recognition without the assistance of a language model. For a character, we do not define specific strokes or require pen-up for stroke segmentation, while tolerating the intra-class variability caused by writing speeds, gesture sizes, writing directions, and observation ambiguity caused by viewing angles and so on in 3D space. Handwritten character recognition: In addition to inertial sensor-based approaches, many image processing techniques [3, 14, 16] have also been adopted for recognizing handwritten characters in a 2D plane (i.e., image). Bahlmann et al. [4] combine DTW and SVMs to establish a Gaussian DTW (GDTW) kernel for on-line recognition of UNIPEN handwriting data. Rayar et al. [28] propose preselection method for CNN-based classification and evaluate it in handwritten character recognition in images. Rao et al. [27] propose a newly designed network structure based on an extended nonlinear kernel residual network to recognize the handwritten characters over MINIST and SVHN datasets. These approaches focus on recognizing hand-moving trajectories in a 2D plane, while our article focuses on transforming the 3D gesture into a proper 2D contour and then utilizes the contour’s space-time feature to recognize contours as characters. 3 TECHNICAL CHALLENGES AND DEFINITIONS IN IN-AIR GESTURE RECOGNITION 3.1 Intra-class Variability in Sensor Data As shown in Figure 2, even when the user performs the same type of gestures (e.g., writes “t”), the sensor data can be quite different due to the variation of writing speeds (Figure 2(a)), gesture sizes (Figure 2(b)), writing directions (Figure 2(c)), and so on. It indicates that directly using the ACM Transactions on Sensor Networks, Vol. 15, No. 4, Article 44. Publication date: October 2019
AirContour:Building Contour-based Model for In-Air Writing Gesture Recognition 44:5 Fast 220 4r40 —2340 A mopiis.m 一a闪 40smp网9arm10He120 Sampsecnee50H2 Slow 0'G1 R的h礼 —n—有 (a)Different speeds (b)Different sizes (cm) (c)Different directions Fig.2.Linear acceleration of writing the same character"t." extracted features from sensor data may fail to recognize in-air gestures accurately.In regard to the definitions of speeds,sizes,and directions,they can be found in Section 6.2. To handle the intra-class variability of in-air gestures,e.g.,the variation of speed,amplitude, and orientation of gestures,we present the contour-based gesture model,which utilizes contours to correlate sensor data with human gestures.The"contour"is represented with a sequence of coordinate points over time.Additionally,to avoid the differences caused by facing directions,we transform the sensor data from a device coordinate system to a human coordinate system shown in Figure 5(a),i.e.,we analyze the 3D contours in a human coordinate system.In this article,we take the instance of writing characters in the air to illustrate the contour-based gesture model. The characters refer to the alphabet,i.e.,"a"-"z,"and we use the term"character"and "letter" interchangeably throughout the article.It is worth mentioning that in-air writing letters can be different from printed letters due to joined-up writing.In particular,we remove the point of"i" and"j,”and use“t”to represent the letter“I”for simplification. 3.2 Difference between 2D Contours and 3D Contours Usually,people get used to recognizing and reading handwritten characters in a 2D plane,e.g.,on a piece of paper.Therefore,we can map a 2D gesture contour with a 2D character for recogni- tion.However,based on extensive observations and experimental study,we find that 3D contour recognition is quite different from 2D contour recognition.In fact,recognizing 3D contours as 2D characters is a challenging task,due to the contour distortion caused by viewing angles,con- tour difference caused by writing directions,and contour distribution across different planes,as described below. 3.2.1 Viewing Angles.There is a uniform viewing angle for a 2D character contour,while there are multiple viewing angles for a 3D character contour.In a predefined plane-coordinate system, the 2D gesture contour is discriminative and can be used for character recognition;it is consistent with people's cognition habits for handwriting letters.However,in 3D space,even in a predefined coordinate system,we can look at the 3D contour from different viewing angles,thus the observed 3D contour can be quite different.As shown in Figure 3,when we look at the 3D contour of"t" from left to right,the shape and orientation of the character contour change a lot,as the contour located in the red circle in Figure 3(a),Figure 3(b),and Figure 3(c)indicates.For a character,its contour consists of one or several strokes in a sequential order and right orientation.If the char- acter contour changes,then it can lead to the misrecognition of characters.For example,when we ACM Transactions on Sensor Networks,Vol.15,No.4.Article 44.Publication date:October 2019
AirContour: Building Contour-based Model for In-Air Writing Gesture Recognition 44:5 Fig. 2. Linear acceleration of writing the same character “t.” extracted features from sensor data may fail to recognize in-air gestures accurately. In regard to the definitions of speeds, sizes, and directions, they can be found in Section 6.2. To handle the intra-class variability of in-air gestures, e.g., the variation of speed, amplitude, and orientation of gestures, we present the contour-based gesture model, which utilizes contours to correlate sensor data with human gestures. The “contour” is represented with a sequence of coordinate points over time. Additionally, to avoid the differences caused by facing directions, we transform the sensor data from a device coordinate system to a human coordinate system shown in Figure 5(a), i.e., we analyze the 3D contours in a human coordinate system. In this article, we take the instance of writing characters in the air to illustrate the contour-based gesture model. The characters refer to the alphabet, i.e., “a”–“z,” and we use the term “character” and “letter” interchangeably throughout the article. It is worth mentioning that in-air writing letters can be different from printed letters due to joined-up writing. In particular, we remove the point of “i” and “j,” and use “ι” to represent the letter “l” for simplification. 3.2 Difference between 2D Contours and 3D Contours Usually, people get used to recognizing and reading handwritten characters in a 2D plane, e.g., on a piece of paper. Therefore, we can map a 2D gesture contour with a 2D character for recognition. However, based on extensive observations and experimental study, we find that 3D contour recognition is quite different from 2D contour recognition. In fact, recognizing 3D contours as 2D characters is a challenging task, due to the contour distortion caused by viewing angles, contour difference caused by writing directions, and contour distribution across different planes, as described below. 3.2.1 Viewing Angles. There is a uniform viewing angle for a 2D character contour, while there are multiple viewing angles for a 3D character contour. In a predefined plane-coordinate system, the 2D gesture contour is discriminative and can be used for character recognition; it is consistent with people’s cognition habits for handwriting letters. However, in 3D space, even in a predefined coordinate system, we can look at the 3D contour from different viewing angles, thus the observed 3D contour can be quite different. As shown in Figure 3, when we look at the 3D contour of “t” from left to right, the shape and orientation of the character contour change a lot, as the contour located in the red circle in Figure 3(a), Figure 3(b), and Figure 3(c) indicates. For a character, its contour consists of one or several strokes in a sequential order and right orientation. If the character contour changes, then it can lead to the misrecognition of characters. For example, when we ACM Transactions on Sensor Networks, Vol. 15, No. 4, Article 44. Publication date: October 2019
44:6 Y.Yin et al. 0.2 02 0.2 E (w) 0 N -0.2 -0.2 0.2 0.2 0.2 0.2 0.2 0.2 0 0.2 )0202 x(m) 0.2-02 (m) 0202 x(m) (a)From left (b)From center (c)From right Fig.3.Observed 3D contours from different viewing angles. 0.2 0.2 0.2 -0.2 0.2 0.2 0.2 0.2 0.2 0.2 02 0.2 0.2-0.2 x(m) -02-02 (m) 片份0202 x(m) (a)Writing towardsx plane (b)Writing towards y plane (c)Writing towards x plane Fig.4.Different contours from the same viewing angle. look at the contours of"b"and "q"from different viewing angles,we may see similar shapes,and it may be difficult to distinguish them.Therefore,it is expected to select a proper viewing angle to mitigate the confusion about character contours. 3.2.2 Writing Directions.From a uniform view,2D contours of a same character are similar,while the 3D contours can be quite different,due to uncertain writing directions.On a 2D plane,the contours of the same character keep the essential shape feature.Even if the orientation of a 2D contour changes,e.g.,the 2D contour rotates in the plane,it still keeps the shape feature of the contour. However,in 3D space,even if we look at the contours of the same character from the same viewing angle,the observed contours can be quite different,as the contours in the red circles shown in Figure 4.This is because the user can write in-air gestures towards different directions.Intuitively, if we can adaptively project the 3D contour into a corresponding coordinate plane(e.g.,xh-zh plane,yh-zh plane,or xh-yh plane),we may mitigate the contour distortion caused by writing directions. 3.2.3 Contour Distribution.A 2D contour locates in a plane,while a 3D contour can distribute across different planes.In Figure 5(a),we show the human coordinate system(human-frame for short)xh-yh-zh.When the user writes in the air,her/his hand can move left and right,up and down,thus the in-air gesture generates a 3D contour across different planes.At this time,the in-air contour may be mainly located in or close to the plane A3B3C3D3 while not be parallel to any coordinate plane,and we cannot directly project the in-air contour into a coordinate plane for dimensionality reduction,e.g.xh-zh plane.As shown in Figure 5(b),the 3D contour of"k" distributes across different planes,and the 3D contour is mainly located in or close to the red plane,instead of any coordinate plane (e.g.,the blue plane in Figure 5(a)).Here,the red plane is called principal plane or writing plane,which contains or is close to most of points in the 3D ACM Transactions on Sensor Networks,Vol 15.No.4.Article 44.Publication date:October 2019
44:6 Y. Yin et al. Fig. 3. Observed 3D contours from different viewing angles. Fig. 4. Different contours from the same viewing angle. look at the contours of “b” and “q” from different viewing angles, we may see similar shapes, and it may be difficult to distinguish them. Therefore, it is expected to select a proper viewing angle to mitigate the confusion about character contours. 3.2.2 Writing Directions. From a uniform view, 2D contours of a same character are similar, while the 3D contours can be quite different, due to uncertain writing directions. On a 2D plane, the contours of the same character keep the essential shape feature. Even if the orientation of a 2D contour changes, e.g., the 2D contour rotates in the plane, it still keeps the shape feature of the contour. However, in 3D space, even if we look at the contours of the same character from the same viewing angle, the observed contours can be quite different, as the contours in the red circles shown in Figure 4. This is because the user can write in-air gestures towards different directions. Intuitively, if we can adaptively project the 3D contour into a corresponding coordinate plane (e.g., xh − zh plane, yh − zh plane, or xh − yh plane), we may mitigate the contour distortion caused by writing directions. 3.2.3 Contour Distribution. A 2D contour locates in a plane, while a 3D contour can distribute across different planes. In Figure 5(a), we show the human coordinate system (human-frame for short) xh − yh − zh. When the user writes in the air, her/his hand can move left and right, up and down, thus the in-air gesture generates a 3D contour across different planes. At this time, the in-air contour may be mainly located in or close to the plane A3B3C3D3 while not be parallel to any coordinate plane, and we cannot directly project the in-air contour into a coordinate plane for dimensionality reduction, e.g., xh − zh plane. As shown in Figure 5(b), the 3D contour of “k” distributes across different planes, and the 3D contour is mainly located in or close to the red plane, instead of any coordinate plane (e.g., the blue plane in Figure 5(a)). Here, the red plane is called principal plane or writing plane, which contains or is close to most of points in the 3D ACM Transactions on Sensor Networks, Vol. 15, No. 4, Article 44. Publication date: October 2019
AirContour:Building Contour-based Model for In-Air Writing Gesture Recognition 44:7 B Plane Human B -0.1 frame D C -0.2 0.1 0.1 y8m)0.10.1 m (a)Hand movements (b)3D contour across different planes Fig.5.In-air gesture across different planes. (a)Writing with left hand (b)Writing with right hand Fig.6.Viewing angles for writing with different hands. contour,i.e.,the projected contour in the principal plane keeps the essential feature of the 3D contour.Therefore,we are expected to adaptively project the 3D contour into the principal plane and obtain the essential contour feature of the handwritten character,as the contour"k"shown in the red circle in Figure 5(b). 3.3 Some Definitions about In-air Gestures According to Section 3.2,the improper viewing angle will lead to the distortion of the observed gesture contour.To mitigate the confusion or misrecognition of gesture contours caused by view- ing angles,we first define the appropriate range of viewing angles based on people's writing habits, i.e.,when the user writes in the air,her/his eyes track the movement of the hand naturally. As shown in Figure 6(a),when the user writes with the left hand,she/he tends to write in front,left side,or below;the corresponding viewing angle comes from behind,right side,or up side.Accordingly,we select a reference coordinate plane for each viewing angle,i.e.,xh-zh plane, yh-Zh plane,and xh-yh plane,respectively.Similarly,as shown in Figure 6(b),when the user writes with the right hand in front,right side,or below,the corresponding viewing angle comes from behind,left side,or up side.The selected reference coordinate plane under the viewing angles are xh-zh plane,(-yh)-zh plane,and xh-yh plane,respectively.Therefore,there is a mapping relationship between a reference coordinate plane and a viewing angle.With the selected reference coordinate plane,the user will not view a character contour in the right orientation as a reversed contour(referring to Figure 3(a)and Figure 3(c)).It is worth mentioning that the selected reference coordinate plane is used to indicate the possible orientation of the projected contour in principal plane,as described in Section 4.2.It does not mean that the user can only write on xh-zh,yh -Zh, or xy-yh planes;in fact,the user can write towards arbitrary directions in 3D space. ACM Transactions on Sensor Networks,Vol.15,No.4.Article 44.Publication date:October 2019
AirContour: Building Contour-based Model for In-Air Writing Gesture Recognition 44:7 Fig. 5. In-air gesture across different planes. Fig. 6. Viewing angles for writing with different hands. contour, i.e., the projected contour in the principal plane keeps the essential feature of the 3D contour. Therefore, we are expected to adaptively project the 3D contour into the principal plane and obtain the essential contour feature of the handwritten character, as the contour “k” shown in the red circle in Figure 5(b). 3.3 Some Definitions about In-air Gestures According to Section 3.2, the improper viewing angle will lead to the distortion of the observed gesture contour. To mitigate the confusion or misrecognition of gesture contours caused by viewing angles, we first define the appropriate range of viewing angles based on people’s writing habits, i.e., when the user writes in the air, her/his eyes track the movement of the hand naturally. As shown in Figure 6(a), when the user writes with the left hand, she/he tends to write in front, left side, or below; the corresponding viewing angle comes from behind, right side, or up side. Accordingly, we select a reference coordinate plane for each viewing angle, i.e., xh − zh plane, yh − zh plane, and xh − yh plane, respectively. Similarly, as shown in Figure 6(b), when the user writes with the right hand in front, right side, or below, the corresponding viewing angle comes from behind, left side, or up side. The selected reference coordinate plane under the viewing angles are xh − zh plane, (−yh ) − zh plane, and xh − yh plane, respectively. Therefore, there is a mapping relationship between a reference coordinate plane and a viewing angle. With the selected reference coordinate plane, the user will not view a character contour in the right orientation as a reversed contour (referring to Figure 3(a) and Figure 3(c)). It is worth mentioning that the selected reference coordinate plane is used to indicate the possible orientation of the projected contour in principal plane, as described in Section 4.2. It does not mean that the user can only write on xh − zh, yh − zh, or xy − yh planes; in fact, the user can write towards arbitrary directions in 3D space. ACM Transactions on Sensor Networks, Vol. 15, No. 4, Article 44. Publication date: October 2019
44:8 Y.Yin et al. 0.2 02 0.2 N 10 -0.2 -0.2 -0.2 02 02 02 0.2 0.2 0.2 m)02 -0.2 (m) m)0202 (m) %9m02 -0.2 x(m) (a) (b) (c) Fig.7.Different principal planes. Here,the hand (i.e.,left hand or right hand)and the writing directions,i.e.,in front,left side, right side,or below,determine the viewing angles.To detect which hand writes in the air,we introduce an initial gesture before writing,i.e.,the user stands with the hands down and then opens up the arm wearing the device until the arm is parallel to the floor.In the human coordinate system,if the hand moves left,then the user writes with the left hand.Otherwise,the user writes with the right hand.In regard to the human coordinate system,it will be described in the later System Design section.To detect the writing direction and project the 3D contour into a 2D plane properly,we introduce the 3D contour-based gesture model,as described below. 4 3D CONTOUR-BASED GESTURE MODEL Based on the accelerometer,gyroscope,and magnetometer of the wrist-worn device,we can get the 3D contour of the in-air gesture.However,according to Section 3.2,due to the uncertainty of the viewing angle,writing direction,and contour distribution,it is essential to find a plane to get the proper projection of 3D contour for character recognition.To solve this issue,we first introduce Principal Component Analysis(PCA)to adaptively detect the principal/writing plane.Then,we detect the reference coordinate plane and determine the viewing angle.After that,we tune the 2D contour in the principal plane to get the character contour in right orientation and normalized size. 4.1 Principal Plane Detection with PCA As mentioned before,to get a proper projected 2D contour for character recognition,we need to detect the principal/writing plane,which contains or is close to most of points in the 3D contour, as the red plane in Figure 7(a),Figure 7(b),and Figure 7(c)indicates.It is worth noting that the principal plane may not be parallel to any coordinate plane,as shown in Figure 7.In this article, we utilize Principal Component Analysis(PCA)[30]to reduce the dimensionality of 3D contour and detect the principal plane adaptively,as described below. For convenience,we use xi=(xi,x2,xi),ie[1,n]to represent the contour (i.e.,point se- quence)in xh-axis,yh-axis,and zh-axis of the human coordinate system.First,we introduce the centralization operation to update the coordinatesxi of the contour,i.e. xi2 =xi2-xx=x1x.Then,we use @i =(@n,0n2.0ns),ie [1,2]to rep- resent the orthonormal basis vectors of the principal plane.Here,ll=1.=0.ij. As shown in Figure 8,for the point xi in human-frame,its projection point in the principal plane isyi=(yn,yi2)=xi,where =(@@2).Then,we can use y;to reconstruct the coordinate of xi as i,as shown in Equation (1).The distance between xi and i is di=lxiil2: 2 xi= ∑wj=2(2'x. (1) j=1 ACM Transactions on Sensor Networks,Vol 15.No.4,Article 44.Publication date:October 2019
44:8 Y. Yin et al. Fig. 7. Different principal planes. Here, the hand (i.e., left hand or right hand) and the writing directions, i.e., in front, left side, right side, or below, determine the viewing angles. To detect which hand writes in the air, we introduce an initial gesture before writing, i.e., the user stands with the hands down and then opens up the arm wearing the device until the arm is parallel to the floor. In the human coordinate system, if the hand moves left, then the user writes with the left hand. Otherwise, the user writes with the right hand. In regard to the human coordinate system, it will be described in the later System Design section. To detect the writing direction and project the 3D contour into a 2D plane properly, we introduce the 3D contour-based gesture model, as described below. 4 3D CONTOUR-BASED GESTURE MODEL Based on the accelerometer, gyroscope, and magnetometer of the wrist-worn device, we can get the 3D contour of the in-air gesture. However, according to Section 3.2, due to the uncertainty of the viewing angle, writing direction, and contour distribution, it is essential to find a plane to get the proper projection of 3D contour for character recognition. To solve this issue, we first introduce Principal Component Analysis (PCA) to adaptively detect the principal/writing plane. Then, we detect the reference coordinate plane and determine the viewing angle. After that, we tune the 2D contour in the principal plane to get the character contour in right orientation and normalized size. 4.1 Principal Plane Detection with PCA As mentioned before, to get a proper projected 2D contour for character recognition, we need to detect the principal/writing plane, which contains or is close to most of points in the 3D contour, as the red plane in Figure 7(a), Figure 7(b), and Figure 7(c) indicates. It is worth noting that the principal plane may not be parallel to any coordinate plane, as shown in Figure 7. In this article, we utilize Principal Component Analysis (PCA) [30] to reduce the dimensionality of 3D contour and detect the principal plane adaptively, as described below. For convenience, we use xi = (xi1, xi2, xi3) T , i ∈ [1,n] to represent the contour (i.e., point sequence) in xh − axis, yh − axis, and zh − axis of the human coordinate system. First, we introduce the centralization operation to update the coordinates xi of the contour, i.e., xi1 = xi1 − 1 n n j=1 xj1, xi2 = xi2 − 1 n n j=1 xj2, xi3 = xi3 − 1 n n j=1 xj3. Then, we use ωi = (ωi1,ωi2,ωi3) T , i ∈ [1, 2] to represent the orthonormal basis vectors of the principal plane. Here, ωi2 = 1, ωT i ωj = 0, i j. As shown in Figure 8, for the point xi in human-frame, its projection point in the principal plane is yi = (yi1,yi2) T = ΩT xi , where Ω = (ω1,ω2). Then, we can use yi to reconstruct the coordinate of xi as xˆi , as shown in Equation (1). The distance between xi and xˆi is di = xi − xˆi2: xˆi = 2 j=1 yijωj = Ω(ΩT xi ). (1) ACM Transactions on Sensor Networks, Vol. 15, No. 4, Article 44. Publication date: October 2019.
AirContour:Building Contour-based Model for In-Air Writing Gesture Recognition 44:9 xa-axis d xj-axis x2-axis Fig.8.The principle of writing plane detection with PCA. 02 E 4 02 0.2 0.2 0202 xp(m) -02 x。m 02 (a)3 D contour (b)Projected contour Fig.9.Relationship between contours and principal plane. When the average distanced=di.i[1.n]reaches the minimal value,the plane represented with the orthonormal basis vectors =(@1,@2)is the principal/writing plane,as shown in Equa- tion (2): arg min 1x- n司 (2) s.t.22=L. By combining Equation(1)and Equation(2),we can transform the objective in Equation(2)to Equation (3),where X =(x1,x2.....xn),while tr means the trace of a matrix,i.e.,the sum of the elements on the main diagonal of the matrix. arg max tr(Q'XX'Q) (3) s.t.22=L. After that,we use Lagrange multiplier method to obtain the orthonormal basis vectors (@@z), based on eigenvalue decomposition of XX,as shown in Equation(4).The orthonormal basis vec- tor @with the largest eigenvalue corresponds to the eigenvector @while the second eigenvector is @2.In the principal plane,we use and @z to represent the xp-axis and yp-axis of the principal plane,respectively. XXTwi=Ai@i. (4) As shown in Figure 9(a),the black line and the green line respectively mean the first basis vector @1 and the second basis vector @2,while the red plane containing and @z is the de- tected principal plane.In the principal plane,we can obtain the projected 2D contour,as shown in Figure 9(b).However,due to the information loss of dimensionality reduction,there may exist the ACM Transactions on Sensor Networks,Vol.15,No.4.Article 44.Publication date:October 2019
AirContour: Building Contour-based Model for In-Air Writing Gesture Recognition 44:9 Fig. 8. The principle of writing plane detection with PCA. Fig. 9. Relationship between contours and principal plane. When the average distance ¯ d = n i=1 di , i ∈ [1,n] reaches the minimal value, the plane represented with the orthonormal basis vectors Ω = (ω1,ω2) is the principal/writing plane, as shown in Equation (2): arg min Ω 1 n n i=1 xi − xˆi2 s.t.ΩT Ω = I. (2) By combining Equation (1) and Equation (2), we can transform the objective in Equation (2) to Equation (3), where X = (x1, x2,..., xn ), while tr means the trace of a matrix, i.e., the sum of the elements on the main diagonal of the matrix. arg max Ω tr(ΩTXXT Ω) s.t.ΩT Ω = I. (3) After that, we use Lagrange multiplier method to obtain the orthonormal basis vectors {ω1,ω2}, based on eigenvalue decomposition of XXT , as shown in Equation (4). The orthonormal basis vector ωi with the largest eigenvalue corresponds to the eigenvector ω1, while the second eigenvector is ω2. In the principal plane, we use ω1 and ω2 to represent the xp -axis and yp -axis of the principal plane, respectively. XXTωi = λiωi . (4) As shown in Figure 9(a), the black line and the green line respectively mean the first basis vector ω1 and the second basis vector ω2, while the red plane containing ω1 and ω2 is the detected principal plane. In the principal plane, we can obtain the projected 2D contour, as shown in Figure 9(b). However, due to the information loss of dimensionality reduction, there may exist the ACM Transactions on Sensor Networks, Vol. 15, No. 4, Article 44. Publication date: October 2019.
44:10 Y.Yin et al. problems such as reversal and skew of the projected contour,which needs further calibration.It is worth mentioning that lowercase letters are different from capital letters;the shapes of different lowercase letters observed from different viewing angles can be similar,e.g."b"and"q.""d"and "p,"thus the orientation and writing order of a character are both important for lowercase letter recognition.It is essential to calibrate the projected 2D contour in right orientation and normalized size under a uniform view for character recognition. 4.2 Reference Coordinate Plane Detection According to Section 4.1 and Figure 9,the projected 2D contour in the principal plane has a high probability of keeping the shape feature of the in-air contour,while still having the problems such as reversal and skew,i.e.,the orientation of the contour is changed.Thus,we need to calibrate the 2D contour in the principal plane.To achieve this goal,we detect the reference coordinate plane and determine the viewing angle first.Here,the reference coordinate plane is used to indicate the viewing angle and possible orientation of the projected contour in the principal plane.In regard to the user,she/he can perform the gesture towards arbitrary directions;the writing plane may be not parallel to any coordinate plane. 4.2.1 Axis Projection Calculation.We project the xh-axis,yh-axis,zh-axis of human-frame into the principal plane and then compare the length of the projected axis to determine the reference coordinate plane.According to Section 4.1,in the principal plane,the orthonormal basis vectors @@z represent the xp-axis and yp-axis,respectively.With xp-axis and yp-axis,we further cal- culate the zp-axis as @3=@ix @2 to establish the principal-plane coordinate system(principal- frame for short).Here,@1,@2,@3 are described in human-frame.While in the principal-frame,we can represent xp-axis,yp-axis,zp-axis as the unit vector(1,0,0),(0,1,0),(0,0,1),respectively. By comparing @1,@2,@3 in human-frame and xp-axis,yp-axis,zp-axis in principal-frame,we can get the rotation matrix Rhp.which transforms coordinates from human-frame to principal-frame, as shown in Equation(5)and Equation(6): 10 0 0 1 0 =Rhp[@1 @2 03], (5) 00 1 Rhp=[01 @2 @3]-1. (6) With the rotation matrix Rhp,we then calculate the projection of each axis of human-frame in principal plane.For convenience,we use ui,ie[1,3]to represent xhaxis,yh-axis,zh-axis, respectively.For ui,its coordinates in the principal-frame is gi,where qi =Rpui.Then,we get the projection vi of qi in the principal plane with M,i.e.,setting the coordinate value in zp-axis to zero,as shown in Equation(7): [1001 =Mq:=0109i (7) 000 As shown in Figure 10(b),Figure 11(b),Figure 12(b),Figure 13(b),and Figure 14(b),we represent the projected axis of xhaxis,yh-axis,zh-axis (of human-frame)in the principal plane with black, green,and fuchsia dashed line,respectively. 4.2.2 Reference Plane Detection.In Figure 15,we show how to utilize the length of the pro- jected axis to detect the reference coordinate plane.Intuitively,if the projection of axis @has the shortest length in the principal plane,it indicates that the coordinate plane perpendicular to r has the highest probability of being parallel to the principal plane and should be selected as the ACM Transactions on Sensor Networks,Vol 15.No.4.Article 44.Publication date:October 2019
44:10 Y. Yin et al. problems such as reversal and skew of the projected contour, which needs further calibration. It is worth mentioning that lowercase letters are different from capital letters; the shapes of different lowercase letters observed from different viewing angles can be similar, e.g., “b” and “q,” “d” and “p,” thus the orientation and writing order of a character are both important for lowercase letter recognition. It is essential to calibrate the projected 2D contour in right orientation and normalized size under a uniform view for character recognition. 4.2 Reference Coordinate Plane Detection According to Section 4.1 and Figure 9, the projected 2D contour in the principal plane has a high probability of keeping the shape feature of the in-air contour, while still having the problems such as reversal and skew, i.e., the orientation of the contour is changed. Thus, we need to calibrate the 2D contour in the principal plane. To achieve this goal, we detect the reference coordinate plane and determine the viewing angle first. Here, the reference coordinate plane is used to indicate the viewing angle and possible orientation of the projected contour in the principal plane. In regard to the user, she/he can perform the gesture towards arbitrary directions; the writing plane may be not parallel to any coordinate plane. 4.2.1 Axis Projection Calculation. We project the xh-axis, yh-axis, zh-axis of human-frame into the principal plane and then compare the length of the projected axis to determine the reference coordinate plane. According to Section 4.1, in the principal plane, the orthonormal basis vectors ω1,ω2 represent the xp -axis and yp -axis, respectively. With xp -axis and yp -axis, we further calculate the zp -axis as ω3 = ω1 × ω2 to establish the principal-plane coordinate system (principalframe for short). Here, ω1, ω2, ω3 are described in human-frame. While in the principal-frame, we can represent xp -axis, yp -axis, zp -axis as the unit vector (1, 0, 0) T , (0, 1, 0) T , (0, 0, 1) T , respectively. By comparing ω1, ω2, ω3 in human-frame and xp -axis, yp -axis, zp -axis in principal-frame, we can get the rotation matrix Rhp , which transforms coordinates from human-frame to principal-frame, as shown in Equation (5) and Equation (6): ⎡ ⎢ ⎢ ⎢ ⎢ ⎢ ⎣ 100 010 001 ⎤ ⎥ ⎥ ⎥ ⎥ ⎥ ⎦ = Rhp [ω1 ω2 ω3 ], (5) Rhp = [ω1 ω2 ω3 ] −1 . (6) With the rotation matrix Rhp , we then calculate the projection of each axis of human-frame in principal plane. For convenience, we use ui , i ∈ [1, 3] to represent xh-axis, yh-axis, zh-axis, respectively. For ui , its coordinates in the principal-frame is qi , where qi = Rhpui . Then, we get the projection vi of qi in the principal plane with M, i.e., setting the coordinate value in zp -axis to zero, as shown in Equation (7): vi = Mqi = ⎡ ⎢ ⎢ ⎢ ⎢ ⎢ ⎣ 100 010 000 ⎤ ⎥ ⎥ ⎥ ⎥ ⎥ ⎦ qi . (7) As shown in Figure 10(b), Figure 11(b), Figure 12(b), Figure 13(b), and Figure 14(b), we represent the projected axis of xh-axis, yh-axis, zh-axis (of human-frame) in the principal plane with black, green, and fuchsia dashed line, respectively. 4.2.2 Reference Plane Detection. In Figure 15, we show how to utilize the length of the projected axis to detect the reference coordinate plane. Intuitively, if the projection of axis ωi has the shortest length in the principal plane, it indicates that the coordinate plane perpendicular to ωi has the highest probability of being parallel to the principal plane and should be selected as the ACM Transactions on Sensor Networks, Vol. 15, No. 4, Article 44. Publication date: October 2019