This article has been accepted for publication in a future issue of this journal,but has not been fully edited.Content may change prior to final publication.Citation information:DOI 10.1109/JIOT.2021.3114224.IEEE Internet of Things Journal EEE INTERNET OF THINGS JOURNAL,VOL.XX,NO.XX,XX 2021 the top-right corner of user interface can be used to remove the character.Secondly,considering the language rules in regular text,we introduce the Bayesian method [33]to correct the wrong keystroke sequence.That is to say,given the keystroke sequence,we calculate the likelihood of each possible word and finally select the word with largest likelihood.In this way, we can tolerate the errors like false-negative keystrokes,false- (a)Key tracking accuracy in 100 (b)Keystokecperfor- positive keystrokes and wrongly-detected keystrokes. frames. mance with and without key tracking. Fig.13.Performance of key tracking and keystroke localization. Algorithm 2:Keystroke detection and localization coordinate and the ground truth of the ith cross point,respec- Input:The consecutive frames. The ith fingertip in the jth frame is ()which tively,and 100;represents the intersection over union between is transformed to (in the th frame. the calculated ith key's area and the ground truth.z=55 represents the number of cross points,while n=40 represents k=j+5. the number of keys,as shown in Fig.5(a).We obtain the if v(agy)2+(uy))2<ed then ground truth by manually detecting the pre-marked coordinates if The jth frame has no keystroke then of cross points and keys from each image frame.To measure LDetecting a new keystroke. the performance of keystroke localization,we use several metrics-localization accuracy.localization error,false positive else if V(a-y2+1))2<ea rate (FPR)and false negative rate (FNR).The localization then accuracy is the ratio of correctly located keystrokes to the LDetecting a new keystroke. number of keystrokes performed by subject.The localization if A new stroke is detected then error is the ratio of falsely located keystrokes to the number of Selecting fingertip with largest coordinate variation. keystrokes performed by subject.FPR and FNR are defined as Matching fingertip with key by coordinates. the ratio of falsely detected keystrokes and missed keystrokes Output:The located keystroke to the number of keystrokes performed by subject,respectively. V.PERFORMANCE EVALUATION B.Accuracy of Key Tracking and Keystroke Localization We deploy DynaKey on a Samsung Galaxy S9 smartphone In the experiment,a subject is instructed to type on the which is used as a head-mounted camera device,as shown keyboard in her/his own way.She/he may move her/his head in Fig.2(a).The smartphone runs Android OS 9.0.We use a naturally during the typing process.We evaluate the accuracy Microsoft Hololens [13]keyboard layout and print it on a piece of key tracking by the aforementioned pixel deviation Er and of A4-sized paper.Unless otherwise specified,the frame rate the average intersection over union To0 in 100 frames.As of camera is set to 30fps,the sampling rate of gyroscope is set shown in Fig.13(a),the pixel deviation E,in an image ranges to 200Hz,the image size is set to 800x 480 pixels.We conduct from 0 to 5 pixels,and the average pixel deviation Er among our experiments in an office environment.We recruit twelve the frames is less than 3 pixels.When comparing with the volunteers to participate in the experiments and each subject key size,i.e.,45 x 25 pixels,the deviation less than 3 pixels types a set of pre-defined 1600 characters.Data sanitized can be neglected.Meanwhile,the average IoU achieves above is done to ensure no private and identity information.We 93%,indicating that the area of the calculated key coincides first evaluate the performance of key tracking and keystroke with the ground truth in a high degree.To conclude,DynaKey localization.Then we evaluate how camera jitters,frame sizes accurately tracks the coordinate changes of keys in different and frame rates affect the performance of key tracking and frames while tolerating head movements during the typing keystroke localization.We also evaluate the performance of process. DynaKey in complex scenarios to explore its usage modes. To evaluate the performance of keystroke localization and After that,we evaluate the latency and energy consumption tracking in dynamic scenes,we instruct a subject to press all of DynaKey.Finally,we evaluate DynaKey on text input,and the keys on the keyboard without and with key tracking mod- compare DynaKey with the state-of-the-art text input methods.ule.Fig.13(b)shows that the keystroke localization accuracy without key tracking module is only about 66.9%,while the A.Performance Metrics localization error and false negative rate are also high.This may be mainly due to the mismatch between the key's location To measure the accuracy of key tracking,we use Er ∑i=1 V(ime-xgP+(gn-g卫to represent the av. and its coordinates in dynamic camera views.With the key tracking module,the keystroke localization accuracy increases erage pixel deviation between the calculated cross points' significantly,i.e.,from 66.9%to 95.5%,and localization error, coordinates forming the keyboard layout and the ground truth, and Ioo to represent the average intersection false positive rate and false negative rate are 1.9%,2.1% and 2.6%,respectively.The results demonstrate that DynaKey over union [25]between the calculated keys'areas and the ground truth.The smaller Er the better,and the larger ToU the accurately locates the keystrokes,and the key tracking module better.Here,(m,m)and ()represent the calculated plays a critical role in keystroke localization in dynamic scenes.2327-4662 (c) 2021 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information. This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/JIOT.2021.3114224, IEEE Internet of Things Journal IEEE INTERNET OF THINGS JOURNAL, VOL. XX, NO. XX, XX 2021 9 the top-right corner of user interface can be used to remove the character. Secondly, considering the language rules in regular text, we introduce the Bayesian method [33] to correct the wrong keystroke sequence. That is to say, given the keystroke sequence, we calculate the likelihood of each possible word and finally select the word with largest likelihood. In this way, we can tolerate the errors like false-negative keystrokes, falsepositive keystrokes and wrongly-detected keystrokes. Algorithm 2: Keystroke detection and localization Input: The consecutive frames. The ith fingertip in the jth frame is (x (j) i , y (j) i ), which is transformed to (x (j) 0 i , y (j) 0 i ) in the kth frame, k = j + 5. if q (x (j) 0 i − x (k) i ) 2 + (y (j) 0 i − y (k) i ) 2 < d then if The jth frame has no keystroke then Detecting a new keystroke. else if q (x (k−1)0 i − x (k) i ) 2 + (y (k−1)0 i − y (k) i ) 2 < d then Detecting a new keystroke. if A new stroke is detected then Selecting fingertip with largest coordinate variation. Matching fingertip with key by coordinates. Output: The located keystroke. V. PERFORMANCE EVALUATION We deploy DynaKey on a Samsung Galaxy S9 smartphone which is used as a head-mounted camera device, as shown in Fig. 2(a). The smartphone runs Android OS 9.0. We use a Microsoft Hololens [13] keyboard layout and print it on a piece of A4-sized paper. Unless otherwise specified, the frame rate of camera is set to 30fps, the sampling rate of gyroscope is set to 200Hz, the image size is set to 800×480 pixels. We conduct our experiments in an office environment. We recruit twelve volunteers to participate in the experiments and each subject types a set of pre-defined 1600 characters. Data sanitized is done to ensure no private and identity information. We first evaluate the performance of key tracking and keystroke localization. Then we evaluate how camera jitters, frame sizes and frame rates affect the performance of key tracking and keystroke localization. We also evaluate the performance of DynaKey in complex scenarios to explore its usage modes. After that, we evaluate the latency and energy consumption of DynaKey. Finally, we evaluate DynaKey on text input, and compare DynaKey with the state-of-the-art text input methods. A. Performance Metrics To measure the accuracy of key tracking, we use Er = 1 z Pz i=1 p (xmi − xgi ) 2 + (ymi − ygi ) 2 to represent the average pixel deviation between the calculated cross points’ coordinates forming the keyboard layout and the ground truth, and IoU = 1 n Pn i=1 IoUi to represent the average intersection over union [25] between the calculated keys’ areas and the ground truth. The smaller Er the better, and the larger IoU the better. Here, (xmi , ymi ) and (xgi , ygi ) represent the calculated 0 20 40 60 80 100 Frames 0 5 10 15 Pixel deviation (pixels) 0.5 0.6 0.7 0.8 0.9 1 IoU (a) Key tracking accuracy in 100 frames. 66.9 15.3 3 17.6 95.5 1.9 2.1 2.6 without with Keystroke localization vs. key tracking 0 20 40 60 80 100 Percentage (%) Localization accuracy Localization error False positive rate False negative rate (b) Keystroke localization performance with and without key tracking. Fig. 13. Performance of key tracking and keystroke localization. coordinate and the ground truth of the ith cross point, respectively, and IoUi represents the intersection over union between the calculated ith key’s area and the ground truth. z = 55 represents the number of cross points, while n = 40 represents the number of keys, as shown in Fig. 5(a). We obtain the ground truth by manually detecting the pre-marked coordinates of cross points and keys from each image frame. To measure the performance of keystroke localization, we use several metrics–localization accuracy, localization error, false positive rate (FPR) and false negative rate (FNR). The localization accuracy is the ratio of correctly located keystrokes to the number of keystrokes performed by subject. The localization error is the ratio of falsely located keystrokes to the number of keystrokes performed by subject. FPR and FNR are defined as the ratio of falsely detected keystrokes and missed keystrokes to the number of keystrokes performed by subject, respectively. B. Accuracy of Key Tracking and Keystroke Localization In the experiment, a subject is instructed to type on the keyboard in her/his own way. She/he may move her/his head naturally during the typing process. We evaluate the accuracy of key tracking by the aforementioned pixel deviation Er and the average intersection over union IoU in 100 frames. As shown in Fig. 13(a), the pixel deviation Er in an image ranges from 0 to 5 pixels, and the average pixel deviation Er among the frames is less than 3 pixels. When comparing with the key size, i.e., 45 × 25 pixels, the deviation less than 3 pixels can be neglected. Meanwhile, the average IoU achieves above 93%, indicating that the area of the calculated key coincides with the ground truth in a high degree. To conclude, DynaKey accurately tracks the coordinate changes of keys in different frames while tolerating head movements during the typing process. To evaluate the performance of keystroke localization and tracking in dynamic scenes, we instruct a subject to press all the keys on the keyboard without and with key tracking module. Fig. 13(b) shows that the keystroke localization accuracy without key tracking module is only about 66.9%, while the localization error and false negative rate are also high. This may be mainly due to the mismatch between the key’s location and its coordinates in dynamic camera views. With the key tracking module, the keystroke localization accuracy increases significantly, i.e., from 66.9% to 95.5%, and localization error, false positive rate and false negative rate are 1.9%, 2.1% and 2.6%, respectively. The results demonstrate that DynaKey accurately locates the keystrokes, and the key tracking module plays a critical role in keystroke localization in dynamic scenes. Authorized licensed use limited to: Nanjing University. Downloaded on December 03,2021 at 08:56:41 UTC from IEEE Xplore. Restrictions apply