AirTyping:A Mid-Air Typing Scheme based on Leap Motion Hao Zhang Yafeng Yin State Key Laboratory for Novel Software Technology, State Key Laboratory for Novel Software Technology, Nanjing University Nanjing University Nanjing,China Nanjing,China H.Zhang@smail.nju.edu.cn yafeng@nju.edu.cn Lei Xie Sanglu Lu State Key Laboratory for Novel Software Technology, State Key Laboratory for Novel Software Technology, Nanjing University Nanjing University Nanjing,China Nanjing,China lxie@nju.edu.cn sanglu@nju.edu.cn ABSTRACT 1 INTRODUCTION In Human-Computer Interactions(HCI),to reduce the dependency The development of human activity recognition technology has of bulky devices like physical keyboards andjoysticks,many gesture- brought new ways for Human-Computer Interaction(HCD).Specifi- based HCI schemes are adopted.As a typical HCI technology,text cally,people can perform gestures with arms,hands or even fingers input has aroused much concern and many virtual or wearable key- to interact with computer/devices,without the necessity of using boards have been proposed.To further remove the keyboard and joysticks or specially designed controllers,e.g,playing motion allow people to type in a device-free way,we propose AirTyping, sensing games.As a typical HCI technology,text input has aroused i.e.,a mid-air typing scheme based on Leap Motion.During the people's attention,thus many gesture based input schemes [1][2][3] typing process,the Leap Motion Controller captures the typing are proposed to get rid of the dependency of physical keyboards. gestures with cameras and provides the coordinates of finger joints. However,the existing work tends to introduce virtual keyboard or Then,AirTyping detects the possible keystrokes,infers the typed wearable sensors for text input.To further remove the constraints words based on Bayesian method,and outputs the inputted word of keyboard and wearable sensors,we propose AirTyping,i.e.,a sequence.The experiment results show that our system can detect mid-air typing scheme based on Leap Motion,as shown in Fig.1. the keystrokes and infer the typed text efficiently,ie.,the true pos- When the user types words in mid-air over the Leap Motion Con- itive rate of keystroke detection is 92.2%.while the accuracy that troller(LMC)with standard fingering,AirTyping utilizes LMC to the top-1 inferred word is the typed word achieves 90.2%. track the coordinates of finger joints and infers the typed words for text input.AirTyping can be used in many scenarios inconvenient CCS CONCEPTS to use keyboards or required to protect the privacy of text input without a visible keyboard layout. Human-centered computing-Text input;Gestural input. However,without the keyboard layout,it is difficult to map the finger's movement with a specific keystroke,which brings the KEYWORDS challenges of keystroke detection and recognition.Specifically,con- Mid-Air Typing:Leap Motion;Human-Computer Interaction sidering that fingers not making a keystroke can also move,we introduce the bending angles of fingers,movement trend of a finger in consecutive coordinates,and time difference between keystrokes ACM Reference Format: Hao Zhang.Yafeng Yin,Lei Xie,and Sanglu Lu.2020.AirTyping:A Mid-Air to detect the most possible finger making a keystroke.Besides.con- Typing Scheme based on Leap Motion.In Adjunct Proceedings of the 2020 sidering possible wrong,false positive and false negative detected ACM International Joint Conference on Pervasive and Ubiquitous Computing and Proceedings of the 2020 ACM International Symposium on Wearable metacarpal QEATYUIO P Computers (UbiComp/ISWC '20 Adjunct),September 12-16,2020,Virtual Event,Mexico.ACM,New York,NY,USA,4 pages.https://doiorg/10.1145/ 3410530.3414387 his is a leap ard dem Permission to make digital or hard copies of part or all of this work for personal or target words classroom use is granted without fee provided that copies are not made or distributed 782/82/1/93110/8837 for profit or commercial advantage and that copies bear this notice and the full citation detected fingers on the first page.Copyrights for third-party components of this work must be honored. his is a leag For all other uses,contact the owner/author(s). inferred words UbiComp/ISWC '20 Adjunct,September 12-16,2020,Virtual Event,Mexico 2020 Copyright held by the owner/author(s). ACM1SBN978-1-4503-8076-8/20/09. htps:/doi.org/10.1145/3410530.3414387 Figure 1:Mid-air typing based on Leap Motion
AirTyping: A Mid-Air Typing Scheme based on Leap Motion Hao Zhang State Key Laboratory for Novel Software Technology, Nanjing University Nanjing, China H.Zhang@smail.nju.edu.cn Yafeng Yin State Key Laboratory for Novel Software Technology, Nanjing University Nanjing, China yafeng@nju.edu.cn Lei Xie State Key Laboratory for Novel Software Technology, Nanjing University Nanjing, China lxie@nju.edu.cn Sanglu Lu State Key Laboratory for Novel Software Technology, Nanjing University Nanjing, China sanglu@nju.edu.cn ABSTRACT In Human-Computer Interactions (HCI), to reduce the dependency of bulky devices like physical keyboards and joysticks, many gesturebased HCI schemes are adopted. As a typical HCI technology, text input has aroused much concern and many virtual or wearable keyboards have been proposed. To further remove the keyboard and allow people to type in a device-free way, we propose AirTyping, i.e., a mid-air typing scheme based on Leap Motion. During the typing process, the Leap Motion Controller captures the typing gestures with cameras and provides the coordinates of finger joints. Then, AirTyping detects the possible keystrokes, infers the typed words based on Bayesian method, and outputs the inputted word sequence. The experiment results show that our system can detect the keystrokes and infer the typed text efficiently, i.e., the true positive rate of keystroke detection is 92.2%, while the accuracy that the top-1 inferred word is the typed word achieves 90.2%. CCS CONCEPTS • Human-centered computing → Text input; Gestural input. KEYWORDS Mid-Air Typing; Leap Motion; Human-Computer Interaction ACM Reference Format: Hao Zhang, Yafeng Yin, Lei Xie, and Sanglu Lu. 2020. AirTyping: A Mid-Air Typing Scheme based on Leap Motion. In Adjunct Proceedings of the 2020 ACM International Joint Conference on Pervasive and Ubiquitous Computing and Proceedings of the 2020 ACM International Symposium on Wearable Computers (UbiComp/ISWC ’20 Adjunct), September 12–16, 2020, Virtual Event, Mexico. ACM, New York, NY, USA, 4 pages. https://doi.org/10.1145/ 3410530.3414387 Permission to make digital or hard copies of part or all of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for third-party components of this work must be honored. For all other uses, contact the owner/author(s). UbiComp/ISWC ’20 Adjunct, September 12–16, 2020, Virtual Event, Mexico © 2020 Copyright held by the owner/author(s). ACM ISBN 978-1-4503-8076-8/20/09. https://doi.org/10.1145/3410530.3414387 1 INTRODUCTION The development of human activity recognition technology has brought new ways for Human-Computer Interaction (HCI). Specifically, people can perform gestures with arms, hands or even fingers to interact with computer/devices, without the necessity of using joysticks or specially designed controllers, e.g., playing motion sensing games. As a typical HCI technology, text input has aroused people’s attention, thus many gesture based input schemes [1][2][3] are proposed to get rid of the dependency of physical keyboards. However, the existing work tends to introduce virtual keyboard or wearable sensors for text input. To further remove the constraints of keyboard and wearable sensors, we propose AirTyping, i.e., a mid-air typing scheme based on Leap Motion, as shown in Fig. 1. When the user types words in mid-air over the Leap Motion Controller (LMC) with standard fingering, AirTyping utilizes LMC to track the coordinates of finger joints and infers the typed words for text input. AirTyping can be used in many scenarios inconvenient to use keyboards or required to protect the privacy of text input without a visible keyboard layout . However, without the keyboard layout, it is difficult to map the finger’s movement with a specific keystroke, which brings the challenges of keystroke detection and recognition. Specifically, considering that fingers not making a keystroke can also move, we introduce the bending angles of fingers, movement trend of a finger in consecutive coordinates, and time difference between keystrokes to detect the most possible finger making a keystroke. Besides, considering possible wrong, false positive and false negative detected target words detected fingers inferred words Leap Motion θ this is a leap this is a leap keyboard demo distal phalanx metacarpal proximal phalanx 1 2 3 4 5 6 7 89 10 Q W E R TYU I O P Z XCV BNM A S D F GHJK L space 4 7 8 2 / 8 2 / 1 / 9 3 1 10 / 8 8 3 7 Figure 1: Mid-air typing based on Leap Motion
UbiComp/ISWC'20 Adjunct,September 12-16,2020,Virtual Event,Mexico Hao Zhang et al. keystrokes,and the one-to-many mapping between the finger and of a finger.For convenience,we introduce 0 to describe the bending characters in standard fingering,we introduce the Bayesian method angle,which is formed by the distal phalanx and the corresponding to infer the typed word sequence for text input. metacarpal bone,as shown in Fig.1.Due to the fact that there is no metacarpal bone for the thumb,we use the 0 between the distal and 2 RELATED WORK proximal phalanx for the thumb.When a>ep,there is a possible To remove the dependency of physical keyboards,CamK [3]is keystroke,we set p=45 empirically. designed to input text into small devices by using a panel with 3.2.2 Detecting the possible keystroke.Based on the bending angle a keyboard layout.Microsoft Hololens [1]provides a projection of a finger,we can find the fingers which may make a keystroke. keyboard in front of the user for text input.When removing the To further determine the only finger making a keystroke,we first constraint of virtual keyboard layout,RF-glove [2]recognizes fin- compare the y-coordinate of each fingertip(the coordinate system ger movements using RF signals for mid-air interaction,while it of LMC is shown in Fig.2),and then select the finger f having the can be easily affected by the variations of environments.Consider- smallest y-coordinate as the finger making a keystroke,as shown ing the above limitations of layouts or environments,Leap Motion in Fig.1.This is because the finger pressing a key often achieves Controller [6]is introduced for mid-air HCI.Assam et al.[5]build the lowest position,when compared with other fingers.Then,we a Google Chrome extension to facilitate web browsing,but the will further verify that whether the selected finger really makes involved gestures are not suitable for text input.ATK [4]enables a keystroke or not,based on the movement trend of the finger freehand typing in the air,while it limits the number of keystrokes We compare three consecutive y-coordinates of the finger f,ie. and the number of characters in a word to be exactly the same. Different from the existing work,we aim to provide a mid-air typ- yi-1,yi.yi+1.If the finger's movement satisfies the 'V'-shaped feature,ie.,yi<yi-1 and yi<yi+1,we detect a possible keystroke ing scheme based on Leap Motion for text input while having no corresponding to the finger f. limitations on the number of keystrokes or characters in a word. 3.2.3 Removing false positive keystrokes.Finally,to remove the 3 SYSTEM DESIGN false duplicate keystrokes in a short duration,we introduce the time 3.1 System Overview difference between two consecutive keystrokes.The newly detected possible keystroke will be treated as a keystroke,only when the time Figure 2 shows the main components of the AirTyping system.The difference between the current keystroke and the last determined inputs are the coordinates of finger joints captured by Leap Motion keystroke is larger than At.According to [7],the duration of a controller,while the output is a sentence composed of inferred keystroke is usually about 185ms,thus we set At 185ms. words.While the user types in mid-air over the Leap Motion con- troller(LMC),we first utilize Keystroke Detection module to analyze 3.3 Word Inference the finger movements based on the coordinates of finger joints. and detect the possible keystroke happening.Then,based on Word After detecting the keystroke,we will infer the typed word based on Inference module,we utilize Bayesian method to infer the most the keystroke sequence.However,due to the interference of other possible word for a specific keystroke sequence,which is separated fingers'movements,these is a probability of detecting a wrong. by the space key.With the inferred words,AirTyping can output false positive or false negative keystroke.Besides,without a key- the sentence(ie.,word sequence)typed by the user for text input. board layout,we can not map the keystroke with a character(i.e., 'a''z')directly.That is to say,there is not a one-to-one mapping be- Keystroke Detection Word Inference tween the keystroke and the character.As shown in Fig.3,the user Coordinates types with standard fingering,thus the finger making a keystroke Calculating bending angles of fingers corresponds to multiple characters,e.g.,the 2nd finger corresponds eystrok The number of Keystrokes to the chars'w','s'and'x'.Therefore,to infer the typed word from Detecting the Ki the keystrokes,we introduce the Bayesian method,as described possible keystroke below. The Permutation of Keystrokes Considering that words are separated by the space key,we first P(KIW) introduce the space key,i.e.,keystrokes made by the 5th or 6th posilve keystrokes finger,to separate the keystrokes.Then,we can get a specific key- stroke sequence corresponding to a word.For convenience,we Figure 2:System Framework use Ki,i E N+to represent the keystroke sequence,while using Wi.jeN+to represent a word in the dictionary.Then,the proba- 3.2 Keystroke Detection bility P(WilKi)means that the keystroke sequence Ki is inferred as The keystroke detection module is designed to analyze the finger the word Wi.When P(WilKi)achieves the highest value,the word movements from the coordinates of finger joints,and detect the Wi is chosen as the inferred word.According to Bayes'theorem, possible keystroke happening. P(WilK:)can be calculated with Eq.(1). 3.2.1 Calculating bending angles of fingers.Intuitively,during a P(w)×P(Kw) keystroke,the fingertip first bends down,then stays at the lowest P(WjlKi)= (1) P(Ki) position for a short duration,and finally moves away.Therefore,we where P(Wi)is the prior probability representing the occurrence can detect the possible keystroke by measuring the bending angle frequency of word Wj.P(Ki)is the probability of detecting the
UbiComp/ISWC ’20 Adjunct, September 12–16, 2020, Virtual Event, Mexico Hao Zhang et al. keystrokes, and the one-to-many mapping between the finger and characters in standard fingering, we introduce the Bayesian method to infer the typed word sequence for text input. 2 RELATED WORK To remove the dependency of physical keyboards, CamK [3] is designed to input text into small devices by using a panel with a keyboard layout. Microsoft Hololens [1] provides a projection keyboard in front of the user for text input. When removing the constraint of virtual keyboard layout, RF-glove [2] recognizes finger movements using RF signals for mid-air interaction, while it can be easily affected by the variations of environments. Considering the above limitations of layouts or environments, Leap Motion Controller [6] is introduced for mid-air HCI. Assam et al. [5] build a Google Chrome extension to facilitate web browsing, but the involved gestures are not suitable for text input. ATK [4] enables freehand typing in the air, while it limits the number of keystrokes and the number of characters in a word to be exactly the same. Different from the existing work, we aim to provide a mid-air typing scheme based on Leap Motion for text input while having no limitations on the number of keystrokes or characters in a word. 3 SYSTEM DESIGN 3.1 System Overview Figure 2 shows the main components of the AirTyping system. The inputs are the coordinates of finger joints captured by Leap Motion controller, while the output is a sentence composed of inferred words. While the user types in mid-air over the Leap Motion controller (LMC), we first utilize Keystroke Detection module to analyze the finger movements based on the coordinates of finger joints, and detect the possible keystroke happening. Then, based on Word Inference module, we utilize Bayesian method to infer the most possible word for a specific keystroke sequence, which is separated by the space key. With the inferred words, AirTyping can output the sentence (i.e., word sequence) typed by the user for text input. Ki Keystroke Detection Word Wj Keystroke Sequence Calculating bending angles of fingers Detecting the possible keystroke Removing false positive keystrokes X Y Z Coordinates Word Inference The Number of Keystrokes The Permutations of Keystrokes P(Ki|Wj ) Sentence Figure 2: System Framework. 3.2 Keystroke Detection The keystroke detection module is designed to analyze the finger movements from the coordinates of finger joints, and detect the possible keystroke happening. 3.2.1 Calculating bending angles of fingers. Intuitively, during a keystroke, the fingertip first bends down, then stays at the lowest position for a short duration, and finally moves away. Therefore, we can detect the possible keystroke by measuring the bending angle of a finger. For convenience, we introduce θ to describe the bending angle, which is formed by the distal phalanx and the corresponding metacarpal bone, as shown in Fig. 1. Due to the fact that there is no metacarpal bone for the thumb, we use the θ between the distal and proximal phalanx for the thumb. When θ > ϵb , there is a possible keystroke, we set ϵb = 45◦ empirically. 3.2.2 Detecting the possible keystroke. Based on the bending angle of a finger, we can find the fingers which may make a keystroke. To further determine the only finger making a keystroke, we first compare the y-coordinate of each fingertip (the coordinate system of LMC is shown in Fig. 2), and then select the finger f having the smallest y-coordinate as the finger making a keystroke, as shown in Fig. 1. This is because the finger pressing a key often achieves the lowest position, when compared with other fingers. Then, we will further verify that whether the selected finger really makes a keystroke or not, based on the movement trend of the finger. We compare three consecutive y-coordinates of the finger f , i.e., yi−1, yi , yi+1. If the finger’s movement satisfies the ‘V’-shaped feature, i.e., yi < yi−1 and yi < yi+1, we detect a possible keystroke corresponding to the finger f . 3.2.3 Removing false positive keystrokes. Finally, to remove the false duplicate keystrokes in a short duration, we introduce the time difference between two consecutive keystrokes. The newly detected possible keystroke will be treated as a keystroke, only when the time difference between the current keystroke and the last determined keystroke is larger than Δt. According to [7], the duration of a keystroke is usually about 185ms, thus we set Δt = 185ms. 3.3 Word Inference After detecting the keystroke, we will infer the typed word based on the keystroke sequence. However, due to the interference of other fingers’ movements, these is a probability of detecting a wrong, false positive or false negative keystroke. Besides, without a keyboard layout, we can not map the keystroke with a character (i.e., ‘a’-‘z’) directly. That is to say, there is not a one-to-one mapping between the keystroke and the character. As shown in Fig. 3, the user types with standard fingering, thus the finger making a keystroke corresponds to multiple characters, e.g., the 2nd finger corresponds to the chars ‘w’, ‘s’ and ‘x’. Therefore, to infer the typed word from the keystrokes, we introduce the Bayesian method, as described below. Considering that words are separated by the space key, we first introduce the space key, i.e., keystrokes made by the 5th or 6th finger, to separate the keystrokes. Then, we can get a specific keystroke sequence corresponding to a word. For convenience, we use Ki,i ∈ N+ to represent the keystroke sequence, while using Wj , j ∈ N+ to represent a word in the dictionary. Then, the probability P (Wj |Ki ) means that the keystroke sequence Ki is inferred as the word Wj . When P (Wj |Ki ) achieves the highest value, the word Wj is chosen as the inferred word. According to Bayes’ theorem, P (Wj |Ki ) can be calculated with Eq. (1), P (Wj |Ki ) = P (Wj ) × P (Ki |Wj ) P (Ki ) (1) where P (Wj ) is the prior probability representing the occurrence frequency of word Wj , P (Ki ) is the probability of detecting the
AirTyping:A Mid-Air Typing Scheme based on Leap Motion UbiComp/ISWC'20 Adjunct,September 12-16,2020.Virtual Event,Mexico 012301230123 10P q0123 t h a t t h a t t h a t th a t 473473473 M p012 012 012 012 (a)Four Permutations for"that". 9 012 01 01 01 t h t o t o 475 47'3473473 01 2 012012 012 (b)One Permutation for "the".(c)Three Permutations for "to" Figure 3:The standard fingering for typing. Figure4:The permutations for“that",“the”and“to keystroke sequence Ki,P(KilWi)is the likelihood function which we further use the permutations of keystrokes to calculate the estimates the probability of the keystroke sequence Ki based on likelihood Pm(KilWi)for the keystroke sequence Ki based on the the word Wi.Since each word has same occurrence frequency, word Wi. i.e.,P(Wi)is equal among all words,and P(Ki)is also same for Specifically,for a detected keystroke sequence Ki =(k1,k2, all words,the Eg.(1)can be transformed to the calculation of the ....kn)and a word Wi (w1.w2.....wi,)in the dictionary,kp likelihood function P(KilWi).i.e.,P(WilKi)o P(KiWi),which will is the pth keystroke,and it is represented with the finger f(kp) be calculated with the number and the permutations of keystrokes. making the keystroke,p e [1,ni],f(kp)E [1,10].while wa is the gth letter of the word,wa e['a','z'],ge [1,li].If ni li,the 3.3.1 Inference with the number of keystrokes.Intuitively,if a key- number of the permutations of keystrokes in Ki for the word W stroke sequence Ki exactly matches with a word Wi,the number of keystrokes in Ki should be same with the number of chars in is A)=which denotes all possible cases of replacing Wi.Therefore,if the number of keystrokes is closer to the number ni letters in the word Wi with the ni keystrokes.If ni li,the of chars in a word,the word has a higher likelihood.For example number of permutations changes to A(n,l),which denotes all when four keystrokes are detected,the word "they"(i.e.,4 chars) cases of replacing the li keystrokes in Ki with lj letters in the word will have a higher likelihood than the word"a"(ie.,1 char)or"there Wi.Therefore,the likelihood Pm(KilW;)can be represented with fore"(i.e.,9 chars).Specifically,we use the number of keystrokes Eq.(4), ni in Ki to calculate the likelihood Pn(KilWj)for Ki based on the word Wi,as described in Eq.(2). ,、Πpetn P(wg∈Sfkp)lm),ni>lj Pm(AlWi)= meA(nt,)q∈L月 Pn(nilWj)=1-Imax +8 ni-lil (2) II pen P(wg∈Sfkp)lm,n:≤ meA(lj.n) Here,Ii is the length of word Wi,and Imax is the maximum length (4) difference of the words in the dictionary,i.e.,the difference between where m denotes a case of all permutations A.Given the permu- the longest and shortest word length.According to the adopted tation m,when the letter wa is in the key set Sf(k)typed by the word set [8].we set lmax=16 in this paper.To avoid the probability finger f(p)P(w()is calculated as For xam Pn(nilWi)being set to 0,we introduce a tolerance factor =0.01. ple,the letter't'is in the key set of the finger 4,so it is calculated as For a better illustration,we assume four keystrokes are detected. If the letterw is not in the key set S()the probability and we calculate the likelihood of the words“they”,“a"and“"there. P(wae Sf(kp)lm)is set to the tolerance factor 8. fore"with Eq.(2).As shown in Eq.(3),the likelihood of the word For instance,if the user wants to type the word"the",the 4th.7th "they”is higher than that of the words"a”or“therefore”,since the and 3rd finger will press in sequence.Based on Fig.3,the detected number (i.e.,four)of chars in "they"is closer to the number (i.e., keystroke sequence Ki will be represented as'4'-'7'-'3,and the four)of keystrokes. number of Ki is ni=3.In regard to the word Wi,if Wi="that",the 4-1川 Pn(ni=4lW=“a"=1- =0.813 length of Wi is lj =4.Since ni<li,the number of permutations is 16+0.01 A(lj,ni).The corresponding possible permutations are shown in Pn(ni=4|w=“they"))=1- 4-4到 Fig.4(a).Then the likelihood of the keystroke sequence Ki based =1 (3) 16+0.01 on the word Wi can be calculated with Eq.(5). 14-9 Pn(ni =4Wi="therefore")=1- =0.688 1 1 1 16+0.01 Pma,ni训w=hat)=商×高xi+ 3.3.2 Inference with the Permutations of Keystrokes.In addition to (5) 1 the number of keystrokes,the finger sequence making the keystroke 阿×6+5×02+6=575×10-4 sequence also affects the likelihood,since each finger maps with more than one character.As shown in Fig.3,when the user types where is the probability of typing't'by finger 4,and is the in standard fingering,each finger makes several fixed keystrokes, probability of typing 'h'by finger 7.The four components represent i.e.,the keys have the same color with the fingertip.Therefore, the four permutation cases shown in Fig.4(a)
AirTyping: A Mid-Air Typing Scheme based on Leap Motion UbiComp/ISWC ’20 Adjunct, September 12–16, 2020, Virtual Event, Mexico 1 2 3 4 5 6 7 89 10 Q W E R TYU I O P Z XCVB N M A S D F GHJK L space Figure 3: The standard fingering for typing. keystroke sequence Ki , P (Ki |Wj ) is the likelihood function which estimates the probability of the keystroke sequence Ki based on the word Wj . Since each word has same occurrence frequency, i.e., P (Wj ) is equal among all words, and P (Ki ) is also same for all words, the Eq. (1) can be transformed to the calculation of the likelihood function P (Ki |Wj ), i.e., P (Wj |Ki ) ∝ P (Ki |Wj ), which will be calculated with the number and the permutations of keystrokes. 3.3.1 Inference with the number of keystrokes. Intuitively, if a keystroke sequence Ki exactly matches with a word Wj , the number of keystrokes in Ki should be same with the number of chars in Wj . Therefore, if the number of keystrokes is closer to the number of chars in a word, the word has a higher likelihood. For example, when four keystrokes are detected, the word “they” (i.e., 4 chars) will have a higher likelihood than the word “a” (i.e., 1 char) or “therefore” (i.e., 9 chars). Specifically, we use the number of keystrokes ni in Ki to calculate the likelihood Pn (Ki |Wj ) for Ki based on the word Wj , as described in Eq. (2). Pn (ni |Wj ) = 1 − |ni − lj | lmax + δ (2) Here, lj is the length of word Wj , and lmax is the maximum length difference of the words in the dictionary, i.e., the difference between the longest and shortest word length. According to the adopted word set [8], we setlmax = 16 in this paper. To avoid the probability Pn (ni |Wj ) being set to 0, we introduce a tolerance factor δ = 0.01. For a better illustration, we assume four keystrokes are detected, and we calculate the likelihood of the words “they”, “a” and “therefore” with Eq. (2). As shown in Eq. (3), the likelihood of the word “they” is higher than that of the words “a” or “therefore”, since the number (i.e., four) of chars in “they” is closer to the number (i.e., four) of keystrokes. Pn (ni = 4|Wj = “a”) = 1 − |4 − 1| 16 + 0.01 = 0.813 Pn (ni = 4|Wj = “they”) = 1 − |4 − 4| 16 + 0.01 = 1 Pn (ni = 4|Wj = “therefore”) = 1 − |4 − 9| 16 + 0.01 = 0.688 (3) 3.3.2 Inference with the Permutations of Keystrokes. In addition to the number of keystrokes, the finger sequence making the keystroke sequence also affects the likelihood, since each finger maps with more than one character. As shown in Fig. 3, when the user types in standard fingering, each finger makes several fixed keystrokes, i.e., the keys have the same color with the fingertip. Therefore, t h a t q 0 1 2 3 p 0 1 2 0 1 2 0 1 2 0 1 2 '4' '7' '3' 0 1 2 3 0 1 2 3 0 1 2 3 (a) Four Permutations for "that". q 0 1 2 0 1 0 1 0 1 t h e t o (b) One Permutation for "the". (c) Three Permutations for “to". p 0 1 2 0 1 2 0 1 2 0 1 2 '4' '7' '3' '4' '7' '3' '4' '7' '3' '4' '7' '3' '4' '7' '3' '4' '7' '3' '4' '7' '3' t h a t t h a t t h a t t o t o Figure 4: The permutations for “that”, “the” and “to”. we further use the permutations of keystrokes to calculate the likelihood Pm (Ki |Wj ) for the keystroke sequence Ki based on the word Wj . Specifically, for a detected keystroke sequence Ki = (k1, k2, ..., kni ) and a word Wj = (w1,w2, ...,wlj ) in the dictionary, kp is the pth keystroke, and it is represented with the finger f (kp ) making the keystroke, p ∈ [1,ni], f (kp ) ∈ [1, 10], while wq is the qth letter of the word, wq ∈[‘a’, ‘z’], q ∈ [1,lj]. If ni ≤ lj , the number of the permutations of keystrokes in Ki for the word Wj is A(lj ,ni ) = lj ! (lj−ni )! , which denotes all possible cases of replacing ni letters in the word Wj with the ni keystrokes. If ni > lj , the number of permutations changes to A(ni,lj ), which denotes all cases of replacing the lj keystrokes in Ki with lj letters in the word Wj . Therefore, the likelihood Pm (Ki |Wj ) can be represented with Eq. (4), Pm (A|Wj ) = ⎧⎪⎪⎪⎪ ⎨ ⎪⎪⎪⎪ ⎩ m∈A(ni,lj ) p ∈ [1, ni ] q ∈ [1, lj ] P (wq ∈ Sf (kp ) |m),ni > lj m∈A(lj,ni ) p ∈ [1, ni ] q ∈ [1, lj ] P (wq ∈ Sf (kp ) |m),ni ≤ lj (4) where m denotes a case of all permutations A. Given the permutation m, when the letter wq is in the key set Sf (kp ) typed by the finger f (kp ), P (wq ∈ Sf (kp ) |m) is calculated as 1 |Sf (kp ) | . For example, the letter ‘t’ is in the key set of the finger 4, so it is calculated as 1 |S4 | = 1 6 . If the letter wq is not in the key set Sf (kp ), the probability P (wq ∈ Sf (kp ) |m) is set to the tolerance factor δ. For instance, if the user wants to type the word “the”, the 4th, 7th and 3rd finger will press in sequence. Based on Fig. 3, the detected keystroke sequence Ki will be represented as ‘4’-‘7’-‘3’, and the number of Ki is ni = 3. In regard to the wordWj , ifWj = “that”, the length of Wj is lj = 4. Since ni < lj , the number of permutations is A(lj ,ni ). The corresponding possible permutations are shown in Fig. 4(a). Then the likelihood of the keystroke sequence Ki based on the word Wj can be calculated with Eq. (5). Pm (A(lj ,ni )|Wj = “that”) = 1 |S4 | × 1 |S7 | × δ + 1 |S4 | × 1 |S7 | × δ + 1 |S4 | × δ 2 + δ 3 = 5.75 × 10−4 (5) where 1 |S4 | is the probability of typing ‘t’ by finger 4, and 1 |S7 | is the probability of typing ‘h’ by finger 7. The four components represent the four permutation cases shown in Fig. 4(a).
UbiComp/ISWC'20 Adjunct,September 12-16,2020,Virtual Event,Mexico Hao Zhang et al. 0.98 0m- 002 3 0as 4002 0.1 5 a92 Q. 6 .2 8 0. 0B2 9 三10 是db1om0e 0624 02 04a020202 2345678910ul Detected finger making a keystroke lop-4 top-7 top-12 top-27 Figure 5:The detection accuracy of the finger making a key- Figure 6:The probability that top-k candidate words contain stroke(null'means no keystrokes). the typed/target word. If the length of the word and that of the keystroke sequence that the top-1 candidate word with the highest likelihood is the are the same (i.e.,ni=li),there is only one permutation for them, typed/target word achieves 90.2%.Besides,the probability that as shown in Fig.4(b),and the likelihood about the word "the"is the top-3 candidate words contain the typed/target word achieves shown in Eq.(6).If ni>lj,we assume the word Wj="to".At this 98.5%.Overall,we can detect keystrokes and infer the typed words time,ni=3 and li=2,the number of permutations changes to accurately,and provide an efficient mid-air typing scheme for text A(ni,li)=3.The corresponding possible permutations are shown input in Fig.4(c).We will calculate the likelihood of K;based on the word Wi with Eq.(7): 5 CONCLUSION In this paper,we propose AirTyping,which allows people to type in Pm(a,nIw=he)=a×S7i×S mid-air based on Leap Motion.To detect the possible keystrokes,we (6) introduce the bending angles of fingers,movement trend of a finger =1.85×102 in consecutive coordinates,and time difference between keystrokes. 1 Pm(A(ni,li)Wi="to")= 1 商×+ ×6+62 To infer the typed word sequence,we introduce Bayesian method (7) and calculate the likelihood function from the number and the =4.33×10-3 permutation of keystrokes.The experiment results show that Air- Typing can detect the keystrokes and infer the typed text efficiently, 3.3.3 Combination of the number and the permutation of keystrokes. i.e.,the true positive rate of keystroke detection is 92.2%,while the Finally,we combine the probability about the number and the per- accuracy that the top-1 inferred word is the typed word achieves mutations of the keystroke sequence,and formulate it as P(KiWi)= 90.2%. Pn(nilWi)x Pm(AlWi).For a detected keystroke sequence Ki,we first filter out the words that satisfy Ini-lils An,and then calcu- ACKNOWLEDGMENTS late the likelihood for each word,and select the word having the This work is supported by National Natural Science Foundation of highest probability as the inferred result.Here,An =2. China under Grant Nos.61802169.61872174.61832008.61902175 61906085;JiangSu Natural Science Foundation under Grant Nos. 4 PERFORMANCE EVALUATION BK20180325,BK20190293;the Key R&D Program of Jiangsu Province We implement AirTyping based on the Leap Motion Controller under Grant No.BE2018116.This work is partially supported by (LMC),as shown in Fig.1.The LMC uses the embedded cameras Collaborative Innovation Center of Novel Software Technology and and infrared LEDs to provide the coordinates of finger joints,where Industrialization.Yafeng Yin is the corresponding author. the sampling rate is 60 Hz.The user performs typing behaviors about 15cm above LMC,while the inferred words will be sent to the REFERENCES displayer for text input.The adopted dictionary includes 5000 most [1]Microsoft Hololens,https://www.microsoft.com/en-us/hololens. [2]L Xie and C.Wang and A.X.Liu and J.Sun and S.Lu,Multi-Touch in the Air:Con- frequently used words downloaded from Word Frequency Data [8]. current Micromovement Recognition Using RF Signals,in IEEE/ACM Transactions Firstly,we evaluate the performance of keystroke detection mod- on Networking,2018. ule.Specifically,each finger makes 50 keystrokes.As shown in Fig. [3]Y.Yin and Q.Li and L.Xie and S.Yi and E.Novak and S.Lu.CamK:Camera- Based Keystroke Detection and Localization for Small Mobile Devices,in IEEE 5,the average detection accuracy of the finger making a keystroke Transactions on Mobile Computing,2018. reaches 92.2%,and the false positive rate (i.e.,treat non-keystrokes [4]X.Yi and C.Yu and M.Zhang and S.Gao and K.Sun and Y.Shi,ATK:Enabling ten- finger freehand typing in air based on 3d hand tracking data,in ACM Symposium as keystrokes)and false negative rate (i.e.,treat keystrokes as non- on User Interface 2015. keystrokes)are 1.7%and 5.4%,respectively.Thus we can accurately [5]A.Boudjelthia and S.Nasim and J.Eskola and J.Adeegbe and O.Hourula and S analyze the finger movements and detect the possible keystrokes. Klakegg and D.Ferreira,Enabling Mid-air Browser Interaction with Leap Motion in ACM International Symposium on Pervasive and Ubiquitous Computing and In addition,we test the performance of word inference module. Wearable Computers,2018. Specifically,we calculate the likelihood function for each word [6]Leap Motion,https://developer.leapmotion.co and obtain the candidate words in the dictionary.As shown in [7]An average professional typist types usually in speeds of 50 to 80 wpm,https: //en.wikipedia.org/wiki/Words-per-minute. Fig.6,when the keystrokes are detected accurately,the probability [8]Word Frequency Data Set,https://www.wordfrequency.info
UbiComp/ISWC ’20 Adjunct, September 12–16, 2020, Virtual Event, Mexico Hao Zhang et al. 0.98 0.02 0.02 0.9 0.082 0.92 0.08 0.02 0.88 0.1 0.92 0.082 0.98 0.021 0.92 0.082 0.98 0.02 0.041 0.9 0.061 0.96 0.041 0.041 0.02 0.02 0.02 0.041 0.02 0.02 0.82 0 1 2 3 4 5 6 7 8 9 null Detected finger making a keystroke 0 1 2 3 4 5 6 7 8 9 Actual finger making a keystroke null 123456789 1 2 3 4 5 6 7 8 9 Figure 5: The detection accuracy of the finger making a keystroke (‘null’ means no keystrokes). If the length of the word and that of the keystroke sequence are the same (i.e., ni = lj), there is only one permutation for them, as shown in Fig. 4(b), and the likelihood about the word “the” is shown in Eq. (6). If ni > lj , we assume the word Wj = “to”. At this time, ni = 3 and lj = 2, the number of permutations changes to A(ni,lj ) = 3. The corresponding possible permutations are shown in Fig. 4(c). We will calculate the likelihood of Ki based on the word Wj with Eq. (7): Pm (A(lj ,ni )|Wj = “the”) = 1 |S4 | × 1 |S7 | × 1 |S3 | = 1.85 × 10−2 (6) Pm (A(ni,lj )|Wj = “to”) = 1 |S4 | × δ + 1 |S4 | × δ + δ 2 = 4.33 × 10−3 (7) 3.3.3 Combination of the number and the permutation of keystrokes. Finally, we combine the probability about the number and the permutations of the keystroke sequence, and formulate it as P (Ki |Wj ) = Pn (ni |Wj ) × Pm (A|Wj ). For a detected keystroke sequence Ki , we first filter out the words that satisfy |ni − lj | ≤ Δn, and then calculate the likelihood for each word, and select the word having the highest probability as the inferred result. Here, Δn = 2. 4 PERFORMANCE EVALUATION We implement AirTyping based on the Leap Motion Controller (LMC), as shown in Fig. 1. The LMC uses the embedded cameras and infrared LEDs to provide the coordinates of finger joints, where the sampling rate is 60 Hz. The user performs typing behaviors about 15cm above LMC, while the inferred words will be sent to the displayer for text input. The adopted dictionary includes 5000 most frequently used words downloaded from Word Frequency Data [8]. Firstly, we evaluate the performance of keystroke detection module. Specifically, each finger makes 50 keystrokes. As shown in Fig. 5, the average detection accuracy of the finger making a keystroke reaches 92.2%, and the false positive rate (i.e., treat non-keystrokes as keystrokes) and false negative rate (i.e., treat keystrokes as nonkeystrokes) are 1.7% and 5.4%, respectively. Thus we can accurately analyze the finger movements and detect the possible keystrokes. In addition, we test the performance of word inference module. Specifically, we calculate the likelihood function for each word and obtain the candidate words in the dictionary. As shown in Fig. 6, when the keystrokes are detected accurately, the probability 90.16 96.56 98.48 99.3 99.62 99.9 99.96 99.98 100 top-1 top-2 top-3 top-4 top-5 top-6 top-7 top-12 top-27 Candidate words 0 20 40 60 80 100 Probability (%) Figure 6: The probability that top-k candidate words contain the typed/target word. that the top-1 candidate word with the highest likelihood is the typed/target word achieves 90.2%. Besides, the probability that the top-3 candidate words contain the typed/target word achieves 98.5%. Overall, we can detect keystrokes and infer the typed words accurately, and provide an efficient mid-air typing scheme for text input. 5 CONCLUSION In this paper, we propose AirTyping, which allows people to type in mid-air based on Leap Motion. To detect the possible keystrokes, we introduce the bending angles of fingers, movement trend of a finger in consecutive coordinates, and time difference between keystrokes. To infer the typed word sequence, we introduce Bayesian method and calculate the likelihood function from the number and the permutation of keystrokes. The experiment results show that AirTyping can detect the keystrokes and infer the typed text efficiently, i.e., the true positive rate of keystroke detection is 92.2%, while the accuracy that the top-1 inferred word is the typed word achieves 90.2%. ACKNOWLEDGMENTS This work is supported by National Natural Science Foundation of China under Grant Nos. 61802169, 61872174, 61832008, 61902175, 61906085; JiangSu Natural Science Foundation under Grant Nos. BK20180325, BK20190293; the Key R&D Program of Jiangsu Province under Grant No. BE2018116. This work is partially supported by Collaborative Innovation Center of Novel Software Technology and Industrialization. Yafeng Yin is the corresponding author. REFERENCES [1] Microsoft Hololens, https://www.microsoft.com/en-us/hololens. [2] L. Xie and C. Wang and A. X. Liu and J. Sun and S. Lu, Multi-Touch in the Air: Concurrent Micromovement Recognition Using RF Signals, in IEEE/ACM Transactions on Networking, 2018. [3] Y. Yin and Q. Li and L. Xie and S. Yi and E. Novak and S. Lu, CamK: CameraBased Keystroke Detection and Localization for Small Mobile Devices, in IEEE Transactions on Mobile Computing, 2018. [4] X. Yi and C. Yu and M. Zhang and S. Gao and K. Sun and Y. Shi, ATK: Enabling tenfinger freehand typing in air based on 3d hand tracking data, in ACM Symposium on User Interface Software Technology, 2015. [5] A. Boudjelthia and S. Nasim and J. Eskola and J. Adeegbe and O. Hourula and S. Klakegg and D. Ferreira, Enabling Mid-air Browser Interaction with Leap Motion, in ACM International Symposium on Pervasive and Ubiquitous Computing and Wearable Computers, 2018. [6] Leap Motion, https://developer.leapmotion.com. [7] An average professional typist types usually in speeds of 50 to 80 wpm, https: //en.wikipedia.org/wiki/Words-per-minute. [8] Word Frequency Data Set, https://www.wordfrequency.info.