CamK:a Camera-based Keyboard for Small Mobile Devices Yafeng Yin',Qun Lit,Lei Xiet,Shanhe Yif,Ed Novak,Sanglu Lut State Key Laboratory for Novel Software Technology,Nanjing University,China College of William and Mary,Williamsburg,VA,USA Email:fyyf@dislab.nju.edu.cn,tfIxie,sanglu}@nju.edu.cn liqun@cs.wm.edu,[syi,ejnovak}@cs.wm.edu Abstract-Due to the smaller size of mobile devices,on-screen keystrokes.CamK can be used in a wide variety of scenarios, keyboards become inefficient for text entry.In this paper,we e.g.,the office,coffee shops,outdoors,etc. present CamK,a camera-based text-entry method,which uses an arbitrary panel (e.g,a piece of paper)with a keyboard layout to input text into small devices.Camk captures the images during the typing process and uses the image processing Please TYPE technique to recognize the typing behavior.The principle of CamK is to extract the keys,track the user's fingertips,detect Typing FINISHES and localize the keystroke.To achieve high accuracy of keystroke Candidato Keys' localization and low false positive rate of keystroke detection, CamK introduces the initial training and online calibration. Accuracy 9B 67 Additionally,CamK optimizes computation-intensive modules to Start Stop screen keyboard Add Speed:1.92 cpa The camera turned OFF. reduce the time latency.We implement CamK on a mobile device running Android.Our experimental results show that CamK can achieve above 95%accuracy of keystroke localization,with Fig.1.A typical use case of CamK. only 4.8%false positive keystrokes.When compared to on-screen There are three key technical challenges in CamK.(1)High keyboards,CamK can achieve 1.25X typing speedup for regular accuracy of keystroke localization:The inter-key distance in text input and 2.5X for random character input. the paper keyboard is only about two centimeters [10].While I.INTRODUCTION using image processing techniques,there may exist a position deviation between the real fingertip and the detected fingertip Recently,mobile devices have converged to a relatively To address this challenge,CamK introduces the initial training small form factor (e.g.,smartphones,Apple Watch),in order to get the optimal parameters for image processing.Besides, to be carried everywhere easily,while avoiding carrying bulky CamK uses an extended region to represent the detected laptops all the time.Consequently,interacting with small fingertip,aiming to tolerate the position deviation.In addition, mobile devices involves many challenges,a typical example CamK utilizes the features (e.g.,visually obstructed area is text input without a physical keyboard. of the pressed key)of a keystroke to verify the validity Currently,many visual keyboards are proposed.However, of a keystroke.(2)Low false positive rate of keystroke wearable keyboards [1],[2]introduce additional equipments. detection:A false positive occurs when a non-keystroke (i.e.. On-screen keyboards [3,4]usually take up a large area a period in which no fingertip is pressing any key)is treated on the screen and only support single finger for text entry. as a keystroke.To address this challenge,CamK combines Projection keyboards [5]-[9]often need an infrared or visible keystroke detection with keystroke localization.If there is light projector to display the keyboard to the user.Audio signal not a valid key pressed by the fingertip.CamK will remove 10]or camera based visual keyboards [11]-13]remove the the possible non-keystroke.Besides,CamK introduces online additional hardware.By leveraging the microphone to localize calibration to further remove the false positive keystrokes. the keystrokes,UbiK [10]requires the user to click keys with (3)Low latency:When the user presses a key on the their fingertips and nails to make an audible sound,which is paper keyboard,CamK should output the character of the not typical of typing.For existing camera based keyboards,key without any noticeable latency.Usually,the computation they either slow the typing speed [12],or should be used in in image processing is heavy,leading to large time latency controlled environments [13].They can not provide a similar in keystroke localization.To address this challenge,CamK user experience to using physical keyboards [11]. changes the sizes of images,optimizes the image processing In this paper,we propose CamK,a more natural and process,adopts multiple threads,and removes the operations intuitive text-entry method,in order to provide a PC-like text- of writing/reading images,in order to make CamK work on entry experience.CamK works with the front-facing camera the mobile device. of the mobile device and a paper keyboard,as shown in Fig.1. We make the following contributions in this paper. CamK takes pictures as the user types on the paper keyboard, We propose a novel method CamK for text-entry.CamK and uses image processing techniques to detect and localize only uses the camera of the mobile device and a paper
CamK: a Camera-based Keyboard for Small Mobile Devices Yafeng Yin† , Qun Li‡ , Lei Xie† , Shanhe Yi‡ , Ed Novak‡ , Sanglu Lu† †State Key Laboratory for Novel Software Technology, Nanjing University, China ‡College of William and Mary, Williamsburg, VA, USA Email: †yyf@dislab.nju.edu.cn, †{lxie, sanglu}@nju.edu.cn ‡ liqun@cs.wm.edu, ‡{syi, ejnovak}@cs.wm.edu Abstract—Due to the smaller size of mobile devices, on-screen keyboards become inefficient for text entry. In this paper, we present CamK, a camera-based text-entry method, which uses an arbitrary panel (e.g., a piece of paper) with a keyboard layout to input text into small devices. CamK captures the images during the typing process and uses the image processing technique to recognize the typing behavior. The principle of CamK is to extract the keys, track the user’s fingertips, detect and localize the keystroke. To achieve high accuracy of keystroke localization and low false positive rate of keystroke detection, CamK introduces the initial training and online calibration. Additionally, CamK optimizes computation-intensive modules to reduce the time latency. We implement CamK on a mobile device running Android. Our experimental results show that CamK can achieve above 95% accuracy of keystroke localization, with only 4.8% false positive keystrokes. When compared to on-screen keyboards, CamK can achieve 1.25X typing speedup for regular text input and 2.5X for random character input. I. INTRODUCTION Recently, mobile devices have converged to a relatively small form factor (e.g., smartphones, Apple Watch), in order to be carried everywhere easily, while avoiding carrying bulky laptops all the time. Consequently, interacting with small mobile devices involves many challenges, a typical example is text input without a physical keyboard. Currently, many visual keyboards are proposed. However, wearable keyboards [1], [2] introduce additional equipments. On-screen keyboards [3], [4] usually take up a large area on the screen and only support single finger for text entry. Projection keyboards [5]–[9] often need an infrared or visible light projector to display the keyboard to the user. Audio signal [10] or camera based visual keyboards [11]–[13] remove the additional hardware. By leveraging the microphone to localize the keystrokes, UbiK [10] requires the user to click keys with their fingertips and nails to make an audible sound, which is not typical of typing. For existing camera based keyboards, they either slow the typing speed [12], or should be used in controlled environments [13]. They can not provide a similar user experience to using physical keyboards [11]. In this paper, we propose CamK, a more natural and intuitive text-entry method, in order to provide a PC-like textentry experience. CamK works with the front-facing camera of the mobile device and a paper keyboard, as shown in Fig. 1. CamK takes pictures as the user types on the paper keyboard, and uses image processing techniques to detect and localize keystrokes. CamK can be used in a wide variety of scenarios, e.g., the office, coffee shops, outdoors, etc. Fig. 1. A typical use case of CamK. There are three key technical challenges in CamK. (1) High accuracy of keystroke localization: The inter-key distance in the paper keyboard is only about two centimeters [10]. While using image processing techniques, there may exist a position deviation between the real fingertip and the detected fingertip. To address this challenge, CamK introduces the initial training to get the optimal parameters for image processing. Besides, CamK uses an extended region to represent the detected fingertip, aiming to tolerate the position deviation. In addition, CamK utilizes the features (e.g., visually obstructed area of the pressed key) of a keystroke to verify the validity of a keystroke. (2) Low false positive rate of keystroke detection: A false positive occurs when a non-keystroke (i.e., a period in which no fingertip is pressing any key) is treated as a keystroke. To address this challenge, CamK combines keystroke detection with keystroke localization. If there is not a valid key pressed by the fingertip, CamK will remove the possible non-keystroke. Besides, CamK introduces online calibration to further remove the false positive keystrokes. (3) Low latency: When the user presses a key on the paper keyboard, CamK should output the character of the key without any noticeable latency. Usually, the computation in image processing is heavy, leading to large time latency in keystroke localization. To address this challenge, CamK changes the sizes of images, optimizes the image processing process, adopts multiple threads, and removes the operations of writing/reading images, in order to make CamK work on the mobile device. We make the following contributions in this paper. • We propose a novel method CamK for text-entry. CamK only uses the camera of the mobile device and a paper
keyboard.CamK allows the user to type with all the satisfied.Thus this feature is used to assist in keystroke fingers and provides a similar user experience to using localization,instead of directly determining a keystroke. physical keyboards. We design a practical framework for CamK,which can III.SYSTEM DESIGN detect and localize the keystroke with high accuracy, and output the character of the pressed key without As shown in Fig.1,CamK works with a mobile device(e.g.. any noticeable time latency.Based on image processing, a smartphone)with the embedded camera,a paper keyboard. CamK can extract the keys,track the user's fingertips, The smartphone uses the front-facing camera to watch the detect and localize keystrokes.Besides,CamK introduces typing process.The paper keyboard is placed on a flat surface. the initial training to optimize the image processing The objective is to let the keyboard layout be located in the result and utilizes online calibration to reduce the false camera's view,while making the keys in the camera's view positive keystrokes.Additionally,CamK optimizes the look as large as possible.CamK does not require the keyboard computation-intensive modules to reduce the time latency, layout is fully located in the camera's view,because some in order to make CamK work on the mobile devices. user may only want to input letters or digits.Even if the user We implement CamK on a smartphone running Google's only place the concerned part of keyboard in the camera's Android operating system(version 4.4.4).We first mea- view.CamK can still work.CamK consists of the following sure the performance of each module in CamK.Then. four components:key extraction,fingertip detection,keystroke we invite nine users!to evaluate CamK in a variety of detection and localization,and text-entry determination. real-world environments.We compare the performance Key Extraction Text-entrv of CamK with other methods,in terms of keystroke determination localization accuracy and text-entry speed. Key area.Key location Kevstrok II.OBSERVATIONS OF A KEYSTROKE Keystroke Detection and Localization In order to show the feasibility of localizing the keystroke based on image processing techniques,we first show the Frame2 Frame 1 Largest distance is vericalis ahd observations of a keystroke.Fig.2 shows the frames/images Fingertip Detection captured by the camera during two consecutive keystrokes. Being localed in the presed key The origin of coordinates is located in the top left corner of the image,as shown in Fig.2(a).We call the hand located in the left area of the image the left hand,while the other is called the right hand,as shown in Fig.2(b).From left to right,the fingers are called finger i in sequence,iE[1,10], as shown in Fig.2(c).The fingertip pressing the key is called Fig.3.Architecture of CamK StrokeTip.The key pressed by StrokeTip is called StrokeKey. A.System Overview The StrokeTip has the largest vertical coordinate among The architecture of CamK is shown in Fig.3.The input is the fingers on the same hand.An example is finger 9 in Fig.2(a).However this feature may not work well for the image taken by the camera and the output is the character thumbs,which should be identified separately. of the pressed key.Before a user begins typing,CamK uses The StrokeTip stays on the StrokeKey for a certain dura- Key Extraction to detect the keyboard and extract each key tion,as shown in Fig.2(c)-Fig.2(d).If the positions of from the image.When the user types,CamK uses Fingertip the fingertip keep unchanged,a keystroke may happen. Detection to extract the user's hands and detect the fingertip The StrokeTip is located in the StrokeKey.as shown in based on the shape of a finger,in order to track the fingertips. Fig.2(a).Fig.2(d). Based on the movements of fingertips,CamK uses Keystroke The StrokeTip obstructs the StrokeKey from the view Detection and Localization to detect a possible keystroke of the camera,as shown in Fig.2(d).The ratio of the and localize the keystroke.Finally,CamK uses Text-entry visually obstructed area to the whole area of the key can Determination to output the character of the pressed key. be used to verify whether the key is pressed. B.Key Extraction The StrokeTip has the largest vertical distance between the remaining fingertips of the corresponding hand.As Without loss of generality,Camk adopts the common shown in Fig.2(a),the vertical distance d between the QWERTY keyboard layout,which is printed in black and StrokeTip (i.e..Finger 9)and remaining fingertips in right white on a piece of paper,as shown in Fig.1.In order to hand is larger than that (d)in left hand.Considering the eliminate background effects,we first detect the boundary of difference caused by the distance between the camera the keyboard.Then,we extract each key from the keyboard. and the fingertip,sometimes this feature may not be Therefore,key extraction contains three parts:keyboard de- tection,key segmentation,and mapping the characters to the All data collection in this paper has gone through the IRB approval keys,as shown in Fig.3
keyboard. CamK allows the user to type with all the fingers and provides a similar user experience to using physical keyboards. • We design a practical framework for CamK, which can detect and localize the keystroke with high accuracy, and output the character of the pressed key without any noticeable time latency. Based on image processing, CamK can extract the keys, track the user’s fingertips, detect and localize keystrokes. Besides, CamK introduces the initial training to optimize the image processing result and utilizes online calibration to reduce the false positive keystrokes. Additionally, CamK optimizes the computation-intensive modules to reduce the time latency, in order to make CamK work on the mobile devices. • We implement CamK on a smartphone running Google’s Android operating system (version 4.4.4). We first measure the performance of each module in CamK. Then, we invite nine users1 to evaluate CamK in a variety of real-world environments. We compare the performance of CamK with other methods, in terms of keystroke localization accuracy and text-entry speed. II. OBSERVATIONS OF A KEYSTROKE In order to show the feasibility of localizing the keystroke based on image processing techniques, we first show the observations of a keystroke. Fig. 2 shows the frames/images captured by the camera during two consecutive keystrokes. The origin of coordinates is located in the top left corner of the image, as shown in Fig. 2(a). We call the hand located in the left area of the image the left hand, while the other is called the right hand, as shown in Fig. 2(b). From left to right, the fingers are called finger i in sequence, i ∈ [1, 10], as shown in Fig. 2(c). The fingertip pressing the key is called StrokeTip. The key pressed by StrokeTip is called StrokeKey. • The StrokeTip has the largest vertical coordinate among the fingers on the same hand. An example is finger 9 in Fig. 2(a). However this feature may not work well for thumbs, which should be identified separately. • The StrokeTip stays on the StrokeKey for a certain duration, as shown in Fig. 2(c) - Fig. 2(d). If the positions of the fingertip keep unchanged, a keystroke may happen. • The StrokeTip is located in the StrokeKey, as shown in Fig. 2(a), Fig. 2(d). • The StrokeTip obstructs the StrokeKey from the view of the camera, as shown in Fig. 2(d). The ratio of the visually obstructed area to the whole area of the key can be used to verify whether the key is pressed. • The StrokeTip has the largest vertical distance between the remaining fingertips of the corresponding hand. As shown in Fig. 2(a), the vertical distance dr between the StrokeTip (i.e., Finger 9) and remaining fingertips in right hand is larger than that (dl) in left hand. Considering the difference caused by the distance between the camera and the fingertip, sometimes this feature may not be 1All data collection in this paper has gone through the IRB approval satisfied. Thus this feature is used to assist in keystroke localization, instead of directly determining a keystroke. III. SYSTEM DESIGN As shown in Fig. 1, CamK works with a mobile device (e.g., a smartphone) with the embedded camera, a paper keyboard. The smartphone uses the front-facing camera to watch the typing process. The paper keyboard is placed on a flat surface. The objective is to let the keyboard layout be located in the camera’s view, while making the keys in the camera’s view look as large as possible. CamK does not require the keyboard layout is fully located in the camera’s view, because some user may only want to input letters or digits. Even if the user only place the concerned part of keyboard in the camera’s view, CamK can still work. CamK consists of the following four components: key extraction, fingertip detection, keystroke detection and localization, and text-entry determination. Key Extraction Keyboard detection Key segmentation Mapping Fingertip Detection Hand segmentation Fingertip discovery Frame i Keystroke Detection and Localization Candidate fingertip selection Key area, Key location Keystroke Nonkeystroke Text-entry determination Output Fingertips’ locations Keystroke location Being located in the same key for nd consecutive frames Frame j Frame i-2 Frame i-1 Fig. 3. Architecture of CamK. A. System Overview The architecture of CamK is shown in Fig. 3. The input is the image taken by the camera and the output is the character of the pressed key. Before a user begins typing, CamK uses Key Extraction to detect the keyboard and extract each key from the image. When the user types, CamK uses Fingertip Detection to extract the user’s hands and detect the fingertip based on the shape of a finger, in order to track the fingertips. Based on the movements of fingertips, CamK uses Keystroke Detection and Localization to detect a possible keystroke and localize the keystroke. Finally, CamK uses Text-entry Determination to output the character of the pressed key. B. Key Extraction Without loss of generality, CamK adopts the common QWERTY keyboard layout, which is printed in black and white on a piece of paper, as shown in Fig. 1. In order to eliminate background effects, we first detect the boundary of the keyboard. Then, we extract each key from the keyboard. Therefore, key extraction contains three parts: keyboard detection, key segmentation, and mapping the characters to the keys, as shown in Fig. 3
(a)Frame 1 (b)Frame 2 (c)Frame 3 (d)Frame 4 (e)Frame 5 Fig.2.Frames during two consecutive keystrokes 1)Keyboard detection:We use Canny edge detection algo-The initial/default value of y is y=50 rithm [14]to obtain the edges of the keyboard.Fig.4(b)shows When we obtain the white pixels,we need to get the the edge detection result of Fig.4(a).However,the interference contours of the keys and separate the keys from one another. edges (e.g.,the paper's edge/longest edge in Fig.4(b))should While considering the pitfall areas such as small white areas be removed.Based on Fig.4(b),the edges of the keyboard which do not belong to any key,we estimate the area of a key should be close to the edges of the keys.We use this feature at first.Based on Fig.4(e),we use P1,P2,P3,P to calculate to remove pitfall edges,the result is shown in Fig.4(c). the area S of the keyboard as S=·(IbFB×hP4+ Additionally,we adopt the dilation operation [15]to join the PPx PP).Then,we calculate the area of each key. dispersed edge points which are close to each other,in order to We use N to represent the number of keys in the keyboard. get better edges/boundaries of the keyboard.After that,we use Considering the size difference between keys,we treat larger the Hough transform [12]to detect the lines in Fig.4(c).Then, keys (e.g.,the space key)as multiple regular keys (e.g.,A-Z, we use the uppermost line and the bottom line to describe 0-9).For example,the space key is treated as five regular keys. the position range of the keyboard,as shown in Fig.4(d). In this way,we will change N to Navg.Then,we can estimate Similarly,we can use the Hough transform [12]to detect the the average area of a regular key as S/Nv In addition to left/right edge of the keyboard.If there are no suitable edges size difference between keys,different distances between the detected by the Hough transform,it is usually because the camera and the keys can also affect the area of a key in the keyboard is not perfectly located in the camera's view.In this image.Therefore,we introduce a.ah to describe the range of case,we simply use the left/right boundary of the image to represent the left/right edge of the keyboard.As shown in a valid area S of a key as 5o We set a=0.15,an =5 in CamK,based on extensive experiments. Fig.4(e),we extend the four edges (lines)to get four inter- The key segmentation result of Fig.4(e)is shown in Fig.4(f) sections Pi(1,1),P2(x2,2),P3(x3,y3),P(4,y4),which Then,we use the location of the space key (biggest key)to are used to describe the boundary of the keyboard. locate other keys,based on the relative locations between keys. C.Fingertip Detection In order to detect keystrokes,CamK needs to detect the fingertips and track the movements of fingertips.Fingertip de- tection consists of hand segmentation and fingertip discovery. (a)An input image (b)Canny edge detec-(c)Optimization for 1)Hand segmentation:Skin segmentation [15]is a com- tion result edges mon method used for hand detection.In YCrCb color space,a pixel (Y,Cr,Cb)is determined to be a skin pixel,if it satisfies Cr E [133,173]and Cb E [77,127].However,the threshold values of Cr and Cb can be affected by the surroundings such as lighting conditions.It is difficult to choose suitable 下国为 threshold values for Cr and Cb.Therefore.we combine Otsu's (d)Position range of (e)Keyboard boundary (f)Key segmentation re method [16]and the red channel in YCrCb color space for skin keyboard sult segmentation. Fig.4.Keyboard detection and key extraction In YCrCb color space,the red channel Cr is essential to 2)Key segmentation:With the known location of the key- human skin coloration.Therefore,for a captured image,we board.we can extract the keys based on color segmentation. use the grayscale image that is split based on Cr channel In YCrCb space,the color coordinate (Y,Cr,Cb)of a white as an input for Otsu's method.Otsu's method [16]can pixel is (255,128,128),while that of a black pixel is (0, automatically perform clustering-based image thresholding, 128,128).Thus,we can only use the difference of the Y i.e.,it can calculate the optimal threshold to separate the value between the pixels to distinguish the white keys from the foreground and background.Therefore,this skin segmentation black background.If a pixel is located in the keyboard,while approach can tolerate the effect caused by environments such satisfying 255-E<Y<255,the pixel belongs to a key.as lighting conditions.For the input image Fig.5(a),the hand The offsets yEN of Y is mainly caused by light conditions. segmentation result is shown in Fig.5(b),where the white ey can be estimated in the initial training(see section IV-A). regions represent the hand regions,while the black regions
O (0, 0) x y dl dr (a) Frame 1 Left hand Right hand (b) Frame 2 Finger number 1 2 3 4 5 6 7 8 9 10 (c) Frame 3 (d) Frame 4 (e) Frame 5 Fig. 2. Frames during two consecutive keystrokes 1) Keyboard detection: We use Canny edge detection algorithm [14] to obtain the edges of the keyboard. Fig. 4(b) shows the edge detection result of Fig. 4(a). However, the interference edges (e.g., the paper’s edge / longest edge in Fig. 4(b)) should be removed. Based on Fig. 4(b), the edges of the keyboard should be close to the edges of the keys. We use this feature to remove pitfall edges, the result is shown in Fig. 4(c). Additionally, we adopt the dilation operation [15] to join the dispersed edge points which are close to each other, in order to get better edges/boundaries of the keyboard. After that, we use the Hough transform [12] to detect the lines in Fig. 4(c). Then, we use the uppermost line and the bottom line to describe the position range of the keyboard, as shown in Fig. 4(d). Similarly, we can use the Hough transform [12] to detect the left/right edge of the keyboard. If there are no suitable edges detected by the Hough transform, it is usually because the keyboard is not perfectly located in the camera’s view. In this case, we simply use the left/right boundary of the image to represent the left/right edge of the keyboard. As shown in Fig. 4(e), we extend the four edges (lines) to get four intersections P1(x1, y1), P2(x2, y2), P3(x3, y3), P4(x4, y4), which are used to describe the boundary of the keyboard. (a) An input image (b) Canny edge detection result (c) Optimization for edges (d) Position range of keyboard P1 (x1, y1) P4 (x4, y4) P2 (x2, y2) P3 (x3, y3) (e) Keyboard boundary (f) Key segmentation result Fig. 4. Keyboard detection and key extraction 2) Key segmentation: With the known location of the keyboard, we can extract the keys based on color segmentation. In YCrCb space, the color coordinate (Y, Cr, Cb) of a white pixel is (255, 128, 128), while that of a black pixel is (0, 128, 128). Thus, we can only use the difference of the Y value between the pixels to distinguish the white keys from the black background. If a pixel is located in the keyboard, while satisfying 255 − εy ≤ Y ≤ 255, the pixel belongs to a key. The offsets εy ∈ N of Y is mainly caused by light conditions. εy can be estimated in the initial training (see section IV-A). The initial/default value of εy is εy = 50. When we obtain the white pixels, we need to get the contours of the keys and separate the keys from one another. While considering the pitfall areas such as small white areas which do not belong to any key, we estimate the area of a key at first. Based on Fig. 4(e), we use P1, P2, P3, P4 to calculate the area Sb of the keyboard as Sb = 1 2 · (| −−−→ P1P2 × −−−→ P1P4| + | −−−→ P3P4 × −−−→ P3P2|). Then, we calculate the area of each key. We use N to represent the number of keys in the keyboard. Considering the size difference between keys, we treat larger keys (e.g., the space key) as multiple regular keys (e.g., A-Z, 0-9). For example, the space key is treated as five regular keys. In this way, we will change N to Navg. Then, we can estimate the average area of a regular key as Sb/Navg. In addition to size difference between keys, different distances between the camera and the keys can also affect the area of a key in the image. Therefore, we introduce αl , αh to describe the range of a valid area Sk of a key as Sk ∈ [αl · Sb Navg , αh · Sb Navg ]. We set αl = 0.15, αh = 5 in CamK, based on extensive experiments. The key segmentation result of Fig. 4(e) is shown in Fig. 4(f). Then, we use the location of the space key (biggest key) to locate other keys, based on the relative locations between keys. C. Fingertip Detection In order to detect keystrokes, CamK needs to detect the fingertips and track the movements of fingertips. Fingertip detection consists of hand segmentation and fingertip discovery. 1) Hand segmentation: Skin segmentation [15] is a common method used for hand detection. In YCrCb color space, a pixel (Y, Cr, Cb) is determined to be a skin pixel, if it satisfies Cr ∈ [133, 173] and Cb ∈ [77, 127]. However, the threshold values of Cr and Cb can be affected by the surroundings such as lighting conditions. It is difficult to choose suitable threshold values for Cr and Cb. Therefore, we combine Otsu’s method [16] and the red channel in YCrCb color space for skin segmentation. In YCrCb color space, the red channel Cr is essential to human skin coloration. Therefore, for a captured image, we use the grayscale image that is split based on Cr channel as an input for Otsu’s method. Otsu’s method [16] can automatically perform clustering-based image thresholding, i.e., it can calculate the optimal threshold to separate the foreground and background. Therefore, this skin segmentation approach can tolerate the effect caused by environments such as lighting conditions. For the input image Fig. 5(a), the hand segmentation result is shown in Fig. 5(b), where the white regions represent the hand regions, while the black regions
00,0) 0(0,0) represent the background.However,around the hands,there exist some interference regions,which may change the con- g(g4g) tours of fingers,resulting in detecting wrong fingertips.Thus. P4g(+g,4g】 CamK introduces the erosion and dilation operations [17]. Ly P-g(x-) P(x.y) We first use the erosion operation to isolate the hands from keys and separate each finger.Then,we use the dilation (a)Fingertips (excluding thumbs) (b)A thumb operation to smooth the edge of the fingers.Fig.5(c)shows Fig.6.Features of a fingertip the optimized result of hand segmentation.Intuitively,if the In fingertip detection,we only need to detect the points color of the user's clothes is close to his/her skin color,the located in the bottom edge (from the left most point to the hand segmentation result will become worse.At this time,we right most point)of the hand,such as the blue contour of only focus on the hand region located in the keyboard area. right hand in Fig.5(d).The shape feature 0;and the positions Due to the color difference between the keyboard and human in vertical coordinates yi along the bottom edge are shown skin,CamK can still extract the hands efficiently Fig.5(e).If we can detect five fingertips in a hand with; and yi,we will not detect the thumb specially. Otherwise,we detect the fingertip of the thumb in the right most area of left hand or left most area of right hand according to 0i and i-,i.i+.The detected fingertips of Fig.5(a) are marked in Fig.5(f). D.Keystroke Detection and Localization (a)An input image (b)Hand segmentation (c)Optimization When CamK detects the fingertips,it will track the fin- gertips to detect a possible keystroke and localize it.The keystroke localization result can be used to remove false pos- itive keystrokes.We illustrate the whole process of keystroke detection and localization together. 1)Candidate fingertip in each hand:CamK allows the (d)Fingers'contour (e)Fingertip discovery (f)Fingertips user to use all the fingers for text-entry,thus the keystroke Fig.5.Fingertip detection may be caused by the left or right hand.According to the 2)Fingertip discovery:After we extract the fingers,we observations (see section II),the fingertip (i.e.,StrokeTip) need to detect the fingertips.As shown in Fig.6(a).the pressing the key usually has the largest vertical coordinate in fingertip is usually a convex vertex of the finger.For a point that hand.Therefore,we first select the candidate fingertip with Pi(xi,yi)located in the contour of a hand,by tracing the the largest vertical coordinate in each hand.We respectively contour,we can select the point Pi(i)before P use C and C.to represent the points located in the contour and the point P+(after Pi.Here,i,q.E N.We of left hand and right hand.For all points in C,if a point calculate the angle between the two vectors PPPP B(x,h)satisfiesy≥yi,l≠j,Pi,P∈C,then P will be according to Eq.(1).In order to simplify the calculation for, selected as the candidate fingertip in the left hand.Similarly, we can get the candidate fingertip Pr(x,,r)in the right hand. we map 0i in the range0a∈[0°,l80].If0:∈[a,0h],aiy and states of hands.It is unnecessary to detect other fingertips. yi>+Otherwise,Pi will not be a candidate vertex.If 2)Moving or staying:As described in the observations, there are multiple candidate vertexes,such as P in Fig.6(a), when the user presses a key,the fingertip will stay at that key for a certain duration.Therefore,we can use the loca- we choose the vertex which has the largest vertical coordinate, as P shown in Fig.6(a).Because this point has the largest tion variation of the candidate fingertip to detect a possible probability to be a fingertip.Based on extensive experiments, keystroke.In Frame i,we use P(L,y)and Pr(r,y) to represent the candidate fingertips in the left hand and right we set 0=60°,0h=150°,q=20 in this paper. hand,respectively.Based on Fig.5,the interference regions around a fingertip may affect the contour of the fingertip,there 0;arccos- P:P-PPito (1) PP-gl·lPP+gl may exist a position deviation between the real fingertip and the detected fingertip.Therefore,if the candidate fingertips in Considering the specificity of thumbs,which may press the frame i-1,i satisfy Eq.(2),the fingertips can be treated as key (e.g.,space key)in a different way from other fingers, static,i.e.,a keystroke probably happens.Based on extensive the relative positions of P-.Pi,P+may change.Fig.6(b) experiments,we set Ar =5 empirically. shows the thumb in the left hand.Obviously,Pi-Pi.P+ do not satisfy yi>yi-g and yi>yi+Therefore,we use V:-户+(-卫≤△r, (2) (i-i-)(i+)>0 to describe the relative locations of V(Em-xm-12+(r-r4-1P≤△r Pi-q,P,P+a in thumbs.Then,we choose the vertex which 3)Discovering the pressed key:For a keystroke,the finger- has the largest vertical coordinate as the fingertip. tip is located at the key and a part of the key will be visually
represent the background. However, around the hands, there exist some interference regions, which may change the contours of fingers, resulting in detecting wrong fingertips. Thus, CamK introduces the erosion and dilation operations [17]. We first use the erosion operation to isolate the hands from keys and separate each finger. Then, we use the dilation operation to smooth the edge of the fingers. Fig. 5(c) shows the optimized result of hand segmentation. Intuitively, if the color of the user’s clothes is close to his/her skin color, the hand segmentation result will become worse. At this time, we only focus on the hand region located in the keyboard area. Due to the color difference between the keyboard and human skin, CamK can still extract the hands efficiently. (a) An input image (b) Hand segmentation (c) Optimization (d) Fingers’contour 0 100 200 300 400 500 0 20 40 60 80 100 120 140 160 180 200 Point sequence Angle () 0 100 200 300 400 500 0 40 80 120 160 200 240 280 320 360 400 Vertical coordinate (e) Fingertip discovery (f) Fingertips Fig. 5. Fingertip detection 2) Fingertip discovery: After we extract the fingers, we need to detect the fingertips. As shown in Fig. 6(a), the fingertip is usually a convex vertex of the finger. For a point Pi(xi , yi) located in the contour of a hand, by tracing the contour, we can select the point Pi−q(xi−q, yi−q) before Pi and the point Pi+q(xi+q, yi+q) after Pi . Here, i, q ∈ N. We calculate the angle θi between the two vectors −−−−→ PiPi−q, −−−−→ PiPi+q, according to Eq. (1). In order to simplify the calculation for θi , we map θi in the range θi ∈ [0◦ , 180◦ ]. If θi ∈ [θl , θh], θl yi−q and yi > yi+q. Otherwise, Pi will not be a candidate vertex. If there are multiple candidate vertexes, such as P 0 i in Fig. 6(a), we choose the vertex which has the largest vertical coordinate, as Pi shown in Fig. 6(a). Because this point has the largest probability to be a fingertip. Based on extensive experiments, we set θl = 60◦ , θh = 150◦ , q = 20 in this paper. θi = arccos −−−−→ PiPi−q · −−−−→ PiPi+q | −−−−→ PiPi−q| · |−−−−→ PiPi+q| (1) Considering the specificity of thumbs, which may press the key (e.g., space key) in a different way from other fingers, the relative positions of Pi−q, Pi , Pi+q may change. Fig. 6(b) shows the thumb in the left hand. Obviously, Pi−q, Pi , Pi+q do not satisfy yi > yi−q and yi > yi+q. Therefore, we use (xi−xi−q)·(xi−xi+q) > 0 to describe the relative locations of Pi−q, Pi , Pi+q in thumbs. Then, we choose the vertex which has the largest vertical coordinate as the fingertip. i ( , ) P x y i i i ( , ) P x y i q i q i q ( , ) P x y i q i q i q ' Pi O (0,0) x y (a) Fingertips (excluding thumbs) i ( , ) P x y i i i ( , ) P x y i q i q i q ( , ) P x y i q i q i q ' Pi O (0,0) x y (b) A thumb Fig. 6. Features of a fingertip In fingertip detection, we only need to detect the points located in the bottom edge (from the left most point to the right most point) of the hand, such as the blue contour of right hand in Fig. 5(d). The shape feature θi and the positions in vertical coordinates yi along the bottom edge are shown Fig. 5(e). If we can detect five fingertips in a hand with θi and yi−q, yi , yi+q, we will not detect the thumb specially. Otherwise, we detect the fingertip of the thumb in the right most area of left hand or left most area of right hand according to θi and xi−q, xi , xi+q. The detected fingertips of Fig. 5(a) are marked in Fig. 5(f). D. Keystroke Detection and Localization When CamK detects the fingertips, it will track the fingertips to detect a possible keystroke and localize it. The keystroke localization result can be used to remove false positive keystrokes. We illustrate the whole process of keystroke detection and localization together. 1) Candidate fingertip in each hand: CamK allows the user to use all the fingers for text-entry, thus the keystroke may be caused by the left or right hand. According to the observations (see section II), the fingertip (i.e., StrokeTip) pressing the key usually has the largest vertical coordinate in that hand. Therefore, we first select the candidate fingertip with the largest vertical coordinate in each hand. We respectively use Cl and Cr to represent the points located in the contour of left hand and right hand. For all points in Cl , if a point Pl(xl , yl) satisfies yl ≥ yj , l 6= j, Pj , Pl ∈ Cl , then Pl will be selected as the candidate fingertip in the left hand. Similarly, we can get the candidate fingertip Pr(xr, yr) in the right hand. In this step, we only need to get Pl and Pr to know the moving states of hands. It is unnecessary to detect other fingertips. 2) Moving or staying: As described in the observations, when the user presses a key, the fingertip will stay at that key for a certain duration. Therefore, we can use the location variation of the candidate fingertip to detect a possible keystroke. In Frame i, we use Pli (xli , yli ) and Pri (xri , yri ) to represent the candidate fingertips in the left hand and right hand, respectively. Based on Fig. 5, the interference regions around a fingertip may affect the contour of the fingertip, there may exist a position deviation between the real fingertip and the detected fingertip. Therefore, if the candidate fingertips in frame i − 1, i satisfy Eq. (2), the fingertips can be treated as static, i.e., a keystroke probably happens. Based on extensive experiments, we set ∆r = 5 empirically. p (xli − xli−1 ) 2 + (yli − yli−1 ) p 2 ≤ ∆r, (xri − xri−1 ) 2 + (yri − yri−1 ) 2 ≤ ∆r. (2) 3) Discovering the pressed key: For a keystroke, the fingertip is located at the key and a part of the key will be visually
obstructed by that fingertip,as shown in Fig.2(d).We treat the the pressed key,it is visually obstructed by the fingertip,as the thumb as a special case,and also select it as a candidate fin- dashed area of key Ki shown in Fig.7(a).We use the coverage gertip at first.Then,we get the candidate fingertip set Cp= ratio to measure the visually obstructed area of a candidate key. [P,Pr,left thumb in frame i,right thumb in frame i).Af- in order to remove the wrong candidate keys.For a candidate ter that,we can localize the keystroke by using Alg.1. key Ki,whose area is Ski,the visually obstructed area is Dj, Eliminating impossible fingertips:For convenience,we use Pi to represent the fingertip in Ctip,i.e.,PE Ctip,iE thecoverae rtio isForrer ky (he [1,4].If a fingertip Pi is not located in the keyboard region, space key),we update the pk,by multiplying a key size factor CamK eliminates it from the candidate fingertips Ctip. ,iep%,=min(e·fj,l,where方=Sk,/S,Hee, means the average area of a key,as described in section III-B2. 0(0,0 (ry) (x2,y2】 If pk,>pl,the key Kj is still a candidate key.Otherwise, A CamK removes it from the candidate key set Ckey.We set pr= P 0.25 in this paper.For each hand,if there is more than one (x,y2 candidate key,we will keep the key with largest coverage ratio D () (y) as the final candidate key.For a candidate fingertip,if there is no candidate key associated with it,the candidate fingertip (a)Candidate keys (b)Locating a fingertip will be eliminated.Fig.8(c)shows each candidate fingertip Fig.7.Candidate keys and Candidate fingertips and its associated key. Selecting the nearest candidate keys:For each candidate fingertip Pi,we first search the candidate keys which are probably pressed by Pi.As shown in Fig.7(a),although the real fingertip is Pi,the detected fingertip is Pi.We use P to search the candidate keys.We use Kej(rej,yej)to represent the centroid of key Kj.We get two rows of keys nearest the location Pi(i,)(i.e.,the rows with two smallest lyj-). For each row,we select two nearest keys (i.e.,the keys with (a)Keys around the fingertip (b)Keys containing the fingertip two smallest ej-).In Fig.7(a),the candidate key set Ckey is consisted of K1,K2,K3,K4.Fig.8(a)shows the candidate keys of the fingertip in each hand. Keeping candidate keys containing the candidate finger- tip:If a key is pressed by the user,the fingertip will be located in that key.Thus we use the location of the fingertip P(,)to verify whether a candidate key contains the fingertip,in order to remove the invalid candidate keys.As (c)Visually obstructed key (d)Vertical distance with re shown in Fig.7(a),there exists a small deviation between maining fingertips the real fingertip and the detected fingertip.Therefore,we Fig.8.Candidate fingertips/keys in each step extend the range of the detected fingertip to Ri,as shown in 4)Vertical distance with remaining fingertips:Until now, Fig.7(a).If any point P(k,y)in the range Ri is located in there is one candidate fingertip in each hand at most.If there a candidate key Kj,Pi is considered to be located in Kj.Ri are no candidate fingertips,then we infer that no keystroke is calculated as{Pk∈Rl√i-Tk2+(i-张)2≤△r},happens.If there is only one candidate fingertip,then the we set Ar =5 empirically. fingertip is the StrokeTip,and the associated candidate key is As shown in Fig.7(b),a key is represented as a quadrangle StrokeKey.However,if there are two candidate fingertips,we ABCD.If a point is located in ABCD,when we move will utilize the vertical distance between the candidate fingertip around ABCD clockwise,the point will be located in the and the remaining fingertips to choose the most probable right side of each edge in ABCD.As shown in Fig.2(a),the StrokeTip,as shown in Fig.2(a). origin of coordinates is located in the top left corner of the We use P(,y)and P(r,y)to represent the candidate image.Therefore,if the fingertip P E Ri satisfy Eq.(3).it fingertips in the left hand and right hand,respectively.Then, is located in the key.CamK will keep it as a candidate key. we calculate the distance d between p and the remaining Otherwise,CamK removes the key from the candidate key set fingertips in left hand,and the distance d,between P and Ckey In Fig.7(a).K1,K2 are the remaining candidate keys. the remaining fingertips in right hand.Here,d=ly- The candidate keys contain the fingertip in Fig.8(a)is shown ∑ih,i≠小whiled.=lr-子·∑6°功,i≠r小Here, in Fig.8(b) y represents the vertical coordinate of fingertip j.If d>dr, A×A≥0,BC×B2≥0, we choose P as the StrokeTip.Otherwise.we choose P as the CD×C2≥0,DA×Dp≥0. (3) StrokeTip.The associated key for the StrokeTip is the pressed key StrokeKey.In Fig.8(d),we choose fingertip 3 in the left Calculating the coverage ratios of candidate keys:For hand as the StrokeTip.However,based on the observations,the
obstructed by that fingertip, as shown in Fig. 2(d). We treat the thumb as a special case, and also select it as a candidate fingertip at first. Then, we get the candidate fingertip set Ctip = {Pli , Pri , left thumb in frame i, right thumb in frame i}. After that, we can localize the keystroke by using Alg. 1. Eliminating impossible fingertips: For convenience, we use Pi to represent the fingertip in Ctip, i.e., Pi ∈ Ctip, i ∈ [1, 4]. If a fingertip Pi is not located in the keyboard region, CamK eliminates it from the candidate fingertips Ctip. O (0,0) x y ˆ Pi Ri K1 K2 K3 K4 Pi (a) Candidate keys A B D C P 1 j1 ( , y ) j x 2 j2 ( , y ) j x 3 j3 ( , y ) j x 4 j4 ( , y ) j x ( , y ) i i x (b) Locating a fingertip Fig. 7. Candidate keys and Candidate fingertips Selecting the nearest candidate keys: For each candidate fingertip Pi , we first search the candidate keys which are probably pressed by Pi . As shown in Fig. 7(a), although the real fingertip is Pi , the detected fingertip is Pˆ i . We use Pˆ i to search the candidate keys. We use Kcj (xcj , ycj ) to represent the centroid of key Kj . We get two rows of keys nearest the location Pˆ i(ˆxi , yˆi) (i.e., the rows with two smallest |ycj −yˆi |). For each row, we select two nearest keys (i.e., the keys with two smallest |xcj − xˆi |). In Fig. 7(a), the candidate key set Ckey is consisted of K1, K2, K3, K4 . Fig. 8(a) shows the candidate keys of the fingertip in each hand. Keeping candidate keys containing the candidate fingertip: If a key is pressed by the user, the fingertip will be located in that key. Thus we use the location of the fingertip Pˆ i(ˆxi , yˆi) to verify whether a candidate key contains the fingertip, in order to remove the invalid candidate keys. As shown in Fig. 7(a), there exists a small deviation between the real fingertip and the detected fingertip. Therefore, we extend the range of the detected fingertip to Ri , as shown in Fig. 7(a). If any point Pk(xk, yk) in the range Ri is located in a candidate key Kj , Pˆ i is considered to be located in Kj . Ri is calculated as {Pk ∈ Ri | p (ˆxi − xk) 2 + (ˆyi − yk) 2 ≤ ∆r}, we set ∆r = 5 empirically. As shown in Fig. 7(b), a key is represented as a quadrangle ABCD. If a point is located in ABCD, when we move around ABCD clockwise, the point will be located in the right side of each edge in ABCD. As shown in Fig. 2(a), the origin of coordinates is located in the top left corner of the image. Therefore, if the fingertip P ∈ Ri satisfy Eq. (3), it is located in the key. CamK will keep it as a candidate key. Otherwise, CamK removes the key from the candidate key set Ckey. In Fig. 7(a), K1, K2 are the remaining candidate keys. The candidate keys contain the fingertip in Fig. 8(a) is shown in Fig. 8(b). −−→AB × −→AP ≥ 0, −−→BC × −−→BP ≥ 0, −−→CD × −−→CP ≥ 0, −−→DA × −−→DP ≥ 0. (3) Calculating the coverage ratios of candidate keys: For the pressed key, it is visually obstructed by the fingertip, as the dashed area of key K1 shown in Fig. 7(a). We use the coverage ratio to measure the visually obstructed area of a candidate key, in order to remove the wrong candidate keys. For a candidate key Kj , whose area is Skj , the visually obstructed area is Dkj , then its coverage ratio is ρkj = Dkj Skj . For a larger key (e.g., the space key), we update the ρkj by multiplying a key size factor fj , i.e., ρkj = min( Dkj Skj ·fj , 1), where fj = SKj /Sk. Here, Sk means the average area of a key, as described in section III-B2. If ρkj ≥ ρl , the key Kj is still a candidate key. Otherwise, CamK removes it from the candidate key set Ckey. We set ρl = 0.25 in this paper. For each hand, if there is more than one candidate key, we will keep the key with largest coverage ratio as the final candidate key. For a candidate fingertip, if there is no candidate key associated with it, the candidate fingertip will be eliminated. Fig. 8(c) shows each candidate fingertip and its associated key. (a) Keys around the fingertip (b) Keys containing the fingertip (c) Visually obstructed key (d) Vertical distance with remaining fingertips Fig. 8. Candidate fingertips/keys in each step 4) Vertical distance with remaining fingertips: Until now, there is one candidate fingertip in each hand at most. If there are no candidate fingertips, then we infer that no keystroke happens. If there is only one candidate fingertip, then the fingertip is the StrokeTip, and the associated candidate key is StrokeKey. However, if there are two candidate fingertips, we will utilize the vertical distance between the candidate fingertip and the remaining fingertips to choose the most probable StrokeTip, as shown in Fig. 2(a). We use Pl(xl , yl) and Pr(xr, yr) to represent the candidate fingertips in the left hand and right hand, respectively. Then, we calculate the distance dl between Pl and the remaining fingertips in left hand, and the distance dr between Pr and the remaining fingertips in right hand. Here, dl = |yl − 1 4 · Pj=5 j=1 yj , j 6= l|, while dr = |yr − 1 4 · Pj=10 j=6 yj , j 6= r|. Here, yj represents the vertical coordinate of fingertip j. If dl > dr, we choose Pl as the StrokeTip. Otherwise, we choose Pr as the StrokeTip. The associated key for the StrokeTip is the pressed key StrokeKey. In Fig. 8(d), we choose fingertip 3 in the left hand as the StrokeTip. However, based on the observations, the
distance between the camera and hands may affect the value of Frame rate selection:CamK sets the initial/default frame d(d).Therefore,for the unselected candidate fingertip (e.g.,rate of the camera to be fo=30fps (frames per second). fingertip 8 in Fig.8(d)),we do not discard it.We display its which is usually the maximal frame rate of many smartphones associated key as the candidate key.The user can select the For the ith keystroke.the number of frames containing the candidate key for text input(see Fig.1). keystroke is represented as no,.When the user has pressed u keys,we can get the average number of frames during Algorithm 1:Keystroke localization a keystroke as o=吉,∑Tno,-In fact.refects the Input:Candidate fingertip set Ctip in frame i. duration of a keystroke.When the frame rate f changes,the Remove fingertips out of the keyboard from Cip. number of frames in a keystroke nf changes.Intuitively,a for P∈Ctip do smaller value of nif can reduce the image processing time, Obtain candidate key set Ckey with four nearest keys while a larger value of nif can improve the accuracy of around P;. keystroke localization.Based on extensive experiments (see forK,∈Ckey do if P;is located in Ki then section V-C,we set ri=3,thusf=fo·需 Calculate the coverage ratio P&,of Kj. B.Online Calibration if Pk,pL then LRemove Kj from Ckey. Removing false positive keystrokes:Sometimes,the fin- gers may keep still,even the user does not type any key. if Ckey≠0then CamK may treat the non-keystroke as a keystroke by chance, Select Ki with largest Pk,from Ckey. leading to an error.Thus we introduce a temporary character P and Ki form a combination Pi,Ki>. to mitigate this problem. else Remove P:from Ctip In the process of pressing a key,the StrokeTip moves towards the key,stays at that key,and then moves away.The if Cip=0 then No keystroke occurs,retum vertical coordinate of the StrokeTip first increases,then pauses, if Ctipl =1 then Return the associated key of the only fingertip. then decreases.If CamK has detected a keystroke in the rf consecutive frames,it will display the current character on For each hand,select P,Ki>with largest ratio pk. the screen as a temporary character.In the next frame(s),if Use P,K>(to represent the fingertip the position of the StrokeTip does not satisfy the features of and its associated key in left(right)hand. a keystroke,CamK will cancel the temporary character.This Calculate d(d,)between P(P)with the remaining does not have much impact on the user's experience,because fingertips in left(right)hand. if di>d,then Return K; of the very short time during two consecutive frames.Besides, CamK also displays the candidate keys around the StrokeTip, else Return K,; the user can choose them for text input. Output:The pressed key. Movement of smartphone or keyboard:CamK presumes that the smartphone and the keyboard are kept at stable IV.OPTIMIZATIONS FOR KEYSTROKE LOCALIZATION AND positions during its usage life-cycle.For best results,we IMAGE PROCESSING recommend the user tape the paper keyboard on the panel. A.Initial Training However,to alleviate the effect caused by the movements of the mobile device or the keyboard,we offer a simple solution. Optimal parameters for image processing:For key seg- If the user uses the Delete key on the screen multiple times mentation (see section III-B2),is used for tolerating the (e.g.,larger than 3 times),it may indicate CamK can not output change of Y caused by environments.Initially,e=50.CamK the character correctly.The movements of the device/keyboard updates +1,when the number of extracted keys may happen.Then,CamK informs the user to move his/her decreases,it stops.Then,CamK sets y to 50 and updates hands away from the keyboard for relocation.After that,the =-1,when the number of extracted keys decreases, user can continue the typing process. it stops.In the process,when CamK gets maximum number of keys,the corresponding value is selected as the optimal C.Real Time Image Processing value for Ey. Because image processing is rather time-consuming,it is In hand segmentation,CamK uses erosion and dilation difficult to make CamK work on the mobile device.Take operations,which respectively use a kernel B [17]to process the Samsung GT-19100 smartphone as an example,when the images.In order to get the suitable size of B,the user first image size is 640*480 pixels,it needs 630ms to process this puts his/her hands on the home row of the keyboard,as shown image to localize the keystroke.When considering the time in Fig.5(a).For simplicity,we set the kernel sizes for erosion cost for taking images,processing consecutive images to track and dilation to be equal.The initial kernel size is zo =0.fingertips for keystroke detection,the time cost for localizing Then,CamK updates zi=2-1+1.When CamK can localize a keystroke will increase to 1320ms,which will lead to a very each fingertip in the correct key with zi,then CamK sets the low input speed and a bad user experience.Therefore,we kernel size as z=2i. introduce the following optimizations for CamK
distance between the camera and hands may affect the value of dl (dr). Therefore, for the unselected candidate fingertip (e.g., fingertip 8 in Fig. 8(d)), we do not discard it. We display its associated key as the candidate key. The user can select the candidate key for text input (see Fig. 1). Algorithm 1: Keystroke localization Input: Candidate fingertip set Ctip in frame i. Remove fingertips out of the keyboard from Ctip . for Pi ∈ Ctip do Obtain candidate key set Ckey with four nearest keys around Pi . for Kj ∈ Ckey do if Pi is located in Kj then Calculate the coverage ratio ρkj of Kj . if ρkj . else Remove Pi from Ctip ; if Ctip = ∅ then No keystroke occurs, return ; if |Ctip| = 1 then Return the associated key of the only fingertip. For each hand, select with largest ratio ρkj . Use () to represent the fingertip and its associated key in left (right) hand. Calculate dl (dr) between Pl (Pr) with the remaining fingertips in left (right) hand. if dl > dr then Return Kl ; else Return Kr; Output: The pressed key. IV. OPTIMIZATIONS FOR KEYSTROKE LOCALIZATION AND IMAGE PROCESSING A. Initial Training Optimal parameters for image processing: For key segmentation (see section III-B2), εy is used for tolerating the change of Y caused by environments. Initially, εy = 50. CamK updates εyi = εyi−1 + 1, when the number of extracted keys decreases, it stops. Then, CamK sets εy to 50 and updates εyi = εyi−1 −1, when the number of extracted keys decreases, it stops. In the process, when CamK gets maximum number of keys, the corresponding value εyi is selected as the optimal value for εy. In hand segmentation, CamK uses erosion and dilation operations, which respectively use a kernel B [17] to process images. In order to get the suitable size of B, the user first puts his/her hands on the home row of the keyboard, as shown in Fig. 5(a). For simplicity, we set the kernel sizes for erosion and dilation to be equal. The initial kernel size is z0 = 0. Then, CamK updates zi = zi−1 + 1. When CamK can localize each fingertip in the correct key with zi , then CamK sets the kernel size as z = zi . Frame rate selection: CamK sets the initial/default frame rate of the camera to be f0 = 30fps (frames per second), which is usually the maximal frame rate of many smartphones. For the ith keystroke, the number of frames containing the keystroke is represented as n0i . When the user has pressed u keys, we can get the average number of frames during a keystroke as n¯0 = 1 u · Pi=u i=1 n0i . In fact, n¯0 reflects the duration of a keystroke. When the frame rate f changes, the number of frames in a keystroke n¯f changes. Intuitively, a smaller value of n¯f can reduce the image processing time, while a larger value of n¯f can improve the accuracy of keystroke localization. Based on extensive experiments (see section V-C), we set n¯f = 3, thus f = l f0 · n¯f n¯0 m . B. Online Calibration Removing false positive keystrokes: Sometimes, the fingers may keep still, even the user does not type any key. CamK may treat the non-keystroke as a keystroke by chance, leading to an error. Thus we introduce a temporary character to mitigate this problem. In the process of pressing a key, the StrokeTip moves towards the key, stays at that key, and then moves away. The vertical coordinate of the StrokeTip first increases, then pauses, then decreases. If CamK has detected a keystroke in the n¯f consecutive frames, it will display the current character on the screen as a temporary character. In the next frame(s), if the position of the StrokeTip does not satisfy the features of a keystroke, CamK will cancel the temporary character. This does not have much impact on the user’s experience, because of the very short time during two consecutive frames. Besides, CamK also displays the candidate keys around the StrokeTip, the user can choose them for text input. Movement of smartphone or keyboard: CamK presumes that the smartphone and the keyboard are kept at stable positions during its usage life-cycle. For best results, we recommend the user tape the paper keyboard on the panel. However, to alleviate the effect caused by the movements of the mobile device or the keyboard, we offer a simple solution. If the user uses the Delete key on the screen multiple times (e.g., larger than 3 times), it may indicate CamK can not output the character correctly. The movements of the device/keyboard may happen. Then, CamK informs the user to move his/her hands away from the keyboard for relocation. After that, the user can continue the typing process. C. Real Time Image Processing Because image processing is rather time-consuming, it is difficult to make CamK work on the mobile device. Take the Samsung GT-I9100 smartphone as an example, when the image size is 640 ∗ 480 pixels, it needs 630ms to process this image to localize the keystroke. When considering the time cost for taking images, processing consecutive images to track fingertips for keystroke detection, the time cost for localizing a keystroke will increase to 1320ms, which will lead to a very low input speed and a bad user experience. Therefore, we introduce the following optimizations for CamK
Adaptively changing image sizes:We use small images C.Frame rate (e.g.,120 90 pixels)during two keystrokes to track the As described in section IV-A,the frame rate affects the fingertips,and use a large image (e.g..480 360 pixels) number of images nif during a keystroke.Obviously,with the for keystroke localization.Optimizing the large-size image larger value of nif,CamK can easily detect the keystroke and processing:When we detect a possible keystroke in(e,yc)of localize it.On the contrary,CamK may miss the keystrokes. frame i-1,then we focus on a small area Se=[P(zi,yi)Based on Fig.11,when rif >3,CamK has good performance. Sel Ti-xc≤△x,li-yel≤△y}of frame i to localize When nf >3,there is no obvious performance improvement. the keystroke.We set Ax =40,Ay=20 by default.Multi- However,increasing nf means introducing more images for thread Processing:CamK adopts three threads to detect and processing.It may increase the time latency.While considering localize the keystroke in parallel,i.e.,capturing thread to take the accuracy,false positive,and time latency,we set n =3. images,tracking thread for keystroke detection,and localizing Besides,we invite 5 users to test the duration At of a thread for keystroke localization.Processing without writing keystroke.At represents the time when the StrokeTip is located and reading images:CamK directly stores the bytes of in the StrokeKey from the view of the camera.Based on the source data to the text file in binary mode,instead of Fig.12,At is usually larger than 150ms.When ni=3,the writing/reading images. frame rate is less than the maximum frame rate(30fps).CamK can work under the frame rate limitation of the smartphone V.PERFORMANCE EVALUATION Therefore,nf =3 is a suitable choice. We implement CamK on the Samsung GT-19100 smart- D.Impact of image size phone running Google's Android operating system (version 4.4.4).Samsung GT-19100 has a 2 million pixels front-facing We first measure the performance of CamK by adopting a camera.We use the layout of AWK(Apple Wireless Keyboard) same size for each image.Based on Fig.13,as the size of as the default keyboard layout,which is printed on a piece of image increases,the performance of CamK becomes better. When the size is smaller than 480 *360 pixels,CamK can US Letter sized paper.Unless otherwise specified,the frame not extract the keys correctly,the performance is rather bad. rate is 15fps,the image size is 480*460 pixels.CamK works in When the size of image is 480 360 pixels,the performance the office.We first evaluate each component of CamK.Then, is good.Keeping increasing the size does not cause obvious we invite 9 users to use CamK and compare the performance of CamK with other text-entry methods. improvement.However,increasing the image size will increase the image processing time and power consumption measured A.Localization accuracy for known keystrokes by a Monsoon power monitor [18])for processing an image, In order to verify whether CamK has obtained the optimal as shown in Fig.14.Based on section IV-C,CamK adaptively parameters for image processing,we measure the accuracy change the sizes of the images.In order to guarantee high of keystroke localization,when CamK knows a keystroke is accuracy and low false positive rate,and reduce the time happening.The user presses the 59 keys (excluding the PC latency and power consumption,the size of the large image is function keys:first row,five keys in last row)on the paper set 480 *380 pixels. keyboard sequentially.We press each key fifty times.The In Fig.15,the size of the small image decreases from localization result is shown in Fig.9.the localization accuracy 480 360 to 120 90,CamK keeps the high accuracy with is close to 100%.It means that CamK can adaptively select low false rate.When the size of small images continuously suitable values of the parameters used in image processing. changes,the accuracy decreases a lot,and the false positive rate increases a lot.When the image size decreases,the time B.Accuracy of keystroke localization and false positive rate cost power consumption for locating a keystroke keeps of keystroke detection decreasing,as shown in Fig.16.Combining Fig.15 and Fig. In order to verify whether CamK can utilize the features 16,the size of the small image is set 120 *90 pixels. of a keystroke and online calibration for keystroke detection E.Time latency and power consumption and localization.We conduct the experiments in three typical Based on Fig.16,the time cost for locating a keystroke scenarios;an office environment,a coffee shop,and outdoors. is about 200ms,which is comparable to the duration of a Usually,in the office,the color of the light is close to white.In keystroke,as shown in Fig.12.It means when the user the coffee shop,the red part of light is similar to that of human stays in the pressed key,CamK can output the text without skin.In outdoors,the sunlight is basic/pure light.In each test. noticeable time latency.The time latency is within 50ms,or a user randomly makes N=500 keystrokes.Suppose CamK even smaller,which is well below human response time [10. localizes Na keystrokes correctly and treats Nf non-keystrokes In addition,we measure the power consumption of Samsung as keystrokes wrongly.We define the accuracy as pa GT-19100 smartphone in the following states:(1)idle with and the false positive rate as p-min(,1).We show the screen on;(2)writing an email;(3)keeping the camera the results of these experiments in Fig.10,which shows that on the preview mode(frame rate is 15fps);(4)running CamK CamK can achieve high accuracy (larger than 90%)with low (frame rate is 15fps)for text-entry.The power consumption false positive rate (about 5%).In the office,the localization in each state is 516mW,1189mW,1872mW,2245mW.The accuracy can achieve 95%. power consumption of CamK is a little high.Yet as a new
Adaptively changing image sizes: We use small images (e.g., 120 ∗ 90 pixels) during two keystrokes to track the fingertips, and use a large image (e.g., 480 ∗ 360 pixels) for keystroke localization. Optimizing the large-size image processing: When we detect a possible keystroke in (xc, yc) of frame i−1, then we focus on a small area Sc = {Pi(xi , yi) ∈ Sc| |xi − xc| ≤ ∆x, |yi − yc| ≤ ∆y} of frame i to localize the keystroke. We set ∆x = 40, ∆y = 20 by default. Multithread Processing: CamK adopts three threads to detect and localize the keystroke in parallel, i.e., capturing thread to take images, tracking thread for keystroke detection, and localizing thread for keystroke localization. Processing without writing and reading images: CamK directly stores the bytes of the source data to the text file in binary mode, instead of writing/reading images. V. PERFORMANCE EVALUATION We implement CamK on the Samsung GT-I9100 smartphone running Google’s Android operating system (version 4.4.4). Samsung GT-I9100 has a 2 million pixels front-facing camera. We use the layout of AWK (Apple Wireless Keyboard) as the default keyboard layout, which is printed on a piece of US Letter sized paper. Unless otherwise specified, the frame rate is 15fps, the image size is 480∗460 pixels. CamK works in the office. We first evaluate each component of CamK. Then, we invite 9 users to use CamK and compare the performance of CamK with other text-entry methods. A. Localization accuracy for known keystrokes In order to verify whether CamK has obtained the optimal parameters for image processing, we measure the accuracy of keystroke localization, when CamK knows a keystroke is happening. The user presses the 59 keys (excluding the PC function keys: first row, five keys in last row) on the paper keyboard sequentially. We press each key fifty times. The localization result is shown in Fig. 9. the localization accuracy is close to 100%. It means that CamK can adaptively select suitable values of the parameters used in image processing. B. Accuracy of keystroke localization and false positive rate of keystroke detection In order to verify whether CamK can utilize the features of a keystroke and online calibration for keystroke detection and localization. We conduct the experiments in three typical scenarios; an office environment, a coffee shop, and outdoors. Usually, in the office, the color of the light is close to white. In the coffee shop, the red part of light is similar to that of human skin. In outdoors, the sunlight is basic/pure light. In each test, a user randomly makes Nk = 500 keystrokes. Suppose CamK localizes Na keystrokes correctly and treats Nf non-keystrokes as keystrokes wrongly. We define the accuracy as pa = Na Nk , and the false positive rate as pf = min( Nf Nk , 1). We show the results of these experiments in Fig. 10, which shows that CamK can achieve high accuracy (larger than 90%) with low false positive rate (about 5%). In the office, the localization accuracy can achieve 95%. C. Frame rate As described in section IV-A, the frame rate affects the number of images n¯f during a keystroke. Obviously, with the larger value of n¯f , CamK can easily detect the keystroke and localize it. On the contrary, CamK may miss the keystrokes. Based on Fig. 11, when n¯f ≥ 3, CamK has good performance. When n¯f > 3, there is no obvious performance improvement. However, increasing n¯f means introducing more images for processing. It may increase the time latency. While considering the accuracy, false positive, and time latency, we set n¯f = 3. Besides, we invite 5 users to test the duration ∆t of a keystroke. ∆t represents the time when the StrokeTip is located in the StrokeKey from the view of the camera. Based on Fig. 12, ∆t is usually larger than 150ms. When n¯f = 3, the frame rate is less than the maximum frame rate (30fps). CamK can work under the frame rate limitation of the smartphone. Therefore, n¯f = 3 is a suitable choice. D. Impact of image size We first measure the performance of CamK by adopting a same size for each image. Based on Fig. 13, as the size of image increases, the performance of CamK becomes better. When the size is smaller than 480 ∗ 360 pixels, CamK can not extract the keys correctly, the performance is rather bad. When the size of image is 480 ∗ 360 pixels, the performance is good. Keeping increasing the size does not cause obvious improvement. However, increasing the image size will increase the image processing time and power consumption ( measured by a Monsoon power monitor [18]) for processing an image, as shown in Fig. 14. Based on section IV-C, CamK adaptively change the sizes of the images. In order to guarantee high accuracy and low false positive rate, and reduce the time latency and power consumption, the size of the large image is set 480 ∗ 380 pixels. In Fig. 15, the size of the small image decreases from 480 ∗ 360 to 120 ∗ 90, CamK keeps the high accuracy with low false rate. When the size of small images continuously changes, the accuracy decreases a lot, and the false positive rate increases a lot. When the image size decreases, the time cost / power consumption for locating a keystroke keeps decreasing, as shown in Fig. 16. Combining Fig. 15 and Fig. 16, the size of the small image is set 120 ∗ 90 pixels. E. Time latency and power consumption Based on Fig. 16, the time cost for locating a keystroke is about 200ms, which is comparable to the duration of a keystroke, as shown in Fig. 12. It means when the user stays in the pressed key, CamK can output the text without noticeable time latency. The time latency is within 50ms, or even smaller, which is well below human response time [10]. In addition, we measure the power consumption of Samsung GT-I9100 smartphone in the following states: (1) idle with the screen on; (2) writing an email; (3) keeping the camera on the preview mode (frame rate is 15fps); (4) running CamK (frame rate is 15fps) for text-entry. The power consumption in each state is 516mW, 1189mW, 1872mW, 2245mW. The power consumption of CamK is a little high. Yet as a new
100 100 400 00 350 一L0cz里on accuracy 70 70 …e…False posithe 0 e250 231 50 5200 0 20 0 50 11 Cocalizaon Resut 51 U1 U2 U Fig.9.Confusion matrix of the 59 keys Fig.10.Three scenarios Fig.11.Accuracy/false positive Fig.12. Duration for a keystroke vs.frames in a keystroke 70 800 ▲-T me cos 50 ●-Power 400 640'480 960720 mage size:(wh 360270 4018 Fig.13.Accuracy/fase positive vs.Fig.14.Processing time/power vs.Fig.15.Accuracy/fase positive by Fig.16. Processing time/power by image sizes image sizes changing sizes of small images changing sizes of small images 1日 CamK U1 U2 U3 U4 U5 06 U7 U1 U2 03:06 U7 08 09 Fig.17.Input speed with regular Fig.18.Error rate with regular text Fig.19.Input speed with random Fig.20.Error rate with random text input input character input character input technique,the power consumption is acceptable.In future,we (which is not typical),CamK improves the input speed about will try to reduce the energy cost 20%.Fig.18 shows the error rate of each method.Although CamK is relatively more erroneous than other methods.as F.User study a new technique,the error rate is comparable and tolerable. In order to evaluate the usability of CamK in practice,we Usually,the error rate of CamK is between 5%-9%,which invite 9 users to test CamK in different environments.We is comparable to that of UbiK (about 4%-8%). use the input speed and the error rate pe =(1-Pa)+pf as metrics.Each user tests CamK by typing regular text sentences 2)Random character input:Fig.19 shows the input speed and random characters.We compare CamK with the following of each user when they input the random characters,which three input methods:typing with an IBM style PC keyboard, contain a lot of digits and punctuations.The input speed of typing on Google's Android on-screen keyboard,and typing CamK is comparable to that of a PC keyboard.CamK can on Swype keyboard [19],which allows the user to slide a achieve 2.5X typing speedup,when compared to the on-screen finger across the keys and use the language mode to guess keyboard and Swype.Because the latter two keyboards need the word.For each input method,the user has ten minutes to to switch between different screens to find letters,digits and familiarize with the keyboard before using it. punctuations.For random character input,UbiK [10]achieves 1)Regular text input:Fig.17 shows the input speed of each 2X typing speedup,compared to that of on-screen keyboards. user when they input the regular text.Each user achieves the Therefore,our solution can improve more input speed,when highest input speed when he/she uses the PC keyboard.This compared to UbiK.Fig.20 shows the error rate of each is because the user can locate the keys on a physical keyboard method.Due to the randomness of the characters,the error rate by touch,while the user tends to look at the paper keyboard increases,especially for typing with the on-screen keyboard to find a key.CamK can achieve 1.25X typing speedup,when and Swype.The error rate of CamK does not increase much, compared to the on-screen keyboard.In CamK,the user can because the user can input the characters just like he/she uses type 1.5-2.5 characters per second.When compared with UbiK the PC keyboard.The error rate in CamK (6%-10%)is [10].which requires the user to type with the finger nail comparable to that of UbiK [10](about 4%-10%)
1 11 21 31 41 51 1 11 21 31 41 51 Localization Result Actual Keystroke 0 0.2 0.4 0.6 0.8 1 Fig. 9. Confusion matrix of the 59 keys Office Coffee Shop Outdoor 0 10 20 30 40 50 60 70 80 90 100 Scenarios Probability (%) Localization accuracy False positive Fig. 10. Three scenarios 1 2 3 4 5 6 0 10 20 30 40 50 60 70 80 90 100 Frames in keystroke duration: nf Probability(%) Localization accuracy False positive Fig. 11. Accuracy/false positive vs. frames in a keystroke U1 U2 U3 U4 U5 0 50 100 150 200 250 300 350 400 Users Duration: (ms) Fig. 12. Duration for a keystroke 320*240 480*360 640*480 800*600 960*720 0 10 20 30 40 50 60 70 80 90 100 Image size: (w*h) Probability (%) Localization accuracy False positive Fig. 13. Accuracy/fase positive vs. image sizes 480*360 640*480 800*600 960*720 0 100 200 300 400 500 600 700 Processing time (ms) Image size: (w*h) 1600 1800 2000 2200 2400 2600 2800 3000 Power (mW) Processing time Power Fig. 14. Processing time/power vs. image sizes 480*360 360*270 240*180 120*90 80*60 0 10 20 30 40 50 60 70 80 90 100 Small image size during two keystrokes: (w*h) Probability (%) Localization accuracy False positive Fig. 15. Accuracy/fase positive by changing sizes of small images 480*360 360*270 240*180 120*90 0 100 200 300 400 500 600 Time cost for keystroke localization: (ms) Small image size during two keystrokes: (w*h) 1600 1800 2000 2200 2400 2600 2800 Power (mW) Time cost Power Fig. 16. Processing time/power by changing sizes of small images U1 U2 U3 U4 U5 U6 U7 U8 U9 0 1 2 3 4 5 Users Input speed: (chars/s) PC On-scre en Sw ype Cam K Fig. 17. Input speed with regular text input U1 U2 U3 U4 U5 U6 U7 U8 U9 0 4 8 12 16 Users Error rate: (%) PC On-screen Swype CamK Fig. 18. Error rate with regular text input U1 U2 U3 U4 U5 U6 U7 U8 U9 0 1 2 3 4 5 Users Input speed: (chars/s) PC On-screen Swype CamK Fig. 19. Input speed with random character input U1 U2 U3 U4 U5 U6 U7 U8 U9 0 4 8 12 16 Users Error rate: (%) PC On-screen Swype CamK Fig. 20. Error rate with random character input technique, the power consumption is acceptable. In future, we will try to reduce the energy cost. F. User study In order to evaluate the usability of CamK in practice, we invite 9 users to test CamK in different environments. We use the input speed and the error rate pe = (1 − pa) + pf as metrics. Each user tests CamK by typing regular text sentences and random characters. We compare CamK with the following three input methods: typing with an IBM style PC keyboard, typing on Google’s Android on-screen keyboard, and typing on Swype keyboard [19], which allows the user to slide a finger across the keys and use the language mode to guess the word. For each input method, the user has ten minutes to familiarize with the keyboard before using it. 1) Regular text input: Fig. 17 shows the input speed of each user when they input the regular text. Each user achieves the highest input speed when he/she uses the PC keyboard. This is because the user can locate the keys on a physical keyboard by touch, while the user tends to look at the paper keyboard to find a key. CamK can achieve 1.25X typing speedup, when compared to the on-screen keyboard. In CamK, the user can type 1.5-2.5 characters per second. When compared with UbiK [10], which requires the user to type with the finger nail (which is not typical), CamK improves the input speed about 20%. Fig. 18 shows the error rate of each method. Although CamK is relatively more erroneous than other methods, as a new technique, the error rate is comparable and tolerable. Usually, the error rate of CamK is between 5% − 9%, which is comparable to that of UbiK (about 4% − 8%). 2) Random character input: Fig. 19 shows the input speed of each user when they input the random characters, which contain a lot of digits and punctuations. The input speed of CamK is comparable to that of a PC keyboard. CamK can achieve 2.5X typing speedup, when compared to the on-screen keyboard and Swype. Because the latter two keyboards need to switch between different screens to find letters, digits and punctuations. For random character input, UbiK [10] achieves 2X typing speedup, compared to that of on-screen keyboards. Therefore, our solution can improve more input speed, when compared to UbiK. Fig. 20 shows the error rate of each method. Due to the randomness of the characters, the error rate increases, especially for typing with the on-screen keyboard and Swype. The error rate of CamK does not increase much, because the user can input the characters just like he/she uses the PC keyboard. The error rate in CamK (6% − 10%) is comparable to that of UbiK [10] (about 4% − 10%)
VI.RELATED WORK Foundation,No.BK20151390;Key Project of Jiangsu Re- Due to small sizes of mobile devices,existing research work search Program under Grant No.BE2013116;EU FP7 IRSES has focused on redesigning visual keyboards for text entry, MobileCloud Project under Grant No.612212;CCF-Tencent such as wearable keyboards.modified on-screen keyboards. Open FundChina Postdoctor Science Fund under Grant No projection keyboards,camera based keyboard,and so on. 2015M570434.This work is partially supported by Collabo- Wearable keyboards:Among the wearable keyboards, rative Innovation Center of Novel Software Technology and FingerRing [1]puts a ring on each finger to detect the finger's Industrialization.This work was supported in part by US movement to produce a character based on the accelerom- National Science Foundation grants CNS-1320453 and CNS- 1117412. eter.Similarly,Samsung's Scurry [20]works with the tiny REFERENCES gyroscopes.Thumbcode method [21],finger-Joint keypad [22] [1]M.Fukumoto and Y.Tonomura,"Body coupled fingerring:wireless work with a glove equipped with the pressure sensors for each wearable keyboard,"in Proc.of ACM CHI,1997. finger.The Senseboard [2]consists of two rubber pads which [2]M.Kolsch and M.Turk,"Keyboards without keyboards:A survey of slip onto the user's hands.It senses the movements in the palm virtual keyboards,"University of California,Santa Barbara.Tech.Rep., 2002. to get keystrokes. [3]K.A.Faraj.M.Mojahid,and N.Vigouroux,"Bigkey:A virtual keyboard Modified on-screen keyboards:Among the modified on- for mobile devices,"Human-Computer Interaction,vol.5612,pp.3-10, 2009. screen keyboards,BigKey [3]and ZoomBoard [4]adaptively [4]S.Oney,C.Harrison,A.Ogan,and J.Wiese,"Zoomboard:A diminutive change the size of keys.ContextType [23]leverages the infor- qwerty soft keyboard using iterative zooming for ultra-small devices,' mation about a user's hand posture to improve mobile touch in Proc.of ACM CHI,2013. screen text entry.While considering using multiple fingers, [5]H.Du,T.Oggier,F.Lustenberger,and E.Charbon1,"A virtual keyboard based on true-3d optical ranging,"in Proc.of the British Machine Vision Sandwich keyboard [24]affords ten-finger touch typing by Conference,2005. utilizing a touch sensor on the back side of a device. [6]M.Lee and W.Woo,"Arkb:3d vision-based augmented reality key- board,"in Proc.of ICAT,2003. Projection keyboards:Considering the advantages of the [7]C.Harrison,H.Benko,and A.D.Wilson,"Omnitouch:Wearable current QWERTY keyboard layout,projection keyboards are multitouch interaction everywhere,"in Proc.of ACM UIST.2011. proposed.However,they either need a visible light projector [8]J.Mantyjarvi,J.Koivumaki,and P.Vuori,"Keystroke recognition for to cast a keyboard [5],[6],[7].or use the infrared projector to virtual keyboard,"in Proc.of IEEE ICME,2002. [9]H.Roeber,J.Bacus,and C.Tomasi,"Typing in thin air:The canesta produce a keyboard [8][9].They use optical ranging or image projection keyboard a new method of interaction with electronic recognition methods to identify the keystroke devices."in Proc.of ACM CHI EA.2003. [10]J.Wang.K.Zhao,X.Zhang,and C.Peng,"Ubiquitous keyboard Camera based keyboards:Camera based visual keyboards for small mobile devices:Harnessing multipath fading for fine-grained do not need additional hardware.In [11],the system gets the keystroke localization,"in Proc.of ACM MobiSys,2014. input by recognizing the gestures of user's fingers.It needs [11]T.Murase,A.Moteki,N.Ozawa,N.Hara,T.Nakai,and K.Fujimoto users to remember the mapping between the keys and the "Gesture keyboard requiring only one camera,"in Proc.of ACM UIST. 2011. fingers.In [12],the visual keyboard is printed on a piece of [12]Z.Zhang,Y.Wu,Y.Shan,and S.Shafer,"Visual panel:Virtual mouse paper.The user can only use one finger and needs to wait for keyboard and 3d controller with an ordinary piece of paper,"in Proc. one second before each keystroke.Similarly,the iPhone app of ACM PUI.2001. [13]Y.Adajania,J.Gosalia,A.Kanade,H.Mehta,and N.Shekokar,"Virtual paper keyboard [25]only allows the user to use one finger keyboard using shadow analysis,"in Proc.of ICETET,2010. in a hand.In [13],the system detects the keystroke based on [14]R.Biswas and J.Sil,"An improved canny edge detection algorithm shadow analysis,which is easy affected by light conditions. based on type-2 fuzzy sets."Procedia Technology.vol.4.pp.820-824. 2012. In addition,Wang et al.[10]propose UbiK,which leverages [15]S.A.Naji.R.Zainuddin.and H.A.Jalab,"Skin segmentation based on the microphone on a mobile device to localize the keystrokes. multi pixel color clustering models"Digital Signal Processing.vol.22. However,it requires the user to click the key with fingertip n0.6,Pp.933-940,2012. [16)“Otsu's method." https://en.wikipedia.org/wiki/Otsu%27s_method, and nail margin,which is not typical. 2015. [17刀“Opencv library,”http://opencv.orgl,2015 VII.CONCLUSION [18] "Monsoon power monitor,"http://www.msoon.com/.2015. [19]“Swype,”http:lwww.swype.com/,2015. In this paper,we propose CamK for inputting text into small 20 Y.S.Kim,B.S.Soh,and S.G.Lee."A new wearable input device: mobile devices.By using image processing techniques,CamK Scurry,"IEEE Transactions on Industrial Electronics,vol.52,no.6,pp 1490-1499,December2005. can achieve above 95%accuracy for keystroke localization, [21]V.R.Pratt,"Thumbcode:A device-independent digital sign language," with only 4.8%false positive keystrokes.Based on our exper- in http://boole.stanford.edw/thumbcode/,1998. iment results,CamK can achieve 1.25X typing speedup for [22]M.Goldstein and D.Chincholle,"The finger-joint gesture wearable keypad,"in Second Workshop on Human Computer Interaction with regular text input and 2.5X for random character input,when Mobile Devices,1999. compared to on-screen keyboards. [23]M.Goel,A.Jansen,T.Mandel,S.N.Patel,and J.O.Wobbrock "Contexttype:Using hand posture information to improve mobile touch screen text entry,"in Proc.of ACM CHI,2013. ACKNOWLEDGMENT [24]O.Schoenleben and A.Oulasvirta,"Sandwich keyboard:Fast ten-finger This work is supported in part by National Natural Science typing on a mobile device with adaptive touch sensing on the back side," in Proc.of ACM MobileHCI.2013.pp.175-178. Foundation of China under Grant Nos.61472185,61373129. [25]"iphone app:Paper keyboard,"http://augmentedappstudio.com/support. 61321491,91218302,61502224;JiangSu Natural Science html,2015
VI. RELATED WORK Due to small sizes of mobile devices, existing research work has focused on redesigning visual keyboards for text entry, such as wearable keyboards, modified on-screen keyboards, projection keyboards, camera based keyboard, and so on. Wearable keyboards: Among the wearable keyboards, FingerRing [1] puts a ring on each finger to detect the finger’s movement to produce a character based on the accelerometer. Similarly, Samsung’s Scurry [20] works with the tiny gyroscopes. Thumbcode method [21], finger-Joint keypad [22] work with a glove equipped with the pressure sensors for each finger. The Senseboard [2] consists of two rubber pads which slip onto the user’s hands. It senses the movements in the palm to get keystrokes. Modified on-screen keyboards: Among the modified onscreen keyboards, BigKey [3] and ZoomBoard [4] adaptively change the size of keys. ContextType [23] leverages the information about a user’s hand posture to improve mobile touch screen text entry. While considering using multiple fingers, Sandwich keyboard [24] affords ten-finger touch typing by utilizing a touch sensor on the back side of a device. Projection keyboards: Considering the advantages of the current QWERTY keyboard layout, projection keyboards are proposed. However, they either need a visible light projector to cast a keyboard [5], [6], [7], or use the infrared projector to produce a keyboard [8] [9]. They use optical ranging or image recognition methods to identify the keystroke. Camera based keyboards: Camera based visual keyboards do not need additional hardware. In [11], the system gets the input by recognizing the gestures of user’s fingers. It needs users to remember the mapping between the keys and the fingers. In [12], the visual keyboard is printed on a piece of paper. The user can only use one finger and needs to wait for one second before each keystroke. Similarly, the iPhone app paper keyboard [25] only allows the user to use one finger in a hand. In [13], the system detects the keystroke based on shadow analysis, which is easy affected by light conditions. In addition, Wang et al. [10] propose UbiK, which leverages the microphone on a mobile device to localize the keystrokes. However, it requires the user to click the key with fingertip and nail margin, which is not typical. VII. CONCLUSION In this paper, we propose CamK for inputting text into small mobile devices. By using image processing techniques, CamK can achieve above 95% accuracy for keystroke localization, with only 4.8% false positive keystrokes. Based on our experiment results, CamK can achieve 1.25X typing speedup for regular text input and 2.5X for random character input, when compared to on-screen keyboards. ACKNOWLEDGMENT This work is supported in part by National Natural Science Foundation of China under Grant Nos. 61472185, 61373129, 61321491, 91218302, 61502224; JiangSu Natural Science Foundation, No. BK20151390; Key Project of Jiangsu Research Program under Grant No. BE2013116; EU FP7 IRSES MobileCloud Project under Grant No. 612212; CCF-Tencent Open FundChina Postdoctor Science Fund under Grant No. 2015M570434. This work is partially supported by Collaborative Innovation Center of Novel Software Technology and Industrialization. This work was supported in part by US National Science Foundation grants CNS-1320453 and CNS- 1117412. REFERENCES [1] M. Fukumoto and Y. Tonomura, “Body coupled fingerring: wireless wearable keyboard,” in Proc. of ACM CHI, 1997. [2] M. Kolsch and M. Turk, “Keyboards without keyboards: A survey of virtual keyboards,” University of California, Santa Barbara, Tech. Rep., 2002. [3] K. A. Faraj, M. Mojahid, and N. Vigouroux, “Bigkey: A virtual keyboard for mobile devices,” Human-Computer Interaction, vol. 5612, pp. 3–10, 2009. [4] S. Oney, C. Harrison, A. Ogan, and J. Wiese, “Zoomboard: A diminutive qwerty soft keyboard using iterative zooming for ultra-small devices,” in Proc. of ACM CHI, 2013. [5] H. Du, T. Oggier, F. Lustenberger, and E. Charbon1, “A virtual keyboard based on true-3d optical ranging,” in Proc. of the British Machine Vision Conference, 2005. [6] M. Lee and W. Woo, “Arkb: 3d vision-based augmented reality keyboard,” in Proc. of ICAT, 2003. [7] C. Harrison, H. Benko, and A. D. Wilson, “Omnitouch: Wearable multitouch interaction everywhere,” in Proc. of ACM UIST, 2011. [8] J. Mantyjarvi, J. Koivumaki, and P. Vuori, “Keystroke recognition for virtual keyboard,” in Proc. of IEEE ICME, 2002. [9] H. Roeber, J. Bacus, and C. Tomasi, “Typing in thin air: The canesta projection keyboard - a new method of interaction with electronic devices,” in Proc. of ACM CHI EA, 2003. [10] J. Wang, K. Zhao, X. Zhang, and C. Peng, “Ubiquitous keyboard for small mobile devices: Harnessing multipath fading for fine-grained keystroke localization,” in Proc. of ACM MobiSys, 2014. [11] T. Murase, A. Moteki, N. Ozawa, N. Hara, T. Nakai, and K. Fujimoto, “Gesture keyboard requiring only one camera,” in Proc. of ACM UIST, 2011. [12] Z. Zhang, Y. Wu, Y. Shan, and S. Shafer, “Visual panel: Virtual mouse, keyboard and 3d controller with an ordinary piece of paper,” in Proc. of ACM PUI, 2001. [13] Y. Adajania, J. Gosalia, A. Kanade, H. Mehta, and N. Shekokar, “Virtual keyboard using shadow analysis,” in Proc. of ICETET, 2010. [14] R. Biswas and J. Sil, “An improved canny edge detection algorithm based on type-2 fuzzy sets,” Procedia Technology, vol. 4, pp. 820–824, 2012. [15] S. A. Naji, R. Zainuddin, and H. A. Jalab, “Skin segmentation based on multi pixel color clustering models,” Digital Signal Processing, vol. 22, no. 6, pp. 933–940, 2012. [16] “Otsu’s method,” https://en.wikipedia.org/wiki/Otsu%27s method, 2015. [17] “Opencv library,” http://opencv.org/, 2015. [18] “Monsoon power monitor,” http://www.msoon.com/, 2015. [19] “Swype,” http://www.swype.com/, 2015. [20] Y. S. Kim, B. S. Soh, and S.-G. Lee, “A new wearable input device: Scurry,” IEEE Transactions on Industrial Electronics, vol. 52, no. 6, pp. 1490–1499, December 2005. [21] V. R. Pratt, “Thumbcode: A device-independent digital sign language,” in http://boole.stanford.edu/thumbcode/, 1998. [22] M. Goldstein and D. Chincholle, “The finger-joint gesture wearable keypad,” in Second Workshop on Human Computer Interaction with Mobile Devices, 1999. [23] M. Goel, A. Jansen, T. Mandel, S. N. Patel, and J. O. Wobbrock, “Contexttype: Using hand posture information to improve mobile touch screen text entry,” in Proc. of ACM CHI, 2013. [24] O. Schoenleben and A. Oulasvirta, “Sandwich keyboard: Fast ten-finger typing on a mobile device with adaptive touch sensing on the back side,” in Proc. of ACM MobileHCI, 2013, pp. 175–178. [25] “iphone app: Paper keyboard,” http://augmentedappstudio.com/support. html, 2015