This article has been accepted for publication in a future issue of this journal,but has not been fully edited.Content may change prior to final publication.Citation information:DOI 10.1109/JIOT.2021.3114224.IEEE Internet of Things Journal EEE INTERNET OF THINGS JOURNAL,VOL.XX,NO.XX,XX 2021 2 1 illustrates a typical scenario where a user wears a head- ing module to ensure DynaKey work dynamically in real time. mounted camera device (e.g.,smart glasses),while a standard In summary,we make three main contributions in this paper. keyboard layout can be printed on a piece of paper or drawn on 1)To the best of our knowledge,this paper appears the first a desk surface.DynaKey combines the embedded camera and work focusing on efficient text input using the built-in camera gyroscope to track finger movements and recognize keystrokes of a head-mounted device (e.g.,smart glasses)in dynamic in real time.Specifically,while the user types on a virtual moving scenes.To adapt to the dynamic camera views,we keyboard,DynaKey utilizes camera to capture image frames propose a Perspective Transformation-based technique to track continuously,then detects fingertips and locates keystrokes the changes of keyboard's coordinate.Besides,without the using image processing techniques.During the tying process, depth information of fingertips in a single camera view,we when the head movement is detected by gyroscope,DynaKey utilize the variation of fingertip's coordinate across multiple needs to track the changes of keyboard coordinate caused by frames for keystroke detection.2)To ensure the real-time camera movements.This keyboard tracking is crucial due to response,DynaKey proposes a gyroscope-based lightweight natural head movements in real application scenarios design to adaptively detect the camera movement and remove The design of DynaKey creates three key challenges that unnecessary image processing for keyboard tracking.Besides, we aim to address in this paper we introduce a series of optimizations such as keypoint The first challenge is how to track changes of keyboard's selection,frame skipping and multi-thread processing for coordinate accurately so that Dynakey is able to adapt to image processing.3)We implement DynaKey on off-the-shelf dynamic moving scenes.In reality,the camera moves naturally Android devices,and conduct comprehensive experiments to along with the head.Such movements will cause dynamic evaluate the performance of DynaKey.Results show that the changes of the camera coordinate system.The different camera average tracking deviation of keyboard layout is less than 3 views and unavoidable image distortion eventually result in pixels and the intersection over union (IoU)[25]of a key changes of keyboard coordinate in image frames.An intuitive in two consecutive images is above 93%.The accuracy of solution is to re-extract keyboard layout from each image,but keystroke localization reaches 95.5%on average.The time it is costly.In addition,we may not be able to obtain keyboard response is 63 ms and such latency is below human response layout from each image properly due to unavoidable occlusion time [23]. by hands.Our intuitive idea asks a fundamental question- II.RELATED WORK can we build a fixed coordinate system no matter how the keyboard coordinate changes?In DynaKey,we propose a Virtual keyboards have been used as an alternative of Perspective Transformation-based technique that converts any on-screen keyboards [1],[2]to support text input for mo- previous coordinate to the current coordinate system.To obtain bile or wearable devices with small or no screen.These appropriate feature point pairs for facilitating transformation, virtual keyboards can be mainly classified into five cate- we propose a keypoint selection method to dynamically select gories,i.e.,wearable sensor-based,projection-based,WiFi- appropriate cross point pairs from the keyboard layout,while based,acoustic-based,and camera-based keyboards. tolerating the occlusion of keyboard. Wearable sensor-based keyboards:Wearable sensors have The second challenge is how to detect and locate keystrokes been used to capture the movements of fingers for text input. efficiently and accurately from a single camera view.This iKey [4]utilizes a wrist-worn piezoelectric ceramic sensor is a non-trivial task due to the lack of depth information of to recognize keystrokes on the back of hand.DigiTouch [5] fingertips from single camera view.In the setting of a head- introduces a glove-based input device which enables thumb-to- mounted camera and a keyboard located in the front of and finger touch interaction by sensing touch position and pressure. below the camera,the camera view from top and behind can MagBoard [3]leverages the triaxial magnetometer embedded hardly get the perpendicular distance between the fingertip and in mobile phones to locate a magnet on a printed keyboard. the keyboard plane,i.e.,it is difficult to determine whether a FingerSound 6 utilizes a thumb-mounted ring which consists finger is typing and which finger is typing.To address this of a microphone and a gyroscope,to recognize unistroke challenge,we utilize the variation of a fingertip's coordinate thumb gestures for text input.These approaches introduce across multiple frames to detect a keystroke,i.e..whether a additional hardwares to capture typing behaviors. finger is typing.In addition to the fingertip movement,we Projection-based keyboards:Projection keyboards [24]. further match a key's coordinate with the fingertip's coordinate [27]have been proposed for mobile devices,by adopting a to locate which finger is typing. conventional QWERTY keyboard layout.They usually require The third challenge is how to trade off between dy- a light projector to cast a keyboard layout onto a flat surface, namic tracking of keyboard and tracking cost for resource- and then recognize keystrokes based on light reflection.This constrained devices.If the camera does not move or has approach requires dedicated equipment.Microsoft Hololens negligible movements,tracking keyboard's coordinate is [13]provides a projection keyboard in front of a user using unnecessary.To achieve the best trade-off for resource-a pair of mixed-reality smart glasses.During text input,the constrained head-mounted devices,we introduce a gyroscope- user needs to move her/his head to pick a key and then make based lightweight method to detect non-negligible camera a specific 'tap'gesture to select the character.This tedious movements,including short-time sharp movement and long- process may slow down text input and affect user experience. time accumulated micro movement.Only the detected non- WiFi-based keyboards:By utilizing the unique pattern of negligible camera movements will trigger the keyboard track- channel state information(CSI)in time series,WiFinger [19]2327-4662 (c) 2021 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information. This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/JIOT.2021.3114224, IEEE Internet of Things Journal IEEE INTERNET OF THINGS JOURNAL, VOL. XX, NO. XX, XX 2021 2 1 illustrates a typical scenario where a user wears a headmounted camera device (e.g., smart glasses), while a standard keyboard layout can be printed on a piece of paper or drawn on a desk surface. DynaKey combines the embedded camera and gyroscope to track finger movements and recognize keystrokes in real time. Specifically, while the user types on a virtual keyboard, DynaKey utilizes camera to capture image frames continuously, then detects fingertips and locates keystrokes using image processing techniques. During the tying process, when the head movement is detected by gyroscope, DynaKey needs to track the changes of keyboard coordinate caused by camera movements. This keyboard tracking is crucial due to natural head movements in real application scenarios. The design of DynaKey creates three key challenges that we aim to address in this paper. The first challenge is how to track changes of keyboard’s coordinate accurately so that DynaKey is able to adapt to dynamic moving scenes. In reality, the camera moves naturally along with the head. Such movements will cause dynamic changes of the camera coordinate system. The different camera views and unavoidable image distortion eventually result in changes of keyboard coordinate in image frames. An intuitive solution is to re-extract keyboard layout from each image, but it is costly. In addition, we may not be able to obtain keyboard layout from each image properly due to unavoidable occlusion by hands. Our intuitive idea asks a fundamental question– can we build a fixed coordinate system no matter how the keyboard coordinate changes? In DynaKey, we propose a Perspective Transformation-based technique that converts any previous coordinate to the current coordinate system. To obtain appropriate feature point pairs for facilitating transformation, we propose a keypoint selection method to dynamically select appropriate cross point pairs from the keyboard layout, while tolerating the occlusion of keyboard. The second challenge is how to detect and locate keystrokes efficiently and accurately from a single camera view. This is a non-trivial task due to the lack of depth information of fingertips from single camera view. In the setting of a headmounted camera and a keyboard located in the front of and below the camera, the camera view from top and behind can hardly get the perpendicular distance between the fingertip and the keyboard plane, i.e., it is difficult to determine whether a finger is typing and which finger is typing. To address this challenge, we utilize the variation of a fingertip’s coordinate across multiple frames to detect a keystroke, i.e., whether a finger is typing. In addition to the fingertip movement, we further match a key’s coordinate with the fingertip’s coordinate to locate which finger is typing. The third challenge is how to trade off between dynamic tracking of keyboard and tracking cost for resourceconstrained devices. If the camera does not move or has negligible movements, tracking keyboard’s coordinate is unnecessary. To achieve the best trade-off for resourceconstrained head-mounted devices, we introduce a gyroscopebased lightweight method to detect non-negligible camera movements, including short-time sharp movement and longtime accumulated micro movement. Only the detected nonnegligible camera movements will trigger the keyboard tracking module to ensure DynaKey work dynamically in real time. In summary, we make three main contributions in this paper. 1) To the best of our knowledge, this paper appears the first work focusing on efficient text input using the built-in camera of a head-mounted device (e.g., smart glasses) in dynamic moving scenes. To adapt to the dynamic camera views, we propose a Perspective Transformation-based technique to track the changes of keyboard’s coordinate. Besides, without the depth information of fingertips in a single camera view, we utilize the variation of fingertip’s coordinate across multiple frames for keystroke detection. 2) To ensure the real-time response, DynaKey proposes a gyroscope-based lightweight design to adaptively detect the camera movement and remove unnecessary image processing for keyboard tracking. Besides, we introduce a series of optimizations such as keypoint selection, frame skipping and multi-thread processing for image processing. 3) We implement DynaKey on off-the-shelf Android devices, and conduct comprehensive experiments to evaluate the performance of DynaKey. Results show that the average tracking deviation of keyboard layout is less than 3 pixels and the intersection over union (IoU) [25] of a key in two consecutive images is above 93%. The accuracy of keystroke localization reaches 95.5% on average. The time response is 63 ms and such latency is below human response time [23]. II. RELATED WORK Virtual keyboards have been used as an alternative of on-screen keyboards [1], [2] to support text input for mobile or wearable devices with small or no screen. These virtual keyboards can be mainly classified into five categories, i.e., wearable sensor-based, projection-based, WiFibased, acoustic-based, and camera-based keyboards. Wearable sensor-based keyboards: Wearable sensors have been used to capture the movements of fingers for text input. iKey [4] utilizes a wrist-worn piezoelectric ceramic sensor to recognize keystrokes on the back of hand. DigiTouch [5] introduces a glove-based input device which enables thumb-to- finger touch interaction by sensing touch position and pressure. MagBoard [3] leverages the triaxial magnetometer embedded in mobile phones to locate a magnet on a printed keyboard. FingerSound [6] utilizes a thumb-mounted ring which consists of a microphone and a gyroscope, to recognize unistroke thumb gestures for text input. These approaches introduce additional hardwares to capture typing behaviors. Projection-based keyboards: Projection keyboards [24], [27] have been proposed for mobile devices, by adopting a conventional QWERTY keyboard layout. They usually require a light projector to cast a keyboard layout onto a flat surface, and then recognize keystrokes based on light reflection. This approach requires dedicated equipment. Microsoft Hololens [13] provides a projection keyboard in front of a user using a pair of mixed-reality smart glasses. During text input, the user needs to move her/his head to pick a key and then make a specific ‘tap’ gesture to select the character. This tedious process may slow down text input and affect user experience. WiFi-based keyboards: By utilizing the unique pattern of channel state information (CSI) in time series, WiFinger [19] Authorized licensed use limited to: Nanjing University. Downloaded on December 03,2021 at 08:56:41 UTC from IEEE Xplore. Restrictions apply