Multi-Touch in the Air:Device-Free Finger Tracking and Gesture Recognition via COTS RFID Chuyu Wang',Jian Liu,Yingying Chen,Hongbo Liu",Lei Xief,Wei Wangt,Bingbing Hef,Sanglu Lut iState Key Laboratory for Novel Software Technology,Nanjing University,China Email:{wangcyu217,hebb}@dislab.nju.edu.cn,{Ixie,ww,sanglu@nju.edu.cn WINLAB,Rutgers University,New Brunswick,NJ,USA Email:jianliu @winlab.rutgers.edu,yingche@scarletmail.rutgers.edu "Indiana University-Purdue University,Indianapolis,IN,USA Email:hl45@iupui.edu Abstract-Recently,gesture recognition has gained consider- able attention in emerging applications (e.g.,AR/VR systems) Track the finger Recognize multi- to provide a better user experience for human-computer inter- on smart devices touch gesture action.Existing solutions usually recognize the gestures based on wearable sensors or specialized signals (e.g.,WiFi,acoustic and visible light),but they are either incurring high energy consumption or susceptible to the ambient environment,which prevents them from efficiently sensing the fine-grained finger RFID tag array Manipulate in movements.In this paper,we present RF-finger,a device-free VR gaming system based on Commercial-Off-The-Shelf(COTS)RFID,which leverages a tag array on a letter-size paper to sense the fine- grained finger movements performed in front of the paper. Particularly,we focus on two kinds of sensing modes:finger tracking recovers the moving trace of finger writings;multi-touch Fig.1.Illustrations of application of RF-finger. gesture recognition identifies the multi-touch gestures involving Therefore,accurately recognizing gestures in the air,especially multiple fingers.Specifically,we build a theoretical model to extract the fine-grained reflection feature from the raw RF-signal, fine-grained finger movements,has a great potential to provide which describes the finger influence on the tag array in cm- a better user experience in emerging VR applications and IoT level resolution.For the finger tracking,we leverage K-Nearest manipulations,which will have a market value of USD 48.56 Neighbors(KNN)to pinpoint the finger position relying on the billion by the year of 2024 [2]. fine-grained reflection features,and obtain a smoothed trace via Kalman filter.Additionally,we construct the reflection image of Existing gesture recognition solutions can be divided into each multi-touch gesture from the reflection features by regarding two categories:(i)Device-based approaches usually require the multiple fingers as a whole.Finally,we use a Convolutional the user to wear sensors,e.g.,RFID tag or smartwatch,and Neural Network(CNN)to identify the multi-touch gestures based track the motion of the sensors to recognize the gestures [15, on the images.Extensive experiments validate that RF-finger can 17.These studies usually derive the gestures by building achieve as high as 88%and 92%accuracy for finger tracking theoretical models to depict the signal changes received from and multi-touch gesture recognition,respectively. the sensors.However,device-based approaches either suffer I.INTRODUCTION from the uncomfortable user experience (e.g.,attaching the With the flourishing of ubiquitous sensing techniques,the RFID tag on the finger)or the short life cycles due to the high human-computer interaction is undergoing a reform:the nat- energy consumption.(ii)Device-free approaches recognize ural human gestures,e.g.,finger movements in the air,is pro- the gestures from ambient signals through different kinds of gressively replacing the traditional typing-based input devices techniques without requiring the user to wear any devices.As such as keyboards to provide a better user experience.Such the most popular solutions,camera-based solutions,such as gesture-based interactions have promoted the development of Kinect and LeapMotion,construct the body or finger structure both Virtual Reality (VR)and Argument Reality (AR)systems, from the video streams for accurately gesture recognition. where users could directly control the virtual objects via per- Nevertheless,they usually involve high computation and may forming gestures in the air,e.g.,writing words,manipulating incur privacy concerns of the users.More recent works try the tellurion or playing the VR games.Toward this end,the to recognize the gestures based on WiFi [16],acoustic sig- gesture-based interaction can further enable the operations on nals [18]and visible light [9].However,these solutions are the smart devices in the Internet-of-Things (loT)environments, either easily affected by the environmental noise or incapable e.g.,withdrawing the curtains,controlling the smart TVs. of sensing fine-grained gestures at the finger level.In this work,we are in search of a new device-free mechanism that Yingying Chen and Lei Xie are the co-corresponding authors can recognize finger-level gestures to facilitate the growing
Multi-Touch in the Air: Device-Free Finger Tracking and Gesture Recognition via COTS RFID Chuyu Wang† , Jian Liu‡ , Yingying Chen‡ , Hongbo Liu* , Lei Xie† , Wei Wang† , Bingbing He† , Sanglu Lu† †State Key Laboratory for Novel Software Technology, Nanjing University, China Email: {wangcyu217, hebb}@dislab.nju.edu.cn, {lxie, ww, sanglu}@nju.edu.cn ‡ WINLAB, Rutgers University, New Brunswick, NJ, USA Email: jianliu@winlab.rutgers.edu, yingche@scarletmail.rutgers.edu * Indiana University-Purdue University, Indianapolis, IN, USA Email: hl45@iupui.edu Abstract—Recently, gesture recognition has gained considerable attention in emerging applications (e.g., AR/VR systems) to provide a better user experience for human-computer interaction. Existing solutions usually recognize the gestures based on wearable sensors or specialized signals (e.g., WiFi, acoustic and visible light), but they are either incurring high energy consumption or susceptible to the ambient environment, which prevents them from efficiently sensing the fine-grained finger movements. In this paper, we present RF-finger, a device-free system based on Commercial-Off-The-Shelf (COTS) RFID, which leverages a tag array on a letter-size paper to sense the finegrained finger movements performed in front of the paper. Particularly, we focus on two kinds of sensing modes: finger tracking recovers the moving trace of finger writings; multi-touch gesture recognition identifies the multi-touch gestures involving multiple fingers. Specifically, we build a theoretical model to extract the fine-grained reflection feature from the raw RF-signal, which describes the finger influence on the tag array in cmlevel resolution. For the finger tracking, we leverage K-Nearest Neighbors (KNN) to pinpoint the finger position relying on the fine-grained reflection features, and obtain a smoothed trace via Kalman filter. Additionally, we construct the reflection image of each multi-touch gesture from the reflection features by regarding the multiple fingers as a whole. Finally, we use a Convolutional Neural Network (CNN) to identify the multi-touch gestures based on the images. Extensive experiments validate that RF-finger can achieve as high as 88% and 92% accuracy for finger tracking and multi-touch gesture recognition, respectively. I. INTRODUCTION With the flourishing of ubiquitous sensing techniques, the human-computer interaction is undergoing a reform: the natural human gestures, e.g., finger movements in the air, is progressively replacing the traditional typing-based input devices such as keyboards to provide a better user experience. Such gesture-based interactions have promoted the development of both Virtual Reality (VR) and Argument Reality (AR) systems, where users could directly control the virtual objects via performing gestures in the air, e.g., writing words, manipulating the tellurion or playing the VR games. Toward this end, the gesture-based interaction can further enable the operations on the smart devices in the Internet-of-Things (IoT) environments, e.g., withdrawing the curtains, controlling the smart TVs. Yingying Chen and Lei Xie are the co-corresponding authors. Track the finger on smart devices Manipulate in VR gaming RFID tag array Recognize multitouch gesture Fig. 1. Illustrations of application of RF-finger. Therefore, accurately recognizing gestures in the air, especially fine-grained finger movements, has a great potential to provide a better user experience in emerging VR applications and IoT manipulations, which will have a market value of USD 48.56 billion by the year of 2024 [2]. Existing gesture recognition solutions can be divided into two categories: (i) Device-based approaches usually require the user to wear sensors, e.g., RFID tag or smartwatch, and track the motion of the sensors to recognize the gestures [15, 17]. These studies usually derive the gestures by building theoretical models to depict the signal changes received from the sensors. However, device-based approaches either suffer from the uncomfortable user experience (e.g., attaching the RFID tag on the finger) or the short life cycles due to the high energy consumption. (ii) Device-free approaches recognize the gestures from ambient signals through different kinds of techniques without requiring the user to wear any devices. As the most popular solutions, camera-based solutions, such as Kinect and LeapMotion, construct the body or finger structure from the video streams for accurately gesture recognition. Nevertheless, they usually involve high computation and may incur privacy concerns of the users. More recent works try to recognize the gestures based on WiFi [16], acoustic signals [18] and visible light [9]. However, these solutions are either easily affected by the environmental noise or incapable of sensing fine-grained gestures at the finger level. In this work, we are in search of a new device-free mechanism that can recognize finger-level gestures to facilitate the growing 1
RFID Peak-to-peak antenna amplitude of RSSI RFID tag Peak-to-peak amplitude of phas 60 Vertical Horizontal 500 1000 1500 200 4006008001000 1200 Sample index Sample index (a)Experiment setup (b)Received signal of vertical movement (c)Received signal of horizontal movement Fig.2.Preliminary study of the RF signal reflection. VR applications and IoT operations. perpendicular way to reduce the interference. The recent advances demonstrate that the emerging RFID The contributions of RF-finger are summarized as follows: technology not only can sense the status of objects with i)We design a new device-free solution based on Commercial- device-based solutions [7,10-12,20],but also has the po- Off-The-Shelf (COTS)RFID for both finger tracking and tential to provide device-free sensing by leveraging the multi- multi-touch gesture recognition.To the best of our knowledge, path effect [4,21].In this work,we present RF-finger,a we are the first to recognize the multi-touch gestures based on device-free system based on RFID tag array,to sense the a RFID system through a device-free approach.ii)We build a fine-grained finger movements.Unlike previous studies,which theoretical model to depict the reflection relationship between either locate the human body in a coarse-grained manner [21] the tag array and the fingers caused by the multi-path effect. or simply detect single stroke from the hand movement for The theoretical model provides guidelines to develop two algo- letter recognition [4],RF-finger focuses on tracking the finger rithms to track the finger trajectories and recognize the multi- trace and recognizing the multi-touch gestures,which involves touch gestures.iii)We experimentally investigate the impact a smaller tracking subject and more complicated multi-touch of tag array deployment on the signal quality.We analyze gestures than existing problems.As shown in Figure 1,by the mutual interference between tags via a signal model and leveraging the tag array attached on a letter-size paper,RF- provide recommendations on tag deployment to reduce the finger seeks to support different applications including writing, interference.iv)We implement a system prototype,RF-finger, multi-touch operations,gaming,etc. for finger tacking and gesture recognition.Experiments show Specifically,we deploy only one RFID antenna behind that RF-finger can achieve the average accuracy of 88%and the tag array to continuously measure the signals emitted 92%for finger tracking and gesture recognition,respectively. from the tag array,and recognize the gestures based on II.PRELIMINARIES CHALLENGES the corresponding signal changes.In designing the RF-finger system,we need to solve three main challenging problems.i) In order to design a system to track the fine-grained finger How to track the trajectory of the finger writings?Since the movements,we first conduct several preliminary studies on finger usually affects several adjacent tags due to the multi- the impact of finger movement on the RF-signals,and the path effect,it is inaccurate to locate the finger as the position feasibility to use RFID tag array for gesture recognition. of tags.In our work,we theoretically model the impact of Based on the observations,we summary three challenges for the moving finger on the tag array to extract the reflection designing our system. features,and then exploit the reflection feature to pinpoint A.Preliminaries the finger with a cm-level resolution.ii)How to recognize Impact of Finger Movement on RF-Signals.RFID tech- the multi-touch gesture?Multi-touch gesture indicates the RF- nique has been widely used in locating and sensing system signals reflected from multiple fingers are mixed together in based on the physical modalities on RF-signal [20],i.e.,phase the tag array,making it even more difficult to distinguish these and Received Signal Strength Indicator (RSSD).Moreover. fingers for gesture recognition.To address this problem,we when a human moves around the tag,both the phase and regard the multiple fingers as a whole for recognition and RSSI are changing accordingly due to the multi-path en- then extract the reflection feature of the multiple fingers as vironment variance [21.Therefore,we first investigate the images.We then leverage a Convolutional Neural Network impact of finger movement on RF-signals,which is much (CNN)to automatically classify the corresponding gestures smaller than human body.As shown in Figure 2(a),a typical from the image features.iii)How to obtain stable signal finger movement can be decomposed into two basic directions: quality from the tag array?In real RFID systems,misreading horizontal movement (i.e.,swipe in front of the tag)and is a common phenomenon due to the dynamic environments vertical movement (i.e.,approach/departure the tag).Hence. that affects the signal quality,especially when reading multiple we conduct two experiments to investigate the influence of tags simultaneously,such as a tag array.To address this these two finger movements.Figure 2(b)presents the signal's problem,we utilize a signal model to depict the mutual phase and RSSI readings when the finger is moving towards interference between tags,which provides recommendations (i.e.,vertically)the tag from 20cm away.We find that both on tag deployment that re-arranges the adjacent tags in a the phase and RSSI readings change in a wavy pattern,and 2
RFID antenna RFID tag Vertical Horizontal (a) Experiment setup Sample index 0 500 1000 1500 Phase (radian) 1 2 3 4 RSSI (dBm) -65 -60 -55 -50 Peak-to-peak amplitude of RSSI Peak-to-peak amplitude of phase (b) Received signal of vertical movement Sample index 0 200 400 600 800 1000 1200 Phase (radian) 3 3.5 4 4.5 5 RSSI (dBm) -48 -47 -46 -45 -44 -43 (c) Received signal of horizontal movement Fig. 2. Preliminary study of the RF signal reflection. VR applications and IoT operations. The recent advances demonstrate that the emerging RFID technology not only can sense the status of objects with device-based solutions [7, 10–12, 20], but also has the potential to provide device-free sensing by leveraging the multipath effect [4, 21]. In this work, we present RF-finger, a device-free system based on RFID tag array, to sense the fine-grained finger movements. Unlike previous studies, which either locate the human body in a coarse-grained manner [21] or simply detect single stroke from the hand movement for letter recognition [4], RF-finger focuses on tracking the finger trace and recognizing the multi-touch gestures, which involves a smaller tracking subject and more complicated multi-touch gestures than existing problems. As shown in Figure 1, by leveraging the tag array attached on a letter-size paper, RF- finger seeks to support different applications including writing, multi-touch operations, gaming, etc. Specifically, we deploy only one RFID antenna behind the tag array to continuously measure the signals emitted from the tag array, and recognize the gestures based on the corresponding signal changes. In designing the RF-finger system, we need to solve three main challenging problems. i) How to track the trajectory of the finger writings? Since the finger usually affects several adjacent tags due to the multipath effect, it is inaccurate to locate the finger as the position of tags. In our work, we theoretically model the impact of the moving finger on the tag array to extract the reflection features, and then exploit the reflection feature to pinpoint the finger with a cm-level resolution. ii) How to recognize the multi-touch gesture? Multi-touch gesture indicates the RFsignals reflected from multiple fingers are mixed together in the tag array, making it even more difficult to distinguish these fingers for gesture recognition. To address this problem, we regard the multiple fingers as a whole for recognition and then extract the reflection feature of the multiple fingers as images. We then leverage a Convolutional Neural Network (CNN) to automatically classify the corresponding gestures from the image features. iii) How to obtain stable signal quality from the tag array? In real RFID systems, misreading is a common phenomenon due to the dynamic environments that affects the signal quality, especially when reading multiple tags simultaneously, such as a tag array. To address this problem, we utilize a signal model to depict the mutual interference between tags, which provides recommendations on tag deployment that re-arranges the adjacent tags in a perpendicular way to reduce the interference. The contributions of RF-finger are summarized as follows: i) We design a new device-free solution based on CommercialOff-The-Shelf (COTS) RFID for both finger tracking and multi-touch gesture recognition. To the best of our knowledge, we are the first to recognize the multi-touch gestures based on a RFID system through a device-free approach. ii) We build a theoretical model to depict the reflection relationship between the tag array and the fingers caused by the multi-path effect. The theoretical model provides guidelines to develop two algorithms to track the finger trajectories and recognize the multitouch gestures. iii) We experimentally investigate the impact of tag array deployment on the signal quality. We analyze the mutual interference between tags via a signal model and provide recommendations on tag deployment to reduce the interference. iv) We implement a system prototype, RF-finger, for finger tacking and gesture recognition. Experiments show that RF-finger can achieve the average accuracy of 88% and 92% for finger tracking and gesture recognition, respectively. II. PRELIMINARIES & CHALLENGES In order to design a system to track the fine-grained finger movements, we first conduct several preliminary studies on the impact of finger movement on the RF-signals, and the feasibility to use RFID tag array for gesture recognition. Based on the observations, we summary three challenges for designing our system. A. Preliminaries Impact of Finger Movement on RF-Signals. RFID technique has been widely used in locating and sensing system based on the physical modalities on RF-signal [20], i.e., phase and Received Signal Strength Indicator (RSSI). Moreover, when a human moves around the tag, both the phase and RSSI are changing accordingly due to the multi-path environment variance [21]. Therefore, we first investigate the impact of finger movement on RF-signals, which is much smaller than human body. As shown in Figure 2(a), a typical finger movement can be decomposed into two basic directions: horizontal movement (i.e., swipe in front of the tag) and vertical movement (i.e., approach/departure the tag). Hence, we conduct two experiments to investigate the influence of these two finger movements. Figure 2(b) presents the signal’s phase and RSSI readings when the finger is moving towards (i.e., vertically) the tag from 20cm away. We find that both the phase and RSSI readings change in a wavy pattern, and 2
2050.09.0500513530529 256.25级.17.78054872618 >358.963.259.2-581613617575 RFID Reflection 452752854.0523520520520 s世n Free-space ● 553.9-49.048.0-490-490-49851.0 signal ● ● 23567 RFID (a)Universal deployment (b)RSSI distribution of universal deploy- antenna ment Fig.3.Preliminary study of the tag array deployment. Fig.4.Reflection model of a tag. the peak-to-peak amplitude [1 increases slowly with the to track finger trace in fine granularity. approaching finger.This indicates that the approaching finger Recognizing Multi-touch Gestures.Unlike the finger- leads to larger refection effect. writing,multi-touch gesture indicates several parts of the tag Additionally,when we swipe the fingers 40cm along the array are affected by different fingers.However,the distance horizontal direction as shown in Figure 2(a),we observe sim- between adjacent fingers is similar to the size of the tag, ilar phenomenon in Figure 2(c).The peak-to-peak amplitude and the finger may affect the tags even though it is 10cm first increases and then decreases as the fingers swipe across away as shown in Figure 2(b)and Figure 2(c).Hence,it is the tag.The results indicate that the peak-to-peak amplitude difficult to distinguish these fingers from the coarse-grained correlates with the distance between the finger and the tag, tag information.To address the challenge,we treat multiple which is later analyzed in Section III.Since the peak-to-peak fingers as a whole without distinguishing each finger and amplitude indicates the linear distance between finger and tag, design a novel solution to recognize the multi-touch gestures we can deploy a tag array to track the moving finger. from the whole of the multiple fingers. Signal Interference within a Tag Array.When we deploy Reducing the Mutual Interference of Tag Array.The the tag array to capture the finger movement,the density of the received signal of the RFID tag can be easily affected by array is a fundamental factor on understanding the granularity the adjacent tags,as shown in Figure 3(b).Such interference of the gestures.For example,a sparse tag array can only may lead to large tracking error,we thus need to find a way recognize the coarse-grained strokes based on the detected to obtain the uniform signal across all tags by reducing the tags affected by the whole hand [41.Therefore,to recognize mutual interference effect of the tag array. the finger-level gestures,we should exploit a dense tag array III.MODELING FINGER TRACKING UPON A TAG ARRAY deployment to serve better recognition capability. In this work,we use the small RFID tag AZ-9629,whose In this section,we introduce the reflection effect of RFID size is only 2.25cm x 2.25cm,so that the tags can ar- tag array with a theoretical wireless model.Particularly,we range tightly.Specifically,we deploy a 5 x 7 tag array into start from the reflection of a single tag,which explains the 15cm x 21cm rectangular space,while each tag only occupies experimental results in Section III and introduces to extract 3cm x 3cm space.A simple deployment is to universally the reflection feature in our system.Then,we move forward deploy all tags with the same orientation as shown in Fig- to the reflection of a tag array,which integrates the reflection ure 3(a).Under this deployment,Figure 3(b)shows the RSSI features of nearby tags to facilitate the perception of the fine- distribution of 35 tags in the unit of dBm when there is no grained finger movement and the multi-touch gestures. finger around.We observe that the RSSI readings vary greatly A.Impact of Finger Movement on a Single Tag across different tags due to the electromagnetic induction between the dense tags [8].In particular,larger RSSI values The signal received from the tag is typically represented as are captured from the marginal tags than those from the tags a stream of complex numbers.In theory,it can be expressed in the center.Therefore,a new deployment is proposed in as: S=X·Sh, (1) Section IV-B to provide stable and uniform RF-signals. where X is the stream of binary bits modulated by the tag,and B.Challenges Sh=ae is the channel parameter of the received signal.In RFID system,we can obtain the channel related information. To develop the finger-level gesture tracking system under including both the RSS in the unit of dBm as R and the phase realistic settings,a number of challenges need to be addressed. value as 6,thus the channel parameter Sh can be calculated Tracking Fine-grained Finger-writing.Given the area size as: 3cm x 3cm of each tag,it can only achieve a coarse-grained 10器 8=V10R/10-3eJ0 (2) resolution of the finger moving trace by detecting the sig- Sh= 1000e nificantly disturbed tag.Moreover,the dense tag deployment Figure 4 illustrates the reflections in RFID system with a may also lead to the detecting errors due to the mutual tag simple case,where the finger swipes across a tag.Besides the interference as shown in Figure 3(b).Therefore,we should free-space signals directly sent from the RFID antenna,the tag have an in-depth understanding about the signals from the tag would also receive the signals reflected by the moving finger. array during the finger movement and then develop the system In the corresponding I-Q plane,two received signals can be 3
(a) Universal deployment -52.0 -56.2 -58.9 -52.7 -53.9 -50.0 -56.1 -63.2 -52.8 -49.0 -49.0 -57.7 -59.2 -54.0 -48.0 -50.0 -58.0 -58.1 -52.3 -49.0 -51.3 -54.8 -61.3 -52.0 -49.0 -53.0 -57.2 -61.7 -52.0 -49.8 -52.9 -61.8 -57.5 -52.0 -51.0 X 1 2 3 4 5 6 7 Y 1 2 3 4 5 (b) RSSI distribution of universal deployment Fig. 3. Preliminary study of the tag array deployment. the peak-to-peak amplitude [1] increases slowly with the approaching finger. This indicates that the approaching finger leads to larger reflection effect. Additionally, when we swipe the fingers 40cm along the horizontal direction as shown in Figure 2(a), we observe similar phenomenon in Figure 2(c). The peak-to-peak amplitude first increases and then decreases as the fingers swipe across the tag. The results indicate that the peak-to-peak amplitude correlates with the distance between the finger and the tag, which is later analyzed in Section III. Since the peak-to-peak amplitude indicates the linear distance between finger and tag, we can deploy a tag array to track the moving finger. Signal Interference within a Tag Array. When we deploy the tag array to capture the finger movement, the density of the array is a fundamental factor on understanding the granularity of the gestures. For example, a sparse tag array can only recognize the coarse-grained strokes based on the detected tags affected by the whole hand [4]. Therefore, to recognize the finger-level gestures, we should exploit a dense tag array deployment to serve better recognition capability. In this work, we use the small RFID tag AZ-9629, whose size is only 2.25cm × 2.25cm, so that the tags can arrange tightly. Specifically, we deploy a 5 × 7 tag array into 15cm×21cm rectangular space, while each tag only occupies 3cm × 3cm space. A simple deployment is to universally deploy all tags with the same orientation as shown in Figure 3(a). Under this deployment, Figure 3(b) shows the RSSI distribution of 35 tags in the unit of dBm when there is no finger around. We observe that the RSSI readings vary greatly across different tags due to the electromagnetic induction between the dense tags [8]. In particular, larger RSSI values are captured from the marginal tags than those from the tags in the center. Therefore, a new deployment is proposed in Section IV-B to provide stable and uniform RF-signals. B. Challenges To develop the finger-level gesture tracking system under realistic settings, a number of challenges need to be addressed. Tracking Fine-grained Finger-writing. Given the area size 3cm × 3cm of each tag, it can only achieve a coarse-grained resolution of the finger moving trace by detecting the significantly disturbed tag. Moreover, the dense tag deployment may also lead to the detecting errors due to the mutual tag interference as shown in Figure 3(b). Therefore, we should have an in-depth understanding about the signals from the tag array during the finger movement and then develop the system I O Q RFID antenna RFID tag Reflection signal Free-space signal !"#$%"& !'()) Swipe *()'&)#$ after swipe *"#$%"& after swipe Swipe Fig. 4. Reflection model of a tag. to track finger trace in fine granularity. Recognizing Multi-touch Gestures. Unlike the fingerwriting, multi-touch gesture indicates several parts of the tag array are affected by different fingers. However, the distance between adjacent fingers is similar to the size of the tag, and the finger may affect the tags even though it is 10cm away as shown in Figure 2(b) and Figure 2(c). Hence, it is difficult to distinguish these fingers from the coarse-grained tag information. To address the challenge, we treat multiple fingers as a whole without distinguishing each finger and design a novel solution to recognize the multi-touch gestures from the whole of the multiple fingers. Reducing the Mutual Interference of Tag Array. The received signal of the RFID tag can be easily affected by the adjacent tags, as shown in Figure 3(b). Such interference may lead to large tracking error, we thus need to find a way to obtain the uniform signal across all tags by reducing the mutual interference effect of the tag array. III. MODELING FINGER TRACKING UPON A TAG ARRAY In this section, we introduce the reflection effect of RFID tag array with a theoretical wireless model. Particularly, we start from the reflection of a single tag, which explains the experimental results in Section III and introduces to extract the reflection feature in our system. Then, we move forward to the reflection of a tag array, which integrates the reflection features of nearby tags to facilitate the perception of the finegrained finger movement and the multi-touch gestures. A. Impact of Finger Movement on a Single Tag The signal received from the tag is typically represented as a stream of complex numbers. In theory, it can be expressed as: S = X · Sh, (1) where X is the stream of binary bits modulated by the tag, and Sh = αeJθ is the channel parameter of the received signal. In RFID system, we can obtain the channel related information, including both the RSS in the unit of dBm as R and the phase value as θ, thus the channel parameter Sh can be calculated as: Sh = s 10 R 10 1000 e Jθ = p 10R/10−3e Jθ . (2) Figure 4 illustrates the reflections in RFID system with a simple case, where the finger swipes across a tag. Besides the free-space signals directly sent from the RFID antenna, the tag would also receive the signals reflected by the moving finger. In the corresponding I-Q plane, two received signals can be 3
Signal Stream:S(t) Signal Preprocessing Data Calibration Semengtation CNN-based Training Model Reflection Feature 口A X (cm) Extraction (a)Reflections from tag array (b)Reflection RSS of tag array Finger Tracking Multi-touch Recognition Fig.5.Reflection model of a tag array NN-based Localization Correlation-based Image Construction represented as Sfree and Sreftect,respectively.Therefore,the 中 actual signal received by the reader can be represented as: Kalman-based Trace Smoothing CNN-based Gesture Sactual Sfree Sreflect. (3) Recognition Here,the finger movement affects Sreftect due to the change Fig.6.System framework of reflection path,thus both the RSS and phase of Sactual also Figure 5(b)illustrates the case where the finger is at (0,0)co- vary accordingly.In order to track the finger movements,we ordinate with 3cm height and each 1cmx lcm grid is supposed need to separate Srefteet from the received signals to roughly to deploy a tag.We set C to 1 for simplicity in this figure.We describe the distance of the reflection path.Specifically,we note that the power highly concentrates at the position of the can estimate Sreflect by subtracting Sactual by Sfree,where finger.Therefore,we can use the theoretical power distribution Sfree can be measured without the reflection object. as a pattern to estimate the finger position from the measured B.Impact of Finger Movement on a Tag Array power distribution of the whole tag array.By computing the The single tag model depicts the signal change on one tag theoretical power distribution in a fine-resolution manner,we caused by the finger movement,but the tag array involving are able to refine the recognition resolution of the tag array multiple tags,meaning the finger affects several adjacent tags with the correlation-based interpolation.In Section V-B,we at the same time.To better understand the reflected signals will show the effectiveness of the tag array model by extracting from the finger,we derive the theoretical model of tag array the refection feature from the reflection power distribution. as follows.In Figure 5(a),we use a one-dimension tag array to illustrate the finger impact on the tag array for simplicity. IV.SYSTEM OVERVIEW Specifically,the antenna A interrogates six tags 71 to 76,while A.System Architecture the finger H is hanging upon the tag array. The major objective of our work is to recognize the fine- According to the single tag model,we can derive the grained finger gestures via a device-free approach.Towards reflection feature Sreftect for each tag.Additionally,Sreftect this end,we design an RFID-based system,RF-finger,which can be further divided into two parts based on the reflection captures the signal changes on the tag array for gesture path in Figure 5(a): recognition.As shown in Figure 6,RF-finger consists of four Sreflect=SA→HSH→T: (4) main components:two core modules Signal Pre-processing where SAH represents the signal from A to H.SH and Reflection Feature Extraction,followed by two function- represents the signal reflected from H to 7i,and varies based ality modules Finger Tracking and Multi-touch Recognition. on the tag's position.In an ideal channel model [5].S Specifically,RF-finger takes as input the time-series signal is defined as: 1 si(t)received from each tag i of the tag array,including both SH-T=dT. (5) the RSSI and phase information.The Signal Pre-processing where diT is the distance between H and Ti.T is the module first calibrates the measured signal by interpolating the phase shift over the distance dFormally,the phase shift misreading signal and smoothing the signal.Next,we divide can be calculated from the wave length A as: the smoothed signals into separated gestures by analyzing the signal variance of all tags,which accurately estimates the 0HT4=2 dHT:mod 2n (6) 入 starting and ending point of a gesture.Then,the Reflection For each tag Ti,we can combine Eq.(4)and Eq.(5)to Feature Extraction module extracts the reflection features of calculate the power of the Srefteet [5]as: the gesture based on our reflection model in Section III. 1 Preflect =Sreflect2=C*- (7) After extracting the reflection features from RF signal,two main functionality modules are followed for finger tracking where denotes the module of the complex parameter to get and multi-touch gesture recognition.For the finger-writing, the power and C=SA2 is a constant power.Therefore,the Finger Tracking module locates the finger from the re- the magnitude of Preftect is determined by dHT,meaning flection features in each time stamp based on the K-Nearest the finger leads to larger reflection power to the close tags. Neighbors (KNN)algorithm.Locations in consecutive time Given the position of H,we can calculate the distribution stamps are connected together and smoothed via Kalman filter of reflection power Preftect in the 2D space from Eq.(7).to obtain a fine-grained trace.For the multi-touch gestures
O X Y A H T1 T2 T3 T4 T5 T6 H !H!Ti !A!H (a) Reflections from tag array X (cm) -5 0 5 Y (cm) -5 0 5 (b) Reflection RSS of tag array Fig. 5. Reflection model of a tag array. represented as Sf ree and Sref lect, respectively. Therefore, the actual signal received by the reader can be represented as: Sactual = Sf ree + Sref lect. (3) Here, the finger movement affects Sref lect due to the change of reflection path, thus both the RSS and phase of Sactual also vary accordingly. In order to track the finger movements, we need to separate Sref lect from the received signals to roughly describe the distance of the reflection path. Specifically, we can estimate Sref lect by subtracting Sactual by Sf ree, where Sf ree can be measured without the reflection object. B. Impact of Finger Movement on a Tag Array The single tag model depicts the signal change on one tag caused by the finger movement, but the tag array involving multiple tags, meaning the finger affects several adjacent tags at the same time. To better understand the reflected signals from the finger, we derive the theoretical model of tag array as follows. In Figure 5(a), we use a one-dimension tag array to illustrate the finger impact on the tag array for simplicity. Specifically, the antenna A interrogates six tags T1 to T6, while the finger H is hanging upon the tag array. According to the single tag model, we can derive the reflection feature Sref lect for each tag. Additionally, Sref lect can be further divided into two parts based on the reflection path in Figure 5(a): Sref lect = SA→HSH→Ti . (4) where SA→H represents the signal from A to H. SH→Ti represents the signal reflected from H to Ti , and varies based on the tag’s position. In an ideal channel model [5], SH→Ti is defined as: SH→Ti = 1 d 2 HT i e JθHT i , (5) where dHT i is the distance between H and Ti . θHT i is the phase shift over the distance dHT i . Formally, the phase shift can be calculated from the wave length λ as: θHT i = 2π dHT i λ mod 2π. (6) For each tag Ti , we can combine Eq. (4) and Eq. (5) to calculate the power of the Sref lect [5] as: Pref lect = |Sref lect| 2 = C ∗ 1 d 4 HT i , (7) where |·| denotes the module of the complex parameter to get the power and C = |SA→H| 2 is a constant power. Therefore, the magnitude of Pref lect is determined by dHT i , meaning the finger leads to larger reflection power to the close tags. Given the position of H, we can calculate the distribution of reflection power Pref lect in the 2D space from Eq. (7). Signal Stream: !"#$% Reflection Feature Extraction Finger Tracking KNN-based Localization Kalman-based Trace Smoothing Signal Preprocessing Data Calibration Semengtation Multi-touch Recognition Correlation-based Image Construction CNN-based Gesture Recognition CNN-based Training Model Fig. 6. System framework. Figure 5(b) illustrates the case where the finger is at (0, 0) coordinate with 3cm height and each 1cm×1cm grid is supposed to deploy a tag. We set C to 1 for simplicity in this figure. We note that the power highly concentrates at the position of the finger. Therefore, we can use the theoretical power distribution as a pattern to estimate the finger position from the measured power distribution of the whole tag array. By computing the theoretical power distribution in a fine-resolution manner, we are able to refine the recognition resolution of the tag array with the correlation-based interpolation. In Section V-B, we will show the effectiveness of the tag array model by extracting the reflection feature from the reflection power distribution. IV. SYSTEM OVERVIEW A. System Architecture The major objective of our work is to recognize the finegrained finger gestures via a device-free approach. Towards this end, we design an RFID-based system, RF-finger, which captures the signal changes on the tag array for gesture recognition. As shown in Figure 6, RF-finger consists of four main components: two core modules Signal Pre-processing and Reflection Feature Extraction, followed by two functionality modules Finger Tracking and Multi-touch Recognition. Specifically, RF-finger takes as input the time-series signal si(t) received from each tag i of the tag array, including both the RSSI and phase information. The Signal Pre-processing module first calibrates the measured signal by interpolating the misreading signal and smoothing the signal. Next, we divide the smoothed signals into separated gestures by analyzing the signal variance of all tags, which accurately estimates the starting and ending point of a gesture. Then, the Reflection Feature Extraction module extracts the reflection features of the gesture based on our reflection model in Section III. After extracting the reflection features from RF signal, two main functionality modules are followed for finger tracking and multi-touch gesture recognition. For the finger-writing, the Finger Tracking module locates the finger from the re- flection features in each time stamp based on the K-Nearest Neighbors (KNN) algorithm. Locations in consecutive time stamps are connected together and smoothed via Kalman filter to obtain a fine-grained trace. For the multi-touch gestures, 4
2153020532510540507 2.4 Gesture- 2 49.050851051950050050.0 22 Release >3510500500-4904992051.0 (uerp) -497500500490480500486 Threshold ● 502508-49050050050051.0 riginal phas 14 ● 23 587 After interpolating smoothing 2 150 200 30 (a)Shuffled deployment (b)RSSI distribution of shuffled 50 100 100 200 Sample index Sample index deployment (a)Signal calibration (b)Gesture segmentation Fig.7.Shuffled deployment of the dense tag array. Fig.8.Illustration of signal preprocessing in RF-finger. the Multi-touch Recognition module leverages a Convolu- 4)=86-4)+(06+)-66-1》-车1 (8) tional Neural Network(CNN)to automatically classify each tit1-ti-1 gesture from the visual features.Particularly,it constructs a where 0(ti_1)and 0(ti1)are two adjacent phase readings before and after time ti.After interpolation,a moving average 3-frame image of the gesture from the reflection features, which describes the influence range of the multiple fingers filter is applied to smooth the signal,which further removes high-frequency noise.Figure 8(a)illustrates the effectiveness in the starting/middle/ending period of the gestures.Then we learn the neural network model from the 3-frame image for of our data calibration by comparing the phase stream before gesture classification.Finally,we can recognize the gestures and after data calibration.The phase stream shown in the figure is from one tag in the array,when the user is performing by analyzing the classification scores based on CNN right rotate gesture.From the enlarged figure,we could clearly B.Dense Tag Array Deployment see the misreadings are well interpolated.Moreover,after As illustrated in Section II-A,we observe that the adjacent smoothing,the high-frequency serrated waves are removed. tags in the dense tag array have great impacts on the signal To capture the signal pattern of a specific finger movement, quality of other tags due to the electromagnetic interference [8, we need to identify its starting and ending point,which 19].The principle behind such influence is the electromagnetic correspond to the gesture people tend to raise the hand up interference between the two tags [8].As a result,the parallel and drop the hand down.Therefore,a segmentation method deployed tags will affect the nearby tags due to the mutual based on the detection of the calibrated RF-signals variance interference.To eliminate such mutual interference,we shuffle is developed to detect the actions of rasing/releasing hand the directions of part tags as shown in Figure 7(a)by making to segment gestures.Intuitively,we observe that the signal the nearby tags perpendicular to each other.In this way,we should be stable when people drop the hand down,and the can minimize the interference between nearby tags by making signal of some tags experiences distinct variations when the the electromagnetic interference perpendicular to each tag.As user performs the gestures.Therefore,we further leverage a a result,we can then achieve a stable RSSI measurement sliding window to calculate the variance stream of each tag across all tags,which is shown in Figure 7(b).Therefore,we from the calibrated RF-signals,and the starting/ending points adopt the perpendicular deployment of the tag array in our should have large variance values.Figure 8(b)illustrates the system. variance stream of all the 35 tags,which takes as input the V.RF-FINGER SYSTEM DESIGN calibrated phase stream.We find only part of the tags have In this section,we will talk about the detailed design of the large signal variance at the same time,because the finger only proposed RF-finger system.Specifically,we first preprocess affects several tags close to the finger.Thus,we continuously the raw RF-signals and then extract the reflection features to calculate the maximum variance of each sliding window for depict the finger influence on the tag array.Finally,we track the maximum variance stream.Based on the first and last the finger trace and recognize the multi-touch gestures from peak of the max variance stream,we can detect the action of these reflection features. raising/releasing hand and then take the signal stream between them as the gesture signal. A.Signal Preprocessing Given the received RF-signals,which involve some inherent B.Refection Feature Extraction measurement defects such as misreading tags and noise,the After signal processing,we have the segmented and noise- data calibration process is developed to improve the reliability less signal of each individual gesture,so we first leverage of the RF-signals by interpolating the misreading tags and the reflection model in Section III-A to derive the reflection smoothing the signal.In RFID system,the misreading tags signals Sreftect of each tag.Then we extract the reflection are usually caused by the highly dynamic environment during features from the Sreftect as the likelihood distribution inside the finger movement.Therefore,we can interpolate the mis-the tag array zone,where the likelihood of each position reading RF-signals from adjacent sampling rounds based on depicts the probability that the finger locates at the position. the continuous movement of the finger.Take a phase stream Before defining the likelihood.we derive the reflection signal 0(t)as an example,which is time-series phase values from of each tag by removing the free-space signal as Srefiect one tag.If there is a misreading phase (ti),we calculate the Sactual-Sfree.Particularly,Sactual is collected during the interpolation value from other phase reading as: gesture period and Sfree is collected before the gesture. 5
(a) Shuffled deployment -52.7 -49.0 -51.0 -49.7 -50.2 -53.0 -50.6 -50.0 -50.0 -50.6 -52.0 -51.0 -50.0 -50.0 -49.0 -53.2 -51.9 -49.0 -49.0 -50.0 -51.0 -50.0 -49.9 -48.0 -50.0 -54.0 -50.0 -52.0 -50.0 -50.0 -50.7 -50.0 -51.0 -48.6 -51.0 X 1 2 3 4 5 6 7 Y 1 2 3 4 5 (b) RSSI distribution of shuffled deployment Fig. 7. Shuffled deployment of the dense tag array. the Multi-touch Recognition module leverages a Convolutional Neural Network (CNN) to automatically classify each gesture from the visual features. Particularly, it constructs a 3-frame image of the gesture from the reflection features, which describes the influence range of the multiple fingers in the starting/middle/ending period of the gestures. Then we learn the neural network model from the 3-frame image for gesture classification. Finally, we can recognize the gestures by analyzing the classification scores based on CNN. B. Dense Tag Array Deployment As illustrated in Section II-A, we observe that the adjacent tags in the dense tag array have great impacts on the signal quality of other tags due to the electromagnetic interference [8, 19]. The principle behind such influence is the electromagnetic interference between the two tags [8]. As a result, the parallel deployed tags will affect the nearby tags due to the mutual interference. To eliminate such mutual interference, we shuffle the directions of part tags as shown in Figure 7(a) by making the nearby tags perpendicular to each other. In this way, we can minimize the interference between nearby tags by making the electromagnetic interference perpendicular to each tag. As a result, we can then achieve a stable RSSI measurement across all tags, which is shown in Figure 7(b). Therefore, we adopt the perpendicular deployment of the tag array in our system. V. RF-FINGER SYSTEM DESIGN In this section, we will talk about the detailed design of the proposed RF-finger system. Specifically, we first preprocess the raw RF-signals and then extract the reflection features to depict the finger influence on the tag array. Finally, we track the finger trace and recognize the multi-touch gestures from these reflection features. A. Signal Preprocessing Given the received RF-signals, which involve some inherent measurement defects such as misreading tags and noise, the data calibration process is developed to improve the reliability of the RF-signals by interpolating the misreading tags and smoothing the signal. In RFID system, the misreading tags are usually caused by the highly dynamic environment during the finger movement. Therefore, we can interpolate the misreading RF-signals from adjacent sampling rounds based on the continuous movement of the finger. Take a phase stream θ(t) as an example, which is time-series phase values from one tag. If there is a misreading phase θ(ti), we calculate the interpolation value from other phase reading as: Sample index 0 50 100 150 200 Phase (radian) 1.2 1.4 1.6 1.8 2 2.2 2.4 Original phase After interpolating After interpolating & smoothing 2 2.1 2.2 2.3 (a) Signal calibration Sample index 0 100 200 300 Phase (radian) 0 1 2 3 4 Variance of one tag Max variance of all tags Threshold Gesture Raise hand Release hand (b) Gesture segmentation Fig. 8. Illustration of signal preprocessing in RF-finger. ˆθ(ti) = θ(ti−1) + (θ(ti+1) − θ(ti−1)) ti − ti−1 ti+1 − ti−1 , (8) where θ(ti−1) and θ(ti+1) are two adjacent phase readings before and after time ti . After interpolation, a moving average filter is applied to smooth the signal, which further removes high-frequency noise. Figure 8(a) illustrates the effectiveness of our data calibration by comparing the phase stream before and after data calibration. The phase stream shown in the figure is from one tag in the array, when the user is performing right rotate gesture. From the enlarged figure, we could clearly see the misreadings are well interpolated. Moreover, after smoothing, the high-frequency serrated waves are removed. To capture the signal pattern of a specific finger movement, we need to identify its starting and ending point, which correspond to the gesture people tend to raise the hand up and drop the hand down. Therefore, a segmentation method based on the detection of the calibrated RF-signals variance is developed to detect the actions of rasing/releasing hand to segment gestures. Intuitively, we observe that the signal should be stable when people drop the hand down, and the signal of some tags experiences distinct variations when the user performs the gestures. Therefore, we further leverage a sliding window to calculate the variance stream of each tag from the calibrated RF-signals, and the starting/ending points should have large variance values. Figure 8(b) illustrates the variance stream of all the 35 tags, which takes as input the calibrated phase stream. We find only part of the tags have large signal variance at the same time, because the finger only affects several tags close to the finger. Thus, we continuously calculate the maximum variance of each sliding window for the maximum variance stream. Based on the first and last peak of the max variance stream, we can detect the action of raising/releasing hand and then take the signal stream between them as the gesture signal. B. Reflection Feature Extraction After signal processing, we have the segmented and noiseless signal of each individual gesture, so we first leverage the reflection model in Section III-A to derive the reflection signals Sref lect of each tag. Then we extract the reflection features from the Sref lect as the likelihood distribution inside the tag array zone, where the likelihood of each position depicts the probability that the finger locates at the position. Before defining the likelihood, we derive the reflection signal of each tag by removing the free-space signal as Sref lect = Sactual − Sf ree. Particularly, Sactual is collected during the gesture period and Sf ree is collected before the gesture. 5
NN-based trac 1674276.4-71.2-68070.776.670.8 173255.57570568.5682 74.4 1070055a548648679718 775858558.865268.969.170.0 6 471873.5-58.6-68269.268775.4 9 4 1 101B16922 -12 6 30 25 (a)RSSI distribution of the reflection (b)Distribution of the reflection fea (a)Filering theway grids for (b)Tracking thenger writing of KNN-based localization letter“e” signal ture Fig.10.Illustration of tracking the finger trace from reflection features Fig.9.Illustration of the extracted reflection features. Therefore,Sreftect demonstrates the reflection signal caused F0=∑x, (10) by the finger movements.Figure 9(a)illustrates the RSSI ∑4 distribution of Sreftect when the finger is at(10,10).We find where Ii is the ith largest grid and(i,yi)is the corresponding that the finger affects several tags around (10,10)and the coordinate.The concatenation of the estimated locations F(t) adjacent tags even have the same RSSI value.The reason is is the trace of the finger.At last,we use the Kalman filter that both the finger and the palm reflect the RF-signal,making to smooth the trace of finger-writing trace based on the fact the reflection signal mixed together. that the finger is continuously moving for writing.Due to the We further use the reflection model of tag array in Sec- space limitation,we only present the state transition function tion III-B to extract the reflection features from Srefleet. based on a velocity model as: Specifically,we partition the reflection range of our tag array F(t)=F(t-1)+v(t-1)*△t, (11) into cm-level grids.Suppose the finger is right upon the grid where v(t)is the moving speed and At is the sampling (y),then we can derive the theoretical reflection power P gap.Based on the Kalman filter,we are able to migrate the for each tag i according to Eq.(7).Given the measured RSSI errors in KNN localization to provide a smooth trace from the values Ri for tag i,we define the likelihood I.y of grid(r,y) velocity model.Figure 10 uses a sample case to illustrate the from the Pearson correlation coefficient [13]as: effectiveness of our tracking method.Figure 10(a)presents N the mechanism of filtering the grids for KNN localization. z,y= 1 N-1台 ) (9) OR By removing the grids that are far away from the estimated location in the last round,we can reduce the interference of the which indicates the probability that the finger locates at(,y). reading errors from some tags.Besides,Figure 10(b)illustrates N is the size of the tag array.u and o are the corresponding the effectiveness of tracking the finger-writing of letter"e" mean and standard deviation value of P and R,respectively. using KNN method and Kalman filter. All the probabilities Iy form a new likelihood matrix as D.Multi-touch Gesture Recognition the refection feature in our work.Figure 9(b)illustrates the reflection feature extracted from the RSSI distribution of In this work,we consider to recognize 6 multi-touch ges- Sreftect.We can observe a peak on the probability distribution tures as shown in Figure 13(b).When we track the finger trace, around (10,10),representing the estimated location range of the RF signals received from the tag array are only affected the finger by one main moving finger.In regard to the multi-touch gestures,the signals affected by different fingers are mixed C.Finger Trace Tracking together,making it hard to distinguish each finger.Intuitively, Based on the extracted reflection features,we next demon- each multi-touch gesture usually has a unique motion pattern strate how to track the finger writing by locating the finger within the tag array zone.In order to effectively discriminate continuously at each sampling round.The basic idea is to use different multi-touch gestures,we evenly separate the gestures the K-Nearest-Neighbor (KNN)method to track the tendency period into 3 frames of equal length,which represent the of finger movement on the whole and leverage Kalman filter starting/middle/ending period of the gestures,respectively.For to smooth the trace for better recognition.The intuition of each frame,we accumulate the reflection features I(t)of KNN method is that the reflection features concentrate on time t to generate the statistic feature I.as: the position of finger as shown in Figure 9(b),so grid Ir. 1z.g=∑1zy(, (12) with larger value is closer to the finger.However,noisy tET reflection features may deviate the localization result away where T is the duration of a frame.The statistic feature from the groundtruth,because traditional KNN method just Lr.y thus constructs an image about the unique pattern of weight averages the K grids without considering the position gesture during this frame.Then the 3-frame image is used of them.Therefore,we first filter the grids based on the fact as the basic feature representation for gesture recognition. that the finger always moves continuously,which removes the Figure 11 illustrates the 3-frame image of"left rotation",while grids far away from the finger location in the last sampling the gesture is shown in Figure 13(b).We can roughly detect round.Then,we estimate the location of the finger F(t)at the rotation pattern from this 3-frame image,which reflects time t from the K grids with the largest likelihood as: the physical movement of the hand. 6
X (cm) 4 7 10 13 16 19 22 Y (cm) 4 7 10 13 16 -71.8 -76.8 -70.0 -73.2 -74.2 -79.6 -68.5 -66.8 -66.5 -76.4 -68.6 -68.8 -64.8 -67.5 -71.2 -68.2 -65.2 -64.8 -70.6 -68.0 -69.2 -68.9 -67.9 -69.5 -70.7 -68.7 -69.1 -71.8 -69.2 -76.6 -75.4 -70.0 -70.6 -74.4 -70.8 -71.8 -76.8 -70.0 -73.2 -74.2 -79.6 -68.5 -66.8 -66.5 -76.4 -68.6 -68.8 -64.8 -67.5 -71.2 -68.2 -65.2 -64.8 -70.6 -68.0 -69.2 -68.9 -67.9 -69.5 -70.7 -68.7 -69.1 -71.8 -69.2 -76.6 -75.4 -70.0 -70.6 -74.4 -70.8 -71.8 -76.8 -70.0 -73.2 -74.2 -79.6 -68.5 -66.8 -66.5 -76.4 -68.6 -68.8 -64.8 -67.5 -71.2 -68.2 -65.2 -64.8 -70.6 -68.0 -69.2 -68.9 -67.9 -69.5 -70.7 -68.7 -69.1 -71.8 -69.2 -76.6 -75.4 -70.0 -70.6 -74.4 -70.8 -71.8 -76.8 -70.0 -73.2 -74.2 -79.6 -68.5 -66.8 -66.5 -76.4 -68.6 -68.8 -64.8 -67.5 -71.2 -68.2 -65.2 -64.8 -70.6 -68.0 -69.2 -68.9 -67.9 -69.5 -70.7 -68.7 -69.1 -71.8 -69.2 -76.6 -75.4 -70.0 -70.6 -74.4 -70.8 -71.8 -76.8 -70.0 -73.2 -74.2 -79.6 -68.5 -66.8 -66.5 -76.4 -68.6 -68.8 -64.8 -67.5 -71.2 -68.2 -65.2 -64.8 -70.6 -68.0 -69.2 -68.9 -67.9 -69.5 -70.7 -68.7 -69.1 -71.8 -69.2 -76.6 -75.4 -70.0 -70.6 -74.4 -70.8 (a) RSSI distribution of the reflection signal X (cm) 5 10 15 20 25 Y (cm) 2 4 6 8 10 12 14 16 18 (b) Distribution of the reflection feature Fig. 9. Illustration of the extracted reflection features. Therefore, Sref lect demonstrates the reflection signal caused by the finger movements. Figure 9(a) illustrates the RSSI distribution of Sref lect when the finger is at (10, 10). We find that the finger affects several tags around (10, 10) and the adjacent tags even have the same RSSI value. The reason is that both the finger and the palm reflect the RF-signal, making the reflection signal mixed together. We further use the reflection model of tag array in Section III-B to extract the reflection features from Sref lect. Specifically, we partition the reflection range of our tag array into cm-level grids. Suppose the finger is right upon the grid (x, y), then we can derive the theoretical reflection power Pi for each tag i according to Eq. (7). Given the measured RSSI values Ri for tag i, we define the likelihood Ix,y of grid (x, y) from the Pearson correlation coefficient [13] as: Ix,y = 1 N − 1 X N i=1 ( Pi − µP σP )(Ri − µR σR ), (9) which indicates the probability that the finger locates at (x, y). N is the size of the tag array. µ and σ are the corresponding mean and standard deviation value of P and R, respectively. All the probabilities Ix,y form a new likelihood matrix as the reflection feature in our work. Figure 9(b) illustrates the reflection feature extracted from the RSSI distribution of Sref lect. We can observe a peak on the probability distribution around (10, 10), representing the estimated location range of the finger. C. Finger Trace Tracking Based on the extracted reflection features, we next demonstrate how to track the finger writing by locating the finger continuously at each sampling round. The basic idea is to use the K-Nearest-Neighbor (KNN) method to track the tendency of finger movement on the whole and leverage Kalman filter to smooth the trace for better recognition. The intuition of KNN method is that the reflection features concentrate on the position of finger as shown in Figure 9(b), so grid Ix,y with larger value is closer to the finger. However, noisy reflection features may deviate the localization result away from the groundtruth, because traditional KNN method just weight averages the K grids without considering the position of them. Therefore, we first filter the grids based on the fact that the finger always moves continuously, which removes the grids far away from the finger location in the last sampling round. Then, we estimate the location of the finger F(t) at time t from the K grids with the largest likelihood as: X (cm) -12 -9 -6 -3 0 3 6 9 12 Y (cm) -9 -6 -3 0 3 6 9 Estimated Location based on KNN method Estimated location in the last round Removed grids Far away (a) Filtering the far-away grids for KNN-based localization X (cm) -10 -5 0 5 10 Y (cm) -5 0 5 KNN-based trace After Kalman filter (b) Tracking the finger writing of letter “e” Fig. 10. Illustration of tracking the finger trace from reflection features. F(t) = PK i=1 Ii × (xi , yi) PK i=1 Ii , (10) where Ii is the ith largest grid and (xi , yi) is the corresponding coordinate. The concatenation of the estimated locations F(t) is the trace of the finger. At last, we use the Kalman filter to smooth the trace of finger-writing trace based on the fact that the finger is continuously moving for writing. Due to the space limitation, we only present the state transition function based on a velocity model as: F(t) = F(t − 1) + v(t − 1) ∗ ∆t, (11) where v(t) is the moving speed and ∆t is the sampling gap. Based on the Kalman filter, we are able to migrate the errors in KNN localization to provide a smooth trace from the velocity model. Figure 10 uses a sample case to illustrate the effectiveness of our tracking method. Figure 10(a) presents the mechanism of filtering the grids for KNN localization. By removing the grids that are far away from the estimated location in the last round, we can reduce the interference of the reading errors from some tags. Besides, Figure 10(b) illustrates the effectiveness of tracking the finger-writing of letter “e” using KNN method and Kalman filter. D. Multi-touch Gesture Recognition In this work, we consider to recognize 6 multi-touch gestures as shown in Figure 13(b). When we track the finger trace, the RF signals received from the tag array are only affected by one main moving finger. In regard to the multi-touch gestures, the signals affected by different fingers are mixed together, making it hard to distinguish each finger. Intuitively, each multi-touch gesture usually has a unique motion pattern within the tag array zone. In order to effectively discriminate different multi-touch gestures, we evenly separate the gestures period into 3 frames of equal length, which represent the starting/middle/ending period of the gestures, respectively. For each frame, we accumulate the reflection features Ix,y(t) of time t to generate the statistic feature Ix,y as: Ix,y = X t∈T Ix,y(t), (12) where T is the duration of a frame. The statistic feature Ix,y thus constructs an image about the unique pattern of gesture during this frame. Then the 3-frame image is used as the basic feature representation for gesture recognition. Figure 11 illustrates the 3-frame image of “left rotation”, while the gesture is shown in Figure 13(b). We can roughly detect the rotation pattern from this 3-frame image, which reflects the physical movement of the hand. 6
Starting period Middle period Ending period 25 Hidden Layers Rotating 10 fingers Two fingers Input Laver Pool Conv Conv Pool Output layer 1 layer 2 FC layer layer 1 layer 2 layer Fig.12.Illustration of CNN structure for gesture recognition 10 15 10 his/her hand down to collect the free-space signal Sfree,which X(cm) is used for reflection feature extraction.For the finger tracking, Fig.11.3-frame images of the "left rotation"for gesture recognition. we ask 4 participants to write the 26 letters and 4 shapes (i.e.. Given the feature representation (i.e..3-frame image)of ▣,△,O,)l0 times.In the KNN method,K is set to5as each multi-touch gesture,we leverage Convolutional Neu- default.For the multi-touch gesture recognition,we ask all the ral Network (CNN)to recognize the target gestures,which 10 participants to perform each of the 6 gestures as shown in provides better performance in image classification mission. Figure 13(b)30 times.Particularly,80%of the gesture related Figure 12 presents the structure of our CNN model,which RF dataset (i.e.,1440 gestures)are used to train the CNN takes as input the 3-frame image and produces the classi- model,and the other 20%are used to evaluate the trained fication scores of each gesture for recognition.Particularly, model.Only one CNN model is trained for all the users our CNN model contains five hidden layers,including two RFID antenna (Back) convolutional (Conv)layers and two pool layers followed by a Fully Connected (FC)layer.Conv layer is the core building block,which leverages a set of learnable filters to extract the RFID antenna local properties of the image.For example,in Figure 11,the two fingers of starting period are placed horizontally,and then rotate to vertical direction at the ending period.Therefore, LeapMotion RrRe士5功Zaom Is制wipeig五 based on these well learned filters,CNN can automatically RFID reader detect these local properties for gesture recognition,even (a)Experimental setup (b)Multi-touch gestures Fig.13.Evaluation setup multi-touch gestures. though the gestures are not performed at the same place. During the training process,we learn the model by col- We define three different metrics to evaluate the perfor- lecting the 3-frame images for each gesture with manually mance of the finger movements. labels.The model automatically learns the properties from Recognition accuracy:For the finger writing letters,we the 3-frame images,which can accurately character each recover the writing trace and then use LipiTk [3]to recognize gesture from the view of images.In the validation process, the trace,which provides a candidate letter set C with different we construct the 3-frame image from the reflection features confidences.Given a test set Tr of the traces for letter x,the of testing gestures and use the trained model to classify them. recognition accuracy of is defined as,where T Finally,we recognize the testing gestures based on the CNN measures the set size. model. Distance error:For the shapes in finger tracking,the dis- tance error is defined as. VI.PERFORMANCE EVALUATION max(()),which indicates the DTW(F.FG) average Dynamic Time Warping (DTW)distance between the A.Experimental Setup Metrics tracking trace F from RF-finger and the groundtruth shape In order to validate the effectiveness of the proposed RF- FC.L()calculates the number of points in the trace. finger system,we conduct the experiments on both the finger Classification accuracy:For the multi-touch gestures,the tracking and multi-touch gesture recognition in realistic set- classification accuracy is defined aswhere G and Ga tings.The experimental setup of RF-finger system consists of a are the numbers of correctly classified gestures and performed 5x 7 tag array of AZ-9629 RFID tags and an ImpinJ Speedway gestures,respectively.Particularly,we first train a general R420 RFID reader integrated with a S9028PCL directional CNN model and then use the model to classify all the multi- antenna as shown in Figure 13(a).The tag array is deployed touch gestures. using the shuffled deployment as shown in Figure 7(a)and B.Finger Tracking of Letters the average sampling rate is 13Hz.The RFID antenna is We first evaluate the accuracy of recognizing the finger placed 50cm behind the tag array to interrogate the tags,while writing letters based on LipiTk.Since LipiTk produces several the user performs finger gestures in front of the tag array.A candidate letters with different confidences,we use the first LeapMotion is also deployed under the tag array to collect three candidates with the larger confidence as the recognition video stream for comparison. result.As shown in Figure 14(a).RF-finger achieves an The experiments are carried out in a typical indoor environ-average recognition accuracy of 88%.For all the letters,the ment involving 10 participants in total (8 males and 2 females). recognition accuracies are all above 80%,while 14 of 26 letters Before performing each gesture,the user is required to drop achieve more than 90%recognition accuracy.Particularly, >
X (cm) 5 10 15 Y (cm) 5 10 15 20 25 X (cm) 5 10 15 Y (cm) 5 10 15 20 25 X (cm) 5 10 15 Y (cm) 5 10 15 20 25 Starting period Middle period Ending period Two fingers Rotating Two fingers Fig. 11. 3-frame images of the “left rotation” for gesture recognition. Given the feature representation (i.e., 3-frame image) of each multi-touch gesture, we leverage Convolutional Neural Network (CNN) to recognize the target gestures, which provides better performance in image classification mission. Figure 12 presents the structure of our CNN model, which takes as input the 3-frame image and produces the classi- fication scores of each gesture for recognition. Particularly, our CNN model contains five hidden layers, including two convolutional (Conv) layers and two pool layers followed by a Fully Connected (FC) layer. Conv layer is the core building block, which leverages a set of learnable filters to extract the local properties of the image. For example, in Figure 11, the two fingers of starting period are placed horizontally, and then rotate to vertical direction at the ending period. Therefore, based on these well learned filters, CNN can automatically detect these local properties for gesture recognition, even though the gestures are not performed at the same place. During the training process, we learn the model by collecting the 3-frame images for each gesture with manually labels. The model automatically learns the properties from the 3-frame images, which can accurately character each gesture from the view of images. In the validation process, we construct the 3-frame image from the reflection features of testing gestures and use the trained model to classify them. Finally, we recognize the testing gestures based on the CNN model. VI. PERFORMANCE EVALUATION A. Experimental Setup & Metrics In order to validate the effectiveness of the proposed RF- finger system, we conduct the experiments on both the finger tracking and multi-touch gesture recognition in realistic settings. The experimental setup of RF-finger system consists of a 5×7 tag array of AZ-9629 RFID tags and an ImpinJ Speedway R420 RFID reader integrated with a S9028PCL directional antenna as shown in Figure 13(a). The tag array is deployed using the shuffled deployment as shown in Figure 7(a) and the average sampling rate is 13Hz. The RFID antenna is placed 50cm behind the tag array to interrogate the tags, while the user performs finger gestures in front of the tag array. A LeapMotion is also deployed under the tag array to collect video stream for comparison. The experiments are carried out in a typical indoor environment involving 10 participants in total (8 males and 2 females). Before performing each gesture, the user is required to drop Input Layer Conv layer 1 Pool layer 1 Conv layer 2 Pool layer 2 FC layer Hidden Layers Output layer Fig. 12. Illustration of CNN structure for gesture recognition. his/her hand down to collect the free-space signal Sf ree, which is used for reflection feature extraction. For the finger tracking, we ask 4 participants to write the 26 letters and 4 shapes (i.e., ,4, ,♥) 10 times. In the KNN method, K is set to 5 as default. For the multi-touch gesture recognition, we ask all the 10 participants to perform each of the 6 gestures as shown in Figure 13(b) 30 times. Particularly, 80% of the gesture related RF dataset (i.e., 1440 gestures) are used to train the CNN model, and the other 20% are used to evaluate the trained model. Only one CNN model is trained for all the users. Tag array LeapMotion RFID antenna (Back) RFID antenna RFID reader (a) Experimental setup 2) Zoom Out (ZO) 3) Swipe Left (SL) 1) Rotate Left (RL) 5) Zoom In (ZI) 6) Swipe Right (SR) 4) Rotate Right (RR) (b) Multi-touch gestures Fig. 13. Evaluation setup & multi-touch gestures. We define three different metrics to evaluate the performance of the finger movements. Recognition accuracy: For the finger writing letters, we recover the writing trace and then use LipiTk [3] to recognize the trace, which provides a candidate letter set C with different confidences. Given a test set Tx of the traces for letter x, the recognition accuracy of x is defined as P||{x}∩C|| ||Tx|| , where ||·|| measures the set size. Distance error: For the shapes in finger tracking, the distance error is defined as DTW(F,FG) max(L(F ),L(FG)) , which indicates the average Dynamic Time Warping (DTW) distance between the tracking trace F from RF-finger and the groundtruth shape FG. L() calculates the number of points in the trace. Classification accuracy: For the multi-touch gestures, the classification accuracy is defined as Gc Ga , where Gc and Ga are the numbers of correctly classified gestures and performed gestures, respectively. Particularly, we first train a general CNN model and then use the model to classify all the multitouch gestures. B. Finger Tracking of Letters We first evaluate the accuracy of recognizing the finger writing letters based on LipiTk. Since LipiTk produces several candidate letters with different confidences, we use the first three candidates with the larger confidence as the recognition result. As shown in Figure 14(a), RF-finger achieves an average recognition accuracy of 88%. For all the letters, the recognition accuracies are all above 80%, while 14 of 26 letters achieve more than 90% recognition accuracy. Particularly, 7
□○ lser Numher of candid-des (a)Recognition accuracy of letters (b)Recognition accuracy of different (c)Recognition accuracy comparison (d)Illusitration of finger tracking users with LeapMotion 0 5R150.303185284 SL 0.3 21 0.0 48 200624 249030.030 52 03912 33 0.9 12 ¥04 RRQ396.400240.3 0.2 RL91803 39 21 0 L RR 20 SL SR 2 500 1000 150 200 (e)Distance error of tracking differ-(f)Confusion matrix of multi-touch(g)Accuracy of multi-touch gestures (h)Training accuracy of CNN ent shapes gestures with users Fig.14.Evaluation results. letter“a”,“f',"h”,k”and“y'are correctly recognized with paper..We use the average DTW distance to characterize the 100%due to their distinct shapes. tracking accuracy of RF-finger as shown in Figure 14(e).We Moreover,we evaluate the robustness of RF-finger by find three of the shapes have the average error as low as 1cm, comparing the recognition accuracy across different users.We while the error for rectangles is about 2.3cm.Through in- also vary the size of candidate set produced by LipiTk for depth investigating.we find all the tracked rectangles are easily comparison.As shown in Figure 14(b),all the users achieve recognized(similar to Figure 14(d)),but they are distorted with more than 75%the accuracy based on the first candidate.As some rotations,leading to a little bit higher tracking error than we increase the number of candidates to three,the accuracy the other shapes.Overall,RF-finger is able to accurately track increases to more than 85%,meaning we can correctly rec- the finger trace with small error. ognize the letters from the first three candidates with more D.Multi-touch Gesture Recognition than 85%probability.Particularly,user 3 achieves the highest Finally,we evaluate the performance of multi-touch recogni- recognition accuracy as 94%,while the lowest accuracy is tion using the CNN based classification algorithm.Figure 14(f) 84%for user 4.Therefore,RF-finger is robust to recognize presents the confusion matrix of classifying the 6 gestures.We the letters from the finger writings of different users. find 5 of the 6 gestures achieve over 90%accuracy for gesture Additionally,we compare the letter recognition accuracy of recognition.Even though these gestures are not performed at RF-finger with that of LeapMotion by varying the number of exactly the same position over the tag array,CNN model candidates.As shown in Figure 14(c),it is encouraging to can still correctly classify them via the local property of find that the accuracy of RF-finger is only 3%to 6%lower the images,e.g.,the relative positions of fingers in different than the LeapMotion,which validates the accuracy of RF- periods.The average accuracy of the all gestures achieves as finger.Particularly,RF-finger achieves about 89%recognition high as 92%,indicating RF-finger can be used to accurately accuracy when we use 3 candidates,and LeapMotion achieves recognize the multi-touch gesture. 92%accuracy.Therefore,RF-finger achieves comparable ac- We also show the robustness of the CNN model by compar- curacy for the recognition of finger writings with the video- ing the recognition accuracy across different users.All the 10 based technique (i.e.,LeapMotion). users perform the 6 gestures in front of the tag array,while the C.Finger Tracking of Shapes users randomly choose the position over the tag array to per- Next,we evaluate the accuracy of finger tracking by com- form.As shown in Figure 14(g),the proposed method achieves paring the shapes of RF-finger with the shapes of groundtruth around 90%accuracy for most of the users.Particularly,the drawn on the paper.Particularly,we test 4 basic shapes,i.e., lowest accuracy is as high as 89%,while the highest accuracy rectangle,triangle,circle and heart(☐,△,○,),respectively.. is 94%.Therefore,RF-finger can accurately classify the multi- Figure 14(d)illustrates the traces of RF-finger,which include touch gestures based on the properties extracted from the CNN ▣,△,O,and letter“a”,k”,“m”,“s”,“z”.All the finger model. traces can be easily recognized with little distortion.Besides. Besides,we also present the learning rate of our CNN all the traces are written in a 15cm x 15cm square,indicating model as shown in Figure 14(h).We randomly choose 1440 RF-finger can track the trajectory with fine-grained resolution. gestures from all the 1800 to train our CNN model.All the Furthermore,we compare the trace of RF-finger with the parameters in each CNN layer automatically update in each groundtruth on the paper.Particularly,we use DTW to map epoch to improve the recognition accuracy of the training each location in the trace of RF-finger to the groundtruth on the dataset.Particularly,we find the CNN model achieves as high 8
Letters a b c d e f g h i j k l mn o p q r s t u vwx y z Accuracy 0 0.2 0.4 0.6 0.8 1 (a) Recognition accuracy of letters User ID 1 2 3 4 Accuracy 0 0.2 0.4 0.6 0.8 1 One candidate Two candidates Three candidates (b) Recognition accuracy of different users Number of candidates 1 2 3 Accuracy 0 0.2 0.4 0.6 0.8 1 RF-Finger LeapMotion (c) Recognition accuracy comparison with LeapMotion (d) Illusitration of finger tracking Rectangle Triangle Circle Heart Distance error (cm) 0 0.5 1 1.5 2 2.5 (e) Distance error of tracking different shapes Ground truth RL RR ZI ZO SL SR Estimated gestures RL RR ZI ZO SL SR 91.8 0.3 5.2 0.6 0.6 1.5 0.3 96.4 0.3 2.4 0.3 0.3 3.9 0.0 91.2 2.4 2.1 0.3 2.1 2.4 3.3 90.3 0.0 1.8 0.6 0.3 0.9 0.0 93.0 5.2 0.9 0.6 1.2 3.0 4.8 89.4 (f) Confusion matrix of multi-touch gestures User ID 1 2 3 4 5 6 7 8 9 10 Accuracy 0 10 20 30 40 50 60 70 80 90 100 (g) Accuracy of multi-touch gestures with users Learning epoch 0 500 1000 1500 2000 Accuracy 0 0.2 0.4 0.6 0.8 1 (h) Training accuracy of CNN Fig. 14. Evaluation results. letter “a”, “f”, “h”, “k” and “y” are correctly recognized with 100% due to their distinct shapes. Moreover, we evaluate the robustness of RF-finger by comparing the recognition accuracy across different users. We also vary the size of candidate set produced by LipiTk for comparison. As shown in Figure 14(b), all the users achieve more than 75% the accuracy based on the first candidate. As we increase the number of candidates to three, the accuracy increases to more than 85%, meaning we can correctly recognize the letters from the first three candidates with more than 85% probability. Particularly, user 3 achieves the highest recognition accuracy as 94%, while the lowest accuracy is 84% for user 4. Therefore, RF-finger is robust to recognize the letters from the finger writings of different users. Additionally, we compare the letter recognition accuracy of RF-finger with that of LeapMotion by varying the number of candidates. As shown in Figure 14(c), it is encouraging to find that the accuracy of RF-finger is only 3% to 6% lower than the LeapMotion, which validates the accuracy of RF- finger. Particularly, RF-finger achieves about 89% recognition accuracy when we use 3 candidates, and LeapMotion achieves 92% accuracy. Therefore, RF-finger achieves comparable accuracy for the recognition of finger writings with the videobased technique (i.e., LeapMotion). C. Finger Tracking of Shapes Next, we evaluate the accuracy of finger tracking by comparing the shapes of RF-finger with the shapes of groundtruth drawn on the paper. Particularly, we test 4 basic shapes, i.e., rectangle, triangle, circle and heart (, 4, , ♥), respectively. Figure 14(d) illustrates the traces of RF-finger, which include , 4, , ♥ and letter “a”, “k”, “m”, “s”, “z”. All the finger traces can be easily recognized with little distortion. Besides, all the traces are written in a 15cm ×15cm square, indicating RF-finger can track the trajectory with fine-grained resolution. Furthermore, we compare the trace of RF-finger with the groundtruth on the paper. Particularly, we use DTW to map each location in the trace of RF-finger to the groundtruth on the paper. We use the average DTW distance to characterize the tracking accuracy of RF-finger as shown in Figure 14(e). We find three of the shapes have the average error as low as 1cm, while the error for rectangles is about 2.3cm. Through indepth investigating, we find all the tracked rectangles are easily recognized (similar to Figure 14(d)), but they are distorted with some rotations, leading to a little bit higher tracking error than the other shapes. Overall, RF-finger is able to accurately track the finger trace with small error. D. Multi-touch Gesture Recognition Finally, we evaluate the performance of multi-touch recognition using the CNN based classification algorithm. Figure 14(f) presents the confusion matrix of classifying the 6 gestures. We find 5 of the 6 gestures achieve over 90% accuracy for gesture recognition. Even though these gestures are not performed at exactly the same position over the tag array, CNN model can still correctly classify them via the local property of the images, e.g., the relative positions of fingers in different periods. The average accuracy of the all gestures achieves as high as 92%, indicating RF-finger can be used to accurately recognize the multi-touch gesture. We also show the robustness of the CNN model by comparing the recognition accuracy across different users. All the 10 users perform the 6 gestures in front of the tag array, while the users randomly choose the position over the tag array to perform. As shown in Figure 14(g), the proposed method achieves around 90% accuracy for most of the users. Particularly, the lowest accuracy is as high as 89%, while the highest accuracy is 94%. Therefore, RF-finger can accurately classify the multitouch gestures based on the properties extracted from the CNN model. Besides, we also present the learning rate of our CNN model as shown in Figure 14(h). We randomly choose 1440 gestures from all the 1800 to train our CNN model. All the parameters in each CNN layer automatically update in each epoch to improve the recognition accuracy of the training dataset. Particularly, we find the CNN model achieves as high 8
as 90%accuracy when training dataset exceeds 800 learning systems).Our key innovations lie in modeling the reflection of epochs,while the training accuracy reaches 98%after 2000 the finger on the tag array and extracting the reflection features learning epochs.The result indicates that our CNN model can of the finger based on the model.Through the reflection converge quickly to about 90%accuracy with fewer epochs features.we leverage the KNN method to track the finger trace and reasonable time. and the CNN model to recognize the multi-touch gestures. VII.RELATED WORK The experimental results confirm the effectiveness of RF- There have been active research efforts in gesture recogni- finger on both finger writing tracking and multi-touch gesture recognition,which achieves over 88%and 92%accuracy. tion,which can be broadly divided into two main categories: ACKNOWLEDGMENT Device-based Approaches.Previous research has shown that both the built-in motion sensors on wearable devices and This work is partially supported by National Natural Science the wearable RFID tags attached on human body can be Foundation of China under Grant Nos.61472185,61373129. utilized for gesture recognition [6,14,17].For example, 61321491,61502224;JiangSu Natural Science Foundation, ArmTrack [15]proposes to track the entire arm solely relying No.BK20151390.This work is partially supported by Col- on the smartwatch.FitCoach [6]assesses dynamic postures in laborative Innovation Center of Novel Software Technology workouts by recognizing the exercise gestures from wearable and Industrialization.This work is partially supported by the sensors.However,these methods suffer from the short life program A for Outstanding PhD candidate of Nanjing Univer- cycles due to high energy computation.RF-IDraw [17]and sity.This work is partially supported by the US National Sci- Pantomime [14]track the motion pattern of RFID tags for ence Foundation Grants CNS-1514436,CNS-1716500,CNS- gesture recognition.These approaches,however,require the 1717356 and Army Office Research Grant W911NF-17-1- 0467. tags to be attached to the finger or the passive object held by REFERENCES the user.It will reduce the user experience with the attached [1]Amplitude.https://en.wikipedia.org/wiki/Amplitude. RFID tags on human body.especially for the manipulation [2]Gesture recognition market.http://www.transparencymarketresearch. in the VR applications.Different from previous studies,we com/gesture-recognition-market.html [3]LipiTk.http://lipitk.sourceforge.net/. propose a device-free approach with a RFID tag array,which [4]H.Ding,C.Qian,J.Han,G.Wang,W.Xi,K.Zhao,and J.Zhao.Rfipad: indicates the user can perform each gesture naturally without Enabling cost-efficient and device-free in-air handwriting using passive wearing any specialized device. tags.In Proc.of IEEE ICDCS.2017. [5]D.M.Dobkin.The RF in RFID:Passive UHF RFID in Practice Device-free Approaches.As an emerging solution for gesture Newnes.2007. recognition,device-free approaches gain significant attentions [6]X.Guo,J.Liu,and Y.Chen.Fitcoach:Virtual fitness coach empowered by wearable mobile devices.In Proc.of IEEE INFOCOM,2017. in recent years.As a mature technique,camera-based ap- [7]J.Han,H.Ding.C.Qian,W.Xi,Z.Wang.Z.Jiang.L.Shangguan,and proaches,e.g.,Microsoft Kinect and LeapMotion,are able to J.Zhao.A customer behavior identification system using passive tags. extract the body or finger structure based on the computer IEEE/ACM Transactions on Networking,2016. version techniques.However,reconstructing the body or finger [8]J.Han.C.Qian,X.Wang.D.Ma,J.Zhao,W.Xi,Z.Jiang.and Z.Wang. Twins:Device-free object tracking using passive tags.IEEE/ACM structure from video streams usually incurs high computation Transactions on Networking,2016. and unexpected privacy leakage.Nowadays,several studies [9]T.Li,C.An,Z.Tian,A.T.Campbell,and X.Zhou.Human sensing try to recognize the gestures leveraging specialized signals, using visible light communication.In Proc.of ACM MobiCom.2015. [10]J.Liu,M.Chen,S.Chen,Q.Pan,and L.Chen.Tag-Compass: e.g.,WiFi [16],acoustic signal [18]and visible light [9]. Determining the spatial direction of an object with small dimensions. However,these solutions are either easily affected by the In Proc.of IEEE INFOCOM,2017. [11]J.Liu,F.Zhu,Y.Wang.X.Wang,Q.Pan,and L.Chen.RF-Scanner: ambient noise or incapable of sensing fine-grained gestures. Shelf scanning with robot-assisted RFID systems.In Proc.of IEEE Yang et al.propose to locate the human body based on INFOCOM,2017. COTS RFID technique via a device-free approach [21].which [12]X.Liu,X.Xie,K.Li,B.Xiao,J.Wu,H.Qi,and D.Lu.Fast Tracking the Population of Key Tags in Large-scale Anonymous RFID Systems. shows the potential of device-free sensing in RFID system. IEEE/ACM Transactions on Networking,2017. More recently,RF-IPad [4].another device-free approach [13]K.Pearson.Notes on regression and inheritance in the case of two based on RFID,is proposed to recognize the human writing parents.In Proc.of the Royal Sociery of London,1895. by detecting the stroke.However,we focus on tracking the [14]L.Shangguan,Z.Zhou,and K.Jamieson.Enabling gesture-based interactions with object.In Proc.of ACM Mobisys,2017 finger trace,which is a finger-level and fine-grained tracking [15]S.Shen,H.Wang,and R.R.Choudhury.I am a smartwatch and i can problem.Moreover,we are able to recognize the multi-touch track my users arm.In Proc.of ACM MobiSys,2016. gestures with a device-free approach based on RFID,which [16]S.Tan and J.Yang.Wifinger:Leveraging commodity wifi for fine- grained finger gesture recognition.In Proc.of ACM MobiHoc,2016. still remains open so far. [17]J.Wang.D.Vasisht,and D.Katabi.Rf-idraw:virtual touch screen in VIII.CONCLUSION the air using rf signals.In Proc.of ACM SIGCOMM,2015. [18]W.Wang.A.X.Liu,and K.Sun.Device-free gesture tracking using In this paper,we propose RF-finger,a device-free system to acoustic signals.In Proc.of ACM MobiCom,2016. [19]R.K.Wangsness.Electromagnetic Fields.New York.NY.USA:Wiley- track the finger writings and recognize the multi-touch gestures VCH,1986. based on COTS RFID system.RF-finger provides a practical [20]L.Yang.Y.Chen,X.-Y.Li,C.Xiao,M.Li,and Y.Liu.Tagoram:Real- solution to precisely track the fine-grained finger trace and time tracking of mobile rfid tags to high precision using cots devices. recognize multi-touch gestures,which facilitates the in-the-air In Proc.of ACM MobiCom,2014. [21]L.Yang.Q.Lin,X.Li,T.Liu,and Y.Liu.See through walls with cots operations in many smart applications (e.g.,VR/AR and IoT rfid system!In Proc.of ACM Mobicom,2015. 9
as 90% accuracy when training dataset exceeds 800 learning epochs, while the training accuracy reaches 98% after 2000 learning epochs. The result indicates that our CNN model can converge quickly to about 90% accuracy with fewer epochs and reasonable time. VII. RELATED WORK There have been active research efforts in gesture recognition, which can be broadly divided into two main categories: Device-based Approaches. Previous research has shown that both the built-in motion sensors on wearable devices and the wearable RFID tags attached on human body can be utilized for gesture recognition [6, 14, 17]. For example, ArmTrack [15] proposes to track the entire arm solely relying on the smartwatch. FitCoach [6] assesses dynamic postures in workouts by recognizing the exercise gestures from wearable sensors. However, these methods suffer from the short life cycles due to high energy computation. RF-IDraw [17] and Pantomime [14] track the motion pattern of RFID tags for gesture recognition. These approaches, however, require the tags to be attached to the finger or the passive object held by the user. It will reduce the user experience with the attached RFID tags on human body, especially for the manipulation in the VR applications. Different from previous studies, we propose a device-free approach with a RFID tag array, which indicates the user can perform each gesture naturally without wearing any specialized device. Device-free Approaches. As an emerging solution for gesture recognition, device-free approaches gain significant attentions in recent years. As a mature technique, camera-based approaches, e.g., Microsoft Kinect and LeapMotion, are able to extract the body or finger structure based on the computer version techniques. However, reconstructing the body or finger structure from video streams usually incurs high computation and unexpected privacy leakage. Nowadays, several studies try to recognize the gestures leveraging specialized signals, e.g., WiFi [16], acoustic signal [18] and visible light [9]. However, these solutions are either easily affected by the ambient noise or incapable of sensing fine-grained gestures. Yang et al. propose to locate the human body based on COTS RFID technique via a device-free approach [21], which shows the potential of device-free sensing in RFID system. More recently, RF-IPad [4], another device-free approach based on RFID, is proposed to recognize the human writing by detecting the stroke. However, we focus on tracking the finger trace, which is a finger-level and fine-grained tracking problem. Moreover, we are able to recognize the multi-touch gestures with a device-free approach based on RFID, which still remains open so far. VIII. CONCLUSION In this paper, we propose RF-finger, a device-free system to track the finger writings and recognize the multi-touch gestures based on COTS RFID system. RF-finger provides a practical solution to precisely track the fine-grained finger trace and recognize multi-touch gestures, which facilitates the in-the-air operations in many smart applications (e.g., VR/AR and IoT systems). Our key innovations lie in modeling the reflection of the finger on the tag array and extracting the reflection features of the finger based on the model. Through the reflection features, we leverage the KNN method to track the finger trace and the CNN model to recognize the multi-touch gestures. The experimental results confirm the effectiveness of RF- finger on both finger writing tracking and multi-touch gesture recognition, which achieves over 88% and 92% accuracy. ACKNOWLEDGMENT This work is partially supported by National Natural Science Foundation of China under Grant Nos. 61472185, 61373129, 61321491, 61502224; JiangSu Natural Science Foundation, No. BK20151390. This work is partially supported by Collaborative Innovation Center of Novel Software Technology and Industrialization. This work is partially supported by the program A for Outstanding PhD candidate of Nanjing University. This work is partially supported by the US National Science Foundation Grants CNS-1514436, CNS-1716500, CNS- 1717356 and Army Office Research Grant W911NF-17-1- 0467. REFERENCES [1] Amplitude. https://en.wikipedia.org/wiki/Amplitude. [2] Gesture recognition market. http://www.transparencymarketresearch. com/gesture-recognition-market.html. [3] LipiTk. http://lipitk.sourceforge.net/. [4] H. Ding, C. Qian, J. Han, G. Wang, W. Xi, K. Zhao, and J. Zhao. Rfipad: Enabling cost-efficient and device-free in-air handwriting using passive tags. In Proc. of IEEE ICDCS, 2017. [5] D. M. Dobkin. The RF in RFID: Passive UHF RFID in Practice. Newnes, 2007. [6] X. Guo, J. Liu, and Y. Chen. Fitcoach: Virtual fitness coach empowered by wearable mobile devices. In Proc. of IEEE INFOCOM, 2017. [7] J. Han, H. Ding, C. Qian, W. Xi, Z. Wang, Z. Jiang, L. Shangguan, and J. Zhao. A customer behavior identification system using passive tags. IEEE/ACM Transactions on Networking, 2016. [8] J. Han, C. Qian, X. Wang, D. Ma, J. Zhao, W. Xi, Z. Jiang, and Z. Wang. Twins: Device-free object tracking using passive tags. IEEE/ACM Transactions on Networking, 2016. [9] T. Li, C. An, Z. Tian, A. T. Campbell, and X. Zhou. Human sensing using visible light communication. In Proc. of ACM MobiCom, 2015. [10] J. Liu, M. Chen, S. Chen, Q. Pan, and L. Chen. Tag-Compass: Determining the spatial direction of an object with small dimensions. In Proc. of IEEE INFOCOM, 2017. [11] J. Liu, F. Zhu, Y. Wang, X. Wang, Q. Pan, and L. Chen. RF-Scanner: Shelf scanning with robot-assisted RFID systems. In Proc. of IEEE INFOCOM, 2017. [12] X. Liu, X. Xie, K. Li, B. Xiao, J. Wu, H. Qi, and D. Lu. Fast Tracking the Population of Key Tags in Large-scale Anonymous RFID Systems. IEEE/ACM Transactions on Networking, 2017. [13] K. Pearson. Notes on regression and inheritance in the case of two parents. In Proc. of the Royal Society of London, 1895. [14] L. Shangguan, Z. Zhou, and K. Jamieson. Enabling gesture-based interactions with object. In Proc. of ACM Mobisys, 2017. [15] S. Shen, H. Wang, and R. R. Choudhury. I am a smartwatch and i can track my users arm. In Proc. of ACM MobiSys, 2016. [16] S. Tan and J. Yang. Wifinger: Leveraging commodity wifi for finegrained finger gesture recognition. In Proc. of ACM MobiHoc, 2016. [17] J. Wang, D. Vasisht, and D. Katabi. Rf-idraw: virtual touch screen in the air using rf signals. In Proc. of ACM SIGCOMM, 2015. [18] W. Wang, A. X. Liu, and K. Sun. Device-free gesture tracking using acoustic signals. In Proc. of ACM MobiCom, 2016. [19] R. K. Wangsness. Electromagnetic Fields. New York, NY, USA: WileyVCH, 1986. [20] L. Yang, Y. Chen, X.-Y. Li, C. Xiao, M. Li, and Y. Liu. Tagoram: Realtime tracking of mobile rfid tags to high precision using cots devices. In Proc. of ACM MobiCom, 2014. [21] L. Yang, Q. Lin, X. Li, T. Liu, and Y. Liu. See through walls with cots rfid system! In Proc. of ACM Mobicom, 2015. 9