Keystroke Recognition Using WiFi Signals Kamran Alit Alex X.Liut+Wei Wang Muhammad Shahzad tDept.of Computer Science and Engineering,Michigan State University,USA #State Key Laboratory for Novel Software Technology,Nanjing University,China tfalikamr3,alexliu,shahzadm@cse.msu.edu,ww@nju.edu.cn ABSTRACT ways to recognize keystrokes,which can be classified into Keystroke privacy is critical for ensuring the security of com- three categories:acoustic emission based approaches,elec- puter systems and the privacy of human users as what being tromagnetic emission based approaches,and vision based typed could be passwords or privacy sensitive information. approaches.Acoustic emission based approaches recognize In this paper,we show for the first time that WiFi signals keystrokes based on either the observation that different keys can also be exploited to recognize keystrokes.The intuition in a keyboard produce different typing sounds 1,2 or the is that while typing a certain key,the hands and fingers of a observation that the acoustic emanations from different keys user move in a unique formation and direction and thus gen- arrive at different surrounding smartphones at different time erate a unique pattern in the time-series of Channel State as the keys are located at different places in a keyboard [3]. Information (CSI)values,which we call CSI-waveform for Electromagnetic emission based approaches recognize key- that key.In this paper,we propose a WiFi signal based strokes based on the observation that the electromagnetic keystroke recognition system called WiKey.WiKey consists emanations from the electrical circuit underneath different of two Commercial Off-The-Shelf (COTS)WiFi devices,a keys in a keyboard are different 4.Vision based approaches sender (such as a router)and a receiver (such as a laptop) recognizes keystrokes using vision technologies 5. The sender continuously emits signals and the receiver con- In this paper,we show for the first time that WiFi signals tinuously receives signals.When a human subject types on can also be exploited to recognize keystrokes.WiFi signals a keyboard,WiKey recognizes the typed keys based on how are pervasive in our daily life at home,offices,and even the CSI values at the WiFi signal receiver end.We imple- shopping centers.The key intuition is that while typing a mented the WiKey system using a TP-Link TL-WR1043ND certain key,the hands and fingers of a user move in a unique WiFi router and a Lenovo X200 laptop.WiKey achieves formation and direction and thus generate a unique pattern more than 97.5%detection rate for detecting the keystroke in the time-series of Channel State Information(CSI)values, and 96.4%recognition accuracy for classifying single keys. which we call CSI-waveform,for that key.The keystrokes In real-world experiments,WiKey can recognize keystrokes of each key introduce relative unique multi-path distortions in a continuously typed sentence with an accuracy of 93.5%. in WiFi signals and this uniqueness can be exploited to re- cognize keystrokes.Due to the high data rates supported by Categories and Subject Descriptors modern WiFi devices,WiFi cards provide enough CSI val- C.2.1 Network Architecturel:Wireless Communica- ues within the duration of a keystroke to construct a high tions;D.4.6 Security and Protectione:Keystroke re- resolution CSI-waveform for each keystroke. covery We propose a WiFi signal based keystroke recognition sys. tem called WiKey.WiKey consists of two Commercial Off- Keywords The-Shelf (COTS)WiFi devices,a sender (such as a router) Gesture recognition;Wireless security:Keystroke recovery: and a receiver(such as a laptop),as shown in Figure 1.The Channel State Information:COTS WiFi devices sender continuously emits signals and the receiver continu- ously receives signals.When a human subject types in a 1.INTRODUCTION keyboard,on the WiFi signal receiver end,WiKey recog- Keystroke privacy is critical for ensuring the security of nizes the typed keys based on how the CSI value changes. computer systems and the privacy of human users as what CSI values quantify the aggregate effect of wireless phenom- being types could be passwords or privacy sensitive in- ena such as fading,multi-paths,and Doppler shift on the formation. The research community has studied various wireless signals in a given environment.When the environ- ment changes,such as a key is being pressed,the impact Permission to make digital or hard copies of all or part of this work for personal or of these wireless phenomena on the wireless signals change classroom use is granted without fee provided that copies are not made or distributed resulting in unique changes in the CSI values.There are for profit or commercial advantage and that copies bear this notice and the full cita- three key technical challenges.The first technical challenge tion on the first page.Copyrights for components of this work owned by others than is to segment the CSI time series to identify the start time ACM must be honored.Abstracting with credit is permitted.To copy otherwise,or re- and end time of each keystroke.We studied the character- publish,to post on servers or to redistribute to lists,requires prior specific permission istics of typical CSI-waveforms of different keystrokes and and/or a fee.Request permissions from Permissions@acm.org. MobiCom'l5.September 7-11,2015,Paris.France. observed that the waveforms of different keys show a similar ©2015ACM.1SBN978-1-4503-3619-2/15/09$15.00 rising and falling trends in the changing rate of CSI values. D0L:http:/x.doi.org/10.1145/2789168.2790109
Keystroke Recognition Using WiFi Signals Kamran Ali† Alex X. Liu†‡ Wei Wang‡ Muhammad Shahzad† †Dept. of Computer Science and Engineering, Michigan State University, USA ‡State Key Laboratory for Novel Software Technology, Nanjing University, China † {alikamr3,alexliu,shahzadm}@cse.msu.edu, ‡ww@nju.edu.cn ABSTRACT Keystroke privacy is critical for ensuring the security of computer systems and the privacy of human users as what being typed could be passwords or privacy sensitive information. In this paper, we show for the first time that WiFi signals can also be exploited to recognize keystrokes. The intuition is that while typing a certain key, the hands and fingers of a user move in a unique formation and direction and thus generate a unique pattern in the time-series of Channel State Information (CSI) values, which we call CSI-waveform for that key. In this paper, we propose a WiFi signal based keystroke recognition system called WiKey. WiKey consists of two Commercial Off-The-Shelf (COTS) WiFi devices, a sender (such as a router) and a receiver (such as a laptop). The sender continuously emits signals and the receiver continuously receives signals. When a human subject types on a keyboard, WiKey recognizes the typed keys based on how the CSI values at the WiFi signal receiver end. We implemented the WiKey system using a TP-Link TL-WR1043ND WiFi router and a Lenovo X200 laptop. WiKey achieves more than 97.5% detection rate for detecting the keystroke and 96.4% recognition accuracy for classifying single keys. In real-world experiments, WiKey can recognize keystrokes in a continuously typed sentence with an accuracy of 93.5%. Categories and Subject Descriptors C.2.1 [Network Architecture]: Wireless Communications; D.4.6 [Security and Protectione]: Keystroke recovery Keywords Gesture recognition; Wireless security; Keystroke recovery; Channel State Information; COTS WiFi devices 1. INTRODUCTION Keystroke privacy is critical for ensuring the security of computer systems and the privacy of human users as what being types could be passwords or privacy sensitive information. The research community has studied various Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from Permissions@acm.org. MobiCom’15, September 7–11, 2015, Paris, France. c 2015 ACM. ISBN 978-1-4503-3619-2/15/09 ...$15.00. DOI: http://dx.doi.org/10.1145/2789168.2790109. ways to recognize keystrokes, which can be classified into three categories: acoustic emission based approaches, electromagnetic emission based approaches, and vision based approaches. Acoustic emission based approaches recognize keystrokes based on either the observation that different keys in a keyboard produce different typing sounds [1, 2] or the observation that the acoustic emanations from different keys arrive at different surrounding smartphones at different time as the keys are located at different places in a keyboard [3]. Electromagnetic emission based approaches recognize keystrokes based on the observation that the electromagnetic emanations from the electrical circuit underneath different keys in a keyboard are different [4]. Vision based approaches recognizes keystrokes using vision technologies [5]. In this paper, we show for the first time that WiFi signals can also be exploited to recognize keystrokes. WiFi signals are pervasive in our daily life at home, offices, and even shopping centers. The key intuition is that while typing a certain key, the hands and fingers of a user move in a unique formation and direction and thus generate a unique pattern in the time-series of Channel State Information (CSI) values, which we call CSI-waveform, for that key. The keystrokes of each key introduce relative unique multi-path distortions in WiFi signals and this uniqueness can be exploited to recognize keystrokes. Due to the high data rates supported by modern WiFi devices, WiFi cards provide enough CSI values within the duration of a keystroke to construct a high resolution CSI-waveform for each keystroke. We propose a WiFi signal based keystroke recognition system called WiKey. WiKey consists of two Commercial Off- The-Shelf (COTS) WiFi devices, a sender (such as a router) and a receiver (such as a laptop), as shown in Figure 1. The sender continuously emits signals and the receiver continuously receives signals. When a human subject types in a keyboard, on the WiFi signal receiver end, WiKey recognizes the typed keys based on how the CSI value changes. CSI values quantify the aggregate effect of wireless phenomena such as fading, multi-paths, and Doppler shift on the wireless signals in a given environment. When the environment changes, such as a key is being pressed, the impact of these wireless phenomena on the wireless signals change, resulting in unique changes in the CSI values. There are three key technical challenges. The first technical challenge is to segment the CSI time series to identify the start time and end time of each keystroke. We studied the characteristics of typical CSI-waveforms of different keystrokes and observed that the waveforms of different keys show a similar rising and falling trends in the changing rate of CSI values
adapted to recognize keystrokes because such coarse grained CSI VALUES information does not capture the minor variations in the CSl DIFFERENT SUMCARRIERS values caused by human micro-movements such as those of hands and fingers while typing.Some recent work,namely VIFI ROUTER WiHear.uses CSI values to extract the micro-movements of mouth to recognize 9 syllables in the spoken words 10 However,WiHear uses special hardware including direc- USER'S KEYOARE tional antennas and stepper motors to direct WiFi beams towards speaker's mouth and extract the micro-movements. Figure 1:WiKey System We implemented the WiKey system using COTS devices,i.e. a TP-Link TL-WR1043ND WiFi router and a Lenovo X200 Based on this observation,we design a keystroke extraction laptop with Intel 5300 WiFi NIC.In the evaluation process algorithm that utilizes CSI streams of all transmit-receive we build a keystroke database of 10 human subjects with antenna (TX-RX)pair pairs to determine the approximate IRB approval.WiKey achieves more than 97.5%detection start and end points of individual keystrokes in a given CSI- rate for detecting the keystroke and 96.4%recognition ac- waveform by continuously matching the trends in CSI time curacy for classifying single keys.In real-world experiments, series with the experimentally observed trends using a slid- WiKey can recognize keystrokes in a continuously typed sen- tence with an accuracy of 93.5% ing window approach. The second technical challenge is to extract distinguishing In this paper,we have shown that fine grained activity features for generating classification models for each of the recognition is possible by using COTS WiFi devices.Thus, 37 keys (10 digits,26 alphabets and 1 space-bar).As the the techniques proposed in this paper can be used for sev- keys on a keyboard are closely placed,conventional features eral HCI applications.Examples include zoom-in,zoom-out, such as maximum peak power,mean amplitude,root mean scrolling,sliding,and rotating gestures for operating per- square deviation of signal amplitude,second/third central sonal computers,gesture recognition for gaming consoles moment,rate of change,signal energy or entropy,and num- in-home gesture recognition for operating various household ber of zero crossings cannot be used because the values of devices,and applications such as writing and drawing in the these features for adjacent keys are almost identical.To air.Other than being a potential attack,our WiKey tech- address this challenge,we use the CSI-waveform shapes of nology can be potentially used to build virtual keyboards each key from each TX-RX antenna pair as features.As the where human users type on a printed keyboard. waveforms for each key contain a large number of samples. we apply the Discrete Wavelet Transform (DWT)technique 2. RELATED WORK on these waveforms to reduce the number of samples while keeping the shape preserving time and frequency domain in- 2.1 Device Free Activity Recognition formation intact.We use the waveforms resulting from the Device-free activity recognition solutions use the vari- DWT of individual keystrokes as their shape features. ations in wireless channel to recognize human activities in a The third technical challenge is to compare shape fea- given environment.Existing solutions can be grouped into tures of any two keystrokes.The midpoints of extracted three categories:(1)Received Signal Strength (RSS)based, CSI-wavforms of different keystrokes rarely align with each (2)CSI based,and (3)Software Defined Radio (SDR)based. other because the start and end points determined by ex- RSS Based:Sigg et al.proposed activity recognition traction algorithm are never exact.Moreover,the lengths schemes that utilize RSS values of WiFi signals to recog- of different keystroke waveforms also differ because the dur- nize four activities including crawling,lying down,standing ation of pressing any key is often different.Consequently up,and walking [11,12].They achieved activity recognition the midpoints and lengths of shape features do not match rates of over 80%for these four activities.To obtain the either.Another issue is that the shape of different keystroke RSS values from WiFi signals,they used USRPs,which are waveforms of the same key are often distorted versions of specialized hardware devices compared to the COTS WiFi each other because of slightly different formation and dir- devices that we used in our work.While RSS values can be ection of motion of hands and fingers while pressing that used for recognizing macro-movements,they are not suit- key.Thus,two shape features cannot be compared using able to recognize the micro-movements such as those of fin- standard measures like correlation coefficient or Euclidean gers and hands in keyboard typing because RSS values only distance.To address this challenge,we use the Dynamic provide coarse-grained information about the channel vari- Time Warping (DTW)technique to quantify the distance ations and do not contain fine-grained information about between the two shape features.DTW can find the min- small scale fading and multi-path effects caused by these imum distance alignment between two waveforms of differ- micro-movements. ent lengths. CSI Based:CSI values obtained from COTS WiFI net- The key novelty of this paper is on proposing the first work interface cards (NICs)(such as Intel 5300 and Ath- WiFi signal based keystroke recognition approach.Some re- eros 9390)have been recently proposed for activity recogni- cent work uses CSI values to recognize various macro aspects tion [6-10,13]and localization [14-16].Han et al.proposed of human movements such as falling down 6,household WiFall that detects fall of a human subject in an indoor activities [7],detection of human presence [8],and estim- environment using CSI values [6].Zhou et al.proposed a ating the number of people in a crowd [9].These schemes passive human detection scheme which exploits multi-path extract coarse grained information from the CSI values to variations for detecting human presence in an indoor envir- recognize the macro-movements such as falling down or re- onment using CSI values [8].Zou et al.proposed Electronic cognizing fullbody/limb gestures.They cannot be directly Frog Eye that counts the number of people in a crowd using
Figure 1: WiKey System Based on this observation, we design a keystroke extraction algorithm that utilizes CSI streams of all transmit-receive antenna (TX-RX) pair pairs to determine the approximate start and end points of individual keystrokes in a given CSIwaveform by continuously matching the trends in CSI time series with the experimentally observed trends using a sliding window approach. The second technical challenge is to extract distinguishing features for generating classification models for each of the 37 keys (10 digits, 26 alphabets and 1 space-bar). As the keys on a keyboard are closely placed, conventional features such as maximum peak power, mean amplitude, root mean square deviation of signal amplitude, second/third central moment, rate of change, signal energy or entropy, and number of zero crossings cannot be used because the values of these features for adjacent keys are almost identical. To address this challenge, we use the CSI-waveform shapes of each key from each TX-RX antenna pair as features. As the waveforms for each key contain a large number of samples, we apply the Discrete Wavelet Transform (DWT) technique on these waveforms to reduce the number of samples while keeping the shape preserving time and frequency domain information intact. We use the waveforms resulting from the DWT of individual keystrokes as their shape features. The third technical challenge is to compare shape features of any two keystrokes. The midpoints of extracted CSI-wavforms of different keystrokes rarely align with each other because the start and end points determined by extraction algorithm are never exact. Moreover, the lengths of different keystroke waveforms also differ because the duration of pressing any key is often different. Consequently, the midpoints and lengths of shape features do not match either. Another issue is that the shape of different keystroke waveforms of the same key are often distorted versions of each other because of slightly different formation and direction of motion of hands and fingers while pressing that key. Thus, two shape features cannot be compared using standard measures like correlation coefficient or Euclidean distance. To address this challenge, we use the Dynamic Time Warping (DTW) technique to quantify the distance between the two shape features. DTW can find the minimum distance alignment between two waveforms of different lengths. The key novelty of this paper is on proposing the first WiFi signal based keystroke recognition approach. Some recent work uses CSI values to recognize various macro aspects of human movements such as falling down [6], household activities [7], detection of human presence [8], and estimating the number of people in a crowd [9]. These schemes extract coarse grained information from the CSI values to recognize the macro-movements such as falling down or recognizing fullbody/limb gestures. They cannot be directly adapted to recognize keystrokes because such coarse grained information does not capture the minor variations in the CSI values caused by human micro-movements such as those of hands and fingers while typing. Some recent work, namely WiHear, uses CSI values to extract the micro-movements of mouth to recognize 9 syllables in the spoken words [10]. However, WiHear uses special hardware including directional antennas and stepper motors to direct WiFi beams towards speaker’s mouth and extract the micro-movements. We implemented the WiKey system using COTS devices, i.e. a TP-Link TL-WR1043ND WiFi router and a Lenovo X200 laptop with Intel 5300 WiFi NIC. In the evaluation process, we build a keystroke database of 10 human subjects with IRB approval. WiKey achieves more than 97.5% detection rate for detecting the keystroke and 96.4% recognition accuracy for classifying single keys. In real-world experiments, WiKey can recognize keystrokes in a continuously typed sentence with an accuracy of 93.5%. In this paper, we have shown that fine grained activity recognition is possible by using COTS WiFi devices. Thus, the techniques proposed in this paper can be used for several HCI applications. Examples include zoom-in, zoom-out, scrolling, sliding, and rotating gestures for operating personal computers, gesture recognition for gaming consoles, in-home gesture recognition for operating various household devices, and applications such as writing and drawing in the air. Other than being a potential attack, our WiKey technology can be potentially used to build virtual keyboards where human users type on a printed keyboard. 2. RELATED WORK 2.1 Device Free Activity Recognition Device-free activity recognition solutions use the variations in wireless channel to recognize human activities in a given environment. Existing solutions can be grouped into three categories: (1) Received Signal Strength (RSS) based, (2) CSI based, and (3) Software Defined Radio (SDR) based. RSS Based: Sigg et al. proposed activity recognition schemes that utilize RSS values of WiFi signals to recognize four activities including crawling, lying down, standing up, and walking [11, 12]. They achieved activity recognition rates of over 80% for these four activities. To obtain the RSS values from WiFi signals, they used USRPs, which are specialized hardware devices compared to the COTS WiFi devices that we used in our work. While RSS values can be used for recognizing macro-movements, they are not suitable to recognize the micro-movements such as those of fingers and hands in keyboard typing because RSS values only provide coarse-grained information about the channel variations and do not contain fine-grained information about small scale fading and multi-path effects caused by these micro-movements. CSI Based: CSI values obtained from COTS WiFI network interface cards (NICs) (such as Intel 5300 and Atheros 9390) have been recently proposed for activity recognition [6–10, 13] and localization [14–16]. Han et al. proposed WiFall that detects fall of a human subject in an indoor environment using CSI values [6]. Zhou et al. proposed a passive human detection scheme which exploits multi-path variations for detecting human presence in an indoor environment using CSI values [8]. Zou et al. proposed Electronic Frog Eye that counts the number of people in a crowd using
CSI values by treating the people reflecting the WiFi signals used cepstrum features [22]instead of FFT as keystroke fea- as "virtual antennas"[9].Wang et al.proposed E-eyes that tures and used unsupervised learning with language model exploits CSI values for recognizing household activities such correction on the collected features before using them for as washing dishes and taking a shower [7.Nandakumar et supervised training and recognition of different keystrokes al.leverage the CSI and RSS information from off-the-shelf Zhu et al.proposed a context-free geometry-based approach WiFi devices to classify four arm gestures-push,pull,lever for recognizing keystrokes that leverage the acoustic eman- and punch [13].The fundamental difference between these ations from keystrokes to first calculate the time difference schemes and our scheme is that these schemes extract coarse of keystroke arrival and then estimate the physical locations grained features from the CSI values provided by the COTS of the keystrokes to identify which keys are pressed [3. WiFi NIC to perform these tasks while our proposed scheme Electromagnetic Emissions Based Vuagnoux et al. refines these CSI to capture fine grained variations in the used a USRP to capture the electromagnetic emanations wireless channel for recognizing keystrokes.Wang et al.pro- while pressing the keys [4.These electromagnetic emana- pose WiHear that uses CSI values recognizes the shape of tions originated from the electrical circuit underneath each mouth while speaking to detect whether a person is utter- key in conventional keyboards.The authors proposed to cap- ing one of a set of nine predefined nine syllables [10].While ture the entire raw electromagnetic spectrum and process it WiHear can capture the micro-movements of lips,it uses to recognize the keystrokes.Unfortunately,this scheme is special purpose directional antennas with stepper motors highly susceptible to background electromagnetic noise that for directing the antenna beams towards a person's mouth exists in almost all environments these days such as due to to obtain a clean signal for recognizing mouth movements microwave ovens,refrigerators,and televisions. In contrast,our proposed scheme does not use any special Video Camera Based Balzarotti et al.proposed purpose equipment and recognizes the micro-movements of ClearShot that processes the video of a person typing to fingers and hands using COTS WiFi NIC reconstruct the sentences (s)he types 5.The authors pro- SDR Based:Researchers have proposed schemes that pose to use context and language sensitive analysis for re- utilize SRDs and special purpose hardware to transmit and constructing the sentences receive custom modulated signals for activity recognition [17-20].Pu et al.proposed WiSee that uses a special pur- 3.CHANNEL STATE INFORMATION pose receiver design on USRPs to extract small Doppler shifts from OFDM WiFi transmissions to recognize human Modern WiFi devices that support IEEE 802.11n/ac gestures [17].Kellogg et al.proposed to use a special pur- standard typically consist of multiple transmit and mul- pose analog envelop detector circuit for recognizing gestures tiple receive antennas and thus support MIMO.Each MIMO within a distance of up to 2.5 feet using backscatter sig- channel between each transmit-receive (TX-RX)antenna nals from RFID or TV transmissions 18.Lyonnet et al. pair of a transmitter and receiver comprises of multiple sub- use micro Doppler signatures to classify gaits of human carriers.These WiFi devices continuously monitor the state subjects into multiple categories using specialized Doppler of the wireless channel to effectively perform transmit power radars [19].Adib et al.proposed WiTrack that uses a spe- allocations and rate adaptations for each individual MIMO stream such that the available capacity of the wireless chan- cially designed frequency modulated carrier wave radio fron- tend to track human movements behind a wall [20].Recently. nel is maximally utilized [23.These devices quantify the state of the channel in terms of CSI values.The CSI val- Chen et al.proposed an SDR based custom receiver design which can be used to track keystrokes using wireless sig- ues essentially characterize the Channel Frequency Response nals [21].In contrast to all these schemes,our scheme does (CFR)for each subcarrier between each transmit-receive not use any specialized hardware or SDRs rather utilizes (TX-RX)antenna pair.As the received signal is the res- COTS WiFi NICs to recognize keystrokes ultant of constructive and destructive interference of several multipath signals scattered from the walls and surrounding objects,the disturbances caused by movement of hands and 2.2 Keystrokes Recognition fingers while typing on a keyboard near the WiFi receiver To the best of our knowledge,there is no prior work on re- not only lead to changes in previously existing multipaths cognizing keystrokes by leveraging variations in wireless sig- but also to the creation of new multipaths.These changes nals using commodity WiFi devices.Other than the SDRs are captured in the CSI values for all subcarriers between based keystroke tracking approach proposed in [21 which every TX-RX antenna pair and can then be used to recog- uses wireless signals to track keystrokes,researchers have nize keystrokes. proposed several keystrokes recognition schemes that are Let Mr denote the number of transmit antennas,MR de- based on other sensing modalities such as acoustics 1-3,22 note the number of receive antennas and Se denote the num- electromagnetic emissions 4,and video cameras 5.Next, ber of OFDM sub-carriers.Let Xi and Y;represent the MT we give a brief overview of the other existing schemes that dimensional transmitted signal vector and MR dimensional utilize these sensing modalities to recognize keystrokes. received signal vector,respectively,for subcarrier i and let Acoustics Based:Asonov et al.proposed a scheme Ni represent an MR dimensional noise vector.An MR x MT to recognize keystrokes by leveraging the observation that MIMO system at any time instant can be represented by the different keys of a given keyboard produce slightly dif- following equation. ferent sounds during regular typing [1].They used back- Yi=H:Xi+Wii∈1,Se (1) propagation neural network for keystroke recognition and fast fourier transform (FFT)of the time window of every In the equation above,the MR x Mr dimensional channel keystroke peak as features for training the classifiers.Zhuang matrix Hi represents the Channel State Information (CSI) et al.proposed another scheme that recognizes keystrokes for the sub-carrier i.Any two communicating WiFi devices based on the sounds generated during key presses [2].They estimate this channel matrix Hi for every subcarrier by reg-
CSI values by treating the people reflecting the WiFi signals as “virtual antennas” [9]. Wang et al. proposed E-eyes that exploits CSI values for recognizing household activities such as washing dishes and taking a shower [7]. Nandakumar et al. leverage the CSI and RSS information from off-the-shelf WiFi devices to classify four arm gestures - push, pull, lever, and punch [13]. The fundamental difference between these schemes and our scheme is that these schemes extract coarse grained features from the CSI values provided by the COTS WiFi NIC to perform these tasks while our proposed scheme refines these CSI to capture fine grained variations in the wireless channel for recognizing keystrokes. Wang et al. propose WiHear that uses CSI values recognizes the shape of mouth while speaking to detect whether a person is uttering one of a set of nine predefined nine syllables [10]. While WiHear can capture the micro-movements of lips, it uses special purpose directional antennas with stepper motors for directing the antenna beams towards a person’s mouth to obtain a clean signal for recognizing mouth movements. In contrast, our proposed scheme does not use any special purpose equipment and recognizes the micro-movements of fingers and hands using COTS WiFi NIC. SDR Based: Researchers have proposed schemes that utilize SRDs and special purpose hardware to transmit and receive custom modulated signals for activity recognition [17–20]. Pu et al. proposed WiSee that uses a special purpose receiver design on USRPs to extract small Doppler shifts from OFDM WiFi transmissions to recognize human gestures [17]. Kellogg et al. proposed to use a special purpose analog envelop detector circuit for recognizing gestures within a distance of up to 2.5 feet using backscatter signals from RFID or TV transmissions [18] . Lyonnet et al. use micro Doppler signatures to classify gaits of human subjects into multiple categories using specialized Doppler radars [19]. Adib et al. proposed WiTrack that uses a specially designed frequency modulated carrier wave radio frontend to track human movements behind a wall [20]. Recently, Chen et al. proposed an SDR based custom receiver design which can be used to track keystrokes using wireless signals [21]. In contrast to all these schemes, our scheme does not use any specialized hardware or SDRs rather utilizes COTS WiFi NICs to recognize keystrokes. 2.2 Keystrokes Recognition To the best of our knowledge, there is no prior work on recognizing keystrokes by leveraging variations in wireless signals using commodity WiFi devices. Other than the SDRs based keystroke tracking approach proposed in [21] which uses wireless signals to track keystrokes, researchers have proposed several keystrokes recognition schemes that are based on other sensing modalities such as acoustics [1–3,22], electromagnetic emissions [4], and video cameras [5]. Next, we give a brief overview of the other existing schemes that utilize these sensing modalities to recognize keystrokes. Acoustics Based: Asonov et al. proposed a scheme to recognize keystrokes by leveraging the observation that different keys of a given keyboard produce slightly different sounds during regular typing [1]. They used backpropagation neural network for keystroke recognition and fast fourier transform (FFT) of the time window of every keystroke peak as features for training the classifiers. Zhuang et al. proposed another scheme that recognizes keystrokes based on the sounds generated during key presses [2]. They used cepstrum features [22] instead of FFT as keystroke features and used unsupervised learning with language model correction on the collected features before using them for supervised training and recognition of different keystrokes. Zhu et al. proposed a context-free geometry-based approach for recognizing keystrokes that leverage the acoustic emanations from keystrokes to first calculate the time difference of keystroke arrival and then estimate the physical locations of the keystrokes to identify which keys are pressed [3]. Electromagnetic Emissions Based Vuagnoux et al. used a USRP to capture the electromagnetic emanations while pressing the keys [4]. These electromagnetic emanations originated from the electrical circuit underneath each key in conventional keyboards. The authors proposed to capture the entire raw electromagnetic spectrum and process it to recognize the keystrokes. Unfortunately, this scheme is highly susceptible to background electromagnetic noise that exists in almost all environments these days such as due to microwave ovens, refrigerators, and televisions. Video Camera Based Balzarotti et al. proposed ClearShot that processes the video of a person typing to reconstruct the sentences (s)he types [5]. The authors propose to use context and language sensitive analysis for reconstructing the sentences. 3. CHANNEL STATE INFORMATION Modern WiFi devices that support IEEE 802.11n/ac standard typically consist of multiple transmit and multiple receive antennas and thus support MIMO. Each MIMO channel between each transmit-receive (TX-RX) antenna pair of a transmitter and receiver comprises of multiple subcarriers. These WiFi devices continuously monitor the state of the wireless channel to effectively perform transmit power allocations and rate adaptations for each individual MIMO stream such that the available capacity of the wireless channel is maximally utilized [23]. These devices quantify the state of the channel in terms of CSI values. The CSI values essentially characterize the Channel Frequency Response (CFR) for each subcarrier between each transmit-receive (TX-RX) antenna pair. As the received signal is the resultant of constructive and destructive interference of several multipath signals scattered from the walls and surrounding objects, the disturbances caused by movement of hands and fingers while typing on a keyboard near the WiFi receiver not only lead to changes in previously existing multipaths but also to the creation of new multipaths. These changes are captured in the CSI values for all subcarriers between every TX-RX antenna pair and can then be used to recognize keystrokes. Let MT denote the number of transmit antennas, MR denote the number of receive antennas and Sc denote the number of OFDM sub-carriers. Let Xi and Yi represent the MT dimensional transmitted signal vector and MR dimensional received signal vector, respectively, for subcarrier i and let Ni represent an MR dimensional noise vector. An MR ×MT MIMO system at any time instant can be represented by the following equation. Yi = HiXi + Ni i ∈ [1, Sc] (1) In the equation above, the MR × MT dimensional channel matrix Hi represents the Channel State Information (CSI) for the sub-carrier i. Any two communicating WiFi devices estimate this channel matrix Hi for every subcarrier by reg-
ularly transmitting a known preamble of OFDM symbols between each other.For each Tx-Rx antenna pair,the driver of our Intel 5300 WiFi NIC reports CSI values for Se=30 OFDM subcarriers of the 20 MHz WiFi Channel [24].This leads to 30 matrices with dimensions MR x Mr per CSI sample. 4.NOISE REMOVAL 10 500 1000 500 The CSI values provided by commodity WiFi NICs are Sample inherently noisy because of the frequent changes in internal (a)Original time series (b)Filtered time series CSI reference levels,transmit power levels,and transmis- Figure 2:Original and filtered CSI time series sion rates.To use CSI values for recognizing keystrokes,such noise must first be removed from the CSI time series.For while a user was repeatedly pressing a key.We observe from this,WiKey first passes the CSI time series from a low- this figure that all subcarriers show correlated variations in pass filter to remove high frequency noises.Unfortunately,a their time series when the user presses the keys.The sub simple low pass filter does not denoise the CSI values very ef- carriers that are closely spaced in frequency show identical ficiently.Although strict low-pass filtering can remove noise variations whereas the subcarriers that farther away in fre- further,it causes loss of useful information from the signal as quency show non-identical changes.Despite non-identical well.To extract useful signal from the noisy CSI time series. changes,a strong correlation still exists even across the sub- WiKey leverages our observation that the variations in the carriers that are far apart in frequency.WiKey leverages this CSI time series of all subcarriers due to the movements of correlation and calculates the principal components from all hands and fingers are correlated.Therefore,it applies Prin- CSI time series.It then chooses those principal components cipal Component Analysis(PCA)on the filtered subcarriers that represent the most common variations among all CSI to extract the signals that only contain variations caused by time series movements of hands.Next,we first describe the process of applying the low-pass filter on the CSI time series and then explain how Wikey extracts hand and finger movement sig- nal using our PCA based approach. 2000 200 000 4.1 Low Pass Filtering The frequency of variations caused due to the movements of hands and fingers lie at the low end of the spectrum while the frequency of the noise lies at the high end of the spectrum.To remove noise in such a situation,Butterworth low-pass filter is a natural choice which does not signific- 2000 4000.300080 antly distort the phase information in the signal and has a maximally flat amplitude response in the passband and thus does not distort the hand and finger movement signal much.WiKey applies the Butterworth filter on the CSI time series of all subcarriers in each TX-RX antenna pair so that every stream experiences similar effects of phase distortion and group delay introduced by the filter.Although this pro- 2000 20o0098.60 8000 cess helps in removing some high frequency noise,the noise (a)#1,2,3,4,5 (b)#5.10,15,20,25 is not completely eliminated because Butterworth filter has slightly slow fall off gain in the stopband. Figure 3:Correlated variations in subcarriers We observed experimentally that the frequencies of the variations in CSI time series due to hand and finger move- There are two main advantages of using PCA.First,PCA ments while typing approximately lie anywhere between 3Hz reduces the dimensionality of the CSI information obtained to 80 Hz.As we sample CSI values at a rate of F=2500 from the 30 subcarriers in each TX-RX stream,which is samples/s,we set the cut-off frequency we of the Butter- useful because using information from all subcarriers for rthfilter at:三学==≈02rad/s.Figure keystroke extraction and recognition significantly increases 2(a)shows the amplitudes of the unfiltered CSI waveform the computational complexity of the scheme.Consequently. of a keystroke and Figure 2(b)shows the resultant from the PCA automatically enables Wikey to obtain the signals that Butterworth filter.We observe that Butterworth filter suc- are representative of hand and finger movements,without cessfully removes most of the bursty noises from the CSI having to devise new techniques and define new parameters waveforms for selecting appropriate subcarriers for further processing. Second.PCA helps in removing noise from the signals by 4.2 PCA Based Filtering taking advantage of correlated varations in CSI time series We observed experimentally that the movements of hands of different subcarriers.It removes the uncorrelated noisy and fingers results in correlated changes in the CSI time components,which can not be removed through traditional series for each subcarrier in every transmit-receive antenna low pass filtering.This PCA based noise reduction is one pair.Figure 3 plots the amplitudes of CSI time series of 10 of the major reasons behind high keystroke extraction and different subcarriers for one transmit-receive antenna pair recognition accuracies of our scheme
ularly transmitting a known preamble of OFDM symbols between each other. For each Tx-Rx antenna pair, the driver of our Intel 5300 WiFi NIC reports CSI values for Sc = 30 OFDM subcarriers of the 20 MHz WiFi Channel [24]. This leads to 30 matrices with dimensions MR × MT per CSI sample. 4. NOISE REMOVAL The CSI values provided by commodity WiFi NICs are inherently noisy because of the frequent changes in internal CSI reference levels, transmit power levels, and transmission rates. To use CSI values for recognizing keystrokes, such noise must first be removed from the CSI time series. For this, WiKey first passes the CSI time series from a lowpass filter to remove high frequency noises. Unfortunately, a simple low pass filter does not denoise the CSI values very ef- ficiently. Although strict low-pass filtering can remove noise further, it causes loss of useful information from the signal as well. To extract useful signal from the noisy CSI time series, WiKey leverages our observation that the variations in the CSI time series of all subcarriers due to the movements of hands and fingers are correlated. Therefore, it applies Principal Component Analysis (PCA) on the filtered subcarriers to extract the signals that only contain variations caused by movements of hands. Next, we first describe the process of applying the low-pass filter on the CSI time series and then explain how WiKey extracts hand and finger movement signal using our PCA based approach. 4.1 Low Pass Filtering The frequency of variations caused due to the movements of hands and fingers lie at the low end of the spectrum while the frequency of the noise lies at the high end of the spectrum. To remove noise in such a situation, Butterworth low-pass filter is a natural choice which does not significantly distort the phase information in the signal and has a maximally flat amplitude response in the passband and thus does not distort the hand and finger movement signal much. WiKey applies the Butterworth filter on the CSI time series of all subcarriers in each TX-RX antenna pair so that every stream experiences similar effects of phase distortion and group delay introduced by the filter. Although this process helps in removing some high frequency noise, the noise is not completely eliminated because Butterworth filter has slightly slow fall off gain in the stopband. We observed experimentally that the frequencies of the variations in CSI time series due to hand and finger movements while typing approximately lie anywhere between 3Hz to 80 Hz. As we sample CSI values at a rate of Fs = 2500 samples/s, we set the cut-off frequency ωc of the Butterworth filter at ωc = 2π∗f Fs = 2π∗80 2500 ≈ 0.2 rad/s. Figure 2(a) shows the amplitudes of the unfiltered CSI waveform of a keystroke and Figure 2(b) shows the resultant from the Butterworth filter. We observe that Butterworth filter successfully removes most of the bursty noises from the CSI waveforms. 4.2 PCA Based Filtering We observed experimentally that the movements of hands and fingers results in correlated changes in the CSI time series for each subcarrier in every transmit-receive antenna pair. Figure 3 plots the amplitudes of CSI time series of 10 different subcarriers for one transmit-receive antenna pair 0 500 1000 1500 10 11 12 13 14 15 16 17 18 19 Sample Amplitude (a) Original time series 0 500 1000 1500 11 12 13 14 15 16 17 Sample Amplitude (b) Filtered time series Figure 2: Original and filtered CSI time series while a user was repeatedly pressing a key. We observe from this figure that all subcarriers show correlated variations in their time series when the user presses the keys. The subcarriers that are closely spaced in frequency show identical variations whereas the subcarriers that farther away in frequency show non-identical changes. Despite non-identical changes, a strong correlation still exists even across the subcarriers that are far apart in frequency. WiKey leverages this correlation and calculates the principal components from all CSI time series. It then chooses those principal components that represent the most common variations among all CSI time series. 2000 4000 6000 8000 1.8 2 2.2 2.4 2.6 2000 4000 6000 8000 3 4 Absolute Value 0 2000 4000 6000 8000 7 8 9 0 2000 4000 6000 8000 9 10 11 12 13 0 2000 4000 6000 8000 12 14 16 Sample (a) # 1,2,3,4,5 2000 4000 6000 8000 12 14 16 2000 4000 6000 8000 12 14 16 0 2000 4000 6000 8000 9 10 0 2000 4000 6000 8000 18 20 22 0 2000 4000 6000 8000 2 2.5 3 Sample (b) # 5,10,15,20,25 Figure 3: Correlated variations in subcarriers There are two main advantages of using PCA. First, PCA reduces the dimensionality of the CSI information obtained from the 30 subcarriers in each TX-RX stream, which is useful because using information from all subcarriers for keystroke extraction and recognition significantly increases the computational complexity of the scheme. Consequently, PCA automatically enables WiKey to obtain the signals that are representative of hand and finger movements, without having to devise new techniques and define new parameters for selecting appropriate subcarriers for further processing. Second, PCA helps in removing noise from the signals by taking advantage of correlated varations in CSI time series of different subcarriers. It removes the uncorrelated noisy components, which can not be removed through traditional low pass filtering. This PCA based noise reduction is one of the major reasons behind high keystroke extraction and recognition accuracies of our scheme
5.KEYSTROKE EXTRACTION --PCA 2 WiKey segments the CSI time series to extract the CSI 6 PCA 3 --PCA4 waveforms for individual keystrokes.For this,WiKey oper- ates on the CSI time series resulting from the butterworth filtering.Let Ht.r(i)be an Sex 1 dimensional vector contain- ing the CSI values of the Sc subcarriers between an arbitrary TX-RX antenna pair t-r for theith CSI sample.Let H. be an N x S.dimensional matrix containing the CSI values of the Se subcarriers between an arbitrary TX-RX antenna pair t-r for N consecutive CSI samples.This matrix is given by the following equation. 00 2000 3000 4000 5000 6000 7000 Sample .=[H:.-(1)IH..-(2)/H..r(3)1...Hr(N)] (2) (a)Top 4 projections The columns of the matrix Ht.r represent the CSI time series -PCA 2 PCA 3 for each OFDM subcarrier.To detect the starting and end- ”6 --PCA4 ing points of any arbitrary key,WiKey first normalizes the 5 H.r matrix such that every CSI stream has zero mean and unit variance.We denote the normalized version of Ht.by Zt.r.WiKey then performs the PCA based dimensionality reduction and denoising(as described in Section 4.2)on Zt. and the resultant waveforms are further processed to detect the starting and ending points of the keystrokes from this particular TX-RX antenna pair.WiKey repeats this pro- cess on the CSI time series for all antenna pairs and obtains 1000200030004000.500060007000 Sample values for starting and ending points for keys based on the CSI time series from each antenna pair one by one.Finally. (b)Projections 2,3 4 WiKey combines the starting and ending points obtained Figure 4:PCA of Z-normalized CSI stream Zr from all TX-RX antenna pairs to calculate a robust estimate of starting and ending points of the time windows contain- component,we essentially remove the most noisy projection ing those keystrokes.Next we explain these steps in more among the all 4 projections of Zt.r. detail. 5.2 Keystroke Detection 5.1 PCA on Normalized Stream Although existing DFAR schemes propose techniques to LetΦ:pl}be an Se×p dimensional matrix that contains automatically detect the start and end of activities,they the top p principal components obtained from PCA on Zt.r. can not be directly adapted for use in detecting the start We remove the first component from those top p principal and end of keystrokes.Existing schemes use simple threshold components based on our observation that the first compon- based algorithms for detecting the start and end of activit- ent captures majority of the noise,while subsequent com- ies.While,threshold based schemes work well for macro- ponents contain information about movements of hands and movements,they are not well suited for micro-movements fingers while typing.This happens because PCA ranks prin- such as those of hands and fingers while typing,where we cipal components in descending order of their variance,due need to precisely segment time series of keystrokes that are to which the noisy components with higher variance gets closely spaced in time.Unlike general purpose threshold ranked among top principal components.Due to correlated based algorithms,we propose a keystroke detection al- nature of variations in multiple CSI time series,the removal gorithm that provides better detection accuracy,since it of this PCA component does not lead to any significant in- is strictly based on the experimentally observed shapes of formation loss as remaining PCA components still contain different keystroke waveforms.The intuition behind our al- enough information required for successfully detecting start- gorithm is that the CSI time series of every keystroke shows ing and ending points of the keystrokes. a typical increasing and decreasing trend in rates of change If we exclude the first component,the projection of the in CSI time series,similar to the one shown in Figure 2.To CSI stream Zt.r of t-r transmit-receive antenna pair onto the detect such increase and decrease in rates of change in CSI remaining principal components 2can then be written time series,our algorithm uses a moving window approach to as: detect the increasing and decreasing trends in rates of change Zp}=Zr×Φp} in all p-1 time series for each transmit-receive antenna pair (3) i.e.,on each column of.Our algorithm detects the start- where Zf2)is an Nx(p-1)dimensional matrix contain- ing and ending points of keystrokes in following six steps. ing the projected CSI streams in its columns.We choose First,the algorithm calculates the mean absolute devi- the p=4 in our implementation based on our observation ation (MAD)for each of the p-1 time series for each win- that only top 4 principal components contained most signi- dow of size W at j-th iteration.This is done primarily to ficant variations in CSI values caused by different keystrokes. detect the extent of variations in the values of a given time Figure 4(a)shows the result of projecting normalized CSI series.The main reason behind choosing MAD instead of time series Zt.r onto its top 4 principal components.We ob- variance is that in calculating,the deviations from the mean serve from Figure 4(b)that by removing the first principle are squared which gives more weight to extreme values.In
5. KEYSTROKE EXTRACTION WiKey segments the CSI time series to extract the CSI waveforms for individual keystrokes. For this, WiKey operates on the CSI time series resulting from the butterworth filtering. Let Ht,r(i) be an Sc×1 dimensional vector containing the CSI values of the Sc subcarriers between an arbitrary TX-RX antenna pair t − r for the i th CSI sample. Let Ht,r be an N × Sc dimensional matrix containing the CSI values of the Sc subcarriers between an arbitrary TX-RX antenna pair t − r for N consecutive CSI samples. This matrix is given by the following equation. Ht,r = [Ht,r(1)|Ht,r(2)|Ht,r(3)|...|Ht,r(N)]T (2) The columns of the matrix Ht,r represent the CSI time series for each OFDM subcarrier. To detect the starting and ending points of any arbitrary key, WiKey first normalizes the Ht,r matrix such that every CSI stream has zero mean and unit variance. We denote the normalized version of Ht,r by Zt,r. WiKey then performs the PCA based dimensionality reduction and denoising (as described in Section 4.2) on Zt,r and the resultant waveforms are further processed to detect the starting and ending points of the keystrokes from this particular TX-RX antenna pair. WiKey repeats this process on the CSI time series for all antenna pairs and obtains values for starting and ending points for keys based on the CSI time series from each antenna pair one by one. Finally, WiKey combines the starting and ending points obtained from all TX-RX antenna pairs to calculate a robust estimate of starting and ending points of the time windows containing those keystrokes. Next we explain these steps in more detail. 5.1 PCA on Normalized Stream Let Φ {1:p} Z be an Sc × p dimensional matrix that contains the top p principal components obtained from PCA on Zt,r. We remove the first component from those top p principal components based on our observation that the first component captures majority of the noise, while subsequent components contain information about movements of hands and fingers while typing. This happens because PCA ranks principal components in descending order of their variance, due to which the noisy components with higher variance gets ranked among top principal components. Due to correlated nature of variations in multiple CSI time series, the removal of this PCA component does not lead to any significant information loss as remaining PCA components still contain enough information required for successfully detecting starting and ending points of the keystrokes. If we exclude the first component, the projection of the CSI stream Zt,r of t-r transmit-receive antenna pair onto the remaining principal components Φ {2:p} Z can then be written as: Z {2:p} t,r = Zt,r × Φ {2:p} Z (3) where Z {2:p} t,r is an N × (p − 1) dimensional matrix containing the projected CSI streams in its columns. We choose the p = 4 in our implementation based on our observation that only top 4 principal components contained most signi- ficant variations in CSI values caused by different keystrokes. Figure 4(a) shows the result of projecting normalized CSI time series Zt,r onto its top 4 principal components. We observe from Figure 4(b) that by removing the first principle 1000 2000 3000 4000 5000 6000 7000 −1 0 1 2 3 4 5 6 7 Sample Projected CSI values PCA 1 PCA 2 PCA 3 PCA 4 (a) Top 4 projections 1000 2000 3000 4000 5000 6000 7000 −1 0 1 2 3 4 5 6 7 Sample Projected CSI values PCA 2 PCA 3 PCA 4 (b) Projections 2, 3 & 4 Figure 4: PCA of Z-normalized CSI stream Zt,r component, we essentially remove the most noisy projection among the all 4 projections of Zt,r. 5.2 Keystroke Detection Although existing DFAR schemes propose techniques to automatically detect the start and end of activities, they can not be directly adapted for use in detecting the start and end of keystrokes. Existing schemes use simple threshold based algorithms for detecting the start and end of activities. While, threshold based schemes work well for macromovements, they are not well suited for micro-movements such as those of hands and fingers while typing, where we need to precisely segment time series of keystrokes that are closely spaced in time. Unlike general purpose threshold based algorithms, we propose a keystroke detection algorithm that provides better detection accuracy, since it is strictly based on the experimentally observed shapes of different keystroke waveforms. The intuition behind our algorithm is that the CSI time series of every keystroke shows a typical increasing and decreasing trend in rates of change in CSI time series, similar to the one shown in Figure 2. To detect such increase and decrease in rates of change in CSI time series, our algorithm uses a moving window approach to detect the increasing and decreasing trends in rates of change in all p−1 time series for each transmit-receive antenna pair i.e., on each column of Z 2:p t,r . Our algorithm detects the starting and ending points of keystrokes in following six steps. First, the algorithm calculates the mean absolute deviation (MAD) for each of the p − 1 time series for each window of size W at j-th iteration. This is done primarily to detect the extent of variations in the values of a given time series. The main reason behind choosing MAD instead of variance is that in calculating, the deviations from the mean are squared which gives more weight to extreme values. In
cases where a time series contains outliers,this results in from our volunteers,we observed that on average the wave undue weight given to those outlying values and that sig- forms of a keystroke spanned tavg650 data points and nificantly corrupts the measure of deviation.The MAD is average number of data points between arrival of two con- calculated using following equation. secutive keystrokes was At 1250 data points at the CSI sampling rate of F.=2500 samples/s.We empirically △m因= ∑法”2同)-zG:j+w1 determined appropriate values for the remaining constants (4 W including W,Du:Iu,a,B,Bieft,Bright,Thresh and Pavg. where(j+W)represents the vector of means of the 5.3 Combining Results from Antenna Pairs kth projected CSI stream in j-th window.It calculates the As mentioned earlier,we obtain the starting points of key- value of Am;for each sample point i and for the principle strokes independently from each TX-RX antenna pair.Let components2≤k≤p. Se.r represent the set containing the starting points of all Second,the algorithm adds the mean absolute deviations keystrokes obtained from the keystroke detection algorithm in each waveform to calculate a combined measure AMj applied on the antenna pair t-r.First,we obtain the set of MAD in all p-1 waveforms,which is calculated in the St.r for each t-r pair.Second,we take the average of all following equation. the starting points that are within Atavg of each other in all sets S.r to obtain a robust estimate of starting points of △M,= △m[ (5) keystrokes.Third,based on experimentally measured aver- k三2 age span tavg of different keystrokes,we calculate the ending Third,the algorithm compares AM;to a heuristically set points of all keystrokes by simply adding tavg to the corres- ponding starting point. threshold Thresh.Let =AMi-Thresh,then 6j>0 shows that the current window i contains significant vari- 5.4 Extracting Keystroke Waveforms ations in CSI amplitudes. Once the algorithm calculates the set of starting and cor- Fourth,the algorithm compares &to its value in last win- responding ending points for keystrokes,we use those points dow 6j-1 to detect increasing or decreasing trend in detec- ted variations.When 6;-6j-1>0,there is an increasing to extract the waveforms from CSI matrix H.r.Let Km.t. represent the CSI waveform of mth keystroke extracted from trend in the rate of change in combined MAD (AM;)of CSI the antenna pair t-r.Let 3m represent the average of the time series and vice versa.These increasing and decreas- starting points for the mthkeystroke from all antenna pairs. ing trends are captured in variables iu and du,respectively We can express Km.t.r in terms of Ht.r follows. The algorithm increments the value of iu by 1 whenever 6j-6-1 0 and du by 1 whenever 6i-6j-1<0.Let Km.t.r Ht,r(3m3m+taug) (8) o represent forgetting factor,which is used to "forget"the After extracting the CSI waveforms Km.t.r from all sub- variations caused by noise to avoid false positives.To forget carriers of the t-r antenna pair,we apply PCA on those CSI such variations,the algorithm decrements both i and d by waveforms to remove the noisy components and obtain the 1 if AMj Thresh for a duration of oW. components that represent the variations caused by move- Fifth,as soon as the values of iu and du exceed empir- ically determined thresholds Iu and Du,respectively,the ments of hands and fingers. Unlike principle components derived from normalized algorithm detects the start of the keystroke.As soon as the streams,it is difficult to decide which PCA component rep- algorithm detects a keystroke,it estimates the starting point resents noise and should be removed from the top p principal sm and ending point em of the keystroke waveforms using following equations. components for the case of Km.t.r.The difficulty arises be- cause Km.t.r contains the set of waveforms for a specific Sm =j-BW-Bleft (6) keystroke instead of the whole CSI stream,due to which the variance of noisy component often becomes small.We em =j-BW +tavg Bright (7) observe that the noisy PCA component keeps changing po- where tavg is the average number of data points spanned by sitions between 1*t and 2nd place among the sorted PCA waveforms of different keystrokes,B is the span factor which components for different extracted keystroke waveforms.In determines the estimated starting point of the keystroke and order to get rid of this problem,we first project Km.t.r onto Bieft and Bright are guard intervals on both sides of the all top g principal components.Let be an Sexq di- estimated keystroke interval.The guard intervals ensure that mensional matrix that represent the top g principal compon- the detected keystroke waveforms are complete ents in Km.t.r obtained after applying PCA and Kg be Last,our algorithm calculates the sum of powers in all an L x q dimensional matrix containing the projected CSI waveforms lying within those starting and ending points streams in its columns,where L is the length of segmented and then compares this combined power with a sum power keystroke waveform.Thus,K is given by the following threshold (Pavg)to confirm the presence of a complete key- equation. stroke within that interval.This ensures that the training (9)】 models are built using only those waveforms which contain K=Kmr×Φq m.t.r complete shapes of the keystrokes.Once keystroke detec- In our implementation,we choose q =4.This choice is again tion is confirmed,the algorithm finally returns the starting based on the observation that the top 4 principal compon- point (sm)of the detected keystroke and jumps Ata data ents contain enough information about keystrokes required points ahead of sm to look for next keystroke,where Atvg to achieve high accuracy during classification. is the average number of data points between arrival of two To detect which waveform inKrepresents the noisy consecutive keystrokes.From the CSI data set we collected projection,we chose the top 2 projected waveforms and di-
cases where a time series contains outliers, this results in undue weight given to those outlying values and that significantly corrupts the measure of deviation. The MAD is calculated using following equation. △mj [k] = Pj+W i=j |Z {k} t,r (i) − Z {k} t,r (j : j + W)| W (4) where Z {k} t,r (j : j + W) represents the vector of means of the kth projected CSI stream in j-th window. It calculates the value of △mj for each sample point j and for the principle components 2 ≤ k ≤ p. Second, the algorithm adds the mean absolute deviations in each waveform to calculate a combined measure △Mj of MAD in all p − 1 waveforms, which is calculated in the following equation. △Mj = Xp k=2 △mj [k] (5) Third, the algorithm compares △Mj to a heuristically set threshold T hresh. Let δj = △Mj − T hresh, then δj > 0 shows that the current window j contains significant variations in CSI amplitudes. Fourth, the algorithm compares δj to its value in last window δj−1 to detect increasing or decreasing trend in detected variations. When δj − δj−1 > 0, there is an increasing trend in the rate of change in combined MAD (△Mj ) of CSI time series and vice versa. These increasing and decreasing trends are captured in variables iu and du, respectively. The algorithm increments the value of iu by 1 whenever δj − δj−1 > 0 and du by 1 whenever δj − δj−1 < 0. Let σ represent forgetting factor, which is used to “forget” the variations caused by noise to avoid false positives. To forget such variations, the algorithm decrements both iu and du by 1 if △Mj < T hresh for a duration of σW. Fifth, as soon as the values of iu and du exceed empirically determined thresholds Iu and Du, respectively, the algorithm detects the start of the keystroke. As soon as the algorithm detects a keystroke, it estimates the starting point sm and ending point em of the keystroke waveforms using following equations. sm = j − βW − Blef t (6) em = j − βW + tavg + Bright (7) where tavg is the average number of data points spanned by waveforms of different keystrokes, β is the span factor which determines the estimated starting point of the keystroke and Blef t and Bright are guard intervals on both sides of the estimated keystroke interval. The guard intervals ensure that the detected keystroke waveforms are complete. Last, our algorithm calculates the sum of powers in all waveforms lying within those starting and ending points and then compares this combined power with a sum power threshold (Pavg) to confirm the presence of a complete keystroke within that interval. This ensures that the training models are built using only those waveforms which contain complete shapes of the keystrokes. Once keystroke detection is confirmed, the algorithm finally returns the starting point (sm) of the detected keystroke and jumps △tavg data points ahead of sm to look for next keystroke, where △tavg is the average number of data points between arrival of two consecutive keystrokes. From the CSI data set we collected from our volunteers, we observed that on average the waveforms of a keystroke spanned tavg ≈ 650 data points and average number of data points between arrival of two consecutive keystrokes was △tavg ≈ 1250 data points at the CSI sampling rate of Fs = 2500 samples/s. We empirically determined appropriate values for the remaining constants including W, Du, Iu, σ, β, Blef t, Bright, T hresh and Pavg. 5.3 Combining Results from Antenna Pairs As mentioned earlier, we obtain the starting points of keystrokes independently from each TX-RX antenna pair. Let St,r represent the set containing the starting points of all keystrokes obtained from the keystroke detection algorithm applied on the antenna pair t − r. First, we obtain the set St,r for each t − r pair. Second, we take the average of all the starting points that are within △tavg of each other in all sets St,r to obtain a robust estimate of starting points of keystrokes. Third, based on experimentally measured average span tavg of different keystrokes, we calculate the ending points of all keystrokes by simply adding tavg to the corresponding starting point. 5.4 Extracting Keystroke Waveforms Once the algorithm calculates the set of starting and corresponding ending points for keystrokes, we use those points to extract the waveforms from CSI matrix Ht,r. Let Km,t,r represent the CSI waveform of mth keystroke extracted from the antenna pair t-r. Let sm represent the average of the starting points for the mth keystroke from all antenna pairs. We can express Km,t,r in terms of Ht,r follows. Km,t,r = Ht,r(sm : sm + tavg) (8) After extracting the CSI waveforms Km,t,r from all subcarriers of the t-r antenna pair, we apply PCA on those CSI waveforms to remove the noisy components and obtain the components that represent the variations caused by movements of hands and fingers. Unlike principle components derived from normalized streams, it is difficult to decide which PCA component represents noise and should be removed from the top p principal components for the case of Km,t,r. The difficulty arises because Km,t,r contains the set of waveforms for a specific keystroke instead of the whole CSI stream, due to which the variance of noisy component often becomes small. We observe that the noisy PCA component keeps changing positions between 1st and 2nd place among the sorted PCA components for different extracted keystroke waveforms. In order to get rid of this problem, we first project Km,t,r onto all top q principal components. Let Φ {1:q} K be an Sc × q dimensional matrix that represent the top q principal components in Km,t,r obtained after applying PCA and K {1:q} m,t,r be an L × q dimensional matrix containing the projected CSI streams in its columns, where L is the length of segmented keystroke waveform. Thus, K {1:q} m,t,r is given by the following equation. K {1:q} m,t,r = Km,t,r × Φ {1:q} K (9) In our implementation, we choose q = 4. This choice is again based on the observation that the top 4 principal components contain enough information about keystrokes required to achieve high accuracy during classification. To detect which waveform in K {1:q} m,t,r represents the noisy projection, we chose the top 2 projected waveforms and di-
vide each of them into R bins and calculate the variances classification process because waveforms contain hundreds in those bins.We then compare the variances calculated for of data points per keystroke.Therefore,we apply Discrete different bins of one waveform with the corresponding bins Wavelet Transform (DWT)to compress the extracted key- of the other waveform.The waveform that has larger num- stroke waveforms while preserving most of the time and fre- ber of higher variance bins is considered to be the noisy quency domain information. projection,which we remove from K to finally get1 The DWT of a discrete signal yln can be written in terms waveforms.Here we leverage the fact that although over- of wavelet basis functions as: all variance of a noisy projection may be smaller than the variance of other waveforms,but if the waveform is divided 1下 into appropriate number of smaller bins then the number of j=jo k bins in which the variance of the noisy projection is higher than the corresponding bins of other waveforms is always where L represents the length of signal yn].The functions larger.This is because the impact of noise is more dominant .k(n)are called scaling functions,where as the correspond- in smaller time windows compared to larger time windows. ing coefficients A(j,k)are known as scaling or approrimation We used R=10 in our implementation of WiKey. coefficients.Similarly,the functions ik(n)are known as PCA can lead to different ordering of principal compon- wavelet functions and the corresponding coefficients y(j,k) ents in waveforms of different keystrokes of the same key,be- are known as wavelet or detail coefficients.To calculate ap- cause the ordering of waveforms done by PCA is based solely proximation and detail coefficients,the scaling and wavelet on the value of their variance,which can change even if a functions are chosen such that they are orthonormal to each key is pressed in a slightly different way.This is problematic other.Thus,the following condition holds because to recognize the keystrokes,we need to compare the projections of an unseen key with the corresponding projec- (vj.k(n),pjo.m(n))=6j.jo6k.m tions of the keys in the training data.In order to minimize Based on the condition above,the approximation and detail the possibility of reordering,we order the projected key- coefficients calculated for j-th scale can then be written as: stroke waveforms in descending order of their peak to peak values before using the waveforms for feature extraction and A(i,)=(m,P+1,k(n)》= classifier training. 6.FEATURE EXTRACTION y(j,k)=(yn,5+1.k(n)》= 元∑响+k To differentiate between keystrokes,we need to extract To achieve desired compression using DWT,we need to features that can uniquely represent those keystrokes.As different keys on a keyboard are closely placed,standard fea- select appropriate wavelet and scaling filters.We tested the accuracy of our classifier using two different wavelet filters: tures such as maximum peak power,mean amplitude,root Daubechies and Symlets.We choose Daubechies D4 (four mean square deviation of signal amplitude,second/third coefficients per filter)wavelet and scaling filters because central moments,rate of change,signal energy or entropy. the models trained with the DWT features extracted us- and number of zero crossings cannot be used because ad- jacent keys give almost the same values for these features. ing these filters achieved higher classification accuracy.For each keystroke,we perform DWT 3 times on each one of Tables 1 and 2 show means and variances of some of these features calculated for 2"4 waveform in the extracted CSI- its (g-1)=3 waveforms,which is achieved by applying DWT on the approximation coefficients obtained from the waveforms for keystrokes of alphabetic keys pressed by a users.It can be observed that the values of these features previous steps.We choose to apply DWT 3 times because for different keys (for example'c'and 'd')come out to be this preserves enough details of those waveforms required for very similar.Looking at the means calculated for features successful classification while achieving maximum compres- like energy and number of zero crossings in Table 1,it seems sion.WiKey uses only the approximation coefficients as key- that they have different values for different keys.But as we stroke features and discards the detail coefficients because observe from Table 2,the variance of those features is high. approximation coefficients alone result in good classification Due to the reasons above,it becomes infeasible to use these accuracy.Therefore,we have 3 x Mr x MR keystroke shapes features for keystroke classification.Frequency analysis is for every keystroke,i.e.the approximation coefficients of all 3 waveforms extracted from the CSI time series in each TX- also not feasible because the frequency components in key- RX antenna pair,where Mr is the number of transmitting strokes of many different keys are similar.Another reason antennas and MR is the number of receiving antennas.Fig- behind inapplicability of frequency domain analysis is that they lead to complete loss of time domain information. ures 5(a)through 5(d)show feature extraction procedure performed on the 2nd keystroke waveforms for keys i'and From our data set,we have observed that although the 'o',extracted from TX-1,RX-1 antenna pair. frequency components in most keys are similar,they occur at different time instants for different keys.Therefore,we use shapes of the extracted keystroke waveforms as their features 7. CLASSIFICATION because the shapes retain both time and frequency domain After obtaining DWT based shape features of keystrokes, information of the waveforms and are thus more suited for WiKey builds training models for classification using them. use in classification.We observed experimentally that the As WiKey needs to compare shape features of different key- shapes of different keystroke waveforms were quite different strokes,we need a comparison metric that provides an effect- from each other,as shown by Figure 5(a)and 5(b). ive measure of similarity between shape features of two key- Directly using the extracted keystroke waveforms as key- strokes.WiKey uses a well-known method called dynamic stroke features leads to high computational costs in the time warping (DTW)that calculates the distance between
vide each of them into R bins and calculate the variances in those bins. We then compare the variances calculated for different bins of one waveform with the corresponding bins of the other waveform. The waveform that has larger number of higher variance bins is considered to be the noisy projection, which we remove from K {1:q} m,t,r to finally get q −1 waveforms. Here we leverage the fact that although overall variance of a noisy projection may be smaller than the variance of other waveforms, but if the waveform is divided into appropriate number of smaller bins then the number of bins in which the variance of the noisy projection is higher than the corresponding bins of other waveforms is always larger. This is because the impact of noise is more dominant in smaller time windows compared to larger time windows. We used R=10 in our implementation of WiKey. PCA can lead to different ordering of principal components in waveforms of different keystrokes of the same key, because the ordering of waveforms done by PCA is based solely on the value of their variance, which can change even if a key is pressed in a slightly different way. This is problematic because to recognize the keystrokes, we need to compare the projections of an unseen key with the corresponding projections of the keys in the training data. In order to minimize the possibility of reordering, we order the projected keystroke waveforms in descending order of their peak to peak values before using the waveforms for feature extraction and classifier training. 6. FEATURE EXTRACTION To differentiate between keystrokes, we need to extract features that can uniquely represent those keystrokes. As different keys on a keyboard are closely placed, standard features such as maximum peak power, mean amplitude, root mean square deviation of signal amplitude, second/third central moments, rate of change, signal energy or entropy, and number of zero crossings cannot be used because adjacent keys give almost the same values for these features. Tables 1 and 2 show means and variances of some of these features calculated for 2nd waveform in the extracted CSIwaveforms for keystrokes of alphabetic keys pressed by a users. It can be observed that the values of these features for different keys (for example ‘c’ and ‘d’) come out to be very similar. Looking at the means calculated for features like energy and number of zero crossings in Table 1, it seems that they have different values for different keys. But as we observe from Table 2, the variance of those features is high. Due to the reasons above, it becomes infeasible to use these features for keystroke classification. Frequency analysis is also not feasible because the frequency components in keystrokes of many different keys are similar. Another reason behind inapplicability of frequency domain analysis is that they lead to complete loss of time domain information. From our data set, we have observed that although the frequency components in most keys are similar, they occur at different time instants for different keys. Therefore, we use shapes of the extracted keystroke waveforms as their features because the shapes retain both time and frequency domain information of the waveforms and are thus more suited for use in classification. We observed experimentally that the shapes of different keystroke waveforms were quite different from each other, as shown by Figure 5(a) and 5(b). Directly using the extracted keystroke waveforms as keystroke features leads to high computational costs in the classification process because waveforms contain hundreds of data points per keystroke. Therefore, we apply Discrete Wavelet Transform (DWT) to compress the extracted keystroke waveforms while preserving most of the time and frequency domain information. The DWT of a discrete signal y[n] can be written in terms of wavelet basis functions as: y[n] = 1 √ L X k λ(j0, k)ϕj0,k(n) + 1 √ L X∞ j=j0 X k γ(j, k)ψj,k(n) where L represents the length of signal y[n]. The functions ϕj,k(n) are called scaling functions, where as the corresponding coefficients λ(j, k) are known as scaling or approximation coefficients. Similarly, the functions ψj,k(n) are known as wavelet functions and the corresponding coefficients γ(j, k) are known as wavelet or detail coefficients. To calculate approximation and detail coefficients, the scaling and wavelet functions are chosen such that they are orthonormal to each other. Thus, the following condition holds. hψj,k(n), ϕj0,m(n)i = δj,j0 δk,m Based on the condition above, the approximation and detail coefficients calculated for j-th scale can then be written as: λ(j, k) = hy[n], ϕj+1,k(n)i = 1 √ L X n y[n]ϕj+1,k(n) γ(j, k) = hy[n], ψj+1,k(n)i = 1 √ L X n y[n]ψj+1,k(n) To achieve desired compression using DWT, we need to select appropriate wavelet and scaling filters. We tested the accuracy of our classifier using two different wavelet filters: Daubechies and Symlets. We choose Daubechies D4 (four coefficients per filter) wavelet and scaling filters because the models trained with the DWT features extracted using these filters achieved higher classification accuracy. For each keystroke, we perform DWT 3 times on each one of its (q − 1) = 3 waveforms, which is achieved by applying DWT on the approximation coefficients obtained from the previous steps. We choose to apply DWT 3 times because this preserves enough details of those waveforms required for successful classification while achieving maximum compression. WiKey uses only the approximation coefficients as keystroke features and discards the detail coefficients because approximation coefficients alone result in good classification accuracy. Therefore, we have 3×MT ×MR keystroke shapes for every keystroke, i.e. the approximation coefficients of all 3 waveforms extracted from the CSI time series in each TXRX antenna pair, where MT is the number of transmitting antennas and MR is the number of receiving antennas. Figures 5(a) through 5(d) show feature extraction procedure performed on the 2nd keystroke waveforms for keys ‘i’ and ‘o’, extracted from TX-1, RX-1 antenna pair. 7. CLASSIFICATION After obtaining DWT based shape features of keystrokes, WiKey builds training models for classification using them. As WiKey needs to compare shape features of different keystrokes, we need a comparison metric that provides an effective measure of similarity between shape features of two keystrokes. WiKey uses a well-known method called dynamic time warping (DTW) that calculates the distance between
20 409 lemporal units (a)Keystroke waveforms for key i (b)Keystroke waveforms for key o (c)DWT features of i (d)DWT features of o from 24 waveforms from 2nd waveforms Figure 5:Feature extraction from 24 keystroke waveforms extracted from TX-1,RX-1 for I and O Table 1:Average values of features extracted from keystrokes of keys a-z collected from user 10 h 00 m 入 Mean amplitu 0024 nd central moment 03000100s302 0. hird central moment 002-00 000096-0010029 0.0609190.05001010.05 0.02 0.010.010.040.0060.0030.0980.101 0.0290.02 00.020.04 MS deviation027080278202s5050.4244079905060204720570.320.30.32☒0.290.4303137Q222Q4720.430.220.3350.3060.503045 Entropy97697629761697629769.76169766972976297@9.769.769.769.7629.769.7697616976297629729.7629.7629.7629.7696976 Zero Crossin869323622544554053420906013.7109.063.82.9855.635216.7515s1436.48101755 waveforms by performing optimal alignment between them. tenna pairs.To classify a detected keystroke,WiKey feeds Using DTW distance as the comparison metric between key- the shape features of that keystroke to their corresponding stroke shape features,WiKey trains an ensemble of k-nearest kNN classifiers and obtains a decision from each classifier in neighbour (kNN)classifiers using those features from all TX- the ensemble.Each kNN classifier searches for the majority RX antenna pairs.WiKey obtains decisions from each clas- class label among k nearest neighbors of the corresponding sifier in the ensemble and uses majority voting to obtain shape feature using DTW distance metric.WiKey calculates final result.Next,we first explain how we apply DTW on the final result through majority voting on the decisions of the keystroke shape features and then explain how we train all kNN classifiers in the ensemble. the ensemble of classifiers. 8.IMPLEMENTATION EVALUATION 7.1 Dynamic Time Warping DTW is a dynamic programming based solution for ob- 8.1 Hardware Setup taining minimum distance alignment between any two wave- We implemented our scheme using off-the-shelf hardware forms.DTW can handle waveforms of different lengths and devices.Specifically,we use a Lenovo X200 laptop with Intel allows a non-linear mapping of one waveform to another by Link 5300 WiFi NIC as the receiver that connects to the minimizing the distance between the two.In contrast to Eu- three antennas of the X200 laptop.The X200 laptop has clidean distance,DTW gives us intuitive distance between 2.26GHz Intel Core 2 Duo processor with 4GB of RAM and two waveforms by determining minimum distance warping Ubuntu 14.04 as its operating system.We used TP-Link TL- path between them even if they are distorted or shifted WR1043ND WiFi router as transmitter operating in 802.11n versions of each other.DTW distance is the Euclidean dis- AP mode at 2.4GHz.We collect the CSI values from the Intel tance of the optimal warping path between two waveforms 5300 NIC using a modified driver developed by Halperin et calculated under boundary conditions and local path con- al.[24.The transmitter has 2 antennas and the receiver has straints [25.In our experiments,DTW distance proves to 3 antennas,.元.e.,Mr=2 and MR=3.This gives3×2×3= be very effective metric for comparing two shape features of 18 classification models for each key in our evaluations. different keystrokes.WiKey uses the open source implement- We place the X200 laptop at a distance of 30 cm from ation of DTW in the Machine Learning Toolbox (MLT)by the keyboard such that the back side of its screen faces the Jang 26.WiKey uses local path constraints of 27,45,and keyboard on which the users type and its screen is within the 63 degrees while determining minimum cost warping path line-of-sight (LOS)of the WiFi router it is connected to.The between two waveforms.For the features extracted for keys distance of WiFi router from the target keyboard is 4 meters i'and 'o'shown in figures 5(a)and 5(b).the DTW distance The CSI values are measured on ICMP ping packets sent among features of key 'i'was 18.79 and the DTW distance from the WiFi router,i.e.,the TP-Link TL-WR1043ND.to among features of key 'o'was 19.44.However,the average the laptop at high data rate of about 2500 packets/s.Setting DTW distance between features of these keys was 44.2. a higher ping frequency leads to higher sampling rate of CSI. 7.2 Classifier Training which ensures that the time resolution of the CSI values is high enough for capturing maximum details of different type To maximize the advantage of having multiple shape fea- of keystrokes tures per keystroke obtained from multiple transmit-receive antenna pairs,we build separate classifiers for each of those 8.2 Data Collection shape features.We build an ensemble of 3 x Mr x MR clas- To evaluate the accuracy of Wikey,we collected train- sifiers using kNN classification scheme.WiKey requires the ing and testing dataset from 10 users.These 10 users were user to provide training data for the keystrokes to be recog- general university students who volunteered for the experi- nized and each classifier is trained using the corresponding ments and only 2 out of them had some know how of wire- features extracted from CSI time series from all TX-RX an- less communication.Users 1-9 first provided 30 samples for
0 200 400 600 800 −2 −1 0 1 2 3 Sample Value 0 200 400 600 800 −2 −1 0 1 2 3 sample Value 1 2 3 1 2 3 (a) Keystroke waveforms for key i 0 200 400 600 800 −2 −1 0 1 2 3 sample Value 0 200 400 600 800 −4 −2 0 2 4 6 Sample Value 1 2 3 1 2 3 (b) Keystroke waveforms for key o 0 50 100 −4 −3 −2 −1 0 1 2 temporal units Approximation Coeefficients (c) DWT features of i from 2nd waveforms 0 50 100 −4 −2 0 2 4 temporal units Approximation Coefficients (d) DWT features of o from 2nd waveforms Figure 5: Feature extraction from 2 nd keystroke waveforms extracted from TX-1, RX-1 for I and O Table 1: Average values of features extracted from keystrokes of keys a-z collected from user 10 Features a b c d e f g h i j k l m n o p q r s t u v w x y z Mean amplitude -0 -0.04 0.0124 -0.03 0.045 -0.043 -0.076 -0.06 0.014 -0.03 0.03 -0.01 -0 0.032 0.02 0.03 -0.012 0.008 0.054 7E-04 -0.013 -0.02 -0 -0.1 -0.02 0.06 Second central moment 0.08 0.133 0.0801 0.083 0.156 0.1818 0.6523 0.263 0.12 0.231 0.33 0.11 0.1 0.108 0.09 0.19 0.1022 0.051 0.245 0.192 0.062 0.12 0.097 0.26 0.09 0.21 Third central moment 0.02 -0.03 0.0036 -0.01 0.029 -0.06 -0.919 -0.05 -0.01 -0.1 0.05 0.02 -0 0.01 0.01 0.04 -0.006 0.003 0.098 -0.101 -0.01 0.029 0.023 -0 -0.02 0.04 RMS deviation 0.27 0.359 0.2782 0.285 0.385 0.4244 0.7899 0.506 0.332 0.472 0.57 0.32 0.3 0.323 0.29 0.43 0.3137 0.222 0.472 0.434 0.242 0.335 0.306 0.5 0.3 0.45 Energy 71.5 116.6 69.788 73.34 137.5 159.43 570.8 232.1 104.8 201.4 288 95.2 83.7 94.98 75.6 167 88.928 44.22 215.5 167.1 54.56 104.4 84.48 227 81.5 182 Entropy 9.76 9.762 9.7616 9.762 9.762 9.7616 9.7616 9.762 9.762 9.762 9.76 9.76 9.76 9.762 9.76 9.76 9.7616 9.762 9.762 9.762 9.762 9.762 9.762 9.76 9.76 9.76 Zero Crossings 11.8 6.913 12.363 6.225 6.4 4.375 4.075 3.4 12.08 9.088 6.05 13.7 10 9.063 13.8 12.9 11.85 15.41 6.35 12.85 16.75 11.88 14.3 6.48 10.1 7.55 waveforms by performing optimal alignment between them. Using DTW distance as the comparison metric between keystroke shape features, WiKey trains an ensemble of k-nearest neighbour (kNN) classifiers using those features from all TXRX antenna pairs. WiKey obtains decisions from each classifier in the ensemble and uses majority voting to obtain final result. Next, we first explain how we apply DTW on the keystroke shape features and then explain how we train the ensemble of classifiers. 7.1 Dynamic Time Warping DTW is a dynamic programming based solution for obtaining minimum distance alignment between any two waveforms. DTW can handle waveforms of different lengths and allows a non-linear mapping of one waveform to another by minimizing the distance between the two. In contrast to Euclidean distance, DTW gives us intuitive distance between two waveforms by determining minimum distance warping path between them even if they are distorted or shifted versions of each other. DTW distance is the Euclidean distance of the optimal warping path between two waveforms calculated under boundary conditions and local path constraints [25]. In our experiments, DTW distance proves to be very effective metric for comparing two shape features of different keystrokes. WiKey uses the open source implementation of DTW in the Machine Learning Toolbox (MLT) by Jang [26]. WiKey uses local path constraints of 27, 45, and 63 degrees while determining minimum cost warping path between two waveforms. For the features extracted for keys ‘i’ and ‘o’ shown in figures 5(a) and 5(b), the DTW distance among features of key ‘i’ was 18.79 and the DTW distance among features of key ‘o’ was 19.44. However, the average DTW distance between features of these keys was 44.2. 7.2 Classifier Training To maximize the advantage of having multiple shape features per keystroke obtained from multiple transmit-receive antenna pairs, we build separate classifiers for each of those shape features. We build an ensemble of 3 × MT × MR classifiers using kNN classification scheme. WiKey requires the user to provide training data for the keystrokes to be recognized and each classifier is trained using the corresponding features extracted from CSI time series from all TX-RX antenna pairs. To classify a detected keystroke, WiKey feeds the shape features of that keystroke to their corresponding kNN classifiers and obtains a decision from each classifier in the ensemble. Each kNN classifier searches for the majority class label among k nearest neighbors of the corresponding shape feature using DTW distance metric. WiKey calculates the final result through majority voting on the decisions of all kNN classifiers in the ensemble. 8. IMPLEMENTATION & EVALUATION 8.1 Hardware Setup We implemented our scheme using off-the-shelf hardware devices. Specifically, we use a Lenovo X200 laptop with Intel Link 5300 WiFi NIC as the receiver that connects to the three antennas of the X200 laptop. The X200 laptop has 2.26GHz Intel Core 2 Duo processor with 4GB of RAM and Ubuntu 14.04 as its operating system. We used TP-Link TLWR1043ND WiFi router as transmitter operating in 802.11n AP mode at 2.4GHz. We collect the CSI values from the Intel 5300 NIC using a modified driver developed by Halperin et al. [24]. The transmitter has 2 antennas and the receiver has 3 antennas, i.e., MT = 2 and MR = 3. This gives 3×2×3 = 18 classification models for each key in our evaluations. We place the X200 laptop at a distance of 30 cm from the keyboard such that the back side of its screen faces the keyboard on which the users type and its screen is within the line-of-sight (LOS) of the WiFi router it is connected to. The distance of WiFi router from the target keyboard is 4 meters. The CSI values are measured on ICMP ping packets sent from the WiFi router, i.e., the TP-Link TL-WR1043ND, to the laptop at high data rate of about 2500 packets/s. Setting a higher ping frequency leads to higher sampling rate of CSI, which ensures that the time resolution of the CSI values is high enough for capturing maximum details of different type of keystrokes. 8.2 Data Collection To evaluate the accuracy of WiKey, we collected training and testing dataset from 10 users. These 10 users were general university students who volunteered for the experiments and only 2 out of them had some know how of wireless communication. Users 1–9 first provided 30 samples for
Table 2:Variance of different features extracted from keystrokes of keys a-z collected from user 10 eatures Mean amplitude 2n4 0018 460440 2 0003000i n0omoi .000四 001g6 4.52319192.40210 93159 28 Zero Crossings 3196 1299433662r393 314 125 1525 92 122 177 each of the 37 keys(26 alphabets,10 digits and 1 space bar) We also observe from this figure that the kevstrokes that are by pressing that key multiple times.After this,these users missed are usually those for which fingers move very little typed the sentence S="the quick brown fox jumped over when typing.For example,in pressing keys 'a','d',f','i',j the lazy dog"two times,without spaces. and 'x'the hands and fingers move very little,and thus the To evaluate how the number of training samples impact variations in the CSI values sometimes go undetected.Fig- the accuracy,we collected 80 samples for each of the 37 keys ure 6(b)shows the keystroke extraction rate for each user from User 10.Afterwards,this user typed each of the follow- averaged over all 37 keys.The experimental results show ing sentences 5 times,without spaces:S ="the quick brown that our keystroke extraction algorithm is robust because it fox jumps over the lazy dog",S2="nobody knew why the consistently achieves high performance over different users candles blew out",S3="the autumn leaves look like golden without requiring any user specific tuning of system para- snow",S4 ="nothing is as profound as the imagination" meters. and Ss ="my small pet mouse escaped from his cage".We asked users to type naturally with multiple fingers but only 8.4 Classification Accuracy press one key at a time while keeping the average keystroke We evaluate the classification accuracy of Wikey through inter-arrival time at 1 second.After recording the CSI time two sets of experiments.In the first set of experiments,we series for each of the above experiments,we first applied our build classifiers for each of the 10 users using 30 samples keystroke extraction algorithm on those recorded CSI time and measure the 10-fold cross validation accuracy of those series to extract the CSI waveforms for individual keys and classifiers.In the second set of experiments,we build clas- then extracted the DWT based shape features from each of sifier for user 10 while increasing the number of samples the extracted keystroke waveforms. from 30 to 80 in order to observe the impact of increase in 8.3 Keystroke Extraction Accuracy the number of training samples on the classification accur- acy.Cross validation automatically picks a part of data for We evaluate the accuracy of our keystroke extraction al- training and remaining for testing and does not use any data gorithm in terms of the detection ratio,which is defined as in testing that was used in training.Recall that the WiKey the total number of correctly detected keystrokes in a CSI uses kNN classifiers for recognizing keys.In all of following time series divided by the total number of actual keystrokes. experiments,we set =15. The detection ratio of our proposed algorithm is more than 97.5%.Figure 6(a)shows the color map showing the percent- 8.4.1 Accuracy with 30 Samples per Key age of the missed keystrokes of all 37 keys for all 10 users. We evaluate the classification accuracy of WiKey in terms of average accuracy per key and average accuracy on all keys of any given user.We also present confusion matrices resulting from our experiments.A confusion matrix tells us which key was recognized by WiKey as which key with what percentage.We calculate the average accuracy per key by taking the average of confusion matrices obtained from all users and average accuracy on all keys of any given user by averaging the accuracy on all keys within the confusion (a)Colormap for missed keys (b)Keystroke extraction rates matrix of that user.For each user,we trained each classifier per user averaged over all keys using features from 30 samples of each key.We conducted Figure 6:Keystroke extraction results our experiments on all 37 keys as well as on only 26 alphabet keys and performed 10-fold cross validation to obtain the The darker areas represent higher rate of missed key- confusion matrices. strokes.We can observe from this figure that the number Wikey achieves an overall keystroke recognition accuracy of missed keystrokes vary for different individuals depend- of 82.87%in case of 37 keys and 83.46%in case of 26 al- ing upon their typing behaviors.For example we observed phabetic keys when averaged over all keys and users.Fig- that the keystrokes of user 4 were missed in higher per- ure 7 shows the recognition accuracy for each key across all centage with average detection ratio of 91.8%whereas the users for the 26 alphabetic keys.Similarly,Figure 8 shows keystrokes of user 10 were not missed at all with average de- the recognition accuracy for each key across all users for all tection ratio of 100%calculated over all 37 keys.The lower 37 keys.Figure 9 shows the average recognition accuracy extraction accuracy for user 4 shows that more keystrokes achieved by each user for both 26 keys and 37 keys.We ob- were missed.which is due to the significant difference in his serve that the recognition accuracy for 26 alphabetic keys is typing behavior compared to other users.The accuracy of on average greater than the recognition accuracy for the all our scheme for such a user can be increased significantly by 37 keys.This is because the keystroke waveforms of the digit tuning the parameters of our algorithm for the given user keys(0-9)often show similarity with keystroke waveforms of
Table 2: Variance of different features extracted from keystrokes of keys a-z collected from user 10 Features a b c d e f g h i j k l m n o p q r s t u v w x y z Mean amplitude 0.00029 4E-04 0.0003 1E-04 4E-04 0.0002 0.0003 8E-04 5E-04 5E-04 5E-04 2E-04 5E-04 3E-04 3E-04 5E-04 0.0003 0.00018 6E-04 4E-04 3E-04 1E-04 3E-04 4E-04 6E-04 4E-04 Second central moment 0.00513 0.003 0.0011 0.001 0.007 0.0028 0.1008 0.012 0.005 0.009 0.016 0.006 0.002 0.003 0.002 0.007 0.0017 0.00041 0.03 0.003 0.001 0.006 0.002 0.007 0.003 0.005 Third central moment 0.00155 9E-04 0.0001 2E-04 0.002 0.0033 0.7021 0.002 0.001 0.007 0.009 0.017 5E-04 3E-04 5E-04 0.003 0.0003 7.70E-05 0.024 0.003 1E-04 0.015 9E-04 0.001 6E-04 0.003 RMS deviation 0.0108 0.006 0.0031 0.004 0.011 0.0038 0.0348 0.011 0.011 0.01 0.012 0.009 0.006 0.005 0.004 0.008 0.0042 0.00196 0.026 0.004 0.004 0.008 0.004 0.007 0.007 0.007 Energy 3874.59 2283 816.91 912.4 5204 2160 76863 9315 3925 6846 12094 4883 1679 2153 1150 5048 1296.2 308.95 23201 2181 886.9 4714 1403 5166 2100 4147 Entropy 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 Zero Crossings 26.3859 12.36 33.196 12.94 9.433 6.2627 3.9943 3.585 44.91 13.14 12.58 31.14 28.51 15.25 29.24 24.09 21.673 17.3847 12.21 17.7 36.85 21.63 27.71 13.67 31.49 6.529 each of the 37 keys (26 alphabets, 10 digits and 1 space bar) by pressing that key multiple times. After this, these users typed the sentence S1 = “the quick brown fox jumped over the lazy dog” two times, without spaces. To evaluate how the number of training samples impact the accuracy, we collected 80 samples for each of the 37 keys from User 10. Afterwards, this user typed each of the following sentences 5 times, without spaces: S1 =“the quick brown fox jumps over the lazy dog”, S2 = “nobody knew why the candles blew out”, S3 = “the autumn leaves look like golden snow”, S4 = “nothing is as profound as the imagination” and S5 = “my small pet mouse escaped from his cage”. We asked users to type naturally with multiple fingers but only press one key at a time while keeping the average keystroke inter-arrival time at 1 second. After recording the CSI time series for each of the above experiments, we first applied our keystroke extraction algorithm on those recorded CSI time series to extract the CSI waveforms for individual keys and then extracted the DWT based shape features from each of the extracted keystroke waveforms. 8.3 Keystroke Extraction Accuracy We evaluate the accuracy of our keystroke extraction algorithm in terms of the detection ratio, which is defined as the total number of correctly detected keystrokes in a CSI time series divided by the total number of actual keystrokes. The detection ratio of our proposed algorithm is more than 97.5%. Figure 6(a) shows the color map showing the percentage of the missed keystrokes of all 37 keys for all 10 users. Users Keys SPa b c d e f g h i j k l mn o p q r s t u v w x y z 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 10 (a) Colormap for missed keys 1 2 3 4 5 6 7 8 9 10 86 88 90 92 94 96 98 Users Extraction rate (%) (b) Keystroke extraction rates per user averaged over all keys Figure 6: Keystroke extraction results The darker areas represent higher rate of missed keystrokes. We can observe from this figure that the number of missed keystrokes vary for different individuals depending upon their typing behaviors. For example we observed that the keystrokes of user 4 were missed in higher percentage with average detection ratio of 91.8% whereas the keystrokes of user 10 were not missed at all with average detection ratio of 100% calculated over all 37 keys. The lower extraction accuracy for user 4 shows that more keystrokes were missed, which is due to the significant difference in his typing behavior compared to other users. The accuracy of our scheme for such a user can be increased significantly by tuning the parameters of our algorithm for the given user. We also observe from this figure that the keystrokes that are missed are usually those for which fingers move very little when typing. For example, in pressing keys ‘a’, ‘d’, ‘f’, ‘i’, ‘j’ and ‘x’ the hands and fingers move very little, and thus the variations in the CSI values sometimes go undetected. Figure 6(b) shows the keystroke extraction rate for each user averaged over all 37 keys. The experimental results show that our keystroke extraction algorithm is robust because it consistently achieves high performance over different users without requiring any user specific tuning of system parameters. 8.4 Classification Accuracy We evaluate the classification accuracy of WiKey through two sets of experiments. In the first set of experiments, we build classifiers for each of the 10 users using 30 samples and measure the 10-fold cross validation accuracy of those classifiers. In the second set of experiments, we build classifier for user 10 while increasing the number of samples from 30 to 80 in order to observe the impact of increase in the number of training samples on the classification accuracy. Cross validation automatically picks a part of data for training and remaining for testing and does not use any data in testing that was used in training. Recall that the WiKey uses kNN classifiers for recognizing keys. In all of following experiments, we set k = 15. 8.4.1 Accuracy with 30 Samples per Key We evaluate the classification accuracy of WiKey in terms of average accuracy per key and average accuracy on all keys of any given user. We also present confusion matrices resulting from our experiments. A confusion matrix tells us which key was recognized by WiKey as which key with what percentage. We calculate the average accuracy per key by taking the average of confusion matrices obtained from all users and average accuracy on all keys of any given user by averaging the accuracy on all keys within the confusion matrix of that user. For each user, we trained each classifier using features from 30 samples of each key. We conducted our experiments on all 37 keys as well as on only 26 alphabet keys and performed 10-fold cross validation to obtain the confusion matrices. WiKey achieves an overall keystroke recognition accuracy of 82.87% in case of 37 keys and 83.46% in case of 26 alphabetic keys when averaged over all keys and users. Figure 7 shows the recognition accuracy for each key across all users for the 26 alphabetic keys. Similarly, Figure 8 shows the recognition accuracy for each key across all users for all 37 keys. Figure 9 shows the average recognition accuracy achieved by each user for both 26 keys and 37 keys. We observe that the recognition accuracy for 26 alphabetic keys is on average greater than the recognition accuracy for the all 37 keys. This is because the keystroke waveforms of the digit keys (0-9) often show similarity with keystroke waveforms of
alphabet keys in the keyboard row staring with QWE,which 100 leads to slightly greater number of misclassifications 7 n p g r s t u v w xz Keys Figure 10:Accuracy for keys A-Z from user 10 a b c d e f g h i j k I m n o p q r s t u v w x y z Keys Figure 7:Mean accuracy for keys A-Z(Users 1-10) SPa bc d e Keys Figure 11:Accuracy for all 37 keys from user 10 SPabedefgh i j k I mno pq rs tuvwxy z1234567890 Keys 8.4.3 Effects of CSI Sampling Rate and Number of Figure 8:Mean accuracy for all 37 keys(Users 1-10) Training Samples In previous experiments,we used high CSI sampling rate Alphabets,Digits and Space of 2500 samples/s.Furthermore,the 10-fold cross valida- tion automatically chose 10%of the data for testing and remaining 90%for training.Next,we evaluate the effect of changing the CSI sampling rate and the percentage of data used for training on accuracy.To extract keystrokes,we halved the values used for W,Du,Iu,Bieft,and Bright. We performed X-fold cross validation (2<X<10)on the data obtained for alphabetic keys from user 10.Figure 13 60 plots the accuracies for number of folds varying from 2 to 10 Users 10,where each plotted value if the average over all alpha- Figure 9:Per user average classifier accuracies bet keys.We observe from Figure 13 that the accuracies dropped compared to previously achieved accuracy because 8.4.2 Accuracy vs.the Size for Training Set of the drop in resolution of keystroke shapes due to reduced To determine the impact of the number for training sampling rate.We also observe that recognition accuracies samples on the accuracy of WiKey,we again perform two of the keys for which hands and fingers move little were affected the most.When 50%of data was used for train- sets of experiments:one for 26 alphabetic keys and other for all 37 keys. ing,i.e.,for 2-fold cross validation,the accuracies for keys The accuracy of Wikey increases when the number of j','x','v'and 'p'dropped below 60%.However,the average training samples per key are increased from 30 to 80.Figure accuracy remained approximately 80%for all folds. 10 shows the results from 10-fold cross validation for the 26 8.5 Real-world Evaluation on Sentences alphabetic keys when 80 training samples are used per key. We observe from this figure that the recognition accuracy To evaluate WiKey in real world scenarios,we collected increased from 88.3%(as seen in Figure 9)to 96.4%when CSI data for different sentences typed by users 1-10 as men- the number of training samples are increased from 30 to 80 tioned earlier in Section 8.2.For recognition of keystrokes in Figure 11 shows the results from 10-fold cross validation for sentences,we performed training using the dataset of indi- all 37 keys when 80 training samples are used per key.We vidual keystrokes and the keystrokes extracted from datasets again observe that the recognition accuracy increased from obtained from typing the sentences were used as test data. 85.95%(as seen in Figure 9)to 89.7%when the number of Wikey achieves an average keystroke recognition accur- training samples are increased from 30 to 80.The gray-scale acy of 77.43%for typed sentences when 30 training samples maps of the confusion matrix obtained after 10-fold cross- per key were used.For each user,we trained classifiers us- validation on 80 training samples of User 10 is shown in ing 30 samples for each of the 26 alphabetic keys.We then Figure 12. applied our keystroke extraction algorithm to first extract waveforms of individual keys,applied PCA on them to de- noise the waveforms and then extracted the shape features for each extracted key and feeded them to the classifiers to
alphabet keys in the keyboard row staring with QWE, which leads to slightly greater number of misclassifications. a b c d e f g h i j k l m n o p q r s t u v w x y z 65 70 75 80 85 90 Keys Accuracy (%) Figure 7: Mean accuracy for keys A-Z (Users 1-10) SPa b c d e f g h i j k l m n o p q r s t u v w x y z 1 2 3 4 5 6 7 8 9 0 65 70 75 80 85 90 Keys Accuracy (%) Figure 8: Mean accuracy for all 37 keys (Users 1-10) Figure 9: Per user average classifier accuracies 8.4.2 Accuracy vs. the Size for Training Set To determine the impact of the number for training samples on the accuracy of WiKey, we again perform two sets of experiments: one for 26 alphabetic keys and other for all 37 keys. The accuracy of WiKey increases when the number of training samples per key are increased from 30 to 80. Figure 10 shows the results from 10-fold cross validation for the 26 alphabetic keys when 80 training samples are used per key. We observe from this figure that the recognition accuracy increased from 88.3% (as seen in Figure 9) to 96.4% when the number of training samples are increased from 30 to 80. Figure 11 shows the results from 10-fold cross validation for all 37 keys when 80 training samples are used per key. We again observe that the recognition accuracy increased from 85.95% (as seen in Figure 9) to 89.7% when the number of training samples are increased from 30 to 80. The gray-scale maps of the confusion matrix obtained after 10-fold crossvalidation on 80 training samples of User 10 is shown in Figure 12. a b c d e f g h i j k l m n o p q r s t u v w x y z 70 75 80 85 90 95 100 Keys Accuracy (%) Figure 10: Accuracy for keys A-Z from user 10 SPa b c d e f g h i j k l m n o p q r s t u v w x y z 1 2 3 4 5 6 7 8 9 0 70 75 80 85 90 95 100 Keys Accuracy (%) Figure 11: Accuracy for all 37 keys from user 10 8.4.3 Effects of CSI Sampling Rate and Number of Training Samples In previous experiments, we used high CSI sampling rate of 2500 samples/s. Furthermore, the 10-fold cross validation automatically chose 10% of the data for testing and remaining 90% for training. Next, we evaluate the effect of changing the CSI sampling rate and the percentage of data used for training on accuracy. To extract keystrokes, we halved the values used for W, Du, Iu, Blef t, and Bright. We performed X−fold cross validation (2 ≤ X ≤ 10) on the data obtained for alphabetic keys from user 10. Figure 13 plots the accuracies for number of folds varying from 2 to 10, where each plotted value if the average over all alphabet keys. We observe from Figure 13 that the accuracies dropped compared to previously achieved accuracy because of the drop in resolution of keystroke shapes due to reduced sampling rate. We also observe that recognition accuracies of the keys for which hands and fingers move little were affected the most. When 50% of data was used for training, i.e., for 2-fold cross validation, the accuracies for keys ‘j’,‘x’,‘v’ and ‘p’ dropped below 60%. However, the average accuracy remained approximately 80% for all folds. 8.5 Real-world Evaluation on Sentences To evaluate WiKey in real world scenarios, we collected CSI data for different sentences typed by users 1-10 as mentioned earlier in Section 8.2. For recognition of keystrokes in sentences, we performed training using the dataset of individual keystrokes and the keystrokes extracted from datasets obtained from typing the sentences were used as test data. WiKey achieves an average keystroke recognition accuracy of 77.43% for typed sentences when 30 training samples per key were used. For each user, we trained classifiers using 30 samples for each of the 26 alphabetic keys. We then applied our keystroke extraction algorithm to first extract waveforms of individual keys, applied PCA on them to denoise the waveforms and then extracted the shape features for each extracted key and feeded them to the classifiers to