Table I COMPARISON AMONG SIDE-CHANNEL BASED KEYSTROKE INFERENCE METHODS System Attack Distance Side-Channel Signal Passive Listening Continues Typing NLOS Owusu et al.[3] On device IMU (Smartphone) Yes Yes Liu et al.[4] Wearable IMU (Smartwatch) Yes Yes Shukla et al.[1] 5 meters Video Yes Yes No Sun et al.[2] 2 meters Video Yes Yes No Asonov et al.[8] 1 meter Acoustic Yes Yes Zhu et al.[6] 40 centimeters Acoustic Yes Yes Wikey [10] 30 centimeters Wi.FI No No Yes WindTalker [12 1.5 meters Wi-Fi No No Yes SpiderMon 5~15 meters LTE Yes Yes Yes separately by assuming that the user always returns to a MHz and to perform user localization [21].Soft-LTE uses the given posture after each keystroke [10].To handle continuous Sora software-radio to implement the LTE uplink with a full typing,we model the process as a Hidden Markov Model bandwidth but does not implement the downlink [22].Marco (HMM)and use the LTE signal to infer the transition between et al.[23]proposed a method for extracting TOA information subsequent keystrokes.Third,the LTE signal contains both from LTE CIR signals and achieved 20 meters accuracy for data transmission and reference signals so that the raw data vehicular position tracking.However,most of these systems rate is 122.88 MBytes per second,which makes real-time [20].[211.[24]do not support real-time operations on the full data processing and logging a challenge.To enable long-term 20 MHz LTE bandwidth. monitoring,we build a signal processing frontend running on RF-based Activity Monitoring Systems:Different types a workstation that compresses the measurements to a rate of of RF signals,including Wi-Fi [25]-[28].FMCW radar [29], 800 kBytes per second so that the results can be efficiently [30].60GHz radar [31],[32],and RFID [33].[34],have processed and stored in real-time for hours. been used for human activity monitoring.Most of the above Our experimental results show that SpiderMon can detect RF-based attacks require an active transmitting device to be 95%keystrokes at a distance of 15 meters.When the victim placed around the victim.There are systems that use signals is behind the wall at a distance of 5 meters,SpiderMon can transmitted by GSM BSs to perform through wall monitoring recover a 6-digits PIN input with a success rate of more than [35].However,GSM-based systems only extract the coarse- 51%within ten trials and this accuracy is above 36%at 15 grained Doppler shift data,while LTE-based systems can meters with line-of-sight. measure the signal phase with high accuracy. In summary,we have made the following contributions: Keystroke Inference Attacks:Existing keystroke inference .To the best of our knowledge,we are the first to show attacks use different types of sensors to capture the keystroke that commercial 4G/5G cellular signals can be used for fine- signal,including sound [5]-[8],IMU [3].[4],video [1]. grained human activity monitoring. [36],and RF signals [10],[12],[13].Asonov et al.[8]first We build a real-time cellular signal analysis system demonstrated that different keys can be distinguished by their with Commercial Off-The-Shelf(COTS)USRP devices and unique typing sounds.Zhuang et al.[7]and Berger et al.[37] workstations.Our system can process commercial LTE signals improved keystroke recognition accuracy by adding a language with a bandwidth of 20 MHz and extract 4,000 x 200 CRS model.Liu et al.[4]achieved 65%inference accuracy in samples per second in real-time. top-3 candidates using the IMU on a smartwatch.Sun et We propose to leverage the HMM to infer continuous al.[36]detected and quantified the subtle motion patterns of keystroke sequences.Our extensive evaluations on keystroke the back of the device induced by a user's keystrokes using sequence inference show that this method outperforms the videos.WiPass [13]and WindTalker [12]further uses the traditional individual keystroke recovery scheme. Wi-Fi CSI to snoop the unlock patterns and PINs on mobile devices.However,these methods have their own shortcomings II.RELATED WORK Sound and Wi-Fi-based methods tend to work only in limited We divide the existing related work into the following four distances.IMU-based solutions need to crack the victim's areas:LTE physical layer measurements,Radio Frequency wearables,while video-based solutions are limited by lighting (RF)based activity monitoring systems,keystroke inference conditions and obstructions such as ATM keyboard cover. attacks,and protection against RF-based attacks. Protection against RF-based Attacks:Most of exist- LTE Physical Layer Measurements:Existing LTE phys- ing privacy protection systems transmit interfering signals ical layer measurement tools mainly focus on the networking to prevent attackers from measuring key RF parameters or ranging problem.LTE physical layer information,such that are vital for activity recognition.PhyCloak [17]lever- as the Channel Quality Indicator (CQD,can be used in age an RF signal-relay to disturb the amplitude,delay, cross-layer design to improve TCP throughput of the cel-and Doppler shift of the signal received by the attacker lular network [18],[19].The real-time LTE radio resource so that they cannot reliably infer the activity of the user. monitor (RMon)extracts the PHY-layer resource allocation Aegis [38]uses randomized amplifications,fan movements, information to help LTE video streaming [20].LTEye uses and antenna rotations to distort the same set of RF sig USRP N210 to decode LTE signal with a bandwidth of 10 nal parameters.However,these protection schemes activelyTable I COMPARISON AMONG SIDE-CHANNEL BASED KEYSTROKE INFERENCE METHODS. System Attack Distance Side-Channel Signal Passive Listening Continues Typing NLOS Owusu et al. [3] On device IMU (Smartphone) Yes Yes / Liu et al. [4] Wearable IMU (Smartwatch) Yes Yes / Shukla et al. [1] 5 meters Video Yes Yes No Sun et al. [2] 2 meters Video Yes Yes No Asonov et al. [8] 1 meter Acoustic Yes Yes / Zhu et al. [6] 40 centimeters Acoustic Yes Yes / Wikey [10] 30 centimeters Wi-Fi No No Yes WindTalker [12] 1.5 meters Wi-Fi No No Yes SpiderMon 5∼15 meters LTE Yes Yes Yes separately by assuming that the user always returns to a given posture after each keystroke [10]. To handle continuous typing, we model the process as a Hidden Markov Model (HMM) and use the LTE signal to infer the transition between subsequent keystrokes. Third, the LTE signal contains both data transmission and reference signals so that the raw data rate is 122.88 MBytes per second, which makes real-time data processing and logging a challenge. To enable long-term monitoring, we build a signal processing frontend running on a workstation that compresses the measurements to a rate of 800 kBytes per second so that the results can be efficiently processed and stored in real-time for hours. Our experimental results show that SpiderMon can detect 95% keystrokes at a distance of 15 meters. When the victim is behind the wall at a distance of 5 meters, SpiderMon can recover a 6-digits PIN input with a success rate of more than 51% within ten trials and this accuracy is above 36% at 15 meters with line-of-sight. In summary, we have made the following contributions: • To the best of our knowledge, we are the first to show that commercial 4G/5G cellular signals can be used for finegrained human activity monitoring. • We build a real-time cellular signal analysis system with Commercial Off-The-Shelf (COTS) USRP devices and workstations. Our system can process commercial LTE signals with a bandwidth of 20 MHz and extract 4, 000 × 200 CRS samples per second in real-time. • We propose to leverage the HMM to infer continuous keystroke sequences. Our extensive evaluations on keystroke sequence inference show that this method outperforms the traditional individual keystroke recovery scheme. II. RELATED WORK We divide the existing related work into the following four areas: LTE physical layer measurements, Radio Frequency (RF) based activity monitoring systems, keystroke inference attacks, and protection against RF-based attacks. LTE Physical Layer Measurements: Existing LTE physical layer measurement tools mainly focus on the networking or ranging problem. LTE physical layer information, such as the Channel Quality Indicator (CQI), can be used in cross-layer design to improve TCP throughput of the cellular network [18], [19]. The real-time LTE radio resource monitor (RMon) extracts the PHY-layer resource allocation information to help LTE video streaming [20]. LTEye uses USRP N210 to decode LTE signal with a bandwidth of 10 MHz and to perform user localization [21]. Soft-LTE uses the Sora software-radio to implement the LTE uplink with a full bandwidth but does not implement the downlink [22]. Marco et al. [23] proposed a method for extracting TOA information from LTE CIR signals and achieved 20 meters accuracy for vehicular position tracking. However, most of these systems [20], [21], [24] do not support real-time operations on the full 20 MHz LTE bandwidth. RF-based Activity Monitoring Systems: Different types of RF signals, including Wi-Fi [25]–[28], FMCW radar [29], [30], 60GHz radar [31], [32], and RFID [33], [34], have been used for human activity monitoring. Most of the above RF-based attacks require an active transmitting device to be placed around the victim. There are systems that use signals transmitted by GSM BSs to perform through wall monitoring [35]. However, GSM-based systems only extract the coarsegrained Doppler shift data, while LTE-based systems can measure the signal phase with high accuracy. Keystroke Inference Attacks: Existing keystroke inference attacks use different types of sensors to capture the keystroke signal, including sound [5]–[8], IMU [3], [4], video [1], [36], and RF signals [10], [12], [13]. Asonov et al. [8] first demonstrated that different keys can be distinguished by their unique typing sounds. Zhuang et al. [7] and Berger et al. [37] improved keystroke recognition accuracy by adding a language model. Liu et al. [4] achieved 65% inference accuracy in top-3 candidates using the IMU on a smartwatch. Sun et al. [36] detected and quantified the subtle motion patterns of the back of the device induced by a user’s keystrokes using videos. WiPass [13] and WindTalker [12] further uses the Wi-Fi CSI to snoop the unlock patterns and PINs on mobile devices. However, these methods have their own shortcomings. Sound and Wi-Fi-based methods tend to work only in limited distances. IMU-based solutions need to crack the victim’s wearables, while video-based solutions are limited by lighting conditions and obstructions such as ATM keyboard cover. Protection against RF-based Attacks: Most of existing privacy protection systems transmit interfering signals to prevent attackers from measuring key RF parameters that are vital for activity recognition. PhyCloak [17] leverage an RF signal-relay to disturb the amplitude, delay, and Doppler shift of the signal received by the attacker so that they cannot reliably infer the activity of the user. Aegis [38] uses randomized amplifications, fan movements, and antenna rotations to distort the same set of RF signal parameters. However, these protection schemes actively