Thru-the-wall Eavesdropping on Loudspeakers via RFID by Capturing Sub-mm Level Vibration 182 CHUYU WANG,State Key Laboratory for Novel Software Technology,Nanjing University,China LEI XIE',State Key Laboratory for Novel Software Technology,Nanjing University,China YUANCAN LIN,State Key Laboratory for Novel Software Technology,Nanjing University,China WEI WANG,State Key Laboratory for Novel Software Technology,Nanjing University,China YINGYING CHEN,Electrical and Computer Engineering,Rutgers University,USA YANLING BU,State Key Laboratory for Novel Software Technology,Nanjing University,China KAl ZHANG,State Key Laboratory for Novel Software Technology,Nanjing University,China SANGLU LU,State Key Laboratory for Novel Software Technology,Nanjing University,China The unprecedented success of speech recognition methods has stimulated the wide usage of intelligent audio systems. which provides new attack opportunities for stealing the user privacy through eavesdropping on the loudspeakers.Effective eavesdropping methods employ a high-speed camera,relying on LOS to measure object vibrations,or utilize WiFi MIMO antenna array,requiring to eavesdrop in quiet environments.In this paper,we explore the possibility of eavesdropping on the loudspeaker based on COTS RFID tags,which are prevalently deployed in many corners of our daily lives.We propose Tag-Bug that focuses on the human voice with complex frequency bands and performs the thru-the-wall eavesdropping on the loudspeaker by capturing sub-mm level vibration.Tag-Bug extracts sound characteristics through two means:(1) Vibration effect,where a tag directly vibrates caused by sounds;(2)Reflection effect,where a tag does not vibrate but senses the reflection signals from nearby vibrating objects.To amplify the influence of vibration signals,we design a new signal feature referred as Modulated Signal Difference(MSD)to reconstruct the sound from RF-signals.To improve the quality of the reconstructed sound for human voice recognition,we apply a Conditional Generative Adversarial Network(CGAN)to recover the full-frequency band from the partial-frequency band of the reconstructed sound.Extensive experiments on the USRP platform show that Tag-Bug can successfully capture the monotone sound when the loudness is larger than 60dB.Tag-Bug can efficiently recognize the numbers of human voice with 95.3%,85.3%and 87.5%precision in the free-space eavesdropping. thru-the-brick-wall eavesdropping and thru-the-insulating-glass eavesdropping,respectively.Tag-Bug can also accurately recognize the letters with 87%precision in the free-space eavesdropping. CCS Concepts:.Networks-Cyber-physical networks;.Security and privacy-Mobile and wireless security. "Lei Xie is the corresponding author. Authors'addresses:Chuyu Wang.State Key Laboratory for Novel Software Technology,Nanjing University,Nanjing.China,chuyu@nju edu.cn;Lei Xie,State Key Laboratory for Novel Software Technology,Nanjing University,Nanjing,China,Ixie@nju.edu.cn:Yuancan Lin, State Key Laboratory for Novel Software Technology.Nanjing University,Nanjing.China,yclin@smailnju.edu.cn;Wei Wang.State Key Laboratory for Novel Software Technology,Nanjing University,Nanjing.China,ww@nju.edu.cn;Yingying Chen,Electrical and Computer Engineering.Rutgers University,New Brunswick,USA,yingche@scarletmaiLrutgers.edu;Yanling Bu,State Key Laboratory for Novel Software Technology,Nanjing University,Nanjing.China,yanling@smail.nju.edu.cn;Kai Zhang.State Key Laboratory for Novel Software Technology Nanjing University,Nanjing,China,mg1933091@smail.nju.edu.cn;Sanglu Lu,State Key Laboratory for Novel Software Technology,Nanjing University.Nanjing.China,sanglu@nju.edu.cn. Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page.Copyrights for components of this work owned by others than ACM must be honored.Abstracting with credit is permitted.To copy otherwise,or republish,to post on servers or to redistribute to lists,requires prior specific permission and/or a fee.Request permissions from permissions@acm.org. 2021 Association for Computing Machinery. 2474-9567/2021/12-ART182$15.00 https:/doi.org/10.1145/3494975 Proc.ACM Interact.Mob.Wearable Ubiquitous Technol.,Vol.5,No.4,Article 182.Publication date:December 2021
182 Thru-the-wall Eavesdropping on Loudspeakers via RFID by Capturing Sub-mm Level Vibration CHUYU WANG, State Key Laboratory for Novel Software Technology, Nanjing University, China LEI XIE∗ , State Key Laboratory for Novel Software Technology, Nanjing University, China YUANCAN LIN, State Key Laboratory for Novel Software Technology, Nanjing University, China WEI WANG, State Key Laboratory for Novel Software Technology, Nanjing University, China YINGYING CHEN, Electrical and Computer Engineering, Rutgers University, USA YANLING BU, State Key Laboratory for Novel Software Technology, Nanjing University, China KAI ZHANG, State Key Laboratory for Novel Software Technology, Nanjing University, China SANGLU LU, State Key Laboratory for Novel Software Technology, Nanjing University, China The unprecedented success of speech recognition methods has stimulated the wide usage of intelligent audio systems, which provides new attack opportunities for stealing the user privacy through eavesdropping on the loudspeakers. E!ective eavesdropping methods employ a high-speed camera, relying on LOS to measure object vibrations, or utilize WiFi MIMO antenna array, requiring to eavesdrop in quiet environments. In this paper, we explore the possibility of eavesdropping on the loudspeaker based on COTS RFID tags, which are prevalently deployed in many corners of our daily lives. We propose Tag-Bug that focuses on the human voice with complex frequency bands and performs the thru-the-wall eavesdropping on the loudspeaker by capturing sub-mm level vibration. Tag-Bug extracts sound characteristics through two means: (1) Vibration e!ect, where a tag directly vibrates caused by sounds; (2) Re"ection e!ect, where a tag does not vibrate but senses the re"ection signals from nearby vibrating objects. To amplify the in"uence of vibration signals, we design a new signal feature referred as Modulated Signal Di!erence (MSD) to reconstruct the sound from RF-signals. To improve the quality of the reconstructed sound for human voice recognition, we apply a Conditional Generative Adversarial Network (CGAN) to recover the full-frequency band from the partial-frequency band of the reconstructed sound. Extensive experiments on the USRP platform show that Tag-Bug can successfully capture the monotone sound when the loudness is larger than 60dB. Tag-Bug can e#ciently recognize the numbers of human voice with 95.3%, 85.3% and 87.5% precision in the free-space eavesdropping, thru-the-brick-wall eavesdropping and thru-the-insulating-glass eavesdropping, respectively. Tag-Bug can also accurately recognize the letters with 87% precision in the free-space eavesdropping. CCS Concepts: • Networks ! Cyber-physical networks; • Security and privacy ! Mobile and wireless security. ∗ Lei Xie is the corresponding author. Authors’ addresses: Chuyu Wang, State Key Laboratory for Novel Software Technology, Nanjing University, Nanjing, China, chuyu@nju. edu.cn; Lei Xie, State Key Laboratory for Novel Software Technology, Nanjing University, Nanjing, China, lxie@nju.edu.cn; Yuancan Lin, State Key Laboratory for Novel Software Technology, Nanjing University, Nanjing, China, yclin@smail.nju.edu.cn; Wei Wang, State Key Laboratory for Novel Software Technology, Nanjing University, Nanjing, China, ww@nju.edu.cn; Yingying Chen, Electrical and Computer Engineering, Rutgers University, New Brunswick, USA, yingche@scarletmail.rutgers.edu; Yanling Bu, State Key Laboratory for Novel Software Technology, Nanjing University, Nanjing, China, yanling@smail.nju.edu.cn; Kai Zhang, State Key Laboratory for Novel Software Technology, Nanjing University, Nanjing, China, mg1933091@smail.nju.edu.cn; Sanglu Lu, State Key Laboratory for Novel Software Technology, Nanjing University, Nanjing, China, sanglu@nju.edu.cn. Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for pro$t or commercial advantage and that copies bear this notice and the full citation on the $rst page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior speci$c permission and/or a fee. Request permissions from permissions@acm.org. © 2021 Association for Computing Machinery. 2474-9567/2021/12-ART182 $15.00 https://doi.org/10.1145/3494975 Proc. ACM Interact. Mob. Wearable Ubiquitous Technol., Vol. 5, No. 4, Article 182. Publication date: December 2021
182:2·Wang et al. Voice assistan RFID antennas Voice assistan Signal features system system Loudspeaker 3Vibration effect RFID tag Sounds RFID tag Signal Attacker RFID tag effect (a)Application scenario (b)Tag-Bug:Acoustic thru-the-wall eavesdropping Fig.1.Thru-the-wall eavesdropping via RFID tags Additional Key Words and Phrases:Eavesdropping,RFID,Sub-mm Level Vibration ACM Reference Format: Chuyu Wang,Lei Xie,Yuancan Lin,Wei Wang,Yingying Chen,Yanling Bu,Kai Zhang,and Sanglu Lu.2021.Thru-the-wall Eavesdropping on Loudspeakers via RFID by Capturing Sub-mm Level Vibration.Proc.ACM Interact.Mob.Wearable Ubiquitous Technol 5,4.Article 182(December 2021),25 pages.https://doi.org/10.1145/3494975 1 INTRODUCTION Acoustic eavesdropping is one of the most significant security concerns,as the voice communication between people is an unencrypted transmission channel,making it easy to obtain the sensitive information.Traditional acoustic eavesdropping methods,which employ hidden or tampered microphones [8,23].can be prevented by using soundproof insulation.Due to such insulation,the user may involuntarily neglect the acoustic eavesdropping in such scenario,making the loudspeaker a potential threat for eavesdropping.Particularly,benefiting from the unprecedented success of the advancement in speech recognition,the intelligent audio systems have been widely integrated into our daily life,which largely extends the usage of loudspeakers and brings new attack opportunities.For example,Google Home may replay the passwords,when the 'Remember'function is activated to record the private information by the user.Then,private information,e.g.,daily schedule,passwords and even life style,may be leaked.Another example is that online meetings during COVID-19 bring great convenience to many companies and employees when working from home.However,all these meetings involve the usage of loudspeakers heavily,which may lead to severe personal and corporate proprietary information leakage. Due to its severe consequences,there have been active research efforts on eavesdropping of loudspeakers.Davis et al leverage a high-speed camera to capture the vibrations of objects(e.g.,a glass of water or a potted plant) caused by the loudspeaker to perceive the sound [10],which relies on the existence of line-of-sight communication. Sensors such as gyroscopes embedded in a smartphone have also been exploited to capture the sound from the loudspeaker [26].This approach works through the common medium with the loudspeaker and does not work for the thru-the-wall eavesdropping.It is also limited by the battery power of mobile devices.ART eavesdropper uses wireless signals to perceive the vibration of the loudspeaker diaphragm based on a specific MIMO antenna array [37].This solution incurs hardware(i.e,MIMO antenna array)with relatively high cost and works mostly in quiet environments.Any nearby vibrations,e.g.,a spinning fan,can affect the receiving signal.Some advanced work has shown that Ultra High Frequency(UHF)RFID tags can capture tiny vibrations.TagSound [20]perceives the mono-tone sound vibration by using harmonic signals,and others [40,41]capture the ambient vibrations based on the phase variation by using the compressive sensing.However,the harmonic signals are too weak to perform the thru-the-wall eavesdropping,and the compressive sensing cannot be used to extract the human voice with none-sparse frequency bands. Proc.ACM Interact.Mob.Wearable Ubiquitous Technol..Vol 5.No.4.Article 182.Publication date:December 2021
182:2 • Wang et al. Voice assistant system Hello!! RFID tag Voice assistant system RFID tag RFID tag RFID tag RFID tag (a) Application scenario Wall ③Vibration effect RFID antennas ②Sounds Loudspeaker TX RX ④Backscattered signal ④Leakage signal -0.12 -0.1 -0.08 -0.06 -0.04 In-phase 0.28 0.3 0.32 0.34 Quadrature Sounds Attacker ① Continuous Wave ③Reflection effect Signal features (b) Tag-Bug: Acoustic thru-the-wall eavesdropping Fig. 1. Thru-the-wall eavesdropping via RFID tags. Additional Key Words and Phrases: Eavesdropping, RFID, Sub-mm Level Vibration ACM Reference Format: Chuyu Wang, Lei Xie, Yuancan Lin, Wei Wang, Yingying Chen, Yanling Bu, Kai Zhang, and Sanglu Lu. 2021. Thru-the-wall Eavesdropping on Loudspeakers via RFID by Capturing Sub-mm Level Vibration. Proc. ACM Interact. Mob. Wearable Ubiquitous Technol. 5, 4, Article 182 (December 2021), 25 pages. https://doi.org/10.1145/3494975 1 INTRODUCTION Acoustic eavesdropping is one of the most signi$cant security concerns, as the voice communication between people is an unencrypted transmission channel, making it easy to obtain the sensitive information. Traditional acoustic eavesdropping methods, which employ hidden or tampered microphones [8, 23], can be prevented by using soundproof insulation. Due to such insulation, the user may involuntarily neglect the acoustic eavesdropping in such scenario, making the loudspeaker a potential threat for eavesdropping. Particularly, bene$ting from the unprecedented success of the advancement in speech recognition, the intelligent audio systems have been widely integrated into our daily life, which largely extends the usage of loudspeakers and brings new attack opportunities. For example, Google Home may replay the passwords, when the ‘Remember’ function is activated to record the private information by the user. Then, private information, e.g., daily schedule, passwords and even life style, may be leaked. Another example is that online meetings during COVID-19 bring great convenience to many companies and employees when working from home. However, all these meetings involve the usage of loudspeakers heavily, which may lead to severe personal and corporate proprietary information leakage. Due to its severe consequences, there have been active research e!orts on eavesdropping of loudspeakers. Davis et al. leverage a high-speed camera to capture the vibrations of objects (e.g., a glass of water or a potted plant) caused by the loudspeaker to perceive the sound [10], which relies on the existence of line-of-sight communication. Sensors such as gyroscopes embedded in a smartphone have also been exploited to capture the sound from the loudspeaker [26]. This approach works through the common medium with the loudspeaker and does not work for the thru-the-wall eavesdropping. It is also limited by the battery power of mobile devices. ART eavesdropper uses wireless signals to perceive the vibration of the loudspeaker diaphragm based on a speci$c MIMO antenna array [37]. This solution incurs hardware (i.e., MIMO antenna array) with relatively high cost and works mostly in quiet environments. Any nearby vibrations, e.g., a spinning fan, can a!ect the receiving signal. Some advanced work has shown that Ultra High Frequency (UHF) RFID tags can capture tiny vibrations. TagSound [20] perceives the mono-tone sound vibration by using harmonic signals, and others [40, 41] capture the ambient vibrations based on the phase variation by using the compressive sensing. However, the harmonic signals are too weak to perform the thru-the-wall eavesdropping, and the compressive sensing cannot be used to extract the human voice with none-sparse frequency bands. Proc. ACM Interact. Mob. Wearable Ubiquitous Technol., Vol. 5, No. 4, Article 182. Publication date: December 2021
Thru-the-wall Eavesdropping on Loudspeakers via RFID by Capturing Sub-mm Level Vibration.182:3 In this paper,we explore the possibility of eavesdropping the human voice played by the loudspeaker based on the surrounding COTS RFID tags,which could be attached on many everyday objects as shown in Figure 1(a) On one hand,many daily products from online purchasing,such as water bottles,delivery packages,hang tags, envelopes books,etc.,come with RFID tags.It greatly improves the chances of RFID tags appearing in our lives,and makes the tags easily overlooked.On the other hand,the adversary can even intentionally hide the battery-less and light-weighted RFID tags beside the loudspeaker,e.g.,under the table,which is hard to be detected and is able to eavesdrop in a long term.As shown in Figure 1(b),we develop Tag-Bug,an effective system to perform the thru-the-wall eavesdropping on the loudspeaker based on the received physical-layer signals.Similar to the previous attacks [6,17,26],we consider the loudspeaker as the sound source,which is widely used in a voice assistant system,e.g.,Google Home and Amazon Alexa,rather than the live human speech.The reason is that the live human speech mainly leads to the air flow from the mouth with small vibration of vocal cords while the loudspeaker mainly leads to the diaphragm vibration.Thus,the human speech can be drowned by the vibration due to the air flow in the extracted sound.In particular,Tag-Bug can extract the sound from loudspeaker through two ways:(1)Vibration effect,the tag directly vibrates caused by sounds,e.g.,the tag vibrates directly due to the playing sounds when attached on the delivery package.(2)Reflection effect,the tag does not vibrate but senses the reflection signals from nearby vibrating objects due to the sound,e.g.,the tag captures the reflection signal from a cup of water,which vibrates due to the playing sounds.To extract the tiny vibration of the sound, we build a model to decompose the received signals and extract the Modulated Signal Difference(MSD)as the vibration indicator.Since the RFID tag is more sensitive to the low-frequency sound due to the larger sound energy,we leverage a Conditional Generative Adversarial Network(CGAN)to recover the high-frequency band by referring to the low-frequency band,so as to improve the quality of recovered human voice. There are three main challenges in performing the eavesdropping via RFID tags.The first challenge is to detect the sub-mm level vibration caused by the sound.Traditionally,the vibration of the loudspeaker diaphragm is usually smaller than 1mm [16].However,such tiny vibration results in the phase change below 0.04 radians,which is close to the noise level [39].To address this challenge,we build a transmitting model and extract amplified vibration features from the received signal.Particularly,we extract the Modulated Signal Difference(MSD)as the difference of signals between the ON and OFF modulation states.The phase change of MSD indicates the tag displacement due to the vibration.Furthermore,we propose the amplified MSD by subtracting the average signal of OFF states in a time window.The amplified MSD can extract the sound from either the vibration effect or the reflection effect.In this way,Tag-Bug can extract the sub-mm level vibration,when either the tag itself or the nearby object vibrates caused by the sound wave The second challenge is to reduce the interference of the periodic commands sent by the RFID reader.In RFID systems,the periodic reader signal,e.g,the QUERY and ACK commands,is much stronger than the backscattered signal from the tag.Even if the reader signal does not overlap with the tag signal in the time domain,the periodic reader signal will lead to the large noise in the frequency band when received by the antenna.To address this challenge,we randomize the tag response mechanism based on the C1G2 protocol.In particular,we randomly set the frame-size of each query cycle and let the tag randomly retransmit the EPC command.Then,the noise due to the periodic commands can be significantly reduced. The third challenge is to refine the recovered human voice extracted from the amplified MSD.Human voice is the main target of the concerns during the eavesdropping.However,limited by the inherent material characteristics of RFID tags,the signals of high-frequency bands are very weak in the extracted sound from the amplified MSD,so the recovered sound is unclear for recognition.To address this challenge,we investigate the correlation of signals with different frequencies,and find that high-frequency signals are usually harmonic of low-frequency signals. To efficiently capture the correlation among different frequency bands,we develop a CGAN to recover the full- frequency band by referring to multiple low-frequencies.In this way,the refined sound has more comprehensive frequency band,and could be recognized more accurately. Proc.ACM Interact.Mob.Wearable Ubiquitous Technol.,Vol.5,No.4,Article 182.Publication date:December 2021
Thru-the-wall Eavesdropping on Loudspeakers via RFID by Capturing Sub-mm Level Vibration • 182:3 In this paper, we explore the possibility of eavesdropping the human voice played by the loudspeaker based on the surrounding COTS RFID tags, which could be attached on many everyday objects as shown in Figure 1(a). On one hand, many daily products from online purchasing, such as water bottles, delivery packages, hang tags, envelopes books, etc. , come with RFID tags. It greatly improves the chances of RFID tags appearing in our lives, and makes the tags easily overlooked. On the other hand, the adversary can even intentionally hide the battery-less and light-weighted RFID tags beside the loudspeaker, e.g., under the table, which is hard to be detected and is able to eavesdrop in a long term. As shown in Figure 1(b), we develop Tag-Bug, an e!ective system to perform the thru-the-wall eavesdropping on the loudspeaker based on the received physical-layer signals. Similar to the previous attacks [6, 17, 26], we consider the loudspeaker as the sound source, which is widely used in a voice assistant system, e.g., Google Home and Amazon Alexa, rather than the live human speech. The reason is that the live human speech mainly leads to the air "ow from the mouth with small vibration of vocal cords, while the loudspeaker mainly leads to the diaphragm vibration. Thus, the human speech can be drowned by the vibration due to the air "ow in the extracted sound. In particular, Tag-Bug can extract the sound from loudspeaker through two ways: (1) Vibration e!ect, the tag directly vibrates caused by sounds, e.g., the tag vibrates directly due to the playing sounds when attached on the delivery package. (2) Re"ection e!ect, the tag does not vibrate but senses the re"ection signals from nearby vibrating objects due to the sound, e.g., the tag captures the re"ection signal from a cup of water, which vibrates due to the playing sounds. To extract the tiny vibration of the sound, we build a model to decompose the received signals and extract the Modulated Signal Di!erence (MSD) as the vibration indicator. Since the RFID tag is more sensitive to the low-frequency sound due to the larger sound energy, we leverage a Conditional Generative Adversarial Network (CGAN) to recover the high-frequency band by referring to the low-frequency band, so as to improve the quality of recovered human voice. There are three main challenges in performing the eavesdropping via RFID tags. The #rst challenge is to detect the sub-mm level vibration caused by the sound. Traditionally, the vibration of the loudspeaker diaphragm is usually smaller than 1mm [16]. However, such tiny vibration results in the phase change below 0.04 radians, which is close to the noise level [39]. To address this challenge, we build a transmitting model and extract ampli$ed vibration features from the received signal. Particularly, we extract the Modulated Signal Di!erence (MSD) as the di!erence of signals between the ON and OFF modulation states. The phase change of MSD indicates the tag displacement due to the vibration. Furthermore, we propose the ampli$ed MSD by subtracting the average signal of OFF states in a time window. The ampli$ed MSD can extract the sound from either the vibration e!ect or the re"ection e!ect. In this way, Tag-Bug can extract the sub-mm level vibration, when either the tag itself or the nearby object vibrates caused by the sound wave. The second challenge is to reduce the interference of the periodic commands sent by the RFID reader. In RFID systems, the periodic reader signal, e.g., the QUERY and ACK commands, is much stronger than the backscattered signal from the tag. Even if the reader signal does not overlap with the tag signal in the time domain, the periodic reader signal will lead to the large noise in the frequency band when received by the antenna. To address this challenge, we randomize the tag response mechanism based on the C1G2 protocol. In particular, we randomly set the frame-size of each query cycle and let the tag randomly retransmit the EPC command. Then, the noise due to the periodic commands can be signi$cantly reduced. The third challenge is to re#ne the recovered human voice extracted from the ampli#ed MSD. Human voice is the main target of the concerns during the eavesdropping. However, limited by the inherent material characteristics of RFID tags, the signals of high-frequency bands are very weak in the extracted sound from the ampli$ed MSD, so the recovered sound is unclear for recognition. To address this challenge, we investigate the correlation of signals with di!erent frequencies, and $nd that high-frequency signals are usually harmonic of low-frequency signals. To e#ciently capture the correlation among di!erent frequency bands, we develop a CGAN to recover the fullfrequency band by referring to multiple low-frequencies. In this way, the re$ned sound has more comprehensive frequency band, and could be recognized more accurately. Proc. ACM Interact. Mob. Wearable Ubiquitous Technol., Vol. 5, No. 4, Article 182. Publication date: December 2021
182:4·Wang et al. This paper makes three contributions.First,we show the possibility of using low-cost and easily-overlooked RFID tags to effectively perform the thru-the-wall eavesdropping,pushing the limit of RFID sensing capability to the sub-mm level.Particularly,Tag-Bug can extract the sound vibration either from the vibration effect or the reflection effect,improving the applicability of our system.Second,we build a signal transmitting model to extract the vibration from the amplified Modulated Signal Difference(MSD)by removing the strong interference.A CGAN based method is designed to improve the quality of the recovered human voice.Third,we implemented our system Tag-Bug on the USRP platform.Real-world experiments show that Tag-Bug can successfully capture the monotone sound when the loudness is larger than 60dB.Tag-Bug can efficiently recognize the numbers of human voice with 95.3%,85.3%and 87.5%precision in the free-space eavesdropping,thru-the-brick-wall eavesdropping and thru-the-insulating-glass eavesdropping,respectively.Tag-Bug can also accurately recognize the letters with 87%precision in the free-space eavesdropping. 2 PROBLEM FORMULATION In this paper,we consider the novel problem of launching the side-channel eavesdropping on the loudspeaker by leveraging the vibration of ambient RFID tag due to the sound.Our attack mainly focuses on the sound played by the loudspeaker,rather than the voice of live human speech,because the live human speech mainly leads to the air flow instead of the air vibration due to the sound.As a result,the vibration extracted from the tag signal is related to the air flow,instead of the human voice.In this paper,we use the USRP platform to extract the sound due to the convenient access to the physical-layer signal. 2.1 Attack Model We assume a victim user with a loudspeaker and some surrounding objects,which are attached with the passive RFID tags.Since the RFID tags are widely used to identify the objects in either the online shopping or the unmanned supermarket,any tagged object can be a potential threat to the user privacy.For example,the labeling tags on the delivery packages or the hang tags of the clothes from the online market may all open up a window of opportunity for eavesdropping.Besides,the adversary can even intentionally hide the battery-less and light- weighted RFID tags beside the loudspeaker,e.g.,under the table,which is hard to be detected and is able to eavesdrop in a long term.In this paper,we mainly focus on the private information,which are made up of number or letters,e.g.,social security number,a password,a credit card number,etc. The adversary leverages an RFID system that can interrogate the RFID tags,which can work even in thru- the-wall scenario,and further extract the sound from the RF-signal and deduce the private information.Once any tag is placed beside the loudspeaker,the RF-signal backscattered by the tag can capture the sound vibration. Particularly,the tag can be directly vibrated by the sound due to the vibration effect,or affected by a nearby vibrating object due to the reflection effect.The adversary continuously collects the RF-signals and extracts the sound information when the loudspeaker is playing an audio sound,e.g,a conversation during an online meeting. By analyzing the spectrogram energy distribution,the adversary can extract the sound from the RF-signals to deduce the private information,even if the adversary is outside the room of the victim. 2.2 Eavesdropping Scenarios The side channel attack described in this paper can be launched via three different means:medium-based,aerial- based and reflection-based eavesdropping.Medium-based eavesdropping means the tag is directly attached on the vibration medium,e.g.,the loudspeaker.Hence,the sound transmission can lead to the tiny vibration of the medium and the tag.Aerial-based eavesdropping means that the tag is vibrated due to the aerial sound played by the loudspeaker,where recent work[6,17,26]has already shown its feasibility of capturing aerial sound using motion sensors.Both Medium-based and Aerial-based eavesdropping methods are leveraging the vibration effect Proc.ACM Interact.Mob.Wearable Ubiquitous Technol..Vol 5.No.4.Article 182.Publication date:December 2021
182:4 • Wang et al. This paper makes three contributions. First, we show the possibility of using low-cost and easily-overlooked RFID tags to e!ectively perform the thru-the-wall eavesdropping, pushing the limit of RFID sensing capability to the sub-mm level. Particularly, Tag-Bug can extract the sound vibration either from the vibration e!ect or the re"ection e!ect, improving the applicability of our system. Second, we build a signal transmitting model to extract the vibration from the ampli$ed Modulated Signal Di!erence (MSD) by removing the strong interference. A CGAN based method is designed to improve the quality of the recovered human voice. Third, we implemented our system Tag-Bug on the USRP platform. Real-world experiments show that Tag-Bug can successfully capture the monotone sound when the loudness is larger than 60dB. Tag-Bug can e#ciently recognize the numbers of human voice with 95.3%, 85.3% and 87.5% precision in the free-space eavesdropping, thru-the-brick-wall eavesdropping and thru-the-insulating-glass eavesdropping, respectively. Tag-Bug can also accurately recognize the letters with 87% precision in the free-space eavesdropping. 2 PROBLEM FORMULATION In this paper, we consider the novel problem of launching the side-channel eavesdropping on the loudspeaker by leveraging the vibration of ambient RFID tag due to the sound. Our attack mainly focuses on the sound played by the loudspeaker, rather than the voice of live human speech, because the live human speech mainly leads to the air "ow instead of the air vibration due to the sound. As a result, the vibration extracted from the tag signal is related to the air "ow, instead of the human voice. In this paper, we use the USRP platform to extract the sound due to the convenient access to the physical-layer signal. 2.1 A!ack Model We assume a victim user with a loudspeaker and some surrounding objects, which are attached with the passive RFID tags. Since the RFID tags are widely used to identify the objects in either the online shopping or the unmanned supermarket, any tagged object can be a potential threat to the user privacy. For example, the labeling tags on the delivery packages or the hang tags of the clothes from the online market may all open up a window of opportunity for eavesdropping. Besides, the adversary can even intentionally hide the battery-less and lightweighted RFID tags beside the loudspeaker, e.g., under the table, which is hard to be detected and is able to eavesdrop in a long term. In this paper, we mainly focus on the private information, which are made up of number or letters, e.g., social security number, a password, a credit card number, etc. The adversary leverages an RFID system that can interrogate the RFID tags, which can work even in thruthe-wall scenario, and further extract the sound from the RF-signal and deduce the private information. Once any tag is placed beside the loudspeaker, the RF-signal backscattered by the tag can capture the sound vibration. Particularly, the tag can be directly vibrated by the sound due to the vibration e!ect, or a!ected by a nearby vibrating object due to the re"ection e!ect. The adversary continuously collects the RF-signals and extracts the sound information when the loudspeaker is playing an audio sound, e.g., a conversation during an online meeting. By analyzing the spectrogram energy distribution, the adversary can extract the sound from the RF-signals to deduce the private information, even if the adversary is outside the room of the victim. 2.2 Eavesdropping Scenarios The side channel attack described in this paper can be launched via three di!erent means: medium-based, aerialbased and re"ection-based eavesdropping. Medium-based eavesdropping means the tag is directly attached on the vibration medium, e.g., the loudspeaker. Hence, the sound transmission can lead to the tiny vibration of the medium and the tag. Aerial-based eavesdropping means that the tag is vibrated due to the aerial sound played by the loudspeaker, where recent work[6, 17, 26] has already shown its feasibility of capturing aerial sound using motion sensors. Both Medium-based and Aerial-based eavesdropping methods are leveraging the vibration e!ect Proc. ACM Interact. Mob. Wearable Ubiquitous Technol., Vol. 5, No. 4, Article 182. Publication date: December 2021
Thru-the-wall Eavesdropping on Loudspeakers via RFID by Capturing Sub-mm Level Vibration.182:5 RX Loudspeaker USRP TX ImpinJ reader (a)Experiment setup for empirical study 15 sRP.100H2- C0TS.100H2 COTS-100H -USRP-100Hz 100Hz can be detected 052 4 Noise 10 154 55 0T300H —C0TS-300H 05 -USRP.MNH nly USRP tects 300Hz 0.5 0.1 0.12 0.14 950 300 35d Time (s) Frequency (Hz) (b)Signal in time domain (c)Signal in frequency domain Fig.2.Signal analysis of USRP signal for vibration sensing. to extract the sound information.Reflection-based eavesdropping means that a tag does not vibrate itself,but instead it is affected by the vibration of a nearby object,e.g,a cup of water. Online meeting.One possible attack scenario is that the victim is using a loudspeaker to discuss in the online meeting,which is frequently used during the COVID-19 period.The adversary can leverage the surrounding RFID tags to eavesdrop the sound played by the loudspeaker.As a result,the sensitive information talked during the online meeting can be obtained by the adversary,which may threat the personal life and property safety Voice assistant system.With the success of AI technique in the speech recognition,intelligent voice assistant systems,e.g.,Google Home,Amazon Echo Dot,are widely used due to their convenience.The voice assistants may replay the messages which includes some private information,e.g.,Google Home can remember the passwords or social security number with the 'Remember'function and replays them when needed.Such replayed sounds from the loudspeaker open up the possibility of the adversary eavesdropping on the private information. 3 FEASIBILITY STUDY In this section,we use several experiments to study the feasibility of extracting the sound vibrations via RFID tags.Particularly,we focus on the mono-tone sound vibration to study the sensitivity of the RFID tags,which can be extended to the human voice. 3.1 COTS RFID Reader V.S.USRP Reader We first compare the COTS RFID reader with the USRP reader in terms of sensing the tag vibration.We place the tag in front of the loudspeaker,as shown in Figure 2(a).We study the impact of mono-tone sounds with frequencies of 100Hz and 300Hz.By default,for the COTS ImpinJ Speedway R420 RFID reader [3],we have the sampling rate of 228Hz;for the USRP reader based on open project [4],we have the sampling rate of 2MHz. Proc.ACM Interact.Mob.Wearable Ubiquitous Technol.,Vol.5,No.4,Article 182.Publication date:December 2021
Thru-the-wall Eavesdropping on Loudspeakers via RFID by Capturing Sub-mm Level Vibration • 182:5 ImpinJ reader USRP TX RX Tag Loudspeaker (a) Experiment setup for empirical study 0.52 0.53 0.54 Amplitude 1.45 1.5 1.55 Phase USRP-100Hz COTS-100Hz 0.1 0.12 0.14 0.16 Time (s) 0.52 0.53 0.54 Amplitude 1.45 1.5 1.55 Phase USRP-300Hz COTS-300Hz 100Hz wave 300Hz jitters (b) Signal in time domain 100Hz can be detected Only USRP detects 300Hz Noise 298 300 302 0.05 0 (c) Signal in frequency domain Fig. 2. Signal analysis of USRP signal for vibration sensing. to extract the sound information. Re"ection-based eavesdropping means that a tag does not vibrate itself, but instead it is a!ected by the vibration of a nearby object, e.g., a cup of water. Online meeting. One possible attack scenario is that the victim is using a loudspeaker to discuss in the online meeting, which is frequently used during the COVID-19 period. The adversary can leverage the surrounding RFID tags to eavesdrop the sound played by the loudspeaker. As a result, the sensitive information talked during the online meeting can be obtained by the adversary, which may threat the personal life and property safety. Voice assistant system. With the success of AI technique in the speech recognition, intelligent voice assistant systems, e.g., Google Home, Amazon Echo Dot, are widely used due to their convenience. The voice assistants may replay the messages which includes some private information, e.g., Google Home can remember the passwords or social security number with the ‘Remember’ function and replays them when needed. Such replayed sounds from the loudspeaker open up the possibility of the adversary eavesdropping on the private information. 3 FEASIBILITY STUDY In this section, we use several experiments to study the feasibility of extracting the sound vibrations via RFID tags. Particularly, we focus on the mono-tone sound vibration to study the sensitivity of the RFID tags, which can be extended to the human voice. 3.1 COTS RFID Reader V.S. USRP Reader We $rst compare the COTS RFID reader with the USRP reader in terms of sensing the tag vibration. We place the tag in front of the loudspeaker, as shown in Figure 2(a). We study the impact of mono-tone sounds with frequencies of 100Hz and 300Hz. By default, for the COTS ImpinJ Speedway '420 RFID reader [3], we have the sampling rate of 228Hz; for the USRP reader based on open project [4], we have the sampling rate of 2MHz. Proc. ACM Interact. Mob. Wearable Ubiquitous Technol., Vol. 5, No. 4, Article 182. Publication date: December 2021
1826·Wang et al. 0 0 ON 04 01 0 OFF/ 0.2 CW EPC 02 的 0.3 0.4 0 05 15 2 2.5 0 0.1020.30.4 Time(s) In-phase (a)USRP signal components (b)Constellation of tag movement ¥0.62 USRP-100Hz 一USRP-10OHz 0.10.110.120.130.14 100 150 Time (s) Frequency (Hz) 0.62L 一USRP.300z 061 USRP.300Hz 05 0.11 0.12 0.13 0.14 250 300 350 Time (s) Frequency (Hz) (c)Amplitude of USRP signal (d)Frequency analysis of raw signal Fig.3.Principle analysis of vibration sensing from IQ plain. Observation 1:The USRP reader with higher sampling rate is more suitable for eavesdropping than the COTS RFID reader. For the COTS RFID reader,we can only detect the 100Hz sound from both the frequency domain and time domain,i.e.,the orange wave in Figure 2(b)and the orange peak in Figure 2(c).According to the Shannon's law [31].over 600Hz sampling rate is required to capture the 300Hz sound.Even if the compressive reading [40,41] can solve the mechanical vibration,it cannot sense the human voice,which has complicated frequency bands. Therefore,we do not consider the compressive sensing and use the traditional FFT to measure the frequency bands.For the USRP reader,even if the reader signal is much stronger than the tag signal,leading to the huge signal noise,we can still observe the weak tag signals of 100Hz and 300Hz in the time domain and frequency domain,i.e.,the 100Hz red wave and 300Hz jitters in Figure 2(b),and the corresponding blue peaks in Figure 2(c). Thus,when we focus on the human voice with complicated frequency bands,the USRP platform is more suitable to capture the human voice than the COTS RFID readers. 3.2 Tag Movement V.S.Tag Vibration Since the tag vibration can be regarded as a small tag movement,we next investigate how the physical-layer signal changes with the tag movement by pushing the tag close to the antenna. Observation 2:The tag movement leads to the wavy change in the time domain,and the rotation of signal vector in the IO plane. As shown in Figure 3(a),when we push the tag close to the antennas from 1.5m to 1.3m,the signal amplitude is changing as the cosine function.As shown in Figure 3(b),when we push the tag close to the antenna,the signal rotates in the IQ plane,and the rotation center is not at the origin.It means that the received signal does not change with the tag-antenna distance linearly.Moreover,two main circles are formed in this figure.In the enlarged signal in the time domain of Figure 3(a),we can clearly see the QUERY and ACK commands from the Proc.ACM Interact.Mob.Wearable Ubiquitous Technol,Vol.5,No.4,Article 182.Publication date:December 2021
182:6 • Wang et al. 0 0.5 1 1.5 2 2.5 Time (s) 0 0.1 0.2 0.3 0.4 0.5 Amplitude QUERY ACK RN16 EPC CW (a) USRP signal components ON OFF/ CW (b) Constellation of tag movement 0.1 0.11 0.12 0.13 0.14 Time (s) 0.615 0.62 Amplitude USRP-100Hz 0.1 0.11 0.12 0.13 0.14 Time (s) 0.615 0.62 Amplitude USRP-300Hz (c) Amplitude of USRP signal 50 100 150 Frequency (Hz) 0 1 2 FFT USRP-100Hz 250 300 350 Frequency (Hz) 0 0.5 1 FFT USRP-300Hz (d) Frequency analysis of raw signal Fig. 3. Principle analysis of vibration sensing from IQ plain. Observation 1: The USRP reader with higher sampling rate is more suitable for eavesdropping than the COTS RFID reader. For the COTS RFID reader, we can only detect the 100Hz sound from both the frequency domain and time domain, i.e., the orange wave in Figure 2(b) and the orange peak in Figure 2(c). According to the Shannon’s law [31], over 600Hz sampling rate is required to capture the 300Hz sound. Even if the compressive reading [40, 41] can solve the mechanical vibration, it cannot sense the human voice, which has complicated frequency bands. Therefore, we do not consider the compressive sensing and use the traditional FFT to measure the frequency bands. For the USRP reader, even if the reader signal is much stronger than the tag signal, leading to the huge signal noise, we can still observe the weak tag signals of 100Hz and 300Hz in the time domain and frequency domain, i.e., the 100Hz red wave and 300Hz jitters in Figure 2(b), and the corresponding blue peaks in Figure 2(c). Thus, when we focus on the human voice with complicated frequency bands, the USRP platform is more suitable to capture the human voice than the COTS RFID readers. 3.2 Tag Movement V.S. Tag Vibration Since the tag vibration can be regarded as a small tag movement, we next investigate how the physical-layer signal changes with the tag movement by pushing the tag close to the antenna. Observation 2: The tag movement leads to the wavy change in the time domain, and the rotation of signal vector in the IQ plane. As shown in Figure 3(a), when we push the tag close to the antennas from 1.5m to 1.3m, the signal amplitude is changing as the cosine function. As shown in Figure 3(b), when we push the tag close to the antenna, the signal rotates in the IQ plane, and the rotation center is not at the origin. It means that the received signal does not change with the tag-antenna distance linearly. Moreover, two main circles are formed in this $gure. In the enlarged signal in the time domain of Figure 3(a), we can clearly see the QUERY and ACK commands from the Proc. ACM Interact. Mob. Wearable Ubiquitous Technol., Vol. 5, No. 4, Article 182. Publication date: December 2021
Thru-the-wall Eavesdropping on Loudspeakers via RFID by Capturing Sub-mm Level Vibration.182:7 Environment Signal changes due to movement W signa displacement Environment 0 In-pbase (a)Signal components in RFID (b)Signal in the IO plane Fig.4.Transmission model in RFID system. reader,as well as the RN16 response and EPC response from the tag.Comparing Figure 3(b)with Figure 3(a),two circles in Figure 3(b)are caused by the changes of CW signals and tag backscattered signals,which correspond to the OFF and ON states of tag modulation [13].Note that when we push the tag about 20cm,which is about 1.23x of the half wave length of CW signals,the signal rotates about 1.23x circles.Since the tag vibration is a small tag movement,the tag vibration leads to the small wavy change in the time domain,and small rotation of signal vector in the IQ plane,which are used to build the model in Section 4. 3.3 Tag Vibration V.S.Diaphragm Vibration Since both the tag and the diaphragm may vibrate due to the sound pressure,we conduct experiments to study the different influences.Particularly,we remove the tag in front of the loudspeaker as shown in Figure 2(a)to capture the diaphragm vibration from the CW signal. Observation 3:The tag vibration captured by backscattered signals is much larger than the loudspeaker diaphragm vibration captured by CW signals. Comparing Figure 2(b)with Figure 3(c),when we remove the tag from the loudspeaker,the periodic patterns without tags are distinctly reduced.Particularly,for the 100Hz sound,we can still observe the weak periodic pattern in Figure 3(c),but the amplitude is much weaker than Figure 2(b).For the 300Hz sound,no periodic pattern can be found in Figure 3(c)and Figure 3(d).The reason is that the metallic tag can backscatter more RF-signals than the papery diaphragm.Thus,the attached tag can amplify the interference of the loudspeaker through backscattering. 4 SYSTEM DESIGN In this section,we introduce the principle of Tag-Bug by extracting the vibration of tag based on the signal model. In particular,we propose to extract the sound from either the vibration effect or the reflection effect of the tag. According to the sound extraction model,we design a new tag response mechanism,which can randomize the tag responses and improve the sound quality. 4.1 Transmitting Model Uplink.In RFID systems,the transmitting antenna TX sends the Cw signal to activate the tag as shown in Figure 4(a).Due to the interference of multi-path effect,the signal reflected from the environment also arrives at the tag together with the Cw signal: Stag STx(hd+hE). (1) Here,Stag indicates the signal received by the tag,STx is the CW signal sent by the TX antenna,hd is the signal attenuation due to the transmitting distance and he is the signal attenuation due to the multi-path effect of the environment.Particularly,in an ideal channel model [13],ha can be calculated as hd =ei,where d is the distance between the TX antenna and the tag,j is the imaginary number.0d is the phase calculated from distance Proc.ACM Interact.Mob.Wearable Ubiquitous Technol.,Vol.5,No.4,Article 182.Publication date:December 2021
Thru-the-wall Eavesdropping on Loudspeakers via RFID by Capturing Sub-mm Level Vibration • 182:7 TX RX Tag displacement Environment CW signal Multi-path Backscattered signal Leakage signal Environment (a) Signal components in RFID O !! !" !# !$ Signal changes due to movement In-phase Quadrature (b) Signal in the IQ plane Fig. 4. Transmission model in RFID system. reader, as well as the RN16 response and EPC response from the tag. Comparing Figure 3(b) with Figure 3(a), two circles in Figure 3(b) are caused by the changes of CW signals and tag backscattered signals, which correspond to the OFF and ON states of tag modulation [13]. Note that when we push the tag about 202<, which is about 1.23⇥ of the half wave length of CW signals, the signal rotates about 1.23⇥ circles. Since the tag vibration is a small tag movement, the tag vibration leads to the small wavy change in the time domain, and small rotation of signal vector in the IQ plane, which are used to build the model in Section 4. 3.3 Tag Vibration V.S. Diaphragm Vibration Since both the tag and the diaphragm may vibrate due to the sound pressure, we conduct experiments to study the di!erent in"uences. Particularly, we remove the tag in front of the loudspeaker as shown in Figure 2(a) to capture the diaphragm vibration from the CW signal. Observation 3: The tag vibration captured by backscattered signals is much larger than the loudspeaker diaphragm vibration captured by CW signals. Comparing Figure 2(b) with Figure 3(c), when we remove the tag from the loudspeaker, the periodic patterns without tags are distinctly reduced. Particularly, for the 100Hz sound, we can still observe the weak periodic pattern in Figure 3(c), but the amplitude is much weaker than Figure 2(b). For the 300Hz sound, no periodic pattern can be found in Figure 3(c) and Figure 3(d). The reason is that the metallic tag can backscatter more RF-signals than the papery diaphragm. Thus, the attached tag can amplify the interference of the loudspeaker through backscattering. 4 SYSTEM DESIGN In this section, we introduce the principle of Tag-Bug by extracting the vibration of tag based on the signal model. In particular, we propose to extract the sound from either the vibration e!ect or the re"ection e!ect of the tag. According to the sound extraction model, we design a new tag response mechanism, which can randomize the tag responses and improve the sound quality. 4.1 Transmi!ing Model Uplink. In RFID systems, the transmitting antenna TX sends the CW signal to activate the tag as shown in Figure 4(a). Due to the interference of multi-path e!ect, the signal re"ected from the environment also arrives at the tag together with the CW signal: (C06 = ()- (⌘3 + ⌘⇢). (1) Here, (C06 indicates the signal received by the tag, ()- is the CW signal sent by the TX antenna, ⌘3 is the signal attenuation due to the transmitting distance and ⌘⇢ is the signal attenuation due to the multi-path e!ect of the environment. Particularly, in an ideal channel model [13], ⌘3 can be calculated as ⌘3 = 1 3 4j\3 , where 3 is the distance between the TX antenna and the tag, j is the imaginary number. \3 is the phase calculated from distance Proc. ACM Interact. Mob. Wearable Ubiquitous Technol., Vol. 5, No. 4, Article 182. Publication date: December 2021
1828·Wang et al. d and wave length入,as: d 8a=2r mod2π. (2) he is related to distance d and the transmitting environment in principle Downlink.After the tag receives the signal,the tag backscatters the signal with FMo or Miller modulations, which encodes the binary bits with ON and OFF states [13].For the OFF state,the tag backscatters all the CW signal, which has a small amplitude.Therefore,the signal received by the reader is the combination of the backscattered signal from tag Stag(hd+he)and the leakage signal from reader SrxhL: SRX.0=STXhL Stag(ha +hE)=STX(hL hahd +hE.d), (3) where hd is the signal attenuation due to the downlink transmitting distance,he indicates the environment influence in the backscattered channel.For simplicity,we use hed to represent the overall signal attenuation due to the environment,which is also related to the distance d. For the ON state,the tag backscatters a large amplitude signal by changing the state of tag antenna.Thus,the received signal is: SRx.1 STxhL Stag(ha hg)h1 STx(hL hihahd +he.d). (4 where hi is the modulation gain of the tag,and h is the overall signal attenuation due to the environment for the ON state.In RFID systems,the tag changes the antenna capacitance to modulate the Cw signal during the backscattering,so that h is usually regarded as the signal enhancement.Particularly,because the multi-path effect from the environment is relative small,we thus omit the influence of hi and regard hd approximates to hed.As a result,the signal received by the reader can be divided into three parts:the leakage signal SL,the multi-path signal Se and the backscattered signal So or S1,where SL STxhL, SE STxhE.d, (5) So Srxhaha,S1=STxhahah1. When the TX antenna and RX antenna are placed close to each other and the tag is relatively far from the two antennas,we regardd'd.Thus,both So and S are proportional to hh==e20,indicating that the phase change is 2d.Such phase change is compatible with the results in Figure 3(b),where 20cm movement leads to 2.45x radians phase change. IO plane analysis.Figure 4(b)presents the signal model in the IQ plane.The transmitting distance dd'changes with the tag movement,leading to the change of both the multi-path signal Se and the backscattered signal So\Si Thus,the phases of Se and So\S get changed,resulting in the rotation of the corresponding signals.The phase change of So\S is caused by the signal attenuation h2,whose phase change is 2m24.Therefore,both So and S rotate with the transmitting distance d,which leads to two arcs in the IQ plane.Since Se is usually static,we omit it for simplicity.Such results exactly explain the signal change in Figure 3(b) 4.2 Sound Extraction from Vibration Effect Theoretically,the vibration effect of the tag due to the sound can lead to the variation of the transmitting distance as d=do+f(t,do).Here,do is the average tag-antenna distance,and f(t,d)is the distance variation related to time t and vibration amplitude do.For the mono-tone sound with the frequency o,f(t,do)=d,cos(2mot),which can be extended to any complicated sound with multiple tones.For simplicity,we introduce the algorithm with mono tone sound.In an ideal model,such tag vibration can be directly captured by the received signals So and S1. However,since the leakage signal SL is much stronger than the backscattered signal So and S1,the small changes of So and Si will not remarkably affect the received signal SRx.o and SRx.1.Figure 5(a)plots the vibration-based Proc.ACM Interact.Mob.Wearable Ubiquitous Technol,Vol.5,No.4,Article 182.Publication date:December 2021
182:8 • Wang et al. 3 and wave length _, as: \3 = 2c 3 _ mod 2c. (2) ⌘⇢ is related to distance 3 and the transmitting environment in principle. Downlink. After the tag receives the signal, the tag backscatters the signal with FM0 or Miller modulations, which encodes the binary bits with ON and OFF states [13]. For the OFF state, the tag backscatters all the CW signal, which has a small amplitude. Therefore, the signal received by the reader is the combination of the backscattered signal from tag (C06 (⌘30 + ⌘⇢0) and the leakage signal from reader ()-⌘!: ('-,0 = ()-⌘! + (C06 (⌘30 + ⌘⇢0) = ()- (⌘! + ⌘3⌘30 + ⌘⇢,3 ), (3) where ⌘30 is the signal attenuation due to the downlink transmitting distance, ⌘⇢0 indicates the environment in"uence in the backscattered channel. For simplicity, we use ⌘⇢,3 to represent the overall signal attenuation due to the environment, which is also related to the distance 3. For the ON state, the tag backscatters a large amplitude signal by changing the state of tag antenna. Thus, the received signal is: ('-,1 = ()-⌘! + (C06 (⌘30 + ⌘⇢0)⌘1 = ()- (⌘! + ⌘1⌘3⌘30 + ⌘0 ⇢,3 ), (4) where ⌘1 is the modulation gain of the tag, and ⌘0 ⇢,3 is the overall signal attenuation due to the environment for the ON state. In RFID systems, the tag changes the antenna capacitance to modulate the CW signal during the backscattering, so that ⌘1 is usually regarded as the signal enhancement. Particularly, because the multi-path e!ect from the environment is relative small, we thus omit the in"uence of ⌘1 and regard ⌘0 ⇢,3 approximates to ⌘⇢,3 . As a result, the signal received by the reader can be divided into three parts: the leakage signal (!, the multi-path signal (⇢ and the backscattered signal (0 or (1, where 8>>>>> : (! = ()-⌘!, (⇢ = ()-⌘⇢,3, (0 = ()-⌘3⌘30, (1 = ()-⌘3⌘30⌘1. (5) When the TX antenna and RX antenna are placed close to each other and the tag is relatively far from the two antennas, we regard 30 ⇡ 3. Thus, both (0 and (1 are proportional to ⌘3⌘30 = ⌘2 3 = 1 32 4j2\3 , indicating that the phase change is 2c 23 _ . Such phase change is compatible with the results in Figure 3(b), where 20cm movement leads to 2.45c radians phase change. IQ plane analysis. Figure 4(b) presents the signal model in the IQ plane. The transmitting distance 3\30 changes with the tag movement, leading to the change of both the multi-path signal (⇢ and the backscattered signal (0\(1. Thus, the phases of (⇢ and (0\(1 get changed, resulting in the rotation of the corresponding signals. The phase change of (0\(1 is caused by the signal attenuation ⌘2 3 , whose phase change is 2c 23 _ . Therefore, both (0 and (1 rotate with the transmitting distance 3, which leads to two arcs in the IQ plane. Since (⇢ is usually static, we omit it for simplicity. Such results exactly explain the signal change in Figure 3(b). 4.2 Sound Extraction from Vibration E"ect Theoretically, the vibration e!ect of the tag due to the sound can lead to the variation of the transmitting distance as 3 = 30 + 5 (C, 3E ). Here, 30 is the average tag-antenna distance, and 5 (C, 3E ) is the distance variation related to time C and vibration amplitude 3E . For the mono-tone sound with the frequency q, 5 (C, 3E ) = 3E cos(2cqC), which can be extended to any complicated sound with multiple tones. For simplicity, we introduce the algorithm with mono tone sound. In an ideal model, such tag vibration can be directly captured by the received signals (0 and (1. However, since the leakage signal (! is much stronger than the backscattered signal (0 and (1, the small changes of (0 and (1 will not remarkably a!ect the received signal ('-,0 and ('-,1. Figure 5(a) plots the vibration-based Proc. ACM Interact. Mob. Wearable Ubiquitous Technol., Vol. 5, No. 4, Article 182. Publication date: December 2021
Thru-the-wall Eavesdropping on Loudspeakers via RFID by Capturing Sub-mm Level Vibration.182:9 Centralized Large signal noise phase change 0535 05 S 0525 SRX.o Long time interval 0.1005 0.1006 0.1007 0 In-phase Time(s) (a)Raw signal V.S.Centralized signal (b)Signal cancellation from adjacent samples Phase change of 05 aw恤a图 Amplified MSD 0 05 一djacent samples SRX.O SRX.O SRX.1 0 Mean of S SL Phase change of o 300Hz -Mean of Sx Static MSD 0s 200 400 600 800 1000 In-phase Frequency (Hz) (c)Amplified MSD V.S.Static MSD (d)Vibration extraction results of different cancellation methods Fig.5.Vibration extraction mechanisms. signal change by omitting SE.Both SRx.o and SRx.I slightly rotate,and the raw phase change is much small due to the strong leakage signal.Moreover,sub-mm level vibration of the tag due to the sound can be easily drowned by the ambient noise.Thus,we need to amplify the vibration effect by removing the strong interference. Naive Normalization.The direct way is to centralize SRx.by subtracting the average value SRx.1,as shown in Figure 5(a).The phase variance range can be amplified to [0,2].However,in the real system,SRx.contains the large ambient noise,and such subtracting can import the additional noise signal.Thus,both the vibration effect and the signal noise are amplified. To efficiently amplify the vibration effect,our basic idea is to extract the backscattered signals,which are related to the tag displacement.If we can obtain the backscattered signal So or S.the corresponding phase change can indicate the tag displacement.However,it is difficult to measure the leakage signal SL and the environment signal SE,thus,we cannot individually get either So or S by referring to SRx.o and SRx.1.Fortunately,since both SL and Se are static in most scenarios,by regarding hed approximates to he.d,we can remove SL and Sg from Eq.(4)and Eq.(3)as: ASRX SRX.1 -SRX.0 STX(h1-1)h (6) We call it Modulated Signal Difference(MSD).Here,only hd changes with the tag vibration in principle,meaning that the vibration can be extracted from the MSD phase. However,in any snapshot,only one of Sgx.o and Sgx.1 can be received.Therefore,we cannot get the MSD ASRx in reality.For a static tag,we can use SRx.o and SRx.I to calculate the MSD ASRx,which is called Static MSD. But for a vibrating tag,both SRx.o and SRx.I get changed even during one tag response.Therefore,we cannot simply calculate the MSD from the average value.To address the problem,two kinds of cancellation solutions are considered to extract the MSD efficiently. Proc.ACM Interact.Mob.Wearable Ubiquitous Technol.,Vol.5,No.4,Article 182.Publication date:December 2021
Thru-the-wall Eavesdropping on Loudspeakers via RFID by Capturing Sub-mm Level Vibration • 182:9 In-phase Quadrature O !! !"#,% !"#,& Raw phase change Centralized phase change I Q !"#,% (a) Raw signal V.S. Centralized signal 0.1005 0.1006 0.1007 Time (s) 0.525 0.53 0.535 Amplitude !"#,% !"#,& Large signal noise Long time interval (b) Signal cancellation from adjacent samples O �� ���,1 ���,0 Phase change of Static MSD �� Phase change of Amplified MSD �� ���,0 Quadrature In-phase (c) Ampli$ed MSD V.S. Static MSD 0 0.5 FFT Raw phase 0 0.5 FFT Adjacent samples 0 0.5 FFT Mean of SRX,1 200 400 600 800 1000 Frequency (Hz) 0 0.5 FFT Mean of S 300Hz RX,0 (d) Vibration extraction results of di!erent cancellation methods Fig. 5. Vibration extraction mechanisms. signal change by omitting (⇢. Both ('-,0 and ('-,1 slightly rotate, and the raw phase change is much small due to the strong leakage signal. Moreover, sub-mm level vibration of the tag due to the sound can be easily drowned by the ambient noise. Thus, we need to amplify the vibration e!ect by removing the strong interference. Naïve Normalization. The direct way is to centralize ('-,1 by subtracting the average value ('-,1, as shown in Figure 5(a). The phase variance range can be ampli$ed to [0, 2c]. However, in the real system, ('-,1 contains the large ambient noise, and such subtracting can import the additional noise signal. Thus, both the vibration e!ect and the signal noise are ampli$ed. To e#ciently amplify the vibration e!ect, our basic idea is to extract the backscattered signals, which are related to the tag displacement. If we can obtain the backscattered signal (0 or (1, the corresponding phase change can indicate the tag displacement. However, it is di#cult to measure the leakage signal (! and the environment signal (⇢, thus, we cannot individually get either (0 or (1 by referring to ('-,0 and ('-,1. Fortunately, since both (! and (⇢ are static in most scenarios, by regarding ⌘0 ⇢,3 approximates to ⌘⇢,3 , we can remove (! and (⇢ from Eq. (4) and Eq. (3) as: ('- = ('-,1 ('-,0 ⇡ ()- (⌘1 1)⌘2 3 . (6) We call it Modulated Signal Di!erence (MSD). Here, only ⌘3 changes with the tag vibration in principle, meaning that the vibration can be extracted from the MSD phase. However, in any snapshot, only one of ('-,0 and ('-,1 can be received. Therefore, we cannot get the MSD ('- in reality. For a static tag, we can use ('-,0 and ('-,1 to calculate the MSD ('- , which is called Static MSD. But for a vibrating tag, both ('-,0 and ('-,1 get changed even during one tag response. Therefore, we cannot simply calculate the MSD from the average value. To address the problem, two kinds of cancellation solutions are considered to extract the MSD e#ciently. Proc. ACM Interact. Mob. Wearable Ubiquitous Technol., Vol. 5, No. 4, Article 182. Publication date: December 2021
182:10·Wang et al. 0.5 RFID tag 02 0. 43544044 Loudspeaker 400 600 800 1000 Bottle of water Frequency (Hz) Fig.6.Influence of enhanced multi-path effect. Instantaneous MSD..The first solution is the cancellation based on adjacent samples.As shown in Figure 5(b). since SRx.o and SRx.I cannot be collected in one snapshot,we use adjacent samples to approximate uncollected samples,which is called instantaneous MSD.It is similar to the standard RFID channel estimation,but the traditional estimation targets on a relatively stable tag while we focus on a vibrating tag.As shown in Figure 5(b) due to the large signal noise around the signal edge,adjacent samples should be selected from the stable part of the square wave,and thus there is a longer time interval between adjacent samples.Such interval may not affect a stable tag,but can introduce the large noise for the high-frequency vibration. Amplified MSD.The second solution is the cancellation based on SRx.o within a small time window.The basic idea is to use SRx.o to replace SRx.o for cancellation.Due to the vibration influence,SRx.o and SRx.i change with the time due to the vibration.Since the tag is vibrating at a fixed position during the time window,Sgx.o can be roughly regarded as SRx.o when the tag is static at its original position.As Sgx.o is an average value,there is no time interval between Sgx.and Sgx.o.and the vibration feature is extracted as: ASx=SRx1-SRx0≈STx(hh匠-h). (7) Here,the time variation of SRx.o is removed while Sgx.i still contains the time variation signal due to the sound. Thus,Eq.(7)can be used to derive the vibration. Compared with the static MSD,Eq.(7)omits the variation of So and focuses on the variation of S to extract the vibration.It is related to both the transmitting distance d and the modulation attenuation h.Since h is the modulation factor caused by the impedance change of the tag antenna,the amplitude of h is greater than 1. Hence,Eq.(7)can amplify the MSD phase by using SRx.o,which is called the amplified MSD.Figure 5(c)illustrates the amplification principle.By connecting the end of SL and SRx.o,according to the exterior angle theorem of a triangle,the phase change of the amplified MSD 0 i.e.,summation of the two exterior angles,is larger than the phase change of the static MSD 0s,i.e.,the summation of the two remote interior angles. We use the 300Hz mono-tone sound to test the performance of different solutions as shown in Figure 5(d). For the raw phase of received signals,the 300Hz sound is buried by the 610Hz noise,which is caused by the measurement noise of the hardware.For the cancellation based on adjacent samples,the noise spreads over the frequency band,due to the large interval between adjacent samples.For the cancellation based on SRx.1,although we can get the clear peak at 300Hz,the ambient noise is also amplified,leading to several noise peaks.For the cancellation based on SRx.o,we detect frequency peaks at 300Hz,600Hz and 900Hz,caused by harmonic signals. Thus,the amplified MSD is better for the vibration extraction,which is also suitable for the extraction of the sound with multiple tones. Proc.ACM Interact.Mob.Wearable Ubiquitous Technol,Vol.5,No.4,Article 182.Publication date:December 2021
182:10 • Wang et al. RFID tag Bottle of water Loudspeaker Fig. 6. Influence of enhanced multi-path e"ect. Instantaneous MSD.. The $rst solution is the cancellation based on adjacent samples. As shown in Figure 5(b), since ('-,0 and ('-,1 cannot be collected in one snapshot, we use adjacent samples to approximate uncollected samples, which is called instantaneous MSD. It is similar to the standard RFID channel estimation, but the traditional estimation targets on a relatively stable tag while we focus on a vibrating tag. As shown in Figure 5(b), due to the large signal noise around the signal edge, adjacent samples should be selected from the stable part of the square wave, and thus there is a longer time interval between adjacent samples. Such interval may not a!ect a stable tag, but can introduce the large noise for the high-frequency vibration. Ampli#ed MSD.. The second solution is the cancellation based on ('-,0 within a small time window. The basic idea is to use ('-,0 to replace ('-,0 for cancellation. Due to the vibration in"uence, ('-,0 and ('-,1 change with the time due to the vibration. Since the tag is vibrating at a $xed position during the time window, ('-,0 can be roughly regarded as ('-,0 when the tag is static at its original position. As ('-,0 is an average value, there is no time interval between ('-,1 and ('-,0, and the vibration feature is extracted as: (0 '- = ('-,1 ('-,0 ⇡ ()- (⌘1⌘2 3 ⌘2 3 ). (7) Here, the time variation of ('-,0 is removed while ('-,1 still contains the time variation signal due to the sound. Thus, Eq. (7) can be used to derive the vibration. Compared with the static MSD, Eq. (7) omits the variation of (0 and focuses on the variation of (1 to extract the vibration. It is related to both the transmitting distance 3 and the modulation attenuation ⌘1. Since ⌘1 is the modulation factor caused by the impedance change of the tag antenna, the amplitude of ⌘1 is greater than 1. Hence, Eq. (7) can amplify the MSD phase by using ('-,0, which is called the ampli#ed MSD. Figure 5(c) illustrates the ampli$cation principle. By connecting the end of (! and ('-,0, according to the exterior angle theorem of a triangle, the phase change of the ampli$ed MSD \0, i.e., summation of the two exterior angles, is larger than the phase change of the static MSD \B, i.e., the summation of the two remote interior angles. We use the 300Hz mono-tone sound to test the performance of di!erent solutions as shown in Figure 5(d). For the raw phase of received signals, the 300Hz sound is buried by the 610Hz noise, which is caused by the measurement noise of the hardware. For the cancellation based on adjacent samples, the noise spreads over the frequency band, due to the large interval between adjacent samples. For the cancellation based on ('-,1, although we can get the clear peak at 300Hz, the ambient noise is also ampli$ed, leading to several noise peaks. For the cancellation based on ('-,0, we detect frequency peaks at 300Hz, 600Hz and 900Hz, caused by harmonic signals. Thus, the ampli$ed MSD is better for the vibration extraction, which is also suitable for the extraction of the sound with multiple tones. Proc. ACM Interact. Mob. Wearable Ubiquitous Technol., Vol. 5, No. 4, Article 182. Publication date: December 2021