The main contributions of our work are as follows: III.DEVICE-FREE GESTURE MATCHING We propose a new similarity measure that can adapt to the In this section,we first summarize the state-of-the-art ges- speed variations in gesture signals of different domains. ture matching methods and their limitations.We then describe We formally prove the properties of the speed adaptive signal our insight on the characteristics of device-free gesture signals. matching scheme and show that the result of DSW is a valid Finally,we demonstrate the benefits of using such speed- similarity measure. adaptive characteristics for gesture similarity calculation. Using real-world ultrasound gesture signals,we show that A.Gesture Matching Methods the DSW algorithm can serve as a solution for both one-shot Device-free gesture recognition systems collect radio/sound learning in supervised gesture recognition and unsupervised signals reflected by the hand to perform gesture recognition. gesture clustering tasks. We call these radio/sound signals gesture signals.The most- widely used gesture signals are complex-valued baseband signals that have Doppler frequencies corresponding to the II.RELATED WORKS hand movement speeds [1].[3],[25].For instance,Figure 1(a) Existing works that are closely related to our approach can and l(e)show the ultrasonic baseband signals of two samples be categorized into three areas:domain-independent feature of the writing "W"gesture,where the user writes the letter "W"in-the-air by moving hand back-and-forth twice.The I- extraction,cross-domain adaptation,and DTW-based schemes. component and the Q-component are the real and imaginary Domain-independent feature extraction.Early device-free parts of the gesture signal.Meanwhile,Figure 1(b)and 1(f) gesture recognition systems use statistical values (mean,vari- show the corresponding spectrogram calculated through Short- ance,etc.)of the signals [21]-[23]or Doppler speeds [24],[25] time Fourier transform (STFT).We can observe that the as the gesture features.However,it's well known that these gesture signal has a negative Doppler frequency when the hand features are dependent on the user,the location of devices. is moving away and has a positive frequency when the hand is and multi-path conditions introduced by the environment. moving back.Thus,the gesture signals show specific patterns There are two major approaches to extract domain-independent that can be matched with gesture movements and actions features for device-free gesture signals.The first approach is to At early stages of device-free gesture recognition,research- use an adversarial network as domain discriminator to help the ers collect statistical parameters from gesture signals as fea- feature-extracting network in generating domain-independent tures,such as mean,variance,and standard deviation.Such features [16.However,the training process requires huge statistical features cannot adapt to speed variations and often datasets from multiple domains,which leads to high data lead to models that are not robust enough for real-world collection costs.The second approach is to use geometric applications.Some other unsuccessful efforts indicate that models to recombine signals measured through multiple links linear transformation is inherently insufficient for handling the into a domain-independent body-coordinate velocity profile complicated nonlinearity fluctuation in patterns [25]. [19].However,this domain-adaption method uses multiple The DTW algorithm is a pattern matching algorithm with devices and assumes that accurate user locations are known. a nonlinear time-normalization effect.While directly applying Cross-domain adaptation.Instead of using domain- DTW on the gesture waveforms have been applied in device- independent features,we can also transfer a domain-specific free keystroke matching,it only works when gestures start gesture recognition model into the target domain.One ap- from a fixed point and is not robust enough for daily gesture proach is to use transfer learning schemes to retrain the model recognition tasks [32].As an example,consider the gesture using a small number of samples in the target domain [26]- samples in Figure 1(a)and 1(e),which have durations of [28].Another way is to use neural networks or geometric 1.5 and 3 seconds,respectively.Besides the differences in models to transfer the samples in the source domain to the frequencies caused by different movement speeds,the two target domain in order to boost the number of training samples waveforms also have different initial phases and different low- in the target domain [18],[29].Compared to these approaches frequency components.The initial phase and low-frequency that need samples in the target domain for bootstrapping,the components of the gesture signal depend on the starting DSW scheme can evaluate the similarity of gestures from position and small body movements [3],which are noise for unknown target domains gesture recognition.However,these noisy factors dominate the Dynamic Time Warping schemes.The DTW algorithm similarity calculated by DTW so that DTW could not find a is originally designed for matching speech signals that have suitable matching for these two samples in the time domain. different lengths in time [20].As human activities also have Fortunately,STFT and other time-frequency analysis allow variable durations,DTW has been adopted for various types of us to focus on the frequency domain without being distracted activity recognition systems [30],[31].In device-free gesture by phases and low-frequency components.The spectrograms recognition,DTW has been applied for matching either the raw in Figure 1(b)and Figure 1(f)clearly show how the dis- gesture signals [32].[33]or the extracted features [21].[34].tribution of Doppler frequency changes over time,which However,these DTW applications only consider the scaling is directly connected to changes in the movement speed. in time rather than the scaling of speed distribution and the However,matching spectrograms is challenging since gesture consistency of movement distance. speed changes introduce both frequency shifts and gestureThe main contributions of our work are as follows: • We propose a new similarity measure that can adapt to the speed variations in gesture signals of different domains. • We formally prove the properties of the speed adaptive signal matching scheme and show that the result of DSW is a valid similarity measure. • Using real-world ultrasound gesture signals, we show that the DSW algorithm can serve as a solution for both one-shot learning in supervised gesture recognition and unsupervised gesture clustering tasks. II. RELATED WORKS Existing works that are closely related to our approach can be categorized into three areas: domain-independent feature extraction, cross-domain adaptation, and DTW-based schemes. Domain-independent feature extraction. Early device-free gesture recognition systems use statistical values (mean, variance, etc.) of the signals [21]–[23] or Doppler speeds [24], [25] as the gesture features. However, it’s well known that these features are dependent on the user, the location of devices, and multi-path conditions introduced by the environment. There are two major approaches to extract domain-independent features for device-free gesture signals. The first approach is to use an adversarial network as domain discriminator to help the feature-extracting network in generating domain-independent features [16]. However, the training process requires huge datasets from multiple domains, which leads to high data collection costs. The second approach is to use geometric models to recombine signals measured through multiple links into a domain-independent body-coordinate velocity profile [19]. However, this domain-adaption method uses multiple devices and assumes that accurate user locations are known. Cross-domain adaptation. Instead of using domainindependent features, we can also transfer a domain-specific gesture recognition model into the target domain. One approach is to use transfer learning schemes to retrain the model using a small number of samples in the target domain [26]– [28]. Another way is to use neural networks or geometric models to transfer the samples in the source domain to the target domain in order to boost the number of training samples in the target domain [18], [29]. Compared to these approaches that need samples in the target domain for bootstrapping, the DSW scheme can evaluate the similarity of gestures from unknown target domains. Dynamic Time Warping schemes. The DTW algorithm is originally designed for matching speech signals that have different lengths in time [20]. As human activities also have variable durations, DTW has been adopted for various types of activity recognition systems [30], [31]. In device-free gesture recognition, DTW has been applied for matching either the raw gesture signals [32], [33] or the extracted features [21], [34]. However, these DTW applications only consider the scaling in time rather than the scaling of speed distribution and the consistency of movement distance. III. DEVICE-FREE GESTURE MATCHING In this section, we first summarize the state-of-the-art gesture matching methods and their limitations. We then describe our insight on the characteristics of device-free gesture signals. Finally, we demonstrate the benefits of using such speedadaptive characteristics for gesture similarity calculation. A. Gesture Matching Methods Device-free gesture recognition systems collect radio/sound signals reflected by the hand to perform gesture recognition. We call these radio/sound signals gesture signals. The mostwidely used gesture signals are complex-valued baseband signals that have Doppler frequencies corresponding to the hand movement speeds [1], [3], [25]. For instance, Figure 1(a) and 1(e) show the ultrasonic baseband signals of two samples of the writing “W” gesture, where the user writes the letter “W” in-the-air by moving hand back-and-forth twice. The Icomponent and the Q-component are the real and imaginary parts of the gesture signal. Meanwhile, Figure 1(b) and 1(f) show the corresponding spectrogram calculated through Shorttime Fourier transform (STFT). We can observe that the gesture signal has a negative Doppler frequency when the hand is moving away and has a positive frequency when the hand is moving back. Thus, the gesture signals show specific patterns that can be matched with gesture movements and actions. At early stages of device-free gesture recognition, researchers collect statistical parameters from gesture signals as features, such as mean, variance, and standard deviation. Such statistical features cannot adapt to speed variations and often lead to models that are not robust enough for real-world applications. Some other unsuccessful efforts indicate that linear transformation is inherently insufficient for handling the complicated nonlinearity fluctuation in patterns [25]. The DTW algorithm is a pattern matching algorithm with a nonlinear time-normalization effect. While directly applying DTW on the gesture waveforms have been applied in devicefree keystroke matching, it only works when gestures start from a fixed point and is not robust enough for daily gesture recognition tasks [32]. As an example, consider the gesture samples in Figure 1(a) and 1(e), which have durations of 1.5 and 3 seconds, respectively. Besides the differences in frequencies caused by different movement speeds, the two waveforms also have different initial phases and different lowfrequency components. The initial phase and low-frequency components of the gesture signal depend on the starting position and small body movements [3], which are noise for gesture recognition. However, these noisy factors dominate the similarity calculated by DTW so that DTW could not find a suitable matching for these two samples in the time domain. Fortunately, STFT and other time-frequency analysis allow us to focus on the frequency domain without being distracted by phases and low-frequency components. The spectrograms in Figure 1(b) and Figure 1(f) clearly show how the distribution of Doppler frequency changes over time, which is directly connected to changes in the movement speed. However, matching spectrograms is challenging since gesture speed changes introduce both frequency shifts and gesture