Dynamic Speed Warping:Similarity-Based One-shot Learning for Device-free Gesture Signals Xun Wang,Ke Sun,Ting Zhao,Wei Wang,and Qing Gu State Key Laboratory for Novel Software Technology,Nanjing University [xunwang,kesun,tingzhao)@smail.nju.edu.cn,kesun@eng.ucsd.edu,[ww.guq}@nju.edu.cn Abstract-In this paper,we propose a Dynamic Speed Warping that learns information about object categories from one,or (DSW)algorithm to enable one-shot learning for device-free only a few,training samples.Thus,it reduces both the data gesture signals performed by different users.The design of DSW collection and training cost.The design of the DSW algorithm is based on the observation that the gesture type is determined by the trajectory of hand components rather than the movement is based on the critical observation that the gesture type is speed.By dynamically scaling the speed distribution and tracking determined by the trajectory of hand components,e.g.,fingers the movement distance along the trajectory,DSW can effectively and the palm,rather than the movement speed.We show that match gesture signals from different domains that have a ten-fold the similarity in trajectory leads to the similarity in the total difference in speeds.Our experimental results show that DSW movement distances and the scaled speed distributions.The can achieve a recognition accuracy of 97%for gestures performed by unknown users,while only use one training sample of each total movement distances are similar because considering the gesture type from four training users. specific trajectory for a gesture,e.g.,click,the starting and the ending postures of the hand remain the same,no matter I.INTRODUCTION how fast the user performs the gesture.The scaled speed Device-free gesture recognition systems use the Radio Fre- distributions are similar because when the user changes the quency (RF)[1]-[9]or sound signals [10]-[15]to detect movement speed,the speeds of different parts of the hand, and recognize human movements.By analyzing the signal such as the fingers and the palm,changes proportionally reflection of the hand,device-free sensing allows users to Therefore,the speed distribution of different components of a interact with their devices freely without wearing any sensor. fast gesture movement can be matched to the distribution of a Such natural and unconstrained interaction paradigm would slow gesture movement of the same type,when we scale down become a vital component for the next generation Human- speeds of all components by the same factor.Based on this Computer Interaction (HCD)solutions. observation,we design a dynamical programming algorithm, One of the key challenges for device-free sensing is to which is inspired by Dynamic Time Warping(DTW)[20],to robustly recognize gesture signals for different users and in calculate the similarity of gesture signals in terms of the total different environments.Traditional machine learning methods movement distance and the scaled speed distribution. use large datasets and intensive training process to extract domain-independent features from gesture signals.For ex- The DSW similarity measure leads to new ways to explore ample,one can collect gesture samples in different domains the gesture recognition problem.First,the robust gesture and use Generative Adversarial Networks (GANs)to reduce matching algorithm can be combined with kNN to serve as the impact of domain-specific features [16],[17].However, a similarity-based one-shot learning scheme that only requires due to the insufficient understanding of the machine-generated a small number of training samples.As the DSW algorithm models,the performance of these domain-independent models can adapt to different gesture speeds,it dramatically reduces under a new environment cannot be guaranteed.Fine-tuning the data collection/labeling cost and can incrementally tune the model in a new domain may require a large number of the system without retraining.Second,the DSW similarity new samples to be collected and labeled by the end-user in measure can serve as the basis for unsupervised or semi- the new environment.Even if virtual samples can be produced supervised learning systems.The DSW algorithm can auto- via geometric models using a small number of gestures in matically derive the type of gestures of unlabeled samples by the target domain [18],the retraining process still incurs clustering them using the speed-independent measure. formidable costs for mobile systems. We perform extensive evaluations of DSW using ultrasound- In this paper,rather than extracting domain-independent fea- based gesture signals.Our experimental results show that ture,we use Dynamic Speed Warping(DSW)to derive a sim- DSW can achieve a recognition accuracy of 97%for gestures ilarity measure between device-free gesture signals.As users performed by unknown users,while using only one training may perform the gesture with different speeds and the Doppler sample of each gesture type from four training users.DSW shift largely depends on the environment [19],speed variations also outperforms the DTW algorithm in all three external lead to severe robustness issues in the widely-used speed-based indicators for clustering performance.Therefore,DSW sim- gesture features [1]-[3].By removing the speed variation,the ilarity can serve as a powerful tool for both supervised and DSW similarity enables domain-independent one-shot learning unsupervised learning tasks.Dynamic Speed Warping: Similarity-Based One-shot Learning for Device-free Gesture Signals Xun Wang, Ke Sun, Ting Zhao, Wei Wang, and Qing Gu State Key Laboratory for Novel Software Technology, Nanjing University {xunwang,kesun,tingzhao}@smail.nju.edu.cn, kesun@eng.ucsd.edu, {ww,guq}@nju.edu.cn Abstract—In this paper, we propose a Dynamic Speed Warping (DSW) algorithm to enable one-shot learning for device-free gesture signals performed by different users. The design of DSW is based on the observation that the gesture type is determined by the trajectory of hand components rather than the movement speed. By dynamically scaling the speed distribution and tracking the movement distance along the trajectory, DSW can effectively match gesture signals from different domains that have a ten-fold difference in speeds. Our experimental results show that DSW can achieve a recognition accuracy of 97% for gestures performed by unknown users, while only use one training sample of each gesture type from four training users. I. INTRODUCTION Device-free gesture recognition systems use the Radio Frequency (RF) [1]–[9] or sound signals [10]–[15] to detect and recognize human movements. By analyzing the signal reflection of the hand, device-free sensing allows users to interact with their devices freely without wearing any sensor. Such natural and unconstrained interaction paradigm would become a vital component for the next generation HumanComputer Interaction (HCI) solutions. One of the key challenges for device-free sensing is to robustly recognize gesture signals for different users and in different environments. Traditional machine learning methods use large datasets and intensive training process to extract domain-independent features from gesture signals. For example, one can collect gesture samples in different domains and use Generative Adversarial Networks (GANs) to reduce the impact of domain-specific features [16], [17]. However, due to the insufficient understanding of the machine-generated models, the performance of these domain-independent models under a new environment cannot be guaranteed. Fine-tuning the model in a new domain may require a large number of new samples to be collected and labeled by the end-user in the new environment. Even if virtual samples can be produced via geometric models using a small number of gestures in the target domain [18], the retraining process still incurs formidable costs for mobile systems. In this paper, rather than extracting domain-independent feature, we use Dynamic Speed Warping (DSW) to derive a similarity measure between device-free gesture signals. As users may perform the gesture with different speeds and the Doppler shift largely depends on the environment [19], speed variations lead to severe robustness issues in the widely-used speed-based gesture features [1]–[3]. By removing the speed variation, the DSW similarity enables domain-independent one-shot learning that learns information about object categories from one, or only a few, training samples. Thus, it reduces both the data collection and training cost. The design of the DSW algorithm is based on the critical observation that the gesture type is determined by the trajectory of hand components, e.g., fingers and the palm, rather than the movement speed. We show that the similarity in trajectory leads to the similarity in the total movement distances and the scaled speed distributions. The total movement distances are similar because considering the specific trajectory for a gesture, e.g., click, the starting and the ending postures of the hand remain the same, no matter how fast the user performs the gesture. The scaled speed distributions are similar because when the user changes the movement speed, the speeds of different parts of the hand, such as the fingers and the palm, changes proportionally. Therefore, the speed distribution of different components of a fast gesture movement can be matched to the distribution of a slow gesture movement of the same type, when we scale down speeds of all components by the same factor. Based on this observation, we design a dynamical programming algorithm, which is inspired by Dynamic Time Warping (DTW) [20], to calculate the similarity of gesture signals in terms of the total movement distance and the scaled speed distribution. The DSW similarity measure leads to new ways to explore the gesture recognition problem. First, the robust gesture matching algorithm can be combined with kNN to serve as a similarity-based one-shot learning scheme that only requires a small number of training samples. As the DSW algorithm can adapt to different gesture speeds, it dramatically reduces the data collection/labeling cost and can incrementally tune the system without retraining. Second, the DSW similarity measure can serve as the basis for unsupervised or semisupervised learning systems. The DSW algorithm can automatically derive the type of gestures of unlabeled samples by clustering them using the speed-independent measure. We perform extensive evaluations of DSW using ultrasoundbased gesture signals. Our experimental results show that DSW can achieve a recognition accuracy of 97% for gestures performed by unknown users, while using only one training sample of each gesture type from four training users. DSW also outperforms the DTW algorithm in all three external indicators for clustering performance. Therefore, DSW similarity can serve as a powerful tool for both supervised and unsupervised learning tasks