often confuse these two intentions,due to the overloaded ac- a short time interval of 0.13~0.34 ms,which is just 6~16 sam tions on gestures that are similar to each other.With the new ple points at a sampling rate of 48 kHz.With the limited types of touch gestures performed on different surfaces of inaudible sound bandwidth(around 6 kHz)available on com- the device.these actions can be assigned to distinct gestures. mercial mobile devices,it is challenging to separate these e.g,selecting an item should be performed on the screen paths.Moreover,to achieve accurate movement measure- while scrolling or switching should be performed on the back ment and location independent touch detection,we need or the side of the device.Third,touch sensing on the side to measure both the phase and the magnitude of each path. of the phone enables virtual side-buttons that could replace To address this challenge,we design a system that uses the physical buttons and improve the waterproof performance of Zadoff-Chu(ZC)sequence to measure different sound paths. the device.Compared to in-air gestures that also enrich the With the near-optimal auto-correlation function of the ZC gesture semantics,touch gestures have a better user experi- sequence,which has a peak width of 6 samples,we can sepa- ence,due to their accurate touch detection(for confirmation) rate the structure-borne and the air-borne signals when the connected to the useful haptic feedbacks. distance between the speaker and microphone is just 12 cm. Fine-grained gesture movement distance/speed measure- Furthermore,we develop a new algorithm that measures ments are vital for enabling touch gestures that users are the phase of each sound path at a rate of 3,000 samples per already familiar with,including scrolling and swiping.How- second.Compared to traditional impulsive signal systems ever,existing accelerometer or structural vibration based that measure sound paths in a frame by frame manner(with touch sensing schemes only recognize coarse-grained ac- frame rate <170 Hz [14,34),the higher sampling rate helps tivities,such as the tapping events [5,35].Extra informa- VSkin capture fast swiping and tapping events. tion on the tapping position or the tapping force levels usu- We implement VSkin on commercial smartphones as real- ally requires intensive training and calibration processes time Android applications.Experimental results show that [12,13,25]or additional hardware,such as a mirror on the VSkin achieves a touch detection accuracy of 99.65%and an back of the smartphone [31]. accuracy of 3.59 mm for finger movement distances.Our user In this paper,we propose VSkin,a system that supports study shows that VSkin only slightly increases the movement fine-grained gesture-sensing on the surfaces of mobile de- time used for interaction tasks,e.g.,scrolling and swiping, vices based on acoustic signals.Similar to a layer of skin by 34%and 10%when compared to touchscreens. on the surfaces of the mobile device,VSkin can sense both We made the following contributions in this work: the finger tapping and finger movement distance/direction We introduce a new approach for touch-sensing on mo- on the surface of the device.Without modifying the hard- bile devices by separating the structure-borne and the air- ware,VSkin utilizes the built-in speakers and microphones borne sound signals. to send and receive sound signals for touch-sensing.More We design an algorithm that performs the phase and specifically,VSkin captures both the structure-borne sounds. magnitude measurement of multiple sound paths at a high ie.,sounds propagating through the structure of the device, sampling rate of 3 kHz. and the air-borne sounds,ie,sounds propagating through We implement our system on the Android platform and the air.As touching the surface can significantly change the perform real-world user studies to verify our design. structural vibration pattern of the device,the characteristics of structure-borne sounds are reliable features for touch de- 2 RELATED WORK tection,i.e.,whether the finger contacts the surface or not [12,13,25].While it is difficult to use the structure-borne We categorize researches related to VSkin into three classes:Back-of-Device interactions,tapping and force sens- sounds to sense finger movements,air-borne sounds can measure the movement with mm-level accuracy [14,28,34] ing,and sound-based gesture sensing. Therefore,by analyzing both the structure-borne and the Back-of-Device Interactions:Back-of-Device interac- air-borne sounds,it is possible to reliably recognize a rich set tion is a popular way to extend the user interface of mobile of touch gestures as if there is another touchscreen on the devices [5,11,31,32,35.Gestures performed on the back back of the phone.Moreover,VSkin does not require inten- of the device can be detected by the built-in camera [31,32 sive training,as it uses the physical properties of the sound or sensors [5,35]on the mobile device.LensGesture [32] propagation to detect touch and measure finger movements. uses the rear camera to detect finger movements that are The key challenge faced by VSkin is to measure both the performed just above the camera.Back-Mirror [31]uses an structure-borne and the air-borne signals with high fidelity additional mirror attached to the rear camera to capture BoD while the hand is very close to the mobile device.Given the gestures in a larger region.However,due to the limited view- small form factor of mobile devices,sounds traveling through ing angle of cameras,these approaches either have limited different mediums and paths arrive at the microphone within sensing area or need extra hardware for extending sensingoften confuse these two intentions, due to the overloaded actions on gestures that are similar to each other. With the new types of touch gestures performed on different surfaces of the device, these actions can be assigned to distinct gestures, e.g., selecting an item should be performed on the screen while scrolling or switching should be performed on the back or the side of the device. Third, touch sensing on the side of the phone enables virtual side-buttons that could replace physical buttons and improve the waterproof performance of the device. Compared to in-air gestures that also enrich the gesture semantics, touch gestures have a better user experience, due to their accurate touch detection (for confirmation) connected to the useful haptic feedbacks. Fine-grained gesture movement distance/speed measurements are vital for enabling touch gestures that users are already familiar with, including scrolling and swiping. However, existing accelerometer or structural vibration based touch sensing schemes only recognize coarse-grained activities, such as the tapping events [5, 35]. Extra information on the tapping position or the tapping force levels usually requires intensive training and calibration processes [12, 13, 25] or additional hardware, such as a mirror on the back of the smartphone [31]. In this paper, we propose VSkin, a system that supports fine-grained gesture-sensing on the surfaces of mobile devices based on acoustic signals. Similar to a layer of skin on the surfaces of the mobile device, VSkin can sense both the finger tapping and finger movement distance/direction on the surface of the device. Without modifying the hardware, VSkin utilizes the built-in speakers and microphones to send and receive sound signals for touch-sensing. More specifically, VSkin captures both the structure-borne sounds, i.e., sounds propagating through the structure of the device, and the air-borne sounds, i.e., sounds propagating through the air. As touching the surface can significantly change the structural vibration pattern of the device, the characteristics of structure-borne sounds are reliable features for touch detection, i.e., whether the finger contacts the surface or not [12, 13, 25]. While it is difficult to use the structure-borne sounds to sense finger movements, air-borne sounds can measure the movement with mm-level accuracy [14, 28, 34]. Therefore, by analyzing both the structure-borne and the air-borne sounds, it is possible to reliably recognize a rich set of touch gestures as if there is another touchscreen on the back of the phone. Moreover, VSkin does not require intensive training, as it uses the physical properties of the sound propagation to detect touch and measure finger movements. The key challenge faced by VSkin is to measure both the structure-borne and the air-borne signals with high fidelity while the hand is very close to the mobile device. Given the small form factor of mobile devices, sounds traveling through different mediums and paths arrive at the microphone within a short time interval of 0.13∼0.34ms, which is just 6∼16 sample points at a sampling rate of 48 kHz. With the limited inaudible sound bandwidth (around 6 kHz) available on commercial mobile devices, it is challenging to separate these paths. Moreover, to achieve accurate movement measurement and location independent touch detection, we need to measure both the phase and the magnitude of each path. To address this challenge, we design a system that uses the Zadoff-Chu (ZC) sequence to measure different sound paths. With the near-optimal auto-correlation function of the ZC sequence, which has a peak width of 6 samples, we can separate the structure-borne and the air-borne signals when the distance between the speaker and microphone is just 12 cm. Furthermore, we develop a new algorithm that measures the phase of each sound path at a rate of 3,000 samples per second. Compared to traditional impulsive signal systems that measure sound paths in a frame by frame manner (with frame rate <170 Hz [14, 34]), the higher sampling rate helps VSkin capture fast swiping and tapping events. We implement VSkin on commercial smartphones as realtime Android applications. Experimental results show that VSkin achieves a touch detection accuracy of 99.65% and an accuracy of 3.59mm for finger movement distances. Our user study shows that VSkin only slightly increases the movement time used for interaction tasks, e.g., scrolling and swiping, by 34% and 10% when compared to touchscreens. We made the following contributions in this work: • We introduce a new approach for touch-sensing on mobile devices by separating the structure-borne and the airborne sound signals. • We design an algorithm that performs the phase and magnitude measurement of multiple sound paths at a high sampling rate of 3 kHz. • We implement our system on the Android platform and perform real-world user studies to verify our design. 2 RELATED WORK We categorize researches related to VSkin into three classes: Back-of-Device interactions, tapping and force sensing, and sound-based gesture sensing. Back-of-Device Interactions: Back-of-Device interaction is a popular way to extend the user interface of mobile devices [5, 11, 31, 32, 35]. Gestures performed on the back of the device can be detected by the built-in camera [31, 32] or sensors [5, 35] on the mobile device. LensGesture [32] uses the rear camera to detect finger movements that are performed just above the camera. Back-Mirror [31] uses an additional mirror attached to the rear camera to capture BoD gestures in a larger region. However, due to the limited viewing angle of cameras, these approaches either have limited sensing area or need extra hardware for extending sensing