the surfaces of Android phones.Our implementation and evaluation mainly focused on Back-of-Device operations.To 10 achieve better efficiency,we implement most signal process- 6 ing algorithms as C functions using Android NDK and the 0 # signal processing is performed on data segments with a size 0 1 10 Length(cm) of 1,024 samples,which is identical to the length of interpo- Figure 11:Touching position clustering. lated ZC sequence.We conducted experiments on Samsung that the differential IR estimations are based on complex Galaxy S5 using its rear speaker,top microphone,and bottom valued path coefficients.If we ignore the phase and only use microphone in typical office and home environments.In the the magnitude of path coefficients,there are some locations experiments,the users interacted with the phone using their where the phase change caused by the touch event incurs bare hands without wearing any accessory. little magnitude change so that the touch event cannot be reliably detected.Similar phenomenon also appears in the 8.2 Evaluations on Finger Movements case of using the magnitude of WiFi signals to detect small VSkin achieves an average movement distance error of3.59 movements,such as human respiration [27]. mm when the finger moves for 6 cm on the back of the phone. 7.2 Touch Detection and Localization We attached a tape with a length of 6 cm on the back of We perform joint touch detection and localization using the phone and asked the user to move their finger up/down the differential IR estimation around the structure path.Since along the tape while touching the surface of the phone.We the structure-borne sound and air-borne sound are mixed on determine the ground truth of the path length change using the bottom microphone as shown in Section 5.3,we only use a ruler,which is 10 cm for the 6 cm movement.Our system the path coefficients of the top microphone to sense touching. measure the movement distance by the bottom microphone To detect touch events,we first calculate the time difference and the rear speaker,using the compensation factor of 0.6 to of the IR estimation in a similar way as in Section 6.2.We convert the measured path length change into the movement then identify the delay with the maximum magnitude of distance.Our simulation results show that the compensation the time differential IR estimation and use the maximum factor is in the range of 0.54 ~0.6 for different positions magnitude as the indicator of the touch event.We use a on the back of the phone.Thus,fixing the factor to 0.6 will threshold based scheme to detect touch and release events. not significantly influence the accuracy.Figure 12(a)shows i.e.,once the magnitude of differential IR estimation exceed the Cumulative Distribution Function(CDF)of the distance the threshold,we determine that the user either touches measurement error for 400 movements.The average move- the surface or releases the finger.The detection threshold is ment distance errors of VSkin,without delay selection and dynamically calculated based on the background noise level without delay selection and EKF are 3.59 mm,4.25 mm and Our touch detection scheme keeps the state of touching and 7.35 mm,respectively.The algorithm for delay selection and toggles between touch and release based on the detected EKF reduces the measurement error by half.The standard events.Touch detection can work when the user holds the deviation of the error is 2.66 mm and the 90th percentile phone with his/her hand.Given that the pose of the holding measurement error is 7.70 mm,as shown in Figure 12(a) hand does not change,we can still reliably detect touches VSkin is robust for objects with different diameters from using the differential IR estimation. 1 cm to 2 cm.Since user fingers have different diameters To determine the position of the touch,we use the delay and introduce different reflection amplitude in sound sig- (calculated in terms of samples)of the peak in differential nals,we use pens with three different diameters to measure IR estimation.We divide the back surface of the phone into the robustness of VSkin.Figure 12(b)shows the CDF of the three regions based on the distance to speaker.The points in movement distance error averaged by 200 movements of 6 different regions are marked with different colors in Figure cm.The average distance errors for pens with 1 cm,1.5 cm, 11.Using the delay of the peak in differential IR estimation, and 2 cm diameters are 6.64 mm,5.14 mm,and 4.40 mm,re- we can identify the region that the user touches with an spectively.Objects with a small diameter of 1 cm only incur accuracy of 87.8%. a small increase in the distance error of 2.24 mm VSkin is robust for different holding styles.We evaluated 8 SYSTEM EVALUATION our system under two different use cases:holding the phone with their hands and putting it on the table.We asked the 8.1 Implementation users to use their own holding styles during the experiments. We implemented VSkin on the Android platform.Our system works as a real time APP that allows user to per- The average distance error for different users is 6.64 mm when putting the phone on the table.Holding the phone in form touch gestures,e.g.,scrolling,swiping,and tapping,onRear camera Rear speaker Top Mic Bottom Mic 9 6 3 10 7 4 1 11 8 5 2 Figure 11: Touching position clustering. that the differential IR estimations are based on complex valued path coefficients. If we ignore the phase and only use the magnitude of path coefficients, there are some locations where the phase change caused by the touch event incurs little magnitude change so that the touch event cannot be reliably detected. Similar phenomenon also appears in the case of using the magnitude of WiFi signals to detect small movements, such as human respiration [27]. 7.2 Touch Detection and Localization We perform joint touch detection and localization using the differential IR estimation around the structure path. Since the structure-borne sound and air-borne sound are mixed on the bottom microphone as shown in Section 5.3, we only use the path coefficients of the top microphone to sense touching. To detect touch events, we first calculate the time difference of the IR estimation in a similar way as in Section 6.2. We then identify the delay with the maximum magnitude of the time differential IR estimation and use the maximum magnitude as the indicator of the touch event. We use a threshold based scheme to detect touch and release events, i.e., once the magnitude of differential IR estimation exceed the threshold, we determine that the user either touches the surface or releases the finger. The detection threshold is dynamically calculated based on the background noise level. Our touch detection scheme keeps the state of touching and toggles between touch and release based on the detected events. Touch detection can work when the user holds the phone with his/her hand. Given that the pose of the holding hand does not change, we can still reliably detect touches using the differential IR estimation. To determine the position of the touch, we use the delay (calculated in terms of samples) of the peak in differential IR estimation. We divide the back surface of the phone into three regions based on the distance to speaker. The points in different regions are marked with different colors in Figure 11. Using the delay of the peak in differential IR estimation, we can identify the region that the user touches with an accuracy of 87.8%. 8 SYSTEM EVALUATION 8.1 Implementation We implemented VSkin on the Android platform. Our system works as a real time APP that allows user to perform touch gestures, e.g., scrolling, swiping, and tapping, on the surfaces of Android phones. Our implementation and evaluation mainly focused on Back-of-Device operations. To achieve better efficiency, we implement most signal processing algorithms as C functions using Android NDK and the signal processing is performed on data segments with a size of 1,024 samples, which is identical to the length of interpolated ZC sequence. We conducted experiments on Samsung Galaxy S5 using its rear speaker, top microphone, and bottom microphone in typical office and home environments. In the experiments, the users interacted with the phone using their bare hands without wearing any accessory. 8.2 Evaluations on Finger Movements VSkin achieves an average movement distance error of 3.59 mm when the finger moves for 6 cm on the back of the phone. We attached a tape with a length of 6 cm on the back of the phone and asked the user to move their finger up/down along the tape while touching the surface of the phone. We determine the ground truth of the path length change using a ruler, which is 10 cm for the 6 cm movement. Our system measure the movement distance by the bottom microphone and the rear speaker, using the compensation factor of 0.6 to convert the measured path length change into the movement distance. Our simulation results show that the compensation factor is in the range of 0.54 ∼ 0.6 for different positions on the back of the phone. Thus, fixing the factor to 0.6 will not significantly influence the accuracy. Figure 12(a) shows the Cumulative Distribution Function (CDF) of the distance measurement error for 400 movements. The average movement distance errors of VSkin, without delay selection and without delay selection and EKF are 3.59 mm, 4.25 mm and 7.35 mm, respectively. The algorithm for delay selection and EKF reduces the measurement error by half. The standard deviation of the error is 2.66 mm and the 90th percentile measurement error is 7.70 mm, as shown in Figure 12(a). VSkin is robust for objects with different diameters from 1 cm to 2 cm. Since user fingers have different diameters and introduce different reflection amplitude in sound signals, we use pens with three different diameters to measure the robustness of VSkin. Figure 12(b) shows the CDF of the movement distance error averaged by 200 movements of 6 cm. The average distance errors for pens with 1 cm, 1.5 cm, and 2 cm diameters are 6.64 mm, 5.14 mm, and 4.40 mm, respectively. Objects with a small diameter of 1 cm only incur a small increase in the distance error of 2.24 mm. VSkin is robust for different holding styles. We evaluated our system under two different use cases: holding the phone with their hands and putting it on the table. We asked the users to use their own holding styles during the experiments. The average distance error for different users is 6.64 mm when putting the phone on the table. Holding the phone in