正在加载图片...
MobiSys'18,June 10-15,2018,Munich,Germany Ke Sun et al. (a)Audio thread User Gender Age Hand Length Hand width Down Tapping User1 Male 23 19.0cm 10cm PVE Total conversion detection User2 Male 23 17.8cm 9.3cm Time 6.455ms 0.315ms0.036ms6.806ms User3 Female 22 14.5cm 7.5cm User4 Male 25 17.2cm 9.2cm (b)Video thread User5 Male 26 17.5cm 9.4cm Hand Fingertip Frame User6 Male 24 18.5cm 10.2cm Total detection detection playback User7 Male 24 17.9cm 9.8cm Time 22.931ms 2.540ms 14.593ms 40.064ms User8 Male 24 18.2cm 9.5cm Table 4:Participants information (c)Control thread We use the Powertutor [47]to measure the power consumption Keystroke Virtual Total localization key rendering of our system on the Samsung Galaxy S5.To measure the power consumption overhead of different individual components,we mea- Time 0.562ms 10.322ms 10.884ms sured the average power consumption in 4 different states for 25 Table 2:Processing time minutes with 5 sessions of 5 minutes:1)idle,with the screen off,2) iCD Audin Total backlight,with the screen displaying,3)video-only scheme,with 0±0.2mW 30±0.2mW Backlight 30±0.2mW 89列mW±2.3 924±2.0mW the video-based scheme on,4)our system,with both the ultrasound 140±4.9m1W 895主2.2mW 1035生4.0mW 252±12.6mW900±5.7mW384±2.7mW1536±11.0mW and video scheme on.As shown in Table 3,more than 68%of power consumption comes from LCD and CPU which are essential for Table 3:Power consumption traditional video-only virtual display applications.Compared to However,low resolution of 176 x 144 cannot support accurate the video-only scheme,the additional power consumptions intro- keystroke localization,as shown in Figure 13(b).The probability duced by our scheme for CPU and Audio are 112mW and 384mW. that our system gives a wrong keystroke location raises from nearly respectively,which means more than 77%additional power con- zero to 2.5%,when we decrease the resolution from 1280 x 720 to sumption comes from the speaker hardware.Overall,we measured 176x 144.Figure 13(a)shows the overall tapping input FNR,which a significant power consumption overhead of 48.4%on commercial is defined as the ratio of missed and wrongly identified keys to the smartphones caused by our scheme.One possible future research total number of keys pressed.We observe that neither the highest direction could be further reducing the power consumption of the nor the lowest resolution has a low tapping input error rate.High audio system video resolution of 1280 x 720 has FNR of 9.1%due to the low video frame rate,which leads to higher latency in response.Low video 8.5 Case Study resolution of 176 x 144 has FNR of 3.5%due to the higher error rate We used our system to develop different applications in AR/VR in keystroke localization.Therefore,to strike a balance between environments.In order to further evaluate the performance of the latency and the keystroke localization error,we choose to use our system,we conducted two different case studies using real- video resolution of 352 x 288,which gives a input error rate of 1.7%. world settings.As our systems use both visual information and sound reflections for target locating,just as Dolphins,we name the 8.4 Latency and Power Consumption applications as DolphinBoard and DolphinPiano. Our system achieves a tapping response latency of 18.08ms on 8.5.1 DolphinBoard:In-the-air text input.In this case study,the commercial mobile phones.We measured the processing time for task of DolphinBoard is to enable text input by tapping-in-the-air our system on a Samsung Galaxy S5 with Qualcomm Snapdragon mechanism.This study aims to evaluate the detect error rate of 2.5GHz quad-core CPU.Our implementation has three parallel different users under different environments and the tapping speed. threads:the audio thread,the video thread,and the control thread User interface:Figure 14(a)shows the user interface of Dol- The audio thread processes ultrasound signals with a segment size phinBoard.Users move their finger in-the-air and locate the virtual of 512 data samples(with time duration of 10.7ms under 48kHz key on the virtual display to be tapped.The QWERTY virtual key- sampling rate).The processing time for each stage of the audio board is rendered on the top of the screen with a size of 1320 x 528 thread for one data segment is summarized in Table 2.We observe pixels.We set the size of keys as 132 x 132 pixels for most of the that the latency for the audio process to detect a finger tapping is experiments. just 6.806ms.The video process performs hand detection,fingertip Testing Participants:We invited eight graduate student volun- detection,and video playback.At the resolution of 352 x 288,the teers to use our applications.We marked these users as User 1~8. processing latency is 40.06ms and our system achieves an average All of them participated in the 90 minutes performance experiments frame rate of 24.96 fps.The control thread performs the keystroke before the use case study.The evaluation of DolphinBoard lasted 20 localization and renders the updated virtual keyboard.It has a minutes per person,where users are asked to type a 160-character latency of 10.88ms.As these three threads run parallelly,the slowest sentence for text input speed test.Note that a larger hand may gen- video thread is not in the critical path and we can use the result erate a stronger echo of the ultrasound signal.Thus,we measured of previous frames in the other two threads.Therefore,once the the hand size of each participant as shown in Table 4. audio thread detects the finger tapping,it can evoke the control Performance evaluation:DolphinBoard achieves finger tapping thread immediately and the total latency between keystroke and detection error of less than 1.76%under three different use cases.To rendering of the virtual keyboard is 6.81ms 10.88ms 17.69ms. evaluate the usability of DolphinBoard,we invited eight users toMobiSys’18, June 10–15, 2018, Munich, Germany Ke Sun et al. (a) Audio thread Down conversion PVE Tapping detection Total Time 6.455ms 0.315ms 0.036ms 6.806ms (b) Video thread Hand detection Fingertip detection Frame playback Total Time 22.931ms 2.540ms 14.593ms 40.064ms (c) Control thread Keystroke localization Virtual key rendering Total Time 0.562ms 10.322ms 10.884ms Table 2: Processing time CPU LCD Audio Total Idle 30 ± 0.2mW / / 30 ± 0.2mW Backlight 30 ± 0.2mW 894mW ± 2.3 / 924 ± 2.0mW Video-only 140 ± 4.9mW 895 ± 2.2mW / 1035 ± 4.0mW Our scheme 252 ± 12.6mW 900 ± 5.7mW 384 ± 2.7mW 1536 ± 11.0mW Table 3: Power consumption However, low resolution of 176 × 144 cannot support accurate keystroke localization, as shown in Figure 13(b). The probability that our system gives a wrong keystroke location raises from nearly zero to 2.5%, when we decrease the resolution from 1280 × 720 to 176 × 144. Figure 13(a) shows the overall tapping input FNR, which is defined as the ratio of missed and wrongly identified keys to the total number of keys pressed. We observe that neither the highest nor the lowest resolution has a low tapping input error rate. High video resolution of 1280 × 720 has FNR of 9.1% due to the low video frame rate, which leads to higher latency in response. Low video resolution of 176 × 144 has FNR of 3.5% due to the higher error rate in keystroke localization. Therefore, to strike a balance between the latency and the keystroke localization error, we choose to use video resolution of 352 × 288, which gives a input error rate of 1.7%. 8.4 Latency and Power Consumption Our system achieves a tapping response latency of 18.08ms on commercial mobile phones. We measured the processing time for our system on a Samsung Galaxy S5 with Qualcomm Snapdragon 2.5GHz quad-core CPU. Our implementation has three parallel threads: the audio thread, the video thread, and the control thread. The audio thread processes ultrasound signals with a segment size of 512 data samples (with time duration of 10.7ms under 48kHz sampling rate). The processing time for each stage of the audio thread for one data segment is summarized in Table 2. We observe that the latency for the audio process to detect a finger tapping is just 6.806ms. The video process performs hand detection, fingertip detection, and video playback. At the resolution of 352 × 288, the processing latency is 40.06ms and our system achieves an average frame rate of 24.96 fps. The control thread performs the keystroke localization and renders the updated virtual keyboard. It has a latency of 10.88ms. As these three threads run parallelly, the slowest video thread is not in the critical path and we can use the result of previous frames in the other two threads. Therefore, once the audio thread detects the finger tapping, it can evoke the control thread immediately and the total latency between keystroke and rendering of the virtual keyboard is 6.81ms + 10.88ms = 17.69ms. User Gender Age Hand Length Hand width User1 Male 23 19.0cm 10cm User2 Male 23 17.8cm 9.3cm User3 Female 22 14.5cm 7.5cm User4 Male 25 17.2cm 9.2cm User5 Male 26 17.5cm 9.4cm User6 Male 24 18.5cm 10.2cm User7 Male 24 17.9cm 9.8cm User8 Male 24 18.2cm 9.5cm Table 4: Participants information We use the Powertutor [47] to measure the power consumption of our system on the Samsung Galaxy S5. To measure the power consumption overhead of different individual components, we mea￾sured the average power consumption in 4 different states for 25 minutes with 5 sessions of 5 minutes: 1) idle, with the screen off, 2) backlight, with the screen displaying, 3) video-only scheme, with the video-based scheme on, 4) our system, with both the ultrasound and video scheme on. As shown in Table 3, more than 68% of power consumption comes from LCD and CPU which are essential for traditional video-only virtual display applications. Compared to the video-only scheme, the additional power consumptions intro￾duced by our scheme for CPU and Audio are 112mW and 384mW , respectively, which means more than 77% additional power con￾sumption comes from the speaker hardware. Overall, we measured a significant power consumption overhead of 48.4% on commercial smartphones caused by our scheme. One possible future research direction could be further reducing the power consumption of the audio system. 8.5 Case Study We used our system to develop different applications in AR/VR environments. In order to further evaluate the performance of our system, we conducted two different case studies using real￾world settings. As our systems use both visual information and sound reflections for target locating, just as Dolphins, we name the applications as DolphinBoard and DolphinPiano. 8.5.1 DolphinBoard: In-the-air text input. In this case study, the task of DolphinBoard is to enable text input by tapping-in-the-air mechanism. This study aims to evaluate the detect error rate of different users under different environments and the tapping speed. User interface: Figure 14(a) shows the user interface of Dol￾phinBoard. Users move their finger in-the-air and locate the virtual key on the virtual display to be tapped. The QWERTY virtual key￾board is rendered on the top of the screen with a size of 1320 × 528 pixels. We set the size of keys as 132 × 132 pixels for most of the experiments. Testing Participants: We invited eight graduate student volun￾teers to use our applications. We marked these users as User 1 ∼ 8. All of them participated in the 90 minutes performance experiments before the use case study. The evaluation of DolphinBoard lasted 20 minutes per person, where users are asked to type a 160-character sentence for text input speed test. Note that a larger hand may gen￾erate a stronger echo of the ultrasound signal. Thus, we measured the hand size of each participant as shown in Table 4. Performance evaluation: DolphinBoard achieves finger tapping detection error of less than 1.76% under three different use cases. To evaluate the usability of DolphinBoard, we invited eight users to
<<向上翻页向下翻页>>
©2008-现在 cucdc.com 高等教育资讯网 版权所有