《分布式计算实验室》课程教学资源（阅读文献）SpeedTalker：Automobile Speed Estimation via Mobile Phones.pdf_大学文库

This article has been accepted for publication in a future issue of this journal,but has not been fully edited.Content may change prior to final publication.Citation information:DOI 10.1109/TMC.2020.3034354.IEEE Transactions on Mobile Computing IEEE TRANSACTIONS ON MOBILE COMPUTING,VOL.XX,NO.XX,2020 SpeedTalker:Automobile Speed Estimation via Mobile Phones Xinran Lu,Lei Xie,Member,IEEE,Yafeng Yin,Member,IEEE,Wei Wang,Member,IEEE, Yanling Bu,Member,IEEE,Qing Guo,and Sanglu Lu,Member,IEEE Abstract-Among all the road accidents,speeding is the most deadly factor.To reduce speeding,it is essential to devise efficient schemes for ubiquitous speed monitoring.Traditional approaches either suffers from using special equipment(e.g.,radar speed gun)or special deployment(e.g.,position-fixed cameras).In this paper,we propose SpeedTalker,a mobile phone-based approach to perform speed detection on automobiles.By leveraging the built-in microphones and camera from the mobile phone,SpeedTalker estimates the automobile speed by passively sensing the acoustic and image signals.We propose an integrated solution to effectively estimate the automobile's speed based on COTS devices,and provide a platform for every pedestrian to help report the speeding event of automobiles.Specifically,we use the time difference of arrivals(TDOA)model based on acoustic signals to figure out the candidate trajectories of automobile,and use the pin-hole model based on image frames to figure out the vertical distance between the user's position and the automobile's trajectory,thus to estimate the unique trajectory.Combined with the time stamp of the trajectory,the automobile speed can be estimated.Besides,we propose a method to effectively mitigate the influence of the movement jitters of mobile phone.We implemented a system prototype for SpeedTalker and estimated the automobile speed with high accuracy. Experiment results show that in the scenario of single automobile,SpeedTalker can achieve an average estimation error of 6.1% compared to radar speed guns.In the scenario of multiple automobiles,SpeedTalker can achieve an average estimation error of 9.8%. which is acceptable for usage. 1 INTRODUCTION Driving direction 1.1 Motivation Nowadays,more and more traffic violations occur due to the increase of the automobile,e.g.,in 2016,the number of the road traffic deaths reached 1.35 million.Among all Sound wave kinds of the traffic violations,speeding is the most deadly factor[1].Appropriate reductions in speed can reduce fatal Top Mic bttom Mic and serious crash risk to prevent death and serious injury[2]. To reduce speeding,it is essential to devise efficient schemes for ubiquitous monitoring on traffic.Traditional ways to monitor the traffic are using speed radar or using cameras. However,they are costly and inconvenient since they need (a)Illustration of the system. wide deployment of special equipment.As a result,a low- cost and mobile solution to measure the speed is needed. 08 It is noted that,the mobile phones embedded with many kinds of sensors,such as cameras and microphones,have Current Automobile Type:Nissan become indispensable in daily life.By utilizing the built- in sensors,we can propose a method to measure the auto- Loc:Hil St mobile speed with mobile phones.Specifically,we can use the microphones and camera to recover the trajectory of the automobile and estimate the speed.IMU sensors are utilized 84km/h18 to remove jitters to raise the accuracy of the system.In this Overspeed:4km/h way,every pedestrian can help to monitor the traffic condi- (b)The application of the system. tion with his/her mobile phone.Furthermore,all people can Fig.1:Application scenario of SpeedTalker. Xinran Lu,Lei Xie,Yafeng Yin,Wei Wang,Yanling Bu,Oing Guo participate in the activities of reporting traffic conditions by and Sanglu Lu are with the State Key Laboratory for Novel Software sufficiently applying the crowdsourcing method [3]. Technology,Nanjing University,China E-mail:luxinran@smail.nju.edu.cn,lxie@nju.edu.cn,yafeng@nju.edu.cn, A typical scenario of SpeedTalker is as follows.In the ww@nju.edu.cn,yanling@smail.nju.edu.cn,guoqing@smail.nju.edu.cn, speed prone areas,the pedestrians who volunteer to moni- sanglu@nju.edu.cn. tor the traffic can arrive at the area in advance and contin- .Lei Xie is the corresponding author. uously record the acoustic and the visual signals of the au- 36-1233(c)2020 IEEE Personal use is permitted,but republication/redistribution requires IEEE permission.See http://www.ieee.org/publications_standards/publications/rights/index.html for more information. Authorized licensed use limited to:Nanjing University.Downloaded on July 06,2021 at 04:35:27 UTC from IEEE Xplore.Restrictions apply

1536-1233 (c) 2020 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information. This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/TMC.2020.3034354, IEEE Transactions on Mobile Computing IEEE TRANSACTIONS ON MOBILE COMPUTING, VOL. XX, NO. XX, 2020 1 SpeedTalker: Automobile Speed Estimation via Mobile Phones Xinran Lu, Lei Xie, Member, IEEE, Yafeng Yin, Member, IEEE, Wei Wang, Member, IEEE, Yanling Bu, Member, IEEE, Qing Guo, and Sanglu Lu, Member, IEEE Abstract—Among all the road accidents, speeding is the most deadly factor. To reduce speeding, it is essential to devise efficient schemes for ubiquitous speed monitoring. Traditional approaches either suffers from using special equipment(e.g., radar speed gun) or special deployment(e.g., position-fixed cameras). In this paper, we propose SpeedTalker, a mobile phone-based approach to perform speed detection on automobiles. By leveraging the built-in microphones and camera from the mobile phone, SpeedTalker estimates the automobile speed by passively sensing the acoustic and image signals. We propose an integrated solution to effectively estimate the automobile’s speed based on COTS devices, and provide a platform for every pedestrian to help report the speeding event of automobiles. Specifically, we use the time difference of arrivals (TDOA) model based on acoustic signals to figure out the candidate trajectories of automobile, and use the pin-hole model based on image frames to figure out the vertical distance between the user’s position and the automobile’s trajectory, thus to estimate the unique trajectory. Combined with the time stamp of the trajectory, the automobile speed can be estimated. Besides, we propose a method to effectively mitigate the influence of the movement jitters of mobile phone. We implemented a system prototype for SpeedTalker and estimated the automobile speed with high accuracy. Experiment results show that in the scenario of single automobile, SpeedTalker can achieve an average estimation error of 6.1% compared to radar speed guns. In the scenario of multiple automobiles, SpeedTalker can achieve an average estimation error of 9.8%, which is acceptable for usage. ✦ 1 INTRODUCTION 1.1 Motivation Nowadays, more and more traffic violations occur due to the increase of the automobile, e.g., in 2016, the number of the road traffic deaths reached 1.35 million. Among all kinds of the traffic violations, speeding is the most deadly factor[1]. Appropriate reductions in speed can reduce fatal and serious crash risk to prevent death and serious injury[2]. To reduce speeding, it is essential to devise efficient schemes for ubiquitous monitoring on traffic. Traditional ways to monitor the traffic are using speed radar or using cameras. However, they are costly and inconvenient since they need wide deployment of special equipment. As a result, a lowcost and mobile solution to measure the speed is needed. It is noted that, the mobile phones embedded with many kinds of sensors, such as cameras and microphones, have become indispensable in daily life. By utilizing the builtin sensors, we can propose a method to measure the automobile speed with mobile phones. Specifically, we can use the microphones and camera to recover the trajectory of the automobile and estimate the speed. IMU sensors are utilized to remove jitters to raise the accuracy of the system. In this way, every pedestrian can help to monitor the traffic condition with his/her mobile phone. Furthermore, all people can • Xinran Lu, Lei Xie, Yafeng Yin, Wei Wang, Yanling Bu, Qing Guo and Sanglu Lu are with the State Key Laboratory for Novel Software Technology, Nanjing University, China. E-mail: luxinran@smail.nju.edu.cn, lxie@nju.edu.cn, yafeng@nju.edu.cn, ww@nju.edu.cn, yanling@smail.nju.edu.cn, guoqing@smail.nju.edu.cn, sanglu@nju.edu.cn. • Lei Xie is the corresponding author. (a) Illustration of the system. (b) The application of the system. Fig. 1: Application scenario of SpeedTalker. participate in the activities of reporting traffic conditions by sufficiently applying the crowdsourcing method [3]. A typical scenario of SpeedTalker is as follows. In the speed prone areas, the pedestrians who volunteer to monitor the traffic can arrive at the area in advance and continuously record the acoustic and the visual signals of the auAuthorized licensed use limited to: Nanjing University. Downloaded on July 06,2021 at 04:35:27 UTC from IEEE Xplore. Restrictions apply

This article has been accepted for publication in a future issue of this journal,but has not been fully edited.Content may change prior to final publication.Citation information:DOI 10.1109/TMC.2020.3034354.IEEE Transactions on Mobile Computing IEEE TRANSACTIONS ON MOBILE COMPUTING,VOL.XX,NO.XX,2020 tomobiles.The pedestrians only need to use the system for sidewalk can utilize mobile phones'built-in microphones a few minutes to collect the traffic speed information in this and camera to estimate the speed of the automobile.IMU period.SpeedTalker estimates the speed of the automobiles sensors are utilized to compensate the jitters caused by and collects the speeding related information.Traffic speed users.Figure 1 illustrates the application scenario of the information will be uploaded onto the server of the related system.To perform speed detection,the user needs to hold department.With the help of volunteers,data from different the mobile phone in landscape orientation as shown in the regions at different time then can be analyzed for traffic figure,i.e.,the top microphone and the bottom microphone control.The distributions of traffic police and equipment are placed in a left-and-right manner.When the automo- can be optimized and the drivers and pedestrians can be bile passes by,both two microphones record the sound of warned of danger when moving in this area. the automobiles.And the camera records the movement of the automobile.According to the measurements from 1.2 Limitation of Prior Art these two kinds of sensors,SpeedTalker estimates the speed There exist two main approaches to measure the speed of of the automobiles.Specifically,during the process when the automobiles.One approach is to use the fixed devices the automobile is passing by,the sound wave reaches the to measure the speed of the automobiles.The cameras and top and bottom microphones at different time,respectively. coils are traditional fixed devices for speed detection.They According to the time difference of arrivals(TDOA)derived can monitor whether there exist automobiles at two pre-set from acoustic signals obtained by different microphones, locations.If the automobile passes the two corresponding SpeedTalker estimates the candidate trajectories of the auto- locations,the system then records the time interval the mobile as a set of hyperbolas.According to the obtained automobile uses.Thus the speed of the automobile can be frames from the camera,SpeedTalker estimates the vertical easily estimated.However,if the fixed speed measurement distance between the user's position and the automobile's devices are widely deployed to monitor the traffic,the cost trajectory,by referring to the pin-hole model of the camera. is unacceptable.Besides,the drivers can easily figure out Then,the trajectory of the automobile can be determined from the candidates by referring to the unique vertical dis- whether there exist speed measurement devices since their positions are fixed.Moreover,each speed detection camera fance.Combined with the temporal information in acoustic needs its own parameters to estimate the speed of the signals,SpeedTalker is able to estimate the speed of the automobiles.The height,gesture and the field of view(FOV) automobiles.Besides,since the mobile phones are held in determines the detection region of the camera deployed on hands,the jitters may cause rotation and translation of the the traffic pole.This makes the estimation simple but can mobile phones.IMU sensors can be used to compensate the translations and rotations and reduce the errors only work for the specific camera. Another approach to measure the speed of the automo- bile is to use portable devices,such as radar speed gun[4] 1.4 Challenges or lidar[5].Radar speed guns use Doppler Effect to perform speed measurement.They send out a radio signal in a nar- There are three main challenges in our work.The first row beam,then receive the same signal back after it bounces challenge is to propose a passive sensing method to mea- off the target object.If the object is moving,the frequency sure the speed of automobile.Passive sensing means the of the radio waves change.According to the difference detection system does not actively transmit any detecting between the reflected radio waves and transmitted waves, signals,such as ultrasonic and flash light.Active sensing the speed of the object can be calculated.However,there has two limitations for the speeding detection.First,the exist limitations when using these portable devices.First, active signals,e.g.,the electromagnetic wave,can be easily special devices are needed to emit the directional modulated detected by the radar detectors.Second,an ultrasonic wave or flash light actively generated by the mobile phone will electromagnetic waves in certain frequency.This increases be dramatically attenuated when it is transmitted outdoors. the cost of the hardware and prohibits it to be widely used by ordinary people.Second,the electromagnetic wave To address this challenges,we propose a passive sensing emitted by the equipment can be easily detected by radar method to estimate the speed of the automobile,by utilizing two microphones and one camera in the mobile phones. detector in the automobile.Usually this makes them fail to capture the speeding event,since the automobiles may Instead of actively transmitting the modulated signals and receiving the reflected signals,our solution only collects the intentionally slow down when they pass by. acoustic signals and the image frames from the automobiles Therefore,in order to make every pedestrian become in a passive manner.The trajectory of the automobiles can potential speeding inspectors,it is essential to leverage portable daily devices,such as mobile phone,and propose be estimated by the acoustic signals from the two separated microphones and the image frames from the camera.Com- easy-to-use measurements to measure the speed of automo- biles.In fact,by sufficiently using the embedded sensors bined with the timestamp of the trajectory,the speed of the automobiles can be estimated. like the microphones and cameras,we can effectively use the mobile phones to measure the automobiles'speed. The second challenge is to derive the automobile speed from the complicated acoustic signals.The complication of the acoustic signals comes from two aspects.On one hand, 1.3 Our Approach the automobile noises are made up of many parts,including In this paper,we propose SpeedTalker,a mobile phone- the tire noise,engine noise,exhaust noise,wind noise,etc based approach to perform speed detection on automobiles. [6].These noises are mixed not only in time domain but Instead of using special devices,the pedestrian on the also in frequency domain.Therefore,it is hard to separate 1536-1233(c)2020 IEEE Personal use is permitted,but republication/redistribution requires IEEE permission.See http://www.ieee.org/publications_standards/publications/rights/index.html for more information. Authorized licensed use limited to:Nanjing University.Downloaded on July 06,2021 at 04:35:27 UTC from IEEE Xplore.Restrictions apply

1536-1233 (c) 2020 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information. This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/TMC.2020.3034354, IEEE Transactions on Mobile Computing IEEE TRANSACTIONS ON MOBILE COMPUTING, VOL. XX, NO. XX, 2020 2 tomobiles. The pedestrians only need to use the system for a few minutes to collect the traffic speed information in this period. SpeedTalker estimates the speed of the automobiles and collects the speeding related information. Traffic speed information will be uploaded onto the server of the related department. With the help of volunteers, data from different regions at different time then can be analyzed for traffic control. The distributions of traffic police and equipment can be optimized and the drivers and pedestrians can be warned of danger when moving in this area. 1.2 Limitation of Prior Art There exist two main approaches to measure the speed of the automobiles. One approach is to use the fixed devices to measure the speed of the automobiles. The cameras and coils are traditional fixed devices for speed detection. They can monitor whether there exist automobiles at two pre-set locations. If the automobile passes the two corresponding locations, the system then records the time interval the automobile uses. Thus the speed of the automobile can be easily estimated. However, if the fixed speed measurement devices are widely deployed to monitor the traffic, the cost is unacceptable. Besides, the drivers can easily figure out whether there exist speed measurement devices since their positions are fixed. Moreover, each speed detection camera needs its own parameters to estimate the speed of the automobiles. The height, gesture and the field of view(FOV) determines the detection region of the camera deployed on the traffic pole. This makes the estimation simple but can only work for the specific camera. Another approach to measure the speed of the automobile is to use portable devices, such as radar speed gun[4] or lidar[5]. Radar speed guns use Doppler Effect to perform speed measurement. They send out a radio signal in a narrow beam, then receive the same signal back after it bounces off the target object. If the object is moving, the frequency of the radio waves change. According to the difference between the reflected radio waves and transmitted waves, the speed of the object can be calculated. However, there exist limitations when using these portable devices. First, special devices are needed to emit the directional modulated electromagnetic waves in certain frequency. This increases the cost of the hardware and prohibits it to be widely used by ordinary people. Second, the electromagnetic wave emitted by the equipment can be easily detected by radar detector in the automobile. Usually this makes them fail to capture the speeding event, since the automobiles may intentionally slow down when they pass by. Therefore, in order to make every pedestrian become potential speeding inspectors, it is essential to leverage portable daily devices, such as mobile phone, and propose easy-to-use measurements to measure the speed of automobiles. In fact, by sufficiently using the embedded sensors like the microphones and cameras, we can effectively use the mobile phones to measure the automobiles’ speed. 1.3 Our Approach In this paper, we propose SpeedTalker, a mobile phonebased approach to perform speed detection on automobiles. Instead of using special devices, the pedestrian on the sidewalk can utilize mobile phones’ built-in microphones and camera to estimate the speed of the automobile. IMU sensors are utilized to compensate the jitters caused by users. Figure 1 illustrates the application scenario of the system. To perform speed detection, the user needs to hold the mobile phone in landscape orientation as shown in the figure, i.e., the top microphone and the bottom microphone are placed in a left-and-right manner. When the automobile passes by, both two microphones record the sound of the automobiles. And the camera records the movement of the automobile. According to the measurements from these two kinds of sensors, SpeedTalker estimates the speed of the automobiles. Specifically, during the process when the automobile is passing by, the sound wave reaches the top and bottom microphones at different time, respectively. According to the time difference of arrivals (TDOA) derived from acoustic signals obtained by different microphones, SpeedTalker estimates the candidate trajectories of the automobile as a set of hyperbolas. According to the obtained frames from the camera, SpeedTalker estimates the vertical distance between the user’s position and the automobile’s trajectory, by referring to the pin-hole model of the camera. Then, the trajectory of the automobile can be determined from the candidates by referring to the unique vertical distance. Combined with the temporal information in acoustic signals, SpeedTalker is able to estimate the speed of the automobiles. Besides, since the mobile phones are held in hands, the jitters may cause rotation and translation of the mobile phones. IMU sensors can be used to compensate the translations and rotations and reduce the errors. 1.4 Challenges There are three main challenges in our work. The first challenge is to propose a passive sensing method to measure the speed of automobile. Passive sensing means the detection system does not actively transmit any detecting signals, such as ultrasonic and flash light. Active sensing has two limitations for the speeding detection. First, the active signals, e.g., the electromagnetic wave, can be easily detected by the radar detectors. Second, an ultrasonic wave or flash light actively generated by the mobile phone will be dramatically attenuated when it is transmitted outdoors. To address this challenges, we propose a passive sensing method to estimate the speed of the automobile, by utilizing two microphones and one camera in the mobile phones. Instead of actively transmitting the modulated signals and receiving the reflected signals, our solution only collects the acoustic signals and the image frames from the automobiles in a passive manner. The trajectory of the automobiles can be estimated by the acoustic signals from the two separated microphones and the image frames from the camera. Combined with the timestamp of the trajectory, the speed of the automobiles can be estimated. The second challenge is to derive the automobile speed from the complicated acoustic signals. The complication of the acoustic signals comes from two aspects. On one hand, the automobile noises are made up of many parts, including the tire noise, engine noise, exhaust noise, wind noise, etc [6]. These noises are mixed not only in time domain but also in frequency domain. Therefore, it is hard to separate Authorized licensed use limited to: Nanjing University. Downloaded on July 06,2021 at 04:35:27 UTC from IEEE Xplore. Restrictions apply

This article has been accepted for publication in a future issue of this journal,but has not been fully edited.Content may change prior to final publication.Citation information:DOI 10.1109/TMC.2020.3034354.IEEE Transactions on Mobile Computing IEEE TRANSACTIONS ON MOBILE COMPUTING,VOL.XX,NO.XX,2020 3 different noises with two built-in microphones of mobile phones are at the sidewalk and the positions and gestures phones.On the other hand,there might be many kinds are unknown.So novel approaches utilizing mobile phones of noises in the environment,especially for the sound of to calculate the speed of the automobiles are needed.To other automobiles on the road.It is hard to remove the get the relative position information between automobiles environment noises,since the frequencies of other auto- and mobile phones,we need to use cameras inside the mobiles mainly lie in very close frequency band with the mobile phones,which is analogous to knowing the posi- target automobile.To address this challenge,we consider tion and gestures of the cameras in traditional CV based the acoustic signals at full frequency as a whole.We utilize approaches.Apart from distance calculation,SpeedTalker the cross-correlation of the acoustic signals from the top utilizes acoustic signals to estimate the candidate trajectory and bottom microphones to estimate the time difference of of the automobiles.There are two advantages of acoustic arrivals(TDOA).As the automobile is continuously moving, signals over the visual signals.Firstly,the detection region we can obtain a series of time delays through TDOA at of acoustic signals is broader than that of visual signals. different time.The candidate trajectories of the automobile Common cameras inside the microphones usually have nar- can be estimated as a set of hyperbolas according to the row field of view(FOV).For example,the wide-angle camera curve of the time delay.Thus the automobile speed can be of Samsung Galaxy Note 8 has 77 field of view.If we further estimated. utilize the microphones of Samsung Galaxy Note 8 to detect The third challenge is to estimate the speed of multiple automobiles,the detection field of view is around 160 automobiles.We can not separate the sound of multiple according to the hyperbola model we propose.Secondly, automobiles.Therefore,when multiple automobiles pass compute complexity of acoustic signals processing is much through the mobile phone,it is challenging to estimate the lower than that of visual signals.If visual signals are utilized speed.To address the challenge,we utilize the multiple to complete the same work,each frame of the videos should peaks in the cross-correlation figures between the top and be processed.The compute complexity of the processing is bottom microphones.Then we may recover the delay curve unacceptable. of each automobile and calculate the speed of the automo- Automobile detection via mobile phones:Automobile biles. detection is an important research area since undetected automobiles are likely to endanger human life.Mobile 1.5 Contributions phones can be utilized to inform the users of the approach- This paper makes four contributions:First,this is the first ing automobiles.There are three approaches to sense the work that estimates the automobile speed via mobile phones automobiles with mobile phones.The first approach is to through passive sensing of acoustic and image signals.We install applications both on the automobiles and the mobile propose an integrated solution to effectively estimate the au- phones.Oki Electric Industry Co.Ltd.develops a mobile tomobile's speed based on commercial off-the-shelf(COTS) phone that notifies the users of the presence of the auto- devices,and provide a platform for every pedestrian to mobiles using DSRC[9].Car-2-X utilizes ad-hoc and cellular help report the speeding event of automobiles.Second,we networks to inform the pedestrians of the automobile with use the time difference of arrivals (TDOA)model based on the same method[10].The second approach is to sense acoustic signals to figure out the candidate trajectories of the moving automobiles via images.Sivaraman proposed a automobile,and use the pin-hole model based on image general active-learning framework for on-road automobiles frames to figure out the vertical distance,thus to estimate recognition and tracking based on videos[11].Wang pro- the unique trajectory.Combined with the timestamp of the posed WalkSafe,a mobile phone application based on the trajectory,the automobile speed can be estimated.Third,we back camera to sense the automobiles[121.The drawback of implemented a system prototype for SpeedTalker and esti- these work is that image processing needs huge calculating mated the automobile speed with high accuracy.The system resources.And the camera of the mobile phone is needed to works in the outdoor environment and effectively mitigates face the road,which makes the detection inconvenient.The the ambient environmental interference.Experiment results third approach is to utilize acoustic signals to sense the auto- show that SpeedTalker can achieve an average estimation mobiles.Tsuzuki proposed an automobile sound detection error of 6.1%in the scenario of single automobile.In the system for a mobile phone[13].Takagi introduced a hybrid scenario of multiple automobiles,SpeedTalker can achieve and electric vehicles detection system[14],which focused an average estimation error of 9.8%. on switching noise of the electric motor.So they failed to detect automobiles other than these types.Li proposed Auto++,a system that detects approaching automobiles for 2 RELATED WORK smart phone users to detect all kinds of automobiles via Automobile detection via visual signals:Traditional ap- overall acoustic signals[15].However,all these works can proaches utilize cameras to calculate the speed of the au- only inform the user of the approach of the automobiles tomobiles.Kumar7]and Czajewski8]use computer vision and can not estimate the speed of the automobile. based technologies to detect automobiles.The cameras are Sensing via acoustic signals with mobile phones:Sens- deployed in fixed positions and gestures above the street. ing with daily equipment is a popular issue.Sound waves As a result,the detection region is known and fixed.That can easily be transmitted and received by daily equipment, means the moving distance of the automobiles can easily such as mobile phones and smart watches.Much work be acquired.Then the speed of the automobiles can be based on sound wave has been published.AAMouse mea- calculated.However,the scenarios of SpeedTalker is differ- sures the Doppler Shift of the sound waves transmitted by ent from that of traditional visual approaches.The mobile a mobile phone to track the phone itself with an accuracy 1536-1233(c)2020 IEEE.Personal use is permitted,but republication/redistribution requires IEEE permission.See http://www.ieee.org/publications_standards/publications/rights/index.html for more information. Authorized licensed use limited to:Nanjing University.Downloaded on July 06,2021 at 04:35:27 UTC from IEEE Xplore.Restrictions apply

1536-1233 (c) 2020 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information. This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/TMC.2020.3034354, IEEE Transactions on Mobile Computing IEEE TRANSACTIONS ON MOBILE COMPUTING, VOL. XX, NO. XX, 2020 3 different noises with two built-in microphones of mobile phones. On the other hand, there might be many kinds of noises in the environment, especially for the sound of other automobiles on the road. It is hard to remove the environment noises, since the frequencies of other automobiles mainly lie in very close frequency band with the target automobile. To address this challenge, we consider the acoustic signals at full frequency as a whole. We utilize the cross-correlation of the acoustic signals from the top and bottom microphones to estimate the time difference of arrivals (TDOA). As the automobile is continuously moving, we can obtain a series of time delays through TDOA at different time. The candidate trajectories of the automobile can be estimated as a set of hyperbolas according to the curve of the time delay. Thus the automobile speed can be further estimated. The third challenge is to estimate the speed of multiple automobiles. We can not separate the sound of multiple automobiles. Therefore, when multiple automobiles pass through the mobile phone, it is challenging to estimate the speed. To address the challenge, we utilize the multiple peaks in the cross-correlation figures between the top and bottom microphones. Then we may recover the delay curve of each automobile and calculate the speed of the automobiles. 1.5 Contributions This paper makes four contributions: First, this is the first work that estimates the automobile speed via mobile phones through passive sensing of acoustic and image signals. We propose an integrated solution to effectively estimate the automobile’s speed based on commercial off-the-shelf(COTS) devices, and provide a platform for every pedestrian to help report the speeding event of automobiles. Second, we use the time difference of arrivals (TDOA) model based on acoustic signals to figure out the candidate trajectories of automobile, and use the pin-hole model based on image frames to figure out the vertical distance, thus to estimate the unique trajectory. Combined with the timestamp of the trajectory, the automobile speed can be estimated. Third, we implemented a system prototype for SpeedTalker and estimated the automobile speed with high accuracy. The system works in the outdoor environment and effectively mitigates the ambient environmental interference. Experiment results show that SpeedTalker can achieve an average estimation error of 6.1% in the scenario of single automobile. In the scenario of multiple automobiles, SpeedTalker can achieve an average estimation error of 9.8%. 2 RELATED WORK Automobile detection via visual signals: Traditional approaches utilize cameras to calculate the speed of the automobiles. Kumar[7] and Czajewski[8] use computer vision based technologies to detect automobiles. The cameras are deployed in fixed positions and gestures above the street. As a result, the detection region is known and fixed. That means the moving distance of the automobiles can easily be acquired. Then the speed of the automobiles can be calculated. However, the scenarios of SpeedTalker is different from that of traditional visual approaches. The mobile phones are at the sidewalk and the positions and gestures are unknown. So novel approaches utilizing mobile phones to calculate the speed of the automobiles are needed. To get the relative position information between automobiles and mobile phones, we need to use cameras inside the mobile phones, which is analogous to knowing the position and gestures of the cameras in traditional CV based approaches. Apart from distance calculation, SpeedTalker utilizes acoustic signals to estimate the candidate trajectory of the automobiles. There are two advantages of acoustic signals over the visual signals. Firstly, the detection region of acoustic signals is broader than that of visual signals. Common cameras inside the microphones usually have narrow field of view(FOV). For example, the wide-angle camera of Samsung Galaxy Note 8 has 77◦ field of view. If we utilize the microphones of Samsung Galaxy Note 8 to detect automobiles, the detection field of view is around 160◦ according to the hyperbola model we propose. Secondly, compute complexity of acoustic signals processing is much lower than that of visual signals. If visual signals are utilized to complete the same work, each frame of the videos should be processed. The compute complexity of the processing is unacceptable. Automobile detection via mobile phones: Automobile detection is an important research area since undetected automobiles are likely to endanger human life. Mobile phones can be utilized to inform the users of the approaching automobiles. There are three approaches to sense the automobiles with mobile phones. The first approach is to install applications both on the automobiles and the mobile phones. Oki Electric Industry Co. Ltd. develops a mobile phone that notifies the users of the presence of the automobiles using DSRC[9]. Car-2-X utilizes ad-hoc and cellular networks to inform the pedestrians of the automobile with the same method[10]. The second approach is to sense the moving automobiles via images. Sivaraman proposed a general active-learning framework for on-road automobiles recognition and tracking based on videos[11]. Wang proposed WalkSafe, a mobile phone application based on the back camera to sense the automobiles[12]. The drawback of these work is that image processing needs huge calculating resources. And the camera of the mobile phone is needed to face the road, which makes the detection inconvenient. The third approach is to utilize acoustic signals to sense the automobiles. Tsuzuki proposed an automobile sound detection system for a mobile phone[13]. Takagi introduced a hybrid and electric vehicles detection system[14], which focused on switching noise of the electric motor. So they failed to detect automobiles other than these types. Li proposed Auto++, a system that detects approaching automobiles for smart phone users to detect all kinds of automobiles via overall acoustic signals[15]. However, all these works can only inform the user of the approach of the automobiles and can not estimate the speed of the automobile. Sensing via acoustic signals with mobile phones: Sensing with daily equipment is a popular issue. Sound waves can easily be transmitted and received by daily equipment, such as mobile phones and smart watches. Much work based on sound wave has been published. AAMouse measures the Doppler Shift of the sound waves transmitted by a mobile phone to track the phone itself with an accuracy Authorized licensed use limited to: Nanjing University. Downloaded on July 06,2021 at 04:35:27 UTC from IEEE Xplore. Restrictions apply

This article has been accepted for publication in a future issue of this journal,but has not been fully edited.Content may change prior to final publication.Citation information:DOI 10.1109/TMC.2020.3034354.IEEE Transactions on Mobile Computing IEEE TRANSACTIONS ON MOBILE COMPUTING,VOL.XX,NO.XX,2020 4200 415 4100 405 茶 30 380 2 Time (secs) (a)Empirical study setup. (b)STFT of acoustic signals when the automobile passes (c)S-shaped curve. by. Fig.2:Simple analysis on acoustic signals. of 1.4cm[16].Wang proposed a device-free gesture tracking on a tripod.As shown in figure 2a,the tripod is set at one method using acoustic signals[17].It has a tracking accuracy side of the road with its camera facing the road.And the of 3.5mm and 4.6mm respectively for 1-D hand movement mobile phone is in the landscape orientation.The mobile and 2-D drawing in the air.ApenaApp,uses chirp signals phone is about 1.5m above the ground,and about 8m away to detect the changes in reflected sound that are caused from the lane.The mobile phone records the sound when by human breaths[18].The system applies FFT over the the automobile passes by.The sampling rate fs of the sound acoustic signals to monitor the periodical movements that in empirical study is 44.1kHz. have frequency lower than 1Hz.All these works need to transmit active sound wave to sense objects.However they 3.1.2 Doppler Effect do not work if they are applied outdoors in a long distance with powerful environmental noise.Above all,calculating The usual way to estimate the speed of the moving object the speed of the automobile with a mobile phone in ourdoor is to utilize Doppler Effect.If we already know the frequency environment is quite challenging. f of the original wave,the frequency f'of real-time wave Distance perception via cameras:Distance perception should be given by: is demanded in computer vision technology to optimize v2t the algorithm and enhance the performance.Traditional f C2f (1) C2-2 VC2v2t2+12(C2-v2) approaches to estimate the distance between the object where C is the velocity of sound,v is the velocity of the au- and the camera is to use binocular system to calculate the tomobile,is the closest distance between the mobile phone depth.Hartley gives detailed view geometry in computer and the automobile,and t is the time[141.The distance vision for distance calculating[19].Tram utilizes two cam- between the mobile phone and the automobile is shortest eras mounted in the automobiles to capture LED light and at t=0.To calculate the speed of the automobile,one of the estimate the distance between vehicles[20].However,the approach is not suitable for our scenario.Although some problems is to find the original frequency f and real-time frequency f'of a specific sound wave. mobile phones have multiple cameras at the backside,the cameras have its own roles.Some cameras have wide-angle First we focus on the original frequency f of the moving automobile.Since active sensing does not work in our sce- lens,some have telephoto lens and some have infrared lens. They may not work together at the same time.Moreover nario,we do not transmit sound wave in specific frequency. As a result,we have to analyze the sound made by the au- some mobile phones only have one camera at the backside. Some other papers use one camera to estimate the distance. tomobile to find the original frequency.In fact,automobile noises include tyre noise,engine noise,wind noise,exhaust Diaz-Cabrera utilizes one camera to estimate the distance between the automobile and the traffic light[21].They need noise,wind noise and so on.The frequency of tyre noise is to know the height of the traffic light and the parameter widely distributed.The peak part locates between 315Hz and 1000Hz[23].The engine noise is dominated by the of the cameras in advance.Rahman utilizes one camera to rotation speed of the engine.The frequency of the engine estimate the distance between the user and the camera[221. noise is mainly distributed from 1600H2 to 4000Hz and They also need to know the distance between the eyes and the peak part concentrates in the range from 100Hz to the parameters of the cameras.Our approach uses similar view geometry to calculate the distance and we can get the 400Hz[24].The frequency of exhaust noise and wind noise real diameter of the wheel hub through machine learning is closely related to the speed of the automobiles.All these approaches. noises vary with the type of the automobiles,tyres,engines and so on.This means automobile noise does not have specific frequency and varies with specific automobiles.We 3 EMPIRICAL STUDY AND MODELING cannot find the original frequency f in our scenario. 3.1 Acoustic Signal Study Then we focus on the real-time frequency f.Figure 2b shows the short-time Fourier transform(STFT)of the process 3.1.1 Measurement of Acoustic Signals via Mobile Phones when the automobile passes by.We can see that the power In order to study the relations between the acoustic signals of full-frequency band increases.It is a hard job to focus and the speed of the automobile,we need to collect acoustic on a specific frequency to calculate the speed.That is to signals when automobiles pass by.To avoid the influence of say,we can hardly know the reason for the increase of jitters from the mobile phone,we deploy the mobile phone the specific frequency power.The increase may be because 1536-1233(c)2020 IEEE Personal use is permitted,but republication/redistribution requires IEEE permission.See http://www.ieee.org/publications_standards/publications/rights/index.html for more information. Authorized licensed use limited to:Nanjing University.Downloaded on July 06,2021 at 04:35:27 UTC from IEEE Xplore.Restrictions apply

1536-1233 (c) 2020 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information. This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/TMC.2020.3034354, IEEE Transactions on Mobile Computing IEEE TRANSACTIONS ON MOBILE COMPUTING, VOL. XX, NO. XX, 2020 4 (a) Empirical study setup. (b) STFT of acoustic signals when the automobile passes by. (c) S-shaped curve. Fig. 2: Simple analysis on acoustic signals. of 1.4cm[16]. Wang proposed a device-free gesture tracking method using acoustic signals[17]. It has a tracking accuracy of 3.5mm and 4.6mm respectively for 1-D hand movement and 2-D drawing in the air. ApenaApp, uses chirp signals to detect the changes in reflected sound that are caused by human breaths[18]. The system applies FFT over the acoustic signals to monitor the periodical movements that have frequency lower than 1Hz. All these works need to transmit active sound wave to sense objects. However they do not work if they are applied outdoors in a long distance with powerful environmental noise. Above all, calculating the speed of the automobile with a mobile phone in ourdoor environment is quite challenging. Distance perception via cameras: Distance perception is demanded in computer vision technology to optimize the algorithm and enhance the performance. Traditional approaches to estimate the distance between the object and the camera is to use binocular system to calculate the depth. Hartley gives detailed view geometry in computer vision for distance calculating[19]. Tram utilizes two cameras mounted in the automobiles to capture LED light and estimate the distance between vehicles[20]. However, the approach is not suitable for our scenario. Although some mobile phones have multiple cameras at the backside, the cameras have its own roles. Some cameras have wide-angle lens, some have telephoto lens and some have infrared lens. They may not work together at the same time. Moreover some mobile phones only have one camera at the backside. Some other papers use one camera to estimate the distance. Diaz-Cabrera utilizes one camera to estimate the distance between the automobile and the traffic light[21]. They need to know the height of the traffic light and the parameter of the cameras in advance. Rahman utilizes one camera to estimate the distance between the user and the camera[22]. They also need to know the distance between the eyes and the parameters of the cameras. Our approach uses similar view geometry to calculate the distance and we can get the real diameter of the wheel hub through machine learning approaches. 3 EMPIRICAL STUDY AND MODELING 3.1 Acoustic Signal Study 3.1.1 Measurement of Acoustic Signals via Mobile Phones In order to study the relations between the acoustic signals and the speed of the automobile, we need to collect acoustic signals when automobiles pass by. To avoid the influence of jitters from the mobile phone, we deploy the mobile phone on a tripod. As shown in figure 2a, the tripod is set at one side of the road with its camera facing the road. And the mobile phone is in the landscape orientation. The mobile phone is about 1.5m above the ground, and about 8m away from the lane. The mobile phone records the sound when the automobile passes by. The sampling rate fs of the sound in empirical study is 44.1kHz. 3.1.2 Doppler Effect The usual way to estimate the speed of the moving object is to utilize Doppler Effect. If we already know the frequency f of the original wave, the frequency f 0 of real-time wave should be given by: f 0 = C 2f C2 − v 2 ( 1 − v 2 t p C2v 2t 2 + l 2(C2 − v 2) ) (1) where C is the velocity of sound, v is the velocity of the automobile, l is the closest distance between the mobile phone and the automobile, and t is the time[14]. The distance between the mobile phone and the automobile is shortest at t = 0. To calculate the speed of the automobile, one of the problems is to find the original frequency f and real-time frequency f 0 of a specific sound wave. First we focus on the original frequency f of the moving automobile. Since active sensing does not work in our scenario, we do not transmit sound wave in specific frequency. As a result, we have to analyze the sound made by the automobile to find the original frequency. In fact, automobile noises include tyre noise, engine noise, wind noise, exhaust noise, wind noise and so on. The frequency of tyre noise is widely distributed. The peak part locates between 315Hz and 1000Hz[23]. The engine noise is dominated by the rotation speed of the engine. The frequency of the engine noise is mainly distributed from 1600Hz to 4000Hz and the peak part concentrates in the range from 100Hz to 400Hz[24]. The frequency of exhaust noise and wind noise is closely related to the speed of the automobiles. All these noises vary with the type of the automobiles, tyres, engines and so on. This means automobile noise does not have specific frequency and varies with specific automobiles. We cannot find the original frequency f in our scenario. Then we focus on the real-time frequency f 0 . Figure 2b shows the short-time Fourier transform(STFT) of the process when the automobile passes by. We can see that the power of full-frequency band increases. It is a hard job to focus on a specific frequency to calculate the speed. That is to say, we can hardly know the reason for the increase of the specific frequency power. The increase may be because Authorized licensed use limited to: Nanjing University. Downloaded on July 06,2021 at 04:35:27 UTC from IEEE Xplore. Restrictions apply

This article has been accepted for publication in a future issue of this journal,but has not been fully edited.Content may change prior to final publication.Citation information:DOI 10.1109/TMC.2020.3034354.IEEE Transactions on Mobile Computing IEEE TRANSACTIONS ON MOBILE COMPUTING,VOL.XX,NO.XX,2020 5 Driving Direction B D Top Microphone Bottom Microphone (a)Positions of the automobile at different time. 0. 04 -Bottom-Top Bottom-Top Bottom-Top Bottom-Top Bottom-Tc 13 A. -02 -03 -0 0.005 0.01 0.005 0.01 0.005 0.01 0.005 0.0 0.005 0.01 Time(s) Time(s) Time(s) Time(s) Time(s) (b)Signals from micro-(c)Signals from micro-(d)Signals from micro-(e)Signals from micro-(f)Signals from micro- phones at position A phones at position B. phones at position C. phones at position D. phones at position E. 0 N-w 2 21 -250 0 250 S0 500 -250 250 00 -250 0 250 500 -250 250 -500 -250 0 250 00 Time Delay(sample) Time Delay(sample) Time Delay(sample) Time Delay(sample) Time Delay(sample) (g)Cross-correlation at po-(h)Cross-correlation at po- (i)Cross-correlation posi- (j)Cross-correlation at posi-(k)Cross-correlation at po- sition A. sition B. tion C. tion D. sition E. Fig.3:Empirical study. of the approaching of the automobile,or the shift of the they are in the similar shape with certain time delays. original high-power frequency. To further study the relation between the two signals,we To conclude,if Doppler Effect can be utilized to solve the calculate the cross-correlation[25]between the two signals. problem in our scenario,we should find some S-shaped Figure 3g to figure 3k show the cross-correlation between curves[14]in the spectrogram.The S-shaped curves show signals collected by the top and the bottom microphones that some specific frequencies shift to the lower frequency at different positions.Signals in figure 3b and figure 3c are in the spectrogram of the acoustic signals,which can be cal- recorded at the top side of the mobile phone.We can see the culated by equation(1).For example,if we let f =4000Hz signals from top microphone is ahead of the signals from I 10m,v =20m/s,C=340m/s,we can get the S-shaped bottom microphone.From figure 3g and figure 3h we can curve as figure 2c shows.We can not find any S-shaped see the time delay can be calculated from the value of cross- curve in the spectrogram of the acoustic signal.That means correlation between the two signals.Similarly,figure 3e and Doppler Effect cannot be used to estimate the speed in our figure 3f show the signals when the automobile is at the scenario. bottom side of the mobile phone.Time delays of position D and position E are-9 and-15. 3.1.3 Correlation between the Acoustic Signals Since we have the idea that the acoustic signals from the Since frequency domain cannot help us estimate the speed top microphone and the bottom microphone are temporally of the automobile,we may look for clues in time domain. related,we can split the signals into small segments to study To understand how automobile speed affects the acoustic the detailed relationship.This give us the chance to calculate signals from automobiles,it is essential to extract spatial and the speed of the automobile. temporal information from received acoustic signals.Since we have two audio streams recorded at the same time from 3.2 Modeling Automobile Speed via Microphone and the top and the bottom microphones,we have the chance to Camera calculate the spatial information. 3.2.1 Build the Coordinate System Figure 3a shows five positions of the automobile's trace We can use three-dimensional coordinate system to describe we choose to study.We record the sound for 0.01s with the scenario,just as figure 4a shows.The origin is located both top and bottom microphones at each place.Figure 3b at the midpoint of MiM2.M and M2 are the points to figure 3f show the raw signals at position A to position representing the two microphones.The x-axis is horizontal E.Although the waveforms of the two acoustic signals are and points to the right,the y-axis points towards the outside different in detail due to the difference of the microphones, of the screen face and the z-axis is vertical and points up. 1536-1233(c)2020 IEEE Personal use is permitted,but republication/redistribution requires IEEE permission.See http://www.ieee.org/publications_standards/publications/rights/index.html for more information. Authorized licensed use limited to:Nanjing University.Downloaded on July 06,2021 at 04:35:27 UTC from IEEE Xplore.Restrictions apply

1536-1233 (c) 2020 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information. This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/TMC.2020.3034354, IEEE Transactions on Mobile Computing IEEE TRANSACTIONS ON MOBILE COMPUTING, VOL. XX, NO. XX, 2020 5 A B C D E Driving Direction Top Microphone Bottom Microphone (a) Positions of the automobile at different time. 0 0.005 0.01 Time(s) -0.4 -0.2 0 0.2 0.4 Amplitude Bottom Top (b) Signals from microphones at position A. 0 0.005 0.01 Time(s) -0.4 -0.2 0 0.2 0.4 Amplitude Bottom Top (c) Signals from microphones at position B. 0 0.005 0.01 Time(s) -0.4 -0.2 0 0.2 0.4 Amplitude Bottom Top (d) Signals from microphones at position C. 0 0.005 0.01 Time(s) -0.4 -0.2 0 0.2 0.4 Amplitude Bottom Top (e) Signals from microphones at position D. 0 0.005 0.01 Time(s) -0.4 -0.2 0 0.2 0.4 Amplitude Bottom Top (f) Signals from microphones at position E. -500 -250 0 250 500 Time Delay(sample) -2 -1 0 1 2 Correlation 19 (g) Cross-correlation at position A. -500 -250 0 250 500 Time Delay(sample) -2 -1 0 1 2 Correlation 8 (h) Cross-correlation at position B. -500 -250 0 250 500 Time Delay(sample) -2 -1 0 1 2 Correlation 0 (i) Cross-correlation position C. -500 -250 0 250 500 Time Delay(sample) -2 -1 0 1 2 Correlation -9 (j) Cross-correlation at position D. -500 -250 0 250 500 Time Delay(sample) -2 -1 0 1 2 Correlation -15 (k) Cross-correlation at position E. Fig. 3: Empirical study. of the approaching of the automobile, or the shift of the original high-power frequency. To conclude, if Doppler Effect can be utilized to solve the problem in our scenario, we should find some S-shaped curves[14] in the spectrogram. The S-shaped curves show that some specific frequencies shift to the lower frequency in the spectrogram of the acoustic signals, which can be calculated by equation(1). For example, if we let f = 4000Hz, l = 10m, v = 20m/s, C = 340m/s, we can get the S-shaped curve as figure 2c shows. We can not find any S-shaped curve in the spectrogram of the acoustic signal. That means Doppler Effect cannot be used to estimate the speed in our scenario. 3.1.3 Correlation between the Acoustic Signals Since frequency domain cannot help us estimate the speed of the automobile, we may look for clues in time domain. To understand how automobile speed affects the acoustic signals from automobiles, it is essential to extract spatial and temporal information from received acoustic signals. Since we have two audio streams recorded at the same time from the top and the bottom microphones, we have the chance to calculate the spatial information. Figure 3a shows five positions of the automobile’s trace we choose to study. We record the sound for 0.01s with both top and bottom microphones at each place. Figure 3b to figure 3f show the raw signals at position A to position E. Although the waveforms of the two acoustic signals are different in detail due to the difference of the microphones, they are in the similar shape with certain time delays. To further study the relation between the two signals, we calculate the cross-correlation[25] between the two signals. Figure 3g to figure 3k show the cross-correlation between signals collected by the top and the bottom microphones at different positions. Signals in figure 3b and figure 3c are recorded at the top side of the mobile phone. We can see the signals from top microphone is ahead of the signals from bottom microphone. From figure 3g and figure 3h we can see the time delay can be calculated from the value of crosscorrelation between the two signals. Similarly, figure 3e and figure 3f show the signals when the automobile is at the bottom side of the mobile phone. Time delays of position D and position E are -9 and -15. Since we have the idea that the acoustic signals from the top microphone and the bottom microphone are temporally related, we can split the signals into small segments to study the detailed relationship. This give us the chance to calculate the speed of the automobile. 3.2 Modeling Automobile Speed via Microphone and Camera 3.2.1 Build the Coordinate System We can use three-dimensional coordinate system to describe the scenario, just as figure 4a shows. The origin is located at the midpoint of M1M2. M1 and M2 are the points representing the two microphones. The x-axis is horizontal and points to the right, the y-axis points towards the outside of the screen face and the z-axis is vertical and points up. Authorized licensed use limited to: Nanjing University. Downloaded on July 06,2021 at 04:35:27 UTC from IEEE Xplore. Restrictions apply

This article has been accepted for publication in a future issue of this journal,but has not been fully edited.Content may change prior to final publication.Citation information:DOI 10.1109/TMC.2020.3034354.IEEE Transactions on Mobile Computing IEEE TRANSACTIONS ON MOBILE COMPUTING,VOL.XX,NO.XX,2020 100 --6 Mf-4.0,△h M.0.-4h 4 Time(s) (a)3-D coordinate system of the scenario (a)Time delays at different time Constraint 2 M-, M:,0) 200 400 600 800 Serial number i of the segments (b)Simplified 2-D coordinate system (b)Maximum correlation distribution Fig.4:The model of the scenario. Fig.5:Cross-correlation of the corresponding segments. The coordinate of Mi and M2 is(-1,0,Ah)and (1,0,Ah). only 0.5 meter,which means we can approximately consider 2l represents the horizontal distance between the two micro- that the position of the sound source remains unchanged in phones and 2Ah represents the height difference between one segment. the two microphones when the mobile phone is in landscape After the segmentation of the acoustic signals,we get orientation.S(r,y,h)represents the sound source.h of two sequences of segments S1 {Wi1W12...Win},S2 S(z,y,h)represents the height difference between the x- {W21W22...W2n}from the top and the bottom micro- y plane and the sound source.Since Ah<l and hz or phones respectively.The following equations calculate the y,we can simplify the scenario into a 2-D model as shown cross-correlations Ri and delays Adi,where i represents the in figure 4b which means h and Ah can be ignored. serial number of the segment pairs: 3.2.2 Preprocessing of the Acoustic Signals B.(n)Wa(m)War(m +n). (2) In this section we split the acoustic signals into small seg- m=-ns ments and calculate the cross-correlation of the correspond- △d=arg max(R:(t): (3) ing segments to get the time delay. To further study the time delay between the two acoustic After we get the result of cross-correlation Ri(n),we may signals,we need to split the signals into segments sorted by find the the largest element Ri(t).And the Adi =t who time.The size of the segment needs to be discussed.We will makes Ri(n)largest is the time delay of the i-th pair of get one time delay from one pair of corresponding segments. segments. The more sampling points one segment includes,the more After we get the time delay Ad between the correspond- time one segment will last for.As a result,the fewer segment ing segments,we want to know whether Ad is suitable for pairs and time delays we will get.This will cause two our system.Some points we get from the equation may be troubles.First,the automobile will change its position in one erroneous due to different kinds of noises.The time delays segment.If the size is too large,the automobile will drive with little noise,which are suitable for further calculation for a long distance.This makes the time delay inaccurate should satisfy thethe following constraints: since the sound source cannot no longer be considered 1)The delay Ad should be less than the maximum as a point.Second,if the amount of time delays is too time delay Adm determined by the type of the small,the time delay curve we draw will be coarse-grained. This influence the estimation precision.However,the fewer mobile phone. 2) The correlation of the corresponding segments sampling points one segment includes,the more easily the should exceed a preset threshold R. segment will be influenced by the environment noise.As a result,we need to choose an appropriate segment size. The upper bound of the valid delay in constraint 1 is In our scenario,we let one segment consist of ns inferred from triangle inequality.We can see from figure 4b fs/100 =441 samples,which means one segment lasts that the M1S-M2S<MM2,where M1S-M2S]can for 0.01 second.In this case,the signals from the top and be calculated by the time delay and MM2 is the distance the bottom microphones are similar enough to calculate the between the two microphones.As a result,the value is time delay.Suppose the speed of the automobile is about mainly determined by the distance between the top and 50m/s(180km/h),in one segment the automobile moves bottom microphones.Suppose the sampling rate is fs,the 36-1233(c)2020 IEEE Personal use is permitted,but republication/redistribution requires IEEE permission.See http://www.ieee.org/publications_standards/publications/rights/index.html for more information. Authorized licensed use limited to:Nanjing University.Downloaded on July 06,2021 at 04:35:27 UTC from IEEE Xplore.Restrictions apply

1536-1233 (c) 2020 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information. This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/TMC.2020.3034354, IEEE Transactions on Mobile Computing IEEE TRANSACTIONS ON MOBILE COMPUTING, VOL. XX, NO. XX, 2020 6 (a) 3-D coordinate system of the scenario. (b) Simplified 2-D coordinate system. Fig. 4: The model of the scenario. The coordinate of M1 and M2 is (−l, 0, ∆h) and (l, 0, ∆h). 2l represents the horizontal distance between the two microphones and 2∆h represents the height difference between the two microphones when the mobile phone is in landscape orientation. S (x, y, h) represents the sound source. h of S (x, y, h) represents the height difference between the xy plane and the sound source. Since ∆h l and h x or y, we can simplify the scenario into a 2-D model as shown in figure 4b which means h and ∆h can be ignored. 3.2.2 Preprocessing of the Acoustic Signals In this section we split the acoustic signals into small segments and calculate the cross-correlation of the corresponding segments to get the time delay. To further study the time delay between the two acoustic signals, we need to split the signals into segments sorted by time. The size of the segment needs to be discussed. We will get one time delay from one pair of corresponding segments. The more sampling points one segment includes, the more time one segment will last for. As a result, the fewer segment pairs and time delays we will get. This will cause two troubles. First, the automobile will change its position in one segment. If the size is too large, the automobile will drive for a long distance. This makes the time delay inaccurate since the sound source cannot no longer be considered as a point. Second, if the amount of time delays is too small, the time delay curve we draw will be coarse-grained. This influence the estimation precision. However, the fewer sampling points one segment includes, the more easily the segment will be influenced by the environment noise. As a result, we need to choose an appropriate segment size. In our scenario, we let one segment consist of ns = fs/100 = 441 samples, which means one segment lasts for 0.01 second. In this case, the signals from the top and the bottom microphones are similar enough to calculate the time delay. Suppose the speed of the automobile is about 50m/s(180km/h), in one segment the automobile moves 02468 Time(s) -50 0 50 100 Time Delay(sample) Constraint 1 (a) Time delays at different time. 0 200 400 600 800 Serial number i of the segments 0 1 2 3 Maximum correlation Constraint 2 (b) Maximum correlation distribution. Fig. 5: Cross-correlation of the corresponding segments. only 0.5 meter, which means we can approximately consider that the position of the sound source remains unchanged in one segment. After the segmentation of the acoustic signals, we get two sequences of segments S1 = {W11W12 . . . W1n}, S2 = {W21W22 . . . W2n} from the top and the bottom microphones respectively. The following equations calculate the cross-correlations Ri and delays ∆di , where i represents the serial number of the segment pairs: Ri(n) = Xns m=−ns W1i(m)W2i(m + n). (2) ∆di = arg max t∈N (Ri(t)). (3) After we get the result of cross-correlation Ri(n), we may find the the largest element Ri(t). And the ∆di = t who makes Ri(n) largest is the time delay of the i-th pair of segments. After we get the time delay ∆d between the corresponding segments, we want to know whether ∆d is suitable for our system. Some points we get from the equation may be erroneous due to different kinds of noises. The time delays with little noise, which are suitable for further calculation should satisfy the the following constraints: 1) The delay ∆d should be less than the maximum time delay ∆dm determined by the type of the mobile phone. 2) The correlation of the corresponding segments should exceed a preset threshold Rs. The upper bound of the valid delay in constraint 1 is inferred from triangle inequality. We can see from figure 4b that the |M1S − M2S| < M1M2, where |M1S − M2S| can be calculated by the time delay and M1M2 is the distance between the two microphones. As a result, the value is mainly determined by the distance between the top and bottom microphones. Suppose the sampling rate is fs, the Authorized licensed use limited to: Nanjing University. Downloaded on July 06,2021 at 04:35:27 UTC from IEEE Xplore. Restrictions apply

IEEE TRANSACTIONS ON MOBILE COMPUTING,VOL.XX,NO.XX,2020 30 Y(m)1 .Invalid Actual Trace D .Valid 6 10 False Trace Maior X(m) Detection Kegion -0.60.4 、0.20.40.6 20 2 Field of View Time(s)】 (a)Time delay curve (a)Hyperbolas generated by the time delays. 20 ·0 riginal -Smoothed Tracel 10 B 0 5 -10 Trace2 L -20 L 2.5 3 3.544.555.56 H H' X Time(s) 0 (b)Smoothed time delay curve. (b)A simplified model of the asymptotes and the trace. Fig.6:Generating Time Delay curve Fig.7:Slope Calculating. maximum valid delay Adm between the two signals should can be determined by the correlation of these time delays be calculated as equation(4): For example,in figure 5b,the threshold can be set as 0.5. 2lfs Constraint 2 can help us remove some of the noise appears △dm= C (4) in Major Detection Region.Figure 6a draws the time delay where 21 is the distance between the two microphones and curve with blue points represent the valid time delays and C is the speed of sound.For example,we set fs =44.1KHz red points represent the invalid time delays. and C=343m/s,and the distance,without loss of gener- ality,of Samsung note 8(Experiments in Section 5 are based 3.2.3 Candidate Trajectories Estimation on this type of mobile phone.)21=15cm,thus Adm is After we get a series of time delays,we need to recover the 0.15m×44100s- trace of the automobile.We utilize Major Detection Region 343m/s -=19.220 samples.We denote the delay between the segment pair as Ad.According to triangle to estimate candidate trajectories of the automobile.The inequality,the valid delay we get from cross-correlation duration of Major Detection Region is less than 3 seconds should be an integer whose absolute value Ad is less than in most situations.For example,in figure 6a the duration of Adm.That means Ad should be an integer ranging from Major Detection Region is 1.5 seconds.Since the duration is -Adm to Adm,just as figure 5a shows.We define the region short we can assume that the trace of the automobile is a where the time delays vary from Adm to -Adm(or on the line.It is known that in two dimensions,the linear trace can contrary)as Major Detection Region.For example,the Major be represented as: Detection Region in figure 5a starts at 2.5s and ends at 5s. y=mx+b. (5) In constraint 2,the threshold R,has its physical interpre- which means that we need two parameters to determine a tation.It implies that the automobile should be close enough line.The parameter m determines the slope of the line and to the mobile phone,which means the signals from the two the parameter b determines the vertical distance between corresponding segments should be similar enough.Cross- the automobile and the mobile phone. correlation is a measure of similarity of two signals.The First we try to calculate the parameter m through the larger the correlation is,the more similar the two signals time delays curve.If the time delay between the top and the will be.If the automobile is far from the mobile phone, bottom microphones is Ad at time t,the automobile should the sound made by the automobile will be too weak to locate in the hyperbolas whose foci are M1(-1,0)and dominate the signal,which means the signals from the top M2 (,0)and vertices are Vi(-zd,0)and V2(zAd,0)at and the bottom microphones are not similar enough.In this this moment.The mathematical expression of the hyperbola case,the cross-correlations of these segments are quite small. is: 12 y2 These time delays are not suitable for speed calculation. a2-2-1, (6) The threshold will change with different scenarios.And the where a=andb=√2-a2 threshold can be determined with constraint 1.Since we Figure 7a shows the hyperbolas generated by different should pay attention to Major Detection Region,we can set time delays.We can see from the figure that the hyper- the maximum cross-correlation of the boundary segments bolas look like a line.The reason is that in our scenario, in Major Detection Region as the threshold.In other words,,yl,where ,y is the coordinate of the automobile in according to constraint 1,there must exist a process in figure 4b,since l is usually shorter than 10cm and z,y are which the time delays are around Adm.The threshold usually longer than 5 meters.So we can use asymptote of Authorized licensed use limited to:Nanjing University.Downloaded on July 06,2021 at 04:35:27 UTC from IEEE Xplore.Restrictions apply

IEEE TRANSACTIONS ON MOBILE COMPUTING, VOL. XX, NO. XX, 2020 7 (a) Time delay curve. 2.5 3 3.5 4 4.5 5 5.5 6 Time(s) -20 -10 0 10 20 Time Delay(sample) Original Smoothed t5 t1 t2 t3 t6 t4 t7 (b) Smoothed time delay curve. Fig. 6: Generating Time Delay curve maximum valid delay ∆dm between the two signals should be calculated as equation (4): ∆dm = 2lfs C , (4) where 2l is the distance between the two microphones and C is the speed of sound. For example, we set fs = 44.1KHz and C = 343m/s, and the distance, without loss of generality, of Samsung note 8(Experiments in Section 5 are based on this type of mobile phone.) 2l = 15cm, thus ∆dm is 0.15m×44100s −1 343m/s = 19.2 ≈ 20 samples. We denote the delay between the segment pair as ∆d. According to triangle inequality, the valid delay we get from cross-correlation should be an integer whose absolute value |∆d| is less than ∆dm. That means ∆d should be an integer ranging from −∆dm to ∆dm, just as figure 5a shows. We define the region where the time delays vary from ∆dm to −∆dm(or on the contrary) as Major Detection Region. For example, the Major Detection Region in figure 5a starts at 2.5s and ends at 5s. In constraint 2, the threshold Rs has its physical interpretation. It implies that the automobile should be close enough to the mobile phone, which means the signals from the two corresponding segments should be similar enough. Crosscorrelation is a measure of similarity of two signals. The larger the correlation is, the more similar the two signals will be. If the automobile is far from the mobile phone, the sound made by the automobile will be too weak to dominate the signal, which means the signals from the top and the bottom microphones are not similar enough. In this case, the cross-correlations of these segments are quite small. These time delays are not suitable for speed calculation. The threshold will change with different scenarios. And the threshold can be determined with constraint 1. Since we should pay attention to Major Detection Region, we can set the maximum cross-correlation of the boundary segments in Major Detection Region as the threshold. In other words, according to constraint 1, there must exist a process in which the time delays are around ∆dm. The threshold (a) Hyperbolas generated by the time delays. X Y Trace1 � � � Trace2 �$ �$ �$ � �$ �) �( �' � � �$ (b) A simplified model of the asymptotes and the trace. Fig. 7: Slope Calculating. can be determined by the correlation of these time delays. For example, in figure 5b, the threshold can be set as 0.5. Constraint 2 can help us remove some of the noise appears in Major Detection Region. Figure 6a draws the time delay curve with blue points represent the valid time delays and red points represent the invalid time delays. 3.2.3 Candidate Trajectories Estimation After we get a series of time delays, we need to recover the trace of the automobile. We utilize Major Detection Region to estimate candidate trajectories of the automobile. The duration of Major Detection Region is less than 3 seconds in most situations. For example, in figure 6a the duration of Major Detection Region is 1.5 seconds. Since the duration is short we can assume that the trace of the automobile is a line. It is known that in two dimensions, the linear trace can be represented as: y = mx + b, (5) which means that we need two parameters to determine a line. The parameter m determines the slope of the line and the parameter b determines the vertical distance between the automobile and the mobile phone. First we try to calculate the parameter m through the time delays curve. If the time delay between the top and the bottom microphones is ∆d at time t, the automobile should locate in the hyperbolas whose foci are M1 (−l, 0) and M2 (l, 0) and vertices are V1 − 1 2∆d , 0 and V2 1 2∆d , 0 at this moment. The mathematical expression of the hyperbola is: x 2 a 2 − y 2 b 2 = 1, (6) where a = ∆d 2 and b = √ l 2 − a 2. Figure 7a shows the hyperbolas generated by different time delays. We can see from the figure that the hyperbolas look like a line. The reason is that in our scenario, x, y l, where x, y is the coordinate of the automobile in figure 4b, since l is usually shorter than 10cm and x, y are usually longer than 5 meters. So we can use asymptote of Authorized licensed use limited to: Nanjing University. Downloaded on July 06,2021 at 04:35:27 UTC from IEEE Xplore. Restrictions apply

This article has been accepted for publication in a future issue of this journal,but has not been fully edited.Content may change prior to final publication.Citation information:DOI 10.1109/TMC.2020.3034354.IEEE Transactions on Mobile Computing IEEE TRANSACTIONS ON MOBILE COMPUTING,VOL.XX,NO.XX,2020 the hyperbola instead using the analytic expression of the hyperbola[26]to simplify the calculation.The mathematical object plane expression of the asymptote is: y= (7 a and b in the equation (7)are the same as that in equa- tion(6).The exact location of the automobile is impossible to be determined by one single time delay point,but through the time delay curve,we can get a series of hyperbolas the automobile should be located in.As a result,we use a series image plane of time delays in Major Detection Region to estimate the slope of the trace. First we can select n time delays with the same time intervals between the adjacent time delays.For example,we select n =7 time delays from ti to t7 in figure 6b with the time interval equals 0.25s.To illustrate the estimation in de- (a)The pin-hole model of the camera system. tail,we focus on the three adjacent asymptotes in figure 7b. We can simplify the mathematical expression of asymptote Y object plane t1 as y kix where k =bL= vPAd.Similarly, △d the mathematical expressions of asymptote t2 and t3 are y k2x and y kax.Combined with equation 5,we can get the coordinates of intersection points A,B,C between the asymptotes and the trace t,t2,t3.The coordinate of A is ()the coordinate of B is (and the coordinate of Cis(The distance between A and B is: image plane vm2+1 =[ABl=b(k2-k1)(-m)(k2 -m) (8) The distance l2 between B and C is: vm2+1 12 BCl=b(ks-ka)(ka m)(k3 -m) (9) The duration of automobile moving from A to C is short (b)Position estimation. enough(0.5s).It is reasonable to assume the speed of Fig.8:Image processing. the automobile remains stable in this period.That means gk-aia:Then是=名-t=land&is independent 1 3.2.4 Estimating Distance through the Image of the Auto- of the parameter b.That means the parameter b can not be mobile determined and the parameter m can be calculated. However,the parameter m should not be calculated After we use least square estimation(LSE)to calculate the slope m of the trajectory,we need to calculate the param- by only three asymptotes.So we use least squares estima- tion(LSE)to estimate the parameter m.Since the automo- eter b of the mathematical expression of the trajectory.The bile's speed remains unchanged during the period from t1 parameter b can not be determined by acoustic signals.It is to t3 and t2-t1=t3-t2,the length difference Al12 between known that everything looks small in the distance and big l and l2 should be minimum.The estimation function of m on the contrary.Similarly,the longer the distance between s the object and the camera is,the fewer pixels the object in the E(m)=l-122 image taken from cameras contains.Therefore,we use the =b2（4m+B)2(m2+1) camera to estimate the distance between the automobile and (10) the mobile phone.We can continuous record the images of (61-m)(2-m)(-m)2 the automobile,and estimate the vertical distance L between where A k1+k3 -2k2,B k1k2 k2k3 -2k1k3 and the mobile phone and the automobile.In this section,we parameter b should be seen as a constant. will illustrate how to estimate the vertical distance from one After we get the estimation function of m with three frame of the video.The increase of frame processing will asymptotes,we can modify the estimation function into n improve the distance evaluation accuracy and increase the asymptotes to find the fittest m.When we take the n time time consumption of the system. delays we select before,the estimation function is: One camera can be simplified to a pin-hole model.In n-2 figure 8a and figure 7b,L is the vertical distance between E(m)=∑l+1-42. (11) the automobile and the mobile phone.H is the real length i=1 of the object,and h represents the length of the object in li can be calculated similarly as equation(8). image plane.Since,we can get the distance as The parameter m=arg min E(m).We can let(m)=0 follows: to calculate m. =(层-≈层 (12) 1536-1233(c)2020 IEEE Personal use is permitted,but republication/redistribution requires IEEE permission.See http://www.ieee.org/publications_standards/publications/rights/index.html for more information. Authorized licensed use limited to:Nanjing University.Downloaded on July 06,2021 at 04:35:27 UTC from IEEE Xplore.Restrictions apply

1536-1233 (c) 2020 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information. This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/TMC.2020.3034354, IEEE Transactions on Mobile Computing IEEE TRANSACTIONS ON MOBILE COMPUTING, VOL. XX, NO. XX, 2020 8 the hyperbola instead using the analytic expression of the hyperbola[26] to simplify the calculation. The mathematical expression of the asymptote is: y = b a x. (7) a and b in the equation (7) are the same as that in equation (6). The exact location of the automobile is impossible to be determined by one single time delay point, but through the time delay curve, we can get a series of hyperbolas the automobile should be located in. As a result, we use a series of time delays in Major Detection Region to estimate the slope of the trace. First we can select n time delays with the same time intervals between the adjacent time delays. For example, we select n = 7 time delays from t1 to t7 in figure 6b with the time interval equals 0.25s. To illustrate the estimation in detail, we focus on the three adjacent asymptotes in figure 7b. We can simplify the mathematical expression of asymptote t1 as y = k1x where k1 = b1 a1 = √ 4l 2−∆d2 ∆d . Similarly, the mathematical expressions of asymptote t2 and t3 are y = k2x and y = k3x. Combined with equation 5, we can get the coordinates of intersection points A, B, C between the asymptotes and the trace t1, t2, t3. The coordinate of A is ( b k1−m , k1b k1−m ), the coordinate of B is ( b k2−m , k2b k2−m ) and the coordinate of C is ( b k3−m , k3b k3−m ). The distance l1 between A and B is: l1 = |AB| = b(k2 − k1) √ m2 + 1 (k1 − m)(k2 − m) . (8) The distance l2 between B and C is : l2 = |BC| = b(k3 − k2) p (m2 + 1) (k2 − m)(k3 − m) . (9) The duration of automobile moving from A to C is short enough(0.5s). It is reasonable to assume the speed of the automobile remains stable in this period. That means l1 t2−t1 = l2 t3−t2 . Then l1 l2 = t2−t1 t3−t2 = 1 and l1 l2 is independent of the parameter b. That means the parameter b can not be determined and the parameter m can be calculated. However, the parameter m should not be calculated by only three asymptotes. So we use least squares estimation(LSE) to estimate the parameter m. Since the automobile’s speed remains unchanged during the period from t1 to t3 and t2−t1 = t3−t2, the length difference ∆l12 between l1 and l2 should be minimum. The estimation function of m is: E(m) = |l1 − l2| 2 = b 2 (Am + B) 2 (m2 + 1) ((k1 − m)(k2 − m)(k3 − m))2 , (10) where A = k1 + k3 − 2k2, B = k1k2 + k2k3 − 2k1k3 and parameter b should be seen as a constant. After we get the estimation function of m with three asymptotes, we can modify the estimation function into n asymptotes to find the fittest m. When we take the n time delays we select before, the estimation function is: E(m) = nX−2 i=1 |li+1 − li | 2 . (11) li can be calculated similarly as equation (8). The parameter m = arg min m E(m). We can let ∂E(m) ∂m = 0 to calculate m. (a) The pin-hole model of the camera system. (b) Position estimation. Fig. 8: Image processing. 3.2.4 Estimating Distance through the Image of the Automobile After we use least square estimation(LSE) to calculate the slope m of the trajectory, we need to calculate the parameter b of the mathematical expression of the trajectory. The parameter b can not be determined by acoustic signals. It is known that everything looks small in the distance and big on the contrary. Similarly, the longer the distance between the object and the camera is, the fewer pixels the object in the image taken from cameras contains. Therefore, we use the camera to estimate the distance between the automobile and the mobile phone. We can continuous record the images of the automobile, and estimate the vertical distance L between the mobile phone and the automobile. In this section, we will illustrate how to estimate the vertical distance from one frame of the video. The increase of frame processing will improve the distance evaluation accuracy and increase the time consumption of the system. One camera can be simplified to a pin-hole model. In figure 8a and figure 7b, L is the vertical distance between the automobile and the mobile phone. H is the real length of the object, and h represents the length of the object in image plane. Since H h = L+f f , we can get the distance as follows: L = f H h − 1 ≈ f H h , (12) Authorized licensed use limited to: Nanjing University. Downloaded on July 06,2021 at 04:35:27 UTC from IEEE Xplore. Restrictions apply

This article has been accepted for publication in a future issue of this journal,but has not been fully edited.Content may change prior to final publication.Citation information:DOI 10.1109/TMC.2020.3034354.IEEE Transactions on Mobile Computing IEEE TRANSACTIONS ON MOBILE COMPUTING,VOL.XX,NO.XX,2020 (a)Coordinate system of IMU on mobile (b)Distance difference between acoustic sig- (c)Rotating around x-axis. phones. nals. Fig.9:Jitters removing. where f is the focal length of the camera.f and h are related 14 10 to the type of the mobile phones.The most ideal situation Translate lem in the direction of z-axis is that the parameters of mainstream mobile phones are 12 Translate Iem in the direction of y-axt Rotate 5 degrees around x-axis stored in the application.And the approach to calculate -Rotate 5 degrees around z-axis the parameters by the system is not complicated.If we 8 take a picture of an object He meters height and measure the distance L.between the object and the camera with a constant resolution.The distance L can be calculated by the equationL If resolution of the image remains unchanged,we can use the number of pixels to replace h, the length of the object in image plane. As a result,we can calculate the distance by the image 10 15 20 if we already know the size of some components of the y(meter) automobile.Wheel hubs have mature standards and are Fig.10:Overall analysis. easy to be extracted from the picture.So we extract wheel hubs in the image to estimate the distance.To avoid the the figure,the x-axis is verticle and points up,the y-axis is horizontal and points to the left,and the z-axis points stretch of the wheel hubs caused by rolling shutter effect, towards the outside of the front face of the screen.We can we calculate the diameter of the wheel hub in the vertical see jitters can be divided into six categories,the rotation direction.The stretch of the wheel hubs will happen in the direction of automobiles'driving direction. around three axes and the translation along the directions of three axes.As figure 9b shows,with the translation and Usually the automobile will not appear in the middle of the picture.In this situation,the vertical distance L does not rotation of the mobile phone,the path length difference AD MIA-M2A changes even if the automobile is at equal the parameter b.Just as shown in the figure 8b,we can the same position. calculate the offset angle o between the center of the picture In fact,the path length difference AD can be calculated and the automobile with the viewing angle om and the as follows: resolution of the image.First we can locate the automobile and get the offset pixels x from the automobile to the centre △D=V(g+22+L2-V2+亚， (14) of the iamge.Then we can get tan=and tanom= where y=AB represents the displacement between the The angle can be calculated: automobile and the closer microphone projected onto y-axis, b=arctan(tanom)】 (13) 2l represents the distance between the two microphones and L represents the closest distance between the automobile m is determined by the type of the camera. and the mobile phone. After we get L and o,the trajectory of the automobile Then,we will evaluate the influence of translation and can be determined. rotation.The translation in the direction of x-axis can be ig- nored.The reason is that we have already ignored the height 3.2.5 Analysis of the Jitters difference when modeling the system.And the translation Although in our scenario the mobile phone needs to be held in the direction of x-axis is much shorter than the height still,jitters are unavoidable since the procedure of signals difference between the mobile phone and the sound source. collection lasts for several seconds.The jitters will change As a result,the translation in the direction of x-axis can be the coordinate system and increase the speed detection ignored.Similarly,the rotation around the y-axis does not error.As a result,inertial measurement unit(IMU)is used to affect the positions of microphones on y-axis.The rotation compensate the error.In this part we need to study different around y-axis can also be ignored. translations and rotations caused by the jitters in our model. Figure 10 shows the influences of remaining rotations We will analyze the influence of different translations and and translations.We can take the translation in the direction rotations of mobile phone. of z-axis as example.The translation in the direction of Figure 9a shows the IMU coordinate system,which is z-axis influences the distance L we estimate between the different from the coordinate system in Section 3.2.1.If the mobile phone and the automobile in figure 9b.The path mobile phone is hold in landscape orientation as shown in length difference AD can be modified as: 1536-1233(c)2020 IEEE Personal use is permitted,but republication/redistribution requires IEEE permission.See http://www.ieee.org/publications_standards/publications/rights/index.html for more information. Authorized licensed use limited to:Nanjing University.Downloaded on July 06,2021 at 04:35:27 UTC from IEEE Xplore.Restrictions apply

1536-1233 (c) 2020 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information. This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/TMC.2020.3034354, IEEE Transactions on Mobile Computing IEEE TRANSACTIONS ON MOBILE COMPUTING, VOL. XX, NO. XX, 2020 9 (a) Coordinate system of IMU on mobile phones. (b) Distance difference between acoustic signals. (c) Rotating around x-axis. Fig. 9: Jitters removing. where f is the focal length of the camera. f and h are related to the type of the mobile phones. The most ideal situation is that the parameters of mainstream mobile phones are stored in the application. And the approach to calculate the parameters by the system is not complicated. If we take a picture of an object Hc meters height and measure the distance Lc between the object and the camera with a constant resolution. The distance L can be calculated by the equation L = Lc Hhc Hch . If resolution of the image remains unchanged, we can use the number of pixels to replace h, the length of the object in image plane. As a result, we can calculate the distance by the image if we already know the size of some components of the automobile. Wheel hubs have mature standards and are easy to be extracted from the picture. So we extract wheel hubs in the image to estimate the distance. To avoid the stretch of the wheel hubs caused by rolling shutter effect, we calculate the diameter of the wheel hub in the vertical direction. The stretch of the wheel hubs will happen in the direction of automobiles’ driving direction. Usually the automobile will not appear in the middle of the picture. In this situation, the vertical distance L does not equal the parameter b. Just as shown in the figure 8b, we can calculate the offset angle φ between the center of the picture and the automobile with the viewing angle φm and the resolution of the image. First we can locate the automobile and get the offset pixels x from the automobile to the centre of the iamge. Then we can get tan φ = x f and tan φm = N f . The angle φ can be calculated: φ = arctan(x tan φm N ). (13) φm is determined by the type of the camera. After we get L and φ, the trajectory of the automobile can be determined. 3.2.5 Analysis of the Jitters Although in our scenario the mobile phone needs to be held still, jitters are unavoidable since the procedure of signals collection lasts for several seconds. The jitters will change the coordinate system and increase the speed detection error. As a result, inertial measurement unit (IMU) is used to compensate the error. In this part we need to study different translations and rotations caused by the jitters in our model. We will analyze the influence of different translations and rotations of mobile phone. Figure 9a shows the IMU coordinate system, which is different from the coordinate system in Section 3.2.1. If the mobile phone is hold in landscape orientation as shown in 0 5 10 15 20 25 y(meter) -2 0 2 4 6 8 10 12 14 Difference of D(meter) 10-3 Translate 1cm in the direction of z-axis Translate 1cm in the direction of y-axis Rotate 5 degrees around x-axis Rotate 5 degrees around z-axis Fig. 10: Overall analysis. the figure, the x-axis is verticle and points up, the y-axis is horizontal and points to the left, and the z-axis points towards the outside of the front face of the screen. We can see jitters can be divided into six categories, the rotation around three axes and the translation along the directions of three axes. As figure 9b shows, with the translation and rotation of the mobile phone, the path length difference ∆D = M1A − M2A changes even if the automobile is at the same position. In fact, the path length difference ∆D can be calculated as follows: ∆D = q (y + 2l) 2 + L2 − p y 2 + L2, (14) where y = |AB| represents the displacement between the automobile and the closer microphone projected onto y-axis, 2l represents the distance between the two microphones and L represents the closest distance between the automobile and the mobile phone. Then, we will evaluate the influence of translation and rotation. The translation in the direction of x-axis can be ignored. The reason is that we have already ignored the height difference when modeling the system. And the translation in the direction of x-axis is much shorter than the height difference between the mobile phone and the sound source. As a result, the translation in the direction of x-axis can be ignored. Similarly, the rotation around the y-axis does not affect the positions of microphones on y-axis. The rotation around y-axis can also be ignored. Figure 10 shows the influences of remaining rotations and translations. We can take the translation in the direction of z-axis as example. The translation in the direction of z-axis influences the distance L we estimate between the mobile phone and the automobile in figure 9b. The path length difference ∆D can be modified as: Authorized licensed use limited to: Nanjing University. Downloaded on July 06,2021 at 04:35:27 UTC from IEEE Xplore. Restrictions apply

IEEE TRANSACTIONS ON MOBILE COMPUTING,VOL.XX.NO.XX,2020 10 Major Detection Region (a)Case 1. (b)Case 2. (c)Case 3. Fig.11:Different cases of multiple automobiles. Recived Signal of M 1. wmrimrwymn 0 100 300 0. 5.45o M18 0. 10020s30E 400600 100 30D Time Dela (a)Illustration of multiple sound source. (b)Original signals from dif-(c)Received signals of differ-(d)Correlation between the re- ferent sources ent receivers. ceived signals. Fig.12:Multiple peaks analysis. △Dz=√/(y+2l)2+(L+△)2-V√y2+(L+△l)2,(15)three cases appear in the figure.The solutions to the cases where Al represents the translation in the direction of z-axis. are as follows. The other rotations and translation can be calculated in the Case 1:Only one automobile is moving in Major Detection similar way.From the figure we can see that the influence of Region.In this case,before one automobile finishes its move- rotation around x-axis is much more than that of others.As ment in its major detection region,no other automobile a result,we just need to analyze the rotation around x-axis will enter the major detection region.The automobile in to remove the jitters. the major detection region will become the major detection object.The acoustic signals we collect from the automobile 3.2.6 Multiple Automobiles Detection in this case is similar to the signals from single automobile. While when multiple automobiles pass by the mobile The blue circles in figure 13a form the delay curve of the phone,the model presented before can only estimate the corresponding automobiles.The speed of the automobiles speed of part of the automobiles.The problem is caused can be estimated through the algorithm we propose.And by multiple sound sources.When multiple automobiles according to our experiments,when it is not at the peak pass by the mobile phone,the sound made by different time,this is the most common case for multiple automobiles. automobiles interferes with each other.And the sounds of Case 2:Multiple automobiles become Major Detection Obiect the automobiles share comparable intensity.The time delays in turn. we calculate from cross-correlation between the top and In this case,we should analyze how multiple automo- bottom microphones cannot always form a complete time biles affect the correlation we calculate through the acoustic delay curve. signals.Cross-correlation measures the similarity between According to the empirical study,it is hard to separate the two signals.The time delay between the two signals the specific automobile sounds from microphones of mobile can be calculated with the correlation values.First we can phones.However,it is possible to decide whose sound analyze the mixed signals with different delays.Figure 12a decides dominate the acoustic signals collected by the mi- illustrates the scenario of multiple signal sources and multi- crophones.In this section we define the automobile whose ple receivers.Figure 12b shows the signal waves of S1 and sound dominates the acoustic signals as Major Detection Ob- S2.Since S1Mi<S1M2l,the signals from S1 will arrive ject.Then we classify the scenario of multiple automobiles at Mi earlier than arrive at M2.Similarly,the signals from into three situations as shown in Figure 11. S2 will arrive at M later than arrive at M2.In figure 12c, We classify the scenarios of multiple automobiles into the signals Ri received by Mi are the superposition of three cases as shown in figure 11.Major Detection Region sample 181-580 in S1 and sample 51-450 in S2.The signals refers to the areas between the asymptote corresponding to R2 received by M2 are the superposition of sample 1-400 the time delay Adm and-Adm as shown in figure 11a.The in S1 and sample 201-600 in S2.That means the signals three cases can be concluded as: of S1 involved in R is 180 samples earlier than in R2 1.Only one automobile is moving in Major Detection and S2 involved in R1 is 150 samples later than in S2. Region. The correlation between the received signals is shown in 2.Multiple automobiles become Major Detection Object in figure 12d.We have the chance to calculate the time delay turn. of different sources. 3.Only one of the automobiles become Major Detection We can take automobile 7 and 8 in figure 13a as example. Object when multiple automobiles go through the Major The time delay curve of automobile 7 is not complete.The Detection Region. reason is that automobile 7 and 8 are coming at the same Figure 13a is the time delay curve of 12 automobiles in time.And the sound made by automobile 8 is louder than the real environment drawn by means of Section 3.2.2.All that made by automobile 7.Then we may use multiple peaks Authorized licensed use limited to:Nanjing University.Downloaded on July 06,2021 at 04:35:27 UTC from IEEE Xplore.Restrictions apply

IEEE TRANSACTIONS ON MOBILE COMPUTING, VOL. XX, NO. XX, 2020 10 Major Detection Region (a) Case 1. Major Detection Region (b) Case 2. Major Detection Region (c) Case 3. Fig. 11: Different cases of multiple automobiles. �" �# �" �# (a) Illustration of multiple sound source. 0 100 200 300 400 500 600 -0.5 0 0.5 Source Signal of S1 0 100 200 300 400 500 600 Sample -0.5 0 0.5 Source Signal of S2 201 - 600 181 - 580 51 - 450 1 - 400 (b) Original signals from different sources. 0 100 200 300 400 -1 0 1 Received Signal of M1 0 100 200 300 400 Sample -1 0 1 Received Signal of M2 (c) Received signals of different receivers. -400 -200 0 200 400 Time Delay(sample) -10 0 10 20 30 Correlation 180 -150 (d) Correlation between the received signals. Fig. 12: Multiple peaks analysis. ∆Dz = q (y + 2l) 2 + (L + ∆l) 2 − q y 2 + (L + ∆l) 2, (15) where ∆l represents the translation in the direction of z-axis. The other rotations and translation can be calculated in the similar way. From the figure we can see that the influence of rotation around x-axis is much more than that of others. As a result, we just need to analyze the rotation around x-axis to remove the jitters. 3.2.6 Multiple Automobiles Detection While when multiple automobiles pass by the mobile phone, the model presented before can only estimate the speed of part of the automobiles. The problem is caused by multiple sound sources. When multiple automobiles pass by the mobile phone, the sound made by different automobiles interferes with each other. And the sounds of the automobiles share comparable intensity. The time delays we calculate from cross-correlation between the top and bottom microphones cannot always form a complete time delay curve. According to the empirical study, it is hard to separate the specific automobile sounds from microphones of mobile phones. However, it is possible to decide whose sound decides dominate the acoustic signals collected by the microphones. In this section we define the automobile whose sound dominates the acoustic signals as Major Detection Object. Then we classify the scenario of multiple automobiles into three situations as shown in Figure 11. We classify the scenarios of multiple automobiles into three cases as shown in figure 11. Major Detection Region refers to the areas between the asymptote corresponding to the time delay ∆dm and −∆dm as shown in figure 11a. The three cases can be concluded as: 1. Only one automobile is moving in Major Detection Region. 2. Multiple automobiles become Major Detection Object in turn. 3. Only one of the automobiles become Major Detection Object when multiple automobiles go through the Major Detection Region. Figure 13a is the time delay curve of 12 automobiles in the real environment drawn by means of Section 3.2.2. All three cases appear in the figure. The solutions to the cases are as follows. Case 1: Only one automobile is moving in Major Detection Region. In this case, before one automobile finishes its movement in its major detection region, no other automobile will enter the major detection region. The automobile in the major detection region will become the major detection object. The acoustic signals we collect from the automobile in this case is similar to the signals from single automobile. The blue circles in figure 13a form the delay curve of the corresponding automobiles. The speed of the automobiles can be estimated through the algorithm we propose. And according to our experiments, when it is not at the peak time, this is the most common case for multiple automobiles. Case 2: Multiple automobiles become Major Detection Object in turn. In this case, we should analyze how multiple automobiles affect the correlation we calculate through the acoustic signals. Cross-correlation measures the similarity between the two signals. The time delay between the two signals can be calculated with the correlation values. First we can analyze the mixed signals with different delays. Figure 12a illustrates the scenario of multiple signal sources and multiple receivers. Figure 12b shows the signal waves of S1 and S2. Since |S1M1| < |S1M2|, the signals from S1 will arrive at M1 earlier than arrive at M2. Similarly, the signals from S2 will arrive at M1 later than arrive at M2. In figure 12c, the signals R1 received by M1 are the superposition of sample 181-580 in S1 and sample 51-450 in S2. The signals R2 received by M2 are the superposition of sample 1-400 in S1 and sample 201-600 in S2. That means the signals of S1 involved in R1 is 180 samples earlier than in R2 and S2 involved in R1 is 150 samples later than in S2. The correlation between the received signals is shown in figure 12d. We have the chance to calculate the time delay of different sources. We can take automobile 7 and 8 in figure 13a as example. The time delay curve of automobile 7 is not complete. The reason is that automobile 7 and 8 are coming at the same time. And the sound made by automobile 8 is louder than that made by automobile 7. Then we may use multiple peaks Authorized licensed use limited to: Nanjing University. Downloaded on July 06,2021 at 04:35:27 UTC from IEEE Xplore. Restrictions apply