正在加载图片...
This article has been accepted for publication in a future issue of this journal,but has not been fully edited.Content may change prior to final publication.Citation information:DOI 10.1109/TMC.2019.2961313.IEEE Transactions on Mobile Computing IEEE TRANSACTIONS ON MOBILE COMPUTING.2019 Video Stabilization for Camera Shoot in Mobile Devices via Inertial-Visual State Tracking Fei Han,Student Member,IEEE,Lei Xie,Member,IEEE,Yafeng Yin,Member,IEEE, Hao Zhang,Student Member,IEEE,Guihai Chen,Member,IEEE,and Sanglu Lu,Member,IEEE Abstract-Due to the sudden movement during the camera shoot,the videos retrieved from the hand-held mobile devices often suffer from undesired frame jitters,leading to the loss of video quality.In this paper,we present a video stabilization solution in mobile devices via inertial-visual state tracking.Specifically,during the video shoot,we use the gyroscope to estimate the rotation of camera,and use the structure-from-motion among the image frames to estimate the translation of camera.We build a camera projection model by considering the rotation and translation of the camera,and the camera motion model to depict the relationship between the inertial-visual state and the camera's 3D motion.By fusing the inertial measurement(IMU)-based method and the computer vision (CV)-based method,our solution is robust to the fast movement and violent jitters,moreover,it greatly reduces the computation overhead in video stabilization.In comparison to the IMU-based solution,our solution can estimate the translation in a more accurate manner,since we use the feature point pairs in adjacent image frames,rather than the error-prone accelerometers,to estimate the translation.In comparison to the CV-based solution,our solution can estimate the translation with less number of feature point pairs since the number of undetermined degrees of freedom in the 3D motion directly reduces from 6 to 3.We implemented a prototype system on smart glasses and smart phones,and evaluated the performance under real scenarios,i.e.,the human subjects used mobile devices to shoot videos while they were walking,climbing or riding.The experiment results show that our solution achieves 32% better performance than the state-of-art solutions in regard to video stabilization.Moreover,the average processing time latency is 32.6ms,which is lower than the conventional inter-frame time interval,i.e.,33ms,and thus meets the real-time requirement for online processing Index Terms-Video Stabilization,Mobile Device,3D Motion Sensing,Inertial-Visual State Tracking 1 INTRODUCTION UE to the proliferation of mobile devices,nowadays niques [3],[4].The inertial measurement-based approaches more and more people tend to use their mobile devices mainly use the built-in inertial measurement unit (IMU) to take videos.Such devices can be smart phones and smart to continuously track the 3D motion of the mobile device. glasses.However,due to the sudden movement from the However,they mainly focus on the rotation while ignoring users during the camera shoot,the videos retrieved from the translation of the camera.The reason is two folds:First such mobile devices often suffer from undesired frame the gyroscope in the IMU is usually able to accurately jitters.This usually leads to the loss of video quality.track the rotation,whereas the accelerometer in the IMU Therefore,a number of video stabilization techniques are usually fails to accurately track the translation due to the proposed to remove the undesired jitters and obtain stable large cumulative tracking errors.The computer vision(CV)- videos [1],[2],[3],[4],[5],[61,[7].Recently,by leveraging the based approaches mainly use the structure-from-motion embedded sensors,new opportunities have been raised to [11]among the image frames to estimate both the rota- perform video stabilization in the mobile devices.For the tion and translation of the camera.Although they achieve mobile devices,conventional video stabilization schemes enough accuracy for the camera motion estimation,they involves estimating the motion of the camera,smoothing require plenty of feature point pairs and long feature point the camera's motion to remove the undesired jitters,and tracks.The requirement of massive feature points for mo- warping the frames to stabilize the videos.Among these tion estimation increases the computational overhead in the procedures,it is especially important to accurately estimate resource-constrained mobile devices.This makes the real- the camera's motion during the camera shoot,since it is a time processing impractical in the mobile devices.Hence,to key precondition for the following jitters removal and frame achieve a tradeoff between performance and computation warping. overhead,only rotation estimation is considered for the Conventionally,the motion estimation of the camera in state-of-the-art solutions.Second,according to our empirical 3D space is either based on the inertial measurement-based studies,when the target is at a distance greater than 100cm, techniques [8],[9],[10]or the computer vision-based tech-the rotation usually brings greater pixel jitters than the translation,hence,most previous work consider the rotation Fei Han,Lei Xie,Yafeng Yin,Hao Zhang,Guihai Chen and Sanglu has a greater impact on performance than the translation. Lu are with the State Key Laboratory for Novel Software Technology, Nanjing UIniversity,China However,when the target is within a close range,e.g., E-mail: feihan@smail.nju.edu.cn. lxie,yafeng @nju.edu.cn, at the distance less than 100cm,the translation usually H.Zhang@smail.nju.edu.cn,{gchen,sangluy@nju.edu.cn. brings greater pixel jitters than the rotation,thus the trans- Lei Xie is the corresponding author. lation tracking is also very essential for real applications of camera shooting.Therefore,to efficiently perform video 1536-1233(c)2019 IEEE Personal use is permitted,but republication/redistribution requires IEEE permission.See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.1536-1233 (c) 2019 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information. This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/TMC.2019.2961313, IEEE Transactions on Mobile Computing IEEE TRANSACTIONS ON MOBILE COMPUTING, 2019 1 Video Stabilization for Camera Shoot in Mobile Devices via Inertial-Visual State Tracking Fei Han, Student Member, IEEE, Lei Xie, Member, IEEE, Yafeng Yin, Member, IEEE, Hao Zhang, Student Member, IEEE, Guihai Chen, Member, IEEE, and Sanglu Lu, Member, IEEE Abstract—Due to the sudden movement during the camera shoot, the videos retrieved from the hand-held mobile devices often suffer from undesired frame jitters, leading to the loss of video quality. In this paper, we present a video stabilization solution in mobile devices via inertial-visual state tracking. Specifically, during the video shoot, we use the gyroscope to estimate the rotation of camera, and use the structure-from-motion among the image frames to estimate the translation of camera. We build a camera projection model by considering the rotation and translation of the camera, and the camera motion model to depict the relationship between the inertial-visual state and the camera’s 3D motion. By fusing the inertial measurement (IMU)-based method and the computer vision (CV)-based method, our solution is robust to the fast movement and violent jitters, moreover, it greatly reduces the computation overhead in video stabilization. In comparison to the IMU-based solution, our solution can estimate the translation in a more accurate manner, since we use the feature point pairs in adjacent image frames, rather than the error-prone accelerometers, to estimate the translation. In comparison to the CV-based solution, our solution can estimate the translation with less number of feature point pairs, since the number of undetermined degrees of freedom in the 3D motion directly reduces from 6 to 3. We implemented a prototype system on smart glasses and smart phones, and evaluated the performance under real scenarios, i.e., the human subjects used mobile devices to shoot videos while they were walking, climbing or riding. The experiment results show that our solution achieves 32% better performance than the state-of-art solutions in regard to video stabilization. Moreover, the average processing time latency is 32.6ms, which is lower than the conventional inter-frame time interval, i.e., 33ms, and thus meets the real-time requirement for online processing. Index Terms—Video Stabilization, Mobile Device, 3D Motion Sensing, Inertial-Visual State Tracking ✦ 1 INTRODUCTION D UE to the proliferation of mobile devices, nowadays more and more people tend to use their mobile devices to take videos. Such devices can be smart phones and smart glasses. However, due to the sudden movement from the users during the camera shoot, the videos retrieved from such mobile devices often suffer from undesired frame jitters. This usually leads to the loss of video quality. Therefore, a number of video stabilization techniques are proposed to remove the undesired jitters and obtain stable videos [1], [2], [3], [4], [5], [6], [7]. Recently, by leveraging the embedded sensors, new opportunities have been raised to perform video stabilization in the mobile devices. For the mobile devices, conventional video stabilization schemes involves estimating the motion of the camera, smoothing the camera’s motion to remove the undesired jitters, and warping the frames to stabilize the videos. Among these procedures, it is especially important to accurately estimate the camera’s motion during the camera shoot, since it is a key precondition for the following jitters removal and frame warping. Conventionally, the motion estimation of the camera in 3D space is either based on the inertial measurement-based techniques [8], [9], [10] or the computer vision-based tech- • Fei Han, Lei Xie, Yafeng Yin, Hao Zhang, Guihai Chen and Sanglu Lu are with the State Key Laboratory for Novel Software Technology, Nanjing University, China. E-mail: feihan@smail.nju.edu.cn, {lxie,yafeng}@nju.edu.cn, H.Zhang@smail.nju.edu.cn, {gchen,sanglu}@nju.edu.cn. • Lei Xie is the corresponding author. niques [3], [4]. The inertial measurement-based approaches mainly use the built-in inertial measurement unit (IMU) to continuously track the 3D motion of the mobile device. However, they mainly focus on the rotation while ignoring the translation of the camera. The reason is two folds: First, the gyroscope in the IMU is usually able to accurately track the rotation, whereas the accelerometer in the IMU usually fails to accurately track the translation due to the large cumulative tracking errors. The computer vision (CV)- based approaches mainly use the structure-from-motion [11] among the image frames to estimate both the rota￾tion and translation of the camera. Although they achieve enough accuracy for the camera motion estimation, they require plenty of feature point pairs and long feature point tracks. The requirement of massive feature points for mo￾tion estimation increases the computational overhead in the resource-constrained mobile devices. This makes the real￾time processing impractical in the mobile devices. Hence, to achieve a tradeoff between performance and computation overhead, only rotation estimation is considered for the state-of-the-art solutions. Second, according to our empirical studies, when the target is at a distance greater than 100cm, the rotation usually brings greater pixel jitters than the translation, hence, most previous work consider the rotation has a greater impact on performance than the translation. However, when the target is within a close range, e.g., at the distance less than 100cm, the translation usually brings greater pixel jitters than the rotation, thus the trans￾lation tracking is also very essential for real applications of camera shooting. Therefore, to efficiently perform video
向下翻页>>
©2008-现在 cucdc.com 高等教育资讯网 版权所有