正在加载图片...
This article has been accepted for publication in a future issue of this journal,but has not been fully edited.Content may change prior to final publication.Citation information:DOI 10.1109/TMC.2019.2961313.IEEE Transactions on Mobile Computing IEEE TRANSACTIONS ON MOBILE COMPUTING,2019 Original frames the 3 independent Euler angles separately.In this way,we are able to effectively smooth the rotation while maintaining the consistency among multiple parameters.Secondly,we build a camera projection model by considering the rotation and translation of the camera.Then,by substituting the esti- mated rotation into the camera projection model,we directly estimate the translation according to the matched feature point pairs in adjacent image frames.For the situation of Fig.1.Video Stabilization in Mobile Devices.Videos captured with mo fast movement and violent jitters,it is usually difficult to bile devices often suffer from undesired frame jitters due to the sudden find enough feature point pairs between adjacent image movement from the users.We first estimate the original camera path (red)via inertial-visual state tracking,then smooth the original camera frames to estimate the camera's 3D motion.In comparison to path to obtain the smoothed camera path(blue),and finally obtain the the traditional CV-based approaches,our solution requires stabilized frames by warping the original frames. less number of feature point pairs,as we directly reduce the number of undetermined degrees of freedom in the 3D stabilization in mobile devices,it is essential to fuse the CV- motion from 6 to 3.The second challenge is to sufficiently reduce based and IMU-based approaches to accurately estimate the the computation overhead of video stabilization,so as to make the camera's 3D motion,including the rotation and franslation. real-time processing practical in the resource-constrained mobile In this paper,we propose a video stabilization scheme devices.For traditional CV-based approaches,they usually for camera shoot in mobile devices,based on the visual and require at least 5~8 pairs of feature points to estimate the inertial state tracking.Our approach is able to accurately rotation and translation.They involve 6 degrees of freedom, estimate the camera's 3D motion by sufficiently fusing both thus they usually incur large computation overhead,failing the CV-based and IMU-based methods.Specifically,during to perform the video stabilization in a real-time manner. the process of video shoot,we use the gyroscope to es- To address this challenge,our solution reduces the com- timate the rotation of camera,and use the structure-from- putation overhead by directly reducing the undetermined motion among the image frames to estimate the translation degrees of freedom from 6 to 3.Specifically,we use the of the camera.Different from the pure CV-based approaches, inertial measurements to estimate the rotation.Our solution which estimate the rotation and translation simultaneously only requires at least 3 pairs of feature points to estimate the according to the camera projection model,our solution first translation,which reduces over 50%of the burden in the estimates the rotation based on the gyroscope measurement, CV-based processing.This makes the real-time processing and plugs the estimated rofation into the camera projection possible in the mobile devices. model,then we estimate the franslation according to the We make three key contributions in this paper.1)We camera projection model.In comparison to the CV-based investigate video stabilization for camera shoot in mobile solution,our solution can estimate the franslation in a more devices.By fusing the IMU-based method and the CV-based accurate manner with less number of feature point pairs, method,our solution is robust to the fast movement and vi- since the number of undetermined degrees of freedom in olent jitters,and greatly reduces the computation overhead the 3D motion directly reduces from 6 to 3.After that,we in video stabilization.2)We conduct empirical studies to further smooth the camera's motion to remove the unde- investigate the impact of movement jitters,and the measure- sired jitters during the moving process.As shown in Fig.1, ment errors in IMU-based approaches.We build a camera according to the mapping relationship between the original projection model by considering the rotation and translation moving path and the smoothed moving path,we warp of the camera.We further build the camera motion model to each pixel from the original frame into a corresponding depict the relationship between the inertial-visual state and pixel in the stabilized frame.In this way,the stabilized the camera's 3D motion.3)We implemented a prototype video appears to have been captured along the smoothed system on smart glasses and smart phones,and evaluated moving path of the camera.In the context of recent visual- the performance under real scenarios,i.e.,the human sub- inertial based video stabilization methods [12],[13],our jects used mobile devices to shoot videos while they were solution is able to estimate the translation and rotation in a walking,climbing or riding.The experiment results show more accurate manner,and meets the real time requirement that our solution achieves 32%better performance than the for online processing,by directly reducing the number of state-of-art solutions in regard to video stabilization.More- undetermined degrees of freedom from 6 to 3 for CV-based over,the average processing time latency is 32.6ms,which processing. is lower than the conventional inter-frame time interval,i.e., There are two key challenges to address in this paper. 33ms,and thus meets the real-time requirement for online The first challenge is to accurately estimate and effectively smooth processing the camera's 3D motion in the situation of fast movement and violent jitters,due to the sudden movement during the video shoot. 2 RELATED WORK To address this challenge,firstly,we use the gyroscope to CV-based Solution:Traditional CV-based solutions for perform the rotation estimation to figure out a 3 x 3 rotation video stabilization can be roughly divided into 2D stabiliza- matrix,since it can accurately estimate the rotation even if tion and 3D stabilization.2D video stabilization solutions the fast movement and violent jitters occur.Then,to smooth use a series of 2D transformations between adjacent frames the rotation,instead of smoothing the 9 dependent parame- to represent the camera motion,and smooth these transfor- ters separately,we further transform the 3x3 rotation matrix mations to stabilize the video [1],[2],[14].However,these into the 1 x3 Euler angles,and apply the low pass filter over methods cannot figure out the camera's 3D motion,thus 1536-1233(c)2019 IEEE Personal use is permitted,but republication/redistribution requires IEEE permission.See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.1536-1233 (c) 2019 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information. This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/TMC.2019.2961313, IEEE Transactions on Mobile Computing IEEE TRANSACTIONS ON MOBILE COMPUTING, 2019 2 Google Glass Smartphone Original frames Stabilized frames Fig. 1. Video Stabilization in Mobile Devices. Videos captured with mo￾bile devices often suffer from undesired frame jitters due to the sudden movement from the users. We first estimate the original camera path (red) via inertial-visual state tracking, then smooth the original camera path to obtain the smoothed camera path (blue), and finally obtain the stabilized frames by warping the original frames. stabilization in mobile devices, it is essential to fuse the CV￾based and IMU-based approaches to accurately estimate the camera’s 3D motion, including the rotation and translation. In this paper, we propose a video stabilization scheme for camera shoot in mobile devices, based on the visual and inertial state tracking. Our approach is able to accurately estimate the camera’s 3D motion by sufficiently fusing both the CV-based and IMU-based methods. Specifically, during the process of video shoot, we use the gyroscope to es￾timate the rotation of camera, and use the structure-from￾motion among the image frames to estimate the translation of the camera. Different from the pure CV-based approaches, which estimate the rotation and translation simultaneously according to the camera projection model, our solution first estimates the rotation based on the gyroscope measurement, and plugs the estimated rotation into the camera projection model, then we estimate the translation according to the camera projection model. In comparison to the CV-based solution, our solution can estimate the translation in a more accurate manner with less number of feature point pairs, since the number of undetermined degrees of freedom in the 3D motion directly reduces from 6 to 3. After that, we further smooth the camera’s motion to remove the unde￾sired jitters during the moving process. As shown in Fig.1, according to the mapping relationship between the original moving path and the smoothed moving path, we warp each pixel from the original frame into a corresponding pixel in the stabilized frame. In this way, the stabilized video appears to have been captured along the smoothed moving path of the camera. In the context of recent visual￾inertial based video stabilization methods [12], [13], our solution is able to estimate the translation and rotation in a more accurate manner, and meets the real time requirement for online processing, by directly reducing the number of undetermined degrees of freedom from 6 to 3 for CV-based processing. There are two key challenges to address in this paper. The first challenge is to accurately estimate and effectively smooth the camera’s 3D motion in the situation of fast movement and violent jitters, due to the sudden movement during the video shoot. To address this challenge, firstly, we use the gyroscope to perform the rotation estimation to figure out a 3×3 rotation matrix, since it can accurately estimate the rotation even if the fast movement and violent jitters occur. Then, to smooth the rotation, instead of smoothing the 9 dependent parame￾ters separately, we further transform the 3×3 rotation matrix into the 1×3 Euler angles, and apply the low pass filter over the 3 independent Euler angles separately. In this way, we are able to effectively smooth the rotation while maintaining the consistency among multiple parameters. Secondly, we build a camera projection model by considering the rotation and translation of the camera. Then, by substituting the esti￾mated rotation into the camera projection model, we directly estimate the translation according to the matched feature point pairs in adjacent image frames. For the situation of fast movement and violent jitters, it is usually difficult to find enough feature point pairs between adjacent image frames to estimate the camera’s 3D motion. In comparison to the traditional CV-based approaches, our solution requires less number of feature point pairs, as we directly reduce the number of undetermined degrees of freedom in the 3D motion from 6 to 3. The second challenge is to sufficiently reduce the computation overhead of video stabilization, so as to make the real-time processing practical in the resource-constrained mobile devices. For traditional CV-based approaches, they usually require at least 5∼8 pairs of feature points to estimate the rotation and translation. They involve 6 degrees of freedom, thus they usually incur large computation overhead, failing to perform the video stabilization in a real-time manner. To address this challenge, our solution reduces the com￾putation overhead by directly reducing the undetermined degrees of freedom from 6 to 3. Specifically, we use the inertial measurements to estimate the rotation. Our solution only requires at least 3 pairs of feature points to estimate the translation, which reduces over 50% of the burden in the CV-based processing. This makes the real-time processing possible in the mobile devices. We make three key contributions in this paper. 1) We investigate video stabilization for camera shoot in mobile devices. By fusing the IMU-based method and the CV-based method, our solution is robust to the fast movement and vi￾olent jitters, and greatly reduces the computation overhead in video stabilization. 2) We conduct empirical studies to investigate the impact of movement jitters, and the measure￾ment errors in IMU-based approaches. We build a camera projection model by considering the rotation and translation of the camera. We further build the camera motion model to depict the relationship between the inertial-visual state and the camera’s 3D motion. 3) We implemented a prototype system on smart glasses and smart phones, and evaluated the performance under real scenarios, i.e., the human sub￾jects used mobile devices to shoot videos while they were walking, climbing or riding. The experiment results show that our solution achieves 32% better performance than the state-of-art solutions in regard to video stabilization. More￾over, the average processing time latency is 32.6ms, which is lower than the conventional inter-frame time interval, i.e., 33ms, and thus meets the real-time requirement for online processing. 2 RELATED WORK CV-based Solution: Traditional CV-based solutions for video stabilization can be roughly divided into 2D stabiliza￾tion and 3D stabilization. 2D video stabilization solutions use a series of 2D transformations between adjacent frames to represent the camera motion, and smooth these transfor￾mations to stabilize the video [1], [2], [14]. However, these methods cannot figure out the camera’s 3D motion, thus
<<向上翻页向下翻页>>
©2008-现在 cucdc.com 高等教育资讯网 版权所有