This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/TMC.2019.2961313, IEEE Transactions on Mobile Computing IEEE TRANSACTIONS ON MOBILE COMPUTING, 2019 2 Google Glass Smartphone Original frames Stabilized frames Fig. 1. Video Stabilization in Mobile Devices. Videos captured with mo￾bile devices often suffer from undesired frame jitters due to the sudden movement from the users. We first estimate the original camera path (red) via inertial-visual state tracking, then smooth the original camera path to obtain the smoothed camera path (blue), and finally obtain the stabilized frames by warping the original frames. stabilization in mobile devices, it is essential to fuse the CV￾based and IMU-based approaches to accurately estimate the camera’s 3D motion, including the rotation and translation. In this paper, we propose a video stabilization scheme for camera shoot in mobile devices, based on the visual and inertial state tracking. Our approach is able to accurately estimate the camera’s 3D motion by sufficiently fusing both the CV-based and IMU-based methods. Specifically, during the process of video shoot, we use the gyroscope to es￾timate the rotation of camera, and use the structure-from￾motion among the image frames to estimate the translation of the camera. Different from the pure CV-based approaches, which estimate the rotation and translation simultaneously according to the camera projection model, our solution first estimates the rotation based on the gyroscope measurement, and plugs the estimated rotation into the camera projection model, then we estimate the translation according to the camera projection model. In comparison to the CV-based solution, our solution can estimate the translation in a more accurate manner with less number of feature point pairs, since the number of undetermined degrees of freedom in the 3D motion directly reduces from 6 to 3. After that, we further smooth the camera’s motion to remove the unde￾sired jitters during the moving process. As shown in Fig.1, according to the mapping relationship between the original moving path and the smoothed moving path, we warp each pixel from the original frame into a corresponding pixel in the stabilized frame. In this way, the stabilized video appears to have been captured along the smoothed moving path of the camera. In the context of recent visual￾inertial based video stabilization methods [12], [13], our solution is able to estimate the translation and rotation in a more accurate manner, and meets the real time requirement for online processing, by directly reducing the number of undetermined degrees of freedom from 6 to 3 for CV-based processing. There are two key challenges to address in this paper. The first challenge is to accurately estimate and effectively smooth the camera’s 3D motion in the situation of fast movement and violent jitters, due to the sudden movement during the video shoot. To address this challenge, firstly, we use the gyroscope to perform the rotation estimation to figure out a 3×3 rotation matrix, since it can accurately estimate the rotation even if the fast movement and violent jitters occur. Then, to smooth the rotation, instead of smoothing the 9 dependent parame￾ters separately, we further transform the 3×3 rotation matrix into the 1×3 Euler angles, and apply the low pass filter over the 3 independent Euler angles separately. In this way, we are able to effectively smooth the rotation while maintaining the consistency among multiple parameters. Secondly, we build a camera projection model by considering the rotation and translation of the camera. Then, by substituting the esti￾mated rotation into the camera projection model, we directly estimate the translation according to the matched feature point pairs in adjacent image frames. For the situation of fast movement and violent jitters, it is usually difficult to find enough feature point pairs between adjacent image frames to estimate the camera’s 3D motion. In comparison to the traditional CV-based approaches, our solution requires less number of feature point pairs, as we directly reduce the number of undetermined degrees of freedom in the 3D motion from 6 to 3. The second challenge is to sufficiently reduce the computation overhead of video stabilization, so as to make the real-time processing practical in the resource-constrained mobile devices. For traditional CV-based approaches, they usually require at least 5∼8 pairs of feature points to estimate the rotation and translation. They involve 6 degrees of freedom, thus they usually incur large computation overhead, failing to perform the video stabilization in a real-time manner. To address this challenge, our solution reduces the com￾putation overhead by directly reducing the undetermined degrees of freedom from 6 to 3. Specifically, we use the inertial measurements to estimate the rotation. Our solution only requires at least 3 pairs of feature points to estimate the translation, which reduces over 50% of the burden in the CV-based processing. This makes the real-time processing possible in the mobile devices. We make three key contributions in this paper. 1) We investigate video stabilization for camera shoot in mobile devices. By fusing the IMU-based method and the CV-based method, our solution is robust to the fast movement and vi￾olent jitters, and greatly reduces the computation overhead in video stabilization. 2) We conduct empirical studies to investigate the impact of movement jitters, and the measure￾ment errors in IMU-based approaches. We build a camera projection model by considering the rotation and translation of the camera. We further build the camera motion model to depict the relationship between the inertial-visual state and the camera’s 3D motion. 3) We implemented a prototype system on smart glasses and smart phones, and evaluated the performance under real scenarios, i.e., the human sub￾jects used mobile devices to shoot videos while they were walking, climbing or riding. The experiment results show that our solution achieves 32% better performance than the state-of-art solutions in regard to video stabilization. More￾over, the average processing time latency is 32.6ms, which is lower than the conventional inter-frame time interval, i.e., 33ms, and thus meets the real-time requirement for online processing. 2 RELATED WORK CV-based Solution: Traditional CV-based solutions for video stabilization can be roughly divided into 2D stabiliza￾tion and 3D stabilization. 2D video stabilization solutions use a series of 2D transformations between adjacent frames to represent the camera motion, and smooth these transfor￾mations to stabilize the video [1], [2], [14]. However, these methods cannot figure out the camera's 3D motion, thus
