This article has been accepted for publication in a future issue of this journal,but has not been fully edited.Content may change prior to final publication.Citation information:DOI 10.1109/JIOT.2021.3114224.IEEE Internet of Things Journal EEE INTERNET OF THINGS JOURNAL,VOL.XX,NO.XX,XX 2021 Fingertip Detection Keystroke Detection Fingertip and Localization discovery Moving or pressing Key Tracking Kevs Perspective extraction j th Adaptive Tracking th enor dat R+T Fig.3.Architecture of DynaKey. Fig.4.Principle of perspective transformation all edges,and then find all possible contours from detected space,as described in Eq.(1).We then introduce a division edges,as shown in Fig.5(b)and Fig.5(c)respectively.The operation to obtain its corresponding projection point(U,V) largest contour(i.e.,the green contour shown in Fig.5(c))with in the kth frame,as described in Eq.(2). four corners corresponds to the keyboard,where the corners U C0o CoL Co2 are detected based on the angles formed by the consecutive C10C11 C12 (1) contour segments,as the red points shown in Fig.5(d).When w C20 C21 C22 1 the keyboard location is fixed,i.e.,four corner points are Co0:X:+Co1·Y+Co2 fixed,as shown in Fig.5(e),we can detect the keys from U:=W: C20·Xi+C21·Y+C22 the keyboard.Specifically,with small contours (i.e..the red = C10·X:+C11·Y:+C12 (2) contours shown in Fig.5(c))located in the keyboard.we utilize C20·Xi+C21·Y:+C22 the area of a key to eliminate pitfall contours and then extract Here,the projection points of the keyboard or keys in the each key from the keyboard,as shown in Fig.5(f).Finally,we map the extracted keys with characters based on the relative previous frame can be obtained through key extraction,as mentioned in Section IV-B1.Thus the main challenge lies locations among keys,i.e.,the known keyboard layout in the calculation of transformation matrix C.which will be 2)Coordinate Transformation:Due to head movements,it described below. is essential to track the coordinates of keys among different Keypoint Selection:In the transformation matrix C,C22 is frames.Besides,the camera view changes also bring in a scale factor and usually set to C22 =1,thus we only need the distortion of keyboard in images,as the two captured to calculate the other eight variables,which can be solved by quadrilaterals PoPiP3 P2 and QoQ1Q3Q2 shown in Fig.4. selecting four non-linear feature point pairs (e.g.,P(Xi,Yi) To tolerate the camera movement and image distortion,we and Q:(U V)(i [0,3])shown in Fig.4).The specific propose a Perspective Transformation-based method to track formula for calculating C with four feature point pairs is the coordinates of keys. shown in Eq.(3). Perspective Transformation:As shown in Fig.4,for Xo Y61 0 0 0 -Xo*U -Y%*U a fixed point Gi in the physical space,when we obtain Y 1 0 0 0 -X1*U -YU its projection point (Xi,Yi)in the jth frame,perspective X2 1 0 0 0 -X2*U码 -*U transformation [21]can use a transformation matrix C 1 0 0 -X3*U -Y3+U Yo =2 0 00 -Xo+V -Yo+Vo C11 (3) (Coo,Co1,Co2;C10,C11,C12;C20,C21,C22)to calculate its 0 0 0 -X1*V -y*好 projection (U,V)in the kth frame.Therefore,when the 0 00X2 -X2* -Y* paper keyboard is fixed,we can use the known keyboard/key 0 00X3 Ya -X3* -Y* C21 locations in the previous frames to infer the keyboard/key To get the feature point pairs,FLANN based matcher [22] locations in the following frames,without keyboard detection was often adopted,which finds an approximate(may be not and key extraction.Specifically,with the known projection point (Xi,Yi)in the jth frame,we first use C to calculate the 3D coordinate (Ui,Vi,Wi)related to (Xi,Yi)in the physical 甲 pth frame gth frame (a)Keypoint selection by FLANN based matcher (a)An input frame (b)Edge detection result (c)All detected contours 100 60 26 用用用 墨 FLANN Our method Different keypoint selection methods (d)Comer point detecton (f)Key extraction result (b)The time cost of two keypoint selection methods Fig.5.Process of extracting keys. Fig.6.Feature points selection and time cost of FLANN based matcher.2327-4662 (c) 2021 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information. This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/JIOT.2021.3114224, IEEE Internet of Things Journal IEEE INTERNET OF THINGS JOURNAL, VOL. XX, NO. XX, XX 2021 5 Fingertip Detection Image frames Key Tracking Perspective transformation Keypoint detection Keys extraction Fingertip discovery Hand region segmentation Adaptive Tracking Sharp increase analysis Rotation angle analysis Keystroke Detection and Localization Moving or pressing Coordinate variation of fingertips Match between keys and keystroke Camera Gyroscope 0 500 1000 1500 2000 2500 3000 3500 4000 ms -0.1 -0.05 0 0.05 0.1 0.15 0.2 0.25 0.3 0.35 0.4 rad/s X Y Z Sensor data Fig. 3. Architecture of DynaKey. Gi (Xi , Yi ) (U′ i , V′ i ) j th k th R + T K1 K2 101∘ 77∘ 78∘ 104∘ 84∘ 97∘ 100∘ 79∘ R + T K1 K2 116∘ 101∘ 64∘ 116∘ 91∘ 63∘ 80∘ 89∘ P0 P1 P2 P3 Q3 Q2 Q1 Q0 Fig. 4. Principle of perspective transformation. all edges, and then find all possible contours from detected edges, as shown in Fig. 5(b) and Fig. 5(c) respectively. The largest contour (i.e., the green contour shown in Fig. 5(c)) with four corners corresponds to the keyboard, where the corners are detected based on the angles formed by the consecutive contour segments, as the red points shown in Fig. 5(d). When the keyboard location is fixed, i.e., four corner points are fixed, as shown in Fig. 5(e), we can detect the keys from the keyboard. Specifically, with small contours (i.e., the red contours shown in Fig. 5(c)) located in the keyboard, we utilize the area of a key to eliminate pitfall contours and then extract each key from the keyboard, as shown in Fig. 5(f). Finally, we map the extracted keys with characters based on the relative locations among keys, i.e., the known keyboard layout. 2) Coordinate Transformation: Due to head movements, it is essential to track the coordinates of keys among different frames. Besides, the camera view changes also bring in the distortion of keyboard in images, as the two captured quadrilaterals P0P1P3P2 and Q0Q1Q3Q2 shown in Fig. 4. To tolerate the camera movement and image distortion, we propose a Perspective Transformation-based method to track the coordinates of keys. Perspective Transformation: As shown in Fig. 4, for a fixed point Gi in the physical space, when we obtain its projection point (Xi , Yi) in the jth frame, perspective transformation [21] can use a transformation matrix C = (C00, C01, C02; C10, C11, C12; C20, C21, C22) to calculate its projection (U 0 i , V 0 i ) in the kth frame. Therefore, when the paper keyboard is fixed, we can use the known keyboard/key locations in the previous frames to infer the keyboard/key locations in the following frames, without keyboard detection and key extraction. Specifically, with the known projection point (Xi , Yi) in the jth frame, we first use C to calculate the 3D coordinate (Ui , Vi , Wi) related to (Xi , Yi) in the physical (a) An input frame (b) Edge detection result (c) All detected contours (d) Corner point detection (e) Keyboard with corner points (f) Key extraction result 0 100 200 300 400 500 600 700 800 900 1000 Point Sequence 80 90 100 110 120 130 140 150 160 170 180 Angle(°) 0 200 400 600 800 1000 80 100 120 140 160 180 Angle ( ) ∘ Point Sequence Fig. 5. Process of extracting keys. space, as described in Eq. (1). We then introduce a division operation to obtain its corresponding projection point (U 0 i , V 0 i ) in the kth frame, as described in Eq. (2). Ui Vi Wi = C00 C01 C02 C10 C11 C12 C20 C21 C22 · Xi Yi 1 (1) U 0 i = Ui Wi = C00 · Xi + C01 · Yi + C02 C20 · Xi + C21 · Yi + C22 V 0 i = Vi Wi = C10 · Xi + C11 · Yi + C12 C20 · Xi + C21 · Yi + C22 (2) Here, the projection points of the keyboard or keys in the previous frame can be obtained through key extraction, as mentioned in Section IV-B1. Thus the main challenge lies in the calculation of transformation matrix C, which will be described below. Keypoint Selection: In the transformation matrix C, C22 is a scale factor and usually set to C22 = 1, thus we only need to calculate the other eight variables, which can be solved by selecting four non-linear feature point pairs (e.g., Pi(Xi , Yi) and Qi(U 0 i , V 0 i )(i ∈ [0, 3]) shown in Fig. 4). The specific formula for calculating C with four feature point pairs is shown in Eq. (3). X0 Y0 1 0 0 0 −X0 ∗ U 0 0 −Y0 ∗ U 0 0 X1 Y1 1 0 0 0 −X1 ∗ U 0 1 −Y1 ∗ U 0 1 X2 Y2 1 0 0 0 −X2 ∗ U 0 2 −Y2 ∗ U 0 2 X3 Y3 1 0 0 0 −X3 ∗ U 0 3 −Y3 ∗ U 0 3 0 0 0 X0 Y0 1 −X0 ∗ V 0 0 −Y0 ∗ V 0 0 0 0 0 X1 Y1 1 −X1 ∗ V 0 1 −Y1 ∗ V 0 1 0 0 0 X2 Y2 1 −X2 ∗ V 0 2 −Y2 ∗ V 0 2 0 0 0 X3 Y3 1 −X3 ∗ V 0 3 −Y3 ∗ V 0 3 · C00 C01 C02 C10 C11 C12 C20 C21 = C22 · U 0 0 U 0 1 U 0 2 U 0 3 V 0 0 V 0 1 V 0 2 V 0 3 (3) To get the feature point pairs, FLANN based matcher [22] was often adopted, which finds an approximate (may be not (a) Keypoint selection by FLANN based matcher (b) The time cost of two keypoint selection methods Fig. 6. Feature points selection and time cost of FLANN based matcher. Authorized licensed use limited to: Nanjing University. Downloaded on December 03,2021 at 08:56:41 UTC from IEEE Xplore. Restrictions apply