正在加载图片...
This article has been accepted for publication in a future issue of this journal,but has not been fully edited.Content may change prior to final publication.Citation information:DOI 10.1109/JIOT.2021.3114224.IEEE Internet of Things Journal EEE INTERNET OF THINGS JOURNAL,VOL.XX,NO.XX,XX 2021 oressina'l (a)Pressing 'I' (b)Leaving I' (a)An input frame (b)Hand segmentatio (e)Line detection (c)Pressing 'Y (d)LeavingY (d)Optimized line detection (e)Comner point detectio (f)Keypoint selection Fig.7.Hands move on the keyboard and lead to the different occlusion in Fig.8.Process of selecting keypoints. the process of pressing 'I'and Y'. a detected cross point can not be directly inferred.To solve this the best)nearest neighbor point in image p for the point problem,we introduce the corner point of keyboard to infer the in image g,and then pairs the two points.Considering the location of a detected cross point,based on the relative position possible wrongly-selected feature point pair,such as P and between corner point and other cross points.Specifically,we P;in Fig.6(a).the FLANN based method often needs to observe that there usually exist one or more corner points in detect a large number of feature point pairs,and then selects captured images during typing,as shown in Fig.7.Particularly, the top-k(k is usually larger than 4)feature point pairs the top left or the top right corner often exists.Therefore,we to calculate the transformation matrix with the least square utilize the top,leftmost,rightmost and bottom detected line method.However,selecting a larger number of feature points by priority to detect possible corner points,until one corner will lead to non-negligible time latency (e.g.,60ms),which is point is detected. larger than inter-frame duration (i.e.,33ms)and unacceptable Take the top line as an example,we trace the points of the in real-time systems,as shown in Fig.6(b).Therefore,it is top line from the leftmost point to right,to detect the top left necessary to quickly and accurately select appropriate number corner point.As shown in Fig.8(e),for a point P(,)in of feature point pairs for transformation matrix calculation. the top line,we use a square area S=[(xi,y)i-< To achieve the above goal,we introduce keypoint selection 6x,yi-y<y}to verify whether P is a corner point. to calculate C with only four keypoint pairs,where keypoints When the ratio of the number of black pixels(i.e.,the possible mean cross points of lines in the keyboard.As shown in Fig.7, contour of a corner)to the number of all pixels in S is larger due to the size differences of the keyboard and hands.whatever than Ape,P'can be a candidate corner point.After that,we the location of the occlusion is,the cross points in the keyboard fit a line for the black pixels satisfying i-<Ox and the will not be occluded completely at the same time.In addition. black pixels satisfying lyi-y<y,respectively,as shown during a typing process,we observe that the hand movements in Fig.8(e).If the angle y between the two fitted lines satisfies between two consecutive frames are not violent,i.e,there often ly-90<Ae,P will be selected as the top left corner,(i.e., exist several common cross points for the two frames,as the P).Based on extensive experiments,we set Ape =0.25. green points shown in Fig.7(a)and Fig.7(b).Therefore,we △e=6°,6x=6y=5 by default..It is worth noting that can detect the common cross points appearing on both of the if all borders (i.e.,all corner points)of the keyboard are two consecutive frames (i.e.,cross point pairs),and select four not detected.we will skip this frame.This is because there non-colinear keypoint pairs for perspective transformation,as usually have no valid keystrokes,when all borders are blocked. shown in Alg.1. Otherwise,if any border of the keyboard is detected,we will 1)Line Detection:With an input image as shown in then detect the corner points for key tracking. Fig.8(a)which equals to Fig.7(b),we first utilize skin 3)Common Cross Point Detection:For other detected lines, segmentation [29]to segment the hand region from the image, we extend the length of each line to detect the cross points, shown as the white region in Fig.8(b).Then,we get the edges as the green points shown in Fig.8(e).To extract the common in Fig.8(b)using Canny edge detector [10],as shown in Fig. cross point set detected in two frames,we first utilize the 8(c).After that,to detect the lines of keyboard and reduce detected top-left corner point P to infer the location of a cross the interference of other edges,we use Hough transformation point.Specifically,we represent the location of a cross point [15]to detect the long lines in image,shown as the red lines Pi with a distance d;and an angle 0;.As shown in Fig.8(f). in Fig.8(c).However,there are too many lines,which may the distance d;is measured as the Euclidean distance between confuse the cross point detection.Therefore,we merge the Pi and P,and the 0;is computed as the angle between PP detected lines.For convenience,we represent each line in polar and PP.Here,the point P is a randomly selected point on the coordinates with a vector (p,)For the lines close to each right of P in top line.By comparing di and i of each point other,which satisfy△p<50 pixels and△9<5.7°,we only in two frames,we pair two keypoints with similar distance select one of them.The optimized line detection result for Fig. and angle,i.e.,the distance difference in two frames satisfies 8(c)is shown in Fig.8(d). od 20 pixels while the angle difference satisfies 60 <4. 2)Corner Point Detection:As shown in Fig.8(d),not all In Fig.8(f),the yellow and the green keypoints are selected lines of the keyboard(i.e.,not all cross points)can be detected, as common cross points. due to the occlusion of hands.Correspondingly,the location of 4)Keypoint Pair Determination:Finally,we select2327-4662 (c) 2021 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information. This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/JIOT.2021.3114224, IEEE Internet of Things Journal IEEE INTERNET OF THINGS JOURNAL, VOL. XX, NO. XX, XX 2021 6 (a) Pressing ‘I’ (b) Leaving ‘I’ (c) Pressing ‘Y’ (d) Leaving ‘Y’ SUHVVLQJņ<Ň OHDYHņ<Ň SUHVVLQJņ,Ň OHDYHņ,Ň SUHVVLQJņ<Ň OHDYHņ<Ň FRUQHUSRLQW FRUQHUSRLQW FRUQHUSRLQW FRUQHUSRLQW FRUQHUSRLQW FRUQHUSRLQW FRUQHUSRLQW FRUQHUSRLQW FRUQHUSRLQW FRUQHUSRLQW FRUQHUSRLQW Fig. 7. Hands move on the keyboard and lead to the different occlusion in the process of pressing ‘I’ and ‘Y’. the best) nearest neighbor point in image p for the point in image q, and then pairs the two points. Considering the possible wrongly-selected feature point pair, such as Pi and Pj in Fig. 6(a), the FLANN based method often needs to detect a large number of feature point pairs, and then selects the top-k (k is usually larger than 4) feature point pairs to calculate the transformation matrix with the least square method. However, selecting a larger number of feature points will lead to non-negligible time latency (e.g., 60ms), which is larger than inter-frame duration (i.e., 33ms) and unacceptable in real-time systems, as shown in Fig. 6(b). Therefore, it is necessary to quickly and accurately select appropriate number of feature point pairs for transformation matrix calculation. To achieve the above goal, we introduce keypoint selection to calculate C with only four keypoint pairs, where keypoints mean cross points of lines in the keyboard. As shown in Fig. 7, due to the size differences of the keyboard and hands, whatever the location of the occlusion is, the cross points in the keyboard will not be occluded completely at the same time. In addition, during a typing process, we observe that the hand movements between two consecutive frames are not violent, i.e, there often exist several common cross points for the two frames, as the green points shown in Fig. 7(a) and Fig. 7(b). Therefore, we can detect the common cross points appearing on both of the two consecutive frames (i.e., cross point pairs), and select four non-colinear keypoint pairs for perspective transformation, as shown in Alg. 1. 1) Line Detection: With an input image as shown in Fig. 8(a) which equals to Fig. 7(b), we first utilize skin segmentation [29] to segment the hand region from the image, shown as the white region in Fig. 8(b). Then, we get the edges in Fig. 8(b) using Canny edge detector [10], as shown in Fig. 8(c). After that, to detect the lines of keyboard and reduce the interference of other edges, we use Hough transformation [15] to detect the long lines in image, shown as the red lines in Fig. 8(c). However, there are too many lines, which may confuse the cross point detection. Therefore, we merge the detected lines. For convenience, we represent each line in polar coordinates with a vector (ρ, θ). For the lines close to each other, which satisfy ∆ρ < 50 pixels and ∆θ < 5.7 ◦ , we only select one of them. The optimized line detection result for Fig. 8(c) is shown in Fig. 8(d). 2) Corner Point Detection: As shown in Fig. 8(d), not all lines of the keyboard (i.e., not all cross points) can be detected, due to the occlusion of hands. Correspondingly, the location of (a) An input frame (b) Hand segmentation (c) Line detection (d) Optimized line detection (e) Corner point detection (f) Keypoint selection Pl Pr Pl Pi d θ P′ l P ′ r P S′ l γ Pl (P′ l ) Fig. 8. Process of selecting keypoints. a detected cross point can not be directly inferred. To solve this problem, we introduce the corner point of keyboard to infer the location of a detected cross point, based on the relative position between corner point and other cross points. Specifically, we observe that there usually exist one or more corner points in captured images during typing, as shown in Fig. 7. Particularly, the top left or the top right corner often exists. Therefore, we utilize the top, leftmost, rightmost and bottom detected line by priority to detect possible corner points, until one corner point is detected. Take the top line as an example, we trace the points of the top line from the leftmost point to right, to detect the top left corner point. As shown in Fig. 8(e), for a point P 0 l (x 0 l , y0 l ) in the top line, we use a square area S 0 l = {(xi , yi)||xi − x 0 l | ≤ δx, |yi − y 0 l | ≤ δy} to verify whether P 0 l is a corner point. When the ratio of the number of black pixels (i.e., the possible contour of a corner) to the number of all pixels in S 0 l is larger than ∆ρc, P 0 l can be a candidate corner point. After that, we fit a line for the black pixels satisfying |xi − x 0 l | < δx and the black pixels satisfying |yi − y 0 l | < δy, respectively, as shown in Fig. 8(e). If the angle γ between the two fitted lines satisfies |γ − 90| < ∆, P 0 l will be selected as the top left corner, (i.e., Pl). Based on extensive experiments, we set ∆ρc = 0.25, ∆ = 6◦ , δx = δy = 5 by default. It is worth noting that if all borders (i.e., all corner points) of the keyboard are not detected, we will skip this frame. This is because there usually have no valid keystrokes, when all borders are blocked. Otherwise, if any border of the keyboard is detected, we will then detect the corner points for key tracking. 3) Common Cross Point Detection: For other detected lines, we extend the length of each line to detect the cross points, as the green points shown in Fig. 8(e). To extract the common cross point set detected in two frames, we first utilize the detected top-left corner point Pl to infer the location of a cross point. Specifically, we represent the location of a cross point Pi with a distance di and an angle θi . As shown in Fig. 8(f), the distance di is measured as the Euclidean distance between Pi and Pl , and the θi is computed as the angle between −−→PlPi and −−→PlP. Here, the point P is a randomly selected point on the right of Pl in top line. By comparing di and θi of each point in two frames, we pair two keypoints with similar distance and angle, i.e., the distance difference in two frames satisfies δd < 20 pixels while the angle difference satisfies δθ < 4 ◦ . In Fig. 8(f), the yellow and the green keypoints are selected as common cross points. 4) Keypoint Pair Determination: Finally, we select Authorized licensed use limited to: Nanjing University. Downloaded on December 03,2021 at 08:56:41 UTC from IEEE Xplore. Restrictions apply.
<<向上翻页向下翻页>>
©2008-现在 cucdc.com 高等教育资讯网 版权所有