正在加载图片...
cases where a time series contains outliers,this results in from our volunteers,we observed that on average the wave undue weight given to those outlying values and that sig- forms of a keystroke spanned tavg650 data points and nificantly corrupts the measure of deviation.The MAD is average number of data points between arrival of two con- calculated using following equation. secutive keystrokes was At 1250 data points at the CSI sampling rate of F.=2500 samples/s.We empirically △m因= ∑法”2同)-zG:j+w1 determined appropriate values for the remaining constants (4 W including W,Du:Iu,a,B,Bieft,Bright,Thresh and Pavg. where(j+W)represents the vector of means of the 5.3 Combining Results from Antenna Pairs kth projected CSI stream in j-th window.It calculates the As mentioned earlier,we obtain the starting points of key- value of Am;for each sample point i and for the principle strokes independently from each TX-RX antenna pair.Let components2≤k≤p. Se.r represent the set containing the starting points of all Second,the algorithm adds the mean absolute deviations keystrokes obtained from the keystroke detection algorithm in each waveform to calculate a combined measure AMj applied on the antenna pair t-r.First,we obtain the set of MAD in all p-1 waveforms,which is calculated in the St.r for each t-r pair.Second,we take the average of all following equation. the starting points that are within Atavg of each other in all sets S.r to obtain a robust estimate of starting points of △M,= △m[ (5) keystrokes.Third,based on experimentally measured aver- k三2 age span tavg of different keystrokes,we calculate the ending Third,the algorithm compares AM;to a heuristically set points of all keystrokes by simply adding tavg to the corres- ponding starting point. threshold Thresh.Let =AMi-Thresh,then 6j>0 shows that the current window i contains significant vari- 5.4 Extracting Keystroke Waveforms ations in CSI amplitudes. Once the algorithm calculates the set of starting and cor- Fourth,the algorithm compares &to its value in last win- responding ending points for keystrokes,we use those points dow 6j-1 to detect increasing or decreasing trend in detec- ted variations.When 6;-6j-1>0,there is an increasing to extract the waveforms from CSI matrix H.r.Let Km.t. represent the CSI waveform of mth keystroke extracted from trend in the rate of change in combined MAD (AM;)of CSI the antenna pair t-r.Let 3m represent the average of the time series and vice versa.These increasing and decreas- starting points for the mthkeystroke from all antenna pairs. ing trends are captured in variables iu and du,respectively We can express Km.t.r in terms of Ht.r follows. The algorithm increments the value of iu by 1 whenever 6j-6-1 0 and du by 1 whenever 6i-6j-1<0.Let Km.t.r Ht,r(3m3m+taug) (8) o represent forgetting factor,which is used to "forget"the After extracting the CSI waveforms Km.t.r from all sub- variations caused by noise to avoid false positives.To forget carriers of the t-r antenna pair,we apply PCA on those CSI such variations,the algorithm decrements both i and d by waveforms to remove the noisy components and obtain the 1 if AMj Thresh for a duration of oW. components that represent the variations caused by move- Fifth,as soon as the values of iu and du exceed empir- ically determined thresholds Iu and Du,respectively,the ments of hands and fingers. Unlike principle components derived from normalized algorithm detects the start of the keystroke.As soon as the streams,it is difficult to decide which PCA component rep- algorithm detects a keystroke,it estimates the starting point resents noise and should be removed from the top p principal sm and ending point em of the keystroke waveforms using following equations. components for the case of Km.t.r.The difficulty arises be- cause Km.t.r contains the set of waveforms for a specific Sm =j-BW-Bleft (6) keystroke instead of the whole CSI stream,due to which the variance of noisy component often becomes small.We em =j-BW +tavg Bright (7) observe that the noisy PCA component keeps changing po- where tavg is the average number of data points spanned by sitions between 1*t and 2nd place among the sorted PCA waveforms of different keystrokes,B is the span factor which components for different extracted keystroke waveforms.In determines the estimated starting point of the keystroke and order to get rid of this problem,we first project Km.t.r onto Bieft and Bright are guard intervals on both sides of the all top g principal components.Let be an Sexq di- estimated keystroke interval.The guard intervals ensure that mensional matrix that represent the top g principal compon- the detected keystroke waveforms are complete ents in Km.t.r obtained after applying PCA and Kg be Last,our algorithm calculates the sum of powers in all an L x q dimensional matrix containing the projected CSI waveforms lying within those starting and ending points streams in its columns,where L is the length of segmented and then compares this combined power with a sum power keystroke waveform.Thus,K is given by the following threshold (Pavg)to confirm the presence of a complete key- equation. stroke within that interval.This ensures that the training (9)】 models are built using only those waveforms which contain K=Kmr×Φq m.t.r complete shapes of the keystrokes.Once keystroke detec- In our implementation,we choose q =4.This choice is again tion is confirmed,the algorithm finally returns the starting based on the observation that the top 4 principal compon- point (sm)of the detected keystroke and jumps Ata data ents contain enough information about keystrokes required points ahead of sm to look for next keystroke,where Atvg to achieve high accuracy during classification. is the average number of data points between arrival of two To detect which waveform inKrepresents the noisy consecutive keystrokes.From the CSI data set we collected projection,we chose the top 2 projected waveforms and di-cases where a time series contains outliers, this results in undue weight given to those outlying values and that sig￾nificantly corrupts the measure of deviation. The MAD is calculated using following equation. △mj [k] = Pj+W i=j |Z {k} t,r (i) − Z {k} t,r (j : j + W)| W (4) where Z {k} t,r (j : j + W) represents the vector of means of the kth projected CSI stream in j-th window. It calculates the value of △mj for each sample point j and for the principle components 2 ≤ k ≤ p. Second, the algorithm adds the mean absolute deviations in each waveform to calculate a combined measure △Mj of MAD in all p − 1 waveforms, which is calculated in the following equation. △Mj = Xp k=2 △mj [k] (5) Third, the algorithm compares △Mj to a heuristically set threshold T hresh. Let δj = △Mj − T hresh, then δj > 0 shows that the current window j contains significant vari￾ations in CSI amplitudes. Fourth, the algorithm compares δj to its value in last win￾dow δj−1 to detect increasing or decreasing trend in detec￾ted variations. When δj − δj−1 > 0, there is an increasing trend in the rate of change in combined MAD (△Mj ) of CSI time series and vice versa. These increasing and decreas￾ing trends are captured in variables iu and du, respectively. The algorithm increments the value of iu by 1 whenever δj − δj−1 > 0 and du by 1 whenever δj − δj−1 < 0. Let σ represent forgetting factor, which is used to “forget” the variations caused by noise to avoid false positives. To forget such variations, the algorithm decrements both iu and du by 1 if △Mj < T hresh for a duration of σW. Fifth, as soon as the values of iu and du exceed empir￾ically determined thresholds Iu and Du, respectively, the algorithm detects the start of the keystroke. As soon as the algorithm detects a keystroke, it estimates the starting point sm and ending point em of the keystroke waveforms using following equations. sm = j − βW − Blef t (6) em = j − βW + tavg + Bright (7) where tavg is the average number of data points spanned by waveforms of different keystrokes, β is the span factor which determines the estimated starting point of the keystroke and Blef t and Bright are guard intervals on both sides of the estimated keystroke interval. The guard intervals ensure that the detected keystroke waveforms are complete. Last, our algorithm calculates the sum of powers in all waveforms lying within those starting and ending points and then compares this combined power with a sum power threshold (Pavg) to confirm the presence of a complete key￾stroke within that interval. This ensures that the training models are built using only those waveforms which contain complete shapes of the keystrokes. Once keystroke detec￾tion is confirmed, the algorithm finally returns the starting point (sm) of the detected keystroke and jumps △tavg data points ahead of sm to look for next keystroke, where △tavg is the average number of data points between arrival of two consecutive keystrokes. From the CSI data set we collected from our volunteers, we observed that on average the wave￾forms of a keystroke spanned tavg ≈ 650 data points and average number of data points between arrival of two con￾secutive keystrokes was △tavg ≈ 1250 data points at the CSI sampling rate of Fs = 2500 samples/s. We empirically determined appropriate values for the remaining constants including W, Du, Iu, σ, β, Blef t, Bright, T hresh and Pavg. 5.3 Combining Results from Antenna Pairs As mentioned earlier, we obtain the starting points of key￾strokes independently from each TX-RX antenna pair. Let St,r represent the set containing the starting points of all keystrokes obtained from the keystroke detection algorithm applied on the antenna pair t − r. First, we obtain the set St,r for each t − r pair. Second, we take the average of all the starting points that are within △tavg of each other in all sets St,r to obtain a robust estimate of starting points of keystrokes. Third, based on experimentally measured aver￾age span tavg of different keystrokes, we calculate the ending points of all keystrokes by simply adding tavg to the corres￾ponding starting point. 5.4 Extracting Keystroke Waveforms Once the algorithm calculates the set of starting and cor￾responding ending points for keystrokes, we use those points to extract the waveforms from CSI matrix Ht,r. Let Km,t,r represent the CSI waveform of mth keystroke extracted from the antenna pair t-r. Let sm represent the average of the starting points for the mth keystroke from all antenna pairs. We can express Km,t,r in terms of Ht,r follows. Km,t,r = Ht,r(sm : sm + tavg) (8) After extracting the CSI waveforms Km,t,r from all sub￾carriers of the t-r antenna pair, we apply PCA on those CSI waveforms to remove the noisy components and obtain the components that represent the variations caused by move￾ments of hands and fingers. Unlike principle components derived from normalized streams, it is difficult to decide which PCA component rep￾resents noise and should be removed from the top p principal components for the case of Km,t,r. The difficulty arises be￾cause Km,t,r contains the set of waveforms for a specific keystroke instead of the whole CSI stream, due to which the variance of noisy component often becomes small. We observe that the noisy PCA component keeps changing po￾sitions between 1st and 2nd place among the sorted PCA components for different extracted keystroke waveforms. In order to get rid of this problem, we first project Km,t,r onto all top q principal components. Let Φ {1:q} K be an Sc × q di￾mensional matrix that represent the top q principal compon￾ents in Km,t,r obtained after applying PCA and K {1:q} m,t,r be an L × q dimensional matrix containing the projected CSI streams in its columns, where L is the length of segmented keystroke waveform. Thus, K {1:q} m,t,r is given by the following equation. K {1:q} m,t,r = Km,t,r × Φ {1:q} K (9) In our implementation, we choose q = 4. This choice is again based on the observation that the top 4 principal compon￾ents contain enough information about keystrokes required to achieve high accuracy during classification. To detect which waveform in K {1:q} m,t,r represents the noisy projection, we chose the top 2 projected waveforms and di-
<<向上翻页向下翻页>>
©2008-现在 cucdc.com 高等教育资讯网 版权所有