Three laver of APsin the r11 r12 13 staving segment M=LiLB= r21 22 T广23 (1) r31 r32 r33 ) () where rij is the overlapping rate between subsets l4i and lBi of AP set vectors L4 and LB,respectively.The overlapping rate rij can be obtained by H)Lrvel-4 马salist of APs in e layer (ame reom) OverlapApNum(lAi,IBj) (a)Appearance rates and significance (b)Four kinds of closeness between rij= i,j=1,2,3. (2) min(Num(IAi),Num(Igi)) of the APs in a staying segment. staying segments A and B. Based on the statistical analysis with 431 staying segments Fig.4.AP appearance rate distribution-based staying segment characteriza- collected from 167 places in 3 cities,we empirically quantify tion. the physical closeness expressed by the closeness matrix M detect short staying segments even when the user is traveling. into five levels: We next check whether the segment duration Ts=fm-fi is greater than a threshold t (e.g.,t=6 minutes)to further Co={M:2-1y=0} (Completely separated) (Same street block) confirm valid staying segments and filter out the false staying Ci={M::>0amd22-1ry-r=0}: C=M:mry-r33-r>0andr=:(Same building) (3) segments.Meanwhile,the user's entrance/departure time and C={M:0<n1<0.6} (Adjacent rooms) C4={M:n1≥0.6, (Same room) corresponding staying duration could also be obtained. where CL,C2,C3,C4 are four mutually exclusive closeness sets B.AP Appearance Rate Distribution-based Staying Segment Characterization with increasing closeness level as shown in Figure 4(b),repre- senting the same street block,the same building,the adjacent We next characterize the visited places by deriving Wi-Fi rooms and the same room respectively.Co=CIUC2UC3 UC4 AP appearance distribution in the detected staying segments means two staying segments are completely separated.We use The discovered AP BSSID list can be used to describe the level-i closeness to express closeness in set Ci. wireless environment of the user in the staying segment. However,not all the APs have the same significance for D.Physical Closeness-based Staying Segments Grouping characterizing the spatial information.Some APs may appear We note that the same user's multiple staying segments may only in a few scans due to weak Wi-Fi signals,while others correspond to the same place as the user may pay multiple are more stable and appear almost in every scan.We calculate revisits.We thus combine these staying segments together by the appearance rate of each discovered AP to represent its checking whether there is level-4 closeness between them and significance,and then classify the APs into different categories keep all the time slots.The grouped staying segments represent based on their significance.In particular,the appearance rate non-redundant places visited by the user and contains the of an AP is defined as R=4,where Na is the appearance user's activities.We can then characterize the user's activities number of this AP and N is the total number of scans in at each unique place. the detected staying segment.The appearance rates together V.DAILY PLACE AND ACTIVITY INFERENCE with BSSIDs of the discovered APs are used to characterize the spatial information of the staying segment,which has the In this section,we explore to what extent we can understand potential to both differentiate places with good resolution but the contextual information of the places visited by people also measure people's physical closeness. and their activities at the places.which facilitate the social We empirically divide the APs of a staying segment into relationships and demographics inference. three layers li,i=1,2,3(i.e.lists of significant APs,secondary A.Daily Routine-based Place Inference APs and peripheral APs)according to their appearance rate. Compared to the physical information(e.g.,longitude and As shown in Figure 4(a),the significant APs are those with latitude),the contextual information (e.g.,name and type) appearance rate larger than 80%,the peripheral APs are the of a place contains more meaningful information related to ones with the appearance rate less than 20%,and the rest people's social relationships and demographics.To obtain of APs are secondary APs.Then the spatial information of such information,we exploit the simple signal information the staying segment can be characterized by AP set vector of surrounding APs (i.e.,BSSIDs and RSSs)that is readily L=(11,/2,13),which can tolerate the noise generated by the available in most mobile devices,to determine the daily place unstable APs,mobile APs or even missing AP scans. meanings of staying segments based on people's daily routines. C.Estimating Physical Closeness between Staying Segments 1)Daily Routine-based Places:Recent reports [25],[26] Measuring the physical closeness between different users' indicate that people's daily routines mainly consist of three staying segments can capture how closely people interact categories of activities:1)working and work-related activi- with each other.It can also be used to group the same ties (working activities);2)sleeping and household activities user's staying segments that are close to each other as one (home activities);and 3)leisure activities.Based on the place.In particular,we leverage the AP set vector to measure understanding of people's daily routines,we define three cate- the physical closeness between staying segments.Given two gories of daily routine-based places,namely Workplace (e.g., staying segments A and B and their AP set vectors L4 and LB, office buildings and universities),Home,and Leisure Place we calculate the closeness matrix M as follows: (e.g.,stores,restaurants,and churches),to describe contextual; $' <=>?Q $ Q \>?^ ' $' <$ Q =>?^ ;* $' <\>? Q $ ^ ݈ଵ ݈ଶ ݈ଷ $' * ܮ ൌ ݈ଵǡ ݈ଶǡ ݈ଷ ݈ * $' % # % 012$1 0 3 42 052$5 0 "2 072$7 0 3 2 082$8 0 2 $ $ $ $ (a) Appearance rates and significance (b) Four kinds of closeness between of the APs in a staying segment. staying segments A and B. Fig. 4. AP appearance rate distribution-based staying segment characterization. detect short staying segments even when the user is traveling. We next check whether the segment duration Ts = tm −t1 is greater than a threshold τ (e.g., τ = 6 minutes) to further confirm valid staying segments and filter out the false staying segments. Meanwhile, the user’s entrance/departure time and corresponding staying duration could also be obtained. B. AP Appearance Rate Distribution-based Staying Segment Characterization We next characterize the visited places by deriving Wi-Fi AP appearance distribution in the detected staying segments. The discovered AP BSSID list can be used to describe the wireless environment of the user in the staying segment. However, not all the APs have the same significance for characterizing the spatial information. Some APs may appear only in a few scans due to weak Wi-Fi signals, while others are more stable and appear almost in every scan. We calculate the appearance rate of each discovered AP to represent its significance, and then classify the APs into different categories based on their significance. In particular, the appearance rate of an AP is defined as R = Na N , where Na is the appearance number of this AP and N is the total number of scans in the detected staying segment. The appearance rates together with BSSIDs of the discovered APs are used to characterize the spatial information of the staying segment, which has the potential to both differentiate places with good resolution but also measure people’s physical closeness. We empirically divide the APs of a staying segment into three layers li,i = 1,2,3 (i.e. lists of significant APs, secondary APs and peripheral APs) according to their appearance rate. As shown in Figure 4(a), the significant APs are those with appearance rate larger than 80%, the peripheral APs are the ones with the appearance rate less than 20%, and the rest of APs are secondary APs. Then the spatial information of the staying segment can be characterized by AP set vector L = (l1,l2,l3), which can tolerate the noise generated by the unstable APs, mobile APs or even missing AP scans. C. Estimating Physical Closeness between Staying Segments Measuring the physical closeness between different users’ staying segments can capture how closely people interact with each other. It can also be used to group the same user’s staying segments that are close to each other as one place. In particular, we leverage the AP set vector to measure the physical closeness between staying segments. Given two staying segments A and B and their AP set vectors LA and LB, we calculate the closeness matrix M as follows: M = L−1 A LB = ⎛ ⎝ r11 r12 r13 r21 r22 r23 r31 r32 r33 ⎞ ⎠, (1) where ri j is the overlapping rate between subsets lAi and lBi of AP set vectors LA and LB, respectively. The overlapping rate ri j can be obtained by ri j = OverlapApNum(lAi,lB j) min(Num(lAi),Num(lB j)),i, j = 1,2,3. (2) Based on the statistical analysis with 431 staying segments collected from 167 places in 3 cities, we empirically quantify the physical closeness expressed by the closeness matrix M into five levels: ⎧ ⎪⎪⎪⎪⎨ ⎪⎪⎪⎪⎩ C0 = M : ∑3 i, j=1 ri j = 0 ; (Completely separated) C1 = M : r33 > 0 and ∑3 i, j=1 ri j −r33 = 0 ; (Same street block) C2 = M : ∑3 i, j=1 ri j −r33 −r11 > 0 and r11 = 0 ; (Same building) C3 = {M : 0 < r11 < 0.6}; (Ad jacent rooms) C4 = {M : r11 ≥ 0.6}, (Same room) (3) where C1,C2,C3,C4 are four mutually exclusive closeness sets with increasing closeness level as shown in Figure 4(b), representing the same street block, the same building, the adjacent rooms and the same room respectively. C0 =C1 ∪C2 ∪C3 ∪C4 means two staying segments are completely separated. We use level-i closeness to express closeness in set Ci. D. Physical Closeness-based Staying Segments Grouping We note that the same user’s multiple staying segments may correspond to the same place as the user may pay multiple revisits. We thus combine these staying segments together by checking whether there is level-4 closeness between them and keep all the time slots. The grouped staying segments represent non-redundant places visited by the user and contains the user’s activities. We can then characterize the user’s activities at each unique place. V. DAILY PLACE AND ACTIVITY INFERENCE In this section, we explore to what extent we can understand the contextual information of the places visited by people and their activities at the places, which facilitate the social relationships and demographics inference. A. Daily Routine-based Place Inference Compared to the physical information (e.g., longitude and latitude), the contextual information (e.g., name and type) of a place contains more meaningful information related to people’s social relationships and demographics. To obtain such information, we exploit the simple signal information of surrounding APs (i.e., BSSIDs and RSSs) that is readily available in most mobile devices, to determine the daily place meanings of staying segments based on people’s daily routines. 1) Daily Routine-based Places: Recent reports [25], [26] indicate that people’s daily routines mainly consist of three categories of activities: 1) working and work-related activities (working activities); 2) sleeping and household activities (home activities); and 3) leisure activities. Based on the understanding of people’s daily routines, we define three categories of daily routine-based places, namely Workplace (e.g., office buildings and universities), Home, and Leisure Place (e.g., stores, restaurants, and churches), to describe contextual