Smartphone Privacy Leakage of Social Relationships and Demographics from Surrounding Access Points Chen Wang*,Chuyu Wang*t,Yingying Chen*,Lei Xiet and Sanglu Lut *Department of Electrical and Computer Engineering Stevens Institute of Technology,Hoboken,NJ,USA {cwang42,yingying.chen;@stevens.edu State Key Laboratory for Novel Software Technology Nanjing University,Nanjing,Jiangsu,China wangcyu217@dislab.nju.edu.cn,(Ixie,sanglu@nju.edu.cn Abstract-While the mobile users enjoy the anytime anywhere personal information,in particular users'social relationships Internet access by connecting their mobile devices through Wi-Fi and demographics.could be derived.Prior work in demo- services,the increasing deployment of access points (APs)have raised a number of privacy concerns.This paper explores the graphics inference based on Wi-Fi network mainly rely on potential of smartphone privacy leakage caused by surrounding the context information obtained from passively sniffed users' APs.In particular,we study to what extent the users'personal Wi-Fi traffic [4],[5].For example,Cheng et al.examine information such as social relationships and demographics could users'Internet browsing activities by collecting their in-the- be revealed leveraging simple signal information from APs air traffic in public hotspots [4],whereas Huaxin et al.infer without examining the Wi-Fi traffic.Our approach utilizes users' user demographic information by passively sniffing the Wi- activities at daily visited places derived from the surrounding APs to infer users'social interactions and individual behaviors.Fur Fi traffic meta-data [5].These methods need to examine the thermore,we develop two new mechanisms:the Closeness-based Wi-Fi traffic and are thus not scalable to large number of Social Relationships Inference algorithm captures how closely users due to the high deployment overhead involved.Existing people interact with each other by evaluating their physical work in social relationships inference primarily depend on closeness and derives fine-grained social relationships,whereas the Behavior-based Demographics Inference method differentiates the encounter events detected by either bluetooth [6],Wi- various individual behaviors via the extracted activity features Fi SSID list [7],or GPS locations [8].These approaches (e.g.,activeness and time slots)at each daily place to reveal can only perform coarse-grained social relationships inference users'demographics.Extensive experiments conducted with 21 by examining whether users have interactions or not instead participants'real daily life including 257 different places in three of studying users'behaviors and how closely they interact cities over a 6-month period demonstrate that the simple signal information from surrounding APs have a high potential to reveal with each other.They can neither provide fine-grained so- people's social relationships and infer demographics with an over cial relationships(such as advisor-student,colleagues,friends, 90%accuracy when using our approach. husband-wife,neighbors)nor identify specific role of the user in the relationship. I.INTRODUCTION It is known that GPS.motion sensors and contact lists on Wi-Fi networks are becoming increasingly pervasive,to the mobile devices can exhibit privacy,but how much a user's point where public Wi-Fi access is readily in place in numer- privacy could be leaked from the ubiquitous access points is ous cities [1].And the number of public Wi-Fi Access Points unclear.In this work,we demonstrate that by examining the (APs)is expected to hit 340 million globally by 2018.resulting simple signal features of the surrounding APs it is possible to in one public Wi-Fi AP for every twenty people worldwide [2]. infer users'fine-grained social relationships and demographics More commonly,retail stores,offices,universities and homes without sniffing any Wi-Fi traffic.Specifically,the availability are usually Wi-Fi enabled for providing high bandwidth and of surrounding Wi-Fi APs is periodically scanned by mobile cost-effective connectivity to the Internet for the mobile users. devices because of their default systems purpose to optimize While the mobile users enjoy the anytime anywhere Internet network service via continuously seeking better Wi-Fi signals access by connecting their mobile devices (e.g.,smartphones)and remembered APs [9],[10]and accessing such information to the Wi-Fi networks,the surrounding APs have raised a only requires a common permission,which is considered number of privacy concerns.For example,mobile users could with low risk [11].Signal features such as the time-series of be located and tracked based on the ubiquitous APs,such as BSSIDs(i.e.MAC addresses)and Received Signal Strength using Google location service [3]. (RSS)are then extracted from these scanned APs and analyzed In this work,we study the potential of privacy leakage to derive users'activities at daily visited places.Our system caused by surrounding APs and explore to what extent the exploits the rich information of users'daily interactions and
Smartphone Privacy Leakage of Social Relationships and Demographics from Surrounding Access Points Chen Wang∗, Chuyu Wang∗†, Yingying Chen∗, Lei Xie† and Sanglu Lu† ∗Department of Electrical and Computer Engineering Stevens Institute of Technology, Hoboken, NJ, USA {cwang42, yingying.chen}@stevens.edu †State Key Laboratory for Novel Software Technology Nanjing University, Nanjing, Jiangsu, China wangcyu217@dislab.nju.edu.cn, {lxie, sanglu}@nju.edu.cn Abstract—While the mobile users enjoy the anytime anywhere Internet access by connecting their mobile devices through Wi-Fi services, the increasing deployment of access points (APs) have raised a number of privacy concerns. This paper explores the potential of smartphone privacy leakage caused by surrounding APs. In particular, we study to what extent the users’ personal information such as social relationships and demographics could be revealed leveraging simple signal information from APs without examining the Wi-Fi traffic. Our approach utilizes users’ activities at daily visited places derived from the surrounding APs to infer users’ social interactions and individual behaviors. Furthermore, we develop two new mechanisms: the Closeness-based Social Relationships Inference algorithm captures how closely people interact with each other by evaluating their physical closeness and derives fine-grained social relationships, whereas the Behavior-based Demographics Inference method differentiates various individual behaviors via the extracted activity features (e.g., activeness and time slots) at each daily place to reveal users’ demographics. Extensive experiments conducted with 21 participants’ real daily life including 257 different places in three cities over a 6-month period demonstrate that the simple signal information from surrounding APs have a high potential to reveal people’s social relationships and infer demographics with an over 90% accuracy when using our approach. I. INTRODUCTION Wi-Fi networks are becoming increasingly pervasive, to the point where public Wi-Fi access is readily in place in numerous cities [1]. And the number of public Wi-Fi Access Points (APs) is expected to hit 340 million globally by 2018, resulting in one public Wi-Fi AP for every twenty people worldwide [2]. More commonly, retail stores, offices, universities and homes are usually Wi-Fi enabled for providing high bandwidth and cost-effective connectivity to the Internet for the mobile users. While the mobile users enjoy the anytime anywhere Internet access by connecting their mobile devices (e.g., smartphones) to the Wi-Fi networks, the surrounding APs have raised a number of privacy concerns. For example, mobile users could be located and tracked based on the ubiquitous APs, such as using Google location service [3]. In this work, we study the potential of privacy leakage caused by surrounding APs and explore to what extent the personal information, in particular users’ social relationships and demographics, could be derived. Prior work in demographics inference based on Wi-Fi network mainly rely on the context information obtained from passively sniffed users’ Wi-Fi traffic [4], [5]. For example, Cheng et al. examine users’ Internet browsing activities by collecting their in-theair traffic in public hotspots [4], whereas Huaxin et al. infer user demographic information by passively sniffing the WiFi traffic meta-data [5]. These methods need to examine the Wi-Fi traffic and are thus not scalable to large number of users due to the high deployment overhead involved. Existing work in social relationships inference primarily depend on the encounter events detected by either bluetooth [6], WiFi SSID list [7], or GPS locations [8]. These approaches can only perform coarse-grained social relationships inference by examining whether users have interactions or not instead of studying users’ behaviors and how closely they interact with each other. They can neither provide fine-grained social relationships (such as advisor-student, colleagues, friends, husband-wife, neighbors) nor identify specific role of the user in the relationship. It is known that GPS, motion sensors and contact lists on mobile devices can exhibit privacy, but how much a user’s privacy could be leaked from the ubiquitous access points is unclear. In this work, we demonstrate that by examining the simple signal features of the surrounding APs it is possible to infer users’ fine-grained social relationships and demographics without sniffing any Wi-Fi traffic. Specifically, the availability of surrounding Wi-Fi APs is periodically scanned by mobile devices because of their default systems purpose to optimize network service via continuously seeking better Wi-Fi signals and remembered APs [9], [10] and accessing such information only requires a common permission, which is considered with low risk [11]. Signal features such as the time-series of BSSIDs (i.e. MAC addresses) and Received Signal Strength (RSS) are then extracted from these scanned APs and analyzed to derive users’ activities at daily visited places. Our system exploits the rich information of users’ daily interactions and
behaviors embedded in these derived activities and discloses We show with experimental study of 21 participants that fine-grained social relationships (including advisor-student, by using our system one can achieve over 91%accuracy supervisor-employee,colleagues,friends,husband-wife and of inferring social relationships and over 90%accuracy neighbors)as well as demographic information (such as oc- of deriving demographic information via examining the cupation,gender,religion,marital status). simple signal features from surrounding APs. Our approach of using simple signal features of APs can II.RELATED WORK be easily applied to a large number of users.For example, advertisers or third party companies could mine users'per- In this work,we aim to understand the privacy leakage sonal information for targeted advertising or recommending of smartphone users,in particular discovering users'social services.However,such an approach could cause significant relationships and demographics,by analyzing only the avail- privacy leakage if it is utilized by advertisers with aggressive ability of surrounding APs without sniffing any Wi-Fi traffic business attempts,who could simply publish free apps to users Obtaining such information requires limited permission other while these free apps actively collect users'surrounding AP than turning on GPS or accessing to contact lists.Our work is information and send back to the server to derive users'social related to the research efforts in using various information col- relationships and demographics. lected from Wi-Fi network and/or smartphone for meaningful In particular,we describe people's daily places in three places extraction [12]-[15],social relationships inference [6], dimensions (i.e.temporal,spatial and contextual)to infer peo- [7],[16]-[18],and demographics derivation [4],[5],[19]. ple's activities at each place.For users performing activities As the contextual location can be used for learning the per- at the same place,we calculate physical closeness of the users son's interest and providing content-aware applications,there (e.g.,whether staying at the same room,adjacent rooms or have been active studies on extracting contextual meaning of inside the same building)and extract users'activeness (e.g., the locations people visited.For example,Kang et al.design a walking around or sitting)together with other features (e.g., cluster-based method to extract meaningful places from traces time slots and duration)to characterize their activities at daily of location coordinates collected from GPS and Wi-Fi based places.We then develop Closeness-based Social Relationships indoor location system [12].Kim et al.propose SensLoc that Inference algorithm to capture where,when and how closely utilizes a combination of acceleration,Wi-Fi,and GPS sensors people interact to derive fine-grained social relationships.We to find semantic places,detect user movements,and track design Behavior-based Demographics Inference method to travel paths [13].These existing methods however only focus capture individual behavior based on users'various daily on individual users'visited locations without analyzing the activities to reveal demographic information including occu- interactions between them.Besides.the obtained meaningful pation,gender,religion and marriage.We conduct extensive places may be not sufficient to infer the higher level personal experiments with 21 participants carrying their smartphones information,such as fine-grained social relationship and de- to collect surrounding Wi-Fi AP information in their real mographics,due to the lack of information about the users' daily life across three cities over 6 months and study to what daily behaviors and social interactions. extent we can derive these participants'social relationships Information in Wi-Fi networks and smartphones have been and demographic information. used in literature to infer users'social relationships.For We summarize our main contributions as follows: example,Wiese et.al [16]use the smartphone contact list to mine personal relationships.Moreover,the similarity of We demonstrate that simple signal information (e.g.,time- smartphones'SSID lists is used to reveal users'social relation- series of MAC addresses and RSS)from users'surround- ships [7].These methods can only derive coarse-grained social ing Wi-Fi APs can reveal private information including relationships without analyzing the behaviors and interactions both social relationships and demographics. among people.Vicinity detection via Bluetooth or Wi-Fi We develop statistical methods to detect and character- signals opens opportunities for social interaction analysis and ize users'daily visited places based on the AP signal the strength of friendship ties can be inferred from such information and further infer the context of daily places wireless signals [6,[18.However,these vicinity detection by deriving users'activity features(e.g.,activeness,time methods only consider the relative interaction between people slots and duration) without interaction context (e.g.,place context and behaviors). We design closeness-based social relationships inference They are unable to differentiate the specific type of various algorithm to analyze when,where and how closely users social relationships,such as family members and friends.Our interact with each other and reveal users'detailed social previous work focuses on extracting the social relationship relationships (e.g.,advisor-student,supervisor-employee, from smartphone App leaked information such as GPS loca- colleagues,friends,husband-wife,customer relationship tion,IMEI and network location [20.It could only derive and neighbors). the social relationships in a coarse-grained manner.In this We further abstract people's various behaviors (e.g.,paper,we take a closer look and study the privacy leakage home,working and leisure behaviors)to infer their demo-just from the surrounding APs and derive people's activities graphic information such as occupation,gender,religion, and various closeness levels of social interactions for inferring and marital status. detailed relationships demographic information
behaviors embedded in these derived activities and discloses fine-grained social relationships (including advisor-student, supervisor-employee, colleagues, friends, husband-wife and neighbors) as well as demographic information (such as occupation, gender, religion, marital status). Our approach of using simple signal features of APs can be easily applied to a large number of users. For example, advertisers or third party companies could mine users’ personal information for targeted advertising or recommending services. However, such an approach could cause significant privacy leakage if it is utilized by advertisers with aggressive business attempts, who could simply publish free apps to users while these free apps actively collect users’ surrounding AP information and send back to the server to derive users’ social relationships and demographics. In particular, we describe people’s daily places in three dimensions (i.e. temporal, spatial and contextual) to infer people’s activities at each place. For users performing activities at the same place, we calculate physical closeness of the users (e.g., whether staying at the same room, adjacent rooms or inside the same building) and extract users’ activeness (e.g., walking around or sitting) together with other features (e.g., time slots and duration) to characterize their activities at daily places. We then develop Closeness-based Social Relationships Inference algorithm to capture where, when and how closely people interact to derive fine-grained social relationships. We design Behavior-based Demographics Inference method to capture individual behavior based on users’ various daily activities to reveal demographic information including occupation, gender, religion and marriage. We conduct extensive experiments with 21 participants carrying their smartphones to collect surrounding Wi-Fi AP information in their real daily life across three cities over 6 months and study to what extent we can derive these participants’ social relationships and demographic information. We summarize our main contributions as follows: • We demonstrate that simple signal information (e.g., timeseries of MAC addresses and RSS) from users’ surrounding Wi-Fi APs can reveal private information including both social relationships and demographics. • We develop statistical methods to detect and characterize users’ daily visited places based on the AP signal information and further infer the context of daily places by deriving users’ activity features (e.g., activeness, time slots and duration) • We design closeness-based social relationships inference algorithm to analyze when, where and how closely users interact with each other and reveal users’ detailed social relationships (e.g., advisor-student, supervisor-employee, colleagues, friends, husband-wife, customer relationship and neighbors). • We further abstract people’s various behaviors (e.g., home, working and leisure behaviors) to infer their demographic information such as occupation, gender, religion, and marital status. • We show with experimental study of 21 participants that by using our system one can achieve over 91% accuracy of inferring social relationships and over 90% accuracy of deriving demographic information via examining the simple signal features from surrounding APs. II. RELATED WORK In this work, we aim to understand the privacy leakage of smartphone users, in particular discovering users’ social relationships and demographics, by analyzing only the availability of surrounding APs without sniffing any Wi-Fi traffic. Obtaining such information requires limited permission other than turning on GPS or accessing to contact lists. Our work is related to the research efforts in using various information collected from Wi-Fi network and/or smartphone for meaningful places extraction [12]–[15], social relationships inference [6], [7], [16]–[18], and demographics derivation [4], [5], [19]. As the contextual location can be used for learning the person’s interest and providing content-aware applications, there have been active studies on extracting contextual meaning of the locations people visited. For example, Kang et al. design a cluster-based method to extract meaningful places from traces of location coordinates collected from GPS and Wi-Fi based indoor location system [12]. Kim et al. propose SensLoc that utilizes a combination of acceleration, Wi-Fi, and GPS sensors to find semantic places, detect user movements, and track travel paths [13]. These existing methods however only focus on individual users’ visited locations without analyzing the interactions between them. Besides, the obtained meaningful places may be not sufficient to infer the higher level personal information, such as fine-grained social relationship and demographics, due to the lack of information about the users’ daily behaviors and social interactions. Information in Wi-Fi networks and smartphones have been used in literature to infer users’ social relationships. For example, Wiese et. al [16] use the smartphone contact list to mine personal relationships. Moreover, the similarity of smartphones’ SSID lists is used to reveal users’ social relationships [7]. These methods can only derive coarse-grained social relationships without analyzing the behaviors and interactions among people. Vicinity detection via Bluetooth or Wi-Fi signals opens opportunities for social interaction analysis and the strength of friendship ties can be inferred from such wireless signals [6], [18]. However, these vicinity detection methods only consider the relative interaction between people without interaction context (e.g., place context and behaviors). They are unable to differentiate the specific type of various social relationships, such as family members and friends. Our previous work focuses on extracting the social relationship from smartphone App leaked information such as GPS location, IMEI and network location [20]. It could only derive the social relationships in a coarse-grained manner. In this paper, we take a closer look and study the privacy leakage just from the surrounding APs and derive people’s activities and various closeness levels of social interactions for inferring detailed relationships demographic information
More recently.Wi-Fi traffic monitoring and smartphone 3000 Surrounding Wi-Fi APs Apps have been used to infer users'demographic information. (Time-series of MAC 250 For example,Cheng et al.examine the user's Internet browsing addresses and R55] g2000 activities (e.g.,domain name querying,web browsing)by People's Activities a collecting their Wi-Fi traffic in public hotspots [4].They Daily Places 000 are able to reveal the travelers'identities,locations or social privacy.Huaxin et al.design an approach to infer user demo- Social nographics Relationships Information graphic information by sniffing the Wi-Fi traffic meta-data [5]. (a)Connection from surrounding APs to (b)lllustration of observed APs by Seneviratne et al.design a system to predict various user traits social relationships demographics. a user's smartphone in one day. by analyzing the snapshot of installed Apps [19].Different Fig.1.Preliminary studies. from the above work,we study the capability of examining the simple signal information of surrounding APs to derive leisure time)can be derived to reflect individual demographics demographic information without sniffing any Wi-Fi traffic or Furthermore,we observe that the same place or the places examining the installed Apps. in the neighborhoods may share some APs (e.g.,office and III.SYSTEM DESIGN restaurant 1).Their physical closeness may be obtained by A.Preliminaries checking how many surrounding APs they share,which is Environment-Behavior research reveals that an individual's useful for analyzing social interactions. activities such as work-related,household and leisure activities B.Challenges are related to the places they visit [21].And such activities Robust Daily Places and Activity Detection Using APs. at daily visited places can be analyzed and mined to infer Lacking the pre-knowledge of AP deployment,the accurate users'personal information such as social relationships and and robust detection of daily places and activities from ubiq- demographics [22].Thus by leveraging the users'activities at uitous APs is challenging.And the ubiquitous unstable and daily places as a bridge,we could start from the non-contextual mobile APs even add to the difficulties.Additionally,the daily surrounding AP information to infer users'social relationships places need to be abstracted with sufficient spatial resolution and demographics.This connection is depicted in Figure 1(a). (e.g.,differentiating rooms and floors)for further deriving The surrounding Wi-Fi APs reflect users'surrounding wireless users'mobility and their physical closeness during interaction. environments,which can be utilized to determine users'daily Determining the Context of Daily Places.Deriving the visited places and activities.The daily places in our work refer context of a user's daily visited places from the non-contextual to the abstract locations that users visit in their daily lives, AP signal information is challenging.Moreover,a place may such as home,workplace,restaurants,stores and churches.By exhibit different contexts to different users.For example,stores analyzing users'activities at daily places,we could derive the are leisure places to most people but the workplace to the social interactions between users and abstract individual's be- store staff.This requires us to search for the deep implication havior.Such information is then further utilized to mine users' behind the individual's activities at the place instead of relying social relationships and demographics.Note that contrary to on traditional place context based on the place function. the existing work in social relationships and demographics inference,we only utilize the availability of surrounding APs' Fine-grained Social Relationships Inference.Fine-grained relationships inference needs the information on not only simple signal information without requiring to sniff any Wi-Fi who have interactions but also on how closely they interact. traffic contents. Our systems needs to have the capability to define multiple To study how the surrounding APs can be utilized to detect a user's daily places and activities,we conduct preliminary closenesses between users.Furthermore,specifying the role of each user in a relationship (e.g.,husband or wife)may needs experiments by recording the APs on the user's smartphone at the regular rate of one scan per 15 seconds,because a Wi-Fi the assistance from demographic information (e.g.,gender). device usually scans every 5-15 seconds for providing the Demography Inference without Context.Inferring a user's user non-interrupted Wi-Fi connection to cope with the user's demographics with non-contextual simple signal information of surrounding APs is challenging.Different from the previous place change [23],[24].Figure 1(b)shows the recorded time- work relying on the content obtained from monitoring the Wi- series of a user's surrounding APs (differentiated by BSSIDs) Fi traffic,our system explores the possibility to abstract users' for one day,as well as the groundtruth of visited places.As behaviors based on their various activities at daily places for the AP index is assigned to each unique AP in sequence,the later observed AP has larger index.The observation is that demographic inference. the detected AP lists have large overlaps when the user stays C.System Overview at the same place,while the AP lists are distinct when the The basic idea of our system is to analyze users'activities user moves to a different daily place.This suggests that we at daily routine-based places that are derived from users' may utilize the changes of the observed AP list to detect the surrounding APs for fine-grained social relationships and user's daily visited places as well as the entrance/departure demographics inference.The proposed system takes as inputs time and the staying duration.Moreover,the user's activities the information of users'surrounding APs perceived by their at daily places (e.g.,the user's mobility at work and during smartphones at each scan,including the list of AP MAC
More recently, Wi-Fi traffic monitoring and smartphone Apps have been used to infer users’ demographic information. For example, Cheng et al. examine the user’s Internet browsing activities (e.g., domain name querying, web browsing) by collecting their Wi-Fi traffic in public hotspots [4]. They are able to reveal the travelers’ identities, locations or social privacy. Huaxin et al. design an approach to infer user demographic information by sniffing the Wi-Fi traffic meta-data [5]. Seneviratne et al. design a system to predict various user traits by analyzing the snapshot of installed Apps [19]. Different from the above work, we study the capability of examining the simple signal information of surrounding APs to derive demographic information without sniffing any Wi-Fi traffic or examining the installed Apps. III. SYSTEM DESIGN A. Preliminaries Environment-Behavior research reveals that an individual’s activities such as work-related, household and leisure activities are related to the places they visit [21]. And such activities at daily visited places can be analyzed and mined to infer users’ personal information such as social relationships and demographics [22]. Thus by leveraging the users’ activities at daily places as a bridge, we could start from the non-contextual surrounding AP information to infer users’ social relationships and demographics. This connection is depicted in Figure 1(a). The surrounding Wi-Fi APs reflect users’ surrounding wireless environments, which can be utilized to determine users’ daily visited places and activities. The daily places in our work refer to the abstract locations that users visit in their daily lives, such as home, workplace, restaurants, stores and churches. By analyzing users’ activities at daily places, we could derive the social interactions between users and abstract individual’s behavior. Such information is then further utilized to mine users’ social relationships and demographics. Note that contrary to the existing work in social relationships and demographics inference, we only utilize the availability of surrounding APs’ simple signal information without requiring to sniff any Wi-Fi traffic contents. To study how the surrounding APs can be utilized to detect a user’s daily places and activities, we conduct preliminary experiments by recording the APs on the user’s smartphone at the regular rate of one scan per 15 seconds, because a Wi-Fi device usually scans every 5 - 15 seconds for providing the user non-interrupted Wi-Fi connection to cope with the user’s place change [23], [24]. Figure 1(b) shows the recorded timeseries of a user’s surrounding APs (differentiated by BSSIDs) for one day, as well as the groundtruth of visited places. As the AP index is assigned to each unique AP in sequence, the later observed AP has larger index. The observation is that the detected AP lists have large overlaps when the user stays at the same place, while the AP lists are distinct when the user moves to a different daily place. This suggests that we may utilize the changes of the observed AP list to detect the user’s daily visited places as well as the entrance/departure time and the staying duration. Moreover, the user’s activities at daily places (e.g., the user’s mobility at work and during ! ! " # $ $ (a) Connection from surrounding APs to (b) Illustration of observed APs by social relationships & demographics. a user’s smartphone in one day. Fig. 1. Preliminary studies. leisure time) can be derived to reflect individual demographics. Furthermore, we observe that the same place or the places in the neighborhoods may share some APs (e.g., office and restaurant 1). Their physical closeness may be obtained by checking how many surrounding APs they share, which is useful for analyzing social interactions. B. Challenges Robust Daily Places and Activity Detection Using APs. Lacking the pre-knowledge of AP deployment, the accurate and robust detection of daily places and activities from ubiquitous APs is challenging. And the ubiquitous unstable and mobile APs even add to the difficulties. Additionally, the daily places need to be abstracted with sufficient spatial resolution (e.g., differentiating rooms and floors) for further deriving users’ mobility and their physical closeness during interaction. Determining the Context of Daily Places. Deriving the context of a user’s daily visited places from the non-contextual AP signal information is challenging. Moreover, a place may exhibit different contexts to different users. For example, stores are leisure places to most people but the workplace to the store staff. This requires us to search for the deep implication behind the individual’s activities at the place instead of relying on traditional place context based on the place function. Fine-grained Social Relationships Inference. Fine-grained relationships inference needs the information on not only who have interactions but also on how closely they interact. Our systems needs to have the capability to define multiple closenesses between users. Furthermore, specifying the role of each user in a relationship (e.g., husband or wife) may needs the assistance from demographic information (e.g., gender). Demography Inference without Context. Inferring a user’s demographics with non-contextual simple signal information of surrounding APs is challenging. Different from the previous work relying on the content obtained from monitoring the WiFi traffic, our system explores the possibility to abstract users’ behaviors based on their various activities at daily places for demographic inference. C. System Overview The basic idea of our system is to analyze users’ activities at daily routine-based places that are derived from users’ surrounding APs for fine-grained social relationships and demographics inference. The proposed system takes as inputs the information of users’ surrounding APs perceived by their smartphones at each scan, including the list of AP MAC
Wi-Fi AP list tim Dynamic Searching window Wi-Fi Access Point Time-series Ⅲ-II AP lists to be se Staying Segment AP List-hased Staying/Traveling d AP for all scan Detection and Grouping d AP fer all ance Distritrution-based Physical Closeness-based Esegment [T之h anted AP Int ms 1+ Daily Place and Daily Routine-based Stavine Activity Inference Segment Group Categorization Activity Feature Extraction and Fin Fig.3.Staying/traveling segmentation leveraging dynamic searching windows e-ra Home to analyze the overlapped AP lists over consecutive scans. Characterization and Closeness-based Social Relationships Classification to infer when,where and how closely people in- teract with each other for inferring their possible relationships avior-based I such as family,neighbors,colleagues,and friends.To derive a hips Infcrence user's demographics.Behavior-based Demographics Inference applies Daily Activity-based Behavior Derivation to abstract people's various behaviors including working behaviors,home Family Neighbors Colleagues Friends on Gender Religion Mariage behaviors and leisure behaviors,based on the activities at Social Relationships Demographics daily places.It then utilizes Behavior-based Decision Rule to Fig.2.Wi-Fi AP distribution-based social relationships and demographics infer users'demographic information(e.g.,occupation,gender, inference framework. marriage and religion)based on the behavior abstraction. addresses and RSS,to infer fine-grained social relationships At last,the Associate Reasoning can be applied to social and demographics.Figure 2 presents our system flow. relationships and demographics to improve the accuracy of First,the Staving Segment Detection and Grouping com- inference results,such as identifying the specific role of the ponent detects and characterizes users'daily visited places user in a relationship (e.g.,husband-wife and advisor-student). in three steps.AP List-based Staying/Traveling Segmentation analyzes the overlap of the AP lists over consecutive scans IV.STAYING SEGMENT GROUP DETECTION AND and divides the time-series into staying and traveling periods CHARACTERIZATION Staying Segment Characterization estimates the significance A.AP List-based Staying/Traveling Segmentation of each surrounding AP by calculating its appearance rate As observed in the preliminary study of Figure 1(b),the within the staying segment.It then categorizes the APs by discovered AP BSSID lists of consecutive scans have large their significance to describe the spatial information of each overlaps when the user stays at the same place,while the staying segment.The spatially close-by staying segments are similarity of the AP lists is rapidly diminished when the user then grouped together as one unique place by using Closeness- moves to a different place.We thus take the advantage of the based Staying Segment Grouping. AP list similarity (i.e.BSSID list similarity)in consecutive The next component is to derive the activities at daily places scans to detect the staying and traveling segments.We define which is an important building block of social relationships staying segment as the Wi-Fi AP-list time-series segment that and demographics inference.It is carried out by using Daily captures the temporal and spatial information when the user Place and Activity Inference,which involves Daily Routine- stays at a location.And we analyze the overlap of the AP lists based Staying Segment Group Categorization and Daily Ac-within a dynamic searching window of consecutive scans to tivity Feature Extraction and Fine-grained Place Context In- perform staying segmentation ference.Daily Routine-based Staying Segment Categorization In particular,Figure 3 illustrates the proposed AP List- classifies the grouped staying segments (i.e.unique places) based Staying/Traveling Segmentation in identifying the stay- into three contextual categories (i.e.home,leisure and work- ing segment n.The dynamic searching window starts at t place)based on people's daily routines.At last,Daily Activity and iteratively expands to the next scan.In each iteration, Feature Extraction and Fine-grained Place Context Inference we analyze the overlapped APs of all the scans within the derives people's activity features including the staying time searching window.The number of solid dots at each scanning slots,duration and activeness and assigns detailed contextual time ti(i=1.2....)indicates the number of overlapped APs information to these places by leveraging the derived activity that are found within the window from t to t.When the features and geo-information,such as restaurants or stores in searching window iteratively expands to the next scan,the leisure places,campus or office buildings in workplaces. number of overlapped APs may decrease.When no overlapped Finally,our system infers users'social relationships and AP is found in the expanded searching window (e.g.,the demographics based on the derived activities at daily places.window from f tot),such searching window is identified as In particular,it first calculates the physical closenesses of the one possible staying segment.We note that because it may take interactions between users.It then uses Interaction Segment several scans to travel out of an AP's range,this approach can
! " ! # ! " " #" " $ %& #" $' "$ $ * + ; ! ; ; * + $' ' "*+ ' + ! /'& #" Fig. 2. Wi-Fi AP distribution-based social relationships and demographics inference framework. addresses and RSS, to infer fine-grained social relationships and demographics. Figure 2 presents our system flow. First, the Staying Segment Detection and Grouping component detects and characterizes users’ daily visited places in three steps. AP List-based Staying/Traveling Segmentation analyzes the overlap of the AP lists over consecutive scans and divides the time-series into staying and traveling periods. Staying Segment Characterization estimates the significance of each surrounding AP by calculating its appearance rate within the staying segment. It then categorizes the APs by their significance to describe the spatial information of each staying segment. The spatially close-by staying segments are then grouped together as one unique place by using Closenessbased Staying Segment Grouping. The next component is to derive the activities at daily places which is an important building block of social relationships and demographics inference. It is carried out by using Daily Place and Activity Inference, which involves Daily Routinebased Staying Segment Group Categorization and Daily Activity Feature Extraction and Fine-grained Place Context Inference. Daily Routine-based Staying Segment Categorization classifies the grouped staying segments (i.e. unique places) into three contextual categories (i.e. home, leisure and workplace) based on people’s daily routines. At last, Daily Activity Feature Extraction and Fine-grained Place Context Inference derives people’s activity features including the staying time slots, duration and activeness and assigns detailed contextual information to these places by leveraging the derived activity features and geo-information, such as restaurants or stores in leisure places, campus or office buildings in workplaces. Finally, our system infers users’ social relationships and demographics based on the derived activities at daily places. In particular, it first calculates the physical closenesses of the interactions between users. It then uses Interaction Segment ାଵݐ ݐ ଵିݐ ଶିݐ ଷݐ ଶݐ ଵ ݐ ! "" # $% $$ & "" # $% $$ ଷିݐ ݉െͳ ܶ௦ ൌ ݐ െ ݐଵ ݄ݐ ௦ܶ ݉ ݄ݐ ൏ ௦ܶ % $$ # ' ' # Fig. 3. Staying/traveling segmentation leveraging dynamic searching windows to analyze the overlapped AP lists over consecutive scans. Characterization and Closeness-based Social Relationships Classification to infer when, where and how closely people interact with each other for inferring their possible relationships such as family, neighbors, colleagues, and friends. To derive a user’s demographics, Behavior-based Demographics Inference applies Daily Activity-based Behavior Derivation to abstract people’s various behaviors including working behaviors, home behaviors and leisure behaviors, based on the activities at daily places. It then utilizes Behavior-based Decision Rule to infer users’ demographic information (e.g., occupation, gender, marriage and religion) based on the behavior abstraction. At last, the Associate Reasoning can be applied to social relationships and demographics to improve the accuracy of inference results, such as identifying the specific role of the user in a relationship (e.g., husband-wife and advisor-student). IV. STAYING SEGMENT GROUP DETECTION AND CHARACTERIZATION A. AP List-based Staying/Traveling Segmentation As observed in the preliminary study of Figure 1(b), the discovered AP BSSID lists of consecutive scans have large overlaps when the user stays at the same place, while the similarity of the AP lists is rapidly diminished when the user moves to a different place. We thus take the advantage of the AP list similarity (i.e. BSSID list similarity) in consecutive scans to detect the staying and traveling segments. We define staying segment as the Wi-Fi AP-list time-series segment that captures the temporal and spatial information when the user stays at a location. And we analyze the overlap of the AP lists within a dynamic searching window of consecutive scans to perform staying segmentation. In particular, Figure 3 illustrates the proposed AP Listbased Staying/Traveling Segmentation in identifying the staying segment n. The dynamic searching window starts at t1 and iteratively expands to the next scan. In each iteration, we analyze the overlapped APs of all the scans within the searching window. The number of solid dots at each scanning time ti(i = 1,2,...) indicates the number of overlapped APs that are found within the window from t1 to ti. When the searching window iteratively expands to the next scan, the number of overlapped APs may decrease. When no overlapped AP is found in the expanded searching window (e.g., the window from t1 to tm), such searching window is identified as one possible staying segment. We note that because it may take several scans to travel out of an AP’s range, this approach can
Three laver of APsin the r11 r12 13 staving segment M=LiLB= r21 22 T广23 (1) r31 r32 r33 ) () where rij is the overlapping rate between subsets l4i and lBi of AP set vectors L4 and LB,respectively.The overlapping rate rij can be obtained by H)Lrvel-4 马salist of APs in e layer (ame reom) OverlapApNum(lAi,IBj) (a)Appearance rates and significance (b)Four kinds of closeness between rij= i,j=1,2,3. (2) min(Num(IAi),Num(Igi)) of the APs in a staying segment. staying segments A and B. Based on the statistical analysis with 431 staying segments Fig.4.AP appearance rate distribution-based staying segment characteriza- collected from 167 places in 3 cities,we empirically quantify tion. the physical closeness expressed by the closeness matrix M detect short staying segments even when the user is traveling. into five levels: We next check whether the segment duration Ts=fm-fi is greater than a threshold t (e.g.,t=6 minutes)to further Co={M:2-1y=0} (Completely separated) (Same street block) confirm valid staying segments and filter out the false staying Ci={M::>0amd22-1ry-r=0}: C=M:mry-r33-r>0andr=:(Same building) (3) segments.Meanwhile,the user's entrance/departure time and C={M:0<n1<0.6} (Adjacent rooms) C4={M:n1≥0.6, (Same room) corresponding staying duration could also be obtained. where CL,C2,C3,C4 are four mutually exclusive closeness sets B.AP Appearance Rate Distribution-based Staying Segment Characterization with increasing closeness level as shown in Figure 4(b),repre- senting the same street block,the same building,the adjacent We next characterize the visited places by deriving Wi-Fi rooms and the same room respectively.Co=CIUC2UC3 UC4 AP appearance distribution in the detected staying segments means two staying segments are completely separated.We use The discovered AP BSSID list can be used to describe the level-i closeness to express closeness in set Ci. wireless environment of the user in the staying segment. However,not all the APs have the same significance for D.Physical Closeness-based Staying Segments Grouping characterizing the spatial information.Some APs may appear We note that the same user's multiple staying segments may only in a few scans due to weak Wi-Fi signals,while others correspond to the same place as the user may pay multiple are more stable and appear almost in every scan.We calculate revisits.We thus combine these staying segments together by the appearance rate of each discovered AP to represent its checking whether there is level-4 closeness between them and significance,and then classify the APs into different categories keep all the time slots.The grouped staying segments represent based on their significance.In particular,the appearance rate non-redundant places visited by the user and contains the of an AP is defined as R=4,where Na is the appearance user's activities.We can then characterize the user's activities number of this AP and N is the total number of scans in at each unique place. the detected staying segment.The appearance rates together V.DAILY PLACE AND ACTIVITY INFERENCE with BSSIDs of the discovered APs are used to characterize the spatial information of the staying segment,which has the In this section,we explore to what extent we can understand potential to both differentiate places with good resolution but the contextual information of the places visited by people also measure people's physical closeness. and their activities at the places.which facilitate the social We empirically divide the APs of a staying segment into relationships and demographics inference. three layers li,i=1,2,3(i.e.lists of significant APs,secondary A.Daily Routine-based Place Inference APs and peripheral APs)according to their appearance rate. Compared to the physical information(e.g.,longitude and As shown in Figure 4(a),the significant APs are those with latitude),the contextual information (e.g.,name and type) appearance rate larger than 80%,the peripheral APs are the of a place contains more meaningful information related to ones with the appearance rate less than 20%,and the rest people's social relationships and demographics.To obtain of APs are secondary APs.Then the spatial information of such information,we exploit the simple signal information the staying segment can be characterized by AP set vector of surrounding APs (i.e.,BSSIDs and RSSs)that is readily L=(11,/2,13),which can tolerate the noise generated by the available in most mobile devices,to determine the daily place unstable APs,mobile APs or even missing AP scans. meanings of staying segments based on people's daily routines. C.Estimating Physical Closeness between Staying Segments 1)Daily Routine-based Places:Recent reports [25],[26] Measuring the physical closeness between different users' indicate that people's daily routines mainly consist of three staying segments can capture how closely people interact categories of activities:1)working and work-related activi- with each other.It can also be used to group the same ties (working activities);2)sleeping and household activities user's staying segments that are close to each other as one (home activities);and 3)leisure activities.Based on the place.In particular,we leverage the AP set vector to measure understanding of people's daily routines,we define three cate- the physical closeness between staying segments.Given two gories of daily routine-based places,namely Workplace (e.g., staying segments A and B and their AP set vectors L4 and LB, office buildings and universities),Home,and Leisure Place we calculate the closeness matrix M as follows: (e.g.,stores,restaurants,and churches),to describe contextual
; $' ?Q $ Q \>?^ ' $' ?^ ;* $' ? Q $ ^ ݈ଵ ݈ଶ ݈ଷ $' * ܮ ൌ ݈ଵǡ ݈ଶǡ ݈ଷ ݈ * $' % # % 012$1 0 3 42 052$5 0 "2 072$7 0 3 2 082$8 0 2 $ $ $ $ (a) Appearance rates and significance (b) Four kinds of closeness between of the APs in a staying segment. staying segments A and B. Fig. 4. AP appearance rate distribution-based staying segment characterization. detect short staying segments even when the user is traveling. We next check whether the segment duration Ts = tm −t1 is greater than a threshold τ (e.g., τ = 6 minutes) to further confirm valid staying segments and filter out the false staying segments. Meanwhile, the user’s entrance/departure time and corresponding staying duration could also be obtained. B. AP Appearance Rate Distribution-based Staying Segment Characterization We next characterize the visited places by deriving Wi-Fi AP appearance distribution in the detected staying segments. The discovered AP BSSID list can be used to describe the wireless environment of the user in the staying segment. However, not all the APs have the same significance for characterizing the spatial information. Some APs may appear only in a few scans due to weak Wi-Fi signals, while others are more stable and appear almost in every scan. We calculate the appearance rate of each discovered AP to represent its significance, and then classify the APs into different categories based on their significance. In particular, the appearance rate of an AP is defined as R = Na N , where Na is the appearance number of this AP and N is the total number of scans in the detected staying segment. The appearance rates together with BSSIDs of the discovered APs are used to characterize the spatial information of the staying segment, which has the potential to both differentiate places with good resolution but also measure people’s physical closeness. We empirically divide the APs of a staying segment into three layers li,i = 1,2,3 (i.e. lists of significant APs, secondary APs and peripheral APs) according to their appearance rate. As shown in Figure 4(a), the significant APs are those with appearance rate larger than 80%, the peripheral APs are the ones with the appearance rate less than 20%, and the rest of APs are secondary APs. Then the spatial information of the staying segment can be characterized by AP set vector L = (l1,l2,l3), which can tolerate the noise generated by the unstable APs, mobile APs or even missing AP scans. C. Estimating Physical Closeness between Staying Segments Measuring the physical closeness between different users’ staying segments can capture how closely people interact with each other. It can also be used to group the same user’s staying segments that are close to each other as one place. In particular, we leverage the AP set vector to measure the physical closeness between staying segments. Given two staying segments A and B and their AP set vectors LA and LB, we calculate the closeness matrix M as follows: M = L−1 A LB = ⎛ ⎝ r11 r12 r13 r21 r22 r23 r31 r32 r33 ⎞ ⎠, (1) where ri j is the overlapping rate between subsets lAi and lBi of AP set vectors LA and LB, respectively. The overlapping rate ri j can be obtained by ri j = OverlapApNum(lAi,lB j) min(Num(lAi),Num(lB j)),i, j = 1,2,3. (2) Based on the statistical analysis with 431 staying segments collected from 167 places in 3 cities, we empirically quantify the physical closeness expressed by the closeness matrix M into five levels: ⎧ ⎪⎪⎪⎪⎨ ⎪⎪⎪⎪⎩ C0 = M : ∑3 i, j=1 ri j = 0 ; (Completely separated) C1 = M : r33 > 0 and ∑3 i, j=1 ri j −r33 = 0 ; (Same street block) C2 = M : ∑3 i, j=1 ri j −r33 −r11 > 0 and r11 = 0 ; (Same building) C3 = {M : 0 < r11 < 0.6}; (Ad jacent rooms) C4 = {M : r11 ≥ 0.6}, (Same room) (3) where C1,C2,C3,C4 are four mutually exclusive closeness sets with increasing closeness level as shown in Figure 4(b), representing the same street block, the same building, the adjacent rooms and the same room respectively. C0 =C1 ∪C2 ∪C3 ∪C4 means two staying segments are completely separated. We use level-i closeness to express closeness in set Ci. D. Physical Closeness-based Staying Segments Grouping We note that the same user’s multiple staying segments may correspond to the same place as the user may pay multiple revisits. We thus combine these staying segments together by checking whether there is level-4 closeness between them and keep all the time slots. The grouped staying segments represent non-redundant places visited by the user and contains the user’s activities. We can then characterize the user’s activities at each unique place. V. DAILY PLACE AND ACTIVITY INFERENCE In this section, we explore to what extent we can understand the contextual information of the places visited by people and their activities at the places, which facilitate the social relationships and demographics inference. A. Daily Routine-based Place Inference Compared to the physical information (e.g., longitude and latitude), the contextual information (e.g., name and type) of a place contains more meaningful information related to people’s social relationships and demographics. To obtain such information, we exploit the simple signal information of surrounding APs (i.e., BSSIDs and RSSs) that is readily available in most mobile devices, to determine the daily place meanings of staying segments based on people’s daily routines. 1) Daily Routine-based Places: Recent reports [25], [26] indicate that people’s daily routines mainly consist of three categories of activities: 1) working and work-related activities (working activities); 2) sleeping and household activities (home activities); and 3) leisure activities. Based on the understanding of people’s daily routines, we define three categories of daily routine-based places, namely Workplace (e.g., office buildings and universities), Home, and Leisure Place (e.g., stores, restaurants, and churches), to describe contextual
Sh opping 0.4 Dining and other activitie 0.3 0.2 0. 0.1 0 0.2 0.4 0.6 0.8 18 Activiness Time in a day (hour) I mme in a day (hour) (a)Spatial closeness difference. (b)Temporal closeness difference Fig.5.Distribution of activeness score computed from each AP during staying segments when people are shopping or dinning. Fig.6.Illustration of social relationships classification derived from temporal information of the places.Different from categorizing daily and spatial closeness based on one day's data. places based on their generic nature [27].our daily routine- B.Activity Feature Extraction based categorization of daily places reflects the meaning of We determine three activity features (i.e.,including active- a place to a person instead of its function,which may vary ness,visiting time slots and staying duration)that can capture from person to person to better describe the context of a place the users'mobilities and the differences between activities at for every individual.For example,the same restaurant could the daily routine-based places.Activeness (i.e.active or static) be a workplace for waiters and waitresses,but it is a leisure describes the person's status at a place,e.g.,shopping in a store place for customers.This advantage enables inferring the fine- is active while dinning in a restaurant is static.Visiting time grained social relationships and demographics. slots,including the person's one or multiple entrance/departure 2)Staying Segment Categorization based on Daily Rou- time at a daily routine-based place,captures the person's tines:Next,we determine the contextual information of a specific pattern of visiting the place,e.g.,faculties may leave place (i.e.staying segment)by categorizing it into one of office several times in one day for teaching,conference,lunch the three defined daily routine-based places.The basic idea et al.Staying duration captures the time nature of the activities is to examine common time spans of the staying segments in such as buying coffee for 10 minutes or doing hair cut for one a day with the daily routines of working and home activities, hour.We note that all the other activity features,except the respectively.Whichever staying segment results in the longest activeness,can be easily obtained by examining the temporal overlapped time with the daily routine of working or home information of the staying segments.Therefore,we discuss activities will be labeled as containing the Workplace or Home. how to derive the activeness for each staying segment in detail. The rest of staying segments are determined as containing Activeness Estimation.We devise a unique activeness the Leisure Places.Since people may move between different estimation approach to determine the activeness of the user rooms for work-related activities,after determining the Work- at a place by only utilizing the RSS of APs observed in the place,we further combine the staying segments that have at staying segment(This is the only place we apply RSS in this least level-1 closeness with the staying segments of Workplace paper).The intuition behind this approach is that the user's together to represent the whole working area.The common position changes within a place result in changing distances time spans are chosen corresponding to the majority people's to every surrounding AP and thus unstable RSS from each daily routines from the reports [25],[26]:working activities- AP.From the time series of RSS in a staying segment,we 8:00AM~4:00PM;home activities-7:00PM~6:00AM; derive a time series of RSS stability of the ih AP,denoted as leisure activities -rest free hours of a day. 3)Fine-grained Place Context Inference:Our system is de- Ai=,...,...},where j is the standard deviation signed to derive more fine-grained place contexts (e.g.restau- of RSS calculated based on a sliding time window W.Then rants or stores in the Leisure Places and universities or office we further derive the activeness score of a staying segment by buildings in the Workplace)by leveraging Geo-information, using the equation: activity features of the places and the SSID context of user ∫1,2>h associated AP.We find that the APs'BSSIDs (MAC addresses) = w+0,oiherwise. (4) in a staying segment generate fine-grained place contexts where theh is a threshold of standard deviation of RSS. through certain web-based services (e.g.,Google Map Geolo- To ensure the robustness,we only consider significant APs cation API [28],Google Place API [29]and unwired labs (80%<appearance rate)in each staying segment for deriving Location API [30]).However,the place contexts obtained from the activeness score,because the significant APs can capture the Geo-information is sometimes not unique especially in a the person's activeness in the entire staying segment.Thus, crowded business area.Therefore,to refine the place contexts the activeness score is the ratio of active period over entire from the Geo-information,we further examine the activity duration at the place.As an illustration,Figure 5 shows the features in the staying segment based on the decision rules, distribution of the activeness score of all significant APs in the made from people's general time use pattern [31]and the basic staying segments,when a user is dinning at a restaurant(i.e. knowledge of activeness at various place contexts.Moreover, sitting statically)or shopping in a store(i.e.,walking actively), if the user is associated with an AP,the semantic meaning respectively.We observe more APs of dinning have lower of the AP SSID can be utilized as assistance,if available,to activeness scores (less than 0.2)compared with shopping, identify detailed contexts (e.g.company names)of the place. indicating that the activeness score can well differentiate
Activiness 0 0.2 0.4 0.6 0.8 1 Percentage 0 0.1 0.2 0.3 0.4 Shopping Dining and other activities Fig. 5. Distribution of activeness score computed from each AP during staying segments when people are shopping or dinning. information of the places. Different from categorizing daily places based on their generic nature [27], our daily routinebased categorization of daily places reflects the meaning of a place to a person instead of its function, which may vary from person to person to better describe the context of a place for every individual. For example, the same restaurant could be a workplace for waiters and waitresses, but it is a leisure place for customers. This advantage enables inferring the finegrained social relationships and demographics. 2) Staying Segment Categorization based on Daily Routines: Next, we determine the contextual information of a place (i.e. staying segment) by categorizing it into one of the three defined daily routine-based places. The basic idea is to examine common time spans of the staying segments in a day with the daily routines of working and home activities, respectively. Whichever staying segment results in the longest overlapped time with the daily routine of working or home activities will be labeled as containing the Workplace or Home. The rest of staying segments are determined as containing the Leisure Places. Since people may move between different rooms for work-related activities, after determining the Workplace, we further combine the staying segments that have at least level-1 closeness with the staying segments of Workplace together to represent the whole working area. The common time spans are chosen corresponding to the majority people’s daily routines from the reports [25], [26]: working activities - 8 : 00AM∼ 4 : 00PM; home activities - 7 : 00PM∼ 6 : 00AM; leisure activities - rest free hours of a day. 3) Fine-grained Place Context Inference: Our system is designed to derive more fine-grained place contexts (e.g. restaurants or stores in the Leisure Places and universities or office buildings in the Workplace) by leveraging Geo-information, activity features of the places and the SSID context of user associated AP. We find that the APs’ BSSIDs (MAC addresses) in a staying segment generate fine-grained place contexts through certain web-based services (e.g., Google Map Geolocation API [28], Google Place API [29] and unwired labs Location API [30]). However, the place contexts obtained from the Geo-information is sometimes not unique especially in a crowded business area. Therefore, to refine the place contexts from the Geo-information, we further examine the activity features in the staying segment based on the decision rules, made from people’s general time use pattern [31] and the basic knowledge of activeness at various place contexts. Moreover, if the user is associated with an AP, the semantic meaning of the AP SSID can be utilized as assistance, if available, to identify detailed contexts (e.g. company names) of the place. Time in a day (hour) 0 6 12 18 24 Physical Closeness 0 0.2 0.4 0.6 0.8 1 Neighbor relationship Family relationship Time in a day (hour) 0 6 12 18 24 Physical Closeness 0 0.2 0.4 0.6 0.8 1 Team member relationship Collaborator relationship (a) Spatial closeness difference. (b) Temporal closeness difference. Fig. 6. Illustration of social relationships classification derived from temporal and spatial closeness based on one day’s data. B. Activity Feature Extraction We determine three activity features (i.e., including activeness, visiting time slots and staying duration) that can capture the users’ mobilities and the differences between activities at the daily routine-based places. Activeness (i.e. active or static) describes the person’s status at a place, e.g., shopping in a store is active while dinning in a restaurant is static. Visiting time slots, including the person’s one or multiple entrance/departure time at a daily routine-based place, captures the person’s specific pattern of visiting the place, e.g., faculties may leave office several times in one day for teaching, conference, lunch et al. Staying duration captures the time nature of the activities such as buying coffee for 10 minutes or doing hair cut for one hour. We note that all the other activity features, except the activeness, can be easily obtained by examining the temporal information of the staying segments. Therefore, we discuss how to derive the activeness for each staying segment in detail. Activeness Estimation. We devise a unique activeness estimation approach to determine the activeness of the user at a place by only utilizing the RSS of APs observed in the staying segment (This is the only place we apply RSS in this paper). The intuition behind this approach is that the user’s position changes within a place result in changing distances to every surrounding AP and thus unstable RSS from each AP. From the time series of RSS in a staying segment, we derive a time series of RSS stability of the i th AP, denoted as Λi = {λ1,...,λ j,...,λt}, where λ j is the standard deviation of RSS calculated based on a sliding time window W. Then we further derive the activeness score of a staying segment by using the equation: ψi = ∑t−w+1 j=1 v j t −w+1 , v j = 1,λ j > λth 0,otherwise, (4) where the λth is a threshold of standard deviation of RSS. To ensure the robustness, we only consider significant APs (80% ≤ appearance rate) in each staying segment for deriving the activeness score, because the significant APs can capture the person’s activeness in the entire staying segment. Thus, the activeness score is the ratio of active period over entire duration at the place. As an illustration, Figure 5 shows the distribution of the activeness score of all significant APs in the staying segments, when a user is dinning at a restaurant (i.e. sitting statically) or shopping in a store (i.e., walking actively), respectively. We observe more APs of dinning have lower activeness scores (less than 0.2) compared with shopping, indicating that the activeness score can well differentiate
Interaction segment at a daily routine-based place pair Classification based on interaction segment Short-period Long-period duration Work-Leisure Leisur Classification based on Home-Leisure -Leisure Work Work types of daily routine- based place pair Classification based on the duration of Strangers Strangers Strangers Level-4 closeness Team Customers Relatives Friends Collaborators Colleagues members in same building Family Fig.7.Decision tree of closeness-based social relationships classification. people's static and active status.We empirically set a threshold Our approach is based on the intuition that different types to the activeness score of each significant AP and further of social relationships show different temporal patterns for determine the activeness (i.e.,active or static)of a staying various levels of physical closeness in the overlapped daily segment based on the majority vote over all significant APs. routine-based place,which reveal different degrees of interac- VI.SOCIAL RELATIONSHIPS AND DEMOGRAPHICS tions between two people.Figure 6 illustrates this intuition INFERENCE by comparing the interaction segment characteristics for two In this section,we present how our system utilizes the pairs of social relationships (i.e.,neighbor and family,and activity features provided by staying segments to derive the team member and collaborator).which can be differentiated user's fine-grained social relationships and demographics. from spacial closeness degree or temporal pattern difference. A.Closeness-based Social Relationships Derivation We design a triple-layer decision tree for relationships The social relationship is about how two people interact with classification based on examining the characteristics of the each other in their daily lives,including both face-to-face inter- interaction segments between two people (i.e.,the temporal action and event the hidden interaction without encountering. and spatial patterns of their physical closeness).Figure 7 Therefore,to infer social relationships,we need to understand illustrates the flow of the decision tree.In the first layer, not only a person's activities at a place,but also how the person the decision tree takes the detected interaction segment of interacts with other people at different places.Towards this two people in one day as input,and classifies it into two end,we define the interaction segment based on the staying classes (i.e.,Short-period and long-period interaction segment) segments between two people to capture the temporal and by examining the duration of the interaction time slot in spatial characteristics of their interactions.The basic idea is the interaction segment.The intuition behind this layer is that,we first extract and characterize the interaction segments that people usually spend most time at several places (e.g., between a target user and other people based on their staying homes,offices,or schools)and shorter time at other places segments and corresponding activity features.Then we utilize (e.g.,diners,grocery stores,and post office)and so as their the temporal and spatial patterns of the closenesses of the interactions at these places.In the second layer,we make interaction segments as well as the individual daily place finer decisions from the result of the first layer.In particular, contexts to derive fine-grained relationships. we examine the daily routine-based place pair of the interac- 1)Interaction Segment Characterization:We generate inter- tion segment to further classify the interaction based on the action segments based on the staying segments of two people people's individual daily place contexts.Because the short- in the same day.Specifically,we first find the temporally period interaction should happen at least at one person's leisure overlapped segments between the daily staying segments from place in logic,the short-period interaction segment leads to the two people.Then we estimate the physical closeness be- three possible branches:workplace-leisure,home-leisure and tween every two overlapped segments by using the Equation 1.leisure-leisure.And the long-period interaction segment leads Only long overlapped segments(i.e.,time duration is longer to the pairs of workplace-workplace and home-home.In the than 10min)with at least level-1 closeness are considered last layer,we further detail the classification of the interaction as valid interaction segments.Each overlapped segment is by analyzing the physical closeness of the interaction segment described by three characteristics:1)interaction time slot,2) to infer fine-grained relationships.Specifically,we examine daily routine-based place pair based on the two users'same whether the level-4 closeness of the interaction segment is or different personal daily place contexts at the interaction non-zero or not,which suggest the two people have or not place (e.g.,Home-Home or Work-Leisure),and 3)physical have the face-to-face interaction in the place.The duration closeness,which correspond to when,where and how closely of the face-to-face interaction allows the decision tree to the two people interact,respectively.Finally,the characterized further distinguish social interaction into 8 categories of fine- interaction segments represent users'interaction at the place. grained relationships:Customers,Relatives,Friends,Team 2)Closeness-based Social Relationships Classification: members,Collaborators,Same-building Colleagues,Family After determining the interaction segments,we classify the and Neighbors,as well as excluding strangers user's social relationships leveraging the temporal and spatial The decision tree infers the possible relationships between patterns of the physical closeness in the interaction segments. two people based on their one-day social interactions.But
( &%' ) ' " " * + ' ' ' ( ' ' " ' " " ' % +, % Fig. 7. Decision tree of closeness-based social relationships classification. people’s static and active status. We empirically set a threshold to the activeness score of each significant AP and further determine the activeness (i.e., active or static) of a staying segment based on the majority vote over all significant APs. VI. SOCIAL RELATIONSHIPS AND DEMOGRAPHICS INFERENCE In this section, we present how our system utilizes the activity features provided by staying segments to derive the user’s fine-grained social relationships and demographics. A. Closeness-based Social Relationships Derivation The social relationship is about how two people interact with each other in their daily lives, including both face-to-face interaction and event the hidden interaction without encountering. Therefore, to infer social relationships, we need to understand not only a person’s activities at a place, but also how the person interacts with other people at different places. Towards this end, we define the interaction segment based on the staying segments between two people to capture the temporal and spatial characteristics of their interactions. The basic idea is that, we first extract and characterize the interaction segments between a target user and other people based on their staying segments and corresponding activity features. Then we utilize the temporal and spatial patterns of the closenesses of the interaction segments as well as the individual daily place contexts to derive fine-grained relationships. 1) Interaction Segment Characterization: We generate interaction segments based on the staying segments of two people in the same day. Specifically, we first find the temporally overlapped segments between the daily staying segments from the two people. Then we estimate the physical closeness between every two overlapped segments by using the Equation 1. Only long overlapped segments (i.e., time duration is longer than 10min) with at least level-1 closeness are considered as valid interaction segments. Each overlapped segment is described by three characteristics: 1) interaction time slot, 2) daily routine-based place pair based on the two users’ same or different personal daily place contexts at the interaction place (e.g., Home-Home or Work-Leisure), and 3) physical closeness, which correspond to when, where and how closely the two people interact, respectively. Finally, the characterized interaction segments represent users’ interaction at the place. 2) Closeness-based Social Relationships Classification: After determining the interaction segments, we classify the user’s social relationships leveraging the temporal and spatial patterns of the physical closeness in the interaction segments. Our approach is based on the intuition that different types of social relationships show different temporal patterns for various levels of physical closeness in the overlapped daily routine-based place, which reveal different degrees of interactions between two people. Figure 6 illustrates this intuition by comparing the interaction segment characteristics for two pairs of social relationships (i.e., neighbor and family, and team member and collaborator), which can be differentiated from spacial closeness degree or temporal pattern difference. We design a triple-layer decision tree for relationships classification based on examining the characteristics of the interaction segments between two people (i.e., the temporal and spatial patterns of their physical closeness). Figure 7 illustrates the flow of the decision tree. In the first layer, the decision tree takes the detected interaction segment of two people in one day as input, and classifies it into two classes (i.e., Short-period and long-period interaction segment) by examining the duration of the interaction time slot in the interaction segment. The intuition behind this layer is that people usually spend most time at several places (e.g., homes, offices, or schools) and shorter time at other places (e.g., diners, grocery stores, and post office) and so as their interactions at these places. In the second layer, we make finer decisions from the result of the first layer. In particular, we examine the daily routine-based place pair of the interaction segment to further classify the interaction based on the people’s individual daily place contexts. Because the shortperiod interaction should happen at least at one person’s leisure place in logic, the short-period interaction segment leads to three possible branches: workplace-leisure, home-leisure and leisure-leisure. And the long-period interaction segment leads to the pairs of workplace-workplace and home-home. In the last layer, we further detail the classification of the interaction by analyzing the physical closeness of the interaction segment to infer fine-grained relationships. Specifically, we examine whether the level-4 closeness of the interaction segment is non-zero or not, which suggest the two people have or not have the face-to-face interaction in the place. The duration of the face-to-face interaction allows the decision tree to further distinguish social interaction into 8 categories of finegrained relationships: Customers, Relatives, Friends, Team members, Collaborators, Same-building Colleagues, Family and Neighbors, as well as excluding strangers. The decision tree infers the possible relationships between two people based on their one-day social interactions. But
Financial Analys king hours 0.5 Fig.8.Histogram of people's working duration in a week. 00 making relationships inference based on one-day observation (a)Working behavior-based (b)Shopping and home may sometimes be opportunistic.For instance,students in occupation inference results. behavior-based gender inference. the same school may be regarded as strangers or classmates Fig.9.Illustration of behavior-based occupation and gender inference results. depending on whether a face-to-face interaction is detected in one day.In order to reduce the opportunistic inferences,we of working hours.Working time STD is the average standard propose to infer the relationships in a relative long time period deviation of the start and ending time of working across (e.g.,multiple days,one week or several weeks)and utilize a multiple days and WH Distribution Kurtosis is a descriptor majority-vote approach to make the final decision. of the distribution shape,which represents how concentrate B.Behavior-based Demographics Inference the working duration is distributed.Figure 9(a)illustrates that the three working behaviors can well separate different Next,we discuss how to utilize the activity features to further capture people's behavior characteristics at various types of occupations,which suggests that we can utilize a threshold-based approach to determine people's occupations daily places and infer people's demographics (e.g.,occupation, by using these features.We note that different occupations may gender,religion and marriage). have similar working behaviors,such as financial analyst and 1)Behavior Derivation at Daily routine-based Places:In software engineer,we can further narrow the choices for the this work,we define the behavior as the mannerisms made by an individual in the daily routine-based place during a period occupation inference by leveraging the supplementary place contexts from Geo-information and user associated AP SSIDs of time (e.g,several days).A behavior usually consists of a series of activities,and thus can be described by the temporal as in Section V-A3. and spatial statistics of the activity features extracted from 3)Gender Inference:The information of user gender is the staying segments across different days.In particular,we more implicit compared with occupation,because there is no define three kinds of behaviors:1)home behavior,2)working information from surrounding APs,which directly links to behavior,and 3)leisure behavior based on three daily routine- this biological characteristic.However,we find that males and based place categories.We utilize the activity features of the females usually behave differently in some specific scenarios. same daily routine-based place across multiple days to derive For example,females tend to spend more time on housework the features that can characterize the three behaviors.We note and in-store shopping,while males tend to work for longer that the leisure behavior can be further specified according to hours [32].Such behavior difference shows the trend of the the fine-grained daily routine-based places in Section V-A3. majority people and exists in many countries according to the 2)Occupation Inference:Occupation is the job or profes- survey.Thus our basic idea is to examine a person's behavior sion of the user,which is related to the working behavior.The characteristics at home or in shops.From activity features, inference approach is based on the fact that people of different we derive three behavior features for gender inference:shop- occupations have different working time slots and duration ping duration,shopping frequency and home duration,which at Workplace(may include single or multiple nearby places), mainly capture the behavior patterns at home and leisure which reveals different working behaviors in temporal and spa- behavior at shops.Figure 9(b)illustrates that the three devised cial.Figure 8 illustrates the intuition by showing the working behavior features can well capture the differences between duration histogram of 4 users with different occupations in males and females in their behaviors at home and in shops a week.We find that office staff has the most concentrate Additionally.we also check the user's associated AP SSIDs at working duration,followed by Researchers,Faculties and leisure places,if any,to look for the particular leisure places Students,because company office uses more regular timetable that can differentiate gender,such as nail spa and beauty salon compared with school.Meanwhile,Faculties need to leave 4)Religion Inference:We further demonstrate that it is office for teaching and faculty meeting,which leads to wider possible to infer people's religion status (i.e.Christian or working duration distribution compared with Researchers.On Non-Christian)from surrounding APs.The intuition is that the other hand,Students have the most scattered working Christian usually goes to church every Sunday and shows a durations because they have different number of classes for regular pattern of leisure behavior around the church.There- each day and flexible hours at library for study. fore,we extract three religion behavior features:church atten- We derive three specific working behavior features to differ-dance days,church attendance duration and church attendance entiate working behaviors for multiple days at working place.frequency,and apply a threshold-based method to decide Working hour(WH Distribution range describes the range of Christian.We note that,by including more religion activities, the working duration histogram,which shows the flexibility we can also cover other religions or religious sects
Working hours 0 2 4 6 8 10 Percentage 0 0.5 1 Financial Analyst Working hours 0 2 4 6 8 10 Percentage 0 0.5 1 Researcher Working hours 0 2 4 6 8 10 Percentage 0 0.5 1 Faculty Working hours 0 2 4 6 8 10 Percentage 0 0.5 1 Student Fig. 8. Histogram of people’s working duration in a week. making relationships inference based on one-day observation may sometimes be opportunistic. For instance, students in the same school may be regarded as strangers or classmates depending on whether a face-to-face interaction is detected in one day. In order to reduce the opportunistic inferences, we propose to infer the relationships in a relative long time period (e.g., multiple days, one week or several weeks) and utilize a majority-vote approach to make the final decision. B. Behavior-based Demographics Inference Next, we discuss how to utilize the activity features to further capture people’s behavior characteristics at various daily places and infer people’s demographics (e.g., occupation, gender, religion and marriage). 1) Behavior Derivation at Daily routine-based Places: In this work, we define the behavior as the mannerisms made by an individual in the daily routine-based place during a period of time (e.g, several days). A behavior usually consists of a series of activities, and thus can be described by the temporal and spatial statistics of the activity features extracted from the staying segments across different days. In particular, we define three kinds of behaviors: 1) home behavior, 2) working behavior, and 3) leisure behavior based on three daily routinebased place categories. We utilize the activity features of the same daily routine-based place across multiple days to derive the features that can characterize the three behaviors. We note that the leisure behavior can be further specified according to the fine-grained daily routine-based places in Section V-A3. 2) Occupation Inference: Occupation is the job or profession of the user, which is related to the working behavior. The inference approach is based on the fact that people of different occupations have different working time slots and duration at Workplace (may include single or multiple nearby places), which reveals different working behaviors in temporal and spacial. Figure 8 illustrates the intuition by showing the working duration histogram of 4 users with different occupations in a week. We find that office staff has the most concentrate working duration, followed by Researchers, Faculties and Students, because company office uses more regular timetable compared with school. Meanwhile, Faculties need to leave office for teaching and faculty meeting, which leads to wider working duration distribution compared with Researchers. On the other hand, Students have the most scattered working durations because they have different number of classes for each day and flexible hours at library for study. We derive three specific working behavior features to differentiate working behaviors for multiple days at working place. Working hour(WH) Distribution range describes the range of the working duration histogram, which shows the flexibility 10 WH Distribution Range 5 0 0 WH Distribution Kurtosis 5 0 5 10 15 10 Working Time STD Researcher Professors Students Software Engineer Financial Analyst Shopping hours 1 0.5 0 0 Shopping frequency 0.5 1 5 10 15 Hours at home Female Male (a) Working behavior-based (b) Shopping and home occupation inference results. behavior-based gender inference. Fig. 9. Illustration of behavior-based occupation and gender inference results. of working hours. Working time STD is the average standard deviation of the start and ending time of working across multiple days and WH Distribution Kurtosis is a descriptor of the distribution shape, which represents how concentrate the working duration is distributed. Figure 9(a) illustrates that the three working behaviors can well separate different types of occupations, which suggests that we can utilize a threshold-based approach to determine people’s occupations by using these features. We note that different occupations may have similar working behaviors, such as financial analyst and software engineer, we can further narrow the choices for the occupation inference by leveraging the supplementary place contexts from Geo-information and user associated AP SSIDs as in Section V-A3. 3) Gender Inference: The information of user gender is more implicit compared with occupation, because there is no information from surrounding APs, which directly links to this biological characteristic. However, we find that males and females usually behave differently in some specific scenarios. For example, females tend to spend more time on housework and in-store shopping, while males tend to work for longer hours [32]. Such behavior difference shows the trend of the majority people and exists in many countries according to the survey. Thus our basic idea is to examine a person’s behavior characteristics at home or in shops. From activity features, we derive three behavior features for gender inference: shopping duration, shopping frequency and home duration, which mainly capture the behavior patterns at home and leisure behavior at shops. Figure 9(b) illustrates that the three devised behavior features can well capture the differences between males and females in their behaviors at home and in shops. Additionally, we also check the user’s associated AP SSIDs at leisure places, if any, to look for the particular leisure places that can differentiate gender, such as nail spa and beauty salon. 4) Religion Inference: We further demonstrate that it is possible to infer people’s religion status (i.e. Christian or Non-Christian) from surrounding APs. The intuition is that Christian usually goes to church every Sunday and shows a regular pattern of leisure behavior around the church. Therefore, we extract three religion behavior features: church attendance days, church attendance duration and church attendance frequency, and apply a threshold-based method to decide Christian. We note that, by including more religion activities, we can also cover other religions or religious sects
80 Observation time(day) Fig.11.Social relationships inference results based on different length of (a)Social relationships inference. (b)Social relationships groundtruth. observation time. Fig.10.Social relationships comparison between inference results and the groundtruth. Each point in the graph represents a volunteer and different 5)Relationships and Demographics Refinement:We find types of lines between points represent the different relation- ships between two volunteers.Compared to the groundtruth. that the inferred relationships and demographics results can be mutually complementary.We then adopt several rules for the the overall detection rate of social relationships inference is relationship and demographics refinement.For example,the 91%,suggesting that our system can efficiently detect various relationships from surrounding AP information.In addition. family relationship between a male and a female is refined as the couple relationship or married;the collaborator between a our system also detects hidden relationships,which represent faculty and a student (or a company supervisor and a software the potential relationship that is recognizable by our system but unknown to the two volunteers due to the lack of face-to-face engineer)is refined as the advisor-student (or supervisor employee)relationship. interactions.We find that certain relationships(e.g.,colleagues and neighbors)may contain such hidden relationship. VII.PERFORMANCE EVALUATION Table I shows the detailed statistics of our social relation- A.Experiment Methodology ships inference results.We observe that we achieve 100% 1)Data Collection:Due to the limitation of the man power, detection rate for Relatives,Family and Neighbor,whereas we choose the representative occupations,working hours and achieve 83.3%.94.1%.89.5%and 87.5%detection rate for age groups for experiments to evaluate the feasibility of our Friends,Team members,Collaborators and Colleagues,re- approach.We recruit 21 volunteers (i.e.,6 females and 15 spectively,indicating that our method can accurately detect males)across three cities to collect surrounding APs informa- different relationships based on interaction features character- tion in their daily lives for over 6 months.The volunteers ized from surrounding APs.For the misclassified relationships, age from 20 to 40 and are mainly from six occupations, one team-member relation is classified as collaborators due including financial analyst,Ph.D.candidate,Master student, to irregular working time;two collaborators are classified undergraduate,assistant professor,and software engineer.We as colleagues in the same building due to low interaction ask the volunteers to install a tool developed for data collection frequency.The overall inference accuracy is 95.8%when on their own phones and run it in the background throughout we compare the detected relationships with the groundtruth. every day during the experiments.The users are asked to fill a We further detect 10 hidden relationships (i.e.,9 colleagues questionnaire to input the groundtruth.The IRB is approved. and I neighbor),while these relationships are not realized by 2)Hardware and Software:We include a variety of An- the volunteers but can be derived from their questionnaires, droid mobile devices in the real experiments including Sam- indicating our system can accurately detect most relationships sung,Huawei,LG and Xiaomi.We develop a tool on Android in daily life. platform to collect information of surrounding APs at a given Figure 11 shows the relationships inference results under frequency,i.e.,4 scans/min,which is the AP scanning fre- different length of observation time.We observe that most quency of many android systems [23].For each scan,our tool regular relationships(i.e.,family,neighbor,team member)can collects the simple information of surrounding APs,including be detected in the first day.As for other relationships,since BSSIDs,SSID,scanning time stamp and RSS. their interactions do not occur every day,we need to observe 3)Evaluation Metrics:We use the following two metrics to for more days to make a decision.The relationship inference evaluate the performance of our inference:Detection Rate.The results become stable after 5~7 days,indicating that our ratio of correctly identified results over the total numbers in system can detect most relationships in people's daily life groundtruth.Inference Accuracy.The ratio of correct inference based on their social interactions in one week. results over the total number of inference results TABLE I SOCIAL RELATIONSHIPS INFERENCE B.Evaluation of Social Relationships Inference Relationships Groundtruth Inference Correct Hidden We first examine the performance of social relationships Relatives 2 0 Fnends 0 inference from surrounding Wi-Fi APs.Figure 10 shows the Icam members 16 16 comparison between the inferred social relationships (i.e.,Fig- Collaborators 18 17 Colleagues 24 2 21 ure 10(a))among the 21 volunteers and the groundtruth from Family 6 6 6 the questionnaire(i.e.,Figure 10(b))in graphs of relationships. Neighbor
Family Neighbor Team member Collaborator Colleagues in the same building Friend Person Family Neighbor Team member Collaborator Colleagues in the same building Friend Person (a) Social relationships inference. (b) Social relationships groundtruth. Fig. 10. Social relationships comparison between inference results and the groundtruth. 5) Relationships and Demographics Refinement: We find that the inferred relationships and demographics results can be mutually complementary. We then adopt several rules for the relationship and demographics refinement. For example, the family relationship between a male and a female is refined as the couple relationship or married; the collaborator between a faculty and a student (or a company supervisor and a software engineer) is refined as the advisor-student (or supervisoremployee) relationship. VII. PERFORMANCE EVALUATION A. Experiment Methodology 1) Data Collection: Due to the limitation of the man power, we choose the representative occupations, working hours and age groups for experiments to evaluate the feasibility of our approach. We recruit 21 volunteers (i.e., 6 females and 15 males) across three cities to collect surrounding APs information in their daily lives for over 6 months. The volunteers age from 20 to 40 and are mainly from six occupations, including financial analyst, Ph.D. candidate, Master student, undergraduate, assistant professor, and software engineer. We ask the volunteers to install a tool developed for data collection on their own phones and run it in the background throughout every day during the experiments. The users are asked to fill a questionnaire to input the groundtruth. The IRB is approved. 2) Hardware and Software: We include a variety of Android mobile devices in the real experiments including Samsung, Huawei, LG and Xiaomi. We develop a tool on Android platform to collect information of surrounding APs at a given frequency, i.e., 4 scans/min, which is the AP scanning frequency of many android systems [23]. For each scan, our tool collects the simple information of surrounding APs, including BSSIDs, SSID, scanning time stamp and RSS. 3) Evaluation Metrics: We use the following two metrics to evaluate the performance of our inference: Detection Rate. The ratio of correctly identified results over the total numbers in groundtruth. Inference Accuracy. The ratio of correct inference results over the total number of inference results. B. Evaluation of Social Relationships Inference We first examine the performance of social relationships inference from surrounding Wi-Fi APs. Figure 10 shows the comparison between the inferred social relationships (i.e., Figure 10(a)) among the 21 volunteers and the groundtruth from the questionnaire (i.e., Figure 10(b)) in graphs of relationships. Observation time (day) 159 Number of relationships 0 20 40 60 80 Family Neighbor Team member Collaborators Colleagues Relatives Customers Friends Fig. 11. Social relationships inference results based on different length of observation time. Each point in the graph represents a volunteer and different types of lines between points represent the different relationships between two volunteers. Compared to the groundtruth, the overall detection rate of social relationships inference is 91%, suggesting that our system can efficiently detect various relationships from surrounding AP information. In addition, our system also detects hidden relationships, which represent the potential relationship that is recognizable by our system but unknown to the two volunteers due to the lack of face-to-face interactions. We find that certain relationships (e.g., colleagues and neighbors) may contain such hidden relationship. Table I shows the detailed statistics of our social relationships inference results. We observe that we achieve 100% detection rate for Relatives, Family and Neighbor, whereas achieve 83.3%, 94.1%, 89.5% and 87.5% detection rate for Friends, Team members, Collaborators and Colleagues, respectively, indicating that our method can accurately detect different relationships based on interaction features characterized from surrounding APs. For the misclassified relationships, one team-member relation is classified as collaborators due to irregular working time; two collaborators are classified as colleagues in the same building due to low interaction frequency. The overall inference accuracy is 95.8% when we compare the detected relationships with the groundtruth. We further detect 10 hidden relationships (i.e., 9 colleagues and 1 neighbor), while these relationships are not realized by the volunteers but can be derived from their questionnaires, indicating our system can accurately detect most relationships in daily life. Figure 11 shows the relationships inference results under different length of observation time. We observe that most regular relationships (i.e., family, neighbor, team member) can be detected in the first day. As for other relationships, since their interactions do not occur every day, we need to observe for more days to make a decision. The relationship inference results become stable after 5 ∼ 7 days, indicating that our system can detect most relationships in people’s daily life based on their social interactions in one week. TABLE I SOCIAL RELATIONSHIPS INFERENCE. Relationships Groundtruth Inference Correct Hidden Relatives 2 2 2 0 Friends 6 5 5 0 Team members 17 16 16 0 Collaborators 19 18 17 0 Colleagues 24 23 21 9 Family 6 6 6 0 Neighbor 1 1 1 1
,0.000.00.00 0.00 1.00 C, 0.00 0.00 0.12 0.88 0.00 0.02 0.00 0.88 0.10 0.00 ec, 0.24048 0.28 0.00 0.00 c.1.00.00.00 0.000.00 C C.C,C Inferred closeness cutegury (a)Demographics Inference Results (b)Demographics Inferenc ewit (a)Classification confusion matrix of (b)Classification of detailed different observation time. 4 kinds of physical closeness. daily routine-based places. Fig.12.Accuracy of behavior-based demographics inference Fig.13.Classification accuracy of physical closeness and daily routine-based C.Evaluation of Demographics Inference places. 1)Accuracy of Demographics Inference: Figure 12(a) information in large-scale areas),we evaluate our system by shows the overall accuracy of inferring demographics.For all recruiting 21 volunteers with representative occupations and the demographics in our study,our system achieves over 90.5% social relationship types.Furthermore,the study is based on accuracy for Occupation,Religion and Marriage,whereas the the users'daily life activities across three cities without being accuracy of gender inference is 95.2%for the 21 volunteers, restricted in a confined area.Since the participants'activities suggesting that it is possible to accurately infer people's at daily places are employed as the inference basis in this demographics from surrounding AP information.We further work,we believe our system has the capability to successfully study the performance of gender and occupation inference with infer fine-grained social relationships and demographics in different length of observation time as shown in Figure 12(b) larger areas when given the opportunity.We demonstrate The inference results converge after 5 days,suggesting that that the privacy leakage from the simple signal information people's behavior features derived in a short period(i.e.,one of surrounding APs is significant and should arouse public week)can accurately infer the demographics attention.For the future work,we will continue our efforts to 2)Fine-grained Social Relationships Derived from Demo- enlarge the Wi-Fi AP dataset and investigate more potential graphics:By leveraging the derived demographics informa-privacy leakages from such simple radio signals surrounding tion,we further obtained refined relationships.Based on our daily lives. the gender information,we successfully detect all the two IX.CONCLUSION couples from the 21 volunteers.Besides,from the occupation In this paper,we show that by analyzing the information inference,we specify the relationship of collaborators,e.g. from surrounding Wi-Fi Access Points(APs),the users'fine- who is superior and who is subordinate.In specifically,we grained social relationships and demographics could be dis- correctly differentiate 4 superior-subordinate from 5 collab- closed.We present a scalable inference system that has the orator pairs.These results show it is possible to accurately potential to derive people's activities at daily visited places infer fine-grained social relationships and demographics from leveraging surrounding APs and utilize such information to surrounding AP information. infer fine-grained social relationships and demographics.This D.Performance of Daily Place Extraction implemented system only uses the simple signal features of We randomly select 100 staying segments to examine surrounding APs such as MAC addresses and Received Signal whether our different levels of physical closeness can reflect Strength without requiring to obtain the context information the true relations between their physical locations.Figure 13(a) by sniffing the Wi-Fi traffic.In particular,we describe peo- presents the confusion matrix of the inferred four kinds of ple's daily places in three dimensions (i.e.time,space and closenesses and the results show that our system can achieve context)to infer people's activities and extract their activity over 88%accuracy for measuring most levels of closeness features as well as their physical closeness at same places except for Cl,whose inference relies on the remote APs or Our Closeness-based Social Relationships Inference algorithm unstable signals.We note that the lowest level Cl does not further analyzes people's physical closeness to capture when, affect the social relationships and demographics inference as where and how closely people interact to reveal fine-grained both of them mainly rely on C4 and C3. social relationships,while the Behavior-based Demographics Finally,we evaluate the accuracy of the contextual meaning Inference method extracts people's various individual behavior inference with 594 detected places.Figure 13(b)shows we can from their activity features to infer demographics.By using achieve over 90%accuracy for Workplace and Home and over the data collected by 21 participants in their daily lives 80%accuracy for detailed Leisure places (e.g.,Shop,Diner, over 6 months,our system confirms the possibility of using Church and Other).The results demonstrate the possibility surrounding APs to infer people's social relationships and to measure the physical closeness between places and infer demographics with over 90%accuracy. complex contextual meaning of daily places only from user's ACKNOWLEDGMENT surrounding APs. This work is supported in part by the NSF grants VIII.DISCUSSION CNS1514436,CNS1409767,the NSF of China grant Due to the limited manpower and shortage of public avail- 61472185 and the JiangSu Natural Science Foundation grant able data sources (i.e.,containing the scanned AP signal BK20151390
Demografic types Occupation Gender Marriage Religion Accuracy 0 0.2 0.4 0.6 0.8 1 Observation time (day) 12345678 Accuracy 0 0.2 0.4 0.6 0.8 1 Gender Occupation (a) Demographics Inference Results. (b) Demographics Inference with different observation time. Fig. 12. Accuracy of behavior-based demographics inference. C. Evaluation of Demographics Inference 1) Accuracy of Demographics Inference: Figure 12(a) shows the overall accuracy of inferring demographics. For all the demographics in our study, our system achieves over 90.5% accuracy for Occupation, Religion and Marriage, whereas the accuracy of gender inference is 95.2% for the 21 volunteers, suggesting that it is possible to accurately infer people’s demographics from surrounding AP information. We further study the performance of gender and occupation inference with different length of observation time as shown in Figure 12(b). The inference results converge after 5 days, suggesting that people’s behavior features derived in a short period (i.e., one week) can accurately infer the demographics. 2) Fine-grained Social Relationships Derived from Demographics: By leveraging the derived demographics information, we further obtained refined relationships. Based on the gender information, we successfully detect all the two couples from the 21 volunteers. Besides, from the occupation inference, we specify the relationship of collaborators, e.g. who is superior and who is subordinate. In specifically, we correctly differentiate 4 superior-subordinate from 5 collaborator pairs. These results show it is possible to accurately infer fine-grained social relationships and demographics from surrounding AP information. D. Performance of Daily Place Extraction We randomly select 100 staying segments to examine whether our different levels of physical closeness can reflect the true relations between their physical locations. Figure 13(a) presents the confusion matrix of the inferred four kinds of closenesses and the results show that our system can achieve over 88% accuracy for measuring most levels of closeness except for C1, whose inference relies on the remote APs or unstable signals. We note that the lowest level C1 does not affect the social relationships and demographics inference as both of them mainly rely on C4 and C3. Finally, we evaluate the accuracy of the contextual meaning inference with 594 detected places. Figure 13(b) shows we can achieve over 90% accuracy for Workplace and Home and over 80% accuracy for detailed Leisure places (e.g., Shop, Diner, Church and Other). The results demonstrate the possibility to measure the physical closeness between places and infer complex contextual meaning of daily places only from user’s surrounding APs. VIII. DISCUSSION Due to the limited manpower and shortage of public available data sources (i.e., containing the scanned AP signal Inferred closeness category C0 C1 C2 C3 C4 Actual closeness category C0 C1 C2 C3 C4 1.00 0.24 0.02 0.00 0.00 0.00 0.48 0.00 0.00 0.00 0.00 0.28 0.88 0.12 0.00 0.00 0.00 0.10 0.88 0.00 0.00 0.00 0.00 0.00 1.00 Work Home Shop Diner Church Other Accuracy 0 0.2 0.4 0.6 0.8 1 (a) Classification confusion matrix of (b) Classification of detailed 4 kinds of physical closeness. daily routine-based places. Fig. 13. Classification accuracy of physical closeness and daily routine-based places. information in large-scale areas), we evaluate our system by recruiting 21 volunteers with representative occupations and social relationship types. Furthermore, the study is based on the users’ daily life activities across three cities without being restricted in a confined area. Since the participants’ activities at daily places are employed as the inference basis in this work, we believe our system has the capability to successfully infer fine-grained social relationships and demographics in larger areas when given the opportunity. We demonstrate that the privacy leakage from the simple signal information of surrounding APs is significant and should arouse public attention. For the future work, we will continue our efforts to enlarge the Wi-Fi AP dataset and investigate more potential privacy leakages from such simple radio signals surrounding our daily lives. IX. CONCLUSION In this paper, we show that by analyzing the information from surrounding Wi-Fi Access Points (APs), the users’ finegrained social relationships and demographics could be disclosed. We present a scalable inference system that has the potential to derive people’s activities at daily visited places leveraging surrounding APs and utilize such information to infer fine-grained social relationships and demographics. This implemented system only uses the simple signal features of surrounding APs such as MAC addresses and Received Signal Strength without requiring to obtain the context information by sniffing the Wi-Fi traffic. In particular, we describe people’s daily places in three dimensions (i.e. time, space and context) to infer people’s activities and extract their activity features as well as their physical closeness at same places. Our Closeness-based Social Relationships Inference algorithm further analyzes people’s physical closeness to capture when, where and how closely people interact to reveal fine-grained social relationships, while the Behavior-based Demographics Inference method extracts people’s various individual behavior from their activity features to infer demographics. By using the data collected by 21 participants in their daily lives over 6 months, our system confirms the possibility of using surrounding APs to infer people’s social relationships and demographics with over 90% accuracy. ACKNOWLEDGMENT This work is supported in part by the NSF grants CNS1514436, CNS1409767, the NSF of China grant 61472185 and the JiangSu Natural Science Foundation grant BK20151390