digital traces: modeling urban mobility using wifi probe...

Digital Traces: Modeling Urban Mobility using Wifi Probe DataMartin Traunmueller

Center for Urban Science & ProgressNew York University

1 MetroTech Center, 19th FloorBrooklyn, NY 11201

[email protected]

Nicholas JohnsonCenter for Urban Science & Progress

New York University1 MetroTech Center, 19th Floor

Brooklyn, NY [email protected]

Awais MalikCenter for Urban Science & Progress

Department of Civil & Urban EngineeringNew York University

1 MetroTech Center, 19th FloorBrooklyn, NY [email protected]

Constantine E. KontokostaCenter for Urban Science & Progress

Department of Civil & Urban EngineeringNew York University

1 MetroTech Center, 19th FloorBrooklyn, NY 11201

[email protected]

ABSTRACTCity governments all over the world face challenges understandingmobility pa�erns within dense urban environments at high spatialand temporal resolution. While such measures are important toprovide insights into the functional pa�erns of a city, novel quanti-tative methods, derived from ubiquitous mobile connectivity, areneeded, o�ering decision makers be�er insights to improve urbanmanagement and planning decisions.

In this paper, we propose a model that uses Wi� probe requestdata to model urban mobility in a dense, mixed–use district in NewYork City. We collect probe request data over 29 access points of apublic WiFi network in Lower Manha�an for one day, accountingfor more than 1million observations and over 60,000 unique devices.First, we aggregate unique entries per access point and per hour,showing that our method has the potential to collect the sametype of data as other common methods, by detecting di�erencesin usage activity pa�erns by time of day. We then use a spatialnetwork analysis to identify edge frequencies of journeys betweenthe network nodes, and apply the results to the road and pedestriansidewalk network to identify usage and trajectories at the streetsegment level.

CCS CONCEPTS•Applied computing →Transportation;

KEYWORDSModeling Urban Mobility, Wi� Probe Data, Spatial Network Analy-sis, Big Data

Copyright is held by the author/owner(s). UrbComp’17, August 14,2017, Halifax, Nova Scotia, Canada.

1 INTRODUCTIONWith an annual growth of 60 million new city dwellers everyyear [19], the world is experiencing a rapid population shi� of peo-ple moving from rural areas into urban environments over the lastseveral decades. Driven by technological innovations and increas-ing economic opportunities [5], this situation has led to a steadyincrease in motorized and pedestrian mobility in cities all over theworld [12]. For city governments, this increased demand has leadto challenges in managing city services and infrastructure, and inmaintaining quality–of–life standards for its population, as con-gestion and overcrowding of areas can negatively a�ect the city’seconomy [16], sustainability [22] and its population’s health [8].

To address these challenges, city managers need to understandpa�erns of urban mobility to enable targeted and “smart” interven-tions to limit overcrowding, improve service delivery, and ensuree�ective emergency response. In many cases, methods to mea-sure mobility dynamics focus on reporting tra�c counts at speci�cpoints in the city at discrete times, typically using rather simpletechnology, such as pressure hoses or Piezo–electric sensors [15].Installed at strategic locations of the city, such as at intersections orheavily–used roads, these “counting–gates” are limited in terms ofscalability and real-time feedback, and can be cost–intensive whenapplied to large areas.

With the rise of remote and in–situ sensing technologies, theanalysis of closed–circuit–television (CCTV) footage o�ers a morequantitative approach for computer scientists and urbanists to countnot only motor, but also pedestrian tra�c on a large scale by usingcomputer vision algorithms [15]. As CCTV cameras have become acommon way to observe people and tra�c in cities, these methodso�er a more scalable and less labor–intensive method to counttra�c, being limited only by the number of camera locations.

However, these methods are limited to providing counts at spe-ci�c, �xed locations for a speci�c time period. In doing so, they donot provide data about the routes and trajectories of pedestrians,constraining the potential to understand actual street and sidewalkusage over space and time. Current work in data mining aims to �llthis gap by using mobile phone data to model urban mobility [2, 9].As promising results are in terms of accuracy, they show limitations

in terms of population representation by capturing only mobileusers of a speci�c network provider and typically with low spatialgranularity.

Data that are independent of speci�c network providers are ableto capture a larger sample of the population at any given placeand time. One example of this is smart device probe requests toWi� access points (APs) in public urban space. With an increasingnumber of public Wi� APs and networks in cities, these networkscan provide dense coverage across the cityscape, particularly atthe neighborhood or district scale. Each AP continuously “senses”its surroundings in terms of potential users equipped with a Wi�-enabled mobile device, which are sending probe requests to avail-able networks and proximate APs at a regular frequency.

As mobile devices are con�gured by default to scan the envi-ronment for Wi� APs automatically to minimize network charges,they regularly transmit packets of data to nearby APs. With anincreasing penetration of Wi� connectible mobile devices, such astablets or smartphones (64% of all U.S. citizens owned a smartphonein 2016 and approximately 80% in New York City )1, we hypothesizethat Wi� probe data can be used to model urban mobility and itstraces on a large scale.

In this paper, we use a dataset of Wi� probe requests collectedby 29 Wi� APs over the duration of one day in an area of LowerManha�an in New York City, NY. First, we show how Wi� probedata can be used to report counts at each AP, similar to “countinggates” methods as described above, and used to understand localizedactivities. Second, we conduct network analysis to describe a spatialnetwork that can be applied to street and sidewalk segments. Wedemonstrate how these data can be used to analyze common pathsof travel and trajectories, indicating the intensity street activityover time.

We begin by presenting current literature on measuring urbanmobility, and then present our data and data processing steps. Wethen introduce our methodology and present results for urban mo-bility counts and traces. We conclude with an in–depth discussionof the �ndings, including limitations and applications to city man-agement and planning.

2 LITERATURE�e most commonly used method to capture urban mobility bycity agencies is the installment of “counting–gates” at pre–de�nedplaces, such as intersections or heavily–used main roads. Whiletechnology has improved over the years, the method has remainedrelatively the same by using, for instance, pneumatic road tubes,Piezo–electric sensors or infrared sensors [15] to count primarilymotor tra�c.

While these methods o�er an easy way to quantify the number ofcars on a street, they are rather expensive to run due to installationand service charges, compared to the output they provide. O�eringtra�c aggregations for speci�c geographic locations within a spe-ci�c time frame, such as per hour or per day, also create limitationsin terms of temporal and geographical scalability.

More current work uses advances in computer vision to analyzeclosed–circuit–television (CCTV) feeds to count motorized andpedestrian tra�c at lower costs. In doing so, researchers and city

1h�ps://www.statista.com/statistics/201183/forecast-of-smartphone-penetration-in-the-us/

governments are now able to count tra�c at places with CCTV–coverage, such as on intersections for instance, by applying com-puter vision algorithms, such as blob–detection2. Focusing primar-ily on motor tra�c, this approach has been extended over the yearsto also count pedestrians3.

�e analysis of CCTV footage o�ers e�ective ways to aggregatetra�c quantitatively and is only limited by the number of CCTV–camera locations in a city. As the usage of CCTV cameras in theurban environment is growing due to congestion and security con-cerns, the method becomes increasingly applicable to count tra�con a large scale. However, in focusing on tra�c counts, it does noto�er any insight into the routes people take between their locations,and hence do not provide information about usage frequency onroad level.

�e increasing availability of open data has o�ered researchersnovel opportunities to study tra�c routes, in particular for publictransport, on a large scale following a data mining approach. In do-ing so, metro journeys4, the use of public bike sharing schemes [20]or GPS traces of taxis [6] for instance have been visualized and thetime–dependent frequencies of routes through cities detected.

While the results of such studies can contribute to the e�ciencyof public transport systems, these open data sources do not includeinformation about the population who do not use public transport.As many people in U.S. cities travel by car or increasingly walk [13],using these data sources excludes a large portion of the urbanpopulation and is therefore not representative of the entire urbanpopulation.

A data source that includes these populations are call detailrecords (CDR). With the increased use of mobile phones over thelast decade, CDR data from mobile phone providers have becomea popular source for urban mobility research. For instance, Yuanand Raubal [21] extracted dynamic mobility pa�erns in urban areasusing a ‘Dynamic Time Wrapping’ algorithm, and were able toclassify areas according to the pa�erns.

Calabrese et al. [2] combined mobile phone traces and odometerreadings from annual vehicle safety inspections to map mobilityas averaged individual total trip lengths for the case of Boston. Indoing so, researchers found for instance, that the 2 most importantfactors for regional variations in mobility are accessibility to workand non–work destinations, while population density and mix ofland–use showed less signi�cance.

Other work uses CDR data to model urban �ow instead. Gonzalezet al. [7], for instance, studied 100.000mobile phone user trajectoriesover 6 months and found, that human trajectories show a highdegree of temporal and spatial regularity. Furthermore, �ndingssuggest that humans follow rather simple reproducible mobilitypa�erns.

�ese works shows the opportunities of using CDR data to re-search urban mobility on large scale. However, at the same timetelecommunication data is highly con�dential and o�en di�cultto access for researchers. One possible way to gain access to suchdata is to take part in a data mining challenges, such as the Data

2h�p://www.tra�cvision.com3h�p://www.placemeter.com4h�p://wgallia.com/#!underground

2

https://www.statista.com/statistics/201183/forecast-of-smartphone-penetration-in-the-us/

http://www.trafficvision.com

http://www.placemeter.com

http://wgallia.com/#!underground

for Development challenge5 and the Big Data Challenge6, whereproviders make parts of their data publicly available. However, asavailable data is being pre–processed, its accuracy o�en su�ersdue to unknown data processing steps [17]. Furthermore, as far asdata is provided by various mobile phone providers, it o�ers a lowrepresentation of urban population by keeping out various groupsof people that use other providers, or people that do not have a cellphone contract, such as the elder and poor, or use Pay–As–You–Gooptions.

A data set that includes those groups is Wi� probe data by beingprovider independent. By default, a mobile device is con�gured tosteadily scan its environment for available Wi� APs and hence, con-tinues to transmit data, including time, geolocation and the deviceMAC address. Such data has been used by researchers to predictuser destinations and routes through a probabilistic approach thatuses visited destinations according to Wi� traces [4]. More recentwork [14] uses Wi� data that collected through a large scale study,to capture human mobility for the case of Copenhagen. Based ona previous scienti�c study, the data contains a high level of infor-mation about the users and uses data of logged–in users only. Inreality, Wi� probe data that has been collected ‘in the wild’ is lessdetailed and the number of people logging–in to Wi� networks islimited. �ese circumstances lead to questions about the suitabilityof this approach for quantitative mobility modeling.

In a recent study,7 London’s public transport provider ‘Trans-port for London’ (TfL) uses Wi� probe data collected from APs ofsubway station within its network to model passenger �ow of itscustomers. While TfL’s ‘Oyster’ travel card data allowed the detec-tion of traveler journey’s origin and destination, the data sourcedid not provide information about the route the passenger tookin between them, as for instance, where passengers change trains.Adding Wi� probe data to the analysis shed light into this limita-tion, as researchers were now able to detect the di�erent routes andtrain changes people took to reach their destination. Furthermore,by using Wi� probe data from its network, TfL was also able totrack people’s movement in stations, so to detect busy platformsand dispatch sta� accordingly.

�is study shows the potential of Wi� probe data, collected bypublic transport’s network provider, to model mobility within asubway network and within its stations. In this paper we show howsimilar data, collected by public Wi� APs, can be used to modelurban mobility on large scale in a densely populated area of NewYork City instead.

3 METHOD AND RESULTS3.1 Dataset description�e method we are proposing requires access to a Wi� probe dataset, that contains information about detected client devices andtheir proximity to surrounding Wi� APs at a speci�c time. For thisstudy we use a dataset provided by the ‘Alliance for DowntownNew

5h�p://www.d4d.orange.com/en/Accueil6h�p://www.telecomitalia.com/tit/en/bigdatachallenge.html7h�ps://t�.gov.uk

Figure 1: Location and land–use map of Wi� access points(APs) on Manhattan’s Lower East Side / New York City, asused in this study. Wi� APs are marked as black dots; red ar-eas are commercial use, yellow are residence areas and pur-ple are industrial areas.

York’8, the management for Downtown–Lower Manha�an’s Busi-ness Improvement District, covering an area within Manha�an’sLower East Side in New York City (Figure 1).

With its mix of high density commercial (colored red) and resi-dential (colored yellow) buildings,9 Lower Manha�an representsa very dynamic and diverse environment, which makes it suitablefor our urban mobility testing, as di�erent groups of people areexpected to activate the area at di�erent times throughout the day.

Our dataset covered observations from 29 Wi� AP locationsthroughout the study area along Water and South Street, fromBroad Street in the South–West to Fulton Street in the North–East,and included APs on two of the East River piers, as shown inFigure 1. Data was collected on April, 18th 2016 between 4.00AM and 6.00 PM, and includes a total of 1,012,519 data points.Every data point includes the MAC address of the observing AP(apMac), the client device’s MAC address (clientMac), the device’sreceived signal strength indicator as seen by AP (rssi) and theobservation time (seenTime). Locations for each AP were storedin a separate shape�le by the provider, containing street name andnearest building number to the Wi� AP, which we visualized usingmapPLUTO for NYC.10.

8h�p://www.downtownny.com9h�p://maps.nyc.gov/doi�/nycitymap/10h�ps://www1.nyc.gov/site/planning/data-maps/open-data/dwn-pluto-mappluto.page

3

http://www.d4d.orange.com/en/Accueil

http://www.telecomitalia.com/tit/en/bigdatachallenge.html

https://tfl.gov.uk

http://www.downtownny.com

http://maps.nyc.gov/doitt/nycitymap/

https://www1.nyc.gov/site/planning/data-maps/open-data/dwn-pluto-mappluto.page

https://www1.nyc.gov/site/planning/data-maps/open-data/dwn-pluto-mappluto.page

Figure 2: Client distribution per AP and per hour between4AM – 6PM. Each colour indicates one AP. To provideanonymity, we removed journeys with less than three usersfrom the graph.

3.2 Data pre–processingData, such as our Wi� probe data set, is very sensitive throughthe provision of MAC addresses of clients. Combined with loggedWi� AP locations, such data can lead to privacy issues as it opensopportunities for tracking individual people. To ensure anonymity,we �rst de–identi�ed our dataset by replacing MAC addresses ofclients with an anonymous identi�er, consisting of a unique in-cremental integer starting from 1. In doing so, the data could notbe traced back to the individual and hence, would eliminate manyunderlying privacy concerns. (�e study received IRB approvalwith IRB No.: #IRB-FY2017-526.)

With the de–identi�ed dataset, we next checked for uniquenessof data points and possible missing input to ensure its complete-ness for our analysis. In doing so, we noticed that 74 data pointswere counted multiple times, and 428 data points were missinggeographical or temporal information. To provide completeness,we removed these data points from the data set.

We also observed that some of the client devices were capturedby multiple Wi� APs at the same time. As this would lead tospatial inaccuracy for our results, we removed multiple entries fromour dataset, ensuring to keep the entry for the client’s closest APonly. �e rssi information provided in the dataset provides a usefulindicator for this purpose: By keeping the entry with the highestrssi–value, we keep the closest Wi� AP location and excluded theothers from the dataset.

Furthermore, we examined the Wi� APs each client was con-nected to and observed that while most clients showed connectionsto various APs, indicating movement between the locations, someclients were only captured by a single AP during the entire studyperiod. As people might pause for a while on the go, we removed

data points that showed no movement for longer than 20 minutes,as suggested by [1]. �ese device connection pa�erns are indicativeof being stationary and hence would skew the mobility model byincreasing an AP’s use frequency. As a ma�er of fact, most of thedeleted clients were detected by the same AP throughout the wholeday. �e removal of detected faulty data – especially through thestationary devices – resulted in a total of 259,444 entries, collectedfrom 60,970 unique clients. Overall, we removed 74.37% of theobservations from our raw data.

A�er cleaning the data, we joined location and observation datasets by assigning AP addresses to each observation, according toAP ID number, and mapped their locations.

3.3 AggregationHaving cleaned the dataset from multiple entries and assigned APlocations, we were now able to build a model of urban mobility.As described in the literature section, the most common way forcity governments to model urban mobility is the installment of“counting–gates”, providing aggregated tra�c counts at speci�clocations for a de�ned temporal unit, such per hour or weekday.By scanning and detecting Wi�–enabled devices in hyper–localenvironments, we show now howWi� APs can be used in a similarway to count the number of pedestrians at a given time. �erefore,as a �rst step, we aggregate the number of entries per AP andper hour to detect di�erences, such as low and peak hours, infrequencies of counts over the day.

Looking at the frequencies of entries for each AP we observe along–tail distribution: while themost–frequented AP shows 137,087entries, the least–frequented AP showed 1601 entries; the overallmean frequency among observed APs was 31,320 (1. �artile =13,520, 3. �artile = 42,810).

In Figure 2 we visualize the entry distribution per hour for ob-served time frame, with the most–frequented AP reaching up toover 17,000 entries in the hour between 3PM-4PM, re�ecting theabove outcome.

While we observe di�erences in frequencies between Wi� APs,as, for instance, some APs are used more frequently than others,we can see a similar temporal pa�ern of frequency rise towardsthe a�ernoon across all APs in the network. Usage is generally lowbefore 6AM, as expected, and then increases a�er 9AM, reaching a�rst peak at noon (12PM-1PM) and the main peak between 3PMand 4PM.

�ese results are rather surprising as we would have expected a�rst peak in Wi� usage in the morning rush–hours (around 8AM),in addition to peaks at noon and in the late a�ernoon. However, weinterpret this outcome as re�ection of the social dynamics of thisarea, resulting from the mix of land–use, which is mostly commer-cial and residential as shown in Figure 1, and the fact that LowerManha�an is very popular among visitors and tourists from all overthe world. Activity in the area begins around 6AM, with commutersstarting to go to work. Activity picks up at 9AM when, besidescommuters, residents and visitors start their routine. Around noon,employees from local businesses have their lunch break, o�en innearby restaurants. �ey go back to work until 3PM-4PM, whenthey commute back home.

4

(a) 5AM-6AM (b) 6AM-7AM (c) 7AM-8AM (d) 8AM-9AM

(e) 9AM-10AM (f) 10AM-11AM (g) 11AM-12PM (h) 12PM-1PM

(i) 1PM-2PM (j) 2PM-3PM (k) 3PM-4PM (l) 4PM-5PM

Figure 3: Visualization for client frequencies per Wi� access point (AP) location and per hour, from at 5AM (top le�) till 5PM(bottom right). Colors represent di�erent APs; the bigger the circle size, the higher the frequency.

To see how this pa�ern is re�ected spatially, we next mappedthese aggregations according to AP locations per hour. In Figure 3we show aggregated frequencies for Wi� APs (indicated by colors)per hour, starting from 5AM (top le� map) until 5PM (bo�om rightmap). We observe geographic di�erences in usage frequencies byAP location depending on time of day. �e general increase in APusage in the a�ernoon, which we have observed in the temporalanalysis, is also visible here and indicated by circle areas for each APlocation – the larger the radius, the higher the number of aggregatedclients.

�e temporal variance between morning (4AM-9AM) and a�er-noon hours (2PM-6PM) was found to be highest among APs that arelocated in the south–western part of the area, around Wall Street,compared to APs in the north–eastern corner, around Fulton Street.We furthermore observe that Wi� APs that are located on the EastRiver piers slowly increase their usage throughout the morningand reach their peaks in the later a�ernoon, a�er 3PM.

Looking at the land–use of the area (Figure 1), this result is notsurprising and re�ects the area’s usage dynamics, as the south–western and north–eastern part of observed area a�ract di�erent

groups of people: while Wall Street and adjacent areas are knownto be prime business locations in New York City, this area is mostlikely to a�ract commuters activating the area during o�ce hours.�e corner of Fulton / Water Street, on the other side, is mostlyoccupied by multi–family residences and hence, is most likely toa�ract residents. While commuters in commercial districts weredetected by APs especially when coming or leaving the area, orduring the lunch hour, the detection of local residents shows a moreeven temporal usage distribution throughout the day.

In summary these results show how Wi� APs can be used tomeasure aggregations representing population activity at givenlocations, o�ering a quantitative, less expensive alternative to com-mon approaches of “counting–gates” as described in the literature.However, by measuring client aggregations at Wi� locations, theoutput o�ers only a limited picture as it does not include any infor-mation about where or how people move between these locations.�erefore this method is limited when it comes to evaluating actualstreet and sidewalk network usage.

5

Figure 4: Network graph, showing connections betweenWi� AP nodes and their edge frequency strength. �e widthof the edges indicates the frequency of journeys between thenodes: the wider an edge, the more journeys were detected.

Next we will show how the same Wi� probe data can be usednot only to generate aggregations at points in the city, but also tocapture movements between those to model urban mobility at boththe street segment level and on larger district scales.

3.4 User traces between Wi� locationsTomodel urbanmobility on road segment level, we �rst run a spatialnetwork analysis to identify client traces between Wi� APs. To doso, we �ltered by client ID number and journey for our 60,912 users.A journey was de�ned as the series of visited Wi� APs by each userthroughout the day. In Figure 4 we show the network graph, asde�ned by all detected journeys, with edge frequencies between theWi� AP nodes. Frequencies showed a long–tail distribution: whilethe most–frequented journey show 2481 trips, the least–frequentedjourneys were single trips between two Wi� APs; the overall meanfrequency among observed journeys was 147.6 (1. �artile = 11.0,3. �artile = 157.8). In the graph, we indicate the frequency ofjourneys between the nodes by the width of the edges – the wideran edge, the more journeys were detected.

We observe a high density of client journeys between the nodeslocated along Water Street, the main thoroughfare in Lower Man-ha�an and important tra�c route for pedestrians, public transportand motor tra�c, connecting the area from east to west.

We also observe a high density of client journeys between thenodes located in the test bed’s south–west corner, compared to thenorth–east. We interpret this as an indication of the di�erences inmovements between the major communities found in these por-tions of the neighborhood. While commuters (mostly found in thecommercial south–western part) might be detected by many Wi�APs while on the go between work, meetings and commuting, localresidents (mostly found in the multi–family residence north–east

Figure 5: Street network showing road usage per segment.�e opacity of red indicates the frequency usage for eachstreet segment: the less opaque, the more journeys happenon the road segment.

part) might be detected by less APs due to being less mobile (dueto staying at home, for instance).

Furthermore, we observe that Wi� APs located at the piers ofthe East River, where ferries and water taxis stop, show strongerconnections to APs in the commercial district. We interpret thisresult as indication that people coming to the area by boat are mostlikely commuters going to and from work.

To see how these movements a�ect the usage of street segments,we next applied our �ndings to the street network. To do so, we usedstreet network data which is open–source and publicly available forNew York City11 and overlay it on our AP locations. As most of thelocations were o�side road vectors, we �rst assigned them to thenearest road segments, using the k–nearest neighbor classi�cationalgorithm [3]. Having assigned APs to road segments enabled usthen to generate the shortest path journeys between each Wi�location as described by the network graph on street segment level.As long as our graph is unweighted, we use the breadth–�rst searchalgorithm [11] to generate the shortest path between the nodes.

In Figure 5 we show the outcome of our model. �e opacity of redindicates the frequency usage for each segment: the less opaque,the more journeys happen on the road segment. Comparing toFigure 4, we now see how people use roads for their journeysbetween AP locations, based on the shortest path algorithm. Weobserve and con�rm, for instance, the importance ofWater Street asindicated by the network graph. We also can see that the increasedactivity in the south–western part of the test bed, as observed in thenetwork graph, is concentrated along Water Street and Pearl Street,connecting most of the analyzed AP locations in the area. We alsocan see that Wall Street and Broad Street show a high frequency ofjourneys, which we interpret as the preferred routes between the

11h�p://www1.nyc.gov/site/planning/data-maps/open-data.page6

http://www1.nyc.gov/site/planning/data-maps/open-data.page

ferry landing points at East River piers to o�ces in the commercialdistrict.

Road usage in the residential dominated areas around FultonStreet of the north–west is found to be lower when compared tothe areas of commercial uses, as expected. Looking at di�erencesin scale of the urban environment, we can observe that in this areawe �nd smaller block size, compared to the south–western part.As smaller blocks, crossed by more roads, o�er more options inroute–�nding, the high frequencies as seen in Figure 4 might beexplained by pedestrians spreading out over various routes.

In summary, our results show the potential of using Wi� probedata to model urban mobility quantitatively, even beyond aggrega-tions of passers–bys at speci�c locations in the urban fabric. Next,we will discuss the results of our approach and its limitations anddiscuss applications and future steps of research.

4 DISCUSSION4.1 SummaryIn this paper, we have presented a novel method to model urbanmobility on large scale, following a data analysis approach. �emethod requires access to Wi� probe data, including informationabout client activities and location information of itsWi�APs. Fromthe dataset, we extracted aggregations about client frequencies perWi� AP location, showing the method’s potential to capture similardata as commonly used “counting–gates” on a quantitative leveland at low cost.

We then used the same Wi� probe data in a network analysis,combined it with open source road network data, and were ableto evaluate journeys and their frequencies on road segment levelbetween Wi� APs based on the shortest routes.

With respect to the area’s land–use pa�ern, both approachesallowed interpretations on how di�erent groups of the urban popu-lation, such as workers or residents, occupy and use urban spaceat di�erent times based on speci�c usage pa�erns. In doing so,we were able to detect temporal di�erences between routes withinprimarily commercial and residential areas and their connections topublic transport, such as water taxi landings. �is pa�ern recogni-tion shows the method’s potential to identify not only how di�erentpopulation types move through localized areas in cities.

As the network of Wi� APs in cities is continuously becomingdenser, the use of Wi� probe data to model urban mobility is be-coming increasingly precise. Supported by the ongoing open datamovement, an increasing amount of road network data for citiesin di�erent parts of the world is freely available and can be usedfor these purposes. Wi� probe data on the other hand is moredi�cult to access, but a variety of data mining challenges, such asAccenture’sWi� Analytics Case Competition12 or the MIT Big DataLiving Labs13 show a clear trend of Wi� providers and researchfacilities making their data available, at least at low spatial resolu-tion, to the public. �is development suggests that the proposedmethodology will become increasingly applicable in the comingyears.

12h�ps://newsroom.accenture.com/subjects/awards-rankings/13h�p://livinglab.mit.edu

4.2 LimitationsWe acknowledge that this exploratory work is constrained by anumber of limitations. First of all, using passively collected Wi�probe data, we have to be aware that our method is highly depen-dent on the overall number and network density of AP locationsdata has been collected from: �e higher the density of APs, themore accurate the model’s performance becomes, we expect. Inour case, we have shown the potential of the approach for citieswith a relatively high penetration of Wi� APs, such as New YorkCity. However, we do not know how the same approach wouldperform in other cities with a lower density of Wi� APs, as forinstance, in less developed countries. As a ma�er of fact, as citiesin these parts of the world show the highest rise in urban popula-tion [18] the method would contribute most to urban planning insuch environments.

Besides AP network density, the method we propose is highlydependent on the accuracy of collected data. Accuracy could su�er,for instance, due to shielding problems (e.g. shielded by buildings)especially in densely built urban areas. �erefore researchers usingthis method need to be aware of what Wi� APs are able to capture,and what is meaningful to the purpose of modeling urban mobility.In our case, we also had to exclude a rather large part - three–quarters - of our data points that were identi�ed as stationarydevices (as for instance, a desktop computer in an o�ce building) toensure the model’s accuracy. By not changing their location, suchdevices are constantly detected by the same surrounding Wi� APs,and hence captured in the dataset throughout the day. Accuratelydetecting stationary devices is thus critical to the accuracy of themodel.

By using Wi� probe data, we need to be aware of what it isable to represent, and what is unobservable. As described in theliterature, we expectWi� probe data to bemore suitable to representurban population compared to CDR data through being network–independent: As far as penetration of Wi�–capable devices in theU.S. is rather high (64% of all U.S. citizens owned a smartphone in2016)14, we can expect a high accuracy in outcome in the case ofNew York City. However, as promising these numbers are, they alsoindicate that our method still misses up to a third of the population,as we are not able to identify mobility of populations not carryinga smart device and are therefore not being captured by Wi� APs.�is fact can lead to bias in the model outputs, particularly as itpertains to understanding the mobility of children, the elderly, andother population sub-groups less likely to possess a Wi�–enableddevice.

In this work, we do not di�erentiate between pedestrian or mo-torist. As some roads are for motor tra�c only, this might lead toinaccuracies in route generation. For instance, we observe in Fig-ure 5 that roads along the East River (South Street and FDR Drive)are being heavily used by clients. As in reality parts of these routesare for motor tra�c only, some of the generated routes might bemisleading for individual cases. Furthermore, here we apply theshortest–route algorithm to generate our result. As people do notalways follow the shortest route and may not always follow thestreet or road, we need to consider the application of other routingalgorithms to be more suitable.

14h�ps://www.statista.com/statistics/201183/forecast-of-smartphone-penetration-in-the-us/7

https://newsroom.accenture.com/subjects/awards-rankings/

http://livinglab.mit.edu

https://www.statista.com/statistics/201183/forecast-of-smartphone-penetration-in-the-us/

4.3 ImplicationsRecognizing these limitations, we can see the potential of thisapproach to bene�t to a number communities to model and researchurban mobility. �ese communities include:

(1) Researchers in urban and computational studies can usethe method to model and analyze people �ow and detectto what degree urban mobility pa�erns impact conditionsin the urban environment, or the other way around. Indoing so, the output can be used, for instance, to describepossible relationships to urban phenomena, such as crime,congestion or property values, by including other urbandata sources, such as crime data, road construction data orhouse price data.

Other research can use the method to further discussthe relationship between urban mobility an land–use, anddiscuss opportunities to describe communities based ontheir movement pa�erns. By describing urban commu-nities based on movement pa�erns, our method opensdoors to investigate novel approaches to describe socio–economic pa�erns of a city. Results of such work wouldsupport recent work [10] suggesting Wi� data as dynamicalternative to census data.

(2) For city governments and decision makers tools canbe built on the top of our method, informing urban plan-ning decisions. Such tools can, for instance, help to detectmobility pa�erns in a city over time, and how they wouldchange in case of emergencies. �e outputs of such toolscan inform road improvement and construction work inorder to avoid unnecessary tra�c congestion. Other toolsbased on this method could inform planning decisions bymodeling urban mobility on di�erent road network vari-ations to see how mobility �ow would change and e�ecturban life. An optimized design solution can be de�nedand inform construction decisions.

(3) At the same time, the method can also support local econ-omy to increase its business activity. In doing so, themodelcould be used, for instance, to suggest locations that areconsidered to increase the revenue by providing a high rateof passing–by potential customers. Furthermore, as themethod is able to describe di�erent types of passers–by,such as residents or workers, it provides detailed insight forbusinesses into who the potential customers are, informinglocation decisions.

(4) For urban population, mobile applications can be builtbased on our method, informing both motorized and pedes-trian urban navigation in a dynamic way. Such applicationcan, for instance, suggest routes with less car–congestionin real–time or suggest walking route preferences to avoidor seek crowds, depending on the user’s mood. In o�eringmood–de�ned routes, such applications have the potentialto support urban walkability, impacting urban quality-of-life and the city’s sustainability.

5 CONCLUSION AND FUTUREWORKIn this paper we have presented a novel methodology in its earlystages to model urban mobility at scale and at low cost by using

Wi� probe request data, and were able to relate detected movementpa�erns to the local street network. As cities all over the worldcontinue to grow dramatically, being able tomeasure urbanmobilityis key to providing improved quality-of-life for city dwellers. Ourapproach will contribute therefore to various communities in andoutside of academia, to a city’s economy, and to the general public.

To do so, we will continue to improve the method by minimiz-ing detected limitations and re�ning the methodology to increaseits accuracy. �erefore, we will in a next step make an e�ort todi�erentiate between pedestrians and motor–tra�c by using thethe temporal information provided in the data as proxy for client’smovement speed. By combining output with additional informationabout the road types (if pedestrian–friendly or not), we will be ableto inform routing procedures of the model accordingly and will pro-vide a more accurate picture of urban mobility. To minimize �awsdue to route generation, we will review and test routing algorithmsother than the shortest–route.

Furthermore, we have shown in presented work how the methodperforms on a data set that captures 14 hours of one day only, fora test bed that is limited to 29 Wi� AP’s in New York City. To seehow the method performs on larger scale, we will next includedata capturing a longer time frame and a larger geographic area.�ese steps will include not only New York City, but also, to detectpossible di�erence depending on built and social environment, inother cities, in other countries and of other cultures.

REFERENCES[1] I.A. Azmi, A.K. Hafazah, and M.A. Mood Zaamreen. 2012. Comparing the

Walking Behaviour between Urban and Rural Residents. Social and BehavioralSciences 68 (2012), 406–416.

[2] F. Calabrese, M. Diao, G. Di Lorenzo, J. Ferreira Jr., and C. Ra�i. 2013. Under-standing individual mobility pa�erns from urban sensing data: A mobile phonetrace example. Transportation Research Part C: Emerging Technologies 26 (2013),301–313.

[3] T. Cover and P. Hart. 1967. Nearest neighbor pa�ern classi�cation. IEEE Trans-actions on Information �eory 13, 1 (1967), 21–27.

[4] A. Danalet, M. Bierlaire, and B Farooq. 2012. Estimating Pedestrian DestinationsUsing Traces from WiFi Infrastructures. Pedestrian and Evacuation Dynamics(2012), 1341–1352.

[5] J. Dargay, D. Gately, and M. Sommer. 2007. Vehicle ownership and incomegrowth, worldwide: 1960-2030. Energy Journal 28, 4 (2007), 143–170.

[6] N. Ferreira, J. Poco, H. T. Vo, J. Freire, and C. T. Silva. 2013. Visual Explorationof Big Spatio-Temporal Urban Data: A Study of New York City Taxi Trips.Transactions on Visualization and Computer Graphics 19, 12 (2013), 2149–2158.

[7] M. C. Gonzalez, C. A. Hidalgo, and B. Albert-Laszlo. 2008. Understanding indi-vidual human mobility pa�erns. Nature 453 (2008), 779–782.

[8] E. Hansson, K. Ma�isson, J. Bjoerk, P. O. Oestergren, and K. Jakobsson. 2011.Relationship between commuting and health outcomes in a cross-sectional pop-ulation survey in southern Sweden. Public Health 11, 834 (2011).

[9] S. Jiang, Y. Yang, S. Gupta, D. Veneziano, S. Athavale, and M. C. Gonzales. 2016.�e TimeGeo modeling framework for urban mobility without travel surveys.PNAS 113, 45 (2016), E5370–��E5378.

[10] C.E. Kontokosta and N. Johnson. 2017. Urban phenology: Toward a real–timecensus of the city using Wi–Fi data. Computers, Environment and Urban Systems64 (2017), 144��–153.

[11] C. Y. Lee. 1961. An Algorithm for Path Connections and Its Applications. IRETransactions on Electronic Computers 3 (1961), 346–365.

[12] A. Millard-Ball and L. Schipper. 2010. Are We Reaching Peak Travel? Trends inPassenger Transport in Eight Industrialized Countries. Transport Reviews 31, 3(2010).

[13] A. Milne and Melin. 2014. Bicycling and Walking in the United States 2014– Benchmarking Report. Technical Report. Alliance for Biking and Walking,Washington, DC.

[14] P. Sapiezynski, A. Stopczynski, R. Gatej, and S. Lehmann. 2015. Tracking HumanMobility Using WiFi Signals. PLoS ONE 10, 7 (2015).

[15] B. Slack. 2017. Tra�c Counts and Tra�c Surveys. �e Geography of TransportSystems (2017).

8

[16] M. Sweet. 2013. Tra�c Congestion’s Economic Impacts: Evidence from USMetropolitan Regions. Urban Studies 51, 10 (2013).

[17] M. Traunmueller, G. �a�rone, and L. Capra. 2014. Mining Mobile PhoneData to Investigate Urban Crime �eories at Scale. In Proc., SocInfo 2014: SocialInformatics. Springer, 396–411.

[18] Department of Economic United Nations and Population Division Social A�airs.2015. World Urbanization Prospects: �e 2014 Revision. (2015).

[19] U.H. WHO. 2010. Hidden cities: unmasking and overcoming health inequities inurban se�ings. Technical Report. World Health Organization WHO. LibraryCataloguing–in–Publication Data.

[20] J. Woodcock, M. Tainio, J. Cheshire, and A. Goodman. 2014. Health e�ects of theLondon bicycle sharing system: health impact modelling study. BMJ 348 (2014).

[21] Y. Yuan and M. Raubal. 2012. Extracting Dynamic Urban Mobility Pa�erns fromMobile Phone Data. Geographic Information Science. 7478 (2012), 354–367.

[22] P. Zhao. 2014. Private motorised urban mobility in China’s large cities: �esocial causes of change and an agenda for future research. Journal of TransportGeography 40 (2014).

9

digital traces: modeling urban mobility using wifi probe...

Documents