takeaways in large-scale human mobility data … › papers › lanman_slides.pdf• data...
TRANSCRIPT
![Page 1: Takeaways in Large-scale Human Mobility Data … › papers › lanman_slides.pdf• Data competitions • Orange D2D 2013 • TIM Big Data Challenge 2015 • flowminder.org, South](https://reader033.vdocuments.us/reader033/viewer/2022053008/5f0b8e317e708231d4311805/html5/thumbnails/1.jpg)
Takeaways in Large-scale Human Mobility Data Mining
Guangshuo Chen, Aline Carneiro Viana, and Marco Fiore
![Page 2: Takeaways in Large-scale Human Mobility Data … › papers › lanman_slides.pdf• Data competitions • Orange D2D 2013 • TIM Big Data Challenge 2015 • flowminder.org, South](https://reader033.vdocuments.us/reader033/viewer/2022053008/5f0b8e317e708231d4311805/html5/thumbnails/2.jpg)
Human Mobility Investigation
!2
timeLocations
General Networking
•Prediction•Reconstruction•Characterization
•Paging management•Caching optimization•Resource allocation
![Page 3: Takeaways in Large-scale Human Mobility Data … › papers › lanman_slides.pdf• Data competitions • Orange D2D 2013 • TIM Big Data Challenge 2015 • flowminder.org, South](https://reader033.vdocuments.us/reader033/viewer/2022053008/5f0b8e317e708231d4311805/html5/thumbnails/3.jpg)
Human Mobility Investigation1. Data collection Operator (CDR), App + Volunteers, WiFi, …
2. Data preliminary/processing
!3
3. Data utilization
•Prediction•Reconstruction•Characterization
•Paging•Caching•Resource allocation
![Page 4: Takeaways in Large-scale Human Mobility Data … › papers › lanman_slides.pdf• Data competitions • Orange D2D 2013 • TIM Big Data Challenge 2015 • flowminder.org, South](https://reader033.vdocuments.us/reader033/viewer/2022053008/5f0b8e317e708231d4311805/html5/thumbnails/4.jpg)
How to Obtain 100K Users’ Locations?
!4
WiFi? App + Volunteers? No! Only CDR!
• (Legacy) Call Detail Records• (Now) Charging Data Records• 3GPP TS 32.240• Calls, SMS, data sessions,
mobility updates, on/off, …• Necessary for billing• Large populations
![Page 5: Takeaways in Large-scale Human Mobility Data … › papers › lanman_slides.pdf• Data competitions • Orange D2D 2013 • TIM Big Data Challenge 2015 • flowminder.org, South](https://reader033.vdocuments.us/reader033/viewer/2022053008/5f0b8e317e708231d4311805/html5/thumbnails/5.jpg)
Phone Time Location
User ID 1 2015-01-0106:47:56
(19.028, -98.209)
User ID 1 2015-01-0108:23:08
(18.993, -98.202)
User ID 2 2015-01-0216:59:34
(19.025, -98.217)
Payload of
communication events
CDR Data
![Page 6: Takeaways in Large-scale Human Mobility Data … › papers › lanman_slides.pdf• Data competitions • Orange D2D 2013 • TIM Big Data Challenge 2015 • flowminder.org, South](https://reader033.vdocuments.us/reader033/viewer/2022053008/5f0b8e317e708231d4311805/html5/thumbnails/6.jpg)
• Collaboration- Operators- Data companies- Researchers
• Internet! • Publicly available
datasets• Where to find?
!6
from CDR datasets?
How to Obtain 100K Users’ Locations?
![Page 7: Takeaways in Large-scale Human Mobility Data … › papers › lanman_slides.pdf• Data competitions • Orange D2D 2013 • TIM Big Data Challenge 2015 • flowminder.org, South](https://reader033.vdocuments.us/reader033/viewer/2022053008/5f0b8e317e708231d4311805/html5/thumbnails/7.jpg)
Publicly Available CDR datasets• Data competitions• Orange D2D 2013• TIM Big Data Challenge 2015• flowminder.org, South Africa• dandelion.eu, Milan, Italy• (Chinese) kesci.com Big Data Competitions• 2016-2017, China Unicom, Shanghai,
642K users (2016), 1 weeks• (Chinese) zjdex.com, China Mobile,
Hangzhou, 7K users, 1 month
!7
![Page 8: Takeaways in Large-scale Human Mobility Data … › papers › lanman_slides.pdf• Data competitions • Orange D2D 2013 • TIM Big Data Challenge 2015 • flowminder.org, South](https://reader033.vdocuments.us/reader033/viewer/2022053008/5f0b8e317e708231d4311805/html5/thumbnails/8.jpg)
Data Preliminary Is Always Required
• Imperfectness • temporal heterogeneity• abnormalities
•
!8
0 20 40 60 80 100
120
140
160
180
200
220
240
260
280
300
320
340
day
06
1218
hour
Number of users (⇥104)
510152025
![Page 9: Takeaways in Large-scale Human Mobility Data … › papers › lanman_slides.pdf• Data competitions • Orange D2D 2013 • TIM Big Data Challenge 2015 • flowminder.org, South](https://reader033.vdocuments.us/reader033/viewer/2022053008/5f0b8e317e708231d4311805/html5/thumbnails/9.jpg)
Data Preliminary: Common Practices
• Extract location coordinates • Filter out “bad” users • Reduce resolutions• Segment observing periods• Correlate with mobility loss • Perform controlled experiments• Fill spatiotemporal gaps
!9
![Page 10: Takeaways in Large-scale Human Mobility Data … › papers › lanman_slides.pdf• Data competitions • Orange D2D 2013 • TIM Big Data Challenge 2015 • flowminder.org, South](https://reader033.vdocuments.us/reader033/viewer/2022053008/5f0b8e317e708231d4311805/html5/thumbnails/10.jpg)
Extract Location Coordinates• Locations in CDR
- Geographical coordinates - Cell tower IDs (extraction required) ‣MCC-MNC-LAC-CID
• Reliable cell tower locations- France OpenData, www.data.gouv.fr
• Crowdsensing, third-party services- OpenCelliD, Google Geolocation,
Unwired Labs, OpenSignal, and Mozilla Location Service
!10
![Page 11: Takeaways in Large-scale Human Mobility Data … › papers › lanman_slides.pdf• Data competitions • Orange D2D 2013 • TIM Big Data Challenge 2015 • flowminder.org, South](https://reader033.vdocuments.us/reader033/viewer/2022053008/5f0b8e317e708231d4311805/html5/thumbnails/11.jpg)
Unwired Labs
![Page 12: Takeaways in Large-scale Human Mobility Data … › papers › lanman_slides.pdf• Data competitions • Orange D2D 2013 • TIM Big Data Challenge 2015 • flowminder.org, South](https://reader033.vdocuments.us/reader033/viewer/2022053008/5f0b8e317e708231d4311805/html5/thumbnails/12.jpg)
Filter Out Users• Shrinking user population• 6M -> 100K•Gonzalez et al. "Understanding individual human mobility
patterns,” Nature, 2008
• 10M -> 50K•Song et al. "Limits of predictability in human mobility,” Science,
2010
• 1M -> 700•Hoteit et al. "Estimating human trajectories and hotspots through
mobile phone data." Computer Networks, 2014
• To cut off, to let go, and to move on!12
![Page 13: Takeaways in Large-scale Human Mobility Data … › papers › lanman_slides.pdf• Data competitions • Orange D2D 2013 • TIM Big Data Challenge 2015 • flowminder.org, South](https://reader033.vdocuments.us/reader033/viewer/2022053008/5f0b8e317e708231d4311805/html5/thumbnails/13.jpg)
Collaborate with Mobility Loss
!13
Entropy
Incompleteness (%)
10%
20%
30%
![Page 14: Takeaways in Large-scale Human Mobility Data … › papers › lanman_slides.pdf• Data competitions • Orange D2D 2013 • TIM Big Data Challenge 2015 • flowminder.org, South](https://reader033.vdocuments.us/reader033/viewer/2022053008/5f0b8e317e708231d4311805/html5/thumbnails/14.jpg)
Before Filtering: Mobility Measurement
• Location• Cell coverage• Repetitiveness• Categories
• Individual• Displacement• Travelled distance• Radius of gyration
!14
![Page 15: Takeaways in Large-scale Human Mobility Data … › papers › lanman_slides.pdf• Data competitions • Orange D2D 2013 • TIM Big Data Challenge 2015 • flowminder.org, South](https://reader033.vdocuments.us/reader033/viewer/2022053008/5f0b8e317e708231d4311805/html5/thumbnails/15.jpg)
Before Filtering: Mobility Measurement
• Location• Cell coverage • Repetitiveness• Categories
• Individual• Displacement• Travelled distance• Radius of gyration
!15
km0 10 20 30 40
km0
5
10
15
20
25
30
35
40
45
0
1
2
3
4
5
6
7
8
9
10
![Page 16: Takeaways in Large-scale Human Mobility Data … › papers › lanman_slides.pdf• Data competitions • Orange D2D 2013 • TIM Big Data Challenge 2015 • flowminder.org, South](https://reader033.vdocuments.us/reader033/viewer/2022053008/5f0b8e317e708231d4311805/html5/thumbnails/16.jpg)
Before Filtering: Mobility Measurement
• Location• Cell coverage• Repetitiveness • Categories
• Individual• Displacement• Travelled distance• Radius of gyration
!16
100 101 102
L-th most visited location
10�3
10�2
10�1
100
P(L
)
(L)�1
![Page 17: Takeaways in Large-scale Human Mobility Data … › papers › lanman_slides.pdf• Data competitions • Orange D2D 2013 • TIM Big Data Challenge 2015 • flowminder.org, South](https://reader033.vdocuments.us/reader033/viewer/2022053008/5f0b8e317e708231d4311805/html5/thumbnails/17.jpg)
Before Filtering: Mobility Measurement
• Location• Cell coverage• Repetitiveness• Categories
• Individual• Displacement• Travelled distance• Radius of gyration
!17
Home Work
![Page 18: Takeaways in Large-scale Human Mobility Data … › papers › lanman_slides.pdf• Data competitions • Orange D2D 2013 • TIM Big Data Challenge 2015 • flowminder.org, South](https://reader033.vdocuments.us/reader033/viewer/2022053008/5f0b8e317e708231d4311805/html5/thumbnails/18.jpg)
Before Filtering: Mobility Measurement
• Location• Cell coverage• Repetitiveness• Categories
• Individual• Displacement • Travelled distance• Radius of gyration
!18
![Page 19: Takeaways in Large-scale Human Mobility Data … › papers › lanman_slides.pdf• Data competitions • Orange D2D 2013 • TIM Big Data Challenge 2015 • flowminder.org, South](https://reader033.vdocuments.us/reader033/viewer/2022053008/5f0b8e317e708231d4311805/html5/thumbnails/19.jpg)
Before Filtering: Mobility Measurement
• Location• Cell coverage• Repetitiveness• Categories
• Individual• Displacement• Travelled distance • Radius of gyration
!19
![Page 20: Takeaways in Large-scale Human Mobility Data … › papers › lanman_slides.pdf• Data competitions • Orange D2D 2013 • TIM Big Data Challenge 2015 • flowminder.org, South](https://reader033.vdocuments.us/reader033/viewer/2022053008/5f0b8e317e708231d4311805/html5/thumbnails/20.jpg)
Before Filtering: Mobility Measurement
• Location• Cell coverage• Repetitiveness• Categories
• Individual• Displacement• Travelled distance• Radius of gyration
!20
![Page 21: Takeaways in Large-scale Human Mobility Data … › papers › lanman_slides.pdf• Data competitions • Orange D2D 2013 • TIM Big Data Challenge 2015 • flowminder.org, South](https://reader033.vdocuments.us/reader033/viewer/2022053008/5f0b8e317e708231d4311805/html5/thumbnails/21.jpg)
Case Study: Next-location Prediction
• Dataset: CDR (voice calls), ~1M users, 15 months
• User filtering:• Metropolitan area• Radius of gyration < 32km (urban, peri-urban)• Completeness > 20%• Weekdays only• Three or more unique locations• Data traffic > 1KB/day• Active days > 150 days
• Users of study: 7K• Travel history: >150 days for each user
!21
![Page 22: Takeaways in Large-scale Human Mobility Data … › papers › lanman_slides.pdf• Data competitions • Orange D2D 2013 • TIM Big Data Challenge 2015 • flowminder.org, South](https://reader033.vdocuments.us/reader033/viewer/2022053008/5f0b8e317e708231d4311805/html5/thumbnails/22.jpg)
• Problem:
• Accuracy Percentage of correct predictions • Theoretical predictability
- Entropy + Fano’s inequality => Accuracy upper bound
• Practical predictive models- PPM: Prediction by partial matching- MLP: Multi-layer perceptron
l̂t = arg maxl2Cells
P (Lt = l|lt�1, lt�2, · · · )
![Page 23: Takeaways in Large-scale Human Mobility Data … › papers › lanman_slides.pdf• Data competitions • Orange D2D 2013 • TIM Big Data Challenge 2015 • flowminder.org, South](https://reader033.vdocuments.us/reader033/viewer/2022053008/5f0b8e317e708231d4311805/html5/thumbnails/23.jpg)
0.4 0.5 0.6 0.7 0.8 0.9 1.0Predictability ⇡(L)
0.0
0.2
0.4
0.6
0.8
1.0
F(⇡
)⇡PPM(L)
⇡MLP(L)
⇡MLP-CI(L)
⇡MLP-CI(L)PastV⇧max
u (L) Theoretical predictability
Travel History
Travel History + Time
Travel history +Time+Data Traffic
CDF
![Page 24: Takeaways in Large-scale Human Mobility Data … › papers › lanman_slides.pdf• Data competitions • Orange D2D 2013 • TIM Big Data Challenge 2015 • flowminder.org, South](https://reader033.vdocuments.us/reader033/viewer/2022053008/5f0b8e317e708231d4311805/html5/thumbnails/24.jpg)
How to Improve Next-location Prediction?
• Still from travel history?• (2006) Markov Chain, Text Compression Algorithms
• Song, Libo, et al. "Evaluating next-cell predictors with extensive Wi-Fi mobility data." IEEE Transactions on Mobile Computing 5.12 (2006): 1633-1649.
• (2010) Matrix/Tensor Factorization• Zheng, Vincent Wenchen, et al. "Collaborative Filtering Meets Mobile Recommendation: A
User-Centered Approach." AAAI. Vol. 10. 2010.
• (2012) ARIMA models• Li, Xiaolong, et al. "Prediction of urban human mobility using large-scale taxi traces and its
applications." Frontiers of Computer Science 6.1 (2012): 111-121.
• (2016) Non-parametric Bayesian + MCMC• Jeong et al. “Cluster-aided mobility predictions." INFOCOM 2016, IEEE, 2016.
• (2016) Recurrent Neural Networks• Liu, Qiang, et al. "Predicting the Next Location: A Recurrent Model with Spatial and
Temporal Contexts." AAAI. 2016.
•New techniques are limited
!24
![Page 25: Takeaways in Large-scale Human Mobility Data … › papers › lanman_slides.pdf• Data competitions • Orange D2D 2013 • TIM Big Data Challenge 2015 • flowminder.org, South](https://reader033.vdocuments.us/reader033/viewer/2022053008/5f0b8e317e708231d4311805/html5/thumbnails/25.jpg)
• Travel history + Context + Deep learning
Travel history
AND
Shared trajectories
(2016) by Jeong et al.
Mobile Traffic
(2018) Call/SMS by González et al.
(2018) Data Sessions by us
?Future
Environment (WiFi/Bluetooth)
POIs
Detailed Traffic