1
Empirical Approach for Modelling Dynamic Human Contact Networks
Eiko Yoneki [email protected]://www.cl.cam.ac.uk/~ey204Systems Research GroupUniversity of Cambridge Computer LaboratoryOutline
� New Communication Paradigm� Opportunistic Networks (proximity based communication)� Empirical Approach to understand Network Structure � Human Connectivity Traces� Characteristics of Networks� Social Communities� Social-Based Forwarding Algorithms� Communication to Epidemiology� Towards Modelling Dynamic Human Contact Networks for Epidemiology 2
2
Wireless Epidemic� The wireless epidemic (Nature 449, 287-288; 2007) by Jon Kleinberg‘Digital traffic flows not only over the wired backbone of the Internet, but also in small leaps through physical space as people pass one another on the street’ New Communication Paradigm �
Opportunistic NetworksEU FP6 HaggleEU FP7 SOCIALNETSEU FP7 RECOGNITION 3Pocket Switched Networks (PSN)�������������� ������������������������������� ������������� ������ ������������������������������������� ���������!������� ���������� ���������!�������
��������������� ���������"������ ��������������#���Human-to-Human: Mobile Devices cover the Globe
4
3
Opportunistic Data Dissemination � Store-Carry-Forward Paradigm� Network holds Data� Path existing over time� Delay Tolerant Networks (DTN)� Use of Mobility (e.g. Message Ferry)� Use of Epidemic � Power of Gossip� Highly robust against disconnection, mobility, and node failures; simple, decentralised, and fast� Control Flooding (e.g. Location, Count-base, Timer, History)� Understanding Network Structure is important� Logical Connection Topology: Backbone Structure (e.g. Social Networks – Hubs and Communities) 5Human Contact Data Collection
� Robust data collection from real world� Post-facto analysis and modelling yield insight into human interactions
� Data is useful from building communication protocol to understanding disease spread
6Modelling Contact Networks: Empirical Approach
4
Proximity Data Collection� Sensor board (iMote), mobile phone� Proximity detection by Bluetooth, and/or GPS � Environmental information (e.g. in train, on road)AroundYou FluPhone
iMote
7Proximity Detection by Bluetooth
� Bluetooth usage (e.g. Bath, UK 7.5%, San Francisco, USA 13.5% among all pedestrians in 2007)� Scanning Interval� 2 mins iMote (one week battery life)� 5 mins phone (one day battery life)� or Continuous scanning by station nodes� BT inquiry can only happen in 1.28-second intervals. 4x1.28 (5.12 seconds) gives >90% chance of finding device
� 5~10m Discover Range� Phone – equipped Bluetooth in mobile phones� Transform Discrete Event Trace 8
5
Sensor Board or Phone or ...� iMote needs disposable battery� Expensive� Third world experiment� Mobile phone� Rechargeable� Additional functions (messaging, tracing)� Smart phone: location assist applications� RF tag...� Special radio based sensor (e.g. BAS)
� Provide device or software� Combine with online information 9Location Data� Location data necessary?� Ethic approval gets tougher� Use of WiFi Access Points or Cell Towers� Use of GPS but not inside of buildings� Infer location using various information � Online Data (Social Network Services, Google)� Use of limited location information – Post localisation
10
6
�
��☺☺� �
� Provide devices to limited population or target general public� For epidemiology study ~=100% coverage necessary?
� Or school as mixing centres
Target Population
11Experiment Parameters vs Data Quality� Battery life vs Granularity of detection interval� Duration of experiments� Day, week, month, or year?� Data rate� Data Storage� Contact /GPS data <50K per device per day (in compressed format)� Server data storage for receiving data from devices� Extend storage by larger memory card � Collected data using different parameters or aggregated? 12
7
Data Retrieval Methods� Retrieving collected data:� Tracking station� Online (3G, SMS)� Uploading via Web� via memory card� Incentive for participating experiments� Collection cycle: real-time, day, or week?
13Data Transformation for Analysis� Transform to discrete version of contact data� Deal with noise and missing data� Ex. transitivity closure� Post localisation
14
8
Security and Privacy� Current method: Basic anonymisation of identities (MAC address)
� Use of HTTPS for data transmission via 3G� Anonymising identities may not be enough?� Simple anonymisation does not prevent to be found the social graph� Ethic approval tough! � Any collection of medical information makes it complex � 40 pages of study protocol document for ‘behaviour -FluPhone’ project – took several months to get approval15Human Connectivity Traces� Capture Human Interactions� ..thus far not large scale
Contact: 025d04b2b3f 4650000025d0 5416492246711621549 5416492246711644527Location: 0025d0e113da [lon: -3.384610278596745E125; lat: 1.3168305280597862E182] 5066619950170431763 16
9
17
City of Bath: Scanner Location
Analyse Network Structure and Model� Network structure of social systems to model dynamics
� Parameterise with interaction patterns, modularity, and details of time-dependent activity� Weighted networks� Modularity� Centrality (e.g. Degree, betweenness)� Community evolution� Network measurement metrics� Patterns of interactionsPublications at:http://www.cl.cam.ac.uk/~ey204http://www.haggleproject.org http://www.social-nets.eu/ 18
10
Basic Metrics
19
Timeline *Encountering Pairs (BATH) – 5 Days� Regularity of Encountering
* Timeline: 6mins/unit 20
11
Inter Contact Time of Pair Nodes � Hybrid Power Law Distribution?
Time21loglog histogram for times less than 12 hours (MIT trace)
Edge WeightI. High Contact No - Long Duration: CommunityII. High Contact No - Short Duration: Familiar StrangerIII.Low Contact No - Short Duration: StrangerIV. Low Contact No - Long Duration: Friend
Contact DurationNumber of Contact III
III IV90 seconds
22
12
� 7500 nodes in Bath Data for 5 days
Tuesday5 Days
Regularity of Network Activity
23Time Dependent Networks� Data paths may not exist at any one point in time but do exist over time
Time Source A
Destination BX Y24
13
Centrality in Dynamic Networks� Degree Centrality: Number of links� Closeness Centrality: Shortest path to all other nodes� Betweenness Centrality: Control over information flowing between others� High betweenness node is important as a relay node � Large number of unlimited flooding, number of times on shortest delay deliveries � Analogue to Freeman centrality
BA CD 25Party and Date Hubs
� High Degree Distribution: Party Hub connects to the same set of nodes, while Date Hub changes the neighbourhood nodes
26
14
Neighbourhood Similarity Rate� Find High Degree Hub locally� Neighbourhood Similarity Rate (NSR)*N is a set of neighbourhood nodes� Neighbourhood plus Neighbourhood Similarity Rate (NNSR)
� Can NSR and NNSR characterise Party and Date Hubs? * time unit without connectivity is suppressed in sparse networks 27Dynamics of NSR (MIT Trace)
� Node10: Continuous High NSR� Node 17: Change of Neighbourhood
� On average, 30% (Party Hub), 40% (Date Hub), 30% (Combined) 28
15
Cumulative Infectious Nodes � Human Epidemic: TTL close to order of day� Apply SI model� 7500 nodes in urban environment (BATH trace)1 DAY12 HOURS6 HOURS 29Three Stages of Epidemic Dynamics� First Rapid Increase: Propagation within Cluster� Second Slow Climbing� Reach Upper Limit of Infection 17 days
MIT Trace 30
16
Three Stages of Epidemic Dynamics (continued)� UCSD
� INFC06
16 days15 hours31
Fiedler ClusteringK-CLIQUE (K=5)
Uncovering Community� Contact trace in form of weighted (multi) graphs� Contact Frequency and Duration� Use community detection algorithms from complex network studies� K-clique [Palla04], Weighted network analysis [Newman05], Betweenness [Newman04], Modularity [Newman06], Fiedler Clustering etc.
32
17
K-CLIQUE Detection� Union of k-cliques reachable through a series of adjacent k-cliques
� Adjacent k-cliques share k-1 nodes� Members in a community reachable through well-connected subsets
� Examples� 2-clique (connected components)� 3-clique (overlapping triangles)� Overlapping feature 33
Barcelona GroupParis Group AParis Group BLausanne GroupParis Groups Barcelona Group
Lausanne GroupK=3
K-CLIQUES Communities (Conference)
34
18
Barcelona GroupParis Group AParis Group BLausanne GroupParis Groups Barcelona Group
Lausanne GroupK=4
K-CLIQUES Communities (Conference)
35BUBBLE RAP Forwarding
� Optimisation of Epidemic Forwarding� Epidemic forwarding - highly robust against disconnection, mobility, and node failures; simple, decentralised, and fast� Control Flooding is necessary (e.g. Count-base, Timer, History)� Social hubs (e.g. celebrities and postman) as betweenness centrality and combining communitystructure for improved routing efficiency
� LABEL Community based� RANK Centrality based� BUBBLE RAP
Global CommunitySub CommunitySource Destination
36
19
Communication to Epidemiology� Building communication protocol based on proximity� EU FP6 Haggle Project� Inferring social interaction, opinion dynamics �Apply results to networking and computer systems� EU FP7 Socialnets, EU FP7 Recognition
� Bio-Inspired Computing and Communication". Edited a book, LNCS 5151, Springer, 2008...2nd edition in progress.
� Understanding behaviour to infectious disease outbreak -social and economic influences� ESRC FluPhone Project with LSHTM
� Network modelling for epidemiology� EPSRC Data Driven Network Modelling for Epidemiology37Data Driven Approach� Threat to public health: e.g., , , SARS, AIDS� Current understanding of disease spread dynamics� Epidemiology: Small scale empirical work � Physics/Math: Mostly large scale abstract/simplified models� Real-world networks are far more complex� Advantage of real world data � Emergence of wireless technology for proximity data
� Goal: post-facto analysis and modelling yield insight into human interactions� How does community structure affect epidemic spread?� How do hubs and weak links influence temporal or spatial effects, and how does this affect the transmission characteristics of disease?� How does community topology of interpersonal connections and its hierarchical nature yield a multi-level structure? 38
20
Outcomes: Prediction of Epidemics� Infectious disease control/prediction systems� Provide vaccination strategy� Predict potential outbreaks� Incorporate human connectivity information to epidemic models� Mobility, interaction, behavioural assumption� Time dependent reproduction ratio� Integrate online and web information� Capture behavioural response of nodes� Analyse web search and blog activities� Twitter could act as early warning system Google Flu Trend Google: ReportedSwine Flu Symptoms39
Twitter acts as early warning system Vienna (AFP) April 13, 2010 - The micro-blogging site Twitter could act as an early warning system for epidemics, a team of experts at London's City University found in a new study published on Tuesday. According to a team of interdisciplinary experts, around three million messages -- or so-called "tweets" -- posted in English on Twitter between May and December 2009 contained the word "flu". Their study was presented to the European Congress of Clinical Microbiology and Infectious Diseases (ECCMID) being held in Vienna this week. "The numbers of tweets we collected by searching by keywords such as 'flu' or 'influenza' has been astronomical," one of the study's co-authors, Patty Kostkova, told AFP. "What we're looking at now is, what is the potential of this enormous data set for early warning systems. Because it's a real time media, it can call for an immediate response if required." Among the so-called "tweets", the experts counted 12,954 messages containing the phrase "I have swine flu" and 12,651 saying "I've got flu". They also counted the frequency of other terms, such as "H1N1" and "vaccine".... 40
21
Extending Data Collection to OSN� Online Social Networks (e.g. Facebook, Twitter)� Potential to obtain data of dynamic behaviour� High volume of dataDoes Facebook matter?� Over 190 M users� Growth rates for 2008 around the world� Italy: 2900%, Argentina: 2000%, Indonesia: 600
41Power Law Degree Distribution
42
� Crawled original Stanford (15043 Nodes), Harvard (18273 nodes) networks� From era when UIDs assign sequentially
� Obtains friends of each user, and their affiliations� 2.1 million links, Maximum degree 911
22
Cascade Symptom (Use of Geo-coding)
43TexasIllinois Florida
The FluPhone Project� Understanding behavioural responses to infectious disease outbreaks with London School of Hygiene and Tropical Medicine (LSHTM)
� Proximity data collection using mobile phone from general public in Cambridge
https://www.fluphone.org
44
23
FluPhone: Main Screen
45FluPhone: Report Symptom
46
24
FluPhone: Report Time - Feedback
47FluPhone Server – Data Collection� Via GPRS/3G FluPhone server collects data� Collection cycle: ~real-time, day, or week?� Collection methods:� Online 3G� Uploading via Web
48
25
Study Status� Pilot study (April 21 ~ May 15)� Computer Laboratory � University scale study (May 15 ~ June 30)� Advertisement (all departments, 35 colleges, student union, industry support club, Twitter, Facebook...)� Employees and students of University of Cambridge, their families, and any residents or people who work in Cambridge� Issues� Limited phone models are supported� Motivation to participate� Flu is not threat at this moment 49Encountered Bluetooth Devices
50May 14, 2010April 16, 2010
� A FluPhone Encountering History� 1495 unique devices per 10 days� Is he party-animal or a shy wall-flower?
26
Simulation of Disease – SEIR Model� Four states on each node:SUSCEPTIBLE�EXPOSED�INFECTED�RECOVERD � Parameters� p: exposure probability � a: exposed time (incubation period)� t: infected time� Diseases� D1 (SARS): p=0.8, a=24H, t=30H� D2 (FLU): p=0.4, a=48H, t=60H� D3 (COLD): p=0.2, a=72H, t=120H� Seed nodes� Random selection of 20% of nodes (=7) among 36 nodes 51SARS
52
� Exposure probability = 0.8� Exposed time = 24H (average)� Infected time =30H (average)
Day 11Day 1
27
Flu
53
� Exposure probability = 0.4� Exposed time = 48H (average)� Infected time = 60H (average)
Day 11Day 1SEIR – Normalised Form
54
SUSCEPTIBLE EXPOSEDINFECTED RECOVERD
28
Time to Exposure vs #of Meetings
55
� Distribution of time to infection (black line) is strongly influenced by time dependent adjacency matrices of meetings
Day 11Day 1Simple Flood (3 Stages)� First Rapid Increase: Propagation within Cluster� Second Slow Climbing� Reach Upper Limit of Infection
5 days56
29
Virtual Disease Experiment� Spread virtual disease via Bluetooth communication in proximity radio range
� Integrate SAR, FLU, and COLD in SEIR model� Provide additional information (e.g. Infection status, news) to observe behavioural change
57Conclusions
� Quantitative Contact Data from Real World!� Analyse Network Structure of Social Systems to Model Dynamics � Emerging Research Area� Weighted networks� Modularity� Centrality (e.g. dynamic betweenness centrality)� Community evolution and dynamics� Network measurement metrics� Integrate Background of Target Population� Location specific� Demography specific...� Virtual Disease Experiment � Behavioural study� Applying methodology to measure contact networks in Malawi, Africa (with diary-based survey) 58