monitoring communication channels between targeted individuals
DESCRIPTION
Monitoring Communication Channels between Targeted Individuals. Ross Sparks. May 2013. Outline. Social networks as a source of information Communications volume between persons of interest Business intelligence Twitter messages – syndromic surveillance-disaster management - PowerPoint PPT PresentationTRANSCRIPT
Monitoring Communication Channels between Targeted IndividualsRoss Sparks
MAY 2013
Outline| Ross Sparks
Outline
• Social networks as a source of information• Communications volume between persons of interest• Business intelligence• Twitter messages – syndromic surveillance-disaster management
• Review of spatio-temporal surveillance• Similarities with monitoring communication levels between targeted people• Differences
• A suggested solution• Order statistics and qq-plots• Deciding on the appropriate level of network aggregation• Some simulation results• Extensions to higher dimensions
2 |
Information| Ross Sparks
Social networks as a source of information
• In Australia, twitter messages have been successfully used in the real-time management of bushfires• Who is affected?• How are they affected?• Where is the fire spreading and how fast is it moving?
– e.g., a combination of a tornado and a bushfire – very fast - devastating.
• Social media information is being mined for security purposes• Facebook is proving useful in criminal investigations
– Addresses, photos, activities, etc.– Conversations and networks.
• Suspected terrorists and friends are being followed – phone, e-mail and cloud services are all being mined
3 |
Business intelligence| Ross Sparks
Social networks: a source of business intelligence
• Companies are monitoring what customers say about them and their competitors.
• Companies are monitoring their employees to better manage their risks• What employees say to each other?• What they say to others outside the company?
• HR departments of companies are looking at people’s Facebook pages to better evaluate suitability of a person joining the company.
• Hence social network monitoring is likely to increase in the future.
4 |
Privacy and ethics| Ross Sparks
Ethical issues and privacy concerns
• Clearly the privacy concerns are an issue.
• Cyber bullying - a concern.
• Cyber crime is on the rise• Exploiting children/child pornography• Cyber scams• Misinformation
• This paper is not going to deal with the ethical issues relating to social media, but wanted to raise it as an important consideration.
5 |
Point Process? | Ross Sparks
Setting the scene for monitoring spatial point processes
6 |
LONGITUDE
LATITUDE
Spatio-temporal surveillance| Ross Sparks
The time dimension
7 |
TIME
LONGITUDE
LATITUDE
The scan statistic - this counts the number of incidents in the spatio-temporal block and compares it to the expected count
Spatio-temporal applications| Ross Sparks
• Disease outbreaks which cluster spatially
• Detecting emerging traffic hot-spots
• Pockets of Australia where domestic violence is increasing significantly more than forecasts/expected
• Criminal activity that cluster spatially
• Identifying geographical regions of higher sales than expected for specific items
• Identifying geographical regions where there are a higher number of people cessing their household insurance policy than expected
• etc
Applications
8 |
Social networks| Ross Sparks
Social networks
9 |
Social networks| Ross Sparks
Who are “neighbours” in the social network?
10 |
cA cB cC cD cE cF cG cH cI cJ cK cL cM cN cO cP cQ cR cS cTrA 0 6 6 6 10 5 7 7 6 6 1 0 1 1 3 2 0 0 1 2rB 6 0 6 11 8 8 10 8 9 11 1 2 0 2 0 0 0 1 1 0rC 4 8 0 6 10 8 5 8 8 6 1 1 0 0 1 2 0 1 3 1rD 6 11 8 0 7 6 8 7 4 8 1 3 0 1 2 0 0 2 1 3rE 8 10 9 8 0 6 12 8 6 8 1 0 0 1 0 1 0 0 1 2rF 6 8 6 9 8 0 6 10 10 8 0 1 2 2 0 0 1 0 0 0rG 9 8 6 8 12 10 0 8 10 10 3 0 1 0 1 2 1 0 1 1rH 7 9 8 8 11 10 8 0 10 10 0 0 0 0 0 2 0 1 1 1rI 8 6 7 4 6 9 8 10 0 8 0 1 1 1 1 1 0 0 1 1rJ 6 10 6 10 8 6 8 8 6 0 0 2 1 3 1 1 0 3 3 0rK 1 2 0 0 1 2 2 1 0 3 0 4 15 8 6 6 4 12 8 5rL 1 2 1 1 0 1 1 1 2 0 2 0 8 10 10 10 6 10 6 9rM 1 0 2 2 0 3 0 1 0 1 11 10 0 9 10 8 12 8 7 9rN 1 0 4 0 1 1 0 3 4 0 9 10 9 0 6 11 9 10 6 7rO 3 0 0 0 2 2 1 0 0 0 8 8 10 6 0 4 8 8 14 14rP 2 1 1 1 0 4 0 1 1 4 6 10 6 9 6 0 10 8 6 10rQ 0 0 1 0 0 1 1 0 0 0 4 7 14 9 10 9 0 8 10 8rR 1 1 3 0 1 0 1 0 1 3 10 10 8 10 8 8 10 0 9 5rS 1 1 1 0 0 0 0 0 0 0 8 6 7 7 12 6 10 9 0 6rT 1 1 0 0 0 1 1 2 2 0 8 9 7 6 10 8 9 8 6 0
Number of times A contacts B, etc
Security| Ross Sparks
Monitoring people who are a security risk
• Assume that there are 1000 past criminals (out of jail) that you wish to monitor.
• The scan statistic – Looking for gangs of 5 in the above network – This would need to investigate close to 10 billion (using the long scale)
potential gangs using an exhaustive “SCAN”.
• An computational feasible alternative is needed.
11 |
Aggregations| Ross Sparks
Dynamic aggregation levels
• In the spatio-temporal monitoring we try to dynamically decide on the level and position of spatial aggregation to best detect an outbreak.
• In the social network case, the natural neighbours in the network are potential dynamic,• e.g., the neighbours socially may differ to neighbours in terms of criminal
gangs.• As such the scan statistic is unlikely to work well in the monitoring of
communication levels unless we are lucky and have people in the appropriate order.
• Neighbours are not easy to define.
12 |
Social networks | Ross Sparks
How to define the best network aggregation?
Order Statistics are often useful in defining anomalous cells
• For each communication cell calculate their signal-to-noise ratio measuring how much their counts depart from expected.
• Rank these from smallest to largest.
• Plot these against their theoretical distribution under the assumption that the network communication level has not changed (in-control).
13 |
Social networks | Ross Sparks
Example
14 |
Contacted person
A B C D E F G H I J
Contacting person
A 0 6 5 0 0 0 0 0 0 0
B 4 0 3 0 0 0 0 0 0 0
C 3 3 0 0 0 0 0 0 0 0
D 0 0 0 0 5 2 1 0 0 0
E 0 0 0 2 0 2 0 0 0 0
F 0 0 0 2 0 0 1 0 0 0
G 0 0 0 1 1 1 0 0 0 0
H 0 0 0 0 0 0 0 0 3 2
I 0 0 0 0 0 0 0 2 0 2
J 0 0 0 0 0 0 0 2 2 0
Contacting person
Expected weekly communication levels
A 0 2.1 1.7 0 0 0 0 0 0 0
B 1.9 0 1.3 0 0 0 0 0 0 0
C 0.6 0.4 0 0 0 0 0 0 0 0
D 0 0 0 0 4.5 2.1 0.6 0 0 0
E 0 0 0 2.5 0 2.4 0.4 0 0 0
F 0 0 0 1.5 0.6 0 0.9 0 0 0
G 0 0 0 0.5 1.0 0.8 0 0 0 0
H 0 0 0 0 0 0 0 0 2.5 2.4
I 0 0 0 0 0 0 0 2.1 0 2.0
J 0 0 0 0 0 0 0 1.9 1.8 0
Social networks | Ross Sparks
QQ-plot
15 |
An alternative is p-values and the use a pp-plot of actuals vstheoretical
Social networks | Social networks
Is there another way?
• Sum over all the cell counts that are greater than their expected quantile by grouping all cells with unusually high signal-to-noise ratios as in the previous QQ-plot.
• Calculate the signal-to-noise ratio for this group.
• See if it exceeds a threshold
16 |
Social networks | Social networks
Which cells to aggregate counts?
17 |
Contacted person
A B C D E F G H I J
Contacting person
A 0 6 5 0 0 0 0 0 0 0
B 4 0 3 0 0 0 0 0 0 0
C 3 3 0 0 0 0 0 0 0 0
D 0 0 0 0 5 2 1 0 0 0
E 0 0 0 2 0 2 0 0 0 0
F 0 0 0 2 0 0 1 0 0 0
G 0 0 0 1 1 1 0 0 0 0
H 0 0 0 0 0 0 0 0 3 2
I 0 0 0 0 0 0 0 2 0 2
J 0 0 0 0 0 0 0 2 2 0
Contacting person
Expected weekly communication levels
A 0 2.1 1.7 0 0 0 0 0 0 0
B 1.9 0 1.3 0 0 0 0 0 0 0
C 0.6 0.4 0 0 0 0 0 0 0 0
D 0 0 0 0 4.5 2.1 0.6 0 0 0
E 0 0 0 2.5 0 2.4 0.4 0 0 0
F 0 0 0 1.5 0.6 0 0.9 0 0 0
G 0 0 0 0.5 1.0 0.8 0 0 0 0
H 0 0 0 0 0 0 0 0 2.5 2.4
I 0 0 0 0 0 0 0 2.1 0 2.0
J 0 0 0 0 0 0 0 1.9 1.8 0
Cells with the highest signal-to-noise ratios
Counts=6+5+…+3=24Expected=2.1+1.7+..+0.4=8
Signal –to-noise ratio for the aggregated group= (24-8)/2.828
Social networks| Ross Sparks
Advantages of this ad hoc procedure
• No need to order the network into neighbours
• It works well even in the spatio-temporal setting where “neighbours” are well defined – a paper will soon appear in Communications in Statistics.
• It works out who to aggregate over and thus determines the number of cells to aggregate. Thus the approach adapts to the size (and shape/network of the outbreak).
• The approach is very simple – intuitive – easy for non-statisticians to understand.
18 |
Social networks| Ross Sparks
Some other applications
• Monitoring several hundred symptoms collected from twitter messages in several countries around the world.
• Supermarket sales of several hundred or thousands of products at thousands of supermarket stores in Australia.
• Monitoring various crimes at several hundred key locations.
• Cancellation of life insurance policies for clients at various geographical locations (sla) by age group.
• Number of banking transactions – type of transaction by locations in Australia.
• Number of people travelling between train stations at the peak times of the day in big cities (e.g., Sydney).
19 |
Simulation| Ross Sparks
Simulated example
• We monitor 1000 group of target people.
• Assumed 100 independent social networks of 10 individuals.
• The mean communication daily counts between individuals is taken as uniform on the interval of:• 0.1 to 3 during periods when no crime is being planned, and • 0.0001 for individuals between not in the same gang.
• A step change in communications of delta for all individuals within a specific few gangs will be simulated-these are then hidden.
• We apply the approach to see how early we detect these “unknown” increases.
20 |
Simulation | Ross Sparks
Different simulated criminal planning outbreaks
• Scenario 1: One cell of ten individuals. • Total communication mean count=137.• Scenario 2: Two neighbouring cells of ten individuals.
• Total communication mean count=275• Scenario 3: Two non-neighbouring cells of ten individuals.
• Total mean count=295.• Scenario 4: Three independent cells involving 7 of the 10 within
each cell. • Total mean count=204.• Scenario 5: Four independent cells involving 6 of the 10 within
each cell. • Total mean count=195.
21 |
Order statistics| Ross Sparks
Fixed number of order statistic
• Simulations – generating 1000 by 1000 counts matrix.
• I tried aggregating over the top 25, 50, 75, 100, 150, 200, 250, 300 cells with the highest signal to noise ratio to see which provided the earliest signals of out-of-control events quickly.
• The in-control Average Run length was taken as 100.
• Daily counts were generated. The first 500 days were used to estimate in-control cell means. Thereafter hidden out-of-control communication cells were simulated and then the technology was used to find them – recording the run lengths – these were averaged for 100 simulations to give the average run lengths.
22 |
Nature of “outbreaks”| Ross Sparks
Generation of unusual communication “outbreaks”
• It is assumed that planning a crime has all participants communication at the same increased level, i.e., not proportional to their social communications.• This means that those that don’t communicate much socially but
do when planning a crime are going to have bigger communication cell signal-to-noise ratios.• The opposite is true if the increase in class is proportionally to
their social calls expected counts.
23 |
Outbreaks| Ross Sparks
Scenarios• Scenario 1: One cell of ten individuals. (Total communication mean count=136.61).
• Scenario 2: Two neighbouring cells of ten individuals. (Total communication mean count=275.25).
• Scenario 3: Two non-neighbouring cells of ten individuals. (Total mean count=294.76).
• Scenario 4: Three independent cells involving 7 of the 10 within each cell. (Total mean count=204.2).
• Scenario 5: Four independent cells involving 6 of the 10 within each cell. (Total mean count=194.95).
• Scenario 6: Four independent cells involving 1 of the 10 within each cell. (Total mean count=194.95).
24 |
Results| Ross Sparks
Scenario 1: One cells of ten individuals Scenario 2: Two neighbouring cells of ten individuals each.
25 |
Scenario 1 Scenario 2
delta
Number of Order Statistics
Number of Order Statistics
m 25 50 75 100 25 50 75 100
0.0 100.8 99.8 100.3 100.2 101.4 100.2 100.3 100.2 0.5 21.7 22.2 22.6 23.8 15.9 16.0 16.2 16.2 1.0 8.8 9.7 10.5 10.9 7.1 7.5 7.8 8.1 2.0 4.5 5.0 5.5 5.9 3.8 3.9 4.3 4.8 3.0 3.4 3.7 3.9 4.1 2.8 2.9 2.9 3.4 4.0 2.8 2.9 2.9 3.4 1.9 1.9 2.6 2.9 6.0 1.9 2.0 2.0 2.7 1.6 1.6 1.9 2.0
Presentation title | Presenter name
Scenario 3: Two non-neighbouring cells of ten individuals each. Scenario 4: Four independent non-neighbouring cells involving 7 of the 10 people within each cell.
26 |
Scenario 3 Scenario 4
delta
Number of Order Statistics
Number of Order Statistics
m 25 50 75 100 25 50 75 100
0.0 101.4 99.8 100.3 100.2 101.4 100.2 100.3 100.2 0.5 15.8 15.9 16.3 16.9 15.6 16.4 16.5 16.8 1.0 7.2 7.4 8.0 8.2 7.1 7.5 8.1 8.3 2.0 3.8 3.9 4.1 4.8 3.6 3.9 4.2 4.8 3.0 2.8 2.9 2.9 3.4 2.9 3.0 3.0 3.2 4.0 1.9 1.9 2.5 2.9 2.0 2.0 2.5 2.9 6.0 1.6 1.5 1.9 2.0 1.8 1.8 1.9 2.0
Results| Ross Sparks
Scenario 5: Four independent cells involving 6 of the 10 within each cell. (Total mean count=194.95).
Scenario 6: Four non-neighbouring cells of ten individuals
27 |
Scenario 5 Scenario 6
delta
Number of Order Statistics
Number of Order Statistics
m 25 50 75 100 25 50 75 100
0.0 101.4 99.8 100.3 100.2 101.4 100.2 100.3 100.2 0.5 14.7 14.8 15.5 15.6 12.0 12.1 12.3 12.6 1.0 6.6 7.3 7.5 8.1 5.4 5.9 6.1 6.4 2.0 3.2 3.9 3.9 4.8 2.9 2.9 3.4 3.8 3.0 2.2 2.9 2.9 3.4 2.2 2.2 2.6 2.9 4.0 1.9 2.0 2.0 2.9 1.9 1.9 2.0 2.0 6.0 1.7 1.6 1.7 2.0
Conclude| Ross Sparks
Conclusions
• As long as the increase in calls are at least twice the normal number calls when planning a crime, then it can be flagged within a week.
– This is probably sufficient to prevent a gang related crime or a gang related terrorist activity.
• Simulations of large scale networks are challenging – needs computing skills better than I currently possess.
• The technology can be scaled up to higher dimensions if the simulations process can be improved.
28 |
CSIRO Computational InformaticsRoss SparksResearch scientistt +61 2 9123 4567e [email protected] http://www.csiro.au/
CSIRO COMPUTATIONAL INFORMATICS/DIGITAL PRODUCTIVITY FLAGSHIP
Thank you. Question?