using predictive analytics for anticipatory investigation and intervention
DESCRIPTION
The proliferation and adoption of data, sensors, mobile phones and social media technology present new ways of capturing conversations surrounding events in real-time. There is high demand for products that allow law enforcement and criminal investigators and others to explore events by monitoring many transmedia sources (social media sources like Facebook and Twitter, photos, news sources, and tweets) and relating that activity to historic data sets like neighborhood maps, crime databases and other digital records. ! Using a combination of the data-analysis products available from D8A Group, we’ve been monitoring unfolding events in real-time to illustrate the ways our technology platforms can be used by public safety officials to make data informed decisions in real-time public safety.TRANSCRIPT
Version 1
PREDICTIVE ANALYTICS FOR ANTICIPATORY CRIME INVESTIGATION
AND INTERVENTION
2014 CASE STUDY
CRIME
� "2Using"Predictive"Analytics"for"Anticipatory"Investigation"and"Intervention"
� "!Predictive Crime Fighting with D8A
!!! !!!!!!
CONTACT !D8A Group http://d8a.com !Phone: (520) 301-7906 Email: [email protected]
� "3Using"Predictive"Analytics"for"Anticipatory"Investigation"and"Intervention"
!!TABLE OF CONTENTS !Public Safety in the 21st Century 4 Case Studies
1- Risk Patterns 6 2- Remote Investigation 10 3- Network Analysis 12
Contextual News Discovery 14 Momentum 15 Real-Time Zeitgeist 17 Filtering by Keyword Exclusion 18 Keyword and Phrase Networks 19 Identifying Influencers 20 Sentiment Analysis 22 Geography Trends and
Locations of Interest 24 Predictive Analytics 26 Risk Mitigation and the Timing Of Information 28 !
!!
� "4Using"Predictive"Analytics"for"Anticipatory"Investigation"and"Intervention"
Public Safety in the 21st Century !The proliferation and adoption of data, sensors, mobile phones and social media
technology present new ways of capturing conversations surrounding events in
real-time. There is high demand for products that allow law enforcement and
criminal investigators and others to explore events by monitoring many
transmedia sources (social media sources like Facebook and Twitter, photos,
news sources, and tweets) and relating that activity to historic data sets like
neighborhood maps, crime databases and other digital records.
!Using a combination of the data-analysis products available from D8A Group,
we’ve been monitoring unfolding events in real-time to illustrate the ways our
technology platforms can be used by public safety officials to make data informed
decisions in real-time crisis scenarios. The solutions used for this analysis
include:
!• SiftDeck: a product that connects online conversations to the people,
places, and things being referenced offline. This helps users manage real-
world risk to predict and avoid their offline assets from being threatened
(think staff, office locations, or property). • Themes: a product that allows users to visually sort through large
amounts of text data or streaming data to surface patterns and trends in
the content. It allows for the visual navigation of real-time data using
search, word trees, keyword & phrase network analysis, and various
filters. • Muxboard: a remixable analytic dashboard that allows researchers to
apply various algorithms and third-party APIs to real-time, ever-evolving
data sets using drag and drop ease. Muxboaard makes it easy to quickly
create dashboards tracking different people or brands, each with intricate
customizable analytics.
� "5Using"Predictive"Analytics"for"Anticipatory"Investigation"and"Intervention"
!The primary purpose of using technologies like the D8A suite of analytic products
is to monitor and capture real-time data for analysis and research that serves as
a baseline so that users can look for “spikes” of irregular activity. These solutions
are also predictive, helping to surface trends, patterns, and happenings before
one might find out about them otherwise. D8A’s products work across multiple
communication channels and with multiple types of public safety data.
!When it comes to social data, most users are primarily interested in analyzing
Twitter’s real-time global data stream, our products also work with mobile data
streams (text messages), news articles and headlines, blogs, RSS feeds, JSON
feeds, email, and can hook virtually any API made available. This makes our
products flexible for any type of online activity monitoring.
!The added advantage of D8A’s particular set of products is the ability to research,
sift through, and sort data streams in real time, allowing companies to make
data-driven decisions while events are still unfolding.
!
� "6Using"Predictive"Analytics"for"Anticipatory"Investigation"and"Intervention"
!!Case Study 1: Risk Patterns This example illustrates how our suite of products can be used to quickly build
risk models using non-traditional data. For instance, the user might upload
various datasets to our system that would contain historic crime records of a
given area, similar to what is publicly available through CrimeReports.com
!This is what we call our ‘baseline’. Historic records that show what has occurred
in the past and where. This gives the user data to begin building a sophisticated
model that triages real-time data with additional historic data.
!
� "7Using"Predictive"Analytics"for"Anticipatory"Investigation"and"Intervention"
What type of model? Well, first let’s say we want to overlay other types of data on
the map. This might include demography, patrol car routes, information on
surrounding buildings (ex. vacant lots) or events.
!To do so, we can simply choose from data sets containing this information, or
upload our own.
!Drag ‘Upload Your Data’ on to the canvas.
� "8Using"Predictive"Analytics"for"Anticipatory"Investigation"and"Intervention"
Choose your file and upload it. The system will automatically begin number
crunching to figure out the best way to display the file.
You can then drag and drop algorithms to augment the dataset. For instance,
using our ‘Counts’ module to find things like standard deviation patterns or mean
averages in your data.
!!
� "9Using"Predictive"Analytics"for"Anticipatory"Investigation"and"Intervention"
The combined data set can then be used to create other maps or charts. This is
an alternative way of doing what some would call ‘data science’. Merging,
combining and ultimately displaying disparate information types — all made drag
and drop simple so that public sector staff need little to know retraining to do it.
!Rather than only looking historically, users can combine their historic crime data
and real-time happenings to create new visualizations that ordinarily would take a
contractor or consultant a lot of time (and usually a lot of money) to produce.
� "10Using"Predictive"Analytics"for"Anticipatory"Investigation"and"Intervention"
!
Case Study 2: Remote
Investigation and Online Forensics Another popular use case for our products is remote investigation. Using real-
time data to create dashboards that can be used to explore an unfolding situation
without necessarily being on the scene.
!
!
!The above map might represent the last known locations of a suspect, or group
of suspects based on data mined from their social media presence. If we know
their location what types of crimes they've committed in the past and the types of
locations or people in the area, we can triangulate potential targets for new
criminal activity.
!
� "11Using"Predictive"Analytics"for"Anticipatory"Investigation"and"Intervention"
!These are called ‘probability zones’. The areas where we have conclude things
might happen because of some early warning signal. They assist in designing
intervention strategies (ex. “We should send more patrol officers to this area.”)
!!!
� "12Using"Predictive"Analytics"for"Anticipatory"Investigation"and"Intervention"
Case Study 3: Network Mapping It’s a common tactic of predictive analytics to use the peer networks of criminals
to identify others likely to commit criminal activity. Tracing these networks of
friends and ’friends of friends’ is called network analysis. They have shown up in
investigations for years in the form of evidence maps and can be used to connect
people to people, but also people to places and things.
!During an investigation, these network maps can become unwieldily.
!!!!!!!
� "13Using"Predictive"Analytics"for"Anticipatory"Investigation"and"Intervention"
!With D8A Group’s products, these evidence walls are dynamic and interactive —
linking to the gigs of data they represent.
!!!
� "14Using"Predictive"Analytics"for"Anticipatory"Investigation"and"Intervention"
!
Contextual News Discovery SiftDeck learns to aggregate news headlines based on keywords parsed from
aggregated content. This is different from only aggregating content based on the
keywords users enter because it provides a contextual stream of headlines
based on the real-time conversation. In other words, SiftDeck recommends
potentially related news headlines that a user may not even be aware of. So it
serves as a real-time discovery and recommendation engine.
!This feature tries to answer the question: “What if I don’t know what I’m looking
for?” Rather than the user programming every single detail into our products,
they learn from both the user and the content creators to make new suggestions
of which news items might be relevant to the investigation underway.
!
� "15Using"Predictive"Analytics"for"Anticipatory"Investigation"and"Intervention"
!!Momentum Momentum is the term we use to refer to the qualities of a conversation. Does
the conversation activity seem to be building or slowing? Are new people joining
or are they leaving? Are the people involved from the beginning conversing more
or less than they were from the start? Which keywords, influencers, and
communication channels are leading the conversation?
!
!
!The image above chronicles the drop and eventual rebound of momentum
surrounding the various keywords being tracked.
!
� "16Using"Predictive"Analytics"for"Anticipatory"Investigation"and"Intervention"
!
!Likewise, looking back over the previous days or weeks shows that there are lulls
and bumps in the flow of the conversation over time. This directly correlates to
events occurring in the real-world and the virality of news spreading online.
!!
� "17Using"Predictive"Analytics"for"Anticipatory"Investigation"and"Intervention"
!Real-Time Zeitgeist What are the recurring themes and phrases in a real-time conversation? The
words, phrases, names, and locations that repeat may allow analysts to draw
correlations between seemingly unrelated conversations.
!
!
!Were one not even looking for a scandal involving Chris Christie, if a word cloud
all of a sudden started surfacing words like ‘scandal’, ‘bridge’, ‘taxes’ and
‘campaign’ (like in the above image) they could easily determine a big story might
be breaking and take action. For a someone working in public safety a
dashboard might surface warnings of an unfolding event or situation
!!
� "18Using"Predictive"Analytics"for"Anticipatory"Investigation"and"Intervention"
This ability to actively monitor the ‘zeitgeist’ or thematic relationships between
conversations happening across disparate communication channels often proves
powerful for organizations who have to plan suggested interventions or activities
in real-time.
!Filtering By Keyword Exclusion
More importantly, these word clouds make it possible to conditionally filter out
conversations that actually are unrelated.
!
!
!In this case, the recurrence of the word ‘Munich’ in data streams monitoring
conversations about Sudan was because of a football match between Sudanese
and German teams. After identifying messages that are skewing the research,
with our product, Themes, the user can simply click on the word (in this case
‘Munich’) and opt to exclude all data where non-relevant words appear in the
same sentences together, while keeping all other data intact.
!
� "19Using"Predictive"Analytics"for"Anticipatory"Investigation"and"Intervention"
Organizations using other products for social media analytics often forget that
many such tools don’t allow for the selective ‘cleansing’ of datasets to remove
misleading or non-relevant content.
! Keyword and Phrase Networks
!
!Themes’ network graphs of words and phrases can provide a powerful means
for visually controlling the underlying dataset. In this case, clicking on any word in
the above graph, gives you the option to focus only on content that contains that
word, or only on the content that doesn’t contain a particular word.
!In the above example, a very large dataset was used to show how select key
words appear in high frequency in the same datasets. But by clicking on each
word, and choosing to focus or exclude certain ones, the dataset is refined as
needed.
!
� "20Using"Predictive"Analytics"for"Anticipatory"Investigation"and"Intervention"
!
!A researcher might want to only view content where the phrases ‘taxes’, ‘liberals’,
‘scandal’ and ‘work’ appear together as it relates to a criminal political scandal. If
so, it’s simply a matter of point and click (the collection of keywords in the bottom
right), and the data is re-organized to fit that criteria. Terms can just as easily be
excluded from the dataset.
!Identifying Influencers
Monitoring digital conversations allows organizations to identify potential ‘thought
leaders’, friends of suspects or other people who may be influential in a given
scenario. While it’s usually impossible to verify exactly who these actors are, and
what their motives are, it’s useful to identify them, to conduct strategies for
engagement and outreach.
� "21Using"Predictive"Analytics"for"Anticipatory"Investigation"and"Intervention"
!
!Having this information allows analysts to follow the public conversations of
specific individuals. For instance, if any of these (or other) individuals are
influential bloggers, employees of other companies, investors, journalists etc.
!!
� "22Using"Predictive"Analytics"for"Anticipatory"Investigation"and"Intervention"
!Sentiment Analysis
Sentiment analysis is a method of measuring the emotional tone of written text
using computer programs. It attempts to weight different words in a body of text
against one another, to ultimately provide a ‘score’ to the whole body of text that
is either positive, negative, or neutral.
!Why is this useful? Because it allows users to algorithmically determine whether
an online conversation is skewing positive or negative in tone.
!
!
!In the image above, it’s easy to quickly see that of the more than 6,972
messages analyzed in the first column, 1679 (25%) have been marked as being
negative in tone, while 700 (10%) are positive. If the analyst wants to focus on
� "23Using"Predictive"Analytics"for"Anticipatory"Investigation"and"Intervention"
the dataset that’s been marked negative, they simply click on that area of the
graph.
!The content and related analysis is then sorted to focus on the ‘negative’ content.
To give a use case scenario, this would allow a researcher to view a list of
influencers leading the negative tone of a conversation. In the past, this has
allowed our users to identify individuals whom they would qualify as the
‘antagonists’ or ‘instigators’ who might be inciting violence or other unwanted
activities. Being able to sort data in this way provides a powerful lens of context
and discovery. More importantly, it allows analysts to constantly ask questions of
the data itself through our simple drag and drop interface.
!
!
!The above screenshot looks at only the analysis of content negative in tone from
a different data set than the previous image. You can see that 379 messages
represent the negative content, of which 376 comes from Twitter, 3 items come
from Google News, and we have a list of potential conversation influencers, as
� "24Using"Predictive"Analytics"for"Anticipatory"Investigation"and"Intervention"
well as how much content they’ve contributed to the overall conversation.
Analysts can now reach out to them directly, or begin monitoring these new
sources of interest. Again, all of this is being done in real-time.
!Geography Trends and Locations
of Interest
Connecting this type of online research to offline activities and actions is a big
portion of why people use data products like the ones provided by D8A. We use
the social graph and natural language processing to algorithmically map various
locations of interest to researchers.
!
!
!!
� "25Using"Predictive"Analytics"for"Anticipatory"Investigation"and"Intervention"
!
!The power of this information is that even with the most minimal knowledge of a
situation, the maps and graphs generated tell a story. While knowing the broader
context and having professional expertise in the given subject matter is
absolutely necessary, when such knowledge is coupled with these kinds of visual
data exploration tools, it’s possible to make the job of experts faster, more
nuanced and efficient.
!D8A’s products (SiftDeck, Muxboard/MetaLayer, and Themes) are not meant to
replace professional analysts and researchers, but to save them incredible
amounts of time.
!!
� "26Using"Predictive"Analytics"for"Anticipatory"Investigation"and"Intervention"
!Predictive Analytics
When all of our products are combined, it’s possible to anticipate events,
demands, or activities that have not happened yet. This is type of anticipatory
response to data is based on an area of research called predictive analytics.
!By combining all of our insights into an informed narrative, researchers might be
able to determine the correct actions to take well before it’s obvious. As with all
systems, it’s possible these predictions can be wrong so rather than give
researchers objectives, our products serve to provide the appropriate information
for informed conversation and action.
!
! !
� "27Using"Predictive"Analytics"for"Anticipatory"Investigation"and"Intervention"
! !
!In a scenario where an analyst is viewing multiple dashboards in an unfolding
scenario, it’s possible to piece each of these different insights together to suggest
action and give reasons for that action.
!In one use of our products in South Sudan, well before stories played out in the
media, our team identified several influencers in-country and around the world.
!We knew that the situation was no longer contained to just South Sudan, but was
now affecting the whole of the East Africa region; we knew that there appeared to
be a rapid build of momentum in the conversations on the evening of January 9th
leading into the 10th, and we know that the thematic tone of conversation was
trending towards some sort of conflict. We also had the related breaking news
stories confirming as much.
!!
� "28Using"Predictive"Analytics"for"Anticipatory"Investigation"and"Intervention"
!Risk Mitigation and the Timing of
Information
While it’s possible to come to the same conclusions in a number of other ways,
the timing of information often dictates its value, as well as the time it takes to
aggregate all data sources to predict future conclusions.
!For a Wallstreet broker, receiving information that the CEO of a major company is
about to be fired might indicate he needs to sell his position in that companies
stock. However, receiving the information after the fact (ex. “the CEO was fired
yesterday”) is an entirely different scenario. The first scenario allows him to
mitigate risk in anticipation of a potential disaster. The other scenario allows him
to make the same decisions, but the information is less valuable because he has
less control over how the news affects things. A portion of the risk is already
realized, thereby making the information less valuable. For the Wallstreet broker,
the value of information could be valued in the millions or billions of dollars. For
humanitarian organizations and journalists, the type of risk we try to help them
mitigate might be measured in loss of life & property, or at the very least, quality
of life for the people affected by these events. For brands or public figures, their
reputation is directly correlated with their value and ability to derive revenue from
customers.
!D8A’s products are designed to shift critical analysis of any situation, event, or
phenomena from a retroactive exploration, to a real-time one. In the above
scenario, the case was made that value of information is very much related to its
timing.
!
� "29Using"Predictive"Analytics"for"Anticipatory"Investigation"and"Intervention"
Thus, even if our products only slightly move the needle in regards to the time of
information, there is a direct correlation to the amount of value that analysis
provides. Knowing how to potentially affect a situation in real-time can be
exponentially more valuable than waiting for everything to play out, only to deal
with the aftermath.
!While such actions need to be tempered with consideration for culture, context,
privacy and law, there is great value in time-shifting the analytics so that
companies can react to events more readily because they were able to anticipate
potential risk scenarios.