big data. new physics. - marist college
TRANSCRIPT
© 2016 IBM Corporation
Big Data. New Physics.And Geospatial Superfood
Jeff Jonas, IBM FellowChief Scientist, Context Computinghttp://www.twitter.com/jeffjonaswww.jeffjonas.typepad.com
© 2016 IBM Corporation
Jeff JonasIBM FellowChief Scientist, Context Computing
Founded Systems Research & Development (SRD) in 1985
Architected, designed, developed roughly 100 systems over the last three decades
– Defense, intelligence– Financial services– Gaming– Law enforcement
Acquired by IBM in 2005
Currently focused on Context Computing, Sensemaking and Privacy by Design (PbD)
2
© 2016 IBM Corporation
”The data must find the data and the
relevance must find you.”
3
© 2016 IBM Corporation
Trend: Organizations Are Getting Dumber
Time
Incr
easi
ng C
ompu
te P
ower
Sensemaking Algorithms
Available Observation
Space ContextEnterpriseAmnesia
Every two days now we create as much information as we did from the dawn of civilization up until 2003.”
~ Eric Schmidt, CEO Google
4
© 2016 IBM Corporation
Trend: Organizations Are Getting Dumber
Time
Incr
easi
ng C
ompu
te P
ower
Sensemaking Algorithms
Available Observation
Space ContextWHY?
5
© 2016 IBM Corporation
Algorithms at Dead End.
You Can’t Squeeze Knowledge
Out of a Pixel.
6
© 2016 IBM Corporation
Context
“Better understanding something by taking into account the things around it.”
8
© 2016 IBM Corporation
I ducked as the bat flew my way.
Another exciting baseball game.
9
© 2016 IBM Corporation
In Context
VendorHigh ValueCustomer
Job Applicant
FormerEmployee Bad Guy
10
© 2016 IBM Corporation
Context Accumulation
“The incremental process of integrating new observations with previous observations.”
11
© 2016 IBM Corporation
Context Accumulating
ContextAccumulation
ContextualizedObservations
Observation(Any kind of data from
any kind of sensor)
12
© 2016 IBM Corporation
Context Informs Decisioning
ContextAccumulation
ContextualizedObservations
ObservationIn Context
Decisioning
Act
Data Finds Data Relevance Finds YouThe Data is the Question
Observation(Any kind of data from
any kind of sensor)
13
© 2016 IBM Corporation
The Puzzle Metaphor
Imagine an ever-growing pile of puzzle pieces of varying sizes, shapes, colors
What it represents is unknown – there is no picture on hand
Is it one puzzle, 15 puzzles, or 1,500 different puzzles?
Some pieces are duplicates, missing, incomplete or have errors
Some pieces may even be professionally fabricated lies
Until you take the pieces to the table, it is nearly impossible to assess the scene
14
© 2016 IBM Corporation
Puzzling Images: Courtesy Ravensburger © 2011
270 pieces90%
200 pieces66%
150 pieces50%
6 pieces2%
30 pieces10% (duplicates)
15
© 2016 IBM Corporation16
© 2016 IBM Corporation17
© 2016 IBM Corporation
First Discovery
18
© 2016 IBM Corporation
More Data Finds Data
19
© 2016 IBM Corporation
Duplicates in Front Of Your Eyes
20
© 2016 IBM Corporation
First Duplicate Found Here
21
© 2016 IBM Corporation22
© 2016 IBM Corporation
Incremental Context – Incremental Discovery
6:40pm START
22min “Hey, this one is a duplicate!”
35min “I think some pieces are missing.”
37min “Looks like a bunch of hillbillies on a porch.”
44min “Hillbillies, playing guitars, sitting on a porch, near a barber sign and a banjo!”
23
© 2016 IBM Corporation
150 pieces50%
24
© 2016 IBM Corporation
Incremental Context – Incremental Discovery
47min “We should take the sky and grass off the table.”
2hr “Let’s switch sides, and see if we can make sense of this fromdifferent perspectives.”
2hr10m “Wait, there are three … no, four puzzles.”
2hr18m “I think you threw in a few random pieces.”
25
© 2016 IBM Corporation26
© 2016 IBM Corporation27
© 2016 IBM Corporation28
© 2016 IBM Corporation
PUZZLING EXPERIMENT #3ADULTS AT SIBOS 2011
29
© 2016 IBM Corporation
SIBOS Conference 2011
100 executives, 10 teams 10 puzzles, 10 small tables Duplicate and missing pieces
Lessons:1. They learned federated
search bites.2. I watched as an early bias
misdirected their attention … but then over time new observations corrected this bias.
30
© 2016 IBM Corporation
How Context Accumulates
With each new observation one asserts: 1) Un-associated; 2) near neighbors; or 3) connected
Must favor the false negative
New observations sometimes reverse earlier assertions
Some observations produce novel discovery
The emerging picture helps focus collection interests
As the working space expands, computational effort increases
Then given sufficient observations there comes a tipping point whereby decision certainty increases while compute effort decreases!
31
© 2016 IBM Corporation
Overstated Population
Observations
Uni
que
Iden
titie
s
True Population
32
© 2016 IBM Corporation
Counting Unique Identities is Difficult
File 1 File 2
Mark SmithDOB: 6/12/1978
SSN: 1234
Mark R Smith707.433.0000
DL: 5678
33
© 2016 IBM Corporation
The Rise and Fall of a Population
Observations
Uni
que
Iden
titie
s
True Population
34
© 2016 IBM Corporation
Context Accumulation
Mark SmithDOB: 6/12/1978
SSN: 1234
Mark R Smith707.433.0000
DL: 5678
File 1 File 2
New Record
Mark Randy SmithSSN: 1234DL: 5678
35
© 2016 IBM Corporation
Data Triangulation
File 1
NAME DOB SSN DL PHONE
Mark Smith 6/12/1978 1234
Mark R Smith 5678 707.433.0000
Mark Randy Smith
1234 5678
36
© 2016 IBM Corporation
Big Data [in context]. New Physics.
More data: better the predictions– Lower false positives– Lower false negatives
More data: bad data good– Suddenly glad your data is not perfect
More data: less compute
37
© 2016 IBM Corporation
Big Data
Pile of ______ Information In Context38
© 2016 IBM Corporation
One Essential Form of Context: “Entity Resolution”
Is it 5 people each with 1 account or is it 1 person with 5 accounts?
Is it 20 cases of SARS in 20 cities or one case reported 20 times?
If one cannot count, one cannot estimate vector or velocity (direction, speed).
Without vector and velocity prediction is nearly impossible.
39
© 2016 IBM Corporation
Who is Fang Wong?
Fang WongTop 100 Customer
F A WongSeattle, DOB: 6/12/82
Former Customer
@FangWong2.5M Followers
[email protected] Subscriber
Fang [email protected] Department’s
Prospect List
40
© 2016 IBM Corporation
Resolving the Fang Wong
Fang WongTop 100 Customer
F A WongSeattle, DOB: 6/12/82
Former Customer
@FangWong2.5M Followers
[email protected] Subscriber
Fang [email protected] Department’s
Prospect List
41
© 2016 IBM Corporation
Resolving the Fang Wong
Fang WongTop 100 Customer2.5M Followers
Newsletter Subscriber
42
© 2016 IBM Corporation
Graphing the (resolved) Fang Wong
Bill SmithMember of the Board
Employee
Customer
Customer
FraudsterFang Wong
Top 100 Customer2.5M Followers
Newsletter Subscriber
43
© 2016 IBM Corporation
Contextualizing Sandy MadenEntity Analytics + Events, Transactions, Space/Time, Etc.
Bill SmithMember of the Board
Sandy MadenNew Customer
Employee
Lives With
Co-signer
FormerEmployee
(term no rehire)
Customer Customer
Customer
FraudsterFang Wong
Top 100 Customer2.5M Followers
Newsletter Subscriber
44
© 2016 IBM Corporation
“Entities”
Bill SmithMember of the Board
Sandy MadenNew Customer
Employee
Lives With
Co-signer
FormerEmployee
(term no rehire)
Customer Customer
Customer
FraudsterFang Wong
Top 100 Customer2.5M Followers
Newsletter Subscriber
Company
Boat
Plane
RouterCar
Asteroid
45
© 2016 IBM Corporation
New Think About Entity Resolution
Exactly Same
Fuzzy
IncompatibleFeatures
Deceit
Bob Jones123455
Bob Jones123455
Bob Jones123455
Robert T Jonnes000123455
Bob Jones123455
Bob@TheCo
Bob Jones123455
Ken Wells550119
46
© 2016 IBM Corporation
Key Features Enable Entity Resolution
Name License Plate No. Serial NumberAddress VIN MAC AddressDate of Birth Make IP AddressPhone Model MakePassport Year ModelNationality Color Firmware VersionBiometric Etc. Etc.Etc.
People Cars Router
47
© 2016 IBM Corporation
Consider Lying Identical Twins
#123Sue3/3/84UberstanExp 2011
PASSPORT#123Sue3/3/84UberstanExp 2011
PASSPORT
Fingerprint
DNA Most TrustedAuthority
“Same person –trust me.”
Most TrustedAuthority
48
© 2016 IBM Corporation
The same thing cannot be in two places … at the same time.
Two different things cannot occupy the same space … at the same time.
49
© 2016 IBM Corporation
Space & Time Enables Absolute Disambiguation
Name License Plate No. Serial NumberAddress VIN MAC AddressDate of Birth Make IP AddressPhone Model MakePassport Year ModelNationality Color Firmware VersionBiometric Etc. Etc.Etc.
People Cars RouterWhen When WhenWhere Where Where
50
© 2016 IBM Corporation
“Life Arcs” Are Also Telling
Bill Smith4/13/67
Salem, Oregon
Bill Smith4/13/67
Seattle, Washington
Address HistoryTampa, FL 2008-2015Biloxi, MS 2005-2008NY, NY 1996-2005Tampa, FL 1984-1996
Address HistorySan Diego, CA 2005-2015San Fran, CA 2005-2005Phoenix, AZ 1990-2005San Jose, CA 1982-1990
51
© 2016 IBM Corporation
OMG
52
© 2016 IBM Corporation
Space-Time-Travel
Cell phones are generating a staggering amount of geo-locational data – 600B transactions per day being created in the US alone
This data is being “de-identified” and shared with third parties – in volume and in real-time
Your movement quickly reveals where you spend your time
Re-identification (figuring out who is who) is somewhat trivial
And, oh so powerful predictions …
53
© 2016 IBM Corporation
The 10 People I Spend the Most Time With(Not at Home and Not at Work)
Michelle Renee Peggy Erin Joshua Ivan Bob Amanda Dane Wesley
He must be following me!
54
© 2016 IBM Corporation
Unfair Advantage?
The Uberstan intelligence service preempts the next mass protest in real-time
A political opponent is crushed and resigns two days after announcing their candidacy
55
© 2016 IBM Corporation
Consequences
Space-time-travel data is the ultimate biometric
It will enable enormous opportunity
It will unravel one’s secrets
It will challenge existing notions of privacy
Adoption is now accelerating at a blistering pace
56
© 2016 IBM Corporation
Introducing
Space-Time-Boxes (STB’s)
57
© 2016 IBM Corporation
Using the Geohash Function
4 byte geohash w21z = +/- 20km6 byte geohash w21z4y = +/- 610m
58
© 2016 IBM Corporation
4 Byte Geohashes (+/- 20km)
59
© 2016 IBM Corporation
6 byte Geohashes (+/-610 meters)
60
© 2016 IBM Corporation
8 Byte Geohashes (+/- 19 meters)
61
© 2016 IBM Corporation
General Purpose Space-Time-Box (STB) Scale
Global Hierarchy Ever Yr Mon Day Hr 15min 5min 1min 5sec 500ms 100ms
2 char Geohash(630km)
X X X X X X X X X X X
3 char Geohash(78km)
X X X X X X X X X X X
4 char Geohash(20km)
X X X X X X X X X X X
5 char Geohash(2.4km)
X X X X X X X X X X X
6 char Geohash(610m)
X X X X X X X X X X X
7 char Geohash(75m)
X X X X X X X X X X X
8 char Geohash(19m)
X X X X X X X X X X X
9 char Geohash(2.83m)
X X X X X X X X X X X
10 char Geohash(59cm)
X X X X X X X X X X X
62
© 2016 IBM Corporation
Example: STB’s for Banking Interactions in Singapore
Online banking (+/- 2.4km)Merchant transactions (+/- 610m)ATM (+/- 610m)Mobile services (+/- 610m)
“W21XX|2012”STB(1.192353, 103.421235, 2012-07-08 00:17:37)
“W21ZWQ|2011-12”STB(1.212363, 103.591156, 2011-12-29 10:56:09)
“W21Z73|2012-07”STB(1.2843622, 103.86103, 2012-07-04 15:08:12)
63
© 2016 IBM Corporation
Toying with Publically Available Cell Phone Data
35,831 Call Data Records (CDRs)– 6 months: From 08-31-2009 through 02-27-2010
18,391 Total Number of Usable CDR’s– Excluded CDRs with missing latitude, longitude, time, flow, or accuracy>250 meters
2,444 Hangouts– Minimum of 2 events, spanning at least 15 minutes, in a 610m STB
The Pattern of Life– 130 Hangouts total– 64 Hangouts 3 or more times
Ummm … seems we are living in
habitrails!
64
© 2016 IBM Corporation65
© 2016 IBM Corporation
Malte Spitz’s Hangouts
66
© 2016 IBM Corporation
Getting to Know Malte Spitz
Six months of my life in 35,000 recordshttp://www.malte-spitz.de/blog/4103927.html
67
© 2016 IBM Corporation
MARITIME DOMAIN AWARENESS
68
© 2016 IBM Corporation
Data Sources
Vessel Registry
Real-time AIS
Other Sources
Historical AIS
Watch List
Sensemaking forMaritime Domain
Awareness
Arrival Notifications
AnnouncedCrew
BusinessListings
69
© 2016 IBM Corporation
“Space-Time Box” Densities
Global Hierarchy Ever Yr Mon Day Hr 15min 5min 1min 5sec 500ms 100ms
2 char Geohash(630km)
X X X X X X X X X X X
3 char Geohash(78km)
X X X X X X X X X X X
4 char Geohash(20km)
X X X X X X X X X X X
5 char Geohash(2.4km)
X X X X X X X X X X X
6 char Geohash(610m)
X X X X X X X X X X X
7 char Geohash(75m)
X X X X X X X X X X X
8 char Geohash(19m)
X X X X X X X X X X X
9 char Geohash(2.83m)
X X X X X X X X X X X
10 char Geohash(59cm)
X X X X X X X X X X X
70
© 2016 IBM Corporation
Vessels Hovering in Space-Time-Boxes
71
© 2016 IBM Corporation
Example Insights
Insight Code Description ScoreWL_ENT Watch Listed Entity +85WL_RELATIONS Watch Listed Relationship +50IDENTITY_DECEIPT Identity Deceit +45UNFAMILIAR Unfamiliar Vessel +30CONFUSION Confused Entity +25HAZ_CARGO Hazardous Cargo +20POL_CHANGE Pattern of Life Change +10FAMILIAR Familiar Vessel -20TRUSTED_ENT Trusted Entity -40
72
© 2016 IBM Corporation
Insights
Developing Actionable Intelligence
!
RelevantActionable
Insights accumulate
Enabling relevance detection
Prioritized with a max number of items
Hazardous CargoPattern of Life Change
Identity Deceit
73
© 2016 IBM Corporation
Sensing and Responding
74
© 2016 IBM Corporation
ASTEROID HUNTING
75
© 2016 IBM Corporation
Single Detection
Image courtesy of: Eva Lilly, Institute of Astronomy, University of Hawaii76
© 2016 IBM Corporation
From Orphans to Orbits
Backcasting
Single Detections(trash)
TrackletteTrack
OrbitForecasting
Named entity: S100ZUtza
Single Detection(orphan)
Anticipation
77
© 2016 IBM Corporationhttp://www.space.com/7854-slam-asteroids-suspected-space-collision.html
78
© 2016 IBM Corporation
"We have directly observed a collision between asteroids for the first time, instead of having to infer that they happened from million-year-old remains."
Colin SnodgrassPlanetary Scientist
Max Planck Institute for Solar System Research
79
© 2016 IBM Corporation
Geospatial Context via “Space Time Boxes”
80
© 2016 IBM Corporation
Detecting Colocation
TIME1 day
1 hour
Determine encounter distance and time
Space Time Boxes
81
© 2016 IBM Corporation
Asteroid vs. Asteroid Encounters
Encounter Distance Asteroid 1 Size Asteroid 2 SizeMay 1, 2032 299km 00A9170 2-4km 0008758 4-9km
Nov 24, 2016 449km 00P5634 1-2km 0055711 2-5km
Jan 11, 2018 449km K08E88J 530-1200m 00N0062 2-4km
82
© 2016 IBM Corporation
Computing 600k Asteroid Interactions over 25 Years
4-5 orders of magnitude improvement
Initial Analysis
Adding 1 New Trajectory
Space-Time Box Method
2,880 CPU hours
15 CPU minutes
N-body Simulation Method
10,000,000 CPU hours
4,000 CPU hours
83
© 2016 IBM Corporation
Collaborator Call-out
Mudhakar SrivatsaIBM Research
Raghu GantiIBM Research
Sexy Bald Guy
84
© 2016 IBM Corporation
June 12th, 2015
Hi Jeff & the gang,
I have great news! On Tuesday I happened to observe a close encounter you guys predicted - one 1 km and the other one 2 km in diameter!
To my knowledge this is the first case ever of direct observation of a close encounter in the small main belt asteroids. The closest point of encounter unfortunately happened during bright daylight in Hawaii, so I missed it …
Cheers!Eva -
Image courtesy of Eva Lilly, Institute of Astronomy, University of Hawaii
86
© 2016 IBM CorporationImage courtesy of: Eva Lilly, Institute of Astronomy, University of Hawaii
© 2016 IBM Corporation
[Theatrical Pause]
88
© 2016 IBM Corporation
IN CLOSING
89
© 2016 IBM Corporation
Action
Red Analytics
Green Analytics
Blue Analytics
ObservationSpace
Old School: Isolated Analytics
90
© 2016 IBM Corporation
ObservationSpace
ActionInformationIn Context
Next: General Purpose Sensemaking
Data Finds Data Relevance Finds You
Sensemaking
91
© 2016 IBM Corporation
ObservationSpace
ActionInformationIn Context
Data Finds Data Relevance Finds You
Helping Focusing Human Attention
Sensemaking
General Purpose• Threat & Fraud• Marketing• Asteroids
Simultaneously!
92
© 2016 IBM Corporation
Sensemaking Architecture
Deep Reflection
DiscoveredPatterns
ContextAccumulation
ContextualizedObservations
ObservationIn Context
Decisioning
ActObservation(Any kind of data from
any kind of sensor)
Data Finds Data Relevance Finds You
Data MiningMachine Learning
Feature Extraction Transformation
Scoring & Predictive ModelsEvent Processing
Entity ResolutionRelationship Graphing
93
© 2016 IBM Corporation
The most competitive organizations
are going to make sense of what they are observing
fast enough to do something about it
while they are observing it.
94
© 2016 IBM Corporation
Related Blog Postswww.jeffjonas.typepad.com
Data Finds Data
Puzzling: How Observations Are Accumulated Into Context
Big Data. New Physics.
Fantasy Analytics
G2 is 4
95
© 2016 IBM Corporation
“No one writes bomb on manifest!”
96
© 2016 IBM Corporation
Email: [email protected]
Blog: www.jeffjonas.typepad.com
Twitter: http://www.twitter.com/jeffjonas
Questions?
© 2016 IBM Corporation
Big Data. New Physics.And Geospatial Superfood
Jeff Jonas, IBM FellowChief Scientist, Context Computinghttp://www.twitter.com/jeffjonaswww.jeffjonas.typepad.com
© 2016 IBM Corporation
WIDENING OBSERVATION SPACESA SNEAK PEEK INTO MY CURRENT WORK
99
© 2016 IBM Corporation
Dealing with Probabilities
Deep Reflection
DiscoveredPatterns
ContextAccumulation
ContextualizedObservations
ObservationIn Context
Decisioning
ActObservation(Any kind of data from
any kind of sensor)
Certainty6.25%
100
© 2016 IBM Corporation
Additional DataOriginal Observations
Dealing with Probabilities
Mark Smith123 Main StreetSanta Rosa, CADOB: 5/12/1974
Mark SmithSanta Rosa, CA702.433.8871
Confirmed across 3 credit bureaus:Mark Smith123 Main StreetSanta Rosa, CADOB: 5/12/1974702.433.8871
Confirmed across two data aggregators:Mark Smith, Santa Rosa, 05/12/74- Only one observed123 Main Street, Santa Rosa, CA- No other Marks- No other Smiths702.433.8871- Exclusive to Mark Smith
(*) 16 Mark Smiths live in Santa Rosa, CA [ref: http://www.intelius.com/results.php?trackit=63&ReportType=1&qf=Mark&qmi=&qn=Smith&qs=CA&qc=Santa+Rosa]
Certainty6.25%*
Decision Certainty
© 2016 IBM Corporation
Using Curiosity to Increase Decision Certainty
Deep Reflection
DiscoveredPatterns
ContextAccumulation
ContextualizedObservations
ObservationIn Context
Decisioning
ActObservation(Any kind of data from
any kind of sensor)
SelectiveCuriosity
Figure Out Who to Ask If yes
Make Request(s)
Assembly of Responses into Observations
Certainty 6.25%
Is it worth being curious
about?
102
© 2016 IBM Corporation
Before
Deep Reflection
DiscoveredPatterns
ContextAccumulation
ContextualizedObservations
ObservationIn Context
Decisioning
ActObservation(Any kind of data from
any kind of sensor)
Certainty 6.25%
103
© 2016 IBM Corporation
After
Deep Reflection
DiscoveredPatterns
ContextAccumulation
ContextualizedObservations
ObservationIn Context
Decisioning
ActObservation(Any kind of data from
any kind of sensor)
Decision Certainty
104
© 2016 IBM Corporation
Why Selective Curiosity MattersPatent US8620927
There are many domains where even 99% accuracy is not good enough e.g.,– Healthcare– Financial Services– National Security– Autonomous Vehicles
In the coming era of Internet of Things, robots, and cognitive computing “decision certainty” is going to make or break these advances.
Selective Curiosity will make this possible …
105
© 2016 IBM Corporation
SELECTIVE CURIOSITY IN ACTIONA TRUE STORY
© 2016 IBM Corporation
One Day …
A lady we will call “Jane Doe” friends me on Facebook. Not sure I know her, I check her About page and see she works for the US Government. But that’s it.
This peeks my curiosity.
Who is this and where does she work?
© 2016 IBM Corporation
Next …
I notice we have a friend in common named “Stu Shea.”
I sat on Stu Shea’s Board of Directors at the United States Geospatial Intelligence Foundation (USGIF). He knows interesting people.
I find myself all that much more curious: Who is “Jane Doe?”
© 2016 IBM Corporation
So …
I Google [“Jane Doe” and “Stu Shea”].
Ah Ha! She works for the National Geospatial Agency (NGA).
FYI: Job title, email address and more are easily found with one more search!
© 2016 IBM Corporation
Email: [email protected]
Blog: www.jeffjonas.typepad.com
Twitter: http://www.twitter.com/jeffjonas
Questions?
© 2016 IBM Corporation
Big Data. New Physics.And Geospatial Superfood
Jeff Jonas, IBM FellowChief Scientist, Context Computinghttp://www.twitter.com/jeffjonaswww.jeffjonas.typepad.com