inome
DESCRIPTION
Jim Adler VP Data Systems & Chief Privacy Officer inome @ jim_adler http://jimadler.me. inome. The Genomics of How We All Fit Together. Overture & 3 Acts. About inome Strata Redux Felon Classifier Closing Arguments. Intelligence. I am not an Attorney. Geek. Dweeb. Nerd. Social - PowerPoint PPT PresentationTRANSCRIPT
inomeThe Genomics of How We All Fit Together
Jim AdlerVP Data Systems & Chief Privacy Officerinome
@jim_adlerhttp://jimadler.me
OVERTURE & 3 ACTS1. About inome
2. Strata Redux
3. Felon Classifier
4. Closing Arguments
Intelligence
Social IneptitudeObsession Dork
DweebNerd
GeekI am not anAttorney
ABOUT INOMEReal-time, person-centric data engineStructured and unstructured data10 years in the makingScalable – serves over 1 million visitors a dayAPIs support 3rd party apps – http://developer.inome.com
When towns were small …
INTERACTION
INFORMATION SOCIAL
GENOMICS
inome is bringing the “local village” back
HOW WE ALL FIT TOGETHER
Billions of Records
Millions of People
Jim AdlerHouston, TX
Age 68
Jim AdlerRedmond, WA
Age 48
Jim AdlerDenver,
COAge 48
Jim AdlerMcKinney, TX
Age 57
Jim AdlerCanaan, NH
Age 59
Jim AdlerHastings, NE
Age 32
213 records mapped to the correct 37 Jim Adlers
HOW INOME SOLVES THE “BIG DATA” PEOPLE PROBLEM
Philip Collins
375 People
Jim Adler213
Records37 People
Randolph
Hutchins5 People
Gwen Flemin
g2
People
Carol Brooks9800
Records1250
People
Full TextSearchIndex
DataAcquisition
MachineLearners
Features
DocumentStore
DataExchange
Acquire, Standardize,Validate, Extract
Clustering Blocking
NamesPlacesPhones
Court RecordsNews/BlogsProfessional
RelativesFriends
Colleagues
inome
Data
Model
(IDM)
THE INOME ENGINE
http://developer.inome.comAPIs
ACT 1Strata Redux
"Watch your thoughts, they become words.Watch your words, they become actions.Watch your actions, they become habits.Watch your habits, they become your character.Watch your character, it becomes your destiny.”
Lao Tzu
… the essential crime that contained all others in itself. Thoughtcrime, they called it."
George Orwell
P R I VA C Y
PERILS
PLAC
ESPLAYERS
http://jimadler.me/post/14171086020/creepy-is-as-creepy-doeshttp://jimadler.me/post/18618791545/strata-2012-is-privacy-a-big-data-prison
THE PLACES-PLAYERS-PERILS PRIVACY FRAMEWORK
PLACES-PLAYERS-PERILS CASES
M O R E P R I VAT E P L A C E S
MO
RE
PL
AY
ER
PO
WE
R
GA
P
ACT 2Felon Classifier
ContributorsJeremy Kahn, Senior Scientist Deepak Konidena, Software Engineer
THE CLASSIFIER’S GOAL
If someone has minor offenses on their criminal record,
do they also have any felonies?
MOTIVATIONSAsk the hard questions
Convene the suits, wonks, and geeks
Drive responsible innovation
Explore the data & showcase the technology
A FEW DEFINITIONSDefinition
Positive Has at least one felonyNegative Has no felonies but does have lesser offenses
Classifier PerformanceTrue Positive Correctly identifies a felonTrue Negative Correctly ignores someone who isn’t a felonFalse Positive Incorrectly identifies a felon who isn’t oneFalse Negative Incorrectly ignores a felon
DATA EXTRACTION AND CLEANSING
250 M Defendant
s(avro files)
Data
Acq
uisit
ion
Data
Exc
hang
e
Bloc
king
Linki
ng
Clus
terin
g
INOME ENGINE
40 M Defendants
Ohio
Alab
ama
Florida
Kentucky: 60 K
Delaware
Texas
Virginia
State Fan-Out
NoiseFilter
15K Labels
15K Predictors
EXAMPLE DATAkey: e926f511b7f8289c64130a266c66411eval: offenses: - {CaseID: MDAOC206059-2, CaseInfo: 'CASE DISPO: TRIAL, CJIS CODE: 3 5010', Disposition: STET, Key: hyg-MDAOC206059, OffenseClass: M, OffenseCount: '2', OffenseDate: '20041205', OffenseDesc: 'THEFT:LESS $500 VALUE'} - {CaseID: MDAOC206060-1, CaseInfo: 'CASE DISPO: TRIAL, CJIS CODE: 1 4803', Disposition: GUILTY, Key: hyg-MDAOC206060, OffenseClass: M, OffenseCount: '1', OffenseDate: '20040928', OffenseDesc: FALSE STATEMENT TO OFFICER} profile: {BodyMarks: 'TAT L ARM; ,TAT L SHLD: N/A; ,TAT R ARM: N/A; ,TAT R SHLD: N/A; ,TAT RF ARM; ,TAT UL ARM; ,TAT UR AR', DOB: '19711206', DOB.Completeness: '111', EyeColor: HAZEL, Gender: m, HairColor: BROWN, Height: 5'8", SkinColor: FAIR, State: 'DE,MD,MD,MD,MD,MD,MD,MD,MD,MD,MD,MD,MD’, Weight: 180 LBS}
key: e926f511b7f8289c64130a266c66411eval: label: true offenses:- {CaseID: MDAOC206065-4, CaseInfo: 'CASE DISPO: TRIAL, CJIS CODE: 1 6501', Disposition: NOLLE PROSEQUI, Key: hyg-MDAOC206065, OffenseClass: F, OffenseCount: '1', OffenseDesc: ARSON 2ND DEGREE}
Prediction Data
Training Labels
PersonInformatio
n
Non-Felony Offense
Information
PredictionData
INOME Person Profile
Model Has any felonies?
Model Training
Model Operation
ProfileInformatio
n
Non-Felony Offense
Information
Felony Offense
Information
PredictionData
Training Labels
INOME Person Profile
Learn Model
Features
MODEL FEATURESPersonal Profile
Person.NumBodyMarks
Person.HasTattoo
Person.IsMale
Person.HairColor
Person.EyeColor
Person.SkinColor
Criminal ProfileOffenses.NumOffenses
Offenses.OnlyTraffic
EXAMPLE FEATUREclass EyeColor(Extractor): normalizer = { 'bro': 'brown’,'blu': 'blue', 'blk': 'black', 'hzl': 'hazel’, 'haz’: 'hazel’, 'grn': 'green’} schema = {'type': 'enum', 'name': 'EyeColors', 'symbols': ('black', 'brown', 'hazel', 'blue', 'green', 'other', 'unknown')} def extract(self, record): recorded = record['profile'].get('EyeColor', None) if recorded is None: return 'unknown' recorded = recorded.lower() if recorded in self.normalizer: recorded = self.normalizer[recorded] for i in self.schema['symbols']: if recorded.startswith(i): recorded = i if recorded in self.schema['symbols']: return recorded else: return 'other'
THE CODEGasket – an inome functional toolset for data extraction
Avro, Json, and Yaml
Gemini – an inome framework for feature extraction and learning
Domain knowledge feature extractorsModel construction from features and labels
Felon detector available now: http://github.com/inome/strataconf-2013-sc
FELON CLASSIFIER PERFORMANCEA
NA
RC
HY
T Y R A N N Y
0.0% 5.0% 10.0% 15.0% 20.0%0.0%
20.0%
40.0%
60.0%
80.0%
100.0%
False Positive Rate
Fals
e N
egat
ive
Rate
Threshold: 0.66FP Rate: 5% FN Rate: 22%
Threshold: 1.01FP Rate: 1% FN Rate: 40%
Threshold: -1.82FP Rate: 19% FN Rate: 0%
ALTERNATING DECISION TREE
ACT 3Closing Arguments
M O R E P R I VAT E P L A C E S
MO
RE
PL
AY
ER
PO
WE
R
GA
P
Public data used by powerful government players resulting in perilous consequences like stop, seizure, arrest, and imprisonment
FROM INFERENCES TO ACTIONSFourth Amendment checks gov’t abusesPrinciples of reasonable suspicion Geographic ProfilingCriminal ProfilingReferences
Predictive PolicingAndrew Guthrie Ferguson, U of District of Columbia Lawhttp://ssrn.com/abstract_id=2050001
Rethinking Racial ProfilingBernard Harcourt, U Chicago Lawhttp://www.law.uchicago.edu/files/files/rethinking_racial_profiling.pdf
Looking at Prediction from an Economics PerspectiveYoram Margaliothhttp://bernardharcourt.com/documents/margalioth-againstprediction.pdf
REASONABLE SUSPICION
Courts have upheld profilingPredictive information never enough
1. Reliable 2. Efficient3. Particularized4. Detailed5. Timely6. Corroborated
GEOGRAPHIC PROFILING
Profile identifies higher crime areaSmall area, 500 sq ft to avoid profiling neighborhoods
Must be corroborated by witnessed criminal activityWhat about police “stops” outside the profiled area?
“Very soon, we will be moving to a predictive policing model where, by studying real time crime patterns, we can anticipate where a crime is likely to occur.”
Chief William Bratton, Los Angeles Police Testimony to US House
September 24, 2009
predpol.com
CRIMINAL PROFILING“Computerized” tips and profiles
Predicting crime for specific individualsCourts have held that profiling is a reasonable factor
Violates punishment theory of equal chances of getting caught
Ratcheting creates a closed loop of confusion
Self-fulfilling prophecy by controlling profile
SUMMARYBig data inferences are thought, not crimeSpeech and action could be criminal… So think carefully
Check us outClassifier available on http://github.com/inome APIs for exploring people data at http://
developer.inome.com
It’s in inome
Jim AdlerVP Data Systems & Chief Privacy Officerinome
@jim_adlerhttp://jimadler.me