inome

35
inome The Genomics of How We All Fit Together Jim Adler VP Data Systems & Chief Privacy Officer inome @jim_adler http://jimadler.me

Upload: brand

Post on 10-Feb-2016

41 views

Category:

Documents


0 download

DESCRIPTION

Jim Adler VP Data Systems & Chief Privacy Officer inome @ jim_adler http://jimadler.me. inome. The Genomics of How We All Fit Together. Overture & 3 Acts. About inome Strata Redux Felon Classifier Closing Arguments. Intelligence. I am not an Attorney. Geek. Dweeb. Nerd. Social - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: inome

inomeThe Genomics of How We All Fit Together

Jim AdlerVP Data Systems & Chief Privacy Officerinome

@jim_adlerhttp://jimadler.me

Page 2: inome

OVERTURE & 3 ACTS1. About inome

2. Strata Redux

3. Felon Classifier

4. Closing Arguments

Page 3: inome

Intelligence

Social IneptitudeObsession Dork

DweebNerd

GeekI am not anAttorney

Page 4: inome

ABOUT INOMEReal-time, person-centric data engineStructured and unstructured data10 years in the makingScalable – serves over 1 million visitors a dayAPIs support 3rd party apps – http://developer.inome.com

Page 5: inome

When towns were small …

Page 6: inome

INTERACTION

INFORMATION SOCIAL

GENOMICS

Page 7: inome

inome is bringing the “local village” back

Page 8: inome

HOW WE ALL FIT TOGETHER

Page 9: inome

Billions of Records

Millions of People

Jim AdlerHouston, TX

Age 68

Jim AdlerRedmond, WA

Age 48

Jim AdlerDenver,

COAge 48

Jim AdlerMcKinney, TX

Age 57

Jim AdlerCanaan, NH

Age 59

Jim AdlerHastings, NE

Age 32

213 records mapped to the correct 37 Jim Adlers

HOW INOME SOLVES THE “BIG DATA” PEOPLE PROBLEM

Philip Collins

375 People

Jim Adler213

Records37 People

Randolph

Hutchins5 People

Gwen Flemin

g2

People

Carol Brooks9800

Records1250

People

Page 10: inome

Full TextSearchIndex

DataAcquisition

MachineLearners

Features

DocumentStore

DataExchange

Acquire, Standardize,Validate, Extract

Clustering Blocking

NamesPlacesPhones

Court RecordsNews/BlogsProfessional

RelativesFriends

Colleagues

inome

Data

Model

(IDM)

THE INOME ENGINE

http://developer.inome.comAPIs

Page 11: inome

ACT 1Strata Redux

Page 12: inome
Page 13: inome

"Watch your thoughts, they become words.Watch your words, they become actions.Watch your actions, they become habits.Watch your habits, they become your character.Watch your character, it becomes your destiny.”

Lao Tzu

… the essential crime that contained all others in itself. Thoughtcrime, they called it."

George Orwell

Page 14: inome

P R I VA C Y

PERILS

PLAC

ESPLAYERS

http://jimadler.me/post/14171086020/creepy-is-as-creepy-doeshttp://jimadler.me/post/18618791545/strata-2012-is-privacy-a-big-data-prison

THE PLACES-PLAYERS-PERILS PRIVACY FRAMEWORK

Page 15: inome

PLACES-PLAYERS-PERILS CASES

M O R E P R I VAT E P L A C E S

MO

RE

PL

AY

ER

PO

WE

R

GA

P

Page 16: inome

ACT 2Felon Classifier

ContributorsJeremy Kahn, Senior Scientist Deepak Konidena, Software Engineer

Page 17: inome

THE CLASSIFIER’S GOAL

If someone has minor offenses on their criminal record,

do they also have any felonies?

Page 18: inome

MOTIVATIONSAsk the hard questions

Convene the suits, wonks, and geeks

Drive responsible innovation

Explore the data & showcase the technology

Page 19: inome

A FEW DEFINITIONSDefinition

Positive Has at least one felonyNegative Has no felonies but does have lesser offenses

Classifier PerformanceTrue Positive Correctly identifies a felonTrue Negative Correctly ignores someone who isn’t a felonFalse Positive Incorrectly identifies a felon who isn’t oneFalse Negative Incorrectly ignores a felon

Page 20: inome

DATA EXTRACTION AND CLEANSING

250 M Defendant

s(avro files)

Data

Acq

uisit

ion

Data

Exc

hang

e

Bloc

king

Linki

ng

Clus

terin

g

INOME ENGINE

40 M Defendants

Ohio

Alab

ama

Florida

Kentucky: 60 K

Delaware

Texas

Virginia

State Fan-Out

NoiseFilter

15K Labels

15K Predictors

Page 21: inome

EXAMPLE DATAkey: e926f511b7f8289c64130a266c66411eval: offenses: - {CaseID: MDAOC206059-2, CaseInfo: 'CASE DISPO: TRIAL, CJIS CODE: 3 5010', Disposition: STET, Key: hyg-MDAOC206059, OffenseClass: M, OffenseCount: '2', OffenseDate: '20041205', OffenseDesc: 'THEFT:LESS $500 VALUE'} - {CaseID: MDAOC206060-1, CaseInfo: 'CASE DISPO: TRIAL, CJIS CODE: 1 4803', Disposition: GUILTY, Key: hyg-MDAOC206060, OffenseClass: M, OffenseCount: '1', OffenseDate: '20040928', OffenseDesc: FALSE STATEMENT TO OFFICER} profile: {BodyMarks: 'TAT L ARM; ,TAT L SHLD: N/A; ,TAT R ARM: N/A; ,TAT R SHLD: N/A; ,TAT RF ARM; ,TAT UL ARM; ,TAT UR AR', DOB: '19711206', DOB.Completeness: '111', EyeColor: HAZEL, Gender: m, HairColor: BROWN, Height: 5'8", SkinColor: FAIR, State: 'DE,MD,MD,MD,MD,MD,MD,MD,MD,MD,MD,MD,MD’, Weight: 180 LBS}

key: e926f511b7f8289c64130a266c66411eval: label: true offenses:- {CaseID: MDAOC206065-4, CaseInfo: 'CASE DISPO: TRIAL, CJIS CODE: 1 6501', Disposition: NOLLE PROSEQUI, Key: hyg-MDAOC206065, OffenseClass: F, OffenseCount: '1', OffenseDesc: ARSON 2ND DEGREE}

Prediction Data

Training Labels

Page 22: inome

PersonInformatio

n

Non-Felony Offense

Information

PredictionData

INOME Person Profile

Model Has any felonies?

Model Training

Model Operation

ProfileInformatio

n

Non-Felony Offense

Information

Felony Offense

Information

PredictionData

Training Labels

INOME Person Profile

Learn Model

Features

Page 23: inome

MODEL FEATURESPersonal Profile

Person.NumBodyMarks

Person.HasTattoo

Person.IsMale

Person.HairColor

Person.EyeColor

Person.SkinColor

Criminal ProfileOffenses.NumOffenses

Offenses.OnlyTraffic

Page 24: inome

EXAMPLE FEATUREclass EyeColor(Extractor): normalizer = { 'bro': 'brown’,'blu': 'blue', 'blk': 'black', 'hzl': 'hazel’, 'haz’: 'hazel’, 'grn': 'green’} schema = {'type': 'enum', 'name': 'EyeColors', 'symbols': ('black', 'brown', 'hazel', 'blue', 'green', 'other', 'unknown')} def extract(self, record): recorded = record['profile'].get('EyeColor', None) if recorded is None: return 'unknown' recorded = recorded.lower() if recorded in self.normalizer: recorded = self.normalizer[recorded] for i in self.schema['symbols']: if recorded.startswith(i): recorded = i if recorded in self.schema['symbols']: return recorded else: return 'other'

Page 25: inome

THE CODEGasket – an inome functional toolset for data extraction

Avro, Json, and Yaml

Gemini – an inome framework for feature extraction and learning

Domain knowledge feature extractorsModel construction from features and labels

Felon detector available now: http://github.com/inome/strataconf-2013-sc

Page 26: inome

FELON CLASSIFIER PERFORMANCEA

NA

RC

HY

T Y R A N N Y

0.0% 5.0% 10.0% 15.0% 20.0%0.0%

20.0%

40.0%

60.0%

80.0%

100.0%

False Positive Rate

Fals

e N

egat

ive

Rate

Threshold: 0.66FP Rate: 5% FN Rate: 22%

Threshold: 1.01FP Rate: 1% FN Rate: 40%

Threshold: -1.82FP Rate: 19% FN Rate: 0%

Page 27: inome

ALTERNATING DECISION TREE

Page 28: inome

ACT 3Closing Arguments

Page 29: inome

M O R E P R I VAT E P L A C E S

MO

RE

PL

AY

ER

PO

WE

R

GA

P

Public data used by powerful government players resulting in perilous consequences like stop, seizure, arrest, and imprisonment

Page 30: inome

FROM INFERENCES TO ACTIONSFourth Amendment checks gov’t abusesPrinciples of reasonable suspicion Geographic ProfilingCriminal ProfilingReferences

Predictive PolicingAndrew Guthrie Ferguson, U of District of Columbia Lawhttp://ssrn.com/abstract_id=2050001

Rethinking Racial ProfilingBernard Harcourt, U Chicago Lawhttp://www.law.uchicago.edu/files/files/rethinking_racial_profiling.pdf

Looking at Prediction from an Economics PerspectiveYoram Margaliothhttp://bernardharcourt.com/documents/margalioth-againstprediction.pdf

Page 31: inome

REASONABLE SUSPICION

Courts have upheld profilingPredictive information never enough

1. Reliable 2. Efficient3. Particularized4. Detailed5. Timely6. Corroborated

Page 32: inome

GEOGRAPHIC PROFILING

Profile identifies higher crime areaSmall area, 500 sq ft to avoid profiling neighborhoods

Must be corroborated by witnessed criminal activityWhat about police “stops” outside the profiled area?

“Very soon, we will be moving to a predictive policing model where, by studying real time crime patterns, we can anticipate where a crime is likely to occur.”

Chief William Bratton, Los Angeles Police Testimony to US House

September 24, 2009

predpol.com

Page 33: inome

CRIMINAL PROFILING“Computerized” tips and profiles

Predicting crime for specific individualsCourts have held that profiling is a reasonable factor

Violates punishment theory of equal chances of getting caught

Ratcheting creates a closed loop of confusion

Self-fulfilling prophecy by controlling profile

Page 34: inome

SUMMARYBig data inferences are thought, not crimeSpeech and action could be criminal… So think carefully

Check us outClassifier available on http://github.com/inome APIs for exploring people data at http://

developer.inome.com

Page 35: inome

It’s in inome

Jim AdlerVP Data Systems & Chief Privacy Officerinome

@jim_adlerhttp://jimadler.me