discovering the hidden treasure of data using graph analytic — ana paula appel (ibm research)...

28
Ana Paula Appel Data Scientist & Master Inventor Discovering the hidden treasure of data using graph analytic

Upload: papisio

Post on 24-Jan-2018

188 views

Category:

Technology


0 download

TRANSCRIPT

Page 1: Discovering the hidden treasure of data using graph analytic — Ana Paula Appel (IBM research) @PAPIs Connect — São Paulo 2017

AnaPaulaAppel

DataScientist&MasterInventor

Discoveringthehiddentreasureofdatausinggraphanalytic

Page 2: Discovering the hidden treasure of data using graph analytic — Ana Paula Appel (IBM research) @PAPIs Connect — São Paulo 2017

©2015IBMCorporation2

Page 3: Discovering the hidden treasure of data using graph analytic — Ana Paula Appel (IBM research) @PAPIs Connect — São Paulo 2017

IBM Research – Brazil view from Rio de Janeiro Lab

Mission:TobeknownforourscienceandtechnologyandvitaltoIBM,Brazil,our

clientsintheregionandworldwide

Page 4: Discovering the hidden treasure of data using graph analytic — Ana Paula Appel (IBM research) @PAPIs Connect — São Paulo 2017

Healthcare Data

• Medicalattentiontransactionaldata

• Largehealthcareinsurancecompanyin

Brazil

• Nationwide• Spanning1.5years(2013-2014)• 0.6Tb(compressed)

Page 5: Discovering the hidden treasure of data using graph analytic — Ana Paula Appel (IBM research) @PAPIs Connect — São Paulo 2017

©2015IBMCorporation5

Healthcare Data:Stakeholders

Physicians

Patients

Healthcareproviders

HealthServices

Claims

HealthInsurance

Company

Page 6: Discovering the hidden treasure of data using graph analytic — Ana Paula Appel (IBM research) @PAPIs Connect — São Paulo 2017

©2015IBMCorporation6

• Paid Claims• Total:109M• Doctors:220k(almosthalfofalldoctorsInBrazil)• Patients:2.2M

• UniqueDoctor-Patientpairs:11.6M

• Other support data:

• Company

• Providers

• Authorizations ~3M

• Claim denials ~13M

• Geolocation

• ...

Over40tables,

hundreds of fields

Healthcare Data:Claims

CLAIM• PhysicianID

• PatientID

• Timestamp

• Servicecode

• Disease– ICD9

• (80+extrarows)

Page 7: Discovering the hidden treasure of data using graph analytic — Ana Paula Appel (IBM research) @PAPIs Connect — São Paulo 2017

©2015IBMCorporation7

AComplex NetworkPerspective

Page 8: Discovering the hidden treasure of data using graph analytic — Ana Paula Appel (IBM research) @PAPIs Connect — São Paulo 2017

©2015IBMCorporation8

PhysID ICD9 PatientID DATESP45962 - 1001 09/04/13

SP45962 Z017 1001 26/04/13

SP47108 Z017 1001 06/12/13

SP47108 Z017 1001 16/12/13

SP45962 - 1002 11/07/13

SP45962 Z017 1002 12/07/13

SP45962 - 1002 19/08/13

SP59938 Z000 1002 24/10/13

… … … …

Bipartitegraph

Weightedgraph

Directedgraph

• Bipartitenetworkofdoctorsandpatients

• |V|=2.4M,|E|=11.6M

• Keeponlythelargestconnectedcomponent(92%-99%ofalllinks)

• Removemultipleedgesandmaptoweights

ANetworkApproach

Page 9: Discovering the hidden treasure of data using graph analytic — Ana Paula Appel (IBM research) @PAPIs Connect — São Paulo 2017

©2015IBMCorporation9

Phys - Patient

Nodes=402

Links=403

Patient- Patient

Nodes=377

Links=5488

Phys - Phys

Nodes=25

Links=30

Patient-Sharing networks

Linksrepresent

asharedpatient

Page 10: Discovering the hidden treasure of data using graph analytic — Ana Paula Appel (IBM research) @PAPIs Connect — São Paulo 2017

©2015IBMCorporation10

Onepatientwith

123different

physicians

409kpatientswith

only1physician

PatientHistogram PhysicianHistogram

Physican and Patient Degree Distributions

26physicianswithmore

than5kdifferent

patients,1with30k

(possiblyspurious)

Page 11: Discovering the hidden treasure of data using graph analytic — Ana Paula Appel (IBM research) @PAPIs Connect — São Paulo 2017

©2015IBMCorporation11

Network-Derived Metrics

• Aim:extend the doctors description with

relevant metrics

• Metrics which,incombination with other

data,will allow to:

• classify• filter• reduce

35 0.1 3.2 0 4% 7% ... ...

17 0.2 5.1 1 9% 1% ... ...

Compliant doctors Not-compliant doctors

Page 12: Discovering the hidden treasure of data using graph analytic — Ana Paula Appel (IBM research) @PAPIs Connect — São Paulo 2017

Case:BuildMetrics forDescribe Physicians using

Complex Network

MutualReference CentralityLoyalty

Page 13: Discovering the hidden treasure of data using graph analytic — Ana Paula Appel (IBM research) @PAPIs Connect — São Paulo 2017

HealthInsurance:SimilaritybetweenComplex

Network

Friendship PhysicianNetwork

Page 14: Discovering the hidden treasure of data using graph analytic — Ana Paula Appel (IBM research) @PAPIs Connect — São Paulo 2017

©2015IBMCorporation14

MutualReference

Page 15: Discovering the hidden treasure of data using graph analytic — Ana Paula Appel (IBM research) @PAPIs Connect — São Paulo 2017

©2015IBMCorporation15

a b

w(ab)=17

Δt =7days

w(ba)=8

Δt =2 days

time

1 1 2 2

a b b a

visit visit visit visit

Patients

Doctors

MutualReference

Samepatientvisitstwodoctors

+

Happensinbothdirections

Δt =7days Δt =2days

ReciprocalLink

GoalIdentifystrongconnectionsbetweeneachpairofphysicians,inparticular,theoutliers.

Page 16: Discovering the hidden treasure of data using graph analytic — Ana Paula Appel (IBM research) @PAPIs Connect — São Paulo 2017

©2015IBMCorporation16

BA DF SP

Top50

Top20

PE RJ

Dens.:

Dens.:

0.809 0.4470.8050.845

0.913 0.963 0.834 0.568 0.802

0.576

MutualReference

Page 17: Discovering the hidden treasure of data using graph analytic — Ana Paula Appel (IBM research) @PAPIs Connect — São Paulo 2017

MutualReference

Alergy Oftalmology

Page 18: Discovering the hidden treasure of data using graph analytic — Ana Paula Appel (IBM research) @PAPIs Connect — São Paulo 2017

©2015IBMCorporation18

MutualReference

ConclusionsandInsights• Claimdataisrichtoidentifyconnectionsamongphysicians

andhowa partnershipisdone.

• TheMutualReferenceisanindicativeofphysician

relationshipandcanpotentiallygenerateotheranalyses,

especiallyinalargevolumeofdata.

• Theproposedmetricmakespossibleafrequent

computationalanalyzeofthatrelationship.

Physician A Physician B rm Rank

MMS028 MMS027 1 1

MSP145 MSP144 0.31 10

MutualReference

• Specialtiesthatappearmore

• Ophthalmologytoophthalmology

• GynecologicandobstetriciantoGynecologicand

obstetrician

• DFhasmostofconsultationwithirregularinterval

• MDF010 andMDF009 with267consultationsandaverageofdaysequalto0

• Toppair;

• 205fromMMS028 toMMS027• 196fromMMS027 toMMS028

Page 19: Discovering the hidden treasure of data using graph analytic — Ana Paula Appel (IBM research) @PAPIs Connect — São Paulo 2017

©2015IBMCorporation19

Patient Loyalty

Page 20: Discovering the hidden treasure of data using graph analytic — Ana Paula Appel (IBM research) @PAPIs Connect — São Paulo 2017

©2015IBMCorporation20

Patient Loyalty

GoalIdentify (and quantify)doctors that have recurring patients inasystematic way,

suggesting ‘loyalty’

1.Considerpatientswithmanyvisitstodoctors

2.Computetherelativeweightforeachdoctorvisited

3.Counttherelativenumberof‘loyal’patientsforthatdoctor

Time

Consultations

Page 21: Discovering the hidden treasure of data using graph analytic — Ana Paula Appel (IBM research) @PAPIs Connect — São Paulo 2017

©2015IBMCorporation21

Patient Loyalty

SãoPaulo

1.00

• Weightwij representsthenumberofvisitsofpatienti todr.j• Strengths:sumoftheweightsattachedtolinksbelonging

toanode(i.e.,allvisitsfromi)

• Relativeweight rw(ij):fractionofweightij overtotal

Strengths

Degreek

Highrw Lowrw

Page 22: Discovering the hidden treasure of data using graph analytic — Ana Paula Appel (IBM research) @PAPIs Connect — São Paulo 2017

©2015IBMCorporation22

• Themorepatientswithhighrw andhigh

s,themostlikelythedoctorisa

candidatetohave‘loyalty’capacity

• Stability:Manydoctorsmaintain

sustainedvaluesofthemetricacross

time.

• Agivendoctorisinrank1or2during

all5quarters.

• 20%meanturnoveracrossquarters

• Top5specialtyamongphysicianswithhigher

loyalty(mf >0.5)• Orthopedicandtraumatology(5intop10)

• Ophthalmology(3)

• Gynecologicandobstetrician(2)

• Pediatric(1)

Patient Loyalty

Relativeweight

strength strength

Cardio Cardio

Physician mf RANK

MSP 139 1.54 175

MSP 261 1.18 432

Loyalty

Page 23: Discovering the hidden treasure of data using graph analytic — Ana Paula Appel (IBM research) @PAPIs Connect — São Paulo 2017

©2015IBMCorporation23

Centrality

Page 24: Discovering the hidden treasure of data using graph analytic — Ana Paula Appel (IBM research) @PAPIs Connect — São Paulo 2017

©2015IBMCorporation24

GoalIdentifyphysiciansroleinthenetworkusingtheirrelativeimportanceoverother

physicians.

• Weappliedseveralcentralitymeasures:

• Eigenvalue;

• Degree;

• Betweeness;

• Closeness

• Dothevaluesofthesemetricschangeovertime?• Isitseasonal?

Physician Centrality

physician eigen Rank Grau

MSP 153 1 1 253

MSP 139 0.55 8 335

2Q2014

CentralityConclusionandinsights• Centralityrecommendswhichphysiciansareimportantinthephysician

community

• Thereisasetofphysicianswithhighscores

• Thissetofphysicianhasaahighernumberofpatientsincommon

buildingablock

• Therelativecentralityhasapositivecorrelationamongclosephysicians

• Thisgroupofphysicianwithhighscoreisstableovertime,withfewchange

ineachquartile.

Page 25: Discovering the hidden treasure of data using graph analytic — Ana Paula Appel (IBM research) @PAPIs Connect — São Paulo 2017

©2015IBMCorporation25

Summary &Take HomeMessages

• Networksareallaboutrelationships,asmostdatais.

• Network-derivedinsightsareusuallynotreachablefromotheranalyses.

• ComplexNetworksmethodsareveryvaluabletodatascience.

• LargeHealthcareclaimdatabasefromBrazilianinsurancecompany.

• Appliedcomplexnetworkmethodstofindhowphysiciansbuildtheir

network.

• Examples:Temporality,reciprocityand‘loyalty’.

Page 26: Discovering the hidden treasure of data using graph analytic — Ana Paula Appel (IBM research) @PAPIs Connect — São Paulo 2017

Where find moreinformation..

Introduction basic Advanced

Page 27: Discovering the hidden treasure of data using graph analytic — Ana Paula Appel (IBM research) @PAPIs Connect — São Paulo 2017

Database API’s Visualization

GRAPHANALYTICS