fraud detection in financial services · community detection and influencer analysis churn risk...

48
Hans Viehmann Product Manager EMEA ORACLE Corporation Brighton, December 3 rd , 2019 @SpatialHannes Fraud Detection in Financial Services … using Graph Analysis and Machine Learning Copyright © 2019 Oracle and/or its affiliates

Upload: others

Post on 10-Oct-2020

3 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Fraud Detection in Financial Services · Community detection and influencer analysis Churn risk analysis/targeted marketing, HR Turnover analysis Product recommendation Collaborative

Hans Viehmann

Product Manager EMEA

ORACLE Corporation

Brighton, December 3rd, 2019

@SpatialHannes

Fraud Detection in Financial Services… using Graph Analysis and Machine Learning

Copyright © 2019 Oracle and/or its affiliates

Page 2: Fraud Detection in Financial Services · Community detection and influencer analysis Churn risk analysis/targeted marketing, HR Turnover analysis Product recommendation Collaborative

The following is intended to outline our general product direction. It is intended for information purposes only, and may not be incorporated into any contract. It is not a commitment to deliver any material, code, or functionality, and should not be relied upon in making purchasing decisions. The development, release, timing, and pricing of any features or functionality described for Oracle’s products may change and remains at the sole discretion of Oracle Corporation.

Safe Harbor

Copyright © 2019 Oracle and/or its affiliates

Page 3: Fraud Detection in Financial Services · Community detection and influencer analysis Churn risk analysis/targeted marketing, HR Turnover analysis Product recommendation Collaborative

Analysis of Relationships, Dependencies and Behavioural Patterns

Copyright © 2019 Oracle and/or its affiliates

Page 4: Fraud Detection in Financial Services · Community detection and influencer analysis Churn risk analysis/targeted marketing, HR Turnover analysis Product recommendation Collaborative

https://www.information-age.com/gartner-data-and-analytics-technology-trends-123479234/

Copyright © 2019 Oracle and/or its affiliates

Page 5: Fraud Detection in Financial Services · Community detection and influencer analysis Churn risk analysis/targeted marketing, HR Turnover analysis Product recommendation Collaborative

Graph Data Models

Copyright © 2019 Oracle and/or its affiliates

Property Graph Model

Financial Retail, Marketing Public Safety Smart Manufacturing

• Path Analytics

• Graph Analytics

• Detect patterns and anomalies

• Data federation

• Knowledge representation

• Semantic Web

RDF Graph Model

Life Sciences Health Care Publishing Finance

Use CasesGraph Model Industry Domain Shipping for 12+ years

Shipping for 3+ years

Page 6: Fraud Detection in Financial Services · Community detection and influencer analysis Churn risk analysis/targeted marketing, HR Turnover analysis Product recommendation Collaborative

Graph Data Model

What is a graph?

Data model representing entities as vertices and relationships as edges

Optionally including attributes

Also known as „linked data“

What are typical graphs?

Social Networks

LinkedIn, Facebook, Google+, Twitter, ...

Physical networks, Supplier networks,...

Knowledge Graphs

Apple SIRI, Google Knowledge Graph, ...

E

A D

C B

F

Copyright © 2019 Oracle and/or its affiliates

Page 7: Fraud Detection in Financial Services · Community detection and influencer analysis Churn risk analysis/targeted marketing, HR Turnover analysis Product recommendation Collaborative

Why are graphs so popular?

Easy data modeling

„whiteboard friendly“

Flexible data model

No predefined schema, easily extensible

Particularly useful for sparse data

Insight from graphical representation

Intuitive visualization

Enabling new kinds of analysis

Overcoming some limitations in relational technology

Additional perspective for Machine Learning

E

A D

C B

F

Copyright © 2019 Oracle and/or its affiliates

Page 8: Fraud Detection in Financial Services · Community detection and influencer analysis Churn risk analysis/targeted marketing, HR Turnover analysis Product recommendation Collaborative

Romanian Police Force

Creating Knowledge Graphs from all kinds of content

Social media networks, documents, images, audio, video, structured data

Using machine learning (text analysis, classification, entity extraction, face recognition, speech2text, ...)

Enabling relationship analysis and semantic search

bigCONNECT platform built by mWARE

Running on Big Data Applicance, Big Data Cloud Service or commodity Hadoop

BIG DATA since 2012

Copyright © 2019 Oracle and/or its affiliates

Page 9: Fraud Detection in Financial Services · Community detection and influencer analysis Churn risk analysis/targeted marketing, HR Turnover analysis Product recommendation Collaborative

Examples for Graph Analytics

Community detection and influencer analysisChurn risk analysis/targeted marketing, HR Turnover analysis

Product recommendationCollaborative filtering, clustering

Anomaly detectionSocial Network Analysis (spam detection), fraud detection in healthcare

Path analysis and reachabilityOutage analysis in utilities networks, vulnerability analysis in IP networks, „Panama Papers“

Pattern matchingTax fraud detection, data extraction

Copyright © 2019 Oracle and/or its affiliates

Page 10: Fraud Detection in Financial Services · Community detection and influencer analysis Churn risk analysis/targeted marketing, HR Turnover analysis Product recommendation Collaborative

Graph Analysis in Financial Services

Copyright © 2019 Oracle and/or its affiliates

Page 11: Fraud Detection in Financial Services · Community detection and influencer analysis Churn risk analysis/targeted marketing, HR Turnover analysis Product recommendation Collaborative

Banco de Galicia

Customer profitability analysis

Part of larger Hadoop/Big Data project

Analysis of banking transactions

Focus on corporate customers

Identification of undesired behaviouralpatterns, eg.

Customers using other banks to make large numbers of transactions

Many of which flow back to Banco Galicia

Increase fees, terminate contracts, or move activities to Banco Galicia

Copyright © 2019 Oracle and/or its affiliates

Page 12: Fraud Detection in Financial Services · Community detection and influencer analysis Churn risk analysis/targeted marketing, HR Turnover analysis Product recommendation Collaborative

Paysafe

Providing online payment solutions

Real-time payments, e-Wallets

1bn revenue/yr

500000 payments/day

Strong demand for fraud detection

Only feasible with graph data

In real-time, upon money movement

During account creation

In investigation, visualizing payment flows

Analysis of payment flows

Identifying suspicious patterns

Copyright © 2019 Oracle and/or its affiliates

Page 13: Fraud Detection in Financial Services · Community detection and influencer analysis Churn risk analysis/targeted marketing, HR Turnover analysis Product recommendation Collaborative

Using graph algorithms for initial assessmentFollowed by interactive analysis with visualization and PGQL

Copyright © 2019 Oracle and/or its affiliates

Page 14: Fraud Detection in Financial Services · Community detection and influencer analysis Churn risk analysis/targeted marketing, HR Turnover analysis Product recommendation Collaborative

Moving towards graph analysis with machine learning

Rule Engine:Takes

decision to process or fail

payment

Graph QueryExample: Is there fraudster in 3

payments distance?

Graph Query Example: Do we have linked by password

customer in 3 payments distance?

Example: Pass fraud probability as fact to the rule engine

Graph Database

Machine Learning

Copyright © 2019 Oracle and/or its affiliates

Page 15: Fraud Detection in Financial Services · Community detection and influencer analysis Churn risk analysis/targeted marketing, HR Turnover analysis Product recommendation Collaborative

Anomaly Detection using Graph Analysis

Example: Finding anomalies in healthcare billing data

Medical providers and their operations

Providers of the same specialty are close to each other in the graph

Closely connected by common services

a provider vertex exceptionally close to vertices of a different specialty should be an anomaly

Using closeness as a metric

eg. Hop-distance, ...

X

Doctors900,000 HCPCS

6,000Edges9,000,000

Copyright © 2019 Oracle and/or its affiliates

Page 16: Fraud Detection in Financial Services · Community detection and influencer analysis Churn risk analysis/targeted marketing, HR Turnover analysis Product recommendation Collaborative

Using Personalized Pagerank to find outliers and anomalies

Variant of Pagerank algorithm that requires a set of starting vertices

Random walks (with restart) from the starting vertices

Computes a new probability of visiting each vertex in the graph biased by the vertices on the starting set

Personalized Pagerank score → a natural relative distance (or closeness) with respect to the vertices from the starting set

Algorithm generates regular pagerank values when starting set contains all vertices in the graph

Starting set of vertices

Copyright © 2019 Oracle and/or its affiliates

Page 17: Fraud Detection in Financial Services · Community detection and influencer analysis Churn risk analysis/targeted marketing, HR Turnover analysis Product recommendation Collaborative

Anomaly Detection Procedure

Example: Finding anomalies in healthcare billing data

Medical providers and their operations

Providers of the same specialty are close to each other in the graph

Closely connected by common services

a provider vertex exceptionally close to vertices of a different specialty should be an anomaly

Using closeness as a metric

eg. Hop-distance, ...

X

DoctorsHCPCS

Same specialty(starting set)

Anomalous (other specialty)

Specialty Actions

Copyright © 2019 Oracle and/or its affiliates

Page 18: Fraud Detection in Financial Services · Community detection and influencer analysis Churn risk analysis/targeted marketing, HR Turnover analysis Product recommendation Collaborative

Combining Graph Analytics and Machine Learning

Graph Analytics

Compute graph metric(s)

Explore graph or computenew metrics using ML result

Machine Learning

Build predictive modelusing graph metric

Build model(s) and score or classify data

Add to structured data

Add to graph

Copyright © 2019 Oracle and/or its affiliates

Page 19: Fraud Detection in Financial Services · Community detection and influencer analysis Churn risk analysis/targeted marketing, HR Turnover analysis Product recommendation Collaborative

Encoding similarity for use in machine learning

Graph captures fine-grained relationship between data entities

As before, closeness can be defined and measured on the graph

Providing numeric representation of your data that retains the distance information

RawData

MLModel

Graph Representation

Numeric Representation (N-dimensional vector)

Copyright © 2019 Oracle and/or its affiliates

Page 20: Fraud Detection in Financial Services · Community detection and influencer analysis Churn risk analysis/targeted marketing, HR Turnover analysis Product recommendation Collaborative

Encoding similarity for use in machine learning

Different approaches available

eg. exploiting techniques from modern NLP (natural language processing)

Used Word2Vec in our example

a ML technique that learns closeness between words from large number of sentences

Perform many random walks on the graph

Apply W2V technique on random walk traces, treating vertices as words

KDD‘14

Copyright © 2019 Oracle and/or its affiliates

Page 21: Fraud Detection in Financial Services · Community detection and influencer analysis Churn risk analysis/targeted marketing, HR Turnover analysis Product recommendation Collaborative

Using Word2Vec on a large text corpus

Word2Vec – Mikolov et al., 2013, image: Steven Skiena, Stony Brook Univ.Copyright © 2019 Oracle and/or its affiliates

Page 22: Fraud Detection in Financial Services · Community detection and influencer analysis Churn risk analysis/targeted marketing, HR Turnover analysis Product recommendation Collaborative

Deepwalk – Translate graphs to a vector space

Copyright © 2019 Oracle and/or its affiliates

Page 23: Fraud Detection in Financial Services · Community detection and influencer analysis Churn risk analysis/targeted marketing, HR Turnover analysis Product recommendation Collaborative

Practical example – Student classification

Can you predict a student’s major or department just by looking at the classmates in the course that (s)he is taking?

Very similar to customer segmentation problem

Student => Customer

Course taking => Item or service purchase

Department => Segment label

Copyright © 2019 Oracle and/or its affiliates

CS

ME

10.003

10.004

10.005

11.103

11.213

12.118

students courses

Page 24: Fraud Detection in Financial Services · Community detection and influencer analysis Churn risk analysis/targeted marketing, HR Turnover analysis Product recommendation Collaborative

Evaluation – Comparison

1. CNN trained on “standard” features (e.g., student age, courses taken, …)

2. Use PPR and predict the department of the highest-scoring vertex

3. Train a CNN on vertex embeddingsextracted with DeepWalk

4. Add “standard” features beside graph embeddings

Copyright © 2019 Oracle and/or its affiliates

CS

ME

10.003

10.004

10.005

11.103

11.213

12.118

students courses

Page 25: Fraud Detection in Financial Services · Community detection and influencer analysis Churn risk analysis/targeted marketing, HR Turnover analysis Product recommendation Collaborative

Results

(Result #1) Graph-based prediction gives better result than naïve application of ML (e.g. CNN) on basic student features (e.g. age, gender, background, …)

(Result #2) Deep-Walk preserves information from graph representation

(Result #3) Deep-Walk allows to combined graph data with other features

CNN on Original Features

PPR (Graph Algorithm)

CNN on Extracted Graph Features(from deep-walk)

CNN on Original + Graph Features

Copyright © 2019 Oracle and/or its affiliates

Page 26: Fraud Detection in Financial Services · Community detection and influencer analysis Churn risk analysis/targeted marketing, HR Turnover analysis Product recommendation Collaborative

Enabling Spatial and Graph use cases on every platform

Oracle DatabaseSpatial and Graph Option

Oracle Big DataSpatial and Graph

CloudServices

Database Cloud Service,Exadata Cloud Service,

Graph Cloud Service (planned)Big Data Appliance,Commodity Hadoop,

Spark

Exadata,Non-Engineered Systems

Copyright © 2019 Oracle and/or its affiliates

Page 27: Fraud Detection in Financial Services · Community detection and influencer analysis Churn risk analysis/targeted marketing, HR Turnover analysis Product recommendation Collaborative

Oracle‘s Property Graph Technologies – Product Packaging

Available for Big Data platform

Hadoop, HBase, Oracle NoSQL

Supported both on BDA and commodity hardware

CDH and Hortonworks

Database connectivity through Big Data Connectors or Big Data SQL

Included in Big Data Cloud Service

Available since Oracle 12.2 (EE)

Using tables for graph persistence

In-database graph analytics

Sparsification, shortest path, page rank, triangle counting, WCC, sub graph generation…

SQL queries possible

Integration with Spatial, Text, Label Security, RDF Views, etc.

PGQL-to-SQL converter

Oracle Big Data Spatial and Graph Oracle Spatial and Graph (DB option)

Copyright © 2019 Oracle and/or its affiliates

Page 28: Fraud Detection in Financial Services · Community detection and influencer analysis Churn risk analysis/targeted marketing, HR Turnover analysis Product recommendation Collaborative

Categories of Graph Analysis

Compute values on vertices and edges

Traversing graph or iterating over graph (usually repeatedly)

Procedural logic

Examples:

Shortest Path, PageRank, Weakly Connected Components, Centrality, ...

Based on description of pattern

Find all matching sub-graphs

Computational Graph Analytics Graph Pattern Matching

:Person{100}name = ‘Amber’age = 25

:Person{200}name = ‘Paul’age = 30

:Person{300}name = ‘Heather’age = 27

:Company{777}name = ‘Oracle’location = ‘Redwood City’

:worksAt{1831}startDate = ’09/01/2015’

:friendOf{1173}

:knows{2200}

:friendOf {2513}since = ’08/01/2014’

Copyright © 2019 Oracle and/or its affiliates

Page 29: Fraud Detection in Financial Services · Community detection and influencer analysis Churn risk analysis/targeted marketing, HR Turnover analysis Product recommendation Collaborative

Oracle Graph Analytics Architecture

Scalable and Persistent Storage

Graph Storage Management

Graph Analytics In-memory Analytic Engine

Blueprints & SolrCloud / Lucene

RE

ST

We

b S

erv

ice

Py

tho

n, P

erl, P

HP

, Ru

by,

Jav

ascrip

t, …

Java APIs

Java APIs/JDBC/SQL/PLSQL

Vis

ua

liza

tio

nR

Inte

gra

tio

n (

OA

Ag

rap

h)

Sp

ark

inte

gra

tio

n

Copyright © 2019 Oracle and/or its affiliates

Page 30: Fraud Detection in Financial Services · Community detection and influencer analysis Churn risk analysis/targeted marketing, HR Turnover analysis Product recommendation Collaborative

Analytical vs. Transactional System

Three-tier

Graph analysis and traversal queries in-memory

Graph updated in-memory periodically

Two-tier

Graph traversal queries in Oracle Database

Graph updates available to queries in real-time

Shell, Notebook,Application, PGViz

Client Graph

Store

In-memory Engine

Graph AnalysisGraph Traversal

Graph

Store

Shell, Notebook,Application, PGViz

Client

Graph Traversal

Copyright © 2019 Oracle and/or its affiliates

Page 31: Fraud Detection in Financial Services · Community detection and influencer analysis Churn risk analysis/targeted marketing, HR Turnover analysis Product recommendation Collaborative

Analyzing the Marvel Graph

g = session.readGraphWithProperties(“config.json”)

analyst.pagerank(g)

analyst.vertexBetweennessCentrality(g)

g.publish(VertexProperty.ALL, EdgeProperty.ALL)

Page 32: Fraud Detection in Financial Services · Community detection and influencer analysis Churn risk analysis/targeted marketing, HR Turnover analysis Product recommendation Collaborative

Copyright © 2019 Oracle and/or its affiliates

Page 33: Fraud Detection in Financial Services · Community detection and influencer analysis Churn risk analysis/targeted marketing, HR Turnover analysis Product recommendation Collaborative

Pattern matching in Property Graphs using PGQL

Finding a given pattern in graph

Fraud detection

Anomaly detection

Subgraph extraction

...

SQL-like syntax but with graph pattern description and property access

Interactive (real-time) analysis

Supporting aggregates, comparison, such as max, min, order by, group by

Proposed for standardization by Oracle

Specification available on-line

Open-sourced front-end (i.e. parser)

https://github.com/oracle/pgql-lang

Copyright © 2019 Oracle and/or its affiliates

Page 34: Fraud Detection in Financial Services · Community detection and influencer analysis Churn risk analysis/targeted marketing, HR Turnover analysis Product recommendation Collaborative

Basic graph pattern matching

Find all instances of a given pattern/template in the data graph

SELECT v3.name, v3.ageFROM socialNetworkGraph

MATCH (v1:Person) –[:friendOf]-> (v2:Person) –[:knows]-> (v3:Person)WHERE v1.name = ‘Amber’

Query: Find all people who are known by friends of ‘Amber’.

socialNetworkGraph

100:Personname = ‘Amber’age = 25

200

:Personname = ‘Paul’age = 30

300

:Personname = ‘Heather’age = 27

777:Companyname = ‘Oracle’location = ‘Redwood City’

:worksAt{1831}startDate = ’09/01/2015’

:friendOf{1173}

:knows{2200}

:friendOf {2513}since = ’08/01/2014’

Copyright © 2019 Oracle and/or its affiliates

Page 35: Fraud Detection in Financial Services · Community detection and influencer analysis Churn risk analysis/targeted marketing, HR Turnover analysis Product recommendation Collaborative

Basic graph pattern matching

Find all instances of a given pattern/template in the data graph

SELECT v3.name, v3.ageFROM socialNetworkGraph

MATCH (v1:Person) –[:friendOf]-> (v2:Person) –[:knows]-> (v3:Person)WHERE v1.name = ‘Amber’

Query: Find all people who are known by friends of ‘Amber’.

socialNetworkGraph

100:Personname = ‘Amber’age = 25

200

:Personname = ‘Paul’age = 30

300

:Personname = ‘Heather’age = 27

777:Companyname = ‘Oracle’location = ‘Redwood City’

:worksAt{1831}startDate = ’09/01/2015’

:friendOf{1173}

:knows{2200}

:friendOf {2513}since = ’08/01/2014’

socialNetworkGraph

100:Personname = ‘Amber’age = 25

200

:Personname = ‘Paul’age = 30

300

:Personname = ‘Heather’age = 27

777:Companyname = ‘Oracle’location = ‘Redwood City’

:worksAt{1831}startDate = ’09/01/2015’

:friendOf{1173}

:knows{2200}

:friendOf {2513}since = ’08/01/2014’

Copyright © 2019 Oracle and/or its affiliates

Page 36: Fraud Detection in Financial Services · Community detection and influencer analysis Churn risk analysis/targeted marketing, HR Turnover analysis Product recommendation Collaborative

PGQL Examples

SELECT e

MATCH ()-[e]->()

Page 37: Fraud Detection in Financial Services · Community detection and influencer analysis Churn risk analysis/targeted marketing, HR Turnover analysis Product recommendation Collaborative

How to get started quickly

... once we have the Graph Cloud Service

Copyright © 2019 Oracle and/or its affiliates

Page 38: Fraud Detection in Financial Services · Community detection and influencer analysis Churn risk analysis/targeted marketing, HR Turnover analysis Product recommendation Collaborative

Requirements

Technology to interact with the data

Data modeling tool to convert tabular data

Graph database and analytics environment

Copyright © 2019 Oracle and/or its affiliates

Page 39: Fraud Detection in Financial Services · Community detection and influencer analysis Churn risk analysis/targeted marketing, HR Turnover analysis Product recommendation Collaborative

Graph Cloud Service (planned)

“One-click” deployment: no installation, zero configurationAutomated failure detection and recovery

Automated graph modelerEasily convert your relational data into property graphs

Pre-built Algorithms, Flows and SQL-like graph query languageJava, Groovy

Rest APIs

Rich User InterfaceLow code / zero code features

Notebook support and powerful data visualization features

Copyright © 2019 Oracle and/or its affiliates

Page 40: Fraud Detection in Financial Services · Community detection and influencer analysis Churn risk analysis/targeted marketing, HR Turnover analysis Product recommendation Collaborative

Converting relational schema to a graph

Copyright © 2019 Oracle and/or its affiliates

Page 41: Fraud Detection in Financial Services · Community detection and influencer analysis Churn risk analysis/targeted marketing, HR Turnover analysis Product recommendation Collaborative

Interactive analysis with Notebooks

Copyright © 2019 Oracle and/or its affiliates

Page 42: Fraud Detection in Financial Services · Community detection and influencer analysis Churn risk analysis/targeted marketing, HR Turnover analysis Product recommendation Collaborative

Graph visualization

Copyright © 2019 Oracle and/or its affiliates

Page 43: Fraud Detection in Financial Services · Community detection and influencer analysis Churn risk analysis/targeted marketing, HR Turnover analysis Product recommendation Collaborative

Graph capabilities in Oracle Products and Cloud Services

Graph databases are powerful tools, complementing relational databases

Especially strong for analysis of graph topology and connectednessGraph analytics offer new insight

Especially relationships, dependencies and behavioural patternsOracle Property Graph technology offers

Comprehensive analytics through various APIs, integration with relational database

Scaleable, parallel in-memory processing

Secure and scaleable graph storage using Hadoop platform or Oracle DatabaseAvailable both on-premise or in the Cloud already today

Copyright © 2019 Oracle and/or its affiliates

Page 44: Fraud Detection in Financial Services · Community detection and influencer analysis Churn risk analysis/targeted marketing, HR Turnover analysis Product recommendation Collaborative

Confidential – Oracle Internal/Restricted/Highly Restricted

„Whenever you‘re analyzing relationships, think graphs!“

Key takeaway for today ...

Page 45: Fraud Detection in Financial Services · Community detection and influencer analysis Churn risk analysis/targeted marketing, HR Turnover analysis Product recommendation Collaborative

Resources

Oracle Property Graph Technologies OTN product page:https://www.oracle.com/database/technologies/spatialandgraph/property-graph-features.html

White papers, software downloads, documentation and videos

Oracle Labs Tutorials https://docs.oracle.com/cd/E56133_01/latest/tutorials/index.html

Blog post series on setting up Graph Analysis on Oracle Cloudhttps://blogs.oracle.com/oraclespatial/how-to-enable-oracle-database-cloud-service-with-property-graph-capabilities

Free cloud credits available on http://cloud.oracle.com

Blog – examples, tips & tricks: blogs.oracle.com/bigdataspatialgraph

@OracleBigData, @SpatialHannes, @JeanIhm Oracle Spatial and Graph Group

Copyright © 2019 Oracle and/or its affiliates

Page 46: Fraud Detection in Financial Services · Community detection and influencer analysis Churn risk analysis/targeted marketing, HR Turnover analysis Product recommendation Collaborative

Where does the community meet?

Copyright © 2019 Oracle and/or its affiliates

Page 47: Fraud Detection in Financial Services · Community detection and influencer analysis Churn risk analysis/targeted marketing, HR Turnover analysis Product recommendation Collaborative

Thank you

Hans Viehmann

@SpatialHannes

Copyright © 2019 Oracle and/or its affiliates

Page 48: Fraud Detection in Financial Services · Community detection and influencer analysis Churn risk analysis/targeted marketing, HR Turnover analysis Product recommendation Collaborative