applying knowledge based ai to modern data … › resources › sfdama › downloads... · “ai...

25
SF DAMA Day 2017 APPLYING KNOWLEDGE BASED AI TO MODERN DATA MANAGEMENT Mani Keeran, CFA Gi Kim, CFA Preeti Sharma

Upload: others

Post on 26-Jun-2020

1 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: APPLYING KNOWLEDGE BASED AI TO MODERN DATA … › resources › SFDAMA › Downloads... · “AI is the study of how to make computers do things at which, at the moment, people are

SF DAMA Day 2017

APPLYING KNOWLEDGE BASED AI TO MODERN DATA MANAGEMENT

Mani Keeran, CFAGi Kim, CFAPreeti Sharma

Page 2: APPLYING KNOWLEDGE BASED AI TO MODERN DATA … › resources › SFDAMA › Downloads... · “AI is the study of how to make computers do things at which, at the moment, people are

SF DAMA Day 2017

What we are going to discuss…

• During last two decades, majority of information assets have been digitized in the financial sector.

• However, relational databases cannot answer critical business inquiries easily because its model lacks of semantics of business processes while focusing on efficient physical mechanism of data.

• Now new ways of storing data is being matured. RDF/OWL, one of most prominent Semantic Web technologies can be one of options for enabling datastore incorporating human intelligence.

• We will present how experimental Enterprise Knowledge Graph was created out of legacy RDB tables. It will include “inferences” features in querying. We consider this “machine readable” data format as potential backend of AI mechanism in Financial Information Management.

2

Page 3: APPLYING KNOWLEDGE BASED AI TO MODERN DATA … › resources › SFDAMA › Downloads... · “AI is the study of how to make computers do things at which, at the moment, people are

SF DAMA Day 2017

CHALLENGES IN FINANCIAL SECTORSIs my database intelligent enough to answer my questionnaires?

3

Page 4: APPLYING KNOWLEDGE BASED AI TO MODERN DATA … › resources › SFDAMA › Downloads... · “AI is the study of how to make computers do things at which, at the moment, people are

SF DAMA Day 2017

Example Questionnaire #1:Am I allowed to make this contract with them?

Is this a valid trade?

Fund A Counterparty X

US Person

Dodd Frank Compliant

Non-US Person

Under EU Regulation

Has not approved by the Committee

yet

?

Trying to make a Swap contract

• Finding a counterparty for a certain trade is restricted by various regulatory requirements and internal rules.

• This process is defined as a multi-step decision making tree, which is conducted by manual workflow or predefined codes.

4

Page 5: APPLYING KNOWLEDGE BASED AI TO MODERN DATA … › resources › SFDAMA › Downloads... · “AI is the study of how to make computers do things at which, at the moment, people are

SF DAMA Day 2017

Example Questionnaire #2:What is our exposure to Lehman Brothers?• When Lehman Brother failed, many firms were not able to aggregate exposure because their

systems were not ready for this.

• An human understands “exposure” should include various relationships between two parties, but a machine needs explicit code that implements the logic to integrate various sources.

Company A

Lehman Brothers

Swap Counterparty

Subsidiary 1

Subsidiary 2

Subsidiary 3

Subsidiary 4

Security Issuer

Prime Broker

General Partner

Money Market

5

Page 6: APPLYING KNOWLEDGE BASED AI TO MODERN DATA … › resources › SFDAMA › Downloads... · “AI is the study of how to make computers do things at which, at the moment, people are

SF DAMA Day 2017

Relational Database is not Intelligent because it lacks of Semantics in its physical data models

ConceptualDescribes Semantics of a business domainCaptures Business Requirements in terms of Entities and Relationships

LogicalSystem Model captures details of the entities in terms of attributesCommunicates Design Details

PhysicalTechnology Model/Physical DesignImplementation of Specific Use CaseBusiness metadata is separated from physical implementation

6

Semantics Implied in Data Model Semantics Embedded in Application Logic

Page 7: APPLYING KNOWLEDGE BASED AI TO MODERN DATA … › resources › SFDAMA › Downloads... · “AI is the study of how to make computers do things at which, at the moment, people are

SF DAMA Day 2017

KNOWLEDGE BASED AIEnlightening Financial Information System Using Ontology, FIBO and Triple Store

7

Page 8: APPLYING KNOWLEDGE BASED AI TO MODERN DATA … › resources › SFDAMA › Downloads... · “AI is the study of how to make computers do things at which, at the moment, people are

SF DAMA Day 2017

Moving to AI-enabled Data

What is AI?“AI is the study of how to make computers do things at which, at the moment, people are better” – Elaine Rich, 1986

What people are better in using data than databases? People understand context of data.

People can associate relevant information together even when they are not obtained at the same time, same place.

People can leverage both of explicit and implicit relationships of information

8

Page 9: APPLYING KNOWLEDGE BASED AI TO MODERN DATA … › resources › SFDAMA › Downloads... · “AI is the study of how to make computers do things at which, at the moment, people are

SF DAMA Day 2017

Knowledge Base

Knowledge Based AI

9

Deep Learning

Machine LearningMachine ReadableBusiness Knowledge

Knowledge from Learning

Reasoning

Page 10: APPLYING KNOWLEDGE BASED AI TO MODERN DATA … › resources › SFDAMA › Downloads... · “AI is the study of how to make computers do things at which, at the moment, people are

SF DAMA Day 2017

From Database to Knowledge Base

• Structured collection of records or data. (Wikipedia)

• Organized in such a way that a program can quickly select desired pieces. (webopedia)

• Machine-readable resource for the dissemination of information. A Dynamic Resource that may itself have the capacity to learn, as part of an AI expert system. (Techtarget)

• Stores knowledge in a computer-readable form, usually for the purpose of having automated reasoning applied to them. (Wikipedia)

10

Database Knowledge Base

Page 11: APPLYING KNOWLEDGE BASED AI TO MODERN DATA … › resources › SFDAMA › Downloads... · “AI is the study of how to make computers do things at which, at the moment, people are

SF DAMA Day 2017

Ontology

• Data model that captures knowledge of a domain as a set of

concepts & relationships between these concepts.

• Allows relationships to become visible and not dependent on code

that embeds business rules.

• Semantic Industry Standards - Resource Descriptor Framework (RDF)

and Web Ontology Language (OWL) govern the construction of

Ontologies

11

Page 12: APPLYING KNOWLEDGE BASED AI TO MODERN DATA … › resources › SFDAMA › Downloads... · “AI is the study of how to make computers do things at which, at the moment, people are

SF DAMA Day 2017

Ontology Model as Knowledge Base

12

Common Stock

Class

Relationship

Brokerage Firm

Issuer

Country

is domiciled in

issues

has an issuer

buy/sell securities

Financial Institution

type

type

type

Legal Entity

Page 13: APPLYING KNOWLEDGE BASED AI TO MODERN DATA … › resources › SFDAMA › Downloads... · “AI is the study of how to make computers do things at which, at the moment, people are

SF DAMA Day 2017

Triple Store as Knowledge Base Storage

13

Triple Instance

Issuer Common Stock

issues

Apple Inc. Issues Apple Common Stock (APPL)

Subject Predicate Object

Triples is the simplest instance format of RDF/OWL expression.

A Triplestore is a database for triples. Data is indexed for the graph data structure

APPL

Apple Inc.

US

is domiciled in

issues

Fund A

has position

OWL expression

Page 14: APPLYING KNOWLEDGE BASED AI TO MODERN DATA … › resources › SFDAMA › Downloads... · “AI is the study of how to make computers do things at which, at the moment, people are

SF DAMA Day 2017

OWL – Language of Ontology

• Web Ontology Language from W3C• Computational logic-based, used to express knowledge which can be

understood and processed by machines• Ontologies described using OWL can be processed by a reasoner which

infers knowledge based on the asserted facts in the Ontology• Has various constructs to create Ontologies and support reasoning e.g.

• Class• EquivalentClass• TransitiveProperty• FunctionalProperty• Rdfs:label etc.

14

Page 15: APPLYING KNOWLEDGE BASED AI TO MODERN DATA … › resources › SFDAMA › Downloads... · “AI is the study of how to make computers do things at which, at the moment, people are

SF DAMA Day 2017

Financial Industry Business Ontology (FIBO)

What is FIBO?• an industry initiative to define financial industry terms, definitions and synonyms using semantic web principles

such as RDF/OWL and OMG modeling standards such as UML. • a joint effort by the Object Management Group (OMG) and the Enterprise Data Management (EDM) Council. • includes Ontologies for Financial Instruments, Business Entities, Market Data etc.• driven by regulatory and industry requirement for data quality and transparency.

Benefits • provides common reference model to integrate disparate technical systems and message formats within an

organization and across industry.• aims to bring transparency in the financial system.• aims to improve regulatory reporting by providing clear meaning of data.

15

Each industry is coming up with its own standard ontologies:

• CDISC : Clinical Research• HL7 : Health Care • ACORD : Insurance

Page 16: APPLYING KNOWLEDGE BASED AI TO MODERN DATA … › resources › SFDAMA › Downloads... · “AI is the study of how to make computers do things at which, at the moment, people are

SF DAMA Day 2017

FIBO as Knowledge Base Language for Finance

• Any type of Legacy data can be a part- Both of Structured/unstructured data can be interpreted

into Ontology Language using FIBO vocabularies.

• FIBO brings the parts together- FIBO pre-defines implicit/explicit business rules between

objects in standardized manner.

- Undiscovered relationship can be added as combination

of standard vocabularies using query grammar.

• Inference: Make the sum of the parts whole- Business rules are included as a part of the data model.

- A query will leverage business rules defined in FIBO

16

Page 17: APPLYING KNOWLEDGE BASED AI TO MODERN DATA … › resources › SFDAMA › Downloads... · “AI is the study of how to make computers do things at which, at the moment, people are

SF DAMA Day 2017

EKG;ENTERPRISE KNOWLEDGE GRAPHProof of Concept

17

Page 18: APPLYING KNOWLEDGE BASED AI TO MODERN DATA … › resources › SFDAMA › Downloads... · “AI is the study of how to make computers do things at which, at the moment, people are

SF DAMA Day 2017

Introducing EKG – Enterprise Knowledge Graph

Enterprise Data Platform

Data Producers, Data Lake

Operational Zone Analytical Zone EKG Zone

Integration Zone

• Normalized Model• Support Operational

Reporting• Data Distribution

• Dimensional Model• Support Analytical

Reporting• Connected to BI Layer

• Graph Data Model• Support Semantic Query• Extended data service

with Inference feature

18

Page 19: APPLYING KNOWLEDGE BASED AI TO MODERN DATA … › resources › SFDAMA › Downloads... · “AI is the study of how to make computers do things at which, at the moment, people are

SF DAMA Day 2017

Logical View of PoC

Issuers of Securities

Counterparty

Bloomberg File

Brokers

Agents

Investment Advisors

Legal Entities

EKG(Triplestore)

Legacy Tables Query

• Use legacy tables as sources. • Translate the legacy data into triples and store them.

• Do queries with Business questions.

19

Page 20: APPLYING KNOWLEDGE BASED AI TO MODERN DATA … › resources › SFDAMA › Downloads... · “AI is the study of how to make computers do things at which, at the moment, people are

SF DAMA Day 2017

PoC TechnologyKey Components

Resource Mapper- Developed as an in-house Java code

leveraging external libraries for triple store communication.

Triple Store- Implemented on AllegroGraph Free Edition

(Oracle 12c was also tested)

Mapping Files- CSV format files that contain mapping rules

between tables and triples.

FIBO Extension- Relevant vocabularies picked from published

FIBO version - Additional vocabularies for extension

Linux Server

Results:+1.4 million triples

Resource Mapper

Legacy Database

File feed Data Files

API over HTTP

MappingFiles

Define mappings using Excel

Triple Store*

* AllegroGraph 6.3 or Oracle 12c Spatial and Graph

Import FIBO RDF/OWL files

Input:10 tables w/ +250k rows

20

Page 21: APPLYING KNOWLEDGE BASED AI TO MODERN DATA … › resources › SFDAMA › Downloads... · “AI is the study of how to make computers do things at which, at the moment, people are

SF DAMA Day 2017

Rule-based Triple translation

Column Usage Sample value

SUBJECT A column which has the name of Subject resource

<name of a column>

FLAG Defines a type of a Object in a statement. URI (Bare URI), Literal (Value), Resource

PREDICATE A URI of a Predicate http://www.omg.org/spec/EDMC‐FIBO/BE/LegalEntities/LegalPersons/isOrganizedIn

OBJECT A URI, a column which have a value of Literal or a name of Object resource

http://www.omg.org/spec/EDMC‐FIBO/BE/LegalEntities/LegalPersons/LegalEntity

BASE_URI_Subject Base URI for Subject a resource http://franklintempleton.com/fibopoc/SecurityIssuer#

BASE_URI_Object Base URI for Object a resource http://franklintempleton.com/fibopoc/SecurityIssuer#

CONTEXT URI of graph http://franklintempleton.com/fibopoc/legalentity

SUBJECT FLAG PREDICATE OBJECT CONTEXT BASE_URI_S BASE_URI_Orisk_cntry URI http://www.w3.org/1999/02/22‐rdf‐syntax‐nhttp://spec.edmcouncil.org/fibo/red/fnd/plachttp://franklintemhttp://franklintempleton.com/fibopoc/Country#risk_cntry URI http://www.w3.org/1999/02/22‐rdf‐syntax‐nhttp://www.w3.org/2002/07/owl#Individual http://franklintemhttp://franklintempleton.com/fibopoc/Country#bbcompid URI http://www.w3.org/1999/02/22‐rdf‐syntax‐nhttp://www.semanticweb.org/gkim/ontologiehttp://franklintemhttp://franklintempleton.com/fibopoc/BloombergIdentifier#bbcompid URI http://www.w3.org/1999/02/22‐rdf‐syntax‐nhttp://www.w3.org/2002/07/owl#Individual http://franklintemhttp://franklintempleton.com/fibopoc/BloombergIdentifier#bbcompid URI http://www.w3.org/1999/02/22‐rdf‐syntax‐nhttp://spec.edmcouncil.org/fibo/red/be/corphttp://franklintemhttp://franklintempleton.com/fibopoc/BloombergIdentifier#bbcompid Literal http://www.w3.org/2000/01/rdf‐schema#labcomp_name http://franklintemhttp://franklintempleton.com/fibopoc/BloombergIdentifier#bbcompid Resource http://spec.edmcouncil.org/fibo/red/be/owbbultimate_parentid http://franklintemhttp://franklintempletohttp://franklintempleton.com/fibopoc/Bloobbultimate_parentid URI http://www.w3.org/1999/02/22‐rdf‐syntax‐nhttp://www.semanticweb.org/gkim/ontologiehttp://franklintemhttp://franklintempleton.com/fibopoc/BloombergIdentifier#bbultimate_parentid URI http://www.w3.org/1999/02/22‐rdf‐syntax‐nhttp://www.w3.org/2002/07/owl#Individual http://franklintemhttp://franklintempleton.com/fibopoc/BloombergIdentifier#bbultimate_parentid URI http://www.w3.org/1999/02/22‐rdf‐syntax‐nhttp://spec.edmcouncil.org/fibo/red/be/ownhttp://franklintemhttp://franklintempleton.com/fibopoc/BloombergIdentifier#bbparentid URI http://www.w3.org/1999/02/22‐rdf‐syntax‐nhttp://www.semanticweb.org/gkim/ontologiehttp://franklintemhttp://franklintempleton.com/fibopoc/BloombergIdentifier#bbparentid URI http://www.w3.org/1999/02/22‐rdf‐syntax‐nhttp://www.w3.org/2002/07/owl#Individual http://franklintemhttp://franklintempleton.com/fibopoc/BloombergIdentifier#

21

Page 22: APPLYING KNOWLEDGE BASED AI TO MODERN DATA … › resources › SFDAMA › Downloads... · “AI is the study of how to make computers do things at which, at the moment, people are

SF DAMA Day 2017

Technical Environment

1. Triple Store : Allegrograph – an RDF graph database

2. Visual Discovery Tool : Gruff – a triple store browser

3. Query Language: SPARQL – RDF query language

Triple Store

AllegroGraph

Gruff

SELECT ?contract WHERE { ?contract <hasCounterparty> ?Counterparty. ?Counterparty.<isLegalEntity> ?lege }

SPARQL

22

Page 23: APPLYING KNOWLEDGE BASED AI TO MODERN DATA … › resources › SFDAMA › Downloads... · “AI is the study of how to make computers do things at which, at the moment, people are

SF DAMA Day 2017

USE CASE DEMO

23

Page 24: APPLYING KNOWLEDGE BASED AI TO MODERN DATA … › resources › SFDAMA › Downloads... · “AI is the study of how to make computers do things at which, at the moment, people are

SF DAMA Day 2017

Use Case #1:Am I allowed to make this contract with them?

Is this a valid trade?

Fund A Counterparty X

US Person

Dodd Frank Compliant

Non-US Person

Under EU Regulation

Has not approved by the Committee yet

?

Trying to make a Swap

contract

24

Page 25: APPLYING KNOWLEDGE BASED AI TO MODERN DATA … › resources › SFDAMA › Downloads... · “AI is the study of how to make computers do things at which, at the moment, people are

SF DAMA Day 2017

Use Case #1:Sample Process

25