the database for ai - developer.ibm.com · example: leonardo di caprio was the actor for the romeo...

38
Contribute on github.com/graknlabs Follow us on twitter.com/graknlabs Join our community on grakn.ai/slack THE DATABASE FOR AI

Upload: others

Post on 06-Nov-2019

1 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: THE DATABASE FOR AI - developer.ibm.com · Example: Leonardo di Caprio was the actor for the Romeo character in the film Romeo and Juliet. Virtual Relationships Located-in Kings Cross

Contribute on github.com/graknlabs Follow us on twitter.com/graknlabsJoin our community on grakn.ai/slack

T H E D ATA B A S E F O R A I

Page 2: THE DATABASE FOR AI - developer.ibm.com · Example: Leonardo di Caprio was the actor for the Romeo character in the film Romeo and Juliet. Virtual Relationships Located-in Kings Cross
Page 3: THE DATABASE FOR AI - developer.ibm.com · Example: Leonardo di Caprio was the actor for the Romeo character in the film Romeo and Juliet. Virtual Relationships Located-in Kings Cross

“For a computer to pass a Turing Test, it needs to possess: Natural Language Processing,

Knowledge Representation, Automated Reasoning and Machine Learning”

Peter Norvig and Stuart J. Russell, “Artificial Intelligence: A Modern Approach”, 1994

Page 4: THE DATABASE FOR AI - developer.ibm.com · Example: Leonardo di Caprio was the actor for the Romeo character in the film Romeo and Juliet. Virtual Relationships Located-in Kings Cross

WHAT AN INTELLIGENT AGENT NEEDS TO POSSESS

KNOWLEDGE REPRESENTATION SYSTEM

AUTOMATED REASONING

NATURAL LANGUAGE PROCESSING

BIG DATA MACHINE LEARNING

Page 5: THE DATABASE FOR AI - developer.ibm.com · Example: Leonardo di Caprio was the actor for the Romeo character in the film Romeo and Juliet. Virtual Relationships Located-in Kings Cross

WHAT AN INTELLIGENT AGENT NEEDS TO POSSESS

A KNOWLEDGE BASE FOR STORAGE AND

RETRIEVAL OF COMPLEX

INFORMATION

NATURAL LANGUAGE PROCESSING

BIG DATA MACHINE LEARNING

Page 6: THE DATABASE FOR AI - developer.ibm.com · Example: Leonardo di Caprio was the actor for the Romeo character in the film Romeo and Juliet. Virtual Relationships Located-in Kings Cross

Meet GRAKN and GRAQL

APACHE KAFKA

GRAKN

APACHEHBASE /

CASSANDRA

GRAQL

APACHE TINKERPOP

GRAKN KNOWLEDGE REPRESENTATION SYSTEM

GRAPH ANALYTICS

Grakn is a distributed data platform that uses an ontology to govern the data structure.

Graql is the knowledge graph query language that performs automated reasoning and graph analytics over Grakn.

APACHE SPARK

APACHE HADOOP

INFERENCE ENGINE

Page 7: THE DATABASE FOR AI - developer.ibm.com · Example: Leonardo di Caprio was the actor for the Romeo character in the film Romeo and Juliet. Virtual Relationships Located-in Kings Cross

Grakn is a database in the form of a knowledge base that uses an ontology to model complex datasets

&

Grakn is a query language that performs automated reasoning to simplify querying for complex datasets

Page 8: THE DATABASE FOR AI - developer.ibm.com · Example: Leonardo di Caprio was the actor for the Romeo character in the film Romeo and Juliet. Virtual Relationships Located-in Kings Cross

Ontology as a knowledge representation model

A modelling tool should be able to model the real world and all the hierarchies and hyper-relationships contained in it.

Page 9: THE DATABASE FOR AI - developer.ibm.com · Example: Leonardo di Caprio was the actor for the Romeo character in the film Romeo and Juliet. Virtual Relationships Located-in Kings Cross

Grakn Knowledge Model

Resource RoleEntity Relation Rules

Employment

Employee Employer

PersonNamehas

Has roleHas rolePlays role

Page 10: THE DATABASE FOR AI - developer.ibm.com · Example: Leonardo di Caprio was the actor for the Romeo character in the film Romeo and Juliet. Virtual Relationships Located-in Kings Cross

Entities and Resources

Person company

name

has has

Page 11: THE DATABASE FOR AI - developer.ibm.com · Example: Leonardo di Caprio was the actor for the Romeo character in the film Romeo and Juliet. Virtual Relationships Located-in Kings Cross

Entity Inheritance

Customer Startup

Person company

name

has has

sub sub

Page 12: THE DATABASE FOR AI - developer.ibm.com · Example: Leonardo di Caprio was the actor for the Romeo character in the film Romeo and Juliet. Virtual Relationships Located-in Kings Cross

Relationship Structure

Customer Startup

Person company

name

has has

Employee Employer

Employment

sub sub

has-role has-role

plays-role plays-role

Page 13: THE DATABASE FOR AI - developer.ibm.com · Example: Leonardo di Caprio was the actor for the Romeo character in the film Romeo and Juliet. Virtual Relationships Located-in Kings Cross

Implied Relationship Structure

Customer Startup

Person company

name

has has

Employee Employer

Employment

sub sub

has-role has-role

plays-role plays-role

Page 14: THE DATABASE FOR AI - developer.ibm.com · Example: Leonardo di Caprio was the actor for the Romeo character in the film Romeo and Juliet. Virtual Relationships Located-in Kings Cross

Non-directional Relationships

Example: Bob is married to Alice, where Bob is the husband and Alice is the wife

husband wifemarriage

has-role has-role

Page 15: THE DATABASE FOR AI - developer.ibm.com · Example: Leonardo di Caprio was the actor for the Romeo character in the film Romeo and Juliet. Virtual Relationships Located-in Kings Cross

Relationships and Resources

Example: Bob is married to Alice on 14 of Jan 2017, where Bob is the husband and Alice is the wife

husband wifemarriage

has-role has-role

date

has

Page 16: THE DATABASE FOR AI - developer.ibm.com · Example: Leonardo di Caprio was the actor for the Romeo character in the film Romeo and Juliet. Virtual Relationships Located-in Kings Cross

Relationship in Relationship

husband wifemarriage

has-role has-role

date

has

located-subject

subject-location

located-in

has-role has-role

plays-role

Example: Bob is married to Alice on 14 of Jan 2017 at Austin, Texas, where Bob is the husband and Alice is the wife

Page 17: THE DATABASE FOR AI - developer.ibm.com · Example: Leonardo di Caprio was the actor for the Romeo character in the film Romeo and Juliet. Virtual Relationships Located-in Kings Cross

N-ary Relations

casted-movie

character

movie-casthas-role

has-role

actor

has-role

Example: Leonardo di Caprio was the actor for the Romeo character in the film Romeo and Juliet

Page 18: THE DATABASE FOR AI - developer.ibm.com · Example: Leonardo di Caprio was the actor for the Romeo character in the film Romeo and Juliet. Virtual Relationships Located-in Kings Cross

Virtual Relationships

Located-in

LondonKings Cross UK

Located-inLocated-in

Located-subject

Located-subject

Subject-location

Located-subject

Subject-location

Subject-location

Page 19: THE DATABASE FOR AI - developer.ibm.com · Example: Leonardo di Caprio was the actor for the Romeo character in the film Romeo and Juliet. Virtual Relationships Located-in Kings Cross

Dynamic Relations

Schedule A

Schedule B

A Start B Start A End B end

Page 20: THE DATABASE FOR AI - developer.ibm.com · Example: Leonardo di Caprio was the actor for the Romeo character in the film Romeo and Juliet. Virtual Relationships Located-in Kings Cross

Data Model Constraint

Customer Startup

Person companynamehas has

Employee Employer

Employment

sub sub

has-role has-role

plays-role plays-role

husband

wife

marriage

has-role

has-role

plays-role

plays-role

Page 21: THE DATABASE FOR AI - developer.ibm.com · Example: Leonardo di Caprio was the actor for the Romeo character in the film Romeo and Juliet. Virtual Relationships Located-in Kings Cross

Data Model Constraint

Customer Startup

Person companynamehas has

Employee Employer

Employment

sub sub

has-role has-role

plays-role plays-role

husband

wife

marriage

has-role

has-role

plays-role

plays-role

✓ Write commit success

Page 22: THE DATABASE FOR AI - developer.ibm.com · Example: Leonardo di Caprio was the actor for the Romeo character in the film Romeo and Juliet. Virtual Relationships Located-in Kings Cross

Data Model Constraint

Customer Startup

Person companynamehas has

Employee Employer

Employment

sub sub

has-role has-role

plays-role plays-role

husband

wife

marriage

has-role

has-role

plays-role

plays-role

❌ Write commit fails

Page 23: THE DATABASE FOR AI - developer.ibm.com · Example: Leonardo di Caprio was the actor for the Romeo character in the film Romeo and Juliet. Virtual Relationships Located-in Kings Cross

A reasoning query language

A query language should not only be able to retrieve explicitly stored data, but also implicitly derived information.

Page 24: THE DATABASE FOR AI - developer.ibm.com · Example: Leonardo di Caprio was the actor for the Romeo character in the film Romeo and Juliet. Virtual Relationships Located-in Kings Cross

SQL vs Gremlin

Get customer A with ID ’ALFKI’, and get a different customer B who has bought products that A have bought, and get other products from B that A have not bought. Order by name.

Page 25: THE DATABASE FOR AI - developer.ibm.com · Example: Leonardo di Caprio was the actor for the Romeo character in the film Romeo and Juliet. Virtual Relationships Located-in Kings Cross

SQL vs Gremlin vs Graql

Get customer A with ID ’ALFKI’, and get a different customer B who has bought products that A have bought, and get other products from B that A have not bought. Order by name.

Page 26: THE DATABASE FOR AI - developer.ibm.com · Example: Leonardo di Caprio was the actor for the Romeo character in the film Romeo and Juliet. Virtual Relationships Located-in Kings Cross

Graql Basic

Get customer A with ID ’ALFKI’, and get a different customer B who has bought products that A have bought, and get other products from B that A have not bought. Order by name.

Page 27: THE DATABASE FOR AI - developer.ibm.com · Example: Leonardo di Caprio was the actor for the Romeo character in the film Romeo and Juliet. Virtual Relationships Located-in Kings Cross

Graql Basic

Get every actor and their character in the movie Titanic and rank by their billing number

Get every couple that got married in Hawaii

Page 28: THE DATABASE FOR AI - developer.ibm.com · Example: Leonardo di Caprio was the actor for the Romeo character in the film Romeo and Juliet. Virtual Relationships Located-in Kings Cross

Graql reasoning query

Whether you write the query in SQL, NoSQL or Graph queries, the query will still be complex, verbose, and potentially suboptimal path

Permanent driver

Temporary driver

TR11 N3 8AB

truck

postcodedestination

CON22 Kings Cross

Bus

Countydestination

BA191 London

Van

Citydestination

located-in

located-in

Contract driver

Driver

SUB

SUB

SUB

Need type abstraction Need transitive relationsTwo Problems

Page 29: THE DATABASE FOR AI - developer.ibm.com · Example: Leonardo di Caprio was the actor for the Romeo character in the film Romeo and Juliet. Virtual Relationships Located-in Kings Cross

Graql reasoning query

Permanent driver

Temporary driver

TR11 N3 8AB

truck

postcodedestination

CON22 Kings Cross

Bus

Countydestination

BA191 London

Van

Citydestination

located-in

located-in

Contract driver

Driver

SUB

SUB

SUB

Need type abstraction Need transitive relationsTwo Problems

5 Graql Lines

Page 30: THE DATABASE FOR AI - developer.ibm.com · Example: Leonardo di Caprio was the actor for the Romeo character in the film Romeo and Juliet. Virtual Relationships Located-in Kings Cross

Graql analytics query

Example:

Connected Component – clustering algorithm

For each vertex V,

Superstep 1:

V sends its own id via both out going and incoming edgesV sets its own id as cluster label

Do Superstep n:

For every received message m of V, compare it to its current cluster label L:If m > L, set the label to m;

If the cluster label has not changed in this super step, vote to halt;Else, send the new cluster label via all edges;

Global operation:

While not every vertex votes to halt, and n < N, do another superstep n + 1.

A VertexProgram Class could be around 200 lines of Java code.

Page 31: THE DATABASE FOR AI - developer.ibm.com · Example: Leonardo di Caprio was the actor for the Romeo character in the film Romeo and Juliet. Virtual Relationships Located-in Kings Cross

Graql analytics query

Example:

Connected Component – clustering algorithm

For each vertex V,

Superstep 1:

V sends its own id via both out going and incoming edgesV sets its own id as cluster label

Do Superstep n:

For every received message m of V, compare it to its current cluster label L:If m > L, set the label to m;

If the cluster label has not changed in this super step, vote to halt;Else, send the new cluster label via all edges;

Global operation:

While not every vertex votes to halt, and n < N, do another superstep n + 1.

Page 32: THE DATABASE FOR AI - developer.ibm.com · Example: Leonardo di Caprio was the actor for the Romeo character in the film Romeo and Juliet. Virtual Relationships Located-in Kings Cross

Graql analytics query

And we’ll continue adding more in each release!

Page 33: THE DATABASE FOR AI - developer.ibm.com · Example: Leonardo di Caprio was the actor for the Romeo character in the film Romeo and Juliet. Virtual Relationships Located-in Kings Cross

WHAT AN INTELLIGENT AGENT NEEDS TO POSSESS

GRAKN

GRAQL

NATURAL LANGUAGE PROCESSING

BIG DATA MACHINE LEARNING

Page 34: THE DATABASE FOR AI - developer.ibm.com · Example: Leonardo di Caprio was the actor for the Romeo character in the film Romeo and Juliet. Virtual Relationships Located-in Kings Cross

Thank you!

Join us at: grakn.ai/community

Page 35: THE DATABASE FOR AI - developer.ibm.com · Example: Leonardo di Caprio was the actor for the Romeo character in the film Romeo and Juliet. Virtual Relationships Located-in Kings Cross

Appendix: Comparisons with other Database

Page 36: THE DATABASE FOR AI - developer.ibm.com · Example: Leonardo di Caprio was the actor for the Romeo character in the film Romeo and Juliet. Virtual Relationships Located-in Kings Cross

Graph/NoSQL databases have no schema

No schema = No model constraint

Complex model + no constraint = Exponential possible paths and mistakes

App layer is responsible for schema and model consistency = too expensive!

Query interpretation has to be managed by user (engineer) in app layer

No knowledge representation and no reasoning of information

Page 37: THE DATABASE FOR AI - developer.ibm.com · Example: Leonardo di Caprio was the actor for the Romeo character in the film Romeo and Juliet. Virtual Relationships Located-in Kings Cross

SQL have a schema but not so expressive

UML modelling still works

Normalised data model constraints managed by database

Higher level abstraction is managed at application layer

Query interpretation has to be managed by user (engineer) in app layer

Bad for querying large number relationships in highly-interconnected data

No knowledge representation and no reasoning of information

Page 38: THE DATABASE FOR AI - developer.ibm.com · Example: Leonardo di Caprio was the actor for the Romeo character in the film Romeo and Juliet. Virtual Relationships Located-in Kings Cross

RDF & OWL are not for software/data engineering

RDF data model too low-level to solve the complexity challenge

OWL is "open-world", databases are "closed-world”

OWL not ideal for graph data, better for tree data

High entry threshold for non-logicians

RDF & OWL are for semantic web (not databases)

and logicians (not software engineers)