the database for ai - developer.ibm.com · example: leonardo di caprio was the actor for the romeo...
TRANSCRIPT
Contribute on github.com/graknlabs Follow us on twitter.com/graknlabsJoin our community on grakn.ai/slack
T H E D ATA B A S E F O R A I
“For a computer to pass a Turing Test, it needs to possess: Natural Language Processing,
Knowledge Representation, Automated Reasoning and Machine Learning”
Peter Norvig and Stuart J. Russell, “Artificial Intelligence: A Modern Approach”, 1994
WHAT AN INTELLIGENT AGENT NEEDS TO POSSESS
KNOWLEDGE REPRESENTATION SYSTEM
AUTOMATED REASONING
NATURAL LANGUAGE PROCESSING
BIG DATA MACHINE LEARNING
WHAT AN INTELLIGENT AGENT NEEDS TO POSSESS
A KNOWLEDGE BASE FOR STORAGE AND
RETRIEVAL OF COMPLEX
INFORMATION
NATURAL LANGUAGE PROCESSING
BIG DATA MACHINE LEARNING
Meet GRAKN and GRAQL
APACHE KAFKA
GRAKN
APACHEHBASE /
CASSANDRA
GRAQL
APACHE TINKERPOP
GRAKN KNOWLEDGE REPRESENTATION SYSTEM
GRAPH ANALYTICS
Grakn is a distributed data platform that uses an ontology to govern the data structure.
Graql is the knowledge graph query language that performs automated reasoning and graph analytics over Grakn.
APACHE SPARK
APACHE HADOOP
INFERENCE ENGINE
Grakn is a database in the form of a knowledge base that uses an ontology to model complex datasets
&
Grakn is a query language that performs automated reasoning to simplify querying for complex datasets
Ontology as a knowledge representation model
A modelling tool should be able to model the real world and all the hierarchies and hyper-relationships contained in it.
Grakn Knowledge Model
Resource RoleEntity Relation Rules
Employment
Employee Employer
PersonNamehas
Has roleHas rolePlays role
Entities and Resources
Person company
name
has has
Entity Inheritance
Customer Startup
Person company
name
has has
sub sub
Relationship Structure
Customer Startup
Person company
name
has has
Employee Employer
Employment
sub sub
has-role has-role
plays-role plays-role
Implied Relationship Structure
Customer Startup
Person company
name
has has
Employee Employer
Employment
sub sub
has-role has-role
plays-role plays-role
Non-directional Relationships
Example: Bob is married to Alice, where Bob is the husband and Alice is the wife
husband wifemarriage
has-role has-role
Relationships and Resources
Example: Bob is married to Alice on 14 of Jan 2017, where Bob is the husband and Alice is the wife
husband wifemarriage
has-role has-role
date
has
Relationship in Relationship
husband wifemarriage
has-role has-role
date
has
located-subject
subject-location
located-in
has-role has-role
plays-role
Example: Bob is married to Alice on 14 of Jan 2017 at Austin, Texas, where Bob is the husband and Alice is the wife
N-ary Relations
casted-movie
character
movie-casthas-role
has-role
actor
has-role
Example: Leonardo di Caprio was the actor for the Romeo character in the film Romeo and Juliet
Virtual Relationships
Located-in
LondonKings Cross UK
Located-inLocated-in
Located-subject
Located-subject
Subject-location
Located-subject
Subject-location
Subject-location
Dynamic Relations
Schedule A
Schedule B
A Start B Start A End B end
Data Model Constraint
Customer Startup
Person companynamehas has
Employee Employer
Employment
sub sub
has-role has-role
plays-role plays-role
husband
wife
marriage
has-role
has-role
plays-role
plays-role
Data Model Constraint
Customer Startup
Person companynamehas has
Employee Employer
Employment
sub sub
has-role has-role
plays-role plays-role
husband
wife
marriage
has-role
has-role
plays-role
plays-role
✓ Write commit success
Data Model Constraint
Customer Startup
Person companynamehas has
Employee Employer
Employment
sub sub
has-role has-role
plays-role plays-role
husband
wife
marriage
has-role
has-role
plays-role
plays-role
❌ Write commit fails
A reasoning query language
A query language should not only be able to retrieve explicitly stored data, but also implicitly derived information.
SQL vs Gremlin
Get customer A with ID ’ALFKI’, and get a different customer B who has bought products that A have bought, and get other products from B that A have not bought. Order by name.
SQL vs Gremlin vs Graql
Get customer A with ID ’ALFKI’, and get a different customer B who has bought products that A have bought, and get other products from B that A have not bought. Order by name.
Graql Basic
Get customer A with ID ’ALFKI’, and get a different customer B who has bought products that A have bought, and get other products from B that A have not bought. Order by name.
Graql Basic
Get every actor and their character in the movie Titanic and rank by their billing number
Get every couple that got married in Hawaii
Graql reasoning query
Whether you write the query in SQL, NoSQL or Graph queries, the query will still be complex, verbose, and potentially suboptimal path
Permanent driver
Temporary driver
TR11 N3 8AB
truck
postcodedestination
CON22 Kings Cross
Bus
Countydestination
BA191 London
Van
Citydestination
located-in
located-in
Contract driver
Driver
SUB
SUB
SUB
Need type abstraction Need transitive relationsTwo Problems
Graql reasoning query
Permanent driver
Temporary driver
TR11 N3 8AB
truck
postcodedestination
CON22 Kings Cross
Bus
Countydestination
BA191 London
Van
Citydestination
located-in
located-in
Contract driver
Driver
SUB
SUB
SUB
Need type abstraction Need transitive relationsTwo Problems
5 Graql Lines
Graql analytics query
Example:
Connected Component – clustering algorithm
For each vertex V,
Superstep 1:
V sends its own id via both out going and incoming edgesV sets its own id as cluster label
Do Superstep n:
For every received message m of V, compare it to its current cluster label L:If m > L, set the label to m;
If the cluster label has not changed in this super step, vote to halt;Else, send the new cluster label via all edges;
Global operation:
While not every vertex votes to halt, and n < N, do another superstep n + 1.
A VertexProgram Class could be around 200 lines of Java code.
Graql analytics query
Example:
Connected Component – clustering algorithm
For each vertex V,
Superstep 1:
V sends its own id via both out going and incoming edgesV sets its own id as cluster label
Do Superstep n:
For every received message m of V, compare it to its current cluster label L:If m > L, set the label to m;
If the cluster label has not changed in this super step, vote to halt;Else, send the new cluster label via all edges;
Global operation:
While not every vertex votes to halt, and n < N, do another superstep n + 1.
Graql analytics query
And we’ll continue adding more in each release!
WHAT AN INTELLIGENT AGENT NEEDS TO POSSESS
GRAKN
GRAQL
NATURAL LANGUAGE PROCESSING
BIG DATA MACHINE LEARNING
Thank you!
Join us at: grakn.ai/community
Appendix: Comparisons with other Database
Graph/NoSQL databases have no schema
No schema = No model constraint
Complex model + no constraint = Exponential possible paths and mistakes
App layer is responsible for schema and model consistency = too expensive!
Query interpretation has to be managed by user (engineer) in app layer
No knowledge representation and no reasoning of information
SQL have a schema but not so expressive
UML modelling still works
Normalised data model constraints managed by database
Higher level abstraction is managed at application layer
Query interpretation has to be managed by user (engineer) in app layer
Bad for querying large number relationships in highly-interconnected data
No knowledge representation and no reasoning of information
RDF & OWL are not for software/data engineering
RDF data model too low-level to solve the complexity challenge
OWL is "open-world", databases are "closed-world”
OWL not ideal for graph data, better for tree data
High entry threshold for non-logicians
RDF & OWL are for semantic web (not databases)
and logicians (not software engineers)