scalable smart data management in the cloud
TRANSCRIPT
![Page 1: Scalable Smart Data Management in the Cloud](https://reader036.vdocuments.us/reader036/viewer/2022062523/58eebd961a28ab3d298b465f/html5/thumbnails/1.jpg)
Scalable Smart Data Management in the Cloud
Alex Simov & Yavor Petkov
Cloud2Days, 2015
Scalable Smart Data Management in the Cloud, Cloud2Days, Sofia #1 Nov 2015
![Page 2: Scalable Smart Data Management in the Cloud](https://reader036.vdocuments.us/reader036/viewer/2022062523/58eebd961a28ab3d298b465f/html5/thumbnails/2.jpg)
• Why we developed the Self-Service Semantic Suit (S4)
• What is S4
• S4 features
• Cloud architecture
• S4 for developers
Presentation outline
Scalable Smart Data Management in the Cloud, Cloud2Days, Sofia #2 Nov 2015
![Page 3: Scalable Smart Data Management in the Cloud](https://reader036.vdocuments.us/reader036/viewer/2022062523/58eebd961a28ab3d298b465f/html5/thumbnails/3.jpg)
About Ontotext
• Provides products & solutions for content enrichment and metadata management
– 70 employees, head quartered in Sofia (Bulgaria)
– Sales presence in London, Washington & Boston
• Major clients and industries
– Media & Publishing
– Health Care & Life Sciences
– Cultural Heritage & Digital Libraries
– Government
– Education
Scalable Smart Data Management in the Cloud, Cloud2Days, Sofia #3 Nov 2015
![Page 4: Scalable Smart Data Management in the Cloud](https://reader036.vdocuments.us/reader036/viewer/2022062523/58eebd961a28ab3d298b465f/html5/thumbnails/4.jpg)
Some of our clients
Scalable Smart Data Management in the Cloud, Cloud2Days, Sofia #4 Nov 2015
![Page 5: Scalable Smart Data Management in the Cloud](https://reader036.vdocuments.us/reader036/viewer/2022062523/58eebd961a28ab3d298b465f/html5/thumbnails/5.jpg)
Why we developed the Self-Service Semantic Suite (S4)
Scalable Smart Data Management in the Cloud, Cloud2Days, Sofia #5 Nov 2015
![Page 6: Scalable Smart Data Management in the Cloud](https://reader036.vdocuments.us/reader036/viewer/2022062523/58eebd961a28ab3d298b465f/html5/thumbnails/6.jpg)
• How can we unlock more insight from text?
• How can we interlink & search across text and structured data sources?
• How can we improve data & content reuse?
• How can we integrate data sources faster?
• How can we reuse external open data sources?
• How can we discover relations between entities?
Typical challenges for our customers
Scalable Smart Data Management in the Cloud, Cloud2Days, Sofia #6 Nov 2015
![Page 7: Scalable Smart Data Management in the Cloud](https://reader036.vdocuments.us/reader036/viewer/2022062523/58eebd961a28ab3d298b465f/html5/thumbnails/7.jpg)
• Unlock the value of semantic technologies to SMEs
– Most success stories so far come from bigger companies
• Lower the technology adoption barriers and risks
– Challenge: perceived risks associated with new technology adoption
– Challenge: insufficient resources to implement new technologies
– Challenge: procurement & provisioning processes
Why did we create S4?
Scalable Smart Data Management in the Cloud, Cloud2Days, Sofia #7 Nov 2015
![Page 8: Scalable Smart Data Management in the Cloud](https://reader036.vdocuments.us/reader036/viewer/2022062523/58eebd961a28ab3d298b465f/html5/thumbnails/8.jpg)
• Utilise semantic technology for smart data applications
– Extract more value hidden in text
– Interlink structured and unstructured data sources
– Semantic search (instead of keyword-based search)
– Reuse open knowledge graphs
• Low adoption cost and risk
• No need for complex planning & procurement
• Pay only for what you use
S4 benefits
Scalable Smart Data Management in the Cloud, Cloud2Days, Sofia #8 Nov 2015
![Page 9: Scalable Smart Data Management in the Cloud](https://reader036.vdocuments.us/reader036/viewer/2022062523/58eebd961a28ab3d298b465f/html5/thumbnails/9.jpg)
What is S4
Scalable Smart Data Management in the Cloud, Cloud2Days, Sofia #9 Nov 2015
![Page 10: Scalable Smart Data Management in the Cloud](https://reader036.vdocuments.us/reader036/viewer/2022062523/58eebd961a28ab3d298b465f/html5/thumbnails/10.jpg)
• Self-service capabilities for text analytics, content enrichment and metadata management
– Access to large open knowledge graphs
– Text analytics for news, life sciences and social media
– RDF graph database as-a-service
• Available anytime, anywhere
– Simple RESTful services
• Simple, pay-per-use pricing
– No upfront commitments
What is S4?
Scalable Smart Data Management in the Cloud, Cloud2Days, Sofia #10 Nov 2015
![Page 11: Scalable Smart Data Management in the Cloud](https://reader036.vdocuments.us/reader036/viewer/2022062523/58eebd961a28ab3d298b465f/html5/thumbnails/11.jpg)
What is S4?
Scalable Smart Data Management in the Cloud, Cloud2Days, Sofia #11 Nov 2015
![Page 12: Scalable Smart Data Management in the Cloud](https://reader036.vdocuments.us/reader036/viewer/2022062523/58eebd961a28ab3d298b465f/html5/thumbnails/12.jpg)
Knowledge Graphs
Scalable Smart Data Management in the Cloud, Cloud2Days, Sofia #12 Nov 2015
![Page 13: Scalable Smart Data Management in the Cloud](https://reader036.vdocuments.us/reader036/viewer/2022062523/58eebd961a28ab3d298b465f/html5/thumbnails/13.jpg)
• SPARQL query endpoint to FactForge knowledge graph
– 500 million entities
– 5 billion triples
• Key LOD datasets integrated
– DBpedia, Freebase, GeoNames, WordNet
– Dublin Core, SKOS, PROTON ontologies and vocabularies
Knowledge graphs with S4
Scalable Smart Data Management in the Cloud, Cloud2Days, Sofia #13 Nov 2015
![Page 14: Scalable Smart Data Management in the Cloud](https://reader036.vdocuments.us/reader036/viewer/2022062523/58eebd961a28ab3d298b465f/html5/thumbnails/14.jpg)
Knowledge graph query example
Scalable Smart Data Management in the Cloud, Cloud2Days, Sofia #14
SPARQL query using DBpedia
data
Nov 2015
![Page 15: Scalable Smart Data Management in the Cloud](https://reader036.vdocuments.us/reader036/viewer/2022062523/58eebd961a28ab3d298b465f/html5/thumbnails/15.jpg)
Text Analytics
Scalable Smart Data Management in the Cloud, Cloud2Days, Sofia #15 Nov 2015
![Page 16: Scalable Smart Data Management in the Cloud](https://reader036.vdocuments.us/reader036/viewer/2022062523/58eebd961a28ab3d298b465f/html5/thumbnails/16.jpg)
• Text analytics services
– News annotation
– News categorisation
– Biomedical
• Entity linking & disambiguation
– Mappings to DBpedia & GeoNames instances
– Mappings to biomedical data sources (LinkedLifeData)
• HTML, MS Word, XML, plain text input
• Simple JSON output
Text analytics with S4
Scalable Smart Data Management in the Cloud, Cloud2Days, Sofia #16 Nov 2015
![Page 17: Scalable Smart Data Management in the Cloud](https://reader036.vdocuments.us/reader036/viewer/2022062523/58eebd961a28ab3d298b465f/html5/thumbnails/17.jpg)
News analytics example
Scalable Smart Data Management in the Cloud, Cloud2Days, Sofia #17
S4 result
Nov 2015
![Page 18: Scalable Smart Data Management in the Cloud](https://reader036.vdocuments.us/reader036/viewer/2022062523/58eebd961a28ab3d298b465f/html5/thumbnails/18.jpg)
RDF Data Management
Scalable Smart Data Management in the Cloud, Cloud2Days, Sofia #18 Nov 2015
![Page 19: Scalable Smart Data Management in the Cloud](https://reader036.vdocuments.us/reader036/viewer/2022062523/58eebd961a28ab3d298b465f/html5/thumbnails/19.jpg)
• Self-managed RDF database – Available from AWS Marketplace
– Variety of hardware configurations
– Manage large data volumes
– Pay-per-hour pricing
– Free trial evaluation (one time)
• Fully-managed RDF DBaaS – Low-cost DBaaS available 24/7
– Ideal for small & moderate data volumes
– Zero administration: automated operations, maintenance & upgrades
– Users pay only for the actual database utilization
– Free data hosting tier
RDF DBaaS Overview
#19 Nov 2015 Scalable Smart Data Management in the Cloud, Cloud2Days, Sofia
![Page 20: Scalable Smart Data Management in the Cloud](https://reader036.vdocuments.us/reader036/viewer/2022062523/58eebd961a28ab3d298b465f/html5/thumbnails/20.jpg)
• Instantly deploy new databases when needed
• Accessible as REST services
• Isolation of the multi-tenant databases
• Fair use of shared resources
• A DBaaS on S4 is…
– A GraphDB instance
– Running within a Docker container
– With a private EBS data volume
Fully-managed RDF DBaaS (cont’d)
#20 Nov 2015 Scalable Smart Data Management in the Cloud, Cloud2Days, Sofia
![Page 21: Scalable Smart Data Management in the Cloud](https://reader036.vdocuments.us/reader036/viewer/2022062523/58eebd961a28ab3d298b465f/html5/thumbnails/21.jpg)
DBaaS management console
#21 Nov 2015 Scalable Smart Data Management in the Cloud, Cloud2Days, Sofia
![Page 22: Scalable Smart Data Management in the Cloud](https://reader036.vdocuments.us/reader036/viewer/2022062523/58eebd961a28ab3d298b465f/html5/thumbnails/22.jpg)
Amazon Cloud Architecture
Scalable Smart Data Management in the Cloud, Cloud2Days, Sofia #22 Nov 2015
![Page 23: Scalable Smart Data Management in the Cloud](https://reader036.vdocuments.us/reader036/viewer/2022062523/58eebd961a28ab3d298b465f/html5/thumbnails/23.jpg)
• Why AWS ? – Innovation
– Elasticity
– Rich infrastructure and platform services
– Reliability
• S4 builds upon … – Compute: EC2, EBS
– Storage: S3, Glacier
– Databases: SimpleDB, DynamoDB
– Infrastructure: ELB, ASG, SQS, SNS, SES, …
– Management: ClouldWatch
S4 on AWS
#23 Nov 2015 Scalable Smart Data Management in the Cloud, Cloud2Days, Sofia
![Page 24: Scalable Smart Data Management in the Cloud](https://reader036.vdocuments.us/reader036/viewer/2022062523/58eebd961a28ab3d298b465f/html5/thumbnails/24.jpg)
S4 Architecture
#24 Nov 2015
applications
applications
Web UI
routing nodes
data nodes
coordinator
storage notifications
Docker repository
account / quota management
monitoring & logging metadata
store
text analytics
document queue
FactForge semantic
warehouse
Scalable Smart Data Management in the Cloud, Cloud2Days, Sofia
![Page 25: Scalable Smart Data Management in the Cloud](https://reader036.vdocuments.us/reader036/viewer/2022062523/58eebd961a28ab3d298b465f/html5/thumbnails/25.jpg)
• Routing nodes
– Forward client requests to the proper data node
– Text processing requests queueing
– Access control & quota checks
• Text processing nodes
• Data nodes
– Multiple Docker containers (GDB+EBS) per node
• Coordinator (single)
– Distribute DB initialisation / creation tasks to data nodes
• Management Console
S4 Architecture
#25 Nov 2015 Scalable Smart Data Management in the Cloud, Cloud2Days, Sofia
![Page 26: Scalable Smart Data Management in the Cloud](https://reader036.vdocuments.us/reader036/viewer/2022062523/58eebd961a28ab3d298b465f/html5/thumbnails/26.jpg)
Dealing with Failures
#26 Nov 2015
applications
applications
Web UI
routing nodes
data nodes
coordinator
storage notifications
Docker repository
account / quota management
monitoring & logging metadata
store
text analytics
document queue
FactForge semantic
warehouse
Scalable Smart Data Management in the Cloud, Cloud2Days, Sofia
![Page 27: Scalable Smart Data Management in the Cloud](https://reader036.vdocuments.us/reader036/viewer/2022062523/58eebd961a28ab3d298b465f/html5/thumbnails/27.jpg)
For Developers
Scalable Smart Data Management in the Cloud, Cloud2Days, Sofia #27 Nov 2015
![Page 28: Scalable Smart Data Management in the Cloud](https://reader036.vdocuments.us/reader036/viewer/2022062523/58eebd961a28ab3d298b465f/html5/thumbnails/28.jpg)
S4 service GET POST PUT DELETE
text analytics
-
submit a document for processing
-
-
knowledge graph access
SPARQL query SPARQL query -
-
self- or fully managed RDF graph databases (OpenRDF REST API)
•list repositories •query data •read data •get the database configuration
•query data •update data
•create repositories •update data (RDF document) •update the database configuration
•delete repositories •delete data
S4 APIs overview
Scalable Smart Data Management in the Cloud, Cloud2Days, Sofia #28 Nov 2015
![Page 29: Scalable Smart Data Management in the Cloud](https://reader036.vdocuments.us/reader036/viewer/2022062523/58eebd961a28ab3d298b465f/html5/thumbnails/29.jpg)
• Security & access control – TSL 1.2 for all HTTP communication
– HTTP Basic Authentication
– API keys – flexible access control mechanism
• Supported data formats – Text analytics: various textual input / JSON output
– Knowledge graphs and DBaaS: any W3C recommended RDF format
• Free monthly quotas – Text analytics: 250 MB of data processed
– Knowledge graphs access – 5000 requests
– Fully-managed RDF database – 1 million hosted triples
S4 APIs overview (cont’d)
Scalable Smart Data Management in the Cloud, Cloud2Days, Sofia #29 Nov 2015
![Page 30: Scalable Smart Data Management in the Cloud](https://reader036.vdocuments.us/reader036/viewer/2022062523/58eebd961a28ab3d298b465f/html5/thumbnails/30.jpg)
Getting started in minutes
Scalable Smart Data Management in the Cloud, Cloud2Days, Sofia #30
1. Register a personal account at s4.ontotext.com
2. Generate an API key pair
3. Check out the docs, demos & code at
docs.s4.ontotext.com
4. Contact us with questions!
Nov 2015
![Page 31: Scalable Smart Data Management in the Cloud](https://reader036.vdocuments.us/reader036/viewer/2022062523/58eebd961a28ab3d298b465f/html5/thumbnails/31.jpg)
• Java, Python & C# SDKs
• Sample code
– Java, C#, NodeJS, JavaScript, Python, PHP, Groovy
– cUrl examples for the most impatient
• GATE/UIMA plugins
• Firefox/Chrome plugins
• Online documentation
Supporting materials
Scalable Smart Data Management in the Cloud, Cloud2Days, Sofia #31 Nov 2015
![Page 32: Scalable Smart Data Management in the Cloud](https://reader036.vdocuments.us/reader036/viewer/2022062523/58eebd961a28ab3d298b465f/html5/thumbnails/32.jpg)
Thank you!
s4.ontotext.com
Nov 2nd, 2015
#32 Scalable Smart Data Management in the Cloud, Cloud2Days, Sofia Nov 2015