apache stanbol and the web of data - apachecon 2011
DESCRIPTION
Presentation on Apache Stanbol (incubating) and related projects given by Olivier Grisel durin ApacheCon 2011. More information: - http://incubator.apache.org/stanbol/ - http://www.iks-project.euTRANSCRIPT
![Page 1: Apache Stanbol and the Web of Data - ApacheCon 2011](https://reader035.vdocuments.us/reader035/viewer/2022062511/54c663f24a79594b538b46fb/html5/thumbnails/1.jpg)
11/7/11
Apache Stanbol (Incubating)and the Web of Data
Olivier Grisel, [email protected], 2011-11-11
![Page 2: Apache Stanbol and the Web of Data - ApacheCon 2011](https://reader035.vdocuments.us/reader035/viewer/2022062511/54c663f24a79594b538b46fb/html5/thumbnails/2.jpg)
11/7/11
My Background
Olivier Grisel - R&D Engineer
nuxeoOpen Source ECM
European project: IKS
Stuff I do:Machine Learning Natural Language Processing All things data
![Page 3: Apache Stanbol and the Web of Data - ApacheCon 2011](https://reader035.vdocuments.us/reader035/viewer/2022062511/54c663f24a79594b538b46fb/html5/thumbnails/3.jpg)
11/7/11
Agenda
The Web of Data: what, why, how?
CMS integration demo
Semantic Components in Stanbol
Building models for Stanbol
![Page 4: Apache Stanbol and the Web of Data - ApacheCon 2011](https://reader035.vdocuments.us/reader035/viewer/2022062511/54c663f24a79594b538b46fb/html5/thumbnails/4.jpg)
The Web of Data
What, Why, How?
![Page 5: Apache Stanbol and the Web of Data - ApacheCon 2011](https://reader035.vdocuments.us/reader035/viewer/2022062511/54c663f24a79594b538b46fb/html5/thumbnails/5.jpg)
![Page 6: Apache Stanbol and the Web of Data - ApacheCon 2011](https://reader035.vdocuments.us/reader035/viewer/2022062511/54c663f24a79594b538b46fb/html5/thumbnails/6.jpg)
11/7/11
“To a computer, then, the web is a flat, boring world devoid of meaning”
Tim Berners Lee, http://www.w3.org/Talks/WWW94Tim/
![Page 7: Apache Stanbol and the Web of Data - ApacheCon 2011](https://reader035.vdocuments.us/reader035/viewer/2022062511/54c663f24a79594b538b46fb/html5/thumbnails/7.jpg)
11/7/11
“This is a pity, as in fact documents on the web describe real objects and imaginary concepts, and give particular relationships between them”
Tim Berners Lee, http://www.w3.org/Talks/WWW94Tim/
![Page 8: Apache Stanbol and the Web of Data - ApacheCon 2011](https://reader035.vdocuments.us/reader035/viewer/2022062511/54c663f24a79594b538b46fb/html5/thumbnails/8.jpg)
11/7/11
“The Semantic Web is not a separate Web but an extension of the current one, in which information
is given well-defined meaning, better enabling computers and people to work in cooperation.”
Tim Berners Lee, http://www.w3.org/Talks/WWW94Tim/
![Page 9: Apache Stanbol and the Web of Data - ApacheCon 2011](https://reader035.vdocuments.us/reader035/viewer/2022062511/54c663f24a79594b538b46fb/html5/thumbnails/9.jpg)
11/7/11
“Adding semantics to the web involves two things: allowing documents which have information
in machine-readable forms, and allowing links to be created with relationship values.”
Tim Berners Lee, http://www.w3.org/Talks/WWW94Tim/
![Page 10: Apache Stanbol and the Web of Data - ApacheCon 2011](https://reader035.vdocuments.us/reader035/viewer/2022062511/54c663f24a79594b538b46fb/html5/thumbnails/10.jpg)
11/7/11
The Web of Data – What?
Shared description of the real world
oStructured with vocabularies
oDecentralized
oScoped by namespaces
oLinked
![Page 11: Apache Stanbol and the Web of Data - ApacheCon 2011](https://reader035.vdocuments.us/reader035/viewer/2022062511/54c663f24a79594b538b46fb/html5/thumbnails/11.jpg)
11/7/11
The Web of Data – Why?• Strings are ambiguous
o New York / The Big Apple / NYCo Washington (Person, State, City, Sports Team...)
• Structured context helps humans o Who is this guy?o Where is this city?
• Conceptual frame helps machineso Explicit user intent decodingo Smarter indexing / search?
![Page 12: Apache Stanbol and the Web of Data - ApacheCon 2011](https://reader035.vdocuments.us/reader035/viewer/2022062511/54c663f24a79594b538b46fb/html5/thumbnails/12.jpg)
11/7/11
Decoding User Intents
![Page 13: Apache Stanbol and the Web of Data - ApacheCon 2011](https://reader035.vdocuments.us/reader035/viewer/2022062511/54c663f24a79594b538b46fb/html5/thumbnails/13.jpg)
11/7/11
Decoding User Intents
Next Generation User InterfacesSiri - conversational interfaceIBM DeepQA: Watson for Heath Care
Tell Google about your stuffPublish structured prediction of your products"3 bedrooms flat near Montmartre"
Useful for non-public data as wellIntranet query: "ApacheCon slides"Intranet query: "Xerox invoices"Intranet query: "Xerox salesperson email"
![Page 14: Apache Stanbol and the Web of Data - ApacheCon 2011](https://reader035.vdocuments.us/reader035/viewer/2022062511/54c663f24a79594b538b46fb/html5/thumbnails/14.jpg)
11/7/11
The Web of Data - How?
RDF / TripeStores / SparqlGraph stores with dynamic schemasStrong interoperability
JSON-LDUpgrade your JSON with scoped vocabulariesWeb / Mobile / JS developer friendly
RDFa + schema.org & rNewsPublish annotation in structured markupVocabulary understood by Search Engines
![Page 15: Apache Stanbol and the Web of Data - ApacheCon 2011](https://reader035.vdocuments.us/reader035/viewer/2022062511/54c663f24a79594b538b46fb/html5/thumbnails/15.jpg)
11/7/11
HTML example
<p>
My name is Manu Sporny and you can give me a ring via 1-800-555-0155. <img src="http://manu.sporny.org/images/manu.png" /> I have a <a href="http://manu.sporny.org/">blog</a>.
</p>
![Page 16: Apache Stanbol and the Web of Data - ApacheCon 2011](https://reader035.vdocuments.us/reader035/viewer/2022062511/54c663f24a79594b538b46fb/html5/thumbnails/16.jpg)
11/7/11
RDFa example
<p vocab="http://schema.org/" prefix="foaf: http://xmlns.com/foaf/0.1/" about="#manu" typeof="Person">
My name is <span property="name">Manu Sporny</span> and you can give me a ring via <span property="telephone">1-800-555-0155</span>. <img rel="image" src="http://manu.sporny.org/images/manu.png" /> I have a <a rel="foaf:weblog" href="http://manu.sporny.org/">blog</a>.</p>
![Page 17: Apache Stanbol and the Web of Data - ApacheCon 2011](https://reader035.vdocuments.us/reader035/viewer/2022062511/54c663f24a79594b538b46fb/html5/thumbnails/17.jpg)
11/7/11
JSON-LD example
![Page 18: Apache Stanbol and the Web of Data - ApacheCon 2011](https://reader035.vdocuments.us/reader035/viewer/2022062511/54c663f24a79594b538b46fb/html5/thumbnails/18.jpg)
11/7/11
2007 2008
20092010
![Page 19: Apache Stanbol and the Web of Data - ApacheCon 2011](https://reader035.vdocuments.us/reader035/viewer/2022062511/54c663f24a79594b538b46fb/html5/thumbnails/19.jpg)
2011
![Page 20: Apache Stanbol and the Web of Data - ApacheCon 2011](https://reader035.vdocuments.us/reader035/viewer/2022062511/54c663f24a79594b538b46fb/html5/thumbnails/20.jpg)
Bridging the Web of Dataand my CMS
![Page 21: Apache Stanbol and the Web of Data - ApacheCon 2011](https://reader035.vdocuments.us/reader035/viewer/2022062511/54c663f24a79594b538b46fb/html5/thumbnails/21.jpg)
![Page 22: Apache Stanbol and the Web of Data - ApacheCon 2011](https://reader035.vdocuments.us/reader035/viewer/2022062511/54c663f24a79594b538b46fb/html5/thumbnails/22.jpg)
11/7/11
Apache Stanbol
EnhancerText analysis with Apache OpenNLP / Tika
EntityHub / ContentHubLinked Data Indexing with Apache SolrGraph Storage with Apache Clerezza / Jena
Reasoner / RulesInference with Apache Jena & OWLApi
Components / HTTP ServicesOSGi with Apache Felix / JAX-RS with Jersey
![Page 23: Apache Stanbol and the Web of Data - ApacheCon 2011](https://reader035.vdocuments.us/reader035/viewer/2022062511/54c663f24a79594b538b46fb/html5/thumbnails/23.jpg)
![Page 24: Apache Stanbol and the Web of Data - ApacheCon 2011](https://reader035.vdocuments.us/reader035/viewer/2022062511/54c663f24a79594b538b46fb/html5/thumbnails/24.jpg)
![Page 25: Apache Stanbol and the Web of Data - ApacheCon 2011](https://reader035.vdocuments.us/reader035/viewer/2022062511/54c663f24a79594b538b46fb/html5/thumbnails/25.jpg)
![Page 26: Apache Stanbol and the Web of Data - ApacheCon 2011](https://reader035.vdocuments.us/reader035/viewer/2022062511/54c663f24a79594b538b46fb/html5/thumbnails/26.jpg)
![Page 27: Apache Stanbol and the Web of Data - ApacheCon 2011](https://reader035.vdocuments.us/reader035/viewer/2022062511/54c663f24a79594b538b46fb/html5/thumbnails/27.jpg)
RESTfulis
Beautiful
![Page 28: Apache Stanbol and the Web of Data - ApacheCon 2011](https://reader035.vdocuments.us/reader035/viewer/2022062511/54c663f24a79594b538b46fb/html5/thumbnails/28.jpg)
11/7/11
Minimalist HTTP Client
curl -X POST -H "Accept: text/turtle" \ -H "Content-type: text/plain" \ --data "John Smith was born in London." \ http://stanbol.demo.nuxeo.com/engines
![Page 29: Apache Stanbol and the Web of Data - ApacheCon 2011](https://reader035.vdocuments.us/reader035/viewer/2022062511/54c663f24a79594b538b46fb/html5/thumbnails/29.jpg)
![Page 30: Apache Stanbol and the Web of Data - ApacheCon 2011](https://reader035.vdocuments.us/reader035/viewer/2022062511/54c663f24a79594b538b46fb/html5/thumbnails/30.jpg)
![Page 31: Apache Stanbol and the Web of Data - ApacheCon 2011](https://reader035.vdocuments.us/reader035/viewer/2022062511/54c663f24a79594b538b46fb/html5/thumbnails/31.jpg)
Local IT infrastructure (LAN)Local IT infrastructure (LAN)
Nuxeo DMNuxeo DMNuxeo DMNuxeo DM
addonaddon
11
1
Apache StanbolApache StanbolApache StanbolApache Stanbol
112
11
Engine 1Engine 1Engine 1Engine 1
Engine 2Engine 2Engine 2Engine 2
Engine 3Engine 3Engine 3Engine 3
3
DBpedia
Freebase
GeonamesLDAP
![Page 32: Apache Stanbol and the Web of Data - ApacheCon 2011](https://reader035.vdocuments.us/reader035/viewer/2022062511/54c663f24a79594b538b46fb/html5/thumbnails/32.jpg)
11/7/11
Stanbol Enhancer
Chain of Enhancement Engines
Language Detection (Tika)
Named Entity Detection (OpenNLP)
Linked Data dereferencing (Solr)
Refactoring / Translation (Jena)
![Page 33: Apache Stanbol and the Web of Data - ApacheCon 2011](https://reader035.vdocuments.us/reader035/viewer/2022062511/54c663f24a79594b538b46fb/html5/thumbnails/33.jpg)
11/7/11
Stanbol EntityHub
Referenced SitesDBpediaGeonames(NY Times, MusicBrainz, ProductDB, UnitProt...)
Fast local offline indices (Solr)Batch indexing utilities for RDF dumpsMultilingual fulltext search in labels & descriptions
Vocabulary mapping / merging
![Page 34: Apache Stanbol and the Web of Data - ApacheCon 2011](https://reader035.vdocuments.us/reader035/viewer/2022062511/54c663f24a79594b538b46fb/html5/thumbnails/34.jpg)
11/7/11
Stanbol Reasoner
RDFS / OWL-lite / OWL2
Consistency checksCardinality checks: each person has 1 birth date Range constraints: birth dates are valid dates
Materializing types / propertiesTypes from subclass: Musician > Artist > PersonSymmetric property: A worked with BTransitive property: A is a located in B
Query-time expansion / inference?
![Page 35: Apache Stanbol and the Web of Data - ApacheCon 2011](https://reader035.vdocuments.us/reader035/viewer/2022062511/54c663f24a79594b538b46fb/html5/thumbnails/35.jpg)
11/7/11
Stanbol Rules
Simple Prolog-like language uncleRule[ has(<http://example.org/family.owl#hasParent>, ?x, ?z) . has(<http://example.org/family.owl#hasSibling>, ?z, ?y) -> has(<http://example.org/family.owl#hasUncle>, ?x, ?y) ]
Sparql Construct or SWRL PREFIX family: <http://example.org/family.owl#> CONSTRUCT { ?x family:hasUncle} ?y } WHERE { ?x family:hasParent ?z . ?z family:hasSibling ?y}
![Page 36: Apache Stanbol and the Web of Data - ApacheCon 2011](https://reader035.vdocuments.us/reader035/viewer/2022062511/54c663f24a79594b538b46fb/html5/thumbnails/36.jpg)
11/7/11
Online Demos
Simple analyzer with small index https://stanbol.demo.nuxeo.com
All services deployed http://dev.iks-project.eu:8081
![Page 37: Apache Stanbol and the Web of Data - ApacheCon 2011](https://reader035.vdocuments.us/reader035/viewer/2022062511/54c663f24a79594b538b46fb/html5/thumbnails/37.jpg)
Building Stanbol Enhancer models from Wikipedia
with the Apache data tools
![Page 38: Apache Stanbol and the Web of Data - ApacheCon 2011](https://reader035.vdocuments.us/reader035/viewer/2022062511/54c663f24a79594b538b46fb/html5/thumbnails/38.jpg)
11/7/11
Universal Topic Classification
UseApache Lucene / Solr MoreLikeThis
to perform atruncated nearest neighbors query
in theTF-IDF vector space of Wikipedia
![Page 39: Apache Stanbol and the Web of Data - ApacheCon 2011](https://reader035.vdocuments.us/reader035/viewer/2022062511/54c663f24a79594b538b46fb/html5/thumbnails/39.jpg)
11/7/11
Universal Topic ClassificationIndex text of all articles grouped by topic
Solr MoreLikeThis query on new document
DBpedia dumps provide:Text summaries for each article
“subject” relationships between articles and topics
“broader” / “narrower” SKOS hieararchy between topics
![Page 40: Apache Stanbol and the Web of Data - ApacheCon 2011](https://reader035.vdocuments.us/reader035/viewer/2022062511/54c663f24a79594b538b46fb/html5/thumbnails/40.jpg)
11/7/11
About the Data500k purely technical categories
“People_with_missing_birth_place”, “Rivers_in_Romania”
70k “semantically grounded” categories
Paths to roots require both “technical” and “grounded” categories
Scale:1.2M topic / topic links30M topic / article links
![Page 41: Apache Stanbol and the Web of Data - ApacheCon 2011](https://reader035.vdocuments.us/reader035/viewer/2022062511/54c663f24a79594b538b46fb/html5/thumbnails/41.jpg)
11/7/11
Some results (Wikinews)US children who celebrate Independence Day more likely to become Republicans, says Harvard study
FireworksVoting theoryRepublican Party (United States)StatisticsElectoral systems
![Page 42: Apache Stanbol and the Web of Data - ApacheCon 2011](https://reader035.vdocuments.us/reader035/viewer/2022062511/54c663f24a79594b538b46fb/html5/thumbnails/42.jpg)
11/7/11
Some results (Wikinews)U.S. space agency NASA sues ex-astronaut
American astronautsAviation halls of fameEdwards Air Force BaseApollo programExploration of the Moon
![Page 43: Apache Stanbol and the Web of Data - ApacheCon 2011](https://reader035.vdocuments.us/reader035/viewer/2022062511/54c663f24a79594b538b46fb/html5/thumbnails/43.jpg)
11/7/11
Some results (Wikinews)Hundreds of thousands of British public sector workers strike over planned pension changes
Retirement in the United KingdomUnited Kingdom pensions and benefitsPensions in the United KingdomLabor disputes by countryLabor disputes
![Page 44: Apache Stanbol and the Web of Data - ApacheCon 2011](https://reader035.vdocuments.us/reader035/viewer/2022062511/54c663f24a79594b538b46fb/html5/thumbnails/44.jpg)
11/7/11
Some results (PLoS One)Metabolic Programming during Lactation Stimulates Renal Na+ Transport in the Adult Offspring Due to an Early Impact on Local Angiotensin II Pathways
Renal physiologyKidneyNephrologyHypertensionMembrane biology
![Page 45: Apache Stanbol and the Web of Data - ApacheCon 2011](https://reader035.vdocuments.us/reader035/viewer/2022062511/54c663f24a79594b538b46fb/html5/thumbnails/45.jpg)
11/7/11
Wrap Up
Web of Databrings Sructured Context Frameto decode User Intention
NLP + Entities & Topics indicesto automate Content Enrichmentto provide Disambiguationn
![Page 46: Apache Stanbol and the Web of Data - ApacheCon 2011](https://reader035.vdocuments.us/reader035/viewer/2022062511/54c663f24a79594b538b46fb/html5/thumbnails/46.jpg)
11/7/11
Resources
Documentation, svn, mailing list: http://incubator.apache.org/stanbol
IKS project blog: http://blog.iks-project.eu
Blog posts about Semantic ECM: http://blogs.nuxeo.com/dev/semantic/
![Page 47: Apache Stanbol and the Web of Data - ApacheCon 2011](https://reader035.vdocuments.us/reader035/viewer/2022062511/54c663f24a79594b538b46fb/html5/thumbnails/47.jpg)
11/7/11
Thank you for your attention!
Olivier Grisel
https://twitter.com/ogrisel
![Page 48: Apache Stanbol and the Web of Data - ApacheCon 2011](https://reader035.vdocuments.us/reader035/viewer/2022062511/54c663f24a79594b538b46fb/html5/thumbnails/48.jpg)
Training models for NER from Wikipedia
Extract sentences with link positions in Wikipedia articles
DBPedia to the find type of the target entity (Person, Location, Organization)
Apache Pig scripts to compute the join + format the result as training files for OpenNLP
Apache OpenNLP to build and evaluate the models
Apache Hadoop / Apache Whirr for distributed processing
![Page 49: Apache Stanbol and the Web of Data - ApacheCon 2011](https://reader035.vdocuments.us/reader035/viewer/2022062511/54c663f24a79594b538b46fb/html5/thumbnails/49.jpg)
![Page 50: Apache Stanbol and the Web of Data - ApacheCon 2011](https://reader035.vdocuments.us/reader035/viewer/2022062511/54c663f24a79594b538b46fb/html5/thumbnails/50.jpg)
![Page 51: Apache Stanbol and the Web of Data - ApacheCon 2011](https://reader035.vdocuments.us/reader035/viewer/2022062511/54c663f24a79594b538b46fb/html5/thumbnails/51.jpg)
![Page 52: Apache Stanbol and the Web of Data - ApacheCon 2011](https://reader035.vdocuments.us/reader035/viewer/2022062511/54c663f24a79594b538b46fb/html5/thumbnails/52.jpg)
52