sd sem weboct252010
TRANSCRIPT
Leveraging the growth of the Seman1c Web -‐ from Seman1c SEO to .....
San Diego Seman+c Web Meetup
Oct 25, 2010
Barbara StarrEmail: [email protected]
Twitter: @BarbaraStarr
So … Let us begin to take a look at how the Seman+c Web is being used and leveraged in the real world of late (feel free to add: …..
And of course, who is using it , how, ........
Seman+c Search/SEO
The major Search Engines & Social Networks are currently leveraging
Seman+c Web Technology
What is Seman+c Search
• Semantic Search is basically the notion of improving search by using metadata or searching on that metadata.
• There are several ways that the Search engines on the web may use this to enhance search results.
– FIND, rather than SEARCH. • Searching directly on the metadata directly can yield specific
answers or results as demonstrated in the following example:
Query “Barack Obama Birthday”
Results on
Google acquires Metaweb
Defini+ve Answer on Top
Bing
Definitive Answer
Note: Freebase part of Metaweb acquisition by Google
Definitive answer & enhanced display
Bingleveraged this for quite some +me
What is Seman+c Search (cont)
• Semantic Search is basically the notion of improving search by using metadata or searching on that metadata.
• There are several ways that the Search engines on the web may use this to enhance search results.
– FIND, rather than SEARCH. • Searching directly on the metadata directly can yield specific
answers or results as demonstrated in the following example: • Ran the query “Barack Obama Birthday” on both google, and
bing. Obtained the following:
– Answer engines rather than Search Engines?
What is Seman+c Search (Cont)
• Semantic Search is basically the notion of improving search by using metadata or searching on that metadata.
• There are several ways that the Search engines on the web may use this to enhance search results.
– FIND, rather than SEARCH. – Another aspect of using metadata such as embedding
metadata or semantic markup in web pages could be demonstrated by enhanced displays in search results (e.g. rich snippets in google). Both Google and Yahoo support enhanced displays for RDFa markup.
Rich Snippets
• Google now supports Rich snippets for– People– Events– Businesses and organiza+ons– Reviews– Recipes– Products when related to a review– Breadcrumbs– Local Search
h[p://rdf.data-‐vocabulary.org/#
Events
14
Recipes
Sept 2, 2010
now see more than twice as many searches with rich snippets in the results in the US, and a four-‐fold increase globally, compared to one year ago.
Single Events – Sept 2, 2010
Social Networks
• While search engines can benefit from access to social networks, social networks can benefit from seman+c metadata in web pages
–Example is Facebook’s Open Graph Protocol (also supports RDFa) which allows users to share & like objects (such as products) as opposed to web pages. Enables “Seman+c Profiling” of the users by facebook. (Japanese MIXI now using it)
Web Benefits / Uses
• Yahoo stated 15% increase in CTR as a result of enhanced displays, rich snippets in Google
• Definitive answers enabled by understanding and leveraging how search engines are searching directly on metadata
• Semantic Profiling and adoption by social networks
• Embedding semantic markup in web pages and product pages ultimately makes information “findable” by search engines, enabling them to provide improvements such as definitive answers, enhanced displays, etc
RDFa produc+on
• Drupal 7 now produces RDFa (previous meetup)
• Many CMS publishers
Consuming RDFa
• Previously indicated increase of RDFa in general and produc+on of RDFa
• Available consumers/parsers– Sindice (any23)
– Rdfa dis+ller
Sindice.com
Handy Validators
• RDFA VALIDATORS AND TESTERS• New RDFa Validator: h[p://check.rdfa.info/• Sindice Inspector: h[p://inspector.sindice.com/• Yahoo Objeclinder: h[p://developer.search.yahoo.com/help/objeclinder
• Google rich snippets tester: h[p://www.google.com/webmasters/tools/richsnippets
Adopters?• UK Government
• US Government• BBC (FIFA world cup site dynamically generated using linked data)• Thomson Reuters• Freebase
• NY Times• Best Buy• Google (More to follow h[p://rdf.data-‐vocabulary.org/#)
• Yahoo• Facebook• Mixi• Oracle
• Overstock• Drug research and discovery companies, pfizer, ….• Tons more – Just look at the diversity in the LOD data cloud (genng there)
Spectrum of Applica+ons• Seman+c Wiki’s (Seman+c media Wiki)• Seman+cs as a Service (e.g. SIRI) – interoperability of web
services, underlying service Ontologies• Enterprise data integra+on (Anzo,• Seman+cs in publishing
– Open Calais now has Openpublish– Zemanta, primal pages– Drupal and other CMS systems
• Contextual Adver+sing• Sen+ment Analysis (COGITO)• Seman+c Search (documents & structured data sources)• Seman+c Social Networks
LOD Cloud Evolu+on
The rate of growth has been remarkable
Source maintained by: Richard Cygniak and Anja Jentsch. h[p://lod-‐cloud.net
Oct 2007
Nov 2007 (1)
Nov 2007 (2)
Feb 2008
Mar 2008
Sept 2008
Mar 2009 (1)
Mar 2009 (2)
March 5 -‐ 2009
As of March 2009
LinkedCTReactome
Taxonomy
KEGG
PubMed
GeneID
Pfam
UniProt
OMIM
PDB
SymbolChEBI
Daily Med
Disea-some
CAS
HGNC
InterPro
Drug Bank
UniParc
UniRef
ProDom
PROSITE
Gene Ontology
HomoloGene
PubChem
MGI
UniSTS
GEOSpecies
Jamendo
BBCProgramm
es
Music-brainz
Magna-tune
BBCLater +TOTP
SurgeRadio
MySpaceWrapper
Audio-Scrobbler
LinkedMDB
BBCJohnPeel
BBCPlaycount
Data
Gov-Track
US Census Data
riese
Geo-names
lingvoj
World Fact-book
Euro-stat
IRIT Toulouse
SWConference
Corpus
RDF Book Mashup
Project Guten-berg
DBLPHannover
DBLPBerlin
LAAS- CNRS
Buda-pestBME
IEEE
IBM
Resex
Pisa
New-castle
RAE 2001
CiteSeer
ACM
DBLP RKB
Explorer
eprints
LIBRIS
SemanticWeb.org Eurécom
ECS South-ampton
RevyuSIOCSites
Doap-space
Flickrexporter
FOAFprofiles
flickrwrappr
CrunchBase
Sem-Web-
Central
Open-Guides
Wiki-company
QDOS
Pub Guide
Open Calais
RDF ohloh
W3CWordNet
OpenCyc
UMBEL
Yago
DBpedia
Freebase
Virtuoso Sponger
March 27 -‐ 2009
As of March 2009
LinkedCTReactome
Taxonomy
KEGG
PubMed
GeneID
Pfam
UniProt
OMIM
PDB
SymbolChEBI
Daily Med
Disea-some
CAS
HGNC
InterPro
Drug Bank
UniParc
UniRef
ProDom
PROSITE
Gene Ontology
HomoloGene
PubChem
MGI
UniSTS
GEOSpecies
Jamendo
BBCProgramm
es
Music-brainz
Magna-tune
BBCLater +TOTP
SurgeRadio
MySpaceWrapper
Audio-Scrobbler
LinkedMDB
BBCJohnPeel
BBCPlaycount
Data
Gov-Track
US Census Data
riese
Geo-names
lingvoj
World Fact-book
Euro-stat
flickrwrappr
Open Calais
RevyuSIOCSites
Doap-space
Flickrexporter
FOAFprofiles
CrunchBase
Sem-Web-
Central
Open-Guides
Wiki-company
QDOS
Pub Guide
RDF ohloh
W3CWordNet
OpenCyc
UMBEL
Yago
DBpedia
Freebase
Virtuoso Sponger
DBLPHannover
IRIT Toulouse
SWConference
Corpus
RDF Book Mashup
Project Guten-berg
DBLPBerlin
LAAS- CNRS
Buda-pestBME
IEEE
IBM
Resex
Pisa
New-castle
RAE 2001
CiteSeer
ACM
DBLP RKB
Explorer
eprints
LIBRIS
SemanticWeb.org
Eurécom
RKBECS
South-ampton
CORDIS
ReSIST ProjectWiki
NationalScience
Foundation
ECS South-ampton
July 14 -‐ 2009
Sept 22 -‐ 2010
As of September 2010
MusicBrainz
(zitgist)
P20
YAGO
World Fact-book (FUB)
WordNet (W3C)
WordNet(VUA)
VIVO UFVIVO
Indiana
VIVO Cornell
VIAF
URIBurner
Sussex Reading
Lists
Plymouth Reading
Lists
UMBEL
UK Post-codes
legislation.gov.uk
Uberblic
UB Mann-heim
TWC LOGD
Twarql
transportdata.gov
.uk
totl.net
Tele-graphis
TCMGeneDIT
TaxonConcept
The Open Library (Talis)
t4gm
Surge Radio
STW
RAMEAU SH
statisticsdata.gov
.uk
St. Andrews Resource
Lists
ECS South-ampton EPrints
Semantic CrunchBase
semanticweb.org
SemanticXBRL
SWDog Food
rdfabout US SEC
Wiki
UN/LOCODE
Ulm
ECS (RKB
Explorer)
Roma
RISKS
RESEX
RAE2001
Pisa
OS
OAI
NSF
New-castle
LAAS
KISTIJISC
IRIT
IEEE
IBM
Eurécom
ERA
ePrints
dotAC
DEPLOY
DBLP (RKB
Explorer)
Course-ware
CORDIS
CiteSeer
Budapest
ACM
riese
Revyu
researchdata.gov
.uk
referencedata.gov
.uk
Recht-spraak.
nl
RDFohloh
Last.FM (rdfize)
RDF Book
Mashup
PSH
ProductDB
PBAC
Poké-pédia
Ord-nance Survey
Openly Local
The Open Library
OpenCyc
OpenCalais
OpenEI
New York
Times
NTU Resource
Lists
NDL subjects
MARC Codes List
Man-chesterReading
Lists
Lotico
The London Gazette
LOIUS
lobidResources
lobidOrgani-sations
LinkedMDB
LinkedLCCN
LinkedGeoData
LinkedCT
Linked Open
Numbers
lingvoj
LIBRIS
Lexvo
LCSH
DBLP (L3S)
Linked Sensor Data (Kno.e.sis)
Good-win
Family
Jamendo
iServe
NSZL Catalog
GovTrack
GESIS
GeoSpecies
GeoNames
GeoLinkedData(es)
GTAA
STITCHSIDER
Project Guten-berg (FUB)
MediCare
Euro-stat
(FUB)
DrugBank
Disea-some
DBLP (FU
Berlin)
DailyMed
Freebase
flickr wrappr
Fishes of Texas
FanHubz
Event-Media
EUTC Produc-
tions
Eurostat
EUNIS
ESD stan-dards
Popula-tion (En-AKTing)
NHS (EnAKTing)
Mortality (En-
AKTing)Energy
(En-AKTing)
CO2(En-
AKTing)
educationdata.gov
.uk
ECS South-ampton
Gem. Norm-datei
datadcs
MySpace(DBTune)
MusicBrainz
(DBTune)
Magna-tune
John Peel(DB
Tune)
classical(DB
Tune)
Audio-scrobbler (DBTune)
Last.fmArtists
(DBTune)
DBTropes
dbpedia lite
DBpedia
Pokedex
Airports
NASA (Data Incu-bator)
MusicBrainz(Data
Incubator)
Moseley Folk
Discogs(Data In-cubator)
Climbing
Linked Data for Intervals
Cornetto
Chronic-ling
America
Chem2Bio2RDF
biz.data.
gov.uk
UniSTS
UniRef
UniPath-way
UniParc
Taxo-nomy
UniProt
SGD
Reactome
PubMed
PubChem
PRO-SITE
ProDom
Pfam PDB
OMIM
OBO
MGI
KEGG Reaction
KEGG Pathway
KEGG Glycan
KEGG Enzyme
KEGG Drug
KEGG Cpd
InterPro
HomoloGene
HGNC
Gene Ontology
GeneID
GenBank
ChEBI
CAS
Affy-metrix
BibBaseBBC
Wildlife Finder
BBC Program
mesBBC
Music
rdfaboutUS Census
LOD cloud – Sept 22 2010
As of September 2010
MusicBrainz
(zitgist)
P20
YAGO
World Fact-book (FUB)
WordNet (W3C)
WordNet(VUA)
VIVO UFVIVO
Indiana
VIVO Cornell
VIAF
URIBurner
Sussex Reading
Lists
Plymouth Reading
Lists
UMBEL
UK Post-codes
legislation.gov.uk
Uberblic
UB Mann-heim
TWC LOGD
Twarql
transportdata.gov
.uk
totl.net
Tele-graphis
TCMGeneDIT
TaxonConcept
The Open Library (Talis)
t4gm
Surge Radio
STW
RAMEAU SH
statisticsdata.gov
.uk
St. Andrews Resource
Lists
ECS South-ampton EPrints
Semantic CrunchBase
semanticweb.org
SemanticXBRL
SWDog Food
rdfabout US SEC
Wiki
UN/LOCODE
Ulm
ECS (RKB
Explorer)
Roma
RISKS
RESEX
RAE2001
Pisa
OS
OAI
NSF
New-castle
LAAS
KISTIJISC
IRIT
IEEE
IBM
Eurécom
ERA
ePrints
dotAC
DEPLOY
DBLP (RKB
Explorer)
Course-ware
CORDIS
CiteSeer
Budapest
ACM
riese
Revyu
researchdata.gov
.uk
referencedata.gov
.uk
Recht-spraak.
nl
RDFohloh
Last.FM (rdfize)
RDF Book
Mashup
PSH
ProductDB
PBAC
Poké-pédia
Ord-nance Survey
Openly Local
The Open Library
OpenCyc
OpenCalais
OpenEI
New York
Times
NTU Resource
Lists
NDL subjects
MARC Codes List
Man-chesterReading
Lists
Lotico
The London Gazette
LOIUS
lobidResources
lobidOrgani-sations
LinkedMDB
LinkedLCCN
LinkedGeoData
LinkedCT
Linked Open
Numbers
lingvoj
LIBRIS
Lexvo
LCSH
DBLP (L3S)
Linked Sensor Data (Kno.e.sis)
Good-win
Family
Jamendo
iServe
NSZL Catalog
GovTrack
GESIS
GeoSpecies
GeoNames
GeoLinkedData(es)
GTAA
STITCHSIDER
Project Guten-berg (FUB)
MediCare
Euro-stat
(FUB)
DrugBank
Disea-some
DBLP (FU
Berlin)
DailyMed
Freebase
flickr wrappr
Fishes of Texas
FanHubz
Event-Media
EUTC Produc-
tions
Eurostat
EUNIS
ESD stan-dards
Popula-tion (En-AKTing)
NHS (EnAKTing)
Mortality (En-
AKTing)Energy
(En-AKTing)
CO2(En-
AKTing)
educationdata.gov
.uk
ECS South-ampton
Gem. Norm-datei
datadcs
MySpace(DBTune)
MusicBrainz
(DBTune)
Magna-tune
John Peel(DB
Tune)
classical(DB
Tune)
Audio-scrobbler (DBTune)
Last.fmArtists
(DBTune)
DBTropes
dbpedia lite
DBpedia
Pokedex
Airports
NASA (Data Incu-bator)
MusicBrainz(Data
Incubator)
Moseley Folk
Discogs(Data In-cubator)
Climbing
Linked Data for Intervals
Cornetto
Chronic-ling
America
Chem2Bio2RDF
biz.data.
gov.uk
UniSTS
UniRef
UniPath-way
UniParc
Taxo-nomy
UniProt
SGD
Reactome
PubMed
PubChem
PRO-SITE
ProDom
Pfam PDB
OMIM
OBO
MGI
KEGG Reaction
KEGG Pathway
KEGG Glycan
KEGG Enzyme
KEGG Drug
KEGG Cpd
InterPro
HomoloGene
HGNC
Gene Ontology
GeneID
GenBank
ChEBI
CAS
Affy-metrix
BibBaseBBC
Wildlife Finder
BBC Program
mesBBC
Music
rdfaboutUS Census
Media
Geographic
Publications
Government
Cross-domain
Life sciences
User-generated content
latest LOD cloud
Leveraging Linked Datasets Pharmaceu+cal example
• There are many ways to leverage exis+ng informa+on and to perform knowledge discovery within them.
• This example makes use of the allegrograph plalorm and query interface supported by Franz Inc, A web 3.0 database provider.
• Allegrograph can be downloaded from their website at h[p://www.franz.com
Leveraging Linked Datasets Pharmaceu+cal example
• Facilitates informa+on sharing between knowledge bases and between researchers
• The graphical viewers and browsers provide by Franz enable visualiza+on of rela+onships between en++es (GRUFF displays rela+onships between en++es as well as providing a query interface)
Life Sciences Example -‐ Allegrograph
• Drugs from Drug Bank • Looked them up in the text of the clinical trials
LinkedCT• Looked up all side effects in SIDER and
looked them up in the texts in the clinical trials. • Resulted in about a million new triples.• Ability to now search for a drug, find all the
clinical trials that mention them and then also find all the side effects also mentioned in the same trials.
Life Sciences Example -‐ Allegrograph
Life Sciences Example -‐ Allegrograph
Namely, we took a look at information dealing with:
- drugs- targets- diseases- side-effects
And ran a query to find all clinical trials for Atorvastatin where side effect of Atorvastatin (or lipitor) is type 2 diabetes
Life Sciences Example -‐ Allegrograph
SPARQL query:
SELECT ?drug ?sideeffect ?trial WHERE {?drug rdfs:label 'Atorvastatin' .?sideeffect rdfs:label 'Type 2 Diabetes' .?trial franz:discusses-drug ?drug .?trial franz:discusses-side-effect ?sideeffect .} limit 10
Translated into English, the SPARQL query reads: “find every, drug, sideffect and clinical trial where the label of the drug is Atorvasta+n, the side effect is type 2 diabetes, restrict output to 10 ”
Example by: (Jans Aasman – Franz Inc) Web 3.0’s database
Life Sciences Example -‐ Allegrograph
Tools for more profitable eCommerce
Online Commerce
• BEST BUY and other retailers are using seman+c technologies to improve visibility of of products and services leveraging:– Goodrela+ons Ontology for e-‐Commerce
– RDFa
Other major online retailers also leveraging the technology
h[p://www.overstock.com/Home-‐Garden/Hotel-‐8-‐piece-‐Comforter-‐Set/367226/product.html
Sindice Inspector -‐ .nt format
Gruff View
Summary
• Significant adop+on in many arenas and by many of the “major players”
• Growing number of Vendor’s providing services and tools
• Many open source tools & resources (“RDFizers”, SPARQL endpoints, SINDICE – Seman+c Web index)
• Technology mature enough at this point to provide compe++ve advantage in many arenas.