sd sem weboct252010

53
Leveraging the growth of the Seman1c Web from Seman1c SEO to ..... San Diego Seman+c Web Meetup Oct 25, 2010 Barbara Starr Email: [email protected] Twitter: @BarbaraStarr

Upload: semantic-web-san-diego

Post on 11-May-2015

1.428 views

Category:

Technology


1 download

TRANSCRIPT

Page 1: Sd sem weboct252010

Leveraging  the  growth  of  the  Seman1c  Web  -­‐  from  Seman1c  SEO  to  .....

San  Diego  Seman+c  Web  Meetup

Oct  25,  2010

Barbara StarrEmail: [email protected]

Twitter: @BarbaraStarr

Page 2: Sd sem weboct252010

So  …  Let  us  begin  to  take  a  look  at  how  the  Seman+c  Web  is  being  used  and  leveraged  in  the  real  world  of  late  (feel  free  to  add:  …..

And  of  course,  who  is  using  it  ,  how,  ........

Page 3: Sd sem weboct252010

Seman+c  Search/SEO  

The  major  Search  Engines  &  Social  Networks  are  currently  leveraging  

Seman+c  Web  Technology

Page 4: Sd sem weboct252010

What  is  Seman+c  Search

• Semantic Search is basically the notion of improving search by using metadata or searching on that metadata.

• There are several ways that the Search engines on the web may use this to enhance search results.

– FIND, rather than SEARCH. • Searching directly on the metadata directly can yield specific

answers or results as demonstrated in the following example:

Query “Barack Obama Birthday”

Results on

Page 6: Sd sem weboct252010

Defini+ve  Answer  on  Top

Page 7: Sd sem weboct252010

Bing

Definitive Answer

Note: Freebase part of Metaweb acquisition by Google

Page 8: Sd sem weboct252010
Page 9: Sd sem weboct252010

Definitive answer & enhanced display

Bingleveraged  this  for  quite  some  +me

Page 10: Sd sem weboct252010

What  is  Seman+c  Search  (cont)

• Semantic Search is basically the notion of improving search by using metadata or searching on that metadata.

• There are several ways that the Search engines on the web may use this to enhance search results.

– FIND, rather than SEARCH. • Searching directly on the metadata directly can yield specific

answers or results as demonstrated in the following example: • Ran the query “Barack Obama Birthday” on both google, and

bing. Obtained the following:

– Answer  engines  rather  than  Search  Engines?

Page 11: Sd sem weboct252010

What  is  Seman+c  Search  (Cont)

• Semantic Search is basically the notion of improving search by using metadata or searching on that metadata.

• There are several ways that the Search engines on the web may use this to enhance search results.

– FIND, rather than SEARCH. – Another aspect of using metadata such as embedding

metadata or semantic markup in web pages could be demonstrated by enhanced displays in search results (e.g. rich snippets  in  google).    Both  Google  and  Yahoo  support  enhanced  displays  for  RDFa  markup.    

Page 12: Sd sem weboct252010

Rich  Snippets

• Google  now  supports  Rich  snippets  for– People– Events– Businesses  and  organiza+ons– Reviews– Recipes– Products  when  related  to  a  review– Breadcrumbs– Local  Search

h[p://rdf.data-­‐vocabulary.org/#

Page 13: Sd sem weboct252010

Events

Page 14: Sd sem weboct252010

14

Recipes

Page 15: Sd sem weboct252010

Sept  2,  2010

now  see  more  than  twice  as  many  searches  with  rich  snippets  in  the  results  in  the  US,  and  a  four-­‐fold  increase  globally,  compared  to  one  year  ago.

Page 16: Sd sem weboct252010

Single  Events  –  Sept  2,  2010

Page 17: Sd sem weboct252010
Page 18: Sd sem weboct252010

Social  Networks

• While  search  engines  can  benefit  from  access  to  social  networks,  social  networks  can  benefit  from  seman+c  metadata  in  web  pages

–Example  is  Facebook’s  Open  Graph  Protocol  (also  supports  RDFa)  which  allows  users  to  share  &  like  objects  (such  as  products)  as  opposed  to  web  pages.  Enables  “Seman+c  Profiling”  of  the  users  by  facebook.    (Japanese  MIXI  now  using  it)

Page 19: Sd sem weboct252010

Web  Benefits  /  Uses

• Yahoo stated 15% increase in CTR as a result of enhanced displays, rich snippets in Google

• Definitive answers enabled by understanding and leveraging how search engines are searching directly on metadata

• Semantic Profiling and adoption by social networks

• Embedding semantic markup in web pages and product pages ultimately makes information “findable” by search engines, enabling them to provide improvements such as definitive answers, enhanced displays, etc

Page 20: Sd sem weboct252010

RDFa  produc+on

• Drupal  7  now  produces  RDFa  (previous  meetup)

• Many  CMS  publishers

Page 21: Sd sem weboct252010

Consuming  RDFa

• Previously  indicated  increase  of  RDFa  in  general  and  produc+on  of  RDFa

• Available  consumers/parsers– Sindice  (any23)

– Rdfa  dis+ller

Sindice.com

Page 22: Sd sem weboct252010

Handy  Validators

• RDFA  VALIDATORS  AND  TESTERS• New  RDFa  Validator:  h[p://check.rdfa.info/• Sindice  Inspector:  h[p://inspector.sindice.com/• Yahoo  Objeclinder:  h[p://developer.search.yahoo.com/help/objeclinder

• Google  rich  snippets  tester:  h[p://www.google.com/webmasters/tools/richsnippets

Page 23: Sd sem weboct252010

Adopters?• UK  Government

• US  Government• BBC  (FIFA  world  cup  site  dynamically  generated  using  linked  data)• Thomson  Reuters• Freebase

• NY  Times• Best  Buy• Google  (More  to  follow  h[p://rdf.data-­‐vocabulary.org/#)

• Yahoo• Facebook• Mixi• Oracle

• Overstock• Drug  research  and  discovery  companies,  pfizer,  ….• Tons  more  –  Just  look  at  the  diversity  in  the  LOD  data  cloud  (genng  there)

Page 24: Sd sem weboct252010

Spectrum  of  Applica+ons• Seman+c  Wiki’s  (Seman+c  media  Wiki)• Seman+cs  as  a  Service  (e.g.  SIRI)  –  interoperability  of  web  

services,  underlying  service  Ontologies• Enterprise  data  integra+on  (Anzo,• Seman+cs  in  publishing

– Open  Calais  now  has  Openpublish– Zemanta,  primal  pages– Drupal  and  other  CMS  systems

• Contextual  Adver+sing• Sen+ment  Analysis  (COGITO)• Seman+c  Search  (documents  &  structured  data  sources)• Seman+c  Social  Networks

Page 25: Sd sem weboct252010

LOD  Cloud  Evolu+on

The  rate  of  growth  has  been  remarkable

Source  maintained  by:  Richard  Cygniak  and  Anja  Jentsch.  h[p://lod-­‐cloud.net

Page 26: Sd sem weboct252010

Oct  2007

Page 27: Sd sem weboct252010

Nov  2007  (1)

Page 28: Sd sem weboct252010

Nov  2007  (2)

Page 29: Sd sem weboct252010

Feb  2008

Page 30: Sd sem weboct252010

Mar  2008

Page 31: Sd sem weboct252010

Sept  2008

Page 32: Sd sem weboct252010

Mar  2009  (1)

Page 33: Sd sem weboct252010

Mar  2009  (2)

Page 34: Sd sem weboct252010

March  5  -­‐  2009

As of March 2009

LinkedCTReactome

Taxonomy

KEGG

PubMed

GeneID

Pfam

UniProt

OMIM

PDB

SymbolChEBI

Daily Med

Disea-some

CAS

HGNC

InterPro

Drug Bank

UniParc

UniRef

ProDom

PROSITE

Gene Ontology

HomoloGene

PubChem

MGI

UniSTS

GEOSpecies

Jamendo

BBCProgramm

es

Music-brainz

Magna-tune

BBCLater +TOTP

SurgeRadio

MySpaceWrapper

Audio-Scrobbler

LinkedMDB

BBCJohnPeel

BBCPlaycount

Data

Gov-Track

US Census Data

riese

Geo-names

lingvoj

World Fact-book

Euro-stat

IRIT Toulouse

SWConference

Corpus

RDF Book Mashup

Project Guten-berg

DBLPHannover

DBLPBerlin

LAAS- CNRS

Buda-pestBME

IEEE

IBM

Resex

Pisa

New-castle

RAE 2001

CiteSeer

ACM

DBLP RKB

Explorer

eprints

LIBRIS

SemanticWeb.org Eurécom

ECS South-ampton

RevyuSIOCSites

Doap-space

Flickrexporter

FOAFprofiles

flickrwrappr

CrunchBase

Sem-Web-

Central

Open-Guides

Wiki-company

QDOS

Pub Guide

Open Calais

RDF ohloh

W3CWordNet

OpenCyc

UMBEL

Yago

DBpedia

Freebase

Virtuoso Sponger

Page 35: Sd sem weboct252010

March  27  -­‐  2009

As of March 2009

LinkedCTReactome

Taxonomy

KEGG

PubMed

GeneID

Pfam

UniProt

OMIM

PDB

SymbolChEBI

Daily Med

Disea-some

CAS

HGNC

InterPro

Drug Bank

UniParc

UniRef

ProDom

PROSITE

Gene Ontology

HomoloGene

PubChem

MGI

UniSTS

GEOSpecies

Jamendo

BBCProgramm

es

Music-brainz

Magna-tune

BBCLater +TOTP

SurgeRadio

MySpaceWrapper

Audio-Scrobbler

LinkedMDB

BBCJohnPeel

BBCPlaycount

Data

Gov-Track

US Census Data

riese

Geo-names

lingvoj

World Fact-book

Euro-stat

flickrwrappr

Open Calais

RevyuSIOCSites

Doap-space

Flickrexporter

FOAFprofiles

CrunchBase

Sem-Web-

Central

Open-Guides

Wiki-company

QDOS

Pub Guide

RDF ohloh

W3CWordNet

OpenCyc

UMBEL

Yago

DBpedia

Freebase

Virtuoso Sponger

DBLPHannover

IRIT Toulouse

SWConference

Corpus

RDF Book Mashup

Project Guten-berg

DBLPBerlin

LAAS- CNRS

Buda-pestBME

IEEE

IBM

Resex

Pisa

New-castle

RAE 2001

CiteSeer

ACM

DBLP RKB

Explorer

eprints

LIBRIS

SemanticWeb.org

Eurécom

RKBECS

South-ampton

CORDIS

ReSIST ProjectWiki

NationalScience

Foundation

ECS South-ampton

Page 36: Sd sem weboct252010

July  14  -­‐    2009

Page 37: Sd sem weboct252010

Sept  22  -­‐  2010

As of September 2010

MusicBrainz

(zitgist)

P20

YAGO

World Fact-book (FUB)

WordNet (W3C)

WordNet(VUA)

VIVO UFVIVO

Indiana

VIVO Cornell

VIAF

URIBurner

Sussex Reading

Lists

Plymouth Reading

Lists

UMBEL

UK Post-codes

legislation.gov.uk

Uberblic

UB Mann-heim

TWC LOGD

Twarql

transportdata.gov

.uk

totl.net

Tele-graphis

TCMGeneDIT

TaxonConcept

The Open Library (Talis)

t4gm

Surge Radio

STW

RAMEAU SH

statisticsdata.gov

.uk

St. Andrews Resource

Lists

ECS South-ampton EPrints

Semantic CrunchBase

semanticweb.org

SemanticXBRL

SWDog Food

rdfabout US SEC

Wiki

UN/LOCODE

Ulm

ECS (RKB

Explorer)

Roma

RISKS

RESEX

RAE2001

Pisa

OS

OAI

NSF

New-castle

LAAS

KISTIJISC

IRIT

IEEE

IBM

Eurécom

ERA

ePrints

dotAC

DEPLOY

DBLP (RKB

Explorer)

Course-ware

CORDIS

CiteSeer

Budapest

ACM

riese

Revyu

researchdata.gov

.uk

referencedata.gov

.uk

Recht-spraak.

nl

RDFohloh

Last.FM (rdfize)

RDF Book

Mashup

PSH

ProductDB

PBAC

Poké-pédia

Ord-nance Survey

Openly Local

The Open Library

OpenCyc

OpenCalais

OpenEI

New York

Times

NTU Resource

Lists

NDL subjects

MARC Codes List

Man-chesterReading

Lists

Lotico

The London Gazette

LOIUS

lobidResources

lobidOrgani-sations

LinkedMDB

LinkedLCCN

LinkedGeoData

LinkedCT

Linked Open

Numbers

lingvoj

LIBRIS

Lexvo

LCSH

DBLP (L3S)

Linked Sensor Data (Kno.e.sis)

Good-win

Family

Jamendo

iServe

NSZL Catalog

GovTrack

GESIS

GeoSpecies

GeoNames

GeoLinkedData(es)

GTAA

STITCHSIDER

Project Guten-berg (FUB)

MediCare

Euro-stat

(FUB)

DrugBank

Disea-some

DBLP (FU

Berlin)

DailyMed

Freebase

flickr wrappr

Fishes of Texas

FanHubz

Event-Media

EUTC Produc-

tions

Eurostat

EUNIS

ESD stan-dards

Popula-tion (En-AKTing)

NHS (EnAKTing)

Mortality (En-

AKTing)Energy

(En-AKTing)

CO2(En-

AKTing)

educationdata.gov

.uk

ECS South-ampton

Gem. Norm-datei

datadcs

MySpace(DBTune)

MusicBrainz

(DBTune)

Magna-tune

John Peel(DB

Tune)

classical(DB

Tune)

Audio-scrobbler (DBTune)

Last.fmArtists

(DBTune)

DBTropes

dbpedia lite

DBpedia

Pokedex

Airports

NASA (Data Incu-bator)

MusicBrainz(Data

Incubator)

Moseley Folk

Discogs(Data In-cubator)

Climbing

Linked Data for Intervals

Cornetto

Chronic-ling

America

Chem2Bio2RDF

biz.data.

gov.uk

UniSTS

UniRef

UniPath-way

UniParc

Taxo-nomy

UniProt

SGD

Reactome

PubMed

PubChem

PRO-SITE

ProDom

Pfam PDB

OMIM

OBO

MGI

KEGG Reaction

KEGG Pathway

KEGG Glycan

KEGG Enzyme

KEGG Drug

KEGG Cpd

InterPro

HomoloGene

HGNC

Gene Ontology

GeneID

GenBank

ChEBI

CAS

Affy-metrix

BibBaseBBC

Wildlife Finder

BBC Program

mesBBC

Music

rdfaboutUS Census

Page 38: Sd sem weboct252010

LOD  cloud  –  Sept  22  2010

As of September 2010

MusicBrainz

(zitgist)

P20

YAGO

World Fact-book (FUB)

WordNet (W3C)

WordNet(VUA)

VIVO UFVIVO

Indiana

VIVO Cornell

VIAF

URIBurner

Sussex Reading

Lists

Plymouth Reading

Lists

UMBEL

UK Post-codes

legislation.gov.uk

Uberblic

UB Mann-heim

TWC LOGD

Twarql

transportdata.gov

.uk

totl.net

Tele-graphis

TCMGeneDIT

TaxonConcept

The Open Library (Talis)

t4gm

Surge Radio

STW

RAMEAU SH

statisticsdata.gov

.uk

St. Andrews Resource

Lists

ECS South-ampton EPrints

Semantic CrunchBase

semanticweb.org

SemanticXBRL

SWDog Food

rdfabout US SEC

Wiki

UN/LOCODE

Ulm

ECS (RKB

Explorer)

Roma

RISKS

RESEX

RAE2001

Pisa

OS

OAI

NSF

New-castle

LAAS

KISTIJISC

IRIT

IEEE

IBM

Eurécom

ERA

ePrints

dotAC

DEPLOY

DBLP (RKB

Explorer)

Course-ware

CORDIS

CiteSeer

Budapest

ACM

riese

Revyu

researchdata.gov

.uk

referencedata.gov

.uk

Recht-spraak.

nl

RDFohloh

Last.FM (rdfize)

RDF Book

Mashup

PSH

ProductDB

PBAC

Poké-pédia

Ord-nance Survey

Openly Local

The Open Library

OpenCyc

OpenCalais

OpenEI

New York

Times

NTU Resource

Lists

NDL subjects

MARC Codes List

Man-chesterReading

Lists

Lotico

The London Gazette

LOIUS

lobidResources

lobidOrgani-sations

LinkedMDB

LinkedLCCN

LinkedGeoData

LinkedCT

Linked Open

Numbers

lingvoj

LIBRIS

Lexvo

LCSH

DBLP (L3S)

Linked Sensor Data (Kno.e.sis)

Good-win

Family

Jamendo

iServe

NSZL Catalog

GovTrack

GESIS

GeoSpecies

GeoNames

GeoLinkedData(es)

GTAA

STITCHSIDER

Project Guten-berg (FUB)

MediCare

Euro-stat

(FUB)

DrugBank

Disea-some

DBLP (FU

Berlin)

DailyMed

Freebase

flickr wrappr

Fishes of Texas

FanHubz

Event-Media

EUTC Produc-

tions

Eurostat

EUNIS

ESD stan-dards

Popula-tion (En-AKTing)

NHS (EnAKTing)

Mortality (En-

AKTing)Energy

(En-AKTing)

CO2(En-

AKTing)

educationdata.gov

.uk

ECS South-ampton

Gem. Norm-datei

datadcs

MySpace(DBTune)

MusicBrainz

(DBTune)

Magna-tune

John Peel(DB

Tune)

classical(DB

Tune)

Audio-scrobbler (DBTune)

Last.fmArtists

(DBTune)

DBTropes

dbpedia lite

DBpedia

Pokedex

Airports

NASA (Data Incu-bator)

MusicBrainz(Data

Incubator)

Moseley Folk

Discogs(Data In-cubator)

Climbing

Linked Data for Intervals

Cornetto

Chronic-ling

America

Chem2Bio2RDF

biz.data.

gov.uk

UniSTS

UniRef

UniPath-way

UniParc

Taxo-nomy

UniProt

SGD

Reactome

PubMed

PubChem

PRO-SITE

ProDom

Pfam PDB

OMIM

OBO

MGI

KEGG Reaction

KEGG Pathway

KEGG Glycan

KEGG Enzyme

KEGG Drug

KEGG Cpd

InterPro

HomoloGene

HGNC

Gene Ontology

GeneID

GenBank

ChEBI

CAS

Affy-metrix

BibBaseBBC

Wildlife Finder

BBC Program

mesBBC

Music

rdfaboutUS Census

Media

Geographic

Publications

Government

Cross-domain

Life sciences

User-generated content

latest  LOD  cloud

Page 39: Sd sem weboct252010

Leveraging  Linked  Datasets    Pharmaceu+cal  example

• There  are  many  ways  to  leverage  exis+ng  informa+on  and  to  perform  knowledge  discovery  within  them.

• This  example  makes  use  of  the  allegrograph  plalorm  and  query  interface  supported  by  Franz  Inc,  A  web  3.0  database  provider.

• Allegrograph  can  be  downloaded  from  their  website  at    h[p://www.franz.com

Page 40: Sd sem weboct252010

Leveraging  Linked  Datasets    Pharmaceu+cal  example

• Facilitates  informa+on  sharing  between  knowledge  bases  and  between  researchers

• The  graphical  viewers  and  browsers  provide  by  Franz  enable  visualiza+on  of  rela+onships  between  en++es  (GRUFF  displays  rela+onships  between  en++es  as  well  as  providing  a  query  interface)

Page 41: Sd sem weboct252010

Life  Sciences  Example  -­‐  Allegrograph

• Drugs from Drug Bank • Looked them up in the text of the clinical trials

LinkedCT• Looked up all side effects in SIDER and

looked them up in the texts in the clinical trials. • Resulted in about a million new triples.• Ability to now search for a drug, find all the

clinical trials that mention them and then also find all the side effects also mentioned in the same trials.

Page 42: Sd sem weboct252010

Life  Sciences  Example  -­‐  Allegrograph

Page 43: Sd sem weboct252010

Life  Sciences  Example  -­‐  Allegrograph

Namely, we took a look at information dealing with:

- drugs- targets- diseases- side-effects

And ran a query to find all clinical trials for Atorvastatin where side effect of Atorvastatin (or lipitor) is type 2 diabetes

Page 44: Sd sem weboct252010

Life  Sciences  Example  -­‐  Allegrograph

SPARQL query:

SELECT ?drug ?sideeffect ?trial WHERE {?drug rdfs:label 'Atorvastatin' .?sideeffect rdfs:label 'Type 2 Diabetes' .?trial franz:discusses-drug ?drug .?trial franz:discusses-side-effect ?sideeffect .} limit 10

Translated  into  English,  the  SPARQL  query  reads:      “find  every,  drug,  sideffect  and  clinical  trial  where  the  label  of  the  drug  is  Atorvasta+n,  the  side  effect  is  type  2  diabetes,  restrict  output  to  10  ”

Example  by:  (Jans  Aasman  –  Franz  Inc)                                      Web  3.0’s  database

Page 45: Sd sem weboct252010

Life  Sciences  Example  -­‐  Allegrograph

Page 46: Sd sem weboct252010

Tools  for  more  profitable  eCommerce

Page 47: Sd sem weboct252010

Online  Commerce

• BEST  BUY  and  other  retailers  are  using  seman+c  technologies  to  improve  visibility  of  of  products  and  services  leveraging:– Goodrela+ons  Ontology  for  e-­‐Commerce

– RDFa

Page 48: Sd sem weboct252010
Page 49: Sd sem weboct252010

Other  major  online  retailers  also  leveraging  the  technology

Page 51: Sd sem weboct252010

Sindice  Inspector  -­‐  .nt  format

Page 52: Sd sem weboct252010

Gruff  View

Page 53: Sd sem weboct252010

Summary

• Significant  adop+on  in  many  arenas  and  by  many  of  the  “major  players”

• Growing  number  of  Vendor’s  providing  services  and  tools

• Many  open  source  tools  &  resources  (“RDFizers”,  SPARQL  endpoints,  SINDICE  –  Seman+c  Web  index)

• Technology  mature  enough  at  this  point  to  provide  compe++ve  advantage  in  many  arenas.