semantic markup using schema.org
DESCRIPTION
A basic intro to microdata and schema.org, along with a new schema.org extension for datasets and data catalogs. "TWed" talk April 4, 2012.TRANSCRIPT
![Page 1: semantic markup using schema.org](https://reader034.vdocuments.us/reader034/viewer/2022052310/554fadddb4c905ad218b5068/html5/thumbnails/1.jpg)
Wednesday Nights in the Tetherless World (TWed)April 4th, 2012
Joshua Shinavier
![Page 2: semantic markup using schema.org](https://reader034.vdocuments.us/reader034/viewer/2022052310/554fadddb4c905ad218b5068/html5/thumbnails/2.jpg)
• rich snippets
• microformats
• RDFa
• microdata
• microdata syntax
• schema.org
• deployment
• mappings, tools, extensions
• the Dataset extension
Outline
2
![Page 3: semantic markup using schema.org](https://reader034.vdocuments.us/reader034/viewer/2022052310/554fadddb4c905ad218b5068/html5/thumbnails/3.jpg)
3
![Page 4: semantic markup using schema.org](https://reader034.vdocuments.us/reader034/viewer/2022052310/554fadddb4c905ad218b5068/html5/thumbnails/4.jpg)
• several solutions for embedding semantic data in Web pages
• three syntaxes known (by Google) as “rich snippets”
- microformats
- RDFa
- HTML microdata
• all three are supported by Google, while
- microdata is the “recommended” syntax
the three syntaxes
4
![Page 5: semantic markup using schema.org](https://reader034.vdocuments.us/reader034/viewer/2022052310/554fadddb4c905ad218b5068/html5/thumbnails/5.jpg)
• microformats emerged around 2005
• some key principles
- start by solving simple, specific problems
- design for humans first, machines second
• wide deployment
- used on billions of Web pages
- usage share was at 94% vis-a-vis competing formats (before microdata, anyway)
• formats exist for marking up Atom feeds, calendars, addresses and contact info, geo-location, multimedia, news, products, recipes, reviews, resumes, social relationships, etc.
First came microformats
5
![Page 6: semantic markup using schema.org](https://reader034.vdocuments.us/reader034/viewer/2022052310/554fadddb4c905ad218b5068/html5/thumbnails/6.jpg)
microformats example
6
<div class="vcard"> <a class="fn org url" href="http://www.commerce.net/">CommerceNet</a> <div class="adr"> <span class="type">Work</span>: <div class="street-address">169 University Avenue</div> <span class="locality">Palo Alto</span>, <abbr class="region" title="California">CA</abbr> <span class="postal-code">94301</span> <div class="country-name">USA</div> </div> <div class="tel"> <span class="type">Work</span> +1-650-289-4040 </div> <div>Email: <span class="email">[email protected]</span> </div></div>
![Page 7: semantic markup using schema.org](https://reader034.vdocuments.us/reader034/viewer/2022052310/554fadddb4c905ad218b5068/html5/thumbnails/7.jpg)
• RDFa aims to bridge the gap between human-oriented HTML and machine-oriented RDF documents
• provides XHTML attributes to indicate machine-understandable information
• uses the RDF data model, and Semantic Web vocabularies directly
then came RDFa
7
![Page 8: semantic markup using schema.org](https://reader034.vdocuments.us/reader034/viewer/2022052310/554fadddb4c905ad218b5068/html5/thumbnails/8.jpg)
RDFa example
8
<div typeof="foaf:Person" xmlns:foaf="http://xmlns.com/foaf/0.1/"> <p property="foaf:name"> Alice Birpemswick </p> <p> Email: <a rel="foaf:mbox" href="mailto:[email protected]">[email protected]</a> </p> <p> Phone: <a rel="foaf:phone" href="tel:+1-617-555-7332">+1 617.555.7332</a> </p></div>
![Page 9: semantic markup using schema.org](https://reader034.vdocuments.us/reader034/viewer/2022052310/554fadddb4c905ad218b5068/html5/thumbnails/9.jpg)
• microdata syntax is based on nested groups of name-value pairs
• HTML microdata specification includes
- an unambiguous parsing model
- an algorithm to convert microdata to RDF
• compatible with the Semantic Web via mappings
last but not least, microdata
9
![Page 10: semantic markup using schema.org](https://reader034.vdocuments.us/reader034/viewer/2022052310/554fadddb4c905ad218b5068/html5/thumbnails/10.jpg)
10
![Page 11: semantic markup using schema.org](https://reader034.vdocuments.us/reader034/viewer/2022052310/554fadddb4c905ad218b5068/html5/thumbnails/11.jpg)
• annotate an item with text-valued properties using the “itemprop” attribute
microdata properties
11
<div itemscope> <p>My name is <span itemprop="name">Daniel</span>.</p></div>
![Page 12: semantic markup using schema.org](https://reader034.vdocuments.us/reader034/viewer/2022052310/554fadddb4c905ad218b5068/html5/thumbnails/12.jpg)
• as in RDF, you can have two properties, for the same item (subject) with the same value (object)
multiple values are OK
12
<div itemscope> <p>Flavors in my favorite ice cream:</p> <ul> <li itemprop="flavor">Lemon sorbet</li> <li itemprop="flavor">Apricot sorbet</li> </ul></div>
![Page 13: semantic markup using schema.org](https://reader034.vdocuments.us/reader034/viewer/2022052310/554fadddb4c905ad218b5068/html5/thumbnails/13.jpg)
• these correspond to classes in RDF
item types
13
<section itemscope itemtype="http://example.org/animals#cat"> <h1 itemprop="name">Hedral</h1> <p itemprop="desc">Hedral is a male american domestic shorthair, with a fluffy black fur with white paws and belly.</p> <img itemprop="img" src="hedral.jpeg" alt="" title="Hedral, age 18 months"></section>
![Page 14: semantic markup using schema.org](https://reader034.vdocuments.us/reader034/viewer/2022052310/554fadddb4c905ad218b5068/html5/thumbnails/14.jpg)
• items may be given global identifiers, which are URLs
• they may be, but do not need to be Semantic Web URIs
global IDs
14
<dl itemscope itemtype="http://vocab.example.net/book" itemid="urn:isbn:0-330-34032-8"> <dt>Title <dd itemprop="title">The Reality Dysfunction <dt>Author <dd itemprop="author">Peter F. Hamilton <dt>Publication date <dd><time itemprop="pubdate" datetime="1996-01-26">26 January 1996</time></dl>
![Page 15: semantic markup using schema.org](https://reader034.vdocuments.us/reader034/viewer/2022052310/554fadddb4c905ad218b5068/html5/thumbnails/15.jpg)
15
![Page 16: semantic markup using schema.org](https://reader034.vdocuments.us/reader034/viewer/2022052310/554fadddb4c905ad218b5068/html5/thumbnails/16.jpg)
• schema.org is one of a number of microdata vocabularies
• it is a shared collection of microdata schemas for use by webmasters
• includes a type hierarchy, like an RDFS schema
- starts with top-level Thing and DataType types
- properties are inherited by descendant types
the schema.org vocabulary
16
![Page 17: semantic markup using schema.org](https://reader034.vdocuments.us/reader034/viewer/2022052310/554fadddb4c905ad218b5068/html5/thumbnails/17.jpg)
Why should you use schema.org?
17
There are several reasons.
![Page 18: semantic markup using schema.org](https://reader034.vdocuments.us/reader034/viewer/2022052310/554fadddb4c905ad218b5068/html5/thumbnails/18.jpg)
current schema.org types
18
(there are around 300 of them)
![Page 19: semantic markup using schema.org](https://reader034.vdocuments.us/reader034/viewer/2022052310/554fadddb4c905ad218b5068/html5/thumbnails/19.jpg)
In terms of deployment...
19
...a few key types stand out.
![Page 20: semantic markup using schema.org](https://reader034.vdocuments.us/reader034/viewer/2022052310/554fadddb4c905ad218b5068/html5/thumbnails/20.jpg)
Top types
20
type occurrences relativeProduct 5001966 0.27689260175
PostalAddress 1437388 0.07956913403
WebPage 1402426 0.07763375119
Offer 1267545 0.07016717684
Book 1111463 0.06152698395
Person 968737 0.05362613587
AggregateRating 780967 0.04323179816
GeoCoordinates 546586 0.03025722678
LocalBusiness 544662 0.03015072039
Article 525487 0.02908925463
Place 490433 0.02714877897
Residence 451652 0.02500198869
ItemPage 421911 0.02335562347
Organization 405876 0.02246797792
Blog 268582 0.01486782772
![Page 21: semantic markup using schema.org](https://reader034.vdocuments.us/reader034/viewer/2022052310/554fadddb4c905ad218b5068/html5/thumbnails/21.jpg)
Who’s using it?
21
Over 1,000 domains found (through Sindice)
![Page 22: semantic markup using schema.org](https://reader034.vdocuments.us/reader034/viewer/2022052310/554fadddb4c905ad218b5068/html5/thumbnails/22.jpg)
Some early adopters
22
domain occurrences relativewww.couponcabin.com 3662 0.04400596
www.digifotopro.nl 2852 0.034272255
www.weg.de 2336 0.028071525
futpedia.globo.com 2003 0.02406989
www.the-plug.com 2001 0.024045857
www.virtualtourist.com 1953 0.023469044
gdgt.com 1857 0.02231542
www.notasdeprensa.es 1564 0.018794463
www.libreriadelsanto.it 1294 0.015549894
liriklaguindonesia.net 1274 0.015309556
www.direct2florist.com 1080 0.012978273
www.bluefountainmedia.com 1065 0.01279802
www.alphabetsigns.com 1059 0.012725918
www.tasit.com 1004 0.012064988
www.teachstreet.com 1001 0.012028937
![Page 23: semantic markup using schema.org](https://reader034.vdocuments.us/reader034/viewer/2022052310/554fadddb4c905ad218b5068/html5/thumbnails/23.jpg)
• maintains schema.org ↔ RDF mappings
- there are mappings for BIBO, DBpedia, Dublin Core, FOAF, GoodRelations, SIOC, and WordNet
• also provides examples, tutorials, and data dumps
schema.rdfs.org
23
See: http://schema.rdfs.org/mappings.html
![Page 24: semantic markup using schema.org](https://reader034.vdocuments.us/reader034/viewer/2022052310/554fadddb4c905ad218b5068/html5/thumbnails/24.jpg)
• Google’s Rich Snippets Testing Tool
• schema.org libraries are available in Java, JavaScript, Perl, PHP, Python, and Ruby
• there are schema.org modules for Drupal, Joomla!, WordPress, and Virtuoso
• online tools include microdata extractors, generators and validators
• sindice.com supports microdata
schema.org tools
24
See: http://schema.rdfs.org/tools.html
![Page 25: semantic markup using schema.org](https://reader034.vdocuments.us/reader034/viewer/2022052310/554fadddb4c905ad218b5068/html5/thumbnails/25.jpg)
• there are dozens of schema.org community proposals
- they extend existing schema.org vocabulary
• several have already been accepted into schema.org, incl.
- Job Postings
- IPTC/rNews integration
- User Comments
• others: Comics, Learning Resources, TV and Radio, Software Application, etc.
schema.org extensions
25
![Page 26: semantic markup using schema.org](https://reader034.vdocuments.us/reader034/viewer/2022052310/554fadddb4c905ad218b5068/html5/thumbnails/26.jpg)
26
![Page 27: semantic markup using schema.org](https://reader034.vdocuments.us/reader034/viewer/2022052310/554fadddb4c905ad218b5068/html5/thumbnails/27.jpg)
motivation: open government data
27
![Page 28: semantic markup using schema.org](https://reader034.vdocuments.us/reader034/viewer/2022052310/554fadddb4c905ad218b5068/html5/thumbnails/28.jpg)
• DataCatalog
- a collection of datasets
- e.g. the International Open Government Data catalog
• Dataset
- an individual, abstract data set
- e.g. a data set about seismic hazard zones near San Francisco
• DataDownload
- a dataset in downloadable form
- e.g. an RDF/XML dump of the seismic hazard zones data set
the Dataset vocabulary: types
28
![Page 29: semantic markup using schema.org](https://reader034.vdocuments.us/reader034/viewer/2022052310/554fadddb4c905ad218b5068/html5/thumbnails/29.jpg)
• catalog
- the catalog containing a dataset
• dataset
- a dataset contained in a catalog
• distribution
- a data download for a dataset
• keyword
- the topic of a dataset
• spatial
- the spatial extent of a data set (e.g. United States)
the Dataset vocabulary: properties
29
![Page 30: semantic markup using schema.org](https://reader034.vdocuments.us/reader034/viewer/2022052310/554fadddb4c905ad218b5068/html5/thumbnails/30.jpg)
• the Dataset extension maps to a subset of the Data Catalog Vocabulary (DCAT)
• many other types and properties are inherited from schema.org
• collectively, they cover
- around 2/3 of DCAT, and
- around half of the Asset Description Metadata Schema (ADMS)
Dataset extension ↔ RDF
30
![Page 31: semantic markup using schema.org](https://reader034.vdocuments.us/reader034/viewer/2022052310/554fadddb4c905ad218b5068/html5/thumbnails/31.jpg)
Dataset example (microdata)
31
<div itemscope="itemscope" itemid="http://logd.tw.rpi.edu/source/datasf-org/dataset/catalog/datasf.org/version/2011-Jun-07/thing_89" itemtype="http://schema.org/Dataset"> <a href="http://www.datasf.org/story.php?title=seismic-hazard-zones-"><span itemprop="name"> <b>Seismic Hazard Zones</b> </span></a> <div><meta itemprop="url" content="http://www.datasf.org/story.php?title=seismic-hazard-zones-"/> <span itemprop="description">The dataset represents the Liquefaction and Landslide Zones [...]</span></div> <div><i>Country:</i> <a href="http://dbpedia.org/resource/United_States"><span itemprop="spatial" itemscope="itemscope" itemtype="http://schema.org/Country"> <span itemprop="name">United States</span> </span> </a></div> <div><i>Publisher:</i> <span itemprop="publisher" itemscope="itemscope" itemtype="http://schema.org/Organization"> <span itemprop="name">Department of Technology</span> </span> </div></div>
![Page 32: semantic markup using schema.org](https://reader034.vdocuments.us/reader034/viewer/2022052310/554fadddb4c905ad218b5068/html5/thumbnails/32.jpg)
Dataset example (RDFa)
32
<div about="http://logd.tw.rpi.edu/source/datasf-org/dataset/catalog/datasf.org/version/2011-Jun-07/thing_89" typeof="dcat:Dataset"> <div><b><a href="http://www.datasf.org/story.php?title=seismic-hazard-zones-">
<span property="dcterms:title">Seismic Hazard Zones</span> </a></b></div>
<div property="dcterms:description">The dataset represents the Liquefaction and Landslide Zones [...]</div> <div rel="dcterms:spatial" resource="http://dbpedia.org/resource/United_States"><i>Country:</i> <a href="http://dbpedia.org/resource/United_States"> <span about="http://dbpedia.org/resource/United_States" typeof="adms:Country"> <span property="dcterms:title">United States</span> </span> </a> </div> <div rel="dcterms:publisher"><i>Publisher:</i> <span typeof="foaf:Organization"> <span property="dcterms:title">Department of Technology</span> </span> </div></div>
![Page 33: semantic markup using schema.org](https://reader034.vdocuments.us/reader034/viewer/2022052310/554fadddb4c905ad218b5068/html5/thumbnails/33.jpg)
Google extracts this data
33
Item Type: http://schema.org/datasetname = Seismic Hazard Zones url = http://www.datasf.org/story.php?title=seismic-hazard-zones- description = The dataset represents the Liquefaction and Landslide Zones [...] spatial = Item( 1 ) publisher = Item( 2 )
Item 1 Type: http://schema.org/countryname = United States
Item 2 Type: http://schema.org/organizationname = Department of Technology
![Page 34: semantic markup using schema.org](https://reader034.vdocuments.us/reader034/viewer/2022052310/554fadddb4c905ad218b5068/html5/thumbnails/34.jpg)
• HTML microdata
- http://www.w3.org/TR/microdata
• Schema.RDFS.org
- http://schema.rdfs.org
• W3C Web Schemas group ([email protected])
- http://lists.w3.org/Archives/Public/public-vocabs
• The Dataset proposal
- http://www.w3.org/wiki/WebSchemas/Datasets
• Rich Snippets Testing Tool
- http://google.com/webmasters/tools/richsnippets
Resources
34
![Page 35: semantic markup using schema.org](https://reader034.vdocuments.us/reader034/viewer/2022052310/554fadddb4c905ad218b5068/html5/thumbnails/35.jpg)
• word clouds by
- http://wordle.net
• deployment statistics discovered using Sindice and Sindice4j
- http://sindice.com
- http://sindice4j.googlecode.com
Credits
35
![Page 36: semantic markup using schema.org](https://reader034.vdocuments.us/reader034/viewer/2022052310/554fadddb4c905ad218b5068/html5/thumbnails/36.jpg)
• Tetherless World Constellation
• http://tw.rpi.edu
• Contact:
• [email protected], @joshsh
Thanks!
36
![Page 37: semantic markup using schema.org](https://reader034.vdocuments.us/reader034/viewer/2022052310/554fadddb4c905ad218b5068/html5/thumbnails/37.jpg)
37