what a long, strange trip it’s been r.v.guha google schema.org
TRANSCRIPT
![Page 1: What a long, strange trip it’s been R.V.Guha Google schema.org](https://reader035.vdocuments.us/reader035/viewer/2022062407/56649e045503460f94aef59f/html5/thumbnails/1.jpg)
schema.org
What a long, strange trip it’s beenR.V.GuhaGoogle
![Page 2: What a long, strange trip it’s been R.V.Guha Google schema.org](https://reader035.vdocuments.us/reader035/viewer/2022062407/56649e045503460f94aef59f/html5/thumbnails/2.jpg)
schema.org
Outline of talk
• The context – How did we end up where we are
• Schema.org– What it is, status of adoption– Schema.org principles, how does it work
• Looking ahead– Next Generation Applications
![Page 3: What a long, strange trip it’s been R.V.Guha Google schema.org](https://reader035.vdocuments.us/reader035/viewer/2022062407/56649e045503460f94aef59f/html5/thumbnails/3.jpg)
schema.org
About 18 years ago, …
• People started thinking about structured data on the web – A few people from Netscape, Microsoft and W3C got together @MIT
• Trying to make sense of a flurry of activity/proposals– XML, MCF, CDF, Sitemaps, …
• There were a number of problems – PICS, Meta data, sitemaps, …
• But one unifying idea
![Page 4: What a long, strange trip it’s been R.V.Guha Google schema.org](https://reader035.vdocuments.us/reader035/viewer/2022062407/56649e045503460f94aef59f/html5/thumbnails/4.jpg)
schema.org
Context: The Web for humans
Structured Data
Web server
HTML
![Page 5: What a long, strange trip it’s been R.V.Guha Google schema.org](https://reader035.vdocuments.us/reader035/viewer/2022062407/56649e045503460f94aef59f/html5/thumbnails/5.jpg)
schema.org
Goal: Web for Machines & Humans
Web server
Structured Data
Apps
![Page 6: What a long, strange trip it’s been R.V.Guha Google schema.org](https://reader035.vdocuments.us/reader035/viewer/2022062407/56649e045503460f94aef59f/html5/thumbnails/6.jpg)
schema.org
What does that mean?
Chuck Norris
Ryan, Oklahama
March 10th 1940
birthdate
birthplace
Actor
type
- Notable points - Graph Data Model
- Common Vocabulary
![Page 7: What a long, strange trip it’s been R.V.Guha Google schema.org](https://reader035.vdocuments.us/reader035/viewer/2022062407/56649e045503460f94aef59f/html5/thumbnails/7.jpg)
schema.org
How do we get there?
• How does the author give us the graph– Data Model: Graph vs tree vs …– Syntax– Vocabulary– Identifiers for objects
• Why should the author give us the graph?
![Page 8: What a long, strange trip it’s been R.V.Guha Google schema.org](https://reader035.vdocuments.us/reader035/viewer/2022062407/56649e045503460f94aef59f/html5/thumbnails/8.jpg)
schema.org
Going depth first
• Many heated battles– Lot of proposals, standards, companies, …
• Data model– Trees vs DLGs vs Vertical specific vs who needs one?
• Syntax– XML vs RDF vs json vs …
• Model theory anyone– We need one vs who cares vs what’s that?
![Page 9: What a long, strange trip it’s been R.V.Guha Google schema.org](https://reader035.vdocuments.us/reader035/viewer/2022062407/56649e045503460f94aef59f/html5/thumbnails/9.jpg)
schema.org
Timeline of ‘standards’
• ‘96: Meta Content Framework (MCF) (Apple)• ’97: MCF using XML (Netscape) RDF, CDF• ’99 -- : RDF, RDFS • ’01 -- : DAML, OWL, OWL EL, OWL QL, OWL RL• ’03: Microformats• And many many many more … SPARQL, Turtle, N3, GRDDL,
R2RML, FOAF, SIOC, SKOS, …
• Lots of bells & whistles: model theory, inference, type systems, …
![Page 10: What a long, strange trip it’s been R.V.Guha Google schema.org](https://reader035.vdocuments.us/reader035/viewer/2022062407/56649e045503460f94aef59f/html5/thumbnails/10.jpg)
schema.org
But something was missing …
• Fewer than 1000 sites were using these standards
• Something was clearly missing and it wasn’t more language features
• We had forgotten the ‘Why’ part of the problem
• The RSS story
![Page 11: What a long, strange trip it’s been R.V.Guha Google schema.org](https://reader035.vdocuments.us/reader035/viewer/2022062407/56649e045503460f94aef59f/html5/thumbnails/11.jpg)
schema.org
’07 - :Rise of the consumers
• Yahoo! Search Monkey, Google Rich Snippets, Facebook Open Graph
• Offer webmasters a simple value proposition
• Search engines to webmasters:– You give us data … we make your results nicer
• Usage begins to take off– 1000x increase in markup’ed up pages in 3 years
![Page 12: What a long, strange trip it’s been R.V.Guha Google schema.org](https://reader035.vdocuments.us/reader035/viewer/2022062407/56649e045503460f94aef59f/html5/thumbnails/12.jpg)
schema.org
Yahoo Search Monkey
• Give websites control over snippet presentation• Moderate adoption
– Targeted at high end developers – Too many choices
![Page 13: What a long, strange trip it’s been R.V.Guha Google schema.org](https://reader035.vdocuments.us/reader035/viewer/2022062407/56649e045503460f94aef59f/html5/thumbnails/13.jpg)
schema.org
Google Rich Snippets: Reviews
![Page 14: What a long, strange trip it’s been R.V.Guha Google schema.org](https://reader035.vdocuments.us/reader035/viewer/2022062407/56649e045503460f94aef59f/html5/thumbnails/14.jpg)
schema.org
Google Rich Snippets: Events
![Page 15: What a long, strange trip it’s been R.V.Guha Google schema.org](https://reader035.vdocuments.us/reader035/viewer/2022062407/56649e045503460f94aef59f/html5/thumbnails/15.jpg)
schema.org
Google Rich Snippets
• Multi-syntax• Adhoc vocabulary for each vertical• Very clear carrot • Lots of experimentation on UI• Moderately successful: 10ks of sites• Scaling issues with vocabulary
![Page 16: What a long, strange trip it’s been R.V.Guha Google schema.org](https://reader035.vdocuments.us/reader035/viewer/2022062407/56649e045503460f94aef59f/html5/thumbnails/16.jpg)
schema.org
Situation in 2010
• Too many choices/decisions for webmasters– Divergence in vocabularies
• Too much fragmentation • N versions of person, address, …
• A lot of bad/wrong markup– ~25% for micro-formats, ~40% with RDFA– Some spam, mostly unintended mistakes
• Absolute adoption numbers still rather low– Less than 100k sites
![Page 17: What a long, strange trip it’s been R.V.Guha Google schema.org](https://reader035.vdocuments.us/reader035/viewer/2022062407/56649e045503460f94aef59f/html5/thumbnails/17.jpg)
schema.org
Schema.org
• Work started in August 2010– Google, Yahoo!, Microsoft & then Yandex
• Goals:– One vocabulary understood by all the search engines– Make it very easy for the webmaster
• It is A vocabulary. Not The vocabulary.– Webmasters can use it together other vocabs– We might not understand the other vocabs. Others might
![Page 18: What a long, strange trip it’s been R.V.Guha Google schema.org](https://reader035.vdocuments.us/reader035/viewer/2022062407/56649e045503460f94aef59f/html5/thumbnails/18.jpg)
schema.org
Schema.org: Major sites
• News: Nytimes, guardian.com, bbc.co.uk,• Movies: imdb, rottentomatoes, movies.com• Jobs / careers: careerjet.com, monster.com, indeed.com• People: linkedin.com, • Products: ebay.com, alibaba.com, sears.com, cafepress.com,
sulit.com, fotolia.com• Videos: youtube, dailymotion, frequency.com, vinebox.com• Medical: cvs.com, drugs.com• Local: yelp.com, allmenus.com, urbanspoon.com• Events: wherevent.com, meetup.com, zillow.com, eventful• Music: last.fm, myspace.com, soundcloud.com
![Page 19: What a long, strange trip it’s been R.V.Guha Google schema.org](https://reader035.vdocuments.us/reader035/viewer/2022062407/56649e045503460f94aef59f/html5/thumbnails/19.jpg)
schema.org
Schema.org principles: Simplicity
• Simple things should be simple– For webmasters, not necessarily for consumers of markup– Webmasters shouldn’t have to deal with N namespaces
• Complex things should be possible– Advanced webmasters should be able to mix and match
vocabularies
• Syntax– Microdata, usability studies– RDFa, json-ld, …
![Page 20: What a long, strange trip it’s been R.V.Guha Google schema.org](https://reader035.vdocuments.us/reader035/viewer/2022062407/56649e045503460f94aef59f/html5/thumbnails/20.jpg)
schema.org
Schema.org principles: Simplicity
• Can’t expect webmasters to understand Knowledge Representation, Semantic Web Query Languages, etc.
• It has to fit in with existing workflows– A posteriori ‘markup tools’ don’t work
• Avoid KR system driven artifacts– Multiple domain / range for attributes– No classes like ‘Agent’ – Categories and attributes should be concrete
![Page 21: What a long, strange trip it’s been R.V.Guha Google schema.org](https://reader035.vdocuments.us/reader035/viewer/2022062407/56649e045503460f94aef59f/html5/thumbnails/21.jpg)
schema.org
Schema.org principles: Simplicity
• Copy and edit as the default mode for authors– It is not a linear spec, but a tree of examples
• Vocabularies– Authors only need to have local view– But schema.org tries to have a single global coherent
vocabulary
![Page 22: What a long, strange trip it’s been R.V.Guha Google schema.org](https://reader035.vdocuments.us/reader035/viewer/2022062407/56649e045503460f94aef59f/html5/thumbnails/22.jpg)
schema.org
Schema.org principles: Incremental
• Started simple – ~ 100 categories at launch
• Applies to every area– Add complexity after adoption– now ~1200 vocab items– Go back and fill in the blanks
• Move fast, accept mistakes, iterate fast
![Page 23: What a long, strange trip it’s been R.V.Guha Google schema.org](https://reader035.vdocuments.us/reader035/viewer/2022062407/56649e045503460f94aef59f/html5/thumbnails/23.jpg)
schema.org
Schema.org Principles: URIs • ~1000s of terms like Actor, birthdate
– ~10s for most sites– Common across sites
• ~10ks of terms like USA– External enumerations
• ~1b-100b terms like Chuck Norris and Ryan, Oklahama– Cannot expect agreement on these– Reference by description– Consumers can reconcile entity references
Chuck Norris
Ryan, Oklahama
March 10th 1940
birthplace
Actor
type
citizenOf
USA
birthdate
![Page 24: What a long, strange trip it’s been R.V.Guha Google schema.org](https://reader035.vdocuments.us/reader035/viewer/2022062407/56649e045503460f94aef59f/html5/thumbnails/24.jpg)
schema.org
An Actor named Chuck Norris
March 10th 1940
citizenOf
USA
birthdate
A city named RyanIn the state OK
March 10th 1940
birthplace
birthdate
A Person namedGeena O’Kelley
spouse
An Actor named Chuck Norris+
=
USA
Chuck Norris
Ryan, Oklahama
March 10th 1940
birthplace
Actor
type
citizenOfbirthdate
spouseGeena O’Kelley
![Page 25: What a long, strange trip it’s been R.V.Guha Google schema.org](https://reader035.vdocuments.us/reader035/viewer/2022062407/56649e045503460f94aef59f/html5/thumbnails/25.jpg)
schema.org
Schema.org Principles: Collaborations
• Most discussions on public W3C lists
• Work closely with interest communities
• Work with others to incorporate their vocabularies– We give them attribution on schema.org– Webmasters should not have to worry about where each
piece of the vocabulary came from– Webmasters can mix and match vocabs
![Page 26: What a long, strange trip it’s been R.V.Guha Google schema.org](https://reader035.vdocuments.us/reader035/viewer/2022062407/56649e045503460f94aef59f/html5/thumbnails/26.jpg)
schema.org
Schema.org Principles: Collaborations
• IPTC /NYTimes / Getty with rNews• Martin Hepp with Good Relations• US Veterans, Whitehouse, Indeed.com with Job Posting• Creative Commons with LRMI• NIH National Library of Medicine for Medical vocab.• Bibextend, Highwire Press for Bibliographic vocabulary• Benetech for Accessibility• BBC, European Broadcasting Union for TV & Radio schema• Stackexchange, SKOS group for message board• Lots and lots and lots of individuals
![Page 27: What a long, strange trip it’s been R.V.Guha Google schema.org](https://reader035.vdocuments.us/reader035/viewer/2022062407/56649e045503460f94aef59f/html5/thumbnails/27.jpg)
schema.org
Schema.org Principles: Partners
• Partner with Authoring platforms– Drupal, Wordpress, Blogger, YouTube
• Drupal 8– Schema.org markup for many types
• News articles, comments, users, events, …
– More schema.org types can be created by site author– Markup in HTML5 & RDFa Lite– Will come out early 2015
![Page 28: What a long, strange trip it’s been R.V.Guha Google schema.org](https://reader035.vdocuments.us/reader035/viewer/2022062407/56649e045503460f94aef59f/html5/thumbnails/28.jpg)
schema.org
Recent Additions
• From Nouns to Verbs: Actions– Object potential actions– Constraints on actions– E.g., ThorMovie Stream, Buy, …
• Introducing time: Roles– E.g., Joe Montana played for the SF 49ers from 1979 to
1992 in the position QuarterBack
![Page 29: What a long, strange trip it’s been R.V.Guha Google schema.org](https://reader035.vdocuments.us/reader035/viewer/2022062407/56649e045503460f94aef59f/html5/thumbnails/29.jpg)
schema.org
Recent Additions
• Scholarly work, Comics, Serials, …• Communications: TV, Radio, Q&A, …• Accessibility• Commerce: Reservations• Sports• Buyer/Seller, etc.• Bibtex
• The ontology is growing … – ~800 properties– ~600 classes
![Page 30: What a long, strange trip it’s been R.V.Guha Google schema.org](https://reader035.vdocuments.us/reader035/viewer/2022062407/56649e045503460f94aef59f/html5/thumbnails/30.jpg)
schema.org
Looking forward
• Schema.org is doing better than we expected– Thanks to millions of webmasters!
• But this is not the final goal– Just the means to the next generation of applications
• First generation of applications– Rich presentation of search results
• Many new applications– Related to search and beyond
![Page 31: What a long, strange trip it’s been R.V.Guha Google schema.org](https://reader035.vdocuments.us/reader035/viewer/2022062407/56649e045503460f94aef59f/html5/thumbnails/31.jpg)
schema.org
Newer Applications: Knowledge Graph
![Page 32: What a long, strange trip it’s been R.V.Guha Google schema.org](https://reader035.vdocuments.us/reader035/viewer/2022062407/56649e045503460f94aef59f/html5/thumbnails/32.jpg)
schema.org
Newer Applications: Knowledge Graph
![Page 33: What a long, strange trip it’s been R.V.Guha Google schema.org](https://reader035.vdocuments.us/reader035/viewer/2022062407/56649e045503460f94aef59f/html5/thumbnails/33.jpg)
schema.org
Non search applications: Google Now
User profile (google.com/now/topics)
+ structured data feeds
![Page 34: What a long, strange trip it’s been R.V.Guha Google schema.org](https://reader035.vdocuments.us/reader035/viewer/2022062407/56649e045503460f94aef59f/html5/thumbnails/34.jpg)
schema.org
Pinterest: Schema.org for Rich Pins
![Page 35: What a long, strange trip it’s been R.V.Guha Google schema.org](https://reader035.vdocuments.us/reader035/viewer/2022062407/56649e045503460f94aef59f/html5/thumbnails/35.jpg)
schema.org
Reservations Personal Assistant
• Open Table website confirmation email Android Reminder
![Page 36: What a long, strange trip it’s been R.V.Guha Google schema.org](https://reader035.vdocuments.us/reader035/viewer/2022062407/56649e045503460f94aef59f/html5/thumbnails/36.jpg)
schema.org
Vertical Search
• Structured data in search– Web search: annotate search results
OR– Filtering based on structured data
• Only in specialized corpus• Ecommerce, real estate, etc.
• How about filtering based on structured data across the web?
![Page 37: What a long, strange trip it’s been R.V.Guha Google schema.org](https://reader035.vdocuments.us/reader035/viewer/2022062407/56649e045503460f94aef59f/html5/thumbnails/37.jpg)
schema.org
Google Rich Snippets: Recipe View
![Page 38: What a long, strange trip it’s been R.V.Guha Google schema.org](https://reader035.vdocuments.us/reader035/viewer/2022062407/56649e045503460f94aef59f/html5/thumbnails/38.jpg)
schema.org
Web scale vertical search
• Searching for Veteran friendly jobs
![Page 39: What a long, strange trip it’s been R.V.Guha Google schema.org](https://reader035.vdocuments.us/reader035/viewer/2022062407/56649e045503460f94aef59f/html5/thumbnails/39.jpg)
schema.org
Web Scale custom vertical search
• Build your own custom vertical search engine– Google does the heavy lifting: crawling, indexing, etc.– You specify the schema.org restricts– APIs to help build your own UI
• Searches over all pages on the web with a certain schema.org markup
• Demo
![Page 40: What a long, strange trip it’s been R.V.Guha Google schema.org](https://reader035.vdocuments.us/reader035/viewer/2022062407/56649e045503460f94aef59f/html5/thumbnails/40.jpg)
schema.org
Scientific Data Publishing
• US Govt alone spends over $60B/yr on scientific research
• Primary output of most of this research is data– Most of the data is thrown away– All that is published are papers
• We would like the data published in a easily reusable form
![Page 41: What a long, strange trip it’s been R.V.Guha Google schema.org](https://reader035.vdocuments.us/reader035/viewer/2022062407/56649e045503460f94aef59f/html5/thumbnails/41.jpg)
schema.org
Case study: Clinical Trials
• Clinical trials• 4000+ clinical trials at any time in the US alone• Almost all the data ‘thrown away’• All that gets published is a textual ‘abstract’
• Many of the trials are redundant• Earlier trials have the data• Assumptions, etc. cannot be re-examined• Longitudinal studies extremely hard, but super important
• Having all the clinical trial data on the web, in a common schema will make this much easier!
![Page 42: What a long, strange trip it’s been R.V.Guha Google schema.org](https://reader035.vdocuments.us/reader035/viewer/2022062407/56649e045503460f94aef59f/html5/thumbnails/42.jpg)
schema.org
Case study: SkyServer
• Huge amount of astronomy data
• Jim Gray, NASA and others brought it all together, normalized it and made it available on the web
• Has changed the way astronomy research takes place• Students in Africa getting PhDs without leaving Africa!• Radio/Ultra-violet/Visible light data easily brought together
• Caveats• SQL biased, not distributed, not scalable• All normalization done by hand, once• Small number of data sources• But shows that it can be done …
![Page 43: What a long, strange trip it’s been R.V.Guha Google schema.org](https://reader035.vdocuments.us/reader035/viewer/2022062407/56649e045503460f94aef59f/html5/thumbnails/43.jpg)
schema.org
First steps for scientific data publication
• OPTC directive for data from federally funded research to be freely available
• Formation of new ‘Data Science’ institute inside NIH
• Seeing traction in scientific data on the web• Lot of interest in creating schemas• Public repositories for scientific data starting
![Page 44: What a long, strange trip it’s been R.V.Guha Google schema.org](https://reader035.vdocuments.us/reader035/viewer/2022062407/56649e045503460f94aef59f/html5/thumbnails/44.jpg)
schema.org
Concluding
• Structured data on the web is now ‘web scale’
• Schema.org has got traction and is evolving
• The most interesting applications are yet to come
![Page 45: What a long, strange trip it’s been R.V.Guha Google schema.org](https://reader035.vdocuments.us/reader035/viewer/2022062407/56649e045503460f94aef59f/html5/thumbnails/45.jpg)
schema.org
Questions?