bernadette hyland semtech 2011 west - linked data cookbook

44
The Joy of Data A cookbook for publishing Linked Data on the Web Bernadette Hyland, CEO 3 Round Stones, Inc [email protected]

Upload: bernadette-hyland

Post on 07-Dec-2014

7.335 views

Category:

Technology


0 download

DESCRIPTION

Linked Data is an evolving set of techniques for publishing and consuming data on the Web. Learn how Linked Data can turn the Web into a distributed database and how you can participate. In this session, Bernadette Hyland takes the mystery out of Linked Data by summarizing seven steps to prepare your data sets as Linked Data and announce it so others will use it.

TRANSCRIPT

Page 1: Bernadette Hyland SemTech 2011 West - Linked Data Cookbook

The Joy of DataA cookbook for publishing Linked Data on the Web

Bernadette Hyland, CEO3 Round Stones, Inc

[email protected]

Page 2: Bernadette Hyland SemTech 2011 West - Linked Data Cookbook

A pragmatic approach to

publishing & consuming Linked Data

Page 3: Bernadette Hyland SemTech 2011 West - Linked Data Cookbook

Agenda• Setting the scene

• Ingredients ... we use a cooking analogy

• Open standards & best practices

• Data modeling without context

• Social contract as a publisher

• Next steps

Page 4: Bernadette Hyland SemTech 2011 West - Linked Data Cookbook

Setting the scene ...

where should we focus?

Page 5: Bernadette Hyland SemTech 2011 West - Linked Data Cookbook

We’ll review •Converting data into RDF

•The social contract publishers make

•The importance of announcing

•Where to turn for guidance

Page 6: Bernadette Hyland SemTech 2011 West - Linked Data Cookbook

Why should we care?• We pretend our organizations are hierarchical -- they aren’t

• Information is power.

• Combining information from different sources is very powerful.

• The US data warehouse market in 2010 was $10B

• In 2012 expected to grow to $13.5B

Page 7: Bernadette Hyland SemTech 2011 West - Linked Data Cookbook

World changing phenomenon

• Using Linked Data approach, we can begin to address the non-hierarchical nature of our organizations

• We can combine information sources

• The W3C has defined standards that enable interoperability and allow us to freely move data

Page 8: Bernadette Hyland SemTech 2011 West - Linked Data Cookbook

We are sowing the seeds for nothing

short of a revolution

Page 9: Bernadette Hyland SemTech 2011 West - Linked Data Cookbook

What does it take?•The ingredients list ...

•Thinking differently about your data

•Modeling for re-use

•Summary of process in 7 steps

Page 10: Bernadette Hyland SemTech 2011 West - Linked Data Cookbook
Page 11: Bernadette Hyland SemTech 2011 West - Linked Data Cookbook

“The change from atoms to bits is irrevocable and unstoppable”

Being Digital by Nicolas Negroponte

Page 12: Bernadette Hyland SemTech 2011 West - Linked Data Cookbook

We use URIs to describe both bits & atoms ...

•Information resources are things that computers understand, e.g., Web pages, images, CSS files, etc.

•Non-information resources are atoms, e.g., people, places, events, things, concepts, etc.

Page 13: Bernadette Hyland SemTech 2011 West - Linked Data Cookbook

• A different way of thinking about data

• The Open World Assumption

• Lots of URIs

• To be citizen of the world (not everyone speaks English)

• To publish useful information & announce it!

Page 14: Bernadette Hyland SemTech 2011 West - Linked Data Cookbook

Peeling the

onion ....

Page 15: Bernadette Hyland SemTech 2011 West - Linked Data Cookbook

Machine readable

Page 16: Bernadette Hyland SemTech 2011 West - Linked Data Cookbook

and Human Readable (or edible)

Page 17: Bernadette Hyland SemTech 2011 West - Linked Data Cookbook

Publish machine & human readable content

• Machine readable format

• Human-readable descriptions of your data set

• Increase visibility with search engines

• Include RDFa or other microformats

• Publish a voID description of your RDF dataset

Page 18: Bernadette Hyland SemTech 2011 West - Linked Data Cookbook

Paid search

100%

90%

80%

70%

60%

50%

40%

30%0% 10% 20% 30% 40% 50% 60%

House email

SEO

Marketers Reporting “Great” Return on Investment

Usa

ge >

>>

Banners, buttons

Text-link ads

Affiliate MarketingBehavioraltargetingContextual

targeting

Pop-ups/pop-unders

Rich media/video

Rented emaillists

Page 19: Bernadette Hyland SemTech 2011 West - Linked Data Cookbook

Model without

context

Page 20: Bernadette Hyland SemTech 2011 West - Linked Data Cookbook

There is a Process

PublishConvertDescribeNameModelIdentify

Maintain

Page 21: Bernadette Hyland SemTech 2011 West - Linked Data Cookbook

Preparation1.Leverage what exists

• Request a copy of the logical and physical model of the database(s)

• Obtain data extracts (i.e., databases and/or spreadsheets) or create data in a way that can be replicated.

Page 22: Bernadette Hyland SemTech 2011 West - Linked Data Cookbook

Modeling the data2. Model data without context to allow for

reuse and easier merging of data sets

•Traditional DBAs organize data for specified Web services or applications.

•With LD, application logic does not drive the data schema, concepts, etc.

Page 23: Bernadette Hyland SemTech 2011 West - Linked Data Cookbook

Modeling the data3.Look for real world objects of interest (e.g., people, places,

things, locations, etc.) and model them.

• Investigate how others are already modeling similar or related data.

• Look for duplication and normalize the data

• Use common sense to decide whether or not to make link

Page 24: Bernadette Hyland SemTech 2011 West - Linked Data Cookbook

Modeling the data ...4. Connect data from different sources and authoritative

vocabularies (see list of popular vocabularies below).

• Use URIs as names for your objects

Page 25: Bernadette Hyland SemTech 2011 West - Linked Data Cookbook

Modeling the data ...

• Put aside immediate needs of any application

• Don’t think about how an application will use your data

• Do think about time and how the data will change over time.

Page 26: Bernadette Hyland SemTech 2011 West - Linked Data Cookbook

Convert, Publish & Maintain

5. Write a script or process to convert the data set repeatedly

6. Publish to the Web and announce it! (more details shortly)

7. Maintenance strategy (more details in the social contract at the end)

Page 27: Bernadette Hyland SemTech 2011 West - Linked Data Cookbook

Take the plunge ... Be forgiving

• Simplistic data models can still be useful

• Better to make progress with something rather than do nothing because we cannot be comprehensive and complete

Page 28: Bernadette Hyland SemTech 2011 West - Linked Data Cookbook

Take an iterative approach1. Review of modeling decisions

2. Review vocabularies chosen and developed

3. Modify/update data conversion scripts

4. Do a maintenance walk-through with real use cases

5. Show how to explore data with SPARQL and visualizations

6. Discuss a persistent identifier strategy (think PURLs)

Page 29: Bernadette Hyland SemTech 2011 West - Linked Data Cookbook

shared innovation™

29

Page 30: Bernadette Hyland SemTech 2011 West - Linked Data Cookbook

Describe your

data

Page 31: Bernadette Hyland SemTech 2011 West - Linked Data Cookbook

Data stewards should....

• Make data accessible via the Web’s standard access mechanism, specifically http URIs

• Represent data in a common format, such as RDF/XML, Notation-3 (N3), Turtle, N-Triples, RDFa, and RDF/JSON

• Provide self describing data

Page 32: Bernadette Hyland SemTech 2011 West - Linked Data Cookbook

Linked Data Formats• RDF/XML - RDF for XML pipelines

• Turtle - Human-readable RDF

• XHTML with GRDDL transformation

• XHTML with embedded RDFa

• RDF Schema - Describing structure

Page 33: Bernadette Hyland SemTech 2011 West - Linked Data Cookbook
Page 34: Bernadette Hyland SemTech 2011 West - Linked Data Cookbook

In a tart, smoothie or margarita ... berries

can be combined in different ways

Page 35: Bernadette Hyland SemTech 2011 West - Linked Data Cookbook

Merging data

Page 36: Bernadette Hyland SemTech 2011 West - Linked Data Cookbook

Guidelines for merging

• URIs name the resources we are describing

• Two people using the same URI are describing the same thing

• The same URI in two datasets means the same thing

• Graphs from several different sources can be merged;

• Resources with the same URI are considered identical;

• No limitations on which graphs can be merged.

Page 37: Bernadette Hyland SemTech 2011 West - Linked Data Cookbook

Announcing the

finished product!

Page 38: Bernadette Hyland SemTech 2011 West - Linked Data Cookbook

•Inform the LOD developer community (linkeddata.org, W3 lists)

•Announce to search engines (RDFa hints, register to make accessible)•Publish human readable descriptions•Encourage interlinking•Publish schema as voID •Include SPARQL endpoint

Page 39: Bernadette Hyland SemTech 2011 West - Linked Data Cookbook

17%

49%

16%

13%4%

6 months12 months18 months24 monthsMore than 24 months

ACCEPTABLE ROI FOR IT

Page 40: Bernadette Hyland SemTech 2011 West - Linked Data Cookbook

The Social Contract ... The not so fine print

• LOD is a social contract to provide the public with information

• Follow best practices for modeling

• Carefully consider your URI strategy

• Ensure that your LOD remains available where you say it will be

• Publish voID description

• For a government agency ... a data policy is “a must”• specify data quality and retention, treatment of data thru

secondary sources, restrictions for use, frequency of updates, public participation, and applicability of this data policy

Page 41: Bernadette Hyland SemTech 2011 West - Linked Data Cookbook

We’ve created

someting quite beautiful

Page 43: Bernadette Hyland SemTech 2011 West - Linked Data Cookbook

This work is Copyright © 2011 3 Round Stones Inc.It is licensed under the Creative Commons Attribution 3.0 Unported LicenseFull details at: http://creativecommons.org/licenses/by/3.0/

You are free:

to Share — to copy, distribute and transmit the work

to Remix — to adapt the work

Under the following conditions:Attribution. You must attribute the work in the manner specified by the author or licensor (but not in any way that suggests that they endorse you or your use of the work).

• For any reuse or distribution, you must make clear to others the license terms of this work.• Any of the above conditions can be waived if you get permission from the copyright holder.• Nothing in this license impairs or restricts the author's moral rights.• Some Content in the work may be licensed under different terms, this is noted separately.