rewire the net

30
Mobile, Context Aware Databases and Database Systems 2007/05/30 Rewire the Net Davide Eynard [email protected] Dipartimento di Elettronica e Informazione Politecnico di Milano

Post on 19-Oct-2014

1.986 views

Category:

Technology


0 download

DESCRIPTION

A presentation on wrappers and mashup tools, made for the PhD course "Mobile and Context-Aware Database Systems" at Politecnico di Milano, May 2007

TRANSCRIPT

Page 1: Rewire the Net

Mobile, Context Aware Databases and Database Systems

2007/05/30

Rewire the Net

Davide [email protected]

Dipartimento di Elettronica e InformazionePolitecnico di Milano

Page 2: Rewire the Net

Rewire the Netp. 2 2007/05/30

Intro

The problem Wrapping vs Mashup Mashup tools and technologies Open problems Conclusions

Page 3: Rewire the Net

Rewire the Netp. 3 2007/05/30

The problem

STRUCTURED

UNSTRUCTURED

STRUCTURED

Page 4: Rewire the Net

Rewire the Netp. 4 2007/05/30

What is a wrapper?

Page 5: Rewire the Net

Rewire the Netp. 5 2007/05/30

What is a wrapper?

ContentProvider

DesiredInterface

Page 6: Rewire the Net

Rewire the Netp. 6 2007/05/30

Is a wrapper enough?

A wrapper takes a (usually unstructured) data source and returns information in a desired format• All the uninteresting stuff is hidden within it• From outside we see only the desired interface

What we want to do is work with this information• aggregate/filter it• use it as input for other services• mash it!

Page 7: Rewire the Net

Rewire the Netp. 7 2007/05/30

An example

... and now?

Page 8: Rewire the Net

Rewire the Netp. 8 2007/05/30

An example

Convert data structuresto LaTeX and generatea Sudoku book in PDF

Page 9: Rewire the Net

Rewire the Netp. 9 2007/05/30

An example

Create a Web appwhich delivers datain a standard format

Create a Java appthat runs Sudokuson your mobile

Create another appthat solves Sudokus!

Page 10: Rewire the Net

Rewire the Netp. 10 2007/05/30

What kind of mashup?

Imagination is your only limit

So, most of the mashups around belong to one of the following families:• mapping mashups• video and photo mashups• search and shopping mashups• news mashups

• and... uhm, well... ability

Page 11: Rewire the Net

Rewire the Netp. 11 2007/05/30

Examples

Page 12: Rewire the Net

Rewire the Netp. 12 2007/05/30

Examples

Page 13: Rewire the Net

Rewire the Netp. 13 2007/05/30

Examples

Page 14: Rewire the Net

Rewire the Netp. 14 2007/05/30

Examples

Page 15: Rewire the Net

Rewire the Netp. 15 2007/05/30

Examples

Page 16: Rewire the Net

Rewire the Netp. 16 2007/05/30

Examples

Page 17: Rewire the Net

Rewire the Netp. 17 2007/05/30

Features

Source:“Five Ways to Mix, Rip, and Mash Your Data”

Nick Gonzalez, March 2 2007

Page 18: Rewire the Net

Rewire the Netp. 18 2007/05/30

The architecture

API/ContentProvider

API/ContentProvider

API/ContentProvider

...

MASHUPSITE/SERVICE

INTERFACE

Client

Page 19: Rewire the Net

Rewire the Netp. 19 2007/05/30

The architecture

API/ContentProvider

API/ContentProvider

API/ContentProvider

...

MASHUPSITE/SERVICE

AJAX

Client

Page 20: Rewire the Net

Rewire the Netp. 20 2007/05/30

AJAX

Asynchronous Javascript and XML It's a Web application model, rather than a

specific technology, and comprises several different technologies:• XHTML and CSS for style presentation• The DOM API exposed by the browser for

dynamic display and interaction• Asynchronous data exchange (typically XML)• Browser-side scripting (typically Javascript)

Page 21: Rewire the Net

Rewire the Netp. 21 2007/05/30

Protocols and standards

Web protocols• SOAP (Services-Oriented Access Protocol)

− XML message format− Message structure: head and body parts

• REST (Representational State Transfer)− Web-based communication using HTTP+XML− Few operations: GET, POST, PUT, DELETE

applicable to all pieces of information Syndication formats

• RSS (v1.0 is RDF based, while 2.0 is not)• ATOM (more attention on metadata)

Page 22: Rewire the Net

Rewire the Netp. 22 2007/05/30

Wrappers, spiders, scrapers

Wrapper is quite a general term used to describe a particular architecture

Rememberthis one?

A wrapper needs at least other two components to accomplish its task• A spider (or crawler), to follow links and

download web pages• A scraper, to extract useful content from pages

full of uninteresting data

Page 23: Rewire the Net

Rewire the Netp. 23 2007/05/30

Scrapers

Page 24: Rewire the Net

Rewire the Netp. 24 2007/05/30

Scrapers

Page 25: Rewire the Net

Rewire the Netp. 25 2007/05/30

Scrapers

However powerful, screen scraping is usually considered an inelegant solution• Lack of sophisticated, re-usable screen

scraping toolkit software (most of the scrapers are created ad hoc). Difficult to program

• Unlike API-interfaces, scraping has no explicit contract between content provider and content consumer. Difficult to update/maintain

Page 26: Rewire the Net

Rewire the Netp. 26 2007/05/30

Semantic Web and RDF

Content created for human consumption does not make good content for automated machine consumption• Data becomes information when it conveys

meaning XML in itself is not sufficient (too arbitrary). RDF is quickly finding an adoption in a variety of

domains.• possibility to query over it (RDQL, SPARQL)• possibility to reason over it (Jena, RACER)

Hey, that's my job!

Page 27: Rewire the Net

Rewire the Netp. 27 2007/05/30

Challenges

Technical:• data integration (what if mapping is not a

complete one?)• data that need to be fixed/cleaned/converted• robust standards, protocols, models and

toolkits (... and try to avoid scrapers) Social:

• encouraging user contributions• data pollution (lack of precision, gaming)• tradeoff between the protection of intellectual

property and consumer privacy versus fair use and free flow of information

Page 28: Rewire the Net

Rewire the Netp. 28 2007/05/30

Conclusions

Considering information as freely flowing on the Internet, and creating “pipes” to redirect, aggregate, reuse it is a great and powerful idea

We're still at the very beginning User participation might offer new chances for

improvement

... and create new problems, of course!

Page 29: Rewire the Net

Rewire the Netp. 29 2007/05/30

Webography

Duane Merrill: “Mashups: The new breed of Web app”

Tim O'Reilly: “Pipes and filters for the Internet” Nick Gonzales:

“Five ways to Mix, Rip and Mash Your Data” Davide Eynard: “PowerBrowsing Projects”,

“SukaSudoku” www.webmashup.com

Page 30: Rewire the Net

Rewire the Netp. 30 2007/05/30

That's All, Folks

Thank you!

Questions are welcome