rewire the net
Post on 19-Oct-2014
1.986 views
DESCRIPTION
A presentation on wrappers and mashup tools, made for the PhD course "Mobile and Context-Aware Database Systems" at Politecnico di Milano, May 2007TRANSCRIPT
Mobile, Context Aware Databases and Database Systems
2007/05/30
Rewire the Net
Davide [email protected]
Dipartimento di Elettronica e InformazionePolitecnico di Milano
Rewire the Netp. 2 2007/05/30
Intro
The problem Wrapping vs Mashup Mashup tools and technologies Open problems Conclusions
Rewire the Netp. 3 2007/05/30
The problem
STRUCTURED
UNSTRUCTURED
STRUCTURED
Rewire the Netp. 4 2007/05/30
What is a wrapper?
Rewire the Netp. 5 2007/05/30
What is a wrapper?
ContentProvider
DesiredInterface
Rewire the Netp. 6 2007/05/30
Is a wrapper enough?
A wrapper takes a (usually unstructured) data source and returns information in a desired format• All the uninteresting stuff is hidden within it• From outside we see only the desired interface
What we want to do is work with this information• aggregate/filter it• use it as input for other services• mash it!
Rewire the Netp. 7 2007/05/30
An example
... and now?
Rewire the Netp. 8 2007/05/30
An example
Convert data structuresto LaTeX and generatea Sudoku book in PDF
Rewire the Netp. 9 2007/05/30
An example
Create a Web appwhich delivers datain a standard format
Create a Java appthat runs Sudokuson your mobile
Create another appthat solves Sudokus!
Rewire the Netp. 10 2007/05/30
What kind of mashup?
Imagination is your only limit
So, most of the mashups around belong to one of the following families:• mapping mashups• video and photo mashups• search and shopping mashups• news mashups
• and... uhm, well... ability
Rewire the Netp. 11 2007/05/30
Examples
Rewire the Netp. 12 2007/05/30
Examples
Rewire the Netp. 13 2007/05/30
Examples
Rewire the Netp. 14 2007/05/30
Examples
Rewire the Netp. 15 2007/05/30
Examples
Rewire the Netp. 16 2007/05/30
Examples
Rewire the Netp. 17 2007/05/30
Features
Source:“Five Ways to Mix, Rip, and Mash Your Data”
Nick Gonzalez, March 2 2007
Rewire the Netp. 18 2007/05/30
The architecture
API/ContentProvider
API/ContentProvider
API/ContentProvider
...
MASHUPSITE/SERVICE
INTERFACE
Client
Rewire the Netp. 19 2007/05/30
The architecture
API/ContentProvider
API/ContentProvider
API/ContentProvider
...
MASHUPSITE/SERVICE
AJAX
Client
Rewire the Netp. 20 2007/05/30
AJAX
Asynchronous Javascript and XML It's a Web application model, rather than a
specific technology, and comprises several different technologies:• XHTML and CSS for style presentation• The DOM API exposed by the browser for
dynamic display and interaction• Asynchronous data exchange (typically XML)• Browser-side scripting (typically Javascript)
Rewire the Netp. 21 2007/05/30
Protocols and standards
Web protocols• SOAP (Services-Oriented Access Protocol)
− XML message format− Message structure: head and body parts
• REST (Representational State Transfer)− Web-based communication using HTTP+XML− Few operations: GET, POST, PUT, DELETE
applicable to all pieces of information Syndication formats
• RSS (v1.0 is RDF based, while 2.0 is not)• ATOM (more attention on metadata)
Rewire the Netp. 22 2007/05/30
Wrappers, spiders, scrapers
Wrapper is quite a general term used to describe a particular architecture
Rememberthis one?
A wrapper needs at least other two components to accomplish its task• A spider (or crawler), to follow links and
download web pages• A scraper, to extract useful content from pages
full of uninteresting data
Rewire the Netp. 23 2007/05/30
Scrapers
Rewire the Netp. 24 2007/05/30
Scrapers
Rewire the Netp. 25 2007/05/30
Scrapers
However powerful, screen scraping is usually considered an inelegant solution• Lack of sophisticated, re-usable screen
scraping toolkit software (most of the scrapers are created ad hoc). Difficult to program
• Unlike API-interfaces, scraping has no explicit contract between content provider and content consumer. Difficult to update/maintain
Rewire the Netp. 26 2007/05/30
Semantic Web and RDF
Content created for human consumption does not make good content for automated machine consumption• Data becomes information when it conveys
meaning XML in itself is not sufficient (too arbitrary). RDF is quickly finding an adoption in a variety of
domains.• possibility to query over it (RDQL, SPARQL)• possibility to reason over it (Jena, RACER)
Hey, that's my job!
Rewire the Netp. 27 2007/05/30
Challenges
Technical:• data integration (what if mapping is not a
complete one?)• data that need to be fixed/cleaned/converted• robust standards, protocols, models and
toolkits (... and try to avoid scrapers) Social:
• encouraging user contributions• data pollution (lack of precision, gaming)• tradeoff between the protection of intellectual
property and consumer privacy versus fair use and free flow of information
Rewire the Netp. 28 2007/05/30
Conclusions
Considering information as freely flowing on the Internet, and creating “pipes” to redirect, aggregate, reuse it is a great and powerful idea
We're still at the very beginning User participation might offer new chances for
improvement
... and create new problems, of course!
Rewire the Netp. 29 2007/05/30
Webography
Duane Merrill: “Mashups: The new breed of Web app”
Tim O'Reilly: “Pipes and filters for the Internet” Nick Gonzales:
“Five ways to Mix, Rip and Mash Your Data” Davide Eynard: “PowerBrowsing Projects”,
“SukaSudoku” www.webmashup.com
Rewire the Netp. 30 2007/05/30
That's All, Folks
Thank you!
Questions are welcome