querying the web

26
Querying the Web SlipstreamUSA :: April 2, 2008

Upload: georgejames

Post on 14-Jan-2015

666 views

Category:

Business


7 download

DESCRIPTION

A discussion of the various ways that data on the web can be published and queried. Why SQL is not the right tool for this.

TRANSCRIPT

Page 1: Querying the Web

Querying the Web

SlipstreamUSA :: April 2, 2008

Page 2: Querying the Web

Querying the Web

“Information wants to be free” Stewart Brand, Whole Earth Catalogue May 1985

“Data is the Next Intel Inside” Tim O’Reilly September 2005

“The internet is my hard drive” Bruce Schneier February 2008

Page 3: Querying the Web

Freebase

Page 4: Querying the Web

Freebase

Page 5: Querying the Web

Freebase

Page 6: Querying the Web

Freebase

Page 7: Querying the Web

Freebase

Metaweb Query Language Request:

{ "type" : "/medicine/physician",

"name" : “Michael Maher“ } Response:

{ "code": "/api/status/ok", "result": { "type": "/medicine/physician", "name": “Michael Maher", “gender”: “Male”,

“education”: “Leeds University”}

} JSON

Page 8: Querying the Web

REST

REpresentational State Transfer Less rigourous equivalent of SOAP Data are considered to be resources Every resource has a unique address Layered over http:

Client/Server separation Stateless Cacheable

Request:GET http://rest.georgejames.com/product/Serenji/

Response:Name=Serenji

Price=195.00

OrderCode=H1001

Page 9: Querying the Web

Amazon S3

S3 :: Simple Storage Service Online storage space $0.15 per Gbyte per month for storage ~ $0.20 per Gbyte data transfer

Storage request:PUT http://s3.amazonaws.com/[bucket-name]/[key-name]

Retrieval request:GET http://s3.amazonaws.com/[bucket-name]/[key-name]

Page 10: Querying the Web

Amazon SimpleDB

Storage request:https://sdb.amazonaws.com/?Action=PutAttributes &Attribute.0.Name=Color&Attribute.0.Value=Blue &Attribute.1.Name=Size&Attribute.1.Value=Med &Attribute.2.Name=Price&Attribute.2.Value=14.99 &AWSAccessKeyId=[valid access key id]&DomainName=MyDomain &ItemName=Item123

Retrieval request:https://sdb.amazonaws.com/ ?Action=GetAttributes &AWSAccessKeyId=[valid access key id] &DomainName=MyDomain &ItemName=Item123

Retrieval response:<GetAttributesResult><Attribute><Name>Color</Name><Value>Blue</Value></Attribute> <Attribute><Name>Size</Name><Value>Med</Value></Attribute> <Attribute><Name>Price</Name><Value>14.99</Value></Attribute> </GetAttributesResult>

Page 11: Querying the Web

Astoria

Page 12: Querying the Web

Astoria in action

Request:http://astoria.sandbox.live.com/northwind/northwind.rse/Categories

Response:

Page 13: Querying the Web

Astoria in action

Request:http://astoria.sandbox.live.com/northwind/northwind.rse/Customers

Response:

Page 14: Querying the Web

Astoria in action

Request:/Customers[FRANK]

Response:

Page 15: Querying the Web

Astoria in action

Request:/Customers[FRANK]/Orders

Response:

Page 16: Querying the Web

Astoria in action A variety of response formats:

POX Web3S (Web, Structured, Schema’d and Searchable) ATOM JSON

JSON request:/Customers[FRANK]?$format=json

Response:

Page 17: Querying the Web

Where is all this information going to come from?

Page 18: Querying the Web

Crowdsourcing

Jeff Howe, Wired Magazine, June 2006 Delegating an activity to a large number of

unidentified individuals Small finite tasks Quantity more important than quality The sum is greater than the parts Examples:

Wikipedia

Page 19: Querying the Web

Crowdsourcing

Page 20: Querying the Web

Crowdsourcing

Page 21: Querying the Web

Google Maps

Page 22: Querying the Web

Google Maps

Page 23: Querying the Web

Crowdsourcing

Jeff Howe, June 2006, Wired Magazine Delegating an activity to a large number of unidentified

individuals Small finite tasks Quantity more important than quality The sum is greater than the parts

Examples: Wikipedia Galaxy Zoo Amazon Mechanical Turk Google route planner

Consequences: Drives down the cost of data Ownership may not be the traditional incubents Client / user needs to discriminate

Page 24: Querying the Web

What does this mean for you?

Data Provider Publish data via simple APIs You data may have unexpected value Innovative usage Usage can enhance the quality of your data

Data Consumer Many potential data sources Explosive growth in available data Quality of the data is potentially lower …but is outweighed by quantity and richness

Technical Cache database is an ideal container Dynamic / extensible data structure Weak data typing High performance and scalability

Page 25: Querying the Web

The Internet is the Database

Page 26: Querying the Web

Thank you

Questions?