querying the web

Post on 14-Jan-2015

666 Views

Category:

Business

7 Downloads

Preview:

Click to see full reader

DESCRIPTION

A discussion of the various ways that data on the web can be published and queried. Why SQL is not the right tool for this.

TRANSCRIPT

Querying the Web

SlipstreamUSA :: April 2, 2008

Querying the Web

“Information wants to be free” Stewart Brand, Whole Earth Catalogue May 1985

“Data is the Next Intel Inside” Tim O’Reilly September 2005

“The internet is my hard drive” Bruce Schneier February 2008

Freebase

Freebase

Freebase

Freebase

Freebase

Metaweb Query Language Request:

{ "type" : "/medicine/physician",

"name" : “Michael Maher“ } Response:

{ "code": "/api/status/ok", "result": { "type": "/medicine/physician", "name": “Michael Maher", “gender”: “Male”,

“education”: “Leeds University”}

} JSON

REST

REpresentational State Transfer Less rigourous equivalent of SOAP Data are considered to be resources Every resource has a unique address Layered over http:

Client/Server separation Stateless Cacheable

Request:GET http://rest.georgejames.com/product/Serenji/

Response:Name=Serenji

Price=195.00

OrderCode=H1001

Amazon S3

S3 :: Simple Storage Service Online storage space $0.15 per Gbyte per month for storage ~ $0.20 per Gbyte data transfer

Storage request:PUT http://s3.amazonaws.com/[bucket-name]/[key-name]

Retrieval request:GET http://s3.amazonaws.com/[bucket-name]/[key-name]

Amazon SimpleDB

Storage request:https://sdb.amazonaws.com/?Action=PutAttributes &Attribute.0.Name=Color&Attribute.0.Value=Blue &Attribute.1.Name=Size&Attribute.1.Value=Med &Attribute.2.Name=Price&Attribute.2.Value=14.99 &AWSAccessKeyId=[valid access key id]&DomainName=MyDomain &ItemName=Item123

Retrieval request:https://sdb.amazonaws.com/ ?Action=GetAttributes &AWSAccessKeyId=[valid access key id] &DomainName=MyDomain &ItemName=Item123

Retrieval response:<GetAttributesResult><Attribute><Name>Color</Name><Value>Blue</Value></Attribute> <Attribute><Name>Size</Name><Value>Med</Value></Attribute> <Attribute><Name>Price</Name><Value>14.99</Value></Attribute> </GetAttributesResult>

Astoria

Astoria in action

Request:http://astoria.sandbox.live.com/northwind/northwind.rse/Categories

Response:

Astoria in action

Request:http://astoria.sandbox.live.com/northwind/northwind.rse/Customers

Response:

Astoria in action

Request:/Customers[FRANK]

Response:

Astoria in action

Request:/Customers[FRANK]/Orders

Response:

Astoria in action A variety of response formats:

POX Web3S (Web, Structured, Schema’d and Searchable) ATOM JSON

JSON request:/Customers[FRANK]?$format=json

Response:

Where is all this information going to come from?

Crowdsourcing

Jeff Howe, Wired Magazine, June 2006 Delegating an activity to a large number of

unidentified individuals Small finite tasks Quantity more important than quality The sum is greater than the parts Examples:

Wikipedia

Crowdsourcing

Crowdsourcing

Google Maps

Google Maps

Crowdsourcing

Jeff Howe, June 2006, Wired Magazine Delegating an activity to a large number of unidentified

individuals Small finite tasks Quantity more important than quality The sum is greater than the parts

Examples: Wikipedia Galaxy Zoo Amazon Mechanical Turk Google route planner

Consequences: Drives down the cost of data Ownership may not be the traditional incubents Client / user needs to discriminate

What does this mean for you?

Data Provider Publish data via simple APIs You data may have unexpected value Innovative usage Usage can enhance the quality of your data

Data Consumer Many potential data sources Explosive growth in available data Quality of the data is potentially lower …but is outweighed by quantity and richness

Technical Cache database is an ideal container Dynamic / extensible data structure Weak data typing High performance and scalability

The Internet is the Database

Thank you

Questions?

top related