hacku iit kgp 2013 boss + ca

32
BOSS around the web Souri Datta Structured Data Extraction Team http://www.flickr.com/photos/sumrow/1267682594/sizes/l/

Upload: souridatta

Post on 22-Nov-2014

821 views

Category:

Technology


0 download

DESCRIPTION

Presentation talks about BOSS and Content Analysis along with Dapper.

TRANSCRIPT

Page 1: HackU IIT Kgp 2013 BOSS + CA

BOSS around the web

Saurabh Sahni YDN Developer, Hacker, Evangelist

Souri DattaStructured Data Extraction Team

http://www.flickr.com/photos/sumrow/1267682594/sizes/l/

Page 2: HackU IIT Kgp 2013 BOSS + CA

BOSS is Build your own search service

http://developer.yahoo.com/search/boss/

Page 3: HackU IIT Kgp 2013 BOSS + CA

Provides APIs

To our Searchdatabase

Page 4: HackU IIT Kgp 2013 BOSS + CA

TO BUILD your ownpowerful

Search applications

Page 5: HackU IIT Kgp 2013 BOSS + CA

BOSS allows you to search over

Web, images, news & Blogs

Page 6: HackU IIT Kgp 2013 BOSS + CA

What can be done on top of BOSS?

• Blend and re-rank search results

• Your own look and feel• Mix it with other APIs

Page 7: HackU IIT Kgp 2013 BOSS + CA

BOSS Pricing

Page 8: HackU IIT Kgp 2013 BOSS + CA

Free for building your hacks!!

Page 9: HackU IIT Kgp 2013 BOSS + CA

BOSS uses OAuth for securityCode : https://github.com/sourind/hacku/

Page 10: HackU IIT Kgp 2013 BOSS + CA

Get a FREE consumer key and

secret

http://hackyourworld.org/hacku/

Page 11: HackU IIT Kgp 2013 BOSS + CA

http://developer.yahoo.com/yql/console/

Page 12: HackU IIT Kgp 2013 BOSS + CA
Page 13: HackU IIT Kgp 2013 BOSS + CA

3. Copy This url

1. Select yql query

2. Select output format

Page 14: HackU IIT Kgp 2013 BOSS + CA
Page 15: HackU IIT Kgp 2013 BOSS + CA

Finding images of “The Dark Knight Rises”

select * from boss.search where q="The Dark Knight Rises" and service="images"

and ck="..." and secret="..."

Page 16: HackU IIT Kgp 2013 BOSS + CA

Finding “The Dark Knight Rises” in IMDB, movies.yahoo.com

select * from boss.search where q="The Dark Knight Rises" and

sites="imdb.com,movies.yahoo.com" and ck="..." and secret="..."

Page 17: HackU IIT Kgp 2013 BOSS + CA

Spell Check and Correction

select * from boss.search where q="The Dirk Knight Rises" and service="spelling" and

ck="..." and secret="..."

Page 18: HackU IIT Kgp 2013 BOSS + CA

Finding news on “The Dark Knight Rises”

select * from boss.search where q="The Dark Knight Rises" and service="news" and

ck="..." and secret="..."

Page 19: HackU IIT Kgp 2013 BOSS + CA

Finding interesting objects:Content Analysis

select * from contentanalysis.analyze where text="Sachin Tendulkar is batting very well"

Page 20: HackU IIT Kgp 2013 BOSS + CA

Content Analysis from a URL

select * from contentanalysis.analyze where url="http://www.cnn.com/"

Page 21: HackU IIT Kgp 2013 BOSS + CA

Lets See it in Action!

Page 22: HackU IIT Kgp 2013 BOSS + CA

Query Cheatsheet• Find images of “The Dark Knight Rises”• select * from boss.search where q="The Dark

Knight Rises" and service="images" and ck="..." and secret="..."

• Find reviews of “The Dark Knight Rises”• select * from boss.search where q="reviews

intitle:The Dark Knight Rises" and service="web" and ck="..." and secret="…"

• Search for Avatar but not the movie: • select * from boss.search where q="Avatar -

movie" and ck="..." and secret="... "

• Search pdfs of “The Dark Knight Rises”• select * from boss.search where q="The Dark

Knight Rises" and type="pdf" and ck="..." and secret="..."

Page 23: HackU IIT Kgp 2013 BOSS + CA

Query Cheatsheet• Find all the news of “The Dark Knight Rises”• select * from boss.search where q="The Dark

Knight Rises" and service="news" and ck="..." and secret="..."

• Get long abstracts in the results• select * from boss.search where q="The Dark

Knight Rises" and abstract="long" and ck="..." and secret="…"

• Retrieve 51-100 results of the query• select * from boss.search where q="The Dark

Knight Rises" and start=51 and ck="..." and secret="... "

Page 24: HackU IIT Kgp 2013 BOSS + CA

EXAMPLES

Page 25: HackU IIT Kgp 2013 BOSS + CA

duckduckgo.com

Page 26: HackU IIT Kgp 2013 BOSS + CA
Page 27: HackU IIT Kgp 2013 BOSS + CA

Data Extraction

Page 28: HackU IIT Kgp 2013 BOSS + CA

Why extraction is difficult?• Internet has lot of information• Not all can be processed by machines

– Unstructured data– E.g. DiscountedPrice and RedcudedPrice of a

product (both mean the same)

• Ultimate aim is to publish data in structured format

• Most simple way- xml,json

Page 29: HackU IIT Kgp 2013 BOSS + CA

Web Scraping• Demo Dapper

Page 30: HackU IIT Kgp 2013 BOSS + CA

More Resources• Yahoo! BOSS:

http://developer.yahoo.com/boss • BOSS Technical Documentation: http://

developer.yahoo.com/search/boss/boss_api_guide/

• Content Analysis : http://developer.yahoo.com/contentanalysis/

• Oauth sample code : https://github.com/sourind/hacku/

Page 31: HackU IIT Kgp 2013 BOSS + CA

Questions??http://www.flickr.com/photos/reem_unique/4119729692/

Page 32: HackU IIT Kgp 2013 BOSS + CA

• http://slideshare.net/souridatta

• https://github.com/sourind/

Thanks!!