hacku iit kgp 2013 boss + ca

Post on 22-Nov-2014

821 Views

Category:

Technology

0 Downloads

Preview:

Click to see full reader

DESCRIPTION

Presentation talks about BOSS and Content Analysis along with Dapper.

TRANSCRIPT

BOSS around the web

Saurabh Sahni YDN Developer, Hacker, Evangelist

Souri DattaStructured Data Extraction Team

http://www.flickr.com/photos/sumrow/1267682594/sizes/l/

BOSS is Build your own search service

http://developer.yahoo.com/search/boss/

Provides APIs

To our Searchdatabase

TO BUILD your ownpowerful

Search applications

BOSS allows you to search over

Web, images, news & Blogs

What can be done on top of BOSS?

• Blend and re-rank search results

• Your own look and feel• Mix it with other APIs

BOSS Pricing

Free for building your hacks!!

BOSS uses OAuth for securityCode : https://github.com/sourind/hacku/

Get a FREE consumer key and

secret

http://hackyourworld.org/hacku/

http://developer.yahoo.com/yql/console/

3. Copy This url

1. Select yql query

2. Select output format

Finding images of “The Dark Knight Rises”

select * from boss.search where q="The Dark Knight Rises" and service="images"

and ck="..." and secret="..."

Finding “The Dark Knight Rises” in IMDB, movies.yahoo.com

select * from boss.search where q="The Dark Knight Rises" and

sites="imdb.com,movies.yahoo.com" and ck="..." and secret="..."

Spell Check and Correction

select * from boss.search where q="The Dirk Knight Rises" and service="spelling" and

ck="..." and secret="..."

Finding news on “The Dark Knight Rises”

select * from boss.search where q="The Dark Knight Rises" and service="news" and

ck="..." and secret="..."

Finding interesting objects:Content Analysis

select * from contentanalysis.analyze where text="Sachin Tendulkar is batting very well"

Content Analysis from a URL

select * from contentanalysis.analyze where url="http://www.cnn.com/"

Lets See it in Action!

Query Cheatsheet• Find images of “The Dark Knight Rises”• select * from boss.search where q="The Dark

Knight Rises" and service="images" and ck="..." and secret="..."

• Find reviews of “The Dark Knight Rises”• select * from boss.search where q="reviews

intitle:The Dark Knight Rises" and service="web" and ck="..." and secret="…"

• Search for Avatar but not the movie: • select * from boss.search where q="Avatar -

movie" and ck="..." and secret="... "

• Search pdfs of “The Dark Knight Rises”• select * from boss.search where q="The Dark

Knight Rises" and type="pdf" and ck="..." and secret="..."

Query Cheatsheet• Find all the news of “The Dark Knight Rises”• select * from boss.search where q="The Dark

Knight Rises" and service="news" and ck="..." and secret="..."

• Get long abstracts in the results• select * from boss.search where q="The Dark

Knight Rises" and abstract="long" and ck="..." and secret="…"

• Retrieve 51-100 results of the query• select * from boss.search where q="The Dark

Knight Rises" and start=51 and ck="..." and secret="... "

EXAMPLES

duckduckgo.com

Data Extraction

Why extraction is difficult?• Internet has lot of information• Not all can be processed by machines

– Unstructured data– E.g. DiscountedPrice and RedcudedPrice of a

product (both mean the same)

• Ultimate aim is to publish data in structured format

• Most simple way- xml,json

Web Scraping• Demo Dapper

More Resources• Yahoo! BOSS:

http://developer.yahoo.com/boss • BOSS Technical Documentation: http://

developer.yahoo.com/search/boss/boss_api_guide/

• Content Analysis : http://developer.yahoo.com/contentanalysis/

• Oauth sample code : https://github.com/sourind/hacku/

Questions??http://www.flickr.com/photos/reem_unique/4119729692/

• http://slideshare.net/souridatta

• https://github.com/sourind/

Thanks!!

top related