Download - HackU IIT Kgp 2013 BOSS + CA
BOSS around the web
Saurabh Sahni YDN Developer, Hacker, Evangelist
Souri DattaStructured Data Extraction Team
http://www.flickr.com/photos/sumrow/1267682594/sizes/l/
BOSS is Build your own search service
http://developer.yahoo.com/search/boss/
Provides APIs
To our Searchdatabase
TO BUILD your ownpowerful
Search applications
BOSS allows you to search over
Web, images, news & Blogs
What can be done on top of BOSS?
• Blend and re-rank search results
• Your own look and feel• Mix it with other APIs
BOSS Pricing
Free for building your hacks!!
BOSS uses OAuth for securityCode : https://github.com/sourind/hacku/
Get a FREE consumer key and
secret
http://hackyourworld.org/hacku/
http://developer.yahoo.com/yql/console/
3. Copy This url
1. Select yql query
2. Select output format
Finding images of “The Dark Knight Rises”
select * from boss.search where q="The Dark Knight Rises" and service="images"
and ck="..." and secret="..."
Finding “The Dark Knight Rises” in IMDB, movies.yahoo.com
select * from boss.search where q="The Dark Knight Rises" and
sites="imdb.com,movies.yahoo.com" and ck="..." and secret="..."
Spell Check and Correction
select * from boss.search where q="The Dirk Knight Rises" and service="spelling" and
ck="..." and secret="..."
Finding news on “The Dark Knight Rises”
select * from boss.search where q="The Dark Knight Rises" and service="news" and
ck="..." and secret="..."
Finding interesting objects:Content Analysis
select * from contentanalysis.analyze where text="Sachin Tendulkar is batting very well"
Content Analysis from a URL
select * from contentanalysis.analyze where url="http://www.cnn.com/"
Lets See it in Action!
Query Cheatsheet• Find images of “The Dark Knight Rises”• select * from boss.search where q="The Dark
Knight Rises" and service="images" and ck="..." and secret="..."
• Find reviews of “The Dark Knight Rises”• select * from boss.search where q="reviews
intitle:The Dark Knight Rises" and service="web" and ck="..." and secret="…"
• Search for Avatar but not the movie: • select * from boss.search where q="Avatar -
movie" and ck="..." and secret="... "
• Search pdfs of “The Dark Knight Rises”• select * from boss.search where q="The Dark
Knight Rises" and type="pdf" and ck="..." and secret="..."
Query Cheatsheet• Find all the news of “The Dark Knight Rises”• select * from boss.search where q="The Dark
Knight Rises" and service="news" and ck="..." and secret="..."
• Get long abstracts in the results• select * from boss.search where q="The Dark
Knight Rises" and abstract="long" and ck="..." and secret="…"
• Retrieve 51-100 results of the query• select * from boss.search where q="The Dark
Knight Rises" and start=51 and ck="..." and secret="... "
EXAMPLES
duckduckgo.com
Data Extraction
Why extraction is difficult?• Internet has lot of information• Not all can be processed by machines
– Unstructured data– E.g. DiscountedPrice and RedcudedPrice of a
product (both mean the same)
• Ultimate aim is to publish data in structured format
• Most simple way- xml,json
Web Scraping• Demo Dapper
More Resources• Yahoo! BOSS:
http://developer.yahoo.com/boss • BOSS Technical Documentation: http://
developer.yahoo.com/search/boss/boss_api_guide/
• Content Analysis : http://developer.yahoo.com/contentanalysis/
• Oauth sample code : https://github.com/sourind/hacku/
Questions??http://www.flickr.com/photos/reem_unique/4119729692/
• http://slideshare.net/souridatta
• https://github.com/sourind/
Thanks!!