hacking rss: filtering & processing obscene amounts of information (short version)
DESCRIPTION
The 15 minute version of the longer talk that I delivered at SXSW in March. More details: http://fastwonderblog.com/yahoo-pipes-and-rss-hacks/TRANSCRIPT
Hacking RSS:Filtering & Processing
Obscene Amounts of Information#hackingRSS
Hacking RSS:Filtering & Processing
Obscene Amounts of Information#hackingRSS
Dawn FosterIntel Community Manager
Information OverloadInformation Overload
CD Photo: http://www.flickr.com/photos/chefranden/2751354004/
Who Cares?Who Cares?
● Most of it is …– complete crap– out of date / obsolete– not interesting to you– irrelevant for you
Junk Pile: http://www.flickr.com/photos/zen/4013525/
You Want to Find the NeedleYou Want to Find the Needle
Haystacks: http://www.flickr.com/photos/rasekh/4911673659/
RSS Alone is a StartRSS Alone is a Start● Sources you care about delivered right to you. But …
– Do you care about everything in each feed?– What about the feeds you aren't subscribed to?– Can you keep up with what you have?
Prioritize Your ReaderPrioritize Your Reader
● Put things you care about at the top● Categorize● Don't try to read everything
The Real Magic is in Filtering RSSThe Real Magic is in Filtering RSS
● In my Google Reader right now:– Analyst research blogs mentioning Online Community– Analyst research blogs mentioning MeeGo– Searches across social sites mentioning me, my projects, my
websites etc. - filtering out things I don't care about– My favorite blogs filtered using PostRank to find only the
ones with a lot of comments or social mentions
Complete CrapInteresting
Maybe Relevant
Yay!
RSS Filtering ToolsRSS Filtering Tools● Yahoo Pipes (my favorite)
– More powerful & fexible: options to filter any data found in any field in the rss feed (URL, title, description, author …)
– Downside: takes some time to learn & can be a little faky at times. Also a single point of failure if Yahoo ever killed it.
● Other Options– FeedRinse: easy to use, not as fexible. Import RSS feeds,
add filters, get new RSS feeds out.– RSS readers with filtering / alerts (FeedDemon)– Code: write your own filters– Note: many free RSS filtering services have gone out of
business – can be bandwidth intensive & costly to host.
● Input:– WebWorkerDaily– ReadWriteWeb
● Filter by content:– Collaborate– Collaboration– Collaborative
● Output:– 1 RSS Feed– Matching 3 keywords
Yahoo Pipes Filtering ExampleYahoo Pipes Filtering Example
2 Minute Yahoo Pipe Video How-to's: http://fastwonderblog.com/yahoo-pipes-and-rss-hacks/
PostRankPostRank● Best Posts in a
feed● Ranked on
engagement (links, sharing, comments)
● Can get output as RSS feed
● Feed includes postrank number as a field
What's In a Feed? PostRank (Yahoo Pipes View)What's In a Feed? PostRank (Yahoo Pipes View)
● Content in feeds varies wildly depending on site.● Common: title, author, pubDate, link, content, description● Site-specific: postrank, lat/long, image links, username,
twitter source … (most RSS readers don't show these)● API: usually has additional data & can output RSS● If it's in the feed, you can use it!
Reformatting / Modifying RSS FeedsReformatting / Modifying RSS FeedsDon't be satisfied with default RSS feed formats!
TwitterSearch
TwitterRSSFeed
Modify & more quickly scan key data
Yahoo Pipes: Reformat Twitter FeedYahoo Pipes: Reformat Twitter Feed● Input:
– Twitter Search feed
● Loop String Build:– Author– : (spacing)– Title
● Loop Assign:– Store result back
into title● Output:
– 1 RSS feed– Efficient format
BackTweets (BackType API)BackTweets (BackType API)● Data about links on
Twitter● Finds links regardless of
shortening service● No RSS Feeds● But … You can use
API + Pipes to build one!
BackType + Twitter API + Pipes OutputBackType + Twitter API + Pipes Output● Data from BackType + Twitter● Built an RSS feed using Yahoo Pipes● Included the information relevant for me● Could have included or filtered on: name, listed count,
location, profile image, user URL, ...
Admit it, we ALL do vanity searchesAdmit it, we ALL do vanity searches● You can enter your search queries in Google, Twitter,
Flickr …– Add a new project & have to update all of them– Can be hard to filter out some results– May have duplicates from multiple searches
● Yahoo Pipes– Update keywords in a CSV file– Use CSV file as input into a bunch of searches (RSS or
API inputs)– Filter out what you don't want– Get 1 filtered RSS feed as output
2 minute video: http://fastwonderblog.com/2009/05/01/keyword-csv-files-and-searching-2-minute-yahoo-pipes-demo/
How Should / Shouldn't You Use All of This?How Should / Shouldn't You Use All of This?● Do:
– Use this for personal productivity– Play around, create prototypes and understand the possibilities
● Don't: – Don't violate licenses on content or republish w/o permission– Don't use in critical or production environments
● For production use or putting data on websites:– Re-write in a real programming language with cached results
and error checkingXKCD Comic: http://xkcd.com/327/
Learn MoreAbout Dawn:● Intel Community Manager for MeeGo● Author of Companies and Communities● More Info: http://fastwonderblog.com● [email protected]● @geekygirldawn on Twitter
Additional Reading & audio from 1 hour version of this talk:● http://fastwonderblog.com/yahoo-pipes-and-rss-hacks/
18
Photo of Dawn: http://www.flickr.com/photos/ahockley/3036575066/
Backup
Outsource / Crowdsource New SourcesOutsource / Crowdsource New Sources
Yahoo Pipes: Reformat PostRank FeedYahoo Pipes: Reformat PostRank Feed● Input:
– 3 PostRank feeds● Loop String Build:
– PostRank– : (spacing)– Title
● Loop Assign:– Store result back
into title● Output:
– 1 RSS feed– Efficient format
Yahoo Pipes PostRank ExampleYahoo Pipes PostRank Example● Input PostRank
Feeds:– Engadget– CrunchGear– Boy Genius
● Filter by content– Tablet
● Sort:– PostRank
● Output– 1 RSS feed– Best tablet posts
Using Web APIs 101Using Web APIs 101● Many API calls are basically URLs● Constructing URLs
– Use API documentation/examples to format the URL
– http://api.twitter.com/1/statuses/show/ID.xml
● Version 1 of API show status for ID in .format
● API keys– Tells API who you are (password)
● Rate limiting– Only get so much & you're cut of– Limited by IP or API key– Chill out for a while & come back
XKCD Comic: http://xkcd.com/844/
Backtweets API + Twitter API + Yahoo PipesBacktweets API + Twitter API + Yahoo Pipes● What we want to do:
– Start with a set of URLs (blog posts in a feed)– Find any tweet mentioning those URLs– Return the tweet and data about the person who posted it
● Mission: Build feed using only data from these 2 APIs ● BackType API provides Tweet ID (not humanly useful)
– http://api.backtype.com/tweets/search/links.xml?q=URL&mode=batch&key=KEY
– List of Twitter Status IDs for Tweets linking to URL– Note: I think this feature may be deprecated
● Twitter API uses Tweet ID to get everything else– http://api.twitter.com/1/statuses/show/ID.xml– Returns a single status all relevant data for ID
BackTweets API: Get Tweet IDBackTweets API: Get Tweet ID
● Take WebWorkerDaily Author Feed● Use WWD URLs to build URLs for BackType API call● Fetch data from BackType URLs to get Tweet ID
Twitter API: Get Data Based on Tweet IDTwitter API: Get Data Based on Tweet ID
● Use BackType tweet ID to build URL for Twitter API● Fetch data about Tweet & User from Twitter API● Re-Build title to show “user (followers): tweet”
Add Filters to BackType + Twitter ExampleAdd Filters to BackType + Twitter Example● Show only tweets from people with 1000+ followers