Download - Feedparser
![Page 1: Feedparser](https://reader035.vdocuments.us/reader035/viewer/2022062511/54b67dd34a79590b548b4586/html5/thumbnails/1.jpg)
feedparserhttp://www.feedparser.org/
Because RSS is Hairy
Lindsey Smith@turbodog
![Page 2: Feedparser](https://reader035.vdocuments.us/reader035/viewer/2022062511/54b67dd34a79590b548b4586/html5/thumbnails/2.jpg)
feedparser: because RSS is hairy
RSS formats bundle HTML
User input via HTML is hairy
There are several syndication formats and versions (RSS, Atom, etc.)
RSSRSS
HTMLHTML
Micro-formatMicro-format
![Page 3: Feedparser](https://reader035.vdocuments.us/reader035/viewer/2022062511/54b67dd34a79590b548b4586/html5/thumbnails/3.jpg)
feedparser: because rss is hairy
Download and parse just about any feed type, including: Various flavors of Atom and RSS
Format extensions (iTunes)
Micro-formats (GeoRSS, hcard)
Ensures that you can treat all feeds the same way, regardless of format or version
![Page 4: Feedparser](https://reader035.vdocuments.us/reader035/viewer/2022062511/54b67dd34a79590b548b4586/html5/thumbnails/4.jpg)
feedparser: because rss is hairy
Digests whatever crap you throw at itSanitizes HTML
Date normalization
Resolving relative links
Feed type, version and encoding detection
Bozo detection of non-well-formed feeds without blowing up
![Page 5: Feedparser](https://reader035.vdocuments.us/reader035/viewer/2022062511/54b67dd34a79590b548b4586/html5/thumbnails/5.jpg)
feedparser: because rss is hairy
Parse URL, local file or string data
304 Not Modified HTTP return code
HTTP basic auth
Custom request headers
Customer handlers
Captures response headers
![Page 6: Feedparser](https://reader035.vdocuments.us/reader035/viewer/2022062511/54b67dd34a79590b548b4586/html5/thumbnails/6.jpg)
feedparser: the good ol’ days
Created circa 2002 by Mark Pilgrim of Dive Into Python fame
Powers feedvalidator.orgv4.1 released in 2007
Open sourceWell-documented3000 unit testsAvailable in popular Linux
distros
![Page 7: Feedparser](https://reader035.vdocuments.us/reader035/viewer/2022062511/54b67dd34a79590b548b4586/html5/thumbnails/7.jpg)
feedparser: the lean years
Development slows to a trickle
No official releasesAtom & RSS continue to
evolve iTunes enclosures
v4.1 released in 2007Still available in popular Linux
distros
![Page 8: Feedparser](https://reader035.vdocuments.us/reader035/viewer/2022062511/54b67dd34a79590b548b4586/html5/thumbnails/8.jpg)
feedparser 5.0: a new hope
Small group of developers start working on feedparser
v5.0 released January 2011Supports Python 3
Micro-formats
CSS & HTML5 sanitation
Bug fixes, bug fixes, bug fixes
![Page 9: Feedparser](https://reader035.vdocuments.us/reader035/viewer/2022062511/54b67dd34a79590b548b4586/html5/thumbnails/9.jpg)
>>> import feedparser
>>> d = feedparser.parse("http://feedparser.org/docs/examples/atom10.xml")
>>> d['feed']['title'] # feed data is a dictionary
u'Sample Feed'
>>> d.feed.title # get values attr-style or dict-style
u'Sample Feed'
>>> d.channel.title # use RSS or Atom terminology anywhere
u'Sample Feed'
>>> d.feed.link # resolves relative links
u'http://example.org/'
>>> d.feed.subtitle # parses escaped HTML
u'For documentation <em>only</em>'
![Page 10: Feedparser](https://reader035.vdocuments.us/reader035/viewer/2022062511/54b67dd34a79590b548b4586/html5/thumbnails/10.jpg)
>>> len(d['entries']) # entries are a list
1
>>> d['entries'][0]['title'] # each entry is a dictionary
u'First entry title'
>>> d.entries[0].title # attr-style works here too
u'First entry title'
>>> d['items'][0].title # RSS terminology works here too
u'First entry title'
>>> e = d.entries[0]
>>> e.link # easy access to alternate link
u'http://example.org/entry/3'
>>> e.links[1].rel # full access to all Atom links
u'related'
>>> e.links[0].href # resolves relative links here too
u'http://example.org/entry/3'
![Page 11: Feedparser](https://reader035.vdocuments.us/reader035/viewer/2022062511/54b67dd34a79590b548b4586/html5/thumbnails/11.jpg)
>>> e.updated_parsed # parses all date formats
time.struct_time(tm_year=2005, tm_mon=11, tm_mday=9, tm_hour=11, tm_min=56, tm_sec=34, tm_wday=2, tm_yday=313, tm_isdst=0)
>>> e.content[0].value # sanitizes dangerous HTML
u'<div>Watch out for <em>nasty tricks</em></div>'
>>> d.version # reports feed type and version
u'atom10'
>>> d.encoding # auto-detects character encoding
u'utf-8'
>>> d.headers.get('Content-type') # full access to all HTTP headers
u'application/xml‘
>>> d.bozo # well-formed?
0
![Page 12: Feedparser](https://reader035.vdocuments.us/reader035/viewer/2022062511/54b67dd34a79590b548b4586/html5/thumbnails/12.jpg)
feedparser: caveats
Fairly slow and CPU intensiveFriendfeed rolled their own and fell back
on feedparser
Team is looking at ways to speed it up
![Page 13: Feedparser](https://reader035.vdocuments.us/reader035/viewer/2022062511/54b67dd34a79590b548b4586/html5/thumbnails/13.jpg)
feedparser: the project details
Home page: http://www.feedparser.org
Discussion: http://code.google.com/p/feedparser