pycon 2011 scaling disqus
DESCRIPTION
Disqus talks about how they scale their Python web application to over 500 million visitors a month. Video is available here: http://pycon.blip.tv/file/4880330/TRANSCRIPT
![Page 1: PyCon 2011 Scaling Disqus](https://reader034.vdocuments.us/reader034/viewer/2022042515/540e31868d7f72747e8b4c9c/html5/thumbnails/1.jpg)
DISQUSJason Yan@jasonyan
David Cramer@zeeg
Python at 400 500 million visitors
Got feedback? Use hashtag #sckrw
Sunday, March 13, 2011
![Page 2: PyCon 2011 Scaling Disqus](https://reader034.vdocuments.us/reader034/viewer/2022042515/540e31868d7f72747e8b4c9c/html5/thumbnails/2.jpg)
Agenda
• What is DISQUS?
• An Overview of the Infrastructure• Iterative Development and Deployment• Why We Love Python
Sunday, March 13, 2011
![Page 3: PyCon 2011 Scaling Disqus](https://reader034.vdocuments.us/reader034/viewer/2022042515/540e31868d7f72747e8b4c9c/html5/thumbnails/3.jpg)
We are a comment system with an emphasis on connecting communities
http://disqus.com/about/
dis·cuss • dĭ-skŭs'
What is DISQUS?
Sunday, March 13, 2011
![Page 4: PyCon 2011 Scaling Disqus](https://reader034.vdocuments.us/reader034/viewer/2022042515/540e31868d7f72747e8b4c9c/html5/thumbnails/4.jpg)
Embeddable Comments
Sunday, March 13, 2011
![Page 5: PyCon 2011 Scaling Disqus](https://reader034.vdocuments.us/reader034/viewer/2022042515/540e31868d7f72747e8b4c9c/html5/thumbnails/5.jpg)
A Brief History
Sunday, March 13, 2011
![Page 6: PyCon 2011 Scaling Disqus](https://reader034.vdocuments.us/reader034/viewer/2022042515/540e31868d7f72747e8b4c9c/html5/thumbnails/6.jpg)
Startup-ish
• Founded just about 4 years ago• 16 employees, 8 engineers• Tra!c increasing 15-20% a month• Flat organizational structure, every
engineer is a product manager• Fast turnaround, new feature launches
every week (sometimes daily)
Sunday, March 13, 2011
![Page 7: PyCon 2011 Scaling Disqus](https://reader034.vdocuments.us/reader034/viewer/2022042515/540e31868d7f72747e8b4c9c/html5/thumbnails/7.jpg)
Tra!c
0M
125M
250M
375M
500M
Number of Visitors
March 2008 through March 2011
Sunday, March 13, 2011
![Page 8: PyCon 2011 Scaling Disqus](https://reader034.vdocuments.us/reader034/viewer/2022042515/540e31868d7f72747e8b4c9c/html5/thumbnails/8.jpg)
DjangoCon 2010
• 17,000 requests/second peak
• 450,000 websites
• 15 million profiles
• 75 million comments
• 250 million visitors
Sunday, March 13, 2011
![Page 9: PyCon 2011 Scaling Disqus](https://reader034.vdocuments.us/reader034/viewer/2022042515/540e31868d7f72747e8b4c9c/html5/thumbnails/9.jpg)
Six Months Later
• 25,000 requests/second peak
• 700,000 websites
• 30 million profiles
• 170 million comments
• 500 million visitors
• 17,000 requests/second peak
• 450,000 websites
• 15 million profiles
• 75 million comments
• 250 million visitors
Sunday, March 13, 2011
![Page 10: PyCon 2011 Scaling Disqus](https://reader034.vdocuments.us/reader034/viewer/2022042515/540e31868d7f72747e8b4c9c/html5/thumbnails/10.jpg)
Six Months Later
• September 2010: 250 million uniques
• March 2011: 500 million uniques
• Handling over 2x the tra!c
Sunday, March 13, 2011
![Page 11: PyCon 2011 Scaling Disqus](https://reader034.vdocuments.us/reader034/viewer/2022042515/540e31868d7f72747e8b4c9c/html5/thumbnails/11.jpg)
Six Months Later
• September 2010: ~100 servers• March 2011: ~100 servers
• Scale diagonally
Sunday, March 13, 2011
![Page 12: PyCon 2011 Scaling Disqus](https://reader034.vdocuments.us/reader034/viewer/2022042515/540e31868d7f72747e8b4c9c/html5/thumbnails/12.jpg)
Scaling Diagonally
• We still rent hardware, so there is no “commodity hardware”
• Cheaper to upgrade
• Everything is redundant• Partition data where you need to, scale
partitions vertically
• Upgrade hardware (more RAM, more drives, more cores)
• Python apps tend to be CPU bound
Sunday, March 13, 2011
![Page 13: PyCon 2011 Scaling Disqus](https://reader034.vdocuments.us/reader034/viewer/2022042515/540e31868d7f72747e8b4c9c/html5/thumbnails/13.jpg)
Infrastructure
• 35% Web Servers (Apache + mod_wsgi)
• 15% Utility Servers (Python scripts, background workers)
• 20% Databases (PostgreSQL, Redis, Membase)
• 20% Load Balancing / High Availability (HAProxy + Heartbeat)
• 10% Caching servers (Memcached, Varnish)
• Half of our servers run Python
Sunday, March 13, 2011
![Page 14: PyCon 2011 Scaling Disqus](https://reader034.vdocuments.us/reader034/viewer/2022042515/540e31868d7f72747e8b4c9c/html5/thumbnails/14.jpg)
• Use what you’re comfortable with• Apache + mod_wsgi vs nginx + uWSGI
• Bottleneck is in the application
Python Web Servers
mod_wsgi
uWSGI
0 200 400 600
req/sec
Min Avg Max
015.030.045.060.0
mod_wsgi uWSGI
Memory
Sunday, March 13, 2011
![Page 15: PyCon 2011 Scaling Disqus](https://reader034.vdocuments.us/reader034/viewer/2022042515/540e31868d7f72747e8b4c9c/html5/thumbnails/15.jpg)
Background Workers
• Lots of tasks that don’t need to be done in web application process:
• Crawling URLs
• Updating avatars
• Email notifications
• Analytics
• Counters
Sunday, March 13, 2011
![Page 16: PyCon 2011 Scaling Disqus](https://reader034.vdocuments.us/reader034/viewer/2022042515/540e31868d7f72747e8b4c9c/html5/thumbnails/16.jpg)
Background Workers (cont’d)
• Most jobs are I/O bound• Slow external calls
• Twitter is slow
• Facebook is slow
• Could parallelize with multiple processes, but...
Sunday, March 13, 2011
![Page 17: PyCon 2011 Scaling Disqus](https://reader034.vdocuments.us/reader034/viewer/2022042515/540e31868d7f72747e8b4c9c/html5/thumbnails/17.jpg)
Background Workers (cont’d)
• Waste of memory
• Use non-blocking I/O• Celery 2.2 adds support for gevent/
eventlet
Sunday, March 13, 2011
![Page 18: PyCon 2011 Scaling Disqus](https://reader034.vdocuments.us/reader034/viewer/2022042515/540e31868d7f72747e8b4c9c/html5/thumbnails/18.jpg)
Monitoring
• Application side: Graphite• Real-time(ish) graphing
• Django front-end, Python backend
• Etsy’s StatsD proxy to Graphite
• UDP (fire and forget)
• Batches updates
Sunday, March 13, 2011
![Page 19: PyCon 2011 Scaling Disqus](https://reader034.vdocuments.us/reader034/viewer/2022042515/540e31868d7f72747e8b4c9c/html5/thumbnails/19.jpg)
Monitoring
• Track application metrics
• Errors, exceptions
• New comments, users, sites, etc.
• Anything
Sunday, March 13, 2011
![Page 20: PyCon 2011 Scaling Disqus](https://reader034.vdocuments.us/reader034/viewer/2022042515/540e31868d7f72747e8b4c9c/html5/thumbnails/20.jpg)
Monitoring
• Check out Etsy’s posts:
• Measure Anything, Measure Everything http://codeascraft.etsy.com/2011/02/15/measure-anything-measure-everything/
• Tracking Every Release http://codeascraft.etsy.com/2010/12/08/track-every-release/
Sunday, March 13, 2011
![Page 21: PyCon 2011 Scaling Disqus](https://reader034.vdocuments.us/reader034/viewer/2022042515/540e31868d7f72747e8b4c9c/html5/thumbnails/21.jpg)
What about the code?
Sunday, March 13, 2011
![Page 22: PyCon 2011 Scaling Disqus](https://reader034.vdocuments.us/reader034/viewer/2022042515/540e31868d7f72747e8b4c9c/html5/thumbnails/22.jpg)
Powered By Django
Sunday, March 13, 2011
![Page 23: PyCon 2011 Scaling Disqus](https://reader034.vdocuments.us/reader034/viewer/2022042515/540e31868d7f72747e8b4c9c/html5/thumbnails/23.jpg)
Which means...
• Largest Django-powered web application
• We fork, and even sometimes monkey patch to make it scale to our needs
• Fortunately, we don’t have to do too much (Yay, Django!)
• Unfortunately, we can’t use the whole of the Django internal components (and if we do, we do it in atypical ways)
Sunday, March 13, 2011
![Page 24: PyCon 2011 Scaling Disqus](https://reader034.vdocuments.us/reader034/viewer/2022042515/540e31868d7f72747e8b4c9c/html5/thumbnails/24.jpg)
Iterative DevelopmentRelease Early Release Often
Sunday, March 13, 2011
![Page 25: PyCon 2011 Scaling Disqus](https://reader034.vdocuments.us/reader034/viewer/2022042515/540e31868d7f72747e8b4c9c/html5/thumbnails/25.jpg)
Iterating Quickly
• Abstracting our application environment
• Less dependancies locally• Rely on CI for dependency coverage
• Heavy use of open source packages• No NIH syndrome
• Deploy frequently, 3-7 times a day
• Lots of branches, but master is “stable”• Realtime reporting on exceptions, metrics
• Our test suite is the main blocker (slow)
Sunday, March 13, 2011
![Page 26: PyCon 2011 Scaling Disqus](https://reader034.vdocuments.us/reader034/viewer/2022042515/540e31868d7f72747e8b4c9c/html5/thumbnails/26.jpg)
Dealing with Deploys
Sunday, March 13, 2011
![Page 27: PyCon 2011 Scaling Disqus](https://reader034.vdocuments.us/reader034/viewer/2022042515/540e31868d7f72747e8b4c9c/html5/thumbnails/27.jpg)
Gargoyle
Being users of our product, we actively use early versions of features before public release
Deploy features to portions of a user base at a time to ensure smooth, measurable releases
Sunday, March 13, 2011
![Page 28: PyCon 2011 Scaling Disqus](https://reader034.vdocuments.us/reader034/viewer/2022042515/540e31868d7f72747e8b4c9c/html5/thumbnails/28.jpg)
The Deployment Problem
• Make some changes locally
• Run a subset of the test suite• Push your commits• CI server begins running tests
• ....
Sunday, March 13, 2011
![Page 29: PyCon 2011 Scaling Disqus](https://reader034.vdocuments.us/reader034/viewer/2022042515/540e31868d7f72747e8b4c9c/html5/thumbnails/29.jpg)
Waiting on the test suite...
Sunday, March 13, 2011
![Page 30: PyCon 2011 Scaling Disqus](https://reader034.vdocuments.us/reader034/viewer/2022042515/540e31868d7f72747e8b4c9c/html5/thumbnails/30.jpg)
Rinse and Repeat
• 30 minutes later tests fail, start over• Finally, deploy to a subset of servers
• Open Sentry (our exception logger)
• Monitor Graphite• Deploy to 35 servers (~8 minutes)
• Full rollback in < 30 seconds
Sunday, March 13, 2011
![Page 31: PyCon 2011 Scaling Disqus](https://reader034.vdocuments.us/reader034/viewer/2022042515/540e31868d7f72747e8b4c9c/html5/thumbnails/31.jpg)
Wait, Sentry?
Sunday, March 13, 2011
![Page 32: PyCon 2011 Scaling Disqus](https://reader034.vdocuments.us/reader034/viewer/2022042515/540e31868d7f72747e8b4c9c/html5/thumbnails/32.jpg)
Testing
Sunday, March 13, 2011
![Page 33: PyCon 2011 Scaling Disqus](https://reader034.vdocuments.us/reader034/viewer/2022042515/540e31868d7f72747e8b4c9c/html5/thumbnails/33.jpg)
Testing Code
• Test suite takes around 25 minutes usually• “Stuck” with Hudson (or Jenkins)
• Most tightly integrated plugins are geared towards Java developers
• Which framework do we use?
• unittest(2), nose, doctests, LETTUCE?
• We use unittest and nose• Need to report code coverage, speed of
tests, pylint (or pyflakes)
Sunday, March 13, 2011
![Page 34: PyCon 2011 Scaling Disqus](https://reader034.vdocuments.us/reader034/viewer/2022042515/540e31868d7f72747e8b4c9c/html5/thumbnails/34.jpg)
We Love Python
Sunday, March 13, 2011
![Page 35: PyCon 2011 Scaling Disqus](https://reader034.vdocuments.us/reader034/viewer/2022042515/540e31868d7f72747e8b4c9c/html5/thumbnails/35.jpg)
Love-ish
• Many of us started with PHP or Rails• Clean syntax, clear standards
• All languages need PEP8.py and PyFlakes
• Interpreted, fast... enough
• Very easy to learn• We all started by learning Django first,
then Python
Sunday, March 13, 2011
![Page 36: PyCon 2011 Scaling Disqus](https://reader034.vdocuments.us/reader034/viewer/2022042515/540e31868d7f72747e8b4c9c/html5/thumbnails/36.jpg)
Haters Gonna HateIf you could choose one thing in
Python to hate on...
Sunday, March 13, 2011
![Page 37: PyCon 2011 Scaling Disqus](https://reader034.vdocuments.us/reader034/viewer/2022042515/540e31868d7f72747e8b4c9c/html5/thumbnails/37.jpg)
Better package management
Sunday, March 13, 2011
![Page 38: PyCon 2011 Scaling Disqus](https://reader034.vdocuments.us/reader034/viewer/2022042515/540e31868d7f72747e8b4c9c/html5/thumbnails/38.jpg)
What can we do?
• Too many forks, too many frameworks• We need less clones, and more combined
e"ort
• Improving existing Python solutions
• More Python solutions for existing products
Sunday, March 13, 2011
![Page 39: PyCon 2011 Scaling Disqus](https://reader034.vdocuments.us/reader034/viewer/2022042515/540e31868d7f72747e8b4c9c/html5/thumbnails/39.jpg)
Python Rocks!
Sunday, March 13, 2011
![Page 41: PyCon 2011 Scaling Disqus](https://reader034.vdocuments.us/reader034/viewer/2022042515/540e31868d7f72747e8b4c9c/html5/thumbnails/41.jpg)
References
• Sentry (our exception tracking tool)http://github.com/dcramer/django-sentry
• Gargoyle (feature switches)https://github.com/disqus/gargoyle
• Django DB Utils (collection of db helpers for Django)https://github.com/disqus/django-db-utils
• Jenkins CIhttp://jenkins-ci.org/
code.disqus.com
Sunday, March 13, 2011