“Startup DevOps” Tuesday, 12th August 2014
Jon Milsom Co-‐founder & CTO
Hiring
• Roles – FullHme Junior Dev (Frontend & some PHP) – Freelance SysAdmins / Ops for ad-‐hoc projects
• Opportunity – Make a big impact in a small team
WHAT IS PITCHERO?
Websites for amateur sports teams
Online management plaRorm for amateur sports teams/clubs/leagues/counHes/governing
bodies
Market?
Not this…
Source: hWp://images.football365.com/13/03/800x600/David-‐Beckham-‐St-‐EHenne-‐v-‐PSG-‐2013_2917145.jpg
This…
Source: hWp://sanford-‐soccer-‐net.blogspot.co.uk/2011/03/massive-‐aWack-‐loving-‐tribute-‐to-‐fat.html
Traffic
iPhone
Jan 2013
User generated content
Tech Team
THE “HYBRID” CLOUD
Hybrid Cloud?
Image from: hWp://www.rackspace.com/cloud/hybrid/
Why Hybrid?
• Reliable (keep some dedicated boxes) • Easy to scale
Moving from dedicated boxes…
• Moved session management into Redis • Don’t rely on file system • Don’t rely on any single node • Rely on config files • Yada yada… Loads of other stuff that I can’t recall
Cool!
Source: hWp://www.ktvu.com/news/entertainment/funk-‐legend-‐headlines-‐sf/nD63D/
Not cool L
Source: hWp://www.mixcrate.com/oneworld/phenoms-‐of-‐funk-‐george-‐clintonparliamenRunkadelicbootsy-‐collins-‐162038
In pracHce
Previously internal traffic now goes through firewall & back out
Issues -‐ Latency
Issues -‐ Latency
Approx 40ms memcache. ~ 20% of total!
Issues -‐ Throughput
Issues -‐ Throughput
• Exceeded internal throughput – Increased latency – Dropped packets
• Had to upgrade firewall – UnanHcipated cost
MESSAGE QUEUES
NewsleWer System
• 1M emails / month • Up to 200k/day • Generated dynamically
NewsleWer System
• Via Cron – Synchronous process – Sent via 5 minute cron – 500 in a batch
Problem?
144,000 emails per day
200,000 subscripHons
200,000 > 144,000
SoluHon…
High-‐Scale Async Processing
The Message Queue For the Cloud
NewsleWer System
• Via MQ – Processed Async – Sent quickly
• 200k+ within 1 hour – X concurrent workers
NewsleWer System
• Sessions • Average peak before: 5,000 • Average peak aver: 10,000
“Business” results…
CACHING
Simple Model
OpHmal Caching
• Every client wants the same content • The content doesn’t change oven
OpHmal Caching
• Every client wants the same content • The content doesn’t change oven
Pitchero Model
OpHon #1 – Increase DB layer
• Pros – Simplify development
• Cons – OperaHons headache – Expensive – Performance hit
OpHon #2 – Intelligent caching
• Pros – Cheap (spare RAM?) – Fast – Low operaHons overhead
• Cons – Dev work – Not relaHonal – Can’t rely on a key exisHng
“There are only two hard things in Computer Science: cache invalida:on and naming things.“ -‐ Phil Karlton
“There are only two hard things in Computer Science: cache invalida:on and naming things.“ -‐ Phil Karlton
ProperHes of our plaRorm
• Natural Groups – PlaRorms
• Club • League • County
ProperHes of our plaRorm
• Natural Groups – PlaRorms
• Club – Discreet club_ids: x,y,z
• League – Discreet league_ids: i,j,k
• County – Discreet county_ids: p,q,r
SoluHon #1 “The InvalidaHon Manager”
• ApplicaHon – “I have changed A”
• InvalidaHon Manager – “I will invalidate A1, A2, A3”
Feeling preWy smug…
Source: hWp://www.showbiz411.com/tag/ricky-‐gervais
What really happened…
• ApplicaHon – “I have changed A”
• InvalidaHon Manager – “I will invalidate A1, A2, A3, B2, C7, D4, F8, G8, A3 (twice!), B6, H28, M14, J139”
“The InvalidaHon Manager” Un-‐managable!
Problems
• Cross domain dependencies A, B, C etc • Reliance on developers • Large number of invalidaHon events • Users content disappeared • Users could not delete content
SoluHon #2 – “PlaRorm Cache”
• Example cache keys – “club1234_news56789” – “club1234_report4321” – “club1234_video1357”
SoluHon #2 – “PlaRorm Cache”
• Example cache keys – “club1234_news56789” – “club1234_report4321” – “club1234_video1357”
SoluHon #2 – “PlaRorm Cache”
• Example cache keys – “{club}_news56789” – “{club}_report4321” – “{club}_video1357”
• {club} is a variable, stored in memcache • {club} = hash_fn(club, club_id, Hme())
Just added support for dynamically namespacing memcache keys…
Source: hWp://www.troll.me/meme/fist-‐baby
SoluHon #2 – “PlaRorm Cache”
• EffecHvely namespaced cache keys • Can now invalidate* an enHre plaRorms (e.g. club_id:1234) data in one command
• No required knowledge of the rest of the key • Don’t need to add keys to invalidaHon manager each Hme we add new funcHonality
*No actual invalidaHon happens: we just change the names of the keys to something that doesn’t exist. Memcache will garbage collect based on LRU.
SoluHon #2 – “PlaRorm Cache”
• Results – Reduced load on database – Low ops overhead – LiWle input required from developers
SoluHon #2 – “PlaRorm Cache”
• Not perfect – Requires an extra memcache lookup for the namespace key
ON-‐DEMAND IMAGE PROCESSING
Any image
Any size
As the page loads
Problems
• 10M images • Currently in 3 sizes (30M objects) • Up to 50 images on some pages – High traffic => very high request rate
Old Model
3 set sizes /{user_id}/{image_id}.jpg -‐ 600x450 /{user_id}/sm_{image_id}.jpg -‐ 120x90 /{user_id}/Hny_{image_id}.jpg -‐ 50x37
New Model
Any image, any size /?url={s3_image_url}&w={width}&h={height}&q={quality} Hash parameters and store on disk once generated Checkout: hWp://images.weserv.nl/
New Model
Super fast… Average server response 15ms (CloudFront faster?) >> CloudFront
>> Server (from disk) >> Server (generate from S3 original)
WHAT ABOUT “DEVOPS”?
DevOps Status
• Dev working pracHces – Version control – One command deployment – (Some) automated tesHng
• Ops working pracHces in infancy – Setup lists & bash scripts – Not taking advantage of the cloud
The Future
• Infrastructure as code • More automated tesHng • BeWer automated tesHng • Cloud? Dedicated?