gevent at tellapart

17
Kevin Ballard kevin(at)tellapart(dot)com Image ©20032012 `DivineError

Upload: tellapart

Post on 20-Aug-2015

17.177 views

Category:

Technology


0 download

TRANSCRIPT

Page 1: gevent at TellApart

Kevin  Ballard  kevin(at)tellapart(dot)com  

Image  ©2003-­‐2012  `DivineError  

Page 2: gevent at TellApart

TellApart’s  Infrastructure  Overview  

2  

•  Millions  of  daily  acIve  users  

•  Page-­‐views  across  mulIple  sites  

•  Real-­‐Time  Bidding  integraIon  - Very  high  volume,  low  latency  - Response  Ime:  50  percenIle:  17ms,  95  percenIle:  50  ms    

•  All  requests  require  user  data  

•  EnIrely  Amazon  Web  Services  (AWS),  in  2  parallel  regions  

Page 3: gevent at TellApart

What  is  gevent?  

3  

gevent  is  a  corouIne-­‐based  Python  networking  library  that  uses  greenlet  to  provide  a  high-­‐level  synchronous  API  on  top  of  the  libevent  event  loop.  

•  EssenIally,  allows  normally  synchronous  code  to  run  asynchronously  

Page 4: gevent at TellApart

What  is  gevent?  

4  

lib·∙e·∙vent  (ˈlib-­‐i-­‐ˈvent):  efficient  cross-­‐pla]orm  library  for  execuIng  callbacks  when  specific  events  occur  or  a  Imeout  has  been  reached.  Includes  several  networking  libraries  (e.g.  DNS,  HTTP)  

 green·∙let  (ˈgrēn-­‐lət):  lightweight  co-­‐rouInes  for  in-­‐process  

concurrent  programming.  Ported  from  Stackless  Python  as  a  library  for  the  CPython  interpreter  

     

Page 5: gevent at TellApart

How  does  gevent  work?  

5  

•  One  gevent  “hub”  per  process  

•  Monkey-­‐patch  blocking  libraries  - socket,  thread,  select,  etc.  

•  Use  greenlets  like  threads  

•  Blocking  calls  switch  to  another  (ready)  greenlet  

Page 6: gevent at TellApart

Example  Server  

6  

mod_wsgi:   gevent:  

Page 7: gevent at TellApart

Example  Server  

7  

•  Server  implementaIon  is  the  same  

•  DB  lookup  blocks  on  network  IO  

•  With  gevent,  greenlet  gets  swapped  out  so  another  request  can  be  served  

•  When  the  DB  request  finishes,  the  greenlet  will  conInue  where  it  lej  off  

 

Page 8: gevent at TellApart

Advantages  

8  

•  Write  code  as  though  it  were  synchronous  (mostly)  - No  ‘callback  spaghen’  like  with  a  callback  framework  - Exact  same  code  can  run  synchronously  (e.g.  unit  tests)  

•  Greenlets  are  very  lightweight  - 100’s  or  1000’s  can  run  concurrently  - No  context  switch  

o  Same  order  of  magnitude  as  a  funcIon  call  - No  GIL  related  performance  issues    

•  Co-­‐operaIve  concurrency  makes  synchronizaIon  easy  - Greenlets  cannot  be  preempted  - No  need  for  in-­‐process  atomic  locks  - Ojen  eliminates  the  need  for  synchronizaIon  

o  As  long  as  there  are  no  blocking  calls  in  the  criIcal  secIon  

Page 9: gevent at TellApart

Advantages  (conInued)  

9  

•  gevent  is  fast  - Very  thorough  set  of  benchmarks  by  Nicholas  Piëlhrp://nichol.as/benchmark-­‐of-­‐python-­‐web-­‐servers  

And  then  there  is  Gevent  [...]    […]  if  you  want  to  dive  into  high  performance  websockets  with  lots  of  concurrent  connecIons  you  really  have  to  go  with  an  asynchronous  framework.  Gevent  seems  like  the  perfect  companion  for  that,  at  least  that  is  what  we  are  going  to  use.    

Page 10: gevent at TellApart

Problems  

10  

•  Monkey-­‐patching  - Doesn’t  play  well  with  C  extensions  

o  Blocking  code  in  C  libraries  will  cause  the  process  to  block  - Can  confuse  some  libraries  

o  e.g.  thread-­‐local  storage  

•  Breaks  analysis  tools  - cProfile  produces  garbage  - AlternaIve  tools  available  

o  gevent-­‐profiler  (Meebo)  o  gevent_request_profiler  (TellApart)  

•  Co-­‐operaIve  scheduling  - Rogue  greenlets  can  Ie  up  the  enIre  process  

o  e.g.  CPU  bound  background  worker  - Long-­‐running  tasks  have  to  periodically  yield  

Page 11: gevent at TellApart

Problems  

11  

•  Same  server  as  before    

•  Processing  in  loop  can  take  long  •  Can  hurt  latency  of  other  requests  

•  Add  ‘gevent.sleep(0)’  to  loop  

•  Allows  other  greenlets  to  run  

Page 12: gevent at TellApart

Uses  

12  

•  We  use  gevent  everywhere  we  use  Python  

•  TellApart  Front  End  (TAFE)  - gevent  WSGI  server  with  a  micro-­‐framework  - One  process  per  core  - Nginx  reverse-­‐proxy  in  front  

•  Database  Proxy  (moxie)  - Thrij  service  - ConnecIon  pooling  across  clients  - Minimal  addiIonal  latency  (~2ms)  

Page 13: gevent at TellApart

Case  Study  -­‐  Taba  

13  

•  Taba  is  a  distributed  Event  AggregaIon  Service  

•  Provides  near  real-­‐Ime  metrics  from  across  a  cluster  

•  At  TellApart:  - 10,000  individual  Tabs  - 100’s  of  event  source  clients  - 20,000,000  events  /  minute  - 25  seconds  latency  from  real-­‐Ime  

Page 14: gevent at TellApart

Case  Study  -­‐  Taba  

14  

•  Implement  Imeouts    very  easily  

•  FuncIon  doesn’t  need    to  know  it’s  being  Imed  

Page 15: gevent at TellApart

Case  Study  –  Taba  

15  

•  Perform  simultaneous  lookups  to  a  sharded  database  

•  No  thread  pools    

•  No  need  for  locking  

Page 16: gevent at TellApart

Case  Study  –  Taba  

16  

•  Streaming  from  DB  in  batches  

•  No  thread  pool  

•  Trivial  synchronizaIon  

•  Process  data  while  the  next  batch  is  retrieved  

Page 17: gevent at TellApart

17  

Thank  you!    

Kevin  Ballard  kevin(at)tellapart(dot)com