mongodb at mapmyfitness

19
9/14/12

Upload: mapmyfitness

Post on 30-Jun-2015

1.118 views

Category:

Technology


1 download

DESCRIPTION

As one of our primary data stores, we utilize MongoDB heavily. Early last year our DevOps lead, Chris Merz, submitted some of our use cases to 10gen (http://www.10gen.com/events) as fodder for a presentation at the MongoDB conference in Boulder. The presentation went well enough at the Boulder conference that 10gen asked him to give it again at San Francisco, Seattle and again in Boulder. Hopefully there are some nuggets in this deck that can help you in your quest to dominate MongoDB.

TRANSCRIPT

Page 1: MongoDB at MapMyFitness

9/14/12  

Page 2: MongoDB at MapMyFitness

Route & Elevation data example (Lost on the way to MongoSeattle)

Page 3: MongoDB at MapMyFitness

Implementation Patterns

•   Standard  Datastore  -­‐  3  member  replica  set          (small  to  med  implementa:ons)  

 •   Big  Data  implementa:on  –  sharded  cluster  (TB+)  

 •   Buffering  Layer  -­‐  high  memory    

       (load  all  data  and  index  files  into  RAM)    •   Write  Heavy  -­‐  u:lize  sharding  to  op:mize  for  writes  

 •   Read  Heavy  -­‐  3+n  replica  set  configura:on  for  rapid  read  scaling  

       (up  to  12  nodes)  

Page 4: MongoDB at MapMyFitness

Implementation Patterns

•   In  the  cloud,  tune  the  instance  type  to  the  mongo  implementa:on  

 •   On  iron,  plan  carefully  and  dedicate  servers  completely  to  mongo  

to  avoid  memory  map  conten:on    •   For  DR,  spin  up  a  delayed,  hidden  replica  node  (preferably  in  a  

different  datacenter)    •   Aggrega:on  framework  can  be  used  in  myriad  ways,  including  

bridging  the  gap  to  SQL  data  warehousing  via  ETL.    •   Automate  install  paYerns  for  rapid  development,  prototyping,  

and  infrastructure  scaling.  

Page 5: MongoDB at MapMyFitness

Operational Automation ( example of automated mongodb install via puppet )

Page 6: MongoDB at MapMyFitness

Replica Set Expansion

•  MongoDB  is  “replica:on  made  elegant”  

•  Ridiculously  simple  to  add  addi:onal  members  

•  Be  sure  to  run  Ini:alSync  from  a  secondary!    rs.add(  “host”  :  “livetrack_db09”,  “ini:alSync”  :  {  “state”  :  2  }  )  

•  Both  rs.add()  and  rs.remove()  can  be  scripted  and  connected  to  Monitoring  systems  for  autoscaling  

Page 7: MongoDB at MapMyFitness

Monitoring and Introspection

•   MMS,  10gen's  cloud-­‐based  monitoring  service  (best  available)    

•   Supported  by  Zabbix,  Nagios,  Munin,  Server  Density,  etc    

•   mongostat,  mongotop,  REST  interface,  database  profiler    

•   Monitoring  system  triggers  can  ini:ate  node  addi:ons,      removals,  service  restarts,  etc    

•   In  addi:on  to  service-­‐level  monitoring,  use  more  advanced      tests  to  check  for  and  alert  on  query  latency  spikes      

Page 8: MongoDB at MapMyFitness

10gen's MMS (the one-stop shop for mongdb metrics)

Page 9: MongoDB at MapMyFitness

Mongo in Zabbix ( Mikoomi Plugins: http://code.google.com/p/mikoomi )

Page 10: MongoDB at MapMyFitness

mongostat ( Very useful for real-time troubleshooting )

Page 11: MongoDB at MapMyFitness

Operational Automation ( example of automated mongodb restart action )

Page 12: MongoDB at MapMyFitness

Security Considerations

•   MongoDB  provides  authen:ca:on  support  and  basic  permissions    

•   Auth  is  turned  off  by  default  to  allow  for  op:mal  performance      

•   Always  run  databases  in  a  trusted  network  environment    

•   Lock  down  host  based  firewalls  to  limit  access  to  required  clients      

•   Automate  iptables  with  puppet  or  chef,  in  EC2  use  security  groups  

Page 13: MongoDB at MapMyFitness

Network Security Automation

## Puppet Pattern for Mongodb network security class iptables::public { iptables::add_rule { '001 MongoDB established': rule => '-A RH-Firewall-1-INPUT -m state --state RELATED,ESTABLISHED -j ACCEPT' } iptables::add_rule { '002 MongoDB': rule => '-A RH-Firewall-1-INPUT -i eth1 -p tcp -m tcp --dport 27017 -j ACCEPT' } iptables::add_rule { '003 MongoDB MMF Phase II Network': rule => '-A RH-Firewall-1-INPUT -i eth0 -s 172.16.16.0/20 -p tcp -m tcp --dport 27017 -j ACCEPT' } iptables::add_rule { '004 MongoDB MMF Cloud Network': rule => '-A RH-Firewall-1-INPUT -i eth0 -s 10.178.52.0/24 -p tcp -m tcp --dport 27017 -j ACCEPT' } }

Page 14: MongoDB at MapMyFitness

Security Considerations

•   Use  the  rule  of  least-­‐privilege  to  allow  access  to  environments      

•   Data  sensi:vity  should  determine  the  extent  of  security  measures    

•   For  non-­‐sensi:ve  data,  good  network  security  can  be  sufficient      

•   In  open  environments,  be  sure  experience  matches  access  level    

•   Lack  of  granular  perms  allows  for  full  admin  access,  use  discre:on  

Page 15: MongoDB at MapMyFitness

Maintenance

•   Far  less  maintenance  required  than  tradi:onal  RDMBS  systems    •   Regularly  perform  query  profile  analysis  and  index  audi:ng    •   Rebuild  databases  to  reclaim  space  lost  due  to  fragmenta:on    •   Automate  checks  of  log  files  for  known  red-­‐flags    •   Regularly  review  data  throughput  rate,  storage  growth  rate,  and      overall  business  growth  graphs  to  inform  capacity  planning.    •   For  HA  tes:ng,  periodically  step-­‐down  the  primary  to  force  failover  

Page 16: MongoDB at MapMyFitness

Indexing Patterns or “Know Your App”

•  Proper  indexing  cri:cal  to  performance  at  scale  (monitor  slow  queries  to  catch  non-­‐performant  requests)  

•  MongoDB  is  ul:mately  flexible,  being  schemaless  (mongo  gives  you  enough  rope  to  hang  yourself,  choose  wisely)  

•  Avoid  un-­‐indexed  queries  at  all  costs    (it's  quickest  way  to  crater  your  app...  consider  -­‐-­‐notablescan)  

•  Onus  on  DevOps  to  match  applica:on  to  indexes  (know  your  query  profile,  never  assume)  

•  Shoot  for  'covered  queries'  wherever  possible  (answer  can  be  obtained  from  indexes  only)  

Page 17: MongoDB at MapMyFitness

Capped Collections

•  Use  standard  capped  collec:ons  for  retaining  a  fixed  amount  of  data.    Uses  a  FIFO  strategy  for  pruning.  (based  on  data  size,  not  number  of  rows)    

•  TTL  Collec:ons  (2.2)  age  out  data  based  on  a  reten:on  :me  configura:on.      (great  for  data  reten:on  requirements  of  all  types)  

 Gotcha!    Explicitly  create  the  capped  collec:on  before  any  data  is  put  into  the  system  to  avoid  auto-­‐crea:on  of  collec:on  

Page 18: MongoDB at MapMyFitness

Lessons Learned

•   Mongo  2.2  upgrade  containing  a  capped  collec:on  created  in  1.8.4.    This  severely  impacted  replica:on  (RC:  no  "_id"  index,    FIX:  add  "_id"  index)      

•   Never  start  mongo  when  a  mount  point  is  missing  or  incorrectly  configured.  Mongo  may  decide  to  take  maYers  into  it's  own  hands  and  resync  itself  with  the  replica  set.    Make  sure  your  devops  and  your  hos0ng  provider  admins  are  aware  of  this    

•   Some  drivers  that  use  connec:on  pooling  can  freak  the  freaky  freak  when  the  primary  member  changes  (older  pymongo).    Kicking  the  applica:on  can  fix,  also:  upgrade  drivers    

•   High  locked  %  is  a  big  red-­‐flag,  and  can  be  caused  by  a  large  number  of  simultaneous  dml  ac:ons  (high  insert  rate,  high  update  rate).  Consider  this  in  the  design  phase.    

•   Be  wary  of  automa:on  that  can  change  the  state  of  a  node  during  maintenance  mode.    Disable  automa:on  agents  for  reduced  risk  during  cri:cal  administra:ve  opera:ons  (filesystem  maint,  etc)  

Page 19: MongoDB at MapMyFitness

9/14/12