netflix global cloud architecture

59
Globally Distributed Cloud Applica4ons at Ne7lix October 2012 Adrian Cockcro3 @adrianco #ne6lixcloud h;p://www.linkedin.com/in/adriancockcro3

Upload: adrian-cockcroft

Post on 15-Jan-2015

22.134 views

Category:

Technology


9 download

DESCRIPTION

Latest version of Netflix Architecture presentation, variants presented several times during October 2012

TRANSCRIPT

Page 1: Netflix Global Cloud Architecture

Globally  Distributed  Cloud  Applica4ons  at  Ne7lix  

October  2012  Adrian  Cockcro3  @adrianco  #ne6lixcloud  

h;p://www.linkedin.com/in/adriancockcro3  

Page 2: Netflix Global Cloud Architecture

Adrian  Cockcro3  •  Director,  Architecture  for  Cloud  Systems,  Ne6lix  Inc.  

–  Previously  Director  for  PersonalizaMon  Pla6orm  

•  DisMnguished  Availability  Engineer,  eBay  Inc.  2004-­‐7  –  Founding  member  of  eBay  Research  Labs  

•  DisMnguished  Engineer,  Sun  Microsystems  Inc.  1988-­‐2004  –  2003-­‐4  Chief  Architect  High  Performance  Technical  CompuMng  –  2001  Author:  Capacity  Planning  for  Web  Services  –  1999  Author:  Resource  Management  –  1995  &  1998  Author:  Sun  Performance  and  Tuning  –  1996  Japanese  EdiMon  of  Sun  Performance  and  Tuning  

•   SPARC  &  Solarisパフォーマンスチューニング (サンソフトプレスシリーズ)  

•  More  –  Twi;er  @adrianco  –  Blog  h;p://perfcap.blogspot.com  –  PresentaMons  at  h;p://www.slideshare.net/adrianco  

Page 3: Netflix Global Cloud Architecture

The  Ne6lix  Streaming  Service  

Now  in  USA,  Canada,  LaMn  America,  UK,  Ireland,  Sweden,  Denmark,  

Norway  and  Finland  

Page 4: Netflix Global Cloud Architecture

US  Non-­‐Member  Web  Site  AdverMsing  and  MarkeMng  Driven  

Page 5: Netflix Global Cloud Architecture

Member  Web  Site  PersonalizaMon  Driven  

Page 6: Netflix Global Cloud Architecture

Streaming  Device  API  

Netflix Ready DevicesFrom: May 2008

To: May 2010

Page 7: Netflix Global Cloud Architecture

Content  Delivery  Service  Distributed  storage  nodes  controlled  by  Ne6lix  cloud  services  

Page 8: Netflix Global Cloud Architecture

Abstract  

•  Ne6lix  on  Cloud  –  What,  Why  and  When  

•  Globally  Distributed  Architecture  

•  Open  Source  Components  

Page 9: Netflix Global Cloud Architecture

Why  Use  Cloud?      

Page 10: Netflix Global Cloud Architecture

Things  we  don’t  do  

Page 11: Netflix Global Cloud Architecture

What  Ne6lix  Did  

•  Moved  to  SaaS  –  Corporate  IT  –  OneLogin,  Workday,  Box,  Evernote…  –  Tools  –  Pagerduty,  AppDynamics,  EMR  (Hadoop)  

•  Built  our  own  PaaS  –  Customized  to  make  our  developers  producMve  –  Large  scale,  global,  highly  available,  leveraging  AWS  

•  Moved  incremental  capacity  to  IaaS  – No  new  datacenter  space  since  2008  as  we  grew  – Moved  our  streaming  apps  to  the  cloud  

Page 12: Netflix Global Cloud Architecture

Keeping  up  with  Developer  Trends  

•  Big  Data/Hadoop  •  AWS  Cloud  •  ApplicaMon  Performance  Management  •  Integrated  DevOps  PracMces  •  ConMnuous  IntegraMon/Delivery  •  NoSQL  •  Pla6orm  as  a  Service;  Fine  grain  SOA  •  Social  coding,  open  development/github  

In  producMon  at  Ne6lix  

2009  2009  2010  2010  2010  2010  2010  2011  

Page 13: Netflix Global Cloud Architecture

AWS  specific  feature  dependence….      

Page 14: Netflix Global Cloud Architecture

Portability  vs.  FuncMonality  

•  Portability  –  the  OperaMons  focus  – Avoid  vendor  lock-­‐in  – Support  datacenter  based  use  cases  – Possible  operaMons  cost  savings  

•  FuncMonality  –  the  Developer  focus  – Less  complex  test  and  debug,  one  mature  supplier  – Faster  Mme  to  market  for  your  products  – Possible  developer  Mme/cost  savings  

Page 15: Netflix Global Cloud Architecture

FuncMonal  PaaS  

•  IaaS  base  -­‐  all  the  features  of  AWS  – Very  large  scale,  mature,  global,  evolving  rapidly  – ELB,  Autoscale,  VPC,  SQS,  EIP,  EMR,  etc,  etc.  – E.g.  Large  files  (TB)  and  mulMpart  writes  in  S3  

•  FuncMonal  PaaS  –  Ne6lix  added  features  – ConMnuous  build/deploy,  SOA,  HA  pa;erns    – Asgard  console,  Monkeys,  Big  data  tools  – Cassandra/Zookeeper  data  store  automaMon  

Page 16: Netflix Global Cloud Architecture

How  Ne6lix  Works  

Customer  Device  (PC,  PS3,  TV…)  

Web  Site  or  Discovery  API  

User  Data  

PersonalizaMon  

Streaming  API  

DRM  

QoS  Logging  

OpenConnect  CDN  Boxes  

CDN  Management  and  

Steering  

Content  Encoding  

Consumer  Electronics  

AWS  Cloud  Services  

CDN  Edge  LocaMons  

Page 17: Netflix Global Cloud Architecture

Component  Services  (Simplified  view  using  AppDynamics)  

Page 18: Netflix Global Cloud Architecture

Web  Server  Dependencies  Flow  (Home  page  business  transacMon  as  seen  by  AppDynamics)  

Start  Here  

memcached  

Cassandra  

Web  service  

S3  bucket  

Page 19: Netflix Global Cloud Architecture

One  Request  Snapshot  (captured  because  it  was  unusually  slow)  

Page 20: Netflix Global Cloud Architecture

Current  Architectural  Pa;erns  for  Availability  

•  Isolated  Services  – Resilient  Business  logic  

•  Three  Balanced  Availability  Zones  – Resilient  to  Infrastructure  outage  

•  Triple  Replicated  Persistence  – Durable  distributed  Storage  

•  Isolated  Regions  – US  and  EU  don’t  take  each  other  down  

Page 21: Netflix Global Cloud Architecture

Isolated  Services  Test  With  Chaos  Monkey,  Latency  Monkey  

Page 22: Netflix Global Cloud Architecture

Three  Balanced  Availability  Zones  Test  with  Chaos  Gorilla  

Cassandra  and  Evcache  Replicas  

Zone  A  

Cassandra  and  Evcache  Replicas  

Zone  B  

Cassandra  and  Evcache  Replicas  

Zone  C  

Load  Balancers  

Page 23: Netflix Global Cloud Architecture

Triple  Replicated  Persistence  Cassandra  maintenance  affects  individual  replicas    

Cassandra  and  Evcache  Replicas  

Zone  A  

Cassandra  and  Evcache  Replicas  

Zone  B  

Cassandra  and  Evcache  Replicas  

Zone  C  

Load  Balancers  

Page 24: Netflix Global Cloud Architecture

Isolated  Regions  

Cassandra  Replicas  

Zone  A  

Cassandra  Replicas  

Zone  B  

Cassandra  Replicas  

Zone  C  

US-­‐East  Load  Balancers  

Cassandra  Replicas  

Zone  A  

Cassandra  Replicas  

Zone  B  

Cassandra  Replicas  

Zone  C  

EU-­‐West  Load  Balancers  

Page 25: Netflix Global Cloud Architecture

Failure  Mode   Probability   Mi4ga4on  Plan  

ApplicaMon  Failure   High   AutomaMc  degraded  response  

AWS  Region  Failure   Low   Wait  for  region  to  recover  

AWS  Zone  Failure   Medium   ConMnue  to  run  on  2  out  of  3  zones  

Datacenter  Failure   Medium   Migrate  more  funcMons  to  cloud  

Data  store  failure   Low   Restore  from  S3  backups  

S3  failure   Low   Restore  from  remote  archive  

Failure  Modes  and  Effects  

Page 26: Netflix Global Cloud Architecture

Ne6lix  Deployed  on  AWS  

Content  

Content  Management  

EC2  Encoding  

S3  Petabytes  

Logs  

S3  Terabytes  

EMR  

Hive  &  Pig  

Business  Intelligence  

Play  

DRM  

CDN  rouMng  

Bookmarks  

Logging  

WWW  

Sign-­‐Up  

Search  Solr  

Movie  Choosing  

RaMngs  

API  

Metadata  

Device  Config  

TV  Movie  Choosing  

Social  Facebook  

CS  

InternaMonal  CS  lookup  

DiagnosMcs  &  AcMons  

Customer  Call  Log  

CS  AnalyMcs  

2009   2009   2010   2010   2010   2011  

CDNs  ISPs  

Terabits  Customers  

Page 27: Netflix Global Cloud Architecture

Cloud  Architecture  Pa;erns  

Where  do  we  start?  

Page 28: Netflix Global Cloud Architecture

Datacenter  to  Cloud  TransiMon  Goals  

•  Faster  –  Lower  latency  than  the  equivalent  datacenter  web  pages  and  API  calls  –  Measured  as  mean  and  99th  percenMle  –  For  both  first  hit  (e.g.  home  page)  and  in-­‐session  hits  for  the  same  user  

•  Scalable  –  Avoid  needing  any  more  datacenter  capacity  as  subscriber  count  increases  –  No  central  verMcally  scaled  databases  –  Leverage  AWS  elasMc  capacity  effecMvely  

•  Available  –  SubstanMally  higher  robustness  and  availability  than  datacenter  services  –  Leverage  mulMple  AWS  availability  zones  –  No  scheduled  down  Mme,  no  central  database  schema  to  change  

•  ProducMve  –  OpMmize  agility  of  a  large  development  team  with  automaMon  and  tools  –  Leave  behind  complex  tangled  datacenter  code  base  (~8  year  old  architecture)  –  Enforce  clean  layered  interfaces  and  re-­‐usable  components  

Page 29: Netflix Global Cloud Architecture

Ne6lix  Datacenter  vs.  Cloud  Arch  

Central  SQL  Database   Distributed  Key/Value  NoSQL  

SMcky  In-­‐Memory  Session   Shared  Memcached  Session  

Cha;y  Protocols   Latency  Tolerant  Protocols  

Tangled  Service  Interfaces   Layered  Service  Interfaces  

Instrumented  Code   Instrumented  Service  Pa;erns  

Fat  Complex  Objects   Lightweight  Serializable  Objects  

Components  as  Jar  Files   Components  as  Services  

Page 30: Netflix Global Cloud Architecture

Cassandra  on  AWS  

A  highly  available  and  durable  deployment  pa;ern  

Page 31: Netflix Global Cloud Architecture

Cassandra  Service  Pa;ern  Cassandra  Cluster  Managed  by  Priam  Between  6  and  72  nodes  

Data  Access  REST  Service  Astyanax  Cassandra  Client  

Datacenter  Update  Flow  

Service  REST  Clients  

Appdynamics  Service  Flow  VisualizaMon  

Page 32: Netflix Global Cloud Architecture

ProducMon  Deployment  Totally  Denormalized  Data  Model  

Over  50  Cassandra  Clusters  Over  500  nodes  Over  30TB  of  daily  backups  Biggest  cluster  72  nodes  1  cluster  over  250Kwrites/s  

Page 33: Netflix Global Cloud Architecture

Astyanax  -­‐  Cassandra  Write  Data  Flows  Single  Region,  MulMple  Availability  Zone,  Token  Aware  

Token  Aware  Clients  

Cassandra  • Disks  • Zone  A  

Cassandra  • Disks  • Zone  B  

Cassandra  • Disks  • Zone  C  

Cassandra  • Disks  • Zone  A  

Cassandra  • Disks  • Zone  B  

Cassandra  • Disks  • Zone  C  

1.  Client  Writes  to  local  coordinator  

2.  Coodinator  writes  to  other  zones  

3.  Nodes  return  ack  4.  Data  wri;en  to  

internal  commit  log  disks  (no  more  than  10  seconds  later)  

If  a  node  goes  offline,  hinted  handoff  completes  the  write  when  the  node  comes  back  up.    Requests  can  choose  to  wait  for  one  node,  a  quorum,  or  all  nodes  to  ack  the  write    SSTable  disk  writes  and  compacMons  occur  asynchronously  

14  

4  

42  

3  

3  3  

2  

Page 34: Netflix Global Cloud Architecture

Data  Flows  for  MulM-­‐Region  Writes  Token  Aware,  Consistency  Level  =  Local  Quorum  

1.  Client  writes  to  local  replicas  2.  Local  write  acks  returned  to  

Client  which  conMnues  when  2  of  3  local  nodes  are  commi;ed  

3.  Local  coordinator  writes  to  remote  coordinator.    

4.  When  data  arrives,  remote  coordinator  node  acks  and  copies  to  other  remote  zones  

5.  Remote  nodes  ack  to  local  coordinator  

6.  Data  flushed  to  internal  commit  log  disks  (no  more  than  10  seconds  later)  

If  a  node  or  region  goes  offline,  hinted  handoff  completes  the  write  when  the  node  comes  back  up.  Nightly  global  compare  and  repair  jobs  ensure  everything  stays  consistent.  

US  Clients  

Cassandra  •  Disks  •  Zone  A  

Cassandra  •  Disks  •  Zone  B  

Cassandra  •  Disks  •  Zone  C  

Cassandra  •  Disks  •  Zone  A  

Cassandra  •  Disks  •  Zone  B  

Cassandra  •  Disks  •  Zone  C  

EU  Clients  

Cassandra  •  Disks  •  Zone  A  

Cassandra  •  Disks  •  Zone  B  

Cassandra  •  Disks  •  Zone  C  

Cassandra  •  Disks  •  Zone  A  

Cassandra  •  Disks  •  Zone  B  

Cassandra  •  Disks  •  Zone  C  

6  

5  

5  

6   6  4  

4  4  

1  6  

6  

6  2  

2  

2  3  

100+ms  latency  

Page 35: Netflix Global Cloud Architecture

ETL  for  Cassandra  

•  Data  is  de-­‐normalized  over  many  clusters!  •  Too  many  to  restore  from  backups  for  ETL  •  SoluMon  –  read  backup  files  using  Hadoop  •  Aegisthus  

–  h;p://techblog.ne6lix.com/2012/02/aegisthus-­‐bulk-­‐data-­‐pipeline-­‐out-­‐of.html  

– High  throughput  raw  SSTable  processing  – Re-­‐normalizes  many  clusters  to  a  consistent  view  – Extract,  Transform,  then  Load  into  Teradata  

Page 36: Netflix Global Cloud Architecture

Benchmarks  and  Scalability  

Page 37: Netflix Global Cloud Architecture

Cloud  Deployment  Scalability  New  Autoscaled  AMI  –  zero  to  500  instances  from  21:38:52  -­‐  21:46:32,  7m40s  

Scaled  up  and  down  over  a  few  days,  total  2176  instance  launches,  m2.2xlarge  (4  core  34GB)    

Min. 1st Qu. Median Mean 3rd Qu. Max. !41.0 104.2 149.0 171.8 215.8 562.0!

Page 38: Netflix Global Cloud Architecture

Scalability  from  48  to  288  nodes  on  AWS  h;p://techblog.ne6lix.com/2011/11/benchmarking-­‐cassandra-­‐scalability-­‐on.html  

174373  

366828  

537172  

1099837  

0  

200000  

400000  

600000  

800000  

1000000  

1200000  

0   50   100   150   200   250   300   350  

Client  Writes/s  by  node  count  –  Replica4on  Factor  =  3  

Used  288  of  m1.xlarge  4  CPU,  15  GB  RAM,  8  ECU  Cassandra  0.86  Benchmark  config  only  existed  for  about  1hr  

Page 39: Netflix Global Cloud Architecture

Cassandra  on  AWS  

The  Past  •  Instance:  m2.4xlarge  •  Storage:  2  drives,  1.7TB  •  CPU:  8  Cores,  26  ECU  •  RAM:  68GB  •  Network:  1Gbit  •  IOPS:  ~500  •  Throughput:  ~100Mbyte/s  •  Cost:  $1.80/hr  

The  Future  •  Instance:  hi1.4xlarge  •  Storage:  2  SSD  volumes,  2TB  •  CPU:  8  HT  cores,  35  ECU  •  RAM:  64GB  •  Network:  10Gbit  •  IOPS:  ~100,000  •  Throughput:  ~1Gbyte/s  •  Cost:  $3.10/hr  

Page 40: Netflix Global Cloud Architecture

Cassandra  Disk  vs.  SSD  Benchmark  Same  Throughput,  Lower  Latency,  Half  Cost  

Page 41: Netflix Global Cloud Architecture

Availability  and  Resilience  

Page 42: Netflix Global Cloud Architecture

Chaos  Monkey  h;p://techblog.ne6lix.com/2012/07/chaos-­‐monkey-­‐released-­‐into-­‐wild.html  •  Computers  (Datacenter  or  AWS)  randomly  die  

– Fact  of  life,  but  too  infrequent  to  test  resiliency  •  Test  to  make  sure  systems  are  resilient  

– Allow  any  instance  to  fail  without  customer  impact  

•  Chaos  Monkey  hours  – Monday-­‐Friday  9am-­‐3pm  random  instance  kill  

•  ApplicaMon  configuraMon  opMon  – Apps  now  have  to  opt-­‐out  from  Chaos  Monkey  

Page 43: Netflix Global Cloud Architecture

Responsibility  and  Experience  

•  Make  developers  responsible  for  failures  – Then  they  learn  and  write  code  that  doesn’t  fail  

•  Use  Incident  Reviews  to  find  gaps  to  fix  – Make  sure  its  not  about  finding  “who  to  blame”  

•  Keep  Mmeouts  short,  fail  fast  – Don’t  let  cascading  Mmeouts  stack  up  

•  Make  configuraMon  opMons  dynamic  – You  don’t  want  to  push  code  to  tweak  an  opMon  

Page 44: Netflix Global Cloud Architecture

Resilient  Design  –  Circuit  Breakers  h;p://techblog.ne6lix.com/2012/02/fault-­‐tolerance-­‐in-­‐high-­‐volume.html  

Page 45: Netflix Global Cloud Architecture

Distributed  OperaMonal  Model  

•  Developers  – Provision  and  run  their  own  code  in  producMon  – Take  turns  to  be  on  call  if  it  breaks  (pagerduty)  – Configure  autoscalers  to  handle  capacity  needs  

•  DevOps  and  PaaS  (aka  NoOps)  – DevOps  is  used  to  build  and  run  the  PaaS  – PaaS  constrains  Dev  to  use  automaMon  instead  – PaaS  puts  more  responsibility  on  Dev,  with  tools  

Page 46: Netflix Global Cloud Architecture

Culture  

Page 47: Netflix Global Cloud Architecture

UnconvenMonal  Culture  See  culture  deck  at  h;p://jobs.ne6lix.com  

•  Brave/Aggressive  from  the  top  down  •  Focus  on  talent  density  above  everything  •  Reduce  process,  remove  complexity  •  Freedom  and  Responsibility  •  One  product  focus  for  the  whole  company  •  (almost)  full  informaMon  sharing  across  co.  •  Simplified  managers  role  

Page 48: Netflix Global Cloud Architecture

Managers  Role  

•  Hiring,  Architecture,  Project  Management  •  No  vacaMon  policy  to  track  •  (Almost)  no  remote  employees  or  contractors  •  No  bonuses  to  allocate  •  No  expenses  to  approve  •  Pay  mark  to  market  handled  at  VP  level  

Page 49: Netflix Global Cloud Architecture

Ne6lix  OrganizaMon  DevOps  Org  ReporMng  into  Product  Group,  not  ITops  

CEO  –  Reed  HasMngs  

CPO  –  Chief  Product  Officer  –  Neil  Hunt  

VP  -­‐  Cloud  and  Pla6orm  Engineering  -­‐  Yury  

Architecture  

Future  planning  Security  Arch  Efficiency  

AWS  VPC  Hyperguard  

Powerpoint  J  

Pla6orm  and  Persistence  Engineering  

Base  Pla6orm  Zookeeper  

Cassandra  Ops  

AWS  Instances  

Cloud  SoluMons  

Monitoring  Monkeys  Build  Tools  

AWS  Instances  AWS  API  

Cloud  Ops  Reliability  Engineering  

Alert  RouMng  Incident  Lifecycle  

PagerDuty  

PersonalizaMon  Pla6orm  and  

Performance  Eng  

Metadata  Benchmarking  Memcached  

AWS  Instances  

Membership  and  Billing  

Data  sources  Vault  processing  

Cassandra  

Data  Science  Pla6orm  

Business  Intelligence  

Hadoop  on  EMR  

Page 50: Netflix Global Cloud Architecture

Build  Your  Own  PaaS  

Page 51: Netflix Global Cloud Architecture

Components  

•  ConMnuous  build  framework  turns  code  into  AMIs  •  AWS  accounts  for  test,  producMon,  etc.  •  Cloud  access  gateway  •  Service  registry  •  ConfiguraMon  properMes  service  •  Persistence  services  •  Monitoring,  alert  forwarding  •  Backups,  archives  

Page 52: Netflix Global Cloud Architecture

Ne6lix  Open  Source  Strategy  

•  Release  PaaS  Components  git-­‐by-­‐git  –  Source  at  github.com/ne6lix  –  we  build  from  it…  –  Intros  and  techniques  at  techblog.ne6lix.com  –  Blog  post  or  new  code  every  few  weeks  

•  MoMvaMons  – Give  back  to  Apache  licensed  OSS  community  – MoMvate,  retain,  hire  top  engineers  –  “Peer  pressure”  code  cleanup,  external  contribuMons  

Page 53: Netflix Global Cloud Architecture

Instance  creaMon  

ASG  /  Instance  started   Instance  Running  

Asgard  

Autoscaling  scripts  Odin  

Bakery  &  Build  tools  

Base  AMI  

ApplicaMon  Code  

Instance  

Image  baked  

Page 54: Netflix Global Cloud Architecture

ApplicaMon  Launch  

Registering,  configuraMon  

Eureka  

Entrypoints  Archaius  

Governator  (Guice)  

Async  logging  

Servo  

ApplicaMon  iniMalizing  

Page 55: Netflix Global Cloud Architecture

RunMme  

Managing  service  

Resiliency  aids  

Priam  

Exhibitor  

Explorers  

NIWS  LB  

Astyanax  

Curator  

Dependency  Command  

REST  client  

Chaos  Monkey  Latency  Monkey  Janitor  Monkey  Cass  JMeter  

Calling  other  services  

Page 56: Netflix Global Cloud Architecture

Open  Source  Projects  Github  /  Techblog  

Apache  ContribuMons  

Techblog  Post  

Coming  Soon  

Priam  Cassandra  as  a  Service  

Astyanax  Cassandra  client  for  Java  

CassJMeter  Cassandra  test  suite  

Cassandra  MulM-­‐region  EC2  datastore  support  

Aegisthus  Hadoop  ETL  for  Cassandra  

Explorers  

Governator  -­‐  Library  lifecycle  and  dependency  injecMon  

Odin  Workflow  orchestraMon  

Async  logging  

Exhibitor  Zookeeper  as  a  Service  

Curator  Zookeeper  Pa;erns  

EVCache  Memcached  as  a  Service  

Eureka  /  Discovery  Service  Directory  

Archaius  Dynamics  ProperMes  Service  

EntryPoints  

Server-­‐side  latency/error  injecMon  

REST  Client  +  mid-­‐Mer  LB  

ConfiguraMon  REST  endpoints  

Servo  and  Autoscaling  Scripts  

Honu  Log4j  streaming  to  Hadoop  

Circuit  Breaker  Robust  service  pa;ern  

Asgard  -­‐  AutoScaleGroup  based  AWS  console  

Chaos  Monkey  Robustness  verificaMon  

Latency  Monkey  

Janitor  Monkey  

Bakeries  and  AMI  

Build  dynaslaves  

Legend  

Page 57: Netflix Global Cloud Architecture

Roadmap  for  2012  

•  More  resiliency  and  improved  availability  •  More  automaMon,  orchestraMon  •  “Hardening”  the  pla6orm,  code  clean-­‐up  •  Lower  latency  for  web  services  and  devices  •  IPv6  –  now  running  in  prod,  rollout  in  process  •  More  open  sourced  components  •  See  you  at  AWS  Re:Invent  in  November…  

Page 58: Netflix Global Cloud Architecture

Takeaway    

Ne?lix  has  built  and  deployed  a  scalable  global  Pla?orm  as  a  Service.    

Key  components  of  the  Ne?lix  PaaS  are  being  released  as  Open  Source  projects  so  you  can  build  your  own  custom  PaaS.  

 h;p://github.com/Ne6lix  h;p://techblog.ne6lix.com  h;p://slideshare.net/Ne6lix  

 h;p://www.linkedin.com/in/adriancockcro3  

 @adrianco  #ne6lixcloud  

Page 59: Netflix Global Cloud Architecture

Amazon Cloud Terminology Reference See http://aws.amazon.com/ This is not a full list of Amazon Web Service features

•  AWS  –  Amazon  Web  Services  (common  name  for  Amazon  cloud)  •  AMI  –  Amazon  Machine  Image  (archived  boot  disk,  Linux,  Windows  etc.  plus  applicaMon  code)  •  EC2  –  ElasMc  Compute  Cloud  

–  Range  of  virtual  machine  types  m1,  m2,  c1,  cc,  cg.  Varying  memory,  CPU  and  disk  configuraMons.  –  Instance  –  a  running  computer  system.  Ephemeral,  when  it  is  de-­‐allocated  nothing  is  kept.  –  Reserved  Instances  –  pre-­‐paid  to  reduce  cost  for  long  term  usage  –  Availability  Zone  –  datacenter  with  own  power  and  cooling  hosMng  cloud  instances  –  Region  –  group  of  Avail  Zones  –  US-­‐East,  US-­‐West,  EU-­‐Eire,  Asia-­‐Singapore,  Asia-­‐Japan,  SA-­‐Brazil,  US-­‐Gov  

•  ASG  –  Auto  Scaling  Group  (instances  booMng  from  the  same  AMI)  •  S3  –  Simple  Storage  Service  (h;p  access)  •  EBS  –  ElasMc  Block  Storage  (network  disk  filesystem  can  be  mounted  on  an  instance)  •  RDS  –  RelaMonal  Database  Service  (managed  MySQL  master  and  slaves)  •  DynamoDB/SDB  –  Simple  Data  Base  (hosted  h;p  based  NoSQL  datastore,  DynamoDB  replaces  SDB)  •  SQS  –  Simple  Queue  Service  (h;p  based  message  queue)  •  SNS  –  Simple  NoMficaMon  Service  (h;p  and  email  based  topics  and  messages)  •  EMR  –  ElasMc  Map  Reduce  (automaMcally  managed  Hadoop  cluster)  •  ELB  –  ElasMc  Load  Balancer  •  EIP  –  ElasMc  IP  (stable  IP  address  mapping  assigned  to  instance  or  ELB)  •  VPC  –  Virtual  Private  Cloud  (single  tenant,  more  flexible  network  and  security  constructs)  •  DirectConnect  –  secure  pipe  from  AWS  VPC  to  external  datacenter  •  IAM  –  IdenMty  and  Access  Management  (fine  grain  role  based  security  keys)