1 IBM Confidential Streaming Movies brings you Streamlined Applications -- How Adopting Netflix Libraries can Improve Your Application or Service! March, 2014 Andrew Spyker, STSM Michael Elder, STSM [email protected] [email protected] @aspyker @mdelder

In this presentation, Andrew Spyker and I present our experience with adopting Netflix OSS, both from a deep runtime perspective for various applications and services as well as managing deployed services for scalability and failover.


Streaming Movies brings you Streamlined Applications -- How Adopting Netflix Libraries can Improve Your Application

or Service!

March, 2014

Andrew Spyker, STSM Michael Elder, STSM [email protected] [email protected]

@aspyker @mdelder

2012  2013  

2014  Beyond  


AcmeAir  Cloud/Mobile  Sample/Benchmark  born  

Sample  applicaBon  cloud  prize  work  

AcmeAir  Run  On  IBM  Cloud  at  “Web  Scale”  

Portability  cloud  prize  work  

Scalable  Services  Fabric  internally  for  

IBM  Services  

Landscaper  &  JazzHub  adopBon  of  NOSS  libraries  

Historical  Context  

SoSLayer  and  BlueMix  services  

 •  “Technical  indigesBon  as  a  service”  –  Adrian  CockcroS  

•  –  40+  OSS  projects  –  Expanding  every  day  

•  Focusing  more  on  interacBve  mid-­‐Ber  server  technology  today  …  

What  is  service  opera7onal  excellence?  

¡  OperaBng  a  24/7  producBon  public  cloud  service  with  paying  customers  means  …  

¡  High  availability  ¡  No  SPOF’s,  all  components  at  least  triple  clustered  

¡  AutomaBc  Recovery  ¡  ParBal  failure  should  be  recovered  by  system  

¡  ConBnuous  Delivery  ¡  Changes  delivered  frequently  with  zero  downBme  

¡  ElasBc  Scalability  ¡  SoluBon  can  scale  out  easily  

¡  OperaBonal  Visibility  ¡  Operators  have  live  view  of  contextualized  state  of  system  

Micro  service  ImplementaBon  

Call  “Auth  Service”  

Ribbon  REST  client  with  Eureka  

Web  App  Front  End  


(REST  services)  App  Service  (auth-­‐service)  

Execute  auth-­‐service  



Eureka  Server(s)  

Eureka  Server(s)  

Eureka  Server(s)  

Karyon  &  Archaius  

Fallback  ImplementaBon  

Implementa7on  Detail   Benefits  

Decompose  into  micro  services   •  Key  user  path  always  available  •  Failure  does  not  propagate  across  service  boundaries  

Karyon  /w  automaBc  Eureka  registraBon  using  Archaius  for  property  management  

•  New  instances  are  quickly  found,  failing  individual  instances  disappear  •  Per  deployment  environment  composite  property  sources  provide  

dynamic  behavior  at  runBme  

Ribbon  client  with  Eureka  awareness   •  Load  balances  &  retries  across  instances  with  “smarts”  •  Handles  temporal  instance  failure  

Hystrix  as  dependency  circuit  breaker   •  Allows  for  fast  failure  •  Provides  graceful  cross  service  degradaBon/recovery  

Highly  Available  Service  Run7me  Recipe  

Region  (Dallas)  


Datacenter  (DAL06)  DAL05  


Local  LBs  

Web  App   Auth  Service   Booking  Service  

Cluster  Auto  Recovery  and  Scaling  Services  

Global  Load  Balancers  

Rule   Why?  

Always  >  2  of  everything   1  is  SPOF,  2  doesn’t  web  scale  and  slow  DR  recovery  

Including  IaaS  and  cloud  services   You’re  only  as  strong  as  your  weakest  dependency  

Use  auto  scaler/recovery  monitoring   Clusters  guarantee  availability  and  service  latency  

Use  applicaBon  level  health  checks   Instance  on  the  network  !=  healthy  

IaaS  High  Availability  

Our  goals  for  this  talk  

¡  Change  your  perspecBve:  as  we  move  into  hosted  offerings,  scalability  and  availability  are  criBcal  to  our  business  

¡  We’ll  look  at  NeWlix  RunBme  Libraries  that  you  can  adopt  in  your  products  today  –  regardless  of  whether  you’re  building  a  hosted  service  

¡  The  Libraries  promote  good  development  pracBces  and  architectural  paperns  

Pre-­‐req:  Inversion  of  Control  PaJern  

¡  Inversion  of  Control  or  Dependency  Injec4on  is  a  soSware  design  papern  that  defines  a  set  of  callback  interfaces  which  are  provided  with  their  data  or  references  when  needed  

¡  The  client  “inverts  control”  back  to  some  container  to  tell  it  what  references  to  use  during  execuBon  

¡  Spring,  Java  Persistence  API,  Servlet,  etc  are  all  forms  of  this  papern  

¡  NeWlix  leverages  Google  Guice  (hpps://­‐guice/)  and  makes  heavy  use  of  injecBon  for  their  lifecycle  management  library  (Karyon)  

Lifecycle  Management:  Karyon  

¡  In  order  to  support  microservices,  it’s  important  to  componenBze  your  architecture  between  microservices  

¡  But  within  microservices,  there  are  similar  logically  independent  segments  –  EmailNoBficaBon,  Database  API,  Storage  API,  etc  

¡  If  you  have  a  big  ServletContextListener  today  which  does  a  bunch  of  iniBalizaBon,  pay  apenBon  here  –  this  will  help  you  refactor  that  code  into  a  more  manageable  approach  







Lifecycle  Management:  Karyon  

¡  Karyon  defines  @ApplicaBon  and  @Component  annotaBons  

¡  Well  defined,  loosely  coupled  classes  form  the  larger  applicaBon  (e.g.  Components)  

¡  Allows  us  to  change  the  monolithic  ConfiguraBonService  class  (1500  lines  originally)  into  about  65  lines    

¡  Any  new  parts  of  the  architecture  should  be  defined  as  Components  

¡  Also  defines  a  “HealthCheck”  type  to  be  subclassed  for  each  ApplicaBon  

package com.urbancode.landscape.web.core; … @Application @Singleton public class LandscaperApplication { private static final Logger logger = LoggerFactory .getLogger(LandscaperApplication.class); @Inject private ServletContext context; @Inject private LocalStorage storage; @Inject private PersistenceConfig persistenceConfig; @Inject private ServiceBusClient serviceClient; @Inject private NotificationConfig notificationConfig; @PostConstruct public void initialize() { .. } }

Karyon:  @Component  ¡  @Components  break  up  logical  parts  of  the  

larger  applicaBon  (as  expected)  

¡  Components  can  declare  other  Components  in  their  InjecBons  (e.g  PersistenceConfig  -­‐>  LocalStorage)  

¡  Components  are  iniBalized  **BEFORE**  the  ApplicaBon  (hence  cannot  depend  on  the  ApplicaBon)  

¡  Components  may  choose  to  be  @Singleton  

¡   Components  may  declare  @PreDestroy  methods  

package com.urbancode.landscape; … @Component @Singleton public class LocalStorage { @Inject private ServletContext context; .. @PostConstruct public void initialize() { .. } … }

Karyon  Admin  Console  

¡  Available  at  hpp://localhost:8077/  (just  by  launching  Tomcat)  

¡  Provides  detailed  informaBon  about  the  process  including  classpath,  environment  variables,  properBes,  and  a  web-­‐based  JMX  console    

Configura7on  Management:  Archaius  

¡  Archaius  defines  a  library  for  managing  applicaBon  properBes  

¡  Exposed  automaBcally  through  Karyon  admin  console  

¡  Allows  you  to  define  “composite”  sources:    ¡  pull  from  properBes  file,    ¡  then  database,    ¡  then  environment-­‐specific  properBes  file  

¡  Makes  overriding  properBes  much  easier  

Dynamic  Proper7es  with  Archaius  

¡  Archaius  also  defines  dynamic  properBes    

¡  Dynamic  property  API  updates  properBes  for  each  request    

¡  Allows  you  to  update  configuraBon  without  forcing  a  reboot  

¡  ProperBes  can  be  manipulated  by  JMX  at  runBme  

// Create the dynamic property DynamicStringProperty novaEndpoint = DynamicPropertyFactory.getInstance() .getStringProperty(“key”, ”default_value"); // each new request will get the latest known value novaEndpoint.get();

Latency  &  Fault  Tolerance:  Hystrix  

¡  Failure  happens.  Daily.  Hourly.  This  minute  even.  

¡  Failure  cannot  be  prevented,  it  can  only  be  protected  against  through  isolaBon  

¡  When  one  dependency  fails,  it  can  cause  cascading  failures  or  chain  reacBons  which  have  a  much  broader  impact  

¡  Release  It!  describes  many  paperns  and  anBpaperns  around  stability  and  availability  moBvated  by  exactly  this  kind  of  use  case  

Bulkhead  PaJern  

¡  When  a  cascading  failure  or  chain  reacBon  occurs,  the  enBre  user  experience  can  be  destroyed  by  one  bad  actor  

¡  In  this  case,  or  in  similar  examples  where  one  or  more  dependencies  all  fail  due  to  an  upstream  service,  you  want  to  isolate  that  failure  so  that  it  doesn’t  impact  the  rest  of  the  architecture  

¡  We  call  this  protecBon  the  “Bulkhead”  papern  

Circuit  Breaker  PaJern  

¡  When  a  failure  occurs  over  and  over  again,  a  backend  system  may  be  down  or  experiencing  too  much  load  

¡  In  the  case  of  too  much  load,  you  can  “shed  load”  to  prevent  further  degradaBon  

¡  We  can  “trip  the  circuit”,  meaning  that  we  avoid  sending  new  requests  aSer  some  failure  threshold  (default  to  50%)  

¡  We  call  this  “Circuit  Breaker”  

Fail  Fast  PaJern  

¡  When  a  circuit  is  “open”  (e.g.  the  backend  service  is  failing  consistently),  any  new  requests  are  immediately  rejected  or  the  fallback  mechanism  is  invoked  

¡  We  call  this  failing  fast.  

¡  Enables  you  to  respond  to  failure  in  a  predictable  way  by  implemenBng  fallbacks  in  each  command  

Encapsula7ng  Service  Dependencies:  HystrixCommand<R>  

¡  Each  command  extends  a  common  type  

¡  The  constructor  configures  Group  ID,  Command  ID,  and  other  sevngs  

¡  A  run  method  implements  the  behavior  

public class OSGetAccessTokenCommand extends HystrixCommand<Access> { private static final HystrixCommandKey GET_ACCESS_TOKEN_CMD_KEY = HystrixCommandKey.Factory.asKey("GetAccessToken”); public OSGetAccessTokenCommand(…) { super(Setter.withGroupKey( HystrixCommandGroupKey.Factory.asKey(KEYSTONE_GROUP_ID)) .andCommandKey(GET_ACCESS_TOKEN_CMD_KEY) .andCommandPropertiesDefaults(HystrixCommandProperties.Setter() )); ... } ... protected Access run() throws Exception { ... } }

More  than  your  money’s  worth  

¡  All  commands  with  the  same  Group  ID  have  their  own  resource  pool  (more  on  that  in  a  bit)  

¡  Commands  have  configurable  Bmeouts  which  are  automaBcally  enforced  

¡  Commands  can  define  fallbacks  in  case  of  Bme  out  or  dependency  failure  

¡  Support  opBmizaBons  such  as  Request  Collapsing  and  Request  Caching  

Future<Access> request = new OSGetAccessTokenCommand( identityEndpoint.get(), identityUsername.get(), identityPassword.get()) .queue(); Access credentialToken = request.get();

In  case  of  emergency  …  

¡  Implement  getFallback()  when  failures  occur  

¡  OpBons  might  include  returning  mock  data  or  non-­‐personalized  data  

¡  OpBonally  check  for  the  failure  reason  with  getFailedExecuBonExcepBon()  (Bmeout,  excepBon,  etc)    

¡  When  using  Jersey  for  REST  API,  use  as  your  response  type,  and  decide  proper  Response  in  getFallback()  

public class UCDGetComponentsCommand extends UCDAbstractCommand<JSONArray> { @Override protected JSONArray getFallback() { JSONArray componentsArray = new JSONArray(); // Return mock data in case of failure for (String name : new String[] { "JKE Web", "JKE Database", "Mortgage App", "JPetStore Web" }) { try { MockData.getInstance() .createComponentResource(componentsArray, name); } catch (JSONException e) { … } } return componentsArray; }

Methods  of  Isola7on  for  Commands  

¡  Hystrix  supports  two  forms  of  IsolaBon:  Thread  Pool  and  Semaphore  

¡  Thread  Pool  is  the  easiest  to  understand  –  each  Group  has  its  own  dedicated  Thread  Pool  and  when  requests  come  in  while  the  thread  pool  they  fail  fast  

¡  Semaphore  is  useful  though  if  you  want  IsolaBon  but  don’t  want  to  break  your  exisBng  Threading  model  –  example  session  filters  which  configure  ThreadLocals  for  Servlets  or  Jersey  Resources  

public abstract class UCLAbstractDBCommand<R> extends UCLAbstractCommand<R>{

public UCLAbstractDBCommand( HystrixCommand.Setter setter) {

super(setter.andCommandPropertiesDefaults( HystrixCommandProperties.Setter() .withExecutionIsolationStrategy(

ExecutionIsolationStrategy.SEMAPHORE))); } @Override protected final R run() throws Exception { return doRun(); }

# Runs in the Jersey Resource’s thread # Leverages ThreadLocals from Hibernate Session Filter

protected abstract R doRun() throws Exception; }

When  to  throw  in  the  towel  …  (configuring  7meouts,  etc)  

¡  Virtually  all  of  the  HystrixCommand’s  opBons  can  be  configured  at  runBme  through  Archaius  

¡  Notable  examples  include  the  Thread  Pool  limits  and  Bmeouts  

¡  See  the  Hystrix  wiki  for  a  full  accounBng  of  what  you  can  do  

¡  Set  these  properBes  dynamically  through  Karyon’s  JMX  console  and  watch  command  behavior  change  on  the  fly  

# or landscaper-[env-name].properties # configure default timeout milliseconds hystrix.command.GetAccessToken.execution.isolation.thread.timeoutInMilliseconds=20000 hystrix.command.OSGetImages.execution.isolation.thread.timeoutInMilliseconds=8000 … # let no more than 20 GetAccessToken commands run # together hystrix.threadpool.KeystoneGroup.maxQueueSize=20 # wait 15 seconds if we “trip” the circuit for a command hystrix.command.GetAccessToken.circuitBreaker.sleepWindowInMilliseconds=15000

Embedding  Unit  Tests  

¡  Promoted  approach  by  NeWlix  to  reduce  fricBon  and  introduces  limited  addiBonal  bytes  relaBve  to  third  party  libraries  

¡  Makes  it  easy  to  write  commands,  the  UnitTest  becomes  the  test  harness  and  verificaBon  

¡  Always  testSuccess()  and  testFailure()  use  cases  to  ensure  expected  behavior  

import static org.junit.Assert.assertEquals; import static org.junit.Assert.assertNotNull; public class OSDeployEnvironmentCommand extends OSAbstractOrchestrationCommand<Response> { public static class UnitTest { @Test public void testSuccess() throws InterruptedException, ExecutionException, JSONException { … Future<Response> request = new OSDeployEnvironmentCommand(…) .queue(); Response response = request.get(); assertNotNull(response); assertEquals(HttpStatus.SC_OK, response.getStatus()); } }

Bootstrapping  Karyon  –  Advanced  Topic  

¡  If  your  commands  leverage  Karyon,  it’s  possible  to  bootstrap  Karyon  as  part  of  a  UnitTest  

¡  Requires  a  liple  more  setup,  but  enables  you  to  have  control  over  the  dependency  injecBon  from  the  Google  Guice  container  

¡  Allows  you  to  create  Mock  classes  for  things  like  ServletContext  or  other  classes,  if  your  Jersey  Resource  or  Hystrix  Command  needs  them  

protected static KaryonServer karyonServer; protected static Injector injector; protected static void configureKaryon(List<String> packages) throws Exception { if (karyonServer == null) { ConfigurationManager .loadPropertiesFromResources( ""); … karyonServer = new KaryonServer(); injector = karyonServer.initialize(); karyonServer.start(); …

REST  API  with  Ribbon  

¡  A  microservice  architecture  is  generally  built  around  REST-­‐based  interfaces  

¡  Ribbon  provides  a  HTTP  client  for  calling  service  dependencies  

¡  Provides  more  visibility  into  behaviors  like  connecBon  Bmeouts,  auto-­‐retry  and  number  of  retries  through  Archaius  properBes  

¡  Also  supports  client-­‐side  load  balancing  through  Eureka  

ConfigurationManager .loadPropertiesFromResources( ""); ClientFactory.getNamedClient("heat-api-client"); URI uri = new URI("/v1/" + token.getToken().getTenant().getId() + "/stacks"); HttpRequest request = HttpRequest.newBuilder().uri(uri) .verb(Verb.POST) .entity(getEntity().toString().getBytes()) .header("Content-Type", "application/json") .header("Accept", "application/json") .header("User-Agent", "python-heatclient") .build(); HttpResponse response = heatAPIClient.executeWithLoadBalancer(request);

Tying  it  all  into  a  bow  …  

¡  Ribbon  is  generally  used  with  HystrixCommands  to  execute  requests  

¡  Ribbon  will  automaBcally  discover  available  servers  and  load  balance  between  them        

¡  Here  the  example  has  an  explicit  list  of  servers,  but  we  can  change  the  sevngs  to  use  Eureka  for  service  discovery  

# configuration settings # Interval to refresh the server list from the source heat-api-client.ribbon.ServerListRefreshInterval=2000 # Connect timeout used by Apache HttpClient heat-api-client.ribbon.ConnectTimeout=3000 heat-api-client.ribbon.listOfServers=localhost:8004, // No need to reference an explicit server, Ribbon finds one URI uri = new URI("/v1/" + token.getToken().getTenant().getId() + "/stacks"); HttpRequest request = HttpRequest.newBuilder().uri(uri) .verb(Verb.POST) ... HttpResponse response = heatAPIClient.executeWithLoadBalancer(request);

Leveraging  Hystrix’s  Built  In  Event  Stream  

Netflix OSS IBM port/enablement

Netflix “Zen” of Cloud •  Worked with initial services to enable cloud native arch •  Worked with initial services to enable NetflixOSS usages •  Created scorecard and tests for “cloud native” readiness

Highly Available IaaS and Cloud Services

•  Deployment across multiple IBM SoftLayer IaaS datacenters and global and local load balancers

•  Complete automation via IBM SoftLayer IaaS API’s •  Ensured facilities for automatic failure recovery

Micro-service Runtimes (Karyon, Eureka Client, Ribbon, Hystrix, Archaius)

•  Ported to work with IBM SoftLayer IaaS and on the WebSphere Liberty Profile application server

•  Created “eureka-sidecar” for non-Java runtimes and ElasticSearch discovery

Netflix OSS Servers (Asgard, Eureka Server, Turbine)

•  Ported to work with IBM SoftLayer IaaS + RightScale •  Operationalized HA and secure deployments for multiple service tenants

Adopted Chaos Testing •  Ported Chaos Monkey to IBM SoftLayer IaaS •  Performed manual Chaos Gorilla validation on services

Worked through devops tool chain

•  Worked with initial services to enable continuous delivery with devops (and imagine baking via Animator like tool)

•  Working through integration with Urban Code Deploy and other IBM continuous delivery tools

AIM  Scalable  Services  Fabric  Ne0lixOSS  Port  to  So_Layer  

Demo  of  Ne0lixOSS  and  Ne0lix  services  on  So_Layer    

Region (Dallas)


Datacenter (DAL06) DAL05


Local LBs Web App Auth Service Booking Service

Cluster Auto Recovery and Scaling Services

Global Load Balancers


SoftLayer GLB/LLB and Datacenters

Demo  of  Ne0lixOSS  and  Ne0lix  services  on  So_Layer    



About  Your  Architecture  

•  Architecture should support DevOps principles such as staged roll out, operational insights, and scriptability •  Each resource provides some very practical advice for building systems which are focused on reliability and

feedback loops

Release It!: Design and Deploy Production-Ready


