designing application for scale and high availability

Date: September 13, 2016

Designing applications for scale and high availabilityBy Venkateswarlu Yerramalli (Architect, Intuit India Development Centre)

Agenda

✤ Business needs in modern world

✤ Characteristics of legacy products

✤ How to scale?

✤ Applying the principles

✤ Challenges in decomposing applications

✤ Techniques to overcome the challenges

✤ Greenfield projects

✤ Closing notes

✤ Q&A

Business needs in modern world

✤ Speed to market

✤ Global customer base

✤ Rapid experimentation

✤ Insights from analytical data

✤ Optimise IT costs

Characteristics of legacy products

✤ Built as monolithic applications

✤ Spaghetti code, lack of clean boundaries

✤ Monolithic persistence

✤ Bad technology choices

How to scale?

Applying the principles

Define functional abstractions

Force cross functional interactions through the new abstractions

Decompose code under the abstractions into independent deployable units

Decompose date store based on the functional boundaries

Scale Horizontally Scale by Sharding

Monolith

Repeat

Challenges in decomposing applications✤ Identifying the correct functional abstractions

✤ Applications depending on shared state

✤ Lack of unit tests to support refactoring

✤ Bloated transactions

✤ Decrease in uptime

✤ Increased response times

✤ Complexity in monitoring and debugging

✤ Complexity in functional testing

Techniques to overcome the challenges

Identifying the correct functional abstractions

✤ Model the domain concepts (e.g., UserAccount, Wallet, etc.)

✤ Apply SOLID (Single responsibility, Open/Close, Leskov substitution, Interface segregation, Dependency inversion) principles to refine the abstractions

✤ Hide the implementation details from the abstractions

✤ Litmus test for your abstractions

Applications depending on shared state

✤ Examples of shared state

✤ Session management classes, utilities

✤ Commonly cached information (e.g., User info and context, static information)

✤ Eliminate shared state as soon as possible

✤ Implement bridge classes till completely decompose into services

Lack of unit tests

✤ Add unit tests for the new functional abstractions

✤ Focus on the core capabilities of the abstractions

✤ Consider adding integration tests along with unit tests

Bloated TransactionssaveCheckoutDetails(String userid, Address address, List<Items> items) { //acquire handle to persistence //Begin transaction //Insert into addresses (user_id, address_details) values (userid, address); //Iterate over items check availability and insert into orders table //Insert into audittable add event ("user modified", "added shipping address"); //Insert into audittable add event ("order created", "List of items"); //End transaction } ~

saveCheckoutDetails(String userid, Address address, List<Items> items) { //acquire handle to persistence //Begin transaction //call UserManagement.addAddress(userid, address) //call InventoryManagement.checkAvailability(items) //call OrderManagement.createOrder(userid, items) //End transaction }

Improving system uptime

✤ Have loosely coupled systems (evaluate integration models, synchronous calls vs event driven), has an impact on the user experience

✤ Be resilient to dependent system failures

✤ Implement circuit breakers and bulk head patterns (Use Hystrix)

✤ Fallback to secondary instances or cached values

✤ Consider having hot standbys (consider the additional cost)

✤ Consider graceful degradation over total unavailability

Reducing response times

✤ Reduce chatty services, consider using coarse grained services

✤ Cache frequently used data to avoid network calls

✤ Minimise hops across datacenters to serve a single request

✤ Implement timeouts to guard against rogue services

✤ Deploy closer to where your customers are located

✤ Use CDNs for static content (e.g., Akamai, Amazon CloudFront, etc.)

✤ Identify and scale out services that are under performing

Improving monitoring and debugging ✤ Tag all service calls for a single user request with a unique identifier.

✤ Monitor each service separately, just monitoring the UI availability is not sufficient

✤ Have a health check dashboard

✤ Establish SLAs and thresholds (e.g., SLA = 1500 concurrent users per sec; threshold = 750 concurrent users (50% of SLA)) for each service and raise alerts when the threshold is reached

✤ Use a single APM tool like NewRelic or AppDynamics across the system, helps in aggregates and end to end view

Testing distributed systems

✤ Unit test the functional abstractions

✤ Use mocks and simulators to test services (use tools like WireMock, custom built simulators for advanced Use Cases)

✤ Focus on testing resiliency and graceful degradation (use tools like Netflix Chaos Monkey and other tools from the Netflix Simian Army)

✤ Automate services testing

Greenfield Projects

✤ Identify areas/modules which require rapid change

✤ Design to scale out and not scale up, across every layer

✤ Identify data sharding strategies from the beginning

Closing notes

✤ Begin with the end in mind

✤ Decompose with line of sight to the end state

✤ Its an iterative process and not a Big-Bang change

✤ Tune into customer feedback (both internal and external) and adopt as necessary

Contact Info:

venkateswarlu_yerramalli@intuit.comhttps://www.linkedin.com/in/venkatyerramalli

designing application for scale and high availability

Technology

ali-designing ods with high availability and consistency

designing small-scale programs for maximum impact

designing and implementing for scale, distribution &...

business and partnering opportunities: “windows server...

drawing to scale: designing a garden -...

occcio 2014 - designing and deploying large scale vdi

cap watkins, designing for scale, warmgun 2013

thermal issues in designing nanometer scale interconnects

designing high availability networks, systems, and...

designing an ods with high availability and...

high availability for ultra-scale scientific high-end

designing at scale: creating a global user experience

high availability for ultra-scale scientific high-end...

designing large-scale lans - umm

designing for high performance ceph at scale

designing solar pv systems ( utility scale)

designing and deploying high availability cluster...

mysql high-availability and scale-out architectures

designing ultra large scale systems list

designing for scale and taking scale to account: lessons