node.js and containers: dispatches from the frontier
TRANSCRIPT
Application architecture, circa 2000
• The late 1990s saw the rise of three-tier architectures consisting of presentation, application logic and data tiers
• Many names for roughly the same notion: “Service-oriented architecture”, “Model/View/Controller”, etc.
• The AJAX+REST revolution of the mid-2000s gave rise to true web applications in which application logic could live on the edge
• Led to some broader architectural questioning...
Post-AJAX questions
• Why should HTTP be restricted to the web?
• Why should REST be restricted to web apps?
• Instead of having one monolithic architecture, why not have a series of (smaller) services that merely did one thing well?
• In case this sounds vaguely familiar...
The Unix Philosophy
• The Unix philosophy, as articulated by Doug McIlroy:
• Write programs that do one thing and do it well
• Write programs to work together
• Write programs that handle text streams, because that is a universal interface
• The single most important revolution in software systems thinking
• Applying it to HTTP-based services...
Microservices
• Microservices do one thing, and strive to do it well
• Replace a small number of monoliths with many services that have well-documented, small HTTP-based APIs
• Larger systems can be composed of these smaller services
• While the trend it describes is real, the term “microservices” isn’t without its controversy...
Developing microservices
• node.js is a perfect fit for microservices:
• Light memory footprint
• Purely asynchronous
• Expresses Unix programming model with respect to the operating system (processes, pipes, files, sockets, etc.)
• Module approach encourages libraries over frameworks
• node.js is itself an expression of the Unix philosophy
Deploying microservices
• Microservices are tautologically small
• One physical machine per service is clearly uneconomical…
• ...but deploying many orthogonal services on a single machine is a well-known operational nightmare (e.g. conflicting dependencies, shared fault domain)
• The key is to virtualize — but at what layer of the stack?
• Virtualization has ramifications with respect to performance and density — which is to say, economics
Hardware-level virtualization?
• The historical answer to virtualization — since the 1960s — has been to virtualize the hardware:
• A virtual machine is presented upon which each tenant runs an operating system that they choose (and must manage)
• There are as many operating systems on a machine as tenants!
• Can run entire legacy stacks unmodified...
• ...but operating systems are heavy and don’t play well with others with respect to resources like DRAM, CPU, I/O devices, etc.
• With microservices, overhead dominates!
Platform-level virtualization?
• Virtualizing at the application platform layer addresses the tenancy challenges of hardware virtualization, and presents a much more nimble (& developer friendly!) abstraction...
• ...but at the cost of dictating abstraction to the developer
• This is the “Google App Engine” problem: developers are in a straightjacket where toy programs are easy — but sophisticated applications are impossible
• Virtualizing at the application platform layer poses many other challenges with respect to security, containment, etc.
OS-level virtualization!
• Virtualizing at the operating system hits a sweet spot:• A single operating system (i.e. a single kernel) allows for efficient use of
hardware resources, maximizing tenancy and performance
• Disjoint instances are securely compartmentalized by the operating system
• Gives tenants what appears to be a virtual machine (albeit a very fast one) on which to run higher-level software: PaaS ease with IaaS generality
• Also: boots like a bandit!
• Model was pioneered by FreeBSD jails and taken to their logical extreme by Solaris zones — and then aped by Linux containers
OS-level virtualization at Joyent
• Joyent runs OS containers in the cloud via SmartOS — and we have run containers in multi-tenant production since ~2006
• Adding support for hardware-based virtualization circa 2011 strengthened our resolve with respect to OS-based virtualization
• This is especially true for microservices: as services get small, overhead and latency become increasingly important — and OS containers become a bigger and bigger win
• Our belief in containers for microservices comes from our own experience in developing cloud management software...
SmartDataCenter as microservices microcosm
• SmartDataCenter is our cloud orchestration system
• Reflects its times: born in ~2006 as a Rails app; by late 2011, consisted of some node.js microservices + a Ruby monstrosity
• In 2012/2013, we rewrote SmartDataCenter to be entirely node.js-based, microservices-based and container-based
• Open sourced in November 2014: https://github.com/joyent/sdc
• Learned many things along the way — some more surprising than others...
Microservices: State management
• While decentralization is an important tenet of microservices, be careful about applying this to canonical state
• State should be offered through its own microservice that can be rigorously developed, tightly controlled, closely managed, etc.
• To this end, we developed Moray, a node.js-based key/value store backed by replicated Postgres
• Moray’s Postgres replication is managed (and automated) with Manatee, a node.js + ZooKeeper-based system
• Essentially all of our services are backed by Moray + Manatee
Microservices: Deployment + config
• With their larger numbers, microservices will exacerbate any deployment, configuration or image management pain
• To alleviate this, we developed our own services API (SAPI) — but the results were still dissatisfying...
• A little-known PaaS called dotCloud saw the same problem and developed a container-based solution for image management...
• Docker allows developers to encode deployment procedures via an image that can in turn be deployed as a container
• Docker will do to apt what apt did to tar
Microservices: CAP omnipresence
• Some malcontent towards microservices may stem from the breathlessness of some of its proponents, especially with respect to resilience and reliability
• Microservices make your system more distributed — and therefore more vulnerable to its CAP tradeoffs
• More monolithic systems are able to hide from CAP — or are deliberately C+A systems
• It’s not just CAP: performance pathologies can be much more difficult to understand in a distributed system
Microservices: System topology
• Our system is historically an AMQP/HTTP hybrid
• The principles of AMQP are very attractive…
• ...but in practice, implementation and operational issues have made message brokers a single point of failure
• We still use AMQP for some kinds of broadcast traffic, but we try to go point-to-point and HTTP as quickly as we can
Microservices: Service discovery
• Moving from monolithic to microservices means moving from tightly coupled to a loosely federated system — and necessitates service discovery
• We developed Binder, a node.js-based DNS + ZooKeeper system
• While Binder has been sufficient for our needs, service discovery remains broadly a thorny problem — especially around rollbacks, service maintenance, etc.
Microservices: Interface primacy
• Microservices have many more interfaces — so interface-based pathologies will naturally become more acute
• ...and JSON-based systems can exhibit interface-based pathologies not seen in more rigid systems
• We use JSON Schema (v3) to add rigor without sacrificing agility
• We implement HTTP routes with Restify, a node.js module that includes support for interface versioning (+ DTrace support!)
• Postel’s Law (“...an implementation should be conservative in its sending behavior, and liberal in its receiving behavior”) remains helpful, but apply with moderation!
Microservices: Pathological behavior
• Systems will misbehave — and distributed systems have much more surface area; hope is not a strategy!
• We have invested heavily in node.js-based infrastructure to allow us to quickly diagnose aberrant behavior in production:
• Bunyan is a logging facility for node.js (+ DTrace support!)
• SmartOS DTrace support for node.js profiling
• SmartOS support for JavaScript heap analysis from core files
• See @dapsays’ “Industrial-grade node.js” talk!
Microservices: Containers
• OS containers have made our microservices approach possible
• They have proven essential for every aspect of our system: speed of deployment, robustness, latency, debuggability, etc.
• Historically, our approach has been limited to SmartOS...
• While SmartOS is Unix, it’s not Linux — and we understand that many (most?) have an established Linux binary footprint...
• We have implemented support for LX-branded zones in SmartOS that allow for Linux binaries (and distributions!) to run out-of-the box in a secure OS container
Microservices, containers and node.js
• Microservices represent a real and important trend in systems
• We have found node.js to be the perfect fit for implementing these systems — especially given its production debugging support
• Containers have been an essential ingredient for our own deployment of microservices-based architectures
• LX-branded zones will allow our microservices approach to be applicable to a much broader audience!
Thank you!
• @mcavage, @pfmooney, @dapsays, @yunongx, @tumederanges for Moray/Manatee (https://github.com/joyent/moray)
• @dapsays and @tjfontaine for bringing humanity debugging support for node.js in production
• @trentmick for Bunyan (https://github.com/trentm/node-bunyan)
• @trevero, @notmatt for Binder (https://github.com/joyent/binder)
• @joshwilsdon for much of SDC — and for being right about AMQP
• Jerry Jelinek, @pfmooney, @jmclulow and @jperkin for their work onLX-branded zones