acug datafiniti pellon_sept2013
TRANSCRIPT
![Page 1: Acug datafiniti pellon_sept2013](https://reader033.vdocuments.us/reader033/viewer/2022060200/55987db11a28ab057e8b460b/html5/thumbnails/1.jpg)
Building Resource Efficient Distributed Systems At Scale
Michael Pellon (@p3ll0n)Operations Engineer
![Page 2: Acug datafiniti pellon_sept2013](https://reader033.vdocuments.us/reader033/viewer/2022060200/55987db11a28ab057e8b460b/html5/thumbnails/2.jpg)
In the ideal world . . .
. . . we want to be here
cost
wo
rk
![Page 3: Acug datafiniti pellon_sept2013](https://reader033.vdocuments.us/reader033/viewer/2022060200/55987db11a28ab057e8b460b/html5/thumbnails/3.jpg)
But in the “real” world . . .
. . . we usually find ourselves here
cost
wo
rk
![Page 4: Acug datafiniti pellon_sept2013](https://reader033.vdocuments.us/reader033/viewer/2022060200/55987db11a28ab057e8b460b/html5/thumbnails/4.jpg)
Big “jumps” are possible in a relatively short timeframe!
req
uest
s p
er s
eco
nd
~ 2009 - 2012
joules
~ 2013 - ???
RPS/dollar: 4.1xRPS/joule: 4.3xRPS/rack: 10.4x
![Page 5: Acug datafiniti pellon_sept2013](https://reader033.vdocuments.us/reader033/viewer/2022060200/55987db11a28ab057e8b460b/html5/thumbnails/5.jpg)
Avoid “density without value”!
![Page 6: Acug datafiniti pellon_sept2013](https://reader033.vdocuments.us/reader033/viewer/2022060200/55987db11a28ab057e8b460b/html5/thumbnails/6.jpg)
“Respect the problem.”
- Theo Schlossnagle, OmniTI
![Page 7: Acug datafiniti pellon_sept2013](https://reader033.vdocuments.us/reader033/viewer/2022060200/55987db11a28ab057e8b460b/html5/thumbnails/7.jpg)
There is no free lunch.
![Page 8: Acug datafiniti pellon_sept2013](https://reader033.vdocuments.us/reader033/viewer/2022060200/55987db11a28ab057e8b460b/html5/thumbnails/8.jpg)
Tradeoffs cannot be solved by marketing.
![Page 9: Acug datafiniti pellon_sept2013](https://reader033.vdocuments.us/reader033/viewer/2022060200/55987db11a28ab057e8b460b/html5/thumbnails/9.jpg)
How to play with the “big boys” when you are not as “big” as them ...
![Page 10: Acug datafiniti pellon_sept2013](https://reader033.vdocuments.us/reader033/viewer/2022060200/55987db11a28ab057e8b460b/html5/thumbnails/10.jpg)
Lesson #1
Understand deeply the relationship between latency, bandwidth and capacity
across all levels of your infrastructure.
![Page 11: Acug datafiniti pellon_sept2013](https://reader033.vdocuments.us/reader033/viewer/2022060200/55987db11a28ab057e8b460b/html5/thumbnails/11.jpg)
< disk seeks = higher performance
![Page 12: Acug datafiniti pellon_sept2013](https://reader033.vdocuments.us/reader033/viewer/2022060200/55987db11a28ab057e8b460b/html5/thumbnails/12.jpg)
> caching = higher performance
![Page 13: Acug datafiniti pellon_sept2013](https://reader033.vdocuments.us/reader033/viewer/2022060200/55987db11a28ab057e8b460b/html5/thumbnails/13.jpg)
We end up with an ever increasing amount of our cheap DRAM is used to hide the terrible latency of our cheap storage.
![Page 14: Acug datafiniti pellon_sept2013](https://reader033.vdocuments.us/reader033/viewer/2022060200/55987db11a28ab057e8b460b/html5/thumbnails/14.jpg)
This growing split between the bandwidth and latency of our storage systems only becomes apparent at large scale.
![Page 15: Acug datafiniti pellon_sept2013](https://reader033.vdocuments.us/reader033/viewer/2022060200/55987db11a28ab057e8b460b/html5/thumbnails/15.jpg)
CPU DRAM LAN Disk
Bandwidth 1.50 1.27 1.39 1.28
Latency 1.17 1.07 1.12 1.09
Annual Bandwidth and Latency Improvements (Patterson, 2004)
* Extracted from leading commodity components over the last 25 years and what is reported is the multiplicative performance increase per year.
➔ CPU fastest to change and DRAM is the slowest.
![Page 16: Acug datafiniti pellon_sept2013](https://reader033.vdocuments.us/reader033/viewer/2022060200/55987db11a28ab057e8b460b/html5/thumbnails/16.jpg)
CPU DRAM LAN Disk
Bandwidth 1.50 1.27 1.39 1.28
Latency 1.17 1.07 1.12 1.09
Annual Bandwidth and Latency Improvements (Patterson, 2004)
* Extracted from leading commodity components over the last 25 years and what is reported is the multiplicative performance increase per year.
➔ CPU fastest to change and DRAM is the slowest.
➔ Latency is driven by physical limits whereas bandwidth can be addressed through parallelism.
![Page 17: Acug datafiniti pellon_sept2013](https://reader033.vdocuments.us/reader033/viewer/2022060200/55987db11a28ab057e8b460b/html5/thumbnails/17.jpg)
CPU DRAM LAN Disk
Bandwidth 1.50 1.27 1.39 1.28
Latency 1.17 1.07 1.12 1.09
Annual Bandwidth and Latency Improvements (Patterson, 2004)
* Extracted from leading commodity components over the last 25 years and what is reported is the multiplicative performance increase per year.
➔ CPU fastest to change and DRAM is the slowest.
➔ Latency is driven by physical limits whereas bandwidth can be addressed through parallelism.
➔ Bountiful bandwidth with lagging latency!
![Page 18: Acug datafiniti pellon_sept2013](https://reader033.vdocuments.us/reader033/viewer/2022060200/55987db11a28ab057e8b460b/html5/thumbnails/18.jpg)
CPU DRAM LAN Disk
Bandwidth 1.50 1.27 1.39 1.28
Capacity -- 1.52 -- 1.48
Annual Bandwidth and Capacity Improvements (Patterson, 2004)
* Extracted from leading commodity components over the last 25 years and what is reported is the multiplicative performance increase per year.
➔ Widening gap between bandwidth and capacity.
![Page 19: Acug datafiniti pellon_sept2013](https://reader033.vdocuments.us/reader033/viewer/2022060200/55987db11a28ab057e8b460b/html5/thumbnails/19.jpg)
➔ Widening gap between bandwidth and capacity.
➔ Time to read a complete disk with random IO is increasing 22x / decade or 36% / year.
CPU DRAM LAN Disk
Bandwidth 1.50 1.27 1.39 1.28
Capacity -- 1.52 -- 1.48
Annual Bandwidth and Capacity Improvements (Patterson, 2004)
* Extracted from leading commodity components over the last 25 years and what is reported is the multiplicative performance increase per year.
![Page 20: Acug datafiniti pellon_sept2013](https://reader033.vdocuments.us/reader033/viewer/2022060200/55987db11a28ab057e8b460b/html5/thumbnails/20.jpg)
➔ Widening gap between bandwidth and capacity.
➔ Time to read a complete disk with random IO is increasing 22x / decade or 36% / year.
➔ Now our applications cannot afford to have a cache miss!
CPU DRAM LAN Disk
Bandwidth 1.50 1.27 1.39 1.28
Capacity -- 1.52 -- 1.48
Annual Bandwidth and Capacity Improvements (Patterson, 2004)
* Extracted from leading commodity components over the last 25 years and what is reported is the multiplicative performance increase per year.
![Page 21: Acug datafiniti pellon_sept2013](https://reader033.vdocuments.us/reader033/viewer/2022060200/55987db11a28ab057e8b460b/html5/thumbnails/21.jpg)
Solutions
Caching, prediction and replication.
![Page 22: Acug datafiniti pellon_sept2013](https://reader033.vdocuments.us/reader033/viewer/2022060200/55987db11a28ab057e8b460b/html5/thumbnails/22.jpg)
Solutions
Caching, prediction and replication.
![Page 23: Acug datafiniti pellon_sept2013](https://reader033.vdocuments.us/reader033/viewer/2022060200/55987db11a28ab057e8b460b/html5/thumbnails/23.jpg)
Tape is dead.Disk is tape.Flash is disk.
RAM locality is king.
- Jim Gray, Microsoft (2006)
![Page 24: Acug datafiniti pellon_sept2013](https://reader033.vdocuments.us/reader033/viewer/2022060200/55987db11a28ab057e8b460b/html5/thumbnails/24.jpg)
Requires very careful attention to durability.
![Page 25: Acug datafiniti pellon_sept2013](https://reader033.vdocuments.us/reader033/viewer/2022060200/55987db11a28ab057e8b460b/html5/thumbnails/25.jpg)
Solutions
Caching, prediction and replication.
![Page 26: Acug datafiniti pellon_sept2013](https://reader033.vdocuments.us/reader033/viewer/2022060200/55987db11a28ab057e8b460b/html5/thumbnails/26.jpg)
Expend bandwidth to reduce apparent latency.
![Page 27: Acug datafiniti pellon_sept2013](https://reader033.vdocuments.us/reader033/viewer/2022060200/55987db11a28ab057e8b460b/html5/thumbnails/27.jpg)
Solutions
Caching, prediction and replication.
![Page 28: Acug datafiniti pellon_sept2013](https://reader033.vdocuments.us/reader033/viewer/2022060200/55987db11a28ab057e8b460b/html5/thumbnails/28.jpg)
Expend capacity to reduce apparent latency.
![Page 29: Acug datafiniti pellon_sept2013](https://reader033.vdocuments.us/reader033/viewer/2022060200/55987db11a28ab057e8b460b/html5/thumbnails/29.jpg)
Avoid the problem entirely by using more servers with cheaper, lower powered processors that more closely
match the capabilities of the memory subsystem.
![Page 30: Acug datafiniti pellon_sept2013](https://reader033.vdocuments.us/reader033/viewer/2022060200/55987db11a28ab057e8b460b/html5/thumbnails/30.jpg)
➔ Leverages the massive volume economics of the smart device (e.g., cell phones and tablets) market.
![Page 31: Acug datafiniti pellon_sept2013](https://reader033.vdocuments.us/reader033/viewer/2022060200/55987db11a28ab057e8b460b/html5/thumbnails/31.jpg)
➔ Leverages the massive volume economics of the smart device (e.g., cell phones and tablets) market.
➔ Most workloads are not pushing CPU limits but are IO (disk, network or memory) bound so spending more on a faster CPU will not deliver results.
![Page 32: Acug datafiniti pellon_sept2013](https://reader033.vdocuments.us/reader033/viewer/2022060200/55987db11a28ab057e8b460b/html5/thumbnails/32.jpg)
➔ Leverages the massive volume economics of the smart device (e.g., cell phones and tablets) market.
➔ Most workloads are not pushing CPU limits but are IO (disk, network or memory) bound so spending more on a faster CPU will not deliver results.
➔ Price/performance in the device market is far better than current generation server CPUs because there is far less competition in server processors prices tend to be higher and price/performance relatively low.
![Page 33: Acug datafiniti pellon_sept2013](https://reader033.vdocuments.us/reader033/viewer/2022060200/55987db11a28ab057e8b460b/html5/thumbnails/33.jpg)
➔ Leverages the massive volume economics of the smart device (e.g., cell phones and tablets) market.
➔ Most workloads are not pushing CPU limits but are IO (disk, network or memory) bound so spending more on a faster CPU will not deliver results.
➔ Price/performance in the device market is far better than current generation server CPUs because there is far less competition in server processors prices tend to be higher and price/performance relatively low.
➔ Server CPU = ~$300 - ~$1000
![Page 34: Acug datafiniti pellon_sept2013](https://reader033.vdocuments.us/reader033/viewer/2022060200/55987db11a28ab057e8b460b/html5/thumbnails/34.jpg)
➔ Leverages the massive volume economics of the smart device (e.g., cell phones and tablets) market.
➔ Most workloads are not pushing CPU limits but are IO (disk, network or memory) bound so spending more on a faster CPU will not deliver results.
➔ Price/performance in the device market is far better than current generation server CPUs because there is far less competition in server processors prices tend to be higher and price/performance relatively low.
➔ Server CPU = ~$300 - ~$1000
➔ ARM CPU = ~$15 / Intel Atom S1200 = ~$65
![Page 35: Acug datafiniti pellon_sept2013](https://reader033.vdocuments.us/reader033/viewer/2022060200/55987db11a28ab057e8b460b/html5/thumbnails/35.jpg)
➔ Leverages the massive volume economics of the smart device (e.g., cell phones and tablets) market.
➔ Most workloads are not pushing CPU limits but are IO (disk, network or memory) bound so spending more on a faster CPU will not deliver results.
➔ Price/performance in the device market is far better than current generation server CPUs because there is far less competition in server processors prices tend to be higher and price/performance relatively low.
➔ Server CPU = ~$300 - ~$1000
➔ ARM CPU = ~$15 / Intel Atom S1200 = ~$65 ➔ ~25% the processing rate @ ~10% the cost!
![Page 36: Acug datafiniti pellon_sept2013](https://reader033.vdocuments.us/reader033/viewer/2022060200/55987db11a28ab057e8b460b/html5/thumbnails/36.jpg)
➔ Leverages the massive volume economics of the smart device (e.g., cell phones and tablets) market.
➔ Most workloads are not pushing CPU limits but are IO (disk, network or memory) bound so spending more on a faster CPU will not deliver results.
➔ Price/performance in the device market is far better than current generation server CPUs because there is far less competition in server processors prices tend to be higher and price/performance relatively low.
➔ Server CPU = ~$300 - ~$1000
➔ ARM CPU = ~$15 / Intel Atom S1200 = ~$65 ➔ ~25% the processing rate @ ~10% the cost!
➔ Volume of the device ecosystem fuels innovation so the performance gap shrinks each generation!
![Page 37: Acug datafiniti pellon_sept2013](https://reader033.vdocuments.us/reader033/viewer/2022060200/55987db11a28ab057e8b460b/html5/thumbnails/37.jpg)
➔ These machines also help with one of the biggest and most certainly the fastest growing cost of any data center -- power!
![Page 38: Acug datafiniti pellon_sept2013](https://reader033.vdocuments.us/reader033/viewer/2022060200/55987db11a28ab057e8b460b/html5/thumbnails/38.jpg)
➔ These machines also help with one of the biggest and most certainly the fastest growing cost of any data center -- power!
➔ Your typical 8-core server uses ~200W idle and above 600W TDP (full tilt boogie)!
![Page 39: Acug datafiniti pellon_sept2013](https://reader033.vdocuments.us/reader033/viewer/2022060200/55987db11a28ab057e8b460b/html5/thumbnails/39.jpg)
➔ These machines also help with one of the biggest and most certainly the fastest growing cost of any data center -- power!
➔ Your typical 8-core server uses ~200W idle and above 600W TDP (full tilt boogie)!
➔ Bringing 30A @ 208V to each rack that is a 6.2 kW rack (and I know of folks provisioning 12 - 14 kW racks just to fill it up 50%!)
![Page 40: Acug datafiniti pellon_sept2013](https://reader033.vdocuments.us/reader033/viewer/2022060200/55987db11a28ab057e8b460b/html5/thumbnails/40.jpg)
➔ These machines also help with one of the biggest and most certainly the fastest growing cost of any data center -- power!
➔ Your typical 8-core server uses ~200W idle and above 600W TDP (full tilt boogie)!
➔ Bringing 30A @ 208V to each rack that is a 6.2 kW rack (and I know of folks provisioning 12 - 14 kW racks just to fill it up 50%!)
➔ If you can save a lot on op-ex by spending a little more on cap-ex it’s a great bargain! (ask your CFO!)
![Page 41: Acug datafiniti pellon_sept2013](https://reader033.vdocuments.us/reader033/viewer/2022060200/55987db11a28ab057e8b460b/html5/thumbnails/41.jpg)
➔ People costs dominate the enterprise player’s data centers but it is very easy and cheap to not let them dominate your costs.
![Page 42: Acug datafiniti pellon_sept2013](https://reader033.vdocuments.us/reader033/viewer/2022060200/55987db11a28ab057e8b460b/html5/thumbnails/42.jpg)
➔ People costs dominate the enterprise player’s data centers but it is very easy and cheap to not let them dominate your costs.
➔ The barrier to entry into automation tools (Puppet, Chef, etc) has never been lower and their penetration into existing systems (networking devices, etc) has never been higher.
![Page 43: Acug datafiniti pellon_sept2013](https://reader033.vdocuments.us/reader033/viewer/2022060200/55987db11a28ab057e8b460b/html5/thumbnails/43.jpg)
Lesson #2
Understand that distributed systems are fundamentally about dealing with
distance and having more than one thing.
![Page 44: Acug datafiniti pellon_sept2013](https://reader033.vdocuments.us/reader033/viewer/2022060200/55987db11a28ab057e8b460b/html5/thumbnails/44.jpg)
Currently writing distributed applications is usually not indistinguishable from writing non-distributed applications.
![Page 45: Acug datafiniti pellon_sept2013](https://reader033.vdocuments.us/reader033/viewer/2022060200/55987db11a28ab057e8b460b/html5/thumbnails/45.jpg)
Despite the non-zero probability of failure within a nearly every aspect of modern computers;
developers of non-distributed applications do not routinely maintain a concept of failing hardware.
![Page 46: Acug datafiniti pellon_sept2013](https://reader033.vdocuments.us/reader033/viewer/2022060200/55987db11a28ab057e8b460b/html5/thumbnails/46.jpg)
complexity
![Page 47: Acug datafiniti pellon_sept2013](https://reader033.vdocuments.us/reader033/viewer/2022060200/55987db11a28ab057e8b460b/html5/thumbnails/47.jpg)
instructions
behaviors
![Page 48: Acug datafiniti pellon_sept2013](https://reader033.vdocuments.us/reader033/viewer/2022060200/55987db11a28ab057e8b460b/html5/thumbnails/48.jpg)
instructions
behaviors
programming language
hardwarelimitations
![Page 49: Acug datafiniti pellon_sept2013](https://reader033.vdocuments.us/reader033/viewer/2022060200/55987db11a28ab057e8b460b/html5/thumbnails/49.jpg)
The difference between an entire data center and a single computer should only be quantitative not qualitative.
![Page 50: Acug datafiniti pellon_sept2013](https://reader033.vdocuments.us/reader033/viewer/2022060200/55987db11a28ab057e8b460b/html5/thumbnails/50.jpg)
Since software development is an entirely quantitative pursuit we should be able to conceal the
entire complexity of the Internet within software.
![Page 51: Acug datafiniti pellon_sept2013](https://reader033.vdocuments.us/reader033/viewer/2022060200/55987db11a28ab057e8b460b/html5/thumbnails/51.jpg)
A clear trajectory in the same direction …
➔ Erlang OTP (Ericsson) and GoCircuit (Tumblr).
![Page 52: Acug datafiniti pellon_sept2013](https://reader033.vdocuments.us/reader033/viewer/2022060200/55987db11a28ab057e8b460b/html5/thumbnails/52.jpg)
A clear trajectory in the same direction …
➔ Erlang OTP (Ericsson) and GoCircuit (Tumblr).
➔ General-purpose distributed file systems (and protocols) spanning multiple globally distributed data centers.
![Page 53: Acug datafiniti pellon_sept2013](https://reader033.vdocuments.us/reader033/viewer/2022060200/55987db11a28ab057e8b460b/html5/thumbnails/53.jpg)
A clear trajectory in the same direction …
➔ Erlang OTP (Ericsson) and GoCircuit (Tumblr).
➔ General-purpose distributed file systems (and protocols) spanning multiple globally distributed data centers.
➔ Datacenter-scale job schedulers also abound (Google’s Borg/Omega, Apache Mesos, Airbnb’s Chronos, etc.)
![Page 54: Acug datafiniti pellon_sept2013](https://reader033.vdocuments.us/reader033/viewer/2022060200/55987db11a28ab057e8b460b/html5/thumbnails/54.jpg)
A clear trajectory in the same direction …
➔ Erlang OTP (Ericsson) and GoCircuit (Tumblr).
➔ General-purpose distributed file systems (and protocols) spanning multiple globally distributed data centers.
➔ Datacenter-scale job schedulers also abound (Google’s Borg/Omega, Apache Mesos, Airbnb Chronos, etc.)
➔ nanomsg scalability protocols (M. Sustrik).
![Page 55: Acug datafiniti pellon_sept2013](https://reader033.vdocuments.us/reader033/viewer/2022060200/55987db11a28ab057e8b460b/html5/thumbnails/55.jpg)
A clear trajectory in the same direction …
➔ Erlang OTP (Ericsson) and GoCircuit (Tumblr).
➔ General-purpose distributed file systems (and protocols) spanning multiple globally distributed data centers.
➔ Datacenter-scale job schedulers also abound (Google’s Borg/Omega, Mesos, Airbnb, etc.)
➔ nanomsg scalability protocols (M. Sustrik).
➔ Not only possible but the clear “silent” choice of the majority!
![Page 56: Acug datafiniti pellon_sept2013](https://reader033.vdocuments.us/reader033/viewer/2022060200/55987db11a28ab057e8b460b/html5/thumbnails/56.jpg)
So how to play “big” when you’re “small”?
➔ You need to understand your technical substrate both broadly and deeply so you know where to focus all your resources most effectively.
![Page 57: Acug datafiniti pellon_sept2013](https://reader033.vdocuments.us/reader033/viewer/2022060200/55987db11a28ab057e8b460b/html5/thumbnails/57.jpg)
So how to play “big” when you’re “small”?
➔ You need to understand your technical substrate both broadly and deeply so you know where to focus all your resources most effectively.
➔ That understanding will allow to you operate at economies of scale that free up your most important resource -- people.
![Page 58: Acug datafiniti pellon_sept2013](https://reader033.vdocuments.us/reader033/viewer/2022060200/55987db11a28ab057e8b460b/html5/thumbnails/58.jpg)
So how to play “big” when you’re “small”?
➔ You need to understand your technical substrate both broadly and deeply so you know where to focus all your resources most effectively.
➔ That understanding will allow to you operate at economies of scale that free up your most important resource -- people.
➔ But remember the focus of our resources is not necessarily where your resources should be focused nor is anyone elses.
![Page 59: Acug datafiniti pellon_sept2013](https://reader033.vdocuments.us/reader033/viewer/2022060200/55987db11a28ab057e8b460b/html5/thumbnails/59.jpg)
So how to play “big” when you’re “small”?
➔ Look for areas where a qualitative difference could easily become merely a quantitative difference.
![Page 60: Acug datafiniti pellon_sept2013](https://reader033.vdocuments.us/reader033/viewer/2022060200/55987db11a28ab057e8b460b/html5/thumbnails/60.jpg)
So how to play “big” when you’re “small”?
➔ Look for areas where a qualitative difference could easily become merely a quantitative difference.
➔ Quantitative problems are easy to solve through technology, however, qualitative problems are very intractable through technology alone.