what is the cloud? (english)

52
What is the Cloud? Kristian Köhntopp Old Fart SysEleven © 2015 Kristian Köhntopp

Upload: kristian-koehntopp

Post on 13-Feb-2017

865 views

Category:

Internet


0 download

TRANSCRIPT

Page 1: What is the Cloud? (english)

What is the Cloud?Kristian Köhntopp Old Fart SysEleven

© 2015 Kristian Köhntopp

Page 2: What is the Cloud? (english)

Chapter 1: Hardware for Hipsters

Page 3: What is the Cloud? (english)

Help, too much compute!http://hpserver.by/images/detailed/1/hp_dl380p_gen8_inside_in_t7e8-xt.jpg © 2014 HP Press Material

Page 4: What is the Cloud? (english)

http://hpserver.by/images/detailed/1/hp_dl380p_gen8_inside_in_t7e8-xt.jpg © 2014 HP Press Material

CPU

used 8

unused 40

RAM

used 16

unused 240

CPU

used 4

unused 44

RAM

used 8

unused 248

Java Appserver

PHP Appserver

Page 5: What is the Cloud? (english)

"Solution": virtual machines5

Hardware Node

vSwitchvRouter

VM VM VM VM VM

Page 6: What is the Cloud? (english)

"Solution": virtual machines6

4 Cores8 GB RAM50 GB Ephemeral Disk

8 Cores32 GB RAM50 GB Ephemeral Disk

2 TB Persistent Volume

2 Network Interfaces2 Cores4 GB RAM50 GB Ephemeral Disk

Page 7: What is the Cloud? (english)

Booting an Instance, parts needed.

• Turn Boot Image into Ephemeral Disk

• Attach Volume • Attach Network • Start VM • DHCP • Config: Hostname, Startscript

7

8 Cores32 GB RAM50 GB Ephemeral Disk

2 TB Persistent Volume

Glance

Cinder

Neutron

Nova

cloud-init

Page 8: What is the Cloud? (english)

Handling hypervisor failure

• Essential: Persistent data from volume.

• Everything else is faster to recreate than to recover.

• Iff: Setup is completely automated.

8

8 Cores32 GB RAM50 GB Ephemeral Disk

2 TB Persistent Volume

Puppet, Ansible, Salt, Chef

Page 9: What is the Cloud? (english)

More than a single machine…9

CPU, RAM

StorageNetwork

OverlayUnderlay

Page 10: What is the Cloud? (english)

[x] It's complicated…

• Underlay: • Multiple Hosts (how many?), shared Storage, sufficient

amount of network • Overlay:

• freely defineable Networks, freely defineable Storage, defineable Guests, defineable Firewall- and Loadbalancer-Rulesets

10

Page 11: What is the Cloud? (english)

Infrastructure as Code…11

Page 12: What is the Cloud? (english)

Merkel: »Das Internet ist für uns alle #neuland«https://en.wikipedia.org/wiki/File:Angela_Merkel_Juli_2010_-_3zu4.jpg, Armin Linnartz

Page 13: What is the Cloud? (english)

Problem 1: Storage

• Filer? • Pro: proven technology, sufficient bandwidth, storage

network separated. • Contra: How to scale this in size and financially?

Storage network separated.

• Alternatives?

13

Page 14: What is the Cloud? (english)

I can haz live migration, plz?

• Yes, but for a price. • Live migration good for reboot

of Underlay nodes, fixup of scheduling problems, data recovery

• requires: Shared Storage

14

Page 15: What is the Cloud? (english)

Problem 2: Network-Capacity

• Convergence: Use disks in computes for storage. • Hyperconvergence: Fold storage network into production

network. • Examples: HDFS, Ceph, Quobyte, …

• 3 Copies, one Off-Rack • Latency? IOPS? Bandwidth? • How much network per compute node?

15

Mercury Redstone Connector MR-1 (1960) https://www.flickr.com/photos/jurvetson/5691350527 Steve Jurvetson (CC-BY)

Page 16: What is the Cloud? (english)

2005: 50 DL360 = 50 Cores, 50 GBit/s Net, ~ 2 Racks

Page 17: What is the Cloud? (english)

2015: 2 HE, 48 Cores, 2x 10 GBit/s Net = ~40% Net

Page 18: What is the Cloud? (english)

Ohai, can I haz 2x 25 Gbit/s, plz?

Page 19: What is the Cloud? (english)

„Be careful what you wish for!“

19

Page 20: What is the Cloud? (english)

16 DL380 with 2x 25 GBit/s per Rack, Ceph (Dramatization, Do Not Attempt)

Top of Rack Switch

Page 21: What is the Cloud? (english)

18U Rack (resize vertically) 18U Rack (resize vertically) 18U Rack (resize vertically)

2x 10GBit/s(2400 MB/sec) 2x 25 GBit/s(6000 MB/sec)

20 Boxen,40 Interfaces,~ 1 TBit/saggregated

Storage Traffic (East-West Traffic)

Internet (North South Traffic)

VM with Volume

Capacity problem? What capacity problem?21

Page 22: What is the Cloud? (english)

Terasort to watch the world burn22

http://www.slideshare.net/pramodbiligiri/shuffle-phase-as-the-bottleneck-in-hadoop-terasort by http://www.slideshare.net/pramodbiligiri/presentations

Page 23: What is the Cloud? (english)

Meanwhile, at the Chocolate Factory…23

Google “Jupiter” Superblock, “1 Petabit/sec of total bisection bandwidth”© 2015 Google Presserelease

Page 24: What is the Cloud? (english)

Build principle: Leaf and Spine24

http://bradhedlund.com/2012/01/25/construct-a-leaf-spine-design-with-40g-or-10g-an-observation-in-scaling-the-fabric/

Page 25: What is the Cloud? (english)

Net >> Storage

• Usable Storage needs a lot of network • “Leaf and Spine” needs a central flow controller

• Several vendors groked that. • But there are no large scale functional deployments.

25

Page 26: What is the Cloud? (english)

Contrail26

Page 27: What is the Cloud? (english)

Midonet27

Page 28: What is the Cloud? (english)

Side note: Power, Cooling

Page 29: What is the Cloud? (english)

“With great power comes a great electricity bill…”web.de Amalienbadstrasse, Karlsruhe, (C) 2004 Kristian Köhntopp

Page 30: What is the Cloud? (english)

High Density

• 6 blade centers or 16 2HU servers ~ 20kW per rack

• Air cooled: • “specific heat capacity”

(Heat 1kg by 1K) • hot aisle/cold aisle

30

web.de Amalienbadstrasse, Karlsruhe, (C) 2004 Kristian Köhntopp

Page 31: What is the Cloud? (english)

Chapter 2: Overlay and Underlay

Page 32: What is the Cloud? (english)

Scheduler, Spread Strategy32

VMHost

Page 33: What is the Cloud? (english)

http://hpserver.by/images/detailed/1/hp_dl380p_gen8_inside_in_t7e8-xt.jpg © 2014 HP Press Material

CPU

used 8

unused 40

RAM

used 16

unused 240

CPU

used 4

unused 44

RAM

used 8

unused 248

Java Appserver

PHP Appserver

IMBA: Uneven Ressorce Usage

Page 34: What is the Cloud? (english)

Which resource is needed most?34

Page 35: What is the Cloud? (english)

Resources

• 48 Cores: • 256 GB RAM, 2x 10 GBit/s • 12x 3TB Disk (200 IOPS ea) or 4x 2TB SSD (20k IOPS ea)

• per Core (“Compute Unit”) • 5 GB RAM, 400 MBit/s, 50 IOPS Disk, 1500 IOPS SSD

35

Page 36: What is the Cloud? (english)

Flavors

• “Compute Unit”: “1/48 of a box” • 5 GB RAM, 400 MBit/s, 50 IOPS Disk, 1500 IOPS SSD

• Flavor: • x Compute Units • Flavor i = 2* Flavor (i-1) • no clipping waste

36

Page 37: What is the Cloud? (english)

Isolation: Quota on everything

• CPU Cores • RAM • Disk I/O (IOPS, MB/s) • Netz I/O (Bit/s)

37

VMHost

Page 38: What is the Cloud? (english)

Quota with Token Bucket

Arrival Rate

Volume = Elasticity

Consumption

Page 39: What is the Cloud? (english)

One Image, many instances39

H

HardwareNode

Ubuntu14.04 LTS

Appserver 1

Appserver 2

DatabaseMaster

copyon

write

download

Glance

More SSD

for every

body!

QCOW2:

Turn linear I/O

into random I/O

Page 40: What is the Cloud? (english)

Ephemeral vs. Persistent Volume40

MySQLDB

Master

/dev/vda

/dev/vdb

50 GBtied to VMlifetime defined by VM

Configureable sizeDetachable/Attachablelifetime variable(billed)

MySQLDB

Master

/dev/vda

Page 41: What is the Cloud? (english)

MySQLDB

Master

/dev/vda

/dev/vdb

50 GBtied to VMlifetime defined by VM

Configureable sizeDetachable/Attachablelifetime variable(billed)

MySQLDB

Master

/dev/vda

Floating IP41

Floating IP

Internal IP 1

Internal IP 2

Page 42: What is the Cloud? (english)

Distributed Anything42

Page 43: What is the Cloud? (english)

Distributed Anything43

ZK ZK ZK

Page 44: What is the Cloud? (english)

ZK ZK ZK

Distributed Anything44

Page 45: What is the Cloud? (english)

ZK ZK ZKZK ZK ZK

Distributed Anything45

Page 46: What is the Cloud? (english)

Distributed Anything

• “Cluster membership” using MySQL, Redis, MongoDB, … does not work.

• “Paxos", “Raft” and other proveable consensus algorithms do work (ZK, etcd, consul)

• when applied correctly • “Kyle Kingsbury Proof” (http://aphyr.com/tags/jepsen)

46

Page 47: What is the Cloud? (english)

Control Systems

• "should" vs. "is" state • within the same Paxos-Domain

• State Transition • Check, Update of "is"-state by measurement • self regulating

47

Page 48: What is the Cloud? (english)

Distributed Anything vs. Performance

• “Microservices” • “Pile of network traversals” • Disconnect, Partition • Throughput, Jitter • Asynchronous Calls? Straggler Handling? Total Jitter! • non-linear performance, non-linear operative complexity

48

Page 49: What is the Cloud? (english)
Page 50: What is the Cloud? (english)

• Virtualization is High Density Computing. • That's not cheaper, only different. • Especially a new network design is needed. • The SDN issue is mostly open, and a much harder nut

to crack than all other topics.

50

Page 51: What is the Cloud? (english)

• “Infrastructure as Code” is cool. • “Automated Provisioning”. • Network insufficiency is visible to upper layers, as

fsync/Commit insufficiency. • “Microservices”, “distributed anything” - avoid, if at all

possible. If not, do it properly. Overhead!

51

Page 52: What is the Cloud? (english)

52

?