the impact of cloud nsbcon ny by yves goeleven

47
The impact of cloud

Upload: particular-software

Post on 24-Jun-2015

86 views

Category:

Technology


0 download

DESCRIPTION

The impact of cloud Understanding cloud Failure is normal Size matters ‘At your service’ How to thrive

TRANSCRIPT

Page 1: The impact of cloud NSBCon NY by Yves Goeleven

The impactof cloud

Page 2: The impact of cloud NSBCon NY by Yves Goeleven

Yves GoelevenThe cloudy Belgian

• Founder of MessageHandler.net

• Developer on NServiceBus

• Microsoft Azure MVP

• @YvesGoeleven

Page 3: The impact of cloud NSBCon NY by Yves Goeleven

AgendaThe impact of cloud

• Understanding cloud• Failure is normal• Size matters• ‘At your service’• How to thrive

Page 4: The impact of cloud NSBCon NY by Yves Goeleven

Understanding

Page 5: The impact of cloud NSBCon NY by Yves Goeleven

Why people are interested? Various reasons

• Automation, • Scalability (scale out)• Elasticity (scale in again)• Cost• Globally available

Page 6: The impact of cloud NSBCon NY by Yves Goeleven

What is Azure?Global network of huge data centers operated by Microsoft

Page 7: The impact of cloud NSBCon NY by Yves Goeleven

200 services running on top

Storage Big data

Caching CDN

Database

Identity

Media Networking

Traffic

Messaging

Cloud ServicesWeb Sites

Connectivity

MobileVirtual Machines

Page 8: The impact of cloud NSBCon NY by Yves Goeleven
Page 9: The impact of cloud NSBCon NY by Yves Goeleven
Page 10: The impact of cloud NSBCon NY by Yves Goeleven

Datacenter Network ArchitectureQuantum10v2 Architecture (Gen 3)

TOR TOR TOR TOR

Spine Spine Spine

DCR DCR

DSDS

Spine

DC Routers

DS DS

100K servers, 50,000 Gbps

DS … DS… DC Spine Set

Spine Spine Spine Spine

Page 11: The impact of cloud NSBCon NY by Yves Goeleven

Older ArchitecturesDLA Architecture (Gen 1) Quantum10 Architecture (Gen 2)

TOR TOR TOR TOR

Spine Spine Spine

DCR DCR

BLBL

Spine

DC Routers

BL BL

30K servers, 30,000 Gbps10K Servers, 120 Gbs

40 Nodes

TOR

LB

LB

AGG

Digi

APC

LB

LB

AGG

LB

LB

AGG

LB

LB

AGG

LB

LB

AGG

LB

LB

AGG

20Racks

DC Router

Access Routers

Aggregation + LB

40 Nodes

TOR

Digi

APC

40 Nodes

TOR

Digi

APC

40 Nodes

TOR

Digi

APC

40 Nodes

TOR

Digi

APC

40 Nodes

TOR

Digi

APC

40 Nodes

TOR

Digi

APC

40 Nodes

TOR

Digi

APC

40 Nodes

TOR

Digi

APC

40 Nodes

TOR

Digi

APC

40 Nodes

TOR

Digi

APC

40 Nodes

TOR

Digi

APC

40 Nodes

TOR

Digi

APC

40 Nodes

TOR

Digi

APC

40 Nodes

TOR

Digi

APC

……

20Racks 20Racks 20Racks

…… … …

Page 12: The impact of cloud NSBCon NY by Yves Goeleven

Datacenter ClustersDatacenters are divided into “clusters”

• Approximately 1000 rack-mounted server (called “nodes”)• Provides a unit of fault isolation

• Each cluster is managed by a Fabric Controller (FC)• FC is responsible for:

• Blade provisioning• Blade management• Service deployment and lifecycle

Cluster1

Cluster2

Clustern…

Datacenter network

FC FC FC

Page 13: The impact of cloud NSBCon NY by Yves Goeleven

Fabric ControllerThe “kernel” of the cloud operating system

• Manages datacenter hardware• Manages Windows Azure services• Four main responsibilities:

• Datacenter resource allocation• Datacenter resource provisioning• Service lifecycle management• Service health management

• Inputs:• Description of the hardware and network resources it will

control• Service model and binaries for cloud applications

Server

Kernel

Process

Datacenter

Fabric Controller

Service

Windows Kernel

Server

WordSQL

Server

Fabric Controller

Datacenter

ExchangeOnline

SQL Azure

Page 14: The impact of cloud NSBCon NY by Yves Goeleven

Deployment

Page 15: The impact of cloud NSBCon NY by Yves Goeleven

ServicePackage

ServicePackage

Page 16: The impact of cloud NSBCon NY by Yves Goeleven

Service Resource AllocationComplicated stuff

• Goal: allocate service components to available resources while satisfying all hard constraints • HW requirements: CPU, Memory, Storage, Network• Fault domains• Update domains

• Secondary goal: Satisfy soft constraints • Prefer allocations which will simplify servicing the host OS/hypervisor• Optimize network proximity: pack nodes

Page 17: The impact of cloud NSBCon NY by Yves Goeleven

ServicePackage

Server Rack 1 Server Rack 2

Virtual machine

Virtual machine

Provision Role InstancesDeploy App CodeConfigure Network

Virtual machine

Virtual machine

Page 18: The impact of cloud NSBCon NY by Yves Goeleven

Service DeploymentProvisioning a Node

• Power on node• PXE-boot Maintenance OS• Agent formats disk and

downloads Host OS via Windows Deployment Services (WDS)

• Host OS boots, runs Sysprep /specialize, reboots

• FC connects with the “Host Agent”

Fabric ControllerRole

ImagesRole

ImagesRole

ImagesRole

Images

Image Repository

Maintenance OS Parent OS

Node

PXEServer

Maintenance OS Windows AzureOS

Windows Azure

OS

FC Host Agent

Windows Azure Hypervisor

Windows Deployment

Server

Page 19: The impact of cloud NSBCon NY by Yves Goeleven

Windows Azure Datacenter

ServicePackage

Provision Role InstancesDeploy App CodeConfigure Network

Page 20: The impact of cloud NSBCon NY by Yves Goeleven

Azure Datacenter

ServicePackage

Provision Role InstancesDeploy App CodeConfigure Network

Page 21: The impact of cloud NSBCon NY by Yves Goeleven

Network Load Balancer

Azure Datacenter

Network load-balancer configured for traffic

Provision Role InstancesDeploy App CodeConfigure Network

Page 22: The impact of cloud NSBCon NY by Yves Goeleven

Failure is normal

Page 23: The impact of cloud NSBCon NY by Yves Goeleven

Network Load Balancer

Azure Datacenter

Page 24: The impact of cloud NSBCon NY by Yves Goeleven

ImplicationsOf commodity hardware with self healing

• Machine failure is normal• Machines are small, low specs• Little to no redundancy

• Always partially broken state

• FC provisions ‘clean’ machines

• Can occur at any time• On failure• On host upgrades• On move

Page 25: The impact of cloud NSBCon NY by Yves Goeleven

How to handleSmall machines & continuous failure

• Distribute & duplicate application across multiple machines• At least 2 of each (3 or 5 is better)

• Accept that target machine may be down• Ensure temporal decoupling

• Do not design ‘RPC-style’, use queueing instead

• Do not put anything on disk• You will loose data!• Except for Virtual Machines with persisted data disks• Use azure storage services instead*

Page 26: The impact of cloud NSBCon NY by Yves Goeleven

Size matters

Page 27: The impact of cloud NSBCon NY by Yves Goeleven

Some numbersJust to illustrate how huge Azure is

• 13 regions• 321 IP ranges• 250.000+ customers• 2.000.000+ VM’s• 25+ trillion objects stored

Page 28: The impact of cloud NSBCon NY by Yves Goeleven

ImplicationsOf such a huge network

• Latency is a given• Network IO is typically bottleneck

• Network partitioning is normal• Distributed transactions flaky or not supported

Page 29: The impact of cloud NSBCon NY by Yves Goeleven

How to handleLatency & lack of DTC

• Most operations in the cloud will be IO/network bound• Multi threaded processing• Process messages, aka wait, in parallel• But don’t overdo it (12-24 per core)

• Lack of DTC• Keep operations atomic• Use compensation logic

Page 30: The impact of cloud NSBCon NY by Yves Goeleven

‘At your service’

Page 31: The impact of cloud NSBCon NY by Yves Goeleven

200 services running on top

Storage Big data

Caching CDN

Database

Identity

Media Networking

Traffic

Messaging

Cloud ServicesWeb Sites

Connectivity

MobileVirtual Machines

Page 32: The impact of cloud NSBCon NY by Yves Goeleven

As A ServiceUnderstanding

• Same capabilities as a product, but it’s not a product

• Operated by Vendor• Multitenant, aka shared hosting• Low marginal profits• ‘Capacity’ VS ‘provisioned’

Page 33: The impact of cloud NSBCon NY by Yves Goeleven

ImplicationsMicrosoft doesn’t want you to be in control!

• Individual resources are limited• Throttling• Your resources are moved around: unpredictable resource performance• Transient errors• No locks or very short locks• No local transactions!

• 1 exception: Sql as it is build into the protocol

Page 34: The impact of cloud NSBCon NY by Yves Goeleven

How to handleThrottling & lack of transactions

• Retry, Retry, Retry• On transient errors and throttles• With backoff algorithms

• Lack of Local transactions• Keep operations atomic with retries• Use compensation logic• Take care of idempotency

Page 35: The impact of cloud NSBCon NY by Yves Goeleven

Thrive

Page 36: The impact of cloud NSBCon NY by Yves Goeleven

How to thrive in the cloudUse NServiceBus to deal with shortcomings

• Messaging provides distribution & temporal decoupling

• Multithreading model built in• Ideal for network bound operations

• Retry, retry, retry• Azure transports use retries instead of relying on transactions• First Level Retry• Second Level Retry

Page 37: The impact of cloud NSBCon NY by Yves Goeleven

Choosing the right transportsBoth retry and are built for reliability

Azure ServiceBus

Azure Storage

Page 38: The impact of cloud NSBCon NY by Yves Goeleven

Azure Storage QueuesQueue construct in Azure Storage Services

• Extremely reliable• Very cheap• 200TB/500TB capacity limit• HTTP(S) based• Queue Peek Lock for retries• Max 7 days TTL!

Page 39: The impact of cloud NSBCon NY by Yves Goeleven

Azure ServiceBusBroker service in azure

• Highly Reliable• Supports queues, topics & subscriptions• 5GB capacity limit• No limit on TTL• TCP based, lower latency• Queue Peek Lock for retries

• Emulates local transactions

• Loads of additional features• Relatively expensive*

Page 40: The impact of cloud NSBCon NY by Yves Goeleven

Azure ServiceBusAdditional features & applicability

• Applicable• Duplicate detection: time window• Partitioning: Bundle of queues/topics• Message ordering• Deadlettering• Batched operations

• Not applicable:• Sessions: instance affinity for message set, used for large

messages, use databus instead

Page 41: The impact of cloud NSBCon NY by Yves Goeleven

How to thrive in the cloudDeal with cost model

• Worker role translates to at least 2 VM’s

• Endpoint per handler• Gets expensive very fast

• Shared endpoint hosting provided

Page 42: The impact of cloud NSBCon NY by Yves Goeleven

How to thrive in the cloudDo not trust your disk!

• Do not put anything on disk!• The machine will fail, the disk will be gone!• Anyone noticed there is no SLA for individual VM’s?

• Put your stuff in azure storage services• 99.99% SLA• Local Redundant & Geo Redunant

Page 43: The impact of cloud NSBCon NY by Yves Goeleven

How to thrive in the cloudNServiceBus helps a lot, but you need to code to it as well

• You need to take care of idempotency• Atomic messagehandler implementations• Saga’s too! Update saga state & nothing else!• Use saga’s to coordinate compensation logic• Check for retries• Check side effects

See, http://docs.particular.net/nservicebus/understanding-transactions-in-windows-azure for more options

Page 44: The impact of cloud NSBCon NY by Yves Goeleven

Wrapup

Page 45: The impact of cloud NSBCon NY by Yves Goeleven

Want to know more?

• Overview: http://docs.particular.net/nservicebus/windows-azure-transport• Hosting: http://docs.particular.net/nservicebus/hosting-nservicebus-in-windows-azure• Cloud services: http://docs.particular.net/nservicebus/hosting-nservicebus-in-windows-azure-cloud-services• Shared host: http://docs.particular.net/nservicebus/shared-hosting-nservicebus-in-windows-azure-cloud-services• Azure servicebus: http://docs.particular.net/nservicebus/using-azure-servicebus-as-transport-in-nservicebus• Azure storage queues: http://

docs.particular.net/nservicebus/using-azure-storage-queues-as-transport-in-nservicebus• Storage persistence: http://docs.particular.net/nservicebus/using-azure-storage-persistence-in-nservicebus• Transactions: http://docs.particular.net/nservicebus/understanding-transactions-in-windows-azure

Resources

Page 46: The impact of cloud NSBCon NY by Yves Goeleven

Or get your hands dirty?

• Samples: https://github.com/particular/nservicebus.azure.samples

Resources

Page 47: The impact of cloud NSBCon NY by Yves Goeleven

Thanks