windows azure platform technical deep dive - chris auld (intergen)

68
1 Windows Azure Platform Technical Deep Dive Chris J.T. Auld Director Intergen Ltd [email protected] twitter.com/cauld

Upload: spiffy

Post on 10-Jul-2015

2.333 views

Category:

Technology


2 download

TRANSCRIPT

1

Windows Azure Platform Technical Deep Dive

Chris J.T. AuldDirectorIntergen Ltd [email protected]/cauld

Notes Before We Begin

I don’t work for MicrosoftWill call a ‘spade a spade’ today

Azure is a young technologyBest practices are still emerging

We focus on architecture in this sessionA Few Demos only

No prior experience assumed

We have lots of attendees and just a little timeQuestions at the end please

TicketDirect: An example application

TicketDirect is a ticketing company in Australia and New Zealand

Ticketing is uniquely suited to the cloud

Will use TicketDirect as real world example today

4

TicketDirect Architecture

SQL Azure

Castellan

Castellan Venue DB

Castellan Venue DB’s

Venue 1Partition(s)

Castellan Venue DB

Castellan Venue DB’s

Venue 2 Partition(s)

Castellan Venue DB

Castellan Venue DB’s

Venue N Partition(s)

One application DB, many venue DB’s – each partitioned in to many parts (40+)

...

Azure Roles

http:// TicketDirect .* Dynamic Worker(tasks uploaded

as blobs)

PartitionerWorker

Azure StorageQueues for communication between clients and roles

-- - --- - --- - --- - -

Tables to record server & partition information

Blobs to store web and worker role resources

Client Applications

Castellan.old (VB6)

Castellan.Azure• Box Office sales• Ticket Printing • System Administration • Venue/Event Management• Partitioning

.Net Service Bus

WCF

On PremiseSQL Server

Castellan Venue

Distributed Cache Worker

5

A Global Hardware Platform

Global Foundation Serviceshttp://www.globalfoundatonservices.com

6 Azure Data CentersEurope - West/North

Asia – East/Southeast

USA – South Central/North Central

New containerised data centers

Approaching PUE of 1.2

Thousands of computation units

6

Intelligent Network Load Balancer

Stateless ‘Worker’ Machines

Shared Filesystem

Stateless Web and/or Application Servers

Partitioned RDBMS

‘NoSQL’Datastores

StateTier

Async Activation

Network Activation

The High Scale Application Archetype

Queues

Windows Azure provides a ‘pay-as-you-go’ scale out application platform

7

Azure Service Architecture

StorageTables

LB

Blobs

Worker Service

Worker Service

Worker RoleManaged

Interface Calll

Web Site(ASPX, ASMX, WCF)Web Site

(ASPX, ASMX, WCF)Web RoleIIS as Host

Queues

Windows Azure Data Center

LB

LB

The InternetThe Internet via TCP or HTTP

8

Cloud != Cloud

9

Windows Azure

FabricController

Web Portal(API)

LB

LB

D

N

S

YourService

10

LB

LB

D

N

S

Service Deployment

YourService

FabricController

Web Portal(API)

Service ModelServiceService

DNSconfig

11

Hello Windows Azure

12

LB

LB

D

N

S

Service Update

YourService

FabricController

Web Portal(API)

Service ModelServiceService

DNSconfig

Service

Serviceproduction

staging

config

13

Upgrading Your Application

Two Models: VIP Swap and In-Place Upgrade

VIP Swap:Uses Staging and Production environments.

Allows to quickly swap environments.

Production: v1 Staging: v2, after swap then Production: v2 Staging: v1.

In-Place UpgradePerforms a rolling upgrade on live service.

Entire service or a single role

Manual or Automatic across update domains

Cannot change Service Model

14

Fault and Upgrade Domains

Instance

Instance

Instance

Instance

Instance

Instance

15

Configuration

Service ConfigurationServiceconfiguration.csdef – Service Model

ServiceConfiguration.cscfg – instance data

RoleEnvironment.GetConfigurationSettingValue()

Don’t use web.config for values you wish to change at runtime

Web.config change requires re-deploy

16

LB

LB

D

N

S

Service Scaling

YourService

FabricController

Web Portal(API)

Service

Service

Service

Model

Service

Service

ServiceService

Service

Rule Based Auto- Scaling

Use Service Management API

Predicable or Periodic DemandTime based rules

Unpredictable demandMonitor metrics and react accordingly

Monitor metrics

Primary metrics (actual work done)Requests per Second

Queue messages processed / interval

Secondary metricsCPU Utilization

Queue length

Response time

Derivative metricsRate of change of queue lengthUse ‘historical’ data to help predict requirements

Evaluating Business Rules

Are requests taking too long?

Do I have too many jobs in my queue?

How much money have I spent this month?

Could write these into code.

Could build some sort of rules engine.

Could use WF rules engine.

Take Action

Add/Remove InstancesUse Service Management API

Don’t forget billing window is 1hr

Change role sizeRequires change to *.csdef

Most suited to Worker Roles

Send notificationsEmail

IM

Manage momentum

21

LB

LB

D

N

S

Service Monitoring & Recovery

YourService

FabricController

Web Portal(API)

Service

Service

Service

Model

Service

22

MonitoringNo Debugging in Cloud

Instrument your application using Trace, Debug

Use Diagnostics API to Configure and CollectEvent Logs

Performance Counters

Trace/Debug information (logging)

IIS Logs, Failed Request Logs

Request data on demand or scheduledTransferred into your table and/or blob storage

Everything is remotely configurable

23

Hello Windows Azure v2

24

Storage

Scalable storage in Azure Datacenter100tb per storage account

Accessible via RESTful Web Service APIAccess from Azure Compute

Access from anywhere via internet

Supporting .NET Client Library

Various storage typesTable

Queue

BlobDrives

25

StorageTables

Table = group of entities

Entity = name/value pairs

Partitioned by keyScale out to Bns of entities

Not an RDBMS

BlobsLarge binary storage

Stored in container

Unlimited containers

CDN Deliverable

QueuesSimple message queue

Not transactionalRead at least once

Delete to remove message, otherwise is returned to queue

Partitioned by Queue Name

26

Web Role Worker Role

StorageQueue

LB LB

Worker RoleWorker Role

Web RoleWorker Role

Blob Container

Table

30mb JPEG

1. User uploads large image file2. Image inserted into blog storage3. Message placed on queue incl BLOB URI and metadata4. Worker role is polling queue. Reads message from queue5. Worker role processes message, reads from BLOB storage, generates thumbnail6. Thumnail and metadata stored in Table storage7. Message deleted from queue

30mb JPEG

Using Queues for Async Processing

27

Idempotencyf(x) = f(f(x))Queues are NOT transactional

28

First Step For Software Architects

If you expect to write documents mentioningidempotencyOpen wordType idempotencyRight clickChoose add to dictionary

WARNING: Failure to follow these steps will surely see you sending an important architecture and design document to a client with the ‘corrected’

spelling of the word...impotency

29

Messages Process At Least Once

Web Role Worker Role

StorageQueue

LB LB

Worker RoleWorker Role

Web RoleWorker Role

1. Debit bank account $100 message2. Worker role reads message3. Balance debited $1004. Worker role is torn before message can be deleted5. 3 minutes later, message re-appears on queue6. Worker role reads message7. Balance debited $1008. Message deleted from queue9. Chaos ensues.....10. Customer calls bank.....

Balance = $1000Balance = $900Balance = $800

30

Solving The Idempotency Problem

Web Role Worker Role

StorageQueue

LB LB

Worker RoleWorker Role

Web RoleWorker Role

1. Debit bank account $100 message with transaction ID2. Worker role reads message. Checks transaction ID not present.3. Writes transaction ID with state ‘Started’ to ‘Replay Log’4. Balance debited $1005. Worker role is torn before message can be deleted6. 3 minutes later, message re-appears on queue7. Worker role reads message. Checks transaction ID. It is present in state started.8. Compensating message written to another queue9. Message deleted from queue10. Compensatory message processed.

Balance = $1000Balance = $900

Table

QueryQuery

Queue

31

Azure TablesNot an RDBMS

32

Azure Table Storage – Key Points

Partition Key is the killer featurePartitions are Auto-Balanced

No need to partition into equal bins

Hot partitions may be scaled upAzure fabric may dedicate more resources to partitions with high Tx load

Partition Key AND Row Key = Primary KeyMust include PartitionKey for Create,Update,Delete

Select queries across partitions parallelized, resource intensive and potentially more expensive!

33

Azure Table Storage – Key Points

Continuation Tokens May Be Returned from Cross Partition Queries

Any query not including the PartitionKey needs to handle Continuation tokenshttp://tinyurl.com/ContToken

Key Columns Up to 1KB in sizeShould aim to keep to 260 char URI limit

Be aggressive e.g. Only ever query by an ID?RowKey = PartitionKey

All queries should include partition key

34

NoSQL/Non-RelationalData Modelling

Azure Tables != RDBMS

Storage is cheap

Cross partition queries are resource intensive

De-normalization and massive duplication often name of the game

36

E.g. Tweet Storage

Tweet

TweetID

UserID

DateTimeStamp

Message

37

E.g. Tweet Storage

Tweet

Message

TweetWord

TweetID

WordID

Word

WordID

Word (IX)

Tweet

Message

Word

WordID

Word (IX)

38

E.g. Tweet Storage

Tweet

TweetID (RK)

UserID (PK)

DateTimeStamp

Message

TweetIndex

TweetID (RK)

UserID

DateTimeStamp

Message

Word (PK)

39

E.g. Tweet Storage

Tweet

TweetID (RK)

UserID (PK)

DateTimeStamp

Message

MentionIndex

TweetID (RK)

UserID

DateTimeStamp

Message

UserID (PK)

40

PricingCompute

$0.12 / CPU hour (or part thereof)~ 1.7 GHz, 2GB Ram, Single Core

$2.88 / Day

$86.4 / 30 days (billing period)

2 instances = $172.80 / month

Storage$0.15 / GB/Month

$.01 / 10,000 calls to storage web service

Bandwidth $0.30 /GB inbound to Asian datacenters

$0.45 /GB outbound from Asian datacenters

41

Design Considerations

Scale and availability are the design points

Storage isn’t a relational database

StatelessStateless front ends, store state in storage

Use queues to decouple components

Instrument your application (Trace)

Once you are on - stay on

Think about patching & updates

SQL Azure

Initial ServicesDatabaseCore SQL Server database capabilities in cloud optimized topologyHighly compatible with on premise SQL Server

Future ServicesData Sync – Enables the sync framework

Additional SQL Server capabilities available as a service: Business Intelligence and Reporting

BusinessIntelligence

ReportingData Sync

SQL Azure Details

SQL Azure provides logical SQL ServerGateway server that understands TDS protocol

Looks like SQL Server to TDS Client

Actual data stored on multiple backend data nodes

Logical optimisations supportedIndexes, Query plans etc..

Physical optimisations not supportedFile Groups, Partitions etc…

SQL Azure transparently manages physical storage

44

SQL AzureDeployment

Web Portal(API)

SQL AzureTDS

DB Script

45

SQL AzureAccessing databases

Web Portal(API)

SQL AzureTDS

Your App

Change Connection String

46

SQL Azure

47

Database Replicas

Replica 1

Replica 2

Replica 3

DB

48

Hardware Boundary

Hardware Boundary

Hardware Boundary

Hardware Boundary

Shared Environment

BC

D

A A

B

B

C

C

D

D

A

49

SQL AzureDatabase Monitoring & Recovery

Web Portal(API)

SQL AzureTDS

Your App

!

50

Design Considerations

1 x 10GB database1 Instances

10 x 1GB databases10 Instances

Partition forData volume

Query load

51

SQL Azure – Key Points

Partition for Data volume > 10GB

Transaction throttle (non deterministic)Always code for retry

All partition logic up to the developerAlgorithmic

Lookup based

Partitions are not Auto-BalancedNeed to aim for ‘equal’ partitions

‘Equal’ not necessarily the same size

52

Choosing a Partition Key

Natural KeysCountry

First letter, last name

Date

MathematicalHash functions

Modulo operator

Lookup BasedLookup table to resolve value to partitions

53

Using Modulo

The remainder of a division

Nice properties for partitioning:Given two positive integers M and N

M mod N will return a number between 0 and N-1

Want equi-sized partitions?Given an appropriate distribution of M we will get N ‘equally full’ buckets.

54

Using Hash Values

Using A Hash Function Projects One Distribution into Another

Use a hash function that projects a random distribution

Do NOT use a cryptographic hash function

Plenty of choice on the webhttp://tinyurl.com/part-hash

Be careful if using Object.GetHashCode()

55

SQL Azure Partitioning

56

Just in time Partitioning

In SQL Azure Partitions Cost Money

In highly elastic scenarios partitions may be needed for just a few hours or days

If load is predictablePartition before load commences

De-partition after load has subsided

57

Pricing

WEB Edition

1 GB Database

$9.99 / month

Bandwidth$0.10 /GB inbound

$0.15 /GB outbound

Business Edition

10GB Database

$99.99 / month

Bandwidth$0.10 /GB inbound

$0.15 /GB outbound

• Pro rated by the day or part thereof• Can move up and down between sizes• SQL Azure has no query charge• Excessively long transactions or high query load may result in throttling• 50GB database size in Beta

58

Windows Azure Platform AppFabric

Service Bus: General purpose application messaging bus

Access Control:Rules-driven, claims-based access control

Extending .NET to the cloud with Internet Scale Utility Services

Simplified, Secure Connectivity for the CloudService Bus and Access Control in Windows Azure platform

AppFabric are powerful building blocks.

AppFabric SERVICE BUSConnect apps & services

AppFabric ACCESS CONTROLControl & secure access

Secure Connectivity• Bridge cloud services, on-premises apps, and hosted

assets• Build distributed apps for your business or to

collaborate with partners

Across boundaries• Navigate network and security boundaries, securely

and simply• Federate identity and access across organizations

and ID providers• Simplify claims-based authorization for distributed

apps and web services

At Cloud Scale• Scale up and down as your business requires• Automated service mgmt. and dynamic scale• Interoperate with a variety of languages and

industry standards

AppFabric Service Bus Connectivity

Application #1 Application #2Direct Connection facilitated by

Service Bus if that is best

connection mechanism.

Text

XML

Graphics

Binary Data

Streaming

Firewall

Send

Receive

Exchange messages between loosely

coupled, composite applications.

Send

Receive

61

Service Bus

Architecture of AppFabric Access Control

Your Access

Control Project

Your App

(Relying Part)

1. Define access

control rules

6. Check for claims

User

(Application)5. Send token with request

0. Trust exchanged;

secrets, certs

2. Send token (initial

claims; e.g. identity)

4. Return token

(output claims from 3)

3. Map input claims to

output claims based

on access control rules

63

Pricing & SLA

$1.99 / 100k ACS transactions

Connections$3.99/connection/month

Packages available

Bandwidth$0.10 /GB inbound

$0.15 /GB outbound

TicketDirect: An example application

TicketDirect is a ticketing company in Australia and New Zealand

Ticketing is uniquely suited to the cloud

Will use TicketDirect as real world example today

65

TicketDirect Architecture

SQL Azure

Castellan

Castellan Venue DB

Castellan Venue DB’s

Venue 1Partition(s)

Castellan Venue DB

Castellan Venue DB’s

Venue 2 Partition(s)

Castellan Venue DB

Castellan Venue DB’s

Venue N Partition(s)

One application DB, many venue DB’s – each partitioned in to many parts (40+)

...

Azure Roles

http:// TicketDirect .* Dynamic Worker(tasks uploaded

as blobs)

PartitionerWorker

Azure StorageQueues for communication between clients and roles

-- - --- - --- - --- - -

Tables to record server & partition information

Blobs to store web and worker role resources

Client Applications

Castellan.old (VB6)

Castellan.Azure• Box Office sales• Ticket Printing • System Administration • Venue/Event Management• Partitioning

.Net Service Bus

WCF

On PremiseSQL Server

Castellan Venue

Distributed Cache Worker

66

Windows Azure Platform Benefits

The Cloud

New Economic ModelLow Capex

Pay as you Go

Elastic ScaleOnly solvable via Cloud

Global DistributionGlobal data centers

Windows Azure

High Level of AbstractionHardware

Server OS

Network Infrastructure

Web Server

AvailabilityAutomated Service Management

Azure CDN

ScalabilityInstance & Partitions

Developer ExperienceFamiliar Developer Tools

67

Windows Azure Platform Benefits

AppFabric

High Performance MessagingMassively scalable

HTTP and Raw TCP

Access ControlLess brittle apps due to factoring out rules

Developer ExperienceFamiliar Developer Tools

WCF bindings

SQL Azure

Higher Level of AbstractionHardware

Server OS

Network Infrastructure

Database Server

AvailabilityAutomated Database Management & Replication

ScalabilityDatabases Partitioning

Developer ExperienceFamiliar SQL Environment