a deep look inside windows azure and its vm

41
A DEEP LOOK INSIDE WINDOWS AZURE AND ITS VIRTUAL MACHINE Wely Lau ([email protected] ) Microsoft MVP, Windows Azure Solutions Architect, NCS Pte Ltd Blog : http://wely-lau.net

Upload: ylew15

Post on 20-Aug-2015

7.636 views

Category:

Technology


1 download

TRANSCRIPT

A DEEP LOOK INSIDE WINDOWS AZURE AND ITS VIRTUAL MACHINE

Wely Lau ([email protected]) Microsoft MVP, Windows AzureSolutions Architect, NCS Pte LtdBlog : http://wely-lau.net

AGENDA

• Introduction (10 mins) • Windows Azure Service Model (10 mins)• Fabric Controller Internal (10 mins)• Deploying a Service (15 mins) • Service Allocation and Service Healing (10 mins)• Inside Windows Azure Virtual Machine (15 mins) • Q & A (5 mins)

INTRODUCTION

WHAT IS A “CLOUD”?

• Cloud: on-demand, scalable, multi-tenant, self-service compute and storage resources

TYPES OF CLOUD

• Infrastructure as a Service (IaaS): basic compute and storage resources• On-demand servers• Amazon EC2, VMWare vCloud, Rackspace

• Platform as a Service (PaaS): cloud application infrastructure• On-demand application-hosting environment• E.g. Google AppEngine, Salesforce.com, Windows Azure

• Software as a Service (SaaS): cloud applications• On-demand applications• E.g. Office 365, GMail, Microsoft Office Web Companions

CLOUD: EFFICIENCY VERSUS CONTROL

= Managed for You StandaloneServers

IaaS PaaS SaaS

Applications

Runtimes

Database

Operating System

Virtualization

Server

Storage

Networking

Windows Azure

Efficiency

Control + Cost

WINDOWS AZURE

• Windows Azure is an OS for the data center• Model: Treat the data center as a machine• Handles resource management, provisioning, and monitoring• Manages application lifecycle• Allows developers to concentrate on business logic

• Provides common building blocks for distributed applications• Reliable queuing, simple structured storage, SQL storage• Application services like access control and connectivity

WINDOWS AZURE PLATFORM BUILDING BLOCKS

Service BusAccess Control

Caching

Data SyncDatabase

Reporting

Storage• Tables• Blobs• Queues

Compute• Web Role• Worker

Role• VM Role

• Connect• Traffic

Manager

Virtual Network

FabricController

WINDOWS AZURE SERVICE MODEL

MULTI-TIER CLOUD APPLICATIONS

• A cloud application is typically made up of different components• Front end: e.g. load-balanced stateless web servers• Middle worker tier: e.g. order processing, encoding• Backend storage: e.g. SQL tables or files• Multiple instances of each for scalability and availability

Front-End

My Cloud Application

Front-End

Middle-TierMiddle-TierMiddle-

Tier

HTTP/HTTPS

WindowsAzure

Storage,SQL Azure

Load Balancer

THE WINDOWS AZURE SERVICE MODEL• A Windows Azure application is called a “service”

• Definition information (Role name, Role type, VM size, etc.)• Configuration information (# of instances, # of update domains, etc.)• At least one “role”• Your codes

• Roles are like DLLs in the service “process”• Collection of code with an entry point that runs in its own virtual

machine• There are currently three role types:

• Web Role: IIS7 and ASP.NET in Windows Azure-supplied OS• Worker Role: arbitrary code in Windows Azure-supplied OS• VM Role: uploaded VHD with customer-supplied OS

My Service

ConfigurationInstances: 2Update Domains: 2Fault Domains: 2

Role: Front-End

DefinitionType: WebVM Size: SmallEndpoints: External-1

ConfigurationInstances: 3Update Domains: 2Fault Domains: 2

Role: Middle-Tier

DefinitionType: WorkerVM Size: LargeEndpoints: Internal-1

SERVICE MODEL FILES

• Service definition is in ServiceDefinition.csdef

• Service configuration is in ServiceConfiguration.cscfg

• CSPack program Zips service binaries and definition into service package file (service.cscfg)

AVAILABILITY: UPDATE DOMAINS

• Purpose: Ensure service stays up while updating and Windows Azure OS updates

• System considers update domains when upgrading a service• 1/Update domains = percent of

service that will be offline• Default is 5 and max is 20, but you

can override with upgradeDomainCount service definition property

• The Windows Azure SLA is based on at least two update domains and two role instances in each role

Front-End-1

Front-End-2

Update Domain 1

Update Domain 2

Middle Tier-

1

Middle Tier-

2

Middle Tier-

3

Update Domain 3

Middle Tier-

3

Front-End-2Front-End-1

Middle Tier-

2

Middle Tier-

1

AVAILABILITY: FAULT DOMAINS

• Purpose: Avoid single points of failures• Similar concept to update domains• But you don’t control the updates

• Unit of failure based on data center topology• E.g. top-of-rack switch on a rack of

machines

• Windows Azure considers fault domains when allocating service roles• 2 fault domains per service• Will try and spread roles out across more• E.g. don’t put all roles in same rack

Front-End-1

Fault Domain 1

Fault Domain

2

Front-End-2

Middle Tier-2

Middle Tier-1

Fault Domain 3

Middle Tier-3

Front-End-1

Middle Tier-1

Front-End-2

Middle

Tier-2

Middle

Tier-3

FABRIC CONTROLLER INTERNALS

“SKETCH” OF DATACENTER ARCHITECTURE

TOR

LB LBAgg

PDU

LB LBAgg

LB LBAgg

LB LBAgg

Racks

Datacenter Routers

Aggregation Routers and

Load Balancers

TOR

PDU

TOR

PDU

TOR

PDU

TOR

PDU

TOR

PDU

TOR

PDU

TOR

PDU

TOR

PDU

…… … …

Top of RackSwitches

Power Distribution Units

Nodes

Nodes

Nodes

Nodes

Nodes

Nodes

Nodes

Nodes

Nodes

WINDOWS AZURE DATACENTERS

DATACENTER CLUSTERS

• Datacenters are divided into “clusters”• Approximately 1000 rack-mounted server (we call them

“nodes”)

• Each cluster is managed by a Fabric Controller (FC)

• FC is responsible for:• Blade provisioning• Blade management• Service deployment and lifecycle

Cluster1

Cluster2

Clustern

Datacenter network

FC FC FC

INSIDE A CLUSTER

• FC is a distributed, stateful application running on nodes (servers) spread across fault domains• Top blades are reserved for FC• Installed by “Utility Fabric Controller”• One FC instance is the primary and all others keep view of

world in sync

• Supports rolling upgrade, and services continue to run even if FC fails entirely

TOR

FC1

… …

TOR

FC2

… …

TOR

FC3

… …

FC3

TOR

FC4

… …

TOR

FC5

… …

LB

LB AGG LBLB LB

Nodes

Rack

THE FABRIC CONTROLLER (FC)

• The “kernel” of the cloud operating system• Manages datacenter hardware• Manages Windows Azure services

• Four main responsibilities:• Datacenter resource allocation• Datacenter resource

provisioning• Service lifecycle management• Service health management

• Inputs:• Description of the hardware and network resources it

will control• Service model and binaries for cloud applications

Server

Kernel

Process

Datacenter

Fabric Controller

Service

Windows Kernel

Server

WordSQL

Server

Fabric Controller

Datacenter

ExchangeOnline

SQL Azure

(DataCenter.xml)

X

INSIDE A NODE

Fabric Controller (Primary)

FC Host Agent

Host Partition

Guest Partition

Guest Agent

Guest Partition

Guest Agent

Guest Partition

Guest Agent

Physical Node

Fabric Controller (Replica)

Fabric Controller (Replica)…

Role Instance

Role Instance

Role Instance

Trust boundary

Image Repository (OS VHDs, role ZIP files)

FABRIC VIEWER

• Used by Windows Azure Operation Team to view the fabric inside the datacenter

clusters

racks

DEPLOYING A SERVICE

RDFEService

Portal Service

US-North Central Datacenter

DEPLOYING A SERVICE TO THE CLOUD:THE 10,000 FOOT VIEW

• Service package uploaded to portal• Windows Azure Portal Service

passes service package to “Red Dog Front End” (RDFE) Azure service

• RDFE converts service package to native “RD” version

• RDFE sends service to Fabric Controller (FC) based on target region

• FC stores image in repository and deploys and activates service

FC

Service

DEPLOYING A SERVICE TO THE CLOUD: A DEEP LOOK

SERVICE ALLOCATION AND

SERVICE HEALING

SERVICE RESOURCE ALLOCATION

• Goal: allocate service components to available resources while satisfying all hard constraints • HW requirements: CPU, Memory, Storage, Network• Fault domains

• Secondary goal: Satisfy soft constraints • Prefer allocations which will simplify servicing the host

OS/hypervisor• Optimize network proximity: pack nodes

• Service allocation produces the goal state for the resources assigned to the service components• Node and VM configuration (OS, hosting environment)• Images and configuration files to deploy• Processes to start• Assign and configure network resources such as LB and VIPs

SERVICE ALLOCATION EXAMPLE

Role BCount: 2

Update Domains: 2 Size: Medium

Role ACount: 3

Update Domains: 3 Size: Large

Fault Domain 1 Fault Domain 2 Fault Domain 3

LoadBalancer

10.100.0.36

10.100.0.122

10.100.0.185

www.mycloudapp.net

www.mycloudapp.net

NODE AND ROLE HEALTH MAINTENANCE• FC maintains service availability by

monitoring the software and hardware health• Based primarily on heartbeats • Automatically “heals” affected roles

Problem How Detected Fabric Response

Role instance crashes

FC guest agent monitors role termination

FC restarts role

Guest VM or agent crashes

FC host agent notices missing guest agent heartbeats

FC restarts VM and hosted role

Host OS or agent crashes

FC notices missing host agent heartbeat

Tries to recover nodeFC reallocates roles to other nodes

Detected node hardware issue

Host agent informs FC FC migrates roles to other nodesMarks node “out for repair”

SERVICE HEALING

Role BWorker Role

Count: 2Update Domains: 2

Size: Medium

Role A – V2VM Role (Front End)Count: 3

Update Domains: 3Size: Large

LoadBalance

r10.100.0.36

10.100.0.122

10.100.0.185

www.mycloudapp.net

www.mycloudapp.net

10.100.0.191

Fault Domain 1 Fault Domain 2 Fault Domain 3

INSIDE WINDOWS AZURE VM

WINDOWS AZURE VM SIZES

• Each Windows Azure compute instance represents a virtual server.

• Although many resources are dedicated to a particular instance, some resources associated to I/O performance (network bandwidth and disk subsystem), are shared among the compute instances on the same physical host.

• The different instance types will provide different minimum performance from the shared resources depending on their size.

VM Size CPU Memory Instance Storage

I/O Performan

ce

Cost Per Hour

Extra Small 1.0 GHz 768 MB 20 GB Low $0.05

Small 1.6 GHz 1.75 GB 225 GB Moderate $0.12

Medium 2 x 1.6 GHz 3.5 GB 490 GB High $0.24

Large 4 x 1.6 GHz 7 GB 1,000 GB High $0.48

Extra Large

8 x 1.6 GHz 14 GB 2,040 GB High $0.96

LOCAL DRIVES

Resource Volume

OS Volume

Role Volume

Guest AgentRole HostRole Entry Point

• C: = Resource local drive (transient storage for VM)

• D: = OS drive• E: = Application’s code

(size of the package)

RUNTIME INSTALLED

• .NET 3.5 SP1• .NET 4 (RTM)• VC80 CRT (8.0.50727)• VC90 CRT (9.0.30729)• URL Rewrite Module 2.0

• VC10 CRT (e.g. MSVCR100.DLL) is not fusion-ized and can be packaged together with the application

• Others?• Java runtime

• (planned in future)

• PHP• PHP SDK for Windows Azure (“Web Platform Installer”)

• Else?• Start-up Task is your friend

OS VERSION

• Two OS currently managed by Windows Azure• Guest OS 1.x: WS08 64-bit compatible• Guest OS 2.x: WS08 R2 64-bit compatible

• Windows Azure Guest OS Releases and SDK Compatibility Matrix• http://msdn.microsoft.com/en-us/library/ee924680.aspx

PROCESSES IN WINDOWS AZURE VMWebRole

processWorkerRole

process Description

clouddrivesvc clouddrivesvc enables Windows Azure Drives

csrss (3) csrss (3) client/server runtime subsystem

explorer explorer Windows explorer

IISConfigurator   a WCF named pipes service managing IIS configuration of

the web rple

LogonUI LogonUI login UI and screen switching

lsass lsass local security authority subsystem management

lsm lsm local session management

MonAgentHost

MonAgentHost DiagnosticMonitor agent supporting Azure diagnostics

msdtc msdtc Distributed Transaction Coordinator console

osdiag osdiag Remote Desktop Performance Agent

rdpclip rdpclip remote copy/paste support for Remote Desktop Services

RemoteAccessAgent

RemoteAccessAgent

Azure agent supporting Remote Desktop Services (via port 3389)

RemoteForwarderAgent

RemoteForwarderAgent

Azure agent to which all Remote Desktop traffic is routed from the load balancer; this process then routes the traffic to the specific targeted role instance

PROCESSES IN WINDOWS AZURE VM

WebRole process

WorkerRole process Description

services services management of OS servicesSLsvc SLsvc Windows software licensing servicesmss smss Windows OS session manager subsystemsvchost (17) svchost (15) host application for various Windows servicesSystem System kernelvds vds Windows server virtual disk servicevmicsvc (2) vmicsvc (2) Hyper-V guest VM integration servicesw3wp   IIS Worker process hosting ASP.NET siteWaAppAgent WaAppAgent Windows Azure guest agentWaHostBootstrapper

WaHostBootstrapper Bootstrapper run by WaAppAgent

WaIISHost Web role host run by bootstrapperWaWorkerHost Worker role host by bootstrapper

wininit wininit core OS process (starts all services)winlogon (2) winlogon (2) OS login management

NETWORK BANDWIDTH

CONCLUSION

• The Cloud enables pay-as-you-go self-service provisioning of application resources

• Platform as a Service is all about reducing management and operations overhead

• The Windows Azure Fabric Controller is the foundation for Windows Azure’s PaaS• Provisions machines• Deploys services• Configures hardware for services• Monitors service and hardware health

• The Fabric Controller continues to evolve and improve• VM in Windows Azure are provisioned VM that’s

optimally configured running on Windows Azure Hypervisor

REFERENCES

• Inside Windows Azure• http://channel9.msdn.com/Events/PDC/PDC10/CS08

• Inside Windows Azure Virtual Machines• http://channel9.msdn.com/Events/PDC/PDC10/CS63

• Inside Windows Azure: The Cloud Operating Systems • http://channel9.msdn.com/Events/BUILD/BUILD2011/SAC-853T

• Inside The Web and Worker Role VMs• http://blogs.msdn.com/b/jimoneil/archive/2011/01/03/azure-ho

me-part-14-inside-the-webrole-and-workerrole-vms.aspx

• Windows Azure Role Architecture• http://blogs.msdn.com/b/kwill/archive/2011/05/05/windows-azu

re-role-architecture.aspx

QUESTIONS?

Wely Lau ([email protected]) Microsoft MVP, Windows AzureSolutions Architect, NCS Pte LtdBlog : http://wely-lau.net