inside windows azure storage name title microsoft corporation

54
Inside Windows Azure Storage Name Title Microsoft Corporation

Upload: damon-ross

Post on 25-Dec-2015

226 views

Category:

Documents


0 download

TRANSCRIPT

Inside Windows Azure Storage

NameTitleMicrosoft Corporation

Agenda

Windows Azure Storage Today

What’s new?Blobs, Tables and Queues features

Storage Analytics

Geo-Replication

Windows Azure Storage Internals

Windows Azure Storage Today

Geographically Distributed across 3 RegionsThousands of services/applicationsAnywhere at Anytime Access to your dataDurability and Scalability

North Central US

South Central US

Northern Europe

Western Europe East Asia

South East Asia

70 Petabytes raw storage todayGrows to >200 Petabytes by start of 2012

Running on Windows Azure StorageTelemetry for KinectGame Saves in CloudMicrosoft Zune Media Storage and DeliveryFacebook and TwitterNear Real-Time Search

Running on Windows Azure StorageBing Realtime facebook/twitter search ingestion engine

Bing Ingestion Engine (Azure Service)

VM VM VM VM

Windows Azure Queues

Index Facebook/Twitter data within 15 seconds of update

Windows Azure Tables

Windows Azure Tables

User postingsStatus updates

…………

Facebook/Twitter data stored into blobs

Ingestion engine process blobsAnnotate with auth/spam/adult scores, content classification, expands links, etcUses Tables heavily for indexing

Queues to manage work flow

Results stored back into blobs

Bing takes resulting blobs and folds into search index

peak 40,000 Requests/sec2~3 billion Requests per day

Took 1 dev 2 months to design, build and release to production

What’s new for Blobs, Tables and Queues

Windows Azure Storage

Abstractions Easy client access

Easy to use REST APIs and Client Libraries

Existing NTFS APIs for Windows Azure Drives

Blobs File system in the cloud

Tables Massively scalable structured storage

Queues Reliable storage and delivery of messages

DrivesDurable NTFS volumes for Windows Azure applications

Windows Azure Storage Account

User creates a globally unique storage account nameChoose the primary location to host storage account

North Central US

South Central US

Northern Europe

Western Europe East Asia

South East Asia

Windows Azure Data Storage Concepts

Account Table Entities

Queue Messages

Container Blobs

https://<account>.blob.core.windows.net/<container>

https://<account>.table.core.windows.net/<table>

https://<account>.queue.core.windows.net/<queue>

Windows Azure BlobsUpload/Download BlobsProvides continuation for large uploadsProvides range readsStrong Consistency and Optimistic Concurrency

Conditional operations – If-Match, If-Not-Modified-Since, etc.

A highly scalable and durable file system in the cloud

Store files as blobs and associate metadata with itBlobs can be up to 200 GB in size

Snapshot BlobCreate versions/backup of your blobs

Lease BlobExclusive write lease

Windows Azure Blobs – What is new?Efficient Resume for browsers and streaming media players require:

Range requests of the form “Range: bytes 100-”

Return “Accept-Ranges” response header

ETags to be quoted

Windows Azure Tables

Scalable Structured StorageStore Tables with billions of entities and TBs of data

Provides flexible schema (NoSQL)

Data Model

A table is a set of entities (rows)

An entity is a set of properties (columns)

Familiar and Easy to use APIOData Protocol

WCF Data Services - .NET classes and LINQ

Windows Azure Tables – What is new?

Insert entityUpdate entity Merge Replace

Delete entityQuery entityEntity Group Transactions

Query Projection ($select)Project only selected columns

Upsert EntityInsertOrReplaceInsertOrMerge

Windows Azure Tables - Projection public class Customer { public string PartitionKey { get; set; } // Customer Name public string RowKey { get; set; } // Customer Phone Number public DateTime CustomerSince { get; set; } public double TotalPurchase { get; set; } public string State { get; set; } // 100 more properties including profile picture etc.… }

// Partial entity defined here public class CustomerDiscount { public string PartitionKey { get; set; } public string RowKey { get; set; }

public double TotalPurchase { get; set; } }

Windows Azure Tables - Projection // Select partial entities by choosing properties to be projected

var query = (from entity in context.CreateQuery<CustomerDiscount>("Customers" /*Table Name*/)

select new CustomerDiscount

{

PartitionKey = entity.PartitionKey,

RowKey = entity.RowKey,

TotalPurchase = entity.TotalPurchase,

}).AsTableServiceQuery<CustomerDiscount>();

foreach (CustomerDiscount customer in query)

{

// Calculate the discount to be given based on total purchases made

}

Windows Azure Tables - Upsert

// When user logs in from mobile device, it will register the user using upsert

Customer customer = new Customer("Thomas Anderson", “555-555-0100");

customer.Address = "4567 Main St. Redmond 48188";

customer.State = "Washington"

// Note: AttachTo method is called without an Etag which indicates // that this is an Upsert Commandcontext.AttachTo("Customers"/*Table Name*/, customer);

context.UpdateObject(customer); 

// No SaveChangeOptions indicates that a MERGE verb will be used // to get InsertOrMerge semantics // Use SaveChangesOptions.ReplaceOnUpdate for InsertOrReplace semantics.// But InsertOrReplace will overwrite TotalPurchase if it existedcontext.SaveChanges(SaveChangesOptions.ReplaceOnUpdate);context.SaveChanges();

Windows Azure Queues

Provides reliable message delivery

Programming semantics – Ensures that a message can be processed at least once Put message into the queue

Get message makes the message invisible in queue for a specified invisibility timeout

Delete message once done processing to remove message from queue

If worker crashes, message becomes visible for another worker to process

Windows Azure Queues – What is new?Allow larger messages to be stored in queueMessage size has been increased to 64 KB

Allow worker to treat the invisibility timeout as a leaseLease can be renewed on a queue message

Allow worker to update contents of queue message Enable efficient continuation on worker failure

Schedule work at a future time“PUT Message” takes invisibility timeout

Windows Azure QueueUpdate Message Example

Azure Queue

Work items

Web Role

Web Role

Worker Role

Worker Role

Current time

7:00 Get Message with 5 minutes visibility timeout

Expires @ 7:05AM

7:04

7:04

Extend visibility timeout with another 5 minutes

Expires @ 7:09AM

Periodically store progress information in message content

7:07

7:09

7:09

Retrieve progress from queue message and resume

Windows Azure Storage Analytics

Storage Analytics

Goal Capabilities

LogsEnable customers to understand and debug their usage of storage

MetricsEnable customers to get an hourly summary of key statistics about the traffic to their Blobs, Tables and Queues

Enable customers to understand and debug their usage of storage

Provides ability to answer commonly asked questions:

Storage Analytics – Why turn on logging?

Did a specific request make it to the storage service and how long did it take?

What Client IP issued a “Delete container” request and when?

How many requests were issued by a specific client or to a specific set of objects?

list goes on and on…

Storage Analytics Logs

Log records for requests are stored in Windows Azure Blobs

The Log blobs are text files with one log entry per lineEach blob can contain one to many request records

A request typically appears in the log within 15 minutes after it completes execution

Configure the logging levels separately for

Blob, Table and Queuesread (GET), write (PUT/POST/MERGE), delete (DELETE) requests or any combination

Best effort logging

Storage Analytics Data Fields Logged

The following are some of the fields logged for each record:

Log VersionAccessing AccountOwner AccountService TypeRequest URLObject KeyRequest ID

Operation NumberRequest VersionOperation TypeStart TimeApplication End to End LatencyStorage Server LatencyAuthentication Type

Request StatusHTTP Status CodeClient IPUser AgentReferrerClient Request IDETagLMT

Request Packet SizeRequest Header SizeResponse Packet SizeResponse Header SizeRequest MD5Server MD5Conditions Used

Log Entry Example

Log Version: 1.0

Start Time: 2011-07-28T18:02:40.6271789Z

Operation Type: PutBlob

Status: Success

HTTP Status Code: 201

Application E2E Latency (milliseconds): 28

Storage Server Latency (milliseconds): 21

Accessing Account: sally

Owner Account: sally

Service Type: blob

Request URL: PUT http://sally.blob.core.windows.net/thumbnails/lake.jpg

Object Key: /sally/thumbnails/lake.jpg

Request ID: fb658ee6-6123-41f5-81e2-4bfdc178fea3

Operation Number: 0

Request Version: 2009-09-19

Client IP: 201.9.10.20

Client Request ID: req12345

Log Entry in Blob:1.0;2011-07-28T18:02:40.6271789Z;PutBlob;Success;201;28;21;authenticated;sally;sally;blob;"http://sally.blob.core.windows.net/thumbnails/lake.jpg?timeout=30000";"/sally/thumbnails/lake.jpg";fb658ee6-6123-41f5-81e2-4bfdc178fea3;0;201.9.10.20;2009-09-19;438;100;223;0;100;;"66CbMXKirxDeTr82SXBKbg==";"0x8CE1B67AD25AA05";Thursday, 28-Jul-11 18:02:40 GMT;;;;"req12345“

Provides ability to answer commonly asked questions:

Storage Analytics – Why turn on Metrics?

How many transactions did my service issue per hour over the past week?

How many anonymous Get Blob requests were issued to my storage account?

My application is not performing as expected, what is the availability and performance of storage for a given time period?

list goes on and on…

Storage Analytics – MetricsTransaction metrics are provided for every 1 hour time interval stored into Windows Azure TablesExample Metrics

Total TransactionsAvailability% Success, % Network Errors, % Timeout, % Throttled, etc.Average Latency (Application E2E and Storage Server latency)Total IngressTotal Egress

Blob, Table or Queue Summary and per REST API metrics

Capacity metrics provided for only Blobs at this timeUpdated once a day

Capacity and # of objects

Storage Analytics – Example using MetricsClient application running in cloud start experiencing slow table access

Compare Application E2E latency with Storage Server latency

Time for input to be transferred to storage

service

Time for storage service to process request and

compute result

Time taken for application

to retrieve the result

Storage Server Latency

Application E2E Latency

Request arrives at storage service

Done

Compare Application E2E Latency to Storage Server Latency

8/23

/201

1 10

:00

8/23

/201

1 12

:00

8/23

/201

1 14

:00

8/23

/201

1 16

:00

8/23

/201

1 18

:00

8/23

/201

1 20

:00

8/23

/201

1 22

:00

8/24

/201

1 0:

00

8/24

/201

1 2:

00

8/24

/201

1 4:

00

8/24

/201

1 6:

00

8/24

/201

1 8:

00

8/24

/201

1 10

:00

8/24

/201

1 12

:00

8/24

/201

1 14

:00

8/24

/201

1 16

:00

8/24

/201

1 18

:00

8/24

/201

1 20

:00

8/24

/201

1 22

:00

8/25

/201

1 0:

00

8/25

/201

1 2:

00

8/25

/201

1 4:

00

8/25

/201

1 6:

00

8/25

/201

1 8:

00

8/25

/201

1 10

:00

8/25

/201

1 12

:00

8/25

/201

1 14

:00

8/25

/201

1 16

:00

8/25

/201

1 18

:00

8/25

/201

1 20

:00

8/25

/201

1 22

:00

8/26

/201

1 0:

00

8/26

/201

1 2:

00

8/26

/201

1 4:

00

8/26

/201

1 6:

000

200

400

600

800

1000

1200

1400

Avg. Application E2E Latency (ms)

Avg. Storage Server Latency (ms)

Ave

rage

Tab

le L

aten

cy (

ms)

Compare Application E2E Latency to Storage Server Latency (ms)

8/23

/201

1 10

:00

8/23

/201

1 12

:00

8/23

/201

1 14

:00

8/23

/201

1 16

:00

8/23

/201

1 18

:00

8/23

/201

1 20

:00

8/23

/201

1 22

:00

8/24

/201

1 0:

00

8/24

/201

1 2:

00

8/24

/201

1 4:

00

8/24

/201

1 6:

00

8/24

/201

1 8:

00

8/24

/201

1 10

:00

8/24

/201

1 12

:00

8/24

/201

1 14

:00

8/24

/201

1 16

:00

8/24

/201

1 18

:00

8/24

/201

1 20

:00

8/24

/201

1 22

:00

8/25

/201

1 0:

00

8/25

/201

1 2:

00

8/25

/201

1 4:

00

8/25

/201

1 6:

00

8/25

/201

1 8:

00

8/25

/201

1 10

:00

8/25

/201

1 12

:00

8/25

/201

1 14

:00

8/25

/201

1 16

:00

8/25

/201

1 18

:00

8/25

/201

1 20

:00

8/25

/201

1 22

:00

8/26

/201

1 0:

00

8/26

/201

1 2:

00

8/26

/201

1 4:

00

8/26

/201

1 6:

000

400

800

1200

1600

Avg. Application E2E Latency (ms)

Ave

rage

Lat

ency

8/23

/201

1 10

:00

8/23

/201

1 12

:00

8/23

/201

1 14

:00

8/23

/201

1 16

:00

8/23

/201

1 18

:00

8/23

/201

1 20

:00

8/23

/201

1 22

:00

8/24

/201

1 0:

00

8/24

/201

1 2:

00

8/24

/201

1 4:

00

8/24

/201

1 6:

00

8/24

/201

1 8:

00

8/24

/201

1 10

:00

8/24

/201

1 12

:00

8/24

/201

1 14

:00

8/24

/201

1 16

:00

8/24

/201

1 18

:00

8/24

/201

1 20

:00

8/24

/201

1 22

:00

8/25

/201

1 0:

00

8/25

/201

1 2:

00

8/25

/201

1 4:

00

8/25

/201

1 6:

00

8/25

/201

1 8:

00

8/25

/201

1 10

:00

8/25

/201

1 12

:00

8/25

/201

1 14

:00

8/25

/201

1 16

:00

8/25

/201

1 18

:00

8/25

/201

1 20

:00

8/25

/201

1 22

:00

8/26

/201

1 0:

00

8/26

/201

1 2:

00

8/26

/201

1 4:

00

8/26

/201

1 6:

000

4000000

8000000

Total Table Transactions

Tota

l

Transa

ctio

ns

Root Causing the IssueThey then looked at their application performance counters and profiling to find

High CPU utilizationHigh Memory usageFrequent Garbage Collection cycles

Reason for the difference between E2E latency and Server latency

Took a long time for application to retrieve the results of a query

Their resolution was to:Increase number of VM instances

Move to larger VM instances

Move to Server GC

Storage Analytics SummarySeparate NamespaceLogsStored as blobs in separate Blob Container in the storage account being monitored

http://account.blob.core.windows.net/$logs/

MetricsStored as entities in a separate metrics Azure Tables in the storage account being monitored

http://account.table.core.windows.net/$Metrics*

Isolation$logs and $Metrics have separate resource limits and throttling from the rest of the storage account traffic

Cost

Capacity to keep the data

Transactions for generating & accessing analytics data

Can use retention policy on both logs and metrics in terms of days

Deleting data via retention policy does not incur transaction cost

Geo-Replication

East Asia South East Asia Europe

West

North Europe

South Central US

North Central US

Geo-replicationData geo-replicated cross data centers 100s miles apartTurned on right now for Blob and Table data (Queues will be in CY12)

Provide data durability in face of major data center disasters

Data only geo-replicated within regions

User chooses primary location during account creationThe other location in region is the secondary location

Asynchronous geo-replicationOff critical path of live requests

Geo-replication Geo-replicationGeo-replication

Geo-replicationIs there a cost for geo-replication?Geo-replication included in current price of Storage

Geo-replication is on by default for all storage accountsCan turn off for whole storage accountThough no price savings if you turn it off

To disable (turn off) geo-replication contact Microsoft Windows Azure Support

But note, if you turn geo-rep off and then back onData transfer egress rates apply to re-bootstrap the data from primary to secondary data center. No additional charge after the re-bootstrap is done.

North Central US

AzureDNS

http://account.blob.core.windows.net/

DNS lookup

Data access

Hostname IP Address

account.blob.core.windows.net

North Central US

Failover

Update DNS

South Central US

Geo-Failover

Existing URL works after failoverFailover Trigger – failover would only be used if primary could not be recoveredAsynchronous Geo-replication – may lose recent updates during failoverTypically geo-replicate data within minutes, though no SLA guarantee

Geo-replication

South Central US

Windows Azure Storage Internals

Design GoalsHighly Available Storage with Strong ConsistencyProvide access to data in face of hardware failures

DurabilityReplicate data several times within and across data centers

ScalabilityNeed to scale to exabytes and beyond

Automatically load balance data to meet peak traffic demands

Provide a global namespace to access data around the world

Additional details can be found in this white paper:“Windows Azure Storage: A Highly Available Cloud Storage Service with Strong Consistency”, ACM Symposium on Operating System Principals (SOSP), Oct. 2011

http://go.microsoft.com/fwlink/?LinkID=234565

Windows Azure Storage Stamps

Storage Stamp

LB

StorageLocation Service

Access blob storage via the URL: http://<account>.blob.core.windows.net/

Data access

Inter-stamp (Geo) replicationPartition Layer

Front-Ends

DFS Layer

Intra-stamp replication

Storage Stamp

LB

Partition Layer

Front-Ends

DFS Layer

Intra-stamp replication

Storage Stamp Architecture – DFS Layer

M

DFS Servers

Paxos

M

MDistributedFile System(DFS)Layer

All data from the Partition Layer is stored into files (extents) in the DFS layerAn extent is replicated 3 times across different fault and upgrade domainsChecksum all stored data

Verified on every client readScrubbed every few days

Re-replicate on disk/node/rack failure or checksum mismatchLoad balancing

3 replicas are randomly allocated across a candidate set of servers based on available resourcesAny of the 3 replicas can be read from and read load balancing is usedUse a journal drive to keep the write latencies low

PartitionServer

Storage Stamp Architecture – Partition Layer

M Paxos

M

M

PartitionServer

PartitionServer

PartitionServer

PartitionMaster

Lock Service

Partition Layer

DFS Layer

Provide transaction semantics and strong consistency for high level data abstractionsStores and reads the objects to/from extents in the DFS layerProvides inter-stamp (geo) replication by shipping logs to other stampsScalable object index via partitioning

DFS Servers

PartitionServer

Storage Stamp Architecture

M Paxos

Front End Layer

FE

M

M

PartitionServer

PartitionServer

PartitionServer

PartitionMaster

FE FE FE FE

Lock Service

Partition Layer

DFS Layer

Stateless ServersAuthentication + authorizationRequest routing

DFS Servers

PartitionServer

Storage Stamp Architecture

M Paxos

Front End Layer

FE

Incoming Write Request

M

M

PartitionServer

PartitionServer

PartitionServer

PartitionMaster

FE FE FE FE

Lock Service

Ack

Partition Layer

DFS Layer

DFS Servers

Partition Layer – Scalable Object Index100s of Billions of blobs, entities, messages across all accounts can be stored in a stamp

Need to efficiently enumerate, query, get, and update them

Traffic pattern can be highly dynamicHot objects, peak load, traffic bursts, etc

Need a scalable index for the objects that can:

Spread the index across 100s of servers

Dynamically load balanceDynamically change what servers are serving each part of the index based on load

Scalable Object Index via PartitioningPartition Layer maintains an internal Object Index Table for each data abstraction

Blob Index: contains all blob objects for all accounts in a stamp

Entity Index: contains all entities for all accounts in a stamp

Message Index: contains all messages for all accounts in a stamp

Scalability is provided for each Object Index

Monitor load to each part of the index to determine hot spots

Index is dynamically split into thousands of Index RangePartitions based on load

Index RangePartitions are automatically load balanced across servers to quickly adapt to changes in load

Account

Name

Container

Name

BlobName

aaaa aaaa aaaaa

…….. …….. ……..

…….. …….. ……..

…….. …….. ……..

…….. …….. ……..

…….. …….. ……..

…….. …….. ……..

…….. …….. ……..

…….. …….. ……..

…….. …….. ……..

…….. …….. ……..

…….. …….. ……..

zzzz zzzz zzzzz

Split index into Range Partitions based on load

Can only split at PartitionKey boundaries

PartitionMap tracks Index RangePartition assignment to partition servers

Front-End caches the PartitionMap to route user requests

Each part of the index is assigned to only one Partition Server at a time

Storage Stamp

PartitionServer

PartitionServer

Account

Name

Container

Name

BlobName

richard videos tennis

……… ……… ………

……… ……… ………

zzzz zzzz zzzzz

Account

Name

Container

Name

BlobName

harry pictures sunset

……… ……… ………

……… ……… ………

richard videos soccer

PartitionServer

Partition Master

Partition Layer – Index Range Partitioning

Front-EndServer

PS 2 PS 3

PS 1

A-H: PS1H’-R: PS2R’-Z: PS3

A-H: PS1H’-R: PS2R’-Z: PS3

Partition

Map

Blob Index

Partition Map

AccountName

Container

Name

BlobName

aaaa aaaa aaaaa

……… ……… ………

……… ……… ………

harry pictures sunrise A-H

R’-ZH’-R

DFS Layer

PartitionServer 1

PartitionServer 2

PartitionServer 3

PartitionServer 4

Master System

Master System

VIP

Partition Master

- RangePartition

- Server Load

Legend

Unassign RangePartitionReassign RangePartition

Partition Layer – Automatic RangePartition Load Balancing

Load balancing is triggered based on hot RangePartitions or Partition ServersNo data is moved on disk for the reassignmentOnly changing the index assignment for the Partition Servers

FE 1 FE 2 FE 3PMPM

Scalability of Data Abstractions

Namespace for accessing storagehttp://<accountName>.<type>.core.windows.net/partitionName

How to scale out storage for your serviceUnderstand the scalability targets at 2 levels:

1. Scalability targets of a single storage account

2. Scalability targets for Blobs, Table Entities and Queues within a storage account

Scalability of Data Abstractions

Namespace for accessing storagehttp://<accountName>.<type>.core.windows.net/partitionName

How to scale out storage for your serviceUnderstand the scalability targets at 2 levels:

Scalability targets of a single storage accountAccount Scalability TargetsCapacity – Up to 100 TBsTransactions – Up to 5000 entities per secondBandwidth – Up to 3 gigabits per second

Partition data across storage accounts to go beyond these targets

Scalability of Objects within an AccountNamespace for accessing storagehttp://<accountName>.<type>.core.windows.net/partitionName

How to scale out storage for your serviceUnderstand the scalability targets at 2 levels:

Scalability targets for Blobs, Table Entities and Queues within a storage accountSingle Blob – up to 60MBytes per secondSingle PartitionKey in a Table – up to 500 entities per secondSingle Queue - up to 500 messages per second

Windows Azure Storage SummaryProvides scalable, durable and available data abstractions to build your applicationsNew features for Blobs, Tables, and Queues:Queue Message Leases/Update Table Projection and Upsert Improve Blob data streamingStorage AnalyticsGeo-replication

Overview of Windows Azure Storage Internals, with details in:“Windows Azure Storage: A Highly Available Cloud Storage Service with Strong Consistency”, ACM Symposium on Operating System Principals (SOSP), Oct. 2011http://go.microsoft.com/fwlink/?LinkID=234565

More info about the above on Windows Azure Storage blog:http://blogs.msdn.com/windowsazurestorage/

Feedback

Feedback and questions http://forums.dev.windows.com

Session feedbackhttp://bldw.in/SessionFeedback

Thank You

© 2011 Microsoft Corporation. All rights reserved. Microsoft, Windows, Windows Vista and other product names are or may be registered trademarks and/or trademarks in the U.S. and/or other countries.The information herein is for informational purposes only and represents the current view of Microsoft Corporation as of the date of this presentation. Because Microsoft must respond to changing market conditions, it should not be interpreted to

be a commitment on the part of Microsoft, and Microsoft cannot guarantee the accuracy of any information provided after the date of this presentation. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS PRESENTATION.

© 2011 Microsoft Corporation. All rights reserved. Microsoft, Windows, Windows Vista and other product names are or may be registered trademarks and/or trademarks in the U.S. and/or other countries.The information herein is for informational purposes only and represents the current view of Microsoft Corporation as of the date of this presentation. Because Microsoft must respond to changing market conditions, it should not be interpreted to

be a commitment on the part of Microsoft, and Microsoft cannot guarantee the accuracy of any information provided after the date of this presentation. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS PRESENTATION.