windows azure storage: what’s coming, best practices and internals

55

Upload: wattan

Post on 24-Feb-2016

45 views

Category:

Documents


0 download

DESCRIPTION

Windows Azure Storage: What’s Coming, Best Practices and Internals. Brad Calder and Jai Haridas Windows Azure Storage Microsoft 3-541. Agenda. Introduction to Storage Internals What’s coming Best Practices. Introduction to Storage. Windows Azure Storage. - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Windows Azure Storage:  What’s  Coming,  Best Practices and Internals
Page 2: Windows Azure Storage:  What’s  Coming,  Best Practices and Internals

Windows Azure Storage: What’s Coming, Best Practices and InternalsBrad Calder and Jai HaridasWindows Azure StorageMicrosoft3-541

Page 3: Windows Azure Storage:  What’s  Coming,  Best Practices and Internals

AgendaIntroduction to StorageInternalsWhat’s comingBest Practices

Page 4: Windows Azure Storage:  What’s  Coming,  Best Practices and Internals

Introduction to Storage

Page 5: Windows Azure Storage:  What’s  Coming,  Best Practices and Internals

Windows Azure StorageCloud Storage - Anywhere and anytime access• Blobs, Disks, Tables and Queues

Highly Durable, Available and Massively Scalable • Easily build “internet scale” applications• 8.5 trillion stored objects• 900K request/sec on average (2.3+ trillion per month)

Pay for what you useExposed via easy and open REST APIs Client libraries in .NET, Java, Node.js, Python, PHP, Ruby

Page 6: Windows Azure Storage:  What’s  Coming,  Best Practices and Internals

Abstractions – Blobs and DisksBlobs – Simple interface to store and retrieve files in cloud• Data sharing – share documents, pictures, video, music, etc.• Big Data – store raw data/logs and compute/map reduce over data• Backups – data and device backups

Disks – Network mounted durable disks for VMs in Azure • Mounted disks are VHDs stored in Azure Blobs• Move on-premise applications to cloud

Page 7: Windows Azure Storage:  What’s  Coming,  Best Practices and Internals

Abstractions – Tables and QueuesTables – Massively scalable and extremely easy to use NoSQL system that auto scales• Key-value lookups at scale• Store user information, device information, any type of metadata for

your service

Queues – Reliable messaging system • Decouple components/roles– Web role to worker role communication– Allows roles to scale independently• Implement scheduling of asynchronous tasks• Building process/work flows

Page 8: Windows Azure Storage:  What’s  Coming,  Best Practices and Internals

North America Europe Asia Pacific

S. Central – U.S. Region

W. Europe Region

N. Central – U.S. Region

N. Europe Region

S.E. AsiaRegion

E. AsiaRegion

Data centers

Windows Azure Storage

East – U.S. Region

West – U.S. Region

Page 9: Windows Azure Storage:  What’s  Coming,  Best Practices and Internals

www.buildwindows.com

Windows Azure Data Storage Concepts

Account

Container Blobs

Table Entities

Queue Messages

https://<account>.blob.core.windows.net/<container>

https://<account>.table.core.windows.net/<table>

https://<account>.queue.core.windows.net/<queue>

Page 10: Windows Azure Storage:  What’s  Coming,  Best Practices and Internals

How is Azure Storage used by Microsoft?Xbox: Uses Blobs, Tables & Queues for Cloud Game Saves, Halo 4, XBox Music, XBox Live, etc.Skype: Uses Blobs, Tables and Queues for Skype video messages and to keep metadata to allow Skype clients to connect with each other Bing: Uses Blobs, Tables and Queues to provide a near real-time ingestion engine that consumes Twitter and Facebook feeds, indexes them, which is then folded into Bing searchSkyDrive: Uses Blobs to store pictures, documents, videos, files, etc.

Page 11: Windows Azure Storage:  What’s  Coming,  Best Practices and Internals

www.buildwindows.com

Page 12: Windows Azure Storage:  What’s  Coming,  Best Practices and Internals

Internals

Page 13: Windows Azure Storage:  What’s  Coming,  Best Practices and Internals

Design GoalsHighly Available with Strong Consistency• Provide access to data in face of failures/partitioning

Durability• Replicate data several times within and across regions

Scalability• Need to scale to zettabytes• Provide a global namespace to access data around the world• Automatically scale out and load balance data to meet peak traffic demands

Additional details can be found in the SOSP paper:“Windows Azure Storage: A Highly Available Cloud Storage Service with Strong Consistency”, ACM Symposium on Operating System Principals (SOSP), Oct. 2011

Page 14: Windows Azure Storage:  What’s  Coming,  Best Practices and Internals

www.buildwindows.com

Windows Azure Storage Stamps

Storage Stamp

LB

StorageLocation Service

Access blob storage via the URL: http://<account>.blob.core.windows.net/

Data access

Partition Layer

Front-Ends

DFS Layer

Intra-stamp replicationStorage Stamp

LB

Partition Layer

Front-Ends

DFS Layer

Intra-stamp replication

Inter-stamp (Geo) replication

Page 15: Windows Azure Storage:  What’s  Coming,  Best Practices and Internals

Index

Architecture Layers inside Stamps

Front-End Layer

Front-end Layer• REST front-end (blob, table, queue)• Authentication/authorization• Metrics/logging

Partition Layer• Understands our data abstractions,

and provides optimistic concurrency• Massively scalable index• Log Structured Merge Tree– Each log (stream) is a linked list of extents

Distributed File System Layer• Data persistence and replication

(JBOD)• Data is stored into a file called extent,

which is replicated 3 times across different nodes (UDs/FDs)

• Append-only file system

Distributed File System

Partition LayerM

Page 16: Windows Azure Storage:  What’s  Coming,  Best Practices and Internals

All writes are appends to the end of a log, which is an append to the last extent in the log

Write Consistency across all replicas for an extent: • Appends are ordered the same across all 3

replicas for an extent (file)• Only return success if all 3 replica appends are

committed to storage• When extent gets to a certain size or on write

failure/LB, seal the extent’s replica set and never append anymore data to it

Write Availability: To handle failures during write• Seal extent’s replica set• Append immediately to a new extent (replica

set) on 3 other available nodes • Add this new extent to the end of the partition’s

log (stream)

Availability with Consistency for Writing

Front-End Layer

Distributed File System

Partition LayerM

Page 17: Windows Azure Storage:  What’s  Coming,  Best Practices and Internals

Read Consistency: Can read from any replica, since data in each replica for an extent is bit-wise identical

Read Availability: Send out parallel read requests if first read is taking higher than 95% latency

Availability with Consistency for Reading

Front-End Layer

Distributed File System

Partition LayerM

Page 18: Windows Azure Storage:  What’s  Coming,  Best Practices and Internals

Spreads index/transaction processing across partition servers• Master monitors traffic load/resource

utilization on partition servers• Dynamically load balance partitions across

servers to achieve better performance/availability

Does not move data around, only reassigns what part of the index a partition server is responsible for

Dynamic Load Balancing – Partition Layer

Front-End Layer

Distributed File System

Partition LayerM

Index

Page 19: Windows Azure Storage:  What’s  Coming,  Best Practices and Internals

DFS Read load balancing across replicas• Monitor latency/load on each node/replica;

dynamically select what replica to read from and start additional reads in parallel based on 95% latency

DFS write load balancing• Monitor latency/load on each node; seal the

replica set with an overloaded node, and switch to a new extent on another set of nodes to append to

DFS capacity load balancing• Lazily move replicas around to ensure the disks

and nodes have equal amount of data on them• Important for avoiding hot nodes/disks

Dynamic Load Balancing – DFS LayerFront-End Layer

Distributed File System

Partition LayerM

Page 20: Windows Azure Storage:  What’s  Coming,  Best Practices and Internals

Architecture SummaryDurability: All data stored with at least 3 replicasConsistency: All committed data across all 3 replicas are identicalAvailability: Can read from any 3 replicas; If any issues writing seal extent and continue appending to new extentPerformance/Scale: Retry based on 95% latencies; Auto scale out and load balance based on load/capacity

Additional details can be found in the SOSP paper:“Windows Azure Storage: A Highly Available Cloud Storage Service with Strong Consistency”, ACM Symposium on Operating System Principals (SOSP), Oct. 2011

Page 21: Windows Azure Storage:  What’s  Coming,  Best Practices and Internals

What’s Coming

Page 22: Windows Azure Storage:  What’s  Coming,  Best Practices and Internals

www.buildwindows.com

What’s Coming by end of 2013Geo-Replication• Queue Geo-Replication• Secondary Read-Only Access

Windows Azure Import/ExportReal-Time Metrics for Blobs, Tables and QueuesCORS for Azure Blobs, Tables and QueuesJSON for Azure TablesNew .NET 2.1 Library

Page 23: Windows Azure Storage:  What’s  Coming,  Best Practices and Internals

Two Types of Durability OfferedLocal Redundant Storage Accounts• Maintain 3 copies of data within a given region• ~ 2/3 price of Geo Redundant Storage

Geo Redundant Storage Accounts• Maintain 6 copies of data spread over 2 regions at least 400 miles apart

from each other (3 copies are kept at each region)

Page 24: Windows Azure Storage:  What’s  Coming,  Best Practices and Internals

www.buildwindows.com

Geo Redundant Storage

Data geo-replicated across regions 400+ miles apart• Provide data durability in face of potential major regional disasters• Provided for Blob, Tables and Queues (NEW)

User chooses primary region during account creation• Each primary region has a predefined secondary region

Asynchronous geo-replication• Off critical path of live requests

Europe West

North Europe

Geo-replication

South Central

US

North Central

US

Geo-replication

East Asia

South East Asia

Geo-replicationWest US East US

Geo-replication

Page 25: Windows Azure Storage:  What’s  Coming,  Best Practices and Internals

www.buildwindows.com

East USWest US

AzureDNS

http://account.blob.core.windows.net/

DNS lookup

Data access

Hostname IP Addressaccount.blob.core.windows.net

West US

Failover

Update DNS

East US

Geo-Rep & Geo-Failover

• Existing URL works after failover• Failover Trigger – failover would only be used by MS if primary could not be recovered• Asynchronous Geo-replication – may lose recent updates during failover• Typically geo-replicate data within minutes, though no SLA for how long it will take

Geo-replication

Page 26: Windows Azure Storage:  What’s  Coming,  Best Practices and Internals

Geo Redundant Storage RoadmapCustomer Controlled Failover (Future)• Provide APIs to allow clients to switch the primary and secondary

regions for a storage account

Queue Geo-Replication (Done)Secondary Read Only Access (by end of CY13)

Page 27: Windows Azure Storage:  What’s  Coming,  Best Practices and Internals

Secondary Read-Only Access – ScenariosRead-only access to data even if primary is unavailable• Access to an eventually consistent copy of the data in the other region

Provides another read source for geographically distributed applications/customers• Allows lower latency access to data in secondary region• Have compute at both primary and secondary region and use the storage stored in

that region

For these, the application semantics need to allow for eventually consistent reads

Page 28: Windows Azure Storage:  What’s  Coming,  Best Practices and Internals

Secondary RO Access – How it WorksCustomers using Geo Redundant Storage can opt to have read-only access to the eventually consistent copy of the data on Secondary tenant• Customer choose primary region, and the secondary region is fixed

Get two endpoints for accessing your storage account• Primary endpoint– accountname.<service>.core.windows.net

• Secondary endpoint– accountname-secondary.<service>.core.windows.net

Same storage keys work for both endpoints

Consistency• All Writes go to the Primary• Reads to Primary are Strongly Consistent • Reads to Secondary are Eventually Consistent

Page 29: Windows Azure Storage:  What’s  Coming,  Best Practices and Internals

Secondary RO Access – CapabilitiesApplication will be able to control which location they read data from• Use one of the two endpoints– Primary: accountname.<service>.core.windows.net– Secondary: accountname-secondary.<service>.core.windows.net

• Our client library SDK will provide features for reading from the secondary– PrimaryOnly, SecondaryOnly, PrimaryThenSecondary, etc

Application will be able to query the current max geo-replication delay for each service (blob, table, queue) for a storage account

There will be separate storage account metrics for primary and secondary locations

Page 30: Windows Azure Storage:  What’s  Coming,  Best Practices and Internals

www.buildwindows.com

Windows Azure Import/ExportMove TBs of data into and out of Windows Azure Blobs by shipping disks

Windows Azure

Storage

Regional Data CenterCustomer

Hard drives

Courier Service

Hard drives

Page 31: Windows Azure Storage:  What’s  Coming,  Best Practices and Internals

www.buildwindows.com

Import Process1. Customer downloads the

tool & prepare drives (must BitLocker all drives)

https://manage.windowsazure.com

2. Customer creates an import job & provides the BitLocker Key’s & obtain mailing address

Data Center

3. Customer ships the encrypted drives

Support Staff

4. Receives & Insert Drives

Windows Azure Blob Storage

5. Data Transfer

6. Removes and ships the drives back

Page 32: Windows Azure Storage:  What’s  Coming,  Best Practices and Internals

www.buildwindows.com

Export Process

https://manage.windowsazure.com

1. Customer creates an export job & obtain # of disks to ship and mailing address

Data Center

2. Customer ships empty drives

Support Staff

3. Receives & Insert Drives

Windows Azure Blob Storage

4. Data Transfer

5. Removes and ships the drives back

6. Customer retrieves BitLocker key

Page 33: Windows Azure Storage:  What’s  Coming,  Best Practices and Internals

www.buildwindows.com

Import/Export FeaturesAccessible via REST with Portal integrationEach Job imports/exports data for a single storage account• Each Job can be up to 10 disks

Support 3.5” SATA HDDsAll Disks must be encrypted with BitLocker

Page 34: Windows Azure Storage:  What’s  Coming,  Best Practices and Internals

Realtime MetricsWhat?• Provide per minute aggregates at API and service level with up to 5

minute delay• Available for Blobs, Tables and Queues• Enable via SetServiceProperties• This data is stored in tables under your storage account– $MetricsRealtimeTransactionsBlob, $MetricsRealtimeTransactionsTable and

$MetricsRealtimeTransactionsQueue• Utilize this to monitor– Availability, Transaction rate, Ingress/Egress, Failure rate per category etc.

Why?• Empower you to monitor/investigate problems in application in real-

time• Hourly granularity smoothens spikes that may exist in traffic patterns

Page 35: Windows Azure Storage:  What’s  Coming,  Best Practices and Internals

Hourly metrics - TPS

6/24/2

013

6/24/2

013 2

:00

6/24/2

013 4

:00

6/24/2

013 6

:00

6/24/2

013 8

:00

6/24/2

013 1

0:00

6/24/2

013 1

2:00

6/24/2

013 1

4:00

6/24/2

013 1

6:00

6/24/2

013 1

8:00

6/24/2

013 2

0:00

6/24/2

013 2

2:00

6/25/2

013

660000

665000

670000

675000

680000

685000

690000

695000

700000

180182184186188190192194196198200

Average of TransactionCountAverage of TPS

Page 36: Windows Azure Storage:  What’s  Coming,  Best Practices and Internals

Realtime Metrics - TPS

6/24/2

013

6/24/2

013 0

:04

6/24/2

013 0

:08

6/24/2

013 0

:12

6/24/2

013 0

:16

6/24/2

013 0

:20

6/24/2

013 0

:24

6/24/2

013 0

:28

6/24/2

013 0

:32

6/24/2

013 0

:36

6/24/2

013 0

:40

6/24/2

013 0

:44

6/24/2

013 0

:48

6/24/2

013 0

:52

6/24/2

013 0

:56

6/24/2

013 1

:000

2000400060008000

100001200014000160001800020000

0

50

100

150

200

250

300

350

Average of TransactionCountAverage of TPS

Page 37: Windows Azure Storage:  What’s  Coming,  Best Practices and Internals

Realtime Metrics Settings <RealtimeMetrics> <Version>1.0</Version> <Enabled>true</Enabled> <IncludeAPIs>true</IncludeAPIs> <RetentionPolicy> <Enabled>true</Enabled> <Days>7</Days> </RetentionPolicy> </RealtimeMetrics>

Page 38: Windows Azure Storage:  What’s  Coming,  Best Practices and Internals

CORS (Cross Origin Resource Sharing)What?• Browser by default prevents scripts from accessing resources from

different origin• CORS is a mechanism that enables cross origin access for scripts• Set CORS rules via SetServiceProperties for Blobs, Tables and Queues– Can control the origins that can access resources– Can control the headers that can be accessed

Why?• Do not require running a proxy service for web apps to access storage

service

Page 39: Windows Azure Storage:  What’s  Coming,  Best Practices and Internals

CORS Settings <Cors> <CorsRule>

<AllowedMethods>GET,PUT</AllowedMethods> <AllowedOrigins>*</AllowedOrigins> <AllowedHeaders>*</AllowedHeaders> <ExposedHeaders>*</ExposedHeaders> <MaxAgeInSeconds>180</MaxAgeInSeconds> </CorsRule> </Cors>

Page 40: Windows Azure Storage:  What’s  Coming,  Best Practices and Internals

JSON (JavaScript Object Notation)What?• A popular concise format for REST protocols• OData supports two formats– ATOMPub: We currently support this but is too verbose– JSON: OData has released multiple flavors of JSON

Why?• Improves COGS for applications– Lower bandwidth consumption (approx. 70% savings), lower cpu utilization

and hence better responsiveness • Many applications use JSON to represent object model – Efficient object data model to wire protocol

Page 41: Windows Azure Storage:  What’s  Coming,  Best Practices and Internals

2.1 .NET LibraryNew Features• Async Task methods with support for cancellation• Byte Array, Text, File upload / download APIs for blobs• IQueryable provider for Tables• Compiled Expressions for Table Entities

Performance Improvements• Buffer Pooling• Multi-Buffer Memory Stream for consistent performance when buffering

unknown length data• .NET MD5 now default (~20% faster than invoking native one)

Available Soon @ http://www.nuget.org/packages/WindowsAzure.Storage

Page 42: Windows Azure Storage:  What’s  Coming,  Best Practices and Internals

Demo – CORS, JSON and Realtime Metrics

Page 43: Windows Azure Storage:  What’s  Coming,  Best Practices and Internals

Best Practices – Account, Blobs, Tables and

Queues

Page 44: Windows Azure Storage:  What’s  Coming,  Best Practices and Internals

General .NET Best Practices For AzureDisable Nagle for small messages (< 1400 b)• ServicePointManager.UseNagleAlgorithm = false;

Disable Expect 100-Continue* • ServicePointManager.Expect100Continue = false;

Increase default connection limit• ServicePointManager.DefaultConnectionLimit = 100; (Or More)

Take advantage of .Net 4.5 GC• GC performance is greatly improved• Background GC:

http://msdn.microsoft.com/en-us/magazine/hh882452.aspx

Page 45: Windows Azure Storage:  What’s  Coming,  Best Practices and Internals

General Best PracticesLocate Storage accounts close to compute/usersUnderstand Account Scalability targets• Use multiple storage accounts to get more• Distribute your storage accounts across regions

Cache critical data sets • As a Backup data set to fall back on• To get more request/sec than the account/partition targets

Distribute load over many partitions and avoid spikes

Page 46: Windows Azure Storage:  What’s  Coming,  Best Practices and Internals

General Best Practices (cont.)Use HTTPSOptimize what you send & receive• Blobs: Range reads, Metadata, Head Requests• Tables: Upsert, Merge, Projection, Point Queries• Queues: Update Message, Batch size

Control Parallelism at the application layer• Unbounded Parallelism can lead to slow latencies and throttling

Page 47: Windows Azure Storage:  What’s  Coming,  Best Practices and Internals

General Best Practices (cont.)Enable Logging & Metrics on each storage service• Can be done via REST, Client API, or Portal• Enables clients to self diagnose issues, including performance related

ones• Data can be automatically GC’d according to a user specified retention

interval– For example, have longer retention for hourly metrics and shorter retention for

realtime metrics

Page 48: Windows Azure Storage:  What’s  Coming,  Best Practices and Internals

Blob Best PracticeTry to match your read size with your write size• Avoid reading small ranges on blobs with large blocks• CloudBlockBlob.StreamMinimumReadSizeInBytes/ StreamWriteSizeInBytes

How do I upload a folder the fastest?• Upload multiple blobs simultaneously

How do I upload a blob the fastest?• Use parallel block upload

Concurrency (C)- Multiple workers upload different blobsParallelism (P) – Multiple workers upload different blocks for same blob

Page 49: Windows Azure Storage:  What’s  Coming,  Best Practices and Internals

Concurrency Vs. Blob ParallelismXL VM Uploading 512, 256MB Blobs (Total upload size = 128GB)• C=1, P=1 => Averaged ~ 13. 2 MB/s• C=1, P=30 => Averaged ~ 50.72 MB/s• C=30, P=1 => Averaged ~ 96.64 MB/s

• Single TCP connection is bound by TCP rate control & RTT• P=30 vs. C=30: Test completed almost twice as fast!• Single Blob is bound by the limits of a single partition • Accessing multiple blobs concurrently scales

0

2000

4000

6000

8000

10000

P=1, C=1

P=30, C =1P=1, C=30

Tim

e (s

)

Page 50: Windows Azure Storage:  What’s  Coming,  Best Practices and Internals

Blob DownloadXL VM Downloading 50, 256MB Blobs (Total download size = 12.5GB)• C=1, P=1 => Averaged ~ 96 MB/s• C=30, P=1 => Averaged ~ 130 MB/s

C=1, P=1 C=30, P=10

20

40

60

80

100

120

140

Tim

e (s

)

Page 51: Windows Azure Storage:  What’s  Coming,  Best Practices and Internals

Table Best PracticeCritical Queries: Select PartitionKey, RowKey to avoid hotspots• Table Scans are expensive – avoid them at all costs for latency sensitive

scenariosBatch: Same PartitionKey for entities that need to be updated togetherSchema-less: Store multiple types in same tableSingle Index – {PartitionKey, RowKey}: If needed, concatenate columns to form composite keysEntity Locality: {PartitionKey, RowKey} determines sort order• Store related entites together to reduce IO and improve performance

Table Service Client Layer in 2.1: Dramatic performance improvements and better NoSQL interface

Page 52: Windows Azure Storage:  What’s  Coming,  Best Practices and Internals

Queue Best PracticeMake message processing idempotent: Messages become visible if client worker fails to delete messageBenefit from Update Message: Extend visibility time based on message or save intermittent stateMessage Count: Use this to scale workersDequeue Count: Use it to identify poison messages or validity of invisibility time usedBlobs to store large messages: Increase throughput by having larger batchesMultiple Queues: To get more than a single queue (partition) target

Page 54: Windows Azure Storage:  What’s  Coming,  Best Practices and Internals

Evaluate this session

Scan this QR code to evaluate this session and be automatically entered in a drawing to win a prize!

Page 55: Windows Azure Storage:  What’s  Coming,  Best Practices and Internals

© 2013 Microsoft Corporation. All rights reserved. Microsoft, Windows, Windows Vista and other product names are or may be registered trademarks and/or trademarks in the U.S. and/or other countries.The information herein is for informational purposes only and represents the current view of Microsoft Corporation as of the date of this presentation. Because Microsoft must respond to changing market conditions, it should not be interpreted to be a commitment on the part of Microsoft, and Microsoft cannot guarantee the accuracy of any information provided after the date of this presentation. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS PRESENTATION.