inside windows azure storage name title microsoft corporation
TRANSCRIPT
Agenda
Windows Azure Storage Today
What’s new?Blobs, Tables and Queues features
Storage Analytics
Geo-Replication
Windows Azure Storage Internals
Windows Azure Storage Today
Geographically Distributed across 3 RegionsThousands of services/applicationsAnywhere at Anytime Access to your dataDurability and Scalability
North Central US
South Central US
Northern Europe
Western Europe East Asia
South East Asia
70 Petabytes raw storage todayGrows to >200 Petabytes by start of 2012
Running on Windows Azure StorageTelemetry for KinectGame Saves in CloudMicrosoft Zune Media Storage and DeliveryFacebook and TwitterNear Real-Time Search
Running on Windows Azure StorageBing Realtime facebook/twitter search ingestion engine
Bing Ingestion Engine (Azure Service)
VM VM VM VM
Windows Azure Queues
Index Facebook/Twitter data within 15 seconds of update
Windows Azure Tables
Windows Azure Tables
User postingsStatus updates
…………
Facebook/Twitter data stored into blobs
Ingestion engine process blobsAnnotate with auth/spam/adult scores, content classification, expands links, etcUses Tables heavily for indexing
Queues to manage work flow
Results stored back into blobs
Bing takes resulting blobs and folds into search index
peak 40,000 Requests/sec2~3 billion Requests per day
Took 1 dev 2 months to design, build and release to production
Windows Azure Storage
Abstractions Easy client access
Easy to use REST APIs and Client Libraries
Existing NTFS APIs for Windows Azure Drives
Blobs File system in the cloud
Tables Massively scalable structured storage
Queues Reliable storage and delivery of messages
DrivesDurable NTFS volumes for Windows Azure applications
Windows Azure Storage Account
User creates a globally unique storage account nameChoose the primary location to host storage account
North Central US
South Central US
Northern Europe
Western Europe East Asia
South East Asia
Windows Azure Data Storage Concepts
Account Table Entities
Queue Messages
Container Blobs
https://<account>.blob.core.windows.net/<container>
https://<account>.table.core.windows.net/<table>
https://<account>.queue.core.windows.net/<queue>
Windows Azure BlobsUpload/Download BlobsProvides continuation for large uploadsProvides range readsStrong Consistency and Optimistic Concurrency
Conditional operations – If-Match, If-Not-Modified-Since, etc.
A highly scalable and durable file system in the cloud
Store files as blobs and associate metadata with itBlobs can be up to 200 GB in size
Snapshot BlobCreate versions/backup of your blobs
Lease BlobExclusive write lease
Windows Azure Blobs – What is new?Efficient Resume for browsers and streaming media players require:
Range requests of the form “Range: bytes 100-”
Return “Accept-Ranges” response header
ETags to be quoted
Windows Azure Tables
Scalable Structured StorageStore Tables with billions of entities and TBs of data
Provides flexible schema (NoSQL)
Data Model
A table is a set of entities (rows)
An entity is a set of properties (columns)
Familiar and Easy to use APIOData Protocol
WCF Data Services - .NET classes and LINQ
Windows Azure Tables – What is new?
Insert entityUpdate entity Merge Replace
Delete entityQuery entityEntity Group Transactions
Query Projection ($select)Project only selected columns
Upsert EntityInsertOrReplaceInsertOrMerge
Windows Azure Tables - Projection public class Customer { public string PartitionKey { get; set; } // Customer Name public string RowKey { get; set; } // Customer Phone Number public DateTime CustomerSince { get; set; } public double TotalPurchase { get; set; } public string State { get; set; } // 100 more properties including profile picture etc.… }
// Partial entity defined here public class CustomerDiscount { public string PartitionKey { get; set; } public string RowKey { get; set; }
public double TotalPurchase { get; set; } }
Windows Azure Tables - Projection // Select partial entities by choosing properties to be projected
var query = (from entity in context.CreateQuery<CustomerDiscount>("Customers" /*Table Name*/)
select new CustomerDiscount
{
PartitionKey = entity.PartitionKey,
RowKey = entity.RowKey,
TotalPurchase = entity.TotalPurchase,
}).AsTableServiceQuery<CustomerDiscount>();
foreach (CustomerDiscount customer in query)
{
// Calculate the discount to be given based on total purchases made
}
Windows Azure Tables - Upsert
// When user logs in from mobile device, it will register the user using upsert
Customer customer = new Customer("Thomas Anderson", “555-555-0100");
customer.Address = "4567 Main St. Redmond 48188";
customer.State = "Washington"
// Note: AttachTo method is called without an Etag which indicates // that this is an Upsert Commandcontext.AttachTo("Customers"/*Table Name*/, customer);
context.UpdateObject(customer);
// No SaveChangeOptions indicates that a MERGE verb will be used // to get InsertOrMerge semantics // Use SaveChangesOptions.ReplaceOnUpdate for InsertOrReplace semantics.// But InsertOrReplace will overwrite TotalPurchase if it existedcontext.SaveChanges(SaveChangesOptions.ReplaceOnUpdate);context.SaveChanges();
Windows Azure Queues
Provides reliable message delivery
Programming semantics – Ensures that a message can be processed at least once Put message into the queue
Get message makes the message invisible in queue for a specified invisibility timeout
Delete message once done processing to remove message from queue
If worker crashes, message becomes visible for another worker to process
Windows Azure Queues – What is new?Allow larger messages to be stored in queueMessage size has been increased to 64 KB
Allow worker to treat the invisibility timeout as a leaseLease can be renewed on a queue message
Allow worker to update contents of queue message Enable efficient continuation on worker failure
Schedule work at a future time“PUT Message” takes invisibility timeout
Windows Azure QueueUpdate Message Example
Azure Queue
Work items
Web Role
Web Role
Worker Role
Worker Role
Current time
7:00 Get Message with 5 minutes visibility timeout
Expires @ 7:05AM
7:04
7:04
Extend visibility timeout with another 5 minutes
Expires @ 7:09AM
Periodically store progress information in message content
7:07
7:09
7:09
Retrieve progress from queue message and resume
Storage Analytics
Goal Capabilities
LogsEnable customers to understand and debug their usage of storage
MetricsEnable customers to get an hourly summary of key statistics about the traffic to their Blobs, Tables and Queues
Enable customers to understand and debug their usage of storage
Provides ability to answer commonly asked questions:
Storage Analytics – Why turn on logging?
Did a specific request make it to the storage service and how long did it take?
What Client IP issued a “Delete container” request and when?
How many requests were issued by a specific client or to a specific set of objects?
list goes on and on…
Storage Analytics Logs
Log records for requests are stored in Windows Azure Blobs
The Log blobs are text files with one log entry per lineEach blob can contain one to many request records
A request typically appears in the log within 15 minutes after it completes execution
Configure the logging levels separately for
Blob, Table and Queuesread (GET), write (PUT/POST/MERGE), delete (DELETE) requests or any combination
Best effort logging
Storage Analytics Data Fields Logged
The following are some of the fields logged for each record:
Log VersionAccessing AccountOwner AccountService TypeRequest URLObject KeyRequest ID
Operation NumberRequest VersionOperation TypeStart TimeApplication End to End LatencyStorage Server LatencyAuthentication Type
Request StatusHTTP Status CodeClient IPUser AgentReferrerClient Request IDETagLMT
Request Packet SizeRequest Header SizeResponse Packet SizeResponse Header SizeRequest MD5Server MD5Conditions Used
Log Entry Example
Log Version: 1.0
Start Time: 2011-07-28T18:02:40.6271789Z
Operation Type: PutBlob
Status: Success
HTTP Status Code: 201
Application E2E Latency (milliseconds): 28
Storage Server Latency (milliseconds): 21
Accessing Account: sally
Owner Account: sally
Service Type: blob
Request URL: PUT http://sally.blob.core.windows.net/thumbnails/lake.jpg
Object Key: /sally/thumbnails/lake.jpg
Request ID: fb658ee6-6123-41f5-81e2-4bfdc178fea3
Operation Number: 0
Request Version: 2009-09-19
Client IP: 201.9.10.20
Client Request ID: req12345
Log Entry in Blob:1.0;2011-07-28T18:02:40.6271789Z;PutBlob;Success;201;28;21;authenticated;sally;sally;blob;"http://sally.blob.core.windows.net/thumbnails/lake.jpg?timeout=30000";"/sally/thumbnails/lake.jpg";fb658ee6-6123-41f5-81e2-4bfdc178fea3;0;201.9.10.20;2009-09-19;438;100;223;0;100;;"66CbMXKirxDeTr82SXBKbg==";"0x8CE1B67AD25AA05";Thursday, 28-Jul-11 18:02:40 GMT;;;;"req12345“
Provides ability to answer commonly asked questions:
Storage Analytics – Why turn on Metrics?
How many transactions did my service issue per hour over the past week?
How many anonymous Get Blob requests were issued to my storage account?
My application is not performing as expected, what is the availability and performance of storage for a given time period?
list goes on and on…
Storage Analytics – MetricsTransaction metrics are provided for every 1 hour time interval stored into Windows Azure TablesExample Metrics
Total TransactionsAvailability% Success, % Network Errors, % Timeout, % Throttled, etc.Average Latency (Application E2E and Storage Server latency)Total IngressTotal Egress
Blob, Table or Queue Summary and per REST API metrics
Capacity metrics provided for only Blobs at this timeUpdated once a day
Capacity and # of objects
Storage Analytics – Example using MetricsClient application running in cloud start experiencing slow table access
Compare Application E2E latency with Storage Server latency
Time for input to be transferred to storage
service
Time for storage service to process request and
compute result
Time taken for application
to retrieve the result
Storage Server Latency
Application E2E Latency
Request arrives at storage service
Done
Compare Application E2E Latency to Storage Server Latency
8/23
/201
1 10
:00
8/23
/201
1 12
:00
8/23
/201
1 14
:00
8/23
/201
1 16
:00
8/23
/201
1 18
:00
8/23
/201
1 20
:00
8/23
/201
1 22
:00
8/24
/201
1 0:
00
8/24
/201
1 2:
00
8/24
/201
1 4:
00
8/24
/201
1 6:
00
8/24
/201
1 8:
00
8/24
/201
1 10
:00
8/24
/201
1 12
:00
8/24
/201
1 14
:00
8/24
/201
1 16
:00
8/24
/201
1 18
:00
8/24
/201
1 20
:00
8/24
/201
1 22
:00
8/25
/201
1 0:
00
8/25
/201
1 2:
00
8/25
/201
1 4:
00
8/25
/201
1 6:
00
8/25
/201
1 8:
00
8/25
/201
1 10
:00
8/25
/201
1 12
:00
8/25
/201
1 14
:00
8/25
/201
1 16
:00
8/25
/201
1 18
:00
8/25
/201
1 20
:00
8/25
/201
1 22
:00
8/26
/201
1 0:
00
8/26
/201
1 2:
00
8/26
/201
1 4:
00
8/26
/201
1 6:
000
200
400
600
800
1000
1200
1400
Avg. Application E2E Latency (ms)
Avg. Storage Server Latency (ms)
Ave
rage
Tab
le L
aten
cy (
ms)
Compare Application E2E Latency to Storage Server Latency (ms)
8/23
/201
1 10
:00
8/23
/201
1 12
:00
8/23
/201
1 14
:00
8/23
/201
1 16
:00
8/23
/201
1 18
:00
8/23
/201
1 20
:00
8/23
/201
1 22
:00
8/24
/201
1 0:
00
8/24
/201
1 2:
00
8/24
/201
1 4:
00
8/24
/201
1 6:
00
8/24
/201
1 8:
00
8/24
/201
1 10
:00
8/24
/201
1 12
:00
8/24
/201
1 14
:00
8/24
/201
1 16
:00
8/24
/201
1 18
:00
8/24
/201
1 20
:00
8/24
/201
1 22
:00
8/25
/201
1 0:
00
8/25
/201
1 2:
00
8/25
/201
1 4:
00
8/25
/201
1 6:
00
8/25
/201
1 8:
00
8/25
/201
1 10
:00
8/25
/201
1 12
:00
8/25
/201
1 14
:00
8/25
/201
1 16
:00
8/25
/201
1 18
:00
8/25
/201
1 20
:00
8/25
/201
1 22
:00
8/26
/201
1 0:
00
8/26
/201
1 2:
00
8/26
/201
1 4:
00
8/26
/201
1 6:
000
400
800
1200
1600
Avg. Application E2E Latency (ms)
Ave
rage
Lat
ency
8/23
/201
1 10
:00
8/23
/201
1 12
:00
8/23
/201
1 14
:00
8/23
/201
1 16
:00
8/23
/201
1 18
:00
8/23
/201
1 20
:00
8/23
/201
1 22
:00
8/24
/201
1 0:
00
8/24
/201
1 2:
00
8/24
/201
1 4:
00
8/24
/201
1 6:
00
8/24
/201
1 8:
00
8/24
/201
1 10
:00
8/24
/201
1 12
:00
8/24
/201
1 14
:00
8/24
/201
1 16
:00
8/24
/201
1 18
:00
8/24
/201
1 20
:00
8/24
/201
1 22
:00
8/25
/201
1 0:
00
8/25
/201
1 2:
00
8/25
/201
1 4:
00
8/25
/201
1 6:
00
8/25
/201
1 8:
00
8/25
/201
1 10
:00
8/25
/201
1 12
:00
8/25
/201
1 14
:00
8/25
/201
1 16
:00
8/25
/201
1 18
:00
8/25
/201
1 20
:00
8/25
/201
1 22
:00
8/26
/201
1 0:
00
8/26
/201
1 2:
00
8/26
/201
1 4:
00
8/26
/201
1 6:
000
4000000
8000000
Total Table Transactions
Tota
l
Transa
ctio
ns
Root Causing the IssueThey then looked at their application performance counters and profiling to find
High CPU utilizationHigh Memory usageFrequent Garbage Collection cycles
Reason for the difference between E2E latency and Server latency
Took a long time for application to retrieve the results of a query
Their resolution was to:Increase number of VM instances
Move to larger VM instances
Move to Server GC
Storage Analytics SummarySeparate NamespaceLogsStored as blobs in separate Blob Container in the storage account being monitored
http://account.blob.core.windows.net/$logs/
MetricsStored as entities in a separate metrics Azure Tables in the storage account being monitored
http://account.table.core.windows.net/$Metrics*
Isolation$logs and $Metrics have separate resource limits and throttling from the rest of the storage account traffic
Cost
Capacity to keep the data
Transactions for generating & accessing analytics data
Can use retention policy on both logs and metrics in terms of days
Deleting data via retention policy does not incur transaction cost
East Asia South East Asia Europe
West
North Europe
South Central US
North Central US
Geo-replicationData geo-replicated cross data centers 100s miles apartTurned on right now for Blob and Table data (Queues will be in CY12)
Provide data durability in face of major data center disasters
Data only geo-replicated within regions
User chooses primary location during account creationThe other location in region is the secondary location
Asynchronous geo-replicationOff critical path of live requests
Geo-replication Geo-replicationGeo-replication
Geo-replicationIs there a cost for geo-replication?Geo-replication included in current price of Storage
Geo-replication is on by default for all storage accountsCan turn off for whole storage accountThough no price savings if you turn it off
To disable (turn off) geo-replication contact Microsoft Windows Azure Support
But note, if you turn geo-rep off and then back onData transfer egress rates apply to re-bootstrap the data from primary to secondary data center. No additional charge after the re-bootstrap is done.
North Central US
AzureDNS
http://account.blob.core.windows.net/
DNS lookup
Data access
Hostname IP Address
account.blob.core.windows.net
North Central US
Failover
Update DNS
South Central US
Geo-Failover
Existing URL works after failoverFailover Trigger – failover would only be used if primary could not be recoveredAsynchronous Geo-replication – may lose recent updates during failoverTypically geo-replicate data within minutes, though no SLA guarantee
Geo-replication
South Central US
Design GoalsHighly Available Storage with Strong ConsistencyProvide access to data in face of hardware failures
DurabilityReplicate data several times within and across data centers
ScalabilityNeed to scale to exabytes and beyond
Automatically load balance data to meet peak traffic demands
Provide a global namespace to access data around the world
Additional details can be found in this white paper:“Windows Azure Storage: A Highly Available Cloud Storage Service with Strong Consistency”, ACM Symposium on Operating System Principals (SOSP), Oct. 2011
http://go.microsoft.com/fwlink/?LinkID=234565
Windows Azure Storage Stamps
Storage Stamp
LB
StorageLocation Service
Access blob storage via the URL: http://<account>.blob.core.windows.net/
Data access
Inter-stamp (Geo) replicationPartition Layer
Front-Ends
DFS Layer
Intra-stamp replication
Storage Stamp
LB
Partition Layer
Front-Ends
DFS Layer
Intra-stamp replication
Storage Stamp Architecture – DFS Layer
M
DFS Servers
Paxos
M
MDistributedFile System(DFS)Layer
All data from the Partition Layer is stored into files (extents) in the DFS layerAn extent is replicated 3 times across different fault and upgrade domainsChecksum all stored data
Verified on every client readScrubbed every few days
Re-replicate on disk/node/rack failure or checksum mismatchLoad balancing
3 replicas are randomly allocated across a candidate set of servers based on available resourcesAny of the 3 replicas can be read from and read load balancing is usedUse a journal drive to keep the write latencies low
PartitionServer
Storage Stamp Architecture – Partition Layer
M Paxos
M
M
PartitionServer
PartitionServer
PartitionServer
PartitionMaster
Lock Service
Partition Layer
DFS Layer
Provide transaction semantics and strong consistency for high level data abstractionsStores and reads the objects to/from extents in the DFS layerProvides inter-stamp (geo) replication by shipping logs to other stampsScalable object index via partitioning
DFS Servers
PartitionServer
Storage Stamp Architecture
M Paxos
Front End Layer
FE
M
M
PartitionServer
PartitionServer
PartitionServer
PartitionMaster
FE FE FE FE
Lock Service
Partition Layer
DFS Layer
Stateless ServersAuthentication + authorizationRequest routing
DFS Servers
PartitionServer
Storage Stamp Architecture
M Paxos
Front End Layer
FE
Incoming Write Request
M
M
PartitionServer
PartitionServer
PartitionServer
PartitionMaster
FE FE FE FE
Lock Service
Ack
Partition Layer
DFS Layer
DFS Servers
Partition Layer – Scalable Object Index100s of Billions of blobs, entities, messages across all accounts can be stored in a stamp
Need to efficiently enumerate, query, get, and update them
Traffic pattern can be highly dynamicHot objects, peak load, traffic bursts, etc
Need a scalable index for the objects that can:
Spread the index across 100s of servers
Dynamically load balanceDynamically change what servers are serving each part of the index based on load
Scalable Object Index via PartitioningPartition Layer maintains an internal Object Index Table for each data abstraction
Blob Index: contains all blob objects for all accounts in a stamp
Entity Index: contains all entities for all accounts in a stamp
Message Index: contains all messages for all accounts in a stamp
Scalability is provided for each Object Index
Monitor load to each part of the index to determine hot spots
Index is dynamically split into thousands of Index RangePartitions based on load
Index RangePartitions are automatically load balanced across servers to quickly adapt to changes in load
Account
Name
Container
Name
BlobName
aaaa aaaa aaaaa
…….. …….. ……..
…….. …….. ……..
…….. …….. ……..
…….. …….. ……..
…….. …….. ……..
…….. …….. ……..
…….. …….. ……..
…….. …….. ……..
…….. …….. ……..
…….. …….. ……..
…….. …….. ……..
zzzz zzzz zzzzz
Split index into Range Partitions based on load
Can only split at PartitionKey boundaries
PartitionMap tracks Index RangePartition assignment to partition servers
Front-End caches the PartitionMap to route user requests
Each part of the index is assigned to only one Partition Server at a time
Storage Stamp
PartitionServer
PartitionServer
Account
Name
Container
Name
BlobName
richard videos tennis
……… ……… ………
……… ……… ………
zzzz zzzz zzzzz
Account
Name
Container
Name
BlobName
harry pictures sunset
……… ……… ………
……… ……… ………
richard videos soccer
PartitionServer
Partition Master
Partition Layer – Index Range Partitioning
Front-EndServer
PS 2 PS 3
PS 1
A-H: PS1H’-R: PS2R’-Z: PS3
A-H: PS1H’-R: PS2R’-Z: PS3
Partition
Map
Blob Index
Partition Map
AccountName
Container
Name
BlobName
aaaa aaaa aaaaa
……… ……… ………
……… ……… ………
harry pictures sunrise A-H
R’-ZH’-R
DFS Layer
PartitionServer 1
PartitionServer 2
PartitionServer 3
PartitionServer 4
Master System
Master System
VIP
Partition Master
- RangePartition
- Server Load
Legend
Unassign RangePartitionReassign RangePartition
Partition Layer – Automatic RangePartition Load Balancing
Load balancing is triggered based on hot RangePartitions or Partition ServersNo data is moved on disk for the reassignmentOnly changing the index assignment for the Partition Servers
FE 1 FE 2 FE 3PMPM
Scalability of Data Abstractions
Namespace for accessing storagehttp://<accountName>.<type>.core.windows.net/partitionName
How to scale out storage for your serviceUnderstand the scalability targets at 2 levels:
1. Scalability targets of a single storage account
2. Scalability targets for Blobs, Table Entities and Queues within a storage account
Scalability of Data Abstractions
Namespace for accessing storagehttp://<accountName>.<type>.core.windows.net/partitionName
How to scale out storage for your serviceUnderstand the scalability targets at 2 levels:
Scalability targets of a single storage accountAccount Scalability TargetsCapacity – Up to 100 TBsTransactions – Up to 5000 entities per secondBandwidth – Up to 3 gigabits per second
Partition data across storage accounts to go beyond these targets
Scalability of Objects within an AccountNamespace for accessing storagehttp://<accountName>.<type>.core.windows.net/partitionName
How to scale out storage for your serviceUnderstand the scalability targets at 2 levels:
Scalability targets for Blobs, Table Entities and Queues within a storage accountSingle Blob – up to 60MBytes per secondSingle PartitionKey in a Table – up to 500 entities per secondSingle Queue - up to 500 messages per second
Windows Azure Storage SummaryProvides scalable, durable and available data abstractions to build your applicationsNew features for Blobs, Tables, and Queues:Queue Message Leases/Update Table Projection and Upsert Improve Blob data streamingStorage AnalyticsGeo-replication
Overview of Windows Azure Storage Internals, with details in:“Windows Azure Storage: A Highly Available Cloud Storage Service with Strong Consistency”, ACM Symposium on Operating System Principals (SOSP), Oct. 2011http://go.microsoft.com/fwlink/?LinkID=234565
More info about the above on Windows Azure Storage blog:http://blogs.msdn.com/windowsazurestorage/
Feedback
Feedback and questions http://forums.dev.windows.com
Session feedbackhttp://bldw.in/SessionFeedback
© 2011 Microsoft Corporation. All rights reserved. Microsoft, Windows, Windows Vista and other product names are or may be registered trademarks and/or trademarks in the U.S. and/or other countries.The information herein is for informational purposes only and represents the current view of Microsoft Corporation as of the date of this presentation. Because Microsoft must respond to changing market conditions, it should not be interpreted to
be a commitment on the part of Microsoft, and Microsoft cannot guarantee the accuracy of any information provided after the date of this presentation. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS PRESENTATION.
© 2011 Microsoft Corporation. All rights reserved. Microsoft, Windows, Windows Vista and other product names are or may be registered trademarks and/or trademarks in the U.S. and/or other countries.The information herein is for informational purposes only and represents the current view of Microsoft Corporation as of the date of this presentation. Because Microsoft must respond to changing market conditions, it should not be interpreted to
be a commitment on the part of Microsoft, and Microsoft cannot guarantee the accuracy of any information provided after the date of this presentation. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS PRESENTATION.