windows azure tables and queues deep dive
Post on 05-Jan-2016
78 Views
Preview:
DESCRIPTION
TRANSCRIPT
Windows Azure Tables and Queues Deep Dive
Jai HaridasSoftware Design EngineerMicrosoft Corporation
SVC09
Agenda
1. Overview of Windows Azure Tables
2. Patterns and Practices for Windows Azure Tables
3. Overview of Windows Azure Queues
4. Patterns and Practices for Windows Azure Queues
5. Q&A
2
Fundamental Storage Abstractions
> Tables – Provide structured storage. A Table is a set of entities, which contain a set of properties
> Queues – Provide reliable storage and delivery of messages for an application
> Blobs – Provide a simple interface for storing named files along with metadata for the file
> Drives – Provides durable NTFS volumes for Windows Azure applications to use (new)
3
Windows Azure Tables> Provides Structured Storage
> Massively Scalable Tables> Billions of entities (rows) and TBs of data> Can use thousands of servers as traffic
grows
> Highly Available & Durable> Data is replicated several times
> Familiar and Easy to use API> ADO.NET Data Services – .NET 3.5
SP1> .NET classes and LINQ> REST – with any platform or language4
Table Storage Concepts
EntitiesTablesAccounts
moviesonline
Users
Movies
Email =…Name = …
Email =…Name = …
Genre =…Title = …
Genre =…Title = …
5
Table Data Model
> Table> A storage account can create many
tables> Table name is scoped by account> Set of entities (i.e. rows)
> Entity> Set of properties (columns)> Required properties
> PartitionKey, RowKey and Timestamp
6
Required Entity Properties
> PartitionKey & RowKey> Uniquely identifies an entity> Defines the sort order> Use them to scale your application
> Timestamp > Read only> Optimistic Concurrency
7
PartitionKey And Partitions
> PartitionKey> Used to group entities in the table into
partitions
> A table partition > All entities with same partition key value> Unit of scale> Control entity locality> Row key provides uniqueness within a
partition
8
PartitionKey(Category)
RowKey(Title)
Timestamp
ReleaseDate
Action Fast & Furious … 2009
Action The Bourne Ultimatum
… 2007
… … … …
Animation
Open Season 2 … 2009
Animation
The Ant Bully … 2006PartitionKey(Category)
RowKey(Title)
Timestamp
ReleaseDate
Comedy Office Space … 1999
… … … …
SciFi X-Men Origins: Wolverine
… 2009
… … … …
War Defiance … 2008
PartitionKey(Category)
RowKey(Title)
Timestamp
ReleaseDate
Action Fast & Furious … 2009
Action The Bourne Ultimatum
… 2007
… … … …
Animation
Open Season 2 … 2009
Animation
The Ant Bully … 2006
… … … …
Comedy Office Space … 1999
… … … …
SciFi X-Men Origins: Wolverine
… 2009
… … … …
War Defiance … 2008
Partitions and Partition Ranges
Server BTable = Movies
[Comedy- Western)
Server ATable = Movies[Action - Comedy)
9
Server ATable = Movies
Table Operations
> Table> Create> Query> Delete
> Entities> Insert> Update
> Merge – Partial Update> Replace – Update entire entity
> Delete> Query> Entity Group Transaction (new)
Define the schema as a .NET class
Table Schema
11
[DataServiceKey("PartitionKey", "RowKey")] public class Movie { /// <summary> /// Category is the partition key /// </summary> public string PartitionKey { get; set; }
/// <summary> /// Title is the row key /// </summary> public string RowKey { get; set; }
public DateTime Timestamp { get; set; }
public int ReleaseYear { get; set; } public string Language { get; set; } public string Cast { get; set; } }
Table SDK Sample Code
12
StorageCredentialsAccountAndKey credentials = new StorageCredentialsAccountAndKey( “myaccount", “myKey");string baseUri = "http://myaccount.table.core.windows.net";
CloudTableClient tableClient = new CloudTableClient(baseUri, credentials);
tableClient.CreateTable(“Movies");
TableServiceContext context = tableClient.GetDataServiceContext();CloudTableQuery<Movie> q = (from movie in context.CreateQuery<Movie>(“Movies")
where movie.PartitionKey == “Action" && movie.RowKey == "The Bourne Ultimatum"
select movie).AsTableServiceQuery<Movie>();Movie movieToUpdate = q.FirstOrDefault();
// Update moviecontext.UpdateObject(movieToUpdate);context.SaveChangesWithRetries();
// Add moviecontext.AddObject(new Movie(“Action" , movieToAdd));context.SaveChangesWithRetries();
Agenda
1. Overview of Windows Azure Tables
2. Patterns and Practices for Windows Azure Tables
3. Overview of Windows Azure Queues
4. Patterns and Practices for Windows Azure Queues
5. Q & A
13
Key Selection: Things to Consider> Scalability
> Distribute load as much as possible> Hot partitions can be load balanced> PartitionKey is critical for scalability
> Query Efficiency & Speed> Avoid frequent large scans> Parallelize queries
> Entity group transactions (new)> Transactions across a single partition> Transaction semantics & Reduce round
trips14
Key Selection: Case Study 1
> Table for listing all movies> Home page lists movies based on chosen
category
15
Movie Listing – Solution 1> Why do I need multiple PartitionKeys?
> Account name as Partition Key > Movie title as RowKey since movie names
need to be sorted> Category as a separate property
> Does this scale?
16
PartitionKey(Account name)
RowKey(Title)
Category …
moviesonline 12 Rounds Action …
moviesonline A Bug’s Life
Animation …
100,000,000 more rows
… … …
moviesonline Office Space
Comedy …
moviesonline Platoon War …
50,000,000 more rows
… … …
moviesonline WALL-E Animation …
Server A
Movie Listing – Solution 1
> Single partition - Entire table served by one server
> All requests served by that single server> Does not scale
PartitionKey(Account name)
RowKey(Title)
Category …
moviesonline 12 Rounds Action …
moviesonline A Bug’s Life
Animation …
100,000,000 more rows
… … …
moviesonline Office Space
Comedy …
moviesonline Platoon War …
50,000,000 more rows
… … …
moviesonline WALL-E Animation …
ClientClient Request
Request
Request
Request
17
Movie Listing – Solution 2
PartitionKey(Category)
RowKey (Title)
Action Fast & Furious
… 10000 more Action movies
Action The Bourne Ultimatum
… 100000 more Action & Animation movies
Animation Open Season 2
… 100000 more Animation movies
Animation The Ant Bully
Comedy Office Space
… 1000000 more Comedy & SciFi movies
SciFi Star Trek
… 100000 more SciFi & War movies
… 100000 more War movies
War Defiance
> All movies partitioned by category> Allows system to load balance hot partitions> Load distributed> Better than single partition
ClientClient Request
Server A
Request
Request
Request
18
Server B
Request
Request
Request
Request
PartitionKey(Category)
RowKey (Title)
Action Fast & Furious
… 10000 more Action movies
Action The Bourne Ultimatum
… 100000 more Action & Animation movies
Animation Open Season 2
… 100000 more Animation movies
Animation The Ant Bully
Comedy Office Space
… 1000000 more Comedy & SciFi movies
SciFi Star Trek
… 100000 more SciFi & War movies
… 100000 more War movies
War Defiance
Key Selection: Case Study 2
> Log every transaction into a table for diagnostics> Scale Write Intensive Scenario> Logs can be retrieved for a given time
range
19
Logging - Solution 1> Timestamp as Partition Key
> Looks like an obvious choice> It is not a single partition as time moves
forward > Append only> Requests to single partition range> Load balancing does not help> Server may throttle
PartitionKey(Timestamp)
Properties
2009-11-15 02:00:01
…
2009-11-15 02:00:11
…
100000 more rows …
2009-11-17 05:40:01
…
2009-11-17 05:40:01
…
80000 more rows …
2009-11-17 12:30:00
…
2009-11-17 12:30:01
…
ApplicationsClientReques
t
Server A
Request
2009-11-17 12:30:01
…
Request
2009-11-17 12:30:02
…
Request
2009-11-17 12:30:03
…
Server B
20
Server A
Server B
PartitionKey(ID_Timestamp)
Properties
01_2009-10-12 05:10:00
…
… …
100000 more rows …
09_2009-11-15 12:31:00
…
… …
20000000 more rows …
10_2009-10-05 05:10:10
…
5000000 more rows …
… …
900000 more rows …
19_2009-11-17 12:20:02
…
ApplicationsClientReques
tReques
tReques
tReques
t
Logging Solution 2 - Distribute "Append Only”
> Prefix timestamp such that load is distributed> Id of the node logging> Hash into N buckets
> Write load is now distributed > Better throughput> To query logs in time range
> Parallelize it across prefix values
15_2009-11-17 12:30:01
…
09_2009-11-17 12:30:22
…
19_2009-11-17 12:30:10
…
01_2009-11-17 12:30:01
…
21
Key Selection: Query Efficiency & Speed> Select keys that allow fast retrieval> Reduce scan range> Reduce scan frequency
22
Single Entity Query
Server A
PartitionKey(Category)
RowKey (Title)
Action Fast & Furious
… 10000 more Action movies
Action The Bourne Ultimatum
… 100000 more Action & Animation movies
Animation Open Season 2
… 100000 more Animation movies
Animation The Ant Bully
Comedy Office Space
… 1000000 more Comedy & SciFi movies
SciFi Star Trek
… 100000 more SciFi & War movies
… 100000 more War movies
War Defiance
Client
Server B
> Where PartitionKey=‘SciFi’ and RowKey = ‘Star Trek’
> Efficient processing> No continuation tokens
23
Request
Result
Client
Server A
Server B
Table Scan Query
Request
PartitionKey(Category)
RowKey(Title)
Rating
Action Fast & Furious 5
… 999 more movies rated > 4
…
… Action and Anim. movies here with rating < 4
…
Animation A Bug’s life 2
… 100 more movies < 4 here
…
Animation The Ant Bully 3
Comedy Are we there yet? 2
… More movies here …
Comedy Office Space 5
… 800000 more movies here
…
Drama A Beautiful Mind 5
… 1200000 more movies here
…
War Defiance 4
Cont.
> Select * from Movies where Rating > 4> Returns Continuation token
> 1000 movies in result set> Partition range boundary
> Serial Processing: Wait for continuation token before proceeding
Request Cont.
Cont.
Request Cont.
Cont.
24
Returns 1000 movies
Partition range boundary hit
Return continuation
Client
Server A
Server B
Make Scans Faster
Request
PartitionKey(Category)
RowKey(Title)
Rating
Action Fast & Furious 5
… More movies here …
Comedy Office Space 5
… More movies here …
Documentary
Planet Earth 4
… More movies here
Drama Seven Pounds 4
Horror Saw 5 3
… More movies here …
Music 8 Mile 2
… More movies here …
SciFi Star Trek 5
… More movies here …
Cont.
> Split “Select * from Movies where Rating > 4” into> Where PartitionKey >= “A” and PartitionKey < “D” and Rating > 4> Where PartitionKey >= “D” and PartitionKey < “I” and Rating > 4> Etc.
> Execute in parallel> Each query handles continuation
Cont.
25
Request
Request
Cont.
Query Speed1. Fast
> Single PartitionKey and RowKey with equality
2. Medium> Single partition but a small range for RowKey> Entire partition or table that is small
3. Slow> Large single scan> Large table scan> “OR” predicates on keys => no query
optimization => results in scan> Expect continuation token for all except in 1
26
Make Queries Faster
> Large Scans> Split the range and parallelize queries> Create and maintain own views that help
queries
> “Or” Predicates> Execute individual query in parallel
instead of using “OR”
> User Interactive> Cache the result to reduce scan
frequency
27
Expect Continuation Tokens – Seriously!> Maximum of 1000 rows in a response> At the end of partition range boundary> Maximum of 5 seconds to execute the
query
28
Entity Group Transactions (EGT) (new)> Atomically perform multiple
insert/update/deleteover entities in same partition in a single transaction
> Maximum of 100 commands in a single transaction and payload < 4 MB
> ADO.Net Data Service> Use SaveChangesOptions.Batch
29
Key Selection: Entity Group Transaction> Case Study
> Maintain user account information> Account ID, User Name, Address, Number of rentals
> Maintain information of checked out rentals> Account ID, Movie Title, Check out date, Due date
> Solution 1 – Maintain two tables – Users & Rentals > Handle Cross table consistency
> Insert into Rentals table succeeds> Update to Users table fails> Queue to maintain consistency
30
Solution 2> Store Account Information and Rental details in
same table> Maintain same PartitionKey to enforce transactions
> Account ID as PartitionKey> Update total count and Insert new rentals using Entity
Group Transaction> Prefix RowKey with “Kind” code: A = Account, R = Rental
> Row key for account info: [Kind Code]_[AccountId]> Row Key for rental info: [Kind Code]_[Title]
> Rental Properties not set for Account row and vice versaPartitionKey(AccountID)
RowKey(Kind_*)
Kind TotalRentals
Name Address CheckOutOn
Title DueOn
… … … … … … … … …
Sally A_Sally Account
8 Sally Field
Ann Arbor, MI
Sally R_Jaws Rental 2009/11/16 Jaws 2009/11/20
Sally R_Taxi Rental 2009/11/16 Taxi 2009/11/20
… … … … … … … … …31
Best Practices & Summary> Select PartitionKey and RowKey that help scale
> Efficient for frequently used queries> Supports batch transactions> Distributes load
> Distribute “Append only” patterns using prefix to PartitionKey
> Always Handle continuation tokens
> Client can maintain their own cache/views instead of frequent scans> Future Feature - Secondary Index
> Execute parallel queries instead of “OR” predicates
> Implement back-off strategy for retries
32
Agenda
1. Overview of Windows Azure Tables
2. Patterns and Practices for Windows Azure Tables
3. Overview of Windows Azure Queues
4. Patterns and Practices for Windows Azure Queues
5. Q & A
33
Windows Azure Queues
> Queue are performance efficient, highly available and provide reliable message delivery> Simple, asynchronous work dispatch> Programming semantics ensure that a
message can be processed at least once
> Access is provided via REST
34
Queue Storage Concepts
Messages
QueuesAccounts
sally
thumbnailjobs
traverselinks
128 x 128 http://...
256 x 256 http://...
http://...
http://...
35
Account, Queues and Messages
> An account can create many queues> Queue Name is scoped by the account
> A Queue contains messages> No limit on number of messages stored in a
queue> Set a limit for message expiration
> Messages> Message size <= 8 KB> To store larger data, store data in blob/entity
storage, and the blob/entity name in the message
> Message now has dequeue count 36
Queue Operations
> Queue> Create Queue> Delete Queue> List Queues> Get/Set Queue Metadata
> Messages> Add Message (i.e. Enqueue Message)> Get Message(s) (i.e. Dequeue Message)> Peek Message(s)> Delete Message
37
Queue Programming Api
38
CloudQueueClient queueClient = new CloudQueueClient(baseUri, credentials);CloudQueue queue = queueClient.GetQueueReference("test1");
queue.CreateIfNotExist();
//MessageCount is populated via FetchAttributesqueue.FetchAttributes();
CloudQueueMessage message = new CloudQueueMessage("Some content");queue.AddMessage(message);
message = queue.GetMessage(TimeSpan.FromMinutes(10) /*visibility timeout*/);
//Process the message here …
queue.DeleteMessage(message);
Agenda
1. Overview of Windows Azure Tables
2. Patterns and Practices for Windows Azure Tables
3. Overview of Windows Azure Queues
4. Patterns and Practices for Windows Azure Queues
5. Q & A
39
21
11
C1
C2
Removing Poison Messages
11
21
340
Producers Consumers
P2
P1
30
2. GetMessage(Q, 30 s) msg 2
1. GetMessage(Q, 30 s) msg 1
11
21
40
10
20
C1
C2
Removing Poison Messages
340
Producers Consumers
P2
P1
11
21
2. GetMessage(Q, 30 s) msg 23. C2 consumed msg 24. DeleteMessage(Q, msg 2)7. GetMessage(Q, 30 s) msg 1
1. GetMessage(Q, 30 s) msg 15. C1 crashed
11
21
6. msg1 visible 30 s after Dequeue30
41
121112
C1
C2
Removing Poison Messages
340
Producers Consumers
P2
P1
12
2. Dequeue(Q, 30 sec) msg 23. C2 consumed msg 24. Delete(Q, msg 2)7. Dequeue(Q, 30 sec) msg 18. C2 crashed
1. Dequeue(Q, 30 sec) msg 15. C1 crashed10. C1 restarted11. Dequeue(Q, 30 sec) msg 112. DequeueCount > 213. Delete (Q, msg1)
12
6. msg1 visible 30s after Dequeue9. msg1 visible 30s after Dequeue
30
42
131213
Best Practices & Summary> Make message processing idempotent
> No need to deal with failures
> Do not rely on order> Invisible messages result in out of order
> Use Dequeue count to remove poison messages> Enforce threshold on message’s dequeue count
> Use message count to dynamically increase/reduce workers
> Use blob to store message data with reference in message> Messages > 8KB> Batch messages> Garbage collect orphaned blobs
43
Future Features
44
> Allow workers to extend invisibility time> Time to process message unknown at
dequeue time> Worker can extend the time as needed
> Allow longer invisibility time> Long running work items may need more
than 2 hours
> Allow messages to not expire> Large backlogs will not cause messages
to expire
Takeaways
45
> Table> Scalable & Reliable Structured Storage System> Partitioning is critical to scalability> Entity Group Transactions (new)
> Queue> Scalable & Reliable Messaging System > Dequeue count returned with message (new)
> Use back-off strategy on retries
> Official Storage Client Library (new)
> Storing and Manipulating Blobs and Files with Windows Azure Storage – 11/18 (4:30 PM)
> Patterns for building Reliable & Scalable Applications with Windows Azure – 11/19 (8:30 AM)
> Automating the Application Lifecycle with Windows Azure – 11/19 (10:00 AM)
Windows Azure Session Alerts!!
Q&A
Windows Azure PDC Swag
YOUR FEEDBACK IS IMPORTANT TO US!
Please fill out session evaluation
forms online atMicrosoftPDC.com
Learn More On Channel 9
> Expand your PDC experience through Channel 9
> Explore videos, hands-on labs, sample code and demos through the new Channel 9 training courses
channel9.msdn.com/learnBuilt by Developers for Developers….
© 2009 Microsoft Corporation. All rights reserved. Microsoft, Windows, Windows Vista and other product names are or may be registered trademarks and/or trademarks in the U.S. and/or other countries.The information herein is for informational purposes only and represents the current view of Microsoft Corporation as of the date of this presentation. Because Microsoft must respond to changing market conditions, it should not be interpreted to be a commitment on the part of Microsoft, and Microsoft cannot guarantee the accuracy of any information provided after the date of this presentation. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS PRESENTATION.
top related