nosql em windows azure table storage - vitor tomaz

51
NoSQL em Windows Azure Table Storage Vítor Tomaz http://netponto.o rg 37ª Reunião Presencial @ Lisboa - 23/03/2013

Upload: comunidade-netponto

Post on 06-May-2015

693 views

Category:

Technology


2 download

DESCRIPTION

Nesta sessão vamos analisar as características deste serviço fazer uma breve introdução à arquitectura que a suporta. Iremos verificar as considerações que devem ser tidas em conta na criação e utilização deste tipo de armazenamento, analisando o impacto que as decisões tomadas têm no que respeita a performance e objectivos de escalabilidade. Serão ainda mostrados alguns exemplos de utilização em cenários distintos, incluindo algumas optimizações que se podem fazer para melhorar a performance. Comunidade NetPonto, a comunidade .NET em Portugal! http://netponto.org

TRANSCRIPT

Page 1: NoSQL em Windows Azure Table Storage - Vitor Tomaz

NoSQL em Windows Azure Table StorageVítor Tomaz

http://netponto.org37ª Reunião Presencial @ Lisboa - 23/03/2013

Page 2: NoSQL em Windows Azure Table Storage - Vitor Tomaz

Vítor TomazISEL – LEICSAFIRA

NetPontoAzurePTRevista ProgramarPortugal@ProgramarSQLPortMSDN

Page 3: NoSQL em Windows Azure Table Storage - Vitor Tomaz

Agenda

• Characteristics & Concepts• Service Architecture• Scalability Targets• Non-Relational Data Modeling• Best Practices

Page 4: NoSQL em Windows Azure Table Storage - Vitor Tomaz

Windows Azure Storage Characteristics • A “pay for what you use” cloud storage system

Durable: Store multiple replicas of your data Local replication:

Synchronous replication before returning success Geo replication:

Replicated to data center at least 400+ miles apart Asynchronous replication after returning success to user.

Available: Multiple replicas are placed to provide fault tolerance

Scalable: Automatically partitions data across servers to meet traffic demands

Strong consistency: Default behavior is consistent reads once data is committed

Page 5: NoSQL em Windows Azure Table Storage - Vitor Tomaz

Windows Azure Storage Abstractions

TablesStructured storage. A table is a set of entities; an entity is

a set of properties.

QueuesReliable storage and delivery of messages for an application.

BlobsSimple named files along with metadata for the file.

DrivesDurable NTFS volumes for Windows Azure applications to use. Based on Blobs.

Page 6: NoSQL em Windows Azure Table Storage - Vitor Tomaz

Storage Libraries in Many Languages

Page 7: NoSQL em Windows Azure Table Storage - Vitor Tomaz

Windows Azure Storage AccountUser specified globally unique account name

North Central USNorthern Europe

Western Europe East Asia

South East Asia

US Europe Asia

Can choose geo-location to host storage account:

South Central US

West US East US

Page 8: NoSQL em Windows Azure Table Storage - Vitor Tomaz

Table Storage ConceptsEntityTableAccount

contoso

Name =…Email = …

Name =…EMailAdd=

customers

Photo ID =…Date =…

photos

Photo ID =…Date =…

Page 9: NoSQL em Windows Azure Table Storage - Vitor Tomaz

No Fixed Schema

FIRST LAST BIRTHDATE

Wade Wegner 2/2/1981

Nathan Totten 3/15/1965

Nick Harris May 1, 1976

FAV SPORT

Canoeing

Page 10: NoSQL em Windows Azure Table Storage - Vitor Tomaz

Table Details

InsertUpdate Merge – Partial update

Replace – Update entire entity

UpsertDeleteQueryEntity Group Transactions Multiple CUD Operations in a single atomic transaction

Create, Query, DeleteTables can have metadataNot an RDBMS! Table

Entities

Page 11: NoSQL em Windows Azure Table Storage - Vitor Tomaz

Entity PropertiesEntity can have up to 255 propertiesUp to 1MB per entity

Mandatory Properties for every entityPartitionKey & RowKey (only indexed properties)Uniquely identifies an entityDefines the sort order

Timestamp Optimistic ConcurrencyExposed as an HTTP Etag

No fixed schema for other propertiesEach property is stored as a <name, typed value> pairNo schema stored for a tableProperties can be the standard .NET types String, binary, bool, DateTime, GUID, int, int64, and double

Page 12: NoSQL em Windows Azure Table Storage - Vitor Tomaz

Scalability

Partition: Range of entities with same partition key value.Partitions are fanned out based on loadThey can be condensed when load decreasesReads are load balanced against three replicas

Server 1 Server 2 Server 3

P1

P2

Pn

Page 13: NoSQL em Windows Azure Table Storage - Vitor Tomaz

Service Architecture

Page 14: NoSQL em Windows Azure Table Storage - Vitor Tomaz

Storage Stamp Architecture

Extent Nodes (EN)

Front End Layer FE

Incoming Write Request

PartitionServer

PartitionServer

PartitionServer

PartitionServer

PartitionMaster

FE FE FE FE

Lock Service

Ack

Partition Layer

Stream Layer

Page 15: NoSQL em Windows Azure Table Storage - Vitor Tomaz

Windows Azure Storage - Architecture

Page 16: NoSQL em Windows Azure Table Storage - Vitor Tomaz

PartitionKeyUnique identifier for the partition within a give table.

RowKeyUnique Identifier for an entity within a given partition.

Both Keys matter!Define Primary KeyForms a single clustered index

Scalability

SlowestNo Partition KeyNo Row Key

SlowerOnly Partition KeyNo Row Key

Very FastPartition Key + Row Key

Page 17: NoSQL em Windows Azure Table Storage - Vitor Tomaz

Table Storage – Key Points

1000 EntitiesAny query not including the Rowkey and PartitionKey (only those as well) needs to handle Continuation tokenshttp://tinyurl.com/ContToken

Continuation Tokens• Next Table• Next PartitionKey• Next RowKey

Transient Fault Handling• Network• Hardware• DataCenter

Page 18: NoSQL em Windows Azure Table Storage - Vitor Tomaz

Scalability Targets

Page 19: NoSQL em Windows Azure Table Storage - Vitor Tomaz

Scalability Targets -Storage AccountStorage Account level targets by end of 2012 Applies to accounts created after June 7th 2012

Capacity – Up to 200 TBs

Transactions – Up to 20,000 entities per second

Bandwidth for a Geo Redundant storage accountIngress - up to 5 GibpsEgress - up to 10 Gibps

Bandwidth for a Locally Redundant storage account

Ingress - up to 10 Gibps Egress - up to 15 Gibps

Page 20: NoSQL em Windows Azure Table Storage - Vitor Tomaz

Scalability Targets – PartitionPartition level Targets by end of 2012 Applies to accounts created after June 7th 2012Single Table Partition – Account Name + Table Name + PartitionKey value

Up to 2,000 entities per second  

Page 21: NoSQL em Windows Azure Table Storage - Vitor Tomaz

Non-Relational Data Modeling

Page 22: NoSQL em Windows Azure Table Storage - Vitor Tomaz

Why Partition

Data Volume (too many bytes)

Work Load (too many transactions/second)

Cost (using different cost storage)

Elasticity (just in time partitioning for high load periods)

Page 23: NoSQL em Windows Azure Table Storage - Vitor Tomaz

Choosing a Partition Key

Natural Keys•Country•First letter, last name•Date

Mathematical•Hash functions•Modulo operator

Lookup Based•Lookup table to resolve value to partitions

Page 24: NoSQL em Windows Azure Table Storage - Vitor Tomaz

Using Modulo

The remainder of a divisionNice properties for partitioning:•Given two positive integers M and N•M mod N will return a number between 0 and N-1

Want equi-sized partitions?•Given an appropriate distribution of M we will get N ‘equally full’ buckets.

Page 25: NoSQL em Windows Azure Table Storage - Vitor Tomaz

Using Hash Values

Using a hash function projects one distribution into anotherUse a hash function that projects a random distributionDo NOT use a cryptographic hash functionBe careful if using Object.GetHashCode()•Boxed types may return different value to un-boxed equivalent

Page 26: NoSQL em Windows Azure Table Storage - Vitor Tomaz

Re-partition all data

Version partitioning scheme

Partition Stability Over Time

May need to change partitioning schemeTwo options:

e.g. <Version><PartitionKey><v1><A3E567D7D8C68789><v2><A8B978C8B6D77836>

wherev1 = GUID mod 4v2 = GUID mod 101 2

Page 27: NoSQL em Windows Azure Table Storage - Vitor Tomaz

E.g. Tweet Storage

TweetID

UserID

DateTimeStamp

Message

With an RDBMS you’d probably start something like this:SELECT * FROM Tweet WHERE Message Like %SearchTerm%

Page 28: NoSQL em Windows Azure Table Storage - Vitor Tomaz

E.g. Tweet StorageYou’d soon realize that LIKE isn’t so wonderful.

You’d do a little normalization

Message

TweetID

WordID

WordID

Word (IX)

Message

TweetID

Word (IX)

Page 29: NoSQL em Windows Azure Table Storage - Vitor Tomaz

E.g. Tweet Storage

With Tables we go the whole way

TweetID (RK)

UserID (PK)

DateTimeStamp

Message

TweetID (RK)

UserID

DateTimeStamp

Message

Word (PK)

Page 30: NoSQL em Windows Azure Table Storage - Vitor Tomaz

E.g. Tweet Storage

We may create multiple indexes

TweetID (RK)

UserID (PK)

DateTimeStamp

Message

TweetID (RK)

UserID

DateTimeStamp

Message

UserID (PK)

Page 31: NoSQL em Windows Azure Table Storage - Vitor Tomaz

Entity Group Transactions

Page 32: NoSQL em Windows Azure Table Storage - Vitor Tomaz
Page 33: NoSQL em Windows Azure Table Storage - Vitor Tomaz
Page 34: NoSQL em Windows Azure Table Storage - Vitor Tomaz
Page 35: NoSQL em Windows Azure Table Storage - Vitor Tomaz

Modeling In Tables

Currently no secondary indexes (coming)•Be careful to minimize cross partition queries

Build indexes yourself•Concentrate on useful partition keys

If associated data is small enough•Save additional queries•Duplicate data with each index

Page 36: NoSQL em Windows Azure Table Storage - Vitor Tomaz

Best Practices

Page 37: NoSQL em Windows Azure Table Storage - Vitor Tomaz

Common Design & ScalabilityCommon Settings

Turn off Nagling & Expect 100 (.NET – ServicePointManager)Set connection limit (.NET – ServicePointManager.DefaultConnectionLimit)Turn off Proxy detection when running in cloud (.NET – Config: autodetect setting in proxy element)

Design you application that allows distributing requests across your range of partition keys to avoid hotspots Avoid Append/Prepend pattern: Access pattern lexically sorted by Partition Key valuesPerform one time operations at startup rather than every request Creating containers/tables/queues which should always exist Setting required constant ACLs on container/table/queue

Page 38: NoSQL em Windows Azure Table Storage - Vitor Tomaz

Common Design & ScalabilityTurn on analytics & take control of your investigations– Logging and MetricsWho deleted my container? – Look at the client IP for delete container requestWhy is my request latency increased? - Look at E2E vs. Server latencyWhat is my user demographics? – Use client request id to trace requests & client IPHow can I tune my service usage? – Use metrics to analyze API usage & peak traffic statsAnd many more…

Use appropriate retry policy for intermittent errors Storage client uses exponential retry by default

Page 39: NoSQL em Windows Azure Table Storage - Vitor Tomaz

Storage AccountsCollocate storage accounts with your compute roles as egress is free within same region

Use multiple storage accounts to: achieve targets that exceed a single storage achieve client proximityMap multiple clients to same storage account

Use different containers/tables/queues instead an account for each customer

Page 40: NoSQL em Windows Azure Table Storage - Vitor Tomaz

Storage Accounts

Design to add more accounts as needed

Use different account for Windows Azure Diagnostics

Choose local redundant storage ifData can be restored on major disastersGeographical boundary constraints on where data can be stored

Page 41: NoSQL em Windows Azure Table Storage - Vitor Tomaz

WA Table Client - Service Layer• Option 1 – WCF Data Services

• Good for fixed schema used like relational tables• Do not require control on serialization/deserialization

• Option 2 – Table Service Layer’s Dynamic Table Entity• Entity containing a Dictionary of Key-Value properties• Used when schema is not known example: Explorers• Performance!

• Option 3 – Table Service Layer’s POCO • POCO derives from ITableServiceEntity or TableServiceEntity• Control over serialization and deserialization – make your data

dance to your tune!• ETag maintained with Entities - easy to update!• Performance!

Page 42: NoSQL em Windows Azure Table Storage - Vitor Tomaz

Performance - Storage Client Library 2.0

Storage Client 1.7 Storage Client 2.0 : DataServices

Storage Client 2.0 : Reflection

Storage Client 2.0 : No Reflection

0

5

10

15

20

25

30

35

40

0

20

40

60

80

100

120

140

160

Batch Stress Scenario Per Entity Latencies

DeleteQueryInsertProcessor Time (s)Test Duration (s)

Tim

e (

ms)

Faster NoSQL table accessUpto 72.06% reduction in execution timeUpto 31.92% reduction in processor time Upto 69-90% reduction in latency

Page 43: NoSQL em Windows Azure Table Storage - Vitor Tomaz

Performance - Storage Client Library 2.0

Storage Client 1.7 Storage Client 2.00

5,000

10,000

15,000

20,000

25,000

30,000

Large Blob Scenario (256MB) Resource Utilization

Total Test Time (s)Total Processor Time (s)

Tim

e (

s)

Storage Client 1.7 Storage Client 2.00

10

20

30

40

50

60

70

Large Blob Scenario (256MB) Latencies

UploadDownload

Tim

e (

s)

Faster uploads and downloads31.46% reduction in processor time Upto 22.07% reduction in latency

Page 44: NoSQL em Windows Azure Table Storage - Vitor Tomaz

Take Away

Partitioning Data Key to Cloud Scale Apps

Horizontally Partition for Scale Out

Choose appropriate partition keys

Table storage requires different approach to data modeling

Don’t be afraid to aggressively de-normalize and duplicate data

Page 46: NoSQL em Windows Azure Table Storage - Vitor Tomaz

Questões?

Page 47: NoSQL em Windows Azure Table Storage - Vitor Tomaz

Próximas reuniões presenciais

23/03/2013 – Março (Lisboa)20/04/2013 – Abril (Lisboa)22/06/2013 – Junho (Lisboa)??/??/2013 – ? (Porto)??/??/2013 – ? (Coimbra)Reserva estes dias na agenda! :)

Page 48: NoSQL em Windows Azure Table Storage - Vitor Tomaz

Patrocinador “GOLD”

Twitter: @PTMicrosoft http://www.microsoft.com/portugal

Page 50: NoSQL em Windows Azure Table Storage - Vitor Tomaz

Patrocinadores “Bronze”

Page 51: NoSQL em Windows Azure Table Storage - Vitor Tomaz

Obrigado!Vítor Tomazvitorbstomaz AT gmail.comhttp://twitter.com/vitortomaz