choosing right data store & processing

20
ndows Azure Conference 2014 indows Azure Conference 2014 Data Storage options on Windows Azure Govind Kanshi MTC

Upload: govind-kanshi

Post on 26-Jan-2015

102 views

Category:

Technology


0 download

DESCRIPTION

Data is essential to an application. Today Azure provides multiple options to store it. There are various ways to skin the cat based on hosted/host your own and working through performance, availability, scale, licensing etc.

TRANSCRIPT

Page 1: Choosing right data store & processing

Windows Azure Conference 2014Windows Azure Conference 2014

Data Storage options on Windows AzureGovind KanshiMTC

Page 2: Choosing right data store & processing

Windows Azure Conference 2014

Way to skin cat store

• Hosting options• What you need to worry about

– Availability– Performance– Scale...

• Where do I store data

Page 3: Choosing right data store & processing

Windows Azure Conference 2014

Hosting option

• Hosted• Host your own• What you need to worry about

– Availability– Performance (more compute/bw/better storage)– Scale (throughput/latency/storage)– Management/Monitoring– Cost

Page 4: Choosing right data store & processing

Windows Azure Conference 2014

Hosting option Path

• Hosted (not my headache option)– No admin – (majority – setup/maintenance)– Availability – Better and cheaper– Very little planning/spend the size of mc, resources– Focus on application not on admin/mgmt. issues

Page 5: Choosing right data store & processing

Windows Azure Conference 2014

Hosting Options Path

• Hosted– No admin – (majority – setup/maintenance)– Availability – Better and cheaper– Very little planning/spend the size of mc, resources– Focus on application not on admin/mgmt. issues

• Host your own– Flexibility (use jobs, use replication, use broker)– Roll your own Availability, Performance, upgrade,patching– Plan your scale, spend – Plan for Admin – have inhouse expertise

Page 6: Choosing right data store & processing

Windows Azure Conference 2014

Offerings

• Relational– Hosted

• SqlAzure

– Host your own• Sql Server, Oracle, MySql, Postgres

• Non Relational– Hosted

• Table Storage – key/value, Blob/Page store• Mongo

– Host your own• Cassandra., Mongo, Redis

Page 7: Choosing right data store & processing

Windows Azure Conference 2014

Availability

• Hosted– SQLAzure

• Local transparent failover – no direct access to replicas • Replicas – Remote ? ( ship logs and failover via Traffic manager), Tk bkup• Replicas – Read Only ? – In future (local vs across dc)

– Azure Storage• Local transparent failover – no direct access to replicas• Remote replication (no guarantee SLA but usually within minutes)

• Host your own– Availability sets

• Need to setup Virtual Network• Need to create synch mechanism• Need to setup failover mechanism

– AlwaysOn for SQL servers, Other databases need to get it right like SQL Server– Use Azure storage – push backup

Page 8: Choosing right data store & processing

Windows Azure Conference 2014

Performance

• Hosted– Azure provides various options

• SqlAzure premium vs Regular (remove noisy neighbor issue)• Pretty soon other services will distinguish themselves by

performance(think H)

– SQlAzure premium provides reserved IOPs

• Host your own– Choose better compute– Choose better storage

• Soon good news on more options

– Eod you need to create monitoring, fixing & do planning

Page 9: Choosing right data store & processing

Windows Azure Conference 2014

Scale (Up/Out)

• Hosted– SqlAzure

• Web/Business – storage vs SqlPremium isolated perf

– HDInsight• Scaleout vs scaleup of nodes (disruptive)

– Table Storage/Azure Blog/Queues - Service Bus(little diff)• Unlimited storage(overall 200TB) – no explicit limit (no scale up sku)

• Host your own– Need to plan for provisioning of storage/compute based on offering

(redis vs Cassandra vs Hbase). Monitoring/Handling failover etc extra effort.

Page 10: Choosing right data store & processing

Windows Azure Conference 2014

Management/Monitoring

• Hosted– API or Dashboard (mostly) – Everything abstraced – Cost/operations which matter than os/mem etc– Mostly auto managed/healed with with overall backend taking care of

many things– No worries about patch mgmt, backup schedules etc…

• Host your own– Roll out your own (time vs what to expose/use/act upon) – Cloud

aware SW needed. System Center can do x things– Backend can take care of say compute failover or storage but rest

stuff needs to be built upon.

Page 11: Choosing right data store & processing

Windows Azure Conference 2014

Cost

• Hosted– Generally easy (volume stored, unit/processed/sent)– For ISV Billing is still an exercise – should become better

• Host your own– Roll your own – basically what you use is what you pay.– Plus licensing blues – Plus dedicated people(sometimes hierarchy, one to do

day-day jobs, another to help business/dev)

Page 12: Choosing right data store & processing

Windows Azure Conference 2014

What to check for in Host your Own• License portability• Certification • Support• Preferred usage

– Dev/Test vs Production

Page 13: Choosing right data store & processing

Windows Azure Conference 2014

Why diff kind of store

• Data is complex - struct of struct of maps• Data is changing the shape• Lot of data is collected – scale of storage

– Time Series• Sensors• Audit events

– Data is schema? • easy to add new fields, and even completely change the structure of a model.• Need query model over shape rather than just key/value or pseudo mapping

to Relational world

• Low Latency high volume  

Page 14: Choosing right data store & processing

Windows Azure Conference 2014

What kind of data• What is my scenario

– Caching – Velocity, MemcacheD, Redis, Riak– Counters/Speed/Write – Velocity, Redis, Cassandra– Transactions – Database, SQL Azure (federation)– Documents/jsonfied class/shape – MongoDB, RavenDB, Riak *– Write large amount of data with throughput –

Cassandra,Azure Storage– Full Text Search – Solr/ElasticSearch, Sphinx– Store data for scale out compute – Hadoop– Store data on specialized Appliance – PDW

* Wished we could query shape data rather than fitting in relational world of columns/rows

Page 15: Choosing right data store & processing

Windows Azure Conference 2014

Where do I store my data - Location

Low latencyLocal Memory

Low latencyShared Memory

Dedicated Machine

Shared high throughputStorage

Shared entity Storage

Shared raw, batch long term storage

Ref Data Session data Tx Data Tx data Entity data Data Lake/Store everything,

In Node Cache

Azure Cache

Relational DB

SQLAzureRelational DB

AzureTable HDInsight

Page 16: Choosing right data store & processing

Windows Azure Conference 2014

Or another way to think

• Will I write lot of data and need to store & query it• Will need very low latency• Can I compromise on consistency• What are my business needs (how fast we are

growing), Can I afford to take a break and get/roll in new store

Page 17: Choosing right data store & processing

Windows Azure Conference 2014

How will we get/store the data

• Query– SQL, LINQ, ORMed (challenge mapping to every

language) or REST– Custom (query format, compression,serialization)

• Tunable Consistency– Out of 5 nodes only when 3 respond yay – consider

written– Out of 5 nodes when 2 respond yay – take that value

Page 18: Choosing right data store & processing

Windows Azure Conference 2014

GuidanceStores Hosted Host your own

Microsoft Non Microsoft/Partner Microsoft/Partner Non Microsoft

Relational SQLAzure Sql Server, Access Oracle, SAP, My

Caching Azure Cache Memcache Redis, Memcache

K-v/Column store Azure Table Cassandra, Riak, Hbase

Document store AzureTable? Mongo MongoDB

Graph Store Neo4j

VL-Scaleout HDInsight HortonWorks HDP Cloudera?

In-Memory DS Azure Cache Redis

Streaming/Queue/EAI Azure queue,Notification, Biztalk

StreamInsight ,MSMQ, Biztalk Storm, Kafka

Long term Azure Storage Build your own

Text Azure Table Solr SQL server Solr, Elastic Search

Page 19: Choosing right data store & processing

Windows Azure Conference 2014

End

Page 20: Choosing right data store & processing

Windows Azure Conference 2014

Compare them – summary (evolving)

Key Value Document Column GraphPersistence-Json

* * *

ACID # # #Query mode API/REST API/REST API SPARQL/Rest/JavaScale Horizontal Horizontal Horizontal Vertical scaleReplication Async Async/tunable Tunable NASchema free * * + *Mapreduce # # # NANode-Addn/Dln

+ Manual # * NA

Indexing Primary key Attributes # *

* :Most of them support, # :specific product support , + :partial support