charles loboz, slawek smyl, suman nath microsoft corporation

30
Charles Loboz, Slawek Smyl, Suman Nath Microsoft Corporation

Upload: kevin-brewer

Post on 26-Mar-2015

224 views

Category:

Documents


5 download

TRANSCRIPT

Page 1: Charles Loboz, Slawek Smyl, Suman Nath Microsoft Corporation

Charles Loboz, Slawek Smyl, Suman NathMicrosoft Corporation

Page 2: Charles Loboz, Slawek Smyl, Suman Nath Microsoft Corporation

Monitoring Large DataCenters

Context Performance Data Design Goals DataGarage Query Processing Experiments Context Performance Data Design Goals DataGarage Query Processing Experiments

Monitoring Planning

Historical analysis

Management Task

CPU, memory, disk utilization,…Response time, queue length,…

Performance data

Page 3: Charles Loboz, Slawek Smyl, Suman Nath Microsoft Corporation

Monitoring Data Management

100K servers = 1TB data per day!

Context Performance Data Design Goals DataGarage Query Processing Experiments Context Performance Data Design Goals DataGarage Query Processing Experiments

Storage challengeStorage challenge

Store data over many months, years

Petabytes of data

Store data over many months, years

Petabytes of data

Query challengeQuery challenge

Hours to run simple queries

Hours to run simple queries

Page 4: Charles Loboz, Slawek Smyl, Suman Nath Microsoft Corporation

DataGarage

Context Performance Data Design Goals DataGarage Query Processing Experiments Context Performance Data Design Goals DataGarage Query Processing Experiments

CPU, memory, disk utilization,…Response time, queue length,…

Performance data

Storage, query processingEfficient, scalable, cheap

Page 5: Charles Loboz, Slawek Smyl, Suman Nath Microsoft Corporation

• Context• Performance data characteristics• Design goals• DataGarage design• Query Processing• Evaluation• Conclusion

Outline

Page 6: Charles Loboz, Slawek Smyl, Suman Nath Microsoft Corporation

Performance Data Collection

Context Performance Data Design Goals DataGarage Query Processing Experiments Context Performance Data Design Goals DataGarage Query Processing Experiments

Monitoring process

Time CPU Mem Jobs Disk …

10:00 48 37 3 134 …

10:01 52 39 3 342 …

10:02 58 45 2 324 …

… … … … … …

Sampling period 15 seconds100-1000 counters/server

5-100 MB/server/day0.01% CPU time

Our Deployment

CPU utilization, memory usage, disk space, SQL queue length, app response time, cache hit rate, network bandwidth, …

Page 7: Charles Loboz, Slawek Smyl, Suman Nath Microsoft Corporation

Performance Data Characteristics• Heterogeneous counter sets– 30K different counters, 100-1000 per server

• Numeric, read-only, possibly-dirty– Dirty data retained, may be ignored for query

• Hierarchical queries– Selection, projection, aggregation, data mining• Fraction of hotmail.com servers in a given rack with CPU

utilization > 50%• Average memory utilization trend of hotmail servers

Context Performance Data Design Goals DataGarage Query Processing Experiments Context Performance Data Design Goals DataGarage Query Processing Experiments

Page 8: Charles Loboz, Slawek Smyl, Suman Nath Microsoft Corporation

DataGarage Design Goals• Small storage footprint– Reduces storage and communication cost– Small pay-as-you-go cost for Cloud systems

• Cheap– Commodity hardware and off-the-shelf software

• Fast and robust query processing– Allows fast decisions– Tolerates faulty and slow hardware

• Simple and flexible query interface (SQL + UDF)– Fast query writing

Context Performance Data Design Goals DataGarage Query Processing Experiments Context Performance Data Design Goals DataGarage Query Processing Experiments

Page 9: Charles Loboz, Slawek Smyl, Suman Nath Microsoft Corporation

• Context• Performance data characteristics• Design goals• DataGarage design• Query Processing• Evaluation• Conclusion

Outline

Page 10: Charles Loboz, Slawek Smyl, Suman Nath Microsoft Corporation

Options• TableStore: Relational table– DB engine: single-node DBMS, parallel DBMS– MapReduce: HadoopDB [Abouzeid et al. VLDB’09]

• FileStore: Files– MapReduce: Hadoop, Dryad [Isard et al., EuroSys’07]

Context Performance Data Design Goals DataGarage Query Processing Experiments Context Performance Data Design Goals DataGarage Query Processing Experiments

Page 11: Charles Loboz, Slawek Smyl, Suman Nath Microsoft Corporation

Trade-offs

Context Performance Data Design Goals DataGarage Query Processing Experiments Context Performance Data Design Goals DataGarage Query Processing Experiments

Performance

Fault-tolerance

Cost Storage footprint

TableStore + Parallel DB Engine

(DBMS-X)

TableStore + MR + single node DB(HadoopDB)

FileStore + MapReduce

(Hadoop, Dryad)

TableStore in files + MapReduce

(DataGarage)

Page 12: Charles Loboz, Slawek Smyl, Suman Nath Microsoft Corporation

Storage Inefficiency: TableStore

Wide table

Context Performance Data Design Goals DataGarage Query Processing Experiments Context Performance Data Design Goals DataGarage Query Processing Experiments

Narrow table

Mac

hine

id

Tim

esta

mps

Coun

ter 1

Coun

ter 2

Coun

ter n

All possible counters

• Too many columns • >95% sparse

Mac

hine

id

Tim

esta

mps

Coun

ter i

d

Valu

e

Key-value store

• Redundant keys(4x more expensive

than raw data)• Expensive joins needed

Key problem: heterogeneous counter setsTotal 30,000 unique counters, <1000/server

Page 13: Charles Loboz, Slawek Smyl, Suman Nath Microsoft Corporation

Storage Inefficiency: FileStore• Heterogeneous counter sets– Files need to maintain schema for each server

• No structure in data– Compression cannot exploit data correlation

Context Performance Data Design Goals DataGarage Query Processing Experiments Context Performance Data Design Goals DataGarage Query Processing Experiments

Page 14: Charles Loboz, Slawek Smyl, Suman Nath Microsoft Corporation

Our Solution• One wide-table per server– Benefits of TableStore, without sparseness/ redundancy

• Each wide-table in an embedded database file– Benefits of FileStore

Context Performance Data Design Goals DataGarage Query Processing Experiments Context Performance Data Design Goals DataGarage Query Processing Experiments

c1 c4 c6 c7 c8 c2 c4 c5 c8

Microsoft SQL Server Compact Edition library

.sdf

file c1 c2 c3

SQL Lite, MS SQL Server Compact Edition

Page 15: Charles Loboz, Slawek Smyl, Suman Nath Microsoft Corporation

DataGarage Architecture

Controller(Query Dissemination)

Controller(Query Dissemination)

SummaryDatabaseSummaryDatabase

Dataanalysis

tools

Data collector

Data collector

Data collector

Embedded database

Distributed file system

Query

Context Performance Data Design Goals DataGarage Query Processing Experiments Context Performance Data Design Goals DataGarage Query Processing Experiments

Page 16: Charles Loboz, Slawek Smyl, Suman Nath Microsoft Corporation

• Zipping files with PKZip is not effective• Compress one column at a time– Exploit strong correlation– RLE, delta encoding not very effective

• Our idea: Bit-truncation + Byte-interleaving

Data Compression

42424242

AEAEAEAE

91832B39

A0E438C4

42424242

AEAEAEAE

91832B39

4242AE..

42..

AE91

42AEAE83

if lossy

Context Performance Data Design Goals DataGarage Query Processing Experiments Context Performance Data Design Goals DataGarage Query Processing Experiments

<1%

Page 17: Charles Loboz, Slawek Smyl, Suman Nath Microsoft Corporation

Storage Efficiency

Context Performance Data Design Goals DataGarage Query Processing ResultsContext Performance Data Design Goals DataGarage Query Processing Results

Page 18: Charles Loboz, Slawek Smyl, Suman Nath Microsoft Corporation

• Context• Performance data characteristics• Design goals• DataGarage design• Query Processing• Evaluation• Conclusion

Outline

Page 19: Charles Loboz, Slawek Smyl, Suman Nath Microsoft Corporation

• DataGarage query: Three components – On: filesystem path: /hotmail/dc1/*.10-.-2009.sdf– Apply: a SQL query run on individual database files– Combine: a SQL query to compute final result

• Enables map-reduce style execution

DataGarage Query

Context Performance Data Design Goals DataGarage Query Processing Experiments Context Performance Data Design Goals DataGarage Query Processing Experiments

Page 20: Charles Loboz, Slawek Smyl, Suman Nath Microsoft Corporation

Query Execution

Context Performance Data Design Goals DataGarage Query Processing Experiments Context Performance Data Design Goals DataGarage Query Processing Experiments

DisseminationApply

Execution Nodes

Distributed File system

ResultController

NodeOn

Combine

Temporary

ControllerCombine

Page 21: Charles Loboz, Slawek Smyl, Suman Nath Microsoft Corporation

Query Execution Time

Context Performance Data Design Goals DataGarage Query Processing ResultsContext Performance Data Design Goals DataGarage Query Processing Results

Page 22: Charles Loboz, Slawek Smyl, Suman Nath Microsoft Corporation

• DataGarage key technology:– Decoupling of execution and storage – Fine-grained data partitioning

• Data is replicated by the file system• Slow execution nodes – Assigned smaller jobs– Faster nodes take additional load after finished

• Execution node failures– New nodes work on remaining job of failed nodes

Fault Tolerance

Context Performance Data Design Goals DataGarage Query Processing Experiments Context Performance Data Design Goals DataGarage Query Processing Experiments

Page 23: Charles Loboz, Slawek Smyl, Suman Nath Microsoft Corporation

• High performance: queries are pushed inside embedded database

• Storage efficient: compression• Fault tolerant: fine partitioning of data and query

processing, aggressive restarting, speculative execution

• Hierarchical queries: file system paths• Simple interface: SQL queries• Cheap: off-the-shelf tools, commodity machines

Goals Revisited

Page 24: Charles Loboz, Slawek Smyl, Suman Nath Microsoft Corporation

• Context• Performance data characteristics• Design goals• DataGarage design• Query Processing• Experience• Conclusion

Outline

Page 25: Charles Loboz, Slawek Smyl, Suman Nath Microsoft Corporation

• Have been in operation for more than 1 year– Warehousing data from Microsoft data centers

• Partitioning with fine granularity + compression is the key to store massive data– Previous implementation with narrow table• 30K server-days in 1TB disk• Slow queries

– Current implementation: • 1-3 million server-days/TB • Orders of magnitude faster queries

Operational Experience

Context Performance Data Design Goals DataGarage Query Processing ResultsContext Performance Data Design Goals DataGarage Query Processing Results

Page 26: Charles Loboz, Slawek Smyl, Suman Nath Microsoft Corporation

• Embedded database files give flexibility– Placement, backup simplified – Scavenge available storage on the fly

• Simple design helps– Several thousands lines of C# code to glue together

existing tools (FS, Embedded DB, R, …)• Defer features until necessary: Parallel Combine

• Good fit with Cloud computing model– Data and/or computation can be on the Cloud– Cheap: only file storage needed, small footprint

Operational Experience

Context Performance Data Design Goals DataGarage Query Processing ResultsContext Performance Data Design Goals DataGarage Query Processing Results

Page 27: Charles Loboz, Slawek Smyl, Suman Nath Microsoft Corporation

• Existing solutions are not efficient for warehousing performance data

• DataGarage: performance data warehouse• Cheap, scalable, fault tolerant– Combines benefits of DB, MapReduce, file systems

• Operational experience shows the benefits

Questions?

Conclusion

Context Performance Data Design Goals DataGarage Query Processing ResultsContext Performance Data Design Goals DataGarage Query Processing Results

Page 28: Charles Loboz, Slawek Smyl, Suman Nath Microsoft Corporation

Compression Overhead

Context Performance Data Design Goals DataGarage Query Processing ResultsContext Performance Data Design Goals DataGarage Query Processing Results

Page 29: Charles Loboz, Slawek Smyl, Suman Nath Microsoft Corporation

• HadoopDB– DataGarage has finer data partitioning• Improves fault tolerance and storage efficiency

– DataGarage uses embedded databases• Cheap, enables using hierarchical file system

– DataGarage uses data compression

Related Work

Context Performance Data Design Goals DataGarage Query Processing Experiments Context Performance Data Design Goals DataGarage Query Processing Experiments

Page 30: Charles Loboz, Slawek Smyl, Suman Nath Microsoft Corporation

Query Processing

Controller(Query Dissemination)

Controller(Query Dissemination)

<apply_script>

<target>

<combine_script>

<apply_script><apply_script>Embedded database

Distributed file system

Temporary table<combine_script>

ResultResult

Context Performance Data Design Goals DataGarage Query Processing Experiments Context Performance Data Design Goals DataGarage Query Processing Experiments