ramcloud: concept and challenges john ousterhout stanford university
TRANSCRIPT
RAMCloud:Concept and Challenges
John OusterhoutStanford University
April 10, 2009 RAMCloud Slide 2
Introduction
● RAMCloud: massive datacenter storage entirely in DRAM
● New research project: Kozyrakis Mazieres McKeown Mitra Ousterhout Parulkar Prabhakar Rosenblum
● Goal for today: feedback What are we missing? Related work Pitfalls
April 10, 2009 RAMCloud Slide 3
Outline
● Overview of RAMCloud
● Motivation
● Challenges in making RAMCloud a reality
April 10, 2009 RAMCloud Slide 4
RAMCloud Overview
● Storage for datacenters
● 1000-10000 commodity servers
● 64 GB DRAM/server
● All data always in RAM
● Low-latency access:5-10µs RPC
● High throughput:1,000,000 ops/sec/server
● High level of durability & availability
Application Servers
Storage Servers
Datacenter
April 10, 2009 RAMCloud Slide 5
Example Configurations
Today 5-10 years
# servers 1000 1000
GB/server 64GB 1024GB
Total capacity 64TB 1PB
Total server cost $4M $4M
$/GB $60 $4
April 10, 2009 RAMCloud Slide 6
RAMDisk Motivation
● Relational databases don’t scale
● Every large-scale Web application has problems: Facebook: 4000 MySQL instances + 2000 memcached servers
● New forms of storage starting to appear: Google BigTable
● Many apps don’t need all RDBMS features, can’t afford them
April 10, 2009 RAMCloud Slide 7
RAMDisk Motivation, cont’d
Disk access rate not keeping up with capacity:
● Disks must become more archival
● Can’t afford small random accesses
● RAM cost today = disk cost 10 years ago
Mid-1980’s Today Change
Disk capacity 30 MB 200 GB 6667x
Max. transfer rate 2 MB/s 100 MB/s 50x
Latency (seek & rotate) 20 ms 10 ms 2x
Capacity/bandwidth(large blocks)
15 s 2000 s 133x
Capacity/bandwidth(200B blocks)
3000 s 115 days 3333x
April 10, 2009 RAMCloud Slide 8
Can We Make RAMCloud Work?
● Is capacity sufficient?
● Data durability/availability
● Low-latency RPCs
● Data model
● Concurrency model
● Data distribution, scaling
● Multi-tenancy
● Functional distribution
● Automated management
April 10, 2009 RAMCloud Slide 9
Is RAMCloud Capacity Sufficient?
● Target applications: online data, not media
● Facebook: 200 TB of (non-image) data today
● Amazon:Revenues/year: $16B
Orders/year: 400M? ($40/order?)
Bytes/order: 10000?
Order data/year 4 TB?
● United Airlines:Total flights/day: 4000? (30,000 for all airlines in U.S.)
Passenger flights/year: 200M?
Data/passenger-flight: 10000?
Order data/year: 2 TB?
April 10, 2009 RAMCloud Slide 10
Data Durability/Availability
● Data must be durable when write RPC returns.
● Non-starters: Synchronous disk write (100-1000x too slow) Replicate in other memories (too expensive)
● One possibility: Log operations in RAM of other server(s) Log to disk in batches Occasional disk checkpoints to truncate logs
● Additional issues: Fast recovery Role of flash memory Datacenter disaster recovery
April 10, 2009 RAMCloud Slide 11
Low-Latency RPCs
Achieving 5-10µs will impact every layer of the system:
● Network: cut-through routing, flow control
● OS: Dedicated cores No interrupts? No virtual memory?
● Protocol stack TCP too slow (especially with packet loss) Must avoid copies
● Storage service
Could we someday get to 1µs??
April 10, 2009 RAMCloud Slide 12
Data Model
● Probably not relations: Too rigid Too much fragmentation Better to collect larger objects?
● Unstructured blobs?
● Hierarchical hash tables (e.g. JSON)?
● What is the role of indexing?
April 10, 2009 RAMCloud Slide 13
Concurrency Model
● Locking
● Transactions?
● What level of consistency for operations on distributed data?
April 10, 2009 RAMCloud Slide 14
Distribution and Scaling
● How to distribute data among storage servers? Concentrate to minimize RPCs, simplify consistency Distribute to reduce hot spots
● How to achieve locality of access?(Is there any locality for Web apps?)
● Smooth scaling: Automatic redistribution of data
April 10, 2009 RAMCloud Slide 15
Multi-Tenancy
● Design for cloud computing environments: Multiple users/applications sharing a RAMCloud
● Security/access control issues
● Scale down as well as up Small apps should get RAMCloud benefits at proportional cost
April 10, 2009 RAMCloud Slide 16
Functional Distribution
● RAMCloud will include software on both application servers and storage servers
● What is the right distribution?
● Application servers: Offload storage system, increase scalability, flexibility Example: aggregation operators
● Storage servers: More trusted May benefit from knowledge of internals
Application Servers
Storage Servers
RAMCloudSoftware
April 10, 2009 RAMCloud Slide 17
Automated Management
● Manual management a non-starter:system too complex
● Things that must be automatic: Recovery from failures Distribution of data among servers Reorganization as system scales
April 10, 2009 RAMCloud Slide 18
Discussion
● Why hasn’t this been done before? Or has it?
● If this is a good idea, what happened to main-memory databases?
● What are the biggest issues that might keep RAMCloud from working?
● What’s the most important database work we should be aware of?
● Would any of you like to participate in RAMCloud?