cs 744: datacenter as a computer
TRANSCRIPT
CS 744: DATACENTER AS A COMPUTER
Shivaram Venkataraman Fall 2019
With slides from Mosharaf Chowdhury and Ion Stoica
ANNOUNCEMENTS
- Assignments - Assignment zero is due! - Form groups for Assignment 1 on Piazza
- Class format - Lecture - Review - Discussion
Scalable Storage Systems
Datacenter Architecture
Resource Management
Computational Engines
Machine Learning SQL Streaming Graph
Applications
OUTLINE
- Hardware Trends - Datacenter design - WSC workloads - Discussion
Why is One Machine Not Enough?
What’s in a Machine?
Interconnected compute and storage Newer Hardware
- GPUs, FPGAs - RDMA, NVlink
Memory Bus
Ethe
rnet SATA
PCIe v4
Scale Up: Make More Powerful Machines
Moore’s law – Stated 52 years ago by Intel
founder Gordon Moore – Number of transistors on
microchip double every 2 years
– Today “closer to 2.5 years” Intel CEO Brian Krzanich
Dennard Scaling is the Problem
Suggested that power requirements are proportional to the area for transistors
– Both voltage and current being proportional to length
– Stated in 1974 by Robert H. Dennard (DRAM inventor)
Broken since 2005 “Adapting to Thrive in a New Economy of Memory Abundance,” Bresniker et al
Dennard Scaling is the Problem
Performance per-core is stalled Number of cores is increasing
“Adapting to Thrive in a New Economy of Memory Abundance,” Bresniker et al
Memory TRENDS
MEMORY TAKEAWAY
Growing +15% per year Data access from memory is getting more expensive !
HDD CAPACITY
HDD BANDWIDTH
Disk bandwidth is not growing
SSDs
Performance: – Reads: 25us latency – Write: 200us latency – Erase: 1,5 ms
Steady state, when SSD full – One erase every 64 or 128 reads (depending on page size)
Lifetime: 100,000-1 million writes per page
SSD VS HDD COST
Amazon EC2 (2014)
Machine Memory (GB) Compute Units (ECU)
Local Storage (GB) Cost / hour
t1.micro 0.615 1 0 $0.02
m1.xlarge 15 8 1680 $0.48
cc2.8xlarge 60.5 88 (Xeon 2670) 3360 $2.40
1 ECU = CPU capacity of a 1.0-1.2 GHz 2007 Opteron or 2007 Xeon processor
Amazon EC2 (2018)
Machine Memory (GB) Compute Units (ECU)
Local Storage (GB) Cost / hour
t2.nano 0.5 1 0 $0.0058
r5d.24xlarge 244 768 104 96 4x900 NVMe $6.912
x1.32xlarge 2 TB 4 * Xeon E7 3.4 TB (SSD) $13.338
p3.16xlarge 488 GB 8 Nvidia Tesla V100 GPUs 0 $24.48
Amazon EC2 (2019)
Machine Memory (GB) Compute Units (ECU)
Local Storage (GB) Cost / hour
t2.nano 0.5 1 0 $0.0058
r5d.24xlarge 768 96 4x900 NVMe $6.912
x1e.32xlarge 2 TB 4 TB 4 * Xeon E7 3.4 TB (SSD) $26.68
p3dn.24xlarge 488 768 GB 8 Nvidia Tesla V100 GPUs 0 $31.21
Ethernet Bandwidth
1998
1995
2002
2017
Growing 33-40% per year !
TRENDS SUMMARY
CPU speed per core is flat Memory bandwidth growing slower than capacity SSD, NVMe replacing HDDs Ethernet bandwidth growing Scale up vs Scale out? (Discussion)
DATACENTER ARCHITECHTURE
Memory Bus
Ethe
rnet
SATA
PCIe
ServerServer
Datacenter Networks
Traditional hierarchical topology – Expensive – Difficult to scale – High oversubscription – Smaller path diversity – …
Core
Agg.
Edge
STORAGE HIERARCHY (v2)
Scale Out: Warehouse-Scale Computers
Single organization Homogeneity (to some extent) Cost efficiency at scale
– Multiplexing across applications and services
– Rent it out!
Many concerns – Infrastructure – Networking – Storage – Software – Power/Energy – Failure/Recovery – …
MAIN COMPONENTS OF WSC
SOFTWARE IMPLICATIONS
Workload Diversity
Reliability
Single organization
Storage Hierarchy
Three Categories of Software
1. Platform-level – Software firmware that are present in every machine
2. Cluster-level – Distributed systems to enable everything
3. Application-level – User-facing applications built on top
BigData
WORKLOAD: Partition-Aggregate
Top-level Aggregator
Mid-level Aggregators
Workers
WORKLOAD: Map-Reduce
Reduce StageMap Stage
VIDEO ENCODING
MACHINE LEARNING
WORKLOAD PATTERNS
DATACENTER VS DESKTOP
Parallelism Available Workload churn Platform homogeneity Fault-free operation
DISCUSSION
Form groups of 4 students Pick up a discussion form per group Fill out responses at https://forms.gle/hhuKktMb5pKkotwc9
Discussion
Scale-up vs Scale-out
DISCUSSION
Differences between web-search and MapReduce
DISCUSSION
Microsoft Word vs. online document editor like Google Docs
DISCUSSION
NEXT STEPS
9/12 class on Storage Systems Assignment 1 out Thursday