computer)architecture)101) · 2012-09-02 · flash‘based)ssd)architecture) chip chip chip …...
Post on 03-Apr-2020
35 Views
Preview:
TRANSCRIPT
Computer Architecture 101
SDBS
How does a computer look like?
CPU
CPU
CPU
CPU
2nd Storage
2nd Storage 2nd
Storage
2nd Storage
RAM
RAM RAM
RAM
Driver
Driver
controller
D C
A B
Network
Network
Network
Network
Driver
Driver
What does a CPU do?
What is a hardware interrupt?
A. A signal from an external device to the CPU B. A signal from the CPU to an external device C. Signals exchanged between CPUs and
external devices D. A program call between CPUs and external
devices
What does an instrucIon look like?
• Data handling and memory – Set (register to constant), move (between register and RAM), read/write (to/from device)
• ArithmeIc and logic – +,´,*,\ – Bitwise operaIons (and, or, not, xor) – Compare (registers values)
• Control flow – Branch, i.e., manipulate instrucIon reference
(condiIonal, indirect)
How does a CPU look like?
InstrucIon Fetcher
InstrucIon Decoder Memory Interface
Registers
ALU
What is a 64 bit CPU?
A. CPU registers are 64 bits B. ALU operates on 64 bits operands C. A memory address is 64 bits long D. All of the above
What is Moore’s Law
A. The number of components on an integrated circuit will double every two years
B. The speed of CPUs will increase every two years
C. CPU performance will double every 18 months
D. CPU performance will increase quadraIcally
Moore’s Law
h]p://download.intel.com/museum/Moores_Law/ArIcles-‐Press_releases/Gordon_Moore_1965_ArIcle.pdf
The end of Moore’s law
Performance Trends
Diagram courtesy of A.Ailamaki (EPFL)
CPU Parallelism
A. Single instrucIon, single data (SISD) B. Single instrucIon, mulIple data (SIMD) C. MulIple instrucIon, single data (MISD) D. MulIple instrucIons, mulIple data (MIMD)
Cache Hierarchy
http://lwn.net/Articles/252125/
Intel Core 2
Figure courtesy of Appaloosa h]p://www.hotchips.org/wp-‐content/uploads/hc_archives/hc18/3_Tues/HC18.S9/HC18.S9T4.pdf
Motherboard
What is an IO (in terms of hardware architecture)?
A. An access to memory B. An access to secondary storage C. An access to a device connected on the I/O
bus
What is the bandwidth of a modern hard disk
(random IO per secnd)?
A. 10 IOPS B. 100 IOPS C. 1000 IOPS D. 10000 IOPS E. 100000 IOPS
How much faster are sequenIal IOs compared to random IOs on disk?
A. the same B. 2x faster C. 10x faster D. 100x faster
Controller
read/write head
disk arm
tracks
pla]er
spindle
actuator
disk interface
2000
2010
HDD Capacity
HDD IOPS
200 GB 2 TB
200 200
Flash SSD Capacity
SSD IOPS
14 GB (2001) 256 GB
HDD GB/$ 0,05
SSD GB/$ 3 x10E-‐4 0,5
30
10E6+ (PCIe) 5x10E3+ (SATA)
10E3 (SCSI)
x1 x600 x10
x20
x1000 x1000
PCM Capacity PCM IOPS 10E6+ (1 chip)
2x10E5 cells, 4 bits/cell
Some Trends
The Good
The hardware! • A single flash chip offers great performance
– e.g., 40 MB/s Read, 10 MB/s Program – Random access is as fast as sequenIal access – Low energy consumpIon
• A flash device contains many (e.g., 32, 64) flash chips and provides inter-‐chips parallelism
• Flash devices may include some (power-‐failure resistant) SRAM
The Bad
The severe constraints of flash chips! • C1: Program granularity:
– Program must be performed at flash page granularity • C2: Must erase a block before updaIng a page • C3: Pages must be programmed sequenIally within a block
• C4: Limited lifeIme (from 104 up to 106 erase operaIons)
Program granularity: a page (32 KB) Pagess must be programmed sequenIally within the block (256 pages)
Erase granularity: a block (1 MB)
The soGware!, the Flash TranslaHon Layer – emulates a classical block device and handle
flash constraints
And The FTL
SSD
Write sector
Read sector
No constraint!
Flash chips
Read page
Program page
Erase block
Constraints (C1) Program granularity (C2) Erase before prog. (C3) SequenIal program within a block (C4) Limited lifeIme
MAPPING
GARBAGE COLLECTION
WEAR LEVELING
FTL
Flash-‐Based SSD Architecture
chip chip chip …
chip chip chip …
chip chip chip …
chip chip chip …
Read Write Trim
Lo
gic
al ad
dre
ss s
pace
Ph
ys
ica
l a
dd
res
s s
pa
ce
Scheduling& Mapping
Wear Leveling Garbage
collection
Shared Internal data structures
Read Program
Erase
Flash memory array
Methodology: Device state
Random Writes – Samsung SSD Out of the box
è Enforce a well-‐defined device state – performing random write IOs of random size on the whole device – The alternaIve, sequenIal IOs, is less stable, thus more difficult to enforce
Random Writes – Samsung SSD A9er filling the device
Methodology: Startup and running phases • When do we reach a steady state? How long to run each test?
Startup and running phases for the Mtron SSD (RW)
Running phase for the Kingston DTI flash Drive (SW)
è Startup and running phase: Run experiments to define § IOIgnore: Number of IOs ignored when compuIng staIsIcs § IOCount: Number of measures to allow for convergence of those staIsIcs.
Methodology: Interferences
è Interferences: Introduce a pause between experiments
0.1
1
10
0 250 500 750 1000 1250 1500
SequenIal Reads Random Writes
Pause
SequenIal Reads
Results: Samsung, memoright, Mtron
Locality for the Samsung, Memoright and Mtron SSDs
• When limited to a focused area, RW performs very well
• For SR, SW and RR,
– linear behavior, almost no latency – good throughputs with large IO Size
• For RW, ≈5ms for a 16KB-‐128KB IO
Granularity for the Memoright SSD
Results: Intel X25-‐E
RW (16 KB) performance varies from 100 μs to 100
ms!! (x 1000)
SR, SW and RW have similar performance.
RR are more costly!
IO size (KB)
Response Ime (μs)
Response Ime (μs)
Results : Fusion IO
• Capacity vs Performance tradeoff (80 GB à 22 GB!) • SensiIvity to device state
0"
50"
100"
150"
200"
250"
MaxCap" MaxWrite" MaxCap" MaxWrite"
SR"
RR"
SW"
RW"
Low level forma]ed
Response Ime (μs)
0"
50"
100"
150"
200"
250"
MaxCap" MaxWrite" MaxCap" MaxWrite"
SR"
RR"
SW"
RW"
Fully wri]en
0"
50"
100"
150"
200"
250"
MaxCap" MaxWrite" MaxCap" MaxWrite"
SR"
RR"
SW"
RW"
IO Size = 4KB
Phase-‐Change Memory (PCM) h]p://cseweb.ucsd.edu/users/swanson/papers/HotStorage2011-‐Onyx.pdf
h]p://www.micron.com/products/phase-‐change-‐memory
• Byte addressable • In-‐place update (no erase) • 10^6 write cycles per cell
• 2012 PCM chip characterisIcs: • 128 MB • 50 MB/sec (random read 16 B/IO) • 0.5 MB/sec (random write 64 B/IO)
Modern Computer Architecture
h]p://hpts.ws/session2/mohan.pdf
top related