computer)architecture)101) · 2012-09-02 · flash‘based)ssd)architecture) chip chip chip …...

31
Computer Architecture 101 SDBS

Upload: others

Post on 03-Apr-2020

35 views

Category:

Documents


6 download

TRANSCRIPT

Page 1: Computer)Architecture)101) · 2012-09-02 · Flash‘Based)SSD)Architecture) chip chip chip … chip chip chip … chip chip chip … chip chip chip … Read Write Trim Logical address

Computer  Architecture  101  

SDBS  

Page 2: Computer)Architecture)101) · 2012-09-02 · Flash‘Based)SSD)Architecture) chip chip chip … chip chip chip … chip chip chip … chip chip chip … Read Write Trim Logical address

How  does  a  computer  look  like?  

CPU  

CPU  

CPU  

CPU  

2nd  Storage  

2nd  Storage   2nd  

Storage  

2nd  Storage  

RAM  

RAM  RAM  

RAM  

Driver  

Driver  

controller  

D  C  

A   B  

Network  

Network  

Network  

Network  

Driver  

Driver  

Page 3: Computer)Architecture)101) · 2012-09-02 · Flash‘Based)SSD)Architecture) chip chip chip … chip chip chip … chip chip chip … chip chip chip … Read Write Trim Logical address

What  does  a  CPU  do?  

Page 4: Computer)Architecture)101) · 2012-09-02 · Flash‘Based)SSD)Architecture) chip chip chip … chip chip chip … chip chip chip … chip chip chip … Read Write Trim Logical address

What  is  a  hardware  interrupt?  

A.  A  signal  from  an  external  device  to  the  CPU  B.  A  signal  from  the  CPU  to  an  external  device  C.  Signals  exchanged  between  CPUs  and  

external  devices  D.  A  program  call  between  CPUs  and  external  

devices  

Page 5: Computer)Architecture)101) · 2012-09-02 · Flash‘Based)SSD)Architecture) chip chip chip … chip chip chip … chip chip chip … chip chip chip … Read Write Trim Logical address

What  does  an  instrucIon  look  like?  

•  Data  handling  and  memory  –  Set  (register  to  constant),  move  (between  register  and  RAM),  read/write  (to/from  device)  

•  ArithmeIc  and  logic  –  +,´,*,\  –  Bitwise  operaIons  (and,  or,  not,  xor)  –  Compare  (registers  values)  

•  Control  flow  –  Branch,  i.e.,  manipulate  instrucIon  reference  

(condiIonal,  indirect)  

Page 6: Computer)Architecture)101) · 2012-09-02 · Flash‘Based)SSD)Architecture) chip chip chip … chip chip chip … chip chip chip … chip chip chip … Read Write Trim Logical address

How  does  a  CPU  look  like?  

InstrucIon  Fetcher  

InstrucIon  Decoder   Memory  Interface  

Registers  

   

ALU  

Page 7: Computer)Architecture)101) · 2012-09-02 · Flash‘Based)SSD)Architecture) chip chip chip … chip chip chip … chip chip chip … chip chip chip … Read Write Trim Logical address

What  is  a  64  bit  CPU?  

A.  CPU  registers  are  64  bits  B.  ALU  operates  on  64  bits  operands  C.  A  memory  address  is  64  bits  long  D.  All  of  the  above  

Page 8: Computer)Architecture)101) · 2012-09-02 · Flash‘Based)SSD)Architecture) chip chip chip … chip chip chip … chip chip chip … chip chip chip … Read Write Trim Logical address

What  is  Moore’s  Law  

A.  The  number  of  components  on  an  integrated  circuit  will  double  every  two  years  

B.  The  speed  of  CPUs  will  increase  every  two  years  

C.  CPU  performance  will  double  every  18  months  

D.  CPU  performance  will  increase  quadraIcally  

Page 9: Computer)Architecture)101) · 2012-09-02 · Flash‘Based)SSD)Architecture) chip chip chip … chip chip chip … chip chip chip … chip chip chip … Read Write Trim Logical address

Moore’s  Law  

h]p://download.intel.com/museum/Moores_Law/ArIcles-­‐Press_releases/Gordon_Moore_1965_ArIcle.pdf  

Page 10: Computer)Architecture)101) · 2012-09-02 · Flash‘Based)SSD)Architecture) chip chip chip … chip chip chip … chip chip chip … chip chip chip … Read Write Trim Logical address

The  end  of  Moore’s  law  

Page 11: Computer)Architecture)101) · 2012-09-02 · Flash‘Based)SSD)Architecture) chip chip chip … chip chip chip … chip chip chip … chip chip chip … Read Write Trim Logical address

Performance  Trends  

Diagram  courtesy  of  A.Ailamaki  (EPFL)  

Page 12: Computer)Architecture)101) · 2012-09-02 · Flash‘Based)SSD)Architecture) chip chip chip … chip chip chip … chip chip chip … chip chip chip … Read Write Trim Logical address

CPU  Parallelism  

A.  Single  instrucIon,  single  data  (SISD)  B.  Single  instrucIon,  mulIple  data  (SIMD)  C.  MulIple  instrucIon,  single  data  (MISD)  D.  MulIple  instrucIons,  mulIple  data  (MIMD)    

Page 13: Computer)Architecture)101) · 2012-09-02 · Flash‘Based)SSD)Architecture) chip chip chip … chip chip chip … chip chip chip … chip chip chip … Read Write Trim Logical address

Cache  Hierarchy  

http://lwn.net/Articles/252125/

Page 14: Computer)Architecture)101) · 2012-09-02 · Flash‘Based)SSD)Architecture) chip chip chip … chip chip chip … chip chip chip … chip chip chip … Read Write Trim Logical address

Intel  Core  2  

Figure  courtesy  of  Appaloosa  h]p://www.hotchips.org/wp-­‐content/uploads/hc_archives/hc18/3_Tues/HC18.S9/HC18.S9T4.pdf  

Page 15: Computer)Architecture)101) · 2012-09-02 · Flash‘Based)SSD)Architecture) chip chip chip … chip chip chip … chip chip chip … chip chip chip … Read Write Trim Logical address

Motherboard  

Page 16: Computer)Architecture)101) · 2012-09-02 · Flash‘Based)SSD)Architecture) chip chip chip … chip chip chip … chip chip chip … chip chip chip … Read Write Trim Logical address

What  is  an  IO    (in  terms  of  hardware  architecture)?  

A.  An  access  to  memory  B.  An  access  to  secondary  storage  C.  An  access  to  a  device  connected  on  the  I/O  

bus    

Page 17: Computer)Architecture)101) · 2012-09-02 · Flash‘Based)SSD)Architecture) chip chip chip … chip chip chip … chip chip chip … chip chip chip … Read Write Trim Logical address

What  is  the  bandwidth  of    a  modern  hard  disk    

(random  IO  per  secnd)?  

A.  10  IOPS  B.  100  IOPS  C.  1000  IOPS  D.  10000  IOPS  E.  100000  IOPS  

Page 18: Computer)Architecture)101) · 2012-09-02 · Flash‘Based)SSD)Architecture) chip chip chip … chip chip chip … chip chip chip … chip chip chip … Read Write Trim Logical address

How  much  faster  are  sequenIal  IOs  compared  to  random  IOs  on  disk?  

A.  the  same  B.  2x  faster  C.  10x  faster  D.  100x  faster  

Controller  

read/write    head  

disk  arm  

tracks  

pla]er  

spindle  

actuator  

disk  interface  

Page 19: Computer)Architecture)101) · 2012-09-02 · Flash‘Based)SSD)Architecture) chip chip chip … chip chip chip … chip chip chip … chip chip chip … Read Write Trim Logical address

2000      

2010  

HDD  Capacity  

HDD  IOPS  

200  GB   2  TB  

200     200    

Flash  SSD  Capacity  

SSD  IOPS  

14  GB  (2001)   256  GB  

HDD  GB/$   0,05  

SSD  GB/$   3  x10E-­‐4   0,5  

30  

10E6+    (PCIe)  5x10E3+    (SATA)  

10E3  (SCSI)  

x1  x600  x10  

x20  

x1000  x1000  

PCM  Capacity  PCM  IOPS   10E6+    (1  chip)  

2x10E5  cells,  4  bits/cell  

Some  Trends  

Page 20: Computer)Architecture)101) · 2012-09-02 · Flash‘Based)SSD)Architecture) chip chip chip … chip chip chip … chip chip chip … chip chip chip … Read Write Trim Logical address

The  Good  

The  hardware!  •  A  single  flash  chip  offers  great  performance  

– e.g.,  40  MB/s  Read,  10  MB/s  Program  – Random  access  is  as  fast  as  sequenIal  access  – Low  energy  consumpIon  

•  A  flash  device  contains  many  (e.g.,  32,  64)  flash  chips  and  provides  inter-­‐chips  parallelism  

•  Flash  devices  may  include  some  (power-­‐failure  resistant)  SRAM  

Page 21: Computer)Architecture)101) · 2012-09-02 · Flash‘Based)SSD)Architecture) chip chip chip … chip chip chip … chip chip chip … chip chip chip … Read Write Trim Logical address

The  Bad  

The  severe  constraints  of  flash  chips!  •  C1:  Program  granularity:    

–  Program  must  be  performed  at  flash  page  granularity    •  C2:  Must  erase  a  block  before  updaIng  a  page  •  C3:  Pages  must  be  programmed  sequenIally  within  a  block  

•  C4:  Limited  lifeIme  (from  104  up  to  106  erase  operaIons)  

Program  granularity:  a  page  (32  KB)  Pagess  must  be  programmed  sequenIally  within  the  block  (256  pages)  

Erase  granularity:  a  block  (1  MB)  

Page 22: Computer)Architecture)101) · 2012-09-02 · Flash‘Based)SSD)Architecture) chip chip chip … chip chip chip … chip chip chip … chip chip chip … Read Write Trim Logical address

The  soGware!,  the  Flash  TranslaHon  Layer  – emulates a classical block device and handle

flash constraints    

And  The  FTL  

SSD  

Write  sector  

Read  sector  

No  constraint!  

Flash  chips  

Read  page  

Program  page  

Erase  block  

Constraints    (C1)  Program  granularity    (C2)  Erase  before  prog.    (C3)  SequenIal  program                  within  a  block    (C4)  Limited  lifeIme  

MAPPING  

GARBAGE  COLLECTION  

WEAR  LEVELING  

FTL  

Page 23: Computer)Architecture)101) · 2012-09-02 · Flash‘Based)SSD)Architecture) chip chip chip … chip chip chip … chip chip chip … chip chip chip … Read Write Trim Logical address

Flash-­‐Based  SSD  Architecture  

chip chip chip …

chip chip chip …

chip chip chip …

chip chip chip …

Read Write Trim

Lo

gic

al ad

dre

ss s

pace

Ph

ys

ica

l a

dd

res

s s

pa

ce

Scheduling& Mapping

Wear Leveling Garbage

collection

Shared Internal data structures

Read Program

Erase

Flash memory array

Page 24: Computer)Architecture)101) · 2012-09-02 · Flash‘Based)SSD)Architecture) chip chip chip … chip chip chip … chip chip chip … chip chip chip … Read Write Trim Logical address

Methodology:  Device  state  

Random  Writes  –  Samsung  SSD  Out  of  the  box  

è  Enforce  a  well-­‐defined  device  state    –  performing  random  write  IOs  of  random  size  on  the  whole  device  –  The  alternaIve,  sequenIal  IOs,  is  less  stable,  thus  more  difficult  to  enforce  

 

Random  Writes  –  Samsung  SSD  A9er  filling  the  device  

Page 25: Computer)Architecture)101) · 2012-09-02 · Flash‘Based)SSD)Architecture) chip chip chip … chip chip chip … chip chip chip … chip chip chip … Read Write Trim Logical address

Methodology:  Startup  and  running  phases  •  When  do  we  reach  a  steady  state?  How  long  to  run  each  test?  

Startup  and  running  phases  for    the  Mtron  SSD  (RW)  

Running  phase  for  the  Kingston  DTI    flash  Drive  (SW)  

è  Startup  and  running  phase:  Run  experiments  to  define  §  IOIgnore:  Number  of  IOs  ignored  when  compuIng  staIsIcs  §  IOCount:  Number  of  measures  to  allow  for  convergence  of  those  staIsIcs.  

Page 26: Computer)Architecture)101) · 2012-09-02 · Flash‘Based)SSD)Architecture) chip chip chip … chip chip chip … chip chip chip … chip chip chip … Read Write Trim Logical address

Methodology:  Interferences  

è  Interferences:  Introduce  a  pause  between  experiments  

0.1  

1  

10  

0   250   500   750   1000   1250   1500  

SequenIal  Reads   Random  Writes  

Pause  

SequenIal  Reads  

Page 27: Computer)Architecture)101) · 2012-09-02 · Flash‘Based)SSD)Architecture) chip chip chip … chip chip chip … chip chip chip … chip chip chip … Read Write Trim Logical address

Results:  Samsung,  memoright,  Mtron  

Locality  for  the  Samsung,    Memoright  and  Mtron  SSDs  

• When  limited  to  a  focused  area,    RW  performs  very  well  

•  For  SR,  SW  and  RR,    

–  linear  behavior,  almost  no  latency  –  good  throughputs  with  large  IO  Size  

•  For  RW,  ≈5ms  for  a  16KB-­‐128KB  IO  

Granularity  for  the    Memoright  SSD  

Page 28: Computer)Architecture)101) · 2012-09-02 · Flash‘Based)SSD)Architecture) chip chip chip … chip chip chip … chip chip chip … chip chip chip … Read Write Trim Logical address

Results:  Intel  X25-­‐E  

RW  (16  KB)  performance  varies  from  100  μs  to  100  

ms!!  (x  1000)  

SR,  SW  and  RW  have  similar  performance.  

RR  are  more  costly!  

IO  size  (KB)  

Response  Ime  (μs)  

Response  Ime  (μs)  

Page 29: Computer)Architecture)101) · 2012-09-02 · Flash‘Based)SSD)Architecture) chip chip chip … chip chip chip … chip chip chip … chip chip chip … Read Write Trim Logical address

Results  :  Fusion  IO  

•  Capacity  vs  Performance  tradeoff  (80  GB  à  22  GB!)  •  SensiIvity  to  device  state  

0"

50"

100"

150"

200"

250"

MaxCap" MaxWrite" MaxCap" MaxWrite"

SR"

RR"

SW"

RW"

Low  level  forma]ed  

Response  Ime  (μs)  

0"

50"

100"

150"

200"

250"

MaxCap" MaxWrite" MaxCap" MaxWrite"

SR"

RR"

SW"

RW"

Fully  wri]en  

0"

50"

100"

150"

200"

250"

MaxCap" MaxWrite" MaxCap" MaxWrite"

SR"

RR"

SW"

RW"

IO  Size  =  4KB  

Page 30: Computer)Architecture)101) · 2012-09-02 · Flash‘Based)SSD)Architecture) chip chip chip … chip chip chip … chip chip chip … chip chip chip … Read Write Trim Logical address

Phase-­‐Change  Memory  (PCM)  h]p://cseweb.ucsd.edu/users/swanson/papers/HotStorage2011-­‐Onyx.pdf  

h]p://www.micron.com/products/phase-­‐change-­‐memory  

•  Byte  addressable  •  In-­‐place  update  (no  erase)  •  10^6  write  cycles  per  cell  

•  2012  PCM  chip  characterisIcs:  •  128  MB  •  50  MB/sec  (random  read  16  B/IO)  •  0.5  MB/sec  (random  write  64  B/IO)  

 

Page 31: Computer)Architecture)101) · 2012-09-02 · Flash‘Based)SSD)Architecture) chip chip chip … chip chip chip … chip chip chip … chip chip chip … Read Write Trim Logical address

Modern  Computer  Architecture  

h]p://hpts.ws/session2/mohan.pdf