3.1memory

Upload: kaylaroberts

Post on 03-Apr-2018

217 views

Category:

Documents


0 download

TRANSCRIPT

  • 7/28/2019 3.1Memory

    1/62

    Chapter 5. The Memory

    System

  • 7/28/2019 3.1Memory

    2/62

    Overview

    Basic memory circuits

    Organization of the main memory

    Cache memory concept Virtual memory mechanism

    Secondary storage

  • 7/28/2019 3.1Memory

    3/62

    Some Basic Concepts

  • 7/28/2019 3.1Memory

    4/62

    Basic Concepts

    The maximum size of the memory that can be used in anycomputer is determined by the addressing scheme.16-bit addresses = 216 = 64K memory locations (32-bit 4GB, 40-bit - 1TB)

    Most modern computers are byte addressable.

    In 32-bit memory address, Higher order 30 bits determine word toaccess

    2k

    4- 2k

    3- 2k

    2- 2k

    1- 2k

    4-2k

    4-

    0 1 2 3

    4 5 6 7

    00

    4

    2k

    1- 2k

    2- 2k

    3- 2k

    4-

    3 2 1 0

    7 6 5 4

    Byte addressByte address

    (a) Big-endian assignment (b) Little-endian assignment

    4

    W ord

    address

  • 7/28/2019 3.1Memory

    5/62

    Traditional Architecture

    Up to 2k addressableMDR

    MAR

    Figure 5.1. Connection of the memory to the processor.

    k-bitaddress bus

    n-bitdata bus

    Control lines

    ( , MFC, etc.)

    Processor Memory

    locations

    Word length = n bits

    WR /

    K can be smaller than n

    if k=2^64 ~ 2 exabytes (tera->Peta->exa 10^19)

  • 7/28/2019 3.1Memory

    6/62

    Basic Concepts

    Block transfer bulk data transfer

    Memory access time (Time between Readand MFC)

    Memory cycle time (Time between twosuccessive R/W)

    RAM any location can be accessedfor a Read or Write operation insome fixed amount of time that isindependent of the locationsaddress.

    Cache memory

  • 7/28/2019 3.1Memory

    7/62

    Semiconductor RAM

    Memories

  • 7/28/2019 3.1Memory

    8/62

    Internal Organization ofMemory Chips

    FF

    Figure 5.2. Organization of bit cells in a memory chip.

    circuitSense / Write

    Address

    decoder

    FF

    CS

    cells

    Memory

    circuitSense / Write Sense / Write

    circuit

    Data input /output lines:

    A0

    A1

    A2

    A3

    W0

    W1

    W15

    b7

    b1

    b0

    WR /

    b 7

    b1

    b 0

    b7 b1 b0

    16 words of 8 bits each: 16x8

    memory org.. It has 16 externalconnections: addr. 4, data 8,

    control: 2, power/ground: 2

    1K memory cells: 128x8 memory,

    external connections: ?

    19(7+8+2+2)

    1Kx1:? 15 (10+1+2+2)

  • 7/28/2019 3.1Memory

    9/62

    A Memory Chip

    Figure 5.3. Organization of a 1K 1 memory chip.

    CS

    Sense/ Write

    circuitry

    arraymemory cell

    address5-bit row

    input/outputData

    5-bitdecoder

    address5-bit column

    address10-bit

    output multiplexer

    32-to-1

    input demultiplexer

    32 32

    WR /

    W0

    W1

    W31

    and

  • 7/28/2019 3.1Memory

    10/62

    Static Memories

    The circuits are capable of retaining their state aslong as power is applied.

    YX

    Word line

    Bit lines

    Figure 5.4. A static RAM cell.

    b

    T2T1

    b

  • 7/28/2019 3.1Memory

    11/62

    Figure A.15. CMOS realization of a NOT gate.

    (b) Truth table and transistor states

  • 7/28/2019 3.1Memory

    12/62

    Static Memories

    CMOS cell: low power consumption

  • 7/28/2019 3.1Memory

    13/62

    Asynchronous DRAMs Static RAMs are fast, but they cost more area and are more

    expensive. Dynamic RAMs (DRAMs) are cheap and area efficient, but

    they can not retain their state indefinitely need to beperiodically refreshed.

    Figure 5.6. A single-transistor dynamic memory cell

    T

    C

    Word line

    Bit line

  • 7/28/2019 3.1Memory

    14/62

    A Dynamic Memory Chip

    Column

    CSSense / Writecircuits

    cell arraylatch

    addressRow

    Column

    latch

    decoderRow

    decoderaddress

    4096 512 8

    R / W

    A20 9- A8 0-

    D0D7

    R A S

    C A S

    Figure 5.7. Internal organization of a 2M 8 dynamic memory chip.

  • 7/28/2019 3.1Memory

    15/62

    Fast Page Mode

    When the DRAM in last slide is accessed,the contents of all 4096 cells in theselected row are sensed, but only 8 bits

    are placed on the data lines D7-0, asselected by A8-0.

    Fast page mode make it possible toaccess the other bytes in the same row

    without having to reselect the row.

    A latch is added at the output of the senseamplifier in each column.

    Good for bulk transfer.

  • 7/28/2019 3.1Memory

    16/62

    Synchronous DRAMs

    The operations of SDRAM are controlled by a

    clock signal.

    R/ W

    RA S

    CA S

    CS

    Clock

    Cell arraylatch

    addressRow

    decoderRow

    Figure 5.8. Synchronous DRAM.

    decoderCo lumn Read/Write

    circuits & latchescounteraddressColumn

    Row/Columnaddress

    Data inputregister

    Data outputregister

    Data

    Refreshcounter

    Mode registerand

    timing control

  • 7/28/2019 3.1Memory

    17/62

    Synchronous DRAMs

    R / W

    R A S

    C A S

    Clock

    Figure 5.9. Burst read of length 4 in an SDRAM.

    Row Col

    D0 D1 D2 D3

    Address

    Data

  • 7/28/2019 3.1Memory

    18/62

    Synchronous DRAMs

    No CAS pulses is needed in burstoperation.

    Refresh circuits are included (every64ms).

    Clock frequency > 100 MHz

    Intel PC100 and PC133 DDR Double data rate

  • 7/28/2019 3.1Memory

    19/62

    Latency and Bandwidth

    The speed and efficiency of data transfersamong memory, processor, and disk havea large impact on the performance of a

    computer system. Memory latency the amount of time it

    takes to transfer a word of data to or fromthe memory.

    Memory bandwidth the number of bits orbytes that can be transferred in onesecond. It is used to measure how muchtime is needed to transfer an entire block

    of data.

  • 7/28/2019 3.1Memory

    20/62

    DDR SDRAM Double-Data-Rate SDRAM

    Standard SDRAM performs all actions on the risingedge of the clock signal.

    DDR SDRAM accesses the cell array in the same way,but transfers the data on both edges of the clock.

    The cell array is organized in two banks. Each can beaccessed separately.

    DDR SDRAMs and standard SDRAMs are mostefficiently used in applications where block transfersare prevalent.

    SDRAM 10ns 100 MHz

    DDR7.5/6/5 ns 133/166/200 MHz

    DDR2/3 4 words or 8 words/read

    f

  • 7/28/2019 3.1Memory

    21/62

    Structures of Larger Memories

    Figure 5.10. Organization of a 2M 32 memory module using 512K 8 static memory chips.

    19-bit internal chip address

    decoder2-bit

    addresses21-bit

    A0A1

    A19

    memory chip

    A20

    D31-24

    D7-0

    D23-16

    D15-8

    512 K 8

    Chip select

    memory chip

    19-bitaddress

    512 K 8

    8-bit datainput/output2 M words, 32-bit each

  • 7/28/2019 3.1Memory

    22/62

    Memory SystemConsiderations

    The choice of a RAM chip for a given applicationdepends on several factors:

    Cost, speed, power, size

    SRAMs are faster, more expensive, smaller.

    DRAMs are slower, cheaper, larger.

    Which one for cache and main memory,respectively?

    Refresh overhead suppose a SDRAM whose

    cells are in 8K rows; 4 clock cycles are needed toaccess each row; then it takes 81924=32,768cycles to refresh all rows; if the clock rate is 133MHz, then it takes 32,768/(13310-6)=24610-6seconds; suppose the typical refreshing period is

    64 ms, then the refresh overhead is=

  • 7/28/2019 3.1Memory

    23/62

    Memory Controller

    Processor

    R A S

    C A S

    R/ W

    Clock

    Address

    Row/Columnaddress

    Memorycontroller

    R/ W

    Clock

    RequestCS

    Data

    Memory

    Figure 5.11. Use of a memory controller.

  • 7/28/2019 3.1Memory

    24/62

    Read-Only Memories

  • 7/28/2019 3.1Memory

    25/62

    Read-Only-Memory

    Volatile / non-volatile memory

    ROM

    PROM EPROM

    EEPROMNot connected to store a 1

    Connected to store a 0

    Figure 5.12. A ROM cell.

    Word line

    P

    Bit line

    T

  • 7/28/2019 3.1Memory

    26/62

    Flash Memory

    Similar to EEPROM

    Difference: only possible to write anentire block of cells instead of asingle cell

    Low power

    Use in portable equipment Implementation of such modules

    Flash cards

    Flash drives

  • 7/28/2019 3.1Memory

    27/62

    Speed, Size, and Cost

    Pr ocessor

    Primarycache

    Secondarycache

    Main

    Magnetic disk

    memory

    Increasingsize

    Increasingspeed

    Figure 5.13. Memory hierarchy.

    secondary

    memory

    Increasingcost per bit

    Re gisters

    L1

    L2

  • 7/28/2019 3.1Memory

    28/62

    Cache Memories

  • 7/28/2019 3.1Memory

    29/62

    Cache

    What is cache?

    Why we need it?

    Locality of reference (very important)- temporal

    - spatial

    Cache block cache line

  • 7/28/2019 3.1Memory

    30/62

    Cache

    Replacement algorithm Hit / miss

    Write-through / Write-back

    Load through

    Figure 5.14. Use of a cache memory.

    CacheMain

    memoryProcessor

  • 7/28/2019 3.1Memory

    31/62

    Direct Mapping

    tag

    tag

    tag

    Cache

    Mainmemory

    Block 0

    Block 1

    Block 127

    Block 128

    Block 129

    Block 255

    Block 256

    Block 257

    Block 4095

    Block 0

    Block 1

    Block 127

    7 4 Main memory address

    Tag Block Word

    128 bocks 16 words each (2K words cache)

    16-bit addressable

    Main memory -64K words(4K blocks of 16 words each)

    block j mod 128

    Figure 5.15. Direct-mapped cache.5

  • 7/28/2019 3.1Memory

    32/62

    Associative Mapping

    4

    tag

    tag

    tag

    Cache

    Mainmemory

    Block 0

    Block 1

    Blocki

    Block 4095

    Block 0

    Block 1

    Block 127

    12 Main memory address

    Figure 5.16. Associative-mapped cache.

    T ag Word

    Place anywhere in cache

    12 bits for tag

    Search entire cache

  • 7/28/2019 3.1Memory

    33/62

    Set-Associative Mapping

    tag

    tag

    tag

    Cache

    Mainmemory

    Block 0

    Block 1

    Block 63

    Block 64

    Block 65

    Block 127

    Block 128

    Block 129

    Block 4095

    Block 0

    Block 1

    Block 126

    tag

    tag

    Block 2

    Block 3

    tagBlock 127

    Main memory address6 6 4

    Tag Set Word

    Set 0

    Set 1

    Set 63

    Figure 5.17. Set-associative-mapped cache with two blocks per set.

    2 or more Blocks make a set

    Place block in any one of the blocks in a setContention minimized, chance for any one of the two

    blocks

    Search space reduced (search from 64 rather than 128

    blocks)

  • 7/28/2019 3.1Memory

    34/62

    Cache Validity

    Valid bit indicates data is valid

    Set to 0 when powered on

    Set to 0 when block in cache, butDMA transfer bypasses cache

    Set to 1 when block loaded withdata

    Similar issue for write from memoryto disk using DMA, memory hasstale data

    Flush cache before DMA write

  • 7/28/2019 3.1Memory

    35/62

    Replacement Algorithms

    Difficult to determine which blocks tokick out

    Least Recently Used (LRU) block

    The cache controller tracksreferences to all blocks ascomputation proceeds.

    Increase / clear track counters whena hit/miss occurs

    FIFO

  • 7/28/2019 3.1Memory

    36/62

    Performance Considerations

  • 7/28/2019 3.1Memory

    37/62

    Overview

    Two key factors: performance and cost Price/performance ratio Performance depends on how fast

    machine instructions can be brought into

    the processor for execution and how fastthey can be executed.

    For memory hierarchy, it is beneficial iftransfers to and from the faster units can

    be done at a rate equal to that of thefaster unit. This is not possible if both the slow and

    the fast units are accessed in the samemanner.

    However, it can be achieved when

  • 7/28/2019 3.1Memory

    38/62

    Interleaving

    If the main memory is structured as acollection of physically separated modules,each with its own ABR and DBR, memoryaccess operations may proceed in more

    than one module at the same time.mbitsAddress in module MM address

    (a) Consecutive words in a module

    i

    kbits

    Module Module Module

    Module

    DBRABR DBRABR ABR DBR

    0 n 1-

    Figure 5.25. Addressing multiple-module memory systems.

    (b) Consecutive words in consecutive modules

    i

    kbits

    0

    ModuleModuleModule

    Module MM address

    DBRABRABR DBRABR DBR

    Address in module

    2k

    1-

    mbits

  • 7/28/2019 3.1Memory

    39/62

    Hit Rate and Miss Penalty The success rate in accessing information

    at various levels of the memory hierarchy hit rate / miss rate.

    Ideally, the entire memory hierarch would

    appear to the processor as a singlememory unit that has the access time of acache on the processor chip and the sizeof a magnetic disk depends on the hit

    rate (>>0.9). A miss causes extra time needed to bring

    the desired information into the cache.

    Example 5.2, page 332.

  • 7/28/2019 3.1Memory

    40/62

    Example

    h -hit rate,

    M miss penalty -time to access main memory

    C -time to access cache

    tavg= hC+(1-h)MNo cache:

    time to get one word=1+8+1

    cache with 8 word blockstime to get 1 block = 1+8+4+4(send addr + access 1st word all 4 modules + send 4 words in next 4 cycles simultaneous access

    next addrs + 4 cycles next 4 words)

    1 cycle send addr to memory

    8 cycles - 1st word access

    4 cycles subsequent words

    1 cycle - send word to cache

  • 7/28/2019 3.1Memory

    41/62

    How to Improve Hit Rate?

    Use larger cache

    Increase the block size while keepingthe total cache size constant.

    However, if the block size is toolarge, some items may not bereferenced before the block is

    replaced miss penalty increases.

    Load-through approach

    Caches on the Processor Chip

  • 7/28/2019 3.1Memory

    42/62

    Caches on the Processor Chip On chip vs. off chip

    Two separate caches for instructions anddata, respectively

    Single cache for both

    Which one has better hit rate?

    Whats the good for separating caches? Level 1 and Level 2 caches

    L1 cache faster and smaller. Accessmore than one word simultaneously and

    let the processor use them one at a time. L2 cache slower and larger.

    How about the average access time?

    Average access time: tave

    = h1C

    1+ (1-

    h h C + 1-h 1-h M

  • 7/28/2019 3.1Memory

    43/62

    Other Enhancements

    Write buffer processor doesntneed to wait for the memory write tobe completed

    Prefetching prefetch the data intothe cache before they are needed

    Lockup-Free cache processor is

    able to access the cache while a missis being serviced.

  • 7/28/2019 3.1Memory

    44/62

    Virtual Memories

  • 7/28/2019 3.1Memory

    45/62

    Overview Physical main memory is not as large as the

    address space spanned by an address issued bythe processor.

    232 = 4 GB, 264=

    When a program does not completely fit into the

    main memory, the parts of it not currently beingexecuted are stored on secondary storagedevices.

    Techniques that automatically move program anddata blocks into the physical main memory whenthey are required for execution are called virtual-memory techniques.

    Virtual addresses will be translated into physicaladdresses.

    Overview

  • 7/28/2019 3.1Memory

    46/62

    Overview

  • 7/28/2019 3.1Memory

    47/62

    Address Translation

    All programs and data are composedof fixed-length units called pages,each of which consists of a block ofwords that occupy contiguouslocations in the main memory.

    Page cannot be too small or toolarge.

    The virtual memory mechanismbridges the size and speed gapsbetween the main memory and

    secondary storage similar to cache.

  • 7/28/2019 3.1Memory

    48/62

    Address Translation

    Page frame

    Virtual address from processor

    in memory

    Offset

    Offset

    Virtual page numberPage table address

    Page table base register

    Figure 5.27. Virtual-memory address translation.

    Controlbits

    Physical address in main memory

    PAGE TABLE

    Page frame

    +

    Valid bitModifed bit

    Read only bit ...

  • 7/28/2019 3.1Memory

    49/62

    Address Translation

    The page table information is usedby the MMU for every access, so it issupposed to be with the MMU.

    However, since MMU is on the

    processor chip and the page table israther large, only small portion of it,which consists of the page tableentries that correspond to the mostrecently accessed pages, can beaccommodated within the MMU.

    Translation Lookaside Buffer (TLB)

    TLBVirtual address from processor

  • 7/28/2019 3.1Memory

    50/62

    TLB

    Figure 5.28. Use of an associative-mapped TLB.

    No

    Yes

    Hit

    Miss

    p

    TLB

    OffsetVirtual page number

    numberVirtual page Page frame

    in memoryControlbits

    Offset

    Physical address in main memory

    Page frame

    =?

  • 7/28/2019 3.1Memory

    51/62

    TLB The contents of TLB must be

    coherent with the contents of pagetables in the memory.

    Translation procedure.

    Page fault

    Page replacement

    Write-through is not suitable forvirtual memory.

    Locality of reference in virtual

    memory

    M M t

  • 7/28/2019 3.1Memory

    52/62

    Memory ManagementRequirements

    Multiple programs

    System space / user space

    Protection (supervisor / user state,privileged instructions)

    Shared pages

  • 7/28/2019 3.1Memory

    53/62

    Secondary Storage

  • 7/28/2019 3.1Memory

    54/62

    Magnetic Hard Disks

    Disk

    Disk drive

    Disk controller

  • 7/28/2019 3.1Memory

    55/62

    Organization of Data on a Disk

    Sector 0, track 0

    Sector 3, trackn

    Figure 5.30. Organization of one surface of a disk.

    Sector 0, track 1

  • 7/28/2019 3.1Memory

    56/62

    Access Data on a Disk

    Sector header

    Following the data, there is an error-correction code (ECC).

    Formatting process Difference between inner tracks and

    outer tracks

    Access time seek time / rotationaldelay (latency time)

    Data buffer/cache

  • 7/28/2019 3.1Memory

    57/62

    Disk Controller

    Processor Main memory

    System bus

    Figure 5.31. Disks connected to the system bus.

    Disk controller

    Disk drive Disk drive

  • 7/28/2019 3.1Memory

    58/62

    Disk Controller

    Seek

    Read

    Write

    Error checking

  • 7/28/2019 3.1Memory

    59/62

    RAID Disk Arrays

    Redundant Array of InexpensiveDisks

    Using multiple disks makes it

    cheaper for huge storage, and alsopossible to improve the reliability ofthe overall system.

    RAID0 data striping

    RAID1

    RAID2, 3, 4

    - - Aluminum Acrylic Label

  • 7/28/2019 3.1Memory

    60/62

    Optical Disks(a) Cross-section

    Polycarbonate plastic

    Source Detector Source Detector Source Detector

    No reflection

    Reflection Reflection

    Pit Land

    0 0 0 1 0 0 0 0 1 0 0 0 1 0 0 1 0 0 1 0

    (c) Stored binary pattern

    Figure 5.32. Optical disk.

    Pit Land

    1

    (b) Transition from pit to land

  • 7/28/2019 3.1Memory

    61/62

    Optical Disks

    CD-ROM

    CD-Recordable (CD-R)

    CD-ReWritable (CD-RW)

    DVD

    DVD-RAM

  • 7/28/2019 3.1Memory

    62/62

    Magnetic Tape Systems

    Figure 5.33. Organization of data on magnetic tape.

    FileFile

    markmark

    File

    7 or 9

    gap gapFile gap Record RecordRecord Record

    bits