Computer Architecture 2010 – PC Structure and Peripherals1
Computer ArchitectureComputer Architecture
PC Structure and PC Structure and PeripheralsPeripheralsDr. Lihu Rappoport
Computer Architecture 2010 – PC Structure and Peripherals2
MemoryMemory
Computer Architecture 2010 – PC Structure and Peripherals3
Capacity Speed
Logic 2× in 3 years 2× in 3 years
DRAM 4× in 3 years 1.4× in 10 years
Disk 2× in 3 years 1.4× in 10 years
Technology TrendsTechnology Trends
CPU-DRAM Memory Gap (latency)
1
10
100
1000
1980
1981
1982
1983
1984
1985
1986
1987
1988
1989
1990
1991
1992
1993
1994
1995
1996
1997
1998
1999
2000
DRAM
CPU
Per
form
ance
Time
Computer Architecture 2010 – PC Structure and Peripherals4
SRAM vs. DRAMSRAM vs. DRAM Random Access: access time is the same for all locations
DRAM – Dynamic RAM SRAM – Static RAM
Refresh Regular refresh (~1% time) No refresh needed
Address Address muxed: row+ column
Address not multiplexed
Access Not true “Random Access” True “Random Access”
density High (1 Transistor/bit) Low (6 Transistor/bit)
Power low high
Speed slow fast
Price/bit low high
Typical usage
Main memory cache
Computer Architecture 2010 – PC Structure and Peripherals5
Basic DRAM chipBasic DRAM chip
Addressing sequence Row address and then RAS# asserted RAS# to CAS# delay Column address and then CAS# asserted DATA transfer
Row latch
Row addressdecoder
Column addrdecoder
Column latchCAS#
RAS# Data
Memoryarray
Memory address bus
Addr
Computer Architecture 2010 – PC Structure and Peripherals6
Addressing sequenceAddressing sequence
Access sequence Put row address on data bus and assert RAS#
Wait for RAS# to CAS# delay (tRCD) Put column address on data bus and assert CAS# DATA transfer Precharge
tRAC–Access time
RAS/CAS delay
Precharge delay
RAS#
Data
A[0:7]
CAS#
Data n
Row i Col n Row jX
CL - CAS latency
X
Computer Architecture 2010 – PC Structure and Peripherals7
DRAM TimingDRAM Timing CAS Latency: #clock cycles to access a specific column of
data #clock cycle from the moment memory controller issues a
column in the current row, and the data is read out from memory
RAS to CAS Delay: #clock cycles between row and column access
Row Pre-charge time: #clock cycles to close an open row, and open the next row
Active to Precharge Delay #clock cycles to access a specific row between the data request and the pre-charge command.
Computer Architecture 2010 – PC Structure and Peripherals8
Basic SDRAM controllerBasic SDRAM controller
DRAM
addressdecoder
Timedelaygen.
addressmux
RAS#
CAS#
R/W#
A[20:23]
A[10:19]
A[0:9]
Memory address busD[0:7]
Select
Chipselect
DRAM data must be periodically refreshed Needed to keep data correct DRAM controller performs DRAM refresh, using refresh
counter
Computer Architecture 2010 – PC Structure and Peripherals9
Paged Mode DRAM– Multiple accesses to different columns from same row– Saves RAS and RAS to CAS delay
Extended Data Output RAM (EDO RAM)– A data output latch enables to parallel next column address with
current column data
Improved DRAM Improved DRAM SchemesSchemes
RAS#
Data
A[0:7]
CAS#
Data n D n+1
Row X Col n X Col n+1 X Col n+2 X
D n+2
X
RAS#
Data
A[0:7]
CAS#
Data n Data n+1
Row X Col n X Col n+1 X Col n+2 X
Data n+2
X
Computer Architecture 2010 – PC Structure and Peripherals10
Burst DRAM– Generates consecutive column address by itself
Improved DRAM Schemes Improved DRAM Schemes (cont)(cont)
RAS#
Data
A[0:7]
CAS#
Data n Data n+1
Row X Col n X
Data n+2
X
Computer Architecture 2010 – PC Structure and Peripherals11
Synchronous DRAM – SDRAMSynchronous DRAM – SDRAM All signals are referenced to an external clock (100MHz-
200MHz) Makes timing more precise with other system devices
Multiple Banks Multiple pages open simultaneously (one per bank)
Command driven functionality instead of signal driven ACTIVE: selects both the bank and the row to be activated
• ACTIVE to a new bank can be issued while accessing current bank READ/WRITE: select column
Read and write accesses to the SDRAM are burst oriented Successive column locations accessed in the given row Burst length is programmable: 1, 2, 4, 8, and full-page
• May end full-page burst by BURST TERMINATE to get arbitrary burst length
A user programmable Mode Register CAS latency, burst length, burst type
Auto pre-charge: may close row at last read/write in burst Auto refresh: internal counters generate refresh address
Computer Architecture 2010 – PC Structure and Peripherals12
SDRAM TimingSDRAM Timing
tRCD: ACTIVE to READ/WRITE gap = tRCD(MIN) / clock period
tRC: successive ACTIVE to a different row in the same bank
tRRD: successive ACTIVE commands to different banks
clock
cmd
Bank
Data
Addr
NOP
X
Data j Data k
ACT
Bank 0
Row i X
RD
Bank 0
Col j
t RCD >20ns
RD+PC
Bank 0
Col k
ACT
Bank 0
Row l
t RC>70ns
ACT
Bank 1
Row m
t RRD >20ns
CL=2
NOP
X
X
NOP
X
X
NOP
X
X
RD
Bank 0
Col q
RD
Bank 1
Col n
NOP
X
X
NOP
X
X
Data n Data q
BL = 1
Computer Architecture 2010 – PC Structure and Peripherals13
DDR-SDRAMDDR-SDRAM 2n-prefetch architecture
The DRAM cells are clocked at the same speed as SDR SDRAM
Internal data bus is twice the width of the external data bus
Data capture occurs twice per clock cycle• Lower half of the bus sampled at clock rise• Upper half of the bus sampled at clock fall
Uses 2.5V (vs. 3.3V in SDRAM) Reduced power consumption
0:n-1
n:2n-1
0:n-1
200MHz clock
0:2n-1SDRAMArray
Computer Architecture 2010 – PC Structure and Peripherals14
DDR SDRAM TimingDDR SDRAM Timing
133MHzclock
cmd
Bank
Data
Addr
NOP
X
ACT
Bank 0
Row i X
RD
Bank 0
Col j
tRCD >20ns
ACT
Bank 0
Row l
t RC>70ns
ACT
Bank 1
Row m
t RRD >20ns
CL=2
NOP
X
X
NOP
X
X
NOP
X
X
RD
Bank 1
Col n
NOP
X
X
NOP
X
X
NOP
X
X
NOP
X
X
j +1 +2 +3 n +1 +2 +3
Computer Architecture 2010 – PC Structure and Peripherals15
DIMMsDIMMs DIMM: Dual In-line Memory Module
A small circuit board that holds memory chips
64-bit wide data path (72 bit with parity) Single sided: 9 chips, each with 8 bit data bus
• 512 Mbit / chip 8 chips 512 Mbyte per DIMM Dual sided: 18 chips, each with 4 bit data bus
• 256 Mbit / chip 16 chips 512 Mbyte per DIMM
Computer Architecture 2010 – PC Structure and Peripherals16
DRAM StandardsDRAM Standards SDR SDRAM: PC66, PC100 and PC133 DDR SDRAM
Total BW for DDR400 3200M Byte/sec = 64 bit2200MHz / 8 (bit/byte)
Dual channel DDR SDRAM Uses 2 64 bit DIMM modules in parallel to get a 128
data bus Total BW for DDR400 dual channel: 6400M Byte/sec
= 128 bit2200MHz /8
DDR200
DDR266
DDR333
DDR400
DDR533
Bus freq (MHz) 100 133 167 200 266
Bit/pin (Mbps) 200 266 333 400 533
Total bandwidth (M Byte/sec )
1600 2133 2666 3200 4264
Computer Architecture 2010 – PC Structure and Peripherals17
DDR2DDR2
DDR2 achieves high-speed using 4-bit prefetch architecture SDRAM cells read/write 4× the
amount of data as the external bus
DDR2-533 cell works at the same frequency as a DDR266 SDRAM or a PC133 SDRAM cell
This method comes at a price of increased latency DDR2-based systems may
perform worse than DDR1-based systems
Computer Architecture 2010 – PC Structure and Peripherals18
DDR2 – Other FeaturesDDR2 – Other Features Shortened page size for reduced activation power
When ACTIVATE command is given, read all bits in the page• A major contributor to the active power
A device with shorter page size has significantly lower power
512Mb DDR2 page size is 1KByte vs. 2KB for 512Mb DDR1 Eight banks in 1Gb densities and above
Increases flexibility in DRAM accesses (also increases the power)
DDR1 DDR 2
DRAM Frequency 100/133/166/200 MHz 100/133/166/200 MHz
Bus Frequency 100/133/166/200 MHz 200/266/333/400 MHz
Data Rate200/266/333/400
Mbps400/533/667/800
Mbps
Operation Voltage 2.5V 1.8V
CAS Latency 2, 2.5, 3 3, 4, 5
Data Bandwidth 3.2GBs 6.4GBs
Power Consumption 399mW 217mW
Computer Architecture 2010 – PC Structure and Peripherals19
DDR2 LatencyDDR2 Latency DRAM timing, measured in I/O bus cycles, specifies 3 numbers
CAS Latency - RAS to CAS Delay - RAS Precharge Time DRAM latency is the time for accessing data in an open page
E.g., latency for DDR400 2-3-2: 1/200MHz × 2.5 = 10ns DDR2-533 4-4-4 latency is 1.5× of to DDR400 2–3–2
• 30% bandwidth growth does not compensate for access time worsening
DDR2-533 3-3-3 latency is only 12% worse than DDR400 2-3-2
Memory Mem Clk Bus Clk Timings
Latency dual-channel BW
DDR400 200MHz 200MHz 2.5–3–3 12.5 ns 6.4 GB/sec
DDR400 200MHz 200MHz 2–3–2 10 ns 6.4 GB/sec
DDR533 266MHz 266MHz 3–4–4 11.2 ns 8.5 GB/sec
DDR533 266MHz 266MHz 2.5–3–3 9.4 ns 8.5 GB/sec
DDR2-533 133MHz 266MHz 4–4–4 15 ns 8.5 GB/sec
DDR2-533 133MHz 266MHz 3–3–3 11.2 ns 8.5 GB/sec
DDR2-600 150MHz 300MHz 5–5–5 16.6 ns 9.6 GB/sec
DDR2-600 150MHz 300MHz 4–4–4 13.3 ns 9.6 GB/sec
Computer Architecture 2010 – PC Structure and Peripherals20
DDR2 Latency (cont.)DDR2 Latency (cont.) Performance tests
DDR2-533 with 4-4-4 timings worse than DDR400 2–3–2 DDR2-533 with 3-3-3 timings better than DDR400 2–3–2
DDR2-533 modules with 3-3-3 timings Supported by 925/915 best choice for enthusiastic users significant improvement
Over-clocked motherboards clock DDR2-533 at 600MHz realized through undocumented memory frequency ratios
available in i925/i915
The performance of DDR2-based systems is more
sensitive to a lower latency than to a higher frequency We get practically nothing from using DDR2-600 SDRAM
with i925/i915
Computer Architecture 2010 – PC Structure and Peripherals21
DDR2 StandardsDDR2 Standards
Standard name
Memory clock
Cycle time
I/O Bus clock
Data transfer
s per second
Module name
Peak transfer
rateTimings
DDR2-400
100 MHz 10 ns 200 MHz400
MillionPC2-3200
3200 MB/s
3-3-34-4-4
DDR2-533
133 MHz 7.5 ns 266 MHz533
MillionPC2-4200
4266 MB/s
3-3-34-4-4
DDR2-667
166 MHz 6 ns 333 MHz667
MillionPC2-5300
5333 MB/s
4-4-45-5-5
DDR2-800
200 MHz 5 ns 400 MHz800
MillionPC2-6400
6400 MB/s
4-4-45-5-56-6-6
DDR2-1066
266 MHz 3.75 ns 533 MHz1066
MillionPC2-8500
8533 MB/s
6-6-67-7-7
Computer Architecture 2010 – PC Structure and Peripherals22
DDR3DDR3 30% a power consumption reduction compared to DDR2
1.5 V supply voltage, compared to DDR2's 1.8 V or DDR's 2.5 V
90 nanometer fabrication technology
Higher bandwidth 8 bit deep prefetch buffer (vs. 4 bit in DDR2 and 2 bit in DDR)
Transfer data rate Effective clock rate of 800–1600 MHz using both rising and
falling edges of a 400–800 MHz I/O clock. DDR2: 400–800 MHz using a 200–400 MHz I/O clock DDR: 200–400 MHz based on a 100–200 MHz I/O clock
DDR3 DIMMs 240 pins, the same number as DDR2, and are the same size Electrically incompatible, and have a different key notch
location
Computer Architecture 2010 – PC Structure and Peripherals23
DDR2 vs. DDR3 PerformanceDDR2 vs. DDR3 Performance
The high latency of DDR3 SDRAM has negative effect on streaming operations
Source: xbitlabs
Computer Architecture 2010 – PC Structure and Peripherals24
SRAM – Static RAM SRAM – Static RAM True random access High speed, low density, high power No refresh Address not multiplexed DDR SRAM
2 READs or 2 WRITEs per clock Common or Separate I/O DDRII: 200MHz to 333MHz Operation; Density:
18/36/72Mb+ QDR SRAM
Two separate DDR ports: one read and one write One DDR address bus: alternating between the read
address and the write address QDRII: 250MHz to 333MHz Operation; Density:
18/36/72Mb+
Computer Architecture 2010 – PC Structure and Peripherals25
Read Only Memory (ROM)Read Only Memory (ROM) Random Access Non volatile ROM Types
PROM – Programmable ROM• Burnt once using special equipment
EPROM – Erasable PROM• Can be erased by exposure to UV, and then
reprogrammed E2PROM – Electrically Erasable PROM
• Can be erased and reprogrammed on board• Write time (programming) much longer than RAM• Limited number of writes (thousands)
Computer Architecture 2010 – PC Structure and Peripherals26
Flash MemoryFlash Memory Non-volatile, rewritable memory
limited lifespan of around 100,000 write cycles Flash drives compared to HD drives:
Smaller size, faster, lighter, noiseless, consume less energy Withstanding shocks up to 2000 Gs
• Equivalent to a 10 foot drop onto concrete - without losing data Lower capacity (8GB), but going up Much more expensive (cost/byte): currently ~20$/1GB
NOR Flash Supports per-byte addressing Suitable for storing code (e.g. BIOS, cell phone SW)
NAND Flash Supports page-mode addressing (e.g., 1KB blocks) Suitable for storing large data (e.g. pictures, songs)
Computer Architecture 2010 – PC Structure and Peripherals27
The MotherboardThe Motherboard
Computer Architecture 2010 – PC Structure and Peripherals28
Computer System StructureComputer System Structure
CPU
PCI
North BridgeDDRII
Channel 1
mouse
LAN
LanAdap
External Graphics
Card
Mem BUS
Cache
SoundCard
speakers
South Bridge
PCI express ×16
IDEcontroller
IO Controller
Old DVDDrive
HardDisk
Pa
rall
el
Po
rt
Se
ria
l P
ort Floppy
Drivekeybrd
DDRIIChannel 2
USBcontroller
SATAcontroller
PCI express ×1
Memory controller
On-board Graphics
CPU BUS
Computer Architecture 2010 – PC Structure and Peripherals29
Computer System Structure – Computer System Structure – NewNew
PCI
North Bridge
mouse
LAN
LanAdap
External Graphics
Card
CPU BUS
SoundCard
speakers
South Bridge
PCI express ×16
IDEcontroller
IO Controller
DVDDrive
HardDisk
Pa
rall
el
Po
rt
Se
ria
l P
ort Floppy
Drivekeybrd
USBcontroller
SATAcontroller
PCI express ×1
On-board GraphicsCPU
CacheDDRIII
Channel 1 Mem BUSDDRIII
Channel 2
Memory controller
Computer Architecture 2010 – PC Structure and Peripherals30
The MotherboardThe MotherboardIEEE-1394a header
audio header
PCI add-in card connector
PCI express x1 connector
PCI express x16 connector
Back panel connectors
Processor core power connector
Rear chassis fan header
LGA775 processor socket
GMCH: North Bridge + integ GFXProcessor fan header
DIMM Channel A socketsSerial port headerDIMM Channel B sockets
Diskette drive connectorMain Power connector
BatteryICH: South Bridge + integ Audio
4 × SATA connectors
Front panel USB header
Speaker
Parallel ATA IDE connector
High Def. Audio header
PCI add-in card connector
Computer Architecture 2010 – PC Structure and Peripherals31
How to get the most of How to get the most of Memory ?Memory ?
Single Channel DDR
Dual channel DDR Each DIMM pair must be the same
Balance FSB and memory bandwidth 800MHz FSB provides 800MHz × 64bit / 8 = 6.4 G
Byte/sec Dual Channel DDR400 SDRAM also provides 6.4 G
Byte/sec
L2 Cache
CPU
FSB – Front Side Bus
MemoryMemory Bus
NorthBridge DRAM
Ctrlr
CH A DDRDIMM
DDRDIMM
DDRDIMM
DDRDIMM
CH B
L2 Cache
CPU
FSB – Front Side Bus NorthBridge DRAM
Ctrlr
Computer Architecture 2010 – PC Structure and Peripherals32
How to get the most of How to get the most of Memory ? Memory ?
Each DIMM supports 4 open pages simultaneously The more open pages, the more random access It is better to have more DIMMs
• n DIMMs: 4n open pages
DIMMs can be single sided or dual sided Dual sided DIMMs may have separate CS of each side
• In this case the number of open pages is doubled (goes up to 8)
• This is not a must – dual sided DIMMs may also have a common CS for both sides, in which case, there are only 4 open pages, as with single side
Computer Architecture 2010 – PC Structure and Peripherals33
Hard DisksHard Disks
Computer Architecture 2010 – PC Structure and Peripherals34
Hard Disk StructureHard Disk Structure Direct access Nonvolatile, Large, inexpensive, and slow
Lowest level in the memory hierarchy Technology
Rotating platters coated with a magnetic surface Use a moveable read/write head to access the disk Each platter is divided to tracks: concentric circles Each track is divided to sectors
• Smallest unit that can be read or written Disk outer parts have more space for
sectors than the inner parts• Constant bit density: record more
sectors on the outer tracks• speed varies with track location
Buffer Cache A temporary data storage area
used to enhance drive performance
Platters
TrackSector
Computer Architecture 2010 – PC Structure and Peripherals35
The IBM Ultrastar 36ZXThe IBM Ultrastar 36ZX Top view of a 36
GB, 10,000 RPM, IBM SCSIserver hard disk
10 stacked platters
Computer Architecture 2010 – PC Structure and Peripherals36
Disk AccessDisk AccessRead/write data is a three-stage process
Seek time: position the arm over the proper track Average: Sum of the time for all possible seek / total # of possible seeks Due to locality of disk reference, actual average seek is shorter: 4 to 12
ms Rotational latency: wait for desired sector to rotate under head
The faster the drives spins, the shorter the rotational latency time Most disks rotate at 5,400 to 15,000 RPM
• At 7200 RPM: 8 ms per revolution An average latency to the desired information is halfway around the
disk• At 7200 RPM: 4 ms
Transfer block: read/write the data Transfer Time is a function of:
• Sector size• Rotation speed• Recording density: bits per inch on a track
Typical values: 100 MB / sec
Disk Access Time = Seek time + Rotational Latency + Transfer time
+ Controller Time + Queuing Delay
Computer Architecture 2010 – PC Structure and Peripherals37
The Disk Interface – EIDEThe Disk Interface – EIDE EIDE, ATA, UltraATA, ATA 100, ATAPI: all the same interface
Uses for connecting hard disk drives and CD-ROM drives 80-pin cable, 40-pin dual header connector 100 MB/s (ATA66 is only 66MB/s) EIDE controller integrated with the motherboard (in the ICH)
EIDE controller has two channels: primary and a secondary Work independently Two devices per channel: master and slave, but equal
• The 2 devices have to take turns controlling the bus• A total of four devices per cont
If there are two device on the system (e.g., a hard disk and a CD-ROM)• It is better to put them on different channels
Avoid mixing slower (CD) and faster devices (HDD) on the same channel
If doing a lot of copying from a CD-ROM drive to the CD-RW• Better performance by separating devices to separate channels
Computer Architecture 2010 – PC Structure and Peripherals38
The Disk Interface – Serial ATA The Disk Interface – Serial ATA (SATA)(SATA) Point-to-point connection
Ensures dedicated 150 MB/s per device (no sharing) Dual controllers allow independent operation of each
device Thinner (7 wires), flexible, longer cables
Easier routing and improved airflow 4 wires for signaling + 3 ground wires to minimize impedance
and crosstalk New 7-pin connector design
for easier installation and better device reliability takes 1/6 the area on the system board
CRC error checking on all data and control information Increased BW supports data intensive applications such
as digital video production, digital audio storage and recording,
high-speed file sharing No configuration needed when a adding a 2nd SATA drive
One cable for each drive eliminates the need for jumpers No more figuring out which device is the master or slave
Today's hard drives are clearly below 100 MB/s Do not benefit from UltraATA / SATA
Computer Architecture 2010 – PC Structure and Peripherals39
The BIOSThe BIOS
Computer Architecture 2010 – PC Structure and Peripherals40
System Start-upSystem Start-upUpon computer turn-on several events occur:
1. The CPU "wakes up" and sends a message to activate the BIOS
2. BIOS runs the Power On Self Test (POST): make sure system devices are working ok Initialize system hardware and chipset registers Initialize power management Test RAM Enable the keyboard Test serial and parallel ports Initialize floppy disk drives and hard disk drive controllers Displays system summary information
Computer Architecture 2010 – PC Structure and Peripherals41
System Start-up (cont.)System Start-up (cont.)3. During POST, the BIOS compares the system configuration
data obtained from POST with the system information stored on a memory chip located on the MB A CMOS chip, which is updated whenever new system
components are added Contains the latest information about system components
4. After the POST tasks are completed the BIOS looks for the boot program responsible for loading
the operating system Usually, the BIOS looks on the floppy disk drive A: followed
by drive C:
5. After boot program is loaded into memory It loads the system configuration information contained in
the registry in a Windows® environment, and device drivers
6. Finally, the operating system is loaded