large computer systems
Post on 30-Dec-2015
39 Views
Preview:
DESCRIPTION
TRANSCRIPT
Large Computer Systems
CE 140 A1/A227 August 2003
Rationale Although computers are getting
faster, the demands are also increasing at least as fast
High-performance applications: simulations and modeling
Circuit speed cannot be increased indefinitely eventually, physical limits will be reached, and quantum mechanical effects will be a problem
Rationale
To handle larger problems, parallel computers are used
Machine level parallelism Replicates entire CPUs or portions of
them
Design Issues
What are the nature, size, and number of the processing elements?
What are the nature, size, and number of the memory modules?
How are the processing and memory elements interconnected?
What applications are to be run in parallel?
Grain Size
Coarse-grained parallelism Unit of parallelism is larger Running large pieces of software in
parallel with little or no communication between the pieces
Example: large time-sharing systems Fine-grained parallelism
Parallel programs with high degree of communication with each other
Tightly Coupled versus Loosely Coupled
Loosely coupled Small number of large, independent
CPUs that have relatively low-speed connections to each other
Tightly coupled Smaller processing units that work
closely together over high-bandwidth connections
Design Issues
In most cases Coarse-grained is well suited for loosely
coupled Fine-grained is well suited for tightly
coupled
Communication Models
In a parallel computer system, CPUs communicate with each other to exchange information
Two general types Multiprocessors Multicomputers
Multiprocessors
Shared Memory System All processors may share a single
virtual address space Easy model for programmers Global memory
any processor can access any memory module without intervention by another processor
Uniform Memory Access (UMA) Multiprocessor
INTERCONNECTION NETWORK
P1 P2 Pn
M1 M2 Mk
Non-Uniform Memory Access (NUMA) Multiprocessor
INTERCONNECTION NETWORK
P1 P2 PnM2 MnM1
Multiprocessor
Multicomputers Distributed Memory System Each CPU has its own private memory Local/private memory – a processor cannot
access a remote memory without the cooperation of the remote processor
Cooperation takes place in the form of a message passing protocol
Programming for a multicomputer is much more difficult than programming a multiprocessor
Distributed Memory System
INTERCONNECTION NETWORK
P1 P2 Pn
M2 MnM1
Distributed Memory System
Multiprocessors versus Multicomputers
Easier to program for multiprocessors But multicomputers are much simpler
and cheaper to build Goal: large computer systems that
combines the best of both worlds
Taxonomy of Large Computer Systems
Instruction Streams
Data Streams
Name Examples
1 1 SISD Classical Von Neumann Machine
1 Multiple SIMD Vector supercomputer, array processor
Multiple 1 MISD NONE
Multiple Multiple MIMD Multiprocessor, Multicomputer
Taxonomy of Large Computer Systems
Symmetric MultiProcessors (SMP)
Multiprocessor architecture where all processor can access all memory locations uniformly
Processors also share I/O SMP classified as an UMA SMP is simplest multiprocessor
system Any processor can execute either the
OS kernel or user programs
SMP Performance improves if programs
can be run in parallel Increased availability: if one processor
breaks down, system does not stop running
Performance is also improved incrementally by adding processors
Does not scale well beyond 16 processors
SMP
SMP
Clusters
A group of whole computers connected together to function as a parallel computer
Popular implementation: Linux computers using Beowulf clustering software
Clusters
High availability – redundant resources
Scalability Affordable – off-the-shelf parts
Clusters
Cyborg ClusterDrexel University
32 nodesDual P3 per node
Clusters
Memory Organization Shared Memory System (Multiprocessors)
each processor may also have a cache convenient to have a global address space
For NUMA, accesses to the global address space may be slower than access to remote address space
Distributed Memory System (Multicomputers) Private address space for each processor Easiest way to connect computers into a large
system Data sharing is implemented through message
passing
Issues
When processors share data, different processors must access the same value for a given data item
When a processor updates its cache, it must also update the caches of other processors, or invalidate other processors’ copies
shared data must be coherent
Cache Coherence
All cached copies of shared data must have the same value at all times
Snooping Caches
So-called because individual caches “snoop” on the bus
Write-Through Protocol
Write-Through with Update (Write Update) Update cache and memory, update the
cache of the rest of the processors Write-Through without Update (Write
Invalidate) Update cache and memory, invalidate
the cache of the rest of the processors
Write-Back Protocol When a processor wants to write to a block,
it must acquire exclusive control/ownership of the block All other copies are invalidated Block’s contents may be changed at any time When another processor requests to read the
block, owner processor sends block to requesting processor, and returns control of block to the memory module which updates block to contain the latest value
MESI Protocol Popular write-back cache coherence
protocol named after the initials of the four possible states of each cache line Modified – entry is valid; memory is invalid; no
copies exist Exclusive – no other cache holds the line;
memory is up to date Shared – multiple caches may hold the line;
memory is up to date Invalid – cache entry does not contain valid data
Snoopy Cache Issues
Snoopy caches require broadcasting information over the bus leading to increased bus traffic if the system grows in size
Directory Protocols
Uses a directory that keeps tracks of locations where multiple copies of a given data item is present
Eliminates need for broadcast If directory is centralized, the
directory will be a bottleneck
Performance
According to Amdahl’s law, introducing machine parallelism will not have a significant effect on performance if the program cannot take advantage of the parallel architecture
Not all programs parallelize well
Performance
Scalability Issues
Scalability Issues
Bandwidth Latency Depends on topology
top related