parallel computers1 prof. sin-min lee department of computer science

80
Parallel Computers 1 Parallel Computers Prof. Sin-Min Lee Department of Computer Science

Upload: imogen-stokes

Post on 26-Dec-2015

230 views

Category:

Documents


3 download

TRANSCRIPT

Page 1: Parallel Computers1 Prof. Sin-Min Lee Department of Computer Science

Parallel Computers 1

Parallel Computers

Prof. Sin-Min LeeDepartment of Computer

Science

Page 2: Parallel Computers1 Prof. Sin-Min Lee Department of Computer Science

Parallel Computers 2

Uniprocessor Systems

Improve performance:

Allowing multiple, simultaneous memory access

- requires multiple address, data, and control buses

(one set for each simultaneous memory access)- The memory chip has to be able to handle multiple transfers simultaneously

Page 3: Parallel Computers1 Prof. Sin-Min Lee Department of Computer Science

Parallel Computers 3

Uniprocessor Systems

Multiport Memory:

Has two sets of address, data, and control pins to allow simultaneous data transfers to occur

CPU and DMA controller can transfer data concurrently

A system with more than one CPU could handle simultaneous requests from two different processors

Page 4: Parallel Computers1 Prof. Sin-Min Lee Department of Computer Science

Parallel Computers 4

Uniprocessor Systems

Multiport Memory (cont.):CanCan- Multiport memory can handle two requests to read Multiport memory can handle two requests to read data from the same location at the same timedata from the same location at the same time

CannotCannot- Process two simultaneous requests to write data to Process two simultaneous requests to write data to the same memory locationthe same memory location

- Requests to read from and write to the same - Requests to read from and write to the same memory location simultaneouslymemory location simultaneously

Page 5: Parallel Computers1 Prof. Sin-Min Lee Department of Computer Science

Parallel Computers 5

Multiprocessors

I/O PortDeviceDevice

Controller

CPU

Bu

s

MemoryCPU

CPU

Page 6: Parallel Computers1 Prof. Sin-Min Lee Department of Computer Science

Parallel Computers 6

Multiprocessors Systems designed to have 2 to 8 CPUs The CPUs all share the other parts of the

computer Memory Disk System Bus etc

CPUs communicate via Memory and the System Bus

Page 7: Parallel Computers1 Prof. Sin-Min Lee Department of Computer Science

Parallel Computers 7

MultiProcessors Each CPU shares memory, disks,

etc Cheaper than clusters Not as good performance as clusters

Often used for Small Servers High-end Workstations

Page 8: Parallel Computers1 Prof. Sin-Min Lee Department of Computer Science

Parallel Computers 8

MultiProcessors OS automatically shares work

among available CPUs On a workstation…

One CPU can be running an engineering design program

Another CPU can be doing complex graphics formatting

Page 9: Parallel Computers1 Prof. Sin-Min Lee Department of Computer Science

Parallel Computers 9

Applications of Parallel Computers Traditionally: government labs,

numerically intensive applications Research Institutions Recent Growth in Industrial

Applications 236 of the top 500 Financial analysis, drug design and

analysis, oil exploration, aerospace and automotive

Page 10: Parallel Computers1 Prof. Sin-Min Lee Department of Computer Science

Parallel Computers 10

Multiprocessor SystemsFlynn’s Classification

Single instruction multiple data (SIMD):

MainMemory

ControlUnit

Processor

Processor

Processor

Memory

Memory

Memory

CommunicationsNetwork

• Executes a single instruction on multiple data values Executes a single instruction on multiple data values simultaneously using many processorssimultaneously using many processors• Since only one instruction is processed at any given time, it Since only one instruction is processed at any given time, it is not necessary for each processor to fetch and decode the is not necessary for each processor to fetch and decode the instructioninstruction• This task is handled by a single control unit that sends the This task is handled by a single control unit that sends the control signals to each processor.control signals to each processor.• Example: Array processorExample: Array processor

Page 11: Parallel Computers1 Prof. Sin-Min Lee Department of Computer Science

Parallel Computers 11

Why Multiprocessors?1. Microprocessors as the fastest CPUs

• Collecting several much easier than redesigning 1

2. Complexity of current microprocessors• Do we have enough ideas to sustain

1.5X/yr?• Can we deliver such complexity on

schedule?3. Slow (but steady) improvement in parallel

software (scientific apps, databases, OS)4. Emergence of embedded and server markets driving

microprocessors in addition to desktops• Embedded functional parallelism,

producer/consumer model• Server figure of merit is tasks per hour vs. latency

Page 12: Parallel Computers1 Prof. Sin-Min Lee Department of Computer Science

Parallel Computers 12

Parallel Processing Intro Long term goal of the field: scale number processors to size of

budget, desired performance Machines today: Sun Enterprise 10000 (8/00)

64 400 MHz UltraSPARC® II CPUs,64 GB SDRAM memory, 868 18GB disk,tape

$4,720,800 total 64 CPUs 15%,64 GB DRAM 11%, disks 55%, cabinet 16%

($10,800 per processor or ~0.2% per processor) Minimal E10K - 1 CPU, 1 GB DRAM, 0 disks, tape ~$286,700 $10,800 (4%) per CPU, plus $39,600 board/4 CPUs (~8%/CPU)

Machines today: Dell Workstation 220 (2/01) 866 MHz Intel Pentium® III (in Minitower) 0.125 GB RDRAM memory, 1 10GB disk, 12X CD, 17” monitor,

nVIDIA GeForce 2 GTS,32MB DDR Graphics card, 1yr service $1,600; for extra processor, add $350 (~20%)

Page 13: Parallel Computers1 Prof. Sin-Min Lee Department of Computer Science

Parallel Computers 13

Major MIMD Styles

1. Centralized shared memory ("Uniform Memory Access" time or "Shared Memory Processor")

2. Decentralized memory (memory module with CPU) • get more memory bandwidth, lower

memory latency• Drawback: Longer communication latency• Drawback: Software model more complex

Page 14: Parallel Computers1 Prof. Sin-Min Lee Department of Computer Science

Parallel Computers 14

Organization of Multiprocessor Systems

Three different ways to organize/classify systems:

• Flynn’s Classification

• System Topologies

• MIMD System Architectures

Page 15: Parallel Computers1 Prof. Sin-Min Lee Department of Computer Science

Parallel Computers 15

Multiprocessor SystemsFlynn’s Classification

Flynn’s Classification: Based on the flow of instructions and data

processing

A computer is classified by:- whether it processes a single instruction at a time or multiple instructions simultaneously- whether it operates on one more multiple data sets

Page 16: Parallel Computers1 Prof. Sin-Min Lee Department of Computer Science

Parallel Computers 16

Multiprocessor SystemsFlynn’s Classification

Four Categories of Flynn’s Classification:

SISD Single instruction single data SIMD Single instruction multiple data MISD Multiple instruction single data ** MIMD Multiple instruction multiple data

** The MISD classification is not practical to implement.In fact, no significant MISD computers have ever been build.It is included only for completeness.

Page 17: Parallel Computers1 Prof. Sin-Min Lee Department of Computer Science

Parallel Computers 17

From the beginning of time, computer scientists have been challenging computers with larger and larger problems. Eventually, computer processors were combined together in parallel to work on the same task together. This is parallel processing.

Types Of Parallel Processing

SISD – Single Instruction stream, Single Data streamMISD – Multiple Instruction stream, Single Data streamSIMD – Single Instruction stream, Multiple Data streamMIMD – Multiple Instruction stream, Multiple Data stream

Page 18: Parallel Computers1 Prof. Sin-Min Lee Department of Computer Science

Parallel Computers 18

SISD

One piece of data is sent to one processor.

Ex: To multiply one hundred numbers by the number three, each number would be sent and calculated until all one hundred results were calculated.

Data Multiply CPU

Page 19: Parallel Computers1 Prof. Sin-Min Lee Department of Computer Science

Parallel Computers 19

MISD

One piece of data is broken up and sent to many processor.

Ex: A database is broken up into sections of records and sent to several different processor, each of which searches the section for a specific key.

Data Search

CPU

CPU

CPU

CPU

Page 20: Parallel Computers1 Prof. Sin-Min Lee Department of Computer Science

Parallel Computers 20

SIMDMultiple processors execute the same instruction of separate data.

Ex: A SIMD machine with 100 processors could multiply 100 numbers, each by the number three, at the same time.

Multiply

CPU

CPU

CPU

CPU

Data

Data

Data

Data

Page 21: Parallel Computers1 Prof. Sin-Min Lee Department of Computer Science

Parallel Computers 21

MIMDMultiple processors execute different instruction of separate data.

This is the most complex form of parallel processing. It is used on complex simulations like modeling the growth of cities.

Multiply CPU

CPU

CPU

CPU

Data

Data

Data

Data

Search

Add

Subtract

Page 22: Parallel Computers1 Prof. Sin-Min Lee Department of Computer Science

Parallel Computers 22

The Granddaddy of Parallel Processing

MIMD

Page 23: Parallel Computers1 Prof. Sin-Min Lee Department of Computer Science

Parallel Computers 23

MIMD computers usually have a different program running on every processor. This makes for a very complex programming environment.

What processor?Doing which task?At what time?

What’s doing what when?

Page 24: Parallel Computers1 Prof. Sin-Min Lee Department of Computer Science

Parallel Computers 24

Memory latency

The time between issuing a memory fetch and receiving the response.

Simply put, if execution proceeds before the memory request responds, unexpected results will occur.

What values are being used? Not the ones requested!

Page 25: Parallel Computers1 Prof. Sin-Min Lee Department of Computer Science

Parallel Computers 25

A similar problem can occur with instruction executions themselves.

Synchronization

The need to enforce the ordering of instruction executions according to their data dependencies.

Instruction b must occur before instruction a.

Page 26: Parallel Computers1 Prof. Sin-Min Lee Department of Computer Science

Parallel Computers 26

Despite potential problems, MIMD can prove larger than life.

MIMD Successes

IBM Deep Blue – Computer beats professional chess player.

Some may not consider this to be a fair example, because Deep Blue was built to beat Kasparov alone. It “knew” his play style so it could counter is projected moves. Still, Deep Blue’s win marked a major victory for computing.

Page 27: Parallel Computers1 Prof. Sin-Min Lee Department of Computer Science

Parallel Computers 27

IBM’s latest, a supercomputer that models nuclear explosions.

IBM Poughkeepsie built the world’s fastest supercomputer for the U. S. Department of Energy. It’s job was to model nuclear explosions.

Page 28: Parallel Computers1 Prof. Sin-Min Lee Department of Computer Science

Parallel Computers 28

MIMD – it’s the most complex, fastest, flexible parallel paradigm. It’s beat a world class chess player at his own game. It models things that few people understand. It is parallel processing at its finest.

Page 29: Parallel Computers1 Prof. Sin-Min Lee Department of Computer Science

Parallel Computers 29

Multiprocessor SystemsFlynn’s Classification

Single instruction single data (SISD):

Consists of a single CPU executing individual instructions on individual data values

Page 30: Parallel Computers1 Prof. Sin-Min Lee Department of Computer Science

Parallel Computers 30

Multiprocessor SystemsFlynn’s Classification

Multiple instruction Multiple data (MIMD):

Executes different instructions simultaneously Each processor must include its own control unit The processors can be assigned to parts of the

same task or to completely separate tasks Example: Multiprocessors, multicomputers

Page 31: Parallel Computers1 Prof. Sin-Min Lee Department of Computer Science

Parallel Computers 31

Popular Flynn Categories SISD (Single Instruction Single Data)

Uniprocessors MISD (Multiple Instruction Single Data)

???; multiple processors on a single data stream SIMD (Single Instruction Multiple Data)

Examples: Illiac-IV, CM-2 Simple programming model Low overhead Flexibility All custom integrated circuits

(Phrase reused by Intel marketing for media instructions ~ vector)

MIMD (Multiple Instruction Multiple Data) Examples: Sun Enterprise 5000, Cray T3D, SGI

Origin Flexible Use off-the-shelf micros

MIMD current winner: Concentrate on major design emphasis <= 128 processor MIMD machines

Page 32: Parallel Computers1 Prof. Sin-Min Lee Department of Computer Science

Parallel Computers 32

Multiprocessor SystemsSystem Topologies: The topology of a multiprocessor system refers to the pattern of

connections between its processors Quantified by standard metrics:

Diameter The maximum distance between two processors in the computer system

Bandwidth The capacity of a communications link multiplied by the number of such links in the system (best case)

Bisectional Bandwidth The total bandwidth of the links connecting the two halves of

the processor split so that the number of links between the two halves is minimized (worst case)

Page 33: Parallel Computers1 Prof. Sin-Min Lee Department of Computer Science

Parallel Computers 33

Multiprocessor SystemsSystem Topologies

Six Categories of System Topologies:

• Shared bus

• Ring

• Tree

• Mesh

• Hypercube

• Completely Connected

Page 34: Parallel Computers1 Prof. Sin-Min Lee Department of Computer Science

Parallel Computers 34

Page 35: Parallel Computers1 Prof. Sin-Min Lee Department of Computer Science

Parallel Computers 35

Multiprocessor SystemsSystem Topologies

Shared bus: The simplest topology Processors communicate

with each other exclusively via this bus

Can handle only one data transmission at a time

Can be easily expanded by connecting additional processors to the shared bus, along with the necessary bus arbitration circuitry

Shared Bus

GlobalMemory

M

P

M

P

M

P

Page 36: Parallel Computers1 Prof. Sin-Min Lee Department of Computer Science

Parallel Computers 36

Page 37: Parallel Computers1 Prof. Sin-Min Lee Department of Computer Science

Parallel Computers 37

Multiprocessor SystemsSystem Topologies

Ring: Uses direct dedicated

connections between processors

Allows all communication links to be active simultaneously

A piece of data may have to travel through several processors to reach its final destination

All processors must have two communication links

P

P P

P P

P

Page 38: Parallel Computers1 Prof. Sin-Min Lee Department of Computer Science

Parallel Computers 38

Multiprocessor SystemsSystem Topologies

Tree topology: Uses direct

connections between processors

Each processor has three connections

Its primary advantage is its relatively low diameter

Example: DADO Computer

P

P P

P P PP

Page 39: Parallel Computers1 Prof. Sin-Min Lee Department of Computer Science

Parallel Computers 39

Page 40: Parallel Computers1 Prof. Sin-Min Lee Department of Computer Science

Parallel Computers 40

Page 41: Parallel Computers1 Prof. Sin-Min Lee Department of Computer Science

Parallel Computers 41

Page 42: Parallel Computers1 Prof. Sin-Min Lee Department of Computer Science

Parallel Computers 42

Multiprocessor SystemsSystem Topologies

Mesh topology: Every processor

connects to the processors above, below, left, and right

Left to right and top to bottom wraparound connections may or may not be present

P P P

P P P

P P P

Page 43: Parallel Computers1 Prof. Sin-Min Lee Department of Computer Science

Parallel Computers 43

Page 44: Parallel Computers1 Prof. Sin-Min Lee Department of Computer Science

Parallel Computers 44

Page 45: Parallel Computers1 Prof. Sin-Min Lee Department of Computer Science

Parallel Computers 45

Multiprocessor SystemsSystem Topologies

Hypercube:

Multidimensional mesh

Has n processors, each with log n connections

Page 46: Parallel Computers1 Prof. Sin-Min Lee Department of Computer Science

Parallel Computers 46

Page 47: Parallel Computers1 Prof. Sin-Min Lee Department of Computer Science

Parallel Computers 47

Page 48: Parallel Computers1 Prof. Sin-Min Lee Department of Computer Science

Parallel Computers 48

Multiprocessor SystemsSystem Topologies

Completely Connected:

• Every processor has n-1 connections, one to each of the other processors• The complexity of the processors increases as the system grows• Offers maximum communication capabilities

Page 49: Parallel Computers1 Prof. Sin-Min Lee Department of Computer Science

Parallel Computers 49

Architecture Details Computers MPPs

P MWorld’s simplest computer (processor/memory)

P MC DStandard computer (add cache,disk)

P MC DP MC D

P MC DNetwork

Page 50: Parallel Computers1 Prof. Sin-Min Lee Department of Computer Science

Parallel Computers 50

A Supercomputer at $5.2 million

Virginia Tech 1,100 node Macs.

G5 supercomputer

Page 51: Parallel Computers1 Prof. Sin-Min Lee Department of Computer Science

Parallel Computers 51

The Virginia Polytechnic Institute and State University has built a supercomputer comprised of a cluster of 1,100 dual-processor Macintosh G5 computers. Based on preliminary benchmarks, Big Mac is capable of 8.1 teraflops per second. The Mac supercomputer still is being fine tuned, and the full extent of its computing power will not be known until November. But the 8.1 teraflops figure would make the Big Mac the world's fourth fastest supercomputer

Page 52: Parallel Computers1 Prof. Sin-Min Lee Department of Computer Science

Parallel Computers 52

Big Mac's cost relative to similar machines is as noteworthy as its performance. The Apple supercomputer was constructed for just over US$5 million, and the cluster was assembled in about four weeks.

In contrast, the world's leading supercomputers cost well over $100 million to build and require several years to construct. The Earth Simulator, which clocked in at 38.5 teraflops in 2002, reportedly cost up to $250 million.

Page 53: Parallel Computers1 Prof. Sin-Min Lee Department of Computer Science

Parallel Computers 53

Srinidhi Varadarajan, Ph.D.Dr. Srinidhi Varadarajan is an Assistant Professor of Computer Science at Virginia Tech. He was honored with the NSF Career Award in 2002 for "Weaving a Code Tapestry: A Compiler Directed Framework for Scalable Network Emulation." He has focused his research on building a distributed network emulation system that can scale to emulate hundreds of thousands of virtual nodes.

October 28 2003Time: 7:30pm - 9:00pmLocation: Santa Clara Ballroom

Page 54: Parallel Computers1 Prof. Sin-Min Lee Department of Computer Science

Parallel Computers 54

Parallel Computers Two common types

Cluster Multi-Processor

Page 55: Parallel Computers1 Prof. Sin-Min Lee Department of Computer Science

Parallel Computers 55

Cluster Computers

Page 56: Parallel Computers1 Prof. Sin-Min Lee Department of Computer Science

Parallel Computers 56

Clusters on the Rise

Using clusters of small machines to build a supercomputer is not a new concept.

Another of the world's top machines, housed at the Lawrence Livermore National Laboratory, was constructed from 2,304 Xeon processors. The machine was build by Utah-based Linux Networx.

Clustering technology has meant that traditional big-iron leaders like Cray (Nasdaq: CRAY) and IBM have new competition from makers of smaller machines. Dell (Nasdaq: DELL) , among other companies, has sold high-powered computing clusters to research institutions.

Page 57: Parallel Computers1 Prof. Sin-Min Lee Department of Computer Science

Parallel Computers 57

Cluster Computers Each computer in a cluster is a

complete computer by itself CPU Memory Disk etc

Computers communicate with each other via some interconnection bus

Page 58: Parallel Computers1 Prof. Sin-Min Lee Department of Computer Science

Parallel Computers 58

Cluster Computers Typically used where one

computer does not have enough capacity to do the expected work Large Servers

Cheaper than building one GIANT computer

Page 59: Parallel Computers1 Prof. Sin-Min Lee Department of Computer Science

Parallel Computers 59

Although not new, supercomputing clustering technology still is impressive. It works by farming out chunks of data to individual machines, adding that clustering works better for some types of computing problems than others.

For example, a cluster would not be ideal to compete against IBM's Deep Blue supercomputer in a chess match; in this case, all the data must be available to one processor at the same moment -- the machine operates much in the same way as the human brain handles tasks.

However, a cluster would be ideal for the processing of seismic data for oil exploration, because that computing job can be divided into many smaller tasks.

Page 60: Parallel Computers1 Prof. Sin-Min Lee Department of Computer Science

Parallel Computers 60

Cluster Computers Need to break up work among the

computers in the cluster Example: Microsoft.com Search

Engine 6 computers running SQL Server

Each has a copy of the MS Knowledge Base Search requests come to one computer

Sends request to one of the 6 Attempts to keep all 6 busy

Page 61: Parallel Computers1 Prof. Sin-Min Lee Department of Computer Science

Parallel Computers 61

The Virginia Tech Mac supercomputer should be fully functional and in use by January 2004. It will be used for research into nanoscale electronics, quantum chemistry, computational chemistry, aerodynamics, molecular statics, computational acoustics and the molecular modeling of proteins.

Page 62: Parallel Computers1 Prof. Sin-Min Lee Department of Computer Science

Parallel Computers 62

Specialized Processors Vector Processors Massively Parallel Computers

Page 63: Parallel Computers1 Prof. Sin-Min Lee Department of Computer Science

Parallel Computers 63

Vector Processors

For (I=0;I<n;I++) {

array1[I] = array2[I] + array3[I]

}

This is an array (vector) operation

Page 64: Parallel Computers1 Prof. Sin-Min Lee Department of Computer Science

Parallel Computers 64

Vector Processors

Special instructions to operate on vectors (arrays) Vector instruction specifies

Starting addresses of all 3 arrays Loop count

Saves For Loop overhead Can more efficiently access memory

Also Known as SIMD Computers Single Instruction Multiple Data

Page 65: Parallel Computers1 Prof. Sin-Min Lee Department of Computer Science

Parallel Computers 65

Vector Processors Until the 1990s, the world’s fastest

supercomputers were implemented as vector processors

Now, Vector Processors are typically special peripheral devices that can be installed on a “regular” computer

Page 66: Parallel Computers1 Prof. Sin-Min Lee Department of Computer Science

Parallel Computers 66

Massively Parallel Computers IBM ASCI Purple

Cluster of 196 computers Each computer has

64 CPUs 256 Gigabytes of RAM 10,000 GB of Disk

Page 67: Parallel Computers1 Prof. Sin-Min Lee Department of Computer Science

Parallel Computers 67

Massively Parallel Computer How will ASCI Purple be used?

Simulation of molecular dynamics Research into repairing damaged DNA

Analysis of seismic waves Earthquake research

Simulation of star evolution Simulation of Weapons of Mass

Destruction

Page 68: Parallel Computers1 Prof. Sin-Min Lee Department of Computer Science

Parallel Computers 68

According to the article, the supercomputer, powered by 2,200 IBM G5 processors, has been initially rated at computing 7.41 trillion operations per second. The final number could be much higher, according to school officials, but if not, it would rank as the #4 fastest supercomputing cluster in the world.

Japan's US$250M Earth Simulator, which is currently the world's fastest computer

Lawrence Livermore's US$10-15M cluster system, which is made up of 2,304 Intel Xeon processors. IBM recently installed "Pacific Blue" at the Lawrence Livermore Laboratories for $94 million

Page 69: Parallel Computers1 Prof. Sin-Min Lee Department of Computer Science

Parallel Computers 69

"We are demonstrating that you can build a very high performance machine for a fifth to a tenth of the cost of what supercomputers now cost," said Hassan Aref, the dean of the School of Engineering at Virginia Tech in Blacksburg

1998 a group called distributed.net linked thousands of computers of all kinds around the world via the Internet, and cracked a 56-bit DES-II code in 40 days. It had previously been thought that such heavyweight ciphers would take hundreds of years to crack even on fast computers. One version of the Distributed.net program ran as a screen saver that kicked in, and began cracking code, whenever the machine was idle for more than a few minutes. Distributed.net bills itself as the "Fastest Computer on Earth", even though their hardware bill is effectively zero.

Page 70: Parallel Computers1 Prof. Sin-Min Lee Department of Computer Science

Parallel Computers 70

The idea is straightforward. You set up an arbitrary number of PCs, network them, typically using fast Ethernet, and then send them problems that can be divided up among the machines' processors. One machine acts as a server that syncs up all the rest, called clients.

Beowulf specs software like the Message Passing Interface written under the Linux operating system, that allows the machines to communicate while working on the problem.

And since Linux, brainchild of computer science student Linus Torvalds, is free, it keeps the cost down

Page 71: Parallel Computers1 Prof. Sin-Min Lee Department of Computer Science

Parallel Computers 71

Modeling the trajectories of tens of millions of charged particles, each interacting with the others through electro-magnetic forces, requires heavy-duty number crunching. To harness supercomputing power at a desktop price, UCLA’s Dr. Viktor K. Decyk and his colleagues have created their own super-fast, parallel processing “supercomputer” using a cluster of Power Macintosh computers.

Page 72: Parallel Computers1 Prof. Sin-Min Lee Department of Computer Science

Parallel Computers 72

                                              

                               

Apple's G4 Cubes used for cell mutation detection and genotyping analysis

SYDNEY - 22 January 2001

Page 73: Parallel Computers1 Prof. Sin-Min Lee Department of Computer Science

Parallel Computers 73

World's fastest" Macintosh cluster Tuesday, May 15, 2001 @ 8:45am

Researchers at the Grupo de Lasers e Plasmas (GoLP) in Portugal have created what they bill as the world's fastest Macintosh-based cluster. Consisting of 16 dual-processor Power Mac G4/450s, the cluster delivers more than 50 GigaFlops of peak power and took just one day to set up.

Page 74: Parallel Computers1 Prof. Sin-Min Lee Department of Computer Science

Parallel Computers 74

Apple Computer purchased a big Cray supercomputer in the mid-1980s. In fact, Steve Jobs was Cray's first and only walk-in customer. He arrived unannounced (so the story goes) at Cray headquarters in Mendota Heights, Minnesota and asked to speak to someone about buying a Cray. They nearly threw him out. It's only slightly less eccentric than someone walking into NASA Johnson Space Center and inquiring how to purchase a shuttle orbiter.

Later, Cray president John Rollwagen phoned Seymour and told him that Apple had just purchased a Cray that would be used in designing the next Macintosh. Seymour thought for a bit, and replied that that seemed reasonable, since he was using a Macintosh to design the next Cray!

Page 75: Parallel Computers1 Prof. Sin-Min Lee Department of Computer Science

Parallel Computers 75

Parallel Computer Architectures MPP – Massively Parallel Processors

Top of the top500 list consists of mostly mpps but clusters are “rising”

Clusters are there! Earth Simulator: Old-old style making news again ASCI Machines: Big companies, special purpose Beowulf Clusters: Popping up everywhere

Software Embarassingly parallel or sacrifice a grad student MATLAB*p (our little homegrown project)

Page 76: Parallel Computers1 Prof. Sin-Min Lee Department of Computer Science

Parallel Computers 76

Page 77: Parallel Computers1 Prof. Sin-Min Lee Department of Computer Science

Parallel Computers 77

Performance Trends

Page 78: Parallel Computers1 Prof. Sin-Min Lee Department of Computer Science

Parallel Computers 78

Extrapolations

Page 79: Parallel Computers1 Prof. Sin-Min Lee Department of Computer Science

Parallel Computers 79

Beowulf Clusters

Page 80: Parallel Computers1 Prof. Sin-Min Lee Department of Computer Science

Parallel Computers 80

Current Beowulfs (2)