cs160 – spring 2000 prof. fran berman - cse dr. philip papadopoulos

Post on 20-Jan-2016

219 Views

Category:

Documents

0 Downloads

Preview:

Click to see full reader

TRANSCRIPT

Two Instructors/One Class

• We are team-teaching the class• Lectures will be split about 50-50 along

topic lines. (We’ll keep you guessing as to who will show up next lecture )

• TA is Derrick Kondo. He is responsible for grading homework and programs

• Exams will be graded by Papadopoulos/Berman

Prerequisites

• Know how to program in C

• CSE 100 (Data Structures)

• CSE 141 (Computer Architecture) would be helpful but not required.

Grading

• 25% Homework

• 25% Programming assignments

• 25% Midterm

• 25% Final

Homework and Programming Assignments Due at beginning of section

Policies

• Exams are closed book, closed notes• No Late Homework• No Late Programs• No Makeup exams

• All assignments are to be your own original work.• Cheating/copying from anyone/anyplace will be

dealt with severely

Office Hours (Papadopoulos)

• My office is SSB 251 (Next to SDSC)

• Hours will be TuTh 2:30 – 3:30 or by appointment.

• My email is phil@sdsc.edu

• My campus phone is 822-3628

Course Materials

• Book: Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers, by B. Wilkinson and Michael Allen.

• Web site: Will try to make lecture notes available before class

• Handouts: As needed.

Computers/Programming

• Please see the TA about getting an account for the undergrad APE lab.

• We will use PVM for programming on workstation clusters.

• A word of advice: With the web, you can probably find almost completed source code somewhere. Don’t do this. Write the code yourself. You’ll learn more. See policy on copying.

Any other Adminstrative Questions?

Introduction to Parallel Computing

• Topics to be covered. See syllabus (online) for full details– Machine architecture and history– Parallel machine organization, – Parallel algorithm paradigm– Parallel programming environments and tools– Heterogeneous computing. – Evaluating Performance– Grid Computing

• Parallel programming and project assignments

What IS Parallel Computing?

• Applying multiple processors to solve a single problem

• Why?– Increased performance for rapid turnaround

time (wall clock time)– More available memory on multiple machines– Natural progression of standard Von Neumann

Architecture

World’s 10th Fastest Machine (as of November 1999) @ SDSC

1152 Processors

Are There Really Problems that Need O(1000) processors?

• Grand Challenge Codes– First Principles Materials Science– Climate modeling (ocean, atmosphere)– Soil Contamination Remediation– Protein Folding (gene sequencing)

• Hydrocodes – Simulated nuclear device detonation

• Code breaking (No Such Agency)

There must be problems with the approach

• Scaling with efficiency (speedup)• Unparallelizable portions of code (Amdahl’s law)• Reliability• Programmability• Algorithms• Monitoring• Debugging• I/O• …

– These and more keep the field interesting

Basic Measurement Yardsticks

• Peak Performance (AKA, guaranteed never to exceed) = nprocs X FLOPS/proc

• NAS Parallel Benchmarks

• Linpack Benchmark for the TOP 500

• Later in the course, We will explore about how to Fool the Masses and valid ways to measure performance

Illiac IV (1966 – 1970)

• $100 Million of 1990 Dollars

• Single instruction multiple data (SIMD)

• 32 - 64 Processing elements

• 15 Megaflops

• Ahead of its time

ICL DAP (1979)

• Distributed array Processor (also SIMD)

• 1K – 4K bit Serial processors

• Connected in a mesh

• Required an ICL mainframe to front-end the main processor array

• Never caught on in the US

Goodyear MPP (late 1970s)

• 16K bit-serial processors (SIMD)

• Goddard Space and Flight Center – NASA

• Only a few sold. Similar to the ICL DAP

• About 100 Mflops (100 MHz Pentium)

Cray-1 (1976)

• Seymour Cray, Designer

• NOT a parallel machine

• Single processor machine with vector registers

• Largely regarded as starting the modern supercomputer revolution

• 80 MHz Processor (80 MFlops)

Denelcor HEP (Heterogeneous Element Processor, early 80’s)

• Burton Smith, Designer• Multiple Instruction, Multiple Data (MIMD)• Fine (instruction-level) and Large-grain parallelism

(16 processors)– Instructions from different programs ran in per-processor

hardware queues (128 threads/proc)

• Precursor to the Tera MTA (Multithreaded architecture• Full-empty bit for every memory location. Allowed

fast synchronization• Important research machine

Caltech Cosmic Cube - 1983

• Chuck Seitz (Founded Myricom) and Geoffrey Fox (Lattice gauge theory)

• First Hypercube interconnection network• 8086/8087 based machine with Eugene Brooks’

Crystalline Operating System (CrOS)• 64 Processors by 1983• About 15x cheaper than a VAX 11/780• Begat nCUBE, Floating Point Systems, Ametek, Intel

Supercomputers (all dead companies)• 1987 – Vector coprocessor system achieved 500MFlops

Cray – XMP (1983) and Cray-2 (1985)

• Up to 4-Way shared memory machines

• This was the first supercomputer at SDSC– Best Performance (600 Mflop Peak)– Best Price/Performance of the time

Late 1980’s

• Proliferation of (now dead) parallel computers• CM-2 (SIMD) (Danny Hillis)

– 64K bit-serial, 2048 Vector Coprocessors• Achieved 5.2 Gflops on Linpack (LU Factorization)

• Intel iPSC/860 (MIMD - MPP)– 128 Processors– 1.92 Gigaflops (Linpack)

• Cray Y/MP (Vector Super)– 8 processors (333 Mflops/proc peak)– Achieved 2.1 Gigaflops (Linpack)

• BBN Butterfly (Shared memory)

• Many others (long since forgotten)

Early 90’s

• Intel Touchstone Delta and Paragon (MPP)– Follow-On iPSC/860– 13.2 Gflops on 512 Processors– 1024 Nodes delivered to ORNL in 1993 (150 GFLOPS Peak)

• Cray C-90 (Vector Super)– 16 Processor update of the Y/MP– Extremely popular, efficient and expensive

• Thinking Machines CM-5 (MPP)– Upto 16K Processors– 1024 Node System at Los Alamos National Lab

More 90’s

• Distributed Shared Memory– KSR-1 (Kendall Square Research)

• COMA (Cache Only Memory Architecture)

– University Projects• Stanford DASH Processor (Hennessy)

• MIT Alewife (Agarwal)

• Cray T3D/T3E. Fast Processor Mesh with upto 512 Alpha CPUs

What Can you Buy Today? (not an exhaustive list)

• IBM SP– Large MPP or Cluster

• SGI Origin 2000– Large Distributed Shared Memory Machine

• Sun HPC 10000 – 64 Processor True Shared Memory• Compaq Alpha Cluster• Tera MTA

– Multithreaded architecture (one in existence)

• Cray SV-1 Vector Processor• Fujitsu and Hitachi Vector Supers

Clusters

• Poor man’s Supercomputer?

• A pile-of-PC’s

• Ethernet or High-speed (eg. Myrinet) network

• Likely to be the dominant high-end architecture.

• Essentially a build-it-yourself MPP.

Next Time …

• Flynn’s Taxonomy

• Bit-Serial, Vector, Pipelined Processors

• Interconnection Networks– Routing Techniques– Embedding– Cluster interconnects

• Network Bisection

top related