course description: parallel computer architecture · 9/12/2004 \course\eleg652-04f\topic0a.ppt 12...

28
9/12/2004 \course\eleg652-04F\Topic0a.ppt 1 Course Description: Parallel Computer Architecture

Upload: others

Post on 15-Oct-2020

0 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Course Description: Parallel Computer Architecture · 9/12/2004 \course\eleg652-04F\Topic0a.ppt 12 Architectural Trends Architecture: performance and capability Tradeoff between parallelism

9/12/2004 \course\eleg652-04F\Topic0a.ppt 1

Course Description:

Parallel Computer Architecture

Page 2: Course Description: Parallel Computer Architecture · 9/12/2004 \course\eleg652-04F\Topic0a.ppt 12 Architectural Trends Architecture: performance and capability Tradeoff between parallelism

9/12/2004 \course\eleg652-04F\Topic0a.ppt 2

Reading List

Slides: Topic1x

Henn&Patt: Chapter 1

CullerSingh98: Chapter 1

Other assigned readings from homework and classes

Page 3: Course Description: Parallel Computer Architecture · 9/12/2004 \course\eleg652-04F\Topic0a.ppt 12 Architectural Trends Architecture: performance and capability Tradeoff between parallelism

9/12/2004 \course\eleg652-04F\Topic0a.ppt 3

Why Study Parallel Architecture?

Role of a computer architect:

To design and engineer the various levels of a computer system to maximize performance and programmabilitywithin limits of technology and cost.

Parallelism:• Provides alternative to faster clock for performance

• Applies at all levels of system design

• Is a fascinating perspective from which to view architecture

• Is increasingly central in information processing

Page 4: Course Description: Parallel Computer Architecture · 9/12/2004 \course\eleg652-04F\Topic0a.ppt 12 Architectural Trends Architecture: performance and capability Tradeoff between parallelism

9/12/2004 \course\eleg652-04F\Topic0a.ppt 4

Application demands

Technology Trends

Architecture Trends

Economics

Inevitability of Parallel Computing

Page 5: Course Description: Parallel Computer Architecture · 9/12/2004 \course\eleg652-04F\Topic0a.ppt 12 Architectural Trends Architecture: performance and capability Tradeoff between parallelism

9/12/2004 \course\eleg652-04F\Topic0a.ppt 5

Application Trends

Demand for cycles fuels advances in hardware, and vice-versaRange of performance demandsGoal of applications in using parallel machines: SpeedupProductivity requirement

Page 6: Course Description: Parallel Computer Architecture · 9/12/2004 \course\eleg652-04F\Topic0a.ppt 12 Architectural Trends Architecture: performance and capability Tradeoff between parallelism

9/12/2004 \course\eleg652-04F\Topic0a.ppt 6

Summary of Application Trends

Transition to parallel computing has occurred for scientific and engineering computingIn rapid progress in commercial computing

Desktop also uses multithreaded programs, which are a lot like parallel programsDemand for improving throughput on sequential workloads

Demand on productivity

Page 7: Course Description: Parallel Computer Architecture · 9/12/2004 \course\eleg652-04F\Topic0a.ppt 12 Architectural Trends Architecture: performance and capability Tradeoff between parallelism

9/12/2004 \course\eleg652-04F\Topic0a.ppt 7

Technology: A Closer Look

Basic advance is decreasing feature size ( λ )� Clock rate improves roughly proportional to

improvement in λ� Number of transistors improves like λ2 (or faster)

Performance > 100x per decade; clock rate 10x, rest transistor count

How to use more transistors?

� Parallelism in processing

� Locality in data access

� Both need resources, so tradeoff

Proc $

Interconnect

Page 8: Course Description: Parallel Computer Architecture · 9/12/2004 \course\eleg652-04F\Topic0a.ppt 12 Architectural Trends Architecture: performance and capability Tradeoff between parallelism

9/12/2004 \course\eleg652-04F\Topic0a.ppt 8

Clock Frequency Growth Rate

•30

% p

er y

ear

1970 1975 1980 1985 1990 1995 2000 200510

−1

100

101

102

103

104

4004

8008

8080 8086

286 386

Pentium

Pentium 4

Itanium 2−2003

MH

z

Page 9: Course Description: Parallel Computer Architecture · 9/12/2004 \course\eleg652-04F\Topic0a.ppt 12 Architectural Trends Architecture: performance and capability Tradeoff between parallelism

9/12/2004 \course\eleg652-04F\Topic0a.ppt 9

Transistor Count Growth Rate

• 1 billion transistors on chip in early 2000’s A.D.

• Transistor count grows much faster than clock rate- 40% per year, order of magnitude more contribution in 2 decades

1970 1975 1980 1985 1990 1995 2000 200510

−3

10−2

10−1

100

101

102

103

40048085

8008 8080

8086

286386

486

PentiumPentium Pro

Pentium 4

Itanium 2−2002

Itanium 2−2003M

illio

n Tr

ansi

stor

s

Page 10: Course Description: Parallel Computer Architecture · 9/12/2004 \course\eleg652-04F\Topic0a.ppt 12 Architectural Trends Architecture: performance and capability Tradeoff between parallelism

9/12/2004 \course\eleg652-04F\Topic0a.ppt 10

Similar Story for Storage

Divergence between memory capacity and speed more pronouncedLarger memories are slower� Need deeper cache hierarchies

Parallelism and locality within memory systems

Disks too: Parallel disks plus caching

Page 11: Course Description: Parallel Computer Architecture · 9/12/2004 \course\eleg652-04F\Topic0a.ppt 12 Architectural Trends Architecture: performance and capability Tradeoff between parallelism

9/12/2004 \course\eleg652-04F\Topic0a.ppt 11

Moore’s Law and Headcount

Along with the number of transistors, the effort and headcount required to design a microprocessor has grown exponentially

Page 12: Course Description: Parallel Computer Architecture · 9/12/2004 \course\eleg652-04F\Topic0a.ppt 12 Architectural Trends Architecture: performance and capability Tradeoff between parallelism

9/12/2004 \course\eleg652-04F\Topic0a.ppt 12

Architectural Trends

Architecture: performance and capability

Tradeoff between parallelism and locality� Current microprocessor: 1/3 compute, 1/3 cache,

1/3 off-chip connect

Understanding microprocessor architectural trends Four generations of architectural history: tube, transistor, IC, VLSI

Page 13: Course Description: Parallel Computer Architecture · 9/12/2004 \course\eleg652-04F\Topic0a.ppt 12 Architectural Trends Architecture: performance and capability Tradeoff between parallelism

9/12/2004 \course\eleg652-04F\Topic0a.ppt 13

Technology Progress Overview

Processor speed improvement: 2x per year (since 85). 100x in last decade.

DRAM Memory Capacity: 2x in 2 years (since 96). 64x in last decade.

DISK capacity: 2x per year (since 97). 250x in last decade.

Page 14: Course Description: Parallel Computer Architecture · 9/12/2004 \course\eleg652-04F\Topic0a.ppt 12 Architectural Trends Architecture: performance and capability Tradeoff between parallelism

9/12/2004 \course\eleg652-04F\Topic0a.ppt 14

Motorola’s PowerPC 604 Pentium

Page 15: Course Description: Parallel Computer Architecture · 9/12/2004 \course\eleg652-04F\Topic0a.ppt 12 Architectural Trends Architecture: performance and capability Tradeoff between parallelism

9/12/2004 \course\eleg652-04F\Topic0a.ppt 15

Page 16: Course Description: Parallel Computer Architecture · 9/12/2004 \course\eleg652-04F\Topic0a.ppt 12 Architectural Trends Architecture: performance and capability Tradeoff between parallelism

9/12/2004 \course\eleg652-04F\Topic0a.ppt 16

Technology Progress Overview

Processor speed improvement: 2x per year (since 85). 100x in last decade.

DRAM Memory Capacity: 2x in 2 years (since 96). 64x in last decade.

DISK capacity: 2x per year (since 97). 250x in last decade.

Page 17: Course Description: Parallel Computer Architecture · 9/12/2004 \course\eleg652-04F\Topic0a.ppt 12 Architectural Trends Architecture: performance and capability Tradeoff between parallelism

9/12/2004 \course\eleg652-04F\Topic0a.ppt 17

Summary: Parallel Architecture?

Increasingly attractive

� Economics, technology, architecture, application

Parallelism exploited at many levels

Same story from memory system perspective

Wide range of parallel architectures make sense

Page 18: Course Description: Parallel Computer Architecture · 9/12/2004 \course\eleg652-04F\Topic0a.ppt 12 Architectural Trends Architecture: performance and capability Tradeoff between parallelism

9/12/2004 \course\eleg652-04F\Topic0a.ppt 18

Page 19: Course Description: Parallel Computer Architecture · 9/12/2004 \course\eleg652-04F\Topic0a.ppt 12 Architectural Trends Architecture: performance and capability Tradeoff between parallelism

9/12/2004 \course\eleg652-04F\Topic0a.ppt 19

Page 20: Course Description: Parallel Computer Architecture · 9/12/2004 \course\eleg652-04F\Topic0a.ppt 12 Architectural Trends Architecture: performance and capability Tradeoff between parallelism

9/12/2004 \course\eleg652-04F\Topic0a.ppt 20

Page 21: Course Description: Parallel Computer Architecture · 9/12/2004 \course\eleg652-04F\Topic0a.ppt 12 Architectural Trends Architecture: performance and capability Tradeoff between parallelism

9/12/2004 \course\eleg652-04F\Topic0a.ppt 21

Page 22: Course Description: Parallel Computer Architecture · 9/12/2004 \course\eleg652-04F\Topic0a.ppt 12 Architectural Trends Architecture: performance and capability Tradeoff between parallelism

9/12/2004 \course\eleg652-04F\Topic0a.ppt 22

Page 23: Course Description: Parallel Computer Architecture · 9/12/2004 \course\eleg652-04F\Topic0a.ppt 12 Architectural Trends Architecture: performance and capability Tradeoff between parallelism

9/12/2004 \course\eleg652-04F\Topic0a.ppt 23

The Earth Simulator Machine in Japan

Earth Simulator (2002)

� Max 40 TFLOPS

� No.1 in TOP500 list

� General purpose

� Parallel vector processors

� 400 M$�development�

Page 24: Course Description: Parallel Computer Architecture · 9/12/2004 \course\eleg652-04F\Topic0a.ppt 12 Architectural Trends Architecture: performance and capability Tradeoff between parallelism

9/12/2004 \course\eleg652-04F\Topic0a.ppt 24

Page 25: Course Description: Parallel Computer Architecture · 9/12/2004 \course\eleg652-04F\Topic0a.ppt 12 Architectural Trends Architecture: performance and capability Tradeoff between parallelism

9/12/2004 \course\eleg652-04F\Topic0a.ppt 25

HPC Architecture

Vector Processor � 1976�

Parallel Processors � 1985�

MPU Cluster�Grid � 1997�

massively PP � 2008�2010

(CRAY-1)

(CM-1)

(ASCI-RED)

(DARPA-HPCS machinesGRAPE-DRBlueGene/LBG/C64)

Page 26: Course Description: Parallel Computer Architecture · 9/12/2004 \course\eleg652-04F\Topic0a.ppt 12 Architectural Trends Architecture: performance and capability Tradeoff between parallelism

9/12/2004 \course\eleg652-04F\Topic0a.ppt 26

Cluster computer of commodity MPU �

1997�ASCI Project � ASCI-Q 20TFLOPS(2003) 8,192 CPUs�

� ASCI-Purple 100TFLOPS(2005) 12,544 CPUs

� OLNL project (2004)

Limitation of current cluster� Low utilization of CPU due to

high-latency in interconnection

� No automatic parallelization

Limitation by size and power� ASCI-Purple (12,544 CPUs�

� �MW

ASCI-Q 20TFLOPS

Page 27: Course Description: Parallel Computer Architecture · 9/12/2004 \course\eleg652-04F\Topic0a.ppt 12 Architectural Trends Architecture: performance and capability Tradeoff between parallelism

9/12/2004 \course\eleg652-04F\Topic0a.ppt 27

New generation parallel systems �

2008�� IBM BlueGene/L Project (360TFLOPS�2005)

High density parallel processor

�65,536 CPU chips in 64 racks�131,072 processors�

IBM BlueGene/C64 Project (1.1 PFlops, 2007 ?)

HPCS Project

� IBM PERCS

� Cray Cascade

� SUN Hero project�

IBM Blue Gene/L

Page 28: Course Description: Parallel Computer Architecture · 9/12/2004 \course\eleg652-04F\Topic0a.ppt 12 Architectural Trends Architecture: performance and capability Tradeoff between parallelism

9/12/2004 \course\eleg652-04F\Topic0a.ppt 28

Landscape of Microprocessor Families

0

0.5

1

0 200 400 600 800 1000 1200 1400 1600 1800 2000

Frequency (MHz)

SP

EC

int2

000/

MH

z

Intel-x86

AMD-x86

Alpha

PowerPC

Sparc

IPF

SPECint2000 800700600500400300200

100

50

25

PIII-Xeon

P4

Athlon

264C

Sparc-III

PIII

264A

604eItanium