m u n - february 17, 2005 - phil bording1 computer engineering of wave machines for seismic modeling...

24
M U N - February 17, 2005 - Phil Bording 1 Computer Engineering of Wave Machines for Seismic Modeling and Seismic Migration R. Phillip Bording February 17, 2004 Husky Energy Chair in Oil and Gas Research Memorial University of Newfoundland

Upload: bryan-merritt

Post on 13-Dec-2015

213 views

Category:

Documents


0 download

TRANSCRIPT

M U N - February 17, 2005 - Phil Bording

1

Computer Engineering of Wave Machines for

Seismic Modeling and Seismic Migration

R. Phillip Bording

February 17, 2004

0 Max Address

Husky Energy Chair in Oil and Gas ResearchMemorial University of Newfoundland

Session 1History of Design

Tyco BraheNapier Charles Babbage – mechanical design

John Atanasoff – Storage – spinning capacitor

- Konrad Zuse - Floating PointMauchley and Ekertvon-Neumann

Harvard memory – code memory - data

Princeton memory code and data

Session 2Current Design Issues

Scaling lawsMoore’s Law

Transistors – VLSIMemory – Technology

Division of DesignThe memory ChallengeThe processor Challenge

The ILLIAC – PEPE IBM 7094IBM 360/44IBM 360/95Array Processors

the software of array processor calls

Programming Modelsvectorsshared memorydistributed memory

M U N - February 17, 2005 - Phil Bording

4

Lamda Rules

M U N - February 17, 2005 - Phil Bording

5

Division of design

ALU

Memory Memory

ALU

One Company Company B

Company A

Weak Link

M U N - February 17, 2005 - Phil Bording

6

Moore’s Laws

Every 18 months the density of transistors on a VLSI chip doubles

The investments of $ doubles with every new VLSI plant

M U N - February 17, 2005 - Phil Bording

7

Illiac

8 X 8 ProcessorsNearest Neighbor Connections

M U N - February 17, 2005 - Phil Bording

8

Parallel Ensemble Processing Elements - PEPE

P0 Pn-3 Pn-2 Pn-1 Pn. . . .

Data Inputs

Radar Processing ComputerAssociative Computing

Data Outputs

M U N - February 17, 2005 - Phil Bording

9

IBM Machines

Early 1960’s 7094, 36 bit arithmetic 1600 and 1400 processors completely different

Middle 1960’s New Machine – IBM 360 36 bit words, but memory parity was added

8 bit byte + 1 bit parity Uniform business machine architectures 32 and 64 bit floating point

Not any industry standard for format of floating point

M U N - February 17, 2005 - Phil Bording

10

Array Processors

IBM and CDC designed DMA processors – Direct Memory AccessFrees the main processor to computeAllows separate simple processors to do

the i/o

The idea translated into attached processors for arithmetic processing

M U N - February 17, 2005 - Phil Bording

11

Array Processors

Arrays of data are moved to a local very high speed memory – fast registers

Arithmetic is performed by special instructions passed to array processor

CPUArray Processor

M U N - February 17, 2005 - Phil Bording

12

Software Design Issues

Vector ProgrammingCache ProgrammingMessage Passing ProgrammingNUMA ProgrammingGrid Programming

ALL of these memory operations have aFixed Cost

Code Performance Improvementsare dominated by fixed costs

M U N - February 17, 2005 - Phil Bording

13

Hardware Design Issues

10 Years equals 100 Fold Speedup

Memory Latency – cost of getting the first word is a constant

Wires have failed to scale

Bigger cache memories are slower

Code Performance Improvements are dominated by fixed costs

M U N - February 17, 2005 - Phil Bording

14

Linear Address Space

Address Pointer

0 Max Address

Latency is the time to access the first word

Bandwidth is the rate of accessing successive words

M U N - February 17, 2005 - Phil Bording

15

von NeumannArchitecture

Princeton

Address Pointer

ArithmeticLogic Unit

(ALU)

Memory

Program Counter

Pc = Pc + 1

Data/Instructions

Featuring Deterministic Execution

M U N - February 17, 2005 - Phil Bording

16

Cache MemoryArchitecture

Address Pointer

Memory

CacheMemory

CACHE

CONTROL

Main Memory is large and slow.

Cache is much smaller and much faster.

Control logic control keeps the main memory coherent.

Featuring Non-Deterministic Execution

M U N - February 17, 2005 - Phil Bording

17

Cache Memory- Three LevelsArchitecture

Address Pointer

MemoryMulti-

Gigabytes

Large and Slow160 X

16XL3 CacheMemory

Cache ControlLogic

L2 CacheMemory

L1 CacheMemory

2X 8X

16 Megabytes128 Kilobytes32 Kilobytes

2 Gigahertz Clock

Featuring Really Non-Deterministic Execution

M U N - February 17, 2005 - Phil Bording

18

Programming Modelsfor

Parallel Computing

M U N - February 17, 2005 - Phil Bording

19

Multiple Address Pointers

Program Address Spaces

0 Max 0 Max 0 Max 0 Max

Distributed ComputingMessage Passing Interface

M U N - February 17, 2005 - Phil Bording

20

Distributed Computingwith Message Passing

Multiple Address Pointers

Program Address Spaces

Messages Left and Right

M U N - February 17, 2005 - Phil Bording

21

M U N - February 17, 2005 - Phil Bording

22

Multiple Address Pointers

Global Program Address Space

0 n-1 n 2n-1 2n 3n-1 3n 4n-1

Local Local Local Local

Address and CacheBus with Conflict Resolution

Multi-ThreadingOpenMP Programming

Model

M U N - February 17, 2005 - Phil Bording

23

Uniqueness of Store Multi-Threading

Multiple Address Pointers

Program Address Space

Duplicate Pointers

to the same Location – Conflict on storing a resultSo who is managing the multiple pointers?It is the programmers responsibility.

0

M U N - February 17, 2005 - Phil Bording

24

Multiple Bank Memory Systems

Starting + 1 +2 +3Address +N +2N +3NMod 4

Memory Banks

Bank 0 1 2 3

Vector Programming Model