1 qualifying examwei chen unified parallel c (upc) and the berkeley upc compiler wei chen the...

11
1 Qualifying Exam Wei Chen Unified Parallel C (UPC) and the Berkeley UPC Compiler Wei Chen the Berkeley UPC Group 3/11/07

Upload: amos-russell

Post on 14-Jan-2016

212 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: 1 Qualifying ExamWei Chen Unified Parallel C (UPC) and the Berkeley UPC Compiler Wei Chen the Berkeley UPC Group 3/11/07

1Qualifying Exam Wei Chen

Unified Parallel C (UPC) and the Berkeley UPC Compiler

Wei Chenthe Berkeley UPC Group

3/11/07

Page 2: 1 Qualifying ExamWei Chen Unified Parallel C (UPC) and the Berkeley UPC Compiler Wei Chen the Berkeley UPC Group 3/11/07

2 Wei ChenUPC talk

Parallel Programming•Most parallel programs are written using either:

•Message passing with a SPMD model• Usually for scientific applications with C++/Fortran• Scales easily: user controlled data layout• Hard to use: send/receive matching, message packing/unpacking

•Shared memory with OpenMP/pthreads/Java• Usually for non-scientific applications• Easier to program: direct reads and writes to shared data

• Hard to scale: (mostly) limited to SMPs, no concept of locality

•PGAS: an alternative hybrid model

Page 3: 1 Qualifying ExamWei Chen Unified Parallel C (UPC) and the Berkeley UPC Compiler Wei Chen the Berkeley UPC Group 3/11/07

3 Wei ChenUPC talk

Partitioned Global Address Space• PGAS model uses global address space abstraction

• Shared memory is partitioned by processors• User controlled data layout (global pointers and distributed arrays)

• One-sided communication: • Use RDMA support for reads/writes of shared variables • Much faster than message passing for small/medium size messages

• Hybrid model works for both SMPs and clusters• Languages: Titanium, Co-Array Fortran, UPC

Shared

Glo

bal

ad

dre

ss

sp

ace

X[0]

Privateptr: ptr: ptr:

X[1] X[P]

Page 4: 1 Qualifying ExamWei Chen Unified Parallel C (UPC) and the Berkeley UPC Compiler Wei Chen the Berkeley UPC Group 3/11/07

4 Wei ChenUPC talk

Unified Parallel C• A SPMD parallel extension of C• PGAS: add shared qualifier to type system• Several kinds of shared array distributions• Fine-grained and bulk communication• Commercial compilers with Cray/HP/IBM• Open source compilers with Berkeley UPC

Vector Addition in UPC

#define N 100*THREADSshared int v1[N], v2[N], sum[N]; //cyclic layoutvoid main() {

for(int i=0; i<N; i++) if (MYTHREAD == i%THREADS) //SPMD sum[i]=v1[i]+v2[i];

}

Page 5: 1 Qualifying ExamWei Chen Unified Parallel C (UPC) and the Berkeley UPC Compiler Wei Chen the Berkeley UPC Group 3/11/07

5 Wei ChenUPC talk

Overview of the Berkeley UPC Compiler

TranslatorUPC Code

Translator Generated C Code

Berkeley UPC Runtime System

GASNet Communication System

Network Hardware

Platform-independent

Network-independent

Compiler-independent

Language-independent

Two Goals: Portability and High-PerformanceLower UPC code into ISO C

code

Shared Memory Management and pointer operations

Uniform get/put interface for underlying

networks

Page 6: 1 Qualifying ExamWei Chen Unified Parallel C (UPC) and the Berkeley UPC Compiler Wei Chen the Berkeley UPC Group 3/11/07

6 Wei ChenUPC talk

UPC to C Translator

Preprocessed UPC Source

WHIRL with shared types

WHIRL with runtime calls

ISO C code

Parsing

Optimized WHIRL

Lowering

WHIRL2C

Lowering

Backend C compiler

Optimizer

• Based on Open64• Extend with shared type• Reuse analysis framework• Add UPC specific optimizations

• Portable translation• High level IR• Config file for platform dependent information

• Reinclude library headers • Convert shared memory

operations into runtime calls

Page 7: 1 Qualifying ExamWei Chen Unified Parallel C (UPC) and the Berkeley UPC Compiler Wei Chen the Berkeley UPC Group 3/11/07

7 Wei ChenUPC talk

Optimization framework

• Combination of language/compiler/runtime support• Transparent to the user• Performance portable

• Short term goal: effective on different cluster networks.• Long term goal: code designed for SMP get good

performance on clusters

Optimize regular array accesses

Optimize irregular pointer accesses

Nonblocking bulk communication

Loop framework for message vectorization, strip mining

PRE framework with split-phase access and coalescing

Runtime framework for communication overlap

A[i][j][k] p->x->y upc_memget(dst, src, size)

Page 8: 1 Qualifying ExamWei Chen Unified Parallel C (UPC) and the Berkeley UPC Compiler Wei Chen the Berkeley UPC Group 3/11/07

8 Wei ChenUPC talk

Application Performance – LU DecompositionLU performance comparison

10% 20% 30% 40% 50% 60% 70% 80% 90%

Cray X1 (64)

Cray X1 (128/124)

SGI Altix (32)

Opteron 2.2 GHz (64)

Syste

m

(pro

c c

ou

nt)

% peak performance

MPI/HPL

UPC/LU

• UPC performance comparable to MPI/HPL(Linpack) with < ½ the code size

• Uses light-weight multi-threading atop SPMD latency tolerant

• Highly adaptable to different problem and machine sizes

Page 9: 1 Qualifying ExamWei Chen Unified Parallel C (UPC) and the Berkeley UPC Compiler Wei Chen the Berkeley UPC Group 3/11/07

9 Wei ChenUPC talk

Application Performance – 3D FFT

• One-sided UPC approach sends more, smaller messages• Same total volume of data, but send earlier and more often• Aggressively overlaps the transpose with the 2nd 1-D FFT

• Same approach is less effective in MPI due to higher per-message cost• Consistently outperforms MPI-based implementations – by as much as 2X

MF

LOP

S /

Pro

cup is good

Page 10: 1 Qualifying ExamWei Chen Unified Parallel C (UPC) and the Berkeley UPC Compiler Wei Chen the Berkeley UPC Group 3/11/07

10 Wei ChenUPC talk

Current Status

• Public release v2.4 in November 2006• Fully compliant with UPC 1.2 specification• Communication optimizations• Extensions for performance and programmability

• Support from laptops to supercomputers• OS: UNIX (Linux, BSD, AIX, Solaris, etc), Mac, Cygwin

• Arch: x86, Itanium, Opteron, Alpha, PPC, SPARC, Cray X1, NEC SX-6, Blue Gene, etc.

• Network: SMP, Myrinet, Quadrics, Infiniband, IBM LAPI, MPI, Ethernet, SHMEM, etc.

• Give us a try at http://upc.lbl.gov

Page 11: 1 Qualifying ExamWei Chen Unified Parallel C (UPC) and the Berkeley UPC Compiler Wei Chen the Berkeley UPC Group 3/11/07

11 Wei ChenUPC talk

Summary

• UPC designed to be consistent with C• Expose memory layout• Flexible communication with pointers and arrays

• Give users more control to achieve high performance

• Berkeley UPC compiler provides an open-source and portable implementation

• Hand optimized UPC programs match and often beat MPI’s performance

• Research goal: productive user + efficient compiler