gpufs: integrating a file system with gpus · cpu-gpu cpu gpus accelerated applications os. mark...

58
Mark Silberstein - UT Austin 1 GPUfs: Integrating a file system with GPUs Mark Silberstein (UT Austin/Technion) Bryan Ford (Yale), Idit Keidar (Technion) Emmett Witchel (UT Austin)

Upload: others

Post on 24-Jun-2020

5 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: GPUfs: Integrating a file system with GPUs · CPU-GPU CPU GPUs Accelerated applications OS. Mark Silberstein - UT Austin 4 Software-hardware gap is widening Manycore processors FPGA

Mark Silberstein - UT Austin 1

GPUfs:Integrating a file system with

GPUs

Mark Silberstein(UT Austin/Technion)

Bryan Ford (Yale), Idit Keidar (Technion)Emmett Witchel (UT Austin)

Page 2: GPUfs: Integrating a file system with GPUs · CPU-GPU CPU GPUs Accelerated applications OS. Mark Silberstein - UT Austin 4 Software-hardware gap is widening Manycore processors FPGA

Mark Silberstein - UT Austin 2

Traditional System Architecture

Applications

OS

CPU

Page 3: GPUfs: Integrating a file system with GPUs · CPU-GPU CPU GPUs Accelerated applications OS. Mark Silberstein - UT Austin 4 Software-hardware gap is widening Manycore processors FPGA

Mark Silberstein - UT Austin 3

Modern System Architecture

Manycoreprocessors

FPGAHybrid

CPU-GPUGPUsCPU

Accelerated applications

OS

Page 4: GPUfs: Integrating a file system with GPUs · CPU-GPU CPU GPUs Accelerated applications OS. Mark Silberstein - UT Austin 4 Software-hardware gap is widening Manycore processors FPGA

Mark Silberstein - UT Austin 4

Software-hardware gap is widening

Manycoreprocessors

FPGAHybrid

CPU-GPUGPUsCPU

Accelerated applications

OS

Page 5: GPUfs: Integrating a file system with GPUs · CPU-GPU CPU GPUs Accelerated applications OS. Mark Silberstein - UT Austin 4 Software-hardware gap is widening Manycore processors FPGA

Mark Silberstein - UT Austin 5

Software-hardware gap is widening

Manycoreprocessors

FPGAHybrid

CPU-GPUGPUsCPU

Accelerated applications

OSAd-hoc abstractions and management mechanisms

Page 6: GPUfs: Integrating a file system with GPUs · CPU-GPU CPU GPUs Accelerated applications OS. Mark Silberstein - UT Austin 4 Software-hardware gap is widening Manycore processors FPGA

Mark Silberstein - UT Austin 6

On-accelerator OS support closes the programmability gap

Manycoreprocessors

FPGAHybrid

CPU-GPUGPUsCPU

Accelerated applications

OS On-accelerator OS support

Native accelerator applications

Coordination

Page 7: GPUfs: Integrating a file system with GPUs · CPU-GPU CPU GPUs Accelerated applications OS. Mark Silberstein - UT Austin 4 Software-hardware gap is widening Manycore processors FPGA

Mark Silberstein - UT Austin 7

● GPUfs: File I/O support for GPUs● Motivation● Goals● Understanding the hardware● Design● Implementation● Evaluation

Page 8: GPUfs: Integrating a file system with GPUs · CPU-GPU CPU GPUs Accelerated applications OS. Mark Silberstein - UT Austin 4 Software-hardware gap is widening Manycore processors FPGA

Mark Silberstein - UT Austin 8

Building systems with GPUs is hard.Why?

Page 9: GPUfs: Integrating a file system with GPUs · CPU-GPU CPU GPUs Accelerated applications OS. Mark Silberstein - UT Austin 4 Software-hardware gap is widening Manycore processors FPGA

Mark Silberstein - UT Austin 9

Data transfersGPU invocation

Memory management

Goal of GPU programming frameworks

GPU

Parallel Algorithm

CPU

Page 10: GPUfs: Integrating a file system with GPUs · CPU-GPU CPU GPUs Accelerated applications OS. Mark Silberstein - UT Austin 4 Software-hardware gap is widening Manycore processors FPGA

Mark Silberstein - UT Austin 10

Headache for GPU programmers

Parallel Algorithm

GPU

Data transfersInvocation

Memory management

CPU

Half of the CUDA SDK 4.1 samples:at least 9 CPU LOC per 1 GPU LOC

Page 11: GPUfs: Integrating a file system with GPUs · CPU-GPU CPU GPUs Accelerated applications OS. Mark Silberstein - UT Austin 4 Software-hardware gap is widening Manycore processors FPGA

Mark Silberstein - UT Austin 11

GPU kernels are isolated

Parallel Algorithm

GPU

Data transfersInvocation

Memory management

CPU

Page 12: GPUfs: Integrating a file system with GPUs · CPU-GPU CPU GPUs Accelerated applications OS. Mark Silberstein - UT Austin 4 Software-hardware gap is widening Manycore processors FPGA

Mark Silberstein - UT Austin 12

Example: accelerating photo collage

http://www.codeproject.com/Articles/36347/Face-Collage

While(Unhappy()){ Read_next_image_file() Decide_placement() Remove_outliers()}

Page 13: GPUfs: Integrating a file system with GPUs · CPU-GPU CPU GPUs Accelerated applications OS. Mark Silberstein - UT Austin 4 Software-hardware gap is widening Manycore processors FPGA

Mark Silberstein - UT Austin 13

CPU Implementation

CPUCPUCPU Application

While(Unhappy()){ Read_next_image_file() Decide_placement() Remove_outliers()}

Page 14: GPUfs: Integrating a file system with GPUs · CPU-GPU CPU GPUs Accelerated applications OS. Mark Silberstein - UT Austin 4 Software-hardware gap is widening Manycore processors FPGA

Mark Silberstein - UT Austin 14

Offloading computations to GPU

CPUCPUCPU Application

While(Unhappy()){ Read_next_image_file() Decide_placement() Remove_outliers()}

Move to GPU

Page 15: GPUfs: Integrating a file system with GPUs · CPU-GPU CPU GPUs Accelerated applications OS. Mark Silberstein - UT Austin 4 Software-hardware gap is widening Manycore processors FPGA

Mark Silberstein - UT Austin 15

Offloading computations to GPU

GPU

CPU

Kernel start

Datatransfer

Kernel termination

Co-processor programming model

Page 16: GPUfs: Integrating a file system with GPUs · CPU-GPU CPU GPUs Accelerated applications OS. Mark Silberstein - UT Austin 4 Software-hardware gap is widening Manycore processors FPGA

Mark Silberstein - UT Austin 16

Kernel start/stop overheads

CPU

GPU

copy to

GP

Uco

py to

CP

U

invoke

Invocationlatency

Synchronization

Cache flush

Page 17: GPUfs: Integrating a file system with GPUs · CPU-GPU CPU GPUs Accelerated applications OS. Mark Silberstein - UT Austin 4 Software-hardware gap is widening Manycore processors FPGA

Mark Silberstein - UT Austin 17

Hiding the overheads

CPU

GPU

copy to

GP

Uco

py to

CP

U

invoke

Manual data reuse managementAsynchronous invocation

Double buffering

copy to

GP

U

Page 18: GPUfs: Integrating a file system with GPUs · CPU-GPU CPU GPUs Accelerated applications OS. Mark Silberstein - UT Austin 4 Software-hardware gap is widening Manycore processors FPGA

Mark Silberstein - UT Austin 18

Implementation complexity

CPU

GPU

copy to

GP

Uco

py to

CP

U

invoke

Manual data reuse managementAsynchronous invocation

Double buffering

copy to

GP

U

Management overhead

Page 19: GPUfs: Integrating a file system with GPUs · CPU-GPU CPU GPUs Accelerated applications OS. Mark Silberstein - UT Austin 4 Software-hardware gap is widening Manycore processors FPGA

Mark Silberstein - UT Austin 19

Implementation complexity

CPU

GPU

copy to

GP

Uco

py to

CP

U

invoke

Manual data reuse managementAsynchronous invocation

Double buffering

copy to

GP

U

Why do we need to deal withlow-level system details?

Management overhead

Page 20: GPUfs: Integrating a file system with GPUs · CPU-GPU CPU GPUs Accelerated applications OS. Mark Silberstein - UT Austin 4 Software-hardware gap is widening Manycore processors FPGA

Mark Silberstein - UT Austin 20

The reason is....

GPUs are peer-processors

They need I/O OS services

Page 21: GPUfs: Integrating a file system with GPUs · CPU-GPU CPU GPUs Accelerated applications OS. Mark Silberstein - UT Austin 4 Software-hardware gap is widening Manycore processors FPGA

Mark Silberstein - UT Austin 21

GPUfs: application viewCPUs GPU1 GPU2 GPU3

open(“shared_file”)

mm

ap()

open(“shared_file”)w

rite(

)

Host File System

GPUfs

Page 22: GPUfs: Integrating a file system with GPUs · CPU-GPU CPU GPUs Accelerated applications OS. Mark Silberstein - UT Austin 4 Software-hardware gap is widening Manycore processors FPGA

Mark Silberstein - UT Austin 22

GPUfs: application viewCPUs GPU1 GPU2 GPU3

open(“shared_file”)

mm

ap()

open(“shared_file”)w

rite(

)

Host File System

GPUfs

System-wideshared

namespace

Persistentstorage

POSIX (CPU)-like API

Page 23: GPUfs: Integrating a file system with GPUs · CPU-GPU CPU GPUs Accelerated applications OS. Mark Silberstein - UT Austin 4 Software-hardware gap is widening Manycore processors FPGA

Mark Silberstein - UT Austin 23

Accelerating collage app with GPUfs

CPUCPUCPU

GPUfsGPUfs

open/read from GPU

GPU

No CPUmanagement code

Page 24: GPUfs: Integrating a file system with GPUs · CPU-GPU CPU GPUs Accelerated applications OS. Mark Silberstein - UT Austin 4 Software-hardware gap is widening Manycore processors FPGA

Mark Silberstein - UT Austin 24

CPUCPUCPU

GPUfs buffer cacheGPUfs

GPU

GPUfs

OverlappingOverlapping computations and transfers

Read-ahead

Accelerating collage app with GPUfs

Page 25: GPUfs: Integrating a file system with GPUs · CPU-GPU CPU GPUs Accelerated applications OS. Mark Silberstein - UT Austin 4 Software-hardware gap is widening Manycore processors FPGA

Mark Silberstein - UT Austin 25

CPUCPUCPU

GPUfs

GPU

Data reuse

Accelerating collage app with GPUfs

Random data access

Page 26: GPUfs: Integrating a file system with GPUs · CPU-GPU CPU GPUs Accelerated applications OS. Mark Silberstein - UT Austin 4 Software-hardware gap is widening Manycore processors FPGA

Mark Silberstein - UT Austin 26

Challenge

GPU ≠ CPU

Page 27: GPUfs: Integrating a file system with GPUs · CPU-GPU CPU GPUs Accelerated applications OS. Mark Silberstein - UT Austin 4 Software-hardware gap is widening Manycore processors FPGA

Mark Silberstein - UT Austin 27

Massive parallelism

NVIDIA Fermi* AMD HD5870*

From M. Houston/A. Lefohn/K. Fatahalian – A trip through the architecture of modern GPUs*

23,000 active threads

31,000 active threads

Parallelism is essential for performance in deeply multi-threaded wide-vector hardware

Page 28: GPUfs: Integrating a file system with GPUs · CPU-GPU CPU GPUs Accelerated applications OS. Mark Silberstein - UT Austin 4 Software-hardware gap is widening Manycore processors FPGA

Mark Silberstein - UT Austin 28

Heterogeneous memory

CPU GPU

Memory Memory

10-32GB/s

6-16 GB/s

288-360GB/s

~x20

GPUs inherently impose high bandwidth demands on memory

Page 29: GPUfs: Integrating a file system with GPUs · CPU-GPU CPU GPUs Accelerated applications OS. Mark Silberstein - UT Austin 4 Software-hardware gap is widening Manycore processors FPGA

Mark Silberstein - UT Austin 29

How to build an FS layer on this hardware?

Page 30: GPUfs: Integrating a file system with GPUs · CPU-GPU CPU GPUs Accelerated applications OS. Mark Silberstein - UT Austin 4 Software-hardware gap is widening Manycore processors FPGA

Mark Silberstein - UT Austin 30

GPUfs: principled redesign of the whole file system stack

● Relaxed FS API semantics for parallelism

● Relaxed FS consistency for heterogeneous memory

● GPU-specific implementation of synchronization primitives, lock-free data structures, memory allocation, ….

Page 31: GPUfs: Integrating a file system with GPUs · CPU-GPU CPU GPUs Accelerated applications OS. Mark Silberstein - UT Austin 4 Software-hardware gap is widening Manycore processors FPGA

Mark Silberstein - UT Austin 31

GPU applicationusing GPUfs File API

OS File System Interface

GPUfs high-level design

GPU Memory(Page cache)CPU Memory

GPUfs Distributed Buffer Cache

Unchanged applicationsusing OS File API

GPUfs hooks GPUfs GPU File I/O library

OS

CPU GPU

Disk

Host File System

Massiveparallelism

Heterogeneousmemory

Page 32: GPUfs: Integrating a file system with GPUs · CPU-GPU CPU GPUs Accelerated applications OS. Mark Silberstein - UT Austin 4 Software-hardware gap is widening Manycore processors FPGA

Mark Silberstein - UT Austin 32

GPU applicationusing GPUfs File API

OS File System Interface

GPUfs high-level design

GPU Memory(Page cache)CPU Memory

GPUfs Distributed Buffer Cache

Unchanged applicationsusing OS File API

GPUfs hooks GPUfs GPU File I/O library

OS

CPU GPU

Disk

Host File System

Page 33: GPUfs: Integrating a file system with GPUs · CPU-GPU CPU GPUs Accelerated applications OS. Mark Silberstein - UT Austin 4 Software-hardware gap is widening Manycore processors FPGA

Mark Silberstein - UT Austin 33

Buffer cache semantics

Local or Distributed file systemdata consistency?

Page 34: GPUfs: Integrating a file system with GPUs · CPU-GPU CPU GPUs Accelerated applications OS. Mark Silberstein - UT Austin 4 Software-hardware gap is widening Manycore processors FPGA

Mark Silberstein - UT Austin 34

GPUfs buffer cacheWeak data consistency model

● close(sync)-to-open semantics (AFS)

write(1)

open() read(1)

GPU1

GPU2

fsync() write(2)

Not visible to CPU

Remote-to-Local memory performanceratio is similar to

a distributed system

>>

Page 35: GPUfs: Integrating a file system with GPUs · CPU-GPU CPU GPUs Accelerated applications OS. Mark Silberstein - UT Austin 4 Software-hardware gap is widening Manycore processors FPGA

Mark Silberstein - UT Austin 35

On-GPU File I/O API

open/close

read/write

mmap/munmap

fsync/msync

ftrunc

gopen/gclose

gread/gwrite

gmmap/gmunmap

gfsync/gmsync

gftrunc

In th

e pa

per

Changes in the semantics are crucial

Page 36: GPUfs: Integrating a file system with GPUs · CPU-GPU CPU GPUs Accelerated applications OS. Mark Silberstein - UT Austin 4 Software-hardware gap is widening Manycore processors FPGA

Mark Silberstein - UT Austin 36

Implementation bits

● Paging support ● Dynamic data structures and memory

allocators● Lock-free radix tree● Inter-processor communications (IPC)● Hybrid H/W-S/W barriers● Consistency module in the OS kernel

In t h

e p a

per

~1,5K GPU LOC, ~600 CPU LOC

Page 37: GPUfs: Integrating a file system with GPUs · CPU-GPU CPU GPUs Accelerated applications OS. Mark Silberstein - UT Austin 4 Software-hardware gap is widening Manycore processors FPGA

Mark Silberstein - UT Austin 37

Evaluation

All benchmarks are written as a GPU

kernel: no CPU-side development

Page 38: GPUfs: Integrating a file system with GPUs · CPU-GPU CPU GPUs Accelerated applications OS. Mark Silberstein - UT Austin 4 Software-hardware gap is widening Manycore processors FPGA

Mark Silberstein - UT Austin 38

Matrix-vector product(Inputs/Outputs in files)Vector 1x128K elements, Page size = 2MB, GPU=TESLA C2075

280 560 2800 5600 112000

500

1000

1500

2000

2500

3000

3500CUDA piplined CUDA optimized GPU file I/O

Input matrix size (MB)

Th

roug

hput

(M

B/s

)

Page 39: GPUfs: Integrating a file system with GPUs · CPU-GPU CPU GPUs Accelerated applications OS. Mark Silberstein - UT Austin 4 Software-hardware gap is widening Manycore processors FPGA

Mark Silberstein - UT Austin 39

Word frequency count in text

● Count frequency of modern English words in the works of Shakespeare, and in the Linux kernel source tree

ChallengesDynamic working setSmall filesLots of file I/O (33,000 files,1-5KB each)Unpredictable output size

English dictionary: 58,000 words

Page 40: GPUfs: Integrating a file system with GPUs · CPU-GPU CPU GPUs Accelerated applications OS. Mark Silberstein - UT Austin 4 Software-hardware gap is widening Manycore processors FPGA

Mark Silberstein - UT Austin 40

Results

8CPUs GPU-vanilla GPU-GPUfs

Linux source33,000 files, 524MB

6h 50m (7.2X) 53m (6.8X)

Shakespeare1 file, 6MB 292s 40s (7.3X) 40s (7.3X)

Page 41: GPUfs: Integrating a file system with GPUs · CPU-GPU CPU GPUs Accelerated applications OS. Mark Silberstein - UT Austin 4 Software-hardware gap is widening Manycore processors FPGA

Mark Silberstein - UT Austin 41

Results

8CPUs GPU-vanilla GPU-GPUfs

Linux source33,000 files, 524MB

6h 50m (7.2X) 53m (6.8X)

Shakespeare1 file, 6MB 292s 40s (7.3X) 40s (7.3X)

Unboundedinput/outputsize support

8% overhead

Page 42: GPUfs: Integrating a file system with GPUs · CPU-GPU CPU GPUs Accelerated applications OS. Mark Silberstein - UT Austin 4 Software-hardware gap is widening Manycore processors FPGA

Mark Silberstein - UT Austin 42

GPUfsCPU

GPU

CPU GPU

Code is available for download at:https://sites.google.com/site/silbersteinmark/Home/gpufs

http://goo.gl/ofJ6J

GPUfs is the first system to provide native accessto host OS services from GPU programs

Page 43: GPUfs: Integrating a file system with GPUs · CPU-GPU CPU GPUs Accelerated applications OS. Mark Silberstein - UT Austin 4 Software-hardware gap is widening Manycore processors FPGA

Mark Silberstein - UT Austin 43

Our life would have been easier with

● PCI atomics● Preemptive background daemons ● GPU-CPU signaling support● In-GPU exceptions● GPU virtual memory API (host-based or device)● Compiler optimizations for register-heavy

libraries● Seems like accomplished in 5.0

Page 44: GPUfs: Integrating a file system with GPUs · CPU-GPU CPU GPUs Accelerated applications OS. Mark Silberstein - UT Austin 4 Software-hardware gap is widening Manycore processors FPGA

Mark Silberstein - UT Austin 44

CPU

CPU

Sequential access to file:3 versions

GPU file I/O

CUDA pipelined transfer

Read chunk Transfer to GPU Read chunk Transfer to GPU

Read chunk Transfer to GPU Read chunk Transfer to GPU

CUDA whole file transfer

GPU

gmmap() Read file Transfer to GPU

Page 45: GPUfs: Integrating a file system with GPUs · CPU-GPU CPU GPUs Accelerated applications OS. Mark Silberstein - UT Austin 4 Software-hardware gap is widening Manycore processors FPGA

Mark Silberstein - UT Austin 45

16K 64K 256K 512K 1M 2M0

500

1000

1500

2000

2500

3000

3500

4000

GPU File I/O CUDA whole file CUDA pipeline

Page size

Thr

ough

put (

MB

/s)

Sequential readThroughput vs. Page size

Page 46: GPUfs: Integrating a file system with GPUs · CPU-GPU CPU GPUs Accelerated applications OS. Mark Silberstein - UT Austin 4 Software-hardware gap is widening Manycore processors FPGA

Mark Silberstein - UT Austin 46

16K 64K 256K 512K 1M 2M0

500

1000

1500

2000

2500

3000

3500

4000

GPU File I/O CUDA whole file CUDA pipeline

Page size

Thr

ough

put (

MB

/s)

Sequential readThroughput vs. Page size

Benefit: Decouple performance constraints

from application logic

Page 47: GPUfs: Integrating a file system with GPUs · CPU-GPU CPU GPUs Accelerated applications OS. Mark Silberstein - UT Austin 4 Software-hardware gap is widening Manycore processors FPGA

Mark Silberstein - UT Austin 47

Accelerators

as

peers

Accelerators

as

co-processors

On-accelerator OS support

Yesterday Tomorrow

Page 48: GPUfs: Integrating a file system with GPUs · CPU-GPU CPU GPUs Accelerated applications OS. Mark Silberstein - UT Austin 4 Software-hardware gap is widening Manycore processors FPGA

Mark Silberstein - UT Austin 48

Accelerators

as co-processors

?

What about software?

CPU

GPU

CPU GPU

Tomorrow

Accelerators

as peers

Yesterday

Page 49: GPUfs: Integrating a file system with GPUs · CPU-GPU CPU GPUs Accelerated applications OS. Mark Silberstein - UT Austin 4 Software-hardware gap is widening Manycore processors FPGA

Mark Silberstein - UT Austin 49

Set GPUs free!

Page 50: GPUfs: Integrating a file system with GPUs · CPU-GPU CPU GPUs Accelerated applications OS. Mark Silberstein - UT Austin 4 Software-hardware gap is widening Manycore processors FPGA

Mark Silberstein - UT Austin 50

Parallel square root on GPU

gpu_thread(thread_id i){

float buffer;

int fd=gopen(filename,O_GRDWR);

offset=sizeof(float)*i;

gread(fd,sizeof(float),&buffer,offset);

buffer=sqrt(buffer);

gwrite(fd,sizeof(float),&buffer,offset);

gclose(fd);

}

Same code will run in all thousands of

the GPU threads

Page 51: GPUfs: Integrating a file system with GPUs · CPU-GPU CPU GPUs Accelerated applications OS. Mark Silberstein - UT Austin 4 Software-hardware gap is widening Manycore processors FPGA

Mark Silberstein - UT Austin 51

GPUfs impact on GPU programs

Memory overhead● Register pressure

● Very little CPU coding● Makes exitless GPU kernels possible

Pay-as-you-go design

Page 52: GPUfs: Integrating a file system with GPUs · CPU-GPU CPU GPUs Accelerated applications OS. Mark Silberstein - UT Austin 4 Software-hardware gap is widening Manycore processors FPGA

Mark Silberstein - UT Austin 52

Preserve CPU semantics?

GPU threads are different

from CPU threads

SIMD vector

Th re a d

Th re a d

Th re a d

Th re a d

SIMD vector

Th re a d

Th re a d

Th re a d

Th re a d

What does it mean to open/read/write/close/mmap a file

in thousands of threads?

Page 53: GPUfs: Integrating a file system with GPUs · CPU-GPU CPU GPUs Accelerated applications OS. Mark Silberstein - UT Austin 4 Software-hardware gap is widening Manycore processors FPGA

Mark Silberstein - UT Austin 53

Preserve CPU semantics?

GPU threads are different

from CPU threads

SIMD vector

Th re a d

Th re a d

Th re a d

Th re a d

SIMD vector

Th re a d

Th re a d

Th re a d

Th re a d

GPU kernel is a single data-parallel

application

What does it mean to open/read/write/close/mmap a file

in thousands of threads?

Page 54: GPUfs: Integrating a file system with GPUs · CPU-GPU CPU GPUs Accelerated applications OS. Mark Silberstein - UT Austin 4 Software-hardware gap is widening Manycore processors FPGA

Mark Silberstein - UT Austin 54

GPUfs semantics(see more discussion in the paper)

int fd=gopen(“filename”,O_GRDWR);

One file descriptor per file:

open()/close() cached on a GPU

One call per SIMD vector:bulk-synchronous

cooperative execution

SIMD vector

Th re a d

Th re a d

Th re a d

Th re a d

SIMD vector

Th re a d

Th re a d

Th re a d

Th re a d

Page 55: GPUfs: Integrating a file system with GPUs · CPU-GPU CPU GPUs Accelerated applications OS. Mark Silberstein - UT Austin 4 Software-hardware gap is widening Manycore processors FPGA

Mark Silberstein - UT Austin 55

GPU hardware characteristics

Parallelism

Heterogeneous memory

Page 56: GPUfs: Integrating a file system with GPUs · CPU-GPU CPU GPUs Accelerated applications OS. Mark Silberstein - UT Austin 4 Software-hardware gap is widening Manycore processors FPGA

Mark Silberstein - UT Austin 56

API semantics

int fd=gopen(“filename”,O_GRDWR);

Page 57: GPUfs: Integrating a file system with GPUs · CPU-GPU CPU GPUs Accelerated applications OS. Mark Silberstein - UT Austin 4 Software-hardware gap is widening Manycore processors FPGA

Mark Silberstein - UT Austin 57

API semantics

int fd=gopen(“filename”,O_GRDWR);

CPU

int fd=gopen(“filename”,O_GRDWR);

Page 58: GPUfs: Integrating a file system with GPUs · CPU-GPU CPU GPUs Accelerated applications OS. Mark Silberstein - UT Austin 4 Software-hardware gap is widening Manycore processors FPGA

Mark Silberstein - UT Austin 58

This code runs in 100,000 GPU threads

int fd=gopen(“filename”,O_GRDWR);

CPU≠GPU

int fd=gopen(“filename”,O_GRDWR);