programming(for(mobile(gadgets( - the 5th …prism.sejong.ac.kr/download/p04_calin_cascaval.pdf ·...

23
Qualcomm Confiden.al and Proprietary — Restricted Distribu.on : DO NOT COPY MAY CONTAIN U.S. AND INTERNATIONAL EXPORT CONTROLLED INFORMATION — Programming for Mobile Gadgets Calin Cascaval cascaval@q..qualcomm.com

Upload: vandien

Post on 24-Mar-2018

216 views

Category:

Documents


3 download

TRANSCRIPT

Page 1: Programming(for(Mobile(Gadgets( - The 5th …prism.sejong.ac.kr/download/p04_Calin_Cascaval.pdf ·  · 2018-02-05Programming(for(Mobile(Gadgets ... (OpenMP,(C++(AMP,(ABB,(…((•

Qualcomm(Confiden.al(and(Proprietary(—(Restricted(Distribu.on(:(DO(NOT(COPY(MAY(CONTAIN(U.S.(AND(INTERNATIONAL(EXPORT(CONTROLLED(INFORMATION(—(

Programming(for(Mobile(Gadgets(Calin(Cascaval(

[email protected](

Page 2: Programming(for(Mobile(Gadgets( - The 5th …prism.sejong.ac.kr/download/p04_Calin_Cascaval.pdf ·  · 2018-02-05Programming(for(Mobile(Gadgets ... (OpenMP,(C++(AMP,(ABB,(…((•

Qualcomm(Confiden.al(and(Proprietary(—(Restricted(Distribu.on(:(DO(NOT(COPY(MAY(CONTAIN(U.S.(AND(INTERNATIONAL(EXPORT(CONTROLLED(INFORMATION(—(

Mobile(Compu.ng(Landscape(

2

users:(109((

devices:(1011( developers:(106(

Page 3: Programming(for(Mobile(Gadgets( - The 5th …prism.sejong.ac.kr/download/p04_Calin_Cascaval.pdf ·  · 2018-02-05Programming(for(Mobile(Gadgets ... (OpenMP,(C++(AMP,(ABB,(…((•

Qualcomm(Confiden.al(and(Proprietary(—(Restricted(Distribu.on(:(DO(NOT(COPY(MAY(CONTAIN(U.S.(AND(INTERNATIONAL(EXPORT(CONTROLLED(INFORMATION(—(

Anatomy(of(a(mobile(SoC((MSM8960)(

3

Source:(hXp://www.qualcomm.com/documents/files/snapdragon:s4:processors:system:on:chip:solu.ons:for:a:new:mobile:age:white:paper.pdf(

Page 4: Programming(for(Mobile(Gadgets( - The 5th …prism.sejong.ac.kr/download/p04_Calin_Cascaval.pdf ·  · 2018-02-05Programming(for(Mobile(Gadgets ... (OpenMP,(C++(AMP,(ABB,(…((•

Qualcomm(Confiden.al(and(Proprietary(—(Restricted(Distribu.on(:(DO(NOT(COPY(MAY(CONTAIN(U.S.(AND(INTERNATIONAL(EXPORT(CONTROLLED(INFORMATION(—(

Parallel(Compu.ng(:(Use(Cases(in(Mobile(

•  Computa.onal(Photography(–  HDR,%Camera%arrays%

•  Mul.media(Processing(–  Photo%and%video%editors%

•  Computer(Vision(Libraries(&(Applica.ons(–  OpenCV,%FastCV%%

•  Gaming(&(Graphics(–  Unity,%Bullet,%Box2D%

•  Web(Applica.ons(&(Browser(Technologies(–  Qualcomm%Zoomm,%Mozilla%Servo,%Intel%Rivertrail%

•  User(Interfaces(–  Samsung%Touchwiz,%HTC%Sense,%Windows%Metro%

•  Security(

4

Page 5: Programming(for(Mobile(Gadgets( - The 5th …prism.sejong.ac.kr/download/p04_Calin_Cascaval.pdf ·  · 2018-02-05Programming(for(Mobile(Gadgets ... (OpenMP,(C++(AMP,(ABB,(…((•

Qualcomm(Confiden.al(and(Proprietary(—(Restricted(Distribu.on(:(DO(NOT(COPY(MAY(CONTAIN(U.S.(AND(INTERNATIONAL(EXPORT(CONTROLLED(INFORMATION(—(

Programming(for(mobile(compu.ng(

5

Developers(•  Think(in(terms(of(services(and(“features”(•  Portable(apps,(performance(not(the(main(concern(•  Use(JavaScript(frameworks(

DSP( GPU(

AXI

T SE / RAS

VBIF

CP

CPU

VSC

HLSQ

VPC

RBBM

MARB

GMEM GMEM GMEM GMEM

OXILI SS

memory

PM4 PacketBuffers

VS Buffer

Index Buffers

Vertex Objects

Texture Objects

Frame Buffer

PC

VFD

SP#3

SP#2

SP#1

SP#0

UCHE

arb

OXILI TOP AHB Slave I/F

SS (shader system)

FF (fix-function)

External - other

arb

arb

arb

RB RB RB RB

MARB

MARB

MARB

2xTPL1

2xTPL1

2xTPL1

2xTPL1

ARM926EJ-S

16k I$ 8k D$

AHB

DDR AXI

VBIF (DDR)

ahb2axi ahb2crif

ARM Peripheral

VBIF (OCMEM)

OCMEM AXI

SP

SG

AP

VS

P-V

PP

I/F

(FIF

O)

Sha

red

RA

M B

anks

4K

B x

8

DM

A

VPE MEC

L1 $ / L2 $ Ctrl

TREPPEOCMEM

Data Mover & Controller

FPB (AHB)

VPP

VSP

Shared RAM pool(FIFO / Cache / Tile Buffer / Internal RAM)

Hardware(Accel.(

CPU(s)(

CPU(( CPU((

(VeNum(

CPU((

L2(

(VeNum((VeNum(

(VeNum(

CPU((

High%Power%Efficiency

Low%Power%Efficiency

Page 6: Programming(for(Mobile(Gadgets( - The 5th …prism.sejong.ac.kr/download/p04_Calin_Cascaval.pdf ·  · 2018-02-05Programming(for(Mobile(Gadgets ... (OpenMP,(C++(AMP,(ABB,(…((•

Qualcomm(Confiden.al(and(Proprietary(—(Restricted(Distribu.on(:(DO(NOT(COPY(MAY(CONTAIN(U.S.(AND(INTERNATIONAL(EXPORT(CONTROLLED(INFORMATION(—(

Pain(Points(for(Parallel(Compu.ng(

•  Portability(and(performance(portability(across(plagorms(

•  What(to(parallelize?((profiling)(

•  How(to(parallelize?((programming(models)(

•  Debugging(&(Tes.ng((correctness)(

•  Achieving(maximum(power(&(performance(efficiency((tuning)(

6

Page 7: Programming(for(Mobile(Gadgets( - The 5th …prism.sejong.ac.kr/download/p04_Calin_Cascaval.pdf ·  · 2018-02-05Programming(for(Mobile(Gadgets ... (OpenMP,(C++(AMP,(ABB,(…((•

Qualcomm(Confiden.al(and(Proprietary(—(Restricted(Distribu.on(:(DO(NOT(COPY(MAY(CONTAIN(U.S.(AND(INTERNATIONAL(EXPORT(CONTROLLED(INFORMATION(—(

Issues(with(current(models(

7

(Concurrency(

Model(

(Heterogeneity(

Support(

(Power(

Efficiency(

1( 2( 3(

Page 8: Programming(for(Mobile(Gadgets( - The 5th …prism.sejong.ac.kr/download/p04_Calin_Cascaval.pdf ·  · 2018-02-05Programming(for(Mobile(Gadgets ... (OpenMP,(C++(AMP,(ABB,(…((•

Qualcomm(Confiden.al(and(Proprietary(—(Restricted(Distribu.on(:(DO(NOT(COPY(MAY(CONTAIN(U.S.(AND(INTERNATIONAL(EXPORT(CONTROLLED(INFORMATION(—(

Concurrency(model(issues(

•  Most(models(designed(with(data(parallelism(in(mind(–  CUDA,(RenderScript,(OpenMP,(C++(AMP,(ABB,(…(

(

•  Most(task:parallel(models(are(not(expressive(enough(for(complex(sooware(–  Apple’s(GCD(uses(queues(to(set(dependences(between(tasks(–  Intel’s(TBB(task(model(has(cumbersome(support(for(complex(dependences(

between(tasks(

–  OpenMP(focuses(on(fork(and(join(parallelism((

•  Irregular(applica.ons,(such(as(browser(are(not(loop(based(parallelism(–  Need(to(combine(different(models(and(concurrency(constructs:(task(and(data(

parallelism,(call:backs(from(events,(blocking(and(network(.meouts(

–  Unpredictable(amounts(of(computa.on(

–  Fast(response(and(unexpected(events(

8

Page 9: Programming(for(Mobile(Gadgets( - The 5th …prism.sejong.ac.kr/download/p04_Calin_Cascaval.pdf ·  · 2018-02-05Programming(for(Mobile(Gadgets ... (OpenMP,(C++(AMP,(ABB,(…((•

Qualcomm(Confiden.al(and(Proprietary(—(Restricted(Distribu.on(:(DO(NOT(COPY(MAY(CONTAIN(U.S.(AND(INTERNATIONAL(EXPORT(CONTROLLED(INFORMATION(—(

Heterogeneity(issues(

•  Heterogeneity(exploita.on(is(hard(–  Requires(explicit(programming(and(detailed(knowledge(of(device,(resul.ng(in(non:portable(code(

•  OpenCL(is(a(first(aXempt(to(change(this(for(programmable(GPUs(

–  Many(poten.ally(programmable(devices(are(hidden(behind(fixed(func.on(drivers(

•  FastCV(is(an(example(that(aXempts(to(break(that(barrier((

•  Only(OpenCL(was(designed(from(the(ground(up(to(be(heterogeneous(–  HSA(::(a(heterogeneous(system(architecture(defini.on((

–  Renderscript?((

•  Code(running(in(these(models(targets(lowest(common(denominator(–  Restric.ons(for(specialized(cores(percolate(to(general(purpose(cores((

–  GPUs(are(great(for(data(parallel(computa.on,(terrible(for(task(parallelism(

–  Impose(limited(languages,(ooen(derived(from(ISO(C(99(

–  Programmers(end(up(using(two(concurrency(models(in(their(apps.(i.e:(OpenCL(and(TBB ((

–  No(common(interface(for(tasks(running(in(main(CPU(and(accelerators(

9

Page 10: Programming(for(Mobile(Gadgets( - The 5th …prism.sejong.ac.kr/download/p04_Calin_Cascaval.pdf ·  · 2018-02-05Programming(for(Mobile(Gadgets ... (OpenMP,(C++(AMP,(ABB,(…((•

Qualcomm(Confiden.al(and(Proprietary(—(Restricted(Distribu.on(:(DO(NOT(COPY(MAY(CONTAIN(U.S.(AND(INTERNATIONAL(EXPORT(CONTROLLED(INFORMATION(—(

Power(efficiency(issues(

•  Mobile(hardware(provides(aggressive(power(management(support:(–  Clock(ga.ng(–  Voltage(and(frequency(scaling(–  Energy(efficient(instruc.ons(

•  Narrow(ISAs,(e.g.,(Thumb(

•  Vector(instruc.ons(–  Low(power(states(–  Specialized,(power(efficient(units:(DSP,(H.264(and(JPEG(decoders(

•  OSes(provide(some(degree(of(power(management(–  Scheduling(–  Managing(power(as(a(resource:(Cinder,(etc.((

•  No(current(model(provides(abstrac.ons(to(think(about(power(–  Almost(no(programmer(ever(thinks(about(power(

10

Page 11: Programming(for(Mobile(Gadgets( - The 5th …prism.sejong.ac.kr/download/p04_Calin_Cascaval.pdf ·  · 2018-02-05Programming(for(Mobile(Gadgets ... (OpenMP,(C++(AMP,(ABB,(…((•

Qualcomm(Confiden.al(and(Proprietary(—(Restricted(Distribu.on(:(DO(NOT(COPY(MAY(CONTAIN(U.S.(AND(INTERNATIONAL(EXPORT(CONTROLLED(INFORMATION(—(

Heterogeneous(System(Architecture((HSA)(

•  HSA(is(a(system(specifica.on(that(envisions(to:(–  Make(the(GPU(easily(accessible(

•  GPU(as(a(first(class(computa.on(engine(

–  Make(GPU(compute(offload(more(efficient(

•  Direct(offload,(without(going(through(drivers(

•  Eliminate(memory(copies(by(suppor.ng(shared(virtual(memory(

–  Provide(a(virtual(ISA,(HSAIL,(to(allow(compa.bility(across(devices.(

•  HSAIL(relies(on(a(Finalizer(to(generate(code(for(the(GPU(–  Defines(a(memory(model(across(CPU(and(GPU,(compa.ble(with(C++11(

11

Page 12: Programming(for(Mobile(Gadgets( - The 5th …prism.sejong.ac.kr/download/p04_Calin_Cascaval.pdf ·  · 2018-02-05Programming(for(Mobile(Gadgets ... (OpenMP,(C++(AMP,(ABB,(…((•

Qualcomm(Confiden.al(and(Proprietary(—(Restricted(Distribu.on(:(DO(NOT(COPY(MAY(CONTAIN(U.S.(AND(INTERNATIONAL(EXPORT(CONTROLLED(INFORMATION(—(

OpenCL(

12

PlaJorm%Model% Memory%Model%

ExecuLon%Model%A(host(program(that(orchestrates(the(execu.on(of(kernels(execu.ng(on(devices(

–  Kernels(are(enqueued(and(communicate(through(events(or(GPU(global(memory(

–  The(host(is(responsible(for(managing(memory(

Page 13: Programming(for(Mobile(Gadgets( - The 5th …prism.sejong.ac.kr/download/p04_Calin_Cascaval.pdf ·  · 2018-02-05Programming(for(Mobile(Gadgets ... (OpenMP,(C++(AMP,(ABB,(…((•

Qualcomm(Confiden.al(and(Proprietary(—(Restricted(Distribu.on(:(DO(NOT(COPY(MAY(CONTAIN(U.S.(AND(INTERNATIONAL(EXPORT(CONTROLLED(INFORMATION(—(

What(does(the(programmer(need(to(think(about?(

•  Abstrac.ons(that(express(concurrency(and(hide(specific(hardware(implementa.ons:(–  Computa.on:(Tasks(

–  Memory(locality:(Sharing,(ownership(and(placement((

–  Synchroniza.on:(Concurrent(access(to(memory,(dependencies(between(tasks,(termina.on(

–  Communica.on:(Values(flowing(between(tasks,(copies(to(memory(closer(to(computa.on(units(

•  Parallel(programming(paXerns(

•  Concurrent,(distributed(data(structures(

13

Page 14: Programming(for(Mobile(Gadgets( - The 5th …prism.sejong.ac.kr/download/p04_Calin_Cascaval.pdf ·  · 2018-02-05Programming(for(Mobile(Gadgets ... (OpenMP,(C++(AMP,(ABB,(…((•

Qualcomm(Confiden.al(and(Proprietary(—(Restricted(Distribu.on(:(DO(NOT(COPY(MAY(CONTAIN(U.S.(AND(INTERNATIONAL(EXPORT(CONTROLLED(INFORMATION(—(

Mul.core(vs.(Heterogeneous(Parallelism(

MulLcore% Heterogeneous%

Work(decomposi.on( flexible:(fine(grain(tasks( dependent(on(latency,(currently(coarse(grain(

Data(decomposi.on( flexible:(shared(memory( requires(data(copy(to(device((

Synchroniza.on( locks,(atomics( atomics(within(the(device(events(across(devices(

Communica.on( coherent(caches( through(memory(or(message(passing(

Coding( high(level(languages(+(||extensions(and(libraries(

restricted(languages(or(assembly.(No(||(extensions(beside(GPU((OpenCL)((

Scheduling( uniform(capabili.es,(availability(based(

complex(op.miza.on(space(because(of(different(capabili.es,(func.on(constraints(and(availability(

14

Page 15: Programming(for(Mobile(Gadgets( - The 5th …prism.sejong.ac.kr/download/p04_Calin_Cascaval.pdf ·  · 2018-02-05Programming(for(Mobile(Gadgets ... (OpenMP,(C++(AMP,(ABB,(…((•

Qualcomm(Confiden.al(and(Proprietary(—(Restricted(Distribu.on(:(DO(NOT(COPY(MAY(CONTAIN(U.S.(AND(INTERNATIONAL(EXPORT(CONTROLLED(INFORMATION(—(

What(Is(MARE?(

•  MARE(is(a(programming(model(and(a(run.me(system(that(provide(simple(yet(powerful(abstrac.ons(for(parallel,(heterogeneous,(and(power:efficient(sooware((

•  MARE’s(abstrac.ons(reduce(the(effort(to(scale(concurrency(to(whatever(hardware(is(available(

•  MARE(key(benefits(extend(to(heterogeneous(execu.on((

15

Page 16: Programming(for(Mobile(Gadgets( - The 5th …prism.sejong.ac.kr/download/p04_Calin_Cascaval.pdf ·  · 2018-02-05Programming(for(Mobile(Gadgets ... (OpenMP,(C++(AMP,(ABB,(…((•

Qualcomm(Confiden.al(and(Proprietary(—(Restricted(Distribu.on(:(DO(NOT(COPY(MAY(CONTAIN(U.S.(AND(INTERNATIONAL(EXPORT(CONTROLLED(INFORMATION(—(

How(is(MARE(solving(these(issues?(

•  Task(parallel:(parallel(computa.on(is(a(par.ally:ordered(set(of(sequen.al(tasks((task(graph)((

•  Par..oned(global(address(space:(tasks(can(access(any(memory(on(SoC(((

•  Task(proper.es:(provide(mapping(and(scheduling(guidance(to(the(run.me(system(

•  Support(for(heterogeneous(execu.on((

•  Explicit(support(for(parallel(programming(paXerns,(design(geared(toward(ease(of(programming(

(16

Page 17: Programming(for(Mobile(Gadgets( - The 5th …prism.sejong.ac.kr/download/p04_Calin_Cascaval.pdf ·  · 2018-02-05Programming(for(Mobile(Gadgets ... (OpenMP,(C++(AMP,(ABB,(…((•

Qualcomm(Confiden.al(and(Proprietary(—(Restricted(Distribu.on(:(DO(NOT(COPY(MAY(CONTAIN(U.S.(AND(INTERNATIONAL(EXPORT(CONTROLLED(INFORMATION(—(

MARE(programming(

17

MARE(App(

Algorithm( Par..on(Work/Data(

Setup(task(dependences(

Parallel(Algorithm(Skeleton(

Implement(tasks(

Map(resources(

Device(

Device(

Task(

Task(

Page 18: Programming(for(Mobile(Gadgets( - The 5th …prism.sejong.ac.kr/download/p04_Calin_Cascaval.pdf ·  · 2018-02-05Programming(for(Mobile(Gadgets ... (OpenMP,(C++(AMP,(ABB,(…((•

Qualcomm(Confiden.al(and(Proprietary(—(Restricted(Distribu.on(:(DO(NOT(COPY(MAY(CONTAIN(U.S.(AND(INTERNATIONAL(EXPORT(CONTROLLED(INFORMATION(—(

MARE(Workflow(

•  Programmer:((–  Par..ons(applica.on(in(small(units(of(work((tasks)(

–  Sets(up(dependencies(between(tasks((–  Asks(MARE(executes(them(on(as(many(cores(as(possible((launch)(((

•  The(goal(of(the(MARE(API(is(to(simplify(well(established(parallel(programming(paXerns(

(Advantages:(•  Simpler(than(managing(threads(manually(•  Task(scheduler(can(make(more(informed(decisions(using(

op.onal(annota.ons(supplied(by(the(programmer(or(inferred(by(the(run.me(

18

Page 19: Programming(for(Mobile(Gadgets( - The 5th …prism.sejong.ac.kr/download/p04_Calin_Cascaval.pdf ·  · 2018-02-05Programming(for(Mobile(Gadgets ... (OpenMP,(C++(AMP,(ABB,(…((•

Qualcomm(Confiden.al(and(Proprietary(—(Restricted(Distribu.on(:(DO(NOT(COPY(MAY(CONTAIN(U.S.(AND(INTERNATIONAL(EXPORT(CONTROLLED(INFORMATION(—(

BarnesHut(–(A(Galois(benchmark(

•  n:body(simula.on(

•  Galois:(a(run.me(system(for(graph:based(applica.ons(

0.000(

0.500(

1.000(

1.500(

2.000(

2.500(

3.000(

3.500(

4.000(

1( 2( 4( 6(

ExecuL

on%Lme%(s)%

Cores%

MARE(

Pthreads(

Galois(

Machine:(6:core(x86(

20

Page 20: Programming(for(Mobile(Gadgets( - The 5th …prism.sejong.ac.kr/download/p04_Calin_Cascaval.pdf ·  · 2018-02-05Programming(for(Mobile(Gadgets ... (OpenMP,(C++(AMP,(ABB,(…((•

Qualcomm(Confiden.al(and(Proprietary(—(Restricted(Distribu.on(:(DO(NOT(COPY(MAY(CONTAIN(U.S.(AND(INTERNATIONAL(EXPORT(CONTROLLED(INFORMATION(—(

MARE(users(

•  System(developers(–  Use(MARE(APIs(to(develop(low:level(libraries(

–  ZOOMM(engine,(FastCV,(gaming(engines(…((

•  Advanced(app(developers(–  Use(MARE(APIs(and(domain:specific(libraries(

to(develop(high:performance(applica.ons(

–  Games,(video(and(photo(editors…((

•  Regular(and(Web(app(developers(–  Use(MARE(APIs(through(domain:specific(

libraries((

–  Parallelism(is(hidden(from(the(programmer(

–  Banking(apps,(restaurant(reviews…((

21

Na.ve(Apps(

Device(API(

JS(frameworks(

ZOOMM(Engine(

Web((Apps(

MARE(

Parallel(Libraries(

Page 21: Programming(for(Mobile(Gadgets( - The 5th …prism.sejong.ac.kr/download/p04_Calin_Cascaval.pdf ·  · 2018-02-05Programming(for(Mobile(Gadgets ... (OpenMP,(C++(AMP,(ABB,(…((•

Qualcomm(Confiden.al(and(Proprietary(—(Restricted(Distribu.on(:(DO(NOT(COPY(MAY(CONTAIN(U.S.(AND(INTERNATIONAL(EXPORT(CONTROLLED(INFORMATION(—(

Development(tools(

Manually%OpLmized%(Ninja)%

Programming%Models%&%AbstracLons%

(Expert%Developers)%

OpLmized%Domain%Specific%Libraries%

(Developers%needing%performance%but%lacking%parallelizaLon%experLse)%

Auto%VectorizaLon%&%Auto%ParallelizaLon%Tools%

(Minimal%Developer%effort)% Num

ber(of(expert(de

velope

rs(

Performance(Gains(

22

Page 22: Programming(for(Mobile(Gadgets( - The 5th …prism.sejong.ac.kr/download/p04_Calin_Cascaval.pdf ·  · 2018-02-05Programming(for(Mobile(Gadgets ... (OpenMP,(C++(AMP,(ABB,(…((•

Qualcomm(Confiden.al(and(Proprietary(—(Restricted(Distribu.on(:(DO(NOT(COPY(MAY(CONTAIN(U.S.(AND(INTERNATIONAL(EXPORT(CONTROLLED(INFORMATION(—(

Conclusions(

•  Are(we(reinven.ng(the(wheel,(and(just(calling(it(mobile(and(embedded?(

•  The(issues(facing(the(mobile(industry(are(similar(to(parallel(and(heterogeneous(programming(for(large(scale(machines(and(desktops(

•  However:(–  Mobile(devices(are(mainly(“single(task”(

–  Mobile(devices(are(power(and(energy(constraint(

–  Mobile(SoCs(already(have(a(variety(of(specialized(cores(

•  Therefore:(–  Heterogeneous(programming(must(contribute(to(improving(power(efficiency(

23

Page 23: Programming(for(Mobile(Gadgets( - The 5th …prism.sejong.ac.kr/download/p04_Calin_Cascaval.pdf ·  · 2018-02-05Programming(for(Mobile(Gadgets ... (OpenMP,(C++(AMP,(ABB,(…((•

Qualcomm(Confiden.al(and(Proprietary(—(Restricted(Distribu.on(:(DO(NOT(COPY(MAY(CONTAIN(U.S.(AND(INTERNATIONAL(EXPORT(CONTROLLED(INFORMATION(—(24