2012 in4392 lecture-5 cloudprogrammingmodels

Upload: akbisoi1

Post on 03-Jun-2018

217 views

Category:

Documents


0 download

TRANSCRIPT

  • 8/12/2019 2012 IN4392 Lecture-5 CloudProgrammingModels

    1/95

    IN4392 Cloud Computing

    Cloud Programming Models

    2012-2013

    1

    Cloud Computing (IN4392)

    D..!. "pema and #. Iosup. $it% &ontri'utions rom Claudio Martella and ogdan *%it.

    Parallel and Distri'uted *roup+, Delt

  • 8/12/2019 2012 IN4392 Lecture-5 CloudProgrammingModels

    2/95

    ,erms or ,oda/s Dis&ussion

    Programming model= language + libraries + runtime system that create

    a model of computation (an abstract machine)

    Examples message!passing "s shared memory# data! "stask!parallelism# $

    #'stra&tion leel

    = distance from physical machine

    % What is the best abstraction le"el&

    '1'!'1 '

    Examples Assembly lo*!le"el "s Java is high le"elany design trade!offs performance# ease!of!use#common!task optimi,ation# programming paradigm# $

  • 8/12/2019 2012 IN4392 Lecture-5 CloudProgrammingModels

    3/95

    C%ara&teristi&s o a Cloud Programming Model

    1- .ost model (Efficiency) = cost/performance# o"erheads# $

    - ca a y- ault!tolerance

    2- 0upport for specific ser"ices

    3- .ontrol model# e-g-# fine!grained many!task scheduling4- 5ata model# including partitioning and placement# out!of!

    6- 0ynchroni,ation model

    '1'!'1

  • 8/12/2019 2012 IN4392 Lecture-5 CloudProgrammingModels

    4/95

  • 8/12/2019 2012 IN4392 Lecture-5 CloudProgrammingModels

    5/95

    ,oda/s C%allenges

    e0cience

    :he ourth 8aradigm :he 5ata 5eluge and 9ig 5ata

    8ossibly others

    '1'!'1 3

  • 8/12/2019 2012 IN4392 Lecture-5 CloudProgrammingModels

    6/95

    e&ien&e ,%e $%

    0cience experiments already cost '3;3< budget $ and perhaps incur 63< of the delays

    illions of lines of code *ith similar functionality

    e co e reuse across pro ec s an app ca on oma ns $ but last t*o decades? science is "ery similar in structure

    ost results difficult to share and reuse .ase!in!point 0loan 5igital 0ky 0ur"ey

    digital map of '3< of the sky x spectra2:9+ sky sur"ey data

    '+ astro!ob>ects (images)1+ ob>ects *ith spectrum (spectra)@o* to make it *ork for this andthe next generation of scientists&

    '1'!'1 4

    0ource Aim Bray and Clex 0,alay#e0cience !! C :ransformed 0cientific ethod#http//research-microsoft-com/en!us/um/people/gray/talks/D.!.0:9Fe0cience-ppt

  • 8/12/2019 2012 IN4392 Lecture-5 CloudProgrammingModels

    7/95

    e&ien&e (!o%n ,alor &i.,e&%. 1999)

    # ne5 s&ientii& met%od .ombine science *ith 7:

    ull scientific process control scientific instrument or produce data

    # #results# "isuali,e results

    ostly compute!intensi"e# e-g-# simulation of complex phenomena

    I, support 7nfrastructure @. Brid# Gpen 0cience Brid# 5C0# DorduBrid# $

    "6amples

    H physics# 9ioinformatics# aterial science# Engineering# Comp&i'1'!'1 6

    % Why is .omp0ci an example here&

  • 8/12/2019 2012 IN4392 Lecture-5 CloudProgrammingModels

    8/95

    ,%e 7ourt% Paradigm ,%e $% (#n #ne&dotal "6ample)

    ,%e 8er5%elming *ro5t% o no5ledge

    When 1' men founded theoyal 0ociety in 144# it *aspossible for an educated

    1II6'1

    1II1II6

    Dumber of8ublications

    scientific kno*ledge- J$K 7nthe last 3 years# such hasbeen the pace of scientificad"ance that e"en the bestscientists cannot keep up

    outside their o*n field-:ony 9lair#8 0peech# ay '' 5ata Ling#:he scientific impact of nations#Dature?2-

  • 8/12/2019 2012 IN4392 Lecture-5 CloudProgrammingModels

    9/95

    ,%e 7ourt% Paradigm ,%e $%at

    :housand years agoscience *as empiri&al describing natural phenomena

    ast fe* hundred years 22

    .

    4 cGa=

    7rom pot%esis to Data

    eore &a ranc using models# generali,ations

    ast fe* decadesa &omputational branch simulating complex phenomena

    :oday (t%e 7ourt% Paradigm)data e6ploration

    aa

    %1 What is the ourth8aradigm&

    # #

    5ata captured by instruments or generated by simulator 8rocessed by soft*are 7nformation/Lno*ledge stored in computer 0cientist analy,es results using data management and statistics

    '1'!'1 I

    0ource Aim Bray and :he ourth 8aradigm#http//research-microsoft-com/en!us/collaboration/fourthparadigm/

    %' What are the dangersof the ourth 8aradigm&

    % l :

  • 8/12/2019 2012 IN4392 Lecture-5 CloudProgrammingModels

    10/95

    ,%e Data Deluge:,%e $%

    ME"ery*here you look# the Nuantityof information in the *orld issoaring- Cccording to one

    # rexabytes (billion gigabytes) ofdata in '3- :his year# it *illcreate 1#' exabytes- erely

    keeping up *ith this flood# andstoring the bits that might beuseful# is difficult enough-

    Cnalysing it# to spot patterns and

    1

    extract use u n ormat on# sharder still-:he 5ata 5eluge# :he Economist#'3 ebruary '1

  • 8/12/2019 2012 IN4392 Lecture-5 CloudProgrammingModels

    11/95

    Data Deluge: ,%e Personal Meme6 "6ample

    Oanne"ar 9ush in the 1I2s record your life

    7: edia aboratory :he @uman 0peechome8ro>ect/:otalecall# data mining/analysis/"isio

    5eb oy and upal 8atel record practically e"ery

    *aking moment of their son?s first three years('< pri"acy time$7s this e"en legal&P 0hould it be&P)

    !

    11

    C75 + tapes# 1 computersR 'k hours audio!"ideo 5ata si,e 'B9/day# 1-389 total

  • 8/12/2019 2012 IN4392 Lecture-5 CloudProgrammingModels

    12/95

    Data Deluge: ,%e *aming #nalti&s "6ample

    E% 77 ':9/year all logs

    1'

    a o - ser"e s a s cson player logs

  • 8/12/2019 2012 IN4392 Lecture-5 CloudProgrammingModels

    13/95

    Data Deluge: Datasets in Comp.&i.

    5ataset0i,e

    http//g*a-e*i-tudelft-nlhttp//g*a-e*i-tudelft-nl

    8eer!to!8eer :race Crchi"e 1B9

    1B9

    1B9

    1:9

    1:9/yr

    8'8:C

    Bam:C:he ailure:race

    Crchi"e

    http//fta-inria-frhttp//fta-inria-fr

    1

    $ 8WC# 7:C# .CW5C5# $

    1#s of scientists rom theory to practice

    1

    SearTI T1 T11T4

  • 8/12/2019 2012 IN4392 Lecture-5 CloudProgrammingModels

    14/95

    Data Deluge:

    ,%e Proessional $orld *ets Conne&ted

    '1'!'1 12

    eb '1'

    0ource Oincen,o .osen,a# :he 0tate of inked7n#http//"incos-it/the!state!of!linkedin/

  • 8/12/2019 2012 IN4392 Lecture-5 CloudProgrammingModels

    15/95

    $%at is ig Data:;

    Oery large# distributed aggregations of loosely structureddata# often incomplete and inaccessible

    Easily exceeds the processing capacity of con"entionaldatabase systems

    8rinciple of 9ig 5ata When you can# keep e"erythingP

    :oo big# too fast# and doesn?t comply *ith the traditionaldatabase architectures

    '11!'1' 13

  • 8/12/2019 2012 IN4392 Lecture-5 CloudProgrammingModels

    16/95

    ,%e ,%ree

  • 8/12/2019 2012 IN4392 Lecture-5 CloudProgrammingModels

    17/95

    Data $are%ouse s. ig Data

    '11!'1' 16

    0ource http//*ikibon-org/

  • 8/12/2019 2012 IN4392 Lecture-5 CloudProgrammingModels

    18/95

    #genda

    1- 7ntroduction

    -

    3. Programming Models or Compute-Intensie$or=loads

    1. ags o ,as=s'- Workflo*s

    - 8arallel 8rogramming odels

    2- 8rogramming odels for 9ig 5ata3- 0ummary

    '1'!'1 1Q

  • 8/12/2019 2012 IN4392 Lecture-5 CloudProgrammingModels

    19/95

    $%at is a ag o ,as=s (o,); # stem obs sent by a user$

    :ime JunitsK Why 9ag of :asks& rom the perspecti"e

    of the user# obs in set are ust tasks of a lar er ob

    $that start at most Us afterthe first >ob % What is the user?s "ie*&

    C single useful result from the complete 9o: esult can be combination of all tasks# or a selection

    of the results of most or e"en a single task'1'!'1 1I

    Iosup et al., The Characteristics andPerformance of Groups of Jobs in Grids,Euro-Par, LNCS, ol.!"!#, pp. $%&-$'$, '6.

  • 8/12/2019 2012 IN4392 Lecture-5 CloudProgrammingModels

    20/95

    #ppli&ations o t%e o, Programming Model

    8arameter s*eeps #

    Oery useful in engineering and simulation!based science

    onte .arlo simulations 0imulation *ith random elements fixed time yet limited inaccuracy Oery useful in engineering and simulation!based science

    any other types of batch processing 8eriodic computation# .ycle sca"enging

    Oery useful to automate operations and reduce *aste

    '1'!'1 '

    , t% D i t P i

  • 8/12/2019 2012 IN4392 Lecture-5 CloudProgrammingModels

    21/95

    o,s e&ame t%e Dominant ProgrammingModel or *rid Computing

    (V0) Brid

    (.C) [email protected]:

    (EV) EBEE

    (V0) .ondor V-Wisc-

    (V0) :eraBrid!' D.0C

    ' 2 4 Q 1

    (D) 5C0!'

    () Brid3

    (DG#0E) DorduBrid

    (VL) C

    (V0) BGW

    7rom >o's ?@A

    (V0) Brid

    (.C) [email protected]:

    (EV) EBEE

    (V0) .ondor V-Wisc-(V0) :eraBrid!' D.0C

    '1'!'1 '1

    ' 2 4 Q 1

    (D) 5C0!'

    () Brid3

    (DG#0E) DorduBrid(VL) C

    (V0) BGW

    7rom CP,ime ?@A

    Iosup and Epema( Grid Computin) *or+loads.IEEE Internet Computin) #&( #'-&" &/##

  • 8/12/2019 2012 IN4392 Lecture-5 CloudProgrammingModels

    22/95

    Pra&ti&al #ppli&ations o t%e o, Programming Model

    Parameter 5eeps in Condor ?1+4A

    0ue the scientist *ants toind the "alue of x# #, for

    1 "alues for x and y# and 4 "alues for ,

    olution un a parameter s5eep# *ith

    1 x 1 x 4 = 4 parameter "alues 8roblem of the solution

    !

    machine- 7t takes 4 hours- :hat?s 1B0 das uninterrupted computation on 0ue?s machineP

    '1'!'1 ''

    0ource .ondor :eam# .ondor Vser?s :utorial-http//cs-u*isc-edu/condor

  • 8/12/2019 2012 IN4392 Lecture-5 CloudProgrammingModels

    23/95

    Pra&ti&al #ppli&ations o t%e o, Programming Model

    Parameter 5eeps in Condor ?2+4AUniverse = vanilla

    Executable = sim.exe

    Input = input.txt

    u pu = ou pu . x

    Error = error.txt

    Log = sim.log

    Requirements = OpSys == WINNT61 &&Arch == INTEL &&

    (Disk >= DiskUsage) && ((Memory * 1024)>=ImageSize)

    .omplex 0Cs canbe specified easily

    InitialDir = run_$(Process)

    Queue 600

    '1'!'1 '

    0ource .ondor :eam# .ondor Vser?s :utorial-http//cs-u*isc-edu/condor

    Clso passed asparameter to sim.exe

  • 8/12/2019 2012 IN4392 Lecture-5 CloudProgrammingModels

    24/95

    Pra&ti&al #ppli&ations o t%e o, Programming Model

    Parameter 5eeps in Condor ?3+4A

    % condor_submit sim.submit

    Submitting job(s)...............................................................................................................................................................................................................................................................

    Logging submit event(s)................................................................................................................................................................................................

    ...............................................................

    600 job(s) submitted to cluster 3.

    '1'!'1 '2

    0ource .ondor :eam# .ondor Vser?s :utorial-http//cs-u*isc-edu/condor

  • 8/12/2019 2012 IN4392 Lecture-5 CloudProgrammingModels

    25/95

    Pra&ti&al #ppli&ations o t%e o, Programming Model

    Parameter 5eeps in Condor ?4+4A

    % condor_q

    -- Submitter: x.cs.wisc.edu : :x.cs.wisc.edu

    ID OWNER SUBMITTED RUN_TIME ST PRI SIZE CMD3.0 frieda 4/20 12:08 0+00:00:05 R 0 9.8 sim.exe

    3.1 frieda 4/20 12:08 0+00:00:03 I 0 9.8 sim.exe

    3.2 frieda 4/20 12:08 0+00:00:01 I 0 9.8 sim.exe

    3.3 frieda 4/20 12:08 0+00:00:00 I 0 9.8 sim.exe...

    3.598 frieda 4/20 12:08 0+00:00:00 I 0 9.8 sim.exe

    3.599 frieda 4 20 12:08 0+00:00:00 I 0 9.8 sim.exe

    600 jobs; 599 idle, 1 running, 0 held

    '1'!'1 '3

    0ource .ondor :eam# .ondor Vser?s :utorial-http//cs-u*isc-edu/condor

  • 8/12/2019 2012 IN4392 Lecture-5 CloudProgrammingModels

    26/95

  • 8/12/2019 2012 IN4392 Lecture-5 CloudProgrammingModels

    27/95

    $%at is a $o=lo5;

    W = set of >obs *ith

    (think 5irect Ccyclic Braph)

    '1'!'1 '6

  • 8/12/2019 2012 IN4392 Lecture-5 CloudProgrammingModels

    28/95

    #ppli&ations o t%e $or=lo5 ProgrammingModel

    .omplex applications .om lex filterin of data

    .omplex analysis of instrument measurements

    Cpplications created by non!.0 scientistsH

    Workflo*s ha"e a natural correspondence in the real!*orld#as descriptions of a scientific procedure

    Oisual model of a graph sometimes easier to program

    8recursor of the apeduce 8rogramming odel(next slides)

    '1'!'1 'Q

    HCdapted from .arole Boble and 5a"id de oure# .hapter in :he ourth8aradigm# http//research-microsoft-com/en!us/collaboration/fourthparadigm/

  • 8/12/2019 2012 IN4392 Lecture-5 CloudProgrammingModels

    29/95

    $or=lo5s "6isted in *rids 'ut Did Note&ome a Dominant Programming Model

    :races

    0elected indings

    Braph *ith !2 le"els C"erage W si,e is /22 >obs 63obs or less# I3< are si,ed ' >obs or less

    '1'!'1 'I

    0stermann et al., 0n the Characteristics of Grid*or+flo1s, CoreG2I3 Inte)rated 2esearch in GridComputin) CGI*, 'Q.

    P ti l # li ti t% $7 P i M d l

  • 8/12/2019 2012 IN4392 Lecture-5 CloudProgrammingModels

    30/95

    Pra&ti&al #ppli&ations o t%e $7 Programming Model

    ioinormati&s in ,aerna

    '1'!'1

    0ource .arole Boble and 5a"id de oure# .hapter in :he ourth 8aradigm#http//research-microsoft-com/en!us/collaboration/fourthparadigm/

    # d

  • 8/12/2019 2012 IN4392 Lecture-5 CloudProgrammingModels

    31/95

    #genda

    1- 7ntroduction

    -

    3. Programming Models or Compute-Intensie$or=loads

    1- 9ags of :asks

    '- Workflo*s

    3. Parallel Programming Models

    2- 8rogramming odels for 9ig 5ata3- 0ummary

    '1'!'1 1

    P ll l P i M d l

    ,D &ourse in4049,D &ourse in4049

  • 8/12/2019 2012 IN4392 Lecture-5 CloudProgrammingModels

    32/95

    Parallel Programming Models

    Cbstract machines (5istributed) shared memory

    ,as=,as= (groups o B B minutes)(groups o B B minutes)dis&uss arallel ro rammin in &loudsdis&uss arallel ro rammin in &louds

    ,D &ourse in4049,D &ourse in4049Introdu&tion to PCIntrodu&tion to PC

    str ute memory

    .onceptual programming models aster/*orker

    5i"ide and conNuer

    5ata / :ask parallelism

    ,as=,as=(inter(inter--group dis&ussion)group dis&ussion)dis&uss.dis&uss.

    0ystem!le"el programming models :hreads on B8Vs and other multi!cores

    '1'!'1 '

    4arbanescu et al.( To1ards an Effectie 5nifiedPro)rammin) 6odel for 6an7-Cores. IP3PS *S &/#&

    # d

  • 8/12/2019 2012 IN4392 Lecture-5 CloudProgrammingModels

    33/95

    #genda

    1- 7ntroduction

    -

    - 8rogramming odels for .ompute!7ntensi"e Workloads

    4. Programming Models or ig Data

    3- 0ummary

    '1'!'1

    "&osstems o ig-Data Programming Models

  • 8/12/2019 2012 IN4392 Lecture-5 CloudProgrammingModels

    34/95

    &osste s o g ata og a g ode s

    0% @i"e 8igAC% 5ryad7D%0cope C%

    @igh!e"el anguage

    9ig%uerylume 0a*,alleteor

    % Where does !on!demand fit&% Where does 8regel!on!B8Vs fit&

    5remel

    0er"ice:ree

    Mapedu&e Model ClgebrixP#C,

    87/

    Erlang

    Nep%ele @yracks5ryadadoop+

    #N

    @aloop

    Pregel

    C,ure

    Engine

    :era

    5ataEngine

    Execution Engine

    lume

    Engine

    5ataflo*

    *irap%

    Csterix9!tree

    '1'!'1 2

    0

    D7 .osmos0

    Cdapted from 5agstuhl 0eminar on 7nformation anagement in the .loud#http//***-dagstuhl-de/program/calendar/partlist/&semnr=11'1X0VGB

    C,ure5ata0tore

    :era5ata0tore

    0torage Engine

    OoldemortB00

    H 8lus Yookeeper# .5D# etc-

    #genda

  • 8/12/2019 2012 IN4392 Lecture-5 CloudProgrammingModels

    35/95

    #genda

    1- 7ntroduction

    -

    - 8rogramming odels for .ompute!7ntensi"e Workloads

    4. Programming Models or ig Data

    1. Mapedu&e'- Braph 8rocessing

    - Gther 9i 5ata 8ro rammin odels

    3- 0ummary

    '1'!'1 3

    Mapedu&e

  • 8/12/2019 2012 IN4392 Lecture-5 CloudProgrammingModels

    36/95

    Mapedu&e

    odel for processing and generating large data sets

    Enables a functional!like ro rammin model

    0plits computations into independent parallel tasks

    akes efficient use of large commodity clusters

    @ides the details of paralleli,ation# fault!tolerance#data distribution# monitoring and load balancing

    '11!'1' 4

    Mapedu&e ,%e Programming Model

  • 8/12/2019 2012 IN4392 Lecture-5 CloudProgrammingModels

    37/95

    Mapedu&e ,%e Programming Model

    C programmig model# not a programming languageP1- 7nput/Gutput

    '- ap 8hase 8rocesses input key/"alue pair

    8roduces set of intermediate pairs

    map (in_key, in_value) -> list(out_key, interm_value)

    3. Reduce Phase:

    v u v y

    Produces a set of merged output values

    reduce(out_key, list(interm_value)) -> list(out_value)

    '11!'1' 6

    Mapedu&e in t%e Cloud

  • 8/12/2019 2012 IN4392 Lecture-5 CloudProgrammingModels

    38/95

    acebook use case 0ocial!net*orking ser"ices

    Cnaly,e connections in the graph of friendships to recommend ne*

    connections Boogle use case

    Web!base email ser"ices# Boogle5ocs

    Cnaly,e messages and user beha"ior to optimi,e ad selection andplacement

    Soutube use case Oideo!sharing sites

    Cnaly,e user preferences to gi"e better stream suggestions

    '11!'1' Q

    $ord&ount "6ample

  • 8/12/2019 2012 IN4392 Lecture-5 CloudProgrammingModels

    39/95

    $ord&ount "6ample

    ile 1 :he big data is big- ile ' apeduce tames big data-

    Map 8utput apper!1 (:he# 1)# (big# 1)# (data# 1)# (is# 1)# (big# 1)

    apper!' (apeduce# 1)# (tames# 1)# (big# 1)# (data# 1)

    edu&e Input educer!1 (:he# 1)

    educer!' (big# 1)# (big# 1)# (big# 1)

    edu&e 8utput educer!1 (:he# 1)

    educer!' (big# )

    '11!'1' I

    educer! (data# 1)# (data# 1)

    educer!2 (is# 1)

    educer!3 (apeduce# 1)# (apeduce# 1)

    educer!4 (tames# 1)

    educer! (data# ')

    educer!2 (is# 1)

    educer!3 (apeduce# ')

    educer!4 (tames# 1)

    Colored Euare Counter

  • 8/12/2019 2012 IN4392 Lecture-5 CloudProgrammingModels

    40/95

    Colored Euare Counter

    '11!'1' 2

    Mapedu&e F *oogle "6ample 1

  • 8/12/2019 2012 IN4392 Lecture-5 CloudProgrammingModels

    41/95

    Mapedu&e F *oogle "6ample 1

    Benerating language model statistics oun o mes e"ery !*or seNuence occurs n arge corpus

    of documents (and keep all those *here count [= 2)

    apeduce solution ap extract 3!*ord seNuences =[ count from document

    educe combine counts# and keep if count large enough

    '11!'1' 21

    http//***-slideshare-net/>hammerb/mapreduce!pact4!keynote

    Mapedu&e F *oogle "6ample 2

  • 8/12/2019 2012 IN4392 Lecture-5 CloudProgrammingModels

    42/95

    Mapedu&e F *oogle "6ample 2

    Aoining *ith other data

    Benerate per!doc summary# but include per!host summary(e-g-# Z of pages on host# important terms on host)

    apeduce solution ap extract host name from V# lookup per!host info#

    combine *ith per!doc data and emit

    e uce en y unc on em ey "a ue rec y

    '11!'1' 2'

    http//***-slideshare-net/>hammerb/mapreduce!pact4!keynote

    More "6amples o Mapedu&e #ppli&ations

  • 8/12/2019 2012 IN4392 Lecture-5 CloudProgrammingModels

    43/95

    More "6amples o Mapedu&e #ppli&ations

    5istributed Brep

    .ount of V Cccess reNuency

    !

    :erm!Oector per @ost

    5istributed 0ort

    7n"erted indices 8age ranking

    '11!'1' 2

    $

    "6e&ution 8erie5 (1+2)

  • 8/12/2019 2012 IN4392 Lecture-5 CloudProgrammingModels

    44/95

    ( + )

    % What is the performance problem raised by this step&

    '11!'1' 22

    "6e&ution 8erie5 (2+2)

  • 8/12/2019 2012 IN4392 Lecture-5 CloudProgrammingModels

    45/95

    "6e&ution 8erie5 (2+2)

    Gne master# many *orkers example 7nput data split into map tasks (42 / 1'Q 9) '# maps

    educe phase partitioned into reduce tasks 2# reduces

    5ynamically assign tasks to *orkers '# *orkers

    aster assigns each map / reduce task to an idle *orker

    ap *orker 5ata locality a*areness

    ead input and generate local files *ith key/"alue pairs

    educe *orker

    ead intermediate output from mappers

    0ort and reduce to produce the output

    '11!'1' 23

    7ailures and a&=-up ,as=s

  • 8/12/2019 2012 IN4392 Lecture-5 CloudProgrammingModels

    46/95

    p

    '11!'1' 24

    $%at is adoop; # Mapedu&e "6e&. "ngine

  • 8/12/2019 2012 IN4392 Lecture-5 CloudProgrammingModels

    47/95

    p p g

    7nspired by Boogle# supported by SahooP

    5esigned to perform fast and reliable analysis of the big data

    arge expansion in many domains such as inance# technology# telecom# media# entertainment# go"ernment#

    research institutions

    '11!'1' 26

    adoop F a%oo

  • 8/12/2019 2012 IN4392 Lecture-5 CloudProgrammingModels

    48/95

    When you "isityahoo# you areinteracting *ithdata rocessed*ith @adoopP

    '11!'1' 2Q

    https//open&irrus-org/system/files/8penCirrusadoop'I-ppt

    adoop se Cases

  • 8/12/2019 2012 IN4392 Lecture-5 CloudProgrammingModels

    49/95

    p

    1- 0earch Sahoo# Cma,on# Y"ents

    '- o rocessin

    acebook# Sahoo# Aoost# ast-fm- ecommendation 0ystems

    acebook

    2- 5ata Warehouse

    acebook# CG

    3- Oideo and 7mage Cnalysis De* Sork :imes# Eyealike

    '11!'1' 2I

    http//cloud-berkeley-edu/data/hdfs-pdf

    adoop Distri'uted 7ile stem

  • 8/12/2019 2012 IN4392 Lecture-5 CloudProgrammingModels

    50/95

    Cssumptions and goals 5ata distributed across hundreds or thousands of machines

    5etection of faults and automatic reco"ery 5esigned for batch processing "s- interacti"e use

    @igh throughput of data access "s- lo* latency of data access

    @igh aggregate band*idth and high scalability

    Write!once!read!many file access model

    o"ing computations is cheaper than mo"ing data minimi,e net*ork

    '11!'1' 3

    D7 #r&%ite&ture

  • 8/12/2019 2012 IN4392 Lecture-5 CloudProgrammingModels

    51/95

    aster/sla"e architecture

    DameDode DD anages the file system namespace egulates access to files by clients

    Gpen/close/rename files or directories

    apping of blocks to 5ataDodes 5ataDode (5D)

    Gne er node in the cluster

    anages local storage of the node 9lock creation/deletion/replication initiated by DD

    0er"e read/*rite reNuests from clients

    '11!'1' 31

    D7 Internals

  • 8/12/2019 2012 IN4392 Lecture-5 CloudProgrammingModels

    52/95

    eplica 8lacement 8olicy irst replica on one node in the local rack

    0econd replica on a different node in the local rack

    r rep ca on a eren no e n a eren rac

    impro"ed *rite performance ('/ are on the local rack)

    preser"es data reliability and read performance

    .ommunication 8rotocols ayered on top of :.8/78

    r \

    5ataDode 8rotocol 5ataDodes \ DameDode DameDode responds to 8. reNuests issued by 5ataDodes / clients

    '11!'1' 3'

    adoop &%eduler

  • 8/12/2019 2012 IN4392 Lecture-5 CloudProgrammingModels

    53/95

    Aob di"ided into se"eral independent tasks executed in parallel :he input file is split into chunks of 42 / 1'Q 9 Each chunk is assigned to a map task

    e uce as aggrega e e ou pu o e map as s

    :he master assigns tasks to the *orkers in 7G order Aob:racker maintains a Nueue of running >obs# the states of the

    :ask:rackers# the tasks assignments

    :ask:rackers report their state "ia a heartbeat mechanism

    5ata ocality execute tasks close to their data 0peculati"e execution re!launch slo* tasks

    '11!'1' 3

    Mapedu&e "olution ?1+BA

    adoop is Maturing Important Contri'utors

  • 8/12/2019 2012 IN4392 Lecture-5 CloudProgrammingModels

    54/95

    adoop is Maturing Important Contri'utors

    ches

    Dum

    berofpat

    '1'!'1 32

    0ource http//***-theregister-co-uk/'1'/Q/16/communityFhadoop/

    Mapedu&e "olution ?2+BA

    Gate &%eduler

  • 8/12/2019 2012 IN4392 Lecture-5 CloudProgrammingModels

    55/95

    Gate &%eduler

    C:E 0cheduler ongest Cpproximate :ime to End

    0peculati"ely execute the task that *ill finish farthest into the future

    timeFleft = (1!progress0core) / progressate :asks make progress at a roughly constant rate

    obust to node heterogeneity

    E.' 0ort running times C:E "s- @adoop "s- Do spec-

    '11!'1' 33

    8aharia et al.( Improin) 6ap2educe performance inhetero)eneous enironments. 0S3I &//%.

    Mapedu&e "olution ?3+BA

    7#I &%eduling

  • 8/12/2019 2012 IN4392 Lecture-5 CloudProgrammingModels

    56/95

    7#I &%eduling

    7solation and statistical multiplexing :*o!le"el architecture

    0econd# each pool allocates its slots among multiple >obs 9ased on a max!min fairness policy

    '11!'1' 34

    8aharia et al.( 3ela7 schedulin)( a simple techni9ue forachiein) localit7 and fairness in clusterschedulin). EuroS7s &/#/. :lso T2 EECS-&//'-

    Mapedu&e "olution ?4+BA

    Dela &%eduling

  • 8/12/2019 2012 IN4392 Lecture-5 CloudProgrammingModels

    57/95

    Dela &%eduling

    5ata locality issues @ead!of!line scheduling 7G# C7

    o* probability for small >obs to achie"e data locality

    3Q< of >obs ] C.E9GGL ha"e ^ '3 maps Gnly 3< achie"e node locality "s- 3I< rack locality

    '11!'1' 36

    8aharia et al.( 3ela7 schedulin)( a simple techni9ue forachiein) localit7 and fairness in clusterschedulin). EuroS7s &/#/. :lso T2 EECS-&//'-

    Mapedu&e "olution ?B+BA

    Dela &%eduling

  • 8/12/2019 2012 IN4392 Lecture-5 CloudProgrammingModels

    58/95

    Dela &%eduling

    0olution 0kip the head!of!line >ob until you find a >ob that can launch a local task

    Wait :' seconds before launching tasks off!rack

    :1 = :' = 13 seconds =[ Q< data locality

    '11!'1' 3Q

    8aharia et al.( 3ela7 schedulin)( a simple techni9ue forachiein) localit7 and fairness in clusterschedulin). EuroS7s &/#/. :lso T2 EECS-&//'-

    8#G# and Mapedu&e ?1+9A

    LGCC unner

  • 8/12/2019 2012 IN4392 Lecture-5 CloudProgrammingModels

    59/95

    LGCC 8lacement X allocation .entral for all clusters

    aintains cluster metadata

    !unner .onfiguration X deployment cluster monitoring

    Bro*/0hrink mechanism

    ! 8erformance isolation

    5ata isolation

    ailure isolation

    Oersion isolation

    '11!'1' 3I

    oala and Mapedu&e ?2+9A

    esiHing Me&%anism

  • 8/12/2019 2012 IN4392 Lecture-5 CloudProgrammingModels

    60/95

    :*o types of nodes Core nodes fully!functional nodes# *ith :ask:racker and 5ataDode (local disk access)

    ,ransient nodes compute nodes# *ith :ask:racker

    8arameters 7 = Dumber of running tasks per number of a"ailable slots

    8redefined 7min and 7ma6 thresholds

    8redefined constant step gro5tep/ s%rin=tep

    , = time elapsed bet*een t*o successi"e resource offers

    ree po c es *ro5-%rin= Poli& (*P) gro*!shrink but maintain bet*een 7min and 7ma6 *reed-*ro5 Poli& (**P) gro*# shrink *hen *orkload done

    *reed-*ro5-5it%-Data Poli& (**DP) BB8 *ith core nodes (local disk access)

    '11!'1' 4

    oala and Mapedu&e ?3+9A

    $or=loads

  • 8/12/2019 2012 IN4392 Lecture-5 CloudProgrammingModels

    61/95

    IQ< of >obs ] acebook process 4-I 9 and take less than a minute J1K

    Boogle reported in '2 computations *ith :9 of data on 1s of machines J'K

    - # - # - # - # ! #pp- 2\34# '1'-

    J'K A- 5ean and 0- Bhema*at# apreduce 0implified 5ata 8rocessing on arge .lusters# .omm- of the C.# Ool- 31# no- 1# pp- 16\11# 'Q-

    '11!'1' 41

    oala and Mapedu&e ?4+9A

    $ord&ount (,pe 0 ull sstem)

  • 8/12/2019 2012 IN4392 Lecture-5 CloudProgrammingModels

    62/95

    .8V 5isk

    1 B9 input data

    Q map tasks executed in 6 *a"es Wordcount is .8V!bound in the map phase

    '11!'1' 4'

    oala and Mapedu&e ?B+9A

    ort (,pe 0 ull sstem)

    .8V 5isk

  • 8/12/2019 2012 IN4392 Lecture-5 CloudProgrammingModels

    63/95

    .8V 5isk

    8erformed by the frame*ork during the shuffling phase

    0hort map phase *ith 2

  • 8/12/2019 2012 IN4392 Lecture-5 CloudProgrammingModels

    64/95

    Gptimum

    a) Wordcount b) 0ort(:ype 1) (:ype ')

    0peedup relati"e to an cluster *ith 1 core nodes

    Close to linear speedup on &ore nodes

    '11!'1' 42

    oala and Mapedu&e ?J+9A

    "6e&ution ,ime s. ,ransient Nodes

  • 8/12/2019 2012 IN4392 Lecture-5 CloudProgrammingModels

    65/95

    7nput data of 2 B9

    Wordcount output data = ' L9 0ort output data = 2 B9

    Wordcount scales better than 0ort on transient nodes'11!'1' 43

  • 8/12/2019 2012 IN4392 Lecture-5 CloudProgrammingModels

    66/95

  • 8/12/2019 2012 IN4392 Lecture-5 CloudProgrammingModels

    67/95

    #genda

  • 8/12/2019 2012 IN4392 Lecture-5 CloudProgrammingModels

    68/95

    1- 7ntroduction

    -

    - 8rogramming odels for .ompute!7ntensi"e Workloads

    4. Programming Models or ig Data

    1- apeduce

    2. *rap% Pro&essing

    - Gther 9i 5ata 8ro rammin odels

    3- 0ummary

    '1'!'1 4Q

    *rap% Pro&essing "6ampleingle-our&e %ortest Pat% (P)

  • 8/12/2019 2012 IN4392 Lecture-5 CloudProgrammingModels

    69/95

    5i>kstra?s algorithm

    Vpdate neighbors G(_E_ + _O_`log_O_) *ith ibo@eap

    7nitial datasetC ^# (9# 3)# (5# )[

    . ^inf# (# 3)[5 ^inf# (9# 1)# (.# )# (E# 2)# (# 2)[

    ---'1'!'1 4I

    0ource .laudio artella# 8resentation on Biraph at :V 5elft# Cpr '1'-

    *rap% Pro&essing "6ampleP in Mapedu&e

  • 8/12/2019 2012 IN4392 Lecture-5 CloudProgrammingModels

    70/95

    apper output distances

    % What is the performance problem&

    ^ # [# # [# # n [# ---

    9ut also graph structure

    ^C# ^# (9# 3)# (5# )[ ---

    educer input distances9 ^inf# 3[# 5 ^inf# [ ---

    9ut also graph structure9 ^inf# (E# 1)[ ---

    '1'!'1 6

    0ource .laudio artella# 8resentation on Biraph at :V 5elft# Cpr '1'-

    D>obs# *here D is thegraph diameter

    ,%e PregelProgramming Model or *rap% Pro&essing

  • 8/12/2019 2012 IN4392 Lecture-5 CloudProgrammingModels

    71/95

    g g p g

    9atch!oriented processing

    uns in!memory

    Oertex!centric C87

    ault!tolerant

    !

    GL# the actual model follo*s in the next slides

    '1'!'1 61

    0ource .laudio artella# 8resentation on Biraph at :V 5elft# Cpr '1'-

    ,%e PregelProgramming Model or *rap% Pro&essing

  • 8/12/2019 2012 IN4392 Lecture-5 CloudProgrammingModels

    72/95

    g g p g

    9ased on Oaliant?s

    8rocessorsocal .omputation

    9ulk 0ynchronous 8arallel (908) D processing units *ith fast local memory

    0hared communication medium

    0eries of upersteps

    Blobal 0ynchroni,ation 9arrier

    Ends *hen all "ote:o@alt

    .ommunication

    8regel executes initiali,ation# oneor se"eral supersteps# shutdo*n

    '1'!'1 6'

    9arrier0ynchroni,ation

    0ource .laudio artella# 8resentation on Biraph at :V 5elft# Cpr '1'-

    Pregel,%e uperstep

  • 8/12/2019 2012 IN4392 Lecture-5 CloudProgrammingModels

    73/95

    p p

    Each Oertex (execution in parallel) ecei"e messages from other "ertices

    8erform o*n computation (user!defined function)

    odify o*n state or state of outgoing messages

    utate topology of the graph

    0end messages to other "ertices

    :ermination condition Cll "ertices inacti"e

    Cll messages ha"e been transmitted

    '1'!'1 6

    0ource 8regel article-

    Pregel ,%e

  • 8/12/2019 2012 IN4392 Lecture-5 CloudProgrammingModels

    74/95

    7nputmessage

    7mplementsprocessingalgorithm

    Gutput

    '1'!'1 62

    message

    0ource 8regel article-

    ,%e Pregel #r&%ite&tureMaster-$or=er

  • 8/12/2019 2012 IN4392 Lecture-5 CloudProgrammingModels

    75/95

    aster assigns "ertices to Workers Braph partitioning

    aster

    r r u r

    aster coordinates .heckpoints

    Workers execute "ertices compute()

    Workers exchange messages directly

    Worker1

    Workern

    1 '3

    '1'!'1 63

    2

    6k

    0ource 8regel article-

    Pregel Perorman&eP on 1 illion-

  • 8/12/2019 2012 IN4392 Lecture-5 CloudProgrammingModels

    76/95

    '1'!'1 64

    0ource 8regel article-

    Pregel Perorman&eP on andom *rap%s

  • 8/12/2019 2012 IN4392 Lecture-5 CloudProgrammingModels

    77/95

    '1'!'1 66

    0ource 8regel article-

    #pa&%e *irap%#n 8pen-our&e Implementation o Pregel

  • 8/12/2019 2012 IN4392 Lecture-5 CloudProgrammingModels

    78/95

    :asktracker :asktracker:asktracker :asktracker

    oose implementation of 8regel

    DDDD X A:X A:YookeeperYookeeper

    0lot0lot 0lot0lot 0lot0lot 0lot0lot0lot0lot 0lot0lot 0lot0lot 0lot0lot

    8ersistentcomputation state

    aster

    0trong community (acebook# :*itter# inked7n)

    uns 1< on existing @adoop clusters

    0ingle ap!only >ob'1'!'1 6Q

    0ource .laudio artella# 8resentation on Biraph at :V 5elft# Cpr '1'-%ttp++in&u'ator.apa&%e.org+girap%+%ttp++in&u'ator.apa&%e.org+girap%+

    Page ran= 'en&%mar=s

    :iberium :an

  • 8/12/2019 2012 IN4392 Lecture-5 CloudProgrammingModels

    79/95

    :iberium :an Clmost 2 nodes# shared among numerous groups in SahooP

    @adoop -'-'2 (secure @adoop)

    - # #

    org-apache-giraph-benchmark-8ageank9enchmark Benerates data# number of edges# number of "ertices# Z of

    supersteps

    1 master/YooLeeper

    ' supersteps

    Do checkpoints

    1 random edge per "ertex

    6I

    0ource C"ery .hing presentation at @ortonWorks-http//***-slideshare-net/a"eryching/'11112horton*orks/

  • 8/12/2019 2012 IN4392 Lecture-5 CloudProgrammingModels

    80/95

  • 8/12/2019 2012 IN4392 Lecture-5 CloudProgrammingModels

    81/95

    '

    3

    1

    13

    ,

    otalse&onds

    Q1

    1 ' 2 3 4

    o erti&es (in 100s o millions)

    0ource C"ery .hing presentation at @ortonWorks-http//***-slideshare-net/a"eryching/'11112horton*orks/

  • 8/12/2019 2012 IN4392 Lecture-5 CloudProgrammingModels

    82/95

    3'

    illions)

    1

    '

    2

    3

    1

    13

    erti&es(100Nso

    ,

    otalse&onds

    Q'

    3 1 13 ' '3 3

    M

    o 5or=ers

    0ource C"ery .hing presentation at @ortonWorks-http//***-slideshare-net/a"eryching/'11112horton*orks/

    #genda

  • 8/12/2019 2012 IN4392 Lecture-5 CloudProgrammingModels

    83/95

    1- 7ntroduction

    -

    - 8rogramming odels for .ompute!7ntensi"e Workloads4. Programming Models or ig Data

    1- apeduce

    '- Braph 8rocessing

    3. 8t%er i Data Pro rammin Models

    3- 0ummary

    '1'!'1 Q

    tratosp%ere

  • 8/12/2019 2012 IN4392 Lecture-5 CloudProgrammingModels

    84/95

    eteor Nuery language# 0upremo operator frame*ork

    r r r r r Extended set of 'nd order functions ("s apeduce) 5eclarati"e definition of data parallelism

    Dephele execution engine 0chedules multiple dataflo*s simultaneously

    #

    @50 storage engine

    '1'!'1 Q2

    tratosp%ereProgramming Contra&ts (P#C,s) ?1+2A

  • 8/12/2019 2012 IN4392 Lecture-5 CloudProgrammingModels

    85/95

    'nd!order function

    '1'!'1 Q3

    0ource 8C.: o"er"ie*#https//stratosphere-eu/*iki/doku-php/*ikipactpm

    Clso in apeduce

    tratosp%ereProgramming Contra&ts (P#C,s) ?2+2A

  • 8/12/2019 2012 IN4392 Lecture-5 CloudProgrammingModels

    86/95

    % @o* can 8C.:soptimi,e data

    processing&

    '1'!'1 Q4

    0ource 8C.: o"er"ie*#https//stratosphere-eu/*iki/doku-php/*ikipactpm

    tratosp%ereNep%ele

  • 8/12/2019 2012 IN4392 Lecture-5 CloudProgrammingModels

    87/95

    '1'!'1 Q6

    tratosp%ere s Mapedu&e

    8C.: extends apeduce

  • 8/12/2019 2012 IN4392 Lecture-5 CloudProgrammingModels

    88/95

    9oth propose 'nd!order functions (3 8C.:s "s ap X educe)

    9oth reNuire from user 1st!order functions (*hat?s inside the ap)

    o can ene rom g er! e"e anguages

    8C.: ecosystem has 7aa0 support

    Ley!"alue

    data model

    '1'!'1 QQ

    0ource abian @ueske# arge 0cale 5ata Cnalysis 9eyond apeduce# @adoop Bet:ogether# eb '1'-

    tratosp%ere s Mapedu&ePair5ise %ortest Pat%s ?1+3A

    l d W h ll Cl ith

  • 8/12/2019 2012 IN4392 Lecture-5 CloudProgrammingModels

    89/95

    loyd!Warshall Clgorithm For k from 1 to n

    For j from 1 to n

    Di,j min( Di,j , Di,k + Dk,j )

    0ource 0tratosphere example#https//stratosphere-eu/*iki/lib/exe/detail-php/*ikiall'allFspFtaskdescription-png&id=*iki

  • 8/12/2019 2012 IN4392 Lecture-5 CloudProgrammingModels

    90/95

    7nputundirected graph

    '- 0ort bysmaller "ertex 75

    !

    Gutputtriangles

    2- :riads to triangles1- ead input graph

    '1'!'1 I

    0ource abian @ueske# arge 0cale 5ata Cnalysis 9eyond apeduce# @adoop Bet:ogether# eb '1' and 0tratosphere example-

    tratosp%ere s Mapedu&e,riangle "numeration ?2+3A

  • 8/12/2019 2012 IN4392 Lecture-5 CloudProgrammingModels

    91/95

    apeduce

    '1'!'1 I1

    0ource abian @ueske# arge 0cale 5ata Cnalysis 9eyond apeduce# @adoop Bet:ogether# eb '1' and 0tratosphere example-

    tratosp%ere s Mapedu&e,riangle "numeration ?3+3A

  • 8/12/2019 2012 IN4392 Lecture-5 CloudProgrammingModels

    92/95

    0tratosphere

    '1'!'1 I'

    0ource abian @ueske# arge 0cale 5ata Cnalysis 9eyond apeduce# @adoop Bet:ogether# eb '1' and 0tratosphere example-

    #genda

  • 8/12/2019 2012 IN4392 Lecture-5 CloudProgrammingModels

    93/95

    1- 7ntroduction

    -

    - 8rogramming odels for .ompute!7ntensi"e Workloads2- 8rogramming odels for 9ig 5ata

    B. ummar

    '1'!'1 I

    Con&lusion ,a=eCon&lusion ,a=e--ome Messageome Message

    Programming model L &omputer sstem a'stra&tion

  • 8/12/2019 2012 IN4392 Lecture-5 CloudProgrammingModels

    94/95

    Programming Models or Compute-Intensie $or=loads

    any trade!offs# fe* dominant programming models

    odels bags of tasks# *orkflo*s# master/*orker# 908# $

    Programming Models or ig Data

    ig data programming models %ae e&osstems

    any trade!offs# many programming models

    odels apeduce# 8regel# 8C.:# 5ryad# $

    Gctober 1# '1' I2

    Execution engines @adoop# Loala+# Biraph# 8C.:/Dephele# 5ryad# $

    ealit &%e&= &loud programming is maturing

    http//***-flickr-com/photos/dimitrisotiropoulos/2'264421Q/

    eading Material (eall #&tie 7ield)

    $or=loads

    Clexandru 7osup# 5ick @- A- Epema Brid .omputing Workloads- 7EEE 7nternet .omputing 13(') 1I!'4 ('11)

    ,%e 7ourt% Paradigm

  • 8/12/2019 2012 IN4392 Lecture-5 CloudProgrammingModels

    95/95

    g

    :he ourth 8aradigm# http//research-microsoft-com/en!us/collaboration/fourthparadigm/

    Programming Models or Compute-Intensie $or=loads

    Clexandru 7osup# athieu Aan# Gmer G,an 0onme,# 5ick @- A- Epema :he .haracteristics and 8erformance of Broups of Aobs in Brids-Euro!8ar '6 Q'!I

    Gstermann et al-# Gn the .haracteristics of Brid Workflo*s# .oreB75 7ntegrated esearch in Brid .omputing (.B7W)# 'Q-http//***-pds-e*i-tudelft-nl/iosup/*ftraces6charsFcamera-pdf

    .orina 0tratan# Clexandru 7osup# 5ick @- A- Epema C performance study of grid *orkflo* engines- B75 'Q '3!'

    Programming Models or ig Data

    Aeffrey 5ean# 0an>ay Bhema*at apeduce 0implified 5ata 8rocessing on arge .lusters- G057 '2 16!13

    Aeffrey 5ean# 0an>ay Bhema*at apeduce a flexible data processing tool- .ommun- C. 3(1) 6'!66 ('1) atei Yaharia# Cndy Lon*inski# Cnthony 5- Aoseph# andy Lat,# and 7on 0toica- 'Q- 7mpro"ing apeduce performance in

    heterogeneous en"ironments- 7n 8roceedings of the Qth V0ED7 conference on Gperating systems design and implementation (G057Q)-V0ED7 Cssociation# 9erkeley# .C# V0C# 'I!2'-

    # # # # #for achie"ing locality and fairness in cluster scheduling- Euro0ys '1 '43!'6Q

    :yson .ondie# Deil .on*ay# 8eter Cl"aro# Aoseph - @ellerstein# Lhaled Elmeleegy# ussell 0ears apeduce Gnline- D057 '1 1!'Q

    Br,egor, ale*ic,# atthe* @- Custern# Cart A- .- 9ik# Aames .- 5ehnert# 7lan @orn# Daty eiser# Br,egor, .,a>ko*ski 8regel a systemfor large!scale graph processing- 07BG5 .onference '1 13!124

    5ominic 9attr# 0tephan E*en# abian @ueske# Gde> Lao# Oolker arkl# 5aniel Warneke Dephele/8C.:s a programming model andexecution frame*ork for *eb!scale analytical processing- 0o.. '1 11I!1

    '1'!'1 I3