parallelization with the matlab® distributed computing server (mdcs) @ cbi cluster
TRANSCRIPT
![Page 1: Parallelization with the Matlab® Distributed Computing Server (MDCS) @ CBI cluster](https://reader033.vdocuments.us/reader033/viewer/2022051018/56649e115503460f94afdce3/html5/thumbnails/1.jpg)
Parallelization with the Matlab® Distributed Computing Server (MDCS) @ CBI cluster
![Page 2: Parallelization with the Matlab® Distributed Computing Server (MDCS) @ CBI cluster](https://reader033.vdocuments.us/reader033/viewer/2022051018/56649e115503460f94afdce3/html5/thumbnails/2.jpg)
2
Overview• Parallelization with Matlab using Parallel
Computing Toolbox(PCT)
• Matlab Distributed Computing Server Introduction
• Benefits of using the MDCS
• Hardware/Software/Utilization @ CBI
• MDCS Usage Scenarios
• Hands-on Training
![Page 3: Parallelization with the Matlab® Distributed Computing Server (MDCS) @ CBI cluster](https://reader033.vdocuments.us/reader033/viewer/2022051018/56649e115503460f94afdce3/html5/thumbnails/3.jpg)
3
Parallelization with Matlab PCT• The Matlab Parallel Computing Toolbox provides
access to multi-core, multi-system(MDCS), GPU parallelism.
• Many built-in Matlab functions directly support parallelism ( e.g. FFT ) transparently.
• Parallel constructs such as going from for loops to parfor loops.
• Allows handling of many different types of parallel software development challenges.
• MDCS allows scaling of locally developed parallel enabled Matlab applications.
![Page 4: Parallelization with the Matlab® Distributed Computing Server (MDCS) @ CBI cluster](https://reader033.vdocuments.us/reader033/viewer/2022051018/56649e115503460f94afdce3/html5/thumbnails/4.jpg)
4
Parallelization with Matlab PCT• Distributed / Parallel algorithm characteristics
– Memory Usage & CPU Usage
• Load a 4 Gigabyte file into Memory Calculate averages
– Communication/Data IO patterns
• Read file 1 ( 10 Gigabytes ) Run a function
• Worker B Send data to worker A run a function return data to worker B
– Dependencies
• Function 1 Function 2 Function 3
• Hardware resource contention ( e.g. 16 cores each trying to read /write a set of files, bandwidth limitations on RAM )
• Managing large #’s of small files Filesystem contention
![Page 5: Parallelization with the Matlab® Distributed Computing Server (MDCS) @ CBI cluster](https://reader033.vdocuments.us/reader033/viewer/2022051018/56649e115503460f94afdce3/html5/thumbnails/5.jpg)
5
Parallelization with Matlab PCT
GPU Cards/External Accelerator Cards
CPU’s, Multi-Cores
Clusters
Applications have layers of parallelism:For optimal solution, must look at the application as a whole.
Scalability: use as many workers as possible in an efficient manner
Matlab PCT + MDCS framework automates much of the complexity in developing parallel & distributed apps
![Page 6: Parallelization with the Matlab® Distributed Computing Server (MDCS) @ CBI cluster](https://reader033.vdocuments.us/reader033/viewer/2022051018/56649e115503460f94afdce3/html5/thumbnails/6.jpg)
6
Parallelization with Matlab PCT & MDCS
Distributed loops: parfor
Interactive development mode(matlabpool/pmode)
Distributed Arrays(spmd)
CPU’s, Multi-Cores MDCS Cluster
Scale out with the MDCS Cluster in Batch Job Submission Mode
![Page 7: Parallelization with the Matlab® Distributed Computing Server (MDCS) @ CBI cluster](https://reader033.vdocuments.us/reader033/viewer/2022051018/56649e115503460f94afdce3/html5/thumbnails/7.jpg)
7
MDCS BenefitsMDCS Worker Processes ( a.k.a. “Labs”)
– The workers never request regular Matlab or toolbox licenses.
– The only license an MDCS worker ever uses is an MDCS worker license( of which we have up to 64 ).
– Toolboxes are unlocked to an MDCS worker based on the licenses owned by the client during the job submission process.
– Wonderful parallel algorithm development environment with the superior visualization & profiling capabilities of the Matlab environment.
– Many built-in functions are parallel enabled: fft, lu, svd…
– Distributed arrays allow development of data – parallel algorithms
– Enable the scaling of codes that cannot be compiled using the Matlab Compiler Toolbox.
– Allows you to go from development on a laptop directly to running on up to 64 MDCS Labs. ( Some simulations can go from years of runtime to days of runtime on 64 MDCS Labs)
![Page 8: Parallelization with the Matlab® Distributed Computing Server (MDCS) @ CBI cluster](https://reader033.vdocuments.us/reader033/viewer/2022051018/56649e115503460f94afdce3/html5/thumbnails/8.jpg)
8
MDCS Structure
![Page 9: Parallelization with the Matlab® Distributed Computing Server (MDCS) @ CBI cluster](https://reader033.vdocuments.us/reader033/viewer/2022051018/56649e115503460f94afdce3/html5/thumbnails/9.jpg)
9
Hardware/Software/Utilization @ CBI
MDCS worker processes run on 4 physical servers Dell PowerEdge M910: Four x 16 core systems,
4x64GB RAM, 2x Intel Xeon 2.26 Ghz/system with 8 cores per processor
Total of 64 cores, with 256 GB total RAM distributed among systems
Max 64 MDCS worker licenses available Subsets of MDCS workers can be created based on
project needs
![Page 10: Parallelization with the Matlab® Distributed Computing Server (MDCS) @ CBI cluster](https://reader033.vdocuments.us/reader033/viewer/2022051018/56649e115503460f94afdce3/html5/thumbnails/10.jpg)
10
Usage scenarios Local system: Interactive Use: ( matlabpool /
spmd / pmode / mpiprofile ) – Local system(e.g. one of the Workstations @ CBI ) as part of initial
algorithm development.
MDCS: Non-interactive Use: Job&Task based– 2 main types: Independent vs. Communicating Jobs
• Both types can be used with either the local( on a non-cluster workstation ) or MDCS profile.
![Page 11: Parallelization with the Matlab® Distributed Computing Server (MDCS) @ CBI cluster](https://reader033.vdocuments.us/reader033/viewer/2022051018/56649e115503460f94afdce3/html5/thumbnails/11.jpg)
11
MDCS Workloads2 main types of workloads can be implemented with the MDCS:
– A job is logically decomposed into a set of tasks. The job may have 1 or more tasks, and each task may or may not have additional parallelism within it.
CASE 1: Independent Within a job the parallelism is fully independent, we have the opportunity to
use MDCS workers to offload some of the independent work units. The code will not make use parallel language features such as parfor, spmd. Note: In many cases, parfor can be transformed into a set of tasks.
– createJob() + createTask(), createTask(), … createTask()
CASE 2: Communicating Within a single job the parallelism is more complex, requiring the workers to
communicate or when parfor, spmd, codistributed arrays(language features are used from Parallel Compute Toolbox).
– createCommunicatingJob(), createTask()
![Page 12: Parallelization with the Matlab® Distributed Computing Server (MDCS) @ CBI cluster](https://reader033.vdocuments.us/reader033/viewer/2022051018/56649e115503460f94afdce3/html5/thumbnails/12.jpg)
12
MDCS Working Environment
![Page 13: Parallelization with the Matlab® Distributed Computing Server (MDCS) @ CBI cluster](https://reader033.vdocuments.us/reader033/viewer/2022051018/56649e115503460f94afdce3/html5/thumbnails/13.jpg)
13
MDCS Working Environment
![Page 14: Parallelization with the Matlab® Distributed Computing Server (MDCS) @ CBI cluster](https://reader033.vdocuments.us/reader033/viewer/2022051018/56649e115503460f94afdce3/html5/thumbnails/14.jpg)
14
Interactive Mode Sample(parfor)For well mapping workloads, parfor can yield exceptional performance improvement
From years to days / days to hours for certain workloads: ideally case are long running jobs with little or no inter-job communication.
Parfor enabled on the MDCS
Standard for loop
![Page 15: Parallelization with the Matlab® Distributed Computing Server (MDCS) @ CBI cluster](https://reader033.vdocuments.us/reader033/viewer/2022051018/56649e115503460f94afdce3/html5/thumbnails/15.jpg)
15
MDCS Scaling ( Batch Mode )
![Page 16: Parallelization with the Matlab® Distributed Computing Server (MDCS) @ CBI cluster](https://reader033.vdocuments.us/reader033/viewer/2022051018/56649e115503460f94afdce3/html5/thumbnails/16.jpg)
16
MDCS Scaling( Batch mode )
![Page 17: Parallelization with the Matlab® Distributed Computing Server (MDCS) @ CBI cluster](https://reader033.vdocuments.us/reader033/viewer/2022051018/56649e115503460f94afdce3/html5/thumbnails/17.jpg)
17
MDCS Scaling ( Batch mode )
![Page 18: Parallelization with the Matlab® Distributed Computing Server (MDCS) @ CBI cluster](https://reader033.vdocuments.us/reader033/viewer/2022051018/56649e115503460f94afdce3/html5/thumbnails/18.jpg)
18
Summary
• Applied examples of using MDCS in Batch mode available as part of hands-on section or via consulting appointment for more in-depth MDCS usage information.
• We can allocate a subset of MDCS workers on a per project basis.
![Page 19: Parallelization with the Matlab® Distributed Computing Server (MDCS) @ CBI cluster](https://reader033.vdocuments.us/reader033/viewer/2022051018/56649e115503460f94afdce3/html5/thumbnails/19.jpg)
19
Summary
• Wonderful parallel algorithm design & development environment
• Scale out codes up to 64 Matlab MDCS workers– Both distributed compute & memory
• Standard Matlab+Toolbox license usage minimization
• Many options to approach parallelization of computational workloads.
![Page 20: Parallelization with the Matlab® Distributed Computing Server (MDCS) @ CBI cluster](https://reader033.vdocuments.us/reader033/viewer/2022051018/56649e115503460f94afdce3/html5/thumbnails/20.jpg)
20
Acknowledgements
• This project received computational, research & development, software design/development support from the Computational System Biology Core/Computational Biology Initiative, funded by the National Institute on Minority Health and Health Disparities (G12MD007591) from the National Institutes of Health. URL: http://www.cbi.utsa.edu
![Page 22: Parallelization with the Matlab® Distributed Computing Server (MDCS) @ CBI cluster](https://reader033.vdocuments.us/reader033/viewer/2022051018/56649e115503460f94afdce3/html5/thumbnails/22.jpg)
22
Appendix A
![Page 23: Parallelization with the Matlab® Distributed Computing Server (MDCS) @ CBI cluster](https://reader033.vdocuments.us/reader033/viewer/2022051018/56649e115503460f94afdce3/html5/thumbnails/23.jpg)
23
Local Mode: Matlab Worker Process/Thread Structure
Parallel Toolbox constructs can be tested in local mode, the “lab” abstraction allows the actual process used for a lab to reside either locally or on a distributed server node.
MPI usedfor inter-process communication between “Labs”, Matlab Worker Processes
![Page 24: Parallelization with the Matlab® Distributed Computing Server (MDCS) @ CBI cluster](https://reader033.vdocuments.us/reader033/viewer/2022051018/56649e115503460f94afdce3/html5/thumbnails/24.jpg)
24
Local Mode Scaling Sample(parfor)
![Page 25: Parallelization with the Matlab® Distributed Computing Server (MDCS) @ CBI cluster](https://reader033.vdocuments.us/reader033/viewer/2022051018/56649e115503460f94afdce3/html5/thumbnails/25.jpg)
25
Interactive Mode Sample(pmode/spmd)
Each lab handles a piece of the data.
Results are gathered on lab 1.
Client session requests the complete data set to be sent to it using lab2client
![Page 26: Parallelization with the Matlab® Distributed Computing Server (MDCS) @ CBI cluster](https://reader033.vdocuments.us/reader033/viewer/2022051018/56649e115503460f94afdce3/html5/thumbnails/26.jpg)
26
Local vs. MDCS Mode Compare (parfor)
![Page 27: Parallelization with the Matlab® Distributed Computing Server (MDCS) @ CBI cluster](https://reader033.vdocuments.us/reader033/viewer/2022051018/56649e115503460f94afdce3/html5/thumbnails/27.jpg)
27
Appendix B: MDCS Access
• Access to MDCS provided via Cheetah Cluster.– On Linux: ssh –Y [email protected]– qlogin – matlab &
![Page 28: Parallelization with the Matlab® Distributed Computing Server (MDCS) @ CBI cluster](https://reader033.vdocuments.us/reader033/viewer/2022051018/56649e115503460f94afdce3/html5/thumbnails/28.jpg)
28
Appendix B: MDCS Access• Access to MDCS provided via Cheetah Cluster.– On Windows: Using PuTTY + Xming w/X11
forwarding– qlogin – matlab &
![Page 29: Parallelization with the Matlab® Distributed Computing Server (MDCS) @ CBI cluster](https://reader033.vdocuments.us/reader033/viewer/2022051018/56649e115503460f94afdce3/html5/thumbnails/29.jpg)
29
References[1] http://www.mathworks.com/products/parallel-computing/ ( Parallel Computing Toolbox reference )[2] http://www.mathworks.com/help/toolbox/distcomp/f1-6010.html#brqxnfb-1 (Parallel Computing Toolbox)[3] http://www.mathworks.com/products/parallel-computing/builtin-parallel-support.html ( Parallel Computing Toolbox )[4] http://www.mathworks.com/products/distriben/supported/license-management.html ( MDCS License Management )[5] http://www.mathworks.com/cmsimages/dm_interact_wl_11322.jpg ( MDCS Architecture Overview )[6] http://www.mathworks.com/cmsimages/62006_wl_mdcs_fig1_wl.jpg ( MDCS Architecture Overview: Scalability )[7] http://www.mathworks.com/products/parallel-computing/builtin-parallel-support.html ( Built-in MDCS support )[8] http://www.mathworks.com/products/datasheets/pdf/matlab-distributed-computing-server.pdf ( MDCS Licensing )[9] http://www.psc.edu/index.php/matlab ( MDCS @ PCS)[10] http://www.mathworks.com/products/compiler/supported/compiler_support.html ( Compiler Support for MATLAB and Toolboxes )[11] http://www.mathworks.com/support/solutions/en/data/1-2MC1RY/?solution=1-2MC1RY ( SGE Integration )[12] http://www.mathworks.com/company/events/webinars/wbnr30965.html?id=30965&p1=70413&p2=70415 ( MDCS Administration )[13] http://www.mathworks.com/help/toolbox/mdce/f4-10664.html ( General MDCE Workflow )[14] http://www.mathworks.com/help/toolbox/distcomp/f3-10664.html ( Independent Jobs with MDCS )[15] http://cac.engin.umich.edu/swafs/training/pdfs/matlab.pdf ( MDCS @ Umich ) [16] http://www.mathworks.com/products/optimization/examples.html?file=/products/demos/shipping/optim/optimparfor.html ( Optimization toolbox example )[17] http://www.mathworks.com/products/distriben/examples.html ( MDCS Examples )[18] http://www.mathworks.com/support/product/DM/installation/ver_current/ ( MDCS Installation Guide R2012a )[19] http://www.psc.edu/index.php/matlab ( MDCS @ PSC )[20] http://rcc.its.psu.edu/resources/software/dmatlab/ ( MDCS @ Penn State )[21] http://ccr.buffalo.edu/support/software-resources/compilers-programming-languages/matlab/mdcs.html ( MDCS @ U of Buffalo)[22] http://www.cac.cornell.edu/wiki/index.php?title=Running_MDCS_Jobs_on_the_ATLAS_cluster ( MDCS @ Cornell )[23] http://www.mathworks.com/products/distriben/description3.html ( MDCS Licensing )[24] http://www.mathworks.com/cmsimages/dm_interact_wl_11322.jpg ( MDCS Architecture )
![Page 30: Parallelization with the Matlab® Distributed Computing Server (MDCS) @ CBI cluster](https://reader033.vdocuments.us/reader033/viewer/2022051018/56649e115503460f94afdce3/html5/thumbnails/30.jpg)
30
References[25] http://www.mathworks.com/access/helpdesk/help/toolbox/distcomp/bqxooam-1.html ( Built-in functions that work with distributed arrays )[26] http://www.rz.rwth-aachen.de/aw/cms/rz/Themen/hochleistungsrechnen/nutzung/nutzung_des_rechners_unter_windows/~sxm/
MATLAB_Parallel_Computing_Toolbox/?lang=de ( MDCS @ Aachen University )[27] http://www.mathworks.com/support/solutions/en/data/1-9D3XVH/index.html?solution=1-9D3XVH ( Compiled Matlab Applications using PCT + MDCS)[28] http://www.hpc.maths.unsw.edu.au/tensor/matlab ( MDCS @ UNSW )[29] http://blogs.mathworks.com/loren/2012/04/20/running-scripts-on-a-cluster-using-the-batch-command-in-parallel-computing-toolbox/ ( Batch command )[30] http://www.rcac.purdue.edu/userinfo/resources/peregrine1/userguide.cfm#run_pbs_examples_app_matlab_licenses_strategies ( MDCS @ Purdue )[31] http://www.mathworks.com/help/pdf_doc/distcomp/distcomp.pdf ( Parallel Computing Toolbox R2012a )[32] http://www.nccs.nasa.gov/matlab_instructions.html ( MDCS @ Nasa )[33] http://www.mathworks.com/help/toolbox/distcomp/rn/bs8h9g9-1.html ( PCT, MDCS R2012a interface changes )[34] http://www.mathworks.com/help/toolbox/distcomp/createcommunicatingjob.html ( Communicating jobs )[35] http://www.mathworks.com/products/parallel-computing/examples.html?file=/products/demos/shipping/distcomp/paralleltutorial_dividing_tasks.html
( Moving parfor loops to jobs+tasks )[36] http://people.sc.fsu.edu/~jburkardt/presentations/fsu_2011_matlab_tasks.pdf ( MDCS @ FSU: Task based parallelism )[37] http://www.icam.vt.edu/Computing/fdi_2012_parfor.pdf ( MDCS @ Virginia Tech: Parfor parallelism )[38] http://www.hpc.fsu.edu/ ( MDCS @ FSU, HPC main site )[39] http://www.mathworks.com/help/toolbox/distcomp/rn/bs8h9g9-1.html ( PCT Updates in R2012a )[40] http://www.mathworks.com/help/distcomp/using-matlab-functions-on-codistributed-arrays.html ( Built in functions available for Co-Distributed arrays )[41] http://scv.bu.edu/~kadin/Tutorials/PCT/matlab-pct.html ( Matlab PCT @ Boston University )[42] http://www.circ.rochester.edu/wiki/index.php/MatlabWorkshop#Example_using_distributed_arrays_for_FFT[43] http://www.advancedlinuxprogramming.com/alp-folder/alp-ch04-threads.pdf[44] http://www.mathworks.com/products/distriben/parallel/accelerate.html[45] http://www.mathworks.com/products/distriben/examples.html?file=/products/parallel-computing/includes/parallel.html[46] http://en.wikipedia.org/wiki/Gustafson%27s_law[47] http://www.mathworks.com/help/distcomp/index.html[48] http://www.mathworks.com/cmsimages/43623_wl_dm_using_paralles_forloops_wl.jpg[49] http://www.mathworks.com/help/distcomp/mpiprofile.html