high performance computing and big data analytics: an ...high performance computing and big data...
TRANSCRIPT
![Page 1: High Performance Computing and Big Data Analytics: An ...High Performance Computing and Big Data Analytics: An Introduction Matthew J. Denny University of Massachusetts Amherst mdenny@polsci.umass.edu](https://reader033.vdocuments.us/reader033/viewer/2022041612/5e383ef3b4bb956e9779f7bf/html5/thumbnails/1.jpg)
High Performance Computing and BigData Analytics: An Introduction
Matthew J. DennyUniversity of Massachusetts Amherst
8/4/2014
https://polsci.umass.edu/profiles/
denny matthew j/workshop-materials
![Page 2: High Performance Computing and Big Data Analytics: An ...High Performance Computing and Big Data Analytics: An Introduction Matthew J. Denny University of Massachusetts Amherst mdenny@polsci.umass.edu](https://reader033.vdocuments.us/reader033/viewer/2022041612/5e383ef3b4bb956e9779f7bf/html5/thumbnails/2.jpg)
Overview
What we will go over tonight:
1. What is High Performance Computing
(HPC)/Big Data Analytics?
2. Strategy - work smart, not hard.
3. Software and programming choices.
4. Hardware.
5. Resources.
![Page 3: High Performance Computing and Big Data Analytics: An ...High Performance Computing and Big Data Analytics: An Introduction Matthew J. Denny University of Massachusetts Amherst mdenny@polsci.umass.edu](https://reader033.vdocuments.us/reader033/viewer/2022041612/5e383ef3b4bb956e9779f7bf/html5/thumbnails/3.jpg)
Warning:
You will not receive direct,
career-advancing professional
compensation for developing HPC
and Big Data skills.
You still have to work on somethingimportant/interesting.
![Page 4: High Performance Computing and Big Data Analytics: An ...High Performance Computing and Big Data Analytics: An Introduction Matthew J. Denny University of Massachusetts Amherst mdenny@polsci.umass.edu](https://reader033.vdocuments.us/reader033/viewer/2022041612/5e383ef3b4bb956e9779f7bf/html5/thumbnails/4.jpg)
1. What is it?
![Page 6: High Performance Computing and Big Data Analytics: An ...High Performance Computing and Big Data Analytics: An Introduction Matthew J. Denny University of Massachusetts Amherst mdenny@polsci.umass.edu](https://reader033.vdocuments.us/reader033/viewer/2022041612/5e383ef3b4bb956e9779f7bf/html5/thumbnails/6.jpg)
What it is
I An approach more than a specific set
of tools
I Effort to scale up analysis.
I Determining the most efficient way tocomplete your analysis task.
I Time
I Resources
![Page 7: High Performance Computing and Big Data Analytics: An ...High Performance Computing and Big Data Analytics: An Introduction Matthew J. Denny University of Massachusetts Amherst mdenny@polsci.umass.edu](https://reader033.vdocuments.us/reader033/viewer/2022041612/5e383ef3b4bb956e9779f7bf/html5/thumbnails/7.jpg)
High Performance Computing
I Run analysis faster.
I Run analysis at larger scale.
I Work on more complex problems.
![Page 8: High Performance Computing and Big Data Analytics: An ...High Performance Computing and Big Data Analytics: An Introduction Matthew J. Denny University of Massachusetts Amherst mdenny@polsci.umass.edu](https://reader033.vdocuments.us/reader033/viewer/2022041612/5e383ef3b4bb956e9779f7bf/html5/thumbnails/8.jpg)
How?
I Make use of low overhead, high speed
programming languages (C, C++,
Fortran, Java, etc.)
I Parallelization.
I Efficient implementation.
I Good scheduling.
![Page 9: High Performance Computing and Big Data Analytics: An ...High Performance Computing and Big Data Analytics: An Introduction Matthew J. Denny University of Massachusetts Amherst mdenny@polsci.umass.edu](https://reader033.vdocuments.us/reader033/viewer/2022041612/5e383ef3b4bb956e9779f7bf/html5/thumbnails/9.jpg)
Big Data Analytics
I Work with larger datasets.
I Efficiently analyze large datasets.
I Leverage large amounts of data.
![Page 10: High Performance Computing and Big Data Analytics: An ...High Performance Computing and Big Data Analytics: An Introduction Matthew J. Denny University of Massachusetts Amherst mdenny@polsci.umass.edu](https://reader033.vdocuments.us/reader033/viewer/2022041612/5e383ef3b4bb956e9779f7bf/html5/thumbnails/10.jpg)
How?
I Use memory efficient data structures and
programming languages.
I More RAM.
I Databases.
I Efficient inference procedures.
I Good scheduling.
![Page 11: High Performance Computing and Big Data Analytics: An ...High Performance Computing and Big Data Analytics: An Introduction Matthew J. Denny University of Massachusetts Amherst mdenny@polsci.umass.edu](https://reader033.vdocuments.us/reader033/viewer/2022041612/5e383ef3b4bb956e9779f7bf/html5/thumbnails/11.jpg)
How they fit together
High Performance
Computing
![Page 12: High Performance Computing and Big Data Analytics: An ...High Performance Computing and Big Data Analytics: An Introduction Matthew J. Denny University of Massachusetts Amherst mdenny@polsci.umass.edu](https://reader033.vdocuments.us/reader033/viewer/2022041612/5e383ef3b4bb956e9779f7bf/html5/thumbnails/12.jpg)
2. Strategy
![Page 13: High Performance Computing and Big Data Analytics: An ...High Performance Computing and Big Data Analytics: An Introduction Matthew J. Denny University of Massachusetts Amherst mdenny@polsci.umass.edu](https://reader033.vdocuments.us/reader033/viewer/2022041612/5e383ef3b4bb956e9779f7bf/html5/thumbnails/13.jpg)
Checklist
1. Understand your challenge.
2. Determine which approach is
appropriate.
3. Exercise Constraint.
4. Work smarter, not harder.
![Page 14: High Performance Computing and Big Data Analytics: An ...High Performance Computing and Big Data Analytics: An Introduction Matthew J. Denny University of Massachusetts Amherst mdenny@polsci.umass.edu](https://reader033.vdocuments.us/reader033/viewer/2022041612/5e383ef3b4bb956e9779f7bf/html5/thumbnails/14.jpg)
Hardware constraints
I RAM = computer working memory –determines size of datasets you can work on.
I CPU = processor, determines speed ofanalysis and degree of parallelization.
![Page 15: High Performance Computing and Big Data Analytics: An ...High Performance Computing and Big Data Analytics: An Introduction Matthew J. Denny University of Massachusetts Amherst mdenny@polsci.umass.edu](https://reader033.vdocuments.us/reader033/viewer/2022041612/5e383ef3b4bb956e9779f7bf/html5/thumbnails/15.jpg)
Look at your activity monitor!
![Page 16: High Performance Computing and Big Data Analytics: An ...High Performance Computing and Big Data Analytics: An Introduction Matthew J. Denny University of Massachusetts Amherst mdenny@polsci.umass.edu](https://reader033.vdocuments.us/reader033/viewer/2022041612/5e383ef3b4bb956e9779f7bf/html5/thumbnails/16.jpg)
Other factors to consider
I Different analysis packages are designed
for different scales.
I Know your data.
I When does the project have to be
complete?
![Page 17: High Performance Computing and Big Data Analytics: An ...High Performance Computing and Big Data Analytics: An Introduction Matthew J. Denny University of Massachusetts Amherst mdenny@polsci.umass.edu](https://reader033.vdocuments.us/reader033/viewer/2022041612/5e383ef3b4bb956e9779f7bf/html5/thumbnails/17.jpg)
Understand your challenge
I Large dataset.
I Analysis takes a long time to run.
I Analysis requires many replications.
![Page 18: High Performance Computing and Big Data Analytics: An ...High Performance Computing and Big Data Analytics: An Introduction Matthew J. Denny University of Massachusetts Amherst mdenny@polsci.umass.edu](https://reader033.vdocuments.us/reader033/viewer/2022041612/5e383ef3b4bb956e9779f7bf/html5/thumbnails/18.jpg)
Large dataset – panel data, event historydata
I Determine memory requirements.
I Use memory efficient software.
I Find a high RAM computer.
I Break problem up.
![Page 19: High Performance Computing and Big Data Analytics: An ...High Performance Computing and Big Data Analytics: An Introduction Matthew J. Denny University of Massachusetts Amherst mdenny@polsci.umass.edu](https://reader033.vdocuments.us/reader033/viewer/2022041612/5e383ef3b4bb956e9779f7bf/html5/thumbnails/19.jpg)
Long run time – MCMC, optimization
I Determine approximate run time.
I Less than a month? – Just let it run.
I Reliable power, turn off automatic
updates, save periodically if possible.
I Implement in a faster language (C++
gives 1,000+ times speedup over R)
![Page 20: High Performance Computing and Big Data Analytics: An ...High Performance Computing and Big Data Analytics: An Introduction Matthew J. Denny University of Massachusetts Amherst mdenny@polsci.umass.edu](https://reader033.vdocuments.us/reader033/viewer/2022041612/5e383ef3b4bb956e9779f7bf/html5/thumbnails/20.jpg)
Many replications – cross validation,bootstrapping
I Write code to run one instance.
I Use looping, try subset first.
I Parallelize.
I Use several computers/cluster.
![Page 21: High Performance Computing and Big Data Analytics: An ...High Performance Computing and Big Data Analytics: An Introduction Matthew J. Denny University of Massachusetts Amherst mdenny@polsci.umass.edu](https://reader033.vdocuments.us/reader033/viewer/2022041612/5e383ef3b4bb956e9779f7bf/html5/thumbnails/21.jpg)
2.1 Parallelization
![Page 22: High Performance Computing and Big Data Analytics: An ...High Performance Computing and Big Data Analytics: An Introduction Matthew J. Denny University of Massachusetts Amherst mdenny@polsci.umass.edu](https://reader033.vdocuments.us/reader033/viewer/2022041612/5e383ef3b4bb956e9779f7bf/html5/thumbnails/22.jpg)
Homogeneous parallelization
A B FEDC
A
A B FEDC
B C
Homogeneous Task
Results
Multiple Processors
A B C D E F
A B C D FE
Results
Heterogeneous Task
![Page 23: High Performance Computing and Big Data Analytics: An ...High Performance Computing and Big Data Analytics: An Introduction Matthew J. Denny University of Massachusetts Amherst mdenny@polsci.umass.edu](https://reader033.vdocuments.us/reader033/viewer/2022041612/5e383ef3b4bb956e9779f7bf/html5/thumbnails/23.jpg)
Heterogeneous parallelization
A B FEDC
A
A B FEDC
B C
Homogeneous Task
Results
Multiple Processors
A B C D E F
A B C D FE
Results
Heterogeneous Task
![Page 24: High Performance Computing and Big Data Analytics: An ...High Performance Computing and Big Data Analytics: An Introduction Matthew J. Denny University of Massachusetts Amherst mdenny@polsci.umass.edu](https://reader033.vdocuments.us/reader033/viewer/2022041612/5e383ef3b4bb956e9779f7bf/html5/thumbnails/24.jpg)
Map-Reduce
A B D E F
A B C D FE
Reduce
C
A B C D FE
A B D E F
A B C D FE
C
A B C D FE
Update
Map
Map
Reduce
![Page 25: High Performance Computing and Big Data Analytics: An ...High Performance Computing and Big Data Analytics: An Introduction Matthew J. Denny University of Massachusetts Amherst mdenny@polsci.umass.edu](https://reader033.vdocuments.us/reader033/viewer/2022041612/5e383ef3b4bb956e9779f7bf/html5/thumbnails/25.jpg)
2.2 General Advice
![Page 26: High Performance Computing and Big Data Analytics: An ...High Performance Computing and Big Data Analytics: An Introduction Matthew J. Denny University of Massachusetts Amherst mdenny@polsci.umass.edu](https://reader033.vdocuments.us/reader033/viewer/2022041612/5e383ef3b4bb956e9779f7bf/html5/thumbnails/26.jpg)
It takes time...
I Implementing HPC can take
more time than you save.
I Reuse somebody else’s code where
possible – GOOGLE IT!
I You will not be professionally
rewarded for HPC skills.
I Get help, find a collaborator.
![Page 27: High Performance Computing and Big Data Analytics: An ...High Performance Computing and Big Data Analytics: An Introduction Matthew J. Denny University of Massachusetts Amherst mdenny@polsci.umass.edu](https://reader033.vdocuments.us/reader033/viewer/2022041612/5e383ef3b4bb956e9779f7bf/html5/thumbnails/27.jpg)
Exercise restraint
I Weight the costs of an HPC project
before pursuing it – get advice.
I HPC resources are expensive, be careful
with your money.
I Invest in resources/languages that will
be transferable to other projects.
![Page 28: High Performance Computing and Big Data Analytics: An ...High Performance Computing and Big Data Analytics: An Introduction Matthew J. Denny University of Massachusetts Amherst mdenny@polsci.umass.edu](https://reader033.vdocuments.us/reader033/viewer/2022041612/5e383ef3b4bb956e9779f7bf/html5/thumbnails/28.jpg)
Know the benefits
I Developing HPC skillset can make you a
valuable collaborator.
I HPC skills most valuable when they let
you work on a problem you could
not otherwise.
I Industry loves HPC, can get internships
in data science.
![Page 29: High Performance Computing and Big Data Analytics: An ...High Performance Computing and Big Data Analytics: An Introduction Matthew J. Denny University of Massachusetts Amherst mdenny@polsci.umass.edu](https://reader033.vdocuments.us/reader033/viewer/2022041612/5e383ef3b4bb956e9779f7bf/html5/thumbnails/29.jpg)
3. Software andProgramming Choices
![Page 30: High Performance Computing and Big Data Analytics: An ...High Performance Computing and Big Data Analytics: An Introduction Matthew J. Denny University of Massachusetts Amherst mdenny@polsci.umass.edu](https://reader033.vdocuments.us/reader033/viewer/2022041612/5e383ef3b4bb956e9779f7bf/html5/thumbnails/30.jpg)
Important considerations
1. Software platform can make a big
difference in speed/efficiency.
2. Programming speed and readability vs.
run time.
3. Repetitive tasks can be automated.
4. Remote access is valuable.
![Page 31: High Performance Computing and Big Data Analytics: An ...High Performance Computing and Big Data Analytics: An Introduction Matthew J. Denny University of Massachusetts Amherst mdenny@polsci.umass.edu](https://reader033.vdocuments.us/reader033/viewer/2022041612/5e383ef3b4bb956e9779f7bf/html5/thumbnails/31.jpg)
Software choices
I Stata, SAS, SPSS, Matlab
I Readable syntax, fast programming,
less control.
I R, Python
I Flexible, more control, harder to code.
I C++, C, Fortran, Java, CUDA
I Most control, fastest, hardest to code.
![Page 32: High Performance Computing and Big Data Analytics: An ...High Performance Computing and Big Data Analytics: An Introduction Matthew J. Denny University of Massachusetts Amherst mdenny@polsci.umass.edu](https://reader033.vdocuments.us/reader033/viewer/2022041612/5e383ef3b4bb956e9779f7bf/html5/thumbnails/32.jpg)
Stata
I Memory efficient
I Reasonably fast.
I Not flexible.
I Have to pay for multithreaded version.
![Page 33: High Performance Computing and Big Data Analytics: An ...High Performance Computing and Big Data Analytics: An Introduction Matthew J. Denny University of Massachusetts Amherst mdenny@polsci.umass.edu](https://reader033.vdocuments.us/reader033/viewer/2022041612/5e383ef3b4bb956e9779f7bf/html5/thumbnails/33.jpg)
Python
I Great for file I/O.
I Easy to read
I Incredibly flexible.
I Can interface with other languages.
![Page 34: High Performance Computing and Big Data Analytics: An ...High Performance Computing and Big Data Analytics: An Introduction Matthew J. Denny University of Massachusetts Amherst mdenny@polsci.umass.edu](https://reader033.vdocuments.us/reader033/viewer/2022041612/5e383ef3b4bb956e9779f7bf/html5/thumbnails/34.jpg)
R
I Access to many statistical packages.
I Existing code base for HPC.
I Not memory efficient, or fast.
![Page 35: High Performance Computing and Big Data Analytics: An ...High Performance Computing and Big Data Analytics: An Introduction Matthew J. Denny University of Massachusetts Amherst mdenny@polsci.umass.edu](https://reader033.vdocuments.us/reader033/viewer/2022041612/5e383ef3b4bb956e9779f7bf/html5/thumbnails/35.jpg)
C++
I Very fast.
I Interface with R.
I Difficult to program.
![Page 36: High Performance Computing and Big Data Analytics: An ...High Performance Computing and Big Data Analytics: An Introduction Matthew J. Denny University of Massachusetts Amherst mdenny@polsci.umass.edu](https://reader033.vdocuments.us/reader033/viewer/2022041612/5e383ef3b4bb956e9779f7bf/html5/thumbnails/36.jpg)
3.1 Software Packages
![Page 37: High Performance Computing and Big Data Analytics: An ...High Performance Computing and Big Data Analytics: An Introduction Matthew J. Denny University of Massachusetts Amherst mdenny@polsci.umass.edu](https://reader033.vdocuments.us/reader033/viewer/2022041612/5e383ef3b4bb956e9779f7bf/html5/thumbnails/37.jpg)
R packages for HPC
I snowfall – cluster paralellization
I sfClusterApplyLB( )
I parallel – included in base R
I mclapply( ) or foreach
I biglm – high memory regression
I bigglm( )
![Page 38: High Performance Computing and Big Data Analytics: An ...High Performance Computing and Big Data Analytics: An Introduction Matthew J. Denny University of Massachusetts Amherst mdenny@polsci.umass.edu](https://reader033.vdocuments.us/reader033/viewer/2022041612/5e383ef3b4bb956e9779f7bf/html5/thumbnails/38.jpg)
C++ in R
I Rcpp – allows for the integration of C++
code in R.
I RcppArmadillo – Gives access to
Armadillo libraries.
I RStudio – IDE of R with built in C++
debugging
![Page 39: High Performance Computing and Big Data Analytics: An ...High Performance Computing and Big Data Analytics: An Introduction Matthew J. Denny University of Massachusetts Amherst mdenny@polsci.umass.edu](https://reader033.vdocuments.us/reader033/viewer/2022041612/5e383ef3b4bb956e9779f7bf/html5/thumbnails/39.jpg)
3.2 ProgrammingChoices
![Page 40: High Performance Computing and Big Data Analytics: An ...High Performance Computing and Big Data Analytics: An Introduction Matthew J. Denny University of Massachusetts Amherst mdenny@polsci.umass.edu](https://reader033.vdocuments.us/reader033/viewer/2022041612/5e383ef3b4bb956e9779f7bf/html5/thumbnails/40.jpg)
Efficient R programming
I Loops are slow in R, but faster than
doing something by hand
I Built-in functions are mostly written in
C – much faster!
I Subset data before processing when
possible.
I Test with system.time({ code })
![Page 41: High Performance Computing and Big Data Analytics: An ...High Performance Computing and Big Data Analytics: An Introduction Matthew J. Denny University of Massachusetts Amherst mdenny@polsci.umass.edu](https://reader033.vdocuments.us/reader033/viewer/2022041612/5e383ef3b4bb956e9779f7bf/html5/thumbnails/41.jpg)
Loops are “slow” in R
system.time({
vect <- c(1:10000000)
total <- 0
#check using a loop
for(i in 1:length(as.numeric(vect))){
total <- total + vect[i]
}
print(total)
})
[1] 5e+13
user system elapsed
7.641 0.062 7.701
![Page 42: High Performance Computing and Big Data Analytics: An ...High Performance Computing and Big Data Analytics: An Introduction Matthew J. Denny University of Massachusetts Amherst mdenny@polsci.umass.edu](https://reader033.vdocuments.us/reader033/viewer/2022041612/5e383ef3b4bb956e9779f7bf/html5/thumbnails/42.jpg)
And fast in C
system.time({
vect <- c(1:10000000)
#use the builtin R function
total <- sum(as.numeric(vect))
print(total)
})
[1] 5e+13
user system elapsed
0.108 0.028 0.136
![Page 43: High Performance Computing and Big Data Analytics: An ...High Performance Computing and Big Data Analytics: An Introduction Matthew J. Denny University of Massachusetts Amherst mdenny@polsci.umass.edu](https://reader033.vdocuments.us/reader033/viewer/2022041612/5e383ef3b4bb956e9779f7bf/html5/thumbnails/43.jpg)
Summing over a sparse dataset
#number of observations
numobs <- 100000000
#observations we want to check
vec <- rep(0,numobs)
#only select 100 to check
vec[sample(1:numobs,100)] <- 1
#combine data
data <- cbind(c(1:numobs),vec)
![Page 44: High Performance Computing and Big Data Analytics: An ...High Performance Computing and Big Data Analytics: An Introduction Matthew J. Denny University of Massachusetts Amherst mdenny@polsci.umass.edu](https://reader033.vdocuments.us/reader033/viewer/2022041612/5e383ef3b4bb956e9779f7bf/html5/thumbnails/44.jpg)
Conditional checking
system.time({
total <- 0
for(i in 1:numobs){
if(data[i,2] == 1)
total <- total + data[i,1]
}
print(total)
})
[1] 5385484508
user system elapsed
199.917 0.289 200.350
![Page 45: High Performance Computing and Big Data Analytics: An ...High Performance Computing and Big Data Analytics: An Introduction Matthew J. Denny University of Massachusetts Amherst mdenny@polsci.umass.edu](https://reader033.vdocuments.us/reader033/viewer/2022041612/5e383ef3b4bb956e9779f7bf/html5/thumbnails/45.jpg)
Subsetting
system.time({
dat <- subset(data, data[,2] ==1)
total <- sum(dat[,1])
print(total)
})
[1] 5385484508
user system elapsed
5.474 1.497 8.245
![Page 46: High Performance Computing and Big Data Analytics: An ...High Performance Computing and Big Data Analytics: An Introduction Matthew J. Denny University of Massachusetts Amherst mdenny@polsci.umass.edu](https://reader033.vdocuments.us/reader033/viewer/2022041612/5e383ef3b4bb956e9779f7bf/html5/thumbnails/46.jpg)
3.3 Remote Access
![Page 47: High Performance Computing and Big Data Analytics: An ...High Performance Computing and Big Data Analytics: An Introduction Matthew J. Denny University of Massachusetts Amherst mdenny@polsci.umass.edu](https://reader033.vdocuments.us/reader033/viewer/2022041612/5e383ef3b4bb956e9779f7bf/html5/thumbnails/47.jpg)
Overview
I Connect from your laptop to an HPC
resource.
I Secure shell (SSH), graphical interface
(RDP/VNC) or web interface (RStudio
Server)
I Cluster job scheduling.
![Page 48: High Performance Computing and Big Data Analytics: An ...High Performance Computing and Big Data Analytics: An Introduction Matthew J. Denny University of Massachusetts Amherst mdenny@polsci.umass.edu](https://reader033.vdocuments.us/reader033/viewer/2022041612/5e383ef3b4bb956e9779f7bf/html5/thumbnails/48.jpg)
SSH
![Page 49: High Performance Computing and Big Data Analytics: An ...High Performance Computing and Big Data Analytics: An Introduction Matthew J. Denny University of Massachusetts Amherst mdenny@polsci.umass.edu](https://reader033.vdocuments.us/reader033/viewer/2022041612/5e383ef3b4bb956e9779f7bf/html5/thumbnails/49.jpg)
Connecting to a remotedesktop/Workstation
![Page 50: High Performance Computing and Big Data Analytics: An ...High Performance Computing and Big Data Analytics: An Introduction Matthew J. Denny University of Massachusetts Amherst mdenny@polsci.umass.edu](https://reader033.vdocuments.us/reader033/viewer/2022041612/5e383ef3b4bb956e9779f7bf/html5/thumbnails/50.jpg)
RStudio Server over Internet (Linux only)
![Page 51: High Performance Computing and Big Data Analytics: An ...High Performance Computing and Big Data Analytics: An Introduction Matthew J. Denny University of Massachusetts Amherst mdenny@polsci.umass.edu](https://reader033.vdocuments.us/reader033/viewer/2022041612/5e383ef3b4bb956e9779f7bf/html5/thumbnails/51.jpg)
How a cluster works
![Page 52: High Performance Computing and Big Data Analytics: An ...High Performance Computing and Big Data Analytics: An Introduction Matthew J. Denny University of Massachusetts Amherst mdenny@polsci.umass.edu](https://reader033.vdocuments.us/reader033/viewer/2022041612/5e383ef3b4bb956e9779f7bf/html5/thumbnails/52.jpg)
Connecting to a cluster
![Page 53: High Performance Computing and Big Data Analytics: An ...High Performance Computing and Big Data Analytics: An Introduction Matthew J. Denny University of Massachusetts Amherst mdenny@polsci.umass.edu](https://reader033.vdocuments.us/reader033/viewer/2022041612/5e383ef3b4bb956e9779f7bf/html5/thumbnails/53.jpg)
Job scheduling on a cluster
I System to share resources.
I Job Schedulers (Moab ,Grid Engine,
LoadLeveler, SLURM, LSF).
I SSH −→ FTP data to local directory
−→ submit ”job”
bsub -n 4 -R "rusage[mem=2048]"
-W 0:10 -q long example.sh
![Page 54: High Performance Computing and Big Data Analytics: An ...High Performance Computing and Big Data Analytics: An Introduction Matthew J. Denny University of Massachusetts Amherst mdenny@polsci.umass.edu](https://reader033.vdocuments.us/reader033/viewer/2022041612/5e383ef3b4bb956e9779f7bf/html5/thumbnails/54.jpg)
example.sh
The file has now been created and saved:
![Page 55: High Performance Computing and Big Data Analytics: An ...High Performance Computing and Big Data Analytics: An Introduction Matthew J. Denny University of Massachusetts Amherst mdenny@polsci.umass.edu](https://reader033.vdocuments.us/reader033/viewer/2022041612/5e383ef3b4bb956e9779f7bf/html5/thumbnails/55.jpg)
Connecting to your owndesktop/Workstation
I You will need an IP address:
Example – 35.2.23.132
I Static vs. Dynamic
I Your university should be able to provide
a static IP for free.
I Dyn DNS – $25 per year
http://dyn.com/remote-access/
![Page 56: High Performance Computing and Big Data Analytics: An ...High Performance Computing and Big Data Analytics: An Introduction Matthew J. Denny University of Massachusetts Amherst mdenny@polsci.umass.edu](https://reader033.vdocuments.us/reader033/viewer/2022041612/5e383ef3b4bb956e9779f7bf/html5/thumbnails/56.jpg)
Security concerns
I If you get a static IP, people will try to
hack into your computer.
I Set firewall (see course handout)
I Use a Virtual Private Network (VPN).
![Page 57: High Performance Computing and Big Data Analytics: An ...High Performance Computing and Big Data Analytics: An Introduction Matthew J. Denny University of Massachusetts Amherst mdenny@polsci.umass.edu](https://reader033.vdocuments.us/reader033/viewer/2022041612/5e383ef3b4bb956e9779f7bf/html5/thumbnails/57.jpg)
4. Hardware
![Page 58: High Performance Computing and Big Data Analytics: An ...High Performance Computing and Big Data Analytics: An Introduction Matthew J. Denny University of Massachusetts Amherst mdenny@polsci.umass.edu](https://reader033.vdocuments.us/reader033/viewer/2022041612/5e383ef3b4bb956e9779f7bf/html5/thumbnails/58.jpg)
Classes of hardware
1. Supercomputers
2. Mainframes
3. Cluster Computing Resources
4. Servers
5. HPC Workstations
6. Consumer Desktops
7. GPGPU
![Page 59: High Performance Computing and Big Data Analytics: An ...High Performance Computing and Big Data Analytics: An Introduction Matthew J. Denny University of Massachusetts Amherst mdenny@polsci.umass.edu](https://reader033.vdocuments.us/reader033/viewer/2022041612/5e383ef3b4bb956e9779f7bf/html5/thumbnails/59.jpg)
Supercomputers
I Used when all computing resources are
needed to solve one problem.I Physics, engineering, materials science
![Page 60: High Performance Computing and Big Data Analytics: An ...High Performance Computing and Big Data Analytics: An Introduction Matthew J. Denny University of Massachusetts Amherst mdenny@polsci.umass.edu](https://reader033.vdocuments.us/reader033/viewer/2022041612/5e383ef3b4bb956e9779f7bf/html5/thumbnails/60.jpg)
Mainframes
I Used for large database applications.
I Business analytics, healthcare.
![Page 61: High Performance Computing and Big Data Analytics: An ...High Performance Computing and Big Data Analytics: An Introduction Matthew J. Denny University of Massachusetts Amherst mdenny@polsci.umass.edu](https://reader033.vdocuments.us/reader033/viewer/2022041612/5e383ef3b4bb956e9779f7bf/html5/thumbnails/61.jpg)
Cluster Computing Resources
I Flexible, used for parallel and high
memory tasks.
I General purpose academic computing
infrastructure.
![Page 62: High Performance Computing and Big Data Analytics: An ...High Performance Computing and Big Data Analytics: An Introduction Matthew J. Denny University of Massachusetts Amherst mdenny@polsci.umass.edu](https://reader033.vdocuments.us/reader033/viewer/2022041612/5e383ef3b4bb956e9779f7bf/html5/thumbnails/62.jpg)
Servers
I Most often used for hosting websites.
I Can be useful for long jobs, high
memory.
![Page 63: High Performance Computing and Big Data Analytics: An ...High Performance Computing and Big Data Analytics: An Introduction Matthew J. Denny University of Massachusetts Amherst mdenny@polsci.umass.edu](https://reader033.vdocuments.us/reader033/viewer/2022041612/5e383ef3b4bb956e9779f7bf/html5/thumbnails/63.jpg)
HPC Workstations
I Personal mid-size high memory and
parallel computing.
I For people who moderate resources
constantly.
![Page 64: High Performance Computing and Big Data Analytics: An ...High Performance Computing and Big Data Analytics: An Introduction Matthew J. Denny University of Massachusetts Amherst mdenny@polsci.umass.edu](https://reader033.vdocuments.us/reader033/viewer/2022041612/5e383ef3b4bb956e9779f7bf/html5/thumbnails/64.jpg)
Desktop
I Everything, depending on how long you
are willing to wait.I Will run 95% of what you want to do.
![Page 65: High Performance Computing and Big Data Analytics: An ...High Performance Computing and Big Data Analytics: An Introduction Matthew J. Denny University of Massachusetts Amherst mdenny@polsci.umass.edu](https://reader033.vdocuments.us/reader033/viewer/2022041612/5e383ef3b4bb956e9779f7bf/html5/thumbnails/65.jpg)
General Purpose GPU Computing
I Problems that break down to small,
interdependent parts.I Bootstrapping, complex looping,
optimization.
![Page 66: High Performance Computing and Big Data Analytics: An ...High Performance Computing and Big Data Analytics: An Introduction Matthew J. Denny University of Massachusetts Amherst mdenny@polsci.umass.edu](https://reader033.vdocuments.us/reader033/viewer/2022041612/5e383ef3b4bb956e9779f7bf/html5/thumbnails/66.jpg)
Pricing tiers
I Cluster Access : Usually free through
your institution but often requires
application/faculty sponsorship.
I HPC Workstation: $8,000-$15,000 –
Not a good investment for most.
I Desktop: $700-$2000 – Often a very
good investment.
![Page 67: High Performance Computing and Big Data Analytics: An ...High Performance Computing and Big Data Analytics: An Introduction Matthew J. Denny University of Massachusetts Amherst mdenny@polsci.umass.edu](https://reader033.vdocuments.us/reader033/viewer/2022041612/5e383ef3b4bb956e9779f7bf/html5/thumbnails/67.jpg)
My suggestion
I Ask a faculty member for access. TRY
THIS FIRST.
I Investing in a desktop with 4C/8T (Intel
i7) and 16GB of ram is often a smart
idea if it will not get in the way of
conference attendance.
I Access a university cluster only once you
are confident a desktop can no longer
meet your needs.
![Page 68: High Performance Computing and Big Data Analytics: An ...High Performance Computing and Big Data Analytics: An Introduction Matthew J. Denny University of Massachusetts Amherst mdenny@polsci.umass.edu](https://reader033.vdocuments.us/reader033/viewer/2022041612/5e383ef3b4bb956e9779f7bf/html5/thumbnails/68.jpg)
Upgrades
I Old computers are good for HPC tasks
that simply take a while to run.
I Locate computer in academic office for
free electricity/internet/easier remote
access.
I Relatively cheap upgrades can
dramatically improve performance.
I BENCHMARK
![Page 69: High Performance Computing and Big Data Analytics: An ...High Performance Computing and Big Data Analytics: An Introduction Matthew J. Denny University of Massachusetts Amherst mdenny@polsci.umass.edu](https://reader033.vdocuments.us/reader033/viewer/2022041612/5e383ef3b4bb956e9779f7bf/html5/thumbnails/69.jpg)
Know your motherboard
I How many RAM Slots?
I Peripherals, CPU, GPU slots.
![Page 70: High Performance Computing and Big Data Analytics: An ...High Performance Computing and Big Data Analytics: An Introduction Matthew J. Denny University of Massachusetts Amherst mdenny@polsci.umass.edu](https://reader033.vdocuments.us/reader033/viewer/2022041612/5e383ef3b4bb956e9779f7bf/html5/thumbnails/70.jpg)
RAM
I Work with larger datasets.
I 16GB Kits – ($100-150)
I 32GB Kits – ($200-300)
![Page 71: High Performance Computing and Big Data Analytics: An ...High Performance Computing and Big Data Analytics: An Introduction Matthew J. Denny University of Massachusetts Amherst mdenny@polsci.umass.edu](https://reader033.vdocuments.us/reader033/viewer/2022041612/5e383ef3b4bb956e9779f7bf/html5/thumbnails/71.jpg)
Solid state hard drive
I General system performance, data I/O.
I $0.40-$0.80 per GB
I Leave 15-20% free.
![Page 72: High Performance Computing and Big Data Analytics: An ...High Performance Computing and Big Data Analytics: An Introduction Matthew J. Denny University of Massachusetts Amherst mdenny@polsci.umass.edu](https://reader033.vdocuments.us/reader033/viewer/2022041612/5e383ef3b4bb956e9779f7bf/html5/thumbnails/72.jpg)
Check review sites
![Page 73: High Performance Computing and Big Data Analytics: An ...High Performance Computing and Big Data Analytics: An Introduction Matthew J. Denny University of Massachusetts Amherst mdenny@polsci.umass.edu](https://reader033.vdocuments.us/reader033/viewer/2022041612/5e383ef3b4bb956e9779f7bf/html5/thumbnails/73.jpg)
Things to remember
I Get an Uninterrupted Power Supply
(UPS) for stable power.
I Put a sign on your computer that says
don’t touch.
![Page 74: High Performance Computing and Big Data Analytics: An ...High Performance Computing and Big Data Analytics: An Introduction Matthew J. Denny University of Massachusetts Amherst mdenny@polsci.umass.edu](https://reader033.vdocuments.us/reader033/viewer/2022041612/5e383ef3b4bb956e9779f7bf/html5/thumbnails/74.jpg)
Summary
I Don’t buy it unless you absolutely
need it.
I Most resources can be borrowed/had for
free.
I More powerful resources require more
time to learn.
![Page 75: High Performance Computing and Big Data Analytics: An ...High Performance Computing and Big Data Analytics: An Introduction Matthew J. Denny University of Massachusetts Amherst mdenny@polsci.umass.edu](https://reader033.vdocuments.us/reader033/viewer/2022041612/5e383ef3b4bb956e9779f7bf/html5/thumbnails/75.jpg)
5. Resources
![Page 76: High Performance Computing and Big Data Analytics: An ...High Performance Computing and Big Data Analytics: An Introduction Matthew J. Denny University of Massachusetts Amherst mdenny@polsci.umass.edu](https://reader033.vdocuments.us/reader033/viewer/2022041612/5e383ef3b4bb956e9779f7bf/html5/thumbnails/76.jpg)
R HPC resources
I Tim Churches – parallelization tutorial
https://github.com/timchurches/smaRts/blob/master/
parallel-package/R-parallel-package-example.md
I Introduction to Scientific Programming and
Simulation Using R
![Page 77: High Performance Computing and Big Data Analytics: An ...High Performance Computing and Big Data Analytics: An Introduction Matthew J. Denny University of Massachusetts Amherst mdenny@polsci.umass.edu](https://reader033.vdocuments.us/reader033/viewer/2022041612/5e383ef3b4bb956e9779f7bf/html5/thumbnails/77.jpg)
Rcpp/C++ resources
I Dirk Eddelbuettel – Rcpp
http://www.rcpp.org/
I Hadley Wickham – Advanced R
http://adv-r.had.co.nz/
I Armadillo library API documentation
http://arma.sourceforge.net/docs.html
![Page 78: High Performance Computing and Big Data Analytics: An ...High Performance Computing and Big Data Analytics: An Introduction Matthew J. Denny University of Massachusetts Amherst mdenny@polsci.umass.edu](https://reader033.vdocuments.us/reader033/viewer/2022041612/5e383ef3b4bb956e9779f7bf/html5/thumbnails/78.jpg)
Before tomorrow:
I Download RStudio.
I Get snowfall, biglm, Rcpp and
RcppArmadillo packages.
I Get Cygwin and PuTTY if you have a
Windows machine.
![Page 79: High Performance Computing and Big Data Analytics: An ...High Performance Computing and Big Data Analytics: An Introduction Matthew J. Denny University of Massachusetts Amherst mdenny@polsci.umass.edu](https://reader033.vdocuments.us/reader033/viewer/2022041612/5e383ef3b4bb956e9779f7bf/html5/thumbnails/79.jpg)
Course Materials Link:
https://polsci.umass.edu/profiles/
denny matthew j/workshop-materials