statistical cinderella: parallel computation for the rest of...
TRANSCRIPT
![Page 1: Statistical Cinderella: Parallel Computation for the Rest of Usheather.cs.ucdavis.edu/rifinance2018.pdf · Statistical Cinderella: Parallel Computation for the Rest of Us Norm Matlo](https://reader036.vdocuments.us/reader036/viewer/2022071219/6057d188d28beb4cba092222/html5/thumbnails/1.jpg)
StatisticalCinderella:
ParallelComputation
for the Rest ofUs
Norm MatloffUniversity ofCalifornia at
Davis
Statistical Cinderella: Parallel Computation forthe Rest of Us
Norm MatloffUniversity of California at Davis
R/Finance 2018Chicago, IL, USA, 1 June, 2018
These slides will be available athttp://heather.cs.ucdavis.edu/rifinance2018.pdf
![Page 2: Statistical Cinderella: Parallel Computation for the Rest of Usheather.cs.ucdavis.edu/rifinance2018.pdf · Statistical Cinderella: Parallel Computation for the Rest of Us Norm Matlo](https://reader036.vdocuments.us/reader036/viewer/2022071219/6057d188d28beb4cba092222/html5/thumbnails/2.jpg)
StatisticalCinderella:
ParallelComputation
for the Rest ofUs
Norm MatloffUniversity ofCalifornia at
Davis
Disclaimer
• “Everyone has opinions.”
• I’ll present mine.
• Dissent is encouraged. :-)
![Page 3: Statistical Cinderella: Parallel Computation for the Rest of Usheather.cs.ucdavis.edu/rifinance2018.pdf · Statistical Cinderella: Parallel Computation for the Rest of Us Norm Matlo](https://reader036.vdocuments.us/reader036/viewer/2022071219/6057d188d28beb4cba092222/html5/thumbnails/3.jpg)
StatisticalCinderella:
ParallelComputation
for the Rest ofUs
Norm MatloffUniversity ofCalifornia at
Davis
Disclaimer
• “Everyone has opinions.”
• I’ll present mine.
• Dissent is encouraged. :-)
![Page 4: Statistical Cinderella: Parallel Computation for the Rest of Usheather.cs.ucdavis.edu/rifinance2018.pdf · Statistical Cinderella: Parallel Computation for the Rest of Us Norm Matlo](https://reader036.vdocuments.us/reader036/viewer/2022071219/6057d188d28beb4cba092222/html5/thumbnails/4.jpg)
StatisticalCinderella:
ParallelComputation
for the Rest ofUs
Norm MatloffUniversity ofCalifornia at
Davis
The Drivers and Their Result
• Parallel hardware for the masses:
• 4 cores standard, 16 not too expensive• GPUs• Intel Xeon Phi, ≈ 60 cores (!), coprocessor, as low as a
few hundred dollars
• Big Data
• Whatever that is.
Result: Users believe,
“I’ve got the hardware and I’ve got the data need —so I should be all set to do parallel computation in Ron the data.”
But this “rule” is “honored in the breach,” as the Brits say.
![Page 5: Statistical Cinderella: Parallel Computation for the Rest of Usheather.cs.ucdavis.edu/rifinance2018.pdf · Statistical Cinderella: Parallel Computation for the Rest of Us Norm Matlo](https://reader036.vdocuments.us/reader036/viewer/2022071219/6057d188d28beb4cba092222/html5/thumbnails/5.jpg)
StatisticalCinderella:
ParallelComputation
for the Rest ofUs
Norm MatloffUniversity ofCalifornia at
Davis
The Drivers and Their Result
• Parallel hardware for the masses:
• 4 cores standard, 16 not too expensive• GPUs• Intel Xeon Phi, ≈ 60 cores (!), coprocessor, as low as a
few hundred dollars
• Big Data
• Whatever that is.
Result: Users believe,
“I’ve got the hardware and I’ve got the data need —so I should be all set to do parallel computation in Ron the data.”
But this “rule” is “honored in the breach,” as the Brits say.
![Page 6: Statistical Cinderella: Parallel Computation for the Rest of Usheather.cs.ucdavis.edu/rifinance2018.pdf · Statistical Cinderella: Parallel Computation for the Rest of Us Norm Matlo](https://reader036.vdocuments.us/reader036/viewer/2022071219/6057d188d28beb4cba092222/html5/thumbnails/6.jpg)
StatisticalCinderella:
ParallelComputation
for the Rest ofUs
Norm MatloffUniversity ofCalifornia at
Davis
The Drivers and Their Result
• Parallel hardware for the masses:
• 4 cores standard, 16 not too expensive
• GPUs• Intel Xeon Phi, ≈ 60 cores (!), coprocessor, as low as a
few hundred dollars
• Big Data
• Whatever that is.
Result: Users believe,
“I’ve got the hardware and I’ve got the data need —so I should be all set to do parallel computation in Ron the data.”
But this “rule” is “honored in the breach,” as the Brits say.
![Page 7: Statistical Cinderella: Parallel Computation for the Rest of Usheather.cs.ucdavis.edu/rifinance2018.pdf · Statistical Cinderella: Parallel Computation for the Rest of Us Norm Matlo](https://reader036.vdocuments.us/reader036/viewer/2022071219/6057d188d28beb4cba092222/html5/thumbnails/7.jpg)
StatisticalCinderella:
ParallelComputation
for the Rest ofUs
Norm MatloffUniversity ofCalifornia at
Davis
The Drivers and Their Result
• Parallel hardware for the masses:
• 4 cores standard, 16 not too expensive• GPUs
• Intel Xeon Phi, ≈ 60 cores (!), coprocessor, as low as afew hundred dollars
• Big Data
• Whatever that is.
Result: Users believe,
“I’ve got the hardware and I’ve got the data need —so I should be all set to do parallel computation in Ron the data.”
But this “rule” is “honored in the breach,” as the Brits say.
![Page 8: Statistical Cinderella: Parallel Computation for the Rest of Usheather.cs.ucdavis.edu/rifinance2018.pdf · Statistical Cinderella: Parallel Computation for the Rest of Us Norm Matlo](https://reader036.vdocuments.us/reader036/viewer/2022071219/6057d188d28beb4cba092222/html5/thumbnails/8.jpg)
StatisticalCinderella:
ParallelComputation
for the Rest ofUs
Norm MatloffUniversity ofCalifornia at
Davis
The Drivers and Their Result
• Parallel hardware for the masses:
• 4 cores standard, 16 not too expensive• GPUs• Intel Xeon Phi, ≈ 60 cores (!), coprocessor, as low as a
few hundred dollars
• Big Data
• Whatever that is.
Result: Users believe,
“I’ve got the hardware and I’ve got the data need —so I should be all set to do parallel computation in Ron the data.”
But this “rule” is “honored in the breach,” as the Brits say.
![Page 9: Statistical Cinderella: Parallel Computation for the Rest of Usheather.cs.ucdavis.edu/rifinance2018.pdf · Statistical Cinderella: Parallel Computation for the Rest of Us Norm Matlo](https://reader036.vdocuments.us/reader036/viewer/2022071219/6057d188d28beb4cba092222/html5/thumbnails/9.jpg)
StatisticalCinderella:
ParallelComputation
for the Rest ofUs
Norm MatloffUniversity ofCalifornia at
Davis
The Drivers and Their Result
• Parallel hardware for the masses:
• 4 cores standard, 16 not too expensive• GPUs• Intel Xeon Phi, ≈ 60 cores (!), coprocessor, as low as a
few hundred dollars
• Big Data
• Whatever that is.
Result: Users believe,
“I’ve got the hardware and I’ve got the data need —so I should be all set to do parallel computation in Ron the data.”
But this “rule” is “honored in the breach,” as the Brits say.
![Page 10: Statistical Cinderella: Parallel Computation for the Rest of Usheather.cs.ucdavis.edu/rifinance2018.pdf · Statistical Cinderella: Parallel Computation for the Rest of Us Norm Matlo](https://reader036.vdocuments.us/reader036/viewer/2022071219/6057d188d28beb4cba092222/html5/thumbnails/10.jpg)
StatisticalCinderella:
ParallelComputation
for the Rest ofUs
Norm MatloffUniversity ofCalifornia at
Davis
The Drivers and Their Result
• Parallel hardware for the masses:
• 4 cores standard, 16 not too expensive• GPUs• Intel Xeon Phi, ≈ 60 cores (!), coprocessor, as low as a
few hundred dollars
• Big Data
• Whatever that is.
Result: Users believe,
“I’ve got the hardware and I’ve got the data need —so I should be all set to do parallel computation in Ron the data.”
But this “rule” is “honored in the breach,” as the Brits say.
![Page 11: Statistical Cinderella: Parallel Computation for the Rest of Usheather.cs.ucdavis.edu/rifinance2018.pdf · Statistical Cinderella: Parallel Computation for the Rest of Us Norm Matlo](https://reader036.vdocuments.us/reader036/viewer/2022071219/6057d188d28beb4cba092222/html5/thumbnails/11.jpg)
StatisticalCinderella:
ParallelComputation
for the Rest ofUs
Norm MatloffUniversity ofCalifornia at
Davis
The Drivers and Their Result
• Parallel hardware for the masses:
• 4 cores standard, 16 not too expensive• GPUs• Intel Xeon Phi, ≈ 60 cores (!), coprocessor, as low as a
few hundred dollars
• Big Data
• Whatever that is.
Result: Users believe,
“I’ve got the hardware and I’ve got the data need —so I should be all set to do parallel computation in Ron the data.”
But this “rule” is “honored in the breach,” as the Brits say.
![Page 12: Statistical Cinderella: Parallel Computation for the Rest of Usheather.cs.ucdavis.edu/rifinance2018.pdf · Statistical Cinderella: Parallel Computation for the Rest of Us Norm Matlo](https://reader036.vdocuments.us/reader036/viewer/2022071219/6057d188d28beb4cba092222/html5/thumbnails/12.jpg)
StatisticalCinderella:
ParallelComputation
for the Rest ofUs
Norm MatloffUniversity ofCalifornia at
Davis
The Drivers and Their Result
• Parallel hardware for the masses:
• 4 cores standard, 16 not too expensive• GPUs• Intel Xeon Phi, ≈ 60 cores (!), coprocessor, as low as a
few hundred dollars
• Big Data
• Whatever that is.
Result: Users believe,
“I’ve got the hardware and I’ve got the data need —so I should be all set to do parallel computation in Ron the data.”
But this “rule” is “honored in the breach,” as the Brits say.
![Page 13: Statistical Cinderella: Parallel Computation for the Rest of Usheather.cs.ucdavis.edu/rifinance2018.pdf · Statistical Cinderella: Parallel Computation for the Rest of Us Norm Matlo](https://reader036.vdocuments.us/reader036/viewer/2022071219/6057d188d28beb4cba092222/html5/thumbnails/13.jpg)
StatisticalCinderella:
ParallelComputation
for the Rest ofUs
Norm MatloffUniversity ofCalifornia at
Davis
The Drivers and Their Result
• Parallel hardware for the masses:
• 4 cores standard, 16 not too expensive• GPUs• Intel Xeon Phi, ≈ 60 cores (!), coprocessor, as low as a
few hundred dollars
• Big Data
• Whatever that is.
Result: Users believe,
“I’ve got the hardware and I’ve got the data need —so I should be all set to do parallel computation in Ron the data.”
But this “rule” is “honored in the breach,” as the Brits say.
![Page 14: Statistical Cinderella: Parallel Computation for the Rest of Usheather.cs.ucdavis.edu/rifinance2018.pdf · Statistical Cinderella: Parallel Computation for the Rest of Us Norm Matlo](https://reader036.vdocuments.us/reader036/viewer/2022071219/6057d188d28beb4cba092222/html5/thumbnails/14.jpg)
StatisticalCinderella:
ParallelComputation
for the Rest ofUs
Norm MatloffUniversity ofCalifornia at
Davis
Not So Simple
• “Embarrassingly parallel” (EP) vs. non-EP algorithms.
• EP: Problem can be easily broken down in independenttasks, with easy combining. Embarrassing is good — butnot common enough.
• Overhead issues:
• Contention for memory/network.• Bandwidth limits — CPU/memory, CPU/network,
CPU/GPU.• Cache coherency problems (inconsistent caches in
multicore systems).• Contention for I/O ports.• OS/R limits on number of sockets (network connections).• Serialization.
![Page 15: Statistical Cinderella: Parallel Computation for the Rest of Usheather.cs.ucdavis.edu/rifinance2018.pdf · Statistical Cinderella: Parallel Computation for the Rest of Us Norm Matlo](https://reader036.vdocuments.us/reader036/viewer/2022071219/6057d188d28beb4cba092222/html5/thumbnails/15.jpg)
StatisticalCinderella:
ParallelComputation
for the Rest ofUs
Norm MatloffUniversity ofCalifornia at
Davis
Not So Simple
• “Embarrassingly parallel” (EP) vs. non-EP algorithms.
• EP: Problem can be easily broken down in independenttasks, with easy combining. Embarrassing is good — butnot common enough.
• Overhead issues:
• Contention for memory/network.• Bandwidth limits — CPU/memory, CPU/network,
CPU/GPU.• Cache coherency problems (inconsistent caches in
multicore systems).• Contention for I/O ports.• OS/R limits on number of sockets (network connections).• Serialization.
![Page 16: Statistical Cinderella: Parallel Computation for the Rest of Usheather.cs.ucdavis.edu/rifinance2018.pdf · Statistical Cinderella: Parallel Computation for the Rest of Us Norm Matlo](https://reader036.vdocuments.us/reader036/viewer/2022071219/6057d188d28beb4cba092222/html5/thumbnails/16.jpg)
StatisticalCinderella:
ParallelComputation
for the Rest ofUs
Norm MatloffUniversity ofCalifornia at
Davis
Not So Simple
• “Embarrassingly parallel” (EP) vs. non-EP algorithms.
• EP: Problem can be easily broken down in independenttasks, with easy combining.
Embarrassing is good — butnot common enough.
• Overhead issues:
• Contention for memory/network.• Bandwidth limits — CPU/memory, CPU/network,
CPU/GPU.• Cache coherency problems (inconsistent caches in
multicore systems).• Contention for I/O ports.• OS/R limits on number of sockets (network connections).• Serialization.
![Page 17: Statistical Cinderella: Parallel Computation for the Rest of Usheather.cs.ucdavis.edu/rifinance2018.pdf · Statistical Cinderella: Parallel Computation for the Rest of Us Norm Matlo](https://reader036.vdocuments.us/reader036/viewer/2022071219/6057d188d28beb4cba092222/html5/thumbnails/17.jpg)
StatisticalCinderella:
ParallelComputation
for the Rest ofUs
Norm MatloffUniversity ofCalifornia at
Davis
Not So Simple
• “Embarrassingly parallel” (EP) vs. non-EP algorithms.
• EP: Problem can be easily broken down in independenttasks, with easy combining. Embarrassing is good — butnot common enough.
• Overhead issues:
• Contention for memory/network.• Bandwidth limits — CPU/memory, CPU/network,
CPU/GPU.• Cache coherency problems (inconsistent caches in
multicore systems).• Contention for I/O ports.• OS/R limits on number of sockets (network connections).• Serialization.
![Page 18: Statistical Cinderella: Parallel Computation for the Rest of Usheather.cs.ucdavis.edu/rifinance2018.pdf · Statistical Cinderella: Parallel Computation for the Rest of Us Norm Matlo](https://reader036.vdocuments.us/reader036/viewer/2022071219/6057d188d28beb4cba092222/html5/thumbnails/18.jpg)
StatisticalCinderella:
ParallelComputation
for the Rest ofUs
Norm MatloffUniversity ofCalifornia at
Davis
Not So Simple
• “Embarrassingly parallel” (EP) vs. non-EP algorithms.
• EP: Problem can be easily broken down in independenttasks, with easy combining. Embarrassing is good — butnot common enough.
• Overhead issues:
• Contention for memory/network.• Bandwidth limits — CPU/memory, CPU/network,
CPU/GPU.• Cache coherency problems (inconsistent caches in
multicore systems).• Contention for I/O ports.• OS/R limits on number of sockets (network connections).• Serialization.
![Page 19: Statistical Cinderella: Parallel Computation for the Rest of Usheather.cs.ucdavis.edu/rifinance2018.pdf · Statistical Cinderella: Parallel Computation for the Rest of Us Norm Matlo](https://reader036.vdocuments.us/reader036/viewer/2022071219/6057d188d28beb4cba092222/html5/thumbnails/19.jpg)
StatisticalCinderella:
ParallelComputation
for the Rest ofUs
Norm MatloffUniversity ofCalifornia at
Davis
Not So Simple
• “Embarrassingly parallel” (EP) vs. non-EP algorithms.
• EP: Problem can be easily broken down in independenttasks, with easy combining. Embarrassing is good — butnot common enough.
• Overhead issues:
• Contention for memory/network.
• Bandwidth limits — CPU/memory, CPU/network,CPU/GPU.
• Cache coherency problems (inconsistent caches inmulticore systems).
• Contention for I/O ports.• OS/R limits on number of sockets (network connections).• Serialization.
![Page 20: Statistical Cinderella: Parallel Computation for the Rest of Usheather.cs.ucdavis.edu/rifinance2018.pdf · Statistical Cinderella: Parallel Computation for the Rest of Us Norm Matlo](https://reader036.vdocuments.us/reader036/viewer/2022071219/6057d188d28beb4cba092222/html5/thumbnails/20.jpg)
StatisticalCinderella:
ParallelComputation
for the Rest ofUs
Norm MatloffUniversity ofCalifornia at
Davis
Not So Simple
• “Embarrassingly parallel” (EP) vs. non-EP algorithms.
• EP: Problem can be easily broken down in independenttasks, with easy combining. Embarrassing is good — butnot common enough.
• Overhead issues:
• Contention for memory/network.• Bandwidth limits — CPU/memory, CPU/network,
CPU/GPU.
• Cache coherency problems (inconsistent caches inmulticore systems).
• Contention for I/O ports.• OS/R limits on number of sockets (network connections).• Serialization.
![Page 21: Statistical Cinderella: Parallel Computation for the Rest of Usheather.cs.ucdavis.edu/rifinance2018.pdf · Statistical Cinderella: Parallel Computation for the Rest of Us Norm Matlo](https://reader036.vdocuments.us/reader036/viewer/2022071219/6057d188d28beb4cba092222/html5/thumbnails/21.jpg)
StatisticalCinderella:
ParallelComputation
for the Rest ofUs
Norm MatloffUniversity ofCalifornia at
Davis
Not So Simple
• “Embarrassingly parallel” (EP) vs. non-EP algorithms.
• EP: Problem can be easily broken down in independenttasks, with easy combining. Embarrassing is good — butnot common enough.
• Overhead issues:
• Contention for memory/network.• Bandwidth limits — CPU/memory, CPU/network,
CPU/GPU.• Cache coherency problems (inconsistent caches in
multicore systems).
• Contention for I/O ports.• OS/R limits on number of sockets (network connections).• Serialization.
![Page 22: Statistical Cinderella: Parallel Computation for the Rest of Usheather.cs.ucdavis.edu/rifinance2018.pdf · Statistical Cinderella: Parallel Computation for the Rest of Us Norm Matlo](https://reader036.vdocuments.us/reader036/viewer/2022071219/6057d188d28beb4cba092222/html5/thumbnails/22.jpg)
StatisticalCinderella:
ParallelComputation
for the Rest ofUs
Norm MatloffUniversity ofCalifornia at
Davis
Not So Simple
• “Embarrassingly parallel” (EP) vs. non-EP algorithms.
• EP: Problem can be easily broken down in independenttasks, with easy combining. Embarrassing is good — butnot common enough.
• Overhead issues:
• Contention for memory/network.• Bandwidth limits — CPU/memory, CPU/network,
CPU/GPU.• Cache coherency problems (inconsistent caches in
multicore systems).• Contention for I/O ports.
• OS/R limits on number of sockets (network connections).• Serialization.
![Page 23: Statistical Cinderella: Parallel Computation for the Rest of Usheather.cs.ucdavis.edu/rifinance2018.pdf · Statistical Cinderella: Parallel Computation for the Rest of Us Norm Matlo](https://reader036.vdocuments.us/reader036/viewer/2022071219/6057d188d28beb4cba092222/html5/thumbnails/23.jpg)
StatisticalCinderella:
ParallelComputation
for the Rest ofUs
Norm MatloffUniversity ofCalifornia at
Davis
Not So Simple
• “Embarrassingly parallel” (EP) vs. non-EP algorithms.
• EP: Problem can be easily broken down in independenttasks, with easy combining. Embarrassing is good — butnot common enough.
• Overhead issues:
• Contention for memory/network.• Bandwidth limits — CPU/memory, CPU/network,
CPU/GPU.• Cache coherency problems (inconsistent caches in
multicore systems).• Contention for I/O ports.• OS/R limits on number of sockets (network connections).
• Serialization.
![Page 24: Statistical Cinderella: Parallel Computation for the Rest of Usheather.cs.ucdavis.edu/rifinance2018.pdf · Statistical Cinderella: Parallel Computation for the Rest of Us Norm Matlo](https://reader036.vdocuments.us/reader036/viewer/2022071219/6057d188d28beb4cba092222/html5/thumbnails/24.jpg)
StatisticalCinderella:
ParallelComputation
for the Rest ofUs
Norm MatloffUniversity ofCalifornia at
Davis
Not So Simple
• “Embarrassingly parallel” (EP) vs. non-EP algorithms.
• EP: Problem can be easily broken down in independenttasks, with easy combining. Embarrassing is good — butnot common enough.
• Overhead issues:
• Contention for memory/network.• Bandwidth limits — CPU/memory, CPU/network,
CPU/GPU.• Cache coherency problems (inconsistent caches in
multicore systems).• Contention for I/O ports.• OS/R limits on number of sockets (network connections).• Serialization.
![Page 25: Statistical Cinderella: Parallel Computation for the Rest of Usheather.cs.ucdavis.edu/rifinance2018.pdf · Statistical Cinderella: Parallel Computation for the Rest of Us Norm Matlo](https://reader036.vdocuments.us/reader036/viewer/2022071219/6057d188d28beb4cba092222/html5/thumbnails/25.jpg)
StatisticalCinderella:
ParallelComputation
for the Rest ofUs
Norm MatloffUniversity ofCalifornia at
Davis
Wish List
• Ability to run on various types of hardware — from R.
• Ease of use for the non-cognoscenti.
• Parameters to tweak for the experts or the daring.
![Page 26: Statistical Cinderella: Parallel Computation for the Rest of Usheather.cs.ucdavis.edu/rifinance2018.pdf · Statistical Cinderella: Parallel Computation for the Rest of Us Norm Matlo](https://reader036.vdocuments.us/reader036/viewer/2022071219/6057d188d28beb4cba092222/html5/thumbnails/26.jpg)
StatisticalCinderella:
ParallelComputation
for the Rest ofUs
Norm MatloffUniversity ofCalifornia at
Davis
Wish List
• Ability to run on various types of hardware — from R.
• Ease of use for the non-cognoscenti.
• Parameters to tweak for the experts or the daring.
![Page 27: Statistical Cinderella: Parallel Computation for the Rest of Usheather.cs.ucdavis.edu/rifinance2018.pdf · Statistical Cinderella: Parallel Computation for the Rest of Us Norm Matlo](https://reader036.vdocuments.us/reader036/viewer/2022071219/6057d188d28beb4cba092222/html5/thumbnails/27.jpg)
StatisticalCinderella:
ParallelComputation
for the Rest ofUs
Norm MatloffUniversity ofCalifornia at
Davis
Wish List
• Ability to run on various types of hardware — from R.
• Ease of use for the non-cognoscenti.
• Parameters to tweak for the experts or the daring.
![Page 28: Statistical Cinderella: Parallel Computation for the Rest of Usheather.cs.ucdavis.edu/rifinance2018.pdf · Statistical Cinderella: Parallel Computation for the Rest of Us Norm Matlo](https://reader036.vdocuments.us/reader036/viewer/2022071219/6057d188d28beb4cba092222/html5/thumbnails/28.jpg)
StatisticalCinderella:
ParallelComputation
for the Rest ofUs
Norm MatloffUniversity ofCalifornia at
Davis
Wish List
• Ability to run on various types of hardware — from R.
• Ease of use for the non-cognoscenti.
• Parameters to tweak for the experts or the daring.
![Page 29: Statistical Cinderella: Parallel Computation for the Rest of Usheather.cs.ucdavis.edu/rifinance2018.pdf · Statistical Cinderella: Parallel Computation for the Rest of Us Norm Matlo](https://reader036.vdocuments.us/reader036/viewer/2022071219/6057d188d28beb4cba092222/html5/thumbnails/29.jpg)
StatisticalCinderella:
ParallelComputation
for the Rest ofUs
Norm MatloffUniversity ofCalifornia at
Davis
Outline of My Remarks
• Overview of existing parallel computation options for Rusers.
• Level in terms of abstraction, i.e. high-level constructs.• Level in terms of tech sophistication needed.
“Help, I’m in over my head here!” – a prominentR developer, entering the parallel comp. world.
• “Cinderellas”: Many users are being overlooked.
• Not enough automatic, tranparent parallelism.• Not enough for quants, e.g. for time series methods.
• Well then, what can be done?
![Page 30: Statistical Cinderella: Parallel Computation for the Rest of Usheather.cs.ucdavis.edu/rifinance2018.pdf · Statistical Cinderella: Parallel Computation for the Rest of Us Norm Matlo](https://reader036.vdocuments.us/reader036/viewer/2022071219/6057d188d28beb4cba092222/html5/thumbnails/30.jpg)
StatisticalCinderella:
ParallelComputation
for the Rest ofUs
Norm MatloffUniversity ofCalifornia at
Davis
Outline of My Remarks
• Overview of existing parallel computation options for Rusers.
• Level in terms of abstraction, i.e. high-level constructs.• Level in terms of tech sophistication needed.
“Help, I’m in over my head here!” – a prominentR developer, entering the parallel comp. world.
• “Cinderellas”: Many users are being overlooked.
• Not enough automatic, tranparent parallelism.• Not enough for quants, e.g. for time series methods.
• Well then, what can be done?
![Page 31: Statistical Cinderella: Parallel Computation for the Rest of Usheather.cs.ucdavis.edu/rifinance2018.pdf · Statistical Cinderella: Parallel Computation for the Rest of Us Norm Matlo](https://reader036.vdocuments.us/reader036/viewer/2022071219/6057d188d28beb4cba092222/html5/thumbnails/31.jpg)
StatisticalCinderella:
ParallelComputation
for the Rest ofUs
Norm MatloffUniversity ofCalifornia at
Davis
Outline of My Remarks
• Overview of existing parallel computation options for Rusers.
• Level in terms of abstraction, i.e. high-level constructs.
• Level in terms of tech sophistication needed.
“Help, I’m in over my head here!” – a prominentR developer, entering the parallel comp. world.
• “Cinderellas”: Many users are being overlooked.
• Not enough automatic, tranparent parallelism.• Not enough for quants, e.g. for time series methods.
• Well then, what can be done?
![Page 32: Statistical Cinderella: Parallel Computation for the Rest of Usheather.cs.ucdavis.edu/rifinance2018.pdf · Statistical Cinderella: Parallel Computation for the Rest of Us Norm Matlo](https://reader036.vdocuments.us/reader036/viewer/2022071219/6057d188d28beb4cba092222/html5/thumbnails/32.jpg)
StatisticalCinderella:
ParallelComputation
for the Rest ofUs
Norm MatloffUniversity ofCalifornia at
Davis
Outline of My Remarks
• Overview of existing parallel computation options for Rusers.
• Level in terms of abstraction, i.e. high-level constructs.• Level in terms of tech sophistication needed.
“Help, I’m in over my head here!” – a prominentR developer, entering the parallel comp. world.
• “Cinderellas”: Many users are being overlooked.
• Not enough automatic, tranparent parallelism.• Not enough for quants, e.g. for time series methods.
• Well then, what can be done?
![Page 33: Statistical Cinderella: Parallel Computation for the Rest of Usheather.cs.ucdavis.edu/rifinance2018.pdf · Statistical Cinderella: Parallel Computation for the Rest of Us Norm Matlo](https://reader036.vdocuments.us/reader036/viewer/2022071219/6057d188d28beb4cba092222/html5/thumbnails/33.jpg)
StatisticalCinderella:
ParallelComputation
for the Rest ofUs
Norm MatloffUniversity ofCalifornia at
Davis
Outline of My Remarks
• Overview of existing parallel computation options for Rusers.
• Level in terms of abstraction, i.e. high-level constructs.• Level in terms of tech sophistication needed.
“Help, I’m in over my head here!” – a prominentR developer, entering the parallel comp. world.
• “Cinderellas”: Many users are being overlooked.
• Not enough automatic, tranparent parallelism.• Not enough for quants, e.g. for time series methods.
• Well then, what can be done?
![Page 34: Statistical Cinderella: Parallel Computation for the Rest of Usheather.cs.ucdavis.edu/rifinance2018.pdf · Statistical Cinderella: Parallel Computation for the Rest of Us Norm Matlo](https://reader036.vdocuments.us/reader036/viewer/2022071219/6057d188d28beb4cba092222/html5/thumbnails/34.jpg)
StatisticalCinderella:
ParallelComputation
for the Rest ofUs
Norm MatloffUniversity ofCalifornia at
Davis
Outline of My Remarks
• Overview of existing parallel computation options for Rusers.
• Level in terms of abstraction, i.e. high-level constructs.• Level in terms of tech sophistication needed.
“Help, I’m in over my head here!” – a prominentR developer, entering the parallel comp. world.
• “Cinderellas”: Many users are being overlooked.
• Not enough automatic, tranparent parallelism.• Not enough for quants, e.g. for time series methods.
• Well then, what can be done?
![Page 35: Statistical Cinderella: Parallel Computation for the Rest of Usheather.cs.ucdavis.edu/rifinance2018.pdf · Statistical Cinderella: Parallel Computation for the Rest of Us Norm Matlo](https://reader036.vdocuments.us/reader036/viewer/2022071219/6057d188d28beb4cba092222/html5/thumbnails/35.jpg)
StatisticalCinderella:
ParallelComputation
for the Rest ofUs
Norm MatloffUniversity ofCalifornia at
Davis
Outline of My Remarks
• Overview of existing parallel computation options for Rusers.
• Level in terms of abstraction, i.e. high-level constructs.• Level in terms of tech sophistication needed.
“Help, I’m in over my head here!” – a prominentR developer, entering the parallel comp. world.
• “Cinderellas”: Many users are being overlooked.
• Not enough automatic, tranparent parallelism.
• Not enough for quants, e.g. for time series methods.
• Well then, what can be done?
![Page 36: Statistical Cinderella: Parallel Computation for the Rest of Usheather.cs.ucdavis.edu/rifinance2018.pdf · Statistical Cinderella: Parallel Computation for the Rest of Us Norm Matlo](https://reader036.vdocuments.us/reader036/viewer/2022071219/6057d188d28beb4cba092222/html5/thumbnails/36.jpg)
StatisticalCinderella:
ParallelComputation
for the Rest ofUs
Norm MatloffUniversity ofCalifornia at
Davis
Outline of My Remarks
• Overview of existing parallel computation options for Rusers.
• Level in terms of abstraction, i.e. high-level constructs.• Level in terms of tech sophistication needed.
“Help, I’m in over my head here!” – a prominentR developer, entering the parallel comp. world.
• “Cinderellas”: Many users are being overlooked.
• Not enough automatic, tranparent parallelism.• Not enough for quants, e.g. for time series methods.
• Well then, what can be done?
![Page 37: Statistical Cinderella: Parallel Computation for the Rest of Usheather.cs.ucdavis.edu/rifinance2018.pdf · Statistical Cinderella: Parallel Computation for the Rest of Us Norm Matlo](https://reader036.vdocuments.us/reader036/viewer/2022071219/6057d188d28beb4cba092222/html5/thumbnails/37.jpg)
StatisticalCinderella:
ParallelComputation
for the Rest ofUs
Norm MatloffUniversity ofCalifornia at
Davis
Outline of My Remarks
• Overview of existing parallel computation options for Rusers.
• Level in terms of abstraction, i.e. high-level constructs.• Level in terms of tech sophistication needed.
“Help, I’m in over my head here!” – a prominentR developer, entering the parallel comp. world.
• “Cinderellas”: Many users are being overlooked.
• Not enough automatic, tranparent parallelism.• Not enough for quants, e.g. for time series methods.
• Well then, what can be done?
![Page 38: Statistical Cinderella: Parallel Computation for the Rest of Usheather.cs.ucdavis.edu/rifinance2018.pdf · Statistical Cinderella: Parallel Computation for the Rest of Us Norm Matlo](https://reader036.vdocuments.us/reader036/viewer/2022071219/6057d188d28beb4cba092222/html5/thumbnails/38.jpg)
StatisticalCinderella:
ParallelComputation
for the Rest ofUs
Norm MatloffUniversity ofCalifornia at
Davis
Available Software
software abstr. level handle non-EP sophis. level
C++/OpenMP low very good very high
C++/GPU low poor super high
RcppPar. pkg low good very high
parallel pkg medium medium medium
Rdsm pkg low good high
Spark/R pkgs medium poor high
Rmpi pkg low good high
foreach pkg medium poor medium
partools pkg medium good high medium
future pkg medium medium high medium
(OpenMP: standard library for parallelizing on multicore)
![Page 39: Statistical Cinderella: Parallel Computation for the Rest of Usheather.cs.ucdavis.edu/rifinance2018.pdf · Statistical Cinderella: Parallel Computation for the Rest of Us Norm Matlo](https://reader036.vdocuments.us/reader036/viewer/2022071219/6057d188d28beb4cba092222/html5/thumbnails/39.jpg)
StatisticalCinderella:
ParallelComputation
for the Rest ofUs
Norm MatloffUniversity ofCalifornia at
Davis
Available Software
software abstr. level handle non-EP sophis. level
C++/OpenMP low very good very high
C++/GPU low poor super high
RcppPar. pkg low good very high
parallel pkg medium medium medium
Rdsm pkg low good high
Spark/R pkgs medium poor high
Rmpi pkg low good high
foreach pkg medium poor medium
partools pkg medium good high medium
future pkg medium medium high medium
(OpenMP: standard library for parallelizing on multicore)
![Page 40: Statistical Cinderella: Parallel Computation for the Rest of Usheather.cs.ucdavis.edu/rifinance2018.pdf · Statistical Cinderella: Parallel Computation for the Rest of Us Norm Matlo](https://reader036.vdocuments.us/reader036/viewer/2022071219/6057d188d28beb4cba092222/html5/thumbnails/40.jpg)
StatisticalCinderella:
ParallelComputation
for the Rest ofUs
Norm MatloffUniversity ofCalifornia at
Davis
The Takeaways
• R has an impressive array of parallel software toolsavailable. Better than Python!
• However, all of those tools require a fair amount ofprogramming sophistication to use.
![Page 41: Statistical Cinderella: Parallel Computation for the Rest of Usheather.cs.ucdavis.edu/rifinance2018.pdf · Statistical Cinderella: Parallel Computation for the Rest of Us Norm Matlo](https://reader036.vdocuments.us/reader036/viewer/2022071219/6057d188d28beb4cba092222/html5/thumbnails/41.jpg)
StatisticalCinderella:
ParallelComputation
for the Rest ofUs
Norm MatloffUniversity ofCalifornia at
Davis
The Takeaways
• R has an impressive array of parallel software toolsavailable.
Better than Python!
• However, all of those tools require a fair amount ofprogramming sophistication to use.
![Page 42: Statistical Cinderella: Parallel Computation for the Rest of Usheather.cs.ucdavis.edu/rifinance2018.pdf · Statistical Cinderella: Parallel Computation for the Rest of Us Norm Matlo](https://reader036.vdocuments.us/reader036/viewer/2022071219/6057d188d28beb4cba092222/html5/thumbnails/42.jpg)
StatisticalCinderella:
ParallelComputation
for the Rest ofUs
Norm MatloffUniversity ofCalifornia at
Davis
The Takeaways
• R has an impressive array of parallel software toolsavailable. Better than Python!
• However, all of those tools require a fair amount ofprogramming sophistication to use.
![Page 43: Statistical Cinderella: Parallel Computation for the Rest of Usheather.cs.ucdavis.edu/rifinance2018.pdf · Statistical Cinderella: Parallel Computation for the Rest of Us Norm Matlo](https://reader036.vdocuments.us/reader036/viewer/2022071219/6057d188d28beb4cba092222/html5/thumbnails/43.jpg)
StatisticalCinderella:
ParallelComputation
for the Rest ofUs
Norm MatloffUniversity ofCalifornia at
Davis
The Takeaways
• R has an impressive array of parallel software toolsavailable. Better than Python!
• However, all of those tools require a fair amount ofprogramming sophistication to use.
![Page 44: Statistical Cinderella: Parallel Computation for the Rest of Usheather.cs.ucdavis.edu/rifinance2018.pdf · Statistical Cinderella: Parallel Computation for the Rest of Us Norm Matlo](https://reader036.vdocuments.us/reader036/viewer/2022071219/6057d188d28beb4cba092222/html5/thumbnails/44.jpg)
StatisticalCinderella:
ParallelComputation
for the Rest ofUs
Norm MatloffUniversity ofCalifornia at
Davis
Speciality Packages withTransparent Parallelism
• Using OpenMP, e.g. xgboost, recosystem.
• Using GPU, e.g. gmatrix (not active?).
• Even though transparent to the user in principle, may stillneed expertise in hardware/systems to make it run well.E.g. choice of number of threads, memory capacity issues.
![Page 45: Statistical Cinderella: Parallel Computation for the Rest of Usheather.cs.ucdavis.edu/rifinance2018.pdf · Statistical Cinderella: Parallel Computation for the Rest of Us Norm Matlo](https://reader036.vdocuments.us/reader036/viewer/2022071219/6057d188d28beb4cba092222/html5/thumbnails/45.jpg)
StatisticalCinderella:
ParallelComputation
for the Rest ofUs
Norm MatloffUniversity ofCalifornia at
Davis
Speciality Packages withTransparent Parallelism
• Using OpenMP, e.g. xgboost, recosystem.
• Using GPU, e.g. gmatrix (not active?).
• Even though transparent to the user in principle, may stillneed expertise in hardware/systems to make it run well.E.g. choice of number of threads, memory capacity issues.
![Page 46: Statistical Cinderella: Parallel Computation for the Rest of Usheather.cs.ucdavis.edu/rifinance2018.pdf · Statistical Cinderella: Parallel Computation for the Rest of Us Norm Matlo](https://reader036.vdocuments.us/reader036/viewer/2022071219/6057d188d28beb4cba092222/html5/thumbnails/46.jpg)
StatisticalCinderella:
ParallelComputation
for the Rest ofUs
Norm MatloffUniversity ofCalifornia at
Davis
Speciality Packages withTransparent Parallelism
• Using OpenMP, e.g. xgboost, recosystem.
• Using GPU, e.g. gmatrix (not active?).
• Even though transparent to the user in principle, may stillneed expertise in hardware/systems to make it run well.E.g. choice of number of threads, memory capacity issues.
![Page 47: Statistical Cinderella: Parallel Computation for the Rest of Usheather.cs.ucdavis.edu/rifinance2018.pdf · Statistical Cinderella: Parallel Computation for the Rest of Us Norm Matlo](https://reader036.vdocuments.us/reader036/viewer/2022071219/6057d188d28beb4cba092222/html5/thumbnails/47.jpg)
StatisticalCinderella:
ParallelComputation
for the Rest ofUs
Norm MatloffUniversity ofCalifornia at
Davis
Speciality Packages withTransparent Parallelism
• Using OpenMP, e.g. xgboost, recosystem.
• Using GPU, e.g. gmatrix (not active?).
• Even though transparent to the user in principle, may stillneed expertise in hardware/systems to make it run well.
E.g. choice of number of threads, memory capacity issues.
![Page 48: Statistical Cinderella: Parallel Computation for the Rest of Usheather.cs.ucdavis.edu/rifinance2018.pdf · Statistical Cinderella: Parallel Computation for the Rest of Us Norm Matlo](https://reader036.vdocuments.us/reader036/viewer/2022071219/6057d188d28beb4cba092222/html5/thumbnails/48.jpg)
StatisticalCinderella:
ParallelComputation
for the Rest ofUs
Norm MatloffUniversity ofCalifornia at
Davis
Speciality Packages withTransparent Parallelism
• Using OpenMP, e.g. xgboost, recosystem.
• Using GPU, e.g. gmatrix (not active?).
• Even though transparent to the user in principle, may stillneed expertise in hardware/systems to make it run well.E.g. choice of number of threads, memory capacity issues.
![Page 49: Statistical Cinderella: Parallel Computation for the Rest of Usheather.cs.ucdavis.edu/rifinance2018.pdf · Statistical Cinderella: Parallel Computation for the Rest of Us Norm Matlo](https://reader036.vdocuments.us/reader036/viewer/2022071219/6057d188d28beb4cba092222/html5/thumbnails/49.jpg)
StatisticalCinderella:
ParallelComputation
for the Rest ofUs
Norm MatloffUniversity ofCalifornia at
Davis
In Other Words...
• We need to FACE FACTS.
• The days in which data scientists could rely on “blackboxes” are GONE.
• One needs to have at least some knowledge of the innards:
• Machine Learning tuning parameters — defaults underfit,naive grid search selection overfits.
• For effective parallel computation, one must be adept atcoding and at software “tuning parameters,” e.g. numberof threads.
• Little or no hope for good automatic parallelism.
![Page 50: Statistical Cinderella: Parallel Computation for the Rest of Usheather.cs.ucdavis.edu/rifinance2018.pdf · Statistical Cinderella: Parallel Computation for the Rest of Us Norm Matlo](https://reader036.vdocuments.us/reader036/viewer/2022071219/6057d188d28beb4cba092222/html5/thumbnails/50.jpg)
StatisticalCinderella:
ParallelComputation
for the Rest ofUs
Norm MatloffUniversity ofCalifornia at
Davis
In Other Words...
• We need to FACE FACTS.
• The days in which data scientists could rely on “blackboxes” are GONE.
• One needs to have at least some knowledge of the innards:
• Machine Learning tuning parameters — defaults underfit,naive grid search selection overfits.
• For effective parallel computation, one must be adept atcoding and at software “tuning parameters,” e.g. numberof threads.
• Little or no hope for good automatic parallelism.
![Page 51: Statistical Cinderella: Parallel Computation for the Rest of Usheather.cs.ucdavis.edu/rifinance2018.pdf · Statistical Cinderella: Parallel Computation for the Rest of Us Norm Matlo](https://reader036.vdocuments.us/reader036/viewer/2022071219/6057d188d28beb4cba092222/html5/thumbnails/51.jpg)
StatisticalCinderella:
ParallelComputation
for the Rest ofUs
Norm MatloffUniversity ofCalifornia at
Davis
In Other Words...
• We need to FACE FACTS.
• The days in which data scientists could rely on “blackboxes” are GONE.
• One needs to have at least some knowledge of the innards:
• Machine Learning tuning parameters — defaults underfit,naive grid search selection overfits.
• For effective parallel computation, one must be adept atcoding and at software “tuning parameters,” e.g. numberof threads.
• Little or no hope for good automatic parallelism.
![Page 52: Statistical Cinderella: Parallel Computation for the Rest of Usheather.cs.ucdavis.edu/rifinance2018.pdf · Statistical Cinderella: Parallel Computation for the Rest of Us Norm Matlo](https://reader036.vdocuments.us/reader036/viewer/2022071219/6057d188d28beb4cba092222/html5/thumbnails/52.jpg)
StatisticalCinderella:
ParallelComputation
for the Rest ofUs
Norm MatloffUniversity ofCalifornia at
Davis
In Other Words...
• We need to FACE FACTS.
• The days in which data scientists could rely on “blackboxes” are GONE.
• One needs to have at least some knowledge of the innards:
• Machine Learning tuning parameters — defaults underfit,naive grid search selection overfits.
• For effective parallel computation, one must be adept atcoding and at software “tuning parameters,” e.g. numberof threads.
• Little or no hope for good automatic parallelism.
![Page 53: Statistical Cinderella: Parallel Computation for the Rest of Usheather.cs.ucdavis.edu/rifinance2018.pdf · Statistical Cinderella: Parallel Computation for the Rest of Us Norm Matlo](https://reader036.vdocuments.us/reader036/viewer/2022071219/6057d188d28beb4cba092222/html5/thumbnails/53.jpg)
StatisticalCinderella:
ParallelComputation
for the Rest ofUs
Norm MatloffUniversity ofCalifornia at
Davis
In Other Words...
• We need to FACE FACTS.
• The days in which data scientists could rely on “blackboxes” are GONE.
• One needs to have at least some knowledge of the innards:
• Machine Learning tuning parameters — defaults underfit,naive grid search selection overfits.
• For effective parallel computation, one must be adept atcoding and at software “tuning parameters,” e.g. numberof threads.
• Little or no hope for good automatic parallelism.
![Page 54: Statistical Cinderella: Parallel Computation for the Rest of Usheather.cs.ucdavis.edu/rifinance2018.pdf · Statistical Cinderella: Parallel Computation for the Rest of Us Norm Matlo](https://reader036.vdocuments.us/reader036/viewer/2022071219/6057d188d28beb4cba092222/html5/thumbnails/54.jpg)
StatisticalCinderella:
ParallelComputation
for the Rest ofUs
Norm MatloffUniversity ofCalifornia at
Davis
In Other Words...
• We need to FACE FACTS.
• The days in which data scientists could rely on “blackboxes” are GONE.
• One needs to have at least some knowledge of the innards:
• Machine Learning tuning parameters — defaults underfit,naive grid search selection overfits.
• For effective parallel computation, one must be adept atcoding and at software “tuning parameters,” e.g. numberof threads.
• Little or no hope for good automatic parallelism.
![Page 55: Statistical Cinderella: Parallel Computation for the Rest of Usheather.cs.ucdavis.edu/rifinance2018.pdf · Statistical Cinderella: Parallel Computation for the Rest of Us Norm Matlo](https://reader036.vdocuments.us/reader036/viewer/2022071219/6057d188d28beb4cba092222/html5/thumbnails/55.jpg)
StatisticalCinderella:
ParallelComputation
for the Rest ofUs
Norm MatloffUniversity ofCalifornia at
Davis
In Other Words...
• We need to FACE FACTS.
• The days in which data scientists could rely on “blackboxes” are GONE.
• One needs to have at least some knowledge of the innards:
• Machine Learning tuning parameters — defaults underfit,naive grid search selection overfits.
• For effective parallel computation, one must be adept atcoding and at software “tuning parameters,” e.g. numberof threads.
• Little or no hope for good automatic parallelism.
![Page 56: Statistical Cinderella: Parallel Computation for the Rest of Usheather.cs.ucdavis.edu/rifinance2018.pdf · Statistical Cinderella: Parallel Computation for the Rest of Us Norm Matlo](https://reader036.vdocuments.us/reader036/viewer/2022071219/6057d188d28beb4cba092222/html5/thumbnails/56.jpg)
StatisticalCinderella:
ParallelComputation
for the Rest ofUs
Norm MatloffUniversity ofCalifornia at
Davis
Parallel Computation of TimeSeries Analyses
Now let’s turn to time series. (Disclaimer: I am not an expertin time series.)
• Some parallel methods have been developed.
• E.g., if large matrices are involved (say models with longmemory), one can use OpenMP to parallelize matrixcomputations
• In some cases one can find a clever way to parallelize aspecific algorithm (F. Belletti, arXiv, 2015).
• But it’s much harder than for i.i.d. models.
![Page 57: Statistical Cinderella: Parallel Computation for the Rest of Usheather.cs.ucdavis.edu/rifinance2018.pdf · Statistical Cinderella: Parallel Computation for the Rest of Us Norm Matlo](https://reader036.vdocuments.us/reader036/viewer/2022071219/6057d188d28beb4cba092222/html5/thumbnails/57.jpg)
StatisticalCinderella:
ParallelComputation
for the Rest ofUs
Norm MatloffUniversity ofCalifornia at
Davis
Parallel Computation of TimeSeries Analyses
Now let’s turn to time series.
(Disclaimer: I am not an expertin time series.)
• Some parallel methods have been developed.
• E.g., if large matrices are involved (say models with longmemory), one can use OpenMP to parallelize matrixcomputations
• In some cases one can find a clever way to parallelize aspecific algorithm (F. Belletti, arXiv, 2015).
• But it’s much harder than for i.i.d. models.
![Page 58: Statistical Cinderella: Parallel Computation for the Rest of Usheather.cs.ucdavis.edu/rifinance2018.pdf · Statistical Cinderella: Parallel Computation for the Rest of Us Norm Matlo](https://reader036.vdocuments.us/reader036/viewer/2022071219/6057d188d28beb4cba092222/html5/thumbnails/58.jpg)
StatisticalCinderella:
ParallelComputation
for the Rest ofUs
Norm MatloffUniversity ofCalifornia at
Davis
Parallel Computation of TimeSeries Analyses
Now let’s turn to time series. (Disclaimer: I am not an expertin time series.)
• Some parallel methods have been developed.
• E.g., if large matrices are involved (say models with longmemory), one can use OpenMP to parallelize matrixcomputations
• In some cases one can find a clever way to parallelize aspecific algorithm (F. Belletti, arXiv, 2015).
• But it’s much harder than for i.i.d. models.
![Page 59: Statistical Cinderella: Parallel Computation for the Rest of Usheather.cs.ucdavis.edu/rifinance2018.pdf · Statistical Cinderella: Parallel Computation for the Rest of Us Norm Matlo](https://reader036.vdocuments.us/reader036/viewer/2022071219/6057d188d28beb4cba092222/html5/thumbnails/59.jpg)
StatisticalCinderella:
ParallelComputation
for the Rest ofUs
Norm MatloffUniversity ofCalifornia at
Davis
Parallel Computation of TimeSeries Analyses
Now let’s turn to time series. (Disclaimer: I am not an expertin time series.)
• Some parallel methods have been developed.
• E.g., if large matrices are involved (say models with longmemory), one can use OpenMP to parallelize matrixcomputations
• In some cases one can find a clever way to parallelize aspecific algorithm (F. Belletti, arXiv, 2015).
• But it’s much harder than for i.i.d. models.
![Page 60: Statistical Cinderella: Parallel Computation for the Rest of Usheather.cs.ucdavis.edu/rifinance2018.pdf · Statistical Cinderella: Parallel Computation for the Rest of Us Norm Matlo](https://reader036.vdocuments.us/reader036/viewer/2022071219/6057d188d28beb4cba092222/html5/thumbnails/60.jpg)
StatisticalCinderella:
ParallelComputation
for the Rest ofUs
Norm MatloffUniversity ofCalifornia at
Davis
Parallel Computation of TimeSeries Analyses
Now let’s turn to time series. (Disclaimer: I am not an expertin time series.)
• Some parallel methods have been developed.
• E.g., if large matrices are involved (say models with longmemory), one can use OpenMP to parallelize matrixcomputations
• In some cases one can find a clever way to parallelize aspecific algorithm (F. Belletti, arXiv, 2015).
• But it’s much harder than for i.i.d. models.
![Page 61: Statistical Cinderella: Parallel Computation for the Rest of Usheather.cs.ucdavis.edu/rifinance2018.pdf · Statistical Cinderella: Parallel Computation for the Rest of Us Norm Matlo](https://reader036.vdocuments.us/reader036/viewer/2022071219/6057d188d28beb4cba092222/html5/thumbnails/61.jpg)
StatisticalCinderella:
ParallelComputation
for the Rest ofUs
Norm MatloffUniversity ofCalifornia at
Davis
Parallel Computation of TimeSeries Analyses
Now let’s turn to time series. (Disclaimer: I am not an expertin time series.)
• Some parallel methods have been developed.
• E.g., if large matrices are involved (say models with longmemory), one can use OpenMP to parallelize matrixcomputations
• In some cases one can find a clever way to parallelize aspecific algorithm (F. Belletti, arXiv, 2015).
• But it’s much harder than for i.i.d. models.
![Page 62: Statistical Cinderella: Parallel Computation for the Rest of Usheather.cs.ucdavis.edu/rifinance2018.pdf · Statistical Cinderella: Parallel Computation for the Rest of Us Norm Matlo](https://reader036.vdocuments.us/reader036/viewer/2022071219/6057d188d28beb4cba092222/html5/thumbnails/62.jpg)
StatisticalCinderella:
ParallelComputation
for the Rest ofUs
Norm MatloffUniversity ofCalifornia at
Davis
Parallel Computation of TimeSeries Analyses
Now let’s turn to time series. (Disclaimer: I am not an expertin time series.)
• Some parallel methods have been developed.
• E.g., if large matrices are involved (say models with longmemory), one can use OpenMP to parallelize matrixcomputations
• In some cases one can find a clever way to parallelize aspecific algorithm (F. Belletti, arXiv, 2015).
• But it’s much harder than for i.i.d. models.
![Page 63: Statistical Cinderella: Parallel Computation for the Rest of Usheather.cs.ucdavis.edu/rifinance2018.pdf · Statistical Cinderella: Parallel Computation for the Rest of Us Norm Matlo](https://reader036.vdocuments.us/reader036/viewer/2022071219/6057d188d28beb4cba092222/html5/thumbnails/63.jpg)
StatisticalCinderella:
ParallelComputation
for the Rest ofUs
Norm MatloffUniversity ofCalifornia at
Davis
Possible Obstacles
• Matrix addition/multiplication is EP, but inversion is not.
• Parallelization based on math structures difficult to showasymptotic validity.
• Breaking t.s. data into chunks might not be EP, due toboundary effects. E.g. computing number of consecutiveperiods in which value is above a threshold — could spantwo chunks, or even more.
• Hyndman’s Rule: Any time series model eventually startsto go bad after very long lengths.
![Page 64: Statistical Cinderella: Parallel Computation for the Rest of Usheather.cs.ucdavis.edu/rifinance2018.pdf · Statistical Cinderella: Parallel Computation for the Rest of Us Norm Matlo](https://reader036.vdocuments.us/reader036/viewer/2022071219/6057d188d28beb4cba092222/html5/thumbnails/64.jpg)
StatisticalCinderella:
ParallelComputation
for the Rest ofUs
Norm MatloffUniversity ofCalifornia at
Davis
Possible Obstacles
• Matrix addition/multiplication is EP, but inversion is not.
• Parallelization based on math structures difficult to showasymptotic validity.
• Breaking t.s. data into chunks might not be EP, due toboundary effects. E.g. computing number of consecutiveperiods in which value is above a threshold — could spantwo chunks, or even more.
• Hyndman’s Rule: Any time series model eventually startsto go bad after very long lengths.
![Page 65: Statistical Cinderella: Parallel Computation for the Rest of Usheather.cs.ucdavis.edu/rifinance2018.pdf · Statistical Cinderella: Parallel Computation for the Rest of Us Norm Matlo](https://reader036.vdocuments.us/reader036/viewer/2022071219/6057d188d28beb4cba092222/html5/thumbnails/65.jpg)
StatisticalCinderella:
ParallelComputation
for the Rest ofUs
Norm MatloffUniversity ofCalifornia at
Davis
Possible Obstacles
• Matrix addition/multiplication is EP, but inversion is not.
• Parallelization based on math structures difficult to showasymptotic validity.
• Breaking t.s. data into chunks might not be EP, due toboundary effects. E.g. computing number of consecutiveperiods in which value is above a threshold — could spantwo chunks, or even more.
• Hyndman’s Rule: Any time series model eventually startsto go bad after very long lengths.
![Page 66: Statistical Cinderella: Parallel Computation for the Rest of Usheather.cs.ucdavis.edu/rifinance2018.pdf · Statistical Cinderella: Parallel Computation for the Rest of Us Norm Matlo](https://reader036.vdocuments.us/reader036/viewer/2022071219/6057d188d28beb4cba092222/html5/thumbnails/66.jpg)
StatisticalCinderella:
ParallelComputation
for the Rest ofUs
Norm MatloffUniversity ofCalifornia at
Davis
Possible Obstacles
• Matrix addition/multiplication is EP, but inversion is not.
• Parallelization based on math structures difficult to showasymptotic validity.
• Breaking t.s. data into chunks might not be EP, due toboundary effects.
E.g. computing number of consecutiveperiods in which value is above a threshold — could spantwo chunks, or even more.
• Hyndman’s Rule: Any time series model eventually startsto go bad after very long lengths.
![Page 67: Statistical Cinderella: Parallel Computation for the Rest of Usheather.cs.ucdavis.edu/rifinance2018.pdf · Statistical Cinderella: Parallel Computation for the Rest of Us Norm Matlo](https://reader036.vdocuments.us/reader036/viewer/2022071219/6057d188d28beb4cba092222/html5/thumbnails/67.jpg)
StatisticalCinderella:
ParallelComputation
for the Rest ofUs
Norm MatloffUniversity ofCalifornia at
Davis
Possible Obstacles
• Matrix addition/multiplication is EP, but inversion is not.
• Parallelization based on math structures difficult to showasymptotic validity.
• Breaking t.s. data into chunks might not be EP, due toboundary effects. E.g. computing number of consecutiveperiods in which value is above a threshold — could spantwo chunks, or even more.
• Hyndman’s Rule: Any time series model eventually startsto go bad after very long lengths.
![Page 68: Statistical Cinderella: Parallel Computation for the Rest of Usheather.cs.ucdavis.edu/rifinance2018.pdf · Statistical Cinderella: Parallel Computation for the Rest of Us Norm Matlo](https://reader036.vdocuments.us/reader036/viewer/2022071219/6057d188d28beb4cba092222/html5/thumbnails/68.jpg)
StatisticalCinderella:
ParallelComputation
for the Rest ofUs
Norm MatloffUniversity ofCalifornia at
Davis
Possible Obstacles
• Matrix addition/multiplication is EP, but inversion is not.
• Parallelization based on math structures difficult to showasymptotic validity.
• Breaking t.s. data into chunks might not be EP, due toboundary effects. E.g. computing number of consecutiveperiods in which value is above a threshold — could spantwo chunks, or even more.
• Hyndman’s Rule: Any time series model eventually startsto go bad after very long lengths.
![Page 69: Statistical Cinderella: Parallel Computation for the Rest of Usheather.cs.ucdavis.edu/rifinance2018.pdf · Statistical Cinderella: Parallel Computation for the Rest of Us Norm Matlo](https://reader036.vdocuments.us/reader036/viewer/2022071219/6057d188d28beb4cba092222/html5/thumbnails/69.jpg)
StatisticalCinderella:
ParallelComputation
for the Rest ofUs
Norm MatloffUniversity ofCalifornia at
Davis
Our partools Package
• On CRAN, but go to github.com/matloff for the latestversion.
• Large variety (78+) of functions for parallel datamanipulation and computation.
• Some functions do a lot, some just a little. The latter canbe combined into powerful tools, as with Unix/Linux/Macscripting.
• Built on top of parallel pkg., plus our own MPI-likeinternal system.
![Page 70: Statistical Cinderella: Parallel Computation for the Rest of Usheather.cs.ucdavis.edu/rifinance2018.pdf · Statistical Cinderella: Parallel Computation for the Rest of Us Norm Matlo](https://reader036.vdocuments.us/reader036/viewer/2022071219/6057d188d28beb4cba092222/html5/thumbnails/70.jpg)
StatisticalCinderella:
ParallelComputation
for the Rest ofUs
Norm MatloffUniversity ofCalifornia at
Davis
Our partools Package
• On CRAN, but go to github.com/matloff for the latestversion.
• Large variety (78+) of functions for parallel datamanipulation and computation.
• Some functions do a lot, some just a little. The latter canbe combined into powerful tools, as with Unix/Linux/Macscripting.
• Built on top of parallel pkg., plus our own MPI-likeinternal system.
![Page 71: Statistical Cinderella: Parallel Computation for the Rest of Usheather.cs.ucdavis.edu/rifinance2018.pdf · Statistical Cinderella: Parallel Computation for the Rest of Us Norm Matlo](https://reader036.vdocuments.us/reader036/viewer/2022071219/6057d188d28beb4cba092222/html5/thumbnails/71.jpg)
StatisticalCinderella:
ParallelComputation
for the Rest ofUs
Norm MatloffUniversity ofCalifornia at
Davis
Our partools Package
• On CRAN, but go to github.com/matloff for the latestversion.
• Large variety (78+) of functions for parallel datamanipulation and computation.
• Some functions do a lot, some just a little. The latter canbe combined into powerful tools, as with Unix/Linux/Macscripting.
• Built on top of parallel pkg., plus our own MPI-likeinternal system.
![Page 72: Statistical Cinderella: Parallel Computation for the Rest of Usheather.cs.ucdavis.edu/rifinance2018.pdf · Statistical Cinderella: Parallel Computation for the Rest of Us Norm Matlo](https://reader036.vdocuments.us/reader036/viewer/2022071219/6057d188d28beb4cba092222/html5/thumbnails/72.jpg)
StatisticalCinderella:
ParallelComputation
for the Rest ofUs
Norm MatloffUniversity ofCalifornia at
Davis
Our partools Package
• On CRAN, but go to github.com/matloff for the latestversion.
• Large variety (78+) of functions for parallel datamanipulation and computation.
• Some functions do a lot, some just a little. The latter canbe combined into powerful tools, as with Unix/Linux/Macscripting.
• Built on top of parallel pkg., plus our own MPI-likeinternal system.
![Page 73: Statistical Cinderella: Parallel Computation for the Rest of Usheather.cs.ucdavis.edu/rifinance2018.pdf · Statistical Cinderella: Parallel Computation for the Rest of Us Norm Matlo](https://reader036.vdocuments.us/reader036/viewer/2022071219/6057d188d28beb4cba092222/html5/thumbnails/73.jpg)
StatisticalCinderella:
ParallelComputation
for the Rest ofUs
Norm MatloffUniversity ofCalifornia at
Davis
Our partools Package
• On CRAN, but go to github.com/matloff for the latestversion.
• Large variety (78+) of functions for parallel datamanipulation and computation.
• Some functions do a lot, some just a little. The latter canbe combined into powerful tools, as with Unix/Linux/Macscripting.
• Built on top of parallel pkg., plus our own MPI-likeinternal system.
![Page 74: Statistical Cinderella: Parallel Computation for the Rest of Usheather.cs.ucdavis.edu/rifinance2018.pdf · Statistical Cinderella: Parallel Computation for the Rest of Us Norm Matlo](https://reader036.vdocuments.us/reader036/viewer/2022071219/6057d188d28beb4cba092222/html5/thumbnails/74.jpg)
StatisticalCinderella:
ParallelComputation
for the Rest ofUs
Norm MatloffUniversity ofCalifornia at
Davis
Themes
• “Leave It There” (LIT) theme: Keep data distributed aslong as possible throughout an analysis session, to avoidcostly communications delays. Borrows distrib. objectapproach from Hadoop/Spark but much more flexible.
• “Software Alchemy” — convert non-EP to stat. equivalentEP, thus easy parallelization.
![Page 75: Statistical Cinderella: Parallel Computation for the Rest of Usheather.cs.ucdavis.edu/rifinance2018.pdf · Statistical Cinderella: Parallel Computation for the Rest of Us Norm Matlo](https://reader036.vdocuments.us/reader036/viewer/2022071219/6057d188d28beb4cba092222/html5/thumbnails/75.jpg)
StatisticalCinderella:
ParallelComputation
for the Rest ofUs
Norm MatloffUniversity ofCalifornia at
Davis
Themes
• “Leave It There” (LIT) theme: Keep data distributed aslong as possible throughout an analysis session, to avoidcostly communications delays.
Borrows distrib. objectapproach from Hadoop/Spark but much more flexible.
• “Software Alchemy” — convert non-EP to stat. equivalentEP, thus easy parallelization.
![Page 76: Statistical Cinderella: Parallel Computation for the Rest of Usheather.cs.ucdavis.edu/rifinance2018.pdf · Statistical Cinderella: Parallel Computation for the Rest of Us Norm Matlo](https://reader036.vdocuments.us/reader036/viewer/2022071219/6057d188d28beb4cba092222/html5/thumbnails/76.jpg)
StatisticalCinderella:
ParallelComputation
for the Rest ofUs
Norm MatloffUniversity ofCalifornia at
Davis
Themes
• “Leave It There” (LIT) theme: Keep data distributed aslong as possible throughout an analysis session, to avoidcostly communications delays. Borrows distrib. objectapproach from Hadoop/Spark but much more flexible.
• “Software Alchemy” — convert non-EP to stat. equivalentEP, thus easy parallelization.
![Page 77: Statistical Cinderella: Parallel Computation for the Rest of Usheather.cs.ucdavis.edu/rifinance2018.pdf · Statistical Cinderella: Parallel Computation for the Rest of Us Norm Matlo](https://reader036.vdocuments.us/reader036/viewer/2022071219/6057d188d28beb4cba092222/html5/thumbnails/77.jpg)
StatisticalCinderella:
ParallelComputation
for the Rest ofUs
Norm MatloffUniversity ofCalifornia at
Davis
Themes
• “Leave It There” (LIT) theme: Keep data distributed aslong as possible throughout an analysis session, to avoidcostly communications delays. Borrows distrib. objectapproach from Hadoop/Spark but much more flexible.
• “Software Alchemy” — convert non-EP to stat. equivalentEP, thus easy parallelization.
![Page 78: Statistical Cinderella: Parallel Computation for the Rest of Usheather.cs.ucdavis.edu/rifinance2018.pdf · Statistical Cinderella: Parallel Computation for the Rest of Us Norm Matlo](https://reader036.vdocuments.us/reader036/viewer/2022071219/6057d188d28beb4cba092222/html5/thumbnails/78.jpg)
StatisticalCinderella:
ParallelComputation
for the Rest ofUs
Norm MatloffUniversity ofCalifornia at
Davis
Possible Obstacles
Can we extend partools to time series applications? Mustovercome:
• Boundary effects problems.
• SA predicated on i.i.d.
![Page 79: Statistical Cinderella: Parallel Computation for the Rest of Usheather.cs.ucdavis.edu/rifinance2018.pdf · Statistical Cinderella: Parallel Computation for the Rest of Us Norm Matlo](https://reader036.vdocuments.us/reader036/viewer/2022071219/6057d188d28beb4cba092222/html5/thumbnails/79.jpg)
StatisticalCinderella:
ParallelComputation
for the Rest ofUs
Norm MatloffUniversity ofCalifornia at
Davis
Possible Obstacles
Can we extend partools to time series applications?
Mustovercome:
• Boundary effects problems.
• SA predicated on i.i.d.
![Page 80: Statistical Cinderella: Parallel Computation for the Rest of Usheather.cs.ucdavis.edu/rifinance2018.pdf · Statistical Cinderella: Parallel Computation for the Rest of Us Norm Matlo](https://reader036.vdocuments.us/reader036/viewer/2022071219/6057d188d28beb4cba092222/html5/thumbnails/80.jpg)
StatisticalCinderella:
ParallelComputation
for the Rest ofUs
Norm MatloffUniversity ofCalifornia at
Davis
Possible Obstacles
Can we extend partools to time series applications? Mustovercome:
• Boundary effects problems.
• SA predicated on i.i.d.
![Page 81: Statistical Cinderella: Parallel Computation for the Rest of Usheather.cs.ucdavis.edu/rifinance2018.pdf · Statistical Cinderella: Parallel Computation for the Rest of Us Norm Matlo](https://reader036.vdocuments.us/reader036/viewer/2022071219/6057d188d28beb4cba092222/html5/thumbnails/81.jpg)
StatisticalCinderella:
ParallelComputation
for the Rest ofUs
Norm MatloffUniversity ofCalifornia at
Davis
Possible Obstacles
Can we extend partools to time series applications? Mustovercome:
• Boundary effects problems.
• SA predicated on i.i.d.
![Page 82: Statistical Cinderella: Parallel Computation for the Rest of Usheather.cs.ucdavis.edu/rifinance2018.pdf · Statistical Cinderella: Parallel Computation for the Rest of Us Norm Matlo](https://reader036.vdocuments.us/reader036/viewer/2022071219/6057d188d28beb4cba092222/html5/thumbnails/82.jpg)
StatisticalCinderella:
ParallelComputation
for the Rest ofUs
Norm MatloffUniversity ofCalifornia at
Davis
Sample partools Session
• Wikipedia page-access data, Kaggle, 145063 time series oflength 550.
• Say we wish to run arma() for each page. Each is quick,but 145K of them takes some time. Say we are interestedonly in ar1.
• Afterward, we will perform various other operations.
• By LIT Principle, first distribute the data to the workers,then avoid collecting it back to the manager node ifpossible.
![Page 83: Statistical Cinderella: Parallel Computation for the Rest of Usheather.cs.ucdavis.edu/rifinance2018.pdf · Statistical Cinderella: Parallel Computation for the Rest of Us Norm Matlo](https://reader036.vdocuments.us/reader036/viewer/2022071219/6057d188d28beb4cba092222/html5/thumbnails/83.jpg)
StatisticalCinderella:
ParallelComputation
for the Rest ofUs
Norm MatloffUniversity ofCalifornia at
Davis
Sample partools Session
• Wikipedia page-access data, Kaggle, 145063 time series oflength 550.
• Say we wish to run arma() for each page. Each is quick,but 145K of them takes some time. Say we are interestedonly in ar1.
• Afterward, we will perform various other operations.
• By LIT Principle, first distribute the data to the workers,then avoid collecting it back to the manager node ifpossible.
![Page 84: Statistical Cinderella: Parallel Computation for the Rest of Usheather.cs.ucdavis.edu/rifinance2018.pdf · Statistical Cinderella: Parallel Computation for the Rest of Us Norm Matlo](https://reader036.vdocuments.us/reader036/viewer/2022071219/6057d188d28beb4cba092222/html5/thumbnails/84.jpg)
StatisticalCinderella:
ParallelComputation
for the Rest ofUs
Norm MatloffUniversity ofCalifornia at
Davis
Sample partools Session
• Wikipedia page-access data, Kaggle, 145063 time series oflength 550.
• Say we wish to run arma() for each page. Each is quick,but 145K of them takes some time. Say we are interestedonly in ar1.
• Afterward, we will perform various other operations.
• By LIT Principle, first distribute the data to the workers,then avoid collecting it back to the manager node ifpossible.
![Page 85: Statistical Cinderella: Parallel Computation for the Rest of Usheather.cs.ucdavis.edu/rifinance2018.pdf · Statistical Cinderella: Parallel Computation for the Rest of Us Norm Matlo](https://reader036.vdocuments.us/reader036/viewer/2022071219/6057d188d28beb4cba092222/html5/thumbnails/85.jpg)
StatisticalCinderella:
ParallelComputation
for the Rest ofUs
Norm MatloffUniversity ofCalifornia at
Davis
Sample partools Session
• Wikipedia page-access data, Kaggle, 145063 time series oflength 550.
• Say we wish to run arma() for each page. Each is quick,but 145K of them takes some time. Say we are interestedonly in ar1.
• Afterward, we will perform various other operations.
• By LIT Principle, first distribute the data to the workers,then avoid collecting it back to the manager node ifpossible.
![Page 86: Statistical Cinderella: Parallel Computation for the Rest of Usheather.cs.ucdavis.edu/rifinance2018.pdf · Statistical Cinderella: Parallel Computation for the Rest of Us Norm Matlo](https://reader036.vdocuments.us/reader036/viewer/2022071219/6057d188d28beb4cba092222/html5/thumbnails/86.jpg)
StatisticalCinderella:
ParallelComputation
for the Rest ofUs
Norm MatloffUniversity ofCalifornia at
Davis
Sample partools Session
• Wikipedia page-access data, Kaggle, 145063 time series oflength 550.
• Say we wish to run arma() for each page. Each is quick,but 145K of them takes some time. Say we are interestedonly in ar1.
• Afterward, we will perform various other operations.
• By LIT Principle, first distribute the data to the workers,then avoid collecting it back to the manager node ifpossible.
![Page 87: Statistical Cinderella: Parallel Computation for the Rest of Usheather.cs.ucdavis.edu/rifinance2018.pdf · Statistical Cinderella: Parallel Computation for the Rest of Us Norm Matlo](https://reader036.vdocuments.us/reader036/viewer/2022071219/6057d188d28beb4cba092222/html5/thumbnails/87.jpg)
StatisticalCinderella:
ParallelComputation
for the Rest ofUs
Norm MatloffUniversity ofCalifornia at
Davis
Serial Version
wd ← as . matr ix ( read . csv ( ’ t r a i n 1 . c sv ’ ) )wdc ← wd [ complete . c a s e s (wd ) , ]armac ← f u n c t i o n ( x ){ z ← NA; t r y ( z ← arma ( x )$ coef [ 1 ] } ; z )
system . time ( z ← apply (wdc , 1 , armac ) )# 624 . 4 5 2 0 . 1 6 4 6 2 4 . 6 4 8
# f i n d the one s w i th weak c o r r e l a t i o n
# f o r f u r t h e r a n a l y s i s
wdlt05 ← wdc [ z < 0 . 5 , ]# v a r i o u s f u r t h e r ops ( not shown )
![Page 88: Statistical Cinderella: Parallel Computation for the Rest of Usheather.cs.ucdavis.edu/rifinance2018.pdf · Statistical Cinderella: Parallel Computation for the Rest of Us Norm Matlo](https://reader036.vdocuments.us/reader036/viewer/2022071219/6057d188d28beb4cba092222/html5/thumbnails/88.jpg)
StatisticalCinderella:
ParallelComputation
for the Rest ofUs
Norm MatloffUniversity ofCalifornia at
Davis
Serial Version
wd ← as . matr ix ( read . csv ( ’ t r a i n 1 . c sv ’ ) )wdc ← wd [ complete . c a s e s (wd ) , ]armac ← f u n c t i o n ( x ){ z ← NA; t r y ( z ← arma ( x )$ coef [ 1 ] } ; z )
system . time ( z ← apply (wdc , 1 , armac ) )# 624 . 4 5 2 0 . 1 6 4 6 2 4 . 6 4 8
# f i n d th e one s w i th weak c o r r e l a t i o n
# f o r f u r t h e r a n a l y s i s
wdlt05 ← wdc [ z < 0 . 5 , ]# v a r i o u s f u r t h e r ops ( not shown )
![Page 89: Statistical Cinderella: Parallel Computation for the Rest of Usheather.cs.ucdavis.edu/rifinance2018.pdf · Statistical Cinderella: Parallel Computation for the Rest of Us Norm Matlo](https://reader036.vdocuments.us/reader036/viewer/2022071219/6057d188d28beb4cba092222/html5/thumbnails/89.jpg)
StatisticalCinderella:
ParallelComputation
for the Rest ofUs
Norm MatloffUniversity ofCalifornia at
Davis
Same But with partools
The plan:
• Distribute the data and LIT. Work on it solely indistributed form as much as possible.
• Distrib. by calling partools:::distribsplit(), then later saveusing partools:::filesave().
• The chunks all have the same name, in this case wdc.The manager then issues commands via clusterEvalQ(),the same command to each worker.
• At end of session, save to partools distributed file, sodon’t need to redistribute next time.
![Page 90: Statistical Cinderella: Parallel Computation for the Rest of Usheather.cs.ucdavis.edu/rifinance2018.pdf · Statistical Cinderella: Parallel Computation for the Rest of Us Norm Matlo](https://reader036.vdocuments.us/reader036/viewer/2022071219/6057d188d28beb4cba092222/html5/thumbnails/90.jpg)
StatisticalCinderella:
ParallelComputation
for the Rest ofUs
Norm MatloffUniversity ofCalifornia at
Davis
Same But with partools
The plan:
• Distribute the data and LIT. Work on it solely indistributed form as much as possible.
• Distrib. by calling partools:::distribsplit(), then later saveusing partools:::filesave().
• The chunks all have the same name, in this case wdc.The manager then issues commands via clusterEvalQ(),the same command to each worker.
• At end of session, save to partools distributed file, sodon’t need to redistribute next time.
![Page 91: Statistical Cinderella: Parallel Computation for the Rest of Usheather.cs.ucdavis.edu/rifinance2018.pdf · Statistical Cinderella: Parallel Computation for the Rest of Us Norm Matlo](https://reader036.vdocuments.us/reader036/viewer/2022071219/6057d188d28beb4cba092222/html5/thumbnails/91.jpg)
StatisticalCinderella:
ParallelComputation
for the Rest ofUs
Norm MatloffUniversity ofCalifornia at
Davis
Same But with partools
The plan:
• Distribute the data and LIT. Work on it solely indistributed form as much as possible.
• Distrib. by calling partools:::distribsplit(), then later saveusing partools:::filesave().
• The chunks all have the same name, in this case wdc.The manager then issues commands via clusterEvalQ(),the same command to each worker.
• At end of session, save to partools distributed file, sodon’t need to redistribute next time.
![Page 92: Statistical Cinderella: Parallel Computation for the Rest of Usheather.cs.ucdavis.edu/rifinance2018.pdf · Statistical Cinderella: Parallel Computation for the Rest of Us Norm Matlo](https://reader036.vdocuments.us/reader036/viewer/2022071219/6057d188d28beb4cba092222/html5/thumbnails/92.jpg)
StatisticalCinderella:
ParallelComputation
for the Rest ofUs
Norm MatloffUniversity ofCalifornia at
Davis
Same But with partools
The plan:
• Distribute the data and LIT. Work on it solely indistributed form as much as possible.
• Distrib. by calling partools:::distribsplit(), then later saveusing partools:::filesave().
• The chunks all have the same name, in this case wdc.The manager then issues commands via clusterEvalQ(),the same command to each worker.
• At end of session, save to partools distributed file, sodon’t need to redistribute next time.
![Page 93: Statistical Cinderella: Parallel Computation for the Rest of Usheather.cs.ucdavis.edu/rifinance2018.pdf · Statistical Cinderella: Parallel Computation for the Rest of Us Norm Matlo](https://reader036.vdocuments.us/reader036/viewer/2022071219/6057d188d28beb4cba092222/html5/thumbnails/93.jpg)
StatisticalCinderella:
ParallelComputation
for the Rest ofUs
Norm MatloffUniversity ofCalifornia at
Davis
Same But with partools
The plan:
• Distribute the data and LIT. Work on it solely indistributed form as much as possible.
• Distrib. by calling partools:::distribsplit(), then later saveusing partools:::filesave().
• The chunks all have the same name, in this case wdc.The manager then issues commands via clusterEvalQ(),the same command to each worker.
• At end of session, save to partools distributed file, sodon’t need to redistribute next time.
![Page 94: Statistical Cinderella: Parallel Computation for the Rest of Usheather.cs.ucdavis.edu/rifinance2018.pdf · Statistical Cinderella: Parallel Computation for the Rest of Us Norm Matlo](https://reader036.vdocuments.us/reader036/viewer/2022071219/6057d188d28beb4cba092222/html5/thumbnails/94.jpg)
StatisticalCinderella:
ParallelComputation
for the Rest ofUs
Norm MatloffUniversity ofCalifornia at
Davis
Same But with partools
The plan:
• Distribute the data and LIT. Work on it solely indistributed form as much as possible.
• Distrib. by calling partools:::distribsplit(), then later saveusing partools:::filesave().
• The chunks all have the same name, in this case wdc.The manager then issues commands via clusterEvalQ(),the same command to each worker.
• At end of session, save to partools distributed file, sodon’t need to redistribute next time.
![Page 95: Statistical Cinderella: Parallel Computation for the Rest of Usheather.cs.ucdavis.edu/rifinance2018.pdf · Statistical Cinderella: Parallel Computation for the Rest of Us Norm Matlo](https://reader036.vdocuments.us/reader036/viewer/2022071219/6057d188d28beb4cba092222/html5/thumbnails/95.jpg)
StatisticalCinderella:
ParallelComputation
for the Rest ofUs
Norm MatloffUniversity ofCalifornia at
Davis
Parallel Version
c l s ← makeC lus te r (4 ) # ’ p a r a l l e l ’ c l u s t e r
s e t c l s i n f o ( c l s ) # i n i t ’ p a r t o o l s ’
d i s t r i b s p l i t ( c l s , ’wdc ’ ) # d i s t r i b . t o wo r k e r s
c l u s t e r E v a lQ ( c l s , l i b r a r y ( t s e r i e s ) )c l u s t e r E x p o r t ( c l s , ’ armac ’ )system . time ( c l u s t e r E v a lQ ( c l s ,
a r1 ← apply (wdc , 1 , armac ) ) )# 0 . 0 2 4 0 . 0 0 0 1 8 0 . 6 5 3
c l u s t e r E v a lQ ( c l s , wd l t05 ← wdc [ a r1 < 0 . 5 , ] ) # LIT !# v a r i o u s f u r t h e r ops ( not shown )
# now save , i n wdc . 1 , wdc . 2 , . . .
f i l e s a v e ( c l s , ’wdc ’ , ’wdc ’ , 1 , ’ , ’ )
The one-time overhead of distributing the data will continueto pay off in further analyses.
![Page 96: Statistical Cinderella: Parallel Computation for the Rest of Usheather.cs.ucdavis.edu/rifinance2018.pdf · Statistical Cinderella: Parallel Computation for the Rest of Us Norm Matlo](https://reader036.vdocuments.us/reader036/viewer/2022071219/6057d188d28beb4cba092222/html5/thumbnails/96.jpg)
StatisticalCinderella:
ParallelComputation
for the Rest ofUs
Norm MatloffUniversity ofCalifornia at
Davis
Parallel Version
c l s ← makeC lus te r (4 ) # ’ p a r a l l e l ’ c l u s t e r
s e t c l s i n f o ( c l s ) # i n i t ’ p a r t o o l s ’
d i s t r i b s p l i t ( c l s , ’wdc ’ ) # d i s t r i b . t o wo r k e r s
c l u s t e r E v a lQ ( c l s , l i b r a r y ( t s e r i e s ) )c l u s t e r E x p o r t ( c l s , ’ armac ’ )system . time ( c l u s t e r E v a lQ ( c l s ,
a r1 ← apply (wdc , 1 , armac ) ) )# 0 . 0 2 4 0 . 0 0 0 1 8 0 . 6 5 3
c l u s t e r E v a lQ ( c l s , wd l t05 ← wdc [ a r1 < 0 . 5 , ] ) # LIT !# v a r i o u s f u r t h e r ops ( not shown )
# now save , i n wdc . 1 , wdc . 2 , . . .
f i l e s a v e ( c l s , ’wdc ’ , ’wdc ’ , 1 , ’ , ’ )
The one-time overhead of distributing the data will continueto pay off in further analyses.
![Page 97: Statistical Cinderella: Parallel Computation for the Rest of Usheather.cs.ucdavis.edu/rifinance2018.pdf · Statistical Cinderella: Parallel Computation for the Rest of Us Norm Matlo](https://reader036.vdocuments.us/reader036/viewer/2022071219/6057d188d28beb4cba092222/html5/thumbnails/97.jpg)
StatisticalCinderella:
ParallelComputation
for the Rest ofUs
Norm MatloffUniversity ofCalifornia at
Davis
Parallel Version
c l s ← makeC lus te r (4 ) # ’ p a r a l l e l ’ c l u s t e r
s e t c l s i n f o ( c l s ) # i n i t ’ p a r t o o l s ’
d i s t r i b s p l i t ( c l s , ’wdc ’ ) # d i s t r i b . t o wo r k e r s
c l u s t e r E v a lQ ( c l s , l i b r a r y ( t s e r i e s ) )c l u s t e r E x p o r t ( c l s , ’ armac ’ )system . time ( c l u s t e r E v a lQ ( c l s ,
a r1 ← apply (wdc , 1 , armac ) ) )# 0 . 0 2 4 0 . 0 0 0 1 8 0 . 6 5 3
c l u s t e r E v a lQ ( c l s , wd l t05 ← wdc [ a r1 < 0 . 5 , ] ) # LIT !# v a r i o u s f u r t h e r ops ( not shown )
# now save , i n wdc . 1 , wdc . 2 , . . .
f i l e s a v e ( c l s , ’wdc ’ , ’wdc ’ , 1 , ’ , ’ )
The one-time overhead of distributing the data will continueto pay off in further analyses.
![Page 98: Statistical Cinderella: Parallel Computation for the Rest of Usheather.cs.ucdavis.edu/rifinance2018.pdf · Statistical Cinderella: Parallel Computation for the Rest of Us Norm Matlo](https://reader036.vdocuments.us/reader036/viewer/2022071219/6057d188d28beb4cba092222/html5/thumbnails/98.jpg)
StatisticalCinderella:
ParallelComputation
for the Rest ofUs
Norm MatloffUniversity ofCalifornia at
Davis
More LIT Helpers
• Can distribute the file directly, using partools:::filesplit().
• The functions fileread() and filesave() automatically adda suffix to the name for chunk number, e.g. wdc.1.
• If do need to “undistribute,” distribcat() will do so,adding the proper header.
• Functions such as dwhich.min() treat a distributed dataframe as a virtual single d.f., returning row number withinchunk number.
• Etc.
![Page 99: Statistical Cinderella: Parallel Computation for the Rest of Usheather.cs.ucdavis.edu/rifinance2018.pdf · Statistical Cinderella: Parallel Computation for the Rest of Us Norm Matlo](https://reader036.vdocuments.us/reader036/viewer/2022071219/6057d188d28beb4cba092222/html5/thumbnails/99.jpg)
StatisticalCinderella:
ParallelComputation
for the Rest ofUs
Norm MatloffUniversity ofCalifornia at
Davis
More LIT Helpers
• Can distribute the file directly, using partools:::filesplit().
• The functions fileread() and filesave() automatically adda suffix to the name for chunk number, e.g. wdc.1.
• If do need to “undistribute,” distribcat() will do so,adding the proper header.
• Functions such as dwhich.min() treat a distributed dataframe as a virtual single d.f., returning row number withinchunk number.
• Etc.
![Page 100: Statistical Cinderella: Parallel Computation for the Rest of Usheather.cs.ucdavis.edu/rifinance2018.pdf · Statistical Cinderella: Parallel Computation for the Rest of Us Norm Matlo](https://reader036.vdocuments.us/reader036/viewer/2022071219/6057d188d28beb4cba092222/html5/thumbnails/100.jpg)
StatisticalCinderella:
ParallelComputation
for the Rest ofUs
Norm MatloffUniversity ofCalifornia at
Davis
Software Alchemy: ParallelComputation for the Masses
• I call this approach Software Alchemy (SA) (Matloff, JSS,2016). Method independently proposed by several authors.
• Very simple idea:
• Break the data into disjoint chunks.• Apply the estimator to each chunk, getting θ̂i for chunk i .• Average the θ̂i to get overall θ̂.• For ML classification algs, “vote” among chunks.
• Converts non-EP to stat. equivalent EP. Thus easyparallelization, possibly even superlinear speedup.
• The partools package has a number of SA ops available.
![Page 101: Statistical Cinderella: Parallel Computation for the Rest of Usheather.cs.ucdavis.edu/rifinance2018.pdf · Statistical Cinderella: Parallel Computation for the Rest of Us Norm Matlo](https://reader036.vdocuments.us/reader036/viewer/2022071219/6057d188d28beb4cba092222/html5/thumbnails/101.jpg)
StatisticalCinderella:
ParallelComputation
for the Rest ofUs
Norm MatloffUniversity ofCalifornia at
Davis
Software Alchemy: ParallelComputation for the Masses
• I call this approach Software Alchemy (SA) (Matloff, JSS,2016). Method independently proposed by several authors.
• Very simple idea:
• Break the data into disjoint chunks.• Apply the estimator to each chunk, getting θ̂i for chunk i .• Average the θ̂i to get overall θ̂.• For ML classification algs, “vote” among chunks.
• Converts non-EP to stat. equivalent EP. Thus easyparallelization, possibly even superlinear speedup.
• The partools package has a number of SA ops available.
![Page 102: Statistical Cinderella: Parallel Computation for the Rest of Usheather.cs.ucdavis.edu/rifinance2018.pdf · Statistical Cinderella: Parallel Computation for the Rest of Us Norm Matlo](https://reader036.vdocuments.us/reader036/viewer/2022071219/6057d188d28beb4cba092222/html5/thumbnails/102.jpg)
StatisticalCinderella:
ParallelComputation
for the Rest ofUs
Norm MatloffUniversity ofCalifornia at
Davis
Software Alchemy: ParallelComputation for the Masses
• I call this approach Software Alchemy (SA) (Matloff, JSS,2016). Method independently proposed by several authors.
• Very simple idea:
• Break the data into disjoint chunks.• Apply the estimator to each chunk, getting θ̂i for chunk i .• Average the θ̂i to get overall θ̂.• For ML classification algs, “vote” among chunks.
• Converts non-EP to stat. equivalent EP. Thus easyparallelization, possibly even superlinear speedup.
• The partools package has a number of SA ops available.
![Page 103: Statistical Cinderella: Parallel Computation for the Rest of Usheather.cs.ucdavis.edu/rifinance2018.pdf · Statistical Cinderella: Parallel Computation for the Rest of Us Norm Matlo](https://reader036.vdocuments.us/reader036/viewer/2022071219/6057d188d28beb4cba092222/html5/thumbnails/103.jpg)
StatisticalCinderella:
ParallelComputation
for the Rest ofUs
Norm MatloffUniversity ofCalifornia at
Davis
Software Alchemy: ParallelComputation for the Masses
• I call this approach Software Alchemy (SA) (Matloff, JSS,2016). Method independently proposed by several authors.
• Very simple idea:
• Break the data into disjoint chunks.• Apply the estimator to each chunk, getting θ̂i for chunk i .• Average the θ̂i to get overall θ̂.
• For ML classification algs, “vote” among chunks.
• Converts non-EP to stat. equivalent EP. Thus easyparallelization, possibly even superlinear speedup.
• The partools package has a number of SA ops available.
![Page 104: Statistical Cinderella: Parallel Computation for the Rest of Usheather.cs.ucdavis.edu/rifinance2018.pdf · Statistical Cinderella: Parallel Computation for the Rest of Us Norm Matlo](https://reader036.vdocuments.us/reader036/viewer/2022071219/6057d188d28beb4cba092222/html5/thumbnails/104.jpg)
StatisticalCinderella:
ParallelComputation
for the Rest ofUs
Norm MatloffUniversity ofCalifornia at
Davis
Software Alchemy: ParallelComputation for the Masses
• I call this approach Software Alchemy (SA) (Matloff, JSS,2016). Method independently proposed by several authors.
• Very simple idea:
• Break the data into disjoint chunks.• Apply the estimator to each chunk, getting θ̂i for chunk i .• Average the θ̂i to get overall θ̂.• For ML classification algs, “vote” among chunks.
• Converts non-EP to stat. equivalent EP. Thus easyparallelization, possibly even superlinear speedup.
• The partools package has a number of SA ops available.
![Page 105: Statistical Cinderella: Parallel Computation for the Rest of Usheather.cs.ucdavis.edu/rifinance2018.pdf · Statistical Cinderella: Parallel Computation for the Rest of Us Norm Matlo](https://reader036.vdocuments.us/reader036/viewer/2022071219/6057d188d28beb4cba092222/html5/thumbnails/105.jpg)
StatisticalCinderella:
ParallelComputation
for the Rest ofUs
Norm MatloffUniversity ofCalifornia at
Davis
Software Alchemy: ParallelComputation for the Masses
• I call this approach Software Alchemy (SA) (Matloff, JSS,2016). Method independently proposed by several authors.
• Very simple idea:
• Break the data into disjoint chunks.• Apply the estimator to each chunk, getting θ̂i for chunk i .• Average the θ̂i to get overall θ̂.• For ML classification algs, “vote” among chunks.
• Converts non-EP to stat. equivalent EP.
Thus easyparallelization, possibly even superlinear speedup.
• The partools package has a number of SA ops available.
![Page 106: Statistical Cinderella: Parallel Computation for the Rest of Usheather.cs.ucdavis.edu/rifinance2018.pdf · Statistical Cinderella: Parallel Computation for the Rest of Us Norm Matlo](https://reader036.vdocuments.us/reader036/viewer/2022071219/6057d188d28beb4cba092222/html5/thumbnails/106.jpg)
StatisticalCinderella:
ParallelComputation
for the Rest ofUs
Norm MatloffUniversity ofCalifornia at
Davis
Software Alchemy: ParallelComputation for the Masses
• I call this approach Software Alchemy (SA) (Matloff, JSS,2016). Method independently proposed by several authors.
• Very simple idea:
• Break the data into disjoint chunks.• Apply the estimator to each chunk, getting θ̂i for chunk i .• Average the θ̂i to get overall θ̂.• For ML classification algs, “vote” among chunks.
• Converts non-EP to stat. equivalent EP. Thus easyparallelization, possibly even superlinear speedup.
• The partools package has a number of SA ops available.
![Page 107: Statistical Cinderella: Parallel Computation for the Rest of Usheather.cs.ucdavis.edu/rifinance2018.pdf · Statistical Cinderella: Parallel Computation for the Rest of Us Norm Matlo](https://reader036.vdocuments.us/reader036/viewer/2022071219/6057d188d28beb4cba092222/html5/thumbnails/107.jpg)
StatisticalCinderella:
ParallelComputation
for the Rest ofUs
Norm MatloffUniversity ofCalifornia at
Davis
Software Alchemy: ParallelComputation for the Masses
• I call this approach Software Alchemy (SA) (Matloff, JSS,2016). Method independently proposed by several authors.
• Very simple idea:
• Break the data into disjoint chunks.• Apply the estimator to each chunk, getting θ̂i for chunk i .• Average the θ̂i to get overall θ̂.• For ML classification algs, “vote” among chunks.
• Converts non-EP to stat. equivalent EP. Thus easyparallelization, possibly even superlinear speedup.
• The partools package has a number of SA ops available.
![Page 108: Statistical Cinderella: Parallel Computation for the Rest of Usheather.cs.ucdavis.edu/rifinance2018.pdf · Statistical Cinderella: Parallel Computation for the Rest of Us Norm Matlo](https://reader036.vdocuments.us/reader036/viewer/2022071219/6057d188d28beb4cba092222/html5/thumbnails/108.jpg)
StatisticalCinderella:
ParallelComputation
for the Rest ofUs
Norm MatloffUniversity ofCalifornia at
Davis
SA for Time Series
• Not i.i.d. but stationarity and finite memory should beenough to prove that it works.
• Should work for ARMA, ARIMA, GARCH, etc.
• All this should be considered preliminary.
![Page 109: Statistical Cinderella: Parallel Computation for the Rest of Usheather.cs.ucdavis.edu/rifinance2018.pdf · Statistical Cinderella: Parallel Computation for the Rest of Us Norm Matlo](https://reader036.vdocuments.us/reader036/viewer/2022071219/6057d188d28beb4cba092222/html5/thumbnails/109.jpg)
StatisticalCinderella:
ParallelComputation
for the Rest ofUs
Norm MatloffUniversity ofCalifornia at
Davis
SA for Time Series
• Not i.i.d. but stationarity and finite memory should beenough to prove that it works.
• Should work for ARMA, ARIMA, GARCH, etc.
• All this should be considered preliminary.
![Page 110: Statistical Cinderella: Parallel Computation for the Rest of Usheather.cs.ucdavis.edu/rifinance2018.pdf · Statistical Cinderella: Parallel Computation for the Rest of Us Norm Matlo](https://reader036.vdocuments.us/reader036/viewer/2022071219/6057d188d28beb4cba092222/html5/thumbnails/110.jpg)
StatisticalCinderella:
ParallelComputation
for the Rest ofUs
Norm MatloffUniversity ofCalifornia at
Davis
SA for Time Series
• Not i.i.d. but stationarity and finite memory should beenough to prove that it works.
• Should work for ARMA, ARIMA, GARCH, etc.
• All this should be considered preliminary.
![Page 111: Statistical Cinderella: Parallel Computation for the Rest of Usheather.cs.ucdavis.edu/rifinance2018.pdf · Statistical Cinderella: Parallel Computation for the Rest of Us Norm Matlo](https://reader036.vdocuments.us/reader036/viewer/2022071219/6057d188d28beb4cba092222/html5/thumbnails/111.jpg)
StatisticalCinderella:
ParallelComputation
for the Rest ofUs
Norm MatloffUniversity ofCalifornia at
Davis
SA for Time Series
• Not i.i.d. but stationarity and finite memory should beenough to prove that it works.
• Should work for ARMA, ARIMA, GARCH, etc.
• All this should be considered preliminary.
![Page 112: Statistical Cinderella: Parallel Computation for the Rest of Usheather.cs.ucdavis.edu/rifinance2018.pdf · Statistical Cinderella: Parallel Computation for the Rest of Us Norm Matlo](https://reader036.vdocuments.us/reader036/viewer/2022071219/6057d188d28beb4cba092222/html5/thumbnails/112.jpg)
StatisticalCinderella:
ParallelComputation
for the Rest ofUs
Norm MatloffUniversity ofCalifornia at
Davis
Example
l i b r a r y (TSA)z ← garch . s im ( a lpha=c ( . 0 1 , 0 . 9 ) , n=5000000)system . time ( p r i n t ( garch ( z ) ) )# C o e f f i c i e n t ( s ) :
# a0 a1 b1
# 1 . 0 0 1 e−02 8 . 9 8 0 e−01 3 . 7 7 5 e−12
# 1 3 . 0 8 8 0 . 1 4 0 1 3 . 2 2 8
c l s ← makeC lus te r (2 )s e t c l s i n f o ( c l s )d i s t r i b s p l i t ( c l s , ’ z ’ )system . time ( zc2 ← c l u s t e r E v a lQ ( c l s , ga rch ( z )$ coef ) )# 0 . 0 0 0 0 . 0 0 0 5 . 9 2 5
Reduce ( ’+’ , zc2 ) / 2# a0 a1 b1
# 1 . 0 0 4 2 9 3 e−02 8 . 9 1 0 9 6 4 e−01 3 . 1 2 0 7 2 3 e−05
SA version pretty good, 2X speed with coeffs close. But a4-worker run gave 0.83 for a1, a bit further off.More study needed!
![Page 113: Statistical Cinderella: Parallel Computation for the Rest of Usheather.cs.ucdavis.edu/rifinance2018.pdf · Statistical Cinderella: Parallel Computation for the Rest of Us Norm Matlo](https://reader036.vdocuments.us/reader036/viewer/2022071219/6057d188d28beb4cba092222/html5/thumbnails/113.jpg)
StatisticalCinderella:
ParallelComputation
for the Rest ofUs
Norm MatloffUniversity ofCalifornia at
Davis
Example
l i b r a r y (TSA)z ← garch . s im ( a lpha=c ( . 0 1 , 0 . 9 ) , n=5000000)system . time ( p r i n t ( garch ( z ) ) )# C o e f f i c i e n t ( s ) :
# a0 a1 b1
# 1 . 0 0 1 e−02 8 . 9 8 0 e−01 3 . 7 7 5 e−12
# 1 3 . 0 8 8 0 . 1 4 0 1 3 . 2 2 8
c l s ← makeC lus te r (2 )s e t c l s i n f o ( c l s )d i s t r i b s p l i t ( c l s , ’ z ’ )system . time ( zc2 ← c l u s t e r E v a lQ ( c l s , ga rch ( z )$ coef ) )# 0 . 0 0 0 0 . 0 0 0 5 . 9 2 5
Reduce ( ’+’ , zc2 ) / 2# a0 a1 b1
# 1 . 0 0 4 2 9 3 e−02 8 . 9 1 0 9 6 4 e−01 3 . 1 2 0 7 2 3 e−05
SA version pretty good, 2X speed with coeffs close. But a4-worker run gave 0.83 for a1, a bit further off.More study needed!
![Page 114: Statistical Cinderella: Parallel Computation for the Rest of Usheather.cs.ucdavis.edu/rifinance2018.pdf · Statistical Cinderella: Parallel Computation for the Rest of Us Norm Matlo](https://reader036.vdocuments.us/reader036/viewer/2022071219/6057d188d28beb4cba092222/html5/thumbnails/114.jpg)
StatisticalCinderella:
ParallelComputation
for the Rest ofUs
Norm MatloffUniversity ofCalifornia at
Davis
Example
l i b r a r y (TSA)z ← garch . s im ( a lpha=c ( . 0 1 , 0 . 9 ) , n=5000000)system . time ( p r i n t ( garch ( z ) ) )# C o e f f i c i e n t ( s ) :
# a0 a1 b1
# 1 . 0 0 1 e−02 8 . 9 8 0 e−01 3 . 7 7 5 e−12
# 1 3 . 0 8 8 0 . 1 4 0 1 3 . 2 2 8
c l s ← makeC lus te r (2 )s e t c l s i n f o ( c l s )d i s t r i b s p l i t ( c l s , ’ z ’ )system . time ( zc2 ← c l u s t e r E v a lQ ( c l s , ga rch ( z )$ coef ) )# 0 . 0 0 0 0 . 0 0 0 5 . 9 2 5
Reduce ( ’+’ , zc2 ) / 2# a0 a1 b1
# 1 . 0 0 4 2 9 3 e−02 8 . 9 1 0 9 6 4 e−01 3 . 1 2 0 7 2 3 e−05
SA version pretty good, 2X speed with coeffs close.
But a4-worker run gave 0.83 for a1, a bit further off.More study needed!
![Page 115: Statistical Cinderella: Parallel Computation for the Rest of Usheather.cs.ucdavis.edu/rifinance2018.pdf · Statistical Cinderella: Parallel Computation for the Rest of Us Norm Matlo](https://reader036.vdocuments.us/reader036/viewer/2022071219/6057d188d28beb4cba092222/html5/thumbnails/115.jpg)
StatisticalCinderella:
ParallelComputation
for the Rest ofUs
Norm MatloffUniversity ofCalifornia at
Davis
Example
l i b r a r y (TSA)z ← garch . s im ( a lpha=c ( . 0 1 , 0 . 9 ) , n=5000000)system . time ( p r i n t ( garch ( z ) ) )# C o e f f i c i e n t ( s ) :
# a0 a1 b1
# 1 . 0 0 1 e−02 8 . 9 8 0 e−01 3 . 7 7 5 e−12
# 1 3 . 0 8 8 0 . 1 4 0 1 3 . 2 2 8
c l s ← makeC lus te r (2 )s e t c l s i n f o ( c l s )d i s t r i b s p l i t ( c l s , ’ z ’ )system . time ( zc2 ← c l u s t e r E v a lQ ( c l s , ga rch ( z )$ coef ) )# 0 . 0 0 0 0 . 0 0 0 5 . 9 2 5
Reduce ( ’+’ , zc2 ) / 2# a0 a1 b1
# 1 . 0 0 4 2 9 3 e−02 8 . 9 1 0 9 6 4 e−01 3 . 1 2 0 7 2 3 e−05
SA version pretty good, 2X speed with coeffs close. But a4-worker run gave 0.83 for a1, a bit further off.
More study needed!
![Page 116: Statistical Cinderella: Parallel Computation for the Rest of Usheather.cs.ucdavis.edu/rifinance2018.pdf · Statistical Cinderella: Parallel Computation for the Rest of Us Norm Matlo](https://reader036.vdocuments.us/reader036/viewer/2022071219/6057d188d28beb4cba092222/html5/thumbnails/116.jpg)
StatisticalCinderella:
ParallelComputation
for the Rest ofUs
Norm MatloffUniversity ofCalifornia at
Davis
Example
l i b r a r y (TSA)z ← garch . s im ( a lpha=c ( . 0 1 , 0 . 9 ) , n=5000000)system . time ( p r i n t ( garch ( z ) ) )# C o e f f i c i e n t ( s ) :
# a0 a1 b1
# 1 . 0 0 1 e−02 8 . 9 8 0 e−01 3 . 7 7 5 e−12
# 1 3 . 0 8 8 0 . 1 4 0 1 3 . 2 2 8
c l s ← makeC lus te r (2 )s e t c l s i n f o ( c l s )d i s t r i b s p l i t ( c l s , ’ z ’ )system . time ( zc2 ← c l u s t e r E v a lQ ( c l s , ga rch ( z )$ coef ) )# 0 . 0 0 0 0 . 0 0 0 5 . 9 2 5
Reduce ( ’+’ , zc2 ) / 2# a0 a1 b1
# 1 . 0 0 4 2 9 3 e−02 8 . 9 1 0 9 6 4 e−01 3 . 1 2 0 7 2 3 e−05
SA version pretty good, 2X speed with coeffs close. But a4-worker run gave 0.83 for a1, a bit further off.More study needed!
![Page 117: Statistical Cinderella: Parallel Computation for the Rest of Usheather.cs.ucdavis.edu/rifinance2018.pdf · Statistical Cinderella: Parallel Computation for the Rest of Us Norm Matlo](https://reader036.vdocuments.us/reader036/viewer/2022071219/6057d188d28beb4cba092222/html5/thumbnails/117.jpg)
StatisticalCinderella:
ParallelComputation
for the Rest ofUs
Norm MatloffUniversity ofCalifornia at
Davis
Conclusions
No “silver bullet.” But the following should go a long waytoward your need for parallel computation.
• “Leave it there” and distributed objects/files.
• SA, previously shown to work well on i.i.d. shows promisetime series.
• The partools package adds a lot of convenience.
Ready for the dissent. :-)
And sorry if I have omitted your favorite software. Just let meknow. :-)
![Page 118: Statistical Cinderella: Parallel Computation for the Rest of Usheather.cs.ucdavis.edu/rifinance2018.pdf · Statistical Cinderella: Parallel Computation for the Rest of Us Norm Matlo](https://reader036.vdocuments.us/reader036/viewer/2022071219/6057d188d28beb4cba092222/html5/thumbnails/118.jpg)
StatisticalCinderella:
ParallelComputation
for the Rest ofUs
Norm MatloffUniversity ofCalifornia at
Davis
Conclusions
No “silver bullet.”
But the following should go a long waytoward your need for parallel computation.
• “Leave it there” and distributed objects/files.
• SA, previously shown to work well on i.i.d. shows promisetime series.
• The partools package adds a lot of convenience.
Ready for the dissent. :-)
And sorry if I have omitted your favorite software. Just let meknow. :-)
![Page 119: Statistical Cinderella: Parallel Computation for the Rest of Usheather.cs.ucdavis.edu/rifinance2018.pdf · Statistical Cinderella: Parallel Computation for the Rest of Us Norm Matlo](https://reader036.vdocuments.us/reader036/viewer/2022071219/6057d188d28beb4cba092222/html5/thumbnails/119.jpg)
StatisticalCinderella:
ParallelComputation
for the Rest ofUs
Norm MatloffUniversity ofCalifornia at
Davis
Conclusions
No “silver bullet.” But the following should go a long waytoward your need for parallel computation.
• “Leave it there” and distributed objects/files.
• SA, previously shown to work well on i.i.d. shows promisetime series.
• The partools package adds a lot of convenience.
Ready for the dissent. :-)
And sorry if I have omitted your favorite software. Just let meknow. :-)
![Page 120: Statistical Cinderella: Parallel Computation for the Rest of Usheather.cs.ucdavis.edu/rifinance2018.pdf · Statistical Cinderella: Parallel Computation for the Rest of Us Norm Matlo](https://reader036.vdocuments.us/reader036/viewer/2022071219/6057d188d28beb4cba092222/html5/thumbnails/120.jpg)
StatisticalCinderella:
ParallelComputation
for the Rest ofUs
Norm MatloffUniversity ofCalifornia at
Davis
Conclusions
No “silver bullet.” But the following should go a long waytoward your need for parallel computation.
• “Leave it there” and distributed objects/files.
• SA, previously shown to work well on i.i.d. shows promisetime series.
• The partools package adds a lot of convenience.
Ready for the dissent. :-)
And sorry if I have omitted your favorite software. Just let meknow. :-)
![Page 121: Statistical Cinderella: Parallel Computation for the Rest of Usheather.cs.ucdavis.edu/rifinance2018.pdf · Statistical Cinderella: Parallel Computation for the Rest of Us Norm Matlo](https://reader036.vdocuments.us/reader036/viewer/2022071219/6057d188d28beb4cba092222/html5/thumbnails/121.jpg)
StatisticalCinderella:
ParallelComputation
for the Rest ofUs
Norm MatloffUniversity ofCalifornia at
Davis
Conclusions
No “silver bullet.” But the following should go a long waytoward your need for parallel computation.
• “Leave it there” and distributed objects/files.
• SA, previously shown to work well on i.i.d. shows promisetime series.
• The partools package adds a lot of convenience.
Ready for the dissent. :-)
And sorry if I have omitted your favorite software. Just let meknow. :-)
![Page 122: Statistical Cinderella: Parallel Computation for the Rest of Usheather.cs.ucdavis.edu/rifinance2018.pdf · Statistical Cinderella: Parallel Computation for the Rest of Us Norm Matlo](https://reader036.vdocuments.us/reader036/viewer/2022071219/6057d188d28beb4cba092222/html5/thumbnails/122.jpg)
StatisticalCinderella:
ParallelComputation
for the Rest ofUs
Norm MatloffUniversity ofCalifornia at
Davis
Conclusions
No “silver bullet.” But the following should go a long waytoward your need for parallel computation.
• “Leave it there” and distributed objects/files.
• SA, previously shown to work well on i.i.d. shows promisetime series.
• The partools package adds a lot of convenience.
Ready for the dissent. :-)
And sorry if I have omitted your favorite software. Just let meknow. :-)
![Page 123: Statistical Cinderella: Parallel Computation for the Rest of Usheather.cs.ucdavis.edu/rifinance2018.pdf · Statistical Cinderella: Parallel Computation for the Rest of Us Norm Matlo](https://reader036.vdocuments.us/reader036/viewer/2022071219/6057d188d28beb4cba092222/html5/thumbnails/123.jpg)
StatisticalCinderella:
ParallelComputation
for the Rest ofUs
Norm MatloffUniversity ofCalifornia at
Davis
Conclusions
No “silver bullet.” But the following should go a long waytoward your need for parallel computation.
• “Leave it there” and distributed objects/files.
• SA, previously shown to work well on i.i.d. shows promisetime series.
• The partools package adds a lot of convenience.
Ready for the dissent. :-)
And sorry if I have omitted your favorite software. Just let meknow. :-)
![Page 124: Statistical Cinderella: Parallel Computation for the Rest of Usheather.cs.ucdavis.edu/rifinance2018.pdf · Statistical Cinderella: Parallel Computation for the Rest of Us Norm Matlo](https://reader036.vdocuments.us/reader036/viewer/2022071219/6057d188d28beb4cba092222/html5/thumbnails/124.jpg)
StatisticalCinderella:
ParallelComputation
for the Rest ofUs
Norm MatloffUniversity ofCalifornia at
Davis
Conclusions
No “silver bullet.” But the following should go a long waytoward your need for parallel computation.
• “Leave it there” and distributed objects/files.
• SA, previously shown to work well on i.i.d. shows promisetime series.
• The partools package adds a lot of convenience.
Ready for the dissent. :-)
And sorry if I have omitted your favorite software. Just let meknow. :-)