presentation 2 spring 2016 final fat cut (1)

Post on 14-Apr-2017

22 Views

Category:

Documents

0 Downloads

Preview:

Click to see full reader

TRANSCRIPT

Design, Implementation, and Characterization of a Raspberry Pi Cluster for

High-Performance Computing

Michael Vistine Katy Rodriguez Ralph Walker II

1/38

For our senior design project, we are testing high-performance computing using the Raspberry Pi 2. The Raspberry Pi 2 offers a powerful 900 MHz quad-core ARM CPU that will be tested to its limit by running different tests such as wired vs wireless, number of cores vs execution time, and temperature vs clock speed. The wired design is set up with one master pi communicating to three slave nodes via router that we are using as a switch. The master pi runs the test program while it is SSH to the slave pi’s which are the main horsepower while running our program through Open MPI.

TEAM 5 MEMBERS

Michael Vistine Software

Engineer

2/38

Katy RodriguezIntegration Engineer

Ralph Walker Hardware Engineer

OVERVIEW

Motivation Hardware/Design Description

Software Data Timeline/Current Status Conclusion & Questions

3/38

MOTIVATIONS

4/38

Design• Cluster Computing • Compact• Active Cooling

Raspberry Pi• Low Cost multicore

processor• Open Source Code

Characterization of the Design• Nodes vs. Performance• Wireless vs. Wired

Performance• Passive vs. Active cooling

Photo courtesy of azchipka.thechipkahouse.com

HARDWARE COMPARISON

5/38Photo courtesy of pcworld.com

Pi 1B+ Pi B 2 BeagleBone Pi 3

Processor 700 MHz 900-1000 MHz 1GHz 1.2GHz

Cores 1 4 1 4

RAM 512 MB 1 GB 512 MB 1 GB

Peripherals 4 USB Ports 4 USB Ports 2 USB Ports 4 USB Ports

Power Draw 0.31A 0.42A 0.46A 0.58A

Memory Micro SD slot Micro SD slot 2 GB on board & Micro SD

Micro SD slot

Price ~$30 ~$35 ~$55 ~$35

Photo courtesy of ti.com

Photo courtesy of adafruit.com

Photo courtesy of hifiberry.com

Walker, Ralph
Raspberry Pi 3 includes integrated 802.11n Wifi adapter and Bluetooth 4.1

HARDWARE

6/38

Photo courtesy of Amazon

• 2.4 amps per port• Multi-device charging• Surge protection

Anker 60W 6 Port USB Charger PowerPort

Photo courtesy of Amazon

Wireless Router TP-Link TL WR841N  

• 300Mbps wireless connection

• Adjustable DHCP settings

• Wireless On/Off switch

• 4 LAN ports

Walker, Ralph
1.2A is the recommended power requirement of the Raspberry Pi 2

RPI1

DESIGN DESCRIPTION

7/38

Power RPI0 (Master Node)

RPI2

RPI3

Open MPI

Test.cRouter

DESIGN DESCRIPTION

8/38

Final Design• Custom made 3D printed

enclosure using PTC Creo Elements

• Laser cut plexiglass• Wired/Wireless router• Heat sinks and PC fan• Power hub

Photo courtesy Katy Rodriguez

OPERATING SYSTEM – RAPSBIAN JESSIE◦ Based on Debian Linux◦ Lightweight OS◦ Open source◦ Bash terminal interface◦ Kernel version 4.1◦ Pre-installed with

education programing languages

SOFTWARE

9/38Photo courtesy raspberrypi.org

Bash terminal- used to:◦ Edit and create configuration files

Style of syntax used to operate in terminal◦ $ sudo apt-get install (“file”) – used to install files

OpenMPI:◦ Message Passing Interface used to implement

parallel computing◦ Takes the data and breaks it into smaller chunks

and distributes it to the nodes to run simultaneously

SOFTWARE

10/38

First all packages were updated Finalize the configurations using sudo raspi-

config Settings for the master were the same as

the slave nodes:◦ Set the host names as rpi0◦ Enable ssh◦ Set the memory split to 16

SETTING UP THE MASTER

12/38

Install all the same packages from the master node

sudo raspi-config to set all the same system preferences as the master node

SETTING UP SECOND PI

13/38

Photo courtesy of www.raspberrypi.org

CALL_PROCS (TEST PROGRAM 1)

14/38

1. # include <stdio.h> //Standard Input/output library2. # include <mpi.h>

3. int main(int argc, char** argv)4. {5. //MPI variables6. int num_processes;7. int curr_rank;8. char proc_name[MPI_MAX_PROCESSOR_NAME];9. int proc_name_len;

10. //intialize MPI11. MPI_Init(&argc, &argv);

12. //get the number of processes13. MPI_Comm_size(MPI_COMM_WORLD, &num_processes);14.15. //Get the rank of the current process16. MPI_Comm_rank(MPI_COMM_WORLD, &curr_rank);

17. // Get the processor name for the current thread18. MPI_Get_processor_name(proc_name, &proc_name_len);

19. //Check that we're running this process.20. printf("Calling process %d out of %d on %s\r\n", curr_rank,

num_processes, proc_name);

21. //Wait for all threads ot finish22. MPI_Finalized();

23. return 0;24. }

•Creates user specified dummy processes of equal size

•Allocates the processes dynamically to each node

•Displays the process number upon completion

#include <stdio.h>#include <math.h>#include <mpi.h>#define TOTAL_ITERATIONS 10000

int main(int argc, char *argv[]){//MPI variables… sum = 0.0;//determine step size h = 1.0 / (double) total_iter;//the current process will perform operations on its rank//added by multiples of the total number of threads// rank = 3, for(step_iter = curr_rank +1; step_iter <= total_iter; step_iter += num_processes)// resolve the sum into calculated value of picurr_pi = h * sum;//reduce all processes' pi values to one valueMPI_Reduce(&curr_pi, &pi, 1, MPI_DOUBLE, MPI_SUM, 0, MPI_COMM_WORLD); } // Print out the final value and error printf("calculated Pi = %.16f\r\n", pi); printf("Relative Error = %.16f\r\n", fabs(pi - M_PI)); //Wrap up MPI MPI_Finalize();

CALC_PI (MAIN PROGRAM)

15/38

This program calculates the value of pi the 10,000 times per thread

SSH Keys generated and a passphrase is recommended ◦ A bitmap of random

characters was then generated as the key

Next key is copied to slave nodes

KEY GENERATION

16/38

Photo courtesy visualgdb.com

Set all node IP addresses as static in ◦ sudo nano /etc/network/interfaces (edit on all

nodes) Set all hostnames to now static IP’s

◦ sudo nano /etc/hosts (edit on all nodes ) We were only able to set up either wired or

wireless static ips at one time to prevent conflict with the mounts

SETTING UP THE NETWORK

17/38

Setting up the wireless connection was essentially the same as setting up the wired connection

/etc/network/hosts was edited and new ip addresses and hostnames were added

SETTING UP WIRELESS CONNECTION

18/38

/ETC/HOSTS

Photo courtesy of Mike Vistine

19/38

This figure shows the wireless setup of /etc/network/interfaces

SETTING UP THE WIRELESS CONNECTION

Photo courtesy of Mike Vistine

20/38

Next a common user was created on all nodes to allow the nodes to communicate with out the need for repeated password entry

Next the nodes were mounted onto the master node

COMMON USER AND NFS

21/38

sudo nano /etc/exports◦ Line added at bottom of file:◦ /mirror 192.168.0.0/24(rw,sync) [for wired]◦ /mirror 192.168.1.0/24(rw,sync) [for wireless]

These steps repeated for all slave nodes

COMMON USER AND NFS

22/38

AUTOMOUNT SCRIPT• For each node /etc/rc.local

was edited• A few lines were added at the

end of the file to print “mounting network drives”

• This script was supposed to automatically mount the drives on boot

• The automount function was incredibly slow

Photo courtesy of Mike Vistine

23/38

Log in as mpiu on master node using su – mpiu Switch to the /mirror/code/ mpicc calc_pi.c –o calc_pi time mpiexec –n 4 –H RRPI0-3 calc_pi

RUNNING OPENMPI

24/38

The .c files and the executables in the directory in the screen shot

The execution of the program call_procs with mpiexec

OPENMPI

25/38

Photo courtesy of Mike Vistine

CALC_PI TEST Here you can see an example of the format

while running the calc_pi test Each core and the number of threads are

designated in the MPI command

Photo courtesy of Mike Vistine

26/38

In order for wireless mpi to work the mounts had to be set manually

The nfs kernel had to be restart each time the pi’s were powered off or rebooted

WIRELESS CALC_PI TESTS

27/38

Wired vs Wireless performance ◦ Test the processing performance of cluster when:

Hard wired to router Using dongles for each node to communicate wirelessly

Computational benchmark tests◦ Using benchmark software to observe total processing

power across all pi’s◦ Using complicated program as test material to solve with

cluster Graphical performance info Implementation of practical applications Active Cooling of the Pi’s

◦ Fans implemented in final case design

DESIGN CHARACTERISTICS

28/38

Wired vs Wireless performance

29/38

Wired performance did prove to be more efficient

The wireless values were inconsistent

Each record value per core was an average of three runs

Temperature vs Clock Speed

30/38

Passive temperatures proved to be higher before and after running wireless data test.

Active cooling significantly improved temperature regulation of each pi.

Active cooling vs passive cooling

31/38

Passive cooling results were very erratic. Active cooling results were consistent and

had better test times.

TIMELINE

32/38

Aug. 28 – Sept. 27

Sept. 23 – Dec. 10

Oct. 11 – March 23

Jan. 4 – April 5

Feb. 9 – April 15

BUDGET

33/38

Budget from ScratchTotal Project Budget

All project tests are complete

Data has been collected for anaylsis

Case is 98% complete

CURRENT STATUS

34/38

Add finishing details to documentation and case design

Make senior design day poster

Prepare for senior design day

IMMEDIATE TASKS

35/38

Experiment completed

Wired proved to be faster and more reliable than wireless

Active cooling made a significant different in performance and temperature regulation

CONCLUSIONS

36/38

http://www.python.org/doc/current/tut/tut.html

http://likemagicappears.com/projects/raspberry-pi-cluster/

http://www.zdnet.com/article/build-your-own-supercomputer-out-of-raspberry-pi-boards/

https://Youtu.be/R0Uglgcb5g

http://www.newegg.com/

http://www.amazon.com

http://anllyquinte.blogspot.com/

http://www.slideshare.net/calcpage2011/mpi4pypdf

SOURCES

37/38

QUESTIONS??

38/38

top related