communication performance measurement and analysis on commodity clusters name nor asilah wati abdul...

32
Communication Performance Communication Performance Measurement and Analysis on Measurement and Analysis on Commodity Clusters Commodity Clusters Name Nor Asilah Wati Abdul Hamid Supervisor Dr. Paul Coddington Dr. Francis Vaughan Research Proposal

Post on 20-Dec-2015

219 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Communication Performance Measurement and Analysis on Commodity Clusters Name Nor Asilah Wati Abdul Hamid Supervisor Dr. Paul Coddington Dr. Francis Vaughan

Communication Performance Communication Performance Measurement and Analysis on Measurement and Analysis on

Commodity ClustersCommodity Clusters

Name Nor Asilah Wati Abdul Hamid

Supervisor Dr. Paul Coddington

Dr. Francis Vaughan

Research Proposal

Page 2: Communication Performance Measurement and Analysis on Commodity Clusters Name Nor Asilah Wati Abdul Hamid Supervisor Dr. Paul Coddington Dr. Francis Vaughan

Table of ContentTable of Content

IntroductionIntroductionMessage-Passing Multicomputers.Message-Passing Multicomputers.Previous Research to Improve Communication Over Previous Research to Improve Communication Over Ethernet.Ethernet.Communication Performance Measurement.Communication Performance Measurement.Previous Benchmark SoftwarePrevious Benchmark SoftwarePerformance Analysis for MPIBench.Performance Analysis for MPIBench.Motivation Motivation MethodologyMethodologyValue of the Research.Value of the Research.

Page 3: Communication Performance Measurement and Analysis on Commodity Clusters Name Nor Asilah Wati Abdul Hamid Supervisor Dr. Paul Coddington Dr. Francis Vaughan

IntroductionIntroductionThe proposed research is on parallel computing and focus on message-passing parallel computers.

This research will study communications benchmark software and performance measurement and analysis for message-passing parallel computers.

The proposed research will find a clearer understanding of communications performance problems and how they can be improved, particularly for commodity clusters using Linux PCs and Ethernet networks.

Page 4: Communication Performance Measurement and Analysis on Commodity Clusters Name Nor Asilah Wati Abdul Hamid Supervisor Dr. Paul Coddington Dr. Francis Vaughan

Message-Passing Parallel ComputersMessage-Passing Parallel Computers There are various types of message-passing parallel computers, from

high end to the low end.

Beowulf clusters are high-performance computers built from off-the-shelf commodity components - PCs running Linux and Fast Ethernet network.

However, some clusters use high-end Unix workstations (such as Compaq Alpha or Sun UltraSPARC machines) and/or high-end gigabit networks (such as Myrinet, QSNet)

Hydra APAC NF

Page 5: Communication Performance Measurement and Analysis on Commodity Clusters Name Nor Asilah Wati Abdul Hamid Supervisor Dr. Paul Coddington Dr. Francis Vaughan

The low end commodity cluster - consist of a cluster of PCs running Linux connected using a Fast Ethernet network, e.g Perseus.

Use MPI message-passing libraries, e.g MPICH, LAM MPI.

MPI – standard library specification for message-passing computer.

MPICH – freely available implementation of MPI

The proposed research is mainly focussed on low end commodity clusters.

Message-Passing Parallel ComputersMessage-Passing Parallel Computers

Perseus

Page 6: Communication Performance Measurement and Analysis on Commodity Clusters Name Nor Asilah Wati Abdul Hamid Supervisor Dr. Paul Coddington Dr. Francis Vaughan

Beowulf clusters have become very popular over the past couple of years, due to the rapid improvements in the performance of commodity processors and networking infrastructure, and the development of Linux, for PCs.

For most applications, Beowulf clusters offer much better price/performance than standard supercomputers.

Beowulf cluster commonly use Ethernet network and TCP/IP for communication and MPICH for MPI library.

Ethernet network is much cheaper than high-speed networks.

However there are several inadequacies related to the Ethernet network due to TCP/IP and MPI implementation.

Message-Passing Parallel ComputersMessage-Passing Parallel Computers

Page 7: Communication Performance Measurement and Analysis on Commodity Clusters Name Nor Asilah Wati Abdul Hamid Supervisor Dr. Paul Coddington Dr. Francis Vaughan

Network Cost Comparison Network Cost Comparison (Clustervision.com)(Clustervision.com)

Interconnect Bandwidth (Mbytes/s)

Latency (µs) Cost/port (Euro)

QsNet (Quadrics)

360 5 4770

Myrinet (Myricom)

245 10 2050

Gigabit Ethernet 90 100 200

Megabit Ethernet 12 100 28

Infiniband 560 - 610 13 - 17 2000

Page 8: Communication Performance Measurement and Analysis on Commodity Clusters Name Nor Asilah Wati Abdul Hamid Supervisor Dr. Paul Coddington Dr. Francis Vaughan

Ethernet ProblemsTCP/IP is specifically designed for Internet use, hence, there are several problems in using it for parallel computing

Examples : mechanism for packet loss and congestion control, timeout etc.

Problems in MPI implementation occur because :- TCP/IP support detect errors, loss of data and retransmission until data is correct and receive

BUT MPI implementation assume network with reliable data transfer.

There is much research trying to improve the performance of TCP/IP, but mostly focussed on optimizing the performance for internet and local-area network.

Page 9: Communication Performance Measurement and Analysis on Commodity Clusters Name Nor Asilah Wati Abdul Hamid Supervisor Dr. Paul Coddington Dr. Francis Vaughan

Previous Research to Improve Communication Over Ethernet

Active Messages – aims to reducing the communication overhead and allowing communication and computation overlap.

GAMMA – an extension layer in communication layer for Linux in cluster of PCs.

BIP – Basic Interface for Parallelism, an interface for network communication for message-passing parallel computing.

VIA – is a standard communication infrastructure for System Area Networks (SANs) that provides protected, zero-copy user-space inter-process communication

MVICH – is an MPICH-based implementation of MPI for Virtual Interface Architecture (VIA).

Page 10: Communication Performance Measurement and Analysis on Commodity Clusters Name Nor Asilah Wati Abdul Hamid Supervisor Dr. Paul Coddington Dr. Francis Vaughan

Protocol Comparison Protocol Comparison (Ping-Pong Application)(Ping-Pong Application)

Platform Latency(us) Bandwidth (Mbyte/s)

BIP – Myrinet 5.0 108.0

TCP - Myrinet 103.0 42.0

GAMMA – Gigabit Ethernet 9.6 90.0

TCP – Gigabit Ethernet 103.0 62.0

GAMMA - Fast Ethernet 12.7 12.2

VIA – Fast Ethernet 27.0 -

TCP – Fast Ethernet 105.0 10.0

Page 11: Communication Performance Measurement and Analysis on Commodity Clusters Name Nor Asilah Wati Abdul Hamid Supervisor Dr. Paul Coddington Dr. Francis Vaughan

Previous research focusing more on developing a new design for replacing the TCP/IP protocol.

However, a new protocol will require new software (e.g: drivers) for all Ethernet hardware.

Also, need to port MPI implementation to new protocol, e.g : MVICH.

TCP/IP and MPICH are widely used in existing Beowulf cluster. So a more flexible TCP/IP and better MPICH will be better than a new protocol.

Research from Pope et al is an example of research aiming to design a more flexible TCP/IP using a compliant systems approach.

They proposed the argument for separation of policy and mechanism and examine what policies is suitable for TCP/IP stacks which depends on the type of communication use.

Previous Research to Improve Previous Research to Improve Communication Over EthernetCommunication Over Ethernet

Page 12: Communication Performance Measurement and Analysis on Commodity Clusters Name Nor Asilah Wati Abdul Hamid Supervisor Dr. Paul Coddington Dr. Francis Vaughan

Communication Performance Measurement

Why communication performance measurement is important, examples :-

To improve the performance of the machine and the MPI implementation

Needed as input to performance modeling tools for parallel programs

To compare the performance of the machine, in order to find the fastest machine.

Benchmark software, e.g: SKaMPI, MPBench, Mpptest, Pallas MPI Benchmark, and recently developed MPIBench

Page 13: Communication Performance Measurement and Analysis on Commodity Clusters Name Nor Asilah Wati Abdul Hamid Supervisor Dr. Paul Coddington Dr. Francis Vaughan

Previous Benchmark SoftwarePrevious Benchmark Software

SKaMPI, MPBench, Pallas MPI Benchmark, Mpptest.

Existing benchmark software has several weaknesses, which can result in the inaccuracy of time measurement.

The use of relatively coarse grained clocks for timing measurement, which will lead a benchmark to average results over a high number of test repetitions.

Rely on MPI_Wtime for timing and use ping-pong test to measure the total round trip time, not single communication time.

None of the communication patterns used in existing benchmark consider clusters of SMP nodes.

Page 14: Communication Performance Measurement and Analysis on Commodity Clusters Name Nor Asilah Wati Abdul Hamid Supervisor Dr. Paul Coddington Dr. Francis Vaughan

MPIBench has been developed by Duncan Grove as part of his PhD research.

The extra functionality in MPIBench :Topology-aware, specifically designed to ensure meaningful results on clusters of SMP nodes.Uses an accurate globally synchronized clock to measure the performance of all the processes involved.Can measure times of single communications - not just averages.Can generate histograms (distributions) of communication times.

The proposed research will used MPIBench for the performance measurement and also improve the MPIBench.

MPIBenchMPIBench

Page 15: Communication Performance Measurement and Analysis on Commodity Clusters Name Nor Asilah Wati Abdul Hamid Supervisor Dr. Paul Coddington Dr. Francis Vaughan

Performance Analysis with MPIBenchPerformance Analysis with MPIBench

Comparison of communication performance of different networks.

Beowulf-type cluster of PCs connected by Fast Ethernet (Perseus and Bunyip).

Perseus vs Bunyip – to analyse effects of different communication topology.

Sun Technical Compute Farm connected with Myrinet (Orion).

Compaq AlphaServer SC connected with QsNet (APAC NF).

Page 16: Communication Performance Measurement and Analysis on Commodity Clusters Name Nor Asilah Wati Abdul Hamid Supervisor Dr. Paul Coddington Dr. Francis Vaughan

Performance Analysis with MPIBenchPerformance Analysis with MPIBench

MPIBench found several inadequacies from the performance analysis, for examples :-

Problem caused by TCP/IP timeouts and congestion control.

Problems with MPI implementations.

Problems caused by network congestion.

Distribution results with long tails, including ‘outliers’ with very long communication time due to :-

Spurious interference from unrelated operating system services.

Cluster management system daemons

Outlier - An extreme point that is much longer than the average value of distribution.

Page 17: Communication Performance Measurement and Analysis on Commodity Clusters Name Nor Asilah Wati Abdul Hamid Supervisor Dr. Paul Coddington Dr. Francis Vaughan

Perseus : Average time for MPI_BcastPerseus : Average time for MPI_Bcast

Page 18: Communication Performance Measurement and Analysis on Commodity Clusters Name Nor Asilah Wati Abdul Hamid Supervisor Dr. Paul Coddington Dr. Francis Vaughan

Perseus : Percentage of procesess experiencing Perseus : Percentage of procesess experiencing outliers during MPI_Bcastoutliers during MPI_Bcast

Page 19: Communication Performance Measurement and Analysis on Commodity Clusters Name Nor Asilah Wati Abdul Hamid Supervisor Dr. Paul Coddington Dr. Francis Vaughan

Distribution of times for MPI_BcastDistribution of times for MPI_Bcast

Page 20: Communication Performance Measurement and Analysis on Commodity Clusters Name Nor Asilah Wati Abdul Hamid Supervisor Dr. Paul Coddington Dr. Francis Vaughan

Perseus : Average times for MPI_AlltoallPerseus : Average times for MPI_Alltoall

Page 21: Communication Performance Measurement and Analysis on Commodity Clusters Name Nor Asilah Wati Abdul Hamid Supervisor Dr. Paul Coddington Dr. Francis Vaughan

Perseus : Percentage of processess Perseus : Percentage of processess experiencing outliers during MPI_Alltoallexperiencing outliers during MPI_Alltoall

Page 22: Communication Performance Measurement and Analysis on Commodity Clusters Name Nor Asilah Wati Abdul Hamid Supervisor Dr. Paul Coddington Dr. Francis Vaughan

Motivation 1Motivation 1MPIBench is a new communication benchmark software which has new

capability compared to existing benchmark software.

HOWEVER, there has been no detailed comparison or study between MPIBench with the existing MPI benchmarks. Furthermore, in order to improve MPIBench a comparison with existed benchmark software is important, to identify any inadequacies in MPIBench.

Research Aims

1. To compare MPIBench with the other existing benchmark software . The comparison also to test the scalability, functionality and usability of MPIBench compared with the existing software.

2. Based from the comparison results, improvements and changes can be done to MPIBench.

Page 23: Communication Performance Measurement and Analysis on Commodity Clusters Name Nor Asilah Wati Abdul Hamid Supervisor Dr. Paul Coddington Dr. Francis Vaughan

MethodologyMethodology

1. Comparison of different benchmark software for message-passing parallel computer.

Particularly, the comparison is divided into theoretical and experimental part.

The theoretical part will involved a study based from the conference or journal paper and the documentation from the benchmark software.

The experimental part will involve installation of the benchmark software into the Hydra cluster and test the functionality of the software.

Then, a standard procedure for test particular such as size of data, MPI routine and number of iterations will be identify to standardized the experiment. All the data that obtain from the experiment will be recorded and compared.

Page 24: Communication Performance Measurement and Analysis on Commodity Clusters Name Nor Asilah Wati Abdul Hamid Supervisor Dr. Paul Coddington Dr. Francis Vaughan

2. Improvement to MPIBench

Generally, the second method will required a detailed understanding to the MPIBench code.

After that, changes to the code will be highlighted and then changes will be made to the code.

Crucially important after the changes is the testing to the MPIBench, the testing should be done with the same testing in the first methodology to ensure the correctness of the program.

MethodologyMethodology

Page 25: Communication Performance Measurement and Analysis on Commodity Clusters Name Nor Asilah Wati Abdul Hamid Supervisor Dr. Paul Coddington Dr. Francis Vaughan

Motivation 2Motivation 2Previously, Grove had used MPIBench to test between two cluster which

has a similar commodity component but different in their topology, Perseus and Bunyip.

HOWEVER, there has not been any experimental work done with MPIBench to test on a machine which has a similar components and similar topology but only different in their network type.

Research Aims

3. To analyze the performance between Myrinet and Ethernet network on a large Linux PC cluster (Hydra). Results obtained from the test will be analyze and may provide ideas on how to upgrade the communication performance for Ethernet network in Beowulf cluster.

Page 26: Communication Performance Measurement and Analysis on Commodity Clusters Name Nor Asilah Wati Abdul Hamid Supervisor Dr. Paul Coddington Dr. Francis Vaughan

3. Performance Analysis and Investigation of Communication Performance on Different Networks.

Design a method to differentiate between Ethernet and Myrinet network to run the program.

A set of procedure or parameter is required to standardize the experiment, for examples number of iterations, MPI routine, number of processors and size of data.

The performance analysis result will be recorded and analysed.

After the performance analysis results is obtained, then, the results will be used to investigate the problems in Ethernet network.

The investigation will involve study, analysis and discussion regarding the comparison results on communication performance for Myrinet and Ethernet network.

The expectation of this stage is to obtain ideas for problems that occur in the Ethernet network, particularly for TCP/IP and MPI implementation.

MethodologyMethodology

Page 27: Communication Performance Measurement and Analysis on Commodity Clusters Name Nor Asilah Wati Abdul Hamid Supervisor Dr. Paul Coddington Dr. Francis Vaughan

Motivation 3Motivation 3Previously, there are several research to overcome the problems of

communication performance for Ethernet network in Beowulf cluster.

However, previous research focus more on a new design of protocol. A new protocol will require new software (e.g: drivers) for all Ethernet hardware and also need to port MPI implementation to new protocol.It will be more valuable if the problems of TCP/IP and MPICH itself can be fixed.

Research Aims

4. To propose or develop solutions to communication problems in Beowulf clusters using Ethernet network, particularly for TCP/IP and MPI implementation.

Page 28: Communication Performance Measurement and Analysis on Commodity Clusters Name Nor Asilah Wati Abdul Hamid Supervisor Dr. Paul Coddington Dr. Francis Vaughan

4. Propose or Develop Solutions for the Ethernet Network Problems in Beowulf Clusters Computers. 

This will involve study, analysis, comparison results and experiment.

Based from the study that has been done, there are several expected problems that might be occurred in TCP/IP, for example packet loss and congestion.

Suggestions that might be suitable to the TCP/IP, decrease the time out or improve the algorithm for the resend mechanism in TCP/IP.

The problems that occur in MPICH such as poor performance and unusual distribution of MPI_Alltoall.

Suggest or develop optimised code for some MPI routines that is suitable for TCP/IP and Ethernet network.

Re-run experiments to test changes to MPICH code or TCP, in order to check for performance improvement.

MethodologyMethodology

Page 29: Communication Performance Measurement and Analysis on Commodity Clusters Name Nor Asilah Wati Abdul Hamid Supervisor Dr. Paul Coddington Dr. Francis Vaughan

Motivation 4Motivation 4Previously Grove had used MPIBench to benchmark several machines,

from his analysis he recorded “outlier” results showing very long communication times.

The main causes of ‘outlier’ is because of :- Spurious interference from unrelated operating system services.

Cluster management system daemons

However, there has been no further work to investigate the solution of these problems.

Research Aims5. To find solutions for loss of performance in Beowulf clusters with Linux

PCs.

6. Possibly develop a customized installation of Linux.

Page 30: Communication Performance Measurement and Analysis on Commodity Clusters Name Nor Asilah Wati Abdul Hamid Supervisor Dr. Paul Coddington Dr. Francis Vaughan

MethodologyMethodology

5. Investigation of the Outliers Problem.

Set the same experiment that the MPIBench did previously on Perseus.

Based on the expected main causes of the outliers, the experiment will involve :-

Experiment with removing operating system and Cluster Management system processes.

Experiment with reducing the frequency of the interference from process execution.

Try to identify the cause of outliers and propose solutions.

Page 31: Communication Performance Measurement and Analysis on Commodity Clusters Name Nor Asilah Wati Abdul Hamid Supervisor Dr. Paul Coddington Dr. Francis Vaughan

Value of the ResearchValue of the ResearchThis proposed research will provide :-

1. An improvement to MPIBench which can be used to analyze communication networks and MPI implementations.

2. Results that can be used for future study for PEVPM, a new performance modelling technique.

3. An improvement in communication performance for Beowulf Clusters using Ethernet network which can provide a solution for cheap high performance computing.

Page 32: Communication Performance Measurement and Analysis on Commodity Clusters Name Nor Asilah Wati Abdul Hamid Supervisor Dr. Paul Coddington Dr. Francis Vaughan

END.END.