lecture 8: design of parallel programs part iii lecturer: simon winberg attribution-sharealike 4.0...

37
Lecture 8: Design of Parallel Programs Part III Lecturer: Simon Winberg Digital Systems EEE4084F Attribution-ShareAlike 4.0 International (CC BY-SA 4.0)

Upload: stanley-henderson

Post on 26-Dec-2015

218 views

Category:

Documents


2 download

TRANSCRIPT

Page 1: Lecture 8: Design of Parallel Programs Part III Lecturer: Simon Winberg Attribution-ShareAlike 4.0 International (CC BY-SA 4.0)

Lecture 8:Design of Parallel Programs

Part IIILecturer:

Simon Winberg

Digital Systems

EEE4084F

Attribution-ShareAlike 4.0 International (CC BY-SA 4.0)

Page 2: Lecture 8: Design of Parallel Programs Part III Lecturer: Simon Winberg Attribution-ShareAlike 4.0 International (CC BY-SA 4.0)

Lecture Overview

Step 4: communications (cont) Cloud computing Virtualization

Page 3: Lecture 8: Design of Parallel Programs Part III Lecturer: Simon Winberg Attribution-ShareAlike 4.0 International (CC BY-SA 4.0)

Steps in designing parallel programs

The hardware may come first or laterThe main steps:1. Understand the problem2. Partitioning (separation into main tasks)3. Decomposition & Granularity4. Communications5. Identify data dependencies6. Synchronization7. Load balancing8. Performance analysis and tuning

Page 4: Lecture 8: Design of Parallel Programs Part III Lecturer: Simon Winberg Attribution-ShareAlike 4.0 International (CC BY-SA 4.0)

Step 4: Communications (CONT)

EEE4084F

Page 5: Lecture 8: Design of Parallel Programs Part III Lecturer: Simon Winberg Attribution-ShareAlike 4.0 International (CC BY-SA 4.0)

Latency vs. Bandwidth Communication Latency =

Time it takes to send a minimal length (e.g., 0 byte) message from one task to another. Usually expressed as microseconds.

Bandwidth = The amount of data that can be sent per unit

of time. Usually expressed as megabytes/sec or gigabytes/sec.

Page 6: Lecture 8: Design of Parallel Programs Part III Lecturer: Simon Winberg Attribution-ShareAlike 4.0 International (CC BY-SA 4.0)

Latency vs. Bandwidth Many small messages can result in

latency dominating communication overheads.

If many small messages are needed: It can be more efficient to package small

messages into a larger ones, to increasing the effective bandwidth of communications.

Page 7: Lecture 8: Design of Parallel Programs Part III Lecturer: Simon Winberg Attribution-ShareAlike 4.0 International (CC BY-SA 4.0)

Baud rate vs. Bandwidth Avoid misuse of using these terms

interchangeablyThe general public seem to think

these refer to the same thing, and in some cases, from a user perspective, this may be OK.

I say, Beatrice, how is the baud rate on your end?

A B

My dear Alice, itis much the same as your end. Limited to about 200 twangs per second.

Nevertheless an engineer should know that: baud specifies the symbol rate (or signal changes) per unit of time, and this couldbe an entirely analogue measure.

Bandwidth is different depending whetheryou are considering computer networks/telecomsor signals more generally (e.g. in a RF system). Forcomputing Bandwidth is a measure of data rate in bits per second. But more generally in the electrical engineering Bandwidth refers to the frequency range within a particular band (typically a band that is used for transmitting some sort of signal).

Page 8: Lecture 8: Design of Parallel Programs Part III Lecturer: Simon Winberg Attribution-ShareAlike 4.0 International (CC BY-SA 4.0)

MBps

Baudrate – an issue of standards(from a computing/telecoms perspective)

Baudrate is not always interpreted the same way for all applications when it comes to deciding the amount of bytes that the link provides. This is due to other aspects of the protocol used, e.g. if each byte is surrounded by error detection bits.

Remember a byte is always understood as 8 bits – but the byte is just the ‘useful data’ that is being transferred.

If the connection is specified as a speed in bits per second, then it is 8 bits per byte. When you specify the speed in baud rate, you're specifying symbol rate (see previous slide) in which case you might be using 10 symbols to a byte when using start-data-stop or 8b/10b encoding schemes. So, in other words you mayquite likely find yourself in a situation where you have a bandwidth of say1Mbps but a baud rate of 1.25 mega symbols per second.

There is even an official SI unit for “baud”: when writing it out as "Bd" youare specifying the symbol rate. And this shouldn’t be confused with the more commonly known bps (bits per second) or Bps (bytes per second). So you should now know that x MBd is not always = x Mbps

Example: If you have an 8b/10b encoded line that has a bandwidth of 10 MBd, you

can automatically say that it has a bandwidth of 8 Mb/s, or 1 MB/s (excludingoverhead such as packet headers, of course).

Further reading: http://en.wikipedia.org/wiki/Baud Mbps

Page 9: Lecture 8: Design of Parallel Programs Part III Lecturer: Simon Winberg Attribution-ShareAlike 4.0 International (CC BY-SA 4.0)

Suggested further reading on baud rate and bandwidthIf you’re one of those super keen students, you might like to look at these nuggets of knowledge and web pages for further reading on this topic:• PCIe 1.0 and 2.0 (including Ethernet and USB) uses 8b/10b encoding. PCIe

3.0 and 4.0 uses 128b/130b encoding. Read more at: http://

en.wikipedia.org/wiki/PCI_Express#History_and_revisions • Historically, Hamming code refers to the Hamming (7, 4) code, which uses 7

symbols for every 4 data bits. There are many other variations on this which you can explore at http://en.wikipedia.org/wiki/Hamming_code

• DVD encoding uses Reed-Solomon error correction in two stages (note, in case you were wondering, this encoding has nothing to do with grass or mines) read more at http://en.wikipedia.org/wiki/Reed%E2%80%93Solomon_error_correction#Data_storage.

• There are even more interesting cases where run-length limited codes are used with error correction and (for argument sake) 16 QAM signal encoding. 1 MBd 16 QAM has a bandwidth of 4 Mb/s (rather interesting case because the baud rate here is a quarter of the bit-rate! You also get 32, 64, 128 and 256 QAM systems....)

Acknowledgement: thanks to John-Philip Taylor for these interesting reading suggestions.

(these readings are not examined)

Page 10: Lecture 8: Design of Parallel Programs Part III Lecturer: Simon Winberg Attribution-ShareAlike 4.0 International (CC BY-SA 4.0)
Page 11: Lecture 8: Design of Parallel Programs Part III Lecturer: Simon Winberg Attribution-ShareAlike 4.0 International (CC BY-SA 4.0)

Effective bandwidth

Total latency = Sending overhead + Transmission time + time of flight + Receiver overhead

Effective bandwidth = Message size / total latency

Sending overhead Transmission time

Time of flight

Transport latency

Receive overhead

Total latency

Transmission time

time of flight is also referred to as ‘propagation delay’ – it may depend on how manychannels are used. E.g. a two-channel path will give an effective lower propagation.With switching circuitry, the propagation delay can increase significantly.

Page 12: Lecture 8: Design of Parallel Programs Part III Lecturer: Simon Winberg Attribution-ShareAlike 4.0 International (CC BY-SA 4.0)

Effective bandwidth calc.

Solution:

Transmission time = 80,000 bits10Mbits/s

= 10bits/μs

Example: Distance 100m Raw bandwidth 10Mbit/s

Message 10,000 bytes Sending overhead 200us Receiving overhead 300us

80,000 bits = 8,000 μs

Time of flight = 3 x 108 m/s100m =

3 x 106 m/s0.33 μs= 1m

Total latency = Sending overhead + Transmission time + time of flight + Receiver overhead

Total latency = 200μs + 8,000μs + 0.33μs + 300μs = 8,500.33μs

Effective bandwidth = Message size / total latency

Effective bandwidth = 80,000bits / 8,500μs = 9.41 Mbits/s

Page 13: Lecture 8: Design of Parallel Programs Part III Lecturer: Simon Winberg Attribution-ShareAlike 4.0 International (CC BY-SA 4.0)

Visibility of Communications

Communications is usually both explicit and highly visible when using the message passing programming model.

Communications may be poorly visible when using the data parallel programming model.

For data parallel design on a distributed system, communications may be entirely invisible, in that the programmer may have no understanding (and no easily obtainable means) to accurately determine what inter-task communications is happening.

Page 14: Lecture 8: Design of Parallel Programs Part III Lecturer: Simon Winberg Attribution-ShareAlike 4.0 International (CC BY-SA 4.0)

Synchronous vs. asynchronous Synchronous communications

Require some kind ofhandshaking between tasks thatshare data / results.

May be explicitly structured in the code, under control of the programmer – or it may happen at a lower level, not under control of the programmer.

Synchronous communications are also referred to as blocking communications because other work must wait until the communications has finished.

Page 15: Lecture 8: Design of Parallel Programs Part III Lecturer: Simon Winberg Attribution-ShareAlike 4.0 International (CC BY-SA 4.0)

Synchronous vs. asynchronous Asynchronous communications

Allow tasks to transfer data between one another independently. E.g.: task A sends a message to task B, and task A immediately begin continues with other work. The point when task B actually receives, and starts working on, the sent data doesn't matter.

Asynchronous communications are often referred to as non-blocking communications.

Allows for interleaving of computation and communication, potentially providing less overhead compared to the synchronous case

Page 16: Lecture 8: Design of Parallel Programs Part III Lecturer: Simon Winberg Attribution-ShareAlike 4.0 International (CC BY-SA 4.0)

Scope of communications

Scope of communications:Knowing which tasks must

communicate with each other Can be crucial to an effective design

of a parallel program. Two general types of scope:

Point-to-point (P2P)Collective / broadcasting

Page 17: Lecture 8: Design of Parallel Programs Part III Lecturer: Simon Winberg Attribution-ShareAlike 4.0 International (CC BY-SA 4.0)

Scope of communications

Point-to-point (P2P) Involves only two tasks, one task is the

sender/producer of data, and the other acting as the receiver/consumer.

Collective Data sharing between more than two tasks

(sometimes specified as a common group or collective).

Both P2P and collective communications can be synchronous or asynchronous.

Page 18: Lecture 8: Design of Parallel Programs Part III Lecturer: Simon Winberg Attribution-ShareAlike 4.0 International (CC BY-SA 4.0)

Collective communicationsTypical techniques used for collective communications:

Initiator

Task Task Task

BROADCAST

Same message sent to all tasks Initiator

Task Task Task

SCATTER

Different message sent to each tasks

Task

Task Task Task

GATHER

Messages from tasks are combined togetherTask

Task Task Task

REDUCING

Only parts, or reduced form, of the messages are worked on

Page 19: Lecture 8: Design of Parallel Programs Part III Lecturer: Simon Winberg Attribution-ShareAlike 4.0 International (CC BY-SA 4.0)

Efficiency of communications

There may be a choice of different communication techniques In terms of hardware (e.g., fiberoptics,

wireless, bus system), and In terms of software / protocol used

Programmer may need to use a combination of techniques and technology to establish the most efficient choice (in terms of speed, power, size, etc).

Page 20: Lecture 8: Design of Parallel Programs Part III Lecturer: Simon Winberg Attribution-ShareAlike 4.0 International (CC BY-SA 4.0)

Cloud ComputingEEE4084F

Page 21: Lecture 8: Design of Parallel Programs Part III Lecturer: Simon Winberg Attribution-ShareAlike 4.0 International (CC BY-SA 4.0)

Cloud Computing: a short but informative marketing clip…

“Salesforce : what is cloud computing”

Page 22: Lecture 8: Design of Parallel Programs Part III Lecturer: Simon Winberg Attribution-ShareAlike 4.0 International (CC BY-SA 4.0)

Checking your understanding…

Which one(s) of these are not official Cloud Computing Services as mentioned in the video?1. Daas2. IaaS3. PaaS4. SaaS5. YaaS

… answer …

??

??

Page 23: Lecture 8: Design of Parallel Programs Part III Lecturer: Simon Winberg Attribution-ShareAlike 4.0 International (CC BY-SA 4.0)

Cloud Computing

Cloud computing is a style of computing in which dynamically scalable and usually virtualized computing resources are provided as a service over the internet.

Cloud computing use: Request resources or services over the

internet (or intranet) Provides scalability and reliability of a data

center

Page 24: Lecture 8: Design of Parallel Programs Part III Lecturer: Simon Winberg Attribution-ShareAlike 4.0 International (CC BY-SA 4.0)

Characteristics

On demand scalabilityAdd or subtract processors, memory,

network bandwidth to your cluster(Be billed for the QoS / resource usage)

Virtualization and virtual systems & servicesRequest operating system, storage,

databases, databases, other services

Page 25: Lecture 8: Design of Parallel Programs Part III Lecturer: Simon Winberg Attribution-ShareAlike 4.0 International (CC BY-SA 4.0)

Key Technology for Cloud Computing: Virtualization

Hardware

Operating System

App App App

Traditional Computing Stack

Hardware

OS

App App App

Hypervisor

OS OS

Virtualized Computing Stack

Page 26: Lecture 8: Design of Parallel Programs Part III Lecturer: Simon Winberg Attribution-ShareAlike 4.0 International (CC BY-SA 4.0)

Cloud Computing Models

Utility computing Rent cycles Examples: Amazon’s EC2, GoGrid, AppNexus

Platform as a Service (PaaS) Provides user-friendly API to aid implementation Example: Google App Engine

Software as a Service (SaaS) Just run it for me Example: Gmail

Driving philosophy: Why buy the equipment, do the configuration and maintenance and yourself, if you can contract it out? Can work out much more cost effectively.

Page 27: Lecture 8: Design of Parallel Programs Part III Lecturer: Simon Winberg Attribution-ShareAlike 4.0 International (CC BY-SA 4.0)

Amazon Web Services Elastic Compute Cloud (EC2)

Rent computing resources by the hourBasic unit of accounting = instance-hourAdditional cost for bandwidth usage

Simple Storage Service (S3)Persistent storageCharge by the GB/monthAdditional costs for bandwidth

(Planning a guest lecture / seminar by Engineer(s) from Amazon Data Centerfor both ECE & CS Hons class to happen soon)

Page 28: Lecture 8: Design of Parallel Programs Part III Lecturer: Simon Winberg Attribution-ShareAlike 4.0 International (CC BY-SA 4.0)

VirtualizationEEE4084F

Page 29: Lecture 8: Design of Parallel Programs Part III Lecturer: Simon Winberg Attribution-ShareAlike 4.0 International (CC BY-SA 4.0)

What is virtualization?

The concept of “cloud computing” is often used as a generalized term to refer to remotely accessing application software stored on centralized servers or a cluster where you don’t know which PC may be hosting the app.

The term “virtualization” is sometimes confused with “cloud computing”, typically used incorrectly in such situations (to refer to a virtual computer).

While these technologies are similar the terms are not interchangeable…

Page 30: Lecture 8: Design of Parallel Programs Part III Lecturer: Simon Winberg Attribution-ShareAlike 4.0 International (CC BY-SA 4.0)

Definition of Cloud Computing: Cloud computing is a computing term or metaphor that evolved in the

late 2000s, based on utility and consumption of computing resources. Cloud computing involves deploying groups of remote servers and software networks that allow centralized data storage and online access to computer services or resources. [source: www.en.wikipedia.org/wiki/Cloud_computing]

Definition of Virtualization: Virtualization (in computing) refers to the act of creating a virtual

(rather than actual) version of something, including but not limited to a virtual computer hardware platform, operating system (OS), storage device, or computer network resources. [www.en.wikipedia.org/wiki/Virtualization]

It is essentially software that separates physical computing infrastructures to create a variety of dedicated computing resources. It is the fundamental technology behind cloud computing, particularly as it allows a consistent software layer for the same programs to run on different hardware. (see next slide for elaboration…)

Page 31: Lecture 8: Design of Parallel Programs Part III Lecturer: Simon Winberg Attribution-ShareAlike 4.0 International (CC BY-SA 4.0)

Virtualization

Virtualization software allows multiple operating systems and multiple applications all to run at the same time on the same server.

“It enables businesses to reduce IT costs while increasing the efficiency, utilization and flexibility of their existing computer hardware.” -- Quote by Mike Adams (VMware director of product marketing)

Page 32: Lecture 8: Design of Parallel Programs Part III Lecturer: Simon Winberg Attribution-ShareAlike 4.0 International (CC BY-SA 4.0)

Difference between virtualization and cloud computing

Main fundamental difference between these: Virtualization is software that manipulates hardware and

provides a layer to present a consistant interface; whereas cloud computing is a service / collection of services that runs on top of that layer (i.e., on the virtual machine).

You can also argue that: cloud computing is the delivery of services, using shared computing resources, software or data, on-demand through via Internet. Whereas virtualization is more a lower level interface unseen by the users.

A confusion is “virtualization as a service” – i.e. where the cloud basically provides virtual machines and operating system of the particular type requested that the users then writes code for.

Page 33: Lecture 8: Design of Parallel Programs Part III Lecturer: Simon Winberg Attribution-ShareAlike 4.0 International (CC BY-SA 4.0)

You don’t have to have a cloud for the virtualized environment

A major advantage of virtualization is allowing forsustainability (increasedproduct lifespan) of crucialsoftware products, such asbespoke applications developedin-house and which are veryimportant for the business.(e.g. expert system, estimationengine, simulation modelling)

VirtualizedEnvironment

VirtualizedEnvironment

appapp

appapp

CloudServices

CrucialSoftwareProduct

Page 34: Lecture 8: Design of Parallel Programs Part III Lecturer: Simon Winberg Attribution-ShareAlike 4.0 International (CC BY-SA 4.0)

Example Virtualization ProductsEEE4084F

Page 35: Lecture 8: Design of Parallel Programs Part III Lecturer: Simon Winberg Attribution-ShareAlike 4.0 International (CC BY-SA 4.0)

Common Virtualization ProductsName Creator Host CPU Guest CPU Host OS Guest OS License

bhyve FreeBSD x86-64 x86-64 FreeBSD FreeBSD, OpenBSD, GNU/Linux …

BSD

DOSBox

Peter Veenstra, Sjoerd, community help

Any x86, ARM

Linux, Windows, Mac OS X, BeOS, FreeBSD, Solaris, QNX, IRIX, AmigaOSymbian, …

Internally emulated DOS shell; classic PC booter games, …

GPL

Hyper-V (2012)

Microsoft

x86-64 + hardware-assisted virtualization (Intel VT-x or AMD-V …

x86-64, (up to 64 physical CPUs)

Windows 8/8.1 & Windows Server 2012 (R2) w/Hyper-V role, Microsoft Hyper-V Server

supported drivers for Windows NT, FreeBSD, Linux (SUSE 10, RHEL 6, CentOS 6) …

Proprietary

VirtualBoxInnotek, acquired by Oracle Corporation

x86, x86-64, Intel VT-x, AMD-V

x86, x86-64

Windows, Linux, Mac OS X x86, Solaris, FreeBSD, eComStation

DOS, Linux, Mac OS X Server, FreeBSD, Haiku, OS/2, Solaris, Syllable, Windows.,

GPL version 2; full version with extra enterprise features is proprietary:

Windows Virtual PC

Connectix & Microsoft

x86, x86-64 with Intel VT-x or AMD-V

x86 Windows 7Windows XP, Windows Vista, Windows 7, …

Proprietary (generally free for non-comm.)

VMware Server

VMware x86, x86-64 x86, x86-64 Windows, LinuxSame as VMware ESX Server

Proprietary

Adapted from: http://en.wikipedia.org/wiki/Comparison_of_platform_virtualization_software

Page 36: Lecture 8: Design of Parallel Programs Part III Lecturer: Simon Winberg Attribution-ShareAlike 4.0 International (CC BY-SA 4.0)

Next lecture

Next LectureGPUs and OpenMPData Dependencies (Step 5)

Later LecturesSynchronization (step 6)Load balancing (step 7)Performance Analysis (step 8)

End of “design of parallel programs” series of lectures

Page 37: Lecture 8: Design of Parallel Programs Part III Lecturer: Simon Winberg Attribution-ShareAlike 4.0 International (CC BY-SA 4.0)

Disclaimers and copyright/licensing details

I have tried to follow the correct practices concerning copyright and licensing of material, particularly image sources that have been used in this presentation. I have put much effort into trying to make this material open access so that it can be of benefit to others in their teaching and learning practice. Any mistakes or omissions with regards to these issues I will correct when notified. To the best of my understanding the material in these slides can be shared according to the Creative Commons “Attribution-ShareAlike 4.0 International (CC BY-SA 4.0)” license, and that is why I selected that license to apply to this presentation (it’s not because I particulate want my slides referenced but more to acknowledge the sources and generosity of others who have provided free material such as the images I have used).

Image sources: Wikipedia (open commons) www.flickr.com (public domain images) Picture of running going up stairs – Wikimedia open commons http://pixabay.com open commons / public domain images