cse 531 parallel processors and processing dr. mahmut kandemir

22
CSE 531 CSE 531 Parallel Parallel Processors and Processors and Processing Processing Dr. Mahmut Kandemir Dr. Mahmut Kandemir

Upload: debra-armstrong

Post on 29-Dec-2015

219 views

Category:

Documents


2 download

TRANSCRIPT

CSE 531CSE 531 Parallel Processors Parallel Processors

and Processing and Processing

Dr. Mahmut KandemirDr. Mahmut Kandemir

Topic Overview Topic Overview

Course AdministrationCourse Administration Motivating Parallelism Motivating Parallelism Scope of Parallel Computing Applications Scope of Parallel Computing Applications Organization and Contents of the Course Organization and Contents of the Course

CSE 531, Fall 2005CSE 531, Fall 2005 This is CSE 531— “Parallel Processors and Processing”This is CSE 531— “Parallel Processors and Processing”

TopicsTopics in the understanding, designing, and implementing of in the understanding, designing, and implementing of parallel systems and algorithms. We will study essential parallel systems and algorithms. We will study essential concepts and structures found in modern parallel computing, concepts and structures found in modern parallel computing, and compare different paradigms.and compare different paradigms.

Important factsImportant facts– Instructor: Mahmut Kandemir ([email protected])Instructor: Mahmut Kandemir ([email protected])– Office: IST 354C; Office Hours: T-Th 10 AM to 11 AMOffice: IST 354C; Office Hours: T-Th 10 AM to 11 AM

Teaching Assistant Teaching Assistant – No such luck!No such luck!

Basis for Grades (tentative)Basis for Grades (tentative)– Mid-term : Mid-term : 30%30% – Final : Final : 40%40%– Homeworks and Programming Assignments: Homeworks and Programming Assignments: 30%30%

Homeworks and ExamsHomeworks and Exams Exams (closed notes, closed book)Exams (closed notes, closed book)

– Mid-term & comprehensive finalMid-term & comprehensive final HomeworksHomeworks

– Several homework assignments Several homework assignments – Cover mundane issues and provide drill Cover mundane issues and provide drill – I will prepare and grade themI will prepare and grade them

Programming AssignmentsProgramming Assignments– Certain homeworks will include programming Certain homeworks will include programming

assignmentsassignments– Thread, MPI, OpenMP programmingThread, MPI, OpenMP programming– Will cover several aspects of parallel computing & Will cover several aspects of parallel computing &

algorithmsalgorithms

Class-Taking Strategy for CSE Class-Taking Strategy for CSE 531531

I will use a “slide show”I will use a “slide show”– I need to moderate my speed (and it is really difficult)I need to moderate my speed (and it is really difficult)– You need to learn to say You need to learn to say STOPSTOP and REPEAT and REPEAT

You need to read the book and attend the classYou need to read the book and attend the class– Close correspondence Close correspondence – Material in book that will not appear in lecture Material in book that will not appear in lecture – You are responsible for material from class and assigned parts You are responsible for material from class and assigned parts

from book (reading assignments)from book (reading assignments)– Coming to class regularly is an excellent strategyComing to class regularly is an excellent strategy

I will record attendance!I will record attendance! I’m terrible with namesI’m terrible with names

– Forgive me (in advance) for forgettingForgive me (in advance) for forgetting– Help me out by reminding me of your namesHelp me out by reminding me of your names– Feel free to send e-mail toFeel free to send e-mail to

Discuss/remind somethingDiscuss/remind something Arrange a meeting outside office hours Arrange a meeting outside office hours

About the BookAbout the Book

Introduction to Parallel ComputingIntroduction to Parallel Computing

– A. Grama, A. Gupta, G. Karypis, V. KumarA. Grama, A. Gupta, G. Karypis, V. Kumar

– Second Edition, Addison WesleySecond Edition, Addison Wesley

Book presents modern material Book presents modern material – Addresses current techniques/issuesAddresses current techniques/issues– Talks about both parallel architectures and algorithmsTalks about both parallel architectures and algorithms

Other relevant textbooks will be on reserve in Other relevant textbooks will be on reserve in librarylibrary

HomeworksHomeworks

No late assignment will be acceptedNo late assignment will be accepted Exceptions only under the most dire of Exceptions only under the most dire of

circumstancescircumstances Turn in what you have; I am generous with Turn in what you have; I am generous with

partial creditpartial credit Solutions to most assignments will be Solutions to most assignments will be

made on-line or discussed in the class made on-line or discussed in the class after the due dateafter the due date

CollaborationCollaboration Collaboration is encouragedCollaboration is encouraged

– But, you have to work through everything yourself – But, you have to work through everything yourself – share ideas, but not code or write-upsshare ideas, but not code or write-ups

– I have no qualms about giving everybody (who survives) I have no qualms about giving everybody (who survives) a high grade if they deserve it, so you don’t have to a high grade if they deserve it, so you don’t have to competecompete

– In fact, if you co-operate, you will learn moreIn fact, if you co-operate, you will learn more Any apparent cases of collaboration Any apparent cases of collaboration on examson exams, or , or

of of unreported unreported collaboration on assignments will collaboration on assignments will be treated as academic dishonestybe treated as academic dishonesty

About the InstructorAbout the Instructor My own researchMy own research

– Compiling for advanced microprocessor systems with deep memory Compiling for advanced microprocessor systems with deep memory hierarchieshierarchies

– Optimization for embedded systems (space, power, speed, reliability)Optimization for embedded systems (space, power, speed, reliability)– Energy-conscious hardware and software designEnergy-conscious hardware and software design– Just-in-Time (JIT) compilation and dynamic code generation for JavaJust-in-Time (JIT) compilation and dynamic code generation for Java– Large scale input/output systemsLarge scale input/output systems

Thus, my interests lie inThus, my interests lie in– Quality of generated codeQuality of generated code– Interplay between compile, architecture, and programming languagesInterplay between compile, architecture, and programming languages– Static and dynamic analysis to understand program behaviorStatic and dynamic analysis to understand program behavior– Custom compilation techniques and data managementCustom compilation techniques and data management

Visit:Visit: http://www.cse.psu.edu/~kandemir/http://www.cse.psu.edu/~kandemir/

Motivating Parallelism Motivating Parallelism

The role of parallelism in accelerating The role of parallelism in accelerating computing speeds has been recognized for computing speeds has been recognized for several decades. several decades.

Its role in providing multiplicity of datapaths Its role in providing multiplicity of datapaths and increased access to storage elements has and increased access to storage elements has been significant in commercial applications. been significant in commercial applications.

The scalable performance and lower cost of The scalable performance and lower cost of parallel platforms is reflected in the wide parallel platforms is reflected in the wide variety of applications. variety of applications.

Motivating Parallelism Motivating Parallelism Developing parallel hardware and software has Developing parallel hardware and software has

traditionally been time and effort intensive. traditionally been time and effort intensive. If one is to view this in the context of rapidly If one is to view this in the context of rapidly

improving uniprocessor speeds, one is tempted to improving uniprocessor speeds, one is tempted to question the need for parallel computing. question the need for parallel computing.

There are some unmistakable trends in hardware There are some unmistakable trends in hardware design, which indicate that uniprocessor (or implicitly design, which indicate that uniprocessor (or implicitly parallel) architectures may not be able to sustain the parallel) architectures may not be able to sustain the rate of rate of realizablerealizable performance increments in the performance increments in the future. future.

This is the result of a number of fundamental physical This is the result of a number of fundamental physical and computational limitations. and computational limitations.

The emergence of standardized parallel programming The emergence of standardized parallel programming environments, libraries, and hardware have environments, libraries, and hardware have significantly reduced time to (parallel) solution.significantly reduced time to (parallel) solution.

The Computational Power The Computational Power ArgumentArgument

Moore's law states [1965]: Moore's law states [1965]:

``The complexity for minimum component ``The complexity for minimum component costs has increased at a rate of roughly a factor of costs has increased at a rate of roughly a factor of two per year. Certainly over the short term this two per year. Certainly over the short term this rate can be expected to continue, if not to rate can be expected to continue, if not to increase. Over the longer term, the rate of increase. Over the longer term, the rate of increase is a bit more uncertain, although there is increase is a bit more uncertain, although there is no reason to believe it will not remain nearly no reason to believe it will not remain nearly constant for at least 10 years. That means by constant for at least 10 years. That means by 1975, the number of components per integrated 1975, the number of components per integrated circuit for minimum cost will be 65,000.''circuit for minimum cost will be 65,000.''

The Computational Power The Computational Power ArgumentArgument

Moore attributed this doubling rate to Moore attributed this doubling rate to exponential behavior of die sizes, finer minimum exponential behavior of die sizes, finer minimum dimensions, and ``circuit and device cleverness''. dimensions, and ``circuit and device cleverness''.

In 1975, he revised this law as follows: In 1975, he revised this law as follows:

``There is no room left to squeeze anything ``There is no room left to squeeze anything out by being clever. Going forward from here we out by being clever. Going forward from here we have to depend on the two size factors - bigger have to depend on the two size factors - bigger dies and finer dimensions.''dies and finer dimensions.''

He revised his rate of He revised his rate of circuit complexitycircuit complexity doubling to 18 months and projected from 1975 doubling to 18 months and projected from 1975 onwards at this reduced rate. onwards at this reduced rate.

The Computational Power The Computational Power ArgumentArgument

If one is to buy into Moore's law, the question If one is to buy into Moore's law, the question still remains - still remains - how does one translate transistors how does one translate transistors into useful OPS (operations per second)? into useful OPS (operations per second)?

The logical recourse is to rely on parallelism, The logical recourse is to rely on parallelism, both both implicitimplicit and and explicitexplicit. .

Most serial (or seemingly serial) processors rely Most serial (or seemingly serial) processors rely extensively on implicit parallelism. extensively on implicit parallelism.

We focus in this class, for the most part, on We focus in this class, for the most part, on explicit parallelism. explicit parallelism.

The Memory/Disk Speed The Memory/Disk Speed Argument Argument

While clock rates of high-end processors have increased While clock rates of high-end processors have increased at roughly 40% per year over the past decade, DRAM at roughly 40% per year over the past decade, DRAM access times have only improved at the rate of roughly access times have only improved at the rate of roughly 10% per year over this interval. 10% per year over this interval.

This mismatch in speeds causes significant performance This mismatch in speeds causes significant performance bottlenecks – this is a very serious issue!bottlenecks – this is a very serious issue!

Parallel platforms provide increased bandwidth to the Parallel platforms provide increased bandwidth to the memory system. memory system.

Parallel platforms also provide higher aggregate caches. Parallel platforms also provide higher aggregate caches. Principles of locality of data reference and bulk access, Principles of locality of data reference and bulk access,

which guide parallel algorithm design also apply to which guide parallel algorithm design also apply to memory optimization. memory optimization.

Some of the fastest growing applications of parallel Some of the fastest growing applications of parallel computing utilize not their raw computational speed, computing utilize not their raw computational speed, rather their ability to pump data to memory and disk rather their ability to pump data to memory and disk faster. faster.

The Data Communication The Data Communication Argument Argument

As the network evolves, the vision of the As the network evolves, the vision of the Internet as one large computing platform Internet as one large computing platform has emerged. has emerged.

This view is exploited by applications such This view is exploited by applications such as SETI@home and Folding@home. as SETI@home and Folding@home.

In many other applications (typically In many other applications (typically databases and data mining) the volume of databases and data mining) the volume of data is such that they cannot be moved – data is such that they cannot be moved – inherently distributed computing. inherently distributed computing.

Any analyses on this data must be Any analyses on this data must be performed over the network using parallel performed over the network using parallel techniques. techniques.

Scope of Parallel Computing Scope of Parallel Computing Applications Applications

Parallelism finds applications in very Parallelism finds applications in very diverse application domains for different diverse application domains for different motivating reasons. motivating reasons.

These range from improved application These range from improved application performance to cost considerations. performance to cost considerations.

Applications in Engineering Applications in Engineering and Design and Design

Design of airfoils (optimizing lift, drag, Design of airfoils (optimizing lift, drag, stability), internal combustion engines stability), internal combustion engines (optimizing charge distribution, burn), high-(optimizing charge distribution, burn), high-speed circuits (layouts for delays and speed circuits (layouts for delays and capacitive and inductive effects), and capacitive and inductive effects), and structures (optimizing structural integrity, structures (optimizing structural integrity, design parameters, cost, etc.). design parameters, cost, etc.).

Design and simulation of micro- and nano-Design and simulation of micro- and nano-scale systems (MEMS, NEMS, etc). scale systems (MEMS, NEMS, etc).

Process optimization, operations research. Process optimization, operations research.

Scientific Applications Scientific Applications Functional and structural characterization of genes and Functional and structural characterization of genes and

proteins. proteins.

Advances in computational physics and chemistry have Advances in computational physics and chemistry have explored new materials, understanding of chemical explored new materials, understanding of chemical pathways, and more efficient processes. pathways, and more efficient processes.

Applications in astrophysics have explored the evolution Applications in astrophysics have explored the evolution of galaxies, thermonuclear processes, and the analysis of of galaxies, thermonuclear processes, and the analysis of extremely large datasets from telescopes. extremely large datasets from telescopes.

Weather modeling, mineral prospecting, flood prediction, Weather modeling, mineral prospecting, flood prediction, etc., are other important applications. etc., are other important applications.

Bioinformatics and astrophysics also present some of the Bioinformatics and astrophysics also present some of the most challenging problems with respect to analyzing most challenging problems with respect to analyzing extremely large datasets. extremely large datasets.

Commercial Applications Commercial Applications Some of the largest parallel computers Some of the largest parallel computers

power the Wall Street! power the Wall Street!

Data mining and analysis for optimizing Data mining and analysis for optimizing business and marketing decisions. business and marketing decisions.

Large scale servers (mail and web servers) Large scale servers (mail and web servers) are often implemented using parallel are often implemented using parallel platforms. platforms.

Applications such as information retrieval Applications such as information retrieval and search are typically powered by large and search are typically powered by large clusters. clusters.

Applications in Computer Applications in Computer Systems Systems

Network intrusion detection, cryptography, multiparty Network intrusion detection, cryptography, multiparty computations are some of the core users of parallel computations are some of the core users of parallel computing techniques. computing techniques.

Embedded systems increasingly rely on distributed Embedded systems increasingly rely on distributed control algorithms. control algorithms.

A modern automobile consists of tens of processors A modern automobile consists of tens of processors communicating to perform complex tasks for communicating to perform complex tasks for optimizing handling and performance. optimizing handling and performance.

Conventional structured peer-to-peer networks Conventional structured peer-to-peer networks impose overlay networks and utilize algorithms impose overlay networks and utilize algorithms directly from parallel computing. directly from parallel computing.

Organization/Contents of this Organization/Contents of this CourseCourse

Fundamentals:Fundamentals: This part of the class covers This part of the class covers basic parallel platforms, principles of algorithm basic parallel platforms, principles of algorithm design, group communication primitives, and design, group communication primitives, and analytical modeling techniques. analytical modeling techniques.

Parallel Programming:Parallel Programming: This part of the class This part of the class deals with programming using message deals with programming using message passing libraries and threads. passing libraries and threads.

Parallel Algorithms:Parallel Algorithms: This part of the class This part of the class covers basic algorithms for matrix covers basic algorithms for matrix computations, graphs, sorting, discrete computations, graphs, sorting, discrete optimization, and dynamic programming. optimization, and dynamic programming.