parallel performance wizard: introduction
Post on 05-Jan-2016
51 Views
Preview:
DESCRIPTION
TRANSCRIPT
06 April 2006
Parallel Performance Wizard:
Introduction
Professor Alan D. George, Principal InvestigatorMr. Hung-Hsun Su, Sr. Research Assistant
Mr. Adam Leko, Sr. Research AssistantMr. Bryan Golden, Research Assistant
Mr. Hans Sherburne, Research AssistantMr. Max Billingsley, Research Assistant
Mr. Josh Hartman, Undergraduate Volunteer
HCS Research LaboratoryUniversity of Florida
206 April 2006
Outline
Motivations and Objectives
Background
Framework & Key Features
Phase II Tasks & Schedules
Today’s Schedule
306 April 2006
Motivations and Objectives Motivations
UPC/SHMEM program does not yield the expected performance. Why?
Due to complexity of parallel computing, difficult to determine without tools for performance analysis and optimization
Discouraging for users, new & old; few options for shared-memory computing in UPC and SHMEM communities
Objectives Research topics relating to performance analysis Develop framework for a performance analysis tool Design with both performance and user productivity in mind Develop a performance analysis tool for UPC and SHMEM
406 April 2006
Need for Performance Analysis Performance analysis of
sequential applications can be challenging
Performance analysis of explicitly communicating parallel applications is significantly more difficult Mainly due to increase in
number of processing nodes
Performance analysis of Implicitly communicating parallel applications is even more difficult Non-blocking, one-sided
communication is tricky to track and analyze accurately
Sequential programming
Processing node
Explicit parallel programming (ex: MPI)
Processing node
...Processing
node
Implicit parallel programming (ex: UPC, SHMEM)
Processing node
...Processing
node
506 April 2006
Background - SHMEM
SHared MEMory library Based on SPMD model Available for C / Fortran Available for servers and clusters Easier to program than MPI Hybrid programming model
Traits of message passing Explicit communication, replication and synchronization Need to give remote data location (processing element ID)
Traits of shared memory Provides logically shared memory system view Non-blocking, one-sided communication
Lower latency, higher bandwidth communication PSHMEM available for some implementations
Int xfloat y
Remotely Accessible
Memory
Private Memory
Int xfloat y
Private Memory
SAME ADDRESS
P.E. X P.E. Y
Remotely Accessible
Memory
606 April 2006
Background - UPC Unified Parallel C (UPC) Partitioned GAS parallel programming language Common and familiar syntax and semantics for parallel C with simple
extensions to ANSI C Many implementations
Open source: Berkeley UPC, Michigan UPC, GCC-UPC Proprietary: HP-UPC, Cray-UPC
Easier to program than MPI, software more scalable With hand-tuning, UPC performance compares favorably with MPI
706 April 2006
Background – Performance Analysis Three general performance analysis approaches
Analytical modeling Mostly predictive methods Could also be used in conjunction with experimental performance
measurement Pros: easy to use, fast, can be performed without running the program Cons: usually not very accurate
Simulation Pros: allow performance estimation of program with various system
architectures Cons: slow, not generally applicable for regular UPC/SHMEM users
Experimental performance measurement Strategy used by most modern performance analysis tools (PATs) Uses actual event measurement to perform analysis Pros: most accurate Cons: can be time-consuming
PAT = Performance Analysis ToolPAT = Performance Analysis Tool
806 April 2006
Background - Experimental Performance Measurement Stages Instrumentation – user-assisted or automatic insertion of instrumentation
code Measurement – actual measuring stage Analysis – data analysis toward bottleneck detection & resolution Presentation – display of analyzed data to user, deals directly with user Optimization – process of finding and resolving bottlenecks
Instrumentation Measurement Analysis
PresentationOptimization
Analytical model Simulation
906 April 2006
FrameworkInstrumentation
Module
Input interface
Sourceinstrumentation
manager
Binaryinstrumentation
manager
Measurement units Event file managerLow-level event read/
write
High-level event read/write
Analysis/data manager
Analysis units (bottleneck detection)
Simulator
Basic event data
Composite event data
Analysis data
STORAGE
Performance prediction
Performance bottleneck resolution
Code transformation
Visualization manager
Measurement Module
Analysis Module
Presentation Module
Optimization Module
Partially done with project
Completely done with project
Not part of project
1006 April 2006
Key Features Semi-automatic source-level instrumentation as default
Only P module and part of I module are visible to user
PAPI will be used
Tracing mode as default with profiling support
Post-mortem data processing and analysis
Analyses: load balancing, scalability, memory system
Visualizations: timeline display, speedup chart, call-tree graph, communication volume graph, memory access graph, profiling table
1106 April 2006
Tasks & ScheduleID Task Name
2005 2006 2007
Oct Nov Dec Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec Jan Feb Mar
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
Develop detailed design
Develop detailed design for I, M, and A modules
Design and evaluate high-level user interface (P module)
Develop standard set of test cases
Construct, test, and integrate I&M modules
Construct and test I module for open platforms
Construct and test M module for open platforms
Integrate I with M modules and test for open platforms
Construct and test I module for proprietary platforms
Construct and test M unit for proprietary platforms
Integrate I with M units and test for proprietary platforms
Construct, test, and integrate A module
Construct and test A module for open platforms
Integrate A with IM modules and test for open platforms
Construct and test A module for proprietary platforms
Integrate A with IM modules and test for proprietary platforms
Construct, test, and integrate P module
Code and test P module
Integrate P with IMA modules and test
Conduct beta tests and refine tool design and implementation
Beta test #1 – functionality + usability
Beta test #2 – functionality + usability + productivity with controlled case studies
Beta test #3 – functionality + usability + productivity with uncontrolled case studies
Design and implement advanced features
Preliminary research in performance optimization via prediction
1206 April 2006
Discussion Topic: Target PlatformsOur current platform list; changes needed? Open
Quadrics SHMEM on Opterons + RHEL4 (qsnet) Berkeley UPC on Opterons + RHEL4 (iba)
Proprietary Cray UPC on X1E (src. inst) Cray SHMEM on X1E
1306 April 2006
Today’s Schedule
09:00 – 09:30 AM Project overview09:30 – 10:15 AM Instrumentation (I) module presentation10:15 – 10:30 AM BREAK10:30 – 11:15 AM Measurement (M) module presentation11:15 – 11:45 AM I&M-modules demo11:45 – 13:00 PM LUNCH13:00 – 13:45 PM Analysis (A) module presentation13:45 – 14:00 PM A-module demo14:00 – 14:45 PM Presentation (P) module presentation14:45 – 15:00 PM P-module demo15:00 – 15:15 PM BREAK15:15 – 16:00 PM Wrap-up & planning discussion
top related