srinidhi varadarajan director. we need a paradigm shift to make supercomputers more usable for...
Post on 30-Dec-2015
214 Views
Preview:
TRANSCRIPT
Srinidhi VaradarajanDirector
We need a paradigm shift to make supercomputers more usable for mainstream computational scientists. ◦ A similar shift occurred in computing in the 1970s when the
advent of inexpensive minicomputers into academia spurred a large body of computing research.
◦ Results from this research went back to industry creating a growth cycle that lead computing being a commodity.
This requires a comprehensive “rethink” of programming languages, runtime systems, operating systems, scheduling, reliability and operations and management◦ Moving to petascale class systems significantly complicates
this challenge. Need a computing environment that can efficiently and
usably span the scales from department sized systems to national resources.
Most of the “big iron” today is concentrated in DoD, DOE, NASA and NSF supercomputing centers.
Their mandate is national interest – their goal is to provide stable production cycles to computational scientists.
The future of supercomputing – research into supercomputing itself – is necessarily different from providing stable production cycles.
Our goal is to build a world class research group focused on high-end systems research.◦ This involves research in architectures, networks, power optimization,
operating systems, compilers and programming models, algorithms, scheduling and reliability.
◦ Our faculty hiring in systems is targeted to cover the breadth of these research areas.
The center is involved in research and development work, including design and prototyping of systems and development of production quality systems software.◦ The goal is to design and build the software infrastructure that makes
HPC systems usable by the broad computational science and engineering community.
Provide support to high performance computing users on-campus. This involves the center in supporting actual applications, are then profiled to gauge the performance impact of its research.
CHECS was setup in the College of Engineering in Sep. 2005◦ Funded by the College of Engineering
The Center consists of several core research labs with affiliated faculty.
Has affiliated faculty within and outside of CS with domain expertise.
Complemented by an industry affiliates program.
Computing Systems Research Lab (CSRL) Distributed Systems and Storage Lab (DSSL) Laboratory for Advanced Scientific
Computing and Applications (LASCA) Parallel Emerging Architectures Research
Lab (PEARL) Scalable Performance Laboratory (SCAPE) Systems, Networking and Renaissance
Grokking Lab (SyNeRGY)
Ph.D. Students MS Students
36 24
Godmar Back (04) Cal Ribbens (87)
Ali Butt (06) Adrian Sandu (03)
Kirk Cameron (05) Eli Tilevich (06)
Wu Feng (05) Srinidhi Varadarajan (99)
Dennis Kafura (82) Layne Watson (78)
Dimitris Nikolopoulos (06)
Flows: Threads based distributed shared memory programming model
MPI On-Ramp: Removing the difficulties of mapping communication design abstractions to MPI code through visual tools and code generation
Operation stacking framework: algorithms and tools for improving the performance of large-scale ensemble computations
ReSHAPE: improving utilization and throughput on clusters via dynamically re-sizable parallel computations
Code Generation on Steroids: enhancing the functionality of automatically generated code through Generative Aspect Oriented Programming
FlexiCache: Improving OS file system performance by developing an interface to support a repertoire of (pluggable) cache replacement polices in the kernel
Cadus: Co-Scheduling of real-time threads and garbage collection
Practical Fair-Sharing scheduling: finding and automatically adopting policies for stock kernels
‘MAGNETizing’ SystemTap: Enabling dynamic, on-the-fly probing and export of kernel information
High-performance, power-aware computing: frameworks for power, energy, and thermal measurement, analysis, and optimization
Frameworks: PowerPack, MISER
Supercomputing in small spaces: low-power & power-aware supercomputing
Programming Layered Multiprocessors: a unified programming approach for layered shared-memory multiprocessors, with multithreaded or multicore execution components
MELISSES: Continuous hardware monitors for power-performance adaptation schemes on layered parallel architectures
Top: A framework for flexible, high-level instrumentation of binaries
DyniX: A framework for combined static/dynamic analysis of Java code
déjà vu: Transparent checkpointing and recovery for parallel applications
Weaves: Runtime system for adaptive compositional codes
High-performance networking: architecture, protocols, performance (modeling, evaluation, auto-tuning) in system-area & wide-area networks
Open Network Emulator: Integrated environment for simulation and direct-code execution of network protocols
Surrogate approximation: mathematical construction of functional approximations using sparse data in high dimensions, with ultimate application to multidisciplinary design optimization (MDO)
Robust design optimization: solving optimization problems with stochastic variables and constraints.
pDIRECT: massively parallel direct search algorithms for global optimization
Mathematical software for terascale machines: scalable algorithms for polynomial systems of equations, global optimization, MDO, and interpolatory approximation.
mpiBLAST: high-performance bioinformatics
Stochastic modeling: parameter estimation for stochastic cell cycle models
Remote sensing: parallel algorithms for remote sensing applications
WBCSim: a problem solving environment for wood based composites manufacturing processes.
System X: 2200 processor PowerPC cluster with Infiniband interconnect
Anantham: 400 processor Opteron Cluster with Myrinet interconnect
Several 8-32 processor research clusters. 12 processor SGI Altix shared memory system 8 processor AMD Opteron shared memory system. 16 core AMD Opteron shared memory system 16 node Playstation 3 cluster
Currently building a 2400 core x86 cluster for research in power aware computing, programming models and fault tolerance.
0
10
20
30
40
50
60
70
2005 2006 2007 2008
0
500,000
1,000,000
1,500,000
2,000,000
2,500,000
3,000,000
3,500,000
4,000,000
4,500,000
2006 2007 2008
CHECS Outreach Training
◦ Summer FDI on parallel computation: 2005, 2006, 2007. Average attendance of 15-20 faculty plus graduate students.
◦ Offered 6-hour short courses on MPI and OpenMP parallel programming to graduate students each semester; average attendance of 15.
Anantham: approximately 500,000 jobs were run in 2007 alone; huge majority by CoE users including students working with Andrew Duggleby (ME), Walt O’Brien (ME), and David Cox (ChemE).
Visitors: Reed, Fowler, Munoz, Dongarra Chair of System X allocation committee Developing senior level course in Computational
Science & Engineering, to be cross-listed with ESM
CHECS HPC Consulting 2006-07
Don Leo (ME)Mark Stremler(ESM)Walt O’Brien (ME)Ron Kriz (ESM)Diana Farkas (MSE)Madhav Marathe (CS)Alexey Onufriev (CS)Amadeu Sum (ChemE)
Jack Lesko (CEE)Linsey Marr (CEE)Romesh Batra (ESM)Chris Wyatt (ECE)Andrew Duggleby (ME)Jimmy Martin (CEE)Yili Liu (ECE)Yu Wang (MSE)
Chris Roy (AOE)Joe Wang (AOE)M. Von Spakofsky (ME)Andrew Kurdila (ME)Danesh Tafti (ME)A. Bouguettaya (CS)David Cox (ChemE)Ishwar Puri (ESM)
Industrial Impact Developed industrial affiliates program
modeled on MPRG.◦ In negotiations with Merrill Lynch to get them as
the first affiliate. Two venture funded startups originated
from CHECS. Created the Green500 list that ranks the
most energy-efficient supercomputers.
5 NSF CAREER awards 2 DOE CAREER awards 3 IBM Faculty awards Dean’s Award for Excellence in Research 2 VT Faculty Fellows Best Paper Award at PPoPP Won Storage Challenge at Supercomputing
2007 2 Faculty in HPCWire List of People to Watch in
Supercomputing 1 Faculty in MIT TR100 list of young researchers
top related