srinidhi varadarajan director. we need a paradigm shift to make supercomputers more usable for...

Srinidhi VaradarajanDirector

We need a paradigm shift to make supercomputers more usable for mainstream computational scientists. ◦ A similar shift occurred in computing in the 1970s when the

advent of inexpensive minicomputers into academia spurred a large body of computing research.

◦ Results from this research went back to industry creating a growth cycle that lead computing being a commodity.

This requires a comprehensive “rethink” of programming languages, runtime systems, operating systems, scheduling, reliability and operations and management◦ Moving to petascale class systems significantly complicates

this challenge. Need a computing environment that can efficiently and

usably span the scales from department sized systems to national resources.

Most of the “big iron” today is concentrated in DoD, DOE, NASA and NSF supercomputing centers.

Their mandate is national interest – their goal is to provide stable production cycles to computational scientists.

The future of supercomputing – research into supercomputing itself – is necessarily different from providing stable production cycles.

Our goal is to build a world class research group focused on high-end systems research.◦ This involves research in architectures, networks, power optimization,

operating systems, compilers and programming models, algorithms, scheduling and reliability.

◦ Our faculty hiring in systems is targeted to cover the breadth of these research areas.

The center is involved in research and development work, including design and prototyping of systems and development of production quality systems software.◦ The goal is to design and build the software infrastructure that makes

HPC systems usable by the broad computational science and engineering community.

Provide support to high performance computing users on-campus. This involves the center in supporting actual applications, are then profiled to gauge the performance impact of its research.

CHECS was setup in the College of Engineering in Sep. 2005◦ Funded by the College of Engineering

The Center consists of several core research labs with affiliated faculty.

Has affiliated faculty within and outside of CS with domain expertise.

Complemented by an industry affiliates program.

Computing Systems Research Lab (CSRL) Distributed Systems and Storage Lab (DSSL) Laboratory for Advanced Scientific

Computing and Applications (LASCA) Parallel Emerging Architectures Research

Lab (PEARL) Scalable Performance Laboratory (SCAPE) Systems, Networking and Renaissance

Grokking Lab (SyNeRGY)

Ph.D. Students MS Students

Godmar Back (04) Cal Ribbens (87)

Ali Butt (06) Adrian Sandu (03)

Kirk Cameron (05) Eli Tilevich (06)

Wu Feng (05) Srinidhi Varadarajan (99)

Dennis Kafura (82) Layne Watson (78)

Dimitris Nikolopoulos (06)

Flows: Threads based distributed shared memory programming model

MPI On-Ramp: Removing the difficulties of mapping communication design abstractions to MPI code through visual tools and code generation

Operation stacking framework: algorithms and tools for improving the performance of large-scale ensemble computations

ReSHAPE: improving utilization and throughput on clusters via dynamically re-sizable parallel computations

Code Generation on Steroids: enhancing the functionality of automatically generated code through Generative Aspect Oriented Programming

FlexiCache: Improving OS file system performance by developing an interface to support a repertoire of (pluggable) cache replacement polices in the kernel

Cadus: Co-Scheduling of real-time threads and garbage collection

Practical Fair-Sharing scheduling: finding and automatically adopting policies for stock kernels

‘MAGNETizing’ SystemTap: Enabling dynamic, on-the-fly probing and export of kernel information

High-performance, power-aware computing: frameworks for power, energy, and thermal measurement, analysis, and optimization

Frameworks: PowerPack, MISER

Supercomputing in small spaces: low-power & power-aware supercomputing

Programming Layered Multiprocessors: a unified programming approach for layered shared-memory multiprocessors, with multithreaded or multicore execution components

MELISSES: Continuous hardware monitors for power-performance adaptation schemes on layered parallel architectures

Top: A framework for flexible, high-level instrumentation of binaries

DyniX: A framework for combined static/dynamic analysis of Java code

déjà vu: Transparent checkpointing and recovery for parallel applications

Weaves: Runtime system for adaptive compositional codes

High-performance networking: architecture, protocols, performance (modeling, evaluation, auto-tuning) in system-area & wide-area networks

Open Network Emulator: Integrated environment for simulation and direct-code execution of network protocols

Surrogate approximation: mathematical construction of functional approximations using sparse data in high dimensions, with ultimate application to multidisciplinary design optimization (MDO)

Robust design optimization: solving optimization problems with stochastic variables and constraints.

pDIRECT: massively parallel direct search algorithms for global optimization

Mathematical software for terascale machines: scalable algorithms for polynomial systems of equations, global optimization, MDO, and interpolatory approximation.

mpiBLAST: high-performance bioinformatics

Stochastic modeling: parameter estimation for stochastic cell cycle models

Remote sensing: parallel algorithms for remote sensing applications

WBCSim: a problem solving environment for wood based composites manufacturing processes.

System X: 2200 processor PowerPC cluster with Infiniband interconnect

Anantham: 400 processor Opteron Cluster with Myrinet interconnect

Several 8-32 processor research clusters. 12 processor SGI Altix shared memory system 8 processor AMD Opteron shared memory system. 16 core AMD Opteron shared memory system 16 node Playstation 3 cluster

Currently building a 2400 core x86 cluster for research in power aware computing, programming models and fault tolerance.

2005 2006 2007 2008

500,000

1,000,000

1,500,000

2,000,000

2,500,000

3,000,000

3,500,000

4,000,000

4,500,000

2006 2007 2008

CHECS Outreach Training

◦ Summer FDI on parallel computation: 2005, 2006, 2007. Average attendance of 15-20 faculty plus graduate students.

◦ Offered 6-hour short courses on MPI and OpenMP parallel programming to graduate students each semester; average attendance of 15.

Anantham: approximately 500,000 jobs were run in 2007 alone; huge majority by CoE users including students working with Andrew Duggleby (ME), Walt O’Brien (ME), and David Cox (ChemE).

Visitors: Reed, Fowler, Munoz, Dongarra Chair of System X allocation committee Developing senior level course in Computational

Science & Engineering, to be cross-listed with ESM

CHECS HPC Consulting 2006-07

Don Leo (ME)Mark Stremler(ESM)Walt O’Brien (ME)Ron Kriz (ESM)Diana Farkas (MSE)Madhav Marathe (CS)Alexey Onufriev (CS)Amadeu Sum (ChemE)

Jack Lesko (CEE)Linsey Marr (CEE)Romesh Batra (ESM)Chris Wyatt (ECE)Andrew Duggleby (ME)Jimmy Martin (CEE)Yili Liu (ECE)Yu Wang (MSE)

Chris Roy (AOE)Joe Wang (AOE)M. Von Spakofsky (ME)Andrew Kurdila (ME)Danesh Tafti (ME)A. Bouguettaya (CS)David Cox (ChemE)Ishwar Puri (ESM)

Industrial Impact Developed industrial affiliates program

modeled on MPRG.◦ In negotiations with Merrill Lynch to get them as

the first affiliate. Two venture funded startups originated

from CHECS. Created the Green500 list that ranks the

most energy-efficient supercomputers.

5 NSF CAREER awards 2 DOE CAREER awards 3 IBM Faculty awards Dean’s Award for Excellence in Research 2 VT Faculty Fellows Best Paper Award at PPoPP Won Storage Challenge at Supercomputing

2007 2 Faculty in HPCWire List of People to Watch in

Supercomputing 1 Faculty in MIT TR100 list of young researchers

srinidhi varadarajan director. we need a paradigm shift to make supercomputers more usable for...

highend systems research

operating systems

runtime systems

prototyping of systems

research areas

hpc systems usable

petascale class systems

department sized systems

Documents

varadarajan - introduction to super symmetry for...

madhavan varadarajan - connecting repositories · 2017. 11....

supersymmetry for mathematicians: an introduction...

+91-8048078710 - srinidhi engineers · established in the...

padmini varadarajan, md

project-srinidhi sharma pdf

imaging membrane-protein diffusion in living bacteria...

cubic dirac cohomology for generalized enright-varadarajan...

raw.rutgers.eduraw.rutgers.edu/miklosvasarhelyi/resume...

oncertainclassesofmodules k. varadarajan* · publicacions...

gayathri varadarajan final thesis report_final

madhavan varadarajan · 2018. 2. 21. · madhavan...

a framework for collective personalized communication...

preservation of taj mahal - varadarajan committee

srinidhi areca

v. s. varadarajan - ucla

srinidhi varadarajan - virginia...

varadarajan srinivasan thesis final

representation of jihvamuliya and upadhmaniya in … of...

performance anomalies within the cloud 1 this slide includes...