tom furlani, phd center for computational research university at buffalo, suny

39
Tom Furlani, PhD Tom Furlani, PhD Center for Computational Research Center for Computational Research University at Buffalo, SUNY University at Buffalo, SUNY Solving the “last mile of computing problem” – developing portals to enable simulation-based science and engineering The Role of High Performance Computation The Role of High Performance Computation in Economic Development in Economic Development Rensselaer Polytechnic Institute October 22 - 24, 2008

Upload: ashely-freeman

Post on 02-Jan-2016

22 views

Category:

Documents


0 download

DESCRIPTION

Solving the “last mile of computing problem” – developing portals to enable simulation-based science and engineering. Tom Furlani, PhD Center for Computational Research University at Buffalo, SUNY. The Role of High Performance Computation in Economic Development - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Tom Furlani, PhD Center for Computational Research University at Buffalo, SUNY

Tom Furlani, PhDTom Furlani, PhDCenter for Computational ResearchCenter for Computational Research

University at Buffalo, SUNYUniversity at Buffalo, SUNY

Solving the “last mile of computing problem” –

developing portals to enable simulation-based science and

engineering

The Role of High Performance Computation The Role of High Performance Computation in Economic Developmentin Economic Development

Rensselaer Polytechnic InstituteOctober 22 - 24, 2008

Page 2: Tom Furlani, PhD Center for Computational Research University at Buffalo, SUNY

Outline

How Did Computation Become so ImportantBringing HPC to the Researcher’s Desktop

Portals Grid Computing Example Portals

Research Center for Computational Research

• Overview Understanding Protein Chemistry

• Photoactive Yellow Protein Toward Petascale level calculations

Page 3: Tom Furlani, PhD Center for Computational Research University at Buffalo, SUNY

How did computation become critical?

1940’s

Revolution in Computing Storage Networking/Communication

Today1980’s

1TB - $120.

Page 4: Tom Furlani, PhD Center for Computational Research University at Buffalo, SUNY

Computing Revolution

Microprocessor Revolution

How long would 1 hr calc today take on a PC from 1984?

Slide courtesy – Dan Reed, RENCI

1890-1945 Mechanical, relay 7 year doubling

1945-1985 Tube, transistor 2.3 year doubling

1985-2005 Microprocessor 1 – 1.5 year doubling

Exponentials Transistor density

• 2X in ~18 months (Moore’s Law) Graphics: 100X in 3 years WAN bandwidth: 64X in 2 years Storage: 7X in 2 years

24 Years!

Page 5: Tom Furlani, PhD Center for Computational Research University at Buffalo, SUNY

The Storage RevolutionMegabyte

5 MB: complete works of Shakespeare Terabyte: 1,000,000 MB – ~$120 today

The text in 1 million books Entire U.S. Library of Congress is 10TB of text 50,000 trees made into paper and printed Large Hadron Collider Experiment– 15 TB/day

Petabyte: 1000 terabytes 20 million four-drawer filing cabinets full of text

The Data Tsunami - Many sources Agricultural, Medical, Environmental, Engineering, Financial

Why so much data? More sensors – higher resolution Faster/cheaper storage capability Faster processors – generate more data!

The challenge: extracting insight! Without being overwhelmed

Page 6: Tom Furlani, PhD Center for Computational Research University at Buffalo, SUNY

Advanced Networking

Networks are the 21st century interstate highway system expertise and information - the real product

Removes the barriers of time and space

Eisenhower Interstate System National Lambda Rail Network

Page 7: Tom Furlani, PhD Center for Computational Research University at Buffalo, SUNY

Enabling SBES for Non-ExpertsBringing HPC to the desktop

Analogous to impact of Windows vs DOS for PC’s• Brought computing/internet to the home

Many users need periodic, but infrequent access Experiment driven

Ease of use is key Shouldn’t need to know about OS, compilers, queuing

system, etc GUI Interface, Web-based, Access anywhere

How do we get there? Focus on development of portals, custom software

and tools, data models, GUI’s, etc. Provide training on the use of these tools Ex: nanoHUB – one stop resource for nanotechnology

Page 8: Tom Furlani, PhD Center for Computational Research University at Buffalo, SUNY

“Old School” Computing

InputFile

VPNsoftware

SecureShell

software

Unixcommands

Use VPNto accessnetwork

Secure loginto front-end

machine

Create subdirectory

Uploadinput data file

Add keywordsto Input

file

Securefile

transfer

Identifykeywordsfor model

Edit inputfile

Create PBSscript file

Edit file

Applicationcommand

line

Setnumber ofprocessors

PBS format and syntax

Set pathand

variables

Submit jobto queue

Set runtime andqueue

PBScommands

Monitorjob

Page 9: Tom Furlani, PhD Center for Computational Research University at Buffalo, SUNY

Portal Driven Computing

InputFile

Secure loginto webportal

Uploadinput data file

Select model and

run job

Monitorjob

View Output in Browser

View Output

Open Browser Monitor JobsSelect Model

Page 10: Tom Furlani, PhD Center for Computational Research University at Buffalo, SUNY

What is an Application Portal?No consistent definitionWeb-based

On-line simulation from you browser Simulation typically doesn’t run on your PC

Doesn’t have to be grid enabledWebMO

Computational Chemistry Portal

nanoHUB Web-based resource for research, education and collaboration

in nanotechnology Includes application portals (tools)

Page 11: Tom Furlani, PhD Center for Computational Research University at Buffalo, SUNY

Portal BasicsRemote Access to simulations and compute power

V

Application Server

Authentication

Internet

ccr.buffalo.edu

Remote DesktopRun Simulation

Export Display

Page 12: Tom Furlani, PhD Center for Computational Research University at Buffalo, SUNY

Application Portals Benefits

Scientists able to focus on research rather than details of computing environment

Underlying infrastructure complexities are hidden Transparently integrate compute and data resources Moving application to a web-based interface provides ubiquitous

access Single sign-on – Don’t have to maintain accounts on many

machines

Challenges Requires close collaboration between domain experts and

developers Developers must be aware of and hide underlying complexity Must be easy to use (web-based, GUI) Must provide full application functionality

Page 13: Tom Furlani, PhD Center for Computational Research University at Buffalo, SUNY

Grid Enabling Applications Why Needed

Scientists require an ever growing amount of compute and storage resources

Experiments may have requirements beyond the capabilities of a single data center

Datasets are growing at a tremendous rate

Grid Computing Provides infrastructure for data and job management Handles authentication of users across administrative

and political domains Provides monitoring of resources and user jobs Allows researchers to harness the power of multiple

datacenters for large experiments Provide reusable interface to commonly used

functions: Job status, job submission, file management

Page 14: Tom Furlani, PhD Center for Computational Research University at Buffalo, SUNY

Example PortalsWebMO – Computational ChemistryREDfly – Bioinformatics iNquiry: Common web interface to many command-line

toolsGenePattern: Scientific workflow and genomic analysis

tools

Page 15: Tom Furlani, PhD Center for Computational Research University at Buffalo, SUNY

CCR Computational Chemistry Portal

CCR iNquiry Bioinformatics Portal, Glimmer page

Based on WebMO: www.webmo.net CCR portal:

webmo.ccr.buffalo.edu Extensive QC Support

Gaussian, GAMESS, NWChem, Q-Chem, Mopac, Molpro, Tinker

Interfaces with batch queues on U2 and several faculty clusters

Page 16: Tom Furlani, PhD Center for Computational Research University at Buffalo, SUNY

Computational Chemistry Portal

Browser based loginMenu driven

Page 17: Tom Furlani, PhD Center for Computational Research University at Buffalo, SUNY

Computational Chemistry Portal

Choose level of theory

Page 18: Tom Furlani, PhD Center for Computational Research University at Buffalo, SUNY

Computational Chemistry Portal

View output

Page 19: Tom Furlani, PhD Center for Computational Research University at Buffalo, SUNY

Computational Chemistry Portal

……including vibrational modes

Page 20: Tom Furlani, PhD Center for Computational Research University at Buffalo, SUNY

Database/Portal Development REDfly (Regulatory Element

Database for Fly) Database of transcriptional regulatory elements

Aggregates data from multiple offline & online sources

Over 2100 entries

Most comprehensive resource of curated animal regulatory elements

Fully searchable, includes DNA sequence, gene expression data, link-outs to other databases

Extensive collaboration with other online data sources using web services

Page 21: Tom Furlani, PhD Center for Computational Research University at Buffalo, SUNY

CCR Bioinformatics Portal Based on iNquiry:

www.bioteam.net Web portal:

inquiry.ccr.buffalo.edu Extensive Application

Support Includes popular open-

source bioinformatics packages

EMBOSS, *PHYLIP, HMMer, BLAST, MPI-BLAST, NCBI Toolkit, Glimmer, Wise2,*ClustalW, *BLAT, *FASTA

Extensible for customized application interfaces

Uses U2 Compute Cluster as Computational Engine

Page 22: Tom Furlani, PhD Center for Computational Research University at Buffalo, SUNY

TITAN - Modeling GeohazardsModeling of Volcanic Flows, Mud flows

(flash flooding), and AvalanchesBenefits for Developers

Developers – too much time supporting user installations

Support single web-based portal CCR supports back-end infrastructure Frees developers to focus on improving the

models, science Integrate information from several

sources Simulation results Remote sensing GIS data

Web enable for remote access

Page 23: Tom Furlani, PhD Center for Computational Research University at Buffalo, SUNY

Metrics on Demand Portal UBMoD: Web-based Interface for On-demand Metrics CPU cycles delivered, Storage, Queue Statistics, etc Role based interface (User, Faculty, Staff, Admin) Available in open source :

Page 24: Tom Furlani, PhD Center for Computational Research University at Buffalo, SUNY

Center for Computational Research

Under NYS Center for Excellence in Bioinformatics & Life Sciences Moved to New Buffalo Life Sciences Complex Building Leading Academic Supercomputing Site Mission: “Enabling and facilitating research within the University

community” Enable Research by Providing

high-end computing and visualization resources, software engineering, scientific computing/modeling, bioinformatics/computational biology, scientific and urban visualization, advanced computing systems

Industrial Outreach/Technology Transfer to WNY Education, Outreach and Training in WNY

Page 25: Tom Furlani, PhD Center for Computational Research University at Buffalo, SUNY

2007 HighlightsComputational Cycles Delivered in 2007:

224 different users submitted jobs (88 research groups) 354,447 jobs run (almost 1000 per day) 700,000 CPU days delivered 200 new user accounts created

CIT/CCR Collaboration to Improve Research Computing Condor deployment

Portal/Tool Development Make machines easier to use

• WebMO (Chemistry)• iNquiry (Bioinformatics)• UBMoD (Metrics on Demand)

Accountability On-line real-time metrics

UB 2020 Campus Master Planning 3D models of all 3 campuses

NYSGrid

Page 26: Tom Furlani, PhD Center for Computational Research University at Buffalo, SUNY

CCR Research & Projects Urban Simulation and

Visualization Accident Reconstruction Risk Mitigation (GIS) Medical Imaging High School Workshops Cluster Computing Data Fusion

Groundwater Flow Modeling Turbulence and Combustion

Modeling Molecular Structure Determination Protein Folding Prediction Data Mining – Digital Gov, Library Grid Computing Computational Chemistry Biomedical Engineering Bioinformatics

Page 27: Tom Furlani, PhD Center for Computational Research University at Buffalo, SUNY

Photoactive Yellow Protein

Simple prototype of Rhodpsin family of proteins

Chromophore is located completely inside the protein pocket

Protein environment causes absorption shift from 2.70 eV (gas phase) to 2.78 eV (protein) yielding the yellow color

Page 28: Tom Furlani, PhD Center for Computational Research University at Buffalo, SUNY

Chromophore Spectra Measured

Experimental spectra of the protein active site in vacuum, in a protein and in water solution

Provides insight into environmental effects on electronic spectra, large shift of absorption maximum

Can gauge accuracy of theory

Page 29: Tom Furlani, PhD Center for Computational Research University at Buffalo, SUNY

Modeling the System

Combined Quantum Mechanical / Molecular Mechanical Method

System is divided into a QM part and a MM part

QM used in to model “important” part of system; MM used to model remainder

The QM part includes the active site of the protein

The MM part includes the rest of the protein, as well as surrounding water molecules QM

Page 30: Tom Furlani, PhD Center for Computational Research University at Buffalo, SUNY

QM versus MM based Methods

QM Calculations

Advantages: Very accurate, based on first principles (ab initio, DFT - there are not empirical parameters involved), can treat bond breaking and formation

Disadvantages: Time consuming, limited to small molecular systems (~100 atoms)

MM Calculations

Advantages: Very fast, capable to calculate entire proteins or solutions (~100,000 atoms)

Disadvantages: Less accurate, based on empirical parameters, not capable to calculate chemical reactions (electrons are not involved)

QM/MM

Page 31: Tom Furlani, PhD Center for Computational Research University at Buffalo, SUNY

Why use the QM/MM Method?

Improved accuracy (QM) and faster (MM) Model active site of proteins

Drug-receptor binding Electrostatic effects Steric effects

Interpretation of experimental data Vibrational spectra Electronic spectra

Mechanism of enzymatic activity Reaction profiles

Thermal motion effects on reactivity

Page 32: Tom Furlani, PhD Center for Computational Research University at Buffalo, SUNY

Modeling Protein Dynamics

1. Run MM based Molecular Dynamics simulation2. From MD simulation, randomly select protein conformations

(snapshots)3. Run QM/MM simulation for each snapshot4. Generate results based on averages taken from snapshots

Protein dynamics time

Goal: Understand how protein thermal dynamics effects function

Page 33: Tom Furlani, PhD Center for Computational Research University at Buffalo, SUNY

Getting Results Faster

Carry out QM/MM calcs simultaneously for many snapshots (protein conformations)

Page 34: Tom Furlani, PhD Center for Computational Research University at Buffalo, SUNY

QM/MM Calc for Each Snapshot

After MD, protein snapshots are randomly selected (1000)

Full geometry optimization of the ligand inside the fixed protein matrix (Q-Chem) QM: DFT/B3LYP/6-31+G* (ligand) MM: AMBER (protein + water)

Electronic excitations (Q-Chem): QM: TDDFT/B3LYP/aug-cc-pVTZ

(ligand) MM: AMBER (protein + water)

• 4500 water molecules

Page 35: Tom Furlani, PhD Center for Computational Research University at Buffalo, SUNY

CPU Demand - Current Calculation

MD Simulation 1600 CPU hours Select 1000 Snapshots

Each Snapshot (54 CPU Hours) Combined QM/MM Geometry Optimization

• 24 CPU hours (3 hours on 8 processors) Electronic Excitation Calc

• 30 CPU Hours

Total for all 1000 snapshots + MD Simulation 55,600 CPU Hours (2300 CPU Days)

Page 36: Tom Furlani, PhD Center for Computational Research University at Buffalo, SUNY

Results

Electronic Excitation

Gas-Phase(eV)

Protein(eV)

Solution(eV)

Calculated 3.07 3.31(0.06)=0.24

3.52(0.04)=0.45

Experiment 2.70 2.78=0.08

3.10=0.40

( ) - standard deviation - change relative to the gas phase

Electronic excitations of the chromophore

Page 37: Tom Furlani, PhD Center for Computational Research University at Buffalo, SUNY

Toward Petascale Level Calc

More accurate MD simulation Larger water sphere (50 A radius)

• ~12,000 water molecules

500 hours on 32 processors - 16,000 CPU hours

More accurate QM/MM simulations Larger basis set 350 hours on 16 processors - 5600 CPU hours

Better statistics 100,000 MD snapshots (560,000,000 CPU hours) 2 MD simulations - 1,120,000,000 CPU hours!

Page 38: Tom Furlani, PhD Center for Computational Research University at Buffalo, SUNY

Power of Parallel Processing

Assume a modest 4X increase in processor performance/computational efficiency over the next few years Reduce requirement to about 10,000,000 CPU

daysTranslates to 100 CPU days on 100,000

coresCombined QM/MM simulations of this scale

possible on petascale level hardware

Page 39: Tom Furlani, PhD Center for Computational Research University at Buffalo, SUNY

Acknowledgements

Portal Development Steve Gallo, Dr. Matt Jones, Jon

Bednasz, Rob Leach Combined QM/MM Calculations

Dr. Marek FriendorfFunding

NIH