introduction

. BioPerf is a suite of representative applications that we have assembled from the computational biology community, where the codes are carefully selected to span a breadth of algorithms and performance characteristics. We have analyzed the complexity of these codes, at the instruction and memory level, using “live” and aggregate data on contemporary high-performance architectures (Apple G5 with the IBM PowerPC 970), and on the IBM cycle-accurate simulator Mambo, previously used to design supercomputers such as IBM p-Series and BlueGene, and currently being used to model future systems. Hence, our work is novel in that it is one of the first efforts to incorporate life science application performance for optimizing high-end computer system architectures. Through dual-platform performance analysis, we offer system design parameters for machine configurations that may improve the performance of these codes. We target this suite for impact to both biologists and computer scientists for the evaluation of systems running bioinformatics applications.

Incorporating Life Science Applications into the Architectural Optimizations of Next-Generation Petaflops Systems

Introduction Performance Analysis

BioPerf

David A. BaderGeorgia Institute of Technology

Vipin Sachdeva University of New Mexico

The following codes are included in BioPerf

Package ExecutableBLAST blastn, blastpFASTA fasta34_t, ssearch34_tCLUSTALW clustalw, clustalw_smpHMMER hmmsearch, hmmpfamT-COFFEE tcoffeeGLIMMER glimmer, glimmer-packagePHYLIP dnapenny, promlkGRAPPA grappaCE cePREDATOR predator

0 200 400 600 800 1000

Instr. completed/cycle L1d Miss Rate

0 200 400 600 800 1000 1200 1400 1600 1800

Instructions Completed/Cycle L1D Cache Misses

0 200 400 600 800 1000 1200 1400

Instr. Completed/Cycle L1D Cache Miss Rate

Blast, hmmpfam, and tcoffee performance graphs (from left to right)

Instructions per cycle increases in the same cycle that the L1 data miss rate decreases. We can thus correlate the performance of the application as it varies, with the system metrics impacting it.

Methodology Performance Analysis through live data

Cumulative Metrics

Separated Regions of Performance

Performance Analysis through live data

6.0e+8 8.0e+8 1.0e+9 1.2e+9 1.4e+9 1.6e+9 1.8e+9 2.0e+9 2.2e+9 2.4e+90

INST_TYPE_ARITH L1D_CACHE_MISS_USER

2.20e+9 2.22e+9 2.24e+9 2.26e+9 2.28e+9 2.30e+9 2.32e+9 2.34e+915000

2.2e+9 2.4e+9 2.6e+9 2.8e+9 3.0e+9 3.2e+9 3.4e+9 3.6e+9 3.8e+9

ClustalW region I (top), II (bottom left) and III (bottom right) showing differences in algorithmic complexity and memory access pattern.

Clustalw’s performance is roughly categorized into three regions.• Every sequence is compared against every other sequence by Smith Waterman, a quadratic time complexity dynamic programming algorithm,• The neighbor joining method in which comparison score of sequences is used to make a guide tree with the sequences at the leaves of the tree• The sequences are combined into a multiple sequence alignment according to the guide tree.

Instruction Profiling

0 500 1000 1500 2000 2500 3000

Instr. Completed/Cycle Branch Mispredicts/Instr.

0 500 1000 1500 2000 2500 3000

Instr. Completed/Cycle L1d Miss Rate

Clustalw’s livegraphs with L1d miss rate and branch mispredicts (top left and right).

Performance of the last phase of clustalw is more closely related to branch mispredicts than L1 data miss rate.

www.bioperf.org

introduction

performance characteristics

clustalws performance

tcoffee performance

guide tree

life science applications

l1 data miss rate decreases

aggregate data

algorithmic complexity

Documents

1. introduction • introduction • einfÜhrung •...

introduction specifications introduction

introduction to transducers, introduction to transducers...

module: introduction...module: introduction page:...

· contents introduction introduction

topics introduction 1 / 3 introduction 2 / 3 introduction 3...

introducción / introduction / introduction

introduction | introduction | millennium serials · 2018....

voip-iptel-introduction-srk1.ppt...

introduction to introduction to introduction to …...

an introduction to nsan introduction to ns2an introduction...

r as a web service introduction introduction introduction

1. introduction introduction - deusto

introduction chapitre 1 · introduction . . . . . . . . . ....

introduction to transducers, introduction to transducers ......

systems analysis i introduction i introduction. 2systems...

introduction to databases: introduction

introduction 1. introduction

introduction | einleitung | introduction | introducción ·...

general introduction 1.1 introduction