Transcript
Page 1: SimTech Cluster of Excellence - uni-stuttgart.de

Adaptive Parallel Simulation of a Two-Timescale Model for Apoptotic Receptor-Clustering on GPUs

Cooperation with M. Daub • G. Schneider

Extending the Scope of Approximate Computing to Scientific Computing and Simulation Technology

Dipl.-Inf. Alexander Schöll, Prof. Dr. Hans-Joachim Wunderlich

E-mail: [email protected], [email protected]

Institute of Computer Architecture and Computer Engineering

MotivationHeterogeneous computer architectures

Goal: Efficient and fault-tolerant execution of simulation applications

Simulation on Reconfigurable Heterogeneous Architectures

ChallengesReliability• Simulation applications:

• often executed for days and months

• Modern CMOS devices: • Increasingly vulnerable to reliability threats

• Required: Fault-tolerant simulation algorithms

Achieving optimal performance• Performance depends on the combination of implementation and

utilized architecture

AlexanderSchöll

Hans-JoachimWunderlich

SimTech Cluster of Excellence

www.simtech.uni-stuttgart.de

Approximate Computing• Trade-off precision for efficiency

• Often limited to applications withinherent error tolerance

Applying approximate computing tosimulation technology• Tight accuracy constraints

• Often low error resilience

Acceleration of Markov-Chain Monte-Carlo Molecular Simulations

Cooperation with Cooperation with J. Castillo • J. Groß

Markov-Chain Monte-Carlo (MCMC)• Core of many tasks in thermodynamics

• Mapping to GPU: exploiting parallel energy calculations and speculative evaluation of Monte-Carlo moves

• Heterogeneous mappingto CPU and GPU resultsin significant speedups

Collaborations in SimTech

Current work

MolecularConfiguration

Motivation• Deeper understanding for the

activation of apoptosis

Simulation: Dominated by extensive computing times

Goals• Reduction of computation time

• … to obtain extensive and detailed conclusions about the clustering behavior

Computational Performance Results• Adaptive discretization of time and heterogeneous

mapping to CPU and GPU results in significant speedups

Biological Evaluation

Evolution of ligand-receptor clusters in less

than 0.5s

Preconditioned Conjugate Gradient (PCG):Important sparse linear system solver• Iterative solving method

Goal: PCG on approximate hardware with guaranteed result accuracy

Challenges:• Error resilience is changing over time

• Overhead by additional operationsto monitor error resilience

Solution: • Use efficient fault tolerance

to monitor and adapt approximation at runtime

Experimental Results

• Hardware utilization and iteration countcompared to execution on precise hardware

50%

60%

70%

80%

90%

100%

110%

120%

130%

Hardware utilization Iteration count

[1] A. Schöll, C. Braun, M. A. Kochte, and H.-J. Wunderlich, "Efficient Algorithm-Based Fault Tolerance for Sparse Matrix Operations", Proceedings of the 46th Annual IEEE/IFIP International Conference on Dependable Systems and Networks, DSN'16, Toulouse, France, 28. June-1. July, 2016, pp. 251-262.

[2] A. Schöll, C. Braun, and H.-J. Wunderlich, "Applying Efficient Fault Tolerance to Enable the Preconditioned Conjugate Gradient Solver on Approximate Computing Hardware”, in Proceedings of the 29th Symposium on Defect and Fault Tolerance in VLSI and Nanotechnology Systems, DFT'16, University of Connecticut, USA, 19. – 20. September, 2016 , pp. 21 - 26. DFTS Best Paper Award 2016.

[3] A. Schöll, C. Braun, M. A. Kochte, and H.-J. Wunderlich, "Low-Overhead Fault-Tolerance for the Preconditioned Conjugate Gradient Solver", in Proceedings of the IEEE International Symposium on Defect and Fault Tolerance in VLSI and Nanotechnology Systems, DFT'15, Amherst, MA, USA, 12.-14. October, 2015, pp. 60-66.

[4] C. Braun, S. Holst, J. Castillo, J. Groß, and H.-J. Wunderlich, "Acceleration of Monte-Carlo Molecular Simulations on Hybrid Computing Architectures", in Proceedings of the 30th IEEE International Conference on Computer Design, ICCD'12, Montreal, Canada, 30. September-3. October, 2012, pp. 207-212.

[5] A. Schöll, C. Braun, M. Daub, G. Schneider, and H.-J. Wunderlich, "Adaptive Parallel Simulation of a Two-Timescale Model for Apoptotic Receptor-Clustering on GPUs", in Proceedings of the IEEE International Conference on Bioinformatics and Biomedicine, BIBM 2014, Belfast, UK, 2.-5. November, 2014, pp. 424-431. SimTech Best Paper Award 2014.

x86-64ARMSPARC

Intel MIC AMD ExcavatorIntel Skylake

Nvidia Pascal Xilinx Zynq Xilinx VirtexAltera Stratix

Central Processing Unit Graphics Processing Unit Field Programmable Gate Array

CPU GPU FPGA

CPU CPU GPU GPUGPUCPU CPU FPGAFPGA

Approximate Computing Paradigm

AC

Emerging

Trade-off precision fora gain in efficiency

Required: Exploit inherent error tolerance of applications

Approximate Computing in image processing

Top Related