neue möglichkeiten schaffen europäische intel forschung...

37
Neue Möglichkeiten schaffen Europäische Intel Forschung für Visual Computing, Exascale und Paralleles Rechnen Hans-Christian Hoppe Principal Engineer Director, ExaCluster Lab Jülich Intel Datacenter and Connected Systems Group

Upload: others

Post on 14-Jul-2020

0 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Neue Möglichkeiten schaffen Europäische Intel Forschung ...invasic.informatik.uni-erlangen.de/publications/Hoppe2012.pdf · IVCI Projekte – Markerless Performance Capture Prof

Neue Möglichkeiten schaffen – Europäische

Intel Forschung für Visual Computing,

Exascale und Paralleles

Rechnen

Hans-Christian Hoppe Principal Engineer

Director, ExaCluster Lab Jülich

Intel Datacenter and Connected

Systems Group

Page 2: Neue Möglichkeiten schaffen Europäische Intel Forschung ...invasic.informatik.uni-erlangen.de/publications/Hoppe2012.pdf · IVCI Projekte – Markerless Performance Capture Prof

Legal Disclaimer

Today’s presentations contain forward-looking statements. All statements made that are not historical facts are subject to a number of risks and uncertainties, and actual results may differ materially.

NO LICENSE, EXPRESS OR IMPLIED, BY ESTOPPEL OR OTHERWISE, TO ANY INTELLECTUAL PROPERTY RIGHTS IS GRANTED BY THIS DOCUMENT. EXCEPT AS PROVIDED IN INTEL’S TERMS AND CONDITIONS OF SALE FOR SUCH PRODUCTS, INTEL ASSUMES NO LIABILITY WHATSOEVER, AND INTEL DISCLAIMS ANY EXPRESS OR IMPLIED WARRANTY, RELATING TO SALE AND/OR USE OF INTEL® PRODUCTS INCLUDING LIABILITY OR WARRANTIES RELATING TO FITNESS FOR A PARTICULAR PURPOSE, MERCHANTABILITY, OR INFRINGEMENT OF ANY PATENT, COPYRIGHT OR OTHER INTELLECTUAL PROPERTY RIGHT. INTEL PRODUCTS ARE NOT INTENDED FOR USE IN MEDICAL, LIFE SAVING, OR LIFE SUSTAINING APPLICATIONS.

Intel does not control or audit the design or implementation of third party benchmarks or Web sites referenced in this document. Intel encourages all of its customers to visit the referenced Web sites or others where similar performance benchmarks are reported and confirm whether the referenced benchmarks are accurate and reflect performance of systems available for purchase.

Intel processor numbers are not a measure of performance. Processor numbers differentiate features within each processor family, not across different processor families. See www.intel.com/products/processor_number for details.

Intel, processors, chipsets, and desktop boards may contain design defects or errors known as errata, which may cause the product to deviate from published specifications. Current characterized errata are available on request.

Intel, Intel Xeon, Intel Core microarchitecture, and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States and other countries.

*Other names and brands may be claimed as the property of others

Copyright © 2012, Intel Corporation. All rights reserved.

Page 3: Neue Möglichkeiten schaffen Europäische Intel Forschung ...invasic.informatik.uni-erlangen.de/publications/Hoppe2012.pdf · IVCI Projekte – Markerless Performance Capture Prof

Gliederung

• Intel Forschung in Europa

– Intel Labs Europe

• Visual Computing

– Intel Visual Computing Institute, Saarbrücken

• Paralleles Rechnen

– Concurrent Collections und paralleles JavaScript

• HPC und Exascale

– Herausforderung Exascale

– Intel European Exascale Labs

Page 4: Neue Möglichkeiten schaffen Europäische Intel Forschung ...invasic.informatik.uni-erlangen.de/publications/Hoppe2012.pdf · IVCI Projekte – Markerless Performance Capture Prof

INTEL FORSCHUNG IN EUROPA

Page 5: Neue Möglichkeiten schaffen Europäische Intel Forschung ...invasic.informatik.uni-erlangen.de/publications/Hoppe2012.pdf · IVCI Projekte – Markerless Performance Capture Prof

Intel Corporation: The World’s Largest Semiconductor Manufacturer

• Leading Manufacturer of Computer, Networking & Communications Products

• 168 Sites and 578 Buildings in 63 Countries

• $54B in Annual Revenues from Customers in Over 120 Countries

• 25 Consecutive Years of Positive Net Income

• Over 100,000 Employees

• 80,000 technical roles, 10,400 Masters in Science, 5,200 PhD’s, 4,000 MBA’s

• One of the Top Ten Most Valuable Brands in the World for 11 Consecutive Years

• Ranked #46 on Fortune’s 100 Best Companies to Work For List

• Invests $100 Million Each Year in Education Across More than 70 Countries

• The Single-Largest Voluntary Purchaser of Green Power in the United States

• More than One Million Hours of Volunteer Service in Our Communities in 2011

Page 6: Neue Möglichkeiten schaffen Europäische Intel Forschung ...invasic.informatik.uni-erlangen.de/publications/Hoppe2012.pdf · IVCI Projekte – Markerless Performance Capture Prof

Intel Labs Delivering Breakthrough Technologies to Fuel Intel’s Growth

Strong Research Partnerships

UNIVERSITIES

World Class Research

… and much more!

Si Photonics

& Wireless

User Experience

& Interaction

Processing &

Programming

Energy &

Sustainability

Security &

Virtualization

GOVERNMENT

INDUSTRY

Page 7: Neue Möglichkeiten schaffen Europäische Intel Forschung ...invasic.informatik.uni-erlangen.de/publications/Hoppe2012.pdf · IVCI Projekte – Markerless Performance Capture Prof

Intel Labs Europe Innovation and Research

Page 8: Neue Möglichkeiten schaffen Europäische Intel Forschung ...invasic.informatik.uni-erlangen.de/publications/Hoppe2012.pdf · IVCI Projekte – Markerless Performance Capture Prof

ILE Network

ExaCluster Lab Jülich

Germany Microprocess

or Lab

Intel Visual Computing Institute

Page 9: Neue Möglichkeiten schaffen Europäische Intel Forschung ...invasic.informatik.uni-erlangen.de/publications/Hoppe2012.pdf · IVCI Projekte – Markerless Performance Capture Prof

VISUAL COMPUTING

Page 10: Neue Möglichkeiten schaffen Europäische Intel Forschung ...invasic.informatik.uni-erlangen.de/publications/Hoppe2012.pdf · IVCI Projekte – Markerless Performance Capture Prof

Intel Visual Computing Institute

• Kooperation von Intel Labs, Universität des Saarlandes, DFKI, MPI für Informatik, MPI für Softwaresysteme

• Kofinanzierung von Forschungsprojekten

• Zur Zeit 50 Forscher/innen, 19 Projekte

• Nächster RfP in für Mitte 2012 geplant

• Offen für alle Forschungspartner in Europa

http://www.intel-vci.uni-saarland.de/

Page 11: Neue Möglichkeiten schaffen Europäische Intel Forschung ...invasic.informatik.uni-erlangen.de/publications/Hoppe2012.pdf · IVCI Projekte – Markerless Performance Capture Prof

IVCI Projekte – Interactive 3D Web Content Prof. Philipp Slusallek, DFKI

• XML3D Scene Description – XHTML5 extension

– Built into the browser

– Supports Web APIs (DOM*, CSS, JS**)

• XFLOW Workflow Description – Parallel data processing

– Fully programmable pipeline

– Accessible from DOM

• Results – Firefox and Chromium demostrators

– W3C standardization

*Domain Object Model **JavaScript

www.xml3d.org

Page 12: Neue Möglichkeiten schaffen Europäische Intel Forschung ...invasic.informatik.uni-erlangen.de/publications/Hoppe2012.pdf · IVCI Projekte – Markerless Performance Capture Prof

IVCI Projekte – Markerless Performance Capture Prof. Christian Theobalt, MPI Informatik

• Reconstruction of detailed human animation models

– In: Multi-view video without markers

– Out: Detailed motion, shape and appearance

– General clothing, modifiable performances, new animations

• Real-time performance capture – Fast full-body motion estimation from

depth cameras

– Interaction

– Virtual mirror

Page 13: Neue Möglichkeiten schaffen Europäische Intel Forschung ...invasic.informatik.uni-erlangen.de/publications/Hoppe2012.pdf · IVCI Projekte – Markerless Performance Capture Prof

Embree Photo-Realistic Ray Tracing Kernels Dr. Manfred Ernst, Intel

What is Embree?

– The fastest ray tracing kernels for Intel® CPUs

– A photo-realistic renderer for demo purposes

Integrating Embree

– Replaces a small component of ISV code

– Has a large performance impact

Professional Graphics Application CAD, digital content creation, visualization, movie production

Rendering Engine Distributed ray tracing, path tracing, photon mapping, …

Embree Ray Tracing Kernels Fast acceleration structure build and traversal

Embree

ISV Code

Page 15: Neue Möglichkeiten schaffen Europäische Intel Forschung ...invasic.informatik.uni-erlangen.de/publications/Hoppe2012.pdf · IVCI Projekte – Markerless Performance Capture Prof

PARALLELES RECHNEN

Page 16: Neue Möglichkeiten schaffen Europäische Intel Forschung ...invasic.informatik.uni-erlangen.de/publications/Hoppe2012.pdf · IVCI Projekte – Markerless Performance Capture Prof

Intel Software Development Tools

Choice of parallel programming models

• OpenMP* – Pragma– based approach to threading

geared towards array—dominated processing

• Intel® CilkTM Plus – C/C++ language extensions enable simple,

task-based programming

– New: comprehensive notation for array operations simplifies SIMD programming

• Intel® Threading Building Blocks – Generic implementations of parallel

performance patterns, concurrent data structures, sync primitives and scalable memory allocators

• Intel® MPI Library – Highly scalable message--passing library for

distributed memory systems, excellent performance and adaptability

Coming up

• OpenCL* for explicit offload programming

• Offload pragmas for MIC

http://software.intel.com/en-us/intel-sdp-home/

#pragma omp parallel for

for (i=0; i<N; i++)

Foo(array[i]);

cilk_spawn qsort(begin, middle);

qsort(middle+1, end);

cilk_sync;

tbb::concurrent_queue <int> q;

MyType <char, tbb::tbb_allocator<char>> data;

Tbb::parallel_sort(data.begin(), data.end());

MPI_Send(&a[0], count, MPI_FLOAT, dest, tag MPI_COMM_WORLD);

MPI_Reduce(&b, &sum, 1, MPI_FLOAT, MPI_SUM, MPI_COMM_WORLD);

If (mask[:])

a[:] = b[:} + m[:];

m[:] = foo(a[:]);

Page 17: Neue Möglichkeiten schaffen Europäische Intel Forschung ...invasic.informatik.uni-erlangen.de/publications/Hoppe2012.pdf · IVCI Projekte – Markerless Performance Capture Prof

Concurrent Collections Frank Schlimbach, Intel

Conventional (distributed memory) models require

• Specification which operations must run in parallel

• Mapping of these to hardware resources

For CnC, all that’s necessary is

• Specification of the semantic ordering constraints

Controller - Controllee Producer - Consumer

step1 step2 item

COMPUTE STEP COMPUTE STEP DATA ITEM

step1

COMPUTE STEP

step2

COMPUTE STEP

CONTROL TAG

tony

Page 18: Neue Möglichkeiten schaffen Europäische Intel Forschung ...invasic.informatik.uni-erlangen.de/publications/Hoppe2012.pdf · IVCI Projekte – Markerless Performance Capture Prof

Concurrent Collections Frank Schlimbach, Intel

CnC is a coordination language

• Works together with a compute language (C++, Java, Python, Scala, …)

Facilitates separation of concerns

Mapping CnC spec to

platform

CnC

Domain Spec Application problem

Domain expert • Physics, finance, gaming

Knows the application domain

Tuning expert • Platform, locality, load

balancing, … Knows the parallel

programming techniques and the platform

Page 19: Neue Möglichkeiten schaffen Europäische Intel Forschung ...invasic.informatik.uni-erlangen.de/publications/Hoppe2012.pdf · IVCI Projekte – Markerless Performance Capture Prof

Concurrent Collections Frank Schlimbach, Intel

(Surprisingly) good performance

Support for distributed memory

– Data distribution orthogonal to everything else

– Can tune distributions without disturbing the rest of the code

http://software.intel.com/en-us/articles/intel-concurrent-collections-for-cc/ http://habanero.rice.edu/cnc

0

1

2

3

4

5

6

1048576 w/ 128 Super/Sub Diagonals, 32 partitions

MKL MKL+OMP HTA+TBB CnC HTA+MPI DistCnC

Ex

ecu

tio

n t

ime

[se

c]

SPIKE

• Parallel solver for banded linear systems

• Easily maps onto CnC • Same code for shared and distributed

memory

Page 20: Neue Möglichkeiten schaffen Europäische Intel Forschung ...invasic.informatik.uni-erlangen.de/publications/Hoppe2012.pdf · IVCI Projekte – Markerless Performance Capture Prof

River Trail: Parallel Javascript Tatiana Shpeisman, Intel

• Javascript executes sequentially – takes no advantage of multicore hardware

• Web Workers implements low level, thread based model

– Parallel programming the had way …

• River Trail extends Javascript language by data parallelism

– Concurrent, independent operations on elements of parallel arrays

– Runtime system maps to parallel HW and devises schedules

– Deterministic execution, no race conditions, no deadlock

Page 21: Neue Möglichkeiten schaffen Europäische Intel Forschung ...invasic.informatik.uni-erlangen.de/publications/Hoppe2012.pdf · IVCI Projekte – Markerless Performance Capture Prof

River Trail: Parallel Javascript Tatiana Shpeisman, Intel

• New ParallelArray data structure

– Immutable, dense, homogeneous

• Six methods provide basic skeletons for parallel computing

– Map, combine, reduce, scan, filter, scatter

• Side—effect free elemental functions

– Compute one element of the array

https://github.com/RiverTrail/RiverTrail

Page 22: Neue Möglichkeiten schaffen Europäische Intel Forschung ...invasic.informatik.uni-erlangen.de/publications/Hoppe2012.pdf · IVCI Projekte – Markerless Performance Capture Prof

HPC UND EXASCALE

Page 23: Neue Möglichkeiten schaffen Europäische Intel Forschung ...invasic.informatik.uni-erlangen.de/publications/Hoppe2012.pdf · IVCI Projekte – Markerless Performance Capture Prof

10 PFlops

1 PFlops

100 TFlops

10 TFlops

1 TFlops

100 GFlops

10 GFlops

1 GFlops

100 MFlops

100 PFlops

10 EFlops

1 EFlops

100 EFlops

1993 2017 1999 2005 2011 2023

1 ZFlops

2029

Weather Prediction

Medical Imaging

Genomics Research

Forecast

Still An Insatiable Need For Computing

Page 24: Neue Möglichkeiten schaffen Europäische Intel Forschung ...invasic.informatik.uni-erlangen.de/publications/Hoppe2012.pdf · IVCI Projekte – Markerless Performance Capture Prof

* Assuming 18% Power/Perf CAGR

The Challenge to Exascale Systems Starts with Power

9.89MW 8.773PF 1.1GW 151MW* $1M

Source: http://www.top500.org/lists/2011/06

Operation Approx Energy Today

Instruction Execution 5-10 pJ FP operation 200 pJ

Byte read from cache 10-20 pJ Byte read from DRAM 1.5 nJ

Byte over IC fabric 5 pJ/hop—250 pJ+

Page 25: Neue Möglichkeiten schaffen Europäische Intel Forschung ...invasic.informatik.uni-erlangen.de/publications/Hoppe2012.pdf · IVCI Projekte – Markerless Performance Capture Prof

1.E+00

1.E+02

1.E+04

1.E+06

1.E+08

1986 1996 2006 2016

G

Tera

Peta

36X

Exa

4,000X

Concurrency

2.5M X

Transistor Performance

0.001

0.01

0.1

1

1986 1996 2006 2016

Re

lati

ve

En

erg

y/O

p G

Tera

Peta

5V

Vcc scaling

1

10

100

1000

1986 1996 2006 2016

Re

lati

ve

Tr

Pe

rfo

rma

nc

e

G

Tera

Peta

30X 250X

Technologies and Solutions That Got Us to Petascale…

…Will Not Get Us To Exascale

Page 26: Neue Möglichkeiten schaffen Europäische Intel Forschung ...invasic.informatik.uni-erlangen.de/publications/Hoppe2012.pdf · IVCI Projekte – Markerless Performance Capture Prof

The Reliability and Concurrency Challenge to Exascale

’93 ‘95 ‘97 ‘99 ’01 ‘03 ‘05 ‘07 ‘09

1E+02

1E+03

1E+04

1E+05

1E+06

1E+07 Top System Concurrency Trend

Extreme Parallelism

Source: Exascale Computing Study: Technology Challenges in achieving Exascale Systems (2008),

MTTI Measured in Minutes

1E+04

1E+05

1E+06

1E+07

1E+08

2004 2006 2008 2010 2012 2014 2016 0

1

10

100

1000

MT

TI (h

ou

rs)

0.1 Failures per socket per year:

Co

un

t

Time to save a global

Global Checkpoint T

ime

Crossover point

Page 27: Neue Möglichkeiten schaffen Europäische Intel Forschung ...invasic.informatik.uni-erlangen.de/publications/Hoppe2012.pdf · IVCI Projekte – Markerless Performance Capture Prof

Single thread performance Through Frequency Programming productivity Architecture features for productivity

Constraints (1) Cost

(2) Reasonable Power/Energy

Throughput performance Parallelism Power/Energy Architecture features for energy

Simplicity Constraints (1) Programming productivity

(2) Cost/Reliability

Past Priorities

Future Priorities

A Paradigm Shift Is Needed

Page 28: Neue Möglichkeiten schaffen Europäische Intel Forschung ...invasic.informatik.uni-erlangen.de/publications/Hoppe2012.pdf · IVCI Projekte – Markerless Performance Capture Prof

Extreme Voltage Scaling

Source: Intel

10-2

10-1

1

101

0.2 0.4 0.6 1.0 1.2 1.4

50

100

150

200

250

300

350

400

450

9.6x

65nm CMOS, 50 C

Supply Voltage

Av

era

ge

Le

aka

ge

Po

we

r (m

W)

En

erg

y E

ffic

ien

cy (

GO

PS

/Wat

t)

320mV

Su

bth

resh

old

Re

gio

n

0.2 0.4 0.6 1.0 1.2 1.4

101

102

103

104

1

65nm CMOS, 50 C

320mV Ma

x. F

req

ue

ncy

(M

Hz)

Supply Voltage

10-2

10-1

1

101

102

To

tal P

ow

er

(mW

)

Page 29: Neue Möglichkeiten schaffen Europäische Intel Forschung ...invasic.informatik.uni-erlangen.de/publications/Hoppe2012.pdf · IVCI Projekte – Markerless Performance Capture Prof

New levels

of memory

hierarchy

Emerging

memory

technologies

Minimize

data

movement

across

hierarchy

Innovative

packaging

and IO

solutions

Re-think System Level Memory Architecture

Page 30: Neue Möglichkeiten schaffen Europäische Intel Forschung ...invasic.informatik.uni-erlangen.de/publications/Hoppe2012.pdf · IVCI Projekte – Markerless Performance Capture Prof

Dealing with Parallelism and Locality Challenges

Exascale Software Study: Software Challenges in Extreme Scale Systems

Programming Today’s Mindset Needs to Change

Software Limitations With the Entire SW Stack

Architecture Better Mapping of Code onto Architecture

Page 31: Neue Möglichkeiten schaffen Europäische Intel Forschung ...invasic.informatik.uni-erlangen.de/publications/Hoppe2012.pdf · IVCI Projekte – Markerless Performance Capture Prof

Intel European Exascale Labs

Strong Commitment To Advance Computing Leading Edge: Intel collaborating with HPC community & European researchers

4 labs in Europe - Exascale computing is the central topic

Strong Commitment To Advance Computing Leading Edge: Intel collaborating with HPC community & European researchers

4 labs in Europe - Exascale computing is the central topic

ExaScale Computing

Research Lab, Paris

Performance and scalability of Exascale applications

Tools for performance characterization

Space weather prediction

Architectural simulation

Scalable kernels and RT

ExaScience Lab,

Leuven

Scalable RTS and tools

New algorithms

Intel and BSC Exascale Lab, Barcelona

ExaCluster Lab,

Jülich

Exascale cluster scalability and reliability

http://www.exascale-labs.eu

Page 32: Neue Möglichkeiten schaffen Europäische Intel Forschung ...invasic.informatik.uni-erlangen.de/publications/Hoppe2012.pdf · IVCI Projekte – Markerless Performance Capture Prof

Intel European Exascale Labs

Role • Understand requirements for

Exascale apps

• Provide feedback to Intel HW architects

• Provide guidance to application developers

• Build Exascale HW and SW prototypes

• Contribute to European and national projects

Status

• Started 2010/2011 as co-design centers

• With leading European HPC R&D organizations

• In total ~70 researchers

• Joint R&D program with partners

• Part of Intel Labs Europe network with >1,500 R&D profesionals

Page 33: Neue Möglichkeiten schaffen Europäische Intel Forschung ...invasic.informatik.uni-erlangen.de/publications/Hoppe2012.pdf · IVCI Projekte – Markerless Performance Capture Prof

Jülich ExaCluster Laboratory

SW Scalability and Resilience

Exascale Cluster Architecture

Exascale Simulation and Tools

The DEEP Architecture

Page 34: Neue Möglichkeiten schaffen Europäische Intel Forschung ...invasic.informatik.uni-erlangen.de/publications/Hoppe2012.pdf · IVCI Projekte – Markerless Performance Capture Prof

Belgium: Flanders ExaScience Lab

Application

Frameworks

Architectural Simulations

Visualization

Methodologies Exascale

Space-Weather

Prediction

Katholieke Universiteit Leuven Universiteit Gent

Vrije Universiteit Brussel Universiteit Antwerpen

Universiteit Hasselt

Katholieke Universiteit Leuven Universiteit Gent

Vrije Universiteit Brussel Universiteit Antwerpen

Universiteit Hasselt

Page 36: Neue Möglichkeiten schaffen Europäische Intel Forschung ...invasic.informatik.uni-erlangen.de/publications/Hoppe2012.pdf · IVCI Projekte – Markerless Performance Capture Prof

Barcelona: Intel and BSC Exascale Lab

Scalable

Run-time

System

New

Algorithms

Scalable

Performance

tools

Page 37: Neue Möglichkeiten schaffen Europäische Intel Forschung ...invasic.informatik.uni-erlangen.de/publications/Hoppe2012.pdf · IVCI Projekte – Markerless Performance Capture Prof