what’s new in visual c++
DESCRIPTION
C++ open positions and popularity remain high as media has recently, and there is a reason for that: from the many languages and platforms that developers have available today, C++ features uncontested capabilities in power and performance, allowing innovation outside the box (just think on action games, natural user interfaces or augmented reality, to mention some). In this talk you’ll see the new features and technologies that are coming with Visual C++ vNext, helping you build compelling applications with a renewed developer experience. Don’t miss it!!TRANSCRIPT
What’s new in Visual C++ 11
Jim Hogg
Program ManagerVisual C++Microsoft
Agenda
• Why C++?• Performance : CPUs and GPUs• Baseline : Single-CPU / Multi-CPU Demo• Vector CPU Demo• GPU : C++ AMP Demo
• ISO C++ 11• ALM (Application Lifetime
Management)
Why C++? : Power & Performance
“The going word at Facebook is that ‘reasonably written C++
code just runs fast,’ which underscores the enormous effort spent at optimizing PHP and Java code. Paradoxically, C++ code is
more difficult to write than in other languages, but
efficient code is a lot easier.” – Andrei Alexandrescu
power: driver at all scales – on-die, mobile, desktop, datacenterPerf/W
Perf/T
size: limits on processor resources – desktop,
mobileexperiences: bigger experiences on smaller hardware; pushing envelope means every cycle matters
Perf/C
Agenda
• Why C++?• Performance : CPUs and GPUs• Baseline : Single-CPU / Multi-CPU Demo• Vector CPU Demo• GPU : C++ AMP Demo
• ISO C++ 11• ALM (Application Lifetime
Management)
CPU v.s. GPU today
CPU
• Low memory bandwidth• Higher power consumption• Medium level of parallelism• Deep execution pipelines• Random accesses• Supports general code• Mainstream programming
GPU
• High memory bandwidth• Lower power consumption• High level of parallelism• Shallow execution pipelines• Sequential accesses• Supports data-parallel code• Niche programming
images source: AMD
NBody Simulation, CPU (novec)
Vector Processors (CPU)
SCLRUNIT
VECTUNIT
Vector Processors – How they work
ADD RAX, RBX
1.10
1.20
RAX
RBX
2.30
ADDPS XMM1, XMM2
XMM1
XMM2
1.10 2.10 3.10 4.10
RAX
1.20 2.20 3.20 4.20
2.30 4.30 6.30 8.30XMM1
SCALAR
VECTOR
for (int i = 0; i < 1000; ++i) a[i] += b[i ]
for (int i = 0; i < 1000; i += 4) a[i : i+3] += b[i : i+3]
Vector Processors (CPU)
SCLRUNIT
VECTUNIT
VECTUNIT
VECTUNIT
VECTUNIT
SCLRUNIT
SCLRUNIT
SCLRUNIT
Compiler Enhancements• Auto-vectorizer• Automatically vectorize
loops.• SIMD instructions. • ON by default
• Auto-parallelization– Reorganizes the loop to
run on multiple threads – /Qpar– Optional #pragma loop
for (i = 0; i < 1024; i++) a[i] = b[i] * c[i];
for (i = 0; i < 1024; i += 4) a[i:i+3] = b[i:i+3] *
c[i:i+3];
#pragma loop(hint_parallel(N))
for (i = 0; i < 1024; i++) a[i] = b[i] * c[i];
Multi-Core Machines (w/ Vectorization)
SCLRUNIT
VECTUNIT
SCLRUNIT
VECTUNIT
SCLRUNIT
VECTUNIT
SCLRUNIT
VECTUNIT
NBody Simulation, CPU (Auto Vectorize + Parallelize)
The Big Picture – Vectorization
Source Code Assembly of Bodyint A[20000];int B[20000];int C[20000];
for (i=0; i<20000; i++) { A[i] = B[i] + C[i];}
$LL3@foo: mov ecx, DWORD PTR ?C@@3PAHA[eax*4] mov edx, DWORD PTR ?B@@3PAHA[eax*4] add ecx, edx mov DWORD PTR ?A@@3PAHA[eax*4], ecx
inc eax cmp eax, esi jl SHORT $LL3@foo
Transformation Assembly of Bodyint A[20000];int B[20000];int C[20000];
for (i=0; i<20000; i+=4) { A[i:i+3] = B[i:i+3] + C[i:i+3];}
$LL3@foo: movdqu xmm1, XMMWORD PTR ?C@@3PAHA[eax*4] movdqu xmm0, XMMWORD PTR ?B@@3PAHA[eax*4] paddd xmm1, xmm0 movdqu XMMWORD PTR ?A@@3PAHA[eax*4], xmm1
add eax, 4 cmp eax, ecx jl SHORT $LL3@foo
Dev11 /O2 400% Speedup!!!
Not Your Grandfather’s Vectorizer for (k = 1; k <= M; k++) {
mc[k] = mpp[k-1] + tpmm[k-1]; if ((sc = ip[k-1] + tpim[k-1]) > mc[k]) mc[k] = sc; if ((sc = dpp[k-1] + tpdm[k-1]) > mc[k]) mc[k] = sc; if ((sc = xmb + bp[k]) > mc[k]) mc[k] = sc; mc[k] += ms[k]; if (mc[k] < -INFTY) mc[k] = -INFTY;
dc[k] = dc[k-1] + tpdd[k-1]; if ((sc = mc[k-1] + tpmd[k-1]) > dc[k]) dc[k] = sc; if (dc[k] < -INFTY) dc[k] = -INFTY;
if (k < M) { ic[k] = mpp[k] + tpmi[k]; if ((sc = ip[k] + tpii[k]) > ic[k]) ic[k] = sc; ic[k] += is[k]; if (ic[k] < -INFTY) ic[k] = -INFTY; } }
for (k = 1; k <= M; k++) { dc[k] = dc[k-1] + tpdd[k-1]; if ((sc = mc[k-1] + tpmd[k-1]) > dc[k]) dc[k] = sc; if (dc[k] < -INFTY) dc[k] = -INFTY;
for (k = 1; k <= M; k++) { if (k < M) { ic[k] = mpp[k] + tpmi[k]; if ((sc = ip[k] + tpii[k]) > ic[k]) ic[k] = sc; ic[k] += is[k]; if (ic[k] < -INFTY) ic[k] = -INFTY; }}
for (k = 1; k < M; k++) { ic[k] = mpp[k] + tpmi[k]; if ((sc = ip[k] + tpii[k]) > ic[k]) ic[k] = sc; ic[k] += is[k]; if (ic[k] < -INFTY) ic[k] = -INFTY; }
Agenda
• Why C++?• Performance : CPUs and GPUs• Baseline : Single-CPU / Multi-CPU Demo• Vector CPU Demo• GPU : C++ AMP Demo
• ISO C++ 11• ALM (Application Lifetime
Management)
N-Body Simulation (GPU)
The Power of Heterogeneous Computing
146X
Interactive visualization of
volumetric white matter connectivity
36X
Ionic placement for molecular
dynamics simulation on
GPU
19X
Transcoding HD video stream to
H.264
17X
Simulation in Matlab
using .mex file CUDA function
100X
Astrophysics N-body simulation
149X
Financial simulation of LIBOR model
with swaptions
47X
GLAME@lab: An M-script API for linear Algebra operations on
GPU
20X
Ultrasound medical
imaging for cancer
diagnostics
24X
Highly optimized
object oriented molecular dynamics
30X
Cmatch exact string matching to find similar proteins and
gene sequences
source
C++ AMP• Part of Visual C++ • Visual Studio integration• STL-like library for multidimensional data • Builds on Direct3D
performance
portability
productivity
Hello World: Array Addition
void AddArrays(int n, int * pA, int * pB, int * pC){
for (int i=0; i<n; i++)
{ pC[i] = pA[i] + pB[i]; }
}
#include <amp.h>using namespace concurrency;
void AddArrays(int* a, int* b, int* c, int N){ array_view<int,1> va(N, a); array_view<int,1> vb(N, b); array_view<int,1> vc(N, c); parallel_for_each( va.grid, [=](index<1> i) restrict(direct3d) { va[i] = vb[i] + vc[i]; } );}
void AddArrays(int* a, int* b, int* c, int N){
for (int i = 0; i < n; ++i)
{ a[i] = b[i] + c[i]; }
}
Basic Elements of C++ AMP coding
void AddArrays(int* a, int* b, int* c, int N){ array_view<int,1> va(N, a); array_view<int,1> vb(N, b); array_view<int,1> vc(N, c); parallel_for_each(
va.grid, [=](index<1> i) restrict(direct3d) { va[i] = vb[i] + vc[i];
} );}
array_view variables captured and associated data copied to accelerator (on demand)
restrict(direct3d): tells the compiler to check that this code can execute on Direct3D hardware (aka accelerator)
parallel_for_each: execute the lambda on the accelerator once per thread
grid: the number and shape of threads to execute the lambda
index: the thread ID that is running the lambda, used to index into data
array_view: wraps the data to operate on the accelerator
Achieving maximum performance gains• Schedule threads in tiles
• Avoid thread index remapping• Gain ability to use tile static
memory
0 1 2 3 4 5
0
1
2
3
4
5
6
7
0 1 2 3 4 5
0
1
2
3
4
5
6
7
g.tile<2,2>()g.tile<4,3>()
array_view<int,2> data(8, 6, p_my_data);parallel_for_each( data.grid.tile<2,2>(), [=] (tiled_index<2,2> t_idx)… { … });
C++ AMP at a Glance• restrict(direct3d, cpu)• parallel_for_each• class array<T,N>• class array_view<T,N>• class index<N>• class extent<N>,
grid<N>• class accelerator• class accelerator_view
• tile_static storage class• class tiled_grid< , , >• class tiled_index< , , >• class tile_barrier
Visual Studio/C++ AMP• Organize• Edit• Design• Build• Browse• Debug• Profile
C++ AMP Parallel Debugger• Well known Visual Studio debugging features • Launch, Attach, Break, Stepping, Breakpoints, DataTips • Toolwindows • Processes, Debug Output, Modules, Disassembly, Call Stack,
Memory, Registers, Locals, Watch, Quick Watch
• New features (for both CPU and GPU)• Parallel Stacks window, Parallel Watch window, Barrier
• New GPU-specific• Emulator, GPU Threads window, race detection
Summary
• Democratization of parallel hardware programmability• Performance for the mainstream• High-level abstractions in C++ (not C)• State-of-the-art Visual Studio IDE• Hardware abstraction platform
• C++ AMP now published as open specification• http://download.microsoft.com/download/4/0/E/40EA02D8-23A7-4BD2-
AD3A-0BFFFB640F28/CppAMPLanguageAndProgrammingModel.pdf
Agenda
• Why C++?• Performance : CPUs and GPUs• Baseline : Single-CPU / Multi-CPU Demo• Vector CPU Demo• GPU : C++ AMP Demo
• ISO C++ 11• ALM (Application Lifetime
Management)
Modern C++: Clean, Safe and Fast
circle* p = new circle( 42 );
vector<shape*> v = load_shapes();
for( vector<circle*>::iterator i = v.begin(); i != v.end(); ++i ) { if(*i && **i == *p )
cout << **i << “ is a match\n”;}
for( vector<circle*>::iterator i = v.begin(); i != v.end(); ++i ) { delete *i;}
delete p;
auto p = make_shared<circle>( 42 );
vector<shared_ptr<shape>> vw = load_shapes();
for_each( begin(vw), end(vw), [&]( shared_ptr<circle>& s ) { if( s && *s == *p ) cout << *s << “ is a match\n”;} );
Then NowT*
shared_ptr<T>
new make_shared
no need for “delete”
automatic lifetime management
exception-safe
for/while/do std:: algorithms
[&] lambda functions
auto type deduction
not exception-safe
missing try/catch, __try/__finally
C++ 11 Language Features in Visual StudioC++11 Core Language Features VC10 VC11
rvalue references v2.0 v2.1*auto v1.0 v1.0decltype v1.0 v1.1**static_assert Yes Yestrailing return types Yes Yeslambdas v1.0 v1.1nullptr Yes Yesstrongly typed enums Partial Yesforward declared enums No Yesstandard-layout and trivial types No Yesatomics No Yesstrong compare and exchange No Yesbidirectional fences No Yesdata-dependency ordering No Yes
rvalue refsstruct Car { string make; // eg “Volvo” int when; // last-serviced – eg 201103 => March 2011};
workOnClone(Car c); // work on a clone of my car – not returned
inspect(const Car& c); // inspect, but don’t alter, my car
fix(Car& c); // fix and return my car
replace(Car&& c); // take my car and cannibalize it – I won’t be using it again// note that && is not a ref-to-ref (unlike **)// enables “move semantics” and “perfect forwarding”
auto
for (std::map<string, vector<double>>::const_iterator iter = m.cbegin(); iter != m.cend(); ++iter) for (auto iter = m.cbegin(); iter != m.cend(); ++iter) const auto * p = new MyClass; // “add back” qualifiers to auto’s inferred typeconst auto & r = s; // “add back” qualifiers to auto’s inferred type
auto a1 = new auto(42); // infers int*auto * a2 = new auto(42); // beware: also infers int*
Notes: static type inference!like C# “var”may break old code: old auto specifies allocation within current stack
frame
int n = 42;double pi = 3.14159;auto x = n * e; // will infer type of x is double
decltype
decltype(new C) c = new C; // c is a C*// Note: first “new C” is not executed
std::vector<int>::const_iterator iter1; // a long type name
decltype(iter1) iter2; // iter2 has same type as iter1
static_assert
static_assert (FeetPerMile > 5200 && FeetPerMile < 6100, “FeetPerMile is wrong”);
#if VERSION < 8 #error “Need version 8 or higher”#endif
pre-processor-time
compile-time
bool done(float g1, float g2, float tol) { assert (tol < 1.0e-3);
run-time
template<class T> struct S { static_assert(sizeof(T) < sizeof(int), “T is too big”); static_assert(std::is_unsigned<T>::value, “S needs an unsigned type”);
Trailing-Return-Type
template<class A, class B> ??? adder(A &a, B &b) { return a + b; } // no!
template<class A, class B> decltype(a + b) adder(A &a, B &b) { return a + b; }// no!
template<class A, class B> auto adder(A &a, B &b) -> decltype(a + b) { return a + b; } // yes!
lambdas – functions with no name[ ] ( ) -> int { return 42; } ; // no arguments[ ] (int n) -> int { return n * n; } ; // one argument[ ] (int a, int b) -> int { return a + b; } ; // two arguments
for_each(v.begin(), v.end(), [ ] (int n) { cout << n << “ “; }); // one-liner
float f1 = integrate ( golden, 0.0, 1.0 );float f2 = integrate ( [ ] (float x ) { return x * x + x – 1; }, 0.0, 1.0 );
[ ] { cout << “hi” } // can omit ( ) if no parameters// can omit -> return-type if inferable
[ capture-clause] ( parameter-list ) -> return-type { body } // grammar
Strongly-Typed Enums
enum Heights {SHORT, TALL}; // okenum Widths {BYTE, SHORT, INT, LONG}; // clash
Use enum class
enum class Heights {SHORT, TALL};enum class Widths {BYTE, SHORT, INT, LONG}; // eg: Widths::SHORT
Illegal – members must be globally unique
enum Colors {RED, GREEN, BLUE};if (GREEN == 1) cout << “GREEN == 1”; // yes!enum Parts {ENGINE, BRAKE, CLUTCH};if (GREEN == BRAKE) cout << “GREEN == BRAKE”; // yes!
enum members are just integers
Forward-Declared Enum Classes
enum class Colors; // forward declaration
void fun(Colors c); // use
. . .
enum class Colors : unsigned char {RED = 3, GREEN, BLUE = 7};
nullptr// the NULL hack:
int* p1 = 0; // value of 0 is ‘special’int* p2 = 42; // illegal
void f (int* p) { cout << p; };f(0); // works
void f (int n) { cout << n; }void f (int* p) { cout << p; };f(0); // which one?
f(nullptr); // calls f(int*)
void f (int n) { cout << n; };f(0); // works
decltype(nullptr) == nullptr_t
Memory Model – Scary Terminology• Dekker’s algorithm
• Double check locking• Weak memory consistency• Atomics• Memory fences/barriers• Volatile• Sequential consistency• Acquire/Release semantics• Axiomatic definition & litmus tests
Dekker’s Algorithm
flag[0] := true while flag[1] = true { if turn ≠ 0 { flag[0] := false while turn ≠ 0 { } flag[0] := true } } // critical section turn := 1 flag[0] := false
flag[1] := true while flag[0] = true { if turn ≠ 1 { flag[1] := false while turn ≠ 1 { } flag[1] := true } } // critical section turn := 0 flag[1] := false
Memory
Proc Proc
Lock
Store buffer
Store buffer
http://www.cl.cam.ac.uk/~pes20/weakmemory/x86tso-paper.tphols.pdf
Each proc has FIFO store bufferReads read from local SB
Read bypassing
MFENCE flushes SB
LOCK’d instruction acqiures Lock (eg: XCHG)
Write to SB may reach memory at any time Lock is not held
C++ Libraries (VS)• STL• C++ 11 conformant• Support for new headers in VS vNext• <atomic>, <filesystem>, <thread> (others)
• PPL• Parallel Algorithms• Task-based programming model• Agents and Messaging - express dataflow pipelines• Concurrency-safe containers
Agenda
• Why C++?• Performance : CPUs and GPUs• Baseline : Single-CPU / Multi-CPU Demo• Vector CPU Demo• GPU : C++ AMP Demo
• ISO C++ 11• ALM (Application Lifetime
Management)
ALM (Application Life Management)
• 2010 features Updated• Architecture Tools
• Dependency Diagrams• Architecture Explorer
• Unit Testing
• Native Unit Test Framework
• Manage and Run tests in VS and Test Manager
• Lightweight Requirements• Agile Planning Tools• Stakeholder Feedback• Context Switching• Code Review• Exploratory Testing
• Additional new C++ features
• New ALM features in vNext
Code Understanding
demo
Q&A
PARTICIPATE IN C++ DEVELOPMENT USER RESEARCH
MICROSOFT DEVELOPER DIVISION DESIGNRESEARCH
SIGN UP ONLINE AThttp://bit.ly/cppdeveloper
MICROSOFTC++2
01
2
Chaque semaine, les DevCampsALM, Azure, Windows Phone, HTML5, OpenDatahttp://msdn.microsoft.com/fr-fr/devcamp
Téléchargement, ressources et toolkits : RdV sur MSDNhttp://msdn.microsoft.com/fr-fr/
Les offres à connaître90 jours d’essai gratuit de Windows Azure www.windowsazure.fr
Jusqu’à 35% de réduction sur Visual Studio Pro, avec l’abonnement MSDN www.visualstudio.fr
Pour aller plus loin
10 février 2012
Live Meeting
Open Data - Développer des applications riches avec le protocole Open Data
16 février 2012
Live Meeting
Azure series - Développer des applications sociales sur la plateforme Windows Azure
17 février 2012
Live Meeting
Comprendre le canvas avec Galactic et la librairie three.js
21 février 2012
Live Meeting
La production automatisée de code avec CodeFluent Entities
2 mars 2012
Live Meeting
Comprendre et mettre en oeuvre le toolkit Azure pour Windows Phone 7, iOS et Android
6 mars 2012
Live Meeting
Nuget et ALM
9 mars 2012
Live Meeting
Kinect - Bien gérer la vie de son capteur
13 mars 2012
Live Meeting
Sharepoint series - Automatisation des tests
14 mars 2012
Live Meeting
TFS Health Check - vérifier la bonne santé de votre plateforme de développement
15 mars 2012
Live Meeting
Azure series - Développer pour les téléphones, les tablettes et le cloud avec Visual Studio 2010
16 mars 2012
Live Meeting
Applications METRO design - Désossage en règle d'un template METRO javascript
20 mars 2012
Live Meeting
Retour d'expérience LightSwitch, Optimisation de l'accès aux données, Intégration Silverlight
23 mars 2012
Live Meeting
OAuth - la clé de l'utilisation des réseaux sociaux dans votre application
Prochaines sessions des Dev Camps