overview of parallel development - ericnel
DESCRIPTION
Overview of the new parallel capabilities in .NET 4.0 plus a swift look at Axum (a new language) and Concurrent Basic (a research project)TRANSCRIPT
1
Overview of Parallel DevelopmentVisual Studio 2010 + a little on Axum and Concurrent Basic
Eric [email protected] http://geekswithblogs.net/iupdateablehttp://blogs.msdn.com/goto100 http://twitter.com/ericnel
htt
p:/
/msd
n.m
icro
soft
.com
/uk/
flash
Editor of the UK MSDN Flash
MSDN Flash eBook 13 of the “Best Technical Articles of 2008”http://bit.ly/flashebook1
MSDN Flash Podcast PilotFor feedbackhttp://bit.ly/flashpod1
Technical Authors wanted for the Flash – 400 to 500 words. Fancy it?
Microsoft UK MSDN Flash NewsletterEvery two weeks, pure joy enters your Inbox
Poll
TipTech
nical
Article
Editori
alFreshDiscoveries
UK Events
Agenda
Overview of what we are up toDrill down into parallel programming for managed developersIf we have time, “heads up” on Axum and CB
Things I learnt...We have a very large investment in parallel computing
We have “something for everyone”It is not all synced, it is sometimes overlapping
It is a big topicManaged vs native vs client vs server vs task vs data...
Even with the investment, design/code/test for parallel is far harder
Locking, Deadlocks, LivelocksIt is about getting ready for the future
Code today – run better tomorrow?VS2010 CTP – not a great place for parallel
Single core in guestUnsupported route to use Hyper-V
Easiest route to dabble – Microsoft Parallel Extensions June CTP for VS2008
Buying a new Processor
£100 - £300
2-3GHz
2 cores or 4
64-bit
Core
Core
Buying a new Processor
CoreCoreCoreCore£200 - £500
2-3GHz
4 cores with HT
64-bit
QuickPath Interconnect
Memory Controller
Where will it all end?
Was it a wise purchase?
Windows OS
App 1 App 2 ...
App 1
.NET CLR
.NET Framework
My Code
Was it a wise purchase?
Some environments scale to take advantage of additional CPU cores (mostly server-side)
A lot of code does not (mostly client-side)This code will see little benefit from future hardware advances
ASP.NET Web Forms/Services WCF Services WF Engine ...
.NET ThreadPool or Custom Threading Strategy
What happened to “The Free Lunch”?
Bad sequential code will run faster on a faster processor
1 2 4 8 16 320
0.5
1
1.5
2
2.5
3
Speedup
Speedup
Just using parallel code is not enoughBad parallel code WILL NOT run faster on more cores
0
16
32
48
64
0 16 32 48 64
Cores
Par
alle
l S
pee
du
p
Production Fluid
Production Face
Production Cloth
Game Fluid
Game Rigid Body
Game Cloth
Marching Cubes
Sports Video Analysis
Video Cast Indexing
Home Video Editing
Text Indexing
Ray Tracing
Foreground Estimation
Human Body Tracker
Portifolio Management
Geometric Mean
Graphics Rendering – Physical Simulation -- Vision – Data Mining -- Analytics
Applications Can Scale Well
Multithreaded programming is “hard” todayDoable by only a subgroup of senior specialistsParallel patterns are not prevalent, well known, nor easy to implementSo many potential problems
Races, deadlocks, livelocks, lock convoys, cache coherency overheads, lost event notifications, broken serializability, priority inversion, and so on…
Businesses have little desire to “go deep”Best developers should focus on business value, not concurrencyNeed simple ways to allow all developers to write concurrent code
What's The Problem?
Example: Matrix Multiplication
void MatrixMult( int size, double** m1, double** m2, double** result){ for (int i = 0; i < size; i++) { for (int j = 0; j < size; j++) { result[i][j] = 0; for (int k = 0; k < size; k++) { result[i][j] += m1[i][k] * m2[k][j]; } } }}
Manual Parallel Solution
void MatrixMult( int size, double** m1, double** m2, double** result) { int N = size; int P = 2 * NUMPROCS; int Chunk = N / P; HANDLE hEvent = CreateEvent(NULL, TRUE, FALSE, NULL); long counter = P; for (int c = 0; c < P; c++) { std::thread t ([&,c] { for (int i = c * Chunk; i < (c + 1 == P ? N : (c + 1) * Chunk); i++) { for (int j = 0; j < size; j++) { result[i][j] = 0; for (int k = 0; k < size; k++) { result[i][j] += m1[i][k] * m2[k][j]; } } } if (InterlockedDecrement(counter) == 0) SetEvent(hEvent); }); } WaitForSingleObject(hEvent,INFINITE); CloseHandle(hEvent);}
Synchronization Knowledge
Error prone
Heavy synchronization
Static partitioning
Lack of thread reuse
Tricks
Lots of boilerplate
Microsoft Parallel Computing Technologies
•Robotics-based manufacturing assembly line•Silverlight Olympics viewer
•Enterprise search, OLTP, collab•Animation / CGI rendering•Weather forecasting•Seismic monitoring•Oil exploration
•Automotive control system •Internet –based photo services
•Ultrasound imaging equipment •Media encode/decode•Image processing/ enhancement•Data visualization
Task Concurrency
Data Parallelism
Distributed/Cloud Computing
LocalComputin
g
CCR
Maestro aka AxumTPL / PPL
Cluster TPL
Cluster PLINQ
MPI / MPI.Net
WCF
Cluster SOA
WF
PLINQ
TPL / PPL
CDS
OpenMP
WF
Compute Shader
Visual Studio 2010Tools / Programming Models / Runtimes
Parallel Pattern Library
Resource Manager
Task Scheduler
Task Parallel Library
PLINQ
Managed Library Native LibraryKey:
ThreadsOperating System
Concurrency Runtime
Programming Models
AgentsLibrary
ThreadPool
Task Scheduler
Resource Manager
Data Structures
Dat
a St
ruct
ures
Integrated Tooling
Tools
ParallelDebugger
Tool
Profiler Concurrency
Analysis
Programming Models
Concurrency Runtime
17
Explicit Tasking Support
.NET 4.0 Task Parallel Library
Task, TaskFactoryParallel.ForParallel.ForeachParallel.InvokeConcurrent data structures
Visual Studio 2010 C++Parallel Pattern Library
task, task_groupparallel_forparallel_for_eachparallel_invokeConcurrent data structuresPrimitives for message passingUser-mode locks
Task Parallel Library ( TPL )
19
Task
No Threadingto Threadingto Tasks
demo
Program Thread
CLR Thread Pool
User Mode Scheduler
GlobalQueue
Worker Thread 1
Worker Thread p
…
CLR Thread Pool: Work-Stealing
Worker Thread 1
Worker Thread p
…
Program Thread
User Mode Scheduler For Tasks
GlobalQueue
LocalQueue
LocalQueue
…
Task 1Task 2Task 3
Task 5Task 4Task 6
22
Tasks revisited
More on Tasks
demo
Debugger Support
Support both managed and native1. Parallel Tasks2. Parallel Stacks
Higher Level Constructs
Even with Task there are common patterns that build into higher level abstractions
The Parallel classInvoke, For, For<T>, Foreach
Care needs to be taken with state, ordering“This is not your Father’s for loop”
25
Parallel
Parallel.ForEachParallel.Invoke
demo
Declarative Data Parallelism
Parallel LINQ-to-Objects (PLINQ)Enables LINQ devs to leverage multiple coresFully supports all .NET standard query operatorsMinimal impact to existing LINQ model
var q = from p in people where p.Name == queryInfo.Name && p.State == queryInfo.State && p.Year >= yearStart && p.Year <= yearEnd orderby p.Year ascending select p;
.AsParallel()
27
Parallel LINQ
demo
Example: "Baby Names"
IEnumerable<BabyInfo> babies = ...;var results = new List<BabyInfo>();foreach(var baby in babies){ if (baby.Name == queryName && baby.State == queryState && baby.Year >= yearStart && baby.Year <= yearEnd) { results.Add(baby); }}results.Sort((b1, b2) => b1.Year.CompareTo(b2.Year));
Manual Naïve Parallel Solution
IEnumerable<BabyInfo> babies = …;var results = new List<BabyInfo>();int partitionsCount = Environment.ProcessorCount;int remainingCount = partitionsCount;var enumerator = babies.GetEnumerator();try { using (var done = new ManualResetEvent(false)) { for(int i = 0; i < partitionsCount; i++) { ThreadPool.QueueUserWorkItem(delegate { while(true) { BabyInfo baby; lock (enumerator) { if (!enumerator.MoveNext()) break; baby = enumerator.Current; } if (baby.Name == queryName && baby.State == queryState && baby.Year >= yearStart && baby.Year <= yearEnd) { lock (results) results.Add(baby); } } if (Interlocked.Decrement(ref remainingCount) == 0) done.Set(); }); } done.WaitOne(); results.Sort((b1, b2) => b1.Year.CompareTo(b2.Year)); }}finally { if (enumerator is IDisposable) ((IDisposable)enumerator).Dispose(); }
LINQ Solution
var results = from baby in babies where baby.Name == queryName && baby.State == queryState && baby.Year >= yearStart && baby.Year <= yearEnd orderby baby.Year ascending select baby;
.AsParallel()
Coordination Data Structures
Thread-safe collectionsConcurrentStack<T>...
LocksSpinLock, SpinWait, SemaphoreSlim ...
Work ExchangeBlockingCollection<T> ...
Phased OperationCountdownEvent ...
32
Coordination Data Structures
demo
What Next?
http://geekswithblogs.net/iupdateable Slides and links http://blogs.msdn.com/pfxteam/ http://msdn.com/concurrency
Wait for the Beta of Visual Studio 2008 andOR for the most impatient
Download VS 2010 CTPRemember to set the clock back
OrDownload Parallel Extensions June 2008 CTP for VS2008
34
Appendix
Heads up: Axum
Previously called MaestroIncubation project!New programming languageLets you take advantage of parallelism without “thinking about it”Agent based programming vs Object based programming
Model agents and their interactions via messages No public methods, fields
Axum “Hello World”
using System; agent Program :
Microsoft.Axum.ConsoleApplication { override int Run(String[] args) { Console.WriteLine("Hello, World!"); } }
Channels and Agents
using System; using System.Concurrency; using Microsoft.Axum; channel Adder { input int Num1; input int Num2; output int Sum; } agent AdderAgent : channel Adder { public AdderAgent() { int result = receive(PrimaryChannel::Num1) + receive(PrimaryChannel::Num2); PrimaryChannel::Sum <-- result; } }
agent MainAgent : channel Microsoft.Axum.Application { public MainAgent() { var adder = AdderAgent.CreateInNewDomain(); adder::Num1 <-- 10; adder::Num2 <-- 20; // do something useful ... var sum = receive(adder::Sum); Console.WriteLine(sum); PrimaryChannel::ExitCode <-- 0; } }
Heads up: Concurrent Basic
Research Projecthttp://channel9.msdn.com/shows/Going+Deep/Claudio-Russo-and-Lucian-Wischik-Inside-Concurrent-Basic/
Added message passing primitives – channels
Module Buffer Public Asynchronous Put(ByVal s As String) Public Synchronous Take() As String Private Function CaseTakeAndPut(ByVal s As String) As String When Take, Put Return s End Function End Module
Thread1: Thread2:Put(“Hello”) result = Take()