high-performance multithreaded producer-consumer designs – from theory to practice
DESCRIPTION
High-performance Multithreaded Producer-consumer Designs – from Theory to Practice. Bill Scherer (University of Rochester) Doug Lea (SUNY Oswego) Rochester Java Users’ Group April 11, 2006. java.util.concurrent. General purpose toolkit for developing concurrent applications - PowerPoint PPT PresentationTRANSCRIPT
![Page 1: High-performance Multithreaded Producer-consumer Designs – from Theory to Practice](https://reader036.vdocuments.us/reader036/viewer/2022062520/5681593e550346895dc67dbc/html5/thumbnails/1.jpg)
High-performance Multithreaded Producer-
consumer Designs – from Theory to Practice
Bill Scherer (University of Rochester)
Doug Lea (SUNY Oswego)
Rochester Java Users’ Group
April 11, 2006
![Page 2: High-performance Multithreaded Producer-consumer Designs – from Theory to Practice](https://reader036.vdocuments.us/reader036/viewer/2022062520/5681593e550346895dc67dbc/html5/thumbnails/2.jpg)
April 11, 2006 Scherer & Lea 2
java.util.concurrent
• General purpose toolkit for developing concurrent applications– No more “reinventing the wheel”!
• Goals: “Something for Everyone!”– Make some problems trivial to solve by everyone
– Develop thread-safe classes, such as servlets, built on concurrent building blocks like ConcurrentHashMap
– Make some problems easier to solve by concurrent programmers
– Develop concurrent applications using thread pools, barriers, latches, and blocking queues
– Make some problems possible to solve by concurrency experts
– Develop custom locking classes, lock-free algorithms
![Page 3: High-performance Multithreaded Producer-consumer Designs – from Theory to Practice](https://reader036.vdocuments.us/reader036/viewer/2022062520/5681593e550346895dc67dbc/html5/thumbnails/3.jpg)
Overview of j.u.c• Executors
– Executor
– ExecutorService
– ScheduledExecutorService
– Callable
– Future
– ScheduledFuture
– Delayed
– CompletionService
– ThreadPoolExecutor
– ScheduledThreadPoolExecutor
– AbstractExecutorService
– Executors
– FutureTask
– ExecutorCompletionService
• Queues– BlockingQueue– ConcurrentLinkedQueue– LinkedBlockingQueue– ArrayBlockingQueue– SynchronousQueue– PriorityBlockingQueue– DelayQueue
• Concurrent Collections– ConcurrentMap
– ConcurrentHashMap
– CopyOnWriteArray{List,Set}
• Synchronizers– CountDownLatch
– Semaphore
– Exchanger
– CyclicBarrier
• Locks: java.util.concurrent.locks– Lock
– Condition
– ReadWriteLock
– AbstractQueuedSynchronizer
– LockSupport
– ReentrantLock
– ReentrantReadWriteLock
• Atomics: java.util.concurrent.atomic– Atomic[Type]
– Atomic[Type]Array
– Atomic[Type]FieldUpdater
– Atomic{Markable,Stampable}Reference
![Page 4: High-performance Multithreaded Producer-consumer Designs – from Theory to Practice](https://reader036.vdocuments.us/reader036/viewer/2022062520/5681593e550346895dc67dbc/html5/thumbnails/4.jpg)
April 11, 2006 Scherer & Lea 4
Key Functional Groups
• Executors, Thread pools and Futures– Execution frameworks for asynchronous tasking
• Concurrent Collections: – Queues, blocking queues, concurrent hash map, …– Data structures designed for concurrent
environments• Locks and Conditions
– More flexible synchronization control– Read/write locks
• Synchronizers: Semaphore, Latch, Barrier– Ready made tools for thread coordination
• Atomic variables– The key to writing lock-free algorithms
![Page 5: High-performance Multithreaded Producer-consumer Designs – from Theory to Practice](https://reader036.vdocuments.us/reader036/viewer/2022062520/5681593e550346895dc67dbc/html5/thumbnails/5.jpg)
April 11, 2006 Scherer & Lea 5
Part I: Theory
![Page 6: High-performance Multithreaded Producer-consumer Designs – from Theory to Practice](https://reader036.vdocuments.us/reader036/viewer/2022062520/5681593e550346895dc67dbc/html5/thumbnails/6.jpg)
April 11, 2006 Scherer & Lea 6
Synchronous Queues
• Synchronized communication channels• Producer awaits explicit ACK from
consumer• Theory and practice of concurrency
– Implementation of language synch. primitives (CSP handoff, Ada rendezvous)
– Message passing software– Java.util.concurrent.ThreadPoolExecutor
![Page 7: High-performance Multithreaded Producer-consumer Designs – from Theory to Practice](https://reader036.vdocuments.us/reader036/viewer/2022062520/5681593e550346895dc67dbc/html5/thumbnails/7.jpg)
April 11, 2006 Scherer & Lea 7
Hanson’s Synch. Queue
datum item;Semaphore sync(0), send(1), recv(0);
datum take() { void put(datum d) { recv.acquire(); send.acquire(); datum d = item; item = d; sync.release(); recv.release(); send.release(); sync.acquire(); return d; }}
![Page 8: High-performance Multithreaded Producer-consumer Designs – from Theory to Practice](https://reader036.vdocuments.us/reader036/viewer/2022062520/5681593e550346895dc67dbc/html5/thumbnails/8.jpg)
April 11, 2006 Scherer & Lea 8
Hanson’s Synch. Queue
datum item;Semaphore sync(0), send(1), recv(0);
datum take() { void put(datum d) { recv.acquire(); send.acquire(); datum d = item; item = d; sync.release(); recv.release(); send.release(); sync.acquire(); return d; }}
![Page 9: High-performance Multithreaded Producer-consumer Designs – from Theory to Practice](https://reader036.vdocuments.us/reader036/viewer/2022062520/5681593e550346895dc67dbc/html5/thumbnails/9.jpg)
April 11, 2006 Scherer & Lea 9
Hanson’s Queue: Limitations
• High overhead– 3 semaphore operations for put and take– Interleaved handshaking – likely to block
• No obvious path to timeout support– Needed e.g. for j.u.c.ThreadPoolExecutor
adaptive thread pool– Producer adds a worker or runs task itself– Consumer terminates if work unavailable
![Page 10: High-performance Multithreaded Producer-consumer Designs – from Theory to Practice](https://reader036.vdocuments.us/reader036/viewer/2022062520/5681593e550346895dc67dbc/html5/thumbnails/10.jpg)
April 11, 2006 Scherer & Lea 10
Java 5 Version
• Fastest known previous implementation• Optional FIFO fairness
– Unfair mode stack-based better locality– Big performance penalty for fair mode
• Global lock covers two queues – (stacks for unfair mode)– One each for awaiting consumers, producers– At least one always empty
![Page 11: High-performance Multithreaded Producer-consumer Designs – from Theory to Practice](https://reader036.vdocuments.us/reader036/viewer/2022062520/5681593e550346895dc67dbc/html5/thumbnails/11.jpg)
April 11, 2006 Scherer & Lea 11
Remainder of Part I
• Introduction
• Nonblocking Synchronization– Why use?– Nonblocking partial methods
• Synchronous Queue Design
• Conclusions
![Page 12: High-performance Multithreaded Producer-consumer Designs – from Theory to Practice](https://reader036.vdocuments.us/reader036/viewer/2022062520/5681593e550346895dc67dbc/html5/thumbnails/12.jpg)
April 11, 2006 Scherer & Lea 12
Nonblocking Synchronization
• Resilient to failure or delay of any thread• Optimistic update pattern:
1) Set-up operation (invisible to other threads)2) Effect all at once (atomic)3) Clean-up if needed (can be done by any thread)
• Atomic compare-and-swap (CAS)
bool CAS(word *ptr, word e, word n) { if (*ptr != e) return false; *ptr = n; return true; }
![Page 13: High-performance Multithreaded Producer-consumer Designs – from Theory to Practice](https://reader036.vdocuments.us/reader036/viewer/2022062520/5681593e550346895dc67dbc/html5/thumbnails/13.jpg)
April 11, 2006 Scherer & Lea 13
Why Use Nonblocking Synch?
• Locks – Performance (convoying, intolerance of
page faults and preemption)– Semantic (deadlock, priority inversion)– Conceptual (scalability vs. complexity)
• Transactional memory– Needs to support the general case– High overheads (currently)
![Page 14: High-performance Multithreaded Producer-consumer Designs – from Theory to Practice](https://reader036.vdocuments.us/reader036/viewer/2022062520/5681593e550346895dc67dbc/html5/thumbnails/14.jpg)
April 11, 2006 Scherer & Lea 14
Pro
gra
mm
er E
ffo
rt
System Performance
CoarseLocks
CannedNBS
FineLocks
Ad HocNBS
HW TM
Software TM (STM)
![Page 15: High-performance Multithreaded Producer-consumer Designs – from Theory to Practice](https://reader036.vdocuments.us/reader036/viewer/2022062520/5681593e550346895dc67dbc/html5/thumbnails/15.jpg)
April 11, 2006 Scherer & Lea 15
Linearizability [HW90]
• Gold standard for correctness
• Linearization Point where operations take place
Time flows left to right
T1: Enqueue (a)
T2: Enqueue (b)
T3: Dequeue (a)
T4: Dequeue (b)
![Page 16: High-performance Multithreaded Producer-consumer Designs – from Theory to Practice](https://reader036.vdocuments.us/reader036/viewer/2022062520/5681593e550346895dc67dbc/html5/thumbnails/16.jpg)
April 11, 2006 Scherer & Lea 16
Linearizability [HW90]
• Gold standard for correctness
• Linearization Point where operations take place
Time flows left to right
T1: Enqueue (a)
T2: Enqueue (b)
T3: Dequeue (b!)
T4: Dequeue (a!)
![Page 17: High-performance Multithreaded Producer-consumer Designs – from Theory to Practice](https://reader036.vdocuments.us/reader036/viewer/2022062520/5681593e550346895dc67dbc/html5/thumbnails/17.jpg)
April 11, 2006 Scherer & Lea 17
Partial Operations• Totalized approach: return failure• Repeat until data retrieved (“try-in-a-loop”)
– Heavy contention on data structures– Output depends on which thread retries first
T1: Dequeue (b!)
T2: Dequeue (a!)
T3: Enqueue (a)T4: Enqueue (b)
![Page 18: High-performance Multithreaded Producer-consumer Designs – from Theory to Practice](https://reader036.vdocuments.us/reader036/viewer/2022062520/5681593e550346895dc67dbc/html5/thumbnails/18.jpg)
April 11, 2006 Scherer & Lea 18
Dual Linearizability
T1: Dequeue (a)
T2: Dequeue (b)
T3: Enqueue (a)T4: Enqueue (b)
Break partial methods into two first-class
halves: pre-blocking reservation, post-
blocking follow-up
![Page 19: High-performance Multithreaded Producer-consumer Designs – from Theory to Practice](https://reader036.vdocuments.us/reader036/viewer/2022062520/5681593e550346895dc67dbc/html5/thumbnails/19.jpg)
April 11, 2006 Scherer & Lea 19
Next Up: Synchronous Queues
• Introduction
• Nonblocking Synchronization
• Synchronous Queue Design– Implementation– Performance
• Conclusions
![Page 20: High-performance Multithreaded Producer-consumer Designs – from Theory to Practice](https://reader036.vdocuments.us/reader036/viewer/2022062520/5681593e550346895dc67dbc/html5/thumbnails/20.jpg)
April 11, 2006 Scherer & Lea 20
Algorithmic Genealogy
Treiber’sStack
DualStack
UnfairSQ
M&SQueue
DualQueue
FairSQ
SourceAlgorithm
ConsumerBlocking
ProducerBlocking,Timeout,Cleanup
Fair mode Unfair mode
![Page 21: High-performance Multithreaded Producer-consumer Designs – from Theory to Practice](https://reader036.vdocuments.us/reader036/viewer/2022062520/5681593e550346895dc67dbc/html5/thumbnails/21.jpg)
April 11, 2006 Scherer & Lea 21
Algorithmic Genealogy
Treiber’sStack
DualStack
UnfairSQ
M&SQueue
DualQueue
FairSQ
SourceAlgorithm
ConsumerBlocking
ProducerBlocking,Timeout,Cleanup
Fair mode Unfair mode
![Page 22: High-performance Multithreaded Producer-consumer Designs – from Theory to Practice](https://reader036.vdocuments.us/reader036/viewer/2022062520/5681593e550346895dc67dbc/html5/thumbnails/22.jpg)
April 11, 2006 Scherer & Lea 22
M&S Queue: Enqueue
Queue
Dummy Data DataData Data Data
E1
E2Queue
DataData Data Data
Head Tail
TailHead
Dummy Data
![Page 23: High-performance Multithreaded Producer-consumer Designs – from Theory to Practice](https://reader036.vdocuments.us/reader036/viewer/2022062520/5681593e550346895dc67dbc/html5/thumbnails/23.jpg)
April 11, 2006 Scherer & Lea 23
M&S Queue: Dequeue
Queue
Dummy Data Data Data Data
Queue
OldDummy
NewDummy Data Data
D1
D2
Head
Head
Tail
Data
Tail
![Page 24: High-performance Multithreaded Producer-consumer Designs – from Theory to Practice](https://reader036.vdocuments.us/reader036/viewer/2022062520/5681593e550346895dc67dbc/html5/thumbnails/24.jpg)
April 11, 2006 Scherer & Lea 24
The Dual Queue
• Separate data, request nodes (flag bit)– queue always data or requests
• Same behavior as M&S queue for data• Reservations are antisymmetric to data
– dequeue enqueues a reservation node– enqueue satisfies oldest reservation
• Tricky consistency checks needed• Dummy node can be datum or reservation
– Extra state to watch out for (more corner cases)
![Page 25: High-performance Multithreaded Producer-consumer Designs – from Theory to Practice](https://reader036.vdocuments.us/reader036/viewer/2022062520/5681593e550346895dc67dbc/html5/thumbnails/25.jpg)
April 11, 2006 Scherer & Lea 25
Dual Queue: Enq. (Requests)
Queue
Dummy Res. Res. Res. Res.
Head Tail
E1
E2
E3
Read dummy’s next ptr
CAS reservation’s data ptr from nil to satisfying data
Update head ptr
E1
E2
E3
![Page 26: High-performance Multithreaded Producer-consumer Designs – from Theory to Practice](https://reader036.vdocuments.us/reader036/viewer/2022062520/5681593e550346895dc67dbc/html5/thumbnails/26.jpg)
April 11, 2006 Scherer & Lea 26
Dual Queue: Enq. (Requests)
Queue
Dummy Res. Res. Res. Res.
Head Tail
E1
E2
E3
Read dummy’s next ptr
CAS reservation’s data ptr from nil to satisfying data
Update head ptr
E3
ItemE2
![Page 27: High-performance Multithreaded Producer-consumer Designs – from Theory to Practice](https://reader036.vdocuments.us/reader036/viewer/2022062520/5681593e550346895dc67dbc/html5/thumbnails/27.jpg)
April 11, 2006 Scherer & Lea 27
Dual Queue: Enq. (Requests)
Queue
Res. Res. Res.
Tail
E1
E2
E3
Read dummy’s next ptr
CAS reservation’s data ptr from nil to satisfying data
Update head ptr
E3
Item
OldDummy
NewDummy
Head
![Page 28: High-performance Multithreaded Producer-consumer Designs – from Theory to Practice](https://reader036.vdocuments.us/reader036/viewer/2022062520/5681593e550346895dc67dbc/html5/thumbnails/28.jpg)
April 11, 2006 Scherer & Lea 28
Synchronous Queue
• Implementation extends dual queue• Consumers already block for producers
– add blocking for the “other direction”
• Add item ptr to data nodes– Consumers CAS from nil to “satisfying request”– Once non-nil, any thread can update head ptr– Timeout support
• Producer CAS from nil back to self • Node reclaimed when it reaches head of queue: seen as
fulfilled node
![Page 29: High-performance Multithreaded Producer-consumer Designs – from Theory to Practice](https://reader036.vdocuments.us/reader036/viewer/2022062520/5681593e550346895dc67dbc/html5/thumbnails/29.jpg)
April 11, 2006 Scherer & Lea 29
The Test Environments
• SunFire 6800– 16 UltraSparc III processors @ 1.2 GHz
• SunFire V40z– 4 AMD Opteron processors @ 2.4 GHz
• Java SE 5.0 HotSpot JVM
• Microbenchmark performance tests
![Page 30: High-performance Multithreaded Producer-consumer Designs – from Theory to Practice](https://reader036.vdocuments.us/reader036/viewer/2022062520/5681593e550346895dc67dbc/html5/thumbnails/30.jpg)
April 11, 2006 Scherer & Lea 30
Producer-Consumer Handoff
0
10000
20000
30000
40000
50000
60000
1 2 3 4 6 8 12 16 24 32 48 64
Pairs
ns/
tran
sfer
SynchronousQueue SynchronousQueue(fair) SynchronousQueue1.6
SynchronousQueue1.6(fair) HansonSQ
Synchronous Queue Performance
16processorSunFire
680014X difference
![Page 31: High-performance Multithreaded Producer-consumer Designs – from Theory to Practice](https://reader036.vdocuments.us/reader036/viewer/2022062520/5681593e550346895dc67dbc/html5/thumbnails/31.jpg)
April 11, 2006 Scherer & Lea 31
ThreadPoolExecutor Impact
16processorSunFire
6800
ThreadPoolExecutor [SPARC]
0
10000
20000
30000
40000
50000
60000
1 2 3 4 6 8 12 16 24 32 48 64
threads
ns/
task
SynchronousQueue SynchronousQueue(fair)
SynchronousQueue1.6 SynchronousQueue1.6(fair)
10X difference
![Page 32: High-performance Multithreaded Producer-consumer Designs – from Theory to Practice](https://reader036.vdocuments.us/reader036/viewer/2022062520/5681593e550346895dc67dbc/html5/thumbnails/32.jpg)
April 11, 2006 Scherer & Lea 32
Next Up: Conclusions
• Introduction
• Nonblocking Synchronization
• Synchronous Queue Design
• Conclusions
![Page 33: High-performance Multithreaded Producer-consumer Designs – from Theory to Practice](https://reader036.vdocuments.us/reader036/viewer/2022062520/5681593e550346895dc67dbc/html5/thumbnails/33.jpg)
April 11, 2006 Scherer & Lea 33
Conclusions
• Low-overhead synchronous queues
• Optional FIFO fairness– Fair mode extends dual queue– Unfair mode extends dual stack– No performance penalty
• Up to 14x performance gain in SQ– Translates to 10x gain for TPE
![Page 34: High-performance Multithreaded Producer-consumer Designs – from Theory to Practice](https://reader036.vdocuments.us/reader036/viewer/2022062520/5681593e550346895dc67dbc/html5/thumbnails/34.jpg)
April 11, 2006 Scherer & Lea 34
Future Work: Types of Scalability
A. Constant overhead for operations, irrespective of the number of threads
– “Low-level” – doesn’t hurt scalability of apps– Spin locks (e.g. MCS), SQ
B. Overall throughput proportional to the number of concurrent threads
– “High-level” – data structure itself– Can be obtained via elimination [ST95]– Stacks [HSY04]; queues [MNSS05]; exchangers
![Page 35: High-performance Multithreaded Producer-consumer Designs – from Theory to Practice](https://reader036.vdocuments.us/reader036/viewer/2022062520/5681593e550346895dc67dbc/html5/thumbnails/35.jpg)
April 11, 2006 Scherer & Lea 35
Part II: Practice
• Thread Creation Patterns
– Loops, oneway messages, workers & pools
• Executor framework
• Advanced Topics
– AbstractQueuedSynchronizer
![Page 36: High-performance Multithreaded Producer-consumer Designs – from Theory to Practice](https://reader036.vdocuments.us/reader036/viewer/2022062520/5681593e550346895dc67dbc/html5/thumbnails/36.jpg)
April 11, 2006 Scherer & Lea 36
Autonomous Loops
• Simple non-reactive active objects contain a run loop of form:• public void run() {
while (!Thread.interrupted()) doSomething();}
• Normally established with a constructor containing:• new Thread(this).start();
– Or by a specific start method– Perhaps also setting priority and daemon status
• Normally also support other methods called from other threads– Requires standard safety measures
• Common Applications– Animations, Simulations, Message buffer Consumers, Polling
daemons that periodically sense state of world
![Page 37: High-performance Multithreaded Producer-consumer Designs – from Theory to Practice](https://reader036.vdocuments.us/reader036/viewer/2022062520/5681593e550346895dc67dbc/html5/thumbnails/37.jpg)
April 11, 2006 Scherer & Lea 37
Thread Patterns for Oneway Messages
![Page 38: High-performance Multithreaded Producer-consumer Designs – from Theory to Practice](https://reader036.vdocuments.us/reader036/viewer/2022062520/5681593e550346895dc67dbc/html5/thumbnails/38.jpg)
April 11, 2006 Scherer & Lea 38
Thread-Per-Message Web Server class UnstableWebServer {
public static void main(String[] args) { ServerSocket socket = new ServerSocket(80); while (true) { final Socket connection = socket.accept(); Runnable r = new Runnable() { public void run() { handleRequest(connection); } }; new Thread(r).start(); } }}
• Potential resource exhaustion unless connection rate is limited– Threads aren’t free!– Don’t do this!
![Page 39: High-performance Multithreaded Producer-consumer Designs – from Theory to Practice](https://reader036.vdocuments.us/reader036/viewer/2022062520/5681593e550346895dc67dbc/html5/thumbnails/39.jpg)
April 11, 2006 Scherer & Lea 39
Thread-Per-Object via Worker Threads
• Establish a producer-consumer chain– Producer
Reactive method just places message in a channel Channel might be a buffer, queue, stream, etc Message might be a Runnable command, event, etc
– ConsumerHost contains an autonomous loop thread of form:while (!Thread.interrupted()) { m = channel.take(); process(m); }
• Common variants– Pools
Use more than one worker thread– Listeners
Separate producer and consumer in different objects
![Page 40: High-performance Multithreaded Producer-consumer Designs – from Theory to Practice](https://reader036.vdocuments.us/reader036/viewer/2022062520/5681593e550346895dc67dbc/html5/thumbnails/40.jpg)
April 11, 2006 Scherer & Lea 40
Web Server Using Worker Threadclass WebServer { BlockingQueue<Socket> queue = new LinkedBlockingQueue<Socket>();
class Worker extends Thread { public void run() { while(!Thread.interrupted()) { Socket s = queue.take(); handleRequest(s); } } }
public void start() { new Worker().start(); ServerSocket socket = new ServerSocket(80); while (true) { Socket connection = socket.accept(); queue.put(connection); } }
public static void main(String[] args) { new WebServer().start();
}}
![Page 41: High-performance Multithreaded Producer-consumer Designs – from Theory to Practice](https://reader036.vdocuments.us/reader036/viewer/2022062520/5681593e550346895dc67dbc/html5/thumbnails/41.jpg)
April 11, 2006 Scherer & Lea 41
Channel Options
• Unbounded queues– Can exhaust resources if clients faster than handlers
• Bounded buffers– Can cause clients to block when full
• Synchronous channels– Force client to wait for handler to complete previous task
• Leaky bounded buffers– For example, drop oldest if full
• Priority queues– Run more important tasks first
• Streams or sockets– Enable persistence, remote execution
• Non-blocking channels– Must take evasive action if put or take fail or time out
![Page 42: High-performance Multithreaded Producer-consumer Designs – from Theory to Practice](https://reader036.vdocuments.us/reader036/viewer/2022062520/5681593e550346895dc67dbc/html5/thumbnails/42.jpg)
April 11, 2006 Scherer & Lea 42
Thread Pools
• Use a collection of worker threads, not just one– Can limit maximum number and priorities of threads– Dynamic worker thread management
Sophisticated policy controls– Often faster than thread-per-message for I/O bound actions
![Page 43: High-performance Multithreaded Producer-consumer Designs – from Theory to Practice](https://reader036.vdocuments.us/reader036/viewer/2022062520/5681593e550346895dc67dbc/html5/thumbnails/43.jpg)
April 11, 2006 Scherer & Lea 43
Web Server Using Executor Thread Pool
• Executor implementations internalize the channel
class PooledWebServer { Executor pool =
Executors.newFixedThreadPool(7);
public void start() { ServerSocket socket = new ServerSocket(80); while (!Thread.interrupted()) { final Socket connection = socket.accept(); Runnable r = new Runnable() { public void run() { handleRequest(connection); } }; pool.execute(r); } }
public static void main(String[] args) { new PooledWebServer().start();
}}
![Page 44: High-performance Multithreaded Producer-consumer Designs – from Theory to Practice](https://reader036.vdocuments.us/reader036/viewer/2022062520/5681593e550346895dc67dbc/html5/thumbnails/44.jpg)
April 11, 2006 Scherer & Lea 44
Policies and Parameters for Thread Pools
• The kind of channel used as task queue
– Unbounded queue, bounded queue, synchronous hand-off, priority queue, ordering by task dependencies, stream, socket
• Bounding resources
– Maximum number of threads
– Minimum number of threads
– “Warm” versus on-demand threads
– Keepalive interval until idle threads dieLater replaced by new threads if necessary
• Saturation policy
– Block, drop, producer-runs, etc
• These policies and parameters can interact in subtle ways!
![Page 45: High-performance Multithreaded Producer-consumer Designs – from Theory to Practice](https://reader036.vdocuments.us/reader036/viewer/2022062520/5681593e550346895dc67dbc/html5/thumbnails/45.jpg)
April 11, 2006 Scherer & Lea 45
Pools in Connection-Based Designs
• For systems with many open connections (sockets), but relatively few active at any given time
• Multiplex the delegations to worker threads via polling– Requires underlying support for select/poll and nonblocking I/O– Supported in JDK1.4 java.nio
![Page 46: High-performance Multithreaded Producer-consumer Designs – from Theory to Practice](https://reader036.vdocuments.us/reader036/viewer/2022062520/5681593e550346895dc67dbc/html5/thumbnails/46.jpg)
April 11, 2006 Scherer & Lea 46
The Executor Framework
• Framework for asynchronous task execution• Standardize asynchronous invocation
– Framework to execute Runnable and Callable tasks– Runnable: void run()– Callable<V>: V call() throws Exception
• Separate submission from execution policy– Use anExecutor.execute(aRunnable)– Not new Thread(aRunnable).start()
• Cancellation and shutdown support• Usually created via Executors factory class
– Configures flexible ThreadPoolExecutor– Customize shutdown methods, before/after hooks, saturation
policies, queuing
![Page 47: High-performance Multithreaded Producer-consumer Designs – from Theory to Practice](https://reader036.vdocuments.us/reader036/viewer/2022062520/5681593e550346895dc67dbc/html5/thumbnails/47.jpg)
April 11, 2006 Scherer & Lea 47
Executor
• Decouple submission policy from task execution• public interface Executor {
void execute(Runnable command);}
• Code which submits a task doesn't have to know in what thread the task will run– Could run in the calling thread, in a thread pool, in a single
background thread (or even in another JVM!)– Executor implementation determines execution policy
– Execution policy controls resource utilization, overload behavior, thread usage, logging, security, etc
– Calling code need not know the execution policy
![Page 48: High-performance Multithreaded Producer-consumer Designs – from Theory to Practice](https://reader036.vdocuments.us/reader036/viewer/2022062520/5681593e550346895dc67dbc/html5/thumbnails/48.jpg)
April 11, 2006 Scherer & Lea 48
ExecutorService
• Adds lifecycle management• ExecutorService supports both graceful and immediate shutdown public interface ExecutorService extends Executor {
void shutdown(); List<Runnable> shutdownNow(); boolean isShutdown(); boolean isTerminated(); boolean awaitTermination(long timeout, TimeUnit
unit);
// … }
• Useful utility methods too– <T> T invokeAny(Collection<Callable<T>> tasks)
– Executes the given tasks returning the result of one that completed successfully (if any)
– Others involving Future objects—covered later
![Page 49: High-performance Multithreaded Producer-consumer Designs – from Theory to Practice](https://reader036.vdocuments.us/reader036/viewer/2022062520/5681593e550346895dc67dbc/html5/thumbnails/49.jpg)
April 11, 2006 Scherer & Lea 49
Creating Executors
• Sample Executor implementations from Executors• newSingleThreadExecutor
– A pool of one, working from an unbounded queue• newFixedThreadPool(int N)
– A fixed pool of N, working from an unbounded queue• newCachedThreadPool
– A variable size pool that grows as needed and shrinks when idle
• newScheduledThreadPool– Pool for executing tasks after a given delay, or
periodically
![Page 50: High-performance Multithreaded Producer-consumer Designs – from Theory to Practice](https://reader036.vdocuments.us/reader036/viewer/2022062520/5681593e550346895dc67dbc/html5/thumbnails/50.jpg)
April 11, 2006 Scherer & Lea 50
ThreadPoolExecutor
• Numerous tuning parameters– Core and maximum pool size
– New thread created on task submission until core size reached
– New thread then created when queue full until maximum size reached
– Note: unbounded queue means the pool won’t grow above core size
– Maximum can be unbounded
– Keep-alive time– Threads above the core size terminate if idle for more than
the keep-alive time
– Pre-starting of core threads, or else on demand
![Page 51: High-performance Multithreaded Producer-consumer Designs – from Theory to Practice](https://reader036.vdocuments.us/reader036/viewer/2022062520/5681593e550346895dc67dbc/html5/thumbnails/51.jpg)
April 11, 2006 Scherer & Lea 51
Customizing ThreadPoolExecutor
• ThreadFactory used to create new threads– Default: Executors.defaultThreadFactory
• Queuing strategies: must be a BlockingQueue<Runnable>– Direct hand-off using SynchronousQueue: no internal
capacity; hands-off to waiting thread, else creates new one if allowed, else task is rejected
– Bounded queue: enforces resource constraints, when full permits pool to grow to maximum, then tasks rejected
– Unbounded queue: potential for resource exhaustion but otherwise never rejects tasks
– Note: Queue is used internally—you cannot directly place tasks in the queue
• Subclass customization through beforeExecute and afterExecute hooks
![Page 52: High-performance Multithreaded Producer-consumer Designs – from Theory to Practice](https://reader036.vdocuments.us/reader036/viewer/2022062520/5681593e550346895dc67dbc/html5/thumbnails/52.jpg)
April 11, 2006 Scherer & Lea 52
Rejected Task Handling
interface RejectExecutionHandler {void rejectedExecution(Runnable r, ThreadPoolExecutor e);
}• Tasks are rejected by a pool
– When it saturates with a bounded queue and maximum pool size– After it has been shutdown
• Four pre-defined policies—nested classes in ThreadPoolExecutor– AbortPolicy: execute throws RejectedExecutionException
– CallerRunsPolicy: execute invokes Runnable.run directly
– Discards this task if the pool has been shutdown– DiscardOldestPolicy: discards the oldest waiting task and
tries execute again– Discards this task if the pool has been shutdown
– DiscardPolicy: silently discard the rejected task
![Page 53: High-performance Multithreaded Producer-consumer Designs – from Theory to Practice](https://reader036.vdocuments.us/reader036/viewer/2022062520/5681593e550346895dc67dbc/html5/thumbnails/53.jpg)
April 11, 2006 Scherer & Lea 53
Case Study: Puzzle Solver
interface Puzzle<P, M> { // P: Position, M: Move P initialPosition(); boolean isGoal(P position); Iterable<M> moves(P position); P applyMove(P position, M move);}
• A general framework for searching the space of positions (states) linked by moves (transitions)– Applies to, for example, sliding block puzzles
– With discrete choices of move for each block configuration• Tools:
– ConcurrentHashMap – ThreadPoolExecutor– AtomicReference– CountDownLatch
![Page 54: High-performance Multithreaded Producer-consumer Designs – from Theory to Practice](https://reader036.vdocuments.us/reader036/viewer/2022062520/5681593e550346895dc67dbc/html5/thumbnails/54.jpg)
April 11, 2006 Scherer & Lea 54
Puzzle Solver: Node Class• Represents a chain of moves from an initial position
– Contains a back pointer so we can reconstruct these moves
class Node<P, M> { final P pos; final M move; final Node<P, M> pre;
Node(P p, M m, Node<P, M> n) { pos = p; move = m; pre = n; }
List<M> asMoveList() { List<M> s = new LinkedList<M>(); for (Node<P,M> n=this; n.move != null; n=n.pre) s.add(0, n.move); return s; }}
![Page 55: High-performance Multithreaded Producer-consumer Designs – from Theory to Practice](https://reader036.vdocuments.us/reader036/viewer/2022062520/5681593e550346895dc67dbc/html5/thumbnails/55.jpg)
April 11, 2006 Scherer & Lea 55
Puzzle Solver: Recursive (sequential) Soln.
public List<M> solve() { P pos = puzzle.initialPosition(); return search(new Node<P, M>(pos, null, null));}
List<M> search(Node<P, M> n) { if (!seen.contains(n.pos)) { seen.add(n.pos); if (puzzle.isGoal(n.pos)) return n.asMoveList(); for (M move : puzzle.moves(n.pos)) { List<M> result = search(new Node<P,M>( puzzle.applyMove(n.pos, move), move, n)); if (result != null) return result; } } return null;}private final Set<P> seen = new HashSet<P>();
![Page 56: High-performance Multithreaded Producer-consumer Designs – from Theory to Practice](https://reader036.vdocuments.us/reader036/viewer/2022062520/5681593e550346895dc67dbc/html5/thumbnails/56.jpg)
April 11, 2006 Scherer & Lea 56
Puzzle Solver: Concurrent Solution // Inner class of concurrent Solver class class TaskNode extends Node<P,M> implements Runnable {
TaskNode(P p, M m, Node<P,M> n) { super(p, m, n); }
public void run() { if (isSolved() || seen.putIfAbsent(pos, true) != null) return; if (puzzle.isGoal(pos)) setSolution(this); else for (M move : puzzle.moves(pos)) { P newPos = puzzle.applyMove(pos, move); exec.execute( new TaskNode(newPos, move, this)); } }}
![Page 57: High-performance Multithreaded Producer-consumer Designs – from Theory to Practice](https://reader036.vdocuments.us/reader036/viewer/2022062520/5681593e550346895dc67dbc/html5/thumbnails/57.jpg)
April 11, 2006 Scherer & Lea 57
Puzzle Solver: Concurrent Solution (2)
class Solver<P, M> {private final Puzzle<P,M> puzzle;private final ExecutorService exec = ...private final ConcurrentMap<P, Boolean> seen = new ConcurrentHashMap<P, Boolean>();
Solver(Puzzle<P,M> p) { this.puzzle = p; }
public List<M> solve() throws InterruptedException { exec.execute(new TaskNode( puzzle.initialPosition(), null, null)); return getSolution(); // block until solved}
boolean isSolved() { ... }
void setSolution(Node<P,M> n) { ... }
List<M> getSolution()throws InterruptedException { ... }
}
![Page 58: High-performance Multithreaded Producer-consumer Designs – from Theory to Practice](https://reader036.vdocuments.us/reader036/viewer/2022062520/5681593e550346895dc67dbc/html5/thumbnails/58.jpg)
April 11, 2006 Scherer & Lea 58
Puzzle Solver: Thread Pool Configuration• Entire framework only applies if either:
– Significant, independent work done in concrete isGoal, moves, and applyMoves methods, or
– Many processors to keep busy• Pool must not saturate; either
– Use unbounded queue—trading memory for time– But note that sequential version may use a lot of memory too
– Use unbounded pool size– More risky! Threads are much bigger than queue nodes
• Use AbortPolicy to make pool shutdown simple– Use of ExecutorService also allows for cancellation
• Suggest:
exec = new ThreadPoolExecutor( NCPUS, NCPUS, Long.MAX_VALUE, TimeUnit.NANOSECONDS, new LinkedBlockingQueue<Runnable>(), new AbortPolicy() );
![Page 59: High-performance Multithreaded Producer-consumer Designs – from Theory to Practice](https://reader036.vdocuments.us/reader036/viewer/2022062520/5681593e550346895dc67dbc/html5/thumbnails/59.jpg)
April 11, 2006 Scherer & Lea 59
Puzzle Solver: Tracking the Solution
• Requirements:– Set solution at most once– getSolution must block until solution available– Release resources once solution found
• Options– Use synchronized with wait/notifyAll– Use Lock and Condition await/signalAll– Use a custom FutureTask– Use atomic variable and a synchronizer: CountDownLatch
![Page 60: High-performance Multithreaded Producer-consumer Designs – from Theory to Practice](https://reader036.vdocuments.us/reader036/viewer/2022062520/5681593e550346895dc67dbc/html5/thumbnails/60.jpg)
April 11, 2006 Scherer & Lea 60
Interactive Messages
• Synopsis– Client activates Server with a oneway message
– Server later invokes a callback method on client
– Callback can be either oneway or procedural– Callback can instead be sent to a helper object of client
– Degenerate case: inform only of task completion• Applications
– Completion indications from file and network I/O– Threads performing computations that yield results
![Page 61: High-performance Multithreaded Producer-consumer Designs – from Theory to Practice](https://reader036.vdocuments.us/reader036/viewer/2022062520/5681593e550346895dc67dbc/html5/thumbnails/61.jpg)
April 11, 2006 Scherer & Lea 61
Completion Callbacks
• The async messages are service activations
• The callbacks are continuation calls that transmit results– May contain a message ID or
completion token to tell client which task completed
• Typically two kinds of callbacks– Success—analog of return– Failure—analog of throw
• Client readiness to accept callbacks may be state-dependent– For example, if client can only
process callbacks in a certain order
![Page 62: High-performance Multithreaded Producer-consumer Designs – from Theory to Practice](https://reader036.vdocuments.us/reader036/viewer/2022062520/5681593e550346895dc67dbc/html5/thumbnails/62.jpg)
April 11, 2006 Scherer & Lea 62
Completion Callback Example
• Callback interfaceinterface FileReaderClient {
void readCompleted(String filename);void readFailed(String filename,IOException ex);
}
• Sample Clientclass FileReaderApp implements FileReaderClient {
private byte[] data;void readCompleted(String filename) { // ... use data ...}void readFailed(String filename, IOException e){ // ... deal with failure ...}void doRead() { new Thread(new FileReader(“file”,data,this)).start();}
}
![Page 63: High-performance Multithreaded Producer-consumer Designs – from Theory to Practice](https://reader036.vdocuments.us/reader036/viewer/2022062520/5681593e550346895dc67dbc/html5/thumbnails/63.jpg)
April 11, 2006 Scherer & Lea 63
Completion Callbacks continued• Sample Serverclass FileReader implements Runnable {
final String name;final byte[] data;final FileReaderClient client; // allow nullpublic FileReader(String name, byte[] data, FileReaderClient c) { this.name = name; this.data = data; this.client = c;}void run() { try { // ... read... if (client != null) client.readCompleted(name); } catch (IOException ex) { if (client != null) client.readFailed(name, ex); }}
}
![Page 64: High-performance Multithreaded Producer-consumer Designs – from Theory to Practice](https://reader036.vdocuments.us/reader036/viewer/2022062520/5681593e550346895dc67dbc/html5/thumbnails/64.jpg)
April 11, 2006 Scherer & Lea 64
Locks and Synchronizers
• java.util.concurrent provides generally useful implementations– ReentrantLock, ReentrantReadWriteLock– Semaphore, CountDownLatch, Barrier, Exchanger– Should meet the needs of most users in most situations
– Some customization possible in some cases by subclassing
• Otherwise AbstractQueuedSynchronizer can be used to build custom locks and synchronizers– Within limitations: int state and FIFO queuing
• Otherwise build from scratch– Atomics– Queues– LockSupport for thread parking/unparking
![Page 65: High-performance Multithreaded Producer-consumer Designs – from Theory to Practice](https://reader036.vdocuments.us/reader036/viewer/2022062520/5681593e550346895dc67dbc/html5/thumbnails/65.jpg)
April 11, 2006 Scherer & Lea 65
Synchronization Infrastructure
• Many locks and synchronizers have similar properties– “acquire” operation that potentially blocks– “release” operation that potentially unblocks other– Examples:
– ReentrantLock, ReentrantReadWriteLock, Semaphore, CountDownLatch
• Implementations have to deal with the same issues:– Atomic state queries and updates– Queue management– Blocking, interruption handling, timeouts
• Nature of state and queuing policies can vary wildly• For specific state and queuing policies a common infrastructure
can be factored out
![Page 66: High-performance Multithreaded Producer-consumer Designs – from Theory to Practice](https://reader036.vdocuments.us/reader036/viewer/2022062520/5681593e550346895dc67dbc/html5/thumbnails/66.jpg)
April 11, 2006 Scherer & Lea 66
AbstractQueuedSynchronizer
• Common synchronization infrastructure for locks/synchronizers– State can be represented as an int– Queuing order is basically FIFO
• Supports notion of both exclusive and shared acquisition semantics– Exclusive acquire only allowed when not held either
exclusively or shared– Shared acquire only allowed when not held exclusively
• BUT AQS knows nothing about this– YOU define when the synchronizer can and can’t be acquired– YOU define all the usage rules
– Reentrant acquisition, only owner can release, …– Barging is/is-not permitted
![Page 67: High-performance Multithreaded Producer-consumer Designs – from Theory to Practice](https://reader036.vdocuments.us/reader036/viewer/2022062520/5681593e550346895dc67dbc/html5/thumbnails/67.jpg)
April 11, 2006 Scherer & Lea 67
Using AbstractQueuedSynchronizer
• Public method are implemented in terms of protected methods that your subclass provides– acquire calls tryAcquire– acquireShared calls tryAcquireShared– release calls tryRelease, etc …
• You implement whichever of the following suit your needs– tryAcquire(int) , tryAcquireShared(int) – tryRelease(int) , tryReleaseShared(int) – isHeldExclusively()
– Used if you want to provide Condition objects – Default implementations throw UnsupportedOperationException
• You use the available state related methods to do the implementation– int getState(), void setState(int), boolean compareAndSetState(int, int)
![Page 68: High-performance Multithreaded Producer-consumer Designs – from Theory to Practice](https://reader036.vdocuments.us/reader036/viewer/2022062520/5681593e550346895dc67dbc/html5/thumbnails/68.jpg)
April 11, 2006 Scherer & Lea 68
Example: Mutex• Mutex: a non-reentrant mutual-exclusion lock
– Only need to implement exclusive mode methods• State semantics:
– State == 0 means lock is free– State == 1 means lock is owned– Owner field identifies current owning thread
– Only owner can release, or use associated Condition• Class outline
• class Mutex implements Lock { Thread owner = null; class Sync extends AbstractQueuedSynchronizer { // AQS method implementations ... } Sync sync = new Sync(); // implement Lock methods in terms of sync ...}
![Page 69: High-performance Multithreaded Producer-consumer Designs – from Theory to Practice](https://reader036.vdocuments.us/reader036/viewer/2022062520/5681593e550346895dc67dbc/html5/thumbnails/69.jpg)
April 11, 2006 Scherer & Lea 69
Example: Mutex (2)• boolean tryAcquire(int acquires) – Purpose:
– Acquire in exclusive mode if possible and return true;– Else return false– acquires argument semantics determined by the application
• Mutex.Sync implementation• public boolean tryAcquire(int unused) {
if (compareAndSetState(0, 1)) { owner = Thread.currentThread(); return true; } return false;}
– compareAndSetState(int expected, int newState)– Atomically set the state to the value newState if it currently has the
value expected. Return true on success, else false
![Page 70: High-performance Multithreaded Producer-consumer Designs – from Theory to Practice](https://reader036.vdocuments.us/reader036/viewer/2022062520/5681593e550346895dc67dbc/html5/thumbnails/70.jpg)
April 11, 2006 Scherer & Lea 70
Example: Mutex (3)
• boolean tryRelease(int releases) – Purpose: – Set state to reflect the release of exclusive mode
– May fail by throwing IllegalMonitorStateException if current thread is not the holder, and the holder is tracked
– Return true if this release allows waiting threads to proceed; else return false
– releases argument semantics determined by the application• Mutex.Sync implementation
• public boolean tryRelease(int unused) { if (owner != Thread.currentThread()) throw new IllegalMonitorStateException(); owner = null; setState(0); return true; // wake up any waiters}
![Page 71: High-performance Multithreaded Producer-consumer Designs – from Theory to Practice](https://reader036.vdocuments.us/reader036/viewer/2022062520/5681593e550346895dc67dbc/html5/thumbnails/71.jpg)
April 11, 2006 Scherer & Lea 71
Example: Mutex (4)
• Condition support– AQS provides inner ConditionObject class– AQS subclass must:
– Support exclusive acquisition by the current thread
– Report exclusive ownership via isHeldExclusively– Ensure release(int) fully releases and a subsequent acquire(int) fully restores the exclusive mode state
• Mutex.Sync implementation• protected boolean isExclusivelyHeld() {
return owner == Thread.currentThread();}protected Condition newCondition() { return new ConditionObject();}
![Page 72: High-performance Multithreaded Producer-consumer Designs – from Theory to Practice](https://reader036.vdocuments.us/reader036/viewer/2022062520/5681593e550346895dc67dbc/html5/thumbnails/72.jpg)
Example: Mutex (5)• Mutex.Sync implementation of Lock methods: public void lock() { sync.acquire(1); }
public boolean tryLock() { return sync.tryAcquire(1); } public void unlock() { sync.release(1); }public Condition newCondition() { return sync.newCondition(); }public boolean isLocked(){ return sync.isHeldExclusively(); }public boolean hasQueuedThreads() { return sync.hasQueuedThreads(); }public void lockInterruptibly()throws InterruptedException { sync.acquireInterruptibly(1);}public boolean tryLock(long timeout, TimeUnit unit) throws InterruptedException { return sync.tryAcquireNanos(1, unit.toNanos(timeout));}
![Page 73: High-performance Multithreaded Producer-consumer Designs – from Theory to Practice](https://reader036.vdocuments.us/reader036/viewer/2022062520/5681593e550346895dc67dbc/html5/thumbnails/73.jpg)
April 11, 2006 Scherer & Lea 73
AQS Queue Management• Basic queuing order is FIFO• Single queue for both shared and exclusive acquisitions
– Practical trade-off:– Two queues more expressive; but
– Atomically working with two queues much harder
• Queue query methods can aid with anti-barging semantics– boolean hasQueuedThreads()
– Returns true if any thread is queued
– Thread getFirstQueuedThread()– Returns the longest waiting thread if any
• Additional monitoring/management methods (slow!):– Collection<Thread> getQueuedThreads()– Collection<Thread> getExclusiveQueuedThreads()
– Collection<Thread> getSharedQueuedThreads()
![Page 74: High-performance Multithreaded Producer-consumer Designs – from Theory to Practice](https://reader036.vdocuments.us/reader036/viewer/2022062520/5681593e550346895dc67dbc/html5/thumbnails/74.jpg)
April 11, 2006 Scherer & Lea 74
![Page 75: High-performance Multithreaded Producer-consumer Designs – from Theory to Practice](https://reader036.vdocuments.us/reader036/viewer/2022062520/5681593e550346895dc67dbc/html5/thumbnails/75.jpg)
April 11, 2006 Scherer & Lea 75
Dual Stack: Implementation
TOS
Data Request
(1) (2a) (2b) (3b)
Fulfiller
Request
TOS TOS TOS
Rest
Rest Rest
Rest