concurrency in python
TRANSCRIPT
CONCURRENCY IN PYTHONMOSKY
1
MULTITHREADING & MULTIPROCESSING IN PYTHONMOSKY
2
OUTLINE
4
OUTLINE
• Introduction
4
OUTLINE
• Introduction
• Producer-Consumer Pattern
4
OUTLINE
• Introduction
• Producer-Consumer Pattern
• Python’s Flavor
4
OUTLINE
• Introduction
• Producer-Consumer Pattern
• Python’s Flavor
• Misc. Techiques
4
INTRODUCTION
5
MULTITHREADING
6
MULTITHREADING
• GIL
6
MULTITHREADING
• GIL
• Only one thread runs at any given time.
6
MULTITHREADING
• GIL
• Only one thread runs at any given time.
• It still can improves IO-bound problems.
6
MULTIPROCESSING
7
MULTIPROCESSING
• It uses fork.
7
MULTIPROCESSING
• It uses fork.
• Processes can run at the same time.
7
MULTIPROCESSING
• It uses fork.
• Processes can run at the same time.
• Use more memory.
7
MULTIPROCESSING
• It uses fork.
• Processes can run at the same time.
• Use more memory.
• Note the initial cost.
7
IS IT HARD?
8
IS IT HARD?
• Avoid shared resources.
8
IS IT HARD?
• Avoid shared resources.
• e.g., vars or shared memory, files, connections, …
8
IS IT HARD?
• Avoid shared resources.
• e.g., vars or shared memory, files, connections, …
• Understand Python’s flavor.
8
IS IT HARD?
• Avoid shared resources.
• e.g., vars or shared memory, files, connections, …
• Understand Python’s flavor.
• Then it will be easy.
8
SHARED RESOURCE
9
SHARED RESOURCE
• Race condition:T1: RW T2: RW T1+T2: RRWW
9
SHARED RESOURCE
• Race condition:T1: RW T2: RW T1+T2: RRWW
• Use lock → Thread-safe:T1+T2: (RW) (RW)
9
SHARED RESOURCE
• Race condition:T1: RW T2: RW T1+T2: RRWW
• Use lock → Thread-safe:T1+T2: (RW) (RW)
• But lock causes worse performance and deadlock.
9
SHARED RESOURCE
• Race condition:T1: RW T2: RW T1+T2: RRWW
• Use lock → Thread-safe:T1+T2: (RW) (RW)
• But lock causes worse performance and deadlock.
• Which is the hard part.
9
DIAGNOSE PROBLEM
10
DIAGNOSE PROBLEM
• Where is the bottleneck?
10
DIAGNOSE PROBLEM
• Where is the bottleneck?
• Divide your problem.
10
PRODUCER-CONSUMER PATTERN
11
PRODUCER-CONSUMER PATTERN
12
PRODUCER-CONSUMER PATTERN
• A queue
12
PRODUCER-CONSUMER PATTERN
• A queue
• Producers → A queue
12
PRODUCER-CONSUMER PATTERN
• A queue
• Producers → A queue
• A queue → Consumers
12
PRODUCER-CONSUMER PATTERN
• A queue
• Producers → A queue
• A queue → Consumers
• Python has built-in Queue module for it.
12
EXAMPLES
• https://docs.python.org/2/library/queue.html#queue-objects
• https://github.com/moskytw/mrbus/blob/master/mrbus/base/pool.py
13
WHY .TASK_DONE?
14
WHY .TASK_DONE?
• It’s for .join.
14
WHY .TASK_DONE?
• It’s for .join.
• When the counter goes zero, it will notify the threads which are waiting.
14
WHY .TASK_DONE?
• It’s for .join.
• When the counter goes zero, it will notify the threads which are waiting.
• It’s implemented by threading.Condition.
14
15
THE THREADING MODULE
15
• Lock — primitive lock: .acquire / .release
THE THREADING MODULE
15
• Lock — primitive lock: .acquire / .release
• RLock — owner can reenter
THE THREADING MODULE
15
• Lock — primitive lock: .acquire / .release
• RLock — owner can reenter
• Semaphore — lock when counter goes zero
THE THREADING MODULE
16
• Condition — .wait for .notify / .notify_all
16
• Condition — .wait for .notify / .notify_all
• Event — .wait for .set; simplifed Condition
16
• Condition — .wait for .notify / .notify_all
• Event — .wait for .set; simplifed Condition
• with lock: …
16
THE MULTIPROCESSING MODULE
17
THE MULTIPROCESSING MODULE
• .Process
17
THE MULTIPROCESSING MODULE
• .Process
• .JoinableQueue
17
THE MULTIPROCESSING MODULE
• .Process
• .JoinableQueue
• .Pool
17
THE MULTIPROCESSING MODULE
• .Process
• .JoinableQueue
• .Pool
• …
17
PYTHON’S FLAVOR
18
19
DAEMONIC THREAD
19
• It’s not that “daemon”.
DAEMONIC THREAD
19
• It’s not that “daemon”.
• Just will be killed when Python shutting down.
DAEMONIC THREAD
19
• It’s not that “daemon”.
• Just will be killed when Python shutting down.
• Immediately.
DAEMONIC THREAD
19
• It’s not that “daemon”.
• Just will be killed when Python shutting down.
• Immediately.
• Others keep running until return.
DAEMONIC THREAD
SO, HOW TO STOP?
20
SO, HOW TO STOP?
• Set demon and let Python clean it up.
20
SO, HOW TO STOP?
• Set demon and let Python clean it up.
• Let it return.
20
BUT, THE THREAD IS BLOCKING
21
BUT, THE THREAD IS BLOCKING
• Set timeout.
21
HOW ABOUT CTRL+C?
22
HOW ABOUT CTRL+C?
• Only main thread can receive that.
22
HOW ABOUT CTRL+C?
• Only main thread can receive that.
• BSD-style.
22
BROADCAST SIGNAL TO SUB-THREAD
23
BROADCAST SIGNAL TO SUB-THREAD
• Set a global flag when get signal.
23
BROADCAST SIGNAL TO SUB-THREAD
• Set a global flag when get signal.
• Let thread read it before each task.
23
BROADCAST SIGNAL TO SUB-THREAD
• Set a global flag when get signal.
• Let thread read it before each task.
• No, you can’t kill non-daemonic thread.
23
BROADCAST SIGNAL TO SUB-THREAD
• Set a global flag when get signal.
• Let thread read it before each task.
• No, you can’t kill non-daemonic thread.
• Just can’t do so.
23
BROADCAST SIGNAL TO SUB-THREAD
• Set a global flag when get signal.
• Let thread read it before each task.
• No, you can’t kill non-daemonic thread.
• Just can’t do so.
• It’s Python.
23
BROADCAST SIGNAL TO SUB-PROCESS
24
BROADCAST SIGNAL TO SUB-PROCESS
• Just broadcast the signal to sub-processes.
24
BROADCAST SIGNAL TO SUB-PROCESS
• Just broadcast the signal to sub-processes.
• Start with register signal handler: signal(SIGINT, _handle_to_term_signal)
24
25
• Realize process context if need:pid = getpid() pgid = getpgid(0) proc_is_parent = (pid == pgid)
25
• Realize process context if need:pid = getpid() pgid = getpgid(0) proc_is_parent = (pid == pgid)
• Off the handler: signal(signum, SIG_IGN)
25
• Realize process context if need:pid = getpid() pgid = getpgid(0) proc_is_parent = (pid == pgid)
• Off the handler: signal(signum, SIG_IGN)
• Broadcast: killpg(pgid, signum)
25
MISC. TECHIQUES
26
JUST THREAD IT OUT
27
JUST THREAD IT OUT
• Or process it out.
27
JUST THREAD IT OUT
• Or process it out.
• Let main thread exit earlier. (Looks faster!)
27
JUST THREAD IT OUT
• Or process it out.
• Let main thread exit earlier. (Looks faster!)
• Let main thread keep dispatching tasks.
27
JUST THREAD IT OUT
• Or process it out.
• Let main thread exit earlier. (Looks faster!)
• Let main thread keep dispatching tasks.
• “Async”
27
JUST THREAD IT OUT
• Or process it out.
• Let main thread exit earlier. (Looks faster!)
• Let main thread keep dispatching tasks.
• “Async”
• And fix some stupid behavior.(I meant atexit with multiprocessing.Pool.)
27
COLLECT RESULT SMARTER
28
COLLECT RESULT SMARTER
• Put into a safe queue.
28
COLLECT RESULT SMARTER
• Put into a safe queue.
• Use a thread per instance.
28
COLLECT RESULT SMARTER
• Put into a safe queue.
• Use a thread per instance.
• Learn “let it go”.
28
EXAMPLES
• https://github.com/moskytw/mrbus/blob/master/mrbus/base/pool.py#L45
• https://github.com/moskytw/mrbus/blob/master/mrbus/model/core.py#L30
29
MONITOR THEM
30
MONITOR THEM
• No one is a master at first.
30
MONITOR THEM
• No one is a master at first.
• Don’t guess.
30
MONITOR THEM
• No one is a master at first.
• Don’t guess.
• Just use a function to print log.
30
BENCHMARK THEM
31
BENCHMARK THEM
• No one is a master at first.
31
BENCHMARK THEM
• No one is a master at first.
• Don’t guess.
31
BENCHMARK THEM
• No one is a master at first.
• Don’t guess.
• Just prove it.
31
CONCLUSION
32
CONCLUSION
• Avoid shared resource — or just use producer-consumer pattern.
32
CONCLUSION
• Avoid shared resource — or just use producer-consumer pattern.
• Signals only go main thread.
32
CONCLUSION
• Avoid shared resource — or just use producer-consumer pattern.
• Signals only go main thread.
• Just thread it out.
32
CONCLUSION
• Avoid shared resource — or just use producer-consumer pattern.
• Signals only go main thread.
• Just thread it out.
• Collect your result smarter.
32
CONCLUSION
• Avoid shared resource — or just use producer-consumer pattern.
• Signals only go main thread.
• Just thread it out.
• Collect your result smarter.
• Monitor and benchmark your code.
32