concurrency in python

108
CONCURRENCY IN PYTHON MOSKY 1

Upload: mosky-liu

Post on 15-Jul-2015

135 views

Category:

Technology


4 download

TRANSCRIPT

Page 1: Concurrency in Python

CONCURRENCY IN PYTHONMOSKY

1

Page 2: Concurrency in Python

MULTITHREADING & MULTIPROCESSING IN PYTHONMOSKY

2

Page 3: Concurrency in Python

MOSKYPYTHON CHARMER @ PINKOI MOSKY.TW

3

Page 4: Concurrency in Python

OUTLINE

4

Page 5: Concurrency in Python

OUTLINE

• Introduction

4

Page 6: Concurrency in Python

OUTLINE

• Introduction

• Producer-Consumer Pattern

4

Page 7: Concurrency in Python

OUTLINE

• Introduction

• Producer-Consumer Pattern

• Python’s Flavor

4

Page 8: Concurrency in Python

OUTLINE

• Introduction

• Producer-Consumer Pattern

• Python’s Flavor

• Misc. Techiques

4

Page 9: Concurrency in Python

INTRODUCTION

5

Page 10: Concurrency in Python

MULTITHREADING

6

Page 11: Concurrency in Python

MULTITHREADING

• GIL

6

Page 12: Concurrency in Python

MULTITHREADING

• GIL

• Only one thread runs at any given time.

6

Page 13: Concurrency in Python

MULTITHREADING

• GIL

• Only one thread runs at any given time.

• It still can improves IO-bound problems.

6

Page 14: Concurrency in Python

MULTIPROCESSING

7

Page 15: Concurrency in Python

MULTIPROCESSING

• It uses fork.

7

Page 16: Concurrency in Python

MULTIPROCESSING

• It uses fork.

• Processes can run at the same time.

7

Page 17: Concurrency in Python

MULTIPROCESSING

• It uses fork.

• Processes can run at the same time.

• Use more memory.

7

Page 18: Concurrency in Python

MULTIPROCESSING

• It uses fork.

• Processes can run at the same time.

• Use more memory.

• Note the initial cost.

7

Page 19: Concurrency in Python

IS IT HARD?

8

Page 20: Concurrency in Python

IS IT HARD?

• Avoid shared resources.

8

Page 21: Concurrency in Python

IS IT HARD?

• Avoid shared resources.

• e.g., vars or shared memory, files, connections, …

8

Page 22: Concurrency in Python

IS IT HARD?

• Avoid shared resources.

• e.g., vars or shared memory, files, connections, …

• Understand Python’s flavor.

8

Page 23: Concurrency in Python

IS IT HARD?

• Avoid shared resources.

• e.g., vars or shared memory, files, connections, …

• Understand Python’s flavor.

• Then it will be easy.

8

Page 24: Concurrency in Python

SHARED RESOURCE

9

Page 25: Concurrency in Python

SHARED RESOURCE

• Race condition:T1: RW T2: RW T1+T2: RRWW

9

Page 26: Concurrency in Python

SHARED RESOURCE

• Race condition:T1: RW T2: RW T1+T2: RRWW

• Use lock → Thread-safe:T1+T2: (RW) (RW)

9

Page 27: Concurrency in Python

SHARED RESOURCE

• Race condition:T1: RW T2: RW T1+T2: RRWW

• Use lock → Thread-safe:T1+T2: (RW) (RW)

• But lock causes worse performance and deadlock.

9

Page 28: Concurrency in Python

SHARED RESOURCE

• Race condition:T1: RW T2: RW T1+T2: RRWW

• Use lock → Thread-safe:T1+T2: (RW) (RW)

• But lock causes worse performance and deadlock.

• Which is the hard part.

9

Page 29: Concurrency in Python

DIAGNOSE PROBLEM

10

Page 30: Concurrency in Python

DIAGNOSE PROBLEM

• Where is the bottleneck?

10

Page 31: Concurrency in Python

DIAGNOSE PROBLEM

• Where is the bottleneck?

• Divide your problem.

10

Page 32: Concurrency in Python

PRODUCER-CONSUMER PATTERN

11

Page 33: Concurrency in Python

PRODUCER-CONSUMER PATTERN

12

Page 34: Concurrency in Python

PRODUCER-CONSUMER PATTERN

• A queue

12

Page 35: Concurrency in Python

PRODUCER-CONSUMER PATTERN

• A queue

• Producers → A queue

12

Page 36: Concurrency in Python

PRODUCER-CONSUMER PATTERN

• A queue

• Producers → A queue

• A queue → Consumers

12

Page 37: Concurrency in Python

PRODUCER-CONSUMER PATTERN

• A queue

• Producers → A queue

• A queue → Consumers

• Python has built-in Queue module for it.

12

Page 38: Concurrency in Python

EXAMPLES

• https://docs.python.org/2/library/queue.html#queue-objects

• https://github.com/moskytw/mrbus/blob/master/mrbus/base/pool.py

13

Page 39: Concurrency in Python

WHY .TASK_DONE?

14

Page 40: Concurrency in Python

WHY .TASK_DONE?

• It’s for .join.

14

Page 41: Concurrency in Python

WHY .TASK_DONE?

• It’s for .join.

• When the counter goes zero, it will notify the threads which are waiting.

14

Page 42: Concurrency in Python

WHY .TASK_DONE?

• It’s for .join.

• When the counter goes zero, it will notify the threads which are waiting.

• It’s implemented by threading.Condition.

14

Page 43: Concurrency in Python

15

THE THREADING MODULE

Page 44: Concurrency in Python

15

• Lock — primitive lock: .acquire / .release

THE THREADING MODULE

Page 45: Concurrency in Python

15

• Lock — primitive lock: .acquire / .release

• RLock — owner can reenter

THE THREADING MODULE

Page 46: Concurrency in Python

15

• Lock — primitive lock: .acquire / .release

• RLock — owner can reenter

• Semaphore — lock when counter goes zero

THE THREADING MODULE

Page 47: Concurrency in Python

16

Page 48: Concurrency in Python

• Condition — .wait for .notify / .notify_all

16

Page 49: Concurrency in Python

• Condition — .wait for .notify / .notify_all

• Event — .wait for .set; simplifed Condition

16

Page 50: Concurrency in Python

• Condition — .wait for .notify / .notify_all

• Event — .wait for .set; simplifed Condition

• with lock: …

16

Page 51: Concurrency in Python

THE MULTIPROCESSING MODULE

17

Page 52: Concurrency in Python

THE MULTIPROCESSING MODULE

• .Process

17

Page 53: Concurrency in Python

THE MULTIPROCESSING MODULE

• .Process

• .JoinableQueue

17

Page 54: Concurrency in Python

THE MULTIPROCESSING MODULE

• .Process

• .JoinableQueue

• .Pool

17

Page 55: Concurrency in Python

THE MULTIPROCESSING MODULE

• .Process

• .JoinableQueue

• .Pool

• …

17

Page 56: Concurrency in Python

PYTHON’S FLAVOR

18

Page 57: Concurrency in Python

19

DAEMONIC THREAD

Page 58: Concurrency in Python

19

• It’s not that “daemon”.

DAEMONIC THREAD

Page 59: Concurrency in Python

19

• It’s not that “daemon”.

• Just will be killed when Python shutting down.

DAEMONIC THREAD

Page 60: Concurrency in Python

19

• It’s not that “daemon”.

• Just will be killed when Python shutting down.

• Immediately.

DAEMONIC THREAD

Page 61: Concurrency in Python

19

• It’s not that “daemon”.

• Just will be killed when Python shutting down.

• Immediately.

• Others keep running until return.

DAEMONIC THREAD

Page 62: Concurrency in Python

SO, HOW TO STOP?

20

Page 63: Concurrency in Python

SO, HOW TO STOP?

• Set demon and let Python clean it up.

20

Page 64: Concurrency in Python

SO, HOW TO STOP?

• Set demon and let Python clean it up.

• Let it return.

20

Page 65: Concurrency in Python

BUT, THE THREAD IS BLOCKING

21

Page 66: Concurrency in Python

BUT, THE THREAD IS BLOCKING

• Set timeout.

21

Page 67: Concurrency in Python

HOW ABOUT CTRL+C?

22

Page 68: Concurrency in Python

HOW ABOUT CTRL+C?

• Only main thread can receive that.

22

Page 69: Concurrency in Python

HOW ABOUT CTRL+C?

• Only main thread can receive that.

• BSD-style.

22

Page 70: Concurrency in Python

BROADCAST SIGNAL TO SUB-THREAD

23

Page 71: Concurrency in Python

BROADCAST SIGNAL TO SUB-THREAD

• Set a global flag when get signal.

23

Page 72: Concurrency in Python

BROADCAST SIGNAL TO SUB-THREAD

• Set a global flag when get signal.

• Let thread read it before each task.

23

Page 73: Concurrency in Python

BROADCAST SIGNAL TO SUB-THREAD

• Set a global flag when get signal.

• Let thread read it before each task.

• No, you can’t kill non-daemonic thread.

23

Page 74: Concurrency in Python

BROADCAST SIGNAL TO SUB-THREAD

• Set a global flag when get signal.

• Let thread read it before each task.

• No, you can’t kill non-daemonic thread.

• Just can’t do so.

23

Page 75: Concurrency in Python

BROADCAST SIGNAL TO SUB-THREAD

• Set a global flag when get signal.

• Let thread read it before each task.

• No, you can’t kill non-daemonic thread.

• Just can’t do so.

• It’s Python.

23

Page 76: Concurrency in Python

BROADCAST SIGNAL TO SUB-PROCESS

24

Page 77: Concurrency in Python

BROADCAST SIGNAL TO SUB-PROCESS

• Just broadcast the signal to sub-processes.

24

Page 78: Concurrency in Python

BROADCAST SIGNAL TO SUB-PROCESS

• Just broadcast the signal to sub-processes.

• Start with register signal handler: signal(SIGINT, _handle_to_term_signal)

24

Page 79: Concurrency in Python

25

Page 80: Concurrency in Python

• Realize process context if need:pid = getpid() pgid = getpgid(0) proc_is_parent = (pid == pgid)

25

Page 81: Concurrency in Python

• Realize process context if need:pid = getpid() pgid = getpgid(0) proc_is_parent = (pid == pgid)

• Off the handler: signal(signum, SIG_IGN)

25

Page 82: Concurrency in Python

• Realize process context if need:pid = getpid() pgid = getpgid(0) proc_is_parent = (pid == pgid)

• Off the handler: signal(signum, SIG_IGN)

• Broadcast: killpg(pgid, signum)

25

Page 83: Concurrency in Python

MISC. TECHIQUES

26

Page 84: Concurrency in Python

JUST THREAD IT OUT

27

Page 85: Concurrency in Python

JUST THREAD IT OUT

• Or process it out.

27

Page 86: Concurrency in Python

JUST THREAD IT OUT

• Or process it out.

• Let main thread exit earlier. (Looks faster!)

27

Page 87: Concurrency in Python

JUST THREAD IT OUT

• Or process it out.

• Let main thread exit earlier. (Looks faster!)

• Let main thread keep dispatching tasks.

27

Page 88: Concurrency in Python

JUST THREAD IT OUT

• Or process it out.

• Let main thread exit earlier. (Looks faster!)

• Let main thread keep dispatching tasks.

• “Async”

27

Page 89: Concurrency in Python

JUST THREAD IT OUT

• Or process it out.

• Let main thread exit earlier. (Looks faster!)

• Let main thread keep dispatching tasks.

• “Async”

• And fix some stupid behavior.(I meant atexit with multiprocessing.Pool.)

27

Page 90: Concurrency in Python

COLLECT RESULT SMARTER

28

Page 91: Concurrency in Python

COLLECT RESULT SMARTER

• Put into a safe queue.

28

Page 92: Concurrency in Python

COLLECT RESULT SMARTER

• Put into a safe queue.

• Use a thread per instance.

28

Page 93: Concurrency in Python

COLLECT RESULT SMARTER

• Put into a safe queue.

• Use a thread per instance.

• Learn “let it go”.

28

Page 94: Concurrency in Python

EXAMPLES

• https://github.com/moskytw/mrbus/blob/master/mrbus/base/pool.py#L45

• https://github.com/moskytw/mrbus/blob/master/mrbus/model/core.py#L30

29

Page 95: Concurrency in Python

MONITOR THEM

30

Page 96: Concurrency in Python

MONITOR THEM

• No one is a master at first.

30

Page 97: Concurrency in Python

MONITOR THEM

• No one is a master at first.

• Don’t guess.

30

Page 98: Concurrency in Python

MONITOR THEM

• No one is a master at first.

• Don’t guess.

• Just use a function to print log.

30

Page 99: Concurrency in Python

BENCHMARK THEM

31

Page 100: Concurrency in Python

BENCHMARK THEM

• No one is a master at first.

31

Page 101: Concurrency in Python

BENCHMARK THEM

• No one is a master at first.

• Don’t guess.

31

Page 102: Concurrency in Python

BENCHMARK THEM

• No one is a master at first.

• Don’t guess.

• Just prove it.

31

Page 103: Concurrency in Python

CONCLUSION

32

Page 104: Concurrency in Python

CONCLUSION

• Avoid shared resource — or just use producer-consumer pattern.

32

Page 105: Concurrency in Python

CONCLUSION

• Avoid shared resource — or just use producer-consumer pattern.

• Signals only go main thread.

32

Page 106: Concurrency in Python

CONCLUSION

• Avoid shared resource — or just use producer-consumer pattern.

• Signals only go main thread.

• Just thread it out.

32

Page 107: Concurrency in Python

CONCLUSION

• Avoid shared resource — or just use producer-consumer pattern.

• Signals only go main thread.

• Just thread it out.

• Collect your result smarter.

32

Page 108: Concurrency in Python

CONCLUSION

• Avoid shared resource — or just use producer-consumer pattern.

• Signals only go main thread.

• Just thread it out.

• Collect your result smarter.

• Monitor and benchmark your code.

32