Download - Parallel worlds of CRuby's GC
![Page 1: Parallel worlds of CRuby's GC](https://reader034.vdocuments.us/reader034/viewer/2022051611/54b794704a7959db528b4b3e/html5/thumbnails/1.jpg)
Parallel worlds of CRuby's GC Powered by Rabbit 0.9.3
Parallel worlds of CRuby's GC
nari/Narihiro Nakamura/@nari_en
Network Applied Communication Laboratory Ltd.
![Page 2: Parallel worlds of CRuby's GC](https://reader034.vdocuments.us/reader034/viewer/2022051611/54b794704a7959db528b4b3e/html5/thumbnails/2.jpg)
I'm very happy now.
![Page 3: Parallel worlds of CRuby's GC](https://reader034.vdocuments.us/reader034/viewer/2022051611/54b794704a7959db528b4b3e/html5/thumbnails/3.jpg)
Today is my first presentation in English.
![Page 4: Parallel worlds of CRuby's GC](https://reader034.vdocuments.us/reader034/viewer/2022051611/54b794704a7959db528b4b3e/html5/thumbnails/4.jpg)
My English is not good.
![Page 5: Parallel worlds of CRuby's GC](https://reader034.vdocuments.us/reader034/viewer/2022051611/54b794704a7959db528b4b3e/html5/thumbnails/5.jpg)
But, I'll do my best.Please bear with me :)
![Page 6: Parallel worlds of CRuby's GC](https://reader034.vdocuments.us/reader034/viewer/2022051611/54b794704a7959db528b4b3e/html5/thumbnails/6.jpg)
Self introduction
![Page 7: Parallel worlds of CRuby's GC](https://reader034.vdocuments.us/reader034/viewer/2022051611/54b794704a7959db528b4b3e/html5/thumbnails/7.jpg)
![Page 8: Parallel worlds of CRuby's GC](https://reader034.vdocuments.us/reader034/viewer/2022051611/54b794704a7959db528b4b3e/html5/thumbnails/8.jpg)
![Page 9: Parallel worlds of CRuby's GC](https://reader034.vdocuments.us/reader034/viewer/2022051611/54b794704a7959db528b4b3e/html5/thumbnails/9.jpg)
Parallel worlds of CRuby's GC Powered by Rabbit 0.9.3
Ice-cream factory
I worked in an assembly line✓
For example, I made many cardboard boxes.
I was a professional cardboard box maker :)
✓
✓
8/207
![Page 10: Parallel worlds of CRuby's GC](https://reader034.vdocuments.us/reader034/viewer/2022051611/54b794704a7959db528b4b3e/html5/thumbnails/10.jpg)
Parallel worlds of CRuby's GC Powered by Rabbit 0.9.3
Ice-cream factory
I made 150 boxes per hour(ZOMG)
✓
9/207
![Page 11: Parallel worlds of CRuby's GC](https://reader034.vdocuments.us/reader034/viewer/2022051611/54b794704a7959db528b4b3e/html5/thumbnails/11.jpg)
http://www.flickr.com/photos/kevincollins123/5887984753/http://www.flickr.com/photos/kevincollins123/5887984753/
I was like a machine!!
![Page 12: Parallel worlds of CRuby's GC](https://reader034.vdocuments.us/reader034/viewer/2022051611/54b794704a7959db528b4b3e/html5/thumbnails/12.jpg)
![Page 13: Parallel worlds of CRuby's GC](https://reader034.vdocuments.us/reader034/viewer/2022051611/54b794704a7959db528b4b3e/html5/thumbnails/13.jpg)
![Page 14: Parallel worlds of CRuby's GC](https://reader034.vdocuments.us/reader034/viewer/2022051611/54b794704a7959db528b4b3e/html5/thumbnails/14.jpg)
Parallel worlds of CRuby's GC Powered by Rabbit 0.9.3
Working with Java
I worked in a big company.✓
This work was similar to assembly line work..
I made a part of a product. I didn't understand whole product.
✓
✓
13/207
![Page 15: Parallel worlds of CRuby's GC](https://reader034.vdocuments.us/reader034/viewer/2022051611/54b794704a7959db528b4b3e/html5/thumbnails/15.jpg)
http://www.flickr.com/photos/kevincollins123/5887984753/http://www.flickr.com/photos/kevincollins123/5887984753/
I was still like a machine!!
![Page 16: Parallel worlds of CRuby's GC](https://reader034.vdocuments.us/reader034/viewer/2022051611/54b794704a7959db528b4b3e/html5/thumbnails/16.jpg)
![Page 17: Parallel worlds of CRuby's GC](https://reader034.vdocuments.us/reader034/viewer/2022051611/54b794704a7959db528b4b3e/html5/thumbnails/17.jpg)
![Page 18: Parallel worlds of CRuby's GC](https://reader034.vdocuments.us/reader034/viewer/2022051611/54b794704a7959db528b4b3e/html5/thumbnails/18.jpg)
Parallel worlds of CRuby's GC Powered by Rabbit 0.9.3
My current work
Currently, I work at NaCl.✓
matz and shyouhei and takaokouji are my co-workers.
✓
shugo is my boss.They are CRuby committers.✓
✓
17/207
![Page 19: Parallel worlds of CRuby's GC](https://reader034.vdocuments.us/reader034/viewer/2022051611/54b794704a7959db528b4b3e/html5/thumbnails/19.jpg)
Parallel worlds of CRuby's GC Powered by Rabbit 0.9.3
When I started Ruby programming
I felt free.✓
This work wasn't similar to assembly line work.
I could make the whole product.✓
✓
18/207
![Page 20: Parallel worlds of CRuby's GC](https://reader034.vdocuments.us/reader034/viewer/2022051611/54b794704a7959db528b4b3e/html5/thumbnails/20.jpg)
http://www.flickr.com/photos/danzden/121379782/http://www.flickr.com/photos/danzden/121379782/
I was no longera machine!!
![Page 21: Parallel worlds of CRuby's GC](https://reader034.vdocuments.us/reader034/viewer/2022051611/54b794704a7959db528b4b3e/html5/thumbnails/21.jpg)
![Page 22: Parallel worlds of CRuby's GC](https://reader034.vdocuments.us/reader034/viewer/2022051611/54b794704a7959db528b4b3e/html5/thumbnails/22.jpg)
Parallel worlds of CRuby's GC Powered by Rabbit 0.9.3
Garbage Collection for me
GC technology is very interesting for me.
✓
GC is a garbage collecting machine.
✓
I've been creating it since then. It's very fun!!
✓
21/207
![Page 23: Parallel worlds of CRuby's GC](https://reader034.vdocuments.us/reader034/viewer/2022051611/54b794704a7959db528b4b3e/html5/thumbnails/23.jpg)
I'm making a machine!!
![Page 24: Parallel worlds of CRuby's GC](https://reader034.vdocuments.us/reader034/viewer/2022051611/54b794704a7959db528b4b3e/html5/thumbnails/24.jpg)
My relationship to GC
![Page 25: Parallel worlds of CRuby's GC](https://reader034.vdocuments.us/reader034/viewer/2022051611/54b794704a7959db528b4b3e/html5/thumbnails/25.jpg)
Parallel worlds of CRuby's GC Powered by Rabbit 0.9.3
I'm a CRuby Committer
I work on GC.✓
24/207
![Page 26: Parallel worlds of CRuby's GC](https://reader034.vdocuments.us/reader034/viewer/2022051611/54b794704a7959db528b4b3e/html5/thumbnails/26.jpg)
And, I wrote abook about GC.
![Page 27: Parallel worlds of CRuby's GC](https://reader034.vdocuments.us/reader034/viewer/2022051611/54b794704a7959db528b4b3e/html5/thumbnails/27.jpg)
But, it's only in Japanese :(
![Page 28: Parallel worlds of CRuby's GC](https://reader034.vdocuments.us/reader034/viewer/2022051611/54b794704a7959db528b4b3e/html5/thumbnails/28.jpg)
And, I've been creating GC with RDD.
![Page 29: Parallel worlds of CRuby's GC](https://reader034.vdocuments.us/reader034/viewer/2022051611/54b794704a7959db528b4b3e/html5/thumbnails/29.jpg)
What is RDD?
![Page 30: Parallel worlds of CRuby's GC](https://reader034.vdocuments.us/reader034/viewer/2022051611/54b794704a7959db528b4b3e/html5/thumbnails/30.jpg)
RDD = RubyKaigi Driven Development
![Page 31: Parallel worlds of CRuby's GC](https://reader034.vdocuments.us/reader034/viewer/2022051611/54b794704a7959db528b4b3e/html5/thumbnails/31.jpg)
Parallel worlds of CRuby's GC Powered by Rabbit 0.9.3
My RDD history
LazySweepGC - RubyKaigi2008✓
LonglifeGC - 2009✓
LazySweepGC - 2010✓
ParallelMarkingGC - 2011✓
30/207
![Page 32: Parallel worlds of CRuby's GC](https://reader034.vdocuments.us/reader034/viewer/2022051611/54b794704a7959db528b4b3e/html5/thumbnails/32.jpg)
Parallel worlds of CRuby's GC Powered by Rabbit 0.9.3
My RDD history
LazySweepGC - RubyKaigi2008✓
LonglifeGC - 2009✓
LazySweepGC - 2010✓
ParallelMarkingGC - 2011✓
31/207
![Page 33: Parallel worlds of CRuby's GC](https://reader034.vdocuments.us/reader034/viewer/2022051611/54b794704a7959db528b4b3e/html5/thumbnails/33.jpg)
Parallel worlds of CRuby's GC Powered by Rabbit 0.9.3
LonglifeGC
It treats long-life objects as a special case.
similar to Generational GC.✓
✓
LonglifeGC was rejected in CRuby 1.9.2 by some reason.
:'(✓
✓
32/207
![Page 34: Parallel worlds of CRuby's GC](https://reader034.vdocuments.us/reader034/viewer/2022051611/54b794704a7959db528b4b3e/html5/thumbnails/34.jpg)
http://www.flickr.com/photos/conifer/2389654222/http://www.flickr.com/photos/conifer/2389654222/
But, LonglifeGC has been
used in Kiji :-)
![Page 35: Parallel worlds of CRuby's GC](https://reader034.vdocuments.us/reader034/viewer/2022051611/54b794704a7959db528b4b3e/html5/thumbnails/35.jpg)
Parallel worlds of CRuby's GC Powered by Rabbit 0.9.3
Kiji
Kiji is an optimized version of REE by Twitter developers.
✓
The twitter team substantially extended LonglifeGC.
It's cool!!✓
✓
34/207
![Page 36: Parallel worlds of CRuby's GC](https://reader034.vdocuments.us/reader034/viewer/2022051611/54b794704a7959db528b4b3e/html5/thumbnails/36.jpg)
But, Kiji will be rejected also... :'(
![Page 37: Parallel worlds of CRuby's GC](https://reader034.vdocuments.us/reader034/viewer/2022051611/54b794704a7959db528b4b3e/html5/thumbnails/37.jpg)
Parallel worlds of CRuby's GC Powered by Rabbit 0.9.3
My RDD history
LazySweepGC - RubyKaigi2008✓
LonglifeGC - 2009✓
LazySweepGC - 2010✓
ParallelMarkingGC - 2011✓
36/207
![Page 38: Parallel worlds of CRuby's GC](https://reader034.vdocuments.us/reader034/viewer/2022051611/54b794704a7959db528b4b3e/html5/thumbnails/38.jpg)
Parallel worlds of CRuby's GC Powered by Rabbit 0.9.3
LazySweepGC
Traditional M&S GC executes mark and sweep atomically.
Ruby application stops during GC (stop-the-world).
✓
✓
In Lazy sweeping, sweeping is lazy.
✓
37/207
![Page 39: Parallel worlds of CRuby's GC](https://reader034.vdocuments.us/reader034/viewer/2022051611/54b794704a7959db528b4b3e/html5/thumbnails/39.jpg)
Parallel worlds of CRuby's GC Powered by Rabbit 0.9.3
LazySweepGC
Each invocation of the object allocation sweeps Ruby's heap
until it finds an appropriate free object.✓
✓
38/207
![Page 40: Parallel worlds of CRuby's GC](https://reader034.vdocuments.us/reader034/viewer/2022051611/54b794704a7959db528b4b3e/html5/thumbnails/40.jpg)
Parallel worlds of CRuby's GC Powered by Rabbit 0.9.3
Improvements
This improves the response time of GC
✓
I.e. the worst case time of GC decreases.
✓
39/207
![Page 41: Parallel worlds of CRuby's GC](https://reader034.vdocuments.us/reader034/viewer/2022051611/54b794704a7959db528b4b3e/html5/thumbnails/41.jpg)
Parallel worlds of CRuby's GC Powered by Rabbit 0.9.3
LazySweepGC
You can use LazySweepGC since Ruby 1.9.3
✓
40/207
![Page 42: Parallel worlds of CRuby's GC](https://reader034.vdocuments.us/reader034/viewer/2022051611/54b794704a7959db528b4b3e/html5/thumbnails/42.jpg)
Parallel worlds of CRuby's GC Powered by Rabbit 0.9.3
My RDD history
LazySweepGC - RubyKaigi2008✓
LonglifeGC - 2009✓
LazySweepGC - 2010✓
ParallelMarkingGC - 2011✓
41/207
![Page 43: Parallel worlds of CRuby's GC](https://reader034.vdocuments.us/reader034/viewer/2022051611/54b794704a7959db528b4b3e/html5/thumbnails/43.jpg)
Today's topics
![Page 44: Parallel worlds of CRuby's GC](https://reader034.vdocuments.us/reader034/viewer/2022051611/54b794704a7959db528b4b3e/html5/thumbnails/44.jpg)
Parallel worlds of CRuby's GC Powered by Rabbit 0.9.3
Today's topics
Why do we need Parallel Marking?
✓
What to consider?✓
How to implement?✓
How much did performance improve?
✓
43/207
![Page 45: Parallel worlds of CRuby's GC](https://reader034.vdocuments.us/reader034/viewer/2022051611/54b794704a7959db528b4b3e/html5/thumbnails/45.jpg)
Parallel worlds of CRuby's GC Powered by Rabbit 0.9.3
Today's topics
Why do we need Parallel Marking?
✓
What to consider?✓
How to implement?✓
How much did performance improve?
✓
44/207
![Page 46: Parallel worlds of CRuby's GC](https://reader034.vdocuments.us/reader034/viewer/2022051611/54b794704a7959db528b4b3e/html5/thumbnails/46.jpg)
Why do we need Parallel Marking?
![Page 47: Parallel worlds of CRuby's GC](https://reader034.vdocuments.us/reader034/viewer/2022051611/54b794704a7959db528b4b3e/html5/thumbnails/47.jpg)
This is CRuby'scurrent GC.
![Page 48: Parallel worlds of CRuby's GC](https://reader034.vdocuments.us/reader034/viewer/2022051611/54b794704a7959db528b4b3e/html5/thumbnails/48.jpg)
Parallel worlds of CRuby's GC Powered by Rabbit 0.9.3
Current CRuby's GC
GC operates on only 1 core.✓
In multi-core environment, other cores don't help GC.
✓
47/207
![Page 49: Parallel worlds of CRuby's GC](https://reader034.vdocuments.us/reader034/viewer/2022051611/54b794704a7959db528b4b3e/html5/thumbnails/49.jpg)
http://www.flickr.com/photos/hortont/2698261070/http://www.flickr.com/photos/hortont/2698261070/
GC:"I'm alone, it's so hard."
![Page 50: Parallel worlds of CRuby's GC](https://reader034.vdocuments.us/reader034/viewer/2022051611/54b794704a7959db528b4b3e/html5/thumbnails/50.jpg)
http://www.flickr.com/photos/knallaerbse/2863161933/http://www.flickr.com/photos/knallaerbse/2863161933/
We should run GC in parallel!!
![Page 51: Parallel worlds of CRuby's GC](https://reader034.vdocuments.us/reader034/viewer/2022051611/54b794704a7959db528b4b3e/html5/thumbnails/51.jpg)
First, Let me explain a few GC related concepts.
![Page 52: Parallel worlds of CRuby's GC](https://reader034.vdocuments.us/reader034/viewer/2022051611/54b794704a7959db528b4b3e/html5/thumbnails/52.jpg)
Parallel worlds of CRuby's GC Powered by Rabbit 0.9.3
What is GC?
GC collects all dead objects.✓
51/207
![Page 53: Parallel worlds of CRuby's GC](https://reader034.vdocuments.us/reader034/viewer/2022051611/54b794704a7959db528b4b3e/html5/thumbnails/53.jpg)
Parallel worlds of CRuby's GC Powered by Rabbit 0.9.3
What is a dead object?
A dead object is an object that is never referenced by the program.
✓
In GC terms, we say a that dead object is unreachable from Roots.
✓
52/207
![Page 54: Parallel worlds of CRuby's GC](https://reader034.vdocuments.us/reader034/viewer/2022051611/54b794704a7959db528b4b3e/html5/thumbnails/54.jpg)
Parallel worlds of CRuby's GC Powered by Rabbit 0.9.3
What is Roots?
Roots is a set of pointers that directly reference objects in the program.
e.g. Ruby's local variables, etc..✓
✓
53/207
![Page 55: Parallel worlds of CRuby's GC](https://reader034.vdocuments.us/reader034/viewer/2022051611/54b794704a7959db528b4b3e/html5/thumbnails/55.jpg)
Parallel worlds of CRuby's GC Powered by Rabbit 0.9.3
For example
54/207
![Page 56: Parallel worlds of CRuby's GC](https://reader034.vdocuments.us/reader034/viewer/2022051611/54b794704a7959db528b4b3e/html5/thumbnails/56.jpg)
Parallel worlds of CRuby's GC Powered by Rabbit 0.9.3
Please remember that
GC collects objects that are unreachable from Roots.
✓
55/207
![Page 57: Parallel worlds of CRuby's GC](https://reader034.vdocuments.us/reader034/viewer/2022051611/54b794704a7959db528b4b3e/html5/thumbnails/57.jpg)
Next, Let me explain the current CRuby GC
algorithm.
![Page 58: Parallel worlds of CRuby's GC](https://reader034.vdocuments.us/reader034/viewer/2022051611/54b794704a7959db528b4b3e/html5/thumbnails/58.jpg)
Parallel worlds of CRuby's GC Powered by Rabbit 0.9.3
CRuby's GC algorithm summary
CRuby adopts the Mark & Sweep algorithm
✓
Collector works in separate Mark and Sweep phases.
✓
57/207
![Page 59: Parallel worlds of CRuby's GC](https://reader034.vdocuments.us/reader034/viewer/2022051611/54b794704a7959db528b4b3e/html5/thumbnails/59.jpg)
Parallel worlds of CRuby's GC Powered by Rabbit 0.9.3
In the Mark phase
collector marks live objects that are reachable from Roots.
✓
58/207
![Page 60: Parallel worlds of CRuby's GC](https://reader034.vdocuments.us/reader034/viewer/2022051611/54b794704a7959db528b4b3e/html5/thumbnails/60.jpg)
Parallel worlds of CRuby's GC Powered by Rabbit 0.9.3
For example
59/207
![Page 61: Parallel worlds of CRuby's GC](https://reader034.vdocuments.us/reader034/viewer/2022051611/54b794704a7959db528b4b3e/html5/thumbnails/61.jpg)
Parallel worlds of CRuby's GC Powered by Rabbit 0.9.3
Mark phase with GC.start
60/207
![Page 62: Parallel worlds of CRuby's GC](https://reader034.vdocuments.us/reader034/viewer/2022051611/54b794704a7959db528b4b3e/html5/thumbnails/62.jpg)
Parallel worlds of CRuby's GC Powered by Rabbit 0.9.3
Ruby Heap after marking
61/207
![Page 63: Parallel worlds of CRuby's GC](https://reader034.vdocuments.us/reader034/viewer/2022051611/54b794704a7959db528b4b3e/html5/thumbnails/63.jpg)
Parallel worlds of CRuby's GC Powered by Rabbit 0.9.3
In the Sweep phase
collector sweeps "dead" objects"dead" means unmarked✓
"dead" means unreachable from Roots✓
✓
62/207
![Page 64: Parallel worlds of CRuby's GC](https://reader034.vdocuments.us/reader034/viewer/2022051611/54b794704a7959db528b4b3e/html5/thumbnails/64.jpg)
Parallel worlds of CRuby's GC Powered by Rabbit 0.9.3
Sweep phase
63/207
![Page 65: Parallel worlds of CRuby's GC](https://reader034.vdocuments.us/reader034/viewer/2022051611/54b794704a7959db528b4b3e/html5/thumbnails/65.jpg)
Characteristics of CRuby's GC
![Page 66: Parallel worlds of CRuby's GC](https://reader034.vdocuments.us/reader034/viewer/2022051611/54b794704a7959db528b4b3e/html5/thumbnails/66.jpg)
Parallel worlds of CRuby's GC Powered by Rabbit 0.9.3
Characteristics
The stop-the-world algorithm✓
Single thread execution✓
65/207
![Page 67: Parallel worlds of CRuby's GC](https://reader034.vdocuments.us/reader034/viewer/2022051611/54b794704a7959db528b4b3e/html5/thumbnails/67.jpg)
Parallel worlds of CRuby's GC Powered by Rabbit 0.9.3
Recently, PC has multi-core processors. But,
GC executes on a single thread.✓
Other cores don't work during GC.✓
What a waste!!✓
66/207
![Page 68: Parallel worlds of CRuby's GC](https://reader034.vdocuments.us/reader034/viewer/2022051611/54b794704a7959db528b4b3e/html5/thumbnails/68.jpg)
How can we fix this?
![Page 69: Parallel worlds of CRuby's GC](https://reader034.vdocuments.us/reader034/viewer/2022051611/54b794704a7959db528b4b3e/html5/thumbnails/69.jpg)
UseParallel Marking,Luke
![Page 70: Parallel worlds of CRuby's GC](https://reader034.vdocuments.us/reader034/viewer/2022051611/54b794704a7959db528b4b3e/html5/thumbnails/70.jpg)
What is Parallel Marking?
![Page 71: Parallel worlds of CRuby's GC](https://reader034.vdocuments.us/reader034/viewer/2022051611/54b794704a7959db528b4b3e/html5/thumbnails/71.jpg)
Parallel worlds of CRuby's GC Powered by Rabbit 0.9.3
What is Parallel Marking?
Collector run several marking processes in parallel
by using native threads.✓
✓
We will be happy on multi-core machine.
✓
70/207
![Page 72: Parallel worlds of CRuby's GC](https://reader034.vdocuments.us/reader034/viewer/2022051611/54b794704a7959db528b4b3e/html5/thumbnails/72.jpg)
Parallel worlds of CRuby's GC Powered by Rabbit 0.9.3
Flow diagram for Parallel Marking
71/207
![Page 73: Parallel worlds of CRuby's GC](https://reader034.vdocuments.us/reader034/viewer/2022051611/54b794704a7959db528b4b3e/html5/thumbnails/73.jpg)
BTW:Why not perform
sweeping in parallel?
![Page 74: Parallel worlds of CRuby's GC](https://reader034.vdocuments.us/reader034/viewer/2022051611/54b794704a7959db528b4b3e/html5/thumbnails/74.jpg)
Parallel worlds of CRuby's GC Powered by Rabbit 0.9.3
Why not perform sweeping in parallel
The sweeping is much faster than the marking.
You can see ko1's research✓
<URL:http://www.atdot.net/~ko1/diary/201011.html#d4>
✓
✓
73/207
![Page 75: Parallel worlds of CRuby's GC](https://reader034.vdocuments.us/reader034/viewer/2022051611/54b794704a7959db528b4b3e/html5/thumbnails/75.jpg)
Parallel worlds of CRuby's GC Powered by Rabbit 0.9.3
Why not perform sweeping in parallel
So, Mark phase improvement = GC improvement
✓
And, we already have the lazy sweeping.
✓
74/207
![Page 76: Parallel worlds of CRuby's GC](https://reader034.vdocuments.us/reader034/viewer/2022051611/54b794704a7959db528b4b3e/html5/thumbnails/76.jpg)
Parallel worlds of CRuby's GC Powered by Rabbit 0.9.3
Today's topics
Why do we need Parallel Marking?
✓
What to consider?✓
How to implement?✓
How much did performance improve?
✓
75/207
![Page 77: Parallel worlds of CRuby's GC](https://reader034.vdocuments.us/reader034/viewer/2022051611/54b794704a7959db528b4b3e/html5/thumbnails/77.jpg)
What to consider when implementing Parallel
Marking?
![Page 78: Parallel worlds of CRuby's GC](https://reader034.vdocuments.us/reader034/viewer/2022051611/54b794704a7959db528b4b3e/html5/thumbnails/78.jpg)
Parallel worlds of CRuby's GC Powered by Rabbit 0.9.3
We should consider two problems
Workload balancing✓
Wait-free algorithm✓
77/207
![Page 79: Parallel worlds of CRuby's GC](https://reader034.vdocuments.us/reader034/viewer/2022051611/54b794704a7959db528b4b3e/html5/thumbnails/79.jpg)
Workload balancing
![Page 80: Parallel worlds of CRuby's GC](https://reader034.vdocuments.us/reader034/viewer/2022051611/54b794704a7959db528b4b3e/html5/thumbnails/80.jpg)
How can we divide the marking task into sub-
tasks?
![Page 81: Parallel worlds of CRuby's GC](https://reader034.vdocuments.us/reader034/viewer/2022051611/54b794704a7959db528b4b3e/html5/thumbnails/81.jpg)
I tried think about a simple approach.
![Page 82: Parallel worlds of CRuby's GC](https://reader034.vdocuments.us/reader034/viewer/2022051611/54b794704a7959db528b4b3e/html5/thumbnails/82.jpg)
1 branch of Roots is marked by 1 thread.
![Page 83: Parallel worlds of CRuby's GC](https://reader034.vdocuments.us/reader034/viewer/2022051611/54b794704a7959db528b4b3e/html5/thumbnails/83.jpg)
![Page 84: Parallel worlds of CRuby's GC](https://reader034.vdocuments.us/reader034/viewer/2022051611/54b794704a7959db528b4b3e/html5/thumbnails/84.jpg)
![Page 85: Parallel worlds of CRuby's GC](https://reader034.vdocuments.us/reader034/viewer/2022051611/54b794704a7959db528b4b3e/html5/thumbnails/85.jpg)
Parallel worlds of CRuby's GC Powered by Rabbit 0.9.3
This means..
Tasks are distributed to multiple threads.
✓
The task of marking the entire heap is divided into several tasks, each marking a single branch.
✓
84/207
![Page 86: Parallel worlds of CRuby's GC](https://reader034.vdocuments.us/reader034/viewer/2022051611/54b794704a7959db528b4b3e/html5/thumbnails/86.jpg)
This seems to be no problem.
![Page 87: Parallel worlds of CRuby's GC](https://reader034.vdocuments.us/reader034/viewer/2022051611/54b794704a7959db528b4b3e/html5/thumbnails/87.jpg)
But actually, this solution suffers from the workload
problem.
![Page 88: Parallel worlds of CRuby's GC](https://reader034.vdocuments.us/reader034/viewer/2022051611/54b794704a7959db528b4b3e/html5/thumbnails/88.jpg)
Each thread doesn't know what the other threads are doing.
![Page 89: Parallel worlds of CRuby's GC](https://reader034.vdocuments.us/reader034/viewer/2022051611/54b794704a7959db528b4b3e/html5/thumbnails/89.jpg)
For instance, if A and B finishes work early,
![Page 90: Parallel worlds of CRuby's GC](https://reader034.vdocuments.us/reader034/viewer/2022051611/54b794704a7959db528b4b3e/html5/thumbnails/90.jpg)
then, they will stop doing anything :(
![Page 91: Parallel worlds of CRuby's GC](https://reader034.vdocuments.us/reader034/viewer/2022051611/54b794704a7959db528b4b3e/html5/thumbnails/91.jpg)
I think "machines should work forever" :D
![Page 92: Parallel worlds of CRuby's GC](https://reader034.vdocuments.us/reader034/viewer/2022051611/54b794704a7959db528b4b3e/html5/thumbnails/92.jpg)
So, I think A and B should ...
![Page 93: Parallel worlds of CRuby's GC](https://reader034.vdocuments.us/reader034/viewer/2022051611/54b794704a7959db528b4b3e/html5/thumbnails/93.jpg)
http://www.flickr.com/photos/ryanr/157458385/http://www.flickr.com/photos/ryanr/157458385/
![Page 94: Parallel worlds of CRuby's GC](https://reader034.vdocuments.us/reader034/viewer/2022051611/54b794704a7959db528b4b3e/html5/thumbnails/94.jpg)
Parallel Marking with Task Stealing.
![Page 95: Parallel worlds of CRuby's GC](https://reader034.vdocuments.us/reader034/viewer/2022051611/54b794704a7959db528b4b3e/html5/thumbnails/95.jpg)
If A and B finishes work early,
![Page 96: Parallel worlds of CRuby's GC](https://reader034.vdocuments.us/reader034/viewer/2022051611/54b794704a7959db528b4b3e/html5/thumbnails/96.jpg)
![Page 97: Parallel worlds of CRuby's GC](https://reader034.vdocuments.us/reader034/viewer/2022051611/54b794704a7959db528b4b3e/html5/thumbnails/97.jpg)
This is called"Task Stealing"
![Page 98: Parallel worlds of CRuby's GC](https://reader034.vdocuments.us/reader034/viewer/2022051611/54b794704a7959db528b4b3e/html5/thumbnails/98.jpg)
Parallel worlds of CRuby's GC Powered by Rabbit 0.9.3
We should consider two problems
Workload balancing✓
Wait-free algorithm✓
97/207
![Page 99: Parallel worlds of CRuby's GC](https://reader034.vdocuments.us/reader034/viewer/2022051611/54b794704a7959db528b4b3e/html5/thumbnails/99.jpg)
Wait-free algorithm
![Page 100: Parallel worlds of CRuby's GC](https://reader034.vdocuments.us/reader034/viewer/2022051611/54b794704a7959db528b4b3e/html5/thumbnails/100.jpg)
Parallel worlds of CRuby's GC Powered by Rabbit 0.9.3
What does "wait-free" mean?
A wait-free program does non-blocking execution.
✓
It guarantees per-thread progress.✓
99/207
![Page 101: Parallel worlds of CRuby's GC](https://reader034.vdocuments.us/reader034/viewer/2022051611/54b794704a7959db528b4b3e/html5/thumbnails/101.jpg)
Why is wait-free important?
![Page 102: Parallel worlds of CRuby's GC](https://reader034.vdocuments.us/reader034/viewer/2022051611/54b794704a7959db528b4b3e/html5/thumbnails/102.jpg)
Amdahl's law
![Page 103: Parallel worlds of CRuby's GC](https://reader034.vdocuments.us/reader034/viewer/2022051611/54b794704a7959db528b4b3e/html5/thumbnails/103.jpg)
Parallel worlds of CRuby's GC Powered by Rabbit 0.9.3
Amdahl's law
is used to find the maximum expected improvement to an overall system when only part of the system is improved.
[cited from `Amdahl's law - Wikipedia']
102/207
![Page 104: Parallel worlds of CRuby's GC](https://reader034.vdocuments.us/reader034/viewer/2022051611/54b794704a7959db528b4b3e/html5/thumbnails/104.jpg)
Parallel worlds of CRuby's GC Powered by Rabbit 0.9.3
Amdahl's law is used in parallel computing
If parallel portion of the system is X%
✓
And number of processors is Y,✓
How much speedup can we expect?
✓
103/207
![Page 105: Parallel worlds of CRuby's GC](https://reader034.vdocuments.us/reader034/viewer/2022051611/54b794704a7959db528b4b3e/html5/thumbnails/105.jpg)
![Page 106: Parallel worlds of CRuby's GC](https://reader034.vdocuments.us/reader034/viewer/2022051611/54b794704a7959db528b4b3e/html5/thumbnails/106.jpg)
![Page 107: Parallel worlds of CRuby's GC](https://reader034.vdocuments.us/reader034/viewer/2022051611/54b794704a7959db528b4b3e/html5/thumbnails/107.jpg)
![Page 108: Parallel worlds of CRuby's GC](https://reader034.vdocuments.us/reader034/viewer/2022051611/54b794704a7959db528b4b3e/html5/thumbnails/108.jpg)
It's worse than expected, right?
![Page 109: Parallel worlds of CRuby's GC](https://reader034.vdocuments.us/reader034/viewer/2022051611/54b794704a7959db528b4b3e/html5/thumbnails/109.jpg)
The conclusion so far
![Page 110: Parallel worlds of CRuby's GC](https://reader034.vdocuments.us/reader034/viewer/2022051611/54b794704a7959db528b4b3e/html5/thumbnails/110.jpg)
Parallel worlds of CRuby's GC Powered by Rabbit 0.9.3
The conclusion so far
We should consider how we can efficiently balance workloads.
So, we use Task Stealing.✓
✓
We should eliminate non-parallel parts
by using wait-free algorithm.✓
✓
109/207
![Page 111: Parallel worlds of CRuby's GC](https://reader034.vdocuments.us/reader034/viewer/2022051611/54b794704a7959db528b4b3e/html5/thumbnails/111.jpg)
Parallel worlds of CRuby's GC Powered by Rabbit 0.9.3
Today's topics
Why do we need Parallel Marking?
✓
What to consider?✓
How to implement?✓
How much did performance improve
✓
110/207
![Page 112: Parallel worlds of CRuby's GC](https://reader034.vdocuments.us/reader034/viewer/2022051611/54b794704a7959db528b4b3e/html5/thumbnails/112.jpg)
How to implement Parallel Marking?
![Page 113: Parallel worlds of CRuby's GC](https://reader034.vdocuments.us/reader034/viewer/2022051611/54b794704a7959db528b4b3e/html5/thumbnails/113.jpg)
Parallel worlds of CRuby's GC Powered by Rabbit 0.9.3
Task Stealing
In Task Stealing, threads steal tasks from each other
✓
Task Stealing is achieved with Arora's Deque
✓
112/207
![Page 114: Parallel worlds of CRuby's GC](https://reader034.vdocuments.us/reader034/viewer/2022051611/54b794704a7959db528b4b3e/html5/thumbnails/114.jpg)
Parallel worlds of CRuby's GC Powered by Rabbit 0.9.3
Arora's Deque
Deque stands for the Double-Ended Queue.
✓
In Arora's Deque, the deque contains tasks as elements.
✓
It's a wait-free data structure.✓
113/207
![Page 115: Parallel worlds of CRuby's GC](https://reader034.vdocuments.us/reader034/viewer/2022051611/54b794704a7959db528b4b3e/html5/thumbnails/115.jpg)
Arora's Deque has only three operations.
![Page 116: Parallel worlds of CRuby's GC](https://reader034.vdocuments.us/reader034/viewer/2022051611/54b794704a7959db528b4b3e/html5/thumbnails/116.jpg)
![Page 117: Parallel worlds of CRuby's GC](https://reader034.vdocuments.us/reader034/viewer/2022051611/54b794704a7959db528b4b3e/html5/thumbnails/117.jpg)
![Page 118: Parallel worlds of CRuby's GC](https://reader034.vdocuments.us/reader034/viewer/2022051611/54b794704a7959db528b4b3e/html5/thumbnails/118.jpg)
![Page 119: Parallel worlds of CRuby's GC](https://reader034.vdocuments.us/reader034/viewer/2022051611/54b794704a7959db528b4b3e/html5/thumbnails/119.jpg)
Each mark worker has a single deque.
![Page 120: Parallel worlds of CRuby's GC](https://reader034.vdocuments.us/reader034/viewer/2022051611/54b794704a7959db528b4b3e/html5/thumbnails/120.jpg)
Only the owner can call pop() and push().
![Page 121: Parallel worlds of CRuby's GC](https://reader034.vdocuments.us/reader034/viewer/2022051611/54b794704a7959db528b4b3e/html5/thumbnails/121.jpg)
Worker can call shift() to steal other workers' deque.
![Page 122: Parallel worlds of CRuby's GC](https://reader034.vdocuments.us/reader034/viewer/2022051611/54b794704a7959db528b4b3e/html5/thumbnails/122.jpg)
"Hey wait a minute, doesn't shift() have
contention problems?"
![Page 123: Parallel worlds of CRuby's GC](https://reader034.vdocuments.us/reader034/viewer/2022051611/54b794704a7959db528b4b3e/html5/thumbnails/123.jpg)
Parallel worlds of CRuby's GC Powered by Rabbit 0.9.3
In what ways could shift() cause contention problems?
e.g...
Multi-thread (workers) may call shift() of same deque at the same time.
✓
122/207
![Page 124: Parallel worlds of CRuby's GC](https://reader034.vdocuments.us/reader034/viewer/2022051611/54b794704a7959db528b4b3e/html5/thumbnails/124.jpg)
Parallel worlds of CRuby's GC Powered by Rabbit 0.9.3
In what ways could shift() cause contention problems?
e.g...
shift() and pop() could be called at the same time
when deque has only one element.✓
✓
123/207
![Page 125: Parallel worlds of CRuby's GC](https://reader034.vdocuments.us/reader034/viewer/2022051611/54b794704a7959db528b4b3e/html5/thumbnails/125.jpg)
But, Arora's Deque avoids these contention problems.
![Page 126: Parallel worlds of CRuby's GC](https://reader034.vdocuments.us/reader034/viewer/2022051611/54b794704a7959db528b4b3e/html5/thumbnails/126.jpg)
Parallel worlds of CRuby's GC Powered by Rabbit 0.9.3
Serialization
shift() is serialized by using CAS.CAS = Compare And Swap✓
✓
And, this serialization doesn't use a lock.
It's wait-free!!✓
✓
125/207
![Page 127: Parallel worlds of CRuby's GC](https://reader034.vdocuments.us/reader034/viewer/2022051611/54b794704a7959db528b4b3e/html5/thumbnails/127.jpg)
I omit details of the implementation of the
serialization.
![Page 128: Parallel worlds of CRuby's GC](https://reader034.vdocuments.us/reader034/viewer/2022051611/54b794704a7959db528b4b3e/html5/thumbnails/128.jpg)
For the sake of this presentation, let's assume that Arora's Deque avoids
contention problems.
![Page 129: Parallel worlds of CRuby's GC](https://reader034.vdocuments.us/reader034/viewer/2022051611/54b794704a7959db528b4b3e/html5/thumbnails/129.jpg)
Parallel worlds of CRuby's GC Powered by Rabbit 0.9.3
Summary for Arora's Deque
A simple data structure for Task Stealing.
✓
Each worker has a single deque.✓
Stealing (shift operation) is wait-free!
✓
128/207
![Page 130: Parallel worlds of CRuby's GC](https://reader034.vdocuments.us/reader034/viewer/2022051611/54b794704a7959db528b4b3e/html5/thumbnails/130.jpg)
How to use Arora's Deque in Parallel Marking?
![Page 131: Parallel worlds of CRuby's GC](https://reader034.vdocuments.us/reader034/viewer/2022051611/54b794704a7959db528b4b3e/html5/thumbnails/131.jpg)
First try: A task is an object.
![Page 132: Parallel worlds of CRuby's GC](https://reader034.vdocuments.us/reader034/viewer/2022051611/54b794704a7959db528b4b3e/html5/thumbnails/132.jpg)
Let's say that worker A has a branch that is composed of 4 objects.
![Page 133: Parallel worlds of CRuby's GC](https://reader034.vdocuments.us/reader034/viewer/2022051611/54b794704a7959db528b4b3e/html5/thumbnails/133.jpg)
We start by marking A and pushing it to the deque.
![Page 134: Parallel worlds of CRuby's GC](https://reader034.vdocuments.us/reader034/viewer/2022051611/54b794704a7959db528b4b3e/html5/thumbnails/134.jpg)
pop A, mark B and C, push B and C.
![Page 135: Parallel worlds of CRuby's GC](https://reader034.vdocuments.us/reader034/viewer/2022051611/54b794704a7959db528b4b3e/html5/thumbnails/135.jpg)
pop C, mark D, push D
![Page 136: Parallel worlds of CRuby's GC](https://reader034.vdocuments.us/reader034/viewer/2022051611/54b794704a7959db528b4b3e/html5/thumbnails/136.jpg)
pop D, pop B
![Page 137: Parallel worlds of CRuby's GC](https://reader034.vdocuments.us/reader034/viewer/2022051611/54b794704a7959db528b4b3e/html5/thumbnails/137.jpg)
This is a branch marking.
![Page 138: Parallel worlds of CRuby's GC](https://reader034.vdocuments.us/reader034/viewer/2022051611/54b794704a7959db528b4b3e/html5/thumbnails/138.jpg)
How do you steal?
![Page 139: Parallel worlds of CRuby's GC](https://reader034.vdocuments.us/reader034/viewer/2022051611/54b794704a7959db528b4b3e/html5/thumbnails/139.jpg)
Suppose that worker1 has task B and C. Worker2 has no task.
![Page 140: Parallel worlds of CRuby's GC](https://reader034.vdocuments.us/reader034/viewer/2022051611/54b794704a7959db528b4b3e/html5/thumbnails/140.jpg)
Worker2 steals task B on Worker1 by using shift().
![Page 141: Parallel worlds of CRuby's GC](https://reader034.vdocuments.us/reader034/viewer/2022051611/54b794704a7959db528b4b3e/html5/thumbnails/141.jpg)
Parallel worlds of CRuby's GC Powered by Rabbit 0.9.3
Summary
Marker uses Arora's Deque as a marking stack.
✓
A "task" means an object.The granularity of the task is very fine.✓
✓
This is a naive implementation.✓
140/207
![Page 142: Parallel worlds of CRuby's GC](https://reader034.vdocuments.us/reader034/viewer/2022051611/54b794704a7959db528b4b3e/html5/thumbnails/142.jpg)
I implemented this approach.
![Page 143: Parallel worlds of CRuby's GC](https://reader034.vdocuments.us/reader034/viewer/2022051611/54b794704a7959db528b4b3e/html5/thumbnails/143.jpg)
But..
![Page 144: Parallel worlds of CRuby's GC](https://reader034.vdocuments.us/reader034/viewer/2022051611/54b794704a7959db528b4b3e/html5/thumbnails/144.jpg)
It's slower than original GC.
![Page 145: Parallel worlds of CRuby's GC](https://reader034.vdocuments.us/reader034/viewer/2022051611/54b794704a7959db528b4b3e/html5/thumbnails/145.jpg)
http://www.flickr.com/photos/emariephotos/4958245676/http://www.flickr.com/photos/emariephotos/4958245676/
OMG...
![Page 146: Parallel worlds of CRuby's GC](https://reader034.vdocuments.us/reader034/viewer/2022051611/54b794704a7959db528b4b3e/html5/thumbnails/146.jpg)
I fell intothe Pitfalls ofParallel Processing(PPP!!!)
![Page 147: Parallel worlds of CRuby's GC](https://reader034.vdocuments.us/reader034/viewer/2022051611/54b794704a7959db528b4b3e/html5/thumbnails/147.jpg)
Why slow?
![Page 148: Parallel worlds of CRuby's GC](https://reader034.vdocuments.us/reader034/viewer/2022051611/54b794704a7959db528b4b3e/html5/thumbnails/148.jpg)
Parallel worlds of CRuby's GC Powered by Rabbit 0.9.3
Why slow?
pop(),push(),shift() are called frequently.
Because deque has fine-grained tasks.✓
✓
Their overhead is too big.✓
147/207
![Page 149: Parallel worlds of CRuby's GC](https://reader034.vdocuments.us/reader034/viewer/2022051611/54b794704a7959db528b4b3e/html5/thumbnails/149.jpg)
How to fix this?
![Page 150: Parallel worlds of CRuby's GC](https://reader034.vdocuments.us/reader034/viewer/2022051611/54b794704a7959db528b4b3e/html5/thumbnails/150.jpg)
We can make the tasks less fine-grained.
![Page 151: Parallel worlds of CRuby's GC](https://reader034.vdocuments.us/reader034/viewer/2022051611/54b794704a7959db528b4b3e/html5/thumbnails/151.jpg)
A task is a branch
![Page 152: Parallel worlds of CRuby's GC](https://reader034.vdocuments.us/reader034/viewer/2022051611/54b794704a7959db528b4b3e/html5/thumbnails/152.jpg)
All branches in Roots are divided roughly among the deques.
![Page 153: Parallel worlds of CRuby's GC](https://reader034.vdocuments.us/reader034/viewer/2022051611/54b794704a7959db528b4b3e/html5/thumbnails/153.jpg)
Each Worker marks a branch in its deque.
![Page 154: Parallel worlds of CRuby's GC](https://reader034.vdocuments.us/reader034/viewer/2022051611/54b794704a7959db528b4b3e/html5/thumbnails/154.jpg)
When the deque is empty, the worker steals a branch from another worker.
![Page 155: Parallel worlds of CRuby's GC](https://reader034.vdocuments.us/reader034/viewer/2022051611/54b794704a7959db528b4b3e/html5/thumbnails/155.jpg)
like this!!
![Page 156: Parallel worlds of CRuby's GC](https://reader034.vdocuments.us/reader034/viewer/2022051611/54b794704a7959db528b4b3e/html5/thumbnails/156.jpg)
Parallel worlds of CRuby's GC Powered by Rabbit 0.9.3
Good point & Bad point
Number of calls to Deque's operations was reduced.
Marking speed of the worker is improved.
✓
✓
However, Coarse-grained tasks decrease parallelism.
✓
155/207
![Page 157: Parallel worlds of CRuby's GC](https://reader034.vdocuments.us/reader034/viewer/2022051611/54b794704a7959db528b4b3e/html5/thumbnails/157.jpg)
Why do coarse-grained tasks decrease parallelism?
![Page 158: Parallel worlds of CRuby's GC](https://reader034.vdocuments.us/reader034/viewer/2022051611/54b794704a7959db528b4b3e/html5/thumbnails/158.jpg)
Tasks may involve a large branch.
![Page 159: Parallel worlds of CRuby's GC](https://reader034.vdocuments.us/reader034/viewer/2022051611/54b794704a7959db528b4b3e/html5/thumbnails/159.jpg)
If an object in B's branch has many child objects..
![Page 160: Parallel worlds of CRuby's GC](https://reader034.vdocuments.us/reader034/viewer/2022051611/54b794704a7959db528b4b3e/html5/thumbnails/160.jpg)
.. then A can't steal it while B is marking the large branch.
![Page 161: Parallel worlds of CRuby's GC](https://reader034.vdocuments.us/reader034/viewer/2022051611/54b794704a7959db528b4b3e/html5/thumbnails/161.jpg)
So, the worker needs to treat large branches as
special cases.
![Page 162: Parallel worlds of CRuby's GC](https://reader034.vdocuments.us/reader034/viewer/2022051611/54b794704a7959db528b4b3e/html5/thumbnails/162.jpg)
Almost all large branches hold large Array objects
and/or large Hash objects.
![Page 163: Parallel worlds of CRuby's GC](https://reader034.vdocuments.us/reader034/viewer/2022051611/54b794704a7959db528b4b3e/html5/thumbnails/163.jpg)
Parallel worlds of CRuby's GC Powered by Rabbit 0.9.3
Treatment for large Array objects and Hash objects
Each marker has a special deque to manage them.
✓
A marker divides them into fixed size tasks.
e.g. 0-9 elements of Array, 10-19 elements of Array...
✓
✓
162/207
![Page 164: Parallel worlds of CRuby's GC](https://reader034.vdocuments.us/reader034/viewer/2022051611/54b794704a7959db528b4b3e/html5/thumbnails/164.jpg)
Parallel worlds of CRuby's GC Powered by Rabbit 0.9.3
Treatment for Large Array and Hash
By doing this, other workers can steal divided tasks.
This improves parallelism.✓
✓
163/207
![Page 165: Parallel worlds of CRuby's GC](https://reader034.vdocuments.us/reader034/viewer/2022051611/54b794704a7959db528b4b3e/html5/thumbnails/165.jpg)
Parallel worlds of CRuby's GC Powered by Rabbit 0.9.3
Summary
The naive implementation was slow.
Grain of the task was too fine.✓
✓
A "task" means a branch in RootsGrain of the task is coarse.✓
✓
It's faster!!✓164/207
![Page 166: Parallel worlds of CRuby's GC](https://reader034.vdocuments.us/reader034/viewer/2022051611/54b794704a7959db528b4b3e/html5/thumbnails/166.jpg)
Parallel worlds of CRuby's GC Powered by Rabbit 0.9.3
Today's topics
Why do we need Parallel Marking?
✓
What to consider?✓
How to implement?✓
How much did performance improve?
✓
165/207
![Page 167: Parallel worlds of CRuby's GC](https://reader034.vdocuments.us/reader034/viewer/2022051611/54b794704a7959db528b4b3e/html5/thumbnails/167.jpg)
How much did performance improve?
![Page 168: Parallel worlds of CRuby's GC](https://reader034.vdocuments.us/reader034/viewer/2022051611/54b794704a7959db528b4b3e/html5/thumbnails/168.jpg)
Parallel worlds of CRuby's GC Powered by Rabbit 0.9.3
These are my machine specs
My machine has only 2 cores✓
Memory: 8GB✓
OS: Linux✓
167/207
![Page 169: Parallel worlds of CRuby's GC](https://reader034.vdocuments.us/reader034/viewer/2022051611/54b794704a7959db528b4b3e/html5/thumbnails/169.jpg)
Parallel marking uses 4 marking threads.
![Page 170: Parallel worlds of CRuby's GC](https://reader034.vdocuments.us/reader034/viewer/2022051611/54b794704a7959db528b4b3e/html5/thumbnails/170.jpg)
Parallel worlds of CRuby's GC Powered by Rabbit 0.9.3
First benchmark program is
make benchmarkThis is the benchmark which used in CRuby development
✓
✓
169/207
![Page 171: Parallel worlds of CRuby's GC](https://reader034.vdocuments.us/reader034/viewer/2022051611/54b794704a7959db528b4b3e/html5/thumbnails/171.jpg)
![Page 172: Parallel worlds of CRuby's GC](https://reader034.vdocuments.us/reader034/viewer/2022051611/54b794704a7959db528b4b3e/html5/thumbnails/172.jpg)
Parallel worlds of CRuby's GC Powered by Rabbit 0.9.3
Why does this seem so slow?
I think it's affected by Parallel Marking's preparation.
e.g. creating marking threads, allocation of deques.
✓
✓
171/207
![Page 173: Parallel worlds of CRuby's GC](https://reader034.vdocuments.us/reader034/viewer/2022051611/54b794704a7959db528b4b3e/html5/thumbnails/173.jpg)
Parallel worlds of CRuby's GC Powered by Rabbit 0.9.3
Why does this seem so slow?
In most of the benchmarks, the mark target objects are few.
In this case, Parallel Marking cost is expensive.
✓
✓
172/207
![Page 174: Parallel worlds of CRuby's GC](https://reader034.vdocuments.us/reader034/viewer/2022051611/54b794704a7959db528b4b3e/html5/thumbnails/174.jpg)
Parallel worlds of CRuby's GC Powered by Rabbit 0.9.3
Next benchmark program is
make rdocmake rdoc generates the Ruby documentation.
✓
This benchmark measures execution time and the GC execution time of make rdoc.
✓
✓
173/207
![Page 175: Parallel worlds of CRuby's GC](https://reader034.vdocuments.us/reader034/viewer/2022051611/54b794704a7959db528b4b3e/html5/thumbnails/175.jpg)
Parallel worlds of CRuby's GC Powered by Rabbit 0.9.3
make rdoc
It takes about 80 seconds on my machine.
✓
In fact, 30% of that time is spent on GC!!
✓
How much did performance improve?
✓
174/207
![Page 176: Parallel worlds of CRuby's GC](https://reader034.vdocuments.us/reader034/viewer/2022051611/54b794704a7959db528b4b3e/html5/thumbnails/176.jpg)
![Page 177: Parallel worlds of CRuby's GC](https://reader034.vdocuments.us/reader034/viewer/2022051611/54b794704a7959db528b4b3e/html5/thumbnails/177.jpg)
All GC time is improved by 40%!
![Page 178: Parallel worlds of CRuby's GC](https://reader034.vdocuments.us/reader034/viewer/2022051611/54b794704a7959db528b4b3e/html5/thumbnails/178.jpg)
So fast!!
![Page 179: Parallel worlds of CRuby's GC](https://reader034.vdocuments.us/reader034/viewer/2022051611/54b794704a7959db528b4b3e/html5/thumbnails/179.jpg)
Parallel worlds of CRuby's GC Powered by Rabbit 0.9.3
In many core environment
I expect we get a large improvement.
e.g. 8 core, 16 core...✓
✓
But, my machine has just 2 cores.I can't see it :(✓
✓
178/207
![Page 180: Parallel worlds of CRuby's GC](https://reader034.vdocuments.us/reader034/viewer/2022051611/54b794704a7959db528b4b3e/html5/thumbnails/180.jpg)
Parallel worlds of CRuby's GC Powered by Rabbit 0.9.3
Best case for Parallel GC
If the objects are many.In this case, mark targets is also many.✓
✓
If the objects are long-lived.Server-side application?✓
✓
179/207
![Page 181: Parallel worlds of CRuby's GC](https://reader034.vdocuments.us/reader034/viewer/2022051611/54b794704a7959db528b4b3e/html5/thumbnails/181.jpg)
Demo
![Page 182: Parallel worlds of CRuby's GC](https://reader034.vdocuments.us/reader034/viewer/2022051611/54b794704a7959db528b4b3e/html5/thumbnails/182.jpg)
Parallel worlds of CRuby's GC Powered by Rabbit 0.9.3
Demonstration
I want to show the performance improvement with Parallel GC.
✓
This demonstration is video game style.
✓
181/207
![Page 183: Parallel worlds of CRuby's GC](https://reader034.vdocuments.us/reader034/viewer/2022051611/54b794704a7959db528b4b3e/html5/thumbnails/183.jpg)
Let me explain about this game.
![Page 184: Parallel worlds of CRuby's GC](https://reader034.vdocuments.us/reader034/viewer/2022051611/54b794704a7959db528b4b3e/html5/thumbnails/184.jpg)
And, Character has HP.
![Page 185: Parallel worlds of CRuby's GC](https://reader034.vdocuments.us/reader034/viewer/2022051611/54b794704a7959db528b4b3e/html5/thumbnails/185.jpg)
When GC runs,
![Page 186: Parallel worlds of CRuby's GC](https://reader034.vdocuments.us/reader034/viewer/2022051611/54b794704a7959db528b4b3e/html5/thumbnails/186.jpg)
the character loses HP while waiting for the GC to finish.
![Page 187: Parallel worlds of CRuby's GC](https://reader034.vdocuments.us/reader034/viewer/2022051611/54b794704a7959db528b4b3e/html5/thumbnails/187.jpg)
We must reach the goal before HP run out.
![Page 188: Parallel worlds of CRuby's GC](https://reader034.vdocuments.us/reader034/viewer/2022051611/54b794704a7959db528b4b3e/html5/thumbnails/188.jpg)
Parallel worlds of CRuby's GC Powered by Rabbit 0.9.3
Other characteristics of SUPER NARIO GC
GC is running in fixed intervals.✓
A lot of objects are generated to increase GC's burden.
Burden = Game Level✓
✓
187/207
![Page 189: Parallel worlds of CRuby's GC](https://reader034.vdocuments.us/reader034/viewer/2022051611/54b794704a7959db528b4b3e/html5/thumbnails/189.jpg)
Parallel worlds of CRuby's GC Powered by Rabbit 0.9.3
Try to compare Original GC and Parallel GC
Original GC pause time is long.This game will be difficult.✓
✓
Parallel GC pause time is short.This game will be easy.✓
✓
188/207
![Page 190: Parallel worlds of CRuby's GC](https://reader034.vdocuments.us/reader034/viewer/2022051611/54b794704a7959db528b4b3e/html5/thumbnails/190.jpg)
OK, Let's try!
![Page 191: Parallel worlds of CRuby's GC](https://reader034.vdocuments.us/reader034/viewer/2022051611/54b794704a7959db528b4b3e/html5/thumbnails/191.jpg)
DEMOOriginal GC version
![Page 192: Parallel worlds of CRuby's GC](https://reader034.vdocuments.us/reader034/viewer/2022051611/54b794704a7959db528b4b3e/html5/thumbnails/192.jpg)
Oops.. so difficult!!!
![Page 193: Parallel worlds of CRuby's GC](https://reader034.vdocuments.us/reader034/viewer/2022051611/54b794704a7959db528b4b3e/html5/thumbnails/193.jpg)
DEMOParallel GC version
![Page 194: Parallel worlds of CRuby's GC](https://reader034.vdocuments.us/reader034/viewer/2022051611/54b794704a7959db528b4b3e/html5/thumbnails/194.jpg)
Wow!! Easy!!!!
![Page 195: Parallel worlds of CRuby's GC](https://reader034.vdocuments.us/reader034/viewer/2022051611/54b794704a7959db528b4b3e/html5/thumbnails/195.jpg)
Let's compare average times GC
![Page 196: Parallel worlds of CRuby's GC](https://reader034.vdocuments.us/reader034/viewer/2022051611/54b794704a7959db528b4b3e/html5/thumbnails/196.jpg)
![Page 197: Parallel worlds of CRuby's GC](https://reader034.vdocuments.us/reader034/viewer/2022051611/54b794704a7959db528b4b3e/html5/thumbnails/197.jpg)
Fast!!
![Page 198: Parallel worlds of CRuby's GC](https://reader034.vdocuments.us/reader034/viewer/2022051611/54b794704a7959db528b4b3e/html5/thumbnails/198.jpg)
Remaining Problems
![Page 199: Parallel worlds of CRuby's GC](https://reader034.vdocuments.us/reader034/viewer/2022051611/54b794704a7959db528b4b3e/html5/thumbnails/199.jpg)
Parallel worlds of CRuby's GC Powered by Rabbit 0.9.3
Windows OS is not supported
Mark Worker uses pthread as native thread.
✓
And, uses some gcc built-in functions.
✓
But, I'll support for Windows eventually.
✓
198/207
![Page 200: Parallel worlds of CRuby's GC](https://reader034.vdocuments.us/reader034/viewer/2022051611/54b794704a7959db528b4b3e/html5/thumbnails/200.jpg)
Parallel worlds of CRuby's GC Powered by Rabbit 0.9.3
Increased memory usage.
Size of 1 Deque is roughly 32KB.✓
But generally multi-core machine have plenty of memory.
So, I think it's OK :P✓
✓
199/207
![Page 201: Parallel worlds of CRuby's GC](https://reader034.vdocuments.us/reader034/viewer/2022051611/54b794704a7959db528b4b3e/html5/thumbnails/201.jpg)
Conclusion
![Page 202: Parallel worlds of CRuby's GC](https://reader034.vdocuments.us/reader034/viewer/2022051611/54b794704a7959db528b4b3e/html5/thumbnails/202.jpg)
Parallel worlds of CRuby's GC Powered by Rabbit 0.9.3
Conclusion
I implemented Parallel Marking GC
✓
GC was improved!I'll report to ruby-core soon.✓
✓
201/207
![Page 203: Parallel worlds of CRuby's GC](https://reader034.vdocuments.us/reader034/viewer/2022051611/54b794704a7959db528b4b3e/html5/thumbnails/203.jpg)
Parallel worlds of CRuby's GC Powered by Rabbit 0.9.3
Conclusion
But, Parallel Marking has some problems.
I'll fix these.✓
✓
202/207
![Page 204: Parallel worlds of CRuby's GC](https://reader034.vdocuments.us/reader034/viewer/2022051611/54b794704a7959db528b4b3e/html5/thumbnails/204.jpg)
Parallel worlds of CRuby's GC Powered by Rabbit 0.9.3
source code
Parallel Marking GC<URL:https://github.com/authorNari/ruby/tree/pmark_div_root2>
✓
✓
SUPER NARIO GC<URL:https://github.com/authorNari/nario/>
✓
✓
203/207
![Page 205: Parallel worlds of CRuby's GC](https://reader034.vdocuments.us/reader034/viewer/2022051611/54b794704a7959db528b4b3e/html5/thumbnails/205.jpg)
Parallel worlds of CRuby's GC Powered by Rabbit 0.9.3
Acknowledgments
Following people helped me make this presentation!!
Tor-san!!✓
matz, shugo, yhara, sada, takaokouji, other co-workers!!
✓
✓
204/207
![Page 206: Parallel worlds of CRuby's GC](https://reader034.vdocuments.us/reader034/viewer/2022051611/54b794704a7959db528b4b3e/html5/thumbnails/206.jpg)
Thank you!!!
![Page 207: Parallel worlds of CRuby's GC](https://reader034.vdocuments.us/reader034/viewer/2022051611/54b794704a7959db528b4b3e/html5/thumbnails/207.jpg)
Do you have any questions?
Please short and simple questions :)
![Page 208: Parallel worlds of CRuby's GC](https://reader034.vdocuments.us/reader034/viewer/2022051611/54b794704a7959db528b4b3e/html5/thumbnails/208.jpg)
Parallel worlds of CRuby's GC Powered by Rabbit 0.9.3
Sorry
It's too difficult for me to understand/answer the question.
✓
Could be send the question on twitter(@nari_en)?
✓
207/207