scanning strings at supersonic speed (euruko 2011)

42

Upload: kornelius-kalnbach

Post on 07-Apr-2018

219 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Scanning Strings at Supersonic Speed (EuRuKo 2011)

8/6/2019 Scanning Strings at Supersonic Speed (EuRuKo 2011)

http://slidepdf.com/reader/full/scanning-strings-at-supersonic-speed-euruko-2011 1/46

Page 2: Scanning Strings at Supersonic Speed (EuRuKo 2011)

8/6/2019 Scanning Strings at Supersonic Speed (EuRuKo 2011)

http://slidepdf.com/reader/full/scanning-strings-at-supersonic-speed-euruko-2011 2/46

Scanning

Strings———— at ————

Supersonic

Speed

Scanning

Page 3: Scanning Strings at Supersonic Speed (EuRuKo 2011)

8/6/2019 Scanning Strings at Supersonic Speed (EuRuKo 2011)

http://slidepdf.com/reader/full/scanning-strings-at-supersonic-speed-euruko-2011 3/46

@

murphy

(Kornelius Kalnbach)

[email protected]@murphy_karasu  murfy

Everyththat can go

will go

sofatutor.com

Page 4: Scanning Strings at Supersonic Speed (EuRuKo 2011)

8/6/2019 Scanning Strings at Supersonic Speed (EuRuKo 2011)

http://slidepdf.com/reader/full/scanning-strings-at-supersonic-speed-euruko-2011 4/46

Scanning"Lorem ipsum dolor sit amet, consectetur adielit, sed do eiusmod tempor incididunt ut la

dolore magna aliqua. Ut enim ad minim venianostrud exercitation ullamco laboris nisi ut al

ea commodo consequat. Duis aute irure doreprehenderit in voluptate velit esse cillum d

fugiat nulla pariatur. Excepteur sint occaecat cnon proident, sunt in culpa qui officia deseru

anim id est lakorum."

Page 5: Scanning Strings at Supersonic Speed (EuRuKo 2011)

8/6/2019 Scanning Strings at Supersonic Speed (EuRuKo 2011)

http://slidepdf.com/reader/full/scanning-strings-at-supersonic-speed-euruko-2011 5/46

Strings"Lorem ipsum dolor sit amet, consectetur adielit, sed do eiusmod tempor incididunt ut la

dolore magna aliqua. Ut enim ad minim venianostrud exercitation ullamco laboris nisi ut al

ea commodo consequat. Duis aute irure doreprehenderit in voluptate velit esse cillum d

fugiat nulla pariatur. Excepteur sint occaecat cnon proident, sunt in culpa qui officia deserun

anim id est lakorum."

Page 6: Scanning Strings at Supersonic Speed (EuRuKo 2011)

8/6/2019 Scanning Strings at Supersonic Speed (EuRuKo 2011)

http://slidepdf.com/reader/full/scanning-strings-at-supersonic-speed-euruko-2011 6/46

Strings"<!DOCTYPE html><html lang="en"><head><charset="utf-8"><title>Agenda –EuRuKo 20title><link rel="stylesheet" href="/styleshe

screen.css"><link rel="stylesheet" href="fanc

jquery.fancybox-1.3.4.css"><linkrel="altertype="application/atom+xml" title="ATO

feed"href="http://euruko2011.org/feed.atomrel="alternate"type="application/atom+x

title="github feed"…".scan

Page 7: Scanning Strings at Supersonic Speed (EuRuKo 2011)

8/6/2019 Scanning Strings at Supersonic Speed (EuRuKo 2011)

http://slidepdf.com/reader/full/scanning-strings-at-supersonic-speed-euruko-2011 7/46

Page 8: Scanning Strings at Supersonic Speed (EuRuKo 2011)

8/6/2019 Scanning Strings at Supersonic Speed (EuRuKo 2011)

http://slidepdf.com/reader/full/scanning-strings-at-supersonic-speed-euruko-2011 8/46

Strings"class Songdef initialize name, author@name = name@author = author

enddef to_s"this is :#{@name}----#{@author

end

end".scan

Page 9: Scanning Strings at Supersonic Speed (EuRuKo 2011)

8/6/2019 Scanning Strings at Supersonic Speed (EuRuKo 2011)

http://slidepdf.com/reader/full/scanning-strings-at-supersonic-speed-euruko-2011 9/46

Supersonic

3KB

Mach 1.8 = 1900 km/h

2514 pages per second

7.5 MB/s

Page 10: Scanning Strings at Supersonic Speed (EuRuKo 2011)

8/6/2019 Scanning Strings at Supersonic Speed (EuRuKo 2011)

http://slidepdf.com/reader/full/scanning-strings-at-supersonic-speed-euruko-2011 10/46

SpeedWhy?

• Syntax Highlighting

• Parsing (eg. HAML)

• Rite

Page 11: Scanning Strings at Supersonic Speed (EuRuKo 2011)

8/6/2019 Scanning Strings at Supersonic Speed (EuRuKo 2011)

http://slidepdf.com/reader/full/scanning-strings-at-supersonic-speed-euruko-2011 11/46

Shootout

12.283

3.565

11.571

Page 12: Scanning Strings at Supersonic Speed (EuRuKo 2011)

8/6/2019 Scanning Strings at Supersonic Speed (EuRuKo 2011)

http://slidepdf.com/reader/full/scanning-strings-at-supersonic-speed-euruko-2011 12/46

write a supersonic scannerwith pure Ruby code

Target

Page 13: Scanning Strings at Supersonic Speed (EuRuKo 2011)

8/6/2019 Scanning Strings at Supersonic Speed (EuRuKo 2011)

http://slidepdf.com/reader/full/scanning-strings-at-supersonic-speed-euruko-2011 13/46

a fast machine

Resources

Page 14: Scanning Strings at Supersonic Speed (EuRuKo 2011)

8/6/2019 Scanning Strings at Supersonic Speed (EuRuKo 2011)

http://slidepdf.com/reader/full/scanning-strings-at-supersonic-speed-euruko-2011 14/46

big examples

5+ seconds

here: 160 MB

Resources

= 9 times ruby-head

Page 15: Scanning Strings at Supersonic Speed (EuRuKo 2011)

8/6/2019 Scanning Strings at Supersonic Speed (EuRuKo 2011)

http://slidepdf.com/reader/full/scanning-strings-at-supersonic-speed-euruko-2011 15/46

rvm

Resources

rvm.beginr

Page 16: Scanning Strings at Supersonic Speed (EuRuKo 2011)

8/6/2019 Scanning Strings at Supersonic Speed (EuRuKo 2011)

http://slidepdf.com/reader/full/scanning-strings-at-supersonic-speed-euruko-2011 16/46

Time

Endurance

Craziness

Resources

← rvm.beginrescueend.com

Page 17: Scanning Strings at Supersonic Speed (EuRuKo 2011)

8/6/2019 Scanning Strings at Supersonic Speed (EuRuKo 2011)

http://slidepdf.com/reader/full/scanning-strings-at-supersonic-speed-euruko-2011 17/46

General Ideas

C

• avoid convenient APIs

• write everything yourself 

• write for the CPU

Page 18: Scanning Strings at Supersonic Speed (EuRuKo 2011)

8/6/2019 Scanning Strings at Supersonic Speed (EuRuKo 2011)

http://slidepdf.com/reader/full/scanning-strings-at-supersonic-speed-euruko-2011 18/46

General Ideas

Ruby

• embrace the core libraries

• write less code

• write for the interpreter

Page 19: Scanning Strings at Supersonic Speed (EuRuKo 2011)

8/6/2019 Scanning Strings at Supersonic Speed (EuRuKo 2011)

http://slidepdf.com/reader/full/scanning-strings-at-supersonic-speed-euruko-2011 19/46

General Ideas

Ruby

• embrace the core libraries

• write less code

• write for the interpreter

Page 20: Scanning Strings at Supersonic Speed (EuRuKo 2011)

8/6/2019 Scanning Strings at Supersonic Speed (EuRuKo 2011)

http://slidepdf.com/reader/full/scanning-strings-at-supersonic-speed-euruko-2011 20/46

String

s = "<head><title>EuRuKo 2011</title>"

Page 21: Scanning Strings at Supersonic Speed (EuRuKo 2011)

8/6/2019 Scanning Strings at Supersonic Speed (EuRuKo 2011)

http://slidepdf.com/reader/full/scanning-strings-at-supersonic-speed-euruko-2011 21/46

Page 22: Scanning Strings at Supersonic Speed (EuRuKo 2011)

8/6/2019 Scanning Strings at Supersonic Speed (EuRuKo 2011)

http://slidepdf.com/reader/full/scanning-strings-at-supersonic-speed-euruko-2011 22/46

Page 23: Scanning Strings at Supersonic Speed (EuRuKo 2011)

8/6/2019 Scanning Strings at Supersonic Speed (EuRuKo 2011)

http://slidepdf.com/reader/full/scanning-strings-at-supersonic-speed-euruko-2011 23/46

String + RegExp

s = "<head><title>EuRuKo 2011</title>"

puts s.scan(/<(\w+)/)

headtitle

Page 24: Scanning Strings at Supersonic Speed (EuRuKo 2011)

8/6/2019 Scanning Strings at Supersonic Speed (EuRuKo 2011)

http://slidepdf.com/reader/full/scanning-strings-at-supersonic-speed-euruko-2011 24/46

String + RegExp

s = "<head><title>EuRuKo 2011</title>"

s.scan(/<(\w+)/) do |tag|puts tag

end

headtitle

Page 25: Scanning Strings at Supersonic Speed (EuRuKo 2011)

8/6/2019 Scanning Strings at Supersonic Speed (EuRuKo 2011)

http://slidepdf.com/reader/full/scanning-strings-at-supersonic-speed-euruko-2011 25/46

Page 26: Scanning Strings at Supersonic Speed (EuRuKo 2011)

8/6/2019 Scanning Strings at Supersonic Speed (EuRuKo 2011)

http://slidepdf.com/reader/full/scanning-strings-at-supersonic-speed-euruko-2011 26/46

StringScanner

Why?

• avoid big RegExp

• control the scan process

• use patterns depending on state

• create patterns on the fly

h k

Page 27: Scanning Strings at Supersonic Speed (EuRuKo 2011)

8/6/2019 Scanning Strings at Supersonic Speed (EuRuKo 2011)

http://slidepdf.com/reader/full/scanning-strings-at-supersonic-speed-euruko-2011 27/46

Benchmark

s = "<head><title>EuRuKo 2011</title>"

s *= 5_000_000

s.scan(/<(\w+)/) do |tag|tag

endsoni

1.81.9

jru

rb

ma

S i S

Page 28: Scanning Strings at Supersonic Speed (EuRuKo 2011)

8/6/2019 Scanning Strings at Supersonic Speed (EuRuKo 2011)

http://slidepdf.com/reader/full/scanning-strings-at-supersonic-speed-euruko-2011 28/46

StringScanner

scanner = StringScanner.new(s)

until scanner.eos?if scanner.scan(/<(\w+)/)tag = scanner[1]

else  scanner.getch

endend

soni

1.81.9

jru

rb

ma

St i S M R b

Page 29: Scanning Strings at Supersonic Speed (EuRuKo 2011)

8/6/2019 Scanning Strings at Supersonic Speed (EuRuKo 2011)

http://slidepdf.com/reader/full/scanning-strings-at-supersonic-speed-euruko-2011 29/46

StringScanner on MacRuby

http://www.macruby.org/trac/ticket/938

St i S

Page 30: Scanning Strings at Supersonic Speed (EuRuKo 2011)

8/6/2019 Scanning Strings at Supersonic Speed (EuRuKo 2011)

http://slidepdf.com/reader/full/scanning-strings-at-supersonic-speed-euruko-2011 30/46

StringScanner

scanner = StringScanner.new(s)

until scanner.eos?if scanner.scan(/<(\w+)/)tag = scanner[1]

else  scanner.getch

endend

soni

1.81.9

jru

rb

ma

sonic™  21.2

1.8.7 13.61.9.2 5.4

jruby 4.7

rbx 29.1

mac 34.7

G l Id

Page 31: Scanning Strings at Supersonic Speed (EuRuKo 2011)

8/6/2019 Scanning Strings at Supersonic Speed (EuRuKo 2011)

http://slidepdf.com/reader/full/scanning-strings-at-supersonic-speed-euruko-2011 31/46

General Ideas

Ruby

• embrace the core libraries

• write less code

• write for the interpreter

L C d

Page 32: Scanning Strings at Supersonic Speed (EuRuKo 2011)

8/6/2019 Scanning Strings at Supersonic Speed (EuRuKo 2011)

http://slidepdf.com/reader/full/scanning-strings-at-supersonic-speed-euruko-2011 32/46

Less Code

class HTML

def initialize html  @scanner = StringScanner.new(html)end

 def scanuntil @scanner.eos?

if @scanner.scan(/<(\w+)/)yield @scanner[1]

...end

end

L C d

Page 33: Scanning Strings at Supersonic Speed (EuRuKo 2011)

8/6/2019 Scanning Strings at Supersonic Speed (EuRuKo 2011)

http://slidepdf.com/reader/full/scanning-strings-at-supersonic-speed-euruko-2011 33/46

Less Code

class HTML

def initialize html  @scanner = StringScanner.new(html)end

 def scanuntil eos?

if scan(/<(\w+)/)yield self[1]

...end

end

L C d

Page 34: Scanning Strings at Supersonic Speed (EuRuKo 2011)

8/6/2019 Scanning Strings at Supersonic Speed (EuRuKo 2011)

http://slidepdf.com/reader/full/scanning-strings-at-supersonic-speed-euruko-2011 34/46

Less Code

class HTML

def initialize html  @scanner = StringScanner.new(html)enddelegate :eos?, :scan, :[]def scanuntil eos?

if scan(/<(\w+)/)yield self[1]

...end

end

Less Code

Page 35: Scanning Strings at Supersonic Speed (EuRuKo 2011)

8/6/2019 Scanning Strings at Supersonic Speed (EuRuKo 2011)

http://slidepdf.com/reader/full/scanning-strings-at-supersonic-speed-euruko-2011 35/46

Less Code

class HTML

def initialize html  @scanner = StringScanner.new(html)enddelegate :eos?, :scan, :[]def tokenizeuntil eos?

if scan(/<(\w+)/)yield self[1]

...end

end

Less Code

Page 36: Scanning Strings at Supersonic Speed (EuRuKo 2011)

8/6/2019 Scanning Strings at Supersonic Speed (EuRuKo 2011)

http://slidepdf.com/reader/full/scanning-strings-at-supersonic-speed-euruko-2011 36/46

Less Code

class HTML < StringScanner

def initialize html  superend

 def tokenizeuntil eos?

if scan(/<(\w+)/)yield self[1]

...end

end

Less Code

Page 37: Scanning Strings at Supersonic Speed (EuRuKo 2011)

8/6/2019 Scanning Strings at Supersonic Speed (EuRuKo 2011)

http://slidepdf.com/reader/full/scanning-strings-at-supersonic-speed-euruko-2011 37/46

Less Code

class HTML < StringScanner

def tokenizeuntil eos?if scan(/<(\w+)/)yield self[1]

...end

end

General Ideas

Page 38: Scanning Strings at Supersonic Speed (EuRuKo 2011)

8/6/2019 Scanning Strings at Supersonic Speed (EuRuKo 2011)

http://slidepdf.com/reader/full/scanning-strings-at-supersonic-speed-euruko-2011 38/46

General Ideas

Ruby

• embrace the core libraries

• write less code

• write for the interpreter

Page 39: Scanning Strings at Supersonic Speed (EuRuKo 2011)

8/6/2019 Scanning Strings at Supersonic Speed (EuRuKo 2011)

http://slidepdf.com/reader/full/scanning-strings-at-supersonic-speed-euruko-2011 39/46

out =Encoder

.new.encodeScanner

.new(in)

• Scanner: simple, 9 rules

• Encoder: does nothing

Single Core

Page 40: Scanning Strings at Supersonic Speed (EuRuKo 2011)

8/6/2019 Scanning Strings at Supersonic Speed (EuRuKo 2011)

http://slidepdf.com/reader/full/scanning-strings-at-supersonic-speed-euruko-2011 40/46

Single Core

out =Encoder

.new.encodeScanner

.new(in)

soni

jru

1.

• Scanner: simple, 9 rules

• Encoder: does nothing

Page 41: Scanning Strings at Supersonic Speed (EuRuKo 2011)

8/6/2019 Scanning Strings at Supersonic Speed (EuRuKo 2011)

http://slidepdf.com/reader/full/scanning-strings-at-supersonic-speed-euruko-2011 41/46

threads = []

in.lines.each_slice 300_000 do |lines|threads << Thread.new do

  chunk = lines.joinThread.current[:out] = Encoder.new.encode Scanner

  endend

threads.each(&:join)

out = threads.map { |thread| thread[:out] }.joinsoni

jru

1.

Page 42: Scanning Strings at Supersonic Speed (EuRuKo 2011)

8/6/2019 Scanning Strings at Supersonic Speed (EuRuKo 2011)

http://slidepdf.com/reader/full/scanning-strings-at-supersonic-speed-euruko-2011 42/46

threads = []chunk_offsets = [0]in.lines.each_slice slice_size do |lines|chunk_offsets << chunk_offsets.last + lines.join.by

end

chunk_offsets.each_cons(2) do |this_chunk, next_chunkthreads << Thread.new do

chunk = code[this_chunk...next_chunk]  Thread.current[:out] = Encoder.new.encode Scannerend

end

threads.each(&:join)

out = threads.map { |thread| thread[:out] }.join

soni

jru

1.

Page 43: Scanning Strings at Supersonic Speed (EuRuKo 2011)

8/6/2019 Scanning Strings at Supersonic Speed (EuRuKo 2011)

http://slidepdf.com/reader/full/scanning-strings-at-supersonic-speed-euruko-2011 43/46

threads = []chunk_offsets = [0]in.lines.each_slice slice_size do |lines|chunk_offsets << chunk_offsets.last + lines.join.by

end

chunk_offsets.each_cons(2) do |this_chunk, next_chunkthreads << Thread.new do

chunk = code[this_chunk...next_chunk]  Thread.current[:out] = Encoder.new.encode Scannerend

end

threads.each(&:join)

out = threads.map { |thread| thread[:out] }.join

inp

joi

offs

Page 44: Scanning Strings at Supersonic Speed (EuRuKo 2011)

8/6/2019 Scanning Strings at Supersonic Speed (EuRuKo 2011)

http://slidepdf.com/reader/full/scanning-strings-at-supersonic-speed-euruko-2011 44/46

threads = []chunk_offsets = [0]in.lines.each_slice slice_size do |lines|chunk_offsets << chunk_offsets.last + lines.join.by

end

chunk_offsets.each_cons(2) do |this_chunk, next_chunkthreads << Thread.new do

chunk = code[this_chunk...next_chunk]  Thread.current[:out] = Encoder.new.encode Scannerend

end

threads.each(&:join)

out = threads.map { |thread| thread[:out] }.join

soni

2 co

4 co

Questions?

Page 45: Scanning Strings at Supersonic Speed (EuRuKo 2011)

8/6/2019 Scanning Strings at Supersonic Speed (EuRuKo 2011)

http://slidepdf.com/reader/full/scanning-strings-at-supersonic-speed-euruko-2011 45/46

Questions?

not allowed:

• When will CodeRay 1.0 be released?

Thank you!

Page 46: Scanning Strings at Supersonic Speed (EuRuKo 2011)

8/6/2019 Scanning Strings at Supersonic Speed (EuRuKo 2011)

http://slidepdf.com/reader/full/scanning-strings-at-supersonic-speed-euruko-2011 46/46

Thank you!

• @euruko

• @yukihiro_matz

• @bovensiepen

• @heinz_gies

• my girlfriend