0.5mln packets per second with erlang
DESCRIPTION
LINCX is an OpenFlow switch written in Erlang and running on LING (Erlang on Xen). It shows some remarkable performance. The presentation discusses various speed-related optimizations.TRANSCRIPT
0.5 mln packets per second with Erlang
Nov 22, 2014
Maxim Kharchenko
CTO/Cloudozer LLP
The road map• Erlang on Xen intro
• LINCX project overview
• Speed-related notes
– Arguments are registers
– ETS tables are (mostly) ok
– Do not overuse records
– GC is key to speed
– gen_server vs. barebone process
– NIFS: more pain than gain
– Fast counters
– Static compiler?
• Q&A
Erlang on Xen a.k.a. LING
• A new Erlang platform that runs without OS
• Conceived in 2009
• Highly-compatible with Erlang/OTP
• Built from scratch, not a “port”
• Optimized for low startup latency
• Open sourced in 2014 (github.com/cloudozer/ling)
• Local and remote builds
Go to erlangonxen.org
Zerg demo: zerg.erlangonxen.org
The road map
• Erlang on Xen intro• LINCX project overview
• Speed-related notes
– Arguments are registers
– ETS tables are (mostly) ok
– Do not overuse records
– GC is key to speed
– gen_server vs. barebone process
– NIFS: more pain than gain
– Fast counters
– Static compiler?
• Q&A
LINCX: project overview
• Started in December, 2013
• Initial scope = porting LINC-Switch to LING
• High degree of compatibility demonstrated for LING
• Extended scope = fix LINC-Switch fast path
• Beta version of LINCX open sourced on March 3, 2014
• LINCX runs 100x faster than the old code
LINCX repository:github.com/FlowForwarding/lincx
Raw network interfaces in Erlang• LING adds raw network interfaces:
Port = net_vif:open(“eth1”, []),port_command(Port, <<1,2,3>>),receive{Port,{data,Frame}} >‐...
• Raw interface receives whole Ethernet frames
• LINCX uses standard gen_tcp for the control connection and net_vif -
for data ports
• Raw interfaces support mailbox_limit option - packets get dropped if
the mailbox of the receiving process overflows:
Port = net_vif:open(“eth1”, [{mailbox_limit,16384}]),...
Testbed configuration
* Test traffic goes between vm1 and vm2
* LINCX runs as a separate Xen domain
* Virtual interfaces are bridged in Dom0
IXIA confirms 460kpps peak rate• 1GbE hw NICs/128 byte packets
• IXIA packet generator/analyzer
Processing delay and low-level stats
• LING can measure a processing delay for a packet:
1> ling:experimental(processing_delay, []).Processing delay statistics:Packets: 2000 Delay: 1.342us + 0.143 (95%)‐
• LING can collect low-level stats for a network interface:
1> ling:experimental(llstat, 1). %% stop/displayDuration: 4868.6msRX: interrupts: 69170 (0 kicks 0.0%) (freq 14207.4/s period 70.4us)RX: reqs per int: 0/0.0/0RX: tx buf freed per int: 0/8.5/234TX: outputs: 1479707 (112263 kicks 7.6) (freq 303928.8/s period 3.3us)TX: tx buf freed per int: 0/0.6/113TX: rates: 303.9kpps 3622.66Mbps avg pkt size 1489.9BTX: drops: 12392 (freq 2545.3/s period 392.9us)TX: drop rates: 2.5kpps 30.26Mbps avg pkt size 1486.0B
The road map
• Erlang on Xen intro
• LINCX project overview• Speed-related notes
– Arguments are registers
– ETS tables are (mostly) ok
– Do not overuse records
– GC is key to speed
– gen_server vs. barebone process
– NIFS: more pain than gain
– Fast counters
– Static compiler?
• Q&A
Arguments are registers
animal(batman = Cat, Dog, Horse, Pig, Cow, State) >‐ feed(Cat, Dog, Horse, Pig, Cow, State);animal(Cat, deli = Dog, Horse, Pig, Cow, State) >‐ pet(Cat, Dog, Horse, Pig, Cow, State);...
%% SLOWanimal(batman = Cat, Dog, Horse, Pig, Cow, State) >‐ feed(Goat, Cat, Dog, Horse, Pig, Cow, State);...
• Many arguments do not make a function any slower
• But do not reshuffle arguments:
ETS tables are (mostly) ok
• A small ETS table lookup = 10x function activations
• Do not use ets:tab2list() inside tight loops
• Treat ETS as a database; not a pool of global variables
• 1-2 ETS lookups on the fast path are ok
• Beware that ets:lookup(), etc create a copy of the data on the heap of
the caller, similarly to message passing
Do not overuse records
• selelement() creates a copy of the tuple
• State#state{foo=Foo1,bar=Bar1,baz=Baz1} creates 3(?)
copies of the tuple
• Use tuples explicitly in performance-critical sections to control
the heap footprint of the code:
%% from 9p.erlmixer({rauth,_,_}, {tauth,_,Afid,_,_}, _) > {write_auth,AFid};‐mixer({rauth,_,_}, {tauth,_,Afid,_,_,_}, _) > {write_auth,AFid};‐mixer({rwrite,_,_}, _, initial) > start_attaching;‐mixer({rerror,_,_}, _, initial) > auth_failed;‐mixer({rlerror,_,_}, _, initial) > auth_failed;‐mixer({rattach,_,Qid}, {tattach,_,Fid,_,_,Aname,_}, initial) >‐ {attach_more,Fid,AName,qid_type(Qid)};mixer({rclunk,_}, {tclunk,_,Fid}, initial) > {forget,Fid};‐
Garbage collection is key to speed
• Heap is a list of chunks
• 'new heap' is close to its head, 'old heap' - to its tail
• A GC run takes 10 s on averageμ• GC may run 1000s times per second
proc_tHTOP
...
How to tackle GC-related issues
• (Priority 1) Call erlang:garbage_collect() at strategic points
• (Priority 2) For the fastest code avoid GC completely – restart the fast
process regularly:
spawn(F, [{suppress_gc,true}]), %% LING only‐
• (Priority 3) Use fullsweep_after option
gen_server vs barebone process
• Message passing using gen_server:call() is 2x slower than Pid ! Msg
• For speedy code prefer barebone processes to gen_servers
• Design Principles are about high availability, not high performance
NIFs: more pain than gain
• A new principle of Erlang development: do not use NIFs
• For a small performance boost, NIFs undermine key properties of
Erlang: reliability and soft-realtime guarantees
• Most of the time Erlang code can be made as fast as C
• Most of performance problems of Erlang are traceable to NIFs, or
external C libraries, which are similar
• Erlang on Xen does not have NIFs and we do not plan to add them
Fast counters• 32-bit or 64-bit unsigned integer counters with overflow - trivial in C,
not easy in Erlang
• FIXNUMs are signed 29-bit integers, BIGNUMs consume heap and are
10-100x slower
• Use two variables for a counter?
foo(C1, 16#ffffff, ...) -> foo(C1+1, 0, ...);foo(C1, C2, ...) > foo(C1,‐ C2+1, ...);...
• LING has a new experimental feature – fast counters:
erlang:new_counter(Bits) > Ref‐erlang:increment_counter(Ref, Incr)erlang:read_counter(Ref)erlang:release_counter(Ref)
Future: static compiler for Erlang
• Scalars and algebraic types
• Structural types only – no nominal types
• Target compiler efficiency not static type checking
• A middle ground between:
• “Type is a first class citizen” (Haskell)
• “A single type is good enough” (Python, Erlang)
Future: static compiler for Erlang - 2
• Challenges:
• Pattern matching compilation
• Type inference for recursive types
y = {(unit | y), x, (unit | y)}
• Work started in 2013
• Currently the compiler is at the proof-of-concept stage
y = nil | {x, y}