patterns for high - qconsp.com · patterns for high performance c# federico lois twitter:...
TRANSCRIPT
Patterns for High Performance C#
Federico LoisTwitter: @federicoloisGithub: redknightloisRepo: performance-course
c o r v a l i u s . c o m
“The best programs are written so that
computing machines can perform them
quickly and so that human beings can understand them clearly.”
Donald Knuth
c o r v a l i u s . c o m
Asymptotic Notation
“Big-O notation is a mathematical notation that
describes the limiting behavior of a function when the
argument tends towards a particular value or infinity.”
Bachmann–Landau notation
c o r v a l i u s . c o m
BigO notation
• Instruction Counting. (Turing model)
• Simple, effective for most problems.
• Cache-Oblivious (based on RAM model)
• Incorporates a simple cache to the model.
• Doesn’t explicitly model it’s size.• It can include the tall-cache assumption.
• Cache-Aware (based on RAM model)
• It models explicitely size, structuction, eviction policy.
• Theoretical analysis is pretty complex.
c o r v a l i u s . c o m
Big O in practice
• Useful to evaluate general behavior.• Not necessarily a deal-breaker
• Guides your hypothesis.
• Usually will not represent the behavior
• Until sizes are big enough to dominate• Which may never happen
• Simple models add uncertainty.• Our job is to adjust those variables.
c o r v a l i u s . c o m
Performance Bounds
• Compute Bound
• Memory Bound.
• Input/Output Bound
c o r v a l i u s . c o m
20% of the code consumes 80% of the resources
Pareto Rule (80-20)
…especially bad when they are in the critical path
c o r v a l i u s . c o m
Pareto
20% of the code consumes 80% of the CPU/Memory/IO
c o r v a l i u s . c o m
Pareto
20% of the code consumes 80% of the resources
c o r v a l i u s . c o m
20% of the
20% of the code consumes 64% of the resources
Pareto2
…around of 4% of the code.
c o r v a l i u s . c o m
20% of the
20% of the
20% of the code consumes51% of the resources
Pareto3
…roughly 0,8% of the code.
c o r v a l i u s . c o m
Pareto
Architecture/Network/Algorithm
Optimization Land
c o r v a l i u s . c o m
Pareto
• Choosing the wrong algorithm/data structure.
• Systems outgrowing design parameters.
• Chatty network interfaces: nano-services
• Physical (and not so physical) distance.
• CPU is doing nothing, nichts, nada!
c o r v a l i u s . c o m
Pareto2
Algorithm Time Optimization Land
c o r v a l i u s . c o m
Pareto2
• Doing things more than once.
• CPU is doing stuff, just nothing useful (for you)!
• Memory pressure on GC or allocators
• Thread state hand-off
• Using data structures wrong• I’m watching at you int.GetHashCode() & long.GetHashCode()
c o r v a l i u s . c o m
Pareto3
Micro Optimization Land
c o r v a l i u s . c o m
Pareto3
Voodoo Land
c o r v a l i u s . c o m
Pareto3
... function calls will hurt you
... code alignment will hurt you... useless instructions will hurt you
You get the idea
… false sharing will hurt you
… cache line pollution will hurt you
… memory layout will hurt you
…loop size in bytes will hurt you
c o r v a l i u s . c o m
Secret Sauce for High Performance (???)
• Adopt laziness as a way of life.• Why do things twice when you can do them once.
• Choose the right data structures/algorithms
• Avoid being chatty over the network (aka IO)
• Design for no less than 20x your expected requirements
• Diminish allocations (like the plague)
Measure, Measure and when you are sure,
Measure Again!!(just in case, you know!)
The End!
…of not talking about C#
High Performance C#(even though most would apply to other platforms / frameworks / languages out there…)
c o r v a l i u s . c o m
IF-Switch
c o r v a l i u s . c o m
IF-Switch
• If you know the statistical distribution• IF tends to be more efficient, except when
• You face a uniform distribution.
• You face a non tail distribution.
• Switch builds a perfect hash
• Unless values are consecutive.
c o r v a l i u s . c o m
Try-Catch
c o r v a l i u s . c o m
Try-Catch
CanThrow
c o r v a l i u s . c o m
Try-Catch
c o r v a l i u s . c o m
Try-Catch
Without Try-Catch
c o r v a l i u s . c o m
Interfaces vs Class vs Struct
• Stack Allocation vs Heap Allocation• At least in C#, etc.
• Accessing an struct via interface will allocate on the Heap.• Aka Boxing
• Struct is subject to special optimization.• You can abuse the Dead Code Elimination mechanism to do
simple metaprogramming.
Allocations
• Pooling
• Generalized
• Contextual
• Per operation
• Stack Allocations
• Fixed
• Structs
• Ref/Out Trick
• Ref Return (C# 7)
Allocations
c o r v a l i u s . c o m
Inlining• Compilers do it, and allow you to suggest them targets
• Avoiding the call helps you because it diminishes:
• Instruction Cache Misses
• Push/Pop en el stack
• Number of retired instructions
• Call context changes at the processor
• Avoiding the call increments
• Caller size in bytes.
• Locality of reference
• Collateral effects (among others)
• Dead code elimination
• Contant propagation
c o r v a l i u s . c o m
Inlining – Call Cost
c o r v a l i u s . c o m
Inlining – Call Cost
c o r v a l i u s . c o m
Inlining - Virtual Calls
• They cant be removed without devirtualization.
• The cost of a virtual call is higher than static ones.
c o r v a l i u s . c o m
Constant Propagation
c o r v a l i u s . c o m
Constant Propagation
c o r v a l i u s . c o m
Constant Propagation
c o r v a l i u s . c o m
Simple Metaprogramming
c o r v a l i u s . c o m
Comparers
c o r v a l i u s . c o m
Comparers
c o r v a l i u s . c o m
Comparers
c o r v a l i u s . c o m
Code Flow
c o r v a l i u s . c o m
c o r v a l i u s . c o m
But is it really all thiswork worth the
trouble?
c o r v a l i u s . c o m
Thanks for coming!