devel::nytprof v5 at yapc::na 201406

72
Devel::NYTProf Perl Source Code Profiler Tim Bunce - YAPC::NA - 2014

Upload: tim-bunce

Post on 27-Aug-2014

437 views

Category:

Software


2 download

DESCRIPTION

Slides of my talk on Devel::NYTProf and optimizing perl code at YAPC::NA in June 2014. It covers use of NYTProf and outlines a multi-phase approach to optimizing your perl code. A video of the talk and questions is available at https://www.youtube.com/watch?v=T7EK6RZAnEA&list=UU7y4qaRSb5w2O8cCHOsKZDw

TRANSCRIPT

Page 1: Devel::NYTProf v5 at YAPC::NA 201406

Devel::NYTProfPerl Source Code Profiler

Tim Bunce - YAPC::NA - 2014

Page 2: Devel::NYTProf v5 at YAPC::NA 201406

Devel::DProf Is Broken$ perl -we 'print "sub s$_ { sqrt(42) for 1..100 }; s$_({});\n" for 1..1000' > x.pl

$ perl -d:DProf x.pl

$ dprofpp -rTotal Elapsed Time = 0.108 Seconds Real Time = 0.108 SecondsExclusive Times%Time ExclSec CumulS #Calls sec/call Csec/c Name 9.26 0.010 0.010 1 0.0100 0.0100 main::s76 9.26 0.010 0.010 1 0.0100 0.0100 main::s323 9.26 0.010 0.010 1 0.0100 0.0100 main::s626 9.26 0.010 0.010 1 0.0100 0.0100 main::s936 0.00 - -0.000 1 - - main::s77 0.00 - -0.000 1 - - main::s82

Page 3: Devel::NYTProf v5 at YAPC::NA 201406

Profiling 101The Basics

Page 4: Devel::NYTProf v5 at YAPC::NA 201406

CPU Time Real Time

Subroutines

Statements

? ?? ?

What To Measure?

Page 5: Devel::NYTProf v5 at YAPC::NA 201406

Subroutine vs Statement• Subroutine Profiling- Measures time between subroutine entry and exit

- That’s the Inclusive time. Exclusive by subtraction.- Reasonably fast, reasonably small data files

• Problems- Can be confused by funky control flow (goto &sub)- No insight into where time spent within large subs

- Doesn’t measure code outside of a sub

Page 6: Devel::NYTProf v5 at YAPC::NA 201406

Subroutine vs Statement

• Line/Statement profiling- Measure time from start of one statement to the start

of the next statement, whereever that might be- Fine grained detail

• Problems- Very expensive in CPU & I/O- Assigns too much time in some cases- Too much detail for large subs- Hard to get overall subroutine times

Page 7: Devel::NYTProf v5 at YAPC::NA 201406

CPU Time vs Real Time• CPU Time- Measures time the CPU sent executing your code- Not (much) affected by other load on system- Doesn’t include time spent waiting for i/o etc.

• Real Time- Measures the elapsed time-of-day- Your time is affected by other load on system- Includes time spent waiting for i/o etc.

Page 8: Devel::NYTProf v5 at YAPC::NA 201406

Devel::NYTProf

Page 9: Devel::NYTProf v5 at YAPC::NA 201406

Public Service Announcement!

The NYTProf name is an accident of history I do not work for the New York TimesI have never worked for the New York TimesI have no affiliation with the New York TimesThe New York Times last contributed in 2008

Page 10: Devel::NYTProf v5 at YAPC::NA 201406

Running NYTProfperl -d:NYTProf ...

perl -MDevel::NYTProf ...

Configure profiler via the NYTPROF env varperldoc Devel::NYTProf for the details

To profile code that’s invoked elsewhere:PERL5OPT=-d:NYTProf

NYTPROF=file=/tmp/nytprof.out:addpid=1:...

Page 11: Devel::NYTProf v5 at YAPC::NA 201406

Reporting: KCachegrind

• KCachegrind call graph - new and cool- contributed by C. L. Kao.

- requires KCachegrind

$ nytprofcg # generates nytprof.callgraph

$ kcachegrind # load the file via the gui

Page 12: Devel::NYTProf v5 at YAPC::NA 201406

KCachegrind

Page 13: Devel::NYTProf v5 at YAPC::NA 201406

Reporting: HTML• HTML report

- page per source file, annotated with times and links

- subroutine index table with sortable columns- interactive Treemap of subroutine times

- generates Graphviz dot file of call graph- -m (--minimal) faster generation but less detailed

$ nytprofhtml # writes HTML report in ./nytprof/...

$ nytprofhtml --file=/tmp/nytprof.out.793 --open

Page 14: Devel::NYTProf v5 at YAPC::NA 201406
Page 15: Devel::NYTProf v5 at YAPC::NA 201406

Summary

Links to annotatedsource code

Timings for perl builtins

Link to sortable tableof all subs

Page 16: Devel::NYTProf v5 at YAPC::NA 201406

Inclusive vs Exclusive Time

Inclusivesub foo

Exclusive

sub bar

bar() bar()

foo()

Inclusive

Page 17: Devel::NYTProf v5 at YAPC::NA 201406

Inclusive vs. Exclusive

• Inclusive Time is best for Top Down- Overview of time spent “in and below this sub”

- Useful to prioritize structural optimizations

• Exclusive Time is best for Bottom Up- Detail of time spent “in the code of this sub”- Where the time actually gets spent

- Useful for localized (peephole) optimization

Page 18: Devel::NYTProf v5 at YAPC::NA 201406

Annotated Source View

Page 19: Devel::NYTProf v5 at YAPC::NA 201406

Overall time spent in and below this sub

(in + below)

Color coding based onMedian Average Deviationrelative to rest of this file

Timings for each location that calls this subroutine

Time between starting this perl statement and starting the next.So includes overhead of calls to

perl subs.Timings for each subroutine

called by each line

Page 20: Devel::NYTProf v5 at YAPC::NA 201406
Page 21: Devel::NYTProf v5 at YAPC::NA 201406

Boxes represent subroutinesColors only used to show

packages (and aren’t pretty yet)

Hover over box to see detailsClick to drill-down one level

in package hierarchy

Treemap showing relative proportions of exclusive time

Page 22: Devel::NYTProf v5 at YAPC::NA 201406

Calls between packages

Page 23: Devel::NYTProf v5 at YAPC::NA 201406

Calls to/from/within package

Page 24: Devel::NYTProf v5 at YAPC::NA 201406

Let’s take a look...

Page 25: Devel::NYTProf v5 at YAPC::NA 201406

DEMO

Page 26: Devel::NYTProf v5 at YAPC::NA 201406

OptimizingHints & Tips

Page 27: Devel::NYTProf v5 at YAPC::NA 201406

Do your own testingWith your own perl binary

On your own hardware

Beware My Examples!

Page 28: Devel::NYTProf v5 at YAPC::NA 201406

Take care comparing code fragments!Edge-effects at loop and scope boundaries.Statement time includes time getting to the next perl statement, wherever that may be.

Beware 2!

Page 29: Devel::NYTProf v5 at YAPC::NA 201406

Consider effect of CPU-level data and code cachingTends to make second case look faster!Swap the order to double-check alternatives

Beware Your Examples!

Page 30: Devel::NYTProf v5 at YAPC::NA 201406

Phase 0 Before you start

Page 31: Devel::NYTProf v5 at YAPC::NA 201406

DON’TDO IT!

Page 32: Devel::NYTProf v5 at YAPC::NA 201406

“The First Rule of Program Optimization: Don't do it.

The Second Rule of Program Optimization (for experts only!): Don't do it yet.”

- Michael A. Jackson

Page 33: Devel::NYTProf v5 at YAPC::NA 201406

Why not?

Page 34: Devel::NYTProf v5 at YAPC::NA 201406

“More computing sins are committed in the name of efficiency (without necessarily achieving it) than for any other single reason - including blind stupidity.”

- W.A. Wulf

Page 35: Devel::NYTProf v5 at YAPC::NA 201406

“We should forget about small efficiencies, say about 97% of the time: premature optimization is the root of all evil.”

- Donald Knuth

Page 36: Devel::NYTProf v5 at YAPC::NA 201406

“We should forget about small efficiencies, say about 97% of the time: premature optimization is the root of all evil.Yet we should not pass up our opportunities in that critical 3%.”

- Donald Knuth

Page 37: Devel::NYTProf v5 at YAPC::NA 201406

How?

Page 38: Devel::NYTProf v5 at YAPC::NA 201406

“Throw hardware at it!”

Hardware == CheapProgrammers == Expensive (& error prone)

Hardware upgrades are usually much less risky than software optimizations.

Page 39: Devel::NYTProf v5 at YAPC::NA 201406

“Bottlenecks occur in surprising places, so don't try to second guess and put in a speed hack until you have proven that's where the bottleneck is.”

- Rob Pike

Page 40: Devel::NYTProf v5 at YAPC::NA 201406

“Measure twice, cut once.”

- Old Carpenter’s Maxim

Page 41: Devel::NYTProf v5 at YAPC::NA 201406

Phase 1Low Hanging Fruit

Page 42: Devel::NYTProf v5 at YAPC::NA 201406

Low Hanging Fruit1. Profile code running representative workload.2. Look at Exclusive Time of subroutines.3. Do they look reasonable?4. Examine worst offenders.5. Fix only simple local problems.6. Profile again.7. Fast enough? Then STOP!8. Rinse and repeat once or twice, then move on.

Page 43: Devel::NYTProf v5 at YAPC::NA 201406

“Simple Local Fixes”

Changes unlikely to introduce bugs

Page 44: Devel::NYTProf v5 at YAPC::NA 201406

Move invariant expressionsout of loops

Page 45: Devel::NYTProf v5 at YAPC::NA 201406

Avoid->repeated->chains->of->accessors(...);

Avoid->repeated->chains->of->accessors(...);

Use a temporary variable

Page 46: Devel::NYTProf v5 at YAPC::NA 201406

Use faster accessorsClass::Accessor-> Class::Accessor::Fast--> Class::Accessor::Faster---> Class::Accessor::Fast::XS----> Class::XSAccessor

These aren’t all compatible so consider your actual usage.

(The list above is out of date.)

Page 47: Devel::NYTProf v5 at YAPC::NA 201406

Avoid calling subs that don’t do anything!my $unused_variable = $self->get_foo;

my $is_logging = $log->info(...);while (...) { $log->info(...) if $is_logging; ...}

Page 48: Devel::NYTProf v5 at YAPC::NA 201406

Exit subs and loops earlyDelay initializations

return if not ...a cheap test...;return if not ...a more expensive test...;my $foo = ...initializations...;...body of subroutine...

Page 49: Devel::NYTProf v5 at YAPC::NA 201406

Fix silly code

- return exists $nav_type{$country}{$key}- ? $nav_type{$country}{$key}- : undef;

+ return $nav_type{$country}{$key};

Page 50: Devel::NYTProf v5 at YAPC::NA 201406

Beware pathological regular expressions

Devel::NYTProf shows regular expression opcodes.Consider using no feature 'unicode_strings';

Page 51: Devel::NYTProf v5 at YAPC::NA 201406

Avoid unpacking argsin very hot subs

sub foo { shift->delegate(@_) }

sub bar { return shift->{bar} unless @_; return $_[0]->{bar} = $_[1]; }

Page 52: Devel::NYTProf v5 at YAPC::NA 201406

Avoid unnecessary(capturing parens)

in regex

Page 53: Devel::NYTProf v5 at YAPC::NA 201406

Retest.

Fast enough?

STOP!Put the profiler down and walk away

Page 54: Devel::NYTProf v5 at YAPC::NA 201406

Phase 2Deeper Changes

Page 55: Devel::NYTProf v5 at YAPC::NA 201406

Profile with aknown workload

E.g., 1000 identical requests

Page 56: Devel::NYTProf v5 at YAPC::NA 201406

Check subroutinecall counts

Reasonablefor the workload?

Page 57: Devel::NYTProf v5 at YAPC::NA 201406

Check Inclusive Times(especially top-level subs)

Reasonable percentagefor the workload?

Page 58: Devel::NYTProf v5 at YAPC::NA 201406

Add cachingif appropriate

to reduce calls

Remember cache invalidation!

Page 59: Devel::NYTProf v5 at YAPC::NA 201406

Walk up call chainto find good spots

for caching

Remember cache invalidation!

Page 60: Devel::NYTProf v5 at YAPC::NA 201406

Creating many objectsthat don’t get used?

Try a lightweight proxye.g. DateTime::Tiny, DateTimeX::Lite, DateTime::LazyInit

Page 61: Devel::NYTProf v5 at YAPC::NA 201406

Reconfigure your Perlcan yield useful gains with little effort

thread support costs ~2..30%debugging support costs ~15%

Also consider: usemymalloc, use64bitint, use64bitall, uselongdouble, optimize, disable taint mode.

Consider using a different compiler.

Page 62: Devel::NYTProf v5 at YAPC::NA 201406

Upgrade your Perl

Newer versions often faster at some things(though occasionally slower at others)

Sometimes have specific micro-optimizations

Many memory usage and performanceimprovements from 5.8 thru 5.20

Page 63: Devel::NYTProf v5 at YAPC::NA 201406

Retest.

Fast enough?

STOP!Put the profiler down and walk away.

Page 64: Devel::NYTProf v5 at YAPC::NA 201406

Phase 3Structural Changes

Page 65: Devel::NYTProf v5 at YAPC::NA 201406

Push loops down

- $object->walk($_) for @dogs;

+ $object->walk_these(\@dogs);

Page 66: Devel::NYTProf v5 at YAPC::NA 201406

Use faster modulessort ! Sort::Key

Storable ! Sereal

LWP ! HTTP::Tiny ! HTTP::Lite ! *::Curl ! Hijk

These aren’t all compatible or full-featured or ‘better’Consider your actual needsSee http://neilb.org/reviews/

Page 67: Devel::NYTProf v5 at YAPC::NA 201406

Change the data structure

hashes <–> arrays

Page 68: Devel::NYTProf v5 at YAPC::NA 201406

Change the algorithm

What’s the “Big O”?O(n2) or O(logn) or ...

Page 69: Devel::NYTProf v5 at YAPC::NA 201406

Rewrite hot-spotsin XS / C

Consider Inline::Cbut beware of deployment issues

Page 70: Devel::NYTProf v5 at YAPC::NA 201406

Small changes add up!

“I achieved my fast times by multitudes of 1% reductions”

- Bill Raymond

Page 71: Devel::NYTProf v5 at YAPC::NA 201406

See also “Top 10 Perl Performance Tips”

• A presentation by Perrin Harkins• Covers higher-level issues, including- Good DBI usage- Fastest modules for serialization, caching,

templating, HTTP requests etc.• http://docs.google.com/present/view?id=dhjsvwmm_26dk9btn3g

Page 72: Devel::NYTProf v5 at YAPC::NA 201406

Questions?

[email protected]://blog.timbunce.org

@timbunce on twitter