![Page 1: Performance Optimization in Apache 2.0 Development: How we made Apache faster, and what we learned from the experience O’Reilly Open](https://reader036.vdocuments.us/reader036/viewer/2022062504/5a4d1b217f8b9ab0599956ec/html5/thumbnails/1.jpg)
Performance Optimization in Apache 2.0 Development:
How we made Apache faster, and what
we learned from the experience
O’Reilly Open Source Convention, San Diego, CA July 24, 2002
![Page 2: Performance Optimization in Apache 2.0 Development: How we made Apache faster, and what we learned from the experience O’Reilly Open](https://reader036.vdocuments.us/reader036/viewer/2022062504/5a4d1b217f8b9ab0599956ec/html5/thumbnails/2.jpg)
Agenda• Introductions
• Performance optimization approach– Specific optimizations in Apache 2.0– General strategy for open-source
software performance improvement
• Results and Next Steps
![Page 3: Performance Optimization in Apache 2.0 Development: How we made Apache faster, and what we learned from the experience O’Reilly Open](https://reader036.vdocuments.us/reader036/viewer/2022062504/5a4d1b217f8b9ab0599956ec/html5/thumbnails/3.jpg)
Goals for Apache 2.0 Performance
• Make the httpd faster
• But what does that mean?– How will we measure speed?– What are we willing to sacrifice for
speed?– And why does performance matter?
![Page 4: Performance Optimization in Apache 2.0 Development: How we made Apache faster, and what we learned from the experience O’Reilly Open](https://reader036.vdocuments.us/reader036/viewer/2022062504/5a4d1b217f8b9ab0599956ec/html5/thumbnails/4.jpg)
Optimization Strategy: Part 1
Know your project’s priorities:•Metrics that matter•Rules of the game
![Page 5: Performance Optimization in Apache 2.0 Development: How we made Apache faster, and what we learned from the experience O’Reilly Open](https://reader036.vdocuments.us/reader036/viewer/2022062504/5a4d1b217f8b9ab0599956ec/html5/thumbnails/5.jpg)
Performance Guidelines• Metrics that matter for Apache:
– Throughput• HTTP requests per second
– Resource utilization• CPU, memory
• Rules of the game for Apache:– Keep the server portable, reliable,
configurable, maintainable, and compatible
![Page 6: Performance Optimization in Apache 2.0 Development: How we made Apache faster, and what we learned from the experience O’Reilly Open](https://reader036.vdocuments.us/reader036/viewer/2022062504/5a4d1b217f8b9ab0599956ec/html5/thumbnails/6.jpg)
Making Strategic Tradeoffs• Use these metrics and rules to make
effective tradeoffs
• Example: Table data structures– Slow, O(n)-time lookups; a significant
bottleneck– But 3rd party code depended upon the array-
based implementation (wasn’t well abstracted)– Solution: keep the O(n) design, but optimize it
heavily (improve the throughput metric, but maintain compatibility)
![Page 7: Performance Optimization in Apache 2.0 Development: How we made Apache faster, and what we learned from the experience O’Reilly Open](https://reader036.vdocuments.us/reader036/viewer/2022062504/5a4d1b217f8b9ab0599956ec/html5/thumbnails/7.jpg)
Optimization Strategy: Part 2
Profile early, profile often
![Page 8: Performance Optimization in Apache 2.0 Development: How we made Apache faster, and what we learned from the experience O’Reilly Open](https://reader036.vdocuments.us/reader036/viewer/2022062504/5a4d1b217f8b9ab0599956ec/html5/thumbnails/8.jpg)
Profiling Tools• We used traditional code profiling tools
to find the slow functions and basic blocks– gprof– Quantify– OProfile
• Plus tracing tools to profile system calls– truss– strace
• And occasional custom instrumentation
![Page 9: Performance Optimization in Apache 2.0 Development: How we made Apache faster, and what we learned from the experience O’Reilly Open](https://reader036.vdocuments.us/reader036/viewer/2022062504/5a4d1b217f8b9ab0599956ec/html5/thumbnails/9.jpg)
Profile-Driven Optimization• Profiling helps to create an
informal roadmap:– Small problems: fix the code now– Medium problems: phase in API changes &
faster code– Large problems: rearchitect
![Page 10: Performance Optimization in Apache 2.0 Development: How we made Apache faster, and what we learned from the experience O’Reilly Open](https://reader036.vdocuments.us/reader036/viewer/2022062504/5a4d1b217f8b9ab0599956ec/html5/thumbnails/10.jpg)
Profile-Driven OptimizationApache 2.0 optimizations due to profiling, throughout the entire request processing flow:
Faster accept(2)serialization
Less buffercopying
More scalable, multi-threaded memory allocator
Faster MIME-typemapper and configmerge
Less stringmanipulation
Complete rewrite ofserver-side-includeparser
Platform-specificsocket I/O speedups
Timestamp cachingin access logger
ReadRequest
Create RequestData Structures
Map URLto File
DetermineContent-Type
Stream OutputThrough Filters
Send ResponseTo Client
AcceptConnection
LogRequest
OpenFile
![Page 11: Performance Optimization in Apache 2.0 Development: How we made Apache faster, and what we learned from the experience O’Reilly Open](https://reader036.vdocuments.us/reader036/viewer/2022062504/5a4d1b217f8b9ab0599956ec/html5/thumbnails/11.jpg)
Optimization Strategy: Part 3
Take advantage ofimprovements in the platform
![Page 12: Performance Optimization in Apache 2.0 Development: How we made Apache faster, and what we learned from the experience O’Reilly Open](https://reader036.vdocuments.us/reader036/viewer/2022062504/5a4d1b217f8b9ab0599956ec/html5/thumbnails/12.jpg)
Platform Optimizations• 2.0 uses fast platform features if
available:– sendfile(2)– unserialized or pthread-mutex-serialized
accept(2)– Atomic operations
![Page 13: Performance Optimization in Apache 2.0 Development: How we made Apache faster, and what we learned from the experience O’Reilly Open](https://reader036.vdocuments.us/reader036/viewer/2022062504/5a4d1b217f8b9ab0599956ec/html5/thumbnails/13.jpg)
Platform Optimizations• Apache Portable Runtime (APR) library
abstracts the OS specifics– “Greatest common denominator” approach– Write your application code to use efficient
OS features– On platforms where those features are not
available, APR will emulate them
• In 2.0, the concurrency model is a plug-in– We can add better threading models for
specific platforms
![Page 14: Performance Optimization in Apache 2.0 Development: How we made Apache faster, and what we learned from the experience O’Reilly Open](https://reader036.vdocuments.us/reader036/viewer/2022062504/5a4d1b217f8b9ab0599956ec/html5/thumbnails/14.jpg)
Optimization Strategy, Part 4
Use the powerof distributed development
![Page 15: Performance Optimization in Apache 2.0 Development: How we made Apache faster, and what we learned from the experience O’Reilly Open](https://reader036.vdocuments.us/reader036/viewer/2022062504/5a4d1b217f8b9ab0599956ec/html5/thumbnails/15.jpg)
Distributed Development• Just like open source debugging, open-
source performance tuning scales well as more people work on a problem
• “Redundant” coding has worked well:– Multiple people implementing different
approaches to the same problem– Share ideas, compare results, pick the
best implementation
![Page 16: Performance Optimization in Apache 2.0 Development: How we made Apache faster, and what we learned from the experience O’Reilly Open](https://reader036.vdocuments.us/reader036/viewer/2022062504/5a4d1b217f8b9ab0599956ec/html5/thumbnails/16.jpg)
Distributed Optimization Example:SSI Parser
From: Brian Pane Date: 2001-09-05 3:00:35Subject: remaining CPU bottlenecks in 2.0
…Here are the top 30 functions, ranked according totheir CPU utilization. :
CPU timefunction (% of total)-------- ------------find_start_sequence 23.9 …* find_start_sequence() is the main scanning function within mod_include. …
![Page 17: Performance Optimization in Apache 2.0 Development: How we made Apache faster, and what we learned from the experience O’Reilly Open](https://reader036.vdocuments.us/reader036/viewer/2022062504/5a4d1b217f8b9ab0599956ec/html5/thumbnails/17.jpg)
Distributed Optimization Example:SSI Parser
From: Justin Erenkrantz Date: 2001-09-05 8:42:46Subject: [PATCH] Potential replacement for find_start_sequence
…Basically, replace the inner search with aRabin-Karp search…
From: Sander Striker Date: 2001-09-05 8:47:59Subject: Re: [PATCH] Potential replacement for find_start_sequence
…Rabin-Karp introduces a lot of * and %.I'll try Boyer-Moore with precalced tablesfor '<!--#' and '--->'…
From: Sascha Schumann Date: 2001-09-05 10:51:53Subject: Re: [PATCH] Potential replacement for find_start_sequence
…I'd suggest looking at BNDM which combines theadvantages of bit-parallelism (shift-and/-oralgorithms) and suffix automata…
From: Ian Holsman Date: 2001-09-05 16:18:11Subject: [PATCH] Potential replacement for find_start_sequence..--skip5
…I can post my code to the skip5 implementation. Itisn't optimized yet, but in my tests I see a lowerCPU utilization than the standard mod-includes parser…
![Page 18: Performance Optimization in Apache 2.0 Development: How we made Apache faster, and what we learned from the experience O’Reilly Open](https://reader036.vdocuments.us/reader036/viewer/2022062504/5a4d1b217f8b9ab0599956ec/html5/thumbnails/18.jpg)
Distributed Optimization Example:SSI Parser
From: Justin Erenkrantz Date: 2001-09-05 19:08:31Subject: [PATCH] Round 2 of mod_include/find_start_sequence...
…Replaced Rabin-Karp with the bndm algorithm asimplemented by Sascha. Seems to work. Can peopleplease test/review?…
• SSI parser performance improvement:– Before: 23.9% of total usr CPU time– After: 4.8%
• Greater than 4x improvement in 48 hours
![Page 19: Performance Optimization in Apache 2.0 Development: How we made Apache faster, and what we learned from the experience O’Reilly Open](https://reader036.vdocuments.us/reader036/viewer/2022062504/5a4d1b217f8b9ab0599956ec/html5/thumbnails/19.jpg)
Results
![Page 20: Performance Optimization in Apache 2.0 Development: How we made Apache faster, and what we learned from the experience O’Reilly Open](https://reader036.vdocuments.us/reader036/viewer/2022062504/5a4d1b217f8b9ab0599956ec/html5/thumbnails/20.jpg)
ResultsPerformance on a simple file delivery test:
Test case description:– Server running on Solaris 8 on Sun E4000/8x167
MHz, 2GB RAM– 20 concurrent client connections requesting 10KB
non-parsed file over 100Mb/s switched network
httpd Requests/sec
CPU Utilization
1.3.24 777 61%2.0.36 912 77%
![Page 21: Performance Optimization in Apache 2.0 Development: How we made Apache faster, and what we learned from the experience O’Reilly Open](https://reader036.vdocuments.us/reader036/viewer/2022062504/5a4d1b217f8b9ab0599956ec/html5/thumbnails/21.jpg)
ResultsPerformance on a server-parsed (.shtml) file
test:
Test case description:– Server running on Solaris 8 on Sun E4000/8x167 MHz,
2GB RAM– 20 concurrent client connections over 100Mb/s switched
network– .shtml file with virtual includes of five 2KB files
httpd Requests/sec
CPU Utilization
1.3.24 389 94%2.0.37 712 93%
![Page 22: Performance Optimization in Apache 2.0 Development: How we made Apache faster, and what we learned from the experience O’Reilly Open](https://reader036.vdocuments.us/reader036/viewer/2022062504/5a4d1b217f8b9ab0599956ec/html5/thumbnails/22.jpg)
ConclusionNext steps for Apache:
• Continue incremental performance improvements
• Explore highly scalable concurrency models (multiple connections per thread)
![Page 23: Performance Optimization in Apache 2.0 Development: How we made Apache faster, and what we learned from the experience O’Reilly Open](https://reader036.vdocuments.us/reader036/viewer/2022062504/5a4d1b217f8b9ab0599956ec/html5/thumbnails/23.jpg)
ConclusionRecommendations for other projects:
1. Know your project’s priorities:• Metrics that matter• Rules of the game
2. Profile early, profile often3. Take advantage of platform
improvements4. Use the power of distributed
development