apache-cloudandfosslessonslearned-combined
DESCRIPTION
http://posscon.org/assets/Uploads/Apache-CloudandFOSSlessonslearned-combined.pdfTRANSCRIPT
Apache httpd 2.4Currently in beta release
Expected GA: This May!
Significant Improvements
high-performance
cloud suitability
Apache httpd 2.4
Support for async I/O w/o dropping support for older systems
Larger selection of usable MPMs: added Event, Simple, etc...
Leverages higher-performant versions of APR
Apache httpd 2.4
Bandwidth control now standard
Finer control of timeouts, esp. during requests
Controllable buffering of I/O
Support for Lua
Apache httpd 2.4Reverse Proxy Improvements
Supports FastCGI, SCGI
Additional load balancing mechanisms
Runtime changing of clusters w/o restarts
Support for dynamic configuration
mod_proxy
• An Apache module• Implements core proxy capability• Both forward and reverse proxy• In general, most people use it for reverse proxy
(gateway) functionality
How did we get here?• A stroll down mod_proxy lane– First available in Apache 1.1• “Experimental Caching Proxy Server”
– In Apache 1.2, pretty stable, but just HTTP/1.0– In Apache 1.3, much improved with added support
for HTTP/1.1– In Apache 2.0, break out cache and proxy– In Apache 2.2, lay framework
Proxy Improvements• Becoming a robust but generic proxy implementation• Support various protocols– HTTP, HTTPS, CONNECT, FTP– AJP, FastCGI, SCGI, WSGI (soon)– Load balancing
• Clustering, failover
AJP? Really?
• Yep, Apache can now talk AJP with Tomcat directly• mod_proxy_ajp is the magic mojo• Other proxy improvements make this even more
exciting• mod_jk alternative
But I like mod_jk• That’s fine, but...– Now the config is much easier and more
consistent ProxyPass /servlets ajp://tc.example.com:
8089
– Easier when Apache needs to proxy both HTTP and AJP– Leverage improvements in proxy module
Reverse Proxy
Internet
Firewall Firewall
Cloud
Reverse Proxy Server
Transactional Servers
Browser
• Operated at the server end of the transaction
• Completely transparent to the Web Browser – thinks the Reverse Proxy Server is the real server
Features of Reverse Proxy • Security
– Uniform security policy can be administered
– The real transactional servers are behind the firewall
• Delegation, Specialization, Load Balancing
Configuring Reverse Proxy
• Set ProxyRequests Off
• Apply ProxyPass, ProxyPassReverse and possibly RewriteRule directives
Reverse Proxy Directives:• Allows remote server to be mapped into the space of
the local (Reverse Proxy) server
• Example:
– ProxyPass /secure/ http://secureserver/
– Presumably “secureserver” is inaccessible directly from the internet
Reverse Proxy Directives:• Used to specify that redirects issued by the remote
server are to be translated to use the proxy before being returned to the client.
• Syntax is identical to ProxyPass; used in conjunction with it
• Example:– ProxyPass /secure/ http://secureserver/
Simple Rev Proxy• All requests for /images to a backend server
• ProxyPass /images http://images.example.com/
• ProxyPass <path> <scheme>://<full url>
• Useful, but limited
• What if:
– images.example.com dies?
– traffic for /images increases
Baby got back• We need more backend servers
• And balance the load between them
• Before 2.2, mod_rewrite was your only option
• Some people would prefer spending an evening with an Life Insurance salesman rather than deal with mod_rewrite
Load Balancer
• mod_proxy_balancer.so
• mod_proxy can do native load balancing
– weight by actual requests
– weight by traffic
– weight by busyness
– lbfactors
Load Balancer
• LB algorithms are implemented as providers
– easy to add
– no core code changes required
– growing list of methods
Load Balancer• Backend connection pooling
• Available for named workers:
– eg: ProxyPass /foo http://bar.example.com
• Reusable connection to origin
– For threaded MPMs, can adjust size of pool (min, max, smax)
– For prefork: singleton
• Shared data held in shared memory
Pooling example<Proxy balancer://foo>
BalancerMember http://www1.example.com:80/ loadfactor=1
BalancerMember http://www2.example.com:80/ loadfactor=1
BalancerMember http://www3.example.com:80/ loadfactor=4 status=+h
ProxySet lbmethod=bytraffic
</Proxy>
Load Balancer• Sticky session support
– aka “session affinity”
• Cookie based
– stickysession=PHPSESSID
– stickysession=JSESSIONID
• Natively easy with Tomcat
• May require more setup for “simple” HTTP proxying
Load Balancer• Cluster set with failover
• Group backend servers as numbered sets
– balancer will try lower-valued sets first
– If no workers are available, will try next set
• Hot standby
Example<Proxy balancer://foo>
BalancerMember http://php1:8080/ loadfactor=1
BalancerMember http://php2:8080/ loadfactor=4
BalancerMember http://phpbkup:8080/ loadfactor=4 status=+h
BalancerMember http://offsite1:8080/ lbset=1
BalancerMember http://offsite2:8080/ lbset=1
ProxySet lbmethod=bytraffic
</Proxy>
ProxyPass /apps/ balancer://foo/
Embedded Admin• Allows for real-time
–Monitoring of stats for each worker
– Adjustment of worker params• lbset
• load factor
• route
• enabled / disabled
• ...
Embedded Admin• Allows for real-time
• Addition of new workers/nodes
• Change of LB methods
• Can be persistent
• More RESTful
• Can be CLI-driven
Easy setup
<Location /balancer-manager>
SetHandler balancer-manager
Order Deny,Allow
Deny from all
Allow from 192.168.2.22
</Location>
Some tuning params• For workers:
– loadfactor• normalized load for worker [1]
– lbset• worker cluster number [0]
– retry• retry timeout, in seconds, for failed workers [60]
Some tuning params• For workers - connection pool:
–min• Initial number of connections [0]
–max• Hard maximum number of connections [1|TPC]
– smax:• soft max - keep this number available [max]
• time to live for connections above smax
Some tuning params• For workers - connection pool:
– disablereuse:
• bypass the connection pool
– ttl• time to live for connections above smax
Some tuning paramsFor workers (cont):– connectiontimeout/timout• Connection timeouts on backend [ProxyTimeout]
– flushpackets *• Does proxy need to flush data with each chunk of
data?– on : Yes | off : No | auto : wait and see
– flushwait *• ms to wait for data before flushing
Some tuning params
For workers (cont):– status (+/-)• D : disabled• S : Stopped• I : Ignore errors• H : Hot standby• E : Error
Some tuning paramsFor balancers:– lbmethod• load balancing algo to use [byrequests]
– stickysession• sticky session name (eg: PHPSESSIONID)
–maxattempts• failover tries before we bail
– Nofailover• Back-ends don't support failover so don't send session when failing over
Recent improvements• ProxyPassMatch– ProxyPass can now take regex’s instead of just
“paths”• ProxyPassMatch ^(/.*\.gif)$ http://backend.example.com$1
– JkMount migration• Or
– ProxyPass ~ ^(/.*\.gif)$ http://backend.example.com$1 • mod_rewrite is balancer aware
Recent improvements• ProxyPassReverse is NOW balancer aware!• The below will work:
<Proxy balancer://foo>
BalancerMember http://php1:8080/ loadfactor=1
BalancerMember http://php2:8080/ loadfactor=4
</Proxy>
ProxyPass /apps/ balancer://foo/
ProxyPassReverse /apps balancer://foo/
Useful Envars
• BALANCER_SESSION_STICKY– This is assigned the stickysession value used in the current request. It is
the cookie or parameter name used for sticky sessions
• BALANCER_SESSION_ROUTE– This is assigned the route parsed from the current request.
• BALANCER_NAME– This is assigned the name of the balancer used for the current request. The
value is something like balancer://foo.
Useful Envars• BALANCER_WORKER_NAME
– This is assigned the name of the worker used for the current request. The value is something like http://hostA:1234.
• BALANCER_WORKER_ROUTE– This is assigned the route of the worker that will be used for the current request.
• BALANCER_ROUTE_CHANGED– This is set to 1 if the session route does not match the worker route
(BALANCER_SESSION_ROUTE != BALANCER_WORKER_ROUTE) or the session does not yet have an established route. This can be used to determine when/if the client needs to be sent an updated route when sticky sessions are used.
Putting it all together<Proxy balancer://foo>
BalancerMember http://php1:8080/ loadfactor=1
BalancerMember http://php2:8080/ loadfactor=4
BalancerMember http://phpbkup:8080/ loadfactor=4 status=+h
BalancerMember http://phpexp:8080/ lbset=1
ProxySet lbmethod=bytraffic
</Proxy>
<Proxy balancer://javaapps>
BalancerMember ajp://tc1:8089/ loadfactor=1
BalancerMember ajp://tc2:8089/ loadfactor=4
ProxySet lbmethod=byrequests
</Proxy>
Putting it all together
ProxyPass /apps/ balancer://foo/
ProxyPass /serv/ balancer://javaapps/
ProxyPass /images/ http://images:8080/
Manipulating HTTP Headers:
• Modify HTTP request and response headers
– Can be used in Main server, Vhost, Directory, Location, Files sections
– Headers can be merged, replaced or removed
– Pass on client-specific data to the backend server
• IP Address, Request scheme (HTTP, HTTPS), UserAgent, SSL connection info, etc.
Manipulating HTTP Headers:
• Shield backend server’s info from the clients
– Strip out Server name
– Server IP address
– etc.
Header examples• Copy all request headers that begin with “TS” to response headers
– Header echo ^TS
• Say hello to Joe
– Header add JoeHeader “Hello Joe!”
• If header “MyRequestHeader: value” is present, response will contain “MyHeader” header:
– SetEnvIf MyRequestHeader value HAVE_MyRequestHeader
– Header add MyHeader “%D %t mytext” env=HAVE_MyRequestHeader
Header examples• Remember, sequence is important! Following will
result in “MHeader” to be stipped from the response:
– RequestHeader append MyHeader “value1”
– RequestHeader append MyHeader “value2”
– RequestHeader unset MyHeader
Example:• Pass additional info about Client Browsers to the App Server:
ProxyPass / http://backend.covalent.net
ProxyPassReverse / http://backend.covalent.net
RequestHeader set X-Forwarded-IP %{REMOTE_ADDR}e
RequestHeader set X-Request-Scheme %{REQUEST_SCHEME}e
• App Server receives the following HTTP headers:
– X-Forwarded-IP: 10.0.0.3
– X-Request-Scheme: https
Using mod-rewrite example# mod_proxy lb example using request parameter
RewriteEngine On
# Use mod_rewrite to insert a node name into the url
RewriteCond %{QUERY_STRING} accountId=.*([0-2])\b
RewriteRule ^/sampleApp/(.*) balancer://tc1/$1 [P]
RewriteCond %{QUERY_STRING} accountId=.*([3-6])\b
RewriteRule ^/sampleApp/(.*) balancer://tc2/$1 [P]
RewriteCond %{QUERY_STRING} accountId=.*([7-9])\b
RewriteRule ^/sampleApp/(.*) balancer://tc3/$1 [P]
# No ID - round robin to all nodes
ProxyPass /sampleApp/ balancer://all/
Using mod-rewrite example
<Proxy balancer://tc1>
# Default worker for this balancer
BalancerMember http://linux6401.dev.local:8080/sampleApp lbset=1
# Backup balancers for node failure - used in round robin
# no stickyness
BalancerMember http://linux6402.dev.local:8081/sampleApp lbset=1 status=H
BalancerMember http://linux6403.dev.local:8081/sampleApp lbset=1 status=H
# Maintenance balancer used to re-route traffic for upgrades etc
BalancerMember http://linux6404.dev.local:8080/sampleApp status=D
</Proxy>
Using mod-rewrite example<Proxy balancer://tc2>
BalancerMember http://linux6402.dev.local:8080/sampleApp lbset=1
# Backup balancers for node failure - used in round robin
# no stickyness
BalancerMember http://linux6401.dev.local:8081/sampleApp lbset=1 status=H
BalancerMember http://linux6403.dev.local:8081/sampleApp lbset=1 status=H
# Maintenance balancer used to re-route traffic for upgrades etc
BalancerMember http://linux6404.dev.local:8080/sampleApp status=D
</Proxy>
Using mod-rewrite example<Proxy balancer://tc3>
BalancerMember http://linux6403.dev.local:8080/sampleApp lbset=1
# Backup balancers for node failure - used in round robin
# no stickyness
BalancerMember http://linux6401.dev.local:8081/sampleApp lbset=1 status=H
BalancerMember http://linux6402.dev.local:8081/sampleApp lbset=1 status=H
# Maintenance balancer used to re-route traffic for upgrades etc
BalancerMember http://linux6404.dev.local:8080/sampleApp status=D
</Proxy>
Using mod-rewrite example<Proxy balancer://all>
BalancerMember http://linux6401:8080/sampleApp
BalancerMember http://linux6402:8080/sampleApp
BalancerMember http://linux6403:8080/sampleApp
</Proxy>
<Location /balancer-manager>
SetHandler balancer-manager
Order deny,allow
Deny from all
Allow from .dev.local
</Location>
What’s on the horizon?
• Improving AJP
• Adding additional protocols
• mass_vhost like clusters/proxies
• More dynamic configuration
Introduction
Jim Jagielski
Longest still-active developer/contributor
Co-founder of the ASF
Member, Director and President
The ASF
ASF == The Apache Software Foundation
Before the ASF there was “The Apache Group”
The ASF was incorporated in 1999
The ASFNon-profit corporation founded in 1999
501( c )3 charity
Volunteer organization
Virtual world-wide organization
Exists to provide the organizational, legal, and financial support for various OSS projects
The ASF’s MissionProvide open source software to the public free of charge
Provide a foundation for open, collaborative software development projects by supplying hardware, communication, and business infrastructure
Create an independent legal entity to which companies and individuals can donate resources and be assured that those resources will be used for the public benefit
The ASF’s MissionProvide a means for individual volunteers to be sheltered from legal suits directed at the Foundation’s projects
Protect the ‘Apache’ brand, as applied to its software products, from being abused by other organizations
Provide legal and technical infrastructure for open source software development and to perform appropriate oversight of
How We WorkThe Apache Software Foundation provides support for the Apache community of open-source software projects. The Apache projects are characterized by a collaborative, consensus based development process, an open and pragmatic software license, and a desire to create high quality software that leads the way in its field. We consider ourselves not simply a group of projects sharing a server, but rather a community of developers and users.
Structure of the ASF - dev
Volunteer Driven Organization
Software Projects are managed by Project Management Committees (PMCs)
PMCs vote in new PMC members and committers
At the end of the day: People / Individual focused
Structure of the ASF - legalMember-based corporation - individuals only
Members nominate and elect new members
Members elect a board - 9 seats
Semi-annual meetings via IRC
Each PMC has a Chair - eyes and ears of the board (oversight only)
ASF “Org Chart”Development Administrative
Users
Patchers/Buggers
Contributors
Committers
PMC Members
Members
Officers
Board
Issues with Dual Stacks
Despite clear differentiation, sometimes there are leaks
eg: PMC chair seen as “lead” developer
Sometimes officers are assumed to have too much power if they venture into development issues
“hats”
Why Open Source?Access to the source code
Avoid vendor lock-in (or worse!)
Much better software
Better security record (more eyes)
Much more nimble development - frequent releases
Direct user input
The draw of Open SourceHaving a real impact in the development and direction of IT
Personal satisfaction: I wrote that!
Sense of membership in a community
Sense of accomplishment - very quick turnaround times
Developers and engineers love to tinker - huge opportunity to do so
Open Source FUD
No quality or quality control
Prevents or slows development
Have to “give it away for free”
No real innovation
What is Open Source?Open Source Licensing
OSI Approved
Free Software
As in Free Speech, not Free Beer
What is Open Source?Basically, it’s a “new” way to develop, license and distribute code
Actually, there was “open source” even before it was called that
The key technologies behind the Internet and the Web are all Open Source based
True Open Source
For software to be Open Source, it must be under an OSI approved Open Source License
At last count, over 70 exist
Open Source LicensesGive Me Credit
AL, BSD, MIT
Give Me Fixes
(L)GPL, EPL, MPL
Give Me Everything
GPL
- Dave Johnson http://rollerweblogger.org/page/roller?entry=gimme_credit_gimme_fixes_gimmem
The Apache License (AL)A liberal open source software license - BSD-like
Business friendly
Requires attribution
Includes Patent Grant
Easily reused by other projects & organizations
Give Me Fixes
MPL / EPL / (L)GPL
Used mostly with platforms or libraries
Protects the licensed code, but allows larger derivative works with different licensing
Still very business friendly
Give Me EverythingGPL (copyleft)
Derivative works also under GPL
Linked works could also be under GPL
Viral nature may likely limit adoption
GPL trumps all others or else incompatible
License Differences
Mainly involve the licensing of derivative works
Only really applies during (re)distribution of work
Where the “freedom” should be mostly focused: the user or the code itself
One True License
There is no such thing
Licensing is selected to address what you are trying to do
In general, Open Standards do better with AL-like license
The Apache Way
Although the term is deprecated, “The Apache Way” relates to how the ASF (and its projects) work and operate
Basically, the least common denominators on how PMCs operate
Basic MemesMeritocracy
Peer-based
Consensus decision making
Collaborative development
Responsible oversight
Meritocracy
“Govern by Merit”
Merit is based on what you do
Merit never expires
Those with merit, get more responsibility
Peer-basedDevelopers represent themselves - individuals
Mutual trust and respect
All votes hold the same weight
Community created code
Healthy communities create healthy code
Poisonous communities don’t
Look Familiar?
These concepts are not new or unique
Best practices regarding how the Scientific and Health community works
Publish or Perish
In Open Source, frequent releases indicate healthy activity
What is collaborative s/w development other than peer review?
Think how restrictive research would be w/o open communication
Why Community -> CodeSince we are all volunteers, people’s time and interests change
A healthy community is “warm and inviting” and encourages a continued influx of developers
Poisonous people/communities turn people off, and the project will die
End result - better code, long-term code
Consensus decision makingKey is the idea of voting
+1 - yes
+0 - no real comment
-1 - veto
Sometimes you’ll also see stuff like -0, -0.5, etc...
Voting
The main intent is to gauge developer acceptance
Vetos must be justifiable and have sound technical merit
If valid, Vetos cannot be overruled
Vetos are very rare
Commit ProcessReview Then Commit (RTC)
A patch is submitted to the project for inclusion
If at least 3 +1s and no -1s, code is committed
Good for stable branches
Ensures enough “eyes on the code” on a direct-to-release path
Commit ProcessCommit Then Review (CTR)
A patch is committed directly to the code
Review Process happens post commit
Good for development branches
Depends on people doing reviews after the fact
Allows very fast development
Commit ProcessLazy Consensus
variant of RTC
“I plan on committing this in 3 days”
Provides opportunity for oversight, but with known “deadline”
As always, can be vetoed after the fact
Collaborative Development
Code is developed by the community
Voting ensures at least 3 active developers
Development done online and on-list
If it didn’t happen on-list, it didn’t happen
Collaborative Development
Mailing lists are the preferred method
Archived
Asynchronous
Available to anyone - public list
Collaborative DevelopmentOther methods are OK, if not primary
Wikis
IRC
F2F
Always bring back to the list
The Apache Incubator
Entry point for all new projects and codebases
Indoctrinates the Apache Way to the podling
Ensures and tracks IP
Contributor License Agreement
aka: iCLA (for individual)
Required of all committers
Guarantees:
The person has the authority to commit the code
That the ASF can relicense the code
Does NOT assign copyright
Success Stories - HTTPDApache HTTP Server (“Apache”)
Reference implementation of HTTP
Most popular web server in existance
Found in numerous commercial web servers
Oracle, IBM,...
Influenced countless more
Success Stories - HTTPDBy having a “free” and open source reference implementation, the drive to create a separate proprietary version was reduced.
“Why spend time and money, when we can use this”
This allowed HTTP (and the Web) to grow and STAY usable (compare to the old browser wars)
Success Stories - TomcatApache Tomcat (Servlet Container)
The default standard servlet container
Each version maps to a specific spec.
Bundled with numerous Java apps out there
Likely a major influence on the diminishing relevance of JEE
Success Stories - OthersApache Geronimo (Server Framework - JEE )
Not “just” a JEE server
Apache Maven (Java project build/management tool)
Another industry standard
Apache Logging, Axis, Struts, ...
Internal AdvantagesMuch tighter community
better communication (esp. with “users”)
tighter feedback loop
Ready made audience
Don’t need to “sell” the aspects of Open Source which scare/confuse IT people
Governance Gotchas
Trust and Merit are earned, which implies time
The available pool might be relatively small
Some Concluding ThoughtsTrust your developers AND your users
Communication is key
Open Source is NOT the Good Housekeeping Seal Of Approval
But don’t believe in all the FUD either
Success is not measured in market share, but in adoption
Some Concluding Thoughts
Open Source should have a viable business or emotional reason
Give some thought to licensing
Make it easier for developers and users to “join”
Give them a reason to
Helpful links
The Apache Software Foundation
www.apache.org
Want to help a great organization?
www.marylandstateboychoir.org