web servers guntis bārzdiņš artūrs lavrenovs normunds grūzītis

Post on 19-Jan-2016






Click to see full reader


Web Servers

Guntis BārzdiņšArtūrs LavrenovsNormunds Grūzītis

What a basic web server does

What a basic web server does

● Implements the HTTP protocol● Listens for HTTP requests from clients (e.g. browsers)

● Tries to fulfill them with static content from the file system● A web server itself serves only static files

● Receives content from clients (e.g. via HTML forms, incl. uploading of files)

● Forwards dynamic content requests for external execution

● Does other useful tasks via extension modules

Web server market share

F: Apache 1.1, modules supportedH: Apache supports HTTP/1.1 virtual hostingI: Microsoft IIS/4.0 and Active Server PagesM: Apache 2.0Q: Microsoft .NET frameworkN,O,R: Code Red worm, Nimda worm, SQL Slammer wormV: Google App EngineW: Microsoft Hyper-V

Jun 2015


Constantly has been the most popular server

Highly configurable and extensible (compiled modules)

Runs on many operating systems (primarily, on Unix)

SSL / TSL support

Supports various authentication schemes

Flexible URL rewriting and aliasing

Virtual Hosts

Custom log files, etc.

Apache modules

mod_access Access control based on client hostname or IP address

mod_alias Mapping different parts of the host filesystem in the document tree,

and URL redirection

mod_auth_xxx Various user authentication approaches (file, dbm, form, etc.)

mod_autoindex Automatic directory listings

mod_cgi Execution of CGI scripts

Apache modules

mod_include Server-parsed documents (SSI)

mod_mime Determining document types using file extensions

mod_proxy Caching proxy abilities

mod_rewrite Powerful URI-to-filename mapping using regular expressions

mod_usertrack User tracking using Cookies

Apache modules

mod_ssl Provides strong cryptography via the Secure Sockets

Layer (SSL) and Transport Layer Security (TLS) protocols by the help of the Open Source SSL/TLS toolkit OpenSSL

Since Apache 1.3+ (1998) Latest version: Apache 2.4 (since 2012)

Private and Public keys Thawte (thawte.com), Verisign (verisign.com)

Apache modules

Third-party modules for server-side scripting:

mod_php Executes PHP within Apache

mod_python Executes Python within Apache

mod_ruby Executes Ruby within Apache

mod_jk Connects Tomcat with Apache


Compiling and installing Apache

./configure --enable-layout=Debian

Use Debian style directory layout

--enable-suexec Allows you to uid and gid for spawned processes (CGI, SSI)

--enable-MODULE=shared Compiles, installs and adds the module as .so

--disable-MODULE Some modules are compiled by default (e.g. autoindex, cgi) and

have to be disabled explicitly

vs. e.g. apt-get install <module>

Apache directory layout



Apache control script


Apache configuration files


Default Document Root


Default directory for scripts


Log files (access.log, error.log)


htpasswd, htdigest, htdbm


Apache modules


CGI wrapper

Apache access log

LogFormat "%v %h %l %u %t \"%r\" %>s %b" commonCustomLog /usr/local/apache/logs/access_log common

%v – virtual host %h – remote host %u – user %t - time %r – HTTP request %>s – status code %b – size

www.atlants.lv - - [21/Nov/2004:17:23:36 +0200]

"GET /index.php?m=5 HTTP/1.1" 200 32257

Apache error log

ErrorLog /usr/local/apache/logs/error_logLogLevel warn

[Sun Nov 21 09:13:42 2004] [error] PHP Fatal error: Call to undefined function PN_DBMsgError() in /home/msaule/public_html/referer.

php on line 85

[Sun Nov 21 12:41:09 2004] [error] [client] File does not exist: /home/sms/public_html/favicon.ico

php on line 85

[Sun Nov 21 13:02:50 2004] [error] [client] File does not exist: /home/code/public_html/robots.txt

[Sun Nov 21 13:08:26 2004] [error] [client] File does not exist: /home/refuser2/public_html/_vti_bin/owssvr.dll

[Sun Nov 21 13:08:26 2004] [error] [client] File does not exist: /home/refuser2/public_html/MSOffice/cltreq.asp

Configuring Apache

Edit httpd.conf

Check configuration: apachectl configtest

Restart Apache

Test changes


Virtual hosts

<VirtualHost *>

ServerName www.jrt.lv

ServerAlias www.jrt.com

CustomLog /usr/local/apache/logs/jrt_access_log common

ErrorLog /usr/local/apache/logs/jrt_error_log

DocumentRoot /home/jrt/public_html


Configuring Apache

.htaccess (directory-level, read on every request)

AuthType Basic

AuthUserFile /home/someuser/passwd

AuthName "Admin"

require valid-user


htpasswd -c <password file> <username>



Configuring Apache

Script Engine (PHP, Python, ...)

Browser Web Server



Database Server(MySQL, ...)

Dynamic content


● Linux - Apache - MySQL - PHP● The most common web server stack● Simple to install and configure● Simple to develop web applications● Acceptable performance and security

● apt-get install apache2 mysql-server php5 libapache2-mod-php5


● Unix distributions moving towards MariaDB after the acquisition of MySQL by Oracle● MySQL fork, being led by the original developers of MySQL

● Fast relation DB implementation● Fairly easy to user (app developer)● Different storage engines

● With/without without transactions, memory based, etc.

● Query caching● User quotas


● One of the most popular programming languages for web applications

● Easy to learn (though, bad coding practices)● Interpreted language● Functions from Unix libraries and tools● Huge amount of ready applications, libraries and


● Create a database● Using the MySQL command prompt accessed by

– $ mysql -u root -p– > CREATE DATABASE `example` COLLATE

'utf8_general_ci';– > CREATE TABLE `posts` (...)– > CREATE USER 'example'@'localhost' IDENTIFIED BY

PASSWORD '...'– > GRANT ... ON `example`.* TO 'example'@'localhost';– > INSERT INTO `posts` (`title`,`info`) VALUES


Simple web app

Simple web app

● Or be lazy and use a web interface like phpMyAdmin or Adminer– Download single file adminer.php

– Drop it into /var/www/

– Navigate your browser to http://localhost/adminer.php

– Do all the tasks in browser without really knowing SQL

Simple web app

● Create file example.php in /var/www/● Write your HTML with PHP code inside

– Connect to database

– Select data

– Show data

● Your simple web site is ready● Navigate your browser to http://localhost/example.php● Enjoy result

Simple web app

Simple web app

● From http://localhost/example.php

Webservers cannot create dynamic content by themselves

Two options how to server dynamic content [Apache] modules


Potentially many programming languages PHP, Perl, Python, Java, ...

C, C++, shell scripts, ...

Dynamic content

CGI - Common Gateway Interface

● A standard environment for web servers to interface with external executable programs● Any script or binary executable

● For each request, webserver defines set of environment variables derived from the request and the server configuration

● Web server starts the external program in the prepared environment● No additional libraries required

● Sends GET/POST data as standard input

● Waits for standard output from executed program, and returns it to the client● With additional HTTP headers

● REQUEST_METHOD: name of HTTP method

● PATH_INFO: path suffix, if appended to URL after program name and a slash

● PATH_TRANSLATED: corresponding full path as supposed by server, if PATH_INFO is present

● SCRIPT_NAME: relative path to the program, like /cgi-bin/script.cgi

● QUERY_STRING: the part of URL after the ? character (GET)

● REMOTE_HOST: host name of the client

● REMOTE_ADDR: IP address of the client (dot-decimal)

● Variables passed by the user agent (HTTP_ACCEPT, HTTP_ACCEPT_LANGUAGE, HTTP_USER_AGENT, HTTP_COOKIE and possibly others) contain values of corresponding HTTP headers

● Few more

CGI enivronment variables

CGI example


echo "Content-type: text/plain"

echo ""

echo "Hello world!"

echo "Today is:" `date`

SSI – Server Side Includes

• Directives in HTML pages that are evaluated by the server while the pages are being served

• Without having to serve the entire page via a CGI program

• Configure httpd.conf or .htaccess: Options +Includes

• Two ways to tell Apache which files should be parsed:

• Parse any file with a particular file extension:

• AddType text/html .shtml

• AddOutputFilter INCLUDES .shtml

• Parse files if they have the execute bit set:

• XBitHack on

• For existing files: chmod instead of changing the file name

SSI – Server Side Includes

• <!--#echo var="DATE_LOCAL" -->

• <!--#flastmod file="index.html" -->

• <!--#include virtual="/footer.html" -->

• <!--#include virtual="/cgi-bin/counter.pl" -->

• <!--#exec cmd="ls" -->

• Setting variables

• Conditional expressions

• A simple but Turing complete programming language

• Loops can be implemented via recursive redirects

CGI issues

● Each request forks a new process: a big overhead for process creation and destruction

● All scripts must be interpreted on each request: another overhead● May be reduced by using compiled CGI programs

● Not scalable● Not suitable for modern web servers (needs)● Still widely used in embedded systems (e.g. WiFi

router web management consoles) that require occasional requests


● One or more persistent processes started (pre-forked)● Web server communicates over sockets or TCP● Each process serves many requests● Performance comparable to modules● Facilitates reuse of resources (DB connections, in-

memory caching, etc.)● Separation of web server and dynamic content system● Scalability – deploy processes across a server farm● apt-get install libapache2-mod-fastcgi php5-fpm

Other communication methods

● Integrate the dynamic content generation system with the web server process (Apache modules)

● CGI derivatives● Simple Common Gateway Interface (SCGI): similar to

FastCGI but is designed to be easier to implement

● *SGI (web-server gateway interfaces) implement programming language specific method of communication between web server and applications● WSGI – Python, PSGI – Perl, Rack - Ruby

● Proxy requests to applications that implement communication via HTTP

C10K problem

● Dan Kegel, 1999● Web servers should handle 10,000 clients

simultaneously (not the same as 10K requests)● Operating system kernel limitations● Functionality provided by the operating system● Web server design flaws

C10K – OS kernel

● Open source nature of Unix kernels allowed to quickly identify C10K bottlenecks and fix them

● Networking related algorithms and data structures in Unix kernels originally implemented with complexities O(n|n^2|...) which where fixed to O(1|n)

● As a result networking capabilities of Unix kernels are virtually limitless (limited by hardware resources)

C10K – OS functionality

● Implemented new scalable I/O event notification mechanisms (epoll – Linux, kqueue – *BSD)– Better performance than traditional poll/select

– e.g. on a large number of file descriptors

– Can receive all pending event using one system call

● AIO – the POSIX asynchronous I/O (AIO) interface – allows applications to initiate one or more I/O operations that are performed asynchronously (i.e., in the background)

● The application can select to be notified of completion of the I/O operation in a variety of ways: by delivery of a signal, by instantiation of a thread, or no notification at all

C10K – web server design

● Non-blocking I/O for networking and disk– Don't block waiting on action completion, serve other

requests and wait for notifications about I/O completion

● Many threads– Use all available CPU cores to achieve maximum

concurrency, avoid locking data structures

● Each thread serves many requests– Don't create thread per request, reuse threads, while some

non-blocking action completes process other requests

C10M problem

● 10 million concurrent connections per server● Doubling the CPU speed does not double the number of

open connections● Current Unix kernels can't handle that

– Application thread locks in kernel– Hardware drivers (NIC)– Memory management

● Solution: new generation of high load Unix kernels– 1 main application per server– Minimize system call amount– Minimize kernel work


• A C10K webserver● Apache implements a thread per connection model

● nginx does not create a new process/thread per connection (does not use the thread scheduler as a packet scheduler)● Typically, one single-threaded worker process per CPU

● Each worker can asynchronously handle thousands of concurrent connections (handles the scheduling itself)

• Event-driven: event is a new connection

• Asynchronous: handles interaction for more than one connection at a time

• Non-blocking: does not stop disk I/O because the CPU is busy; works on other events until the I/O is freed up


● Efficient CPU usage● Less cores needed

● Small memory footprint per request● High-performance

● Thousands connections/requests per second

● Often used as front-end to high-load websites● Load-balancing (reverse proxy), caching etc.

High-load web systems

● Busy dynamic web sites cannot reside in one server● Need some strategy how to split load across multiple

web servers● One possible strategy

– One entry point, front-end, which receives all requests and splits the load (e.g. nginx, Varnish)

– Back-ends process requests from redirected from the front-end (e.g. nginx, Apache)


● Starpniekserveris (proxy server)– Reversais

– Kešojošais

– Programmējams

● Slodzes dalītājs (load balancer)● Dinamiskā satura ģenerētājs● Rīki: žurnalēšana, atkļūdošana, monitorēšana● Lietotāji: Facebook, Twitter, WikiLeaks, ThePirateBay

● Izstrādāts Norvēģijā

● Fantastiska veiktspēja pat uz lētā gala serveriem – no 1000 līdz 10000 pieprasījumu uz serveri sekundē tā ir norma

● C + labi C programmētāji

● Izmanto Unix arhitektūras priekšrocības

● Pēc «tjūninga» desmitiem tūkstošu pieprasījumu sekundē, testēšanā pārsniegti 100k/s

● Pieprasījuma orientēta domēnspecifiska konfigurēšanas/programmēšanas valoda VCL


● Jebkura dinamiskas tīmekļa lapas ģenerēšana ir ļoti lēna - atkarībā no vides simtiem vai tūkstošiem reižu lēnāka nekā statiska satura atgriešana

● Lētā gala serveris var ģenerēt pāris simtus šādu dinamisku lapu sekundē

● Jebkurš izstrādes ietvars padara dinamiskas lapas ģenerēšanu vēl desmitiem vai simtiem reižu lēnāku

● Jau tikai daži desmiti pieprasījumi sekundē

● Rupja matemātika: 100x100=10 000 reižu lēnāk kā statiska lapa


● Ideāli būtu atgriezt dinamisku saturu ar veiktspēju līdzīgu statiskām lapām

● Saturu, kas noteiktā laika intervālā būtiski nemainās, iespējams uz laiku saglabāt, lai atkalizmantotu

● Cietā diska izmantošana lēna, labā prakse izmantot tikai RAM vai servera SSD visa kešotā satura glabāšanai

● Katram konkrētam gadījumam jāveido kešošanas stratēģija, kas var būt ļoti subjektīva


● Pēc pieprasījuma adreses (pilnas vai regulāras izteiksmes) var noteikt, kurus pieprasījumus kešot, cik ilgi konkrētu elementu kešot vai nekešot

● Reklamējas, ka var paātrināt lapas atgriešanu no simtiem līdz tūkstošiem reižu, t.i., tikai aptuveni līdz 10 reizēm lēnāk nekā statisks saturs● Ātrs, salīdzinoši ar citām kešošanas pieejām

Varnish kešošana

DSL VCL● Vienkārša sintakse (līdzīga C), kas tiek notranslēta

uz C un tad nokompilēts uz mašīnkodu● =, ==, !=, ~, !~, !, &&, ||, +, “string”● if () {} else {}, set, unset, return

● 9 subrutīnas, kas ir dažādi katra pieprasījuma apstrādes posmi, kurās var kaut ko ietekmēt

● Tikai predefinēti objekti - client, server, req, bereq, beresp, obj, resp

sub vcl_recv {

if (req.request == "GET" && req.url ~ “\.js$”) {

return (lookup); }


VCL apstrādes arhitektūra

Integrēšana● Fiksētais kešošanas laiks var nebūt optimāls

● Saturs var mainīties biežāk par uzstādīto laiku - lietotāji dabū vecu informāciju

● Retāk – serveri veic nevajadzīgu darbu

● Risinājums – jāpaziņo serverim, ka saturs ir jāatjaunina

acl purge { ""/24; }

sub vcl_recv { if (req.request == "PURGE" ) {

if (!client.ip ~ purge) { error 405 "Not allowed."; } return (lookup); } }

sub vcl_hit { if (req.request == "PURGE") {


error 200 "Purged."; } }

Dinamiskā satura ģenerēšana ESI● Bieži vien tīmekļa lapas sastāv no blokiem, kuru

mainība ir dažāda● Vai arī ir neliels informācijas bloks, kas atbilst katram

lietotājam (piemēram, “Sveiks, [Jāni Bērziņ], Tev ir [0] jauns ziņas”)

● Mēs to varam ielādēt pēc lapas ielādes, izmantojot JSON vai arī ģenerēt saturu ar Varnish

<TABLE><TR><esi:include src=”sveiks.html”/></TR>

<TR><TD><esi:include src=”index.html”/></TD>

<TD><esi:include src=”article.html”/></TD></TR>

</TABLE>● Varnish parsē <esi> birkas un saliek elementus kopā, visi

elementi konfigurēti un kešoti kā neatkarīgi

Slodzes dalīšana● Vienu adresi var apstrādāt vairāki ar bakendi● Dažādus url var apstrādāt dažādi bakendi● Monitorēšana

● Beigto serveru atslēgšana (restart, upgrade, repair)● Atdzīvojušos serveru pieslēgšana atpakaļ (arī jauni)

● Faktiski nozīmē, ka var lietot kaudzi LĒTU desktop grade dzelžu dinamiskā satura ģenerēšanai

● Ja pievienojam vēl vienu frontend, tad iegūstam augstu, bet lētu bojājumpiecietība (fault tolerance)

● Ja izmantojam NoSQL vai kā savādāk iegūstam replicētu datubāzi, tad nav nepieciešami dārgi serveri vispār

Varnish lietojums Latvijā$ curl -I www.tvnet.lv

● HTTP/1.1 200 OK

● Server: Apache

● Last-Modified: Wed, 07 Nov 2012 20:09:08 GMT

● Expires: Wed, 07 Nov 2012 20:10:08 GMT

● Cache-Control: max-age=60

● Vary: Accept-Encoding

● Content-Type: text/html; charset=UTF-8

● Content-Length: 185924

● Date: Wed, 07 Nov 2012 20:10:15 GMT

● X-Varnish: 2025605055 2025545136

● Age: 67

● Via: 1.1 varnish

● Connection: keep-alive

● $ curl -I www.delfi.lv

● HTTP/1.1 200 OK

● X-Fe-Node: nuffy

● Content-type: text/html; charset=utf-8

● Server: lighttpd/1.4.31 (PLD Linux)

● Content-Length: 159097

● Date: Wed, 07 Nov 2012 20:20:58 GMT

● X-Varnish: 734492112 734450241

● Age: 58

● Via: 1.1 varnish

● Connection: keep-alive

Situācija šobrīd

● Standarta tīmekļa izstrādes risinājums ir HTTP serveris un kāda klasiska dinamiskā satura ģenerējošā sistēma (PHP, ASP, Python u.c.), pastāv problēmas:● Ilglaicīgie pieprasījumi un pastāvīgie savienojumi● Vienlaicīgi apkalpojamo klientu skaits● Savietojamība ar citām tehnoloģijām● Nākotnes attīstības iespējas

Notikumvirzītie programmēšanas ietvari

● Ideja un realizācija nav jauni (Python Twisted, Perl Object Environment, Ruby EventMachine, Node.js)

● Maza izplatība tīmekļa risinājumos● Risina standarta tehnoloģiju problēmas● Reaktora projektējums, C10K problēma● Ļauj tīmekļa programmētājiem veidot tīkla risinājumus

top related