sip server scalability irt internal seminar kundan singh, henning schulzrinne and jonathan lennox...

SIP Server ScalabilitySIP Server Scalability

IRT Internal SeminarKundan Singh, Henning Schulzrinne

and Jonathan LennoxMay 10, 2005

AgendaAgenda

Why do we need scalability? Scaling the server

SIP express router (Iptel.org) SIPd (Columbia University) Threads/Processes/Events

Scaling using load sharing DNS-based, Identifier-based Two stage architecture

Conclusions27 slides

Internet telephonyInternet telephony(SIP: Session Initiation Protocol)(SIP: Session Initiation Protocol)

bob@example.comalice@yahoo.com yahoo.com example.comREGISTER

INVITE

INVITE 192.1.2.4129.1.2.3

Scalability RequirementsScalability RequirementsDepends on role in the network architectureDepends on role in the network architecture

IP network

SIP/PSTNSIP/MGC

SIP/MGC

Carrier network

Cybercafe

PSTNGW

IP phones

PSTN phones T1 PRI/BRI

Edge ISP server10,000 customers

Carrier (3G)10 million customers

Enterprise server1000 customers

Scalability RequirementsScalability RequirementsDepends on traffic typeDepends on traffic type

Registration (uniform) Authentication, mobile users

Call routing (Poisson) stateful vs stateless proxy, redirect, programmable

scripts Beyond telephony (Don’t know)

Instant message, presence (including sensors), device control

Stateful calls (Poisson arrival, exponential call duration)

Firewall, conference, voicemail Transport type

UDP/TCP/TLS (cost of security)

SIPstoneSIPstoneSIP server performance metricsSIP server performance metrics

Steady state rate for successful registration, forwarding and

unsuccessful call attempts measured using 15 min test runs.

Measure: #requests/s with given delay constraint.

Performance=f(#user,#DNS,UDP/TCP,g(request),L) where g=type and arrival pdf (#request/s), L=logging?

For register, outbound proxy, redirect, proxy480, proxy200.

Parameters Measurement interval, transaction response time,

RPS (registers/s), CPS (calls/s), transaction failure probability<5%,

Delay budget: R1 < 500 ms, R2 < 2000 ms Shortcomings:

does not consider forking, scripting, Via header, packet size, different call rates, SSL. Is there linear combination of results?

Whitebox measurements: turnaround time Extend to SIMPLEstone

Loader Handler

REGISTER

Server

INVITE

180 Ringing180 Ringing

100 Trying

200 OK

200 OK200 OK

200 OK 200 OK

ACKACK

BYE BYE

SQLdatabase

SIP serverSIP serverWhat happens inside a proxy?What happens inside a proxy?

recvfrom oraccept/recv

Match transaction

Modifyresponse

Match transaction

Update DB

Lookup DBBuildresponse

Modify Request

sendto,send orsendmsg

Request

Response

Stateless proxy

stateful

REGISTER

Redirect/reject

(Blocking) I/O

Critical section (lock)

Critical section (r/w lock)

Lessons Learnt (sipd)Lessons Learnt (sipd)In-memory databaseIn-memory database

Call routing involves ( 1) contact lookups

10 ms per query (approx)

Cache (FastSQL) Loading entire

database is easy Periodic refresh

Potentially useful for DNS lookups

SQLdatabase

PeriodicRefresh

< 1 ms

[2002:Narayanan] Single CPU Sun Ultra10 Turnaround time vs RPS

Web config

Lessons Learnt (sipd)Lessons Learnt (sipd)Thread-per-request does not scaleThread-per-request does not scale

One thread per message Doesn’t scale

Too many threads over a short timescale

Stateless: 2-4 threads per transaction

Stateful: 30s holding time

Thread pool + queue Thread overhead less; more useful processing Pre-fork processes for SIP-CGI

Overload management Graceful failure, drop requests over responses

Not enough if holding time is high Each request holds (blocks) a thread

IncomingRequestsR1-4

Fixed number of threads

Thread pool with overload control

Thread per request

What is the best What is the best architecture?architecture?

Event-based Reactive system

Process pool Each pool process

receives and processes to the end (SER)

Thread pool1. Receive and hand-over to pool

thread (sipd)2. Each pool thread receives and

processes to the end3. Staged event-driven: each stage

has a thread pool

Match transaction

Modifyresponse

Match transaction

Update DB

Modify Request

Request

Response

Stateless proxy

stateful

REGISTER

Redirect/reject

Stateless proxyStateless proxyUDP, no DNS, six messages per callUDP, no DNS, six messages per call

Match transaction

Modifyresponse

Match transaction

Update DB

Modify Request

Request

Response

Stateless proxy

stateful

REGISTER

Redirect/reject

Stateless proxyStateless proxyUDP, no DNS, six messages per callUDP, no DNS, six messages per call

Architecture/Hardware

1 PentiumIV 3GHz, 1GB, Linux2.4.20(CPS)

4 pentium, 450MHz, 512 MB, Linux2.4.20(CPS)

1 ultraSparc-IIi, 300 MHz, 64MB, Solaris(CPS)

2 ultraSparc-II, 300 MHz, 256MB, Solaris(CPS)

Event-based 1650 370 150 190

Thread/msg 1400 TBD 100 TBD

Thread-pool1 1450 600 (?) 110 220 (?)

Thread-pool2 1600 1150 (?) 152 TBD

Process-pool 1700 1400 160 350

1xP/ Linux 4xP/ Linux 1xS/ Solaris 2xS/ Solaris

Th/ msg

Th-pool1

Th-pool2

Proc-pool

Stateful proxyStateful proxyUDP, no DNS, eight messages per callUDP, no DNS, eight messages per call

Event-based single thread: socket listener + scheduler/timer

Thread-per-message pool_schedule => pthread_create

Thread-pool1 (sipd) Thread-pool2

N event-based threads Each handles specific subset of requests (hash(call-id))

Receive & hand over to the correct thread poll in multiple threads => bad on multi-CPU

Process pool Not finished yet

Stateful proxyStateful proxyUDP, no DNS, eight messages per callUDP, no DNS, eight messages per call

Architecture/Hardware

1 PentiumIV 3GHz, 1GB, Linux2.4.20(CPS)

4 pentium, 450MHz, 512 MB, Linux2.4.20(CPS)

1 ultraSparc-IIi, 360MHz, 256 MB, Solaris5.9 (CPS)

2 ultraSparc-II, 300 MHz, 256 MB, Solaris5.8 (CPS)

Event-based 1200 300 160 160

Thread/msg 650 175 90 120

Thread-pool1 950 340 (p=4) 120 120 (p=4)

Thread-pool2 1100 500 (p=4) 155 200 (p=4)

Process-pool - - - -

1xP/ Linux 4xP/ Linux 1xS/ Solaris 2xS/ Solaris

Th/ msg

Th-pool1

Th-pool2

Lessons LearntLessons LearntWhat is the best architecture?What is the best architecture?

Stateless CPU is bottleneck Memory is constant Process pool is the

best Event-based not

good for multi-CPU Thread/msg and

thread-pool similar Thread-pool2 close

to process-poll

Stateful Memory can

become bottle-neck

Thread-pool2 is good

But not N x CPU Not good if P

CPU Process pool may

be better (?)

Lessons Learnt (sipd)Lessons Learnt (sipd)Avoid blocking function callsAvoid blocking function calls

DNS 10-25 ms (29 queries) Cache

110 to 900 CPS Internal vs external

non-blocking Logger

Lazy logger as a separate thread Date formatter

Strftime() 10% REG processing Update date variable every second

random32() Cache gethostid()- 37s

Logger:while (1) { lock; writeall; unlock; sleep;}

Lessons Learnt (sipd)Lessons Learnt (sipd)Resource managementResource management

Socket management Problems: OS limit (1024), “liveness” detection, retransmission One socket per transaction does not scale

Global socket if downstream server is alive, soft state – works for UDP

Hard for TCP/TLS – apply connection reuse Socket buffer size

64KB to 128KB; Tradeoff: memory per socket vs number of sockets Memory management

Problems: too many malloc/free, leaks Memory pool

Transaction specific memory, free once; also, less memcpy About 30% performance gain

Stateful: 650 to 800 CPS; Stateless: 900 to 1200 CPS

Stateless processing time (s)

INV 180 200 ACK BYE 200 REG 200

W/o mempool 155 67 67 95 139 62 237 70W/ mempool 111 49 48 64 106 41 202 48Improvement (%) 28 27 28 33 24 34 15 31

Lessons Learnt (SER)Lessons Learnt (SER)OptimizationsOptimizations

Reduce copying and string operations Data lumps, counted strings (+5-10%)

Reduce URI comparison to local User part as a keyword, use r2 parameters

Parser Lazy parsing (2-6x), incremental parsing 32-bit header parser (2-3.5x)

Use padding to align Fast for general case (canonicalized)

Case compare Hash-table, sixth bit

Database Cache is divided into domains for locking

[2003:Jan Janak] SIP proxy server effectiveness, Master’s thesis, Czech Technical University

Lessons Learnt (SER)Lessons Learnt (SER)Protocol bottlenecks and other scalability concernsProtocol bottlenecks and other scalability concerns

Protocol bottlenecks Parsing

Order of headers Host names vs IP address Line folding Scattered headers (Via, Route)

Authentication Reuse credentials in subsequent requests

TCP Message length unknown until Content-Length

Other scalability concerns Configuration:

broken digest client, wrong password, wrong expires Overuse of features

Use stateless instead of stateful if possible Record route only when needed Avoid outbound proxy if possible

Load SharingLoad SharingDistribute load among multiple serversDistribute load among multiple servers

Single server scalability There is a maximum capacity limit

Multiple servers DNS-based Identifier-based Network address translation Same IP address

Load Sharing (DNS-based)Load Sharing (DNS-based)Redundant proxies and databasesRedundant proxies and databases

REGISTER Write to D1 & D2

INVITE Read from D1 or

D2 Database write/

synchronization traffic becomes bottleneck

REGISTER

INVITE

Load Sharing (Identifier-Load Sharing (Identifier-based)based)Divide the user spaceDivide the user space

Proxy and database on the same host

First-stage proxy may get overloaded Use many

Hashing Static vs dynamic

Load SharingLoad SharingComparison of the two designsComparison of the two designs

((tr/D)+1)TN= (A/D) + B

((tr+1)/D)TN= (A/D) + (B/D)

Total time per DB

D = number of database serversN = number of writes (REGISTER)r = #reads/#writes = (INV+REG)/REGT = write latencyt = read latency/write latency

Low reliabilityHigh scale

Scalability (and Reliability)Scalability (and Reliability)Two stage architecture for CINEMATwo stage architecture for CINEMA

Master

sip:bob@example.comsip:bob@b.example.com

a*@example.com

b*@example.com

example.com_sip._udp SRV 0 40 s1.example.com SRV 0 40 s2.example.com SRV 0 20 s3.example.com SRV 1 0 ex.backup.com

a.example.com_sip._udp SRV 0 0 a1.example.com SRV 1 0 a2.example.com

b.example.com_sip._udp SRV 0 0 b1.example.com SRV 1 0 b2.example.com

Request-rate = f(#stateless, #groups)

Bottleneck: CPU, memory, bandwidth?

Load SharingLoad SharingResult (UDP, stateless, no DNS, no mempool)Result (UDP, stateless, no DNS, no mempool)

S P CPS

3 3 2800

2 3 2100

2 2 1800

1 2 1050

0 1 900

Lessons LearntLessons LearntLoad sharingLoad sharing

Non-uniform distribution Identifier distribution (bad hash function) Call distribution => dynamically adjust

Stateless proxy S=1050, P=900 CPS S3P3 => 10 million BHCA (busy hour call attempts)

Stateful proxy S=800, P=650 CPS

Registration (no auth) S=2500, P=2400 RPS S3P3 => 10 million subscribers (1 hour refresh)

Memory pool and thread-pool2/event-based further increase the capacity (approx 1.8x)

Conclusions and future workConclusions and future work Server scalability

Non-blocking, process/events/thread, resource management, optimizations

Load sharing DNS, Identifier, two-stage

Current and future work: Measure process pool performance for stateful Optimize sipd

Use thread-pool2/event-based (?) Memory - use counted strings; clean after 200 (?) CPU - use hash tables

Presence, call stateful and TLS performance (Vishal and Eilon)

Backup slidesBackup slides

Telephone scalabilityTelephone scalability(PSTN: Public Switched Telephone Network)(PSTN: Public Switched Telephone Network)

“bearer” network telephone switch(SSP)

database (SCP)for freephone, calling card, …

signaling network(SS7)

signaling router(STP)

local telephone switch(class 5 switch)10,000 customers20,000 calls/hour

database (SCP)10 million customers2 million lookups/hour

signaling router (STP)1 million customers1.5 million calls/hour

regional telephone switch(class 4 switch)100,000 customers150,000 calls/hour

SIP serverSIP serverComparison with HTTP serverComparison with HTTP server

Signaling (vs data) bound No File I/O (exception: scripts, logging) No caching; DB read and write frequency are

comparable Transactions

Stateful wait for response Depends on external entities

DNS, SQL database Transport

UDP in addition to TCP/TLS Goals

Carrier class scaling using commodity hardware Try not to customize/recompile OS or implement (parts

of) server in kernel (khttpd, AFPA)

Related workRelated workScalability for (web) serversScalability for (web) servers

Existing work Connection dispatcher Content/session-based redirection DNS-based load sharing

HTTP vs SIP UDP+TCP, signaling not bandwidth intensive, no

caching of response, read/write ratio is comparable for DB

SIP scalability bottleneck Signaling (chapter 4), real-time media data,

gateway 302 redirect to less loaded server, REFER session

to another location, signal upstream to reduce

Related workRelated work3GPP (release 5)’s IP Multimedia core network Subsystem 3GPP (release 5)’s IP Multimedia core network Subsystem uses SIPuses SIP

Proxy-CSCF (call session control function) First contact in visited network. 911 lookup. Dialplan.

Interrogating-CSCF First contact in operator’s network. Locate S-CSCF for register

Serving-CSCF User policy and privileges, session control service Registrar

Connection to PSTN MGCF and MGW

Server-based vs peer-to-Server-based vs peer-to-peerpeer

Reliability, failover latency

DNS-based. Depends on client retry timeout, DB replication latency, registration refresh interval

DHT self organization and periodic registration refresh. Depends on client timeout, registration refresh interval.

Scalability, number of users

Depends on number of servers in the two stages.

Depends on refresh rate, join/leave rate, uptime

Call setup latency

One or two steps. O(log(N)) steps.

Security TLS, digest authentication, S/MIME

Additionally needs a reputation system, working around spy nodes

Maintenance, configuration

Administrator: DNS, database, middle-box

Automatic: one time bootstrap node addresses

PSTN interoperability

Gateways, TRIP, ENUM Interact with server-based infrastructure or co-locate peer node with the gateway

Comparison of sipd and SERComparison of sipd and SER

sipd Thread pool Events (reactive

system) Memory pool PentiumIV 3GHz,

1GB, 1200 CPS, 2400 RPS (no auth)

SER Process pool Custom memory

management PentiumIII 850

MHz, 512 MB => 2000 CPS, 1800 RPS

sip server scalability irt internal seminar kundan singh, henning schulzrinne and jonathan lennox...

dns slide

single thread

threadpool21100500 p

slides slide

p cpu process pool

correct thread poll

multicpu process pool

bottleneck threadpool2

Documents

p2p-sip peer to peer internet telephony using sip kundan...

cinema – the columbia internet multimedia architecture...

towards junking the pbx: deploying ip telephony wenyu jiang,...

making services bloom outside the walled garden henning...

charles shen and henning schulzrinne department of computer...

irt 424 dtp irt 425 dtp irt 428 dtp - hedson...assembly...

ubiquitous communications henning schulzrinne (with knarig...

product specification flextrack irt 501 … specification -...

shubham creations · kundan earring polki mala set fancy...

netserv – software- defined networking end- to-end henning...

cs234/netsys210: advanced topics in networking spring 2012...

sip research at columbia university henning schulzrinne...

reliable and scalable internet telephony by kundan singh...

ubiquitous sip henning schulzrinne (with stefan berger,...

henning schulzrinne presentation

optimizing network resources in opportunistic networks se gi...

resume - kundan kumar singh

andoori dishes - kundan tandoori | kundan...

irt 4-20 pcauto irt 4-10 pcauto irt 3-20 pcd irt ......irt...

may 23, 2005alcatel1 advanced multimedia and presence...