sip server scalability irt internal seminar kundan singh, henning schulzrinne and jonathan lennox...
Post on 26-Dec-2015
219 Views
Preview:
TRANSCRIPT
SIP Server ScalabilitySIP Server Scalability
IRT Internal SeminarKundan Singh, Henning Schulzrinne
and Jonathan LennoxMay 10, 2005
2
AgendaAgenda
Why do we need scalability? Scaling the server
SIP express router (Iptel.org) SIPd (Columbia University) Threads/Processes/Events
Scaling using load sharing DNS-based, Identifier-based Two stage architecture
Conclusions27 slides
3
DB
Internet telephonyInternet telephony(SIP: Session Initiation Protocol)(SIP: Session Initiation Protocol)
bob@example.comalice@yahoo.com yahoo.com example.comREGISTER
INVITE
INVITE 192.1.2.4129.1.2.3
DNS
4
Scalability RequirementsScalability RequirementsDepends on role in the network architectureDepends on role in the network architecture
GW
GW
MG
MG
MG
IP network
PSTN
SIP/PSTNSIP/MGC
SIP/MGC
Carrier network
ISP
ISP
Cybercafe
IP
PSTNGW
PBX
IP phones
PSTN phones T1 PRI/BRI
Edge ISP server10,000 customers
Carrier (3G)10 million customers
Enterprise server1000 customers
5
Scalability RequirementsScalability RequirementsDepends on traffic typeDepends on traffic type
Registration (uniform) Authentication, mobile users
Call routing (Poisson) stateful vs stateless proxy, redirect, programmable
scripts Beyond telephony (Don’t know)
Instant message, presence (including sensors), device control
Stateful calls (Poisson arrival, exponential call duration)
Firewall, conference, voicemail Transport type
UDP/TCP/TLS (cost of security)
6
SIPstoneSIPstoneSIP server performance metricsSIP server performance metrics
Steady state rate for successful registration, forwarding and
unsuccessful call attempts measured using 15 min test runs.
Measure: #requests/s with given delay constraint.
Performance=f(#user,#DNS,UDP/TCP,g(request),L) where g=type and arrival pdf (#request/s), L=logging?
For register, outbound proxy, redirect, proxy480, proxy200.
Parameters Measurement interval, transaction response time,
RPS (registers/s), CPS (calls/s), transaction failure probability<5%,
Delay budget: R1 < 500 ms, R2 < 2000 ms Shortcomings:
does not consider forking, scripting, Via header, packet size, different call rates, SSL. Is there linear combination of results?
Whitebox measurements: turnaround time Extend to SIMPLEstone
Loader Handler
REGISTER
Server
INVITE
INVITE
180 Ringing180 Ringing
100 Trying
200 OK
200 OK200 OK
200 OK 200 OK
ACKACK
BYE BYE
SQLdatabase
R2
R1
7
SIP serverSIP serverWhat happens inside a proxy?What happens inside a proxy?
recvfrom oraccept/recv
Match transaction
Modifyresponse
Match transaction
Update DB
Lookup DBBuildresponse
Modify Request
DNS
sendto,send orsendmsg
parse
Request
Response
Stateless proxy
Found
Stateless proxy
stateful
REGISTER
other
Redirect/reject
Proxy
(Blocking) I/O
Critical section (lock)
Critical section (r/w lock)
8
Lessons Learnt (sipd)Lessons Learnt (sipd)In-memory databaseIn-memory database
Call routing involves ( 1) contact lookups
10 ms per query (approx)
Cache (FastSQL) Loading entire
database is easy Periodic refresh
Potentially useful for DNS lookups
SQLdatabase
Cache
PeriodicRefresh
< 1 ms
[2002:Narayanan] Single CPU Sun Ultra10 Turnaround time vs RPS
Web config
9
Lessons Learnt (sipd)Lessons Learnt (sipd)Thread-per-request does not scaleThread-per-request does not scale
One thread per message Doesn’t scale
Too many threads over a short timescale
Stateless: 2-4 threads per transaction
Stateful: 30s holding time
Thread pool + queue Thread overhead less; more useful processing Pre-fork processes for SIP-CGI
Overload management Graceful failure, drop requests over responses
Not enough if holding time is high Each request holds (blocks) a thread
R1R2
R3
R4
IncomingRequestsR1-4
Load
Thro
ugh
put
IncomingRequestsR1-4
Fixed number of threads
Thread pool with overload control
Thread per request
10
What is the best What is the best architecture?architecture?
Event-based Reactive system
Process pool Each pool process
receives and processes to the end (SER)
Thread pool1. Receive and hand-over to pool
thread (sipd)2. Each pool thread receives and
processes to the end3. Staged event-driven: each stage
has a thread pool
recvfrom oraccept/recv
Match transaction
Modifyresponse
Match transaction
Update DB
Lookup DBBuildresponse
Modify Request
DNS
sendto,send orsendmsg
parse
Request
Response
Stateless proxy
Found
Stateless proxy
stateful
REGISTER
other
Redirect/reject
Proxy
11
Stateless proxyStateless proxyUDP, no DNS, six messages per callUDP, no DNS, six messages per call
recvfrom oraccept/recv
Match transaction
Modifyresponse
Match transaction
Update DB
Lookup DBBuildresponse
Modify Request
DNS
sendto,send orsendmsg
parse
Request
Response
Stateless proxy
Found
Stateless proxy
stateful
REGISTER
other
Redirect/reject
Proxy
12
Stateless proxyStateless proxyUDP, no DNS, six messages per callUDP, no DNS, six messages per call
Architecture/Hardware
1 PentiumIV 3GHz, 1GB, Linux2.4.20(CPS)
4 pentium, 450MHz, 512 MB, Linux2.4.20(CPS)
1 ultraSparc-IIi, 300 MHz, 64MB, Solaris(CPS)
2 ultraSparc-II, 300 MHz, 256MB, Solaris(CPS)
Event-based 1650 370 150 190
Thread/msg 1400 TBD 100 TBD
Thread-pool1 1450 600 (?) 110 220 (?)
Thread-pool2 1600 1150 (?) 152 TBD
Process-pool 1700 1400 160 350
0
0.5
1
1.5
2
2.5
3
3.5
4
1xP/ Linux 4xP/ Linux 1xS/ Solaris 2xS/ Solaris
Event
Th/ msg
Th-pool1
Th-pool2
Proc-pool
13
Stateful proxyStateful proxyUDP, no DNS, eight messages per callUDP, no DNS, eight messages per call
Event-based single thread: socket listener + scheduler/timer
Thread-per-message pool_schedule => pthread_create
Thread-pool1 (sipd) Thread-pool2
N event-based threads Each handles specific subset of requests (hash(call-id))
Receive & hand over to the correct thread poll in multiple threads => bad on multi-CPU
Process pool Not finished yet
14
Stateful proxyStateful proxyUDP, no DNS, eight messages per callUDP, no DNS, eight messages per call
Architecture/Hardware
1 PentiumIV 3GHz, 1GB, Linux2.4.20(CPS)
4 pentium, 450MHz, 512 MB, Linux2.4.20(CPS)
1 ultraSparc-IIi, 360MHz, 256 MB, Solaris5.9 (CPS)
2 ultraSparc-II, 300 MHz, 256 MB, Solaris5.8 (CPS)
Event-based 1200 300 160 160
Thread/msg 650 175 90 120
Thread-pool1 950 340 (p=4) 120 120 (p=4)
Thread-pool2 1100 500 (p=4) 155 200 (p=4)
Process-pool - - - -
0
0.2
0.4
0.6
0.8
1
1.2
1.4
1.6
1.8
1xP/ Linux 4xP/ Linux 1xS/ Solaris 2xS/ Solaris
Event
Th/ msg
Th-pool1
Th-pool2
15
Lessons LearntLessons LearntWhat is the best architecture?What is the best architecture?
Stateless CPU is bottleneck Memory is constant Process pool is the
best Event-based not
good for multi-CPU Thread/msg and
thread-pool similar Thread-pool2 close
to process-poll
Stateful Memory can
become bottle-neck
Thread-pool2 is good
But not N x CPU Not good if P
CPU Process pool may
be better (?)
16
Lessons Learnt (sipd)Lessons Learnt (sipd)Avoid blocking function callsAvoid blocking function calls
DNS 10-25 ms (29 queries) Cache
110 to 900 CPS Internal vs external
non-blocking Logger
Lazy logger as a separate thread Date formatter
Strftime() 10% REG processing Update date variable every second
random32() Cache gethostid()- 37s
Logger:while (1) { lock; writeall; unlock; sleep;}
17
Lessons Learnt (sipd)Lessons Learnt (sipd)Resource managementResource management
Socket management Problems: OS limit (1024), “liveness” detection, retransmission One socket per transaction does not scale
Global socket if downstream server is alive, soft state – works for UDP
Hard for TCP/TLS – apply connection reuse Socket buffer size
64KB to 128KB; Tradeoff: memory per socket vs number of sockets Memory management
Problems: too many malloc/free, leaks Memory pool
Transaction specific memory, free once; also, less memcpy About 30% performance gain
Stateful: 650 to 800 CPS; Stateless: 900 to 1200 CPS
Stateless processing time (s)
INV 180 200 ACK BYE 200 REG 200
W/o mempool 155 67 67 95 139 62 237 70W/ mempool 111 49 48 64 106 41 202 48Improvement (%) 28 27 28 33 24 34 15 31
18
Lessons Learnt (SER)Lessons Learnt (SER)OptimizationsOptimizations
Reduce copying and string operations Data lumps, counted strings (+5-10%)
Reduce URI comparison to local User part as a keyword, use r2 parameters
Parser Lazy parsing (2-6x), incremental parsing 32-bit header parser (2-3.5x)
Use padding to align Fast for general case (canonicalized)
Case compare Hash-table, sixth bit
Database Cache is divided into domains for locking
[2003:Jan Janak] SIP proxy server effectiveness, Master’s thesis, Czech Technical University
19
Lessons Learnt (SER)Lessons Learnt (SER)Protocol bottlenecks and other scalability concernsProtocol bottlenecks and other scalability concerns
Protocol bottlenecks Parsing
Order of headers Host names vs IP address Line folding Scattered headers (Via, Route)
Authentication Reuse credentials in subsequent requests
TCP Message length unknown until Content-Length
Other scalability concerns Configuration:
broken digest client, wrong password, wrong expires Overuse of features
Use stateless instead of stateful if possible Record route only when needed Avoid outbound proxy if possible
20
Load SharingLoad SharingDistribute load among multiple serversDistribute load among multiple servers
Single server scalability There is a maximum capacity limit
Multiple servers DNS-based Identifier-based Network address translation Same IP address
21
Load Sharing (DNS-based)Load Sharing (DNS-based)Redundant proxies and databasesRedundant proxies and databases
REGISTER Write to D1 & D2
INVITE Read from D1 or
D2 Database write/
synchronization traffic becomes bottleneck
D1
D2
P1
P2
P3
REGISTER
INVITE
22
Load Sharing (Identifier-Load Sharing (Identifier-based)based)Divide the user spaceDivide the user space
Proxy and database on the same host
First-stage proxy may get overloaded Use many
Hashing Static vs dynamic
D1
D2
P1
P2
P3
D3
a-h
i-q
r-z
23
Load SharingLoad SharingComparison of the two designsComparison of the two designs
((tr/D)+1)TN= (A/D) + B
((tr+1)/D)TN= (A/D) + (B/D)
D1
D2
P1
P2
P3
D1
D2
P1
P2
P3
D2
a-h
i-q
r-z
Total time per DB
D = number of database serversN = number of writes (REGISTER)r = #reads/#writes = (INV+REG)/REGT = write latencyt = read latency/write latency
Low reliabilityHigh scale
24
Scalability (and Reliability)Scalability (and Reliability)Two stage architecture for CINEMATwo stage architecture for CINEMA
Master
Slave
Master
Slave
sip:bob@example.comsip:bob@b.example.com
s1
s2
s3
a1
a2
b1
b2
a*@example.com
b*@example.com
example.com_sip._udp SRV 0 40 s1.example.com SRV 0 40 s2.example.com SRV 0 20 s3.example.com SRV 1 0 ex.backup.com
a.example.com_sip._udp SRV 0 0 a1.example.com SRV 1 0 a2.example.com
b.example.com_sip._udp SRV 0 0 b1.example.com SRV 1 0 b2.example.com
Request-rate = f(#stateless, #groups)
Bottleneck: CPU, memory, bandwidth?
ex
25
Load SharingLoad SharingResult (UDP, stateless, no DNS, no mempool)Result (UDP, stateless, no DNS, no mempool)
S P CPS
3 3 2800
2 3 2100
2 2 1800
1 2 1050
0 1 900
26
Lessons LearntLessons LearntLoad sharingLoad sharing
Non-uniform distribution Identifier distribution (bad hash function) Call distribution => dynamically adjust
Stateless proxy S=1050, P=900 CPS S3P3 => 10 million BHCA (busy hour call attempts)
Stateful proxy S=800, P=650 CPS
Registration (no auth) S=2500, P=2400 RPS S3P3 => 10 million subscribers (1 hour refresh)
Memory pool and thread-pool2/event-based further increase the capacity (approx 1.8x)
27
Conclusions and future workConclusions and future work Server scalability
Non-blocking, process/events/thread, resource management, optimizations
Load sharing DNS, Identifier, two-stage
Current and future work: Measure process pool performance for stateful Optimize sipd
Use thread-pool2/event-based (?) Memory - use counted strings; clean after 200 (?) CPU - use hash tables
Presence, call stateful and TLS performance (Vishal and Eilon)
Backup slidesBackup slides
29
Telephone scalabilityTelephone scalability(PSTN: Public Switched Telephone Network)(PSTN: Public Switched Telephone Network)
“bearer” network telephone switch(SSP)
database (SCP)for freephone, calling card, …
signaling network(SS7)
signaling router(STP)
local telephone switch(class 5 switch)10,000 customers20,000 calls/hour
database (SCP)10 million customers2 million lookups/hour
signaling router (STP)1 million customers1.5 million calls/hour
regional telephone switch(class 4 switch)100,000 customers150,000 calls/hour
30
SIP serverSIP serverComparison with HTTP serverComparison with HTTP server
Signaling (vs data) bound No File I/O (exception: scripts, logging) No caching; DB read and write frequency are
comparable Transactions
Stateful wait for response Depends on external entities
DNS, SQL database Transport
UDP in addition to TCP/TLS Goals
Carrier class scaling using commodity hardware Try not to customize/recompile OS or implement (parts
of) server in kernel (khttpd, AFPA)
31
Related workRelated workScalability for (web) serversScalability for (web) servers
Existing work Connection dispatcher Content/session-based redirection DNS-based load sharing
HTTP vs SIP UDP+TCP, signaling not bandwidth intensive, no
caching of response, read/write ratio is comparable for DB
SIP scalability bottleneck Signaling (chapter 4), real-time media data,
gateway 302 redirect to less loaded server, REFER session
to another location, signal upstream to reduce
32
Related workRelated work3GPP (release 5)’s IP Multimedia core network Subsystem 3GPP (release 5)’s IP Multimedia core network Subsystem uses SIPuses SIP
Proxy-CSCF (call session control function) First contact in visited network. 911 lookup. Dialplan.
Interrogating-CSCF First contact in operator’s network. Locate S-CSCF for register
Serving-CSCF User policy and privileges, session control service Registrar
Connection to PSTN MGCF and MGW
33
Server-based vs peer-to-Server-based vs peer-to-peerpeer
Reliability, failover latency
DNS-based. Depends on client retry timeout, DB replication latency, registration refresh interval
DHT self organization and periodic registration refresh. Depends on client timeout, registration refresh interval.
Scalability, number of users
Depends on number of servers in the two stages.
Depends on refresh rate, join/leave rate, uptime
Call setup latency
One or two steps. O(log(N)) steps.
Security TLS, digest authentication, S/MIME
Additionally needs a reputation system, working around spy nodes
Maintenance, configuration
Administrator: DNS, database, middle-box
Automatic: one time bootstrap node addresses
PSTN interoperability
Gateways, TRIP, ENUM Interact with server-based infrastructure or co-locate peer node with the gateway
34
Comparison of sipd and SERComparison of sipd and SER
sipd Thread pool Events (reactive
system) Memory pool PentiumIV 3GHz,
1GB, 1200 CPS, 2400 RPS (no auth)
SER Process pool Custom memory
management PentiumIII 850
MHz, 512 MB => 2000 CPS, 1800 RPS
top related