extension to perfcenter: a modeling and simulation tool for datacenter application nikhil r....

Extension to PerfCenter: A Modeling andSimulation Tool for Datacenter

Application

Nikhil R. Ramteke,Advisor: Prof. Varsha Apte,

Department of CSA, IISc

27th May 2011

Multi-tiered Networked Applications

DB server

Auth server

sweb

server

Important Performance Metric• Response time• Utilization• Throughput• Waiting time• Queue length• Arrival Rate• Blocking probability• Average service time

Flow of a request through such a system

3

Machine 1 Machine 2 Machine 3

“login”

Web Server Auth ServerWeb Server Auth Server Web ServerDB Server DB Server DB ServerWeb Server

Machine 4 Machine 5 Machine 6Machine 7 Machin 8 Machine 9

PerfCenter

Fig: PerfCenter tool structure

• Performance measurement tool, it builds and solves the system model • It takes the system details as an input and built the system model.• System model is built as a network of queues.• Built model is solved either by simulation or by analytical methods. • Open source, available at:http://www.cse.iitb.ac.in/perfnet/softperf/cape/home/wosp2008page

http://www.cse.iitb.ac.in/perfnet/softperf/cape/home/wosp2008page

http://www.cse.iitb.ac.in/perfnet/softperf/cape/home/wosp2008page

PerfCenter(Input Language)

Host Specification:

host machine1[2] ram 1000cpu count 1cpu buffer 99999cpu schedP fcfscpu speedup 1disk count 1disk buffer 99999disk schedP fcfsdisk speedup 1...

end

Server Specification:

server webthread count 1thread buffer 9999thread schedP fcfs

thread size 0.610staticsize 100requestsize 0.5task node2task node5task node9...

end

Feature Enhancements to PerfCenter (Problem Definition)

Among various enhancements possible, our contribution isfollowing:Memory Model: Memory can be bottleneck while deploying server on host.

Individual server utilization on a device: •PerfCenter can predict the device utilization of host.• But can not estimate the which server has contribution in what amount,• This feature enables the user to find bottleneck server quickly

Timeout and Retries: Aimed at capturing the user behavior like “stop-reload”.

Memory Usage Modeling

PerfCenter System Model for memory usage:Servers:• Static size of server• Per thread memory usage• Per request memory usage (increases with queue length)

Host:• RAM size for each host

Input language specification:

Server webstaticsize 80thread size 2requestsize 2end

Host host1ram 2000end

Per server util = (Static Size +

Thread size * total threads +Request size * Avg. Queue length of request queue)/

RAM size

Metrics:• util(host_name:ram) //overall RAM util• util(host_name:server_name:ram) //RAM util by a server

Software Design Changes Required for Memory model and Individual Server Utilization

Memory Model Added members static size, thread size, request size to the

server class software server, Added members ram size to host class, No change required to dynamic statistics calculation in

simulation Use average queue length calculated at the end of

simulation Individual server utilization of host devices:

Must keep track of who is issuing request to device

Class member update: total busy time, utilization variables into software queue class.

Some additional bookkeeping during simulation (per server statistics)

Server

R R R R R

S1S2S1S3 S2

Timeouts and Retries:

• Characteristics of real users of server systems• Impatience: users abandon if response is not received

within their expected time• Retries: Users often retry just after abandoning a

request (E.g. “stop-reload” behavior on Web browser)

This behavior is common in client-server based applications.

Timeout may affect system performance in following ways:• Reduction in Throughput, • Completed requests may have already timed out – need to count successful requests separately,• Utilization may decrease due to less throughput,• Average response time decrease due to increase in request timeouts,

Timeouts and Retries:

When request is submitted to an application, one of the following things can happen:

Server

Drop[Drop rate (D)]

Timeout in Buffer[Timeout in buffer

Rate (Tb)]

Timeout during service[Badput (B)]

Successfully completed[Goodput (G)]

Arrival of

request

Request does not leave the queue

immediately, When it is picked by s/w server then it is

counted as failed.

Request processing is not aborted immediately,

processing goes to completion, but

request counted as failed

Possibility of

Retry

Timeouts and Retries: (PerfCenter system model)

Mean timeout value is taken as an input with certain distribution, timeout value of each request is set according to it.

Input language:loadparamstimeout distribution_name(distribution_parameters)..end

Eg: loadparams timeout exp(0.5) end

Timeouts and Retries: (PerfCenter system model: )

Overall G, B, D and Tb can now be estimated with PerfCenteras follows,

Output Language:

- gput() //overall Goodput- bput() // overall Badput- buffTimeout() // overall Timeout in buffer rate- droprate() // overall drop rate

Timeouts and Retries: Software Design Changes:

• Added members timeout flag, mean timeout in to theRequest class,

• Added number of request processed, number of request timeout in buffer, number of request timed out in service, Goodput, Badput, drop rate, timeout in buffer rate to the Scenario simulation class.

• No extra events are added.

Validation:

Validation done using sanity checks Results should follow expected rules and trends

Scenario used for validation:

Input File

Type of System : OpenService rate : 100Arrival rate : Varied from 10 to 100Timeout rate : 10Timeout distribution : ExponentialRequests simulated : 1000000Number of repetition : 20

Results

Fig : RAM utilization v/s Arrival Rate

Results

Fig : G, B, Tb, D v/s Arrival Rate

More request

timed out in buffer

Goodput Decreases

Results

Fig : Utilization, Throughput v/s Arrival Rate

Utilization curve follows

Throughput (G + B)

Starts decreasing

because more requests are timing out in

buffer

Results

Fig : Individual server utilization v/s Arrival Rate

Utilization decreases

due to more request time

outs

Results

Fig : Average Response Time v/s Arrival Rate

Avg. Response time decreases

due to timeouts

Summary of Work Done

Before Midterm: Background Study • Queuing theory, • Simulation modeling, • Performance issues of multi-tiered systems,• PerfCenter

After Midterm:Developed an abstraction, an input language and updated PerfCenter simulation engine for • Adding memory model,• Updating utilization model for Individual server utilization on device,• Adding Timeout and Retries model.

Conclusion:

PerfCenter is performance measurement tool, and can now be used by performance analysts with few more useful features added, most important one being timeouts and retries. Validated our model using test experiment. Illustrative results shows how PerfCenter can be used for estimating application performance in presence of following features.• Memory model• Individual server utilization• Timeout and retries model.

As results show, this can change data center sizing plans.Future work:• Predicting G, B, Tb, D for individual queuing systems,• More validation is needed to increase confidence in

the tool,• More features need to be added to increase power of

the tool.

References:

1. R.P. Verlekar, V. Apte, PP. Goyal, and B. Aggarwal.Perfcenter: A methodology and tool for performance analysis of application hosting centers.MASCOTS '07: Modeling, Analysis, and Simulation of Computer and Telecommunication Systems, 2007, pages 201—208.

2. Supriya Marathe, Varsha Apte, and Akhila Deshpande.Perfcenter: Perfcenter: A performance modeling tool for application hosting centers.WOSP '08 Proceedings of the 7th international workshop of Software and Performance, 2008

3. Kishor~S. Trivedi.Probability and Statistics With Reliability, Queuing, and Computer Science Applications.PHI Learing Private Limited, Eastern Economy edition, 2009.

References:

4. Averill M. Law and W. David Kelton.Simulation Modeling and Analysis.Tata Mcgraw-Hill, 2000.

5. Daniel A. Menasce and Virgilio A. F. Almeida.Scaling for E-Business, Technologies, Models, Performance and Capacity Planning.Prentice Hall PTR, 2000.

6. Supriya Marathe.Performance Modeling for Distributed Systems.Master's thesis, IIT Bombay, Mumbai, India, June 2008.

7. Puram Niranjan Kumar.Validation, Defect Resolution and Feature Enhancements of PerfCenter.Master's thesis, IIT Bombay, Mumbai, India, June 2008.

extension to perfcenter: a modeling and simulation tool for datacenter application nikhil r....

Documents

server slide

server class software

bottleneck server

static size of server

server web thread

end server specification

static size thread size

end slide