dependability evaluation and benchmarking of network function virtualization infrastructures
TRANSCRIPT
NetSoft 2015 Dependability Evaluation and Benchmarking of Network Function Virtualization Infrastructures
Dependability Evaluation and Benchmarking of
Network Function Virtualization Infrastructures
1st IEEE Conference on Network Softwarization (NetSoft) April 13-17, 2015, UCL, London
Best Conference Paper Award
D. Cotroneo, L. De Simone, A.K. Iannillo, A. Lanzaro, R. Natella Critiware s.r.l. and Federico II University of Naples, Italy
http://dx.doi.org/10.1109/NETSOFT.2015.7116123
NetSoft 2015 Dependability Evaluation and Benchmarking of Network Function Virtualization Infrastructures
Network Function Virtualization: A new paradigm
2 / 20
Source: “Network Functions Virtualisation – Introductory White Paper”, Issue 1
t Reduced costs, improved manageability, faster innovation
t Comparable performance and reliability?
NetSoft 2015 Dependability Evaluation and Benchmarking of Network Function Virtualization Infrastructures
Why engineering reliable NFV is challenging?
Complex stack of hardware and software off-the-shelf components
Exposure to several sources of hardware and software faults
Lack of tools and methodologies for testing fault-tolerance
3 / 20
Hardware
Hypervisor
VM
Guest OS
Hardware
Hypervisor
VM
Guest OS
Hardware
Virtualization
VM
Guest OS
?
? As a result, it is hard to trust the
reliability of NFV services
NetSoft 2015 Dependability Evaluation and Benchmarking of Network Function Virtualization Infrastructures
In this presentation:
t An experimental methodology for dependability benchmarking of NFV based on fault injection
t A case study on a virtual IP Multimedia Subsystem (IMS), analyzing: § The impact of faults on performance and availability § The sensitivity to different types of faults § The pitfalls in the design of NFVIs
4 / 20
NetSoft 2015 Dependability Evaluation and Benchmarking of Network Function Virtualization Infrastructures
What is a dependability benchmark?
t A dependability benchmark evaluates a system in the presence of (deliberately injected) faults
§ Are NFV services still available and high-performing even when a fault is injected?
t The dependability benchmark includes: § measures (KPIs) for characterizing performance and
availability § procedures, tools, conditions under which
measures are obtained
5 / 20
NetSoft 2015 Dependability Evaluation and Benchmarking of Network Function Virtualization Infrastructures
Overview of the benchmarking process
6 / 20
Deployment of VNFs over
the NFVI Workload and
VNFs execution Data
collection Testbed clean-up ... ...
Injection of the i-th fault
Definition of workload,
faultload, and measures
Fault Injection experiments
Computation of measures and
reporting
Iterated over several different faults
NetSoft 2015 Dependability Evaluation and Benchmarking of Network Function Virtualization Infrastructures
Benchmark measures
t The dependability benchmark measures the quality of service as perceived by NFV users: 1. VNF latency 2. VNF throughput 3. VNF experimental availability 4. Risk Score
t We compare fault-injected experiments with the QoS objectives and the fault-free experiment (benchmark baseline)
7 / 20
NetSoft 2015 Dependability Evaluation and Benchmarking of Network Function Virtualization Infrastructures
VNF Latency and Throughput
8 / 20
VNFVNF
VNF
VNF
Virtualization Layer
Off-The-Shelf hardware and
software
VNF Latency:
trequest
tresponse
Fault Injection
End pointsEnd pointsEnd points
VNF Throughput:
the time required to process a unit of traffic (such as a packet or a service request)
the rate of processed traffic (packets or service requests) per second
NetSoft 2015 Dependability Evaluation and Benchmarking of Network Function Virtualization Infrastructures
90th percentiles
50th percentiles
Characterization of VNF latency
9 / 20
0102030405060708090
100
0 50 100 150 200 250 300 350 400
Cum
ulat
ive
Dis
trib
utio
n (%
)
Latency (ms) Gap from QoS objectives
Response latency fault-free
Response latency with faults,
good performance
Response latency with faults,
bad performance
Percentiles of the distribution are compared against QoS objectives, e.g.: • 50th percentile ≤ 150ms • 90th percentile ≤ 250ms
NetSoft 2015 Dependability Evaluation and Benchmarking of Network Function Virtualization Infrastructures
VNF Experimental Availability
10 / 20
VNFVNF
VNF
VNF
Virtualization Layer
Off-The-Shelf hardware and
software
End points
Experimental availability:
Fault Injection
End pointsEnd points
the percentage of traffic units that are successfully processed
NetSoft 2015 Dependability Evaluation and Benchmarking of Network Function Virtualization Infrastructures
Risk Score
t The Risk Score is a brief measure that summarizes the risk of experiencing service unavailability and/or performance failures
11 / 20
RS =Weighted average over all faults
∑ ( )% +Performance
failures
%Availability
failures
NetSoft 2015 Dependability Evaluation and Benchmarking of Network Function Virtualization Infrastructures
Benchmark faultload
12 / 20
Network frame receive/transmit
Corruption Drop Delay
Host VM Host VM Host VM
Storage block reads/write
Corruption Drop Delay
Host VM Host VM Host VM
I/O faults
Compute faults
Hogs Termination Code corruption Data corruption
CPU Memory Host VM Host VM Host VM
t Faults in virtualized environments include disruptions in network and storage I/O traffic, in CPUs and memory
t A fault injector has been implemented as a set of kernel modules for VMware ESXi and Linux
NetSoft 2015 Dependability Evaluation and Benchmarking of Network Function Virtualization Infrastructures
Benchmark workload
t The VNFs should be exercised using a representative workload
t Our dependability benchmarking methodology is not tied to a specific choice of the workload
t Realistic workloads can be generated using load testing and performance benchmarking tools (e.g., Netperf)
13 / 20
NetSoft 2015 Dependability Evaluation and Benchmarking of Network Function Virtualization Infrastructures
Case study: Clearwater IMS
t Clearwater: an open-source NFV-oriented implementation of IP Multimedia Subsystem (IMS)
t In a first round of experiments, we test a replicated, load-balanced deployment over several VMs
t In a second round of experiments, we introduce the automated recovery of VMs (VMware HA cluster) in the setup
t We use SIPp to generate SIP call set-up requests
14 / 20
VMware ESXi replicated servers
Fault Injection
NetSoft 2015 Dependability Evaluation and Benchmarking of Network Function Virtualization Infrastructures
Fault injection test plan
t We inject faults in one of the physical host machines, and faults in a subset of the VMs (Sprout and Homestead)
t We inject both I/O (network, storage) and compute (memory, CPU) faults, both intermittently and permanently
t Each fault injection experiment has been repeated three times
t In total, 93 fault injection experiments have been performed
15 / 20
NetSoft 2015 Dependability Evaluation and Benchmarking of Network Function Virtualization Infrastructures
Experimental availability
t We computed performance and availability KPIs from logs of the SIPp workload generator
t Faults have a strong impact on availability t Compute faults and Sprout-VM faults
have the strongest impact
16 / 20
NetSoft 2015 Dependability Evaluation and Benchmarking of Network Function Virtualization Infrastructures
VNF latency (by fault type)
17 / 20
0%!
10%!
20%!
30%!
40%!
50%!
60%!
70%!
80%!
90%!
100%!
1! 10! 100! 1000! 10000!
Cum
ulat
ive
Dis
trib
utio
n (%
)!
Latency (ms) - Logarithmic scale!
I/O faults!Compute faults!Fault-free!
T 50=
150m
s
T 90=
250m
s
Over than 10% of requests exhibit a
latency much higher than
250ms!
NetSoft 2015 Dependability Evaluation and Benchmarking of Network Function Virtualization Infrastructures
Risk Score and problem determination
t The overall risk score (55%) is quite high and reflects the strong impact of faults
t The infrastructure was affected by a capacity problem § once a VM or Host fails, the remaining replicas are not
able to handle the SIP traffic
18 / 20
NFVI design choices have a big impact on reliability! e.g., placement of VMs across hosts, topology of virtual networks and
storage, allocation of CPUs and memory for VMs, etc.
NetSoft 2015 Dependability Evaluation and Benchmarking of Network Function Virtualization Infrastructures
Evaluating automated recovery mechanisms
19 / 20
0
500
1000
1500
2000
2500
0 20 40 60 80 100 120 140 160 180 200 220 240 260 280 300
Time (s)
Network throughput at Sprout VM
Fault injected VM recovered
~1m
Fault-free run
Faulty run,load-balancing
only
Faulty run,load-balancing + automated
recovery
Fault tolerance mechanisms require careful tuning, based on experimentation
in our experiments, automated VM recovery was too slow and availability still resulted low
NetSoft 2015 Dependability Evaluation and Benchmarking of Network Function Virtualization Infrastructures
Conclusion
t Performance and availability are critical concerns for NFV
t NFVIs are very complex, and making design choices is difficult
t We proposed a dependability benchmark useful to point out dependability issues and to guide designers
t Future work will extend the evaluation to alternative virtualization technologies
20 / 20