Towards Practical Multi-layer Performance
Management in Cloud Networking and NFV
Infrastructures
COST 1304 MC meeting, Lovran, Croatia, Oct. 2015.
Kurt Tutschku
Blekinge Institute of Technology (BTH),
Faculty of Computing
Department of Communication Systems (DIKO)
Future Internet: Applications, Architecture
and Performance
Social
Networks
with
Big Data
Internet of Things
Vehicular
Networks
Many (Virtual)
Networks!
Everywhere! Huge data centers for Clouds
Are these networks/clouds sustained? What
about:
• Greenness: Energy? Mineral Resources?
• Economical durability: Lifespan?
Investments?
Are these networks/clouds sustained? What
about:
• Greenness: Energy? Mineral Resources?
• Economical durability: Lifespan?
Investments?
• How can we operate these networks?
Virtualization is very much about
efficient use and sharing of
resources
Performance management is key
front
end
business
logic
back
end
Load
Manager
Advanced Network controller
(e.g. Floodlight OpenFlow controller)
Allow
Failove
r
DPI or VPN
Middle
Box
IPv4-based Internet
Possible coordination between
Cloud and network infrastructure
Enterprise network of
CN operator
Cloud Orchestrator
(e.g. OpenStack)
Storage
Node
Storage
Node
Customer
Customer
Customer
What are the Issues?
Multiple Slices /
Tenants
• Cloud as a Service
The Infrastructure project of the European Public-Private-Partnership on Future Internet
Yes,
BTH is
here!
What are the Issues?
What is the
E2E service
experience?
Dimensions of Resource Mgmt in Cloud
Networking
• Vertical integration: layering concept (even recursive)
• Horizontal integration: end-to-end (multiple hops)
Compute Infrastructure
Application
Cloud XaaS Services
(MS Azure, Amazon
AWS, Oracle, …)
Operating System
Server Hardware
Virtualization
Network
Infrastructure
Virtualization type (Type 1/2)
Hypervisor, Scheduling
Virtual Machine Resources
Cloud / Federation
Hypervisor (e.g KVM)
Host Kernel
Host OS
CPU RAM I/O
User
Admin
What are daily tasks in performance manage-
ment in virtualized environment?
• Relationship between “inside (virtual appliance/user)” and “outside
(infrastructure/admin)” is not obvious.
• Might even get less predictable when an “end-to-end view” or “service
chaining” is considered !
Virtual appliances
Infrastructure
How can we establish relationships?
• What can we do this with onboard
instruments / “standard pgkes”?
(Additional instrumentation often
(still) impossible!)
• How do these tools behave and where are
their limits?
• Measurements of tests in an experiment:
– Hardware: E3-1230, 3.30GHz, 8 cores, 8 GB RAM
– Host OS: Ubuntu Desktop14.04 LTS , CentOS 6.6
– Virtualization: kvm
– Guest OS: Ubuntu 14.04 server
– VM configuration: 2 vCPUs
– Load monitoring: uptime, top, mpstat
– Load generators: stress/stress-ng
Some Results: mpstat
centOS host
0
5
10
15
20
25
0 10 20 30 40 50 60 70%
CP
U u
tili
zati
on
in
ho
st
% CPU utilization in guest
10% 20% 30% 40% 50% 100%
0
2
4
6
8
10
12
14
16
18
0 10 20 30 40 50 60
% C
PU
uti
liza
tio
n i
n h
ost
% CPU utilization in guest
10% 20% 30% 40% 50% 100%
• Difficult to compare load across host operating systems!
• Large variations between multiple test
• Variation might in crease at lower load!?!
Ubuntu host
• Utilization when stressing one vCPU
Some Results: mpstat
centOS host
• Tool may show even higher variation at full load under centOS!
• Is Ubuntu more “predictable”?
Ubuntu host
• Utilization when stressing two vCPU
0
5
10
15
20
25
30
35
40
45
0 20 40 60 80 100 120%
CP
U u
tili
zati
on
in
ho
st
% CPU utilization in guest
10% 20% 30% 40% 50% 100%
0
5
10
15
20
25
30
0 20 40 60 80 100 120
% C
PU
uti
liza
tio
n i
n h
ost
% CPU utilization in guest
10% 20% 30% 40% 50% 100%
0
0,2
0,4
0,6
0,8
1
1,2
1,4
0 0,2 0,4 0,6 0,8 1 1,2
CP
U l
oa
d a
ver
ag
e in
ho
st
CPU load average in guest
Host: CentOS
10% 20% 30% 40% 50% 100%
0
0,2
0,4
0,6
0,8
1
1,2
0 0,2 0,4 0,6 0,8 1 1,2L
oa
d a
ver
ag
e in
ho
st
Load average in guest
Host: Ubuntu
10% 20% 30% 40% 50% 100%
More Results: uptime
• CPU load when stressing one vCPU
• Very high variations at full load under centOS!
• Ubuntu: more “direct” variation
More Results: top
• CPU load when stressing one vCPU
• Top might be more “predictable”
• Ubuntu shows some kind of “exponential limit” behavior
0
0,2
0,4
0,6
0,8
1
1,2
1,4
0 0,2 0,4 0,6 0,8 1 1,2
Lo
ad
av
era
ge
in h
ost
Load average in guest
Host: CentOS
10% 20% 30% 40% 50% 100%
0
0,1
0,2
0,3
0,4
0,5
0,6
0,7
0,8
0 0,2 0,4 0,6 0,8 1 1,2
Lo
ad
av
era
ge
in h
ost
Load average in guest
Host: Ubuntu
10% 20% 30% 40% 50% 100%
CN (Umbrella) Controller
Cloud Orchestrator
(e.g OpenStack)
Network Controller
(e.g OpenFlow
Controller)
CN Profiler (provides
vertical and horizontal
integration)
Data Representation and
Storage Service
Data Translation Service
Data Collection Service
Cloud Networking
Control Application
Layer
Data Integration
Layer
(part of future
Cloud Networking
Operating System)
Cloud Networking
Component Layer
SDN-based
OpenFlow
switch
IPv4/v6
router
(Virtualized)
Network
Function
Server
Storage
Server
Cloud
Computing
Server
Application
Specific
Virtual
Machine
(Establish global Cloud Networking view)
(coherent data structure and semantics)
(Provides interoperability by translation)
(Reliable gathering of data)
A Framework for Performance Data
Integration – Overall Concept
Data Representation and Storage Service
Data Translation Service
Data
Inte
gra
tion
Layer
Translations Module
A: QoE for VNF
Translations Module
B: Host Load on
Compute Server
using OS XYZ
Translations Module
C: Host Load on
Compute Server
using OS XYZ
Data Collection Service
Data Object A: QoE
for VNF
Data Object B: Host
Load (OS XYZ)
Data Object C: VM
Load (Guest OS XYZ)
Raw Data Raw Data Raw Data
A Framework for Performance Data
Integration – Data Integration Layer
Summary
Performance management in virtualized systems is of increased
complexity due additional dimensions virtualization techniques
and scope.
Interpretation of simple observations/measurements is still
complex or in the beginning Reliable and und reasonable tools
urgently needed to described load
Significant differences for similar guest load among different
operating systems
Term “network load” in virtual infrastructures might comprise
computational as well as network load in physical infrastructures
vertical integration
Horizontal integration not yet addressed
Data integration layer might enable comparability of load
descriptions
Tack så mycket!
Frågor?