serene 2014 school: measurement-driven resilience design of cloud-based cyber-physical systems

72
Department of Measurement and Information Systems Budapest University of Technology and Economics, Hungary Measurement-Driven Resilience Design of Cloud-Based Cyber-Physical Systems Imre Kocsis ikocsis@ mit.bme.hu SERENE’14 Autumn School 2014.10.14.

Upload: sereneworkshop

Post on 19-Jul-2015

151 views

Category:

Engineering


1 download

TRANSCRIPT

Page 1: SERENE 2014 School: Measurement-Driven Resilience Design of Cloud-Based Cyber-Physical Systems

Department of Measurement and Information SystemsBudapest University of Technology and Economics, Hungary

Measurement-Driven Resilience Design of Cloud-Based Cyber-Physical Systems

Imre [email protected]

SERENE’14 Autumn School2014.10.14.

Page 2: SERENE 2014 School: Measurement-Driven Resilience Design of Cloud-Based Cyber-Physical Systems

A View of Cyber-Physical Systems

Page 3: SERENE 2014 School: Measurement-Driven Resilience Design of Cloud-Based Cyber-Physical Systems

Cyber-Physical Systems (CPSs)

3

Ubiquitous embedded and networkedsystems that can monitor and control the

physical world with a high level of intelligence and dependability

Networked embedded systems everywhere

Clouds, „infusable” analytics, Big Data

Page 4: SERENE 2014 School: Measurement-Driven Resilience Design of Cloud-Based Cyber-Physical Systems

From embedded to CPS

4

Direct manual control, „closed world” engineering

Page 5: SERENE 2014 School: Measurement-Driven Resilience Design of Cloud-Based Cyber-Physical Systems

From embedded to CPS

5

Direct manual control, „closed world” engineering

Highly autonomous, „cyber” backend,

environment, swarms, …

Page 6: SERENE 2014 School: Measurement-Driven Resilience Design of Cloud-Based Cyber-Physical Systems

From embedded to CPS

6

Direct manual control, „closed world” engineering

Highly autonomous, „cyber” backend,

environment, swarms, …

Page 7: SERENE 2014 School: Measurement-Driven Resilience Design of Cloud-Based Cyber-Physical Systems

Cyber-Physical Systems

Different flavorso NSF, EU, academia, industry…

Still: it is hereo From smart cities & IoT to self-

driving carso Scalable, reconfigurable

backend is a must

7

Health Care

Transportation

Energy

Page 8: SERENE 2014 School: Measurement-Driven Resilience Design of Cloud-Based Cyber-Physical Systems

„Classical” case for cloud computing: a brain for a CPS

Video surveillance

Citizen devices

Env. sensors …

Traffic control Situational awareness Deep analytics Normalday

Disaster

See: Naphade et. al (IBM), „Smarter Cities and Their Innovation Challanges”, Computer, 2011

Elastic, reconfigurable

computing

Re

con

figu

rati

on

Page 9: SERENE 2014 School: Measurement-Driven Resilience Design of Cloud-Based Cyber-Physical Systems

Converging domains

CPS

Cloudcomputing

Big Data

9

Page 10: SERENE 2014 School: Measurement-Driven Resilience Design of Cloud-Based Cyber-Physical Systems

Detour 1: Cloud Computing

Page 11: SERENE 2014 School: Measurement-Driven Resilience Design of Cloud-Based Cyber-Physical Systems

Cloud computing: leased resources

Source: http://cloud.dzone.com/articles/introduction-cloud-computing

Page 12: SERENE 2014 School: Measurement-Driven Resilience Design of Cloud-Based Cyber-Physical Systems

Definition?

NIST 800-145

Cloud computing is a model for enabling ubiquitous,convenient, on-demand network access to a sharedpool of configurable computing resources (e.g., networks,servers, storage, applications, and services) thatcan be rapidly provisioned and released with minimalmanagement effort or service provider interaction.

Page 13: SERENE 2014 School: Measurement-Driven Resilience Design of Cloud-Based Cyber-Physical Systems

Properties

On-demand self-service

Broad network access

Resource pooling

Rapid elasticity

Measured service

13

Page 14: SERENE 2014 School: Measurement-Driven Resilience Design of Cloud-Based Cyber-Physical Systems

On the provider side…

~?

Page 15: SERENE 2014 School: Measurement-Driven Resilience Design of Cloud-Based Cyber-Physical Systems

Why is it good for the provider?

(Without CLT)

𝑋𝑖 independent prob. Vars with 𝜇 and σ2

Coefficient of variation: 𝜎

𝜇

Exp. value of sum: sum of exp. values

Variance of sum: sum of variances

CV 𝑋𝑠𝑢𝑚 =𝑛𝜎2

𝑛𝜇=

1

𝑛

𝜎

𝜇=

1

𝑛𝐶𝑉(𝑋𝑖)

Page 16: SERENE 2014 School: Measurement-Driven Resilience Design of Cloud-Based Cyber-Physical Systems

„Statistical multiplexing”

Variance w.r.t. meangets smaller

1

𝑛: quick – smaller

private clouds

Reality is a bit different

Source: http://en.wikipedia.org/wiki/Central_limit_theorem

Page 17: SERENE 2014 School: Measurement-Driven Resilience Design of Cloud-Based Cyber-Physical Systems

Gartner, 2013

„For larger businesses with existing internal data centers, well-managed virtualized infrastructure and efficient IT operations teams, IaaS for steady-state workloads is often no less expensive, and may be more expensive, than an internal private cloud.”

„I need it now, and need it fast…”?

Page 18: SERENE 2014 School: Measurement-Driven Resilience Design of Cloud-Based Cyber-Physical Systems

Parallellizable loads

More and more embarrassingly parallel, „scale-out” application categories exist

NYT TimesMachine: public domain archive

o Conversion to web-friendly format: Apache Hadoop, a few hundred VMs, 36 hours

In the cloud: costs the same as with one VM

Practically: „speedup for free”

Page 19: SERENE 2014 School: Measurement-Driven Resilience Design of Cloud-Based Cyber-Physical Systems

Scaling resources

„Scale up”

„Scale out”

o Algorithmics?

o „webscale”technologies

Page 20: SERENE 2014 School: Measurement-Driven Resilience Design of Cloud-Based Cyber-Physical Systems

Detour 2: Big Data

Page 21: SERENE 2014 School: Measurement-Driven Resilience Design of Cloud-Based Cyber-Physical Systems

1.) Big Data at Rest

Distributed storage

„Computation to data”

„At rest Big Data”

o No update

o No sampling

„Not true, but a very, very good lie!”(T.Pratchett, Nightwatch)

Page 22: SERENE 2014 School: Measurement-Driven Resilience Design of Cloud-Based Cyber-Physical Systems

MapReduce (Apache Hadoop)

Distributed File System

[ , ][ , ][ , ]

[ , ][ , ][ , ]

[ , ][ , ][ , ]

[ , ][ , ][ , ]

[ , ][ , ][ , ]

[ ,[ , , ]]

[ ,[ , , ]]

[ ,[ , , ]]

[ ,[ , , ]]

[ ,[ , , ]]

SHUFFLE

Map

Reduce

[ , ] [ , ] [ , ] [ , ] [ , ]

Page 23: SERENE 2014 School: Measurement-Driven Resilience Design of Cloud-Based Cyber-Physical Systems

2.) „Big Data in Motion”

Stream processing

Inherently scalable the same way

Page 24: SERENE 2014 School: Measurement-Driven Resilience Design of Cloud-Based Cyber-Physical Systems

Streaming data

Sensor data

o From smart grid toturbine testing

Images

o Satellites: n TB/day

Web services

Network traffic

Trading

Page 25: SERENE 2014 School: Measurement-Driven Resilience Design of Cloud-Based Cyber-Physical Systems

The stream processor model

Source: Rajaraman, A., & Ullman, J. D. (2011). Mining of Massive Datasets. Cambridge: Cambridge University Press. p130

Page 26: SERENE 2014 School: Measurement-Driven Resilience Design of Cloud-Based Cyber-Physical Systems

Design & composition

Source: International Technical Support Organization. IBM InfoSphere Streams: Harnessing Data in Motion. September 2010, p76

Page 27: SERENE 2014 School: Measurement-Driven Resilience Design of Cloud-Based Cyber-Physical Systems

When we have a WCET constraint…

Emphasis in „plain” Big Data: keeping step with ingresso But largely the same for direct timeliness

No (direct) disk access

Memory: bounded

Per-tuple processing: bounded

Algorithmic patterns:o Per-tuple processing

o Sliding window storage and processing

o Specialized sampling• Gets ugly fast

o Various heuristics

Page 28: SERENE 2014 School: Measurement-Driven Resilience Design of Cloud-Based Cyber-Physical Systems

Application classes

Source: International Technical Support Organization. IBM InfoSphere Streams: Harnessing Data in Motion. September 2010, p80

Page 29: SERENE 2014 School: Measurement-Driven Resilience Design of Cloud-Based Cyber-Physical Systems

Takes on cyber-physical clouds:Cloud-in-CPS…

Page 30: SERENE 2014 School: Measurement-Driven Resilience Design of Cloud-Based Cyber-Physical Systems

Converging domains

CPS

Cloudcomputing

Big Data

30

standard link

Intelligence Reconfigurability

Page 31: SERENE 2014 School: Measurement-Driven Resilience Design of Cloud-Based Cyber-Physical Systems

Clouds in CPS – reality, not promise

31

SENSORS ACTUATORS

Page 32: SERENE 2014 School: Measurement-Driven Resilience Design of Cloud-Based Cyber-Physical Systems

Architectural landscape

32

Page 33: SERENE 2014 School: Measurement-Driven Resilience Design of Cloud-Based Cyber-Physical Systems

Takes on cyber-physical clouds:…CPS-in-cloud

Page 34: SERENE 2014 School: Measurement-Driven Resilience Design of Cloud-Based Cyber-Physical Systems

Extending Apache VCL for CPS

34

Apache VCL

Virtualized Data Center

...

Virtualmachines

Internet/CAN/LAN

Remote client

ReservationEstablishing connection

Remote desktop or terminal access

Page 35: SERENE 2014 School: Measurement-Driven Resilience Design of Cloud-Based Cyber-Physical Systems

Proof of Concept

35

Time-shareable arrangements

Cloud-on-Cloud

Apache VCL

VCL management network

VCL public network

Cloud instance

Network-attachedphys. devices

Experiment video stream

Page 36: SERENE 2014 School: Measurement-Driven Resilience Design of Cloud-Based Cyber-Physical Systems

„Cloud on Cloud” capability

36

Apache VCL

VCL management network

VCL public network

Apache VCL/OpenStack/...

CoC virtual networks

Page 37: SERENE 2014 School: Measurement-Driven Resilience Design of Cloud-Based Cyber-Physical Systems

„Cloud on Cloud” capability

37

Apache VCL

VCL management network

VCL public network

Apache VCL/OpenStack/...

CoC virtual networks

Bootstrap & capture XaaS

Hypervisors

Page 38: SERENE 2014 School: Measurement-Driven Resilience Design of Cloud-Based Cyber-Physical Systems

„Cloud on Cloud” (CoC)

38

With nestedvirtualization

We have…o virtualesxi

o VCL over VCL on that

Some restrictionsapply; in VCL, no…o storage virtualization

o network virtualization

o dynamic reservations

Page 39: SERENE 2014 School: Measurement-Driven Resilience Design of Cloud-Based Cyber-Physical Systems

Integrating a field device: Raspberry Pi

39

Surprisingly popular

o In the target demographic

Almost a lab PC: rpi VCL module

Linux

o gentler learning curve

o In reservation: SSH access

Useful set of interfaces

ASM C scripting Java Wolfram

Page 40: SERENE 2014 School: Measurement-Driven Resilience Design of Cloud-Based Cyber-Physical Systems

Integrating field devices?

Other device types: adapter computer needed

o E.g. a Rasberry Pi for an Arduino

o Scopes/spectrometers/…: already there

o Autonomous cameras/mesh GWs/…: already inside

Lab.pm: starting point, needs rework

o Field devices: „sanitization” is stronger concept

o Harder work - Pi: reset + read-only SD netboot

40

Page 41: SERENE 2014 School: Measurement-Driven Resilience Design of Cloud-Based Cyber-Physical Systems

Container/VMContainer/VM

Future: field devices as true cloud hosts

Real-time/embeddedvirtualization is maturing

o Check out: Siemens Jailhouse

o Xen for ARM

o …

Also see: carrier clouds

Raspberry Pi already has containers!

41

Container/VM

Page 42: SERENE 2014 School: Measurement-Driven Resilience Design of Cloud-Based Cyber-Physical Systems

Educational prototype

42

Page 43: SERENE 2014 School: Measurement-Driven Resilience Design of Cloud-Based Cyber-Physical Systems

Immediate applications: cloud engineering

CoC: teaching virt. & cloudo E.g. we use it for an ESXi lab; o support for local VCL devel in

progress

Real-life: faults, errors, failureso CPS: performance!

Virtualization in the loopo There are existing SWIFI tools…o … and VCL can be a harness

43

Page 44: SERENE 2014 School: Measurement-Driven Resilience Design of Cloud-Based Cyber-Physical Systems

Immediate applications: people & labs

44

Internet/CAN/LAN

Remote client

We have EE/CE in view; chemistry, biology,

physics, …?

Page 45: SERENE 2014 School: Measurement-Driven Resilience Design of Cloud-Based Cyber-Physical Systems

Trusting your cloud with deadlines- is it a good idea?

Page 46: SERENE 2014 School: Measurement-Driven Resilience Design of Cloud-Based Cyber-Physical Systems

Clouds for demanding applications?

Standard infrastructure vs

demanding application?

Page 47: SERENE 2014 School: Measurement-Driven Resilience Design of Cloud-Based Cyber-Physical Systems

Clouds for demanding applications?

Virtual Desktop Infrastructure

Telecommunications

Extra-functional reqs: throughput, timeliness, availability

„Small problems” have high impact(soft real time)

Page 48: SERENE 2014 School: Measurement-Driven Resilience Design of Cloud-Based Cyber-Physical Systems

Test automation

Hypervisor

Interference

Lab

OS and hypervisor

metrics

OS and hypervisor

metrics

LOLO

HIHI

Experimental setup

Page 49: SERENE 2014 School: Measurement-Driven Resilience Design of Cloud-Based Cyber-Physical Systems

Short transient faults – long recovery

8 sec platform overload

30 sec service outage

120 sec SLA violation

As if you unplug your

desktop for a second...

Page 50: SERENE 2014 School: Measurement-Driven Resilience Design of Cloud-Based Cyber-Physical Systems

Deterministic (?!) run-time in the public cloud...

Variance tolerable by overcapacity

Performance outage

intolerable by overcapacity

Page 51: SERENE 2014 School: Measurement-Driven Resilience Design of Cloud-Based Cyber-Physical Systems

The noisy neighbour problem

Hypervisor

Tenant Neighbor

Page 52: SERENE 2014 School: Measurement-Driven Resilience Design of Cloud-Based Cyber-Physical Systems

Tenant-side measurability and observability

Hypervisor

Tenant Neighbor

Page 53: SERENE 2014 School: Measurement-Driven Resilience Design of Cloud-Based Cyber-Physical Systems

Characterizing IaaS performance

Page 54: SERENE 2014 School: Measurement-Driven Resilience Design of Cloud-Based Cyber-Physical Systems

IaaS performance

HW not necessarily known

Unknown / uncontrollable

deployment

Unknown / uncontrollable

scheduling„Noisy neighbors”

Also: management action performance?

Page 55: SERENE 2014 School: Measurement-Driven Resilience Design of Cloud-Based Cyber-Physical Systems

IaaS performance

Deployment decisionso Should I use this cloud?

Capacity planningo Type and amount of res.

Perf. predictiono QoS to be expected

o And its deviancesBenchmarking!

Page 56: SERENE 2014 School: Measurement-Driven Resilience Design of Cloud-Based Cyber-Physical Systems

Benchmarking (a pragmatic take on)

(De-facto) standard applications

with well defined execution metrics

that may exercise specific subsystems

to compare IT systems via said metrics.

Popular benchmarks: e.g. Phoronix Test Suite

Benchmarking as a Service: cloudharmony.com

Page 57: SERENE 2014 School: Measurement-Driven Resilience Design of Cloud-Based Cyber-Physical Systems

Why traditional benchmarking is not enough

Stability

Homogeneity

Rare events

Repeatability?o Provider/tenant

Micro/component benchmarks?o Application sensitivity?

o Cloud functions (scale in and out)?

Page 58: SERENE 2014 School: Measurement-Driven Resilience Design of Cloud-Based Cyber-Physical Systems

Towards Measurement-DrivenResilience Design for Clouds

Page 59: SERENE 2014 School: Measurement-Driven Resilience Design of Cloud-Based Cyber-Physical Systems

A performance feature model+ exp. behavior, homogeneity, stability

Li, Z., OBrien, L., Cai, R., & Zhang, H. (2012). Towards a Taxonomy of Performance Evaluation of Commercial Cloud Services. In 2012 IEEE Fifth International Conference on Cloud Computing (pp. 344–351). IEEE. doi:10.1109/CLOUD.2012.74

Page 60: SERENE 2014 School: Measurement-Driven Resilience Design of Cloud-Based Cyber-Physical Systems

Modeling IaaS performance experiments

Li, Z., OBrien, L., Cai, R., & Zhang, H. (2012). Towards a Taxonomy of Performance Evaluation of Commercial Cloud Services. In 2012 IEEE Fifth International Conference on Cloud Computing (pp. 344–351). IEEE. doi:10.1109/CLOUD.2012.74

Page 61: SERENE 2014 School: Measurement-Driven Resilience Design of Cloud-Based Cyber-Physical Systems

„Cloud metrology” and its application

Full stack instrumentation

Full adaptive data acquisition

Fine-grained storage

Exploratory Data Analysis

Confirmatory Data Analysis

Mystery shoppers and routine excercises

Application sensitivity model

(Platform) fault modelPerformance/capacity

model

Structural defenses

Dynamic defenses

MO

NIT

OR

ING

BE

NC

HM

AR

KIN

G

Page 62: SERENE 2014 School: Measurement-Driven Resilience Design of Cloud-Based Cyber-Physical Systems

Example: characterizing VDI „CPU Ready Time”

„Ready”: VM ready to run, but not scheduledo VDI: „stutter”

Rare eventso Sampling

Needs fine granularity! + at least a few months Very „wide” data

Result: ~QoE capacity + load

Big Data tooling

Page 63: SERENE 2014 School: Measurement-Driven Resilience Design of Cloud-Based Cyber-Physical Systems

EDA: hypotheses from „visual tours” of the data

Cloud responsetime ~ nw delay

client ID ~ loc

Client locationsDoes not scale for

Big Data (yet)

Page 64: SERENE 2014 School: Measurement-Driven Resilience Design of Cloud-Based Cyber-Physical Systems

Workflow? (As of now)

Classicaltools

Slow EDA On Big Data

Interactive EDAOn samples

statistics on samples

Big Data statistics

Hadoop, Storm, Cassandra, …

Page 65: SERENE 2014 School: Measurement-Driven Resilience Design of Cloud-Based Cyber-Physical Systems

The effect of CPS cloud backend instability

Page 66: SERENE 2014 School: Measurement-Driven Resilience Design of Cloud-Based Cyber-Physical Systems

Experimental environment

Host1 Host2

Workstation Workstation

OS_

con

tr

OS_compute

nim

bu

s

OS_

net

wo

rk

Co

llect

D

rep

lay

sup

erv

2

sup

erv

1

Application

Page 67: SERENE 2014 School: Measurement-Driven Resilience Design of Cloud-Based Cyber-Physical Systems

Application topology

Redisspout

Gatherer1

Gatherer2

Aggregator

Timerspout

Sweeper

<ts, city, delay>

<city, delay>

Page 68: SERENE 2014 School: Measurement-Driven Resilience Design of Cloud-Based Cyber-Physical Systems

WorkloadBaselineworkload

Start of stress End of stress

Page 69: SERENE 2014 School: Measurement-Driven Resilience Design of Cloud-Based Cyber-Physical Systems

CPU utilization

Page 70: SERENE 2014 School: Measurement-Driven Resilience Design of Cloud-Based Cyber-Physical Systems

Process latency

Relationship with guest resource usage?

Page 71: SERENE 2014 School: Measurement-Driven Resilience Design of Cloud-Based Cyber-Physical Systems

Correlation: 0.890

Page 72: SERENE 2014 School: Measurement-Driven Resilience Design of Cloud-Based Cyber-Physical Systems

Acknowledgements

Special thanks go for the experimental environmentand data to our OpenStack Measurement „taskforce”:

Ágnes Salánki, Dávid Zilahi, Tamás Nádudvari, György Nádudvari, Gábor Kiss (BME) and

Gábor Urbanics (Quanopt Ltd, our spinoff)

72