is it time to go global with cloud performance management?

32

Upload: marat-zhanikeev

Post on 12-May-2015

216 views

Category:

Technology


0 download

DESCRIPTION

The topic of Federated Clouds has been in discussion for several years. However, practice today sees very little federation across large infrastructure providers. One of the biggest causes of this loitering is insufficient understanding of how to share responsibility across data centers, providers, and so on. This study shows that understanding cloud performance at such a large scale is a crucial part of information support in federated clouds. Topics like cloud performance measurement and modeling, as well as several practical ongoing projects and works in progress are also discussed.

TRANSCRIPT

Page 1: Is It Time to Go Global with Cloud Performance Management?
Page 2: Is It Time to Go Global with Cloud Performance Management?

.

Mission Statement

1. federated clouds = diversification2. many DCs and/or cloud providers

3. we care mostly about performance4. practical solutions are needed

Marat Zhanikeev -- [email protected] Is It Time to Go Global with Cloud Performance Management? -- http://tinyurl.com/marat140328 2/30...

2/30

Page 3: Is It Time to Go Global with Cloud Performance Management?

.

Example: BizStore

Marat Zhanikeev -- [email protected] Is It Time to Go Global with Cloud Performance Management? -- http://tinyurl.com/marat140328 3/30...

3/30

Page 4: Is It Time to Go Global with Cloud Performance Management?

.

BizStore: One DC is Not Enough

• rememberJune 2013?• most services today use vertical intergration -- no diversity

• Hitachi does not share DCs with NEC

• regional diversity of one provider is bad◦ how many Amazon DCs in Japan?

.(the only possible) Solution..

.

... is to sign contracts with multiple DCs and manage on

client side◦ to be officially presented/released in April 01

01 myself+0 "High Availability Cloud Storage ... Social Graph ... Smart Distribution" NS研 (April 2014)

Marat Zhanikeev -- [email protected] Is It Time to Go Global with Cloud Performance Management? -- http://tinyurl.com/marat140328 4/30...

4/30

Page 5: Is It Time to Go Global with Cloud Performance Management?

.

BizStore: One DC is Not Enough

Kansai

DC1

Okinawa Locations

Data Centers

DC2

Kyushu

Osaka Office DC1

DC1 DC2 Naha Office

Network distance

Network distance

storage network

Employee A …. Content / Social Metadata High Availability Data Store DC1 DC2 ….

DC1 DC2 Business trip

Store APIs

Proposed Software

Marat Zhanikeev -- [email protected] Is It Time to Go Global with Cloud Performance Management? -- http://tinyurl.com/marat140328 5/30...

5/30

Page 6: Is It Time to Go Global with Cloud Performance Management?

.

BizStore: Store Diversification

• in software: not a priority list -- optimization engine!• realtime performance monitoring, read/write optimization, etc.

• sub-file data unit -- chunks

SSD Growing network

distance User

HDD DC1 DC2 …

Network

Marat Zhanikeev -- [email protected] Is It Time to Go Global with Cloud Performance Management? -- http://tinyurl.com/marat140328 6/30...

6/30

Page 7: Is It Time to Go Global with Cloud Performance Management?

.

BizStore: Socially Aware Store• content relevance based on

social graph• relevance is a distribution• individual redundancy based on distribution

• other link types: same time, location,filetype, ...

• link strengh != 1Descending

order

Relevance

Distribution

Redundancy (user setting)

Physical limit of redundancy

End of content

There is a link

When a file is …

Between Created Viewed Edited Deleted

Marat Zhanikeev -- [email protected] Is It Time to Go Global with Cloud Performance Management? -- http://tinyurl.com/marat140328 7/30...

7/30

Page 8: Is It Time to Go Global with Cloud Performance Management?

.

Example: Cloud Streaming

Marat Zhanikeev -- [email protected] Is It Time to Go Global with Cloud Performance Management? -- http://tinyurl.com/marat140328 8/30...

8/30

Page 9: Is It Time to Go Global with Cloud Performance Management?

.

Cloud Streaming: Fixing Problems

Traditionalstreaming

P2Pstreaming

Cloud streaming

Adaptivestreaming

• Congestion(Flash Crowds)

• Unreliable throughput

• Unreliable sources

• Unreliable throughput

• Congestion

Fixed Fixed

Fixed

Fixed

Marat Zhanikeev -- [email protected] Is It Time to Go Global with Cloud Performance Management? -- http://tinyurl.com/marat140328 9/30...

9/30

Page 10: Is It Time to Go Global with Cloud Performance Management?

.

Cloud Streaming: Design

VMpopulation

CurrentSources

ServiceProvider

(SP)

Tracker

ServiceProvider

(SP)

Parentpeers

P2Pstreaming

Cloudstreaming

ClientClient

02 myself+0 "Multi-Source Stream Aggregation in the Cloud" Wiley Book on ACDN, Chapter 10 (2014)

Marat Zhanikeev -- [email protected] Is It Time to Go Global with Cloud Performance Management? -- http://tinyurl.com/marat140328 10/30...

10/30

Page 11: Is It Time to Go Global with Cloud Performance Management?

.

Practical Solutions for FederatedClouds

Marat Zhanikeev -- [email protected] Is It Time to Go Global with Cloud Performance Management? -- http://tinyurl.com/marat140328 11/30...

11/30

Page 12: Is It Time to Go Global with Cloud Performance Management?

.

A Shortlist of (S)olutions

1. S1: Nextgen traffic processors at DCs

2. S2: QoS Context and Performance Visualization at DCs

3. S3: Performance Modeling for Federated Clouds

4. S4: Client Side Traffic Boostings5. .... definitely not a complete list

Marat Zhanikeev -- [email protected] Is It Time to Go Global with Cloud Performance Management? -- http://tinyurl.com/marat140328 12/30...

12/30

Page 13: Is It Time to Go Global with Cloud Performance Management?

.

Solution (S) 1:Nextgen Traffic Processors at DCs(work in progress)

Marat Zhanikeev -- [email protected] Is It Time to Go Global with Cloud Performance Management? -- http://tinyurl.com/marat140328 13/30...

13/30

Page 14: Is It Time to Go Global with Cloud Performance Management?

.

S1: Multicore Packet Capture

Global Networks

Data Center Internals

Gateway Switch

Capture Manager

CPU CPU CPU

CPU CPU CPU

… Storage

Mirror

• multicore is the key

• multicore !=traditional parallel processing03

• on-demand capture, DPI,heterogeneous tasks 04

03 myself+0 "...Multicore Capture in Data Center Forensics" ACM AISACCS-SFCS (June 2014)

04 myself+0 "A Lock-Free Shared Memory Design for ... Multicore Packet Traffic Capture" IJNM (in print)

Marat Zhanikeev -- [email protected] Is It Time to Go Global with Cloud Performance Management? -- http://tinyurl.com/marat140328 14/30...

14/30

Page 15: Is It Time to Go Global with Cloud Performance Management?

.

S1: Multicore Hates Memory Locks• lockfree design 04 : no messages, no memory locks

PF_RING

PF_RING

TimeManager

Shared Memory

Capture

Capture

Core 1

Core 2

Core 3

….

Core X Manager

PF_ RING

Shared memory

One thread

Create

Fork

Lifespan Stale check

Process/wrap Wrap wait

Doub

le-L

inke

d Li

st (D

LL)

Assign

04 myself+0 "A Lock-Free Shared Memory Design for ... Multicore Packet Traffic Capture" IJNM (in print)

Marat Zhanikeev -- [email protected] Is It Time to Go Global with Cloud Performance Management? -- http://tinyurl.com/marat140328 15/30...

15/30

Page 16: Is It Time to Go Global with Cloud Performance Management?

.

Solution (S) 2: DC Performance APIs

Marat Zhanikeev -- [email protected] Is It Time to Go Global with Cloud Performance Management? -- http://tinyurl.com/marat140328 16/30...

16/30

Page 17: Is It Time to Go Global with Cloud Performance Management?

.

S2: E2E QoS, M2M Patterns

Meter Merger

Per flow statistics

Analyzer

History,state

Profiler

UDP

Users

Clients

Probe Analysis machine

Web application

• clean slate: captureQoS context 05

• visualize usercommunities

• export via APIs to usersand/or service providers

05 myself+0 "A holistic community-based architecture for measuring E2E QoS at data centres" IJCSE (in print)

Marat Zhanikeev -- [email protected] Is It Time to Go Global with Cloud Performance Management? -- http://tinyurl.com/marat140328 17/30...

17/30

Page 18: Is It Time to Go Global with Cloud Performance Management?

.

S2: Has to be a Clean Slate

ProbeRouter

Data center infrastructure

source IP

timestamp Key Key DLL0 #01 #02 #03 1 #042 #05 #06….….2^24 #07

source portdest IP

dest port

protocolpacket size

CRC24

Packet Hash table

#01

DLL#05#04

#02#03#07

Exportover UDP

Byte048

12162024…

0 (bits) 32

Source port Dest port

Source IPDestination IP

* psize pspace

Start time (s)Start time (us)

* psize pspace1 11

Data unit

psize:Packet size

pspace:Packet space

(us)

#06

Exportvia

a file

UDP RX

Buffer (5s)

Byte048

12162024…

0 (bits) 32

Source port Dest port

Source IPDestination IP

D psize pspace

Start time (s)Start time (us)

D psize pspace1 11

Data unit

D:Direction(0 or 1)

Merger

Find flow fromopposite direction

Analyzer

History

State

Read and update

Ring buffer of data units per IP on internal networks

Statistic MeaningMinOWD Global minimum OWDMaxBatch Max byte count of a

packet burstBulks Throughputs in flows

Per source-dest pair

• has to be a clean slate!• cisco, ntop, sflow are notfeasible

• QoS context is somethingnew

• (figure is vector, so, zoom in!)

Marat Zhanikeev -- [email protected] Is It Time to Go Global with Cloud Performance Management? -- http://tinyurl.com/marat140328 18/30...

18/30

Page 19: Is It Time to Go Global with Cloud Performance Management?

.

S2: But Payoff is Great!

0 6400 12800 19200Batch size (bytes)

0

800

1600

2400

3200

4000O

WD

(ms)

+ T

X tim

e (x

0.1m

s)

Marat Zhanikeev -- [email protected] Is It Time to Go Global with Cloud Performance Management? -- http://tinyurl.com/marat140328 19/30...

19/30

Page 20: Is It Time to Go Global with Cloud Performance Management?

.

Solution (S) 3: Cloud Weather System(work in progress)

Marat Zhanikeev -- [email protected] Is It Time to Go Global with Cloud Performance Management? -- http://tinyurl.com/marat140328 20/30...

20/30

Page 21: Is It Time to Go Global with Cloud Performance Management?

.

S3: Cloud Weather System

(high/low)Pressure

front

Typhoon

Drought

Goodweather

Badweather

• continents: user, services 07

• water: network• weather, clouds, etc.: changes inperformance

• droughts: insufficiency ofinfrastructure, users do not get enoughcapacity

• typhoons: basically, Flash Crowds inservices, going viral, ...

• forecasting: possible with enoughperformance monitoring, similar to stockmarket

07 myself+0 "Cloud Weather System as a Futuristic Performance Model" IEICE総合大会 (March 2013)

Marat Zhanikeev -- [email protected] Is It Time to Go Global with Cloud Performance Management? -- http://tinyurl.com/marat140328 21/30...

21/30

Page 22: Is It Time to Go Global with Cloud Performance Management?

.

Solution (S) 4: Mobile ThroughputBoosters

Marat Zhanikeev -- [email protected] Is It Time to Go Global with Cloud Performance Management? -- http://tinyurl.com/marat140328 22/30...

22/30

Page 23: Is It Time to Go Global with Cloud Performance Management?

.

S4: Mobile Throughput Booster

• so far, only possible in wireless -- WiFi Direct

Single Connection Multipath

Singular Connectivity

Traditional Applications

Traditional Multipath

Multiple Connectivity

No known cases (wasted potential)

Group Communication 3G/LTE/* + WiFi Direct THIS PROPOSAL

Marat Zhanikeev -- [email protected] Is It Time to Go Global with Cloud Performance Management? -- http://tinyurl.com/marat140328 23/30...

23/30

Page 24: Is It Time to Go Global with Cloud Performance Management?

.

S4: Group Resource Pooling

Remote connectivity Local Connectivity

Content Provider

Main Client

Delegated Client

Delegated Client

3G/LTE/* Access 3G/LTE/*

Access

Marat Zhanikeev -- [email protected] Is It Time to Go Global with Cloud Performance Management? -- http://tinyurl.com/marat140328 24/30...

24/30

Page 25: Is It Time to Go Global with Cloud Performance Management?

.

S4: Converged Wireless Campus

Student

Develop, make secure

APP + CODE

Campus

Another Student

APP + CODE

APP + CODE

1

2 2 Distribute

3 Meet and delegate

API Tokens

API Tokens

Distribute

Pass at delegation

University 4

Marat Zhanikeev -- [email protected] Is It Time to Go Global with Cloud Performance Management? -- http://tinyurl.com/marat140328 25/30...

25/30

Page 26: Is It Time to Go Global with Cloud Performance Management?

.

Solution (S) 5: Over-the-NetworkIndexing

Marat Zhanikeev -- [email protected] Is It Time to Go Global with Cloud Performance Management? -- http://tinyurl.com/marat140328 26/30...

26/30

Page 27: Is It Time to Go Global with Cloud Performance Management?

.

S5: Indexing in Clouds

Data

Indexer

Index

Network

Traditional Client

Data

Indexer

Index Read, Write

Stringex Client

The

Marat Zhanikeev -- [email protected] Is It Time to Go Global with Cloud Performance Management? -- http://tinyurl.com/marat140328 27/30...

27/30

Page 28: Is It Time to Go Global with Cloud Performance Management?

.

S5: Over-the-Network Optimization• in short: throughput-centric network storage optimization 08

Stringex

Index

Stringex Client

The

Sync Engine

Optimization

Local Cache

Check 1 2

Use

08 myself+0 "A New Practical Design for Browsable Over-the-Network Indexing" ISEEE (April 2014)

Marat Zhanikeev -- [email protected] Is It Time to Go Global with Cloud Performance Management? -- http://tinyurl.com/marat140328 28/30...

28/30

Page 29: Is It Time to Go Global with Cloud Performance Management?

.

S5: Performance

3.15 3.85 4.55 5.25 5.95 6.65Index Size (log)

2.55

2.65

2.75

2.85

2.95

3.05

3.15

3.25

Thro

ughp

ut (l

og o

f byt

es/d

oc)

Lucene

Stringex

Marat Zhanikeev -- [email protected] Is It Time to Go Global with Cloud Performance Management? -- http://tinyurl.com/marat140328 29/30...

29/30

Page 30: Is It Time to Go Global with Cloud Performance Management?

.

That’s all, thank you ...

Marat Zhanikeev -- [email protected] Is It Time to Go Global with Cloud Performance Management? -- http://tinyurl.com/marat140328 30/30...

30/30

Page 31: Is It Time to Go Global with Cloud Performance Management?

.

[01] myself+0 (April 2014)High Availability Cloud Storage ... Social Graph ... Smart DistributionNS研

[02] myself+0 (2014)Multi-Source Stream Aggregation in the CloudWiley Book on ACDN, Chapter 10

[03] myself+0 (June 2014)...Multicore Capture in Data Center ForensicsACM AISACCS-SFCS

[04] myself+0 (in print)A Lock-Free Shared Memory Design for ... Multicore Packet Traffic CaptureIJNM

[05] myself+0 (in print)A holistic community-based architecture for measuring E2E QoS at data centresIJCSE

[06] myself+0 (May 2014)Marat Zhanikeev -- [email protected] Is It Time to Go Global with Cloud Performance Management? -- http://tinyurl.com/marat140328 30/30

...

30/30

Page 32: Is It Time to Go Global with Cloud Performance Management?

.

Towards a Practical Method for Interactive Traffic Visualizations in Data CentersSC研

[07] myself+0 (March 2013)Cloud Weather System as a Futuristic Performance ModelIEICE総合大会

[08] myself+0 (April 2014)A New Practical Design for Browsable Over-the-Network IndexingISEEE

Marat Zhanikeev -- [email protected] Is It Time to Go Global with Cloud Performance Management? -- http://tinyurl.com/marat140328 30/30...

30/30