joshua reich, oren laadan, eli brosh, alex sherman, vishal misra, jason nieh, and dan rubenstein 1...

Post on 27-Mar-2015

213 Views

Category:

Documents

0 Downloads

Preview:

Click to see full reader

TRANSCRIPT

1

Joshua Reich, Oren Laadan, Eli Brosh, Alex Sherman, Vishal Misra,

Jason Nieh, and Dan Rubenstein

VMTorrent: Scalable P2P Virtual Machine Streaming

2

• VM: software implementation of computer

• Implementation stored in VM image

• VM runs on VMM– Virtualizes HW

– Accesses image

VM Basics

VMMVMImage

VM

3

Where is Image Stored?

VMMVMImage

VM

4

Traditionally: Local Storage

VMM VMLocalStorage

5

IaaS Cloud: on Network Storage

VMM VMVM

Image

NetworkStorage

6

Can Be Primary

VMM VMNFS/iSCSIVM

Image

NetworkStorage

e.g., OpenStack Glance Amazon EC2/S3

vSphere network storage

7

Or Secondary

VMMVM

Image

VMNetworkStorage Local

Storage

e.g., Amazon EC2/EBSvSphere local

storage

8

Either Way, No Problem Here

VMM VMVM

Image

NetworkStorage

9

Here?

VMImage

NetworkStorage

Bottleneck!

10

Lots of Unique VM Images

NetworkStorage

on EC2 alone

54784 unique images*

*http://thecloudmarket.com/stats#/totals , 06 Dec 2012

11

Unpredictable Demand

NetworkStorage

• Lots of customers

• Spot-pricing

• Cloud-bursting

12

Don’t Just Take My Word• “The challenge for IT teams will be

finding way to deal with the bandwidth strain during peak demand - for instance when hundreds or thousands of users log on to a virtual desktop at the start of the day - while staying within an acceptable budget” 1

• “scale limits are due to simultaneous loading rather than total number of nodes” 2

• Developer proposals to replace or supplement VM launch architecture for greater scalability 3

1. http://www.zdnet.com/why-so-many-businesses-arent-ready-for-virtual-desktops-7000008229/?s_cid=e539

2. http://www.openstack.org/blog/2011/12/openstack-deployments-abound-at-austin-meetup-129

3. https://blueprints.launchpad.net/nova/+spec/xenserver-bittorrent-images

13

Challenge: VM Launch in IaaS

• Minimize delay in VM execution

• Starting from time launch request arrives

• For lots of instances (scale!)

14

Naive Scaling Approaches

• Multicast – Setup, configuration, maintenance, etc.

1

– ACK implosion

– “multicast traffic saturated the CPU on [Etsy] core switches causing all of Etsy to be unreachable“ 2

1. [El-Sayed et al., 2003; Hosseini et al., 2007]2. http://codeascraft.etsy.com/2012/01/23/solr-bittorrent-index-replication

15

Naive Scaling Approaches

• P2P bulk data download (e.g., Bit-Torrent)– Files are big (waste bandwidth)

–Must wait until whole file available (waste time)

– Network primary? Must store GB image in RAM!

16

Both Miss Big Opportunity

VM image access • Sparse• Gradual

• Most of image doesn’t need to be transferred

• Can start w/ just a couple of blocks

17

VMTorrent Contributions

• Architecture

–Make (scalable) streaming possible: Decouple data delivery from presentation

–Make scalable streaming effective: Profile-based image streaming techniques

• Understanding / Validation

–Modeling for VM image streaming

– Prototype & evaluation not highly optimized

18

Talk

• Make (scalable) streaming possible: Decouple data delivery from presentation

• Make scalable streaming effective: Profile-based image streaming techniques

• VMTorrent Prototype & Evaluation

(Modeling along the way)

19

Decoupling Data Delivery from Presentation

(Making Streaming Possible)

20

Generic Virtualization Architecture

VM

VMM Host

Hardware

VM Imag

e

FS

• Virtual Machine Monitor virtualizes hardware

• Conducts I/O to image through file system

21

Cloud Virtualization Architecture

VM

VMM

Hardware

VM Imag

e

FS

• Either to download image

NetworkBacken

d

Network backend used

• Or to access via remote FS

22

CustomFS

VMTorrent Virtualization Architecture

VM

VMM

Hardware

VM Imag

e

FS

NetworkBacken

d

• Divide image into pieces• But provide appearance

of complete image to VMM

• Introduce custom file system

23

CustomFS

Decoupling Delivery from Presentation

VM

VMM

Hardware

NetworkBacken

d

0 1 23 4 56 7 8

VMM attempts to read piece 1Piece 1 is present, read completes

24

CustomFS

VM

VMM

Hardware

NetworkBacken

d

0 1 23 4 56 7 8

VMM attempts to read piece 0Piece 0 isn’t local, read stallsVMM waits for I/O to completeVM stalls

Decoupling Delivery from Presentation

25

CustomFS

VM

VMM

Hardware

NetworkBacken

d

0 1 23 4 56 7 8

FS requests piece from backendBackend requests from network

Decoupling Delivery from Presentation

26

VM

VMM

Hardware

NetworkBacken

d

0

Later, network delivers piece 0

CustomFS

1 23 4 56 7 8

0

Read completesCustom FS receives, updates piece

VMM resumes VM’s execution

Decoupling Delivery from Presentation

27

Decoupling Improves Performance

VM

VMM

Hardware

NetworkBacken

d

CustomFS

1 23 4 56 7 8

0

Primary StorageNo waiting for image download to complete

28

Decoupling Improves Performance

VM

VMM

Hardware

NetworkBacken

d

CustomFS

1 23 4 56 7 8

0

X

X

Secondary StorageNo more writes or re-reads over network w/ remote FS

29

But Doesn’t Scale

Assuming a single server,the time to download a single piece is

t = W + S / (rnet / n)

• W : wait time for first bit • rnet : network speed

• S : piece size• n : # of clients

Transfer time,each client getsrnet / n of server BW

30

Read Time Grows Linearly w/ n

Assuming a single server,the time to download a single piece is

t = W + n * S / rnet

• W : wait time for first bit • rnet : network speed

• S : piece size• n : # of clients

Transfer timelinear w/ n

31

This Scenario

VM

VMM

Hardware

NetworkBacken

d

CustomFS

1 23 4 56 7 8

0

csd

32

Alleviate network storage bottleneck

VM

VMM

Hardware

NetworkBacken

d

CustomFS

1 23 4 56 7 8

0

Decoupling Enables P2P Backend

Swarm

P2PManage

r

1 23 4 56 7 8

0

• Exchange pieces w/ swarmP2P copy must remain pristine

33

VM

VMM

Hardware

CustomFS

1 23 4 56 7 8

0

Space Efficient

Swarm

P2PManage

r

1 23 4 56 7 8

0

FS uses pointers to P2P imageFS does copy-on-write

6 7

34

VM

VMM

Hardware

CustomFS

1 23 4 56 7 8

0

Minimizing Stall Time

Swarm

P2PManage

r

1 23 4 56 7 8

0

Non-local piece accesses

6 7

4?

4?

4!

Trigger high priority requests

35

Now, the time to download a single piece is

t = W(d) + S / rnet

• W(d) : wait time for first bit as function of

• d : piece diversity• rnet : network speed

• S : piece size• n : # of peers

P2P Helps

Wait is function of diversity

Transfer timeindependent of n

36

High Diversity Swarm Efficiency

37

Low Diversity Little Benefit

Nothing to share

38

All peers request same pieces at same time

t = W(d) + S / rnet

Low piece diversity Long wait (gets worse as n grows) Long download times

P2P Helps, But Not Enough

39

VM

VMM

Hardware

CustomFS

1 23 4 56 7 8

0

This Scenario

Swarm

P2PManage

r

1 23 4 56 7 8

0

6 7

p2pd

40

Profile-based Image Streaming Techniques

(Making Streaming Effective)

41

How to Increase Diversity?

Need to fetch pieces that are

• Rare: not yet demanded by many peers

• Useful: likely to be used by some peer

Profiling

• Need useful pieces

• But only small % of VM image accessed

• We need to know which pieces accessed

• Also, when (need later for piece selection)

42

43

Build Profile

• One profile for each VM/workload

• Ran one or more times (even online)

• Use FS to track –Which pieces accessed

–When pieces accessed

• Entries w/ average appearance time, piece index, and frequency

44

Piece Selection

• Want pieces not yet demanded by many

• Don’t know piece distribution in swarm

• Guess others like self

• Gives estimate when pieces likely needed

45

Piece Selection Heuristic

• Randomly (rarest first) pick one of first k pieces in predicted playback window

• fetch w/ medium priority (demand wins)

46

Profile-based Prefetching

• Increases diversity

• Helps even w/ no peers (when ideal access exceeds network rate)

47

Profile-based window-randomized prefetch

t = W(d) + S / rnet

High piece diversity Short wait (shouldn’t grow much w/ n)

Quick piece download

Obtain Full P2P Benefit

48

VM

VMM

Hardware

CustomFS

1 23 4 56 7 8

0

Full VMTorrent Architecture

Swarm

P2P Manager

1 23 4 56 7 8

0

6 7

profile

p2pp

49

Prototype

50

VM

Hardware

CustomFS

1 23 4 56 7 8

0

VMTorrent Prototype

BT Swarm

P2P Manager

1 23 4 56 7 8

0

6 7

profile

Custom CUsing FUSE

Custom C++& Libtorrent

51

Evaluation Setup

52

Testbeds

• Emulab [White, et. al, 2002]

– Instances on 100 dedicated hardware nodes

– 100 Mbps LAN

• VICCI [Peterson, et. al, 2011]

– Instances on 64 vserver hardware node

slices

– 1 Gbps LAN

53

VMs

54

Workloads

• Short VDI-like tasks• Some cpu-intensive, some I/O

intensive

55

• Measured total runtime – Launch through shutdown

– (Easy to measure)

• Normalized against memory-cached execution– Ideal runtime for that set of hardware

– Allows easy cross-comparison • different VM/workload combinations

• Different hardware platforms

Assessment

56

Evaluation

57

100 Mbps Scaling

Starting to increase

58

Due to Decreased Diversity

# peers increases more demand requests to seed less opportunity to build diversity longer to reach max swarming efficiency + lower max

59

Due to Decreased Diversity

# peers increases more demand requests to seed less opportunity to build diversity longer to reach max swarming efficiency + lower max

60

Due to Decreased Diversity

# peers increases more demand requests to seed less opportunity to build diversity longer to reach max swarming efficiency + lower max

We optimized too much for single instance!

(choosing demand requests take precedence)

61

(Some) Future Work

• Piece selection for better diversity

• Improved profiling

• DC-specific optimizations

Current work orders of magnitude better than state-of-art

62

Demo(video omitted for space)

63

See Paper for More Details

• Modeling– Playback process dynamics – Buffering (for prefetch)– Full characterization of r incorporating

impact of centralized and distributed models on W

– Other elided details

• Plus–More architectural discussion!– Lots more experimental results!

64

Summary• Scalable VM launching needed

• VMTorrent addresses by– Decoupling data presentation from streaming

– Profile-based VM image streaming

• Straightforward techniques, implementation, no special optimizations for DC

• Performance much better than state-of-art– Hardware evaluation on multiple testbeds

– As predicted by modeling

top related