joshua reich, oren laadan, eli brosh, alex sherman, vishal misra, jason nieh, and dan rubenstein 1...

Joshua Reich, Oren Laadan, Eli Brosh, Alex Sherman, Vishal Misra,

Jason Nieh, and Dan Rubenstein

VMTorrent: Scalable P2P Virtual Machine Streaming

• VM: software implementation of computer

• Implementation stored in VM image

• VM runs on VMM– Virtualizes HW

– Accesses image

VM Basics

VMMVMImage

Where is Image Stored?

VMMVMImage

Traditionally: Local Storage

VMM VMLocalStorage

IaaS Cloud: on Network Storage

VMM VMVM

NetworkStorage

Can Be Primary

VMM VMNFS/iSCSIVM

NetworkStorage

e.g., OpenStack Glance Amazon EC2/S3

vSphere network storage

Or Secondary

VMNetworkStorage Local

Storage

e.g., Amazon EC2/EBSvSphere local

storage

Either Way, No Problem Here

VMM VMVM

NetworkStorage

VMImage

NetworkStorage

Bottleneck!

Lots of Unique VM Images

NetworkStorage

on EC2 alone

54784 unique images*

*http://thecloudmarket.com/stats#/totals , 06 Dec 2012

Unpredictable Demand

NetworkStorage

• Lots of customers

• Spot-pricing

• Cloud-bursting

Don’t Just Take My Word• “The challenge for IT teams will be

finding way to deal with the bandwidth strain during peak demand - for instance when hundreds or thousands of users log on to a virtual desktop at the start of the day - while staying within an acceptable budget” 1

• “scale limits are due to simultaneous loading rather than total number of nodes” 2

• Developer proposals to replace or supplement VM launch architecture for greater scalability 3

1. http://www.zdnet.com/why-so-many-businesses-arent-ready-for-virtual-desktops-7000008229/?s_cid=e539

2. http://www.openstack.org/blog/2011/12/openstack-deployments-abound-at-austin-meetup-129

3. https://blueprints.launchpad.net/nova/+spec/xenserver-bittorrent-images

Challenge: VM Launch in IaaS

• Minimize delay in VM execution

• Starting from time launch request arrives

• For lots of instances (scale!)

Naive Scaling Approaches

• Multicast – Setup, configuration, maintenance, etc.

– ACK implosion

– “multicast traffic saturated the CPU on [Etsy] core switches causing all of Etsy to be unreachable“ 2

1. [El-Sayed et al., 2003; Hosseini et al., 2007]2. http://codeascraft.etsy.com/2012/01/23/solr-bittorrent-index-replication

Naive Scaling Approaches

• P2P bulk data download (e.g., Bit-Torrent)– Files are big (waste bandwidth)

–Must wait until whole file available (waste time)

– Network primary? Must store GB image in RAM!

Both Miss Big Opportunity

VM image access • Sparse• Gradual

• Most of image doesn’t need to be transferred

• Can start w/ just a couple of blocks

VMTorrent Contributions

• Architecture

–Make (scalable) streaming possible: Decouple data delivery from presentation

–Make scalable streaming effective: Profile-based image streaming techniques

• Understanding / Validation

–Modeling for VM image streaming

– Prototype & evaluation not highly optimized

• Make (scalable) streaming possible: Decouple data delivery from presentation

• Make scalable streaming effective: Profile-based image streaming techniques

• VMTorrent Prototype & Evaluation

(Modeling along the way)

Decoupling Data Delivery from Presentation

(Making Streaming Possible)

Generic Virtualization Architecture

VMM Host

Hardware

VM Imag

• Virtual Machine Monitor virtualizes hardware

• Conducts I/O to image through file system

Cloud Virtualization Architecture

Hardware

VM Imag

• Either to download image

NetworkBacken

Network backend used

• Or to access via remote FS

CustomFS

VMTorrent Virtualization Architecture

Hardware

VM Imag

NetworkBacken

• Divide image into pieces• But provide appearance

of complete image to VMM

• Introduce custom file system

CustomFS

Decoupling Delivery from Presentation

Hardware

NetworkBacken

0 1 23 4 56 7 8

VMM attempts to read piece 1Piece 1 is present, read completes

CustomFS

Hardware

NetworkBacken

0 1 23 4 56 7 8

VMM attempts to read piece 0Piece 0 isn’t local, read stallsVMM waits for I/O to completeVM stalls

CustomFS

Hardware

NetworkBacken

0 1 23 4 56 7 8

FS requests piece from backendBackend requests from network

Hardware

NetworkBacken

Later, network delivers piece 0

CustomFS

1 23 4 56 7 8

Read completesCustom FS receives, updates piece

VMM resumes VM’s execution

Decoupling Improves Performance

Hardware

NetworkBacken

CustomFS

1 23 4 56 7 8

Primary StorageNo waiting for image download to complete

Decoupling Improves Performance

Hardware

NetworkBacken

CustomFS

1 23 4 56 7 8

Secondary StorageNo more writes or re-reads over network w/ remote FS

But Doesn’t Scale

Assuming a single server,the time to download a single piece is

t = W + S / (rnet / n)

• W : wait time for first bit • rnet : network speed

• S : piece size• n : # of clients

Transfer time,each client getsrnet / n of server BW

Read Time Grows Linearly w/ n

Assuming a single server,the time to download a single piece is

t = W + n * S / rnet

• W : wait time for first bit • rnet : network speed

• S : piece size• n : # of clients

Transfer timelinear w/ n

This Scenario

Hardware

NetworkBacken

CustomFS

1 23 4 56 7 8

Alleviate network storage bottleneck

Hardware

NetworkBacken

CustomFS

1 23 4 56 7 8

Decoupling Enables P2P Backend

P2PManage

1 23 4 56 7 8

• Exchange pieces w/ swarmP2P copy must remain pristine

Hardware

CustomFS

1 23 4 56 7 8

Space Efficient

P2PManage

1 23 4 56 7 8

FS uses pointers to P2P imageFS does copy-on-write

Hardware

CustomFS

1 23 4 56 7 8

Minimizing Stall Time

P2PManage

1 23 4 56 7 8

Non-local piece accesses

Trigger high priority requests

Now, the time to download a single piece is

t = W(d) + S / rnet

• W(d) : wait time for first bit as function of

• d : piece diversity• rnet : network speed

• S : piece size• n : # of peers

P2P Helps

Wait is function of diversity

Transfer timeindependent of n

High Diversity Swarm Efficiency

Low Diversity Little Benefit

Nothing to share

All peers request same pieces at same time

t = W(d) + S / rnet

Low piece diversity Long wait (gets worse as n grows) Long download times

P2P Helps, But Not Enough

Hardware

CustomFS

1 23 4 56 7 8

This Scenario

P2PManage

1 23 4 56 7 8

Profile-based Image Streaming Techniques

(Making Streaming Effective)

How to Increase Diversity?

Need to fetch pieces that are

• Rare: not yet demanded by many peers

• Useful: likely to be used by some peer

Profiling

• Need useful pieces

• But only small % of VM image accessed

• We need to know which pieces accessed

• Also, when (need later for piece selection)

Build Profile

• One profile for each VM/workload

• Ran one or more times (even online)

• Use FS to track –Which pieces accessed

–When pieces accessed

• Entries w/ average appearance time, piece index, and frequency

Piece Selection

• Want pieces not yet demanded by many

• Don’t know piece distribution in swarm

• Guess others like self

• Gives estimate when pieces likely needed

Piece Selection Heuristic

• Randomly (rarest first) pick one of first k pieces in predicted playback window

• fetch w/ medium priority (demand wins)

Profile-based Prefetching

• Increases diversity

• Helps even w/ no peers (when ideal access exceeds network rate)

Profile-based window-randomized prefetch

t = W(d) + S / rnet

High piece diversity Short wait (shouldn’t grow much w/ n)

Quick piece download

Obtain Full P2P Benefit

Hardware

CustomFS

1 23 4 56 7 8

Full VMTorrent Architecture

P2P Manager

1 23 4 56 7 8

profile

Prototype

Hardware

CustomFS

1 23 4 56 7 8

VMTorrent Prototype

BT Swarm

P2P Manager

1 23 4 56 7 8

profile

Custom CUsing FUSE

Custom C++& Libtorrent

Evaluation Setup

Testbeds

• Emulab [White, et. al, 2002]

– Instances on 100 dedicated hardware nodes

– 100 Mbps LAN

• VICCI [Peterson, et. al, 2011]

– Instances on 64 vserver hardware node

slices

– 1 Gbps LAN

Workloads

• Short VDI-like tasks• Some cpu-intensive, some I/O

intensive

• Measured total runtime – Launch through shutdown

– (Easy to measure)

• Normalized against memory-cached execution– Ideal runtime for that set of hardware

– Allows easy cross-comparison • different VM/workload combinations

• Different hardware platforms

Assessment

Evaluation

100 Mbps Scaling

Starting to increase

Due to Decreased Diversity

# peers increases more demand requests to seed less opportunity to build diversity longer to reach max swarming efficiency + lower max

We optimized too much for single instance!

(choosing demand requests take precedence)

(Some) Future Work

• Piece selection for better diversity

• Improved profiling

• DC-specific optimizations

Current work orders of magnitude better than state-of-art

Demo(video omitted for space)

See Paper for More Details

• Modeling– Playback process dynamics – Buffering (for prefetch)– Full characterization of r incorporating

impact of centralized and distributed models on W

– Other elided details

• Plus–More architectural discussion!– Lots more experimental results!

Summary• Scalable VM launching needed

• VMTorrent addresses by– Decoupling data presentation from streaming

– Profile-based VM image streaming

• Straightforward techniques, implementation, no special optimizations for DC

• Performance much better than state-of-art– Hardware evaluation on multiple testbeds

– As predicted by modeling

joshua reich, oren laadan, eli brosh, alex sherman, vishal misra, jason nieh, and dan rubenstein 1...

vmm vm image vm slide

vmm vm local storage

vm execution

images slide

hw accesses image vm

image doesnt

gb image

lots of unique vm images

Documents

internal use only – not for distribution elsafe ravid...

timings training 2005/03/15 bca158 kobe nieh 6793 foxconn...

aphg rubenstein ch11 industry - al vazquez

rubenstein creative overview

chapter 1 rubenstein

joshua s. rubenstein

rubenstein study guide - weebly

rubenstein appendix extra reading

michael rubenstein, wei -min shen

rubenstein baer shell scenarios 201203120

david rubenstein final v3

mu-ping nieh, phd - 國立臺灣大學 · 2017-12-13 ·...

rubenstein - 'absolute processes; a nominalist alternative

culture rubenstein ch 3

created by ms. linching nieh mandarin chinese at cgms...

neil rubenstein letter

rubenstein creative samples

j anim sci-1998-brosh-3054-64

michael spitzer-rubenstein strategies portfolio

evelyn rubenstein jcc houston