joshua reich, oren laadan, eli brosh, alex sherman, vishal misra, jason nieh, and dan rubenstein 1...
TRANSCRIPT
1
Joshua Reich, Oren Laadan, Eli Brosh, Alex Sherman, Vishal Misra,
Jason Nieh, and Dan Rubenstein
VMTorrent: Scalable P2P Virtual Machine Streaming
2
• VM: software implementation of computer
• Implementation stored in VM image
• VM runs on VMM– Virtualizes HW
– Accesses image
VM Basics
VMMVMImage
VM
3
Where is Image Stored?
VMMVMImage
VM
4
Traditionally: Local Storage
VMM VMLocalStorage
5
IaaS Cloud: on Network Storage
VMM VMVM
Image
NetworkStorage
6
Can Be Primary
VMM VMNFS/iSCSIVM
Image
NetworkStorage
e.g., OpenStack Glance Amazon EC2/S3
vSphere network storage
7
Or Secondary
VMMVM
Image
VMNetworkStorage Local
Storage
e.g., Amazon EC2/EBSvSphere local
storage
8
Either Way, No Problem Here
VMM VMVM
Image
NetworkStorage
9
Here?
VMImage
NetworkStorage
Bottleneck!
10
Lots of Unique VM Images
NetworkStorage
on EC2 alone
54784 unique images*
*http://thecloudmarket.com/stats#/totals , 06 Dec 2012
11
Unpredictable Demand
NetworkStorage
• Lots of customers
• Spot-pricing
• Cloud-bursting
12
Don’t Just Take My Word• “The challenge for IT teams will be
finding way to deal with the bandwidth strain during peak demand - for instance when hundreds or thousands of users log on to a virtual desktop at the start of the day - while staying within an acceptable budget” 1
• “scale limits are due to simultaneous loading rather than total number of nodes” 2
• Developer proposals to replace or supplement VM launch architecture for greater scalability 3
1. http://www.zdnet.com/why-so-many-businesses-arent-ready-for-virtual-desktops-7000008229/?s_cid=e539
2. http://www.openstack.org/blog/2011/12/openstack-deployments-abound-at-austin-meetup-129
3. https://blueprints.launchpad.net/nova/+spec/xenserver-bittorrent-images
13
Challenge: VM Launch in IaaS
• Minimize delay in VM execution
• Starting from time launch request arrives
• For lots of instances (scale!)
14
Naive Scaling Approaches
• Multicast – Setup, configuration, maintenance, etc.
1
– ACK implosion
– “multicast traffic saturated the CPU on [Etsy] core switches causing all of Etsy to be unreachable“ 2
1. [El-Sayed et al., 2003; Hosseini et al., 2007]2. http://codeascraft.etsy.com/2012/01/23/solr-bittorrent-index-replication
15
Naive Scaling Approaches
• P2P bulk data download (e.g., Bit-Torrent)– Files are big (waste bandwidth)
–Must wait until whole file available (waste time)
– Network primary? Must store GB image in RAM!
16
Both Miss Big Opportunity
VM image access • Sparse• Gradual
• Most of image doesn’t need to be transferred
• Can start w/ just a couple of blocks
17
VMTorrent Contributions
• Architecture
–Make (scalable) streaming possible: Decouple data delivery from presentation
–Make scalable streaming effective: Profile-based image streaming techniques
• Understanding / Validation
–Modeling for VM image streaming
– Prototype & evaluation not highly optimized
18
Talk
• Make (scalable) streaming possible: Decouple data delivery from presentation
• Make scalable streaming effective: Profile-based image streaming techniques
• VMTorrent Prototype & Evaluation
(Modeling along the way)
19
Decoupling Data Delivery from Presentation
(Making Streaming Possible)
20
Generic Virtualization Architecture
VM
VMM Host
Hardware
VM Imag
e
FS
• Virtual Machine Monitor virtualizes hardware
• Conducts I/O to image through file system
21
Cloud Virtualization Architecture
VM
VMM
Hardware
VM Imag
e
FS
• Either to download image
NetworkBacken
d
Network backend used
• Or to access via remote FS
22
CustomFS
VMTorrent Virtualization Architecture
VM
VMM
Hardware
VM Imag
e
FS
NetworkBacken
d
• Divide image into pieces• But provide appearance
of complete image to VMM
• Introduce custom file system
23
CustomFS
Decoupling Delivery from Presentation
VM
VMM
Hardware
NetworkBacken
d
0 1 23 4 56 7 8
VMM attempts to read piece 1Piece 1 is present, read completes
24
CustomFS
VM
VMM
Hardware
NetworkBacken
d
0 1 23 4 56 7 8
VMM attempts to read piece 0Piece 0 isn’t local, read stallsVMM waits for I/O to completeVM stalls
Decoupling Delivery from Presentation
25
CustomFS
VM
VMM
Hardware
NetworkBacken
d
0 1 23 4 56 7 8
FS requests piece from backendBackend requests from network
Decoupling Delivery from Presentation
26
VM
VMM
Hardware
NetworkBacken
d
0
Later, network delivers piece 0
CustomFS
1 23 4 56 7 8
0
Read completesCustom FS receives, updates piece
VMM resumes VM’s execution
Decoupling Delivery from Presentation
27
Decoupling Improves Performance
VM
VMM
Hardware
NetworkBacken
d
CustomFS
1 23 4 56 7 8
0
Primary StorageNo waiting for image download to complete
28
Decoupling Improves Performance
VM
VMM
Hardware
NetworkBacken
d
CustomFS
1 23 4 56 7 8
0
X
X
Secondary StorageNo more writes or re-reads over network w/ remote FS
29
But Doesn’t Scale
Assuming a single server,the time to download a single piece is
t = W + S / (rnet / n)
• W : wait time for first bit • rnet : network speed
• S : piece size• n : # of clients
Transfer time,each client getsrnet / n of server BW
30
Read Time Grows Linearly w/ n
Assuming a single server,the time to download a single piece is
t = W + n * S / rnet
• W : wait time for first bit • rnet : network speed
• S : piece size• n : # of clients
Transfer timelinear w/ n
31
This Scenario
VM
VMM
Hardware
NetworkBacken
d
CustomFS
1 23 4 56 7 8
0
csd
32
Alleviate network storage bottleneck
VM
VMM
Hardware
NetworkBacken
d
CustomFS
1 23 4 56 7 8
0
Decoupling Enables P2P Backend
Swarm
P2PManage
r
1 23 4 56 7 8
0
• Exchange pieces w/ swarmP2P copy must remain pristine
33
VM
VMM
Hardware
CustomFS
1 23 4 56 7 8
0
Space Efficient
Swarm
P2PManage
r
1 23 4 56 7 8
0
FS uses pointers to P2P imageFS does copy-on-write
6 7
34
VM
VMM
Hardware
CustomFS
1 23 4 56 7 8
0
Minimizing Stall Time
Swarm
P2PManage
r
1 23 4 56 7 8
0
Non-local piece accesses
6 7
4?
4?
4!
Trigger high priority requests
35
Now, the time to download a single piece is
t = W(d) + S / rnet
• W(d) : wait time for first bit as function of
• d : piece diversity• rnet : network speed
• S : piece size• n : # of peers
P2P Helps
Wait is function of diversity
Transfer timeindependent of n
36
High Diversity Swarm Efficiency
37
Low Diversity Little Benefit
Nothing to share
38
All peers request same pieces at same time
t = W(d) + S / rnet
Low piece diversity Long wait (gets worse as n grows) Long download times
P2P Helps, But Not Enough
39
VM
VMM
Hardware
CustomFS
1 23 4 56 7 8
0
This Scenario
Swarm
P2PManage
r
1 23 4 56 7 8
0
6 7
p2pd
40
Profile-based Image Streaming Techniques
(Making Streaming Effective)
41
How to Increase Diversity?
Need to fetch pieces that are
• Rare: not yet demanded by many peers
• Useful: likely to be used by some peer
Profiling
• Need useful pieces
• But only small % of VM image accessed
• We need to know which pieces accessed
• Also, when (need later for piece selection)
42
43
Build Profile
• One profile for each VM/workload
• Ran one or more times (even online)
• Use FS to track –Which pieces accessed
–When pieces accessed
• Entries w/ average appearance time, piece index, and frequency
44
Piece Selection
• Want pieces not yet demanded by many
• Don’t know piece distribution in swarm
• Guess others like self
• Gives estimate when pieces likely needed
45
Piece Selection Heuristic
• Randomly (rarest first) pick one of first k pieces in predicted playback window
• fetch w/ medium priority (demand wins)
46
Profile-based Prefetching
• Increases diversity
• Helps even w/ no peers (when ideal access exceeds network rate)
47
Profile-based window-randomized prefetch
t = W(d) + S / rnet
High piece diversity Short wait (shouldn’t grow much w/ n)
Quick piece download
Obtain Full P2P Benefit
48
VM
VMM
Hardware
CustomFS
1 23 4 56 7 8
0
Full VMTorrent Architecture
Swarm
P2P Manager
1 23 4 56 7 8
0
6 7
profile
p2pp
49
Prototype
50
VM
Hardware
CustomFS
1 23 4 56 7 8
0
VMTorrent Prototype
BT Swarm
P2P Manager
1 23 4 56 7 8
0
6 7
profile
Custom CUsing FUSE
Custom C++& Libtorrent
51
Evaluation Setup
52
Testbeds
• Emulab [White, et. al, 2002]
– Instances on 100 dedicated hardware nodes
– 100 Mbps LAN
• VICCI [Peterson, et. al, 2011]
– Instances on 64 vserver hardware node
slices
– 1 Gbps LAN
53
VMs
54
Workloads
• Short VDI-like tasks• Some cpu-intensive, some I/O
intensive
55
• Measured total runtime – Launch through shutdown
– (Easy to measure)
• Normalized against memory-cached execution– Ideal runtime for that set of hardware
– Allows easy cross-comparison • different VM/workload combinations
• Different hardware platforms
Assessment
56
Evaluation
57
100 Mbps Scaling
Starting to increase
58
Due to Decreased Diversity
# peers increases more demand requests to seed less opportunity to build diversity longer to reach max swarming efficiency + lower max
59
Due to Decreased Diversity
# peers increases more demand requests to seed less opportunity to build diversity longer to reach max swarming efficiency + lower max
60
Due to Decreased Diversity
# peers increases more demand requests to seed less opportunity to build diversity longer to reach max swarming efficiency + lower max
We optimized too much for single instance!
(choosing demand requests take precedence)
61
(Some) Future Work
• Piece selection for better diversity
• Improved profiling
• DC-specific optimizations
Current work orders of magnitude better than state-of-art
62
Demo(video omitted for space)
63
See Paper for More Details
• Modeling– Playback process dynamics – Buffering (for prefetch)– Full characterization of r incorporating
impact of centralized and distributed models on W
– Other elided details
• Plus–More architectural discussion!– Lots more experimental results!
64
Summary• Scalable VM launching needed
• VMTorrent addresses by– Decoupling data presentation from streaming
– Profile-based VM image streaming
• Straightforward techniques, implementation, no special optimizations for DC
• Performance much better than state-of-art– Hardware evaluation on multiple testbeds
– As predicted by modeling