adaylateandadollar short …wisdom.cs.wisc.edu/workshops/spring-14/talks/jellinek.pdf ·...
TRANSCRIPT
![Page 1: ADayLateandaDollar Short …wisdom.cs.wisc.edu/workshops/spring-14/talks/Jellinek.pdf · ComputeTime • Billing$“start”$and$“stop”$could$to$correspond$to$various$ events$in$an$instance$life9me,$for$example:$](https://reader033.vdocuments.us/reader033/viewer/2022060908/60a327bc2213e837a40be194/html5/thumbnails/1.jpg)
A Day Late and a Dollar Short The Case for Research on Cloud Billing Systems
Robert Jellinek Yan Zhai Thomas Ristenpart Michael Swi9
![Page 2: ADayLateandaDollar Short …wisdom.cs.wisc.edu/workshops/spring-14/talks/Jellinek.pdf · ComputeTime • Billing$“start”$and$“stop”$could$to$correspond$to$various$ events$in$an$instance$life9me,$for$example:$](https://reader033.vdocuments.us/reader033/viewer/2022060908/60a327bc2213e837a40be194/html5/thumbnails/2.jpg)
Outline • Mo9va9on: Why study cloud billing? • Our contribu9ons • Our study • Results • Conclusions and future work
![Page 3: ADayLateandaDollar Short …wisdom.cs.wisc.edu/workshops/spring-14/talks/Jellinek.pdf · ComputeTime • Billing$“start”$and$“stop”$could$to$correspond$to$various$ events$in$an$instance$life9me,$for$example:$](https://reader033.vdocuments.us/reader033/viewer/2022060908/60a327bc2213e837a40be194/html5/thumbnails/3.jpg)
Motivation: Cloud Billing • Many performance, reliability, and cost efficiency studies of the cloud.
• LiEle aEen9on has been paid to their billing systems.
• Pay-‐as-‐you-‐go pricing model relies upon complex, large-‐scale billing systems
![Page 4: ADayLateandaDollar Short …wisdom.cs.wisc.edu/workshops/spring-14/talks/Jellinek.pdf · ComputeTime • Billing$“start”$and$“stop”$could$to$correspond$to$various$ events$in$an$instance$life9me,$for$example:$](https://reader033.vdocuments.us/reader033/viewer/2022060908/60a327bc2213e837a40be194/html5/thumbnails/4.jpg)
Motivation: Cloud Billing • Resource accoun9ng is an interes9ng challenge. • How to track all compute resource usage in • real >me, • at fine granularity • maintaining accuracy, • and not hur9ng performance?
![Page 5: ADayLateandaDollar Short …wisdom.cs.wisc.edu/workshops/spring-14/talks/Jellinek.pdf · ComputeTime • Billing$“start”$and$“stop”$could$to$correspond$to$various$ events$in$an$instance$life9me,$for$example:$](https://reader033.vdocuments.us/reader033/viewer/2022060908/60a327bc2213e837a40be194/html5/thumbnails/5.jpg)
Study of cloud billing mechanisms. • We were able to: • Disambiguate billing by reverse engineering • Uncover bugs: • Race condi9ons in EC2 • Inconsistencies across billing interfaces in EC2 • Rackspace bug causing overcharges
• Detect systema>c undercharging from caching/aggrega9on
• Characterize performance of billing latency.
![Page 6: ADayLateandaDollar Short …wisdom.cs.wisc.edu/workshops/spring-14/talks/Jellinek.pdf · ComputeTime • Billing$“start”$and$“stop”$could$to$correspond$to$various$ events$in$an$instance$life9me,$for$example:$](https://reader033.vdocuments.us/reader033/viewer/2022060908/60a327bc2213e837a40be194/html5/thumbnails/6.jpg)
Study Overview • Guiding ques9on:
How accurate, >mely, and predictable are customer-‐facing billing interfaces?
• Measured billing for: • compute >me • storage (IOPS and capacity) • network usage
• Experimented on AWS, GCE, and Rackspace • Calculate billing latency of billing interfaces.
![Page 7: ADayLateandaDollar Short …wisdom.cs.wisc.edu/workshops/spring-14/talks/Jellinek.pdf · ComputeTime • Billing$“start”$and$“stop”$could$to$correspond$to$various$ events$in$an$instance$life9me,$for$example:$](https://reader033.vdocuments.us/reader033/viewer/2022060908/60a327bc2213e837a40be194/html5/thumbnails/7.jpg)
Methodology • Instrument providers’ API calls to launch/terminate instances and create/delete storage volumes, in order to capture fine-‐grained 9ming data about their usage.
• Launch an instance and execute one of several workloads (network tests; I/O tests; or 9med idle to test instance-‐hour thresholds) to measure resource usage.
• Fetch OS-‐based resource-‐usage data from procfs and Ne2ilter / iptables in order to compare with the amount ul9mately billed.
• Terminate instance aVer workload comple9on. In cases where we measure instance-‐hour thresholds, terminate at some fixed number of seconds aVer various instance-‐life9me events, in order to isolate the interval that the provider uses to calculate billing.
• Poll for billing updates over all measured resources.
![Page 8: ADayLateandaDollar Short …wisdom.cs.wisc.edu/workshops/spring-14/talks/Jellinek.pdf · ComputeTime • Billing$“start”$and$“stop”$could$to$correspond$to$various$ events$in$an$instance$life9me,$for$example:$](https://reader033.vdocuments.us/reader033/viewer/2022060908/60a327bc2213e837a40be194/html5/thumbnails/8.jpg)
Billing Interfaces • EC2: • Web-‐based GUI management console • Programa9cally accessible CSVs:
• Hourly • Monthly (to date) • Cost-‐alloca>on (allows user to tag resources and filter costs by tag)
• GCE: • Web-‐based GUI interface
• Rackspace: • Web-‐based GUI interface
![Page 9: ADayLateandaDollar Short …wisdom.cs.wisc.edu/workshops/spring-14/talks/Jellinek.pdf · ComputeTime • Billing$“start”$and$“stop”$could$to$correspond$to$various$ events$in$an$instance$life9me,$for$example:$](https://reader033.vdocuments.us/reader033/viewer/2022060908/60a327bc2213e837a40be194/html5/thumbnails/9.jpg)
Billing Interfaces
![Page 10: ADayLateandaDollar Short …wisdom.cs.wisc.edu/workshops/spring-14/talks/Jellinek.pdf · ComputeTime • Billing$“start”$and$“stop”$could$to$correspond$to$various$ events$in$an$instance$life9me,$for$example:$](https://reader033.vdocuments.us/reader033/viewer/2022060908/60a327bc2213e837a40be194/html5/thumbnails/10.jpg)
Billing Latency
Web Console Avg latency: 6:41 hours Std dev: 4:10 hours
CSV Avg latency: 8:15 hours Std dev: 3 hours
EC2
![Page 11: ADayLateandaDollar Short …wisdom.cs.wisc.edu/workshops/spring-14/talks/Jellinek.pdf · ComputeTime • Billing$“start”$and$“stop”$could$to$correspond$to$various$ events$in$an$instance$life9me,$for$example:$](https://reader033.vdocuments.us/reader033/viewer/2022060908/60a327bc2213e837a40be194/html5/thumbnails/11.jpg)
GCE/Rackspace Billing Latency
Lower bounds on GCE billing latency for 13 instances, in DAYS. Error bars indicate upper bounds.
Rackspace billing latency for 21 instances in HOURS, +/-‐ 10 minutes. All billing updates occurred between 9-‐10am UTC.
GCE Rackspace
![Page 12: ADayLateandaDollar Short …wisdom.cs.wisc.edu/workshops/spring-14/talks/Jellinek.pdf · ComputeTime • Billing$“start”$and$“stop”$could$to$correspond$to$various$ events$in$an$instance$life9me,$for$example:$](https://reader033.vdocuments.us/reader033/viewer/2022060908/60a327bc2213e837a40be194/html5/thumbnails/12.jpg)
EC2: Why such latency?
We deliberately staggered the start 9mes of instances. The billing update schedule suggests periodic batch processing.
EC2: Staggered Launch Times
![Page 13: ADayLateandaDollar Short …wisdom.cs.wisc.edu/workshops/spring-14/talks/Jellinek.pdf · ComputeTime • Billing$“start”$and$“stop”$could$to$correspond$to$various$ events$in$an$instance$life9me,$for$example:$](https://reader033.vdocuments.us/reader033/viewer/2022060908/60a327bc2213e837a40be194/html5/thumbnails/13.jpg)
What is “Compute Time”?
Major events in an instance life>me
![Page 14: ADayLateandaDollar Short …wisdom.cs.wisc.edu/workshops/spring-14/talks/Jellinek.pdf · ComputeTime • Billing$“start”$and$“stop”$could$to$correspond$to$various$ events$in$an$instance$life9me,$for$example:$](https://reader033.vdocuments.us/reader033/viewer/2022060908/60a327bc2213e837a40be194/html5/thumbnails/14.jpg)
Compute Time • Many events in an instance life9me.
• Providers: “Pricing is per instance-‐hour consumed for each instance, from the 9me an instance is launched un9l it is terminated or stopped.”[1]
• This is ambiguous. We tried to reverse engineer exactly when the “start” and “stop” 9mestamps occur.
[1] hEps://aws.amazon.com/ec2/pricing/
![Page 15: ADayLateandaDollar Short …wisdom.cs.wisc.edu/workshops/spring-14/talks/Jellinek.pdf · ComputeTime • Billing$“start”$and$“stop”$could$to$correspond$to$various$ events$in$an$instance$life9me,$for$example:$](https://reader033.vdocuments.us/reader033/viewer/2022060908/60a327bc2213e837a40be194/html5/thumbnails/15.jpg)
Compute Time • Billing “start” and “stop” could to correspond to various events in an instance life9me, for example:
• Start: • When user launches instance. • When launch request is serviced (could be queued). • When instance boot is complete. • When /proc/up9me is zero.
• Stop: • When user ini9ates shutdown from within instance. • When user ini9ates terminate from management console. • When termina9on is complete (opaque to user).
![Page 16: ADayLateandaDollar Short …wisdom.cs.wisc.edu/workshops/spring-14/talks/Jellinek.pdf · ComputeTime • Billing$“start”$and$“stop”$could$to$correspond$to$various$ events$in$an$instance$life9me,$for$example:$](https://reader033.vdocuments.us/reader033/viewer/2022060908/60a327bc2213e837a40be194/html5/thumbnails/16.jpg)
Compute Time • We determined the most probable 9mestamps for each service, but there was s9ll jiEer.
• Suggests variance outside what we are able to measure by polling the providers’ API.
![Page 17: ADayLateandaDollar Short …wisdom.cs.wisc.edu/workshops/spring-14/talks/Jellinek.pdf · ComputeTime • Billing$“start”$and$“stop”$could$to$correspond$to$various$ events$in$an$instance$life9me,$for$example:$](https://reader033.vdocuments.us/reader033/viewer/2022060908/60a327bc2213e837a40be194/html5/thumbnails/17.jpg)
EC2 Compute Time Results
![Page 18: ADayLateandaDollar Short …wisdom.cs.wisc.edu/workshops/spring-14/talks/Jellinek.pdf · ComputeTime • Billing$“start”$and$“stop”$could$to$correspond$to$various$ events$in$an$instance$life9me,$for$example:$](https://reader033.vdocuments.us/reader033/viewer/2022060908/60a327bc2213e837a40be194/html5/thumbnails/18.jpg)
Compute Time Anomalies Other anomalies in EC2 compute billing
![Page 19: ADayLateandaDollar Short …wisdom.cs.wisc.edu/workshops/spring-14/talks/Jellinek.pdf · ComputeTime • Billing$“start”$and$“stop”$could$to$correspond$to$various$ events$in$an$instance$life9me,$for$example:$](https://reader033.vdocuments.us/reader033/viewer/2022060908/60a327bc2213e837a40be194/html5/thumbnails/19.jpg)
Compute Time
• We created a special “fast boot” kernel that booted and immediately sent heartbeat messages to our control server.
• Terminated instances Δ seconds aVer launch. • Race condi9on causes some instances to not get billed, but yield roughly 2 minutes of free up9me.
![Page 20: ADayLateandaDollar Short …wisdom.cs.wisc.edu/workshops/spring-14/talks/Jellinek.pdf · ComputeTime • Billing$“start”$and$“stop”$could$to$correspond$to$various$ events$in$an$instance$life9me,$for$example:$](https://reader033.vdocuments.us/reader033/viewer/2022060908/60a327bc2213e837a40be194/html5/thumbnails/20.jpg)
Storage Billing • Provider charges based on its view of storage ops
• Storage: • In Rackspace dele9ng a volume before detaching from an instance caused it to hang and accrue charges.
• I/O charges in EC2 lower than /proc/diskstats would suggest; caching or aggrega9on?
• Storage example: • Write(4kb); write(4kb); -‐> 1 storage op • Write (4kb); seek(1 million); write(4kb) -‐> 2 storage ops
![Page 21: ADayLateandaDollar Short …wisdom.cs.wisc.edu/workshops/spring-14/talks/Jellinek.pdf · ComputeTime • Billing$“start”$and$“stop”$could$to$correspond$to$various$ events$in$an$instance$life9me,$for$example:$](https://reader033.vdocuments.us/reader033/viewer/2022060908/60a327bc2213e837a40be194/html5/thumbnails/21.jpg)
IOPS Aggregation/Caching?
![Page 22: ADayLateandaDollar Short …wisdom.cs.wisc.edu/workshops/spring-14/talks/Jellinek.pdf · ComputeTime • Billing$“start”$and$“stop”$could$to$correspond$to$various$ events$in$an$instance$life9me,$for$example:$](https://reader033.vdocuments.us/reader033/viewer/2022060908/60a327bc2213e837a40be194/html5/thumbnails/22.jpg)
Network Billing
• In EC2 for Internet-‐outbound traffic, underbilled by 5.6% of Nelilter measurements.
• Rackspace underbilled 1 GB of Internet-‐outbound traffic for 2 of 11 instances by 35 MB and 125 MB.
![Page 23: ADayLateandaDollar Short …wisdom.cs.wisc.edu/workshops/spring-14/talks/Jellinek.pdf · ComputeTime • Billing$“start”$and$“stop”$could$to$correspond$to$various$ events$in$an$instance$life9me,$for$example:$](https://reader033.vdocuments.us/reader033/viewer/2022060908/60a327bc2213e837a40be194/html5/thumbnails/23.jpg)
Conclusions • Future research should inves9gate the tradeoffs between performance on the one hand, and accurate, 9mely, transparent resource accoun9ng and billing on the other. • This will likely necessitate collabora9on with industry. • It seems that today, it should be feasible for providers to expose a billing API, to enable programma9c queries of billing informa9on.
![Page 24: ADayLateandaDollar Short …wisdom.cs.wisc.edu/workshops/spring-14/talks/Jellinek.pdf · ComputeTime • Billing$“start”$and$“stop”$could$to$correspond$to$various$ events$in$an$instance$life9me,$for$example:$](https://reader033.vdocuments.us/reader033/viewer/2022060908/60a327bc2213e837a40be194/html5/thumbnails/24.jpg)
Thank you.