the cloud workloads archive: a status report

Download The Cloud Workloads Archive: A Status  Report

If you can't read please download the document

Upload: malini

Post on 09-Jan-2016

30 views

Category:

Documents


0 download

DESCRIPTION

The Cloud Workloads Archive: A Status Report. Special thanks to Ion for this opportunity!. Alexandru Iosup. Rean Griffith, Andrew Konwinski, Matei Zaharia, Ali Ghodsi, Ion Stoica. Parallel and Distributed Systems Group, Delft University of Technology, The Netherlands. RADLab, - PowerPoint PPT Presentation

TRANSCRIPT

  • ** The Cloud Workloads Archive:A Status ReportBerkeley, CA, USAAlexandru IosupParallel and Distributed Systems Group,Delft University of Technology,The NetherlandsRean Griffith, Andrew Konwinski, Matei Zaharia, Ali Ghodsi, Ion StoicaRADLab,University of California, Berkeley,USASpecial thanks to Ion for this opportunity!

  • About the TeamRecent Work in PerformanceThe Grid Workloads Archive (Nov 2006)The Failure Trace Archive (Nov 2009)Analysis of Facebook, Yahoo, and Google data center workloads (2009-2010)The Peer-to-Peer Trace Archive (Apr 2010)Tools: GrenchMark workload-based grid benchmarking, RAINSpeaker: Alexandru IosupSystems work: Tribler (P2P file sharing), Koala (grid scheduling), POGGI and CAMEO (massively multiplayer online gaming)Performance evaluation of clouds for sci.comp.: EC2 & three othersTeam of 15+ active collaborators in NL, AT, RO, USHappy to be in Berkeley until September

    **

  • Traces: Sine Qua Non in Comp.Sys.Res.My system/method/algorithm is better than yours (on my carefully crafted workload) Unrealistic (trivial): Prove that prioritize jobs from users whose name starts with A is a good scheduling policyRealistic? 85% jobs are short, 15% are longMajor problem in Computer Systems research Workload Trace = recording of real activity from a (real) system, often as a sequence of jobs / requests submitted by users for executionMain use: compare and cross-validate new job and resource management techniques and algorithmsMajor problem: obtaining and using real workload traces**

  • Previous Data Sharing EffortsCritical datasets in computer scienceGrid Workloads ArchiveFailure Trace ArchivePeer-to-Peer Trace ArchiveGame Trace Archive (soon) PWA, ITA, CRAWDAD, 1,000s of scientistsFrom theory to practice

    Research Question:Are data center workloads unique? (vs GWA, PWA, )**

  • **AgendaIntroduction & MotivationThe Cloud Workloads Archive: Whats in a Name?Format and ToolsContentsAnalysis & ModelingApplicationsTake Home Message

  • The Cloud Workloads Archive (CWA)Whats in a Name?CWA = Public collection of cloud/data center workload traces and of tools to process these traces; allows us to:Compare and cross-validate new job and resource management techniques and algorithms, across various workload tracesDetermine which (part of a) trace is most interesting for a specific job and resource management technique or algorithmDesign a general model for data center workloads, and validate it with various real workload tracesEvaluate the generality of a particular workload trace, to determine if results are biased towards a particular traceAnalyze the evolution of workload characteristics across long timescales, both intra- and inter-trace**

  • One Format Fits Them AllFlat formatJob and TasksSummary (20 unique data fields) and Detail (60 fields)Categories of informationShared with GWA, PWA: Time, Disk, Memory, NetJobs/Tasks that change resource consumption profileMapReduce-specific (two-thirds data fields)**A. Iosup, R. Griffith, A. Konwinski, M. Zaharia, A. Ghodsi, I. Stoica, Data Format for the Cloud Workloads Archive, v.3, 13/07/10CWJCWJDCWTCWTD

  • CWA Contents: Large-Scale WorkloadsToolsConvert to CWA formatAnalyze and model automatically Report**

  • **AgendaIntroduction & MotivationThe Cloud Workloads Archive: Whats in a Name?Format and ToolsContentsAnalysis & ModelingApplicationsTake Home Message

  • Types of Analysis**Analysis TypeBasic statisticsEvolution over timeCorrelationsData Break-downOverallBy Task Type (M/R)By App. Type (ID)By User (ID)By Duration (Short)

    Analysis FocusTime-relatedRun, Wait, Resp.TimeBounded SlowdownStructure-relatedNumber of tasksIO-relatedIO sizes and ratiosStatus-relatedSys. Utilization-relatedCounts/Ratios

  • Types of AnalysisSys.U., Over Time, By RunTimeAlso 1h, 10mins, counting intervals Study Short-/Long- Range Dependence (self-similarity)Also Job count, Running/Waiting counts, Study system utilization behavior**

  • Modeling ProcessWell-known prob. distrib.Normal, Exp, LogNormal, Gamma, Weibull, Gen-Pareto,MLE to fitFit known distribution to empirical distribution parameters Goodness-of-FitAssess how good the fit is; select best-fitting distributionKolmogorov-Smirnov: sensitive to body of distribution + D statAnderson-Darling: sensitive to tails of distributionHybrid method*: works for very large populations***Kondo et al., Failure Trace Archive, CCGrid10, Best Paper Award.

  • Main Results: Basic StatsMapReduce vs Grid workloads [vs Parallel Prod. Env.]Massive short tasks vs Many long tasks vs Few very long tasksFewer users for MapReduce environments?TODO: Analyse amounts per core**

  • **AgendaIntroduction & MotivationThe Cloud Workloads Archive: Whats in a Name?Format and ToolsContentsAnalysis & ModelingApplicationsTake Home Message

  • ApplicationsMesos running mixtures of workloadsWorkloads: MPI, MapReduce, grid, Find bottlenecksFind workloads that are particularly difficult to runImprove the system!Status: in progress, using cluster in Finland (Petri Savolainen)

    All the apps typical to trace-based work: design, validation, and comparison of algorithms, methods, and systems. **

  • **AgendaIntroduction & MotivationThe Cloud Workloads Archive: Whats in a Name?Format and ToolsContentsAnalysis & ModelingApplicationsTake Home Message

  • **Take Home MessageCloud Workloads ArchiveDatasetsTools to convert, analyze, and model the datasetsNeed your help to collect more traces Converted and analyzed three MapReduce workloadsDifferent from grid and parallel production environment workloads (ask about additional proof and let me show a couple more slides)Invariants? Applications1: Model of Cloud/MapReduce workloads2: Test and improve Mesos

  • **Continuing Our CollaborationScheduling mixtures of grid/HPC/cloud workloadsScheduling and resource management in practiceModeling aspects of cloud infrastructure and workloads Condor on top of Mesos

    Massively Social Gaming and MesosStep 1: Game analytics and social network analysis in Mesos

  • **Alex Iosup, Rean Griffith, Andrew Konwinski, Matei Zaharia, Ali Ghodsi, Ion StoicaThank you! Questions? Observations?More Information:The Grid Workloads Archive: gwa.ewi.tudelft.nlThe Failure Trace Archive: fta.inria.fr The GrenchMark perf. eval. tool: grenchmark.st.ewi.tudelft.nl Cloud research: www.st.ewi.tudelft.nl/~iosup/research_cloud.htmlsee PDS publication database at: www.pds.twi.tudelft.nl/email: [email protected] thanks to our collaborators: U. Wisc.-Madison, U Chicago, U Dortmund, U Innsbruck, LRI/INRIA Paris, INRIA Grenoble, U Leiden, Politehnica University of Bucharest, Technion, Thanks for all:AliG, Andrew, AndyK, Ari, Beth, Blaine, David, Ion, Justin, Lucian, Matei, Petri, Rean, Tim,

  • Additional Slides**

  • Main Results: Basic StatsMapReduce vs Grid workloadsIO-intensive vs Compute-intensiveConstant Wr[%]~40%IO for MapReduce traces?TODO: More MapReduce traces to validate findings**

  • Main ResultsTwo-mode trace do NOT analyze as whole

    **

  • **

    ********