mmds 2014: myria (and scalable graph clustering with relaxmap)

Myria: Scalable Analytics as a Service Bill Howe, PhD University of Washington with Dan Suciu, Magda Balazinska, Dan Halperin, and many students MMDS 2014, Berkeley CA

Upload: bill-howe

Post on 10-May-2015

324 views

Category:

Technology

6 download

Report

Download

Tags:

Embed Size (px):

DESCRIPTION

A talk I gave at the MMDS workshop June 2014 on the Myria system as well as some of Seung-Hee Bae's work on scalable graph clustering. https://mmds-data.org/

TRANSCRIPT

1.Myria: Scalable Analytics as a Service Bill Howe, PhD University of Washington with Dan Suciu, Magda Balazinska, Dan Halperin, and many students MMDS 2014, Berkeley CA

2. Today Three observations about Big Data Myria: Scalable Analytics as a Service Parallel Flow-based Graph Clustering (if time, but there wont be) 7/10/2014 Bill Howe, UW 2/57 3. 7/10/2014 Bill Howe, UW 3 How can we deliver 1000 little SDSSs to anyone who wants one? 4. How much time do you spend handling data as opposed to doing science? Mode answer: 90% 7/10/2014 Bill Howe, UW 4 5. 0 30 60 90 120 Benchmark 1 Benchmark 2 Old system Your system Our system A typical Computer Science paper. slide src: Dan Halperin 6. 0 2500 5000 7500 10000 12500 Benchmark 1 Benchmark 2 Old system Your system Our system What people use The reality of the situation. slide src: Dan Halperin 7. [This was hard] due to the large amount of data (e.g. data indexes for data retrieval, dissection into data blocks and processing steps, order in which steps are performed to match memory/time requirements, file formats required by software used). In addition we actually spend quite some time in iterations fixing problems with certain features (e.g. capping ENCODE data), testing features and feature products to include, identifying useful test data sets, adjusting the training data (e.g. 1000G vs human-derived variants) So roughly 50% of the project was testing and improving the model, 30% figuring out how to do things (engineering) and 20% getting files and getting them into the right format. I guess in total [I spent] 6 months [on this project]. At least 3 months on issues of scale, file handling, and feature engineering. Martin Kircher, Genome SciencesWhy? 3k NSF postdocs in 2010 $50k / postdoc at least 50% overhead maybe $75M annually at NSF alone? 8. Data Science Workflow: 7/10/2014 Bill Howe, UW 8 1) Preparing to run a model 2) Running the model 3) Interpreting the results Gathering, cleaning, integrating, restructuring, transforming, loading, filtering, deleting, combining, merging, verifying, extracting, shaping, massaging 80% of the work -- Aaron Kimball The other 80% of the work 9. 7/10/2014 Bill Howe, UW 9 Your cool algorithmic problem is not the bottleneck Observation 1 10. 7/10/2014 Bill Howe, UW 10 Symbolic Reasoning and Algebraic Optimization N = ((z*2)+((z*3)+0))/1 Algebraic Laws: 1. (+) identity: x+0 = x 2. (/) identity: x/1 = x 3. (*) distributes: (n*x+n*y) = n*(x+y) 4. (*) commutes: x*y = y*x Apply rules 1, 3, 4, 2: N = (2+3)*z two operations instead of five, no division operator Every database does this kind of optimization every time you issue a query 11. SELECT x.strain, x.chr, x.region as snp_region, x.start_bp as snp_start_bp , x.end_bp as snp_end_bp, w.start_bp as nc_start_bp, w.end_bp as nc_end_bp , w.category as nc_category , CASE WHEN (x.start_bp >= w.start_bp AND x.end_bp

MMDS Ch2.2-2.2.3 - The Stanford University InfoLabinfolab.stanford.edu/~ullman/mmds/ch2.pdf · Chapter 2 MapReduce and the New Software Stack Modern data-mining applications, often

Guide to Wireless Communications, Third Editionjolenewa/courses/comp...Multichannel Multipoint Distribution Service (MMDS) • Multichannel multipoint di stribution service (MMDS)

Dimension Reduction Techniques - Stanford Universityweb.stanford.edu/group/mmds/slides/li-mmds.pdf · · 2010-05-04Dimension Reduction Techniques, (MMDS) June 24, 2006 4 ... 4 5

Part 1: Myria in action · Part 1: Myria in action In 2016, Myria initiated civil proceedings in seven new cases, five of which involved human trafficking, and two of which involved

Melbourne Medical Deputising Service (MMDS) - Home - … · 2 Overview of Melbourne Medical Deputising Service (MMDS) ... 8.3.4 Why the need for change ... 8.3.7 Risk Management

Interim Channel Models for G2 MMDS Fixed Wireless Applicationsgrouper.ieee.org/groups/802/16/tg3/contrib/802163p-00_… · · 2000-11-177 Nov 00 Interim Channel Models for G2 MMDS

Mahoney Mmds

Clustering - The Stanford University InfoLabinfolab.stanford.edu/~ullman/mmds/ch7a.pdf · 224 CHAPTER 7. CLUSTERING 7.1.3 The Curse of Dimensionality High-dimensional Euclidean spaces

Running N-body Use Cases on Myria - Amazon Web Services

MYRIA HE-422_fierbator de Apa 1000W - 1.7 Litri_manual de Utilizare

MMDs 12.3 SVM

MMDS 2008: Workshop on Algorithms for Modern Massive Data …mmds-data.org/programs/program2008.pdf · large-scale data analysis. The goals of MMDS 2008 are to explore novel tech-niques