beyond map/reduce: getting creative with parallel processing

1. Beyond Map/Reduce: Getting Creative with Parallel Processing"Ed Kohlwey@ekohlwey [email protected]

2. Overview" Within the last year: Two cluster schedulers have been released Two BSP frameworks have been released An in-memory Map/Reduce has been released Accumulo has been released More importantly We have been given the tools to program in something besides Map/Reduce and MPI 3. What About" This talk covers a few specic frameworks Theres lots more out there 4. Motivations for Schedulers" The cornerstone of new cluster computing environments 5. Different Tasks Have Different Needs"Host 7 Host 5Host 3 Host 2CPU RAM Host 1 CPU RAM Host 1 CPU RAMTask A Task BTask C 6. Clusters Often Dont Accommodate This" Percentage of ClusterExpense of Hosts RequiredLoad to Execute TaskTask A Task B Task CTask A Task B Task CTypes of HostsIn Cluster Type 1 7. This is How It Should Look"Percentage of ClusterExpense of Hosts Required Load to Execute TaskTask A Task B Task C Task A Task B Task CTypes of HostsIn ClusterType 1 Type 2 8. Economic Reasons"Power Consumption Load 9. Simple Example: a Work Queue" Data scientists execute serial implementations of machine learning algorithms Some are expensive, some are not Scientists arent running analyses all the time Solution 1: Give all the analysts a big workstation Solution 2: Give the analysts all thin clients and let themshare a cluster 10. Advantages for Moving to a Thin Client/Cluster Model" Scalability All analyst capabilities can be enhances by adding one host Increases resource utilization Workstations are expensive, and will be highly under-utilized Increase availability Using a distributed le system to store data 11. Desirable Scheduler Features"YARN Mesos Operate on heterogeneous clusters Y Y Highly Available Y Y Pluggable scheduling policies Y Y Authen9ca9on Y N Task ar9fact distribu9on Y P Scheduling policy based on mul9ple resources N Y (RAM, CPU) Mul9ple Queues Y N Fast accept/reject model N P Reusable method of describing resource Y N requirements Pluggable Isola9on N Y Compute Units N N 12. New Compute Environments"BSP, In-Memory Map/Reduce, and Streaming Processing 13. (Hadoop) Map/Reduce Pros & Cons" Map/Reduce implements partitioned, parallel sorting Many algorithms (relational) express well Creates O(n lg(n)) runtime constraints for some problems that wouldnt otherwise have them Hadoop M/R is good for bulk jobs 14. In-Memory Map/Reduce" Memory is fast Often, after the map phase, a whole data set can t in the memory of the cluster Spark provides this, as well as a very succinct programming environment courtesy of Scala and its closures 15. In-Memory Performance"Logistic Regression Performance Comparison 4000 3000Time (s) 2000Hadoop 1000Spark0510 20 30 Iterations*Numbers taken from http://spark-project.org 16. Spark Wordcount"val file = spark.textFile("hdfs://...)file.flatMap(line => line.split(" ")).map(word => (word, 1)).reduceByKey(_ + _) 17. Hadoop Wordcount"public class WordCount { public static void main(String[] args) throws Exception {Configuration conf = new Configuration();public static class TokenizerMapper String[] otherArgs = new GenericOptionsParser(conf,extends Mapper{ args).getRemainingArgs(); if (otherArgs.length != 2) { private final static IntWritable one = new IntWritable(1);System.err.println("Usage: wordcount "); private Text word = new Text(); System.exit(2);} public void map(Object key, Text value, Context contextJob job = new Job(conf, "word count"); ) throws IOException, InterruptedException { job.setJarByClass(WordCount.class);StringTokenizer itr = new StringTokenizer(value.toString());job.setMapperClass(TokenizerMapper.class);while (itr.hasMoreTokens()) { job.setCombinerClass(IntSumReducer.class); word.set(itr.nextToken()); job.setReducerClass(IntSumReducer.class); context.write(word, one);job.setOutputKeyClass(Text.class);} job.setOutputValueClass(IntWritable.class); }FileInputFormat.addInputPath(job, new Path(otherArgs[0]));} FileOutputFormat.setOutputPath(job, new Path(otherArgs[1]));System.exit(job.waitForCompletion(true) ? 0 : 1);public static class IntSumReducer }extends Reducer { } private IntWritable result = new IntWritable(); public void reduce(Text key, Iterable values, Context context) throws IOException, InterruptedException {int sum = 0;for (IntWritable val : values) { sum += val.get();}result.set(sum);context.write(key, result); }} 18. Streaming Processing: Accumulo" Accumulo is a BigTable implementation Idea: accumulate values in a column map using the ETL process Summarize values (stored in sorted order) at read-time reduce process No control over partitioning outside a row Accumulo doesnt suffer from the column family problem that HBase has, so this is ok Less consistent than Map/Reduce because race conditions can occur with respect to the scan cursor Iterator programming environment allows you to compose reduce operations Implementing streaming Map/Reduce over a BigTable implementation is a hybrid of in-memory and disk based approaches Allows revision of gures due to data provenance issues 19. BSP"Generalizing Map/Reduce for graph processing 20. BSP" First proposed by Valiant in 1990 Good at expressing iterative computation Good at expressing graph algorithms Concerned with passing messages between virtual processors Perhaps the most famous implementation is Pregel 21. MR Graph Traversal"Map Sort + Reduce Shue A n Bn Cn 22. MR Graph Traversal"Map Sort + Reduce I want to send a Shue message to C!A n Bn Cn 23. MR Graph Traversal"Map Sort + Reduce Shue A n A C n, m Bn B nCn C n 24. MR Graph Traversal"Map Sort + Reduce Shue A n A C n, m Bn B nCn C n 25. MR Graph Traversal"Map Sort + Reduce Shue A n A C n, m An Bn B nBn Cn C nCCn, m 26. MR Graph Traversal"Map Sort + Reduce Shue A n A C n, m An A nBn B nBn B nI got it!Cn C nCCn,m C n 27. MR Graph Traversal"Map Sort + Reduce Shue O((n+m) lg(n+m) ) A n A C n, m An A nBn B nB n B nCn C nCCn,m C n 28. MR Graph Traversal"Map Sort + Reduce Shue This can be op9mized to O(m) A n A C n, m A n A nBn B n Bn B nCn C n CC n,m C n 29. The BSP Version"Compute Exchange Synchronize Messages A n C m A nB n B n CBn m C n 30. The BSP Version" No9ce A and Cs message Compute Exchange Synchronize exchange isnt closely Messages coupled, providing beEer I/O u9liza9on A n C m A n B n B nCBn m C n 31. The BSP Version"Also, no9ce we dont necessarily Compute have to copy the en9re graph Exchange Synchronize state. We just send whatever Messages messages need to be sent A n C m A n Bn B nC B n m C n 32. BSP Implementations" Giraph Currently an Apache Incubator project Has a growing community Runs during the Hadoop Map phase GoldenOrb Not actively maintained since the summer Both implementations are in-memory, modeled after Pregel 33. Contact Info"Ed KohlweyBooz | Allen | Hamilton@[email protected]

beyond map/reduce: getting creative with parallel processing

Technology

task b task c task

memory mapreduce memory

heterogeneous clusters

task b task ctask

streaming mapreduce

new jobconf

new stringtokenizervalue

new genericoptionsparserconf