sd cadd meeting 2016-08-30: intro to spark
TRANSCRIPT
Apache Spark is a fast and general engine for large-‐scale data processing • In-‐memory processing Successor of Hadoop (MapReduce) • File-‐based processing
hDp://spark.apache.org/
Apache Spark works in parallel on • Mul)core laptop, desktop • Single server • Cluster (need cluster manager)
RDD<String> RDD<String> PairRDD<String,Integer> PairRDD<String,Integer>
Map-‐Reduce Example
one to many one to one