Download - Scala - THE language for Big Data
private static class Person { String firstName; String lastName;}
private List<Person> firstNFamilies(int n, List<Person> persons) { final List<String> familiesSoFar = new LinkedList<>(); final List<Person> result = new LinkedList<>(); for (Person p : persons) { if (familiesSoFar.contains(p.lastName)) { result.add(p); } else if (familiesSoFar.size() < n) { familiesSoFar.add(p.lastName); result.add(p); } } return result;}
case class Person(firstName: String, lastName: String)
def firstNFamilies(n: Int, persons: List[Person]): List[Person] = { val firstFamilies = persons.map(p => p.lastName).distinct.take(n) persons.filter(p => firstFamilies.contains(p.lastName))}
class DirectParquetOutputCommitter(outputPath: Path, context: TaskAttemptContext) extends ParquetOutputCommitter(outputPath, context) { … }
Java class from org.apache.parquet:parquet-hadoop
Scala class from org.apache.spark:spark-core_2.10
Nonsense!
No Way!
RAGE!!11
http://vmturbo.com/wp-content/uploads/2015/05/ScaleUpScaleOut_sm-min.jpg
http://vmturbo.com/wp-content/uploads/2015/05/ScaleUpScaleOut_sm-min.jpg
val numbers = 1 to 100000val result = numbers.map(slowF)
val numbers = 1 to 100000val result = numbers.par.map(slowF)
Parallelizes next manipulations over available CPUs
val numbers = 1 to 100000val result = sparkContext.parallelize(numbers).map(slowF)
Parallelizes next manipulations over scalable cluster, by creating a Spark RDD - a Resilient Distributed Dataset
photo: http://www.swissict-award.ch/fileadmin/award/Pressebilder/Martin_Odersky_Scala.jpg
Map
Map
MapMap Map (retry)