scala - the language for big data
TRANSCRIPT
![Page 1: Scala - THE language for Big Data](https://reader033.vdocuments.us/reader033/viewer/2022050614/587d23521a28ab1c2f8b5fb9/html5/thumbnails/1.jpg)
![Page 4: Scala - THE language for Big Data](https://reader033.vdocuments.us/reader033/viewer/2022050614/587d23521a28ab1c2f8b5fb9/html5/thumbnails/4.jpg)
![Page 5: Scala - THE language for Big Data](https://reader033.vdocuments.us/reader033/viewer/2022050614/587d23521a28ab1c2f8b5fb9/html5/thumbnails/5.jpg)
![Page 6: Scala - THE language for Big Data](https://reader033.vdocuments.us/reader033/viewer/2022050614/587d23521a28ab1c2f8b5fb9/html5/thumbnails/6.jpg)
![Page 7: Scala - THE language for Big Data](https://reader033.vdocuments.us/reader033/viewer/2022050614/587d23521a28ab1c2f8b5fb9/html5/thumbnails/7.jpg)
![Page 8: Scala - THE language for Big Data](https://reader033.vdocuments.us/reader033/viewer/2022050614/587d23521a28ab1c2f8b5fb9/html5/thumbnails/8.jpg)
![Page 9: Scala - THE language for Big Data](https://reader033.vdocuments.us/reader033/viewer/2022050614/587d23521a28ab1c2f8b5fb9/html5/thumbnails/9.jpg)
![Page 10: Scala - THE language for Big Data](https://reader033.vdocuments.us/reader033/viewer/2022050614/587d23521a28ab1c2f8b5fb9/html5/thumbnails/10.jpg)
private static class Person { String firstName; String lastName;}
private List<Person> firstNFamilies(int n, List<Person> persons) { final List<String> familiesSoFar = new LinkedList<>(); final List<Person> result = new LinkedList<>(); for (Person p : persons) { if (familiesSoFar.contains(p.lastName)) { result.add(p); } else if (familiesSoFar.size() < n) { familiesSoFar.add(p.lastName); result.add(p); } } return result;}
case class Person(firstName: String, lastName: String)
def firstNFamilies(n: Int, persons: List[Person]): List[Person] = { val firstFamilies = persons.map(p => p.lastName).distinct.take(n) persons.filter(p => firstFamilies.contains(p.lastName))}
![Page 11: Scala - THE language for Big Data](https://reader033.vdocuments.us/reader033/viewer/2022050614/587d23521a28ab1c2f8b5fb9/html5/thumbnails/11.jpg)
![Page 12: Scala - THE language for Big Data](https://reader033.vdocuments.us/reader033/viewer/2022050614/587d23521a28ab1c2f8b5fb9/html5/thumbnails/12.jpg)
![Page 13: Scala - THE language for Big Data](https://reader033.vdocuments.us/reader033/viewer/2022050614/587d23521a28ab1c2f8b5fb9/html5/thumbnails/13.jpg)
![Page 14: Scala - THE language for Big Data](https://reader033.vdocuments.us/reader033/viewer/2022050614/587d23521a28ab1c2f8b5fb9/html5/thumbnails/14.jpg)
![Page 15: Scala - THE language for Big Data](https://reader033.vdocuments.us/reader033/viewer/2022050614/587d23521a28ab1c2f8b5fb9/html5/thumbnails/15.jpg)
class DirectParquetOutputCommitter(outputPath: Path, context: TaskAttemptContext) extends ParquetOutputCommitter(outputPath, context) { … }
Java class from org.apache.parquet:parquet-hadoop
Scala class from org.apache.spark:spark-core_2.10
![Page 16: Scala - THE language for Big Data](https://reader033.vdocuments.us/reader033/viewer/2022050614/587d23521a28ab1c2f8b5fb9/html5/thumbnails/16.jpg)
![Page 17: Scala - THE language for Big Data](https://reader033.vdocuments.us/reader033/viewer/2022050614/587d23521a28ab1c2f8b5fb9/html5/thumbnails/17.jpg)
![Page 18: Scala - THE language for Big Data](https://reader033.vdocuments.us/reader033/viewer/2022050614/587d23521a28ab1c2f8b5fb9/html5/thumbnails/18.jpg)
Nonsense!
No Way!
RAGE!!11
![Page 19: Scala - THE language for Big Data](https://reader033.vdocuments.us/reader033/viewer/2022050614/587d23521a28ab1c2f8b5fb9/html5/thumbnails/19.jpg)
![Page 20: Scala - THE language for Big Data](https://reader033.vdocuments.us/reader033/viewer/2022050614/587d23521a28ab1c2f8b5fb9/html5/thumbnails/20.jpg)
http://vmturbo.com/wp-content/uploads/2015/05/ScaleUpScaleOut_sm-min.jpg
![Page 21: Scala - THE language for Big Data](https://reader033.vdocuments.us/reader033/viewer/2022050614/587d23521a28ab1c2f8b5fb9/html5/thumbnails/21.jpg)
http://vmturbo.com/wp-content/uploads/2015/05/ScaleUpScaleOut_sm-min.jpg
![Page 22: Scala - THE language for Big Data](https://reader033.vdocuments.us/reader033/viewer/2022050614/587d23521a28ab1c2f8b5fb9/html5/thumbnails/22.jpg)
![Page 23: Scala - THE language for Big Data](https://reader033.vdocuments.us/reader033/viewer/2022050614/587d23521a28ab1c2f8b5fb9/html5/thumbnails/23.jpg)
![Page 24: Scala - THE language for Big Data](https://reader033.vdocuments.us/reader033/viewer/2022050614/587d23521a28ab1c2f8b5fb9/html5/thumbnails/24.jpg)
![Page 25: Scala - THE language for Big Data](https://reader033.vdocuments.us/reader033/viewer/2022050614/587d23521a28ab1c2f8b5fb9/html5/thumbnails/25.jpg)
![Page 26: Scala - THE language for Big Data](https://reader033.vdocuments.us/reader033/viewer/2022050614/587d23521a28ab1c2f8b5fb9/html5/thumbnails/26.jpg)
val numbers = 1 to 100000val result = numbers.map(slowF)
![Page 27: Scala - THE language for Big Data](https://reader033.vdocuments.us/reader033/viewer/2022050614/587d23521a28ab1c2f8b5fb9/html5/thumbnails/27.jpg)
val numbers = 1 to 100000val result = numbers.par.map(slowF)
Parallelizes next manipulations over available CPUs
![Page 28: Scala - THE language for Big Data](https://reader033.vdocuments.us/reader033/viewer/2022050614/587d23521a28ab1c2f8b5fb9/html5/thumbnails/28.jpg)
val numbers = 1 to 100000val result = sparkContext.parallelize(numbers).map(slowF)
Parallelizes next manipulations over scalable cluster, by creating a Spark RDD - a Resilient Distributed Dataset
![Page 29: Scala - THE language for Big Data](https://reader033.vdocuments.us/reader033/viewer/2022050614/587d23521a28ab1c2f8b5fb9/html5/thumbnails/29.jpg)
photo: http://www.swissict-award.ch/fileadmin/award/Pressebilder/Martin_Odersky_Scala.jpg
![Page 30: Scala - THE language for Big Data](https://reader033.vdocuments.us/reader033/viewer/2022050614/587d23521a28ab1c2f8b5fb9/html5/thumbnails/30.jpg)
Map
Map
MapMap Map (retry)
![Page 31: Scala - THE language for Big Data](https://reader033.vdocuments.us/reader033/viewer/2022050614/587d23521a28ab1c2f8b5fb9/html5/thumbnails/31.jpg)
![Page 32: Scala - THE language for Big Data](https://reader033.vdocuments.us/reader033/viewer/2022050614/587d23521a28ab1c2f8b5fb9/html5/thumbnails/32.jpg)
![Page 33: Scala - THE language for Big Data](https://reader033.vdocuments.us/reader033/viewer/2022050614/587d23521a28ab1c2f8b5fb9/html5/thumbnails/33.jpg)
![Page 34: Scala - THE language for Big Data](https://reader033.vdocuments.us/reader033/viewer/2022050614/587d23521a28ab1c2f8b5fb9/html5/thumbnails/34.jpg)
![Page 35: Scala - THE language for Big Data](https://reader033.vdocuments.us/reader033/viewer/2022050614/587d23521a28ab1c2f8b5fb9/html5/thumbnails/35.jpg)
![Page 36: Scala - THE language for Big Data](https://reader033.vdocuments.us/reader033/viewer/2022050614/587d23521a28ab1c2f8b5fb9/html5/thumbnails/36.jpg)
![Page 37: Scala - THE language for Big Data](https://reader033.vdocuments.us/reader033/viewer/2022050614/587d23521a28ab1c2f8b5fb9/html5/thumbnails/37.jpg)
![Page 38: Scala - THE language for Big Data](https://reader033.vdocuments.us/reader033/viewer/2022050614/587d23521a28ab1c2f8b5fb9/html5/thumbnails/38.jpg)