installation and setup spark published
TRANSCRIPT
![Page 1: Installation and setup spark published](https://reader034.vdocuments.us/reader034/viewer/2022042506/58ac43fb1a28ab99028b4fc7/html5/thumbnails/1.jpg)
DIPENDRA KUSI https://www.linkedin.com/in/er-dipendra-kusi-b3674193
2/11/17 SPARK SETUP
Installation and Setup Spark
![Page 2: Installation and setup spark published](https://reader034.vdocuments.us/reader034/viewer/2022042506/58ac43fb1a28ab99028b4fc7/html5/thumbnails/2.jpg)
DIPENDRA KUSI https://www.linkedin.com/in/er-dipendra-kusi-b3674193
2/11/17 SPARK SETUP
![Page 3: Installation and setup spark published](https://reader034.vdocuments.us/reader034/viewer/2022042506/58ac43fb1a28ab99028b4fc7/html5/thumbnails/3.jpg)
DIPENDRA KUSI https://www.linkedin.com/in/er-dipendra-kusi-b3674193
2/11/17 SPARK SETUP
Step 1: First setup the Cloudera
Step 2: Open terminal in Cloudera and start spark
usr/bin/spark-shell
Step 3: After start of spark we can write scala command to execute in spark using spark context
Now read the file from hdfs. Here there is input file in hdfs
val dt = sc.textFile("/user/cloudera/project_data/input") We can keep file in hdfs using:
hadoop fs -put file0 /user/cloudera/project_data/input
![Page 4: Installation and setup spark published](https://reader034.vdocuments.us/reader034/viewer/2022042506/58ac43fb1a28ab99028b4fc7/html5/thumbnails/4.jpg)
DIPENDRA KUSI https://www.linkedin.com/in/er-dipendra-kusi-b3674193
2/11/17 SPARK SETUP
Step 4: Now, we will split the text content based on whitespace and then count the word
val wordcount = dt.flatMap(x=>x.split(" ")).map(x=>(x,1)) .reduceByKey((a,b)=>a+b))
Step 5: Now print the result:
for(value <- wordcount) {println(value)}
![Page 5: Installation and setup spark published](https://reader034.vdocuments.us/reader034/viewer/2022042506/58ac43fb1a28ab99028b4fc7/html5/thumbnails/5.jpg)
DIPENDRA KUSI https://www.linkedin.com/in/er-dipendra-kusi-b3674193
2/11/17 SPARK SETUP
Integrate the Spark in eclipse:
Step 1: First go to eclipse and setup the scala plugin.
Go to Help-> Eclipse Market Place
Step 2: Now search scala plugin and install the plugin
![Page 6: Installation and setup spark published](https://reader034.vdocuments.us/reader034/viewer/2022042506/58ac43fb1a28ab99028b4fc7/html5/thumbnails/6.jpg)
DIPENDRA KUSI https://www.linkedin.com/in/er-dipendra-kusi-b3674193
2/11/17 SPARK SETUP
Click on install
Click on confirm
Then, Accept and install
Step 3: Now, check whether scala plugin is installed or not in eclipse
Go to New-> other-> type scala
![Page 7: Installation and setup spark published](https://reader034.vdocuments.us/reader034/viewer/2022042506/58ac43fb1a28ab99028b4fc7/html5/thumbnails/7.jpg)
DIPENDRA KUSI https://www.linkedin.com/in/er-dipendra-kusi-b3674193
2/11/17 SPARK SETUP
If there is scala App then scala plugin is installed
Step 4: Now create maven project
Got to New->other-> type maven project -> next->next->next
![Page 8: Installation and setup spark published](https://reader034.vdocuments.us/reader034/viewer/2022042506/58ac43fb1a28ab99028b4fc7/html5/thumbnails/8.jpg)
DIPENDRA KUSI https://www.linkedin.com/in/er-dipendra-kusi-b3674193
2/11/17 SPARK SETUP
Step 5: Now give the
Group Id: edu.sparkproject
Artifact Id: WordCount
Click Finish
Step 6:
Now go to pom.xml file and edit dependency to spark
![Page 9: Installation and setup spark published](https://reader034.vdocuments.us/reader034/viewer/2022042506/58ac43fb1a28ab99028b4fc7/html5/thumbnails/9.jpg)
DIPENDRA KUSI https://www.linkedin.com/in/er-dipendra-kusi-b3674193
2/11/17 SPARK SETUP
Step 7: Now Copy and paste the code below in pom.xml
Link: http://pastebin.com/V5n0hM5P
<project xmlns="http://maven.apache.org/POM/4.0.0" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/xsd/maven-4.0.0.xsd"> <modelVersion>4.0.0</modelVersion> <groupId>com.scalaproject</groupId> <artifactId>scalaproject</artifactId> <version>0.0.1-SNAPSHOT</version> <pluginRepositories> <pluginRepository> <id>scala-tools.org</id> <name>Scala-tools Maven2 Repository</name> <url>http://scala-tools.org/repo-releases</url> </pluginRepository> </pluginRepositories> <repositories> <repository> <id>pele.farmbio.uu.se</id> <url>http://pele.farmbio.uu.se/artifactory/libs-snapshot</url> </repository> </repositories> <dependencies>
![Page 10: Installation and setup spark published](https://reader034.vdocuments.us/reader034/viewer/2022042506/58ac43fb1a28ab99028b4fc7/html5/thumbnails/10.jpg)
DIPENDRA KUSI https://www.linkedin.com/in/er-dipendra-kusi-b3674193
2/11/17 SPARK SETUP
<dependency> <groupId>org.apache.spark</groupId> <artifactId>spark-core_2.10</artifactId> <version>1.6.0</version> </dependency> </dependencies> <build> <plugins> <!-- mixed scala/java compile --> <plugin> <groupId>org.scala-tools</groupId> <artifactId>maven-scala-plugin</artifactId> <executions> <execution> <id>compile</id> <goals> <goal>compile</goal> </goals> <phase>compile</phase> </execution> <execution> <id>test-compile</id> <goals> <goal>testCompile</goal> </goals> <phase>test-compile</phase> </execution> <execution> <phase>process-resources</phase> <goals> <goal>compile</goal> </goals> </execution> </executions> </plugin> <plugin> <artifactId>maven-compiler-plugin</artifactId> <configuration> <source>1.7</source> <target>1.7</target> </configuration> </plugin> <!-- for fatjar --> <plugin> <groupId>org.apache.maven.plugins</groupId> <artifactId>maven-assembly-plugin</artifactId> <version>2.4</version> <configuration> <descriptorRefs> <descriptorRef>jar-with-dependencies</descriptorRef> </descriptorRefs> </configuration>
![Page 11: Installation and setup spark published](https://reader034.vdocuments.us/reader034/viewer/2022042506/58ac43fb1a28ab99028b4fc7/html5/thumbnails/11.jpg)
DIPENDRA KUSI https://www.linkedin.com/in/er-dipendra-kusi-b3674193
2/11/17 SPARK SETUP
<executions> <execution> <id>assemble-all</id> <phase>package</phase> <goals> <goal>single</goal> </goals> </execution> </executions> </plugin> <plugin> <groupId>org.apache.maven.plugins</groupId> <artifactId>maven-jar-plugin</artifactId> <configuration> <archive> <manifest> <addClasspath>true</addClasspath> <mainClass>fully.qualified.MainClass</mainClass> </manifest> </archive> </configuration> </plugin> </plugins> <pluginManagement> <plugins> <!--This plugin's configuration is used to store Eclipse m2e settings only. It has no influence on the Maven build itself. --> <plugin> <groupId>org.eclipse.m2e</groupId> <artifactId>lifecycle-mapping</artifactId> <version>1.0.0</version> <configuration> <lifecycleMappingMetadata> <pluginExecutions> <pluginExecution> <pluginExecutionFilter> <groupId>org.scala-tools</groupId> <artifactId> maven-scala-plugin </artifactId> <versionRange> [2.15.2,) </versionRange> <goals> <goal>compile</goal> <goal>testCompile</goal> </goals> </pluginExecutionFilter> <action> <execute></execute> </action> </pluginExecution> </pluginExecutions>
![Page 12: Installation and setup spark published](https://reader034.vdocuments.us/reader034/viewer/2022042506/58ac43fb1a28ab99028b4fc7/html5/thumbnails/12.jpg)
DIPENDRA KUSI https://www.linkedin.com/in/er-dipendra-kusi-b3674193
2/11/17 SPARK SETUP
</lifecycleMappingMetadata> </configuration> </plugin> </plugins> </pluginManagement> </build> </project>
Now save it. It will download all the dependency.
Step 8: Now convert the project into Scala project
First delete the src/test/java folder
Now fix the error by clicking in quick fix and ok.
The error will disappear.
![Page 13: Installation and setup spark published](https://reader034.vdocuments.us/reader034/viewer/2022042506/58ac43fb1a28ab99028b4fc7/html5/thumbnails/13.jpg)
DIPENDRA KUSI https://www.linkedin.com/in/er-dipendra-kusi-b3674193
2/11/17 SPARK SETUP
Step 9: Now convert project into Scala Nature
Step 10:
Right click on project -> properties
![Page 14: Installation and setup spark published](https://reader034.vdocuments.us/reader034/viewer/2022042506/58ac43fb1a28ab99028b4fc7/html5/thumbnails/14.jpg)
DIPENDRA KUSI https://www.linkedin.com/in/er-dipendra-kusi-b3674193
2/11/17 SPARK SETUP
Step 11:
Now go to Scala Compiler -> tick on Use Project Setting -> select Fixed Scala Installation 2.10.6-> Apply ->
Ok
(Spark only support Scala version 2.10 so we need to match the scala version running on Spark )
![Page 15: Installation and setup spark published](https://reader034.vdocuments.us/reader034/viewer/2022042506/58ac43fb1a28ab99028b4fc7/html5/thumbnails/15.jpg)
DIPENDRA KUSI https://www.linkedin.com/in/er-dipendra-kusi-b3674193
2/11/17 SPARK SETUP
Step 12: Then go to Java Build Path -> remove Scala Library Container
(Spark core contain Scala Library Container so no need to have library here)
Now rename the package to Scala
![Page 16: Installation and setup spark published](https://reader034.vdocuments.us/reader034/viewer/2022042506/58ac43fb1a28ab99028b4fc7/html5/thumbnails/16.jpg)
DIPENDRA KUSI https://www.linkedin.com/in/er-dipendra-kusi-b3674193
2/11/17 SPARK SETUP
Step 13: Now add the Scala Object File
![Page 17: Installation and setup spark published](https://reader034.vdocuments.us/reader034/viewer/2022042506/58ac43fb1a28ab99028b4fc7/html5/thumbnails/17.jpg)
DIPENDRA KUSI https://www.linkedin.com/in/er-dipendra-kusi-b3674193
2/11/17 SPARK SETUP
![Page 18: Installation and setup spark published](https://reader034.vdocuments.us/reader034/viewer/2022042506/58ac43fb1a28ab99028b4fc7/html5/thumbnails/18.jpg)
DIPENDRA KUSI https://www.linkedin.com/in/er-dipendra-kusi-b3674193
2/11/17 SPARK SETUP
Give the Scala Object Name -> Count
![Page 19: Installation and setup spark published](https://reader034.vdocuments.us/reader034/viewer/2022042506/58ac43fb1a28ab99028b4fc7/html5/thumbnails/19.jpg)
DIPENDRA KUSI https://www.linkedin.com/in/er-dipendra-kusi-b3674193
2/11/17 SPARK SETUP
Step 14:
Now copy code from and paste into Word.scala file
Link: http://pastebin.com/XNpbcJ2z
package com.scalaproject.scalaproject import org.apache.spark.SparkConf import org.apache.spark.SparkContext import java.nio.file.{Paths, Files} import java.io._ import org.apache.commons.io.FileUtils import org.apache.commons.io.filefilter.WildcardFileFilter import scala.collection.immutable
![Page 20: Installation and setup spark published](https://reader034.vdocuments.us/reader034/viewer/2022042506/58ac43fb1a28ab99028b4fc7/html5/thumbnails/20.jpg)
DIPENDRA KUSI https://www.linkedin.com/in/er-dipendra-kusi-b3674193
2/11/17 SPARK SETUP
object WordCount { def main(args: Array[String]) = { //Start the Spark context val conf = new SparkConf() .setAppName("WordCount") .setMaster("local") val sc = new SparkContext(conf) val test = sc.textFile("input.txt") test.flatMap( x => x.split("\\s+")).map(x=>(x,1)).reduceByKey((a,b)=>a+b).saveAsTextFile("output") //Stop the Spark context sc.stop } def splitting(v:String): Array[String] = { v.split(" ") } }
Step 15:
Now add the input.txt file as input file to be processed.
![Page 21: Installation and setup spark published](https://reader034.vdocuments.us/reader034/viewer/2022042506/58ac43fb1a28ab99028b4fc7/html5/thumbnails/21.jpg)
DIPENDRA KUSI https://www.linkedin.com/in/er-dipendra-kusi-b3674193
2/11/17 SPARK SETUP
Add the text to input.txt file so that we can process it.
![Page 22: Installation and setup spark published](https://reader034.vdocuments.us/reader034/viewer/2022042506/58ac43fb1a28ab99028b4fc7/html5/thumbnails/22.jpg)
DIPENDRA KUSI https://www.linkedin.com/in/er-dipendra-kusi-b3674193
2/11/17 SPARK SETUP
Step 16: Now run the code
Step 17: Refresh the project.
You will see the output folder in the project-> go inside it there will be part-0000 that contain the output
![Page 23: Installation and setup spark published](https://reader034.vdocuments.us/reader034/viewer/2022042506/58ac43fb1a28ab99028b4fc7/html5/thumbnails/23.jpg)
DIPENDRA KUSI https://www.linkedin.com/in/er-dipendra-kusi-b3674193
2/11/17 SPARK SETUP