hadoop map reduce

40
WDABT 2016 – BHARATHIAR UNIVERSITY K.Santhiya , Ph.d Research Scholar , Dr.V.Bhuvaneswari, Asst.Professor, Dept. of Comp. Appll., Bharathiar University,- WDABT 2016

Upload: karthika-karthi

Post on 15-Apr-2017

221 views

Category:

Data & Analytics


0 download

TRANSCRIPT

Page 1: Hadoop map reduce

WDABT 2016 – BHARATHIAR UNIVERSITY

K.Santhiya , Ph.d Research Scholar , Dr.V.Bhuvaneswari, Asst.Professor, Dept. of Comp. Appll., Bharathiar University,- WDABT 2016

Page 2: Hadoop map reduce

Take a Closer Look at

Presented ByK.SANTHIYA

Ph.d Research Scholar

Under the Guidance ofDr.V.BHUVANESWARI

Assistant ProfessorDepartment of Computer Applications

Bharathiar UniversityK.Santhiya , Ph.d Research Scholar , Dr.V.Bhuvaneswari, Asst.Professor, Dept. of Comp. Appll., Bharathiar

University,- WDABT 2016

Page 3: Hadoop map reduce

AGENDA

• MAPREDUCE• ANALOGY• EXECUTION• HADOOP INTERACTION• BUILD MAPREDUCE PROGRAM IN ECLIPSE

YARN• YARN DEFINITION• YARN REAL LIFE CONNECT• YARN INRASTRUCTURE

K.Santhiya , Ph.d Research Scholar , Dr.V.Bhuvaneswari, Asst.Professor, Dept. of Comp. Appll., Bharathiar University,- WDABT 2016

Page 4: Hadoop map reduce

WHY MAPREDUCE

K.Santhiya , Ph.d Research Scholar , Dr.V.Bhuvaneswari, Asst.Professor, Dept. of Comp. Appll., Bharathiar University,- WDABT 2016

Page 5: Hadoop map reduce

MAP REDUCE

K.Santhiya , Ph.d Research Scholar , Dr.V.Bhuvaneswari, Asst.Professor, Dept. of Comp. Appll., Bharathiar University,- WDABT 2016

Page 6: Hadoop map reduce

REAL TIME USES OF MAP REDUCE

K.Santhiya , Ph.d Research Scholar , Dr.V.Bhuvaneswari, Asst.Professor, Dept. of Comp. Appll., Bharathiar University,- WDABT 2016

Page 7: Hadoop map reduce

MR REAL – LIFE CONNECT

K.Santhiya , Ph.d Research Scholar , Dr.V.Bhuvaneswari, Asst.Professor, Dept. of Comp. Appll., Bharathiar University,- WDABT 2016

Page 8: Hadoop map reduce

MAP REDUCE - ANALOGY

K.Santhiya , Ph.d Research Scholar , Dr.V.Bhuvaneswari, Asst.Professor, Dept. of Comp. Appll., Bharathiar University,- WDABT 2016

Page 9: Hadoop map reduce

MAP REDUCE – ANALOGY CONTD.,

K.Santhiya , Ph.d Research Scholar , Dr.V.Bhuvaneswari, Asst.Professor, Dept. of Comp. Appll., Bharathiar University,- WDABT 2016

Page 10: Hadoop map reduce

MAP REDUCE EXAMPLE

K.Santhiya , Ph.d Research Scholar , Dr.V.Bhuvaneswari, Asst.Professor, Dept. of Comp. Appll., Bharathiar University,- WDABT 2016

Page 11: Hadoop map reduce

MAP EXECUTION

K.Santhiya , Ph.d Research Scholar , Dr.V.Bhuvaneswari, Asst.Professor, Dept. of Comp. Appll., Bharathiar University,- WDABT 2016

Page 12: Hadoop map reduce

MAP EXECUTION – DISTRIBUTED TWO NODE ENVIRONMENT

K.Santhiya , Ph.d Research Scholar , Dr.V.Bhuvaneswari, Asst.Professor, Dept. of Comp. Appll., Bharathiar University,- WDABT 2016

Page 13: Hadoop map reduce

MAPREDUCE JOBS

K.Santhiya , Ph.d Research Scholar , Dr.V.Bhuvaneswari, Asst.Professor, Dept. of Comp. Appll., Bharathiar University,- WDABT 2016

Page 14: Hadoop map reduce

HADOOP JOB WORK INTERACTION

K.Santhiya , Ph.d Research Scholar , Dr.V.Bhuvaneswari, Asst.Professor, Dept. of Comp. Appll., Bharathiar University,- WDABT 2016

Page 15: Hadoop map reduce

CHARACTERISTICS OF MR

• MapReduce is designed to handle very large scale data in the range of petabytes and exabytes.

• It works well on write once and read many data, also known as WORM data.

• MapReduce allows parallelism without mutexes.• The Map and Reduce operations are performed by the

same processor.• Operations are provisioned near the data as data locality is

preferred.• Commodity hardware and storage is leveraged in

MapReduce. • The runtime takes care of splitting and moving data for

operations.K.Santhiya , Ph.d Research Scholar , Dr.V.Bhuvaneswari, Asst.Professor, Dept. of Comp. Appll., Bharathiar University,-

WDABT 2016

Page 16: Hadoop map reduce

BUSINESS SCENARIO

K.Santhiya , Ph.d Research Scholar , Dr.V.Bhuvaneswari, Asst.Professor, Dept. of Comp. Appll., Bharathiar University,- WDABT 2016

Page 17: Hadoop map reduce

SET UP ENVIRONMENT

K.Santhiya , Ph.d Research Scholar , Dr.V.Bhuvaneswari, Asst.Professor, Dept. of Comp. Appll., Bharathiar University,- WDABT 2016

Page 18: Hadoop map reduce

SMALL DATA AND BIG DATA

K.Santhiya , Ph.d Research Scholar , Dr.V.Bhuvaneswari, Asst.Professor, Dept. of Comp. Appll., Bharathiar University,- WDABT 2016

Page 19: Hadoop map reduce

UPLOADING SMALL & BIG DATA

K.Santhiya , Ph.d Research Scholar , Dr.V.Bhuvaneswari, Asst.Professor, Dept. of Comp. Appll., Bharathiar University,- WDABT 2016

Page 20: Hadoop map reduce

BUILD MAPREDUCE PROGRAM

K.Santhiya , Ph.d Research Scholar , Dr.V.Bhuvaneswari, Asst.Professor, Dept. of Comp. Appll., Bharathiar University,- WDABT 2016

Page 21: Hadoop map reduce

MAPREDUCE DEMO

• We will be running an example to compute the value of ‘pi’, which is a computation intensive program. The first argument indicates how many maps to create. Here, we use 10 mappers. The second argument indicates how many samples are generated per map; here, we take 100 random samples. So this program uses 10 multiplied by 100, that is, 1000 random points to estimate pi. We could enhance 100 to 10 million and improve accuracy.

K.Santhiya , Ph.d Research Scholar , Dr.V.Bhuvaneswari, Asst.Professor, Dept. of Comp. Appll., Bharathiar University,- WDABT 2016

Page 22: Hadoop map reduce

HADOOP MR REQUIREMENTS

K.Santhiya , Ph.d Research Scholar , Dr.V.Bhuvaneswari, Asst.Professor, Dept. of Comp. Appll., Bharathiar University,- WDABT 2016

Page 23: Hadoop map reduce

Create a New Project : Step 1

K.Santhiya , Ph.d Research Scholar , Dr.V.Bhuvaneswari, Asst.Professor, Dept. of Comp. Appll., Bharathiar University,- WDABT 2016

Page 24: Hadoop map reduce

Create a New Project : Step 2

K.Santhiya , Ph.d Research Scholar , Dr.V.Bhuvaneswari, Asst.Professor, Dept. of Comp. Appll., Bharathiar University,- WDABT 2016

Page 25: Hadoop map reduce

Create a New Project : Step 3

K.Santhiya , Ph.d Research Scholar , Dr.V.Bhuvaneswari, Asst.Professor, Dept. of Comp. Appll., Bharathiar University,- WDABT 2016

Page 26: Hadoop map reduce

Create a New Project : Step 4

K.Santhiya , Ph.d Research Scholar , Dr.V.Bhuvaneswari, Asst.Professor, Dept. of Comp. Appll., Bharathiar University,- WDABT 2016

Page 27: Hadoop map reduce

Create a New Project : Step 5

K.Santhiya , Ph.d Research Scholar , Dr.V.Bhuvaneswari, Asst.Professor, Dept. of Comp. Appll., Bharathiar University,- WDABT 2016

Page 28: Hadoop map reduce

CHECKING HADOOP ENVIRONMENT FOR MAPREDUCE

K.Santhiya , Ph.d Research Scholar , Dr.V.Bhuvaneswari, Asst.Professor, Dept. of Comp. Appll., Bharathiar University,- WDABT 2016

Page 29: Hadoop map reduce

Build a MR Application Using Eclipse and Run in Hadoop Cluster

Let’s build a MapReduce Java program in Eclipse and then run in our Hadoop cluster. In this demo, we will run Eclipse in the Windows development machine and our Hadoop cluster will be in Ubuntu.

• First, let’s launch Eclipse.• 2. Enter the workspace location.• 3. Click OK• 4. The Eclipse window will open. • 5. Close the welcome screen of Eclipse.• 6. Select the New menu item. • 7. Select Java Project.• 8. The New Java Project window opens. • 9. We will be build a WordCount program here to count the number of times

each word occurs in a particular file.

K.Santhiya , Ph.d Research Scholar , Dr.V.Bhuvaneswari, Asst.Professor, Dept. of Comp. Appll., Bharathiar University,- WDABT 2016

Page 30: Hadoop map reduce

Build a MR Application Using Eclipse and Run in Hadoop Cluster Contd.,

10. Enter the name of the project as ‘WordCount’ and click Finish. 11. Right click the WordCount project in the panel on the left. 12. Select New and then Class. 13. The New Java Class window opens. 14. Enter the name of the class as ‘WordCount’. 15. Click Finish.

16. Now, let’s copy the WordCount program from the MapReduce tutorial on Hadoop’s website. You may go to Hadoop’s documentation or directly go to the link being shown.

17. Copy the source code for the Word Count program. 18. You would notice a lot of compilation errors. Let’s fix the build patch now.

Select the project WordCount.

K.Santhiya , Ph.d Research Scholar , Dr.V.Bhuvaneswari, Asst.Professor, Dept. of Comp. Appll., Bharathiar University,- WDABT 2016

Page 31: Hadoop map reduce

Build a MR Application Using Eclipse and Run in Hadoop Cluster Contd.,

19. Select the Project menu item. 20. Click Properties.21. In libraries, add external JARs. 22. Browse to the unpacked Hadoop directory and go to share- Hadoop- MapReduce

directory. 23. Select the Hadoop MapReduce client core and Hadoop MapReduce client common

JAR files. 24. Now, go to share-Hadoop-common directory. 25. Select the Hadoop common JAR file. 26. The compilation errors would have gone by now. 27. Let’s now see various portions of this program. 28. The usual Java imports are at the top of the program. 29. Further, there are Hadoop and MapReduce related import statements. Select the

Description column header.

K.Santhiya , Ph.d Research Scholar , Dr.V.Bhuvaneswari, Asst.Professor, Dept. of Comp. Appll., Bharathiar University,- WDABT 2016

Page 32: Hadoop map reduce

Build a MR Application Using Eclipse and Run in Hadoop Cluster Contd.,

30. In the main method, we begin by setting configuration of the MapReduce job.31. We set the name of the Mapper class.32. We set the name of Combiner class.33. Similarly, there is a Reducer class. 34. We can set the output key class. 35. We can also set the output value class. 36. Also, set the input data path for the source dataset. 37. Set the output path to a location where the results are desired.38. Our Mapper class extends Mapper. 39. It has a map method which takes key and value as arguments and uses context. 40. In the WordCount logic, we just tokenize each line by space character and

extract individual words.

K.Santhiya , Ph.d Research Scholar , Dr.V.Bhuvaneswari, Asst.Professor, Dept. of Comp. Appll., Bharathiar University,- WDABT 2016

Page 33: Hadoop map reduce

Build a MR Application Using Eclipse and Run in Hadoop Cluster Contd.,

41. Our Reducer class similarly extends Reducer.42. The Reduce method takes a key and an iterable list of values as arguments.43. The final output is again written as key value pairs.44. Select the New menu item. 45. Let’s now build and export a JAR file to run this program on a Hadoop

cluster. Click File menu and then Export. 46. The Export window opens. 47. Expand Java. 48. Select JAR file.49. Click the Next button. 50. Enter the path and name of JAR. In this case, let’s name it

‘WordCount.jar’.

K.Santhiya , Ph.d Research Scholar , Dr.V.Bhuvaneswari, Asst.Professor, Dept. of Comp. Appll., Bharathiar University,- WDABT 2016

Page 34: Hadoop map reduce

Build a MR Application Using Eclipse and Run in Hadoop Cluster Contd.,

51. Make sure that you to select the project. 52. Now, let’s transfer this JAR to the Hadoop cluster. If you are using

Windows, you can use any SCP or FTP client such as WinSCP. Login to WinSCP using the IP address of the Hadoop Ubuntu cluster.

53. Enter the username of the Hadoop machine. 54. Enter the password. 55. Select the WordCount.jar file from the local Windows machine. 56. Using WinSCP, you can drag and drop to the Ubuntu machine in the panel

on the right. 57. The Copy window opens. 58. Click the Copy button.

K.Santhiya , Ph.d Research Scholar , Dr.V.Bhuvaneswari, Asst.Professor, Dept. of Comp. Appll., Bharathiar University,- WDABT 2016

Page 35: Hadoop map reduce

Build a MR Application Using Eclipse and Run in Hadoop Cluster Contd.,

59. Now, run the WordCount program in the Hadoop cluster using the hadoop jar command. Specify the input file name on which WordCount is to be applied and also the output result path.

60. View the results in the output directory. 61. You will notice a file named similar to the part Out1.62. View the contents of this output file using the hadoop fs -cat command.63. The output will have a count of each word’s occurrence in the input

dataset.

K.Santhiya , Ph.d Research Scholar , Dr.V.Bhuvaneswari, Asst.Professor, Dept. of Comp. Appll., Bharathiar University,- WDABT 2016

Page 36: Hadoop map reduce

WHY YARN ?YARN : Yet Another Resource Navigator

K.Santhiya , Ph.d Research Scholar , Dr.V.Bhuvaneswari, Asst.Professor, Dept. of Comp. Appll., Bharathiar University,- WDABT 2016

Page 37: Hadoop map reduce

WHAT IS YARN ?

YARN is a resource manager. It was created by separating the processing engine and the management function of MapReduce. It monitors and manages workloads, maintains a multi-tenant environment, manages the high availability features of Hadoop, and implements security controls.

K.Santhiya , Ph.d Research Scholar , Dr.V.Bhuvaneswari, Asst.Professor, Dept. of Comp. Appll., Bharathiar University,- WDABT 2016

Page 38: Hadoop map reduce

YARN – REAL LIFE CONNECT

• Limitations of MapReduce• Architected by Yahoo• Hadoop 2.0 provides a broader ecosystem

with

– Spark for Iterative processing– Storm for Stream processing– Hadoop for Batch processing

K.Santhiya , Ph.d Research Scholar , Dr.V.Bhuvaneswari, Asst.Professor, Dept. of Comp. Appll., Bharathiar University,- WDABT 2016

Page 39: Hadoop map reduce

YARN INFRASTRUCTURE

K.Santhiya , Ph.d Research Scholar , Dr.V.Bhuvaneswari, Asst.Professor, Dept. of Comp. Appll., Bharathiar University,- WDABT 2016

Page 40: Hadoop map reduce

REFERENCES• (2012) Carl W. Olofson, Dan Vesset.

Worldwide Hadoop – MapReduce Ecosystem Software 2012-2016 Forecast [Online] Available : http://www.idc.com/getdoc.jsp?containerId=234294

•  Philip Russom , " Big Data Analytics " , presented by tdwi , 2011

• K. Cukier, “Data, data everywhere,'' Economist, vol. 394, no. 8671,pp. 3_16, 2010

K.Santhiya , Ph.d Research Scholar , Dr.V.Bhuvaneswari, Asst.Professor, Dept. of Comp. Appll., Bharathiar University,- WDABT 2016