emr aws demo
Post on 28-Jan-2018
48 Views
Preview:
TRANSCRIPT
DEMO OUTLINE
Simple Storage Service: S3
Job
Jar: Map Reduce code
Input: input data files
Output: output data files
All data must be on S3 including jar and input data
Create Hadoop Cluster
Size: number of workers
Hardware configuration
Stat the job
2
FIND OUT
29
S3
How to upload data from a terminal to S3
Scenario where data is some where in the net
Hadoop Master
Compile the job on the master
Submit the job from a terminal on the master
Performance Tuning
Hadoop cluster configuration
RAM allocated to each Mapper, Reducer
Data Compression
Code
Input Split Size
Adjust the number of reducers
top related