sky agile horizons hadoop at sky. what is hadoop? - reliable, scalable, distributed where did it...
TRANSCRIPT
![Page 1: Sky Agile Horizons Hadoop at Sky. What is Hadoop? - Reliable, Scalable, Distributed Where did it come from? - Community + Yahoo! Where is it now? - Apache](https://reader036.vdocuments.us/reader036/viewer/2022081515/56649e8f5503460f94b94012/html5/thumbnails/1.jpg)
Sky Agile HorizonsHadoop at Sky
![Page 2: Sky Agile Horizons Hadoop at Sky. What is Hadoop? - Reliable, Scalable, Distributed Where did it come from? - Community + Yahoo! Where is it now? - Apache](https://reader036.vdocuments.us/reader036/viewer/2022081515/56649e8f5503460f94b94012/html5/thumbnails/2.jpg)
• What is Hadoop?- Reliable, Scalable, Distributed
• Where did it come from?- Community + Yahoo!
• Where is it now? - Apache Software Foundation
• Why is it called “Hadoop”?
1.01
Hadoop at Sky
Overview
![Page 3: Sky Agile Horizons Hadoop at Sky. What is Hadoop? - Reliable, Scalable, Distributed Where did it come from? - Community + Yahoo! Where is it now? - Apache](https://reader036.vdocuments.us/reader036/viewer/2022081515/56649e8f5503460f94b94012/html5/thumbnails/3.jpg)
To name just a few…
1.02
Hadoop at Sky
Who is using it?
![Page 4: Sky Agile Horizons Hadoop at Sky. What is Hadoop? - Reliable, Scalable, Distributed Where did it come from? - Community + Yahoo! Where is it now? - Apache](https://reader036.vdocuments.us/reader036/viewer/2022081515/56649e8f5503460f94b94012/html5/thumbnails/4.jpg)
This screengrab is from one of the Hadoop clusters at Facebook (May 2010)
1.03
Hadoop at Sky
Is it “production” ready?
![Page 5: Sky Agile Horizons Hadoop at Sky. What is Hadoop? - Reliable, Scalable, Distributed Where did it come from? - Community + Yahoo! Where is it now? - Apache](https://reader036.vdocuments.us/reader036/viewer/2022081515/56649e8f5503460f94b94012/html5/thumbnails/5.jpg)
1.04
Hadoop at Sky
So, what does it give you?
![Page 6: Sky Agile Horizons Hadoop at Sky. What is Hadoop? - Reliable, Scalable, Distributed Where did it come from? - Community + Yahoo! Where is it now? - Apache](https://reader036.vdocuments.us/reader036/viewer/2022081515/56649e8f5503460f94b94012/html5/thumbnails/6.jpg)
• Distributed Filesystem (HDFS)- Name Node- Data Node(s)
• Distributed Processing Infrastructure- Job Tracker- Task Tracker(s)
1.05
Hadoop at Sky
Just two things...
![Page 7: Sky Agile Horizons Hadoop at Sky. What is Hadoop? - Reliable, Scalable, Distributed Where did it come from? - Community + Yahoo! Where is it now? - Apache](https://reader036.vdocuments.us/reader036/viewer/2022081515/56649e8f5503460f94b94012/html5/thumbnails/7.jpg)
• Blocks- 64MB chunks (configurable)
• WORM (Write once, read many)
- NO EDITS- NO APPENDS
• Replication- 3 copies- direct
1.06
Hadoop at Sky
HDFS - Overview
![Page 8: Sky Agile Horizons Hadoop at Sky. What is Hadoop? - Reliable, Scalable, Distributed Where did it come from? - Community + Yahoo! Where is it now? - Apache](https://reader036.vdocuments.us/reader036/viewer/2022081515/56649e8f5503460f94b94012/html5/thumbnails/8.jpg)
1.07
Hadoop at Sky
HDFS - ReadName Node
1 1 1 2
2
2
3 3 34
4 4
Client 1. Get Metadata
2. Fetch Blocks
Data Nodes
Control / Monitoring
![Page 9: Sky Agile Horizons Hadoop at Sky. What is Hadoop? - Reliable, Scalable, Distributed Where did it come from? - Community + Yahoo! Where is it now? - Apache](https://reader036.vdocuments.us/reader036/viewer/2022081515/56649e8f5503460f94b94012/html5/thumbnails/9.jpg)
1.08
Hadoop at Sky
HDFS - WriteName Node
1 32
Client 1. Create Metadata
2. Put Blocks
Data Nodes
Control / Monitoring
1 1
2 2
3 3
![Page 10: Sky Agile Horizons Hadoop at Sky. What is Hadoop? - Reliable, Scalable, Distributed Where did it come from? - Community + Yahoo! Where is it now? - Apache](https://reader036.vdocuments.us/reader036/viewer/2022081515/56649e8f5503460f94b94012/html5/thumbnails/10.jpg)
• Slots- X mapper slots, Y reducer slots (per node)
• Jobs- Queued- Prioritised
• Tasks
- Data-aware
1.09
Hadoop at Sky
Distributed Processing
![Page 11: Sky Agile Horizons Hadoop at Sky. What is Hadoop? - Reliable, Scalable, Distributed Where did it come from? - Community + Yahoo! Where is it now? - Apache](https://reader036.vdocuments.us/reader036/viewer/2022081515/56649e8f5503460f94b94012/html5/thumbnails/11.jpg)
1.10
Hadoop at Sky
Distributed ProcessingJob TrackerClient 1. Setup Job
Task Trackers
Control / Monitoring
M M
M M
R R
M M
M M
R R
M M
M M
R R
M M
M M
R R
M M
M M
R R
![Page 12: Sky Agile Horizons Hadoop at Sky. What is Hadoop? - Reliable, Scalable, Distributed Where did it come from? - Community + Yahoo! Where is it now? - Apache](https://reader036.vdocuments.us/reader036/viewer/2022081515/56649e8f5503460f94b94012/html5/thumbnails/12.jpg)
• Two modes of operation
1.11
Hadoop at Sky
Implementation
Name Node
Data Node
Job Tracker
Task Tracker
Standalone
Name Node
Job Tracker
Master
Data Node
Task Tracker
Data Node
Task Tracker
Data Node
Task Tracker
Data Node
Task Tracker
Data Node
Task Tracker
Data Node
Task Tracker
Slaves
![Page 13: Sky Agile Horizons Hadoop at Sky. What is Hadoop? - Reliable, Scalable, Distributed Where did it come from? - Community + Yahoo! Where is it now? - Apache](https://reader036.vdocuments.us/reader036/viewer/2022081515/56649e8f5503460f94b94012/html5/thumbnails/13.jpg)
1.12
Hadoop at Sky
Building upon the basics
![Page 14: Sky Agile Horizons Hadoop at Sky. What is Hadoop? - Reliable, Scalable, Distributed Where did it come from? - Community + Yahoo! Where is it now? - Apache](https://reader036.vdocuments.us/reader036/viewer/2022081515/56649e8f5503460f94b94012/html5/thumbnails/14.jpg)
• Map/Reduce – divide & conquer
• Pig – SQL-like “Pig Latin”
• HBase – column-based database
• Hive – data-warehousing (SQL-like queries)
• Mahout – distributed algorithms
1.13
Hadoop at Sky
Sub-projects
![Page 15: Sky Agile Horizons Hadoop at Sky. What is Hadoop? - Reliable, Scalable, Distributed Where did it come from? - Community + Yahoo! Where is it now? - Apache](https://reader036.vdocuments.us/reader036/viewer/2022081515/56649e8f5503460f94b94012/html5/thumbnails/15.jpg)
• Java-based- Key,Value input, Key,Value output(s)
• Intended for low-level / bespoke work
1.14
Hadoop at Sky
Map/Reduce
Start
M
M
M
M
M
R
M
R
R
R
R
End
![Page 16: Sky Agile Horizons Hadoop at Sky. What is Hadoop? - Reliable, Scalable, Distributed Where did it come from? - Community + Yahoo! Where is it now? - Apache](https://reader036.vdocuments.us/reader036/viewer/2022081515/56649e8f5503460f94b94012/html5/thumbnails/16.jpg)
• SQL-like syntax, Map/Reduce under the hood
• Client-only software
1.15
Hadoop at Sky
Hive
Query
M R
Results
M R M R M R
![Page 17: Sky Agile Horizons Hadoop at Sky. What is Hadoop? - Reliable, Scalable, Distributed Where did it come from? - Community + Yahoo! Where is it now? - Apache](https://reader036.vdocuments.us/reader036/viewer/2022081515/56649e8f5503460f94b94012/html5/thumbnails/17.jpg)
1.16
Hadoop at Sky
Live Demo
![Page 18: Sky Agile Horizons Hadoop at Sky. What is Hadoop? - Reliable, Scalable, Distributed Where did it come from? - Community + Yahoo! Where is it now? - Apache](https://reader036.vdocuments.us/reader036/viewer/2022081515/56649e8f5503460f94b94012/html5/thumbnails/18.jpg)
• It’s not a magic bullet…
• If the tools you need don’t exist…
• Approach is everything…
• Hadoop is *just* the framework
1.17
Hadoop at Sky
Lastly, word of warning...
![Page 19: Sky Agile Horizons Hadoop at Sky. What is Hadoop? - Reliable, Scalable, Distributed Where did it come from? - Community + Yahoo! Where is it now? - Apache](https://reader036.vdocuments.us/reader036/viewer/2022081515/56649e8f5503460f94b94012/html5/thumbnails/19.jpg)
1.18
Hadoop at Sky
Thank you!
Questions?
http://cotdp.com/hadoop.html- Soft-copy of this presentation- VM image available to download- Example code is on GitHub