big data: introduction to hadoop

17
Hadoop 101 Big Data Technology

Upload: tokopedia

Post on 13-Apr-2017

550 views

Category:

Technology


3 download

TRANSCRIPT

Page 1: Big Data: Introduction to Hadoop

Hadoop 101Big Data Technology

Page 2: Big Data: Introduction to Hadoop

What is Big Data?

Page 3: Big Data: Introduction to Hadoop

Big Data is ...

- A Technology that capable of handling a:- massive and complex data (petabytes+)- stream of data in (near) real time- extremely large infrastructure

Page 4: Big Data: Introduction to Hadoop

$ whoami

Firman GautamaSoftware Engineer at ADSKOM Indonesia

[email protected]

Page 5: Big Data: Introduction to Hadoop

What is Hadoop?- Hadoop is:

- scalable.- a “Framework”.- not a drop in replacement

for RDBMS.- great for pipelining

massive amounts of data to achieve the end result.

Page 6: Big Data: Introduction to Hadoop

- Hadoop was created by Doug Cutting and Mike Cafarella. Cutting, who was working at Yahoo! at the time, named it after his son’s toy elephant.

- Yahoo! has the single largest Hadoop cluster in the world (4,500 nodes). (according to the Apache Hadoop website)

- Yes, there is a Hadoop GPU Framework available!

Hadoop Fun Facts

Page 7: Big Data: Introduction to Hadoop

Hadoop Core Components

Page 8: Big Data: Introduction to Hadoop

Hadoop 1.x- HDFS (storage)

- NameNode- DataNode- Secondary NameNode*

- MapReduce (processing)- JobTracker- TaskTrackers- JobHistoryServer

Hadoop Core Components (Details)

Hadoop 2.x- HDFS (storage)

- NameNode- DataNode- Secondary NameNode*

- YARN (processing)- ResourceManager- ApplicationMaster- NodeManager- JobHistoryServer

Page 9: Big Data: Introduction to Hadoop

Hadoop Compatible Components (1)

- Manipulate/Querying Data:- Apache Hive (SQL like query)- Cloudera Impala (SQL like query)- Apache Pig (Scripting based query)

- MapReduce (Library)

- Key Value Storage- HBase- Cassandra

Page 10: Big Data: Introduction to Hadoop

Hadoop Compatible Components (2)

- Message Queueing:- Kafka (Similar to RabbitMQ, Pub-Sub, etc)

- Advanced Processing- Spark (Up to 100x faster than MapReduce)

- Scheduler/Workflow- Oozie (Similar to Crontab)

Page 11: Big Data: Introduction to Hadoop

Hadoop Compatible Components (3)

- Data Export/Import:- Flume (Stream: Text Files/Logs to HDFS)- Sqoop (RDBMS to HDFS or vice versa)

and many more.. :)

Page 12: Big Data: Introduction to Hadoop

Most Popular Hadoop Distributions

source: datanami.com

Page 13: Big Data: Introduction to Hadoop

Real Example of Using Hadoop* (1)

Page 14: Big Data: Introduction to Hadoop

Real Example of Using Hadoop* (2)

Page 15: Big Data: Introduction to Hadoop

Real Example of Using Hadoop* (3)

(near) Real Time Analytics

Page 16: Big Data: Introduction to Hadoop

QA Session

Join our Linkedin Group

Big Data Indonesiahttps://www.linkedin.com/grp/home?gid=6970225

Page 17: Big Data: Introduction to Hadoop

Hadoop 101Thank You # EOFUnless stated, all images used in this slides belong to their respective owners.