understanding hadoop clusters and the network

25
Understanding Hadoop Clusters and the Network Part 1. Introduction and Overview BRAD HEDLUND .com Brad Hedlund http://bradhedlund.com http://www.linkedin.com/in/bra dhedlund @bradhedlund

Upload: bradhedlund

Post on 24-Dec-2014

1.199 views

Category:

Technology


2 download

DESCRIPTION

 

TRANSCRIPT

Page 1: Understanding Hadoop Clusters and the Network

Understanding Hadoop Clusters and the Network

Part 1. Introduction and Overview

BRAD HEDLUND .com

Brad Hedlundhttp://bradhedlund.comhttp://www.linkedin.com/in/bradhedlund@bradhedlund

Page 2: Understanding Hadoop Clusters and the Network

Hadoop Server Roles

Data Node &Task Tracker

Data Node &Task Tracker

Data Node &Task Tracker

Data Node &Task Tracker

Data Node &Task Tracker

Data Node &Task Tracker

slaves

masters

Clients

Name NodeJob TrackerSecondary

Name Node

Map Reduce HDFSDistributed Data Analytics Distributed Data Storage

BRAD HEDLUND .com

Page 3: Understanding Hadoop Clusters and the Network

Hadoop Cluster

Rack 1

DN + TT

DN + TT

DN + TT

DN + TT

Name Node

Rack 2

DN + TT

DN + TT

DN + TT

DN + TT

Job Tracker

Rack 3

DN + TT

DN + TT

DN + TT

DN + TT

Secondary NN

Rack 4

DN + TT

DN + TT

DN + TT

DN + TT

Client

Rack N

DN + TT

DN + TT

DN + TT

DN + TT

DN + TT

DN + TT

World

switch switch switch switch switch

switch switch

BRAD HEDLUND .com

Page 4: Understanding Hadoop Clusters and the Network

Typical Workflow

• Load data into the cluster (HDFS writes)• Analyze the data (Map Reduce)• Store results in the cluster (HDFS writes)• Read the results from the cluster (HDFS reads)

How many times did our customers type the word “Fraud” into emails sent to customer service?

Sample Scenario:

File.txtHuge file containing all emails sent to customer service

BRAD HEDLUND .com

Page 5: Understanding Hadoop Clusters and the Network

Writing files to HDFS

• Client consults Name Node• Client writes block directly to one Data Node• Data Nodes replicates block• Cycle repeats for next block

Name Node

Data Node 1 Data Node 5 Data Node 6 Data Node N

Client

I want to write Blocks A,B,C of

File.txt

OK. Write to Data Nodes

1,5,6

Blk A Blk B Blk C

File.txt

Blk A Blk B Blk C

BRAD HEDLUND .com

Page 6: Understanding Hadoop Clusters and the Network

Hadoop Rack Awareness – Why?Name Node

File.txt=Blk A:DN1, DN5, DN6

Blk B:DN7, DN1, DN2

Blk C:DN5, DN8,DN9

metadata

Rack 1

Data Node 1

Data Node 2

Data Node 3

Data Node 5

switch

A

Rack 5

Data Node 5

Data Node 6

Data Node 7

Data Node 8

switch

Rack 9

Data Node 9

Data Node 10

Data Node 11

Data Node 12

switch

• Never loose all data if entire rack fails• Keep bulky flows in-rack when possible• Assumption that in-rack is higher bandwidth,

lower latency

A

A

BB

B

C C

C

switch

Rack 1:Data Node 1Data Node 2Data Node 3

Rack 5:Data Node 5Data Node 6Data Node 7

Rack aware

BRAD HEDLUND .com

Page 7: Understanding Hadoop Clusters and the Network

switch

switchswitch

Preparing HDFS writes

Name Node

Data Node 1 Data Node 5

Data Node 6

Client

I want to write File.txtBlock A

OK. Write to Data Nodes

1,5,6

Blk A Blk B Blk C

File.txt

Ready Data Nodes

5,6

Read

y!

Ready?

Rack 1 Rack 5

Rack 1:Data Node 1

Rack 5:Data Node 5Data Node 6

ReadyData Node 6

Rack aware

Ready! • Name Node picks two nodes in the same rack, one node in a different rack

• Data protection• Locality for M/R

Ready!

BRAD HEDLUND .com

Page 8: Understanding Hadoop Clusters and the Network

switch

switchswitch

Preparing HDFS writes

Name Node

Data Node 1 Data Node 5

Data Node 6

Client

I want to write File.txtBlock A

OK. Write to Data Nodes

1,5,6

Blk A Blk B Blk C

File.txt

Ready Data Nodes

5,6

Read

y!

Ready?

Rack 1 Rack 5

Rack 1:Data Node 1

Rack 5:Data Node 5Data Node 6

ReadyData Node 6

Rack aware

Ready! • Name Node picks two nodes in the same rack, one node in a different rack

• Data protection• Locality for M/R

Ready!

BRAD HEDLUND .com

Page 9: Understanding Hadoop Clusters and the Network

switch

switchswitch

Pipelined Write

Name Node

Data Node 1 Data Node 5

Data Node 6

ClientBlk A Blk B Blk C

File.txt

Rack 1 Rack 5

Rack 1:Data Node 1

Rack 5:Data Node 5Data Node 6

A A

A

• Data Nodes 1 & 2 pass data along as its received

• TCP 50010

Rack aware

BRAD HEDLUND .com

Page 10: Understanding Hadoop Clusters and the Network

switch

switchswitch

Pipelined Write

Name Node

Data Node 1 Data Node 5

Data Node 6

ClientBlk A Blk B Blk C

File.txt

Rack 1 Rack 5

Rack 1:Data Node 1

Rack 5:Data Node 5Data Node 6

A A

A

• Data Nodes 1 & 2 pass data along as its received

• TCP 50010

Rack aware

BRAD HEDLUND .com

Page 11: Understanding Hadoop Clusters and the Network

switch

switchswitch

Pipelined Write

Name Node

Data Node 1 Data Node 2

Data Node 3

ClientBlk A Blk B Blk C

File.txt

Rack 1 Rack 5

Rack 1:Data Node 1

Rack 5:Data Node 2Data Node 3

A A

A

Block received

Success

Success

File.txtBlk A:DN1, DN2, DN3

BRAD HEDLUND .com

Page 12: Understanding Hadoop Clusters and the Network

Multi-block Replication Pipeline

Data Node 1 Data Node X

Data Node 3

Rack 1

switch

switch

Client

switch

Blk A Blk A

Blk A

Rack 4

Data Node 2

Rack 5

switch

Data Node Y

Data Node Z

Blk B

Blk B

Blk B

Data Node WBlk C

Blk C

Blk C

Blk A Blk B Blk C

File.txt 1TB File =3TB storage3TB network traffic

BRAD HEDLUND .com

Page 13: Understanding Hadoop Clusters and the Network

Client writes Span the HDFS ClusterClient

Rack N

Data Node

Data Node

Data Node

Data Node

Data Node

Data Node

switch

File.txt

Rack 4

Data Node

Data Node

Data Node

Data Node

Data Node

Data Node

switch

Rack 3

Data Node

Data Node

Data Node

Data Node

Data Node

Data Node

switch

Rack 2

Data Node

Data Node

Data Node

Data Node

Data Node

Data Node

switch

Rack 1

Data Node

Data Node

Data Node

Data Node

Data Node

Data Node

switch

• Block size• File Size

Factors:

More blocks = Wider spread

BRAD HEDLUND .com

Page 14: Understanding Hadoop Clusters and the Network

Data Node writes span itself, and other racks

Rack N

Data Node

Data Node

Data Node

Data Node

Data Node

Data Node

switch

Rack 4

Data Node

Data Node

Data Node

Data Node

Data Node

Data Node

switch

Rack 3

Data Node

Data Node

Data Node

Data Node

Data Node

Data Node

switch

Rack 2

Data Node

Data Node

Data Node

Data Node

Data Node

Data Node

switch

C

C

B

B

A

A

Rack 1

Data Node

Data Node

Data Node

Data Node

Data Node

switch

CBA

Blk A Blk B Blk C

Results.txt

BRAD HEDLUND .com

Page 15: Understanding Hadoop Clusters and the Network

Name Node

• Data Node sends Heartbeats• Every 10th heartbeat is a Block report• Name Node builds metadata from Block reports• TCP – every 3 seconds• If Name Node is down, HDFS is down

Name Node

Data Node 1 Data Node 2 Data Node 3 Data Node N

A AA CC

DN1: A,CDN2: A,CDN3: A,C

metadata

File.txt = A,C

C

File systemAwesome!

Thanks.

I’m alive!

I have blocks:

A, C

BRAD HEDLUND .com

Page 16: Understanding Hadoop Clusters and the Network

Re-replicating missing replicas

Name Node

Data Node 1 Data Node 2 Data Node 3 Data Node 8

A AA CC

DN1: A,CDN2: B,CDN3: A, B

metadataRack1: DN1, DN2Rack5: DN3, Rack9: DN8

C

Rack AwarenessUh Oh!Missing replicas

Copy blocks A,C to Node 8

A C

• Missing Heartbeats signify lost Nodes• Name Node consults metadata, finds affected data• Name Node consults Rack Awareness script• Name Node tells a Data Node to re-replicate

BRAD HEDLUND .com

Page 17: Understanding Hadoop Clusters and the Network

Secondary Name Node

• Not a hot standby for the Name Node• Connects to Name Node every hour*• Housekeeping, backup of Name Node metadata• Saved metadata can rebuild a failed Name Node

Name Node

metadata

File.txt = A,C

File system

Secondary Name Node

Its been an hour, give me your

metadata

BRAD HEDLUND .com

Page 18: Understanding Hadoop Clusters and the Network

Client reading files from HDFS

Name NodeClient

Tell me the block locations of Results.txt

Blk A = 1,5,6Blk B = 8,1,2Blk C = 5,8,9

Results.txt=Blk A:DN1, DN5, DN6

Blk B:DN7, DN1, DN2

Blk C:DN5, DN8,DN9

metadata

Rack 1

Data Node 1

Data Node 2

Data Node

Data Node

switch

A

Rack 5

Data Node 5

Data Node 6

Data Node

Data Node

switch

Rack 9

Data Node 8

Data Node 9

Data Node

Data Node

switch

• Client receives Data Node list for each block• Client picks first Data Node for each block• Client reads blocks sequentially

A

A

BB

B

C C

C

BRAD HEDLUND .com

Page 19: Understanding Hadoop Clusters and the Network

Data Node reading files from HDFS

Name Node

Block A = 1,5,6

File.txt=Blk A:DN1, DN5, DN6

Blk B:DN7, DN1, DN2

Blk C:DN5, DN8,DN9

metadata

Rack 1

Data Node 1

Data Node 2

Data Node 3

Data Node

switch

A

Rack 5

Data Node 5

Data Node 6

Data Node

Data Node

switch

Rack 9

Data Node 8

Data Node 9

Data Node

Data Node

switch

• Name Node provides rack local Nodes first• Leverage in-rack bandwidth, single hop

A

A

BB

B

C C

C

Tell me the locations of Block A of

File.txt switch

Rack 1:Data Node 1Data Node 2Data Node 3

Rack 5:Data Node 5

Rack aware

BRAD HEDLUND .com

Page 20: Understanding Hadoop Clusters and the Network

Data Processing: Map

• Map: “Run this computation on your local data”• Job Tracker delivers Java code to Nodes with local data

Map Task Map Task Map Task

A B C

Client

How many times does

“Fraud” appear in File.txt?

Count “Fraud”

in Block C

File.txt

Fraud = 3 Fraud = 0 Fraud = 11

Job TrackerName Node

Data Node 1 Data Node 5 Data Node 9

BRAD HEDLUND .com

Page 21: Understanding Hadoop Clusters and the Network

switchswitchswitch

What if data isn’t local?

• Job Tracker tries to select Node in same rack as data• Name Node rack awareness

“I need block A”Map Task Map Task

B C

Client

How many times does

“Fraud” appear in File.txt?

Count “Fraud”

in Block C

Fraud = 0 Fraud = 11

Job TrackerName Node

Data Node 1Data Node 5 Data Node 9

“no Map tasks left”A

Data Node 2

Rack 1 Rack 5 Rack 9

BRAD HEDLUND .com

Page 22: Understanding Hadoop Clusters and the Network

Data Processing: Reduce

• Reduce: “Run this computation across Map results”• Map Tasks deliver output data over the network • Reduce Task data output written to and read from HDFS

Fraud = 3Fraud = 0

Fraud = 11

Job Tracker

Reduce Task

Sum “Fraud”

Results.txtFraud = 14

Map Task Map Task Map Task

A B C

Client

HDFS

X Y Z

Data Node 1 Data Node 5 Data Node 9

Data Node 3

BRAD HEDLUND .com

Page 23: Understanding Hadoop Clusters and the Network

Unbalanced Cluster

New Rack

Data Node

Data Node

Data Node

Data Node

Data Node

Data Node

switch

Rack 2

Data Node

Data Node

Data Node

Data Node

Data Node

Data Node

switch

Rack 1

Data Node

Data Node

Data Node

Data Node

Data Node

Data Node

switch NEW

switch

New Rack

Data Node

Data Node

Data Node

Data Node

Data Node

Data Node

switch NEW

File.txt

• Hadoop prefers local processing if possible• New servers underutilized for Map Reduce, HDFS*• Might see more network bandwidth, slower job times**

**I was assigned a Map Task but don’t have the block. Guess I need to get it.

*I’m bored!

BRAD HEDLUND .com

Page 24: Understanding Hadoop Clusters and the Network

Cluster Balancing

New Rack

Data Node

Data Node

Data Node

Data Node

Data Node

Data Node

switch

Rack 2

Data Node

Data Node

Data Node

Data Node

Data Node

Data Node

switch

Rack 1

Data Node

Data Node

Data Node

Data Node

Data Node

Data Node

switch NEW

switch

New Rack

Data Node

Data Node

Data Node

Data Node

Data Node

Data Node

switch NEW

File.txt

• Balancer utility (if used) runs in the background• Does not interfere with Map Reduce or HDFS• Default speed limit 1 MB/s

brad@cloudera-1:~$hadoop balancer

BRAD HEDLUND .com

Page 25: Understanding Hadoop Clusters and the Network

BRAD HEDLUND .com

Thanks!

Narrated at:http://bradhedlund.com/?p=3108