apache hadoop for system administrators

53
Apache Hadoop for System Administrators Allen Wittenauer

Upload: allen-wittenauer

Post on 26-Jan-2015

126 views

Category:

Technology


8 download

DESCRIPTION

The talk I gave at LISA 2013!

TRANSCRIPT

Page 1: Apache Hadoop for System Administrators

Apache Hadoop for System Administrators Allen Wittenauer

Page 3: Apache Hadoop for System Administrators

Hadoop Deployed Now?

Page 4: Apache Hadoop for System Administrators

Planning Hadoop Deployment?

Page 5: Apache Hadoop for System Administrators

Needed some place to sit before lunch?

Page 6: Apache Hadoop for System Administrators

An Extremely Quick & Incomplete Intro to Hadoop

Page 7: Apache Hadoop for System Administrators
Page 8: Apache Hadoop for System Administrators

Map (“transform”)– Perl:

– Python:

items = [1,2,3,4,5]def sqr(x) : return x**2

print list(map(sqr,items))[1, 4, 9, 16, 25]

@items=(1,2,3,4,5);sub sqr {return $_**2);print join(‘,’,map(sqr,@items));1,4,9,16,25

Page 9: Apache Hadoop for System Administrators

Reduce (“compress” or “fold”)– Perl

– Python

from functools import reduceitems = [1,4,9,16,25]print reduce ((lambda x,y: x if

(x>y) else y), items)25

use List::Util qw/reduce/;@items=(1,4,9,16,25);print reduce {$a>$b ? $a:$b} @items;25

Page 10: Apache Hadoop for System Administrators

NEVER GONNA

GIVE YOU UP

Page 11: Apache Hadoop for System Administrators

Hadoop(‘common’ or ‘core’)

MapReduce HDFS

Page 12: Apache Hadoop for System Administrators

Hadoop(‘common’ or ‘core’)

MapReduce S3

Page 13: Apache Hadoop for System Administrators

Hadoop(‘common’ or ‘core’)

MapReduce Gluster

Page 14: Apache Hadoop for System Administrators

Hadoop(‘common’ or ‘core’)

HBase HDFS

Page 15: Apache Hadoop for System Administrators

NameNode

DataNode DataNodeDataNode

ext4D

ext4D

ext4D

ext4D

ext4D

ext4D

Page 16: Apache Hadoop for System Administrators

JobTracker

M

TaskTracker

M R R M

TaskTracker

M R R M

TaskTracker

M R R

Page 17: Apache Hadoop for System Administrators

JobTracker

M

TaskTracker

M R R M

TaskTracker

M R R M

TaskTracker

M R R

Page 18: Apache Hadoop for System Administrators

NameNode

JobTracker

DN

TT

D D D DM M R R

DN

TT

D D D DM M R R

DN

TT

D D D DM M R R

DN

TT

D D D DM M R R

DN

TT

D D D DM M R R

Page 19: Apache Hadoop for System Administrators

Hadoop isn’t designed for system administrators and/or support staff.

Page 20: Apache Hadoop for System Administrators

“Hadoop is not a developer problem; it’s an operations problem.”

-- Hadoop vendor ex-employee

Page 21: Apache Hadoop for System Administrators
Page 22: Apache Hadoop for System Administrators

Don’t Make Assumptions

Page 23: Apache Hadoop for System Administrators

tail’ing the logs won’t tell you the whole story.

Page 24: Apache Hadoop for System Administrators

%

Page 25: Apache Hadoop for System Administrators

Monitor the masters!

Page 26: Apache Hadoop for System Administrators
Page 27: Apache Hadoop for System Administrators
Page 28: Apache Hadoop for System Administrators
Page 29: Apache Hadoop for System Administrators
Page 30: Apache Hadoop for System Administrators

LinkedIn’s Configuration– 30+ Health Checks per Grid Masters, canary report, daily fsck, etc

– 10+ Health Checks per DC LDAP, Kerberos, etc ...

– Cross-DC Nagios Server Checks

Warn: 5% down nodes Panic: 30% down HDFS: 20% Free Space Gateway home dir: 10% free

space ...

ComputeNodes

Nagios

NN JT AZ GWZK VD

Page 31: Apache Hadoop for System Administrators

Health Check Script– “OK” - good status– “ERROR (message)” - bad status

mapred.healthChecker.script.path

Consider checking ...– critical software– ownership & permissions– network connection speed– drive count

– file system space– RO file systems– IO errors– missing memory

Page 32: Apache Hadoop for System Administrators
Page 33: Apache Hadoop for System Administrators

Use the tools most of your user’s code is written in!

Pig– testfile:

– Code:

– Output:

A = load 'testfile' using PigStorage(',') as (i: int);

B = foreach C generate i;C = distinct B;dump C;

100

(100)

Page 34: Apache Hadoop for System Administrators

Reactive

Page 35: Apache Hadoop for System Administrators

Proactive

Page 36: Apache Hadoop for System Administrators

Resource Controls

Page 37: Apache Hadoop for System Administrators

JobTracker Memory Resource Controls– Limit jobs stored in JT heap: mapred.jobtracker.completeuserjobs.maximum

– Limit total # of job tasks: mapred.jobtracker.maxtasks.per.job

Job Memory Resource Controls– Scheduler-level: mapred.cluster.*.memory.mb– TT-level: auto-calculated based upon MR slot counts & scheduler level settings– MR Job-level: mapred.job.*.memory.mb– Linux only: /proc memory calculator and task killer

Page 38: Apache Hadoop for System Administrators

“I set the heap to 1G but my process ran out of memory?”

Page 39: Apache Hadoop for System Administrators

Treat HDFS like any other multi-tenant FS

Page 40: Apache Hadoop for System Administrators

Quota everything– Yes, including /tmp– No “show me all quotas” functionality

Be consistent:– /user/* all get same quota

Be flexible:– Make another dir for user’s to store big projects (e.g., /project)

Be smart:– Have a policy that content in /tmp gets deleted after X days. Automate this!– Build reporting that shows files that are replicated less than 3 times

dfsadmin -setQuotadfsadmin -setSpaceQuota

Page 41: Apache Hadoop for System Administrators

Compute Node Disk Partitioning as Protective Measure

Page 42: Apache Hadoop for System Administrators

root partitioning

non-root partinioning

20 GB /, ... 200 GB task space (rest) HDFS

5 GB swap 200 GB task space (rest) HDFS

Page 43: Apache Hadoop for System Administrators

Security!

Page 44: Apache Hadoop for System Administrators

Queue Level ACLs– users– groups– netgroups

Service Level ACLs– hosts– users– groups– netgroups

– Limitation: Web services are all or nothing! :(– Be aware: Hadoop uses ephemeral ports all over the place! :(

hadoop-policy.xml

mapred-queue-acls.xml

Page 45: Apache Hadoop for System Administrators

Kerberos!

Page 46: Apache Hadoop for System Administrators
Page 47: Apache Hadoop for System Administrators

Corp ITActive Directory

@CORP

Client Node

Grid Realm@GRID

krbtgt/GRID@CORP

Hadoop Services

krbtgt/user@CORPkrbtgt/GRID@CORP

krbtgt/host@GRIDkrbtgt/service@GRID

Password

Page 48: Apache Hadoop for System Administrators
Page 49: Apache Hadoop for System Administrators
Page 50: Apache Hadoop for System Administrators
Page 52: Apache Hadoop for System Administrators

Fonzi: http://www.flickr.com/photos/elzey/7224689810 Captain Obvious by artist Stuart McGhee. http://stuartmcghee.com/ Ant on flower: http://www.flickr.com/photos/bolonski/6116358907 Ant Colony: http://www.flickr.com/photos/klearchos/2821230516 Ant Queen: http://commons.wikimedia.org/wiki/

File:Camponotus_crispulus_queen_ant.jpg Canary: http://www.flickr.com/photos/nathan_and_jenny/2454127424 Mona Lisa: Leonardo Da Vinci White Elephant: http://data.linkedin.com/opensource/white-elephant Ecce Homo:

– Elías García Martínez (original) – Cecilia Giménez (restored)

Page 53: Apache Hadoop for System Administrators

Contact: Twitter: @_a__w_ Email: [email protected]

More info: Quora: www.quora.com/user/allenwittenauer SlideShare: www.slideshare.net/allenwittenauer

Thanks!