hadoop operations-2014-strata-new-york-v5

41
Hadoop Operations – Best Practices from the Field October 17, 2014 Chris Nauroth email: [email protected] twitter: @cnauroth Suresh Srinivas email: [email protected] twitter: @suresh_m_s

Upload: chris-nauroth

Post on 10-Jun-2015

673 views

Category:

Software


1 download

DESCRIPTION

You’ve successfully deployed Hadoop, but are you taking advantage of all of Hadoop’s features to operate a stable and effective cluster? In the first part of the talk, we will cover issues that have been seen over the last two years on hundreds of production clusters with detailed breakdown covering the number of occurrences, severity, and root cause. We will cover best practices and many new tools and features in Hadoop added over the last year to help system administrators monitor, diagnose and address such incidents. The second part of our talk discusses new features for making daily operations easier. This includes features such as ACLs for simplified permission control, snapshots for data protection and more. We will also cover tuning configuration and features that improve cluster utilization, such as short-circuit reads and datanode caching.

TRANSCRIPT

Page 1: Hadoop operations-2014-strata-new-york-v5

Hadoop Operations –Best Practices from the Field

October 17, 2014

Chris Naurothemail: [email protected]: @cnauroth

Suresh Srinivasemail: [email protected]: @suresh_m_s

Page 2: Hadoop operations-2014-strata-new-york-v5

© Hortonworks Inc. 2011

About Us

Chris Nauroth• Member of Technical Staff, Hortonworks

– Apache Hadoop committer and PMC member– Major contributor to HDFS ACLs, Windows compatibility, and operability improvements

• Hadoop user since 2010– Prior employment experience deploying, maintaining and using Hadoop clusters

Suresh Srinivas• Architect & Founder at Hortonworks

– Long time Apache Hadoop committer and PMC member– Designed and developed many key Hadoop features

• Experience from supporting many clusters– Including some of the world’s largest Hadoop clusters

Page 2Architecting the Future of Big Data

Page 3: Hadoop operations-2014-strata-new-york-v5

© Hortonworks Inc. 2011

Agenda

• Analysis of Hadoop Support Cases– Support case trends– Configuration– Documentation– Software Improvements

• Key Learnings and Best Practices– HDFS ACLs– HDFS Snapshots– YARN Application Timeline Server

Page 3Architecting the Future of Big Data

Page 4: Hadoop operations-2014-strata-new-york-v5

© Hortonworks Inc. 2011

Support Cases: Setting the Context

• Hortonworks Support– Multiple tiers of support contacts– Support engineers trained and knowledgeable across the entire Hadoop ecosystem– Cases may escalate to subject matter experts for depth in one particular area– Challenging cases may escalate to Apache committers at Hortonworks if additional expertise is required

• Apache Community Support– [email protected] for user questions and support– https://issues.apache.org/jira for reporting confirmed bugs– Apache Hadoop users, contributors, committers and PMC members all participate actively in these forums to help

resolve issues

Page 4Architecting the Future of Big Data

Page 5: Hadoop operations-2014-strata-new-york-v5

© Hortonworks Inc. 2011

Support Case Analysis Methodology

• Inspected over 2 years of support case history across hundreds of customers

• Broad inclusion of 29 Hadoop ecosystem and related projects

• Multiple versions of Hadoop in deployments– 2 major versions: Hadoop 1.x and 2.x– ~3 minor versions within each major version– ~3 patch releases per minor version– ~15 total releases and updates

• Distinct deployment environments– Cluster sizes ranging from 10s to 1000s of nodes– Different management environments and operational practices– Various deployment techniques: Ambari, Chef, RPMs, etc.

Page 5Architecting the Future of Big Data

Page 6: Hadoop operations-2014-strata-new-york-v5

© Hortonworks Inc. 2011

Support Case Trends – Cases per Month

Page 6Architecting the Future of Big Data

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 280

20

40

60

80

100

120

140

HDFS

Map Reduce

YARN

Page 7: Hadoop operations-2014-strata-new-york-v5

© Hortonworks Inc. 2011

Support Case Trends – Cases per Month

• What is the spike in May 2014?– More users

– More total users means more total support cases

– More features– Many upgrades of existing clusters from Hadoop 1 to Hadoop 2

– Many conversions to HA deployments

– Many conversions to secured deployments

– More integration– Many sites running separate Hadoop 1 and Hadoop 2 clusters simultaneously

– Questions around migrating data between clusters at 2 different versions (DistCp)

Page 7Architecting the Future of Big Data

Page 8: Hadoop operations-2014-strata-new-york-v5

© Hortonworks Inc. 2011

Support Case Trends – Proportional Cases per Month

Page 8Architecting the Future of Big Data

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 310

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

HDFS

Map Reduce

YARN

Other (26 components)

Page 9: Hadoop operations-2014-strata-new-york-v5

© Hortonworks Inc. 2011

Support Case Trends – Root Cause

Page 9Architecting the Future of Big Data

Custo

mer

Env

ironm

ent (

Non H

DP)

Docum

enta

tion

Defec

t

Docum

enta

tion

Gap

Docum

enta

tion

Not U

tilize

d

Educa

tion

- Con

figur

ation

Needs

Tra

ining

Produ

ct D

efec

t0

50

100

150

200

250

300

350

400

450

YARN

Map Reduce

HDFS

Page 10: Hadoop operations-2014-strata-new-york-v5

© Hortonworks Inc. 2011

Support Case Trends

• Highlights– Core Hadoop components (HDFS, YARN and MapReduce) are used across all deployments, and therefore

receive proportionally more support cases than other ecosystem components.– Misconfiguration is the dominant root cause.– Documentation is a close second.– We are constantly improving the code to eliminate operational issues, help with diagnosis and provide increased

visibility.

Page 10Architecting the Future of Big Data

Page 11: Hadoop operations-2014-strata-new-york-v5

Configuration

Page 12: Hadoop operations-2014-strata-new-york-v5

© Hortonworks Inc. 2011

Hardware and Cluster Sizing

• Considerations–Larger clusters heal faster on nodes or disk failure–Machines with huge storage take longer to recover–More racks give more failure domains

• Recommendations– Get good-quality commodity hardware– Buy the sweet-spot in pricing: 3TB disk, 96GB, 8-12 cores

– More memory is better – real time is memory hungry!

– Before considering fatter machines (1U 6 disks vs. 2U 12 disks)– Get to 30-40 machines or 3-4 racks

–Use pilot cluster to learn about load patterns– Balanced hardware for I/O, compute or memory bound

– More details - http://tinyurl.com/hwx-hadoop-hw

Page 12

Page 13: Hadoop operations-2014-strata-new-york-v5

© Hortonworks Inc. 2011

Configuration

• Avoid JVM issues– Use 64 bit JVM for all daemons

– Compressed OOPS enabled by default (6 u23 and later)

– Java heap size– Set same max and starting heapsize, Xmx == Xms

– Avoid java defaults – configure NewSize and MaxNewSize

– Use 1/8 to 1/6 of max size for JVMs larger than 4G– Configure –XX:PermSize=128 MB, -XX:MaxPermSize=256 MB

– Use low-latency GC collector– -XX:+UseConcMarkSweepGC, -XX:ParallelGCThreads=<N>

– High <N> on Namenode and JobTracker or ResourceManager

– Important JVM configs to help debugging– -verbose:gc -Xloggc:<file> -XX:+PrintGCDetails

– -XX:ErrorFile=<file>

– -XX:+HeapDumpOnOutOfMemoryError

Page 13

Page 14: Hadoop operations-2014-strata-new-york-v5

© Hortonworks Inc. 2011

Configuration

• Multiple redundant dirs for namenode metadata– One of dfs.namenode.name.dir should be on NFS– NFS softmount - tcp,soft,intr,timeo=20,retrans=5

• Configure open fd ulimit– Default 1024 is too low– 16K for datanodes, 64K for Master nodes

• Use version control for configuration!

Page 14

Page 15: Hadoop operations-2014-strata-new-york-v5

© Hortonworks Inc. 2011

Configuration

• Use disk fail in place for datanodes: dfs.datanode.failed.volumes.tolerated– Disk failure is no longer datanode failure– Especially important for large density nodes

• Set dfs.namenode.name.dir.restore to true– Restores NN storage directory during checkpointing

• Take periodic backups of namenode metadata– Make copies of the entire storage directory

• Set aside a lot of disk space for NN logs– It is verbose – set aside multiple GBs– Many installs configure this too small

– NN logs roll with in minutes – hard to debug issues

Page 15

Page 16: Hadoop operations-2014-strata-new-york-v5

© Hortonworks Inc. 2011

Monitor Usage• Cluster storage, nodes, files, blocks grows

– Update NN heap, handler count, number of DN xceivers– Tweak other related config periodically

• Monitor the hardware usage for your work load– Disk I/O, network I/O, CPU and memory usage– Use this information when expanding cluster capacity

• Monitor the usage with HADOOP metrics– JVM metrics – GC times, Memory used, Thread Status– RPC metrics – especially latency to track slowdowns– HDFS metrics

– Used storage, # of files and blocks, total load on the cluster

– File System operations

– MapReduce Metrics– Slot utilization and Job status

• Tweak configurations during upgrades/maintenance on an ongoing basis

Page 16

Page 17: Hadoop operations-2014-strata-new-york-v5

Documentation

Page 18: Hadoop operations-2014-strata-new-york-v5

© Hortonworks Inc. 2011

Documentation

• Continual Investment in Documentation– Hortonworks Data Platform Documentation

– http://docs.hortonworks.com/

– Apache Hadoop Documentation– http://hadoop.apache.org/docs/current/

• Apache Hadoop Documentation– We welcome your requests in Apache jira for documentation improvements.

– Create issues with the “documentation” label.

– Getting the end user perspective is extremely valuable.

– We would be grateful to receive documentation patches.– It’s a great way to get started in the Apache Hadoop open source process.

– Search for unresolved issues with the “documentation” label.

– https://issues.apache.org/jira/issues/?jql=project%20in%20(HDFS%2C%20HADOOP%2C%20YARN%2C%20MAPREDUCE)%20AND%20resolution%20%3D%20Unresolved%20AND%20labels%20%3D%20documentation

Page 18Architecting the Future of Big Data

Page 19: Hadoop operations-2014-strata-new-york-v5

Software ImprovementsReal Incidents and Software Improvements to Address Them

Page 20: Hadoop operations-2014-strata-new-york-v5

© Hortonworks Inc. 2011

Don’t edit the metadata files!

• Editing can corrupt the cluster state– Might result in loss of data

• Real incident– NN misconfigured to point to another NN’s metadata– DNs can’t register due to namespace ID mismatch

– System detected the problem correctly

– Safety net ignored by the admin!

– Admin edits the namenode VERSION file to match ids

What Happens Next?

Page 20

Page 21: Hadoop operations-2014-strata-new-york-v5

© Hortonworks Inc. 2011

Improvement

• Pause deletion of blocks when the namenode starts up– https://issues.apache.org/jira/browse/HDFS-6186– Supports configurable delay of block deletions after NameNode startup– Gives an admin extra time to diagnose before deletions begin

• Show when block deletion will start after NameNode startup in WebUI– https://issues.apache.org/jira/browse/HDFS-6385– The web UI already displays the number of pending block deletions– This will enhance the display to indicate when actual deletion will begin

Page 21Architecting the Future of Big Data

Page 22: Hadoop operations-2014-strata-new-york-v5

© Hortonworks Inc. 2011

Guard Against Accidental Deletion• rm –r deletes the data at the speed of Hadoop!

– ctrl-c of the command does not stop deletion!– Undeleting files on datanodes is hard & time consuming

– Immediately shutdown NN, unmount disks on datanodes

– Recover deleted files

– Start namenode without the delete operation in edits

• Enable Trash• Real Incident

– Customer is running a distro of Hadoop with trash not enabled– Deletes a large dir (100 TB) and shuts down NN immediately– Support person asks NN to be restarted to see if trash is enabled!

What happens next?• Now HDFS has Snapshots!

Page 22

Page 23: Hadoop operations-2014-strata-new-york-v5

© Hortonworks Inc. 2011

Improvement

• HDFS Snapshots– https://issues.apache.org/jira/browse/HDFS-2802– A snapshot is a read-only point-in-time image of part of the file system– A snapshot created before a deletion can be used to restore deleted data– More coverage of snapshots later in the presentation

• HDFS ACLs– https://issues.apache.org/jira/browse/HDFS-4685– Finer-grained control of file permissions can help prevent an accidental deletion– More coverage of ACLs later in the presentation

Page 23Architecting the Future of Big Data

Page 24: Hadoop operations-2014-strata-new-york-v5

© Hortonworks Inc. 2011

Unexpected error during HA HDFS upgrade

• Background: HDFS HA Architecture– http://hadoop.apache.org/docs/current/hadoop-project-dist/hadoop-hdfs/HDFSHighAvailabilityWithQJM.html

• Real Incident– During upgrade, NameNode calls every JournalNode to request backup of metadata directory, which renames

“current” directory to “previous.tmp”.– Permissions incorrect on metadata directory for 1 out of 3 JournalNodes.– The hdfs user is not authorized to rename. Backup fails for that JournalNode, so upgrade process aborts with

error.

What happens next?

Page 24Architecting the Future of Big Data

Page 25: Hadoop operations-2014-strata-new-york-v5

© Hortonworks Inc. 2011

Improvement

• Improve diagnostics on storage directory rename operations by using native code.– https://issues.apache.org/jira/browse/HDFS-7118– Logs additional root cause information for rename failure. For example, EACCES

• Split error checks in into separate conditions to improve diagnostics.– https://issues.apache.org/jira/browse/HDFS-7119– Splits a log message about failure to delete or rename into separate log messages to clarify which specific action

failed

• When aborting NameNode or JournalNode, write the contents of the metadata directories and permissions to logs.– https://issues.apache.org/jira/browse/HDFS-7120– Usually the first information asked of the user, so we can automate this

• For JournalNode operations that must succeed on all nodes, execute a pre-check to verify that the operation can succeed.– https://issues.apache.org/jira/browse/HDFS-7121– Prevents need for manual cleanup on 2 out of 3 JournalNodes where backup succeeded

Page 25Architecting the Future of Big Data

Page 26: Hadoop operations-2014-strata-new-york-v5

© Hortonworks Inc. 2011

Support Case Trends

• Highlights Revisited– Core Hadoop components (HDFS, YARN and MapReduce) are used across almost all deployments, and

therefore receive proportionally more support cases than other ecosystem components.– Action: Focus efforts on core Hadoop first to improve operability of the platform.

– Misconfiguration is the dominant root cause.– Action: Publish configuration best practices and advise on the need for ongoing review of configuration as cluster usage

patterns change over time.

– Documentation is a close second.– Action: Contribute frequently to product documentation, both in open source Apache Hadoop and in the distro. End user

documentation is a gating factor for launching new features. We welcome your requests in Apache jira for documentation improvements, and we welcome your patches!

– Code changes often can be implemented to eliminate an operational issue, help with diagnosis or provide increased visibility.

– Action: After resolution of each support case, consider potential product improvements. For example, can logging be improved? Small code changes can have a big impact.

Page 26Architecting the Future of Big Data

Page 27: Hadoop operations-2014-strata-new-york-v5

Key Learnings and Best PracticesFeatures that Help Improve Production Operations

Page 28: Hadoop operations-2014-strata-new-york-v5

© Hortonworks Inc. 2011

HDFS ACLs

• Existing HDFS POSIX permissions good, but not flexible enough– Permission requirements may differ from the natural organizational hierarchy of users and groups.

• HDFS ACLs augment the existing HDFS POSIX permissions model by implementing the POSIX ACL model.– An ACL (Access Control List) provides a way to set different permissions for specific named users or named

groups, not only the file’s owner and file’s group.

Page 28Architecting the Future of Big Data

Page 29: Hadoop operations-2014-strata-new-york-v5

© Hortonworks Inc. 2011

HDFS File Permissions Example

• Authorization requirements:– In a sales department, they would like a single user Maya (Department Manager) to

control all modifications to sales data–Other members of sales department need to view the data, but can’t modify it.–Everyone else in the company must not be allowed to view the data.

• Can be implemented via the following:

Read/Write perm for user maya

User

GroupRead perm for group sales

File with sales data

Page 30: Hadoop operations-2014-strata-new-york-v5

© Hortonworks Inc. 2011

HDFS ACLs

• Problem–No longer feasible for Maya to control all modifications to the file

– New Requirement: Maya, Diane and Clark are allowed to make modifications

– New Requirement: New group called executives should be able to read the sales data

–Current permissions model only allows permissions at 1 group and 1 user

• Solution: HDFS ACLs–Now assign different permissions to different users and groups

Owner

Group

Others

HDFS Directory

… rwx

… rwx

… rwx

Group D … rwx

Group F … rwx

User Y … rwx

Page 31: Hadoop operations-2014-strata-new-york-v5

© Hortonworks Inc. 2011

HDFS ACLs

New Tools for ACL Management (setfacl, getfacl)

– hdfs dfs -setfacl -m group:execs:r-- /sales-data– hdfs dfs -getfacl /sales-data# file: /sales-data# owner: maya# group: salesuser::rw-group::r--group:execs:r--mask::r--other::--

– How do you know if a directory has ACLs set?– hdfs dfs -ls /sales-dataFound 1 items-rw-r-----+  3 maya sales          0 2014-03-04 16:31 /sales-data

Page 32: Hadoop operations-2014-strata-new-york-v5

© Hortonworks Inc. 2011

HDFS ACLs

Default ACLs–hdfs dfs -setfacl -m default:group:execs:r-x /monthly-sales-data

–hdfs dfs -mkdir /monthly-sales-data/JAN–hdfs dfs –getfacl /monthly-sales-data/JAN– # file: /monthly-sales-data/JAN# owner: maya# group: salesuser::rwxgroup::r-xgroup:execs:r-xmask::r-xother::---default:user::rwxdefault:group::r-xdefault:group:execs:r-xdefault:mask::r-xdefault:other::---

Page 33: Hadoop operations-2014-strata-new-york-v5

© Hortonworks Inc. 2011

HDFS ACLs Best Practices

• Start with traditional HDFS permissions to implement most permission requirements.

• Define a smaller number of ACLs to handle exceptional cases.

• A file with an ACL incurs an additional cost in memory in the NameNode compared to a file that has only traditional permissions.

Page 33Architecting the Future of Big Data

Page 34: Hadoop operations-2014-strata-new-york-v5

© Hortonworks Inc. 2011

HDFS Snapshots

• HDFS Snapshots– A snapshot is a read-only point-in-time image of part of the file system– Performance: snapshot creation is instantaneous, regardless of data size or subtree depth– Reliability: snapshot creation is atomic– Scalability: snapshots do not create extra copies of data blocks– Useful for protecting against accidental deletion of data

• Example: Daily Feedshdfs dfs -ls /daily-feedsFound 5 itemsdrwxr-xr-x - chris supergroup 0 2014-10-13 14:36 /daily-feeds/2014-10-13drwxr-xr-x - chris supergroup 0 2014-10-13 14:36 /daily-feeds/2014-10-14drwxr-xr-x - chris supergroup 0 2014-10-13 14:37 /daily-feeds/2014-10-15drwxr-xr-x - chris supergroup 0 2014-10-13 14:37 /daily-feeds/2014-10-16drwxr-xr-x - chris supergroup 0 2014-10-13 14:37 /daily-feeds/2014-10-17

Page 34Architecting the Future of Big Data

Page 35: Hadoop operations-2014-strata-new-york-v5

© Hortonworks Inc. 2011

HDFS Snapshots

• Create a snapshot after each daily loadhdfs dfsadmin -allowSnapshot /daily-feedsAllowing snaphot on /daily-feeds succeeded

hdfs dfs -createSnapshot /daily-feeds snapshot-to-2014-10-17Created snapshot /daily-feeds/.snapshot/snapshot-to-2014-10-17

• User accidentally deletes data for 2014-10-16hdfs dfs -ls /daily-feedsFound 4 itemsdrwxr-xr-x - chris supergroup 0 2014-10-13 14:36 /daily-feeds/2014-10-13drwxr-xr-x - chris supergroup 0 2014-10-13 14:36 /daily-feeds/2014-10-14drwxr-xr-x - chris supergroup 0 2014-10-13 14:37 /daily-feeds/2014-10-15drwxr-xr-x - chris supergroup 0 2014-10-13 14:37 /daily-feeds/2014-10-17

Page 35Architecting the Future of Big Data

Page 36: Hadoop operations-2014-strata-new-york-v5

© Hortonworks Inc. 2011

HDFS Snapshots

• Snapshots to the rescue: the data is still in the snapshothdfs dfs -ls /daily-feeds/.snapshot/snapshot-to-2014-10-17Found 5 itemsdrwxr-xr-x - chris supergroup 0 2014-10-13 14:36 /daily-feeds/.snapshot/snapshot-to-2014-10-17/2014-10-13drwxr-xr-x - chris supergroup 0 2014-10-13 14:36 /daily-feeds/.snapshot/snapshot-to-2014-10-17/2014-10-14drwxr-xr-x - chris supergroup 0 2014-10-13 14:37 /daily-feeds/.snapshot/snapshot-to-2014-10-17/2014-10-15drwxr-xr-x - chris supergroup 0 2014-10-13 14:37 /daily-feeds/.snapshot/snapshot-to-2014-10-17/2014-10-16drwxr-xr-x - chris supergroup 0 2014-10-13 14:37 /daily-feeds/.snapshot/snapshot-to-2014-10-17/2014-10-17

• Restore data from 2014-10-16hdfs dfs -cp /daily-feeds/.snapshot/snapshot-to-2014-10-17/2014-10-16 /daily-feeds

Page 36Architecting the Future of Big Data

Page 37: Hadoop operations-2014-strata-new-york-v5

© Hortonworks Inc. 2011

YARN Application Timeline Server

• Stores data about YARN application execution– Generic data

– YARN container utilization

– Metrics related to containers

– Application-specific data– MapReduce jobs and their tasks

– Tez DAG execution

• Provides CLI for accessing data– Useful for ad-hoc queries or scripted analysis

• Provides REST API for accessing data– Consumed by UI front-ends such as Apache Ambari

Page 37Architecting the Future of Big Data

Page 38: Hadoop operations-2014-strata-new-york-v5

© Hortonworks Inc. 2011

Querying a Map Reduce Job Entity

curl http://127.0.0.1:8188/ws/v1/timeline/MAPREDUCE_JOB/job_1413405332088_0001{ "entity": "job_1413405332088_0001", "entitytype": "MAPREDUCE_JOB", "events": [ { "eventinfo": { "FINISHED_MAPS": 2, "FINISHED_REDUCES": 1, "FINISH_TIME": 1413405349192, "JOB_STATUS": "SUCCEEDED" }, "eventtype": "JOB_FINISHED", "timestamp": 1413405349194 } ], "relatedentities": { "MAPREDUCE_TASK": [ "task_1413405332088_0001_m_000000" ] }, "starttime": 1413405339442}

Page 38Architecting the Future of Big Data

Page 39: Hadoop operations-2014-strata-new-york-v5

© Hortonworks Inc. 2011

Querying a Map Task Entity

curl http://127.0.0.1:8188/ws/v1/timeline/MAPREDUCE_TASK/task_1413405332088_0001_m_000000{ "entity": "task_1413405332088_0001_m_000000", "entitytype": "MAPREDUCE_TASK", "events": [ { "eventtype": "TASK_FINISHED", "timestamp": 1413405345253 }, { "eventinfo": { "SPLIT_LOCATIONS": "localhost", "START_TIME": 1413405340255, "TASK_TYPE": "MAP" }, "eventtype": "TASK_STARTED", "timestamp": 1413405340258 } ],}

Page 39Architecting the Future of Big Data

Page 40: Hadoop operations-2014-strata-new-york-v5

© Hortonworks Inc. 2011

Summary

• Configuration– Prevent garbage collection issues– Configure for redundancy– Retune configuration in response to metrics

• Documentation– End user perspective is crucial– Please consider contributing to Apache Hadoop documentation

• HDFS ACLs– Implement fine-grained authorization rules on files– Can protect against accidental file manipulations

• HDFS Snapshots– Point-in-time image of part of the filesystem– Useful for restoring to a prior state after accidental file manipulation

• YARN Application Timeline Server– Provides generic and application-specific data about YARN application execution– Useful for analyzing cluster usage patterns

Page 40Architecting the Future of Big Data

Page 41: Hadoop operations-2014-strata-new-york-v5

© Hortonworks Inc. 2011

Thank you, Q&A

Page 41

Resource Location

Hardware Recommendations for Apache Hadoop

http://docs.hortonworks.com/HDPDocuments/HDP2/HDP-2.1.2/bk_cluster-planning-guide/content/ch_hardware-recommendations.html

Hadoop Documentation Issues

https://issues.apache.org/jira/issues/?jql=project%20in%20(HDFS%2C%20HADOOP%2C%20YARN%2C%20MAPREDUCE)%20AND%20resolution%20%3D%20Unresolved%20AND%20labels%20%3D%20documentation

HDFS operational and debuggability improvements

https://issues.apache.org/jira/browse/HDFS-6185

HDFS ACLs Blog Post http://hortonworks.com/blog/hdfs-acls-fine-grained-permissions-hdfs-files-hadoop/

HDFS Snapshots Blog Post http://hortonworks.com/blog/protecting-your-enterprise-data-with-hdfs-snapshots/

YARN Timeline Server Documentation

http://hadoop.apache.org/docs/current/hadoop-yarn/hadoop-yarn-site/TimelineServer.html

Learn more