building the right platform architecture for hadoop
TRANSCRIPT
Building the right Platform Architecture for Hadoop
ROMMEL GARCIA HORTONWORKS
#whoami
▸ Sr. Solutions Engineer & Platform Architect @hortonworks
▸ Global Security SME Lead @hortonworks
▸ Author of “Virtualizing Hadoop: How to Install, Deploy, and Optimize Hadoop in a Virtualized Architecture”
▸ Runs Atlanta Hadoop User Group
OverviewThe Elephant keeps getting bigger
THE HADOOP ECOSYSTEM
THE TWO FACE OF HADOOP
Hadoop Cluster
Master Nodes Worker Nodes
Manage cluster state
Manage data & job state
Manage CPU resources
Manage RAM resources
Manage I/O resources
Manage Network resources
RUN W/
YARN
PLATFORM FOCUS
▸ Performance (SLA)
▸ Scale (Bursts)
▸ Speed (RT)
▸ Throughput (data/time, compute/time)
▸ Resiliency (HA)
▸ Graceful Degradation (Throttling/Failure Mgt.)
NetworkIt’s not Pyramidal
NETWORK IS EVERYTHING!
W
N
E
S
WorkloadsYour workload no longer affects me
HADOOP AND THE WORLD OF WORKLOADS
Workloads (YARN)
Hadoop Cluster
Deployment
OLAP OLTP DATA SCIENCE STREAMING
STRUCTURED UNSTRUCTUREDData (HDFS)
HADOOP WORKLOAD MANAGEMENT
Hadoop Cluster
YARN
PHYSICAL MEMORY/CPUqueues
containers
C1 C2
C3 CN
C1 C2
C3 CN
C1 C2
C3 CN
C1 C2
C3 CN
C1 C2
C3 CN
C1 C2
C3 CN
DEFAULT GRID SETTINGS FOR ALL WORKLOADS
▸ ext4 or XFS for Worker Nodes ENFORCED
▸ Transparent Huge Pages (THP) Compaction OFF
▸ Masters’ Swapiness = 1, Workers’ Swap turned OFF
▸ Jumbo Frames ENABLED
▸ IO Scheduling “deadline” ENABLED
▸ Limiting “processes” and “files” ENFORCED
▸ Name Service Cache ENABLED
OLAPGive back my precious SQL !
HIVE/SPARK SQL WORKLOAD BEHAVIOR
▸ Hive/Spark SQL workload, depends on YARN
▸ Near-realtime to Batch SLA
▸ Large data set, consumes lots of memory, fair cpu usage
▸ Hundreds to hundreds of thousands of analytical jobs daily
▸ Typically Memory bound first, then I/O
HIVE/SPARK SQL WORKLOAD PARALLELISM
▸ Hive
▸ Hive auto-parallelism ENABLED
▸ Hive on Tez ENABLED
▸ Reuse Tez Session ENABLED
▸ ORC for Hive ENFORCED
▸ Spark
▸ Repartition Spark SQL RDD ENFORCED
▸ 2GB to 4GB YARN container size
HIVE/SPARK SQL WORKLOAD DEPLOYMENT MODELS
Bare Metal
Master Node
▸ 2 x 6 CPU Cores
▸ 128 GB RAM
▸ 4 x 1TB SSD RAID 10 plus 1 Hot-spare
▸ 2 x 10Gbe NIC Bonded
Worker Node
▸ 2 x 8 CPU Cores
▸ 256 GB RAM
▸ 2 x 1TB SSD RAID 1
▸ 12 x 4TB SATA/SAS/NL-SAS
▸ 2 x 10Gbe NIC Bonded
Cloud
All Nodes are the same
▸ Typically more nodes vs bare metal
▸ Storage (AWS/Azure)
▸ Batch - S3/Blob/ADLS
▸ Interactive - EBS/Premium
▸ Near-realtime - Local/Local
▸ vm types (AWS/Azure)
▸ >=m4.4xlarge (EBS) / >= A9 (Blob/Premium)
▸ >=i2.4xlarge (Local) / >= DS14 (Premium/Local)
▸ >=d2.2xlarge (Local) / >= DS14_v2 (Premium/Local)
Virtualized On-prem
More Master Node vs Bare Metal
▸ 2 x 4 vCPU Cores
▸ 48 GB vRAM
▸ 1TB SAN/NAS
▸ 2 x 10Gbe vNIC Bonded
Worker Node
▸ 6 vCPU Cores
▸ 32 GB vRAM
▸ 1TB SAN/NAS
▸ 2 x 10Gbe NIC Bonded
▸ Storage (data)
▸ SAN/NAS
▸ Appliance
▸ NetApp
▸ Isilon
▸ vBlock
OLTPCome on, OLTP is fun!
HBASE/SOLR WORKLOADS BEHAVIOR
▸ HBase & Solr workload
▸ Realtime, in sub-seconds SLA
▸ Large data set
▸ Millions to Billions of hits per day
▸ HBase
▸ HBase for GET/PUT/SCAN
▸ Typically IO bound first, then memory for HBase
▸ Solr
▸ Solr for Search
▸ Typically CPU bound first, then memory for Solr
HBASE WORKLOAD PARALLELISM
▸ Use HBase Java API/Phoenix, not ThriftServer
▸ HBase “short circuit” reads ENABLED
▸ HBase BucketCache ENABLED
▸ Manage HBase Compaction for block locality
▸ HBase Read HA ENABLED
▸ Properly design “rowkey” and ColumnFamily
▸ Properly design RPC Handlers
▸ Minimize GC pauses
HBASE WORKLOAD DEPLOYMENT MODEL
Bare Metal
Master Node (ZK, HBase Master)
▸ 2 x 6 CPU Cores
▸ 128 GB RAM
▸ 4 x 1TB SSD RAID 10 plus 1 Hot-spare
▸ 2 x 10Gbe NIC Bonded
Worker Node
▸ 2 x 8-12 CPU Cores
▸ 256 GB RAM
▸ 2 x 1TB SSD RAID 1
▸ 12-24 x 4TB SATA/SAS/NL-SAS/SSD
▸ 2 x 10Gbe NIC Bonded
Cloud
All Nodes are the same
▸ Typically more nodes vs bare metal
▸ Storage (AWS/Azure)
▸ Local
▸ vm types (AWS/Azure)
▸ >= d2.4xlarge / >= D14_v2
Virtualized On-prem
More Master Node vs Bare Metal
▸ 2 x 4 vCPU Cores
▸ 48 GB vRAM
▸ 1TB SAN/NAS
▸ 2 x 10Gbe vNIC Bonded
Worker Node w/ DAS
▸ 2 x 8-12 CPU Cores
▸ 256 GB RAM
▸ 2 x 1TB SSD RAID 1
▸ 12-24 x 4TB SATA/SAS/NL-SAS/SSD
▸ 2 x 10Gbe NIC Bonded
SOLR WORKLOAD PARALLELISM
▸ # of Shards I/O operations Speed of disk
▸ # of threads Indexing Speed
▸ Properly manage RAM Buffer Size
▸ Merge Factor = 25 (indexing), 2 (search)
▸ Use large disk cache
▸ Minimize GC pauses
SOLR WORKLOAD DEPLOYMENT MODEL
Bare Metal
Master Node (ZK)
▸ 2 x 4 CPU Cores
▸ 32 GB RAM
▸ 2 x 1TB SSD (OS) RAID 1
▸ 2 x 10Gbe NIC Bonded
Worker Node
▸ 2 x 10 CPU Cores
▸ 256 GB RAM
▸ 2 x 1TB SSD (OS) RAID 1
▸ 12 x 2 - 4TB SATA/SAS/NL-SAS/SSD
▸ 2 x 10Gbe NIC Bonded
Cloud
All Nodes are the same
▸ Typically more nodes vs bare metal
▸ Storage (AWS/Azure)
▸ Local
▸ vm types (AWS/Azure)
▸ >= d2.4xlarge / >= D14_v2
Virtualized On-prem
More Master Node vs Bare Metal
▸ 2 x 4 vCPU Cores
▸ 48 GB vRAM
▸ 1TB SAN/NAS
▸ 2 x 10Gbe vNIC Bonded
Worker Node w/ DAS
▸ 2 x 10 CPU Cores
▸ 256 GB RAM
▸ 2 x 1TB SSD RAID 1
▸ 12 x 2 - 4TB SATA/SAS/NL-SAS/SSD
▸ 2 x 10Gbe NIC Bonded
Data ScienceBut Captain, one does not simply warp into Mordor
DATA SCIENCE WORKLOAD BEHAVIOR
▸ Spark ML workload, depends on YARN
▸ Near-realtime to Batch SLA
▸ Runs “Monte Carlo” type of processing
▸ Hundreds to hundreds of thousands of analytical jobs daily
▸ CPU bound first, then memory
SPARK ML WORKLOAD PARALLELISM
▸ YARN ENABLED
▸ Cache data
▸ Serialized/Raw for fast access/processing
▸ Off-heap, slower processing
▸ Use Checkpointing
▸ Use Broadcast Variables to minimize network traffic
▸ Minimize GC by limiting object creation, use Builders
▸ executor-memory = between 8GB to 64GB
▸ executor-cores = between 2 to 4
▸ num-executors = w/ caching, datasize * 2 as total app memory
SPARK ML WORKLOAD DEPLOYMENT MODELS
Bare Metal
Master Node
▸ 2 x 6 CPU Cores
▸ 64 GB RAM
▸ 4 x 1TB SSD RAID 10 plus 1 Hot-spare
▸ 2 x 10Gbe NIC Bonded
Worker Node
▸ 2 x 12 CPU Cores
▸ 256 - 512 GB RAM
▸ 2 x 1TB SSD RAID 1
▸ 12 x 4TB SATA/SAS/NL-SAS
▸ 2 x 10Gbe NIC Bonded
Cloud
All Nodes are the same
▸ Typically more nodes vs bare metal
▸ Storage (AWS/Azure)
▸ Local
▸ vm types (AWS/Azure)
▸ >= d2.4xlarge / >= D14_v2
Virtualized On-prem
More Master Node vs Bare Metal
▸ 2 x 4 vCPU Cores
▸ 48 GB vRAM
▸ 1TB SAN/NAS
▸ 2 x 10Gbe vNIC Bonded
Worker Node w/ DAS
▸ 2 x 8-12 CPU Cores
▸ 256 GB RAM
▸ 2 x 1TB SSD RAID 1
▸ 12-24 x 4TB SATA/SAS/NL-SAS
▸ 2 x 10Gbe NIC Bonded
StreamingStream me in scotty !
NIFI/STORM WORKLOAD BEHAVIOR
▸ NiFi & Storm Streaming workloads
▸ Always “ON” data ingest and processing
▸ Guarantees data delivery
▸ Highly distributed
▸ simple event processing -> complex event processing
NIFI WORKLOAD PARALLELISM
▸ Network bound first, then memory or cpu
▸ Go granular on NiFi processors
▸ nifi.bored.yield.duration=10
▸ RAID 10 for repositories: Content, FlowFile and Provenance
▸ nifi.queue.swap.threshold=20000
▸ No sharing of disk across repositories, use RAID 10
NIFI WORKLOAD DEPLOYMENT MODELS
Bare Metal
Master Node
▸ 2 x 4 CPU Cores
▸ 32 GB RAM
▸ 2 x 1TB SSD RAID 1
▸ 2 x 10Gbe NIC Bonded
Worker Node
▸ 2 x 8 CPU Cores
▸ 128 GB RAM
▸ 2 x 1TB SSD (OS) RAID 1
▸ 6 x 1TB SATA/SAS/NL-SAS RAID 10
▸ 2 x 10Gbe NIC Bonded
Cloud
All Nodes are the same
▸ Typically more nodes vs bare metal
▸ Storage (AWS/Azure)
▸ Local/EBS/Premium
▸ vm types (AWS/Azure)
▸ >= m4.4xlarge / >= A9
Virtualized On-prem
More Master Node vs Bare Metal
▸ 2 x 4 vCPU Cores
▸ 16 GB vRAM
▸ 1TB SAN/NAS
▸ 2 x 10Gbe vNIC Bonded
Worker Node w/ DAS
▸ 2 x 8 CPU Cores
▸ 128 GB RAM
▸ 2 x 1TB SSD (OS) RAID 1
▸ 6 x 1TB SATA/SAS/NL-SAS RAID 10
▸ 2 x 10Gbe NIC Bonded
STORM WORKLOAD PARALLELISM
▸ CPU bound first, then memory
▸ Keep Tuple processing light, quick execution of execute()
▸ Use Google’s Guava for bolt-local caches (mem <1GB)
▸ Shared caches across bolts: HBase, Phoenix, Redis or MemcacheD
▸ # of workers memory
STORM WORKLOAD DEPLOYMENT MODELS
Bare Metal
Master Node
▸ 2 x 4 CPU Cores
▸ 32 GB RAM
▸ 4 x 1TB SSD RAID 10 plus 1 Hot-spare
▸ 2 x 10Gbe NIC Bonded
Worker Node
▸ 2 x 12 CPU Cores
▸ 256 GB RAM
▸ 2 x 1TB SSD RAID 1
▸ 2 x 10Gbe NIC Bonded
Cloud
All Nodes are the same
▸ Typically more nodes vs bare metal
▸ Storage (AWS/Azure)
▸ Local/EBS/Premium
▸ vm types (AWS/Azure)
▸ >= r3.4xlarge / >= A9
Virtualized On-prem
More Master Node vs Bare Metal
▸ 2 x 4 vCPU Cores
▸ 24 GB vRAM
▸ 1TB SAN/NAS
▸ 2 x 10Gbe vNIC Bonded
Worker Node w/ DAS
▸ 2 x 12 CPU Cores
▸ 256 GB RAM
▸ 1TB SAN/NAS
▸ 2 x 10Gbe NIC Bonded
http://hortonworks.com/products/sandbox/
?
THANK YOU!
@rommelgarcia /in/rommelgarcia
#hortonworks