exar presentation: big data solutions - exar corporation

28
March 2013 [email protected] Exar Corporation 48720 Kato Road Fremont, CA 510-668-7000 www.exar.com Exar Optimizing Hadoop – Is Bigger Better??

Upload: others

Post on 11-Feb-2022

4 views

Category:

Documents


0 download

TRANSCRIPT

March 2013

[email protected] Exar Corporation 48720 Kato Road Fremont, CA 510-668-7000 www.exar.com

Exar Optimizing Hadoop – Is Bigger Better??

• Section I: Exar Introduction – Exar Corporate Overview

• Section II: Big Data Pain-Points – Debunking Top 5 Hadoop Myths – 3 Main System Constraints

• Section III: Hadoop Optimization Solution – Exar Hadoop Acceleration Solutions

• Section IV: Benchmarking Results – OEM 1 Results – OEM 2 Results – OEM Results

• Section V: Summary

Exar At-A-Glance Global Leader in Data Management Solutions and Mixed Signal Components

• Well Established Fabless IC Company – 42 years of history in Silicon Valley

– ~ 300 Employees Worldwide

– Healthy balance sheet - $229M in assets

• Broad-base Component and Solution Supplier – Specialty SoCs, FPGA/ASIC Boards and Software

• DCS (Data Compression & Security)

– Analog Mixed Signal Components • Interface

• Power

• Section I: Exar Introduction – Exar Corporate Overview

• Section II: Big Data Pain-Points – Debunking Top 5 Hadoop Myths – 3 Main System Constraints

• Section III: Hadoop Optimization Solution – Exar Hadoop Acceleration Solutions

• Section IV: Benchmarking Results – OEM 1 Results – OEM 2 Results – OEM Results

• Section V: Summary

Is Bigger Always Better??

It is not about Size of Big-Data Deployment Return on Investment would be defined by Optimal Utilization of Resources

Debunking the Top 5 Hadoop Myths 1. More CPUs or More Storage does not mean better Analytics

Increasing Number of Jobs Per Node, or, Improving Job processing time, implies more powerful Nodes…..

No!!!

Rack Density maximization and effective resource utilization (CPU, Storage and Memory) is the solution

Capital expenditure is the primary contributor to the 3 or 5 Year TCO

Debunking Top 5 Hadoop Myths 2. Operational Expenditure is a significant component of 3-5 Years TCO

No!!! Operational expenditure is a significant contributor in the TCO

Debunking the Top 5 Hadoop Myths 3. Storage scaling is significantly constrained by Size and Space

Storage can Scale Easily

No!!!

Size, Space and Connectivity constrains scaling capacity

Debunking the Top 5 Hadoop Myths 4. Data Nodes costs are driven by Storage rather than CPUs

Compute defines the Data node cost

No!!!!

Storage defines the node cost, and the ratio is often as high as 10:1 (Storage to CPU)

Debunking the Top 5 Hadoop Myths 5. For larger Hadoop Clusters Network (Shuffle) traffic reduction is a key

Network Traffic Reduction is not relevant in Hadoop TCO

No!!!

10G WAN Links are expensive. It is preferable to optimize traffic on 1G WAN Links, and avoid/minimize 10G Links

Summary of Hadoop Cluster Constraints Hadoop Clusters can be Optimized for Storage, Network Bandwidth & Compute Resources

Server OEMs are Struggling to provide enough Capacity

to keep up with every growing Data Needs E.g. – Leading Server OEM Latest Configuration supports

30 Disks/Server!!!

Storage Capacity

The biggest bottleneck for Data Analytics is the Disk IOPs limitation

E.g. – Even the most optimally configured Hadoop System is struggling to get better than 80% CPU Utilization, as Disk IO bandwidth is not able to keep up, especially for high CPU Core to HDD Ratios

Disk IOPs Bottleneck

Data is often Replicated 3 times, and Large Clusters are distributed globally. Minimizing bandwidth (across WAN) and minimizing Switch/HW Cost (across LAN) is key

E.g. – A Leading eCommerce Company has 6 Clusters distributed globally, with each Cluster having 2,000-3,000 Data Nodes

Network Bandwidth

Can Hadoop Cluster TCO be reduced without impacting job execution time??

Exar Hadoop Acceleration Solutions can lower Cluster TCO by 20-40%!!

Exar Hadoop Optimization Solutions By optimizing CPU, Storage, Memory , & Network Bandwidth, TCO can be reduced up to 40%

• Section I: Exar Introduction – Exar Corporate Overview

• Section II: Big Data Pain-Points – Debunking Top 5 Hadoop Myths – 3 Main System Constraints

• Section III: Hadoop Optimization Solution – Exar Hadoop Acceleration Solutions

• Section IV: Benchmarking Results – OEM 1 Results – OEM 2 Results – OEM Results

• Section V: Summary

Exar Hadoop Acceleration Solution Overview Exar Solution optimizes all the Hadoop Cluster Constraints mentioned earlier

Exar Hadoop Acceleration Solution Highlights:

Storage Optimization – Exar Solution uses Advanced Data Compression technique to Compress Input and Output Data, which drastically reduces Storage requirement in each Data Node

CPU Optimization – Data Compression/Decompression is Offloaded from CPU, which releases additional CPU Cycles for Enhanced Data Analytics

Memory Management – Exar Solution uses advanced Memory Management, which optimizes the System Memory Usage

Network Bandwidth Optimization – Exar Solution Compresses Intermittent or Shuffle traffic, which optimizes Network Bandwidth

Exar Hadoop Acceleration Solution Overview Exar offers a Certified Plug N Play Hadoop Acceleration solution

Plug N Play Solution:

No Code Change – Filter Layer SW sits below the HDFS. No APIs required. SW installs in minutes!

Standard HW – Offload card supports PCIe Gen 1 and Gen 2

Linux OS Compatible – Solution supports Linux 6.X, and works across RHEL, Ubuntu and SUSE

Certified by Cloudera:

Solution Certified on both CDH3 and CDH4

OEM Tested:

Solutions evaluated and benchmarked on leading OEM HW including IBM, HP, Dell, SuperMicro etc

Big Data (Hadoop) Optimization Solution Exar Solutions Reduce Storage Requirement & Optimize System Resource Utilization

A Hadoop Cluster Accelerated with AltraSTAR consists of:

CeDeFS Filter Layer SW

Exar Hardware Accelerator

CeDeFS is a transparent Filter Layer SW and sits below HDFS. No code changes are required and workflow remains the same

Exar Accelerator is a FPGA based PCIe HW Accelerator 3x-6x increase in storage capacity in each node

Enhanced CPU utilization and reduced runtime through I/O reduction and optimization Significantly benefits I/O bound tasks Increased data density; reduces the shuffle traffic Reduction in Power – Per Node, Per Cluster

Storage Volume

CeDeFS + CeDeFN

Exar Driver

Exar Offload

Card

Hadoop Map/Reduce

Linux System

Hadoop FS

• Section I: Exar Introduction – Exar Corporate Overview

• Section II: Big Data Pain-Points – Debunking Top 5 Hadoop Myths – 3 Main System Constraints

• Section III: Hadoop Optimization Solution – Exar Hadoop Acceleration Solutions

• Section IV: Benchmarking Results – OEM 1 Results – OEM 2 Results – OEM Results

• Section V: Summary

Test Procedure Validate Exar Acceleration Solutions on Typical Hadoop Clusters

Configure System to Default Hadoop Setting

Establish Benchmark for Native Config (with LZO)

Rerun Tests with Exar Acceleration Solution

Disk Reduction

Network Link Opt

Large File Optimization

Quantify Results; Calculate ROI

Exar Hadoop Acceleration – OEM 1 Results Exar’s GX1745 based Acceleration Test Results

Cluster Configuration

Job Execution & Resource Req

300 TB EXAR Hadoop Accelerated Solution End-Users could reduce their Capital Expenditure up to 40%!!!

Exar Hadoop Acceleration – OEM 2 Results OEM Sorted 1 TB in an industry leading time; Exar reduced the cost by 30%

Servers = 10

Expansion Units = 5

Servers = 10

Expansion Units = 10 Exar Solution

Job Execution & Resource Req

Cluster Configuration

Terasort Test on AppSystem Cluster

12 Disks 6 Disks

Single Job (512GB)

Single Job (1TB)

Multiple Job Single Job (512GB) Job 2

Native LZO 14m 15s 33m 36s 33m 32s 21m 34s

AltraSTAR + LZO 8m 9s 16m 0s 19m 3s 12m 07s Performance Gain 70% 101% 76% 77%

Capacity Gain

Reduce cost and Improve performance through. 1. Improve performance 2. Remove disks or Lower Capacity disks 3. Increase Capacity

Exar Hadoop Acceleration – OEM 3 Results Solution gave the flexibility to increase Storage/CPU density per Rack

Exar Hadoop Acceleration – OEM 3 Results Exar Solution improved Analytics up to 70%, or, reduced Storage Cost up to 50%

Performance Maximized

Configuration Cost

Minimized Configuration

Exar Hadoop Accelerated Solutions Outperformed CPU solutions Implied or Calculated Results shed light on 4 of the 5 Hadoop Implementation Myths

Efficiency Parameter

Parameter Definition

Acceleration Benchmarks

AltraSTAR Accel Gain

System Resource Optimization

Ratio of CPU Cores to Hard Disks 1:2 1:1 100%

Cap-Ex Efficiency

$$ Cap Investment 1 GB Sort

N/A N/A 27%

Op-Ex Efficiency

KWh Consumed per 1 GB Sort

N/A N/A 20%

No

EX

AR

Acc

eler

atio

n

With

E

xar A

ccel

erat

ion

Storage Density

Effective Storage per 40U Rack

261 430 61%

• Section I: Exar Introduction – Exar Corporate Overview

• Section II: Big Data Pain-Points – Debunking Top 5 Hadoop Myths – 3 Main System Constraints

• Section III: Hadoop Optimization Solution – Exar Hadoop Acceleration Solutions

• Section IV: Benchmarking Results – OEM 1 Results – OEM 2 Results – OEM Results

• Section V: Summary

Exar Hadoop Acceleration Solution Exar Acceleration Solution optimizes all of the Hadoop Constraints

Significant ROI: Highest Rack Density

Lowest $$/GB Sort

Most Power Efficient

Optimized Network Bandwidth

Flexibility: Offers flexibility to cater to both Disk

IO Bound or CPU Bound Solutions

Certified: Certified on all Cloudera Releases, and

tested on most of the major OEM HW

Conclusion

• Hardware accelerated compression provides meaningful acceleration as well as added capacity

• Acceleration plus added capacity means bigger jobs executed in less time

• Very significant savings in both CAPEX and OPEX

Ramana Jampala

Vice-President – Business Development [email protected]

(732) 440-1280 x238

www.exar.com