mambo - school of computingxinglin/papers/mambo-usenix-atc15.pdf · mambo running analytics on...

67
Mambo Running Analytics on Enterprise Storage Jingxin Feng, Xing Lin 1 , Gokul Soundararajan Advanced Technology Group 1 University of Utah

Upload: ngokhue

Post on 12-May-2018

215 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Mambo - School of Computingxinglin/papers/mambo-usenix-atc15.pdf · Mambo Running Analytics on Enterprise Storage Jingxin Feng, Xing Lin1, Gokul Soundararajan Advanced Technology

Mambo Running Analytics on Enterprise Storage  Jingxin Feng, Xing Lin1, Gokul Soundararajan

 Advanced Technology Group

 1 University of Utah

Page 2: Mambo - School of Computingxinglin/papers/mambo-usenix-atc15.pdf · Mambo Running Analytics on Enterprise Storage Jingxin Feng, Xing Lin1, Gokul Soundararajan Advanced Technology

Motivation No easy way to analyze data stored in enterprise storage (NFS)

© 2015 NetApp, Inc. All rights reserved. 2

Page 3: Mambo - School of Computingxinglin/papers/mambo-usenix-atc15.pdf · Mambo Running Analytics on Enterprise Storage Jingxin Feng, Xing Lin1, Gokul Soundararajan Advanced Technology

Motivation No easy way to analyze data stored in enterprise storage (NFS)

© 2015 NetApp, Inc. All rights reserved. 3

Production System

Data Data

Page 4: Mambo - School of Computingxinglin/papers/mambo-usenix-atc15.pdf · Mambo Running Analytics on Enterprise Storage Jingxin Feng, Xing Lin1, Gokul Soundararajan Advanced Technology

Motivation No easy way to analyze data stored in enterprise storage (NFS)

© 2015 NetApp, Inc. All rights reserved. 4

Analytics System Production System

Data Data

Page 5: Mambo - School of Computingxinglin/papers/mambo-usenix-atc15.pdf · Mambo Running Analytics on Enterprise Storage Jingxin Feng, Xing Lin1, Gokul Soundararajan Advanced Technology

Motivation No easy way to analyze data stored in enterprise storage (NFS)

© 2015 NetApp, Inc. All rights reserved. 5

Analytics System Production System

Data Data

§  Bank

Page 6: Mambo - School of Computingxinglin/papers/mambo-usenix-atc15.pdf · Mambo Running Analytics on Enterprise Storage Jingxin Feng, Xing Lin1, Gokul Soundararajan Advanced Technology

Motivation No easy way to analyze data stored in enterprise storage (NFS)

© 2015 NetApp, Inc. All rights reserved. 6

Analytics System Production System

Data Data

§  Bank

§  AutoSupport

Page 7: Mambo - School of Computingxinglin/papers/mambo-usenix-atc15.pdf · Mambo Running Analytics on Enterprise Storage Jingxin Feng, Xing Lin1, Gokul Soundararajan Advanced Technology

Motivation No easy way to analyze data stored in enterprise storage (NFS)

© 2015 NetApp, Inc. All rights reserved. 7

Analytics System Production System

Data Data

Page 8: Mambo - School of Computingxinglin/papers/mambo-usenix-atc15.pdf · Mambo Running Analytics on Enterprise Storage Jingxin Feng, Xing Lin1, Gokul Soundararajan Advanced Technology

Motivation No easy way to analyze data stored in enterprise storage (NFS)

© 2015 NetApp, Inc. All rights reserved. 8

Analytics System

§  Separate infrastructures for production systems and analytics systems

Production System

Data Data

Page 9: Mambo - School of Computingxinglin/papers/mambo-usenix-atc15.pdf · Mambo Running Analytics on Enterprise Storage Jingxin Feng, Xing Lin1, Gokul Soundararajan Advanced Technology

Motivation No easy way to analyze data stored in enterprise storage (NFS)

© 2015 NetApp, Inc. All rights reserved. 9

Analytics System

Data Copying

§  Separate infrastructures for production systems and analytics systems

Production System

Data Data Data

Page 10: Mambo - School of Computingxinglin/papers/mambo-usenix-atc15.pdf · Mambo Running Analytics on Enterprise Storage Jingxin Feng, Xing Lin1, Gokul Soundararajan Advanced Technology

Motivation No easy way to analyze data stored in enterprise storage (NFS)

© 2015 NetApp, Inc. All rights reserved. 10

Analytics System

Data Copying

§  Separate infrastructures for production systems and analytics systems

data silos

Production System

Data Data

Page 11: Mambo - School of Computingxinglin/papers/mambo-usenix-atc15.pdf · Mambo Running Analytics on Enterprise Storage Jingxin Feng, Xing Lin1, Gokul Soundararajan Advanced Technology

Motivation No easy way to analyze data stored in enterprise storage (NFS)

© 2015 NetApp, Inc. All rights reserved. 11

Analytics System

Data Copying

§  Separate infrastructures for production systems and analytics systems

§  Problems §  Copying PBs of data is time consuming

data silos

Production System

Data Data

Page 12: Mambo - School of Computingxinglin/papers/mambo-usenix-atc15.pdf · Mambo Running Analytics on Enterprise Storage Jingxin Feng, Xing Lin1, Gokul Soundararajan Advanced Technology

Motivation No easy way to analyze data stored in enterprise storage (NFS)

© 2015 NetApp, Inc. All rights reserved. 12

Analytics System

Data Copying

§  Separate infrastructures for production systems and analytics systems

§  Problems §  Copying PBs of data is time consuming §  3 × storage overhead in HDFS

data silos

Production System

Data Data

Page 13: Mambo - School of Computingxinglin/papers/mambo-usenix-atc15.pdf · Mambo Running Analytics on Enterprise Storage Jingxin Feng, Xing Lin1, Gokul Soundararajan Advanced Technology

Motivation No easy way to analyze data stored in enterprise storage (NFS)

© 2015 NetApp, Inc. All rights reserved. 13

Analytics System

Data Copying

§  Separate infrastructures for production systems and analytics systems

§  Problems §  Copying PBs of data is time consuming §  3 × storage overhead in HDFS

data silos

Production System

Data Data

§  Periodical re-synchronization later on

Page 14: Mambo - School of Computingxinglin/papers/mambo-usenix-atc15.pdf · Mambo Running Analytics on Enterprise Storage Jingxin Feng, Xing Lin1, Gokul Soundararajan Advanced Technology

Motivation No easy way to analyze data stored in enterprise storage (NFS)

© 2015 NetApp, Inc. All rights reserved. 14

Analytics System

Data Copying

§  Separate infrastructures for production systems and analytics systems

§  Problems §  Copying PBs of data is time consuming §  3 × storage overhead in HDFS

data silos

Production System

Data Data

§  Periodical re-synchronization later on §  Legal prevents data copying

Page 15: Mambo - School of Computingxinglin/papers/mambo-usenix-atc15.pdf · Mambo Running Analytics on Enterprise Storage Jingxin Feng, Xing Lin1, Gokul Soundararajan Advanced Technology

Mambo An NFS connector, enabling direct analytics for data on Enterprise storage (NFS)

© 2015 NetApp, Inc. All rights reserved. 15

Production System Analytics System

Data Data

Page 16: Mambo - School of Computingxinglin/papers/mambo-usenix-atc15.pdf · Mambo Running Analytics on Enterprise Storage Jingxin Feng, Xing Lin1, Gokul Soundararajan Advanced Technology

Mambo An NFS connector, enabling direct analytics for data on Enterprise storage (NFS)

© 2015 NetApp, Inc. All rights reserved. 16

Production System Analytics System

Data Data Connector

Page 17: Mambo - School of Computingxinglin/papers/mambo-usenix-atc15.pdf · Mambo Running Analytics on Enterprise Storage Jingxin Feng, Xing Lin1, Gokul Soundararajan Advanced Technology

Mambo

§  Remove data copying §  Remove storage overhead (single copy)

An NFS connector, enabling direct analytics for data on Enterprise storage (NFS)

© 2015 NetApp, Inc. All rights reserved. 17

Production System Analytics System

Data Data Connector

§  Remove data re-synchronization §  No legal issue

Page 18: Mambo - School of Computingxinglin/papers/mambo-usenix-atc15.pdf · Mambo Running Analytics on Enterprise Storage Jingxin Feng, Xing Lin1, Gokul Soundararajan Advanced Technology

Mambo

§  Remove data copying §  Remove storage overhead (single copy)

An NFS connector, enabling direct analytics for data on Enterprise storage (NFS)

© 2015 NetApp, Inc. All rights reserved. 18

Production System Analytics System

Data Data Connector

Copying is not required; you can do analytics in-place

§  Remove data re-synchronization §  No legal issue

Page 19: Mambo - School of Computingxinglin/papers/mambo-usenix-atc15.pdf · Mambo Running Analytics on Enterprise Storage Jingxin Feng, Xing Lin1, Gokul Soundararajan Advanced Technology

Journey From Research to Product

© 2015 NetApp, Inc. All rights reserved. 19

Page 20: Mambo - School of Computingxinglin/papers/mambo-usenix-atc15.pdf · Mambo Running Analytics on Enterprise Storage Jingxin Feng, Xing Lin1, Gokul Soundararajan Advanced Technology

Project History From Research to Product

© 2015 NetApp, Inc. All rights reserved. 20

•  Talked with customers • Developed initial prototype

• Madalin Mihailescu refined prototype • Added a distributed cache • Obtained traces from UC Berkeley • Published in FAST’13

• Xing Lin refactored code for Hadoop 2.0 • Optimized for 10 Gb networks • Obtained legal approval for open-source • Posted to GitHub • Customer Proof-of-Concepts (PoCs) • Pushing to merge into Hadoop

2011

2012

2013 ~ now

Page 21: Mambo - School of Computingxinglin/papers/mambo-usenix-atc15.pdf · Mambo Running Analytics on Enterprise Storage Jingxin Feng, Xing Lin1, Gokul Soundararajan Advanced Technology

Use Cases How many ways can you use Mambo?

© 2015 NetApp, Inc. All rights reserved. 21

Page 22: Mambo - School of Computingxinglin/papers/mambo-usenix-atc15.pdf · Mambo Running Analytics on Enterprise Storage Jingxin Feng, Xing Lin1, Gokul Soundararajan Advanced Technology

Analyze Enterprise Data In-place

© 2015 NetApp, Inc. All rights reserved. 22

Job User jobs

Compute layer MapReduce

File System

Yarn

HDFS

Resource management layer

Storage layer

Generic file system layer

Page 23: Mambo - School of Computingxinglin/papers/mambo-usenix-atc15.pdf · Mambo Running Analytics on Enterprise Storage Jingxin Feng, Xing Lin1, Gokul Soundararajan Advanced Technology

Analyze Enterprise Data In-place

© 2015 NetApp, Inc. All rights reserved. 23

Job User jobs

Compute layer MapReduce

File System

Yarn

HDFS

Resource management layer

Storage layer

MapReduce

File System

Yarn

NFS

Generic file system layer

• HDFS gets swapped out with NFS*

Page 24: Mambo - School of Computingxinglin/papers/mambo-usenix-atc15.pdf · Mambo Running Analytics on Enterprise Storage Jingxin Feng, Xing Lin1, Gokul Soundararajan Advanced Technology

Analyze Enterprise Data In-place

© 2015 NetApp, Inc. All rights reserved. 24

Job User jobs

Compute layer MapReduce

File System

Yarn

HDFS

Resource management layer

Storage layer

MapReduce

File System

Yarn

NFS

• HDFS gets swapped out with NFS* • Apache Hadoop does not get modified. • User programs (jobs) are not modified.

Generic file system layer

Page 25: Mambo - School of Computingxinglin/papers/mambo-usenix-atc15.pdf · Mambo Running Analytics on Enterprise Storage Jingxin Feng, Xing Lin1, Gokul Soundararajan Advanced Technology

§  Use NetApp FlexClones for creating test environments quickly §  Use a copy of production data for realistic Test/QA environments (e.g.,

AutoSupport)

© 2015 NetApp, Inc. All rights reserved. 25

Production

NetApp Storage On Premises

Easily Launch Test Environments

Page 26: Mambo - School of Computingxinglin/papers/mambo-usenix-atc15.pdf · Mambo Running Analytics on Enterprise Storage Jingxin Feng, Xing Lin1, Gokul Soundararajan Advanced Technology

§  Use NetApp FlexClones for creating test environments quickly §  Use a copy of production data for realistic Test/QA environments (e.g.,

AutoSupport)

© 2015 NetApp, Inc. All rights reserved. 26

Production

NetApp Storage On Premises

Test/QA

FlexClone

Easily Launch Test Environments

Page 27: Mambo - School of Computingxinglin/papers/mambo-usenix-atc15.pdf · Mambo Running Analytics on Enterprise Storage Jingxin Feng, Xing Lin1, Gokul Soundararajan Advanced Technology

Use cloud to Analyze Data

© 2015 NetApp, Inc. All rights reserved. 27

Private Storage

On-premise (onsite)

Public cloud

Secondary private storage at a colocation facility (e.g., Equinix), for backup and fast restoration with cloud

(offsite)

Page 28: Mambo - School of Computingxinglin/papers/mambo-usenix-atc15.pdf · Mambo Running Analytics on Enterprise Storage Jingxin Feng, Xing Lin1, Gokul Soundararajan Advanced Technology

Use cloud to Analyze Data

© 2015 NetApp, Inc. All rights reserved. 28

Private Storage

On-premise (onsite)

Private Storage

Colocated facility Public cloud

Secondary private storage at a colocation facility (e.g., Equinix), for backup and fast restoration with cloud

Express interconnection

(offsite)

Page 29: Mambo - School of Computingxinglin/papers/mambo-usenix-atc15.pdf · Mambo Running Analytics on Enterprise Storage Jingxin Feng, Xing Lin1, Gokul Soundararajan Advanced Technology

Use cloud to Analyze Data

© 2015 NetApp, Inc. All rights reserved. 29

SnapMirror

Private Storage

On-premise (onsite)

Private Storage

Colocated facility Public cloud

Secondary private storage at a colocation facility (e.g., Equinix), for backup and fast restoration with cloud

Express interconnection

(offsite)

Page 30: Mambo - School of Computingxinglin/papers/mambo-usenix-atc15.pdf · Mambo Running Analytics on Enterprise Storage Jingxin Feng, Xing Lin1, Gokul Soundararajan Advanced Technology

Use cloud to Analyze Data

© 2015 NetApp, Inc. All rights reserved. 30

SnapMirror

Launch Hadoop in the cloud and use data on private storage

Private Storage

On-premise (onsite)

Private Storage

Colocated facility Public cloud

Secondary private storage at a colocation facility (e.g., Equinix), for backup and fast restoration with cloud

Express interconnection

(offsite)

Page 31: Mambo - School of Computingxinglin/papers/mambo-usenix-atc15.pdf · Mambo Running Analytics on Enterprise Storage Jingxin Feng, Xing Lin1, Gokul Soundararajan Advanced Technology

Design and Implementation

© 2015 NetApp, Inc. All rights reserved. 31

Page 32: Mambo - School of Computingxinglin/papers/mambo-usenix-atc15.pdf · Mambo Running Analytics on Enterprise Storage Jingxin Feng, Xing Lin1, Gokul Soundararajan Advanced Technology

Architecture Overview Mambo: an NFS client in Java, implementing the Hadoop generic file system API

© 2015 NetApp, Inc. All rights reserved. 32

Page 33: Mambo - School of Computingxinglin/papers/mambo-usenix-atc15.pdf · Mambo Running Analytics on Enterprise Storage Jingxin Feng, Xing Lin1, Gokul Soundararajan Advanced Technology

Architecture Overview Mambo: an NFS client in Java, implementing the Hadoop generic file system API

© 2015 NetApp, Inc. All rights reserved. 33

HDFS Amazon S3 GlusterFS Azure

Page 34: Mambo - School of Computingxinglin/papers/mambo-usenix-atc15.pdf · Mambo Running Analytics on Enterprise Storage Jingxin Feng, Xing Lin1, Gokul Soundararajan Advanced Technology

Architecture Overview Mambo: an NFS client in Java, implementing the Hadoop generic file system API

© 2015 NetApp, Inc. All rights reserved. 34

HDFS Amazon S3 GlusterFS Azure NFS

Filled the missing piece

Page 35: Mambo - School of Computingxinglin/papers/mambo-usenix-atc15.pdf · Mambo Running Analytics on Enterprise Storage Jingxin Feng, Xing Lin1, Gokul Soundararajan Advanced Technology

Architecture Overview Mambo: an NFS client in Java, implementing the Hadoop generic file system API

© 2015 NetApp, Inc. All rights reserved. 35

HDFS Amazon S3 GlusterFS Azure NFS

Filled the missing piece

Copying is not required

Page 36: Mambo - School of Computingxinglin/papers/mambo-usenix-atc15.pdf · Mambo Running Analytics on Enterprise Storage Jingxin Feng, Xing Lin1, Gokul Soundararajan Advanced Technology

Architecture Overview Mambo: an NFS client in Java, implementing the Hadoop generic file system API

© 2015 NetApp, Inc. All rights reserved. 36

HDFS Amazon S3 GlusterFS Azure NFS

§  No changes to Hadoop framework §  No changes to user programs

Filled the missing piece

Copying is not required

Page 37: Mambo - School of Computingxinglin/papers/mambo-usenix-atc15.pdf · Mambo Running Analytics on Enterprise Storage Jingxin Feng, Xing Lin1, Gokul Soundararajan Advanced Technology

Tight integration with Hadoop/MapReduce

§ Optimized for large sequential I/O (e.g., 1MB IO)

§ Commit data to disk only when a task succeeds

§  Intelligent prefetching for streaming reads; aware of task sizes

© 2015 NetApp, Inc. All rights reserved. 37

Page 38: Mambo - School of Computingxinglin/papers/mambo-usenix-atc15.pdf · Mambo Running Analytics on Enterprise Storage Jingxin Feng, Xing Lin1, Gokul Soundararajan Advanced Technology

Implementation

© 2015 NetApp, Inc. All rights reserved. 38

File System Hadoop generic filesystem API

YARN

MapReduce … Computation frameworks

Resource management layer

Page 39: Mambo - School of Computingxinglin/papers/mambo-usenix-atc15.pdf · Mambo Running Analytics on Enterprise Storage Jingxin Feng, Xing Lin1, Gokul Soundararajan Advanced Technology

Implementation

© 2015 NetApp, Inc. All rights reserved. 39

File System Hadoop generic filesystem API

YARN

MapReduce …

NFS File System ü  FS metadata OPs ü  NFS client protocol ü  File reads ü  File writes

Computation frameworks

Resource management layer

Page 40: Mambo - School of Computingxinglin/papers/mambo-usenix-atc15.pdf · Mambo Running Analytics on Enterprise Storage Jingxin Feng, Xing Lin1, Gokul Soundararajan Advanced Technology

Implementation

© 2015 NetApp, Inc. All rights reserved. 40

File System Hadoop generic filesystem API

YARN

MapReduce …

NFS File System ü  FS metadata OPs ü  NFS client protocol ü  File reads ü  File writes

Computation frameworks

Resource management layer

NFSv3FileSystem.java NFSv3FileSystemStore.java NFSv3InputStream.java NFSv3OutputStream.java

Page 41: Mambo - School of Computingxinglin/papers/mambo-usenix-atc15.pdf · Mambo Running Analytics on Enterprise Storage Jingxin Feng, Xing Lin1, Gokul Soundararajan Advanced Technology

Implementation

© 2015 NetApp, Inc. All rights reserved. 41

File System Hadoop generic filesystem API

YARN

MapReduce …

NFS File System ü  FS metadata OPs ü  NFS client protocol ü  File reads ü  File writes

Computation frameworks

Resource management layer

Standard NFSv3 protocol

NFSv3FileSystem.java NFSv3FileSystemStore.java NFSv3InputStream.java NFSv3OutputStream.java

Page 42: Mambo - School of Computingxinglin/papers/mambo-usenix-atc15.pdf · Mambo Running Analytics on Enterprise Storage Jingxin Feng, Xing Lin1, Gokul Soundararajan Advanced Technology

How to Use it?

© 2015 NetApp, Inc. All rights reserved. 42

§ Source code (jar library file) § Get code from GitHub §  Compile the code §  Install the jar file §  Copy the jar file to the library directory for Hadoop installation

§ Only need to modify two configuration files §  core-site.xml (hadoop core configuration file) §  nfs-mapping.json (nfs configuration file)

Page 43: Mambo - School of Computingxinglin/papers/mambo-usenix-atc15.pdf · Mambo Running Analytics on Enterprise Storage Jingxin Feng, Xing Lin1, Gokul Soundararajan Advanced Technology

How to Use it?

© 2015 NetApp, Inc. All rights reserved. 43

§ Source code (jar library file) § Get code from GitHub §  Compile the code §  Install the jar file §  Copy the jar file to the library directory for Hadoop installation

§ Only need to modify two configuration files §  core-site.xml (hadoop core configuration file) §  nfs-mapping.json (nfs configuration file)

§ Or just try the Amazon Cloud Formation template with everything configured

Page 44: Mambo - School of Computingxinglin/papers/mambo-usenix-atc15.pdf · Mambo Running Analytics on Enterprise Storage Jingxin Feng, Xing Lin1, Gokul Soundararajan Advanced Technology

Configure core-site.xml

© 2015 NetApp, Inc. All rights reserved. 44

Property Value

fs.defaultFS hdfs://namenode:54310/ HDFS

NFS

Property Value

fs.defaultFS nfs://nfsserver:2049/ fs.nfs.configuration <path-to-configuration-file>

fs.nfs.impl org.apache.hadoop.fs.nfs.NFSv3FileSystem

fs.AbstractFileSystem.nfs.impl org.apache.hadoop.fs.nfs.NFSv3AbstractFileSystem

Page 45: Mambo - School of Computingxinglin/papers/mambo-usenix-atc15.pdf · Mambo Running Analytics on Enterprise Storage Jingxin Feng, Xing Lin1, Gokul Soundararajan Advanced Technology

Configure nfs-mapping.json

© 2015 NetApp, Inc. All rights reserved. 45

§  Configurable properties §  Export path §  Read/write sizes §  Split size (Hadoop task granularity) §  Authentication method (supporting AUTH_NONE or AUTH_UNIX) §  …

§  Supports multiple controllers (for NetApp clustered ONTAP) §  Aggregated bandwidth

Page 46: Mambo - School of Computingxinglin/papers/mambo-usenix-atc15.pdf · Mambo Running Analytics on Enterprise Storage Jingxin Feng, Xing Lin1, Gokul Soundararajan Advanced Technology

Performance Evaluation

© 2015 NetApp, Inc. All rights reserved. 46

Page 47: Mambo - School of Computingxinglin/papers/mambo-usenix-atc15.pdf · Mambo Running Analytics on Enterprise Storage Jingxin Feng, Xing Lin1, Gokul Soundararajan Advanced Technology

Highlights from MixApart

© 2015 NetApp, Inc. All rights reserved. 47

•  MixApart: NFS connector + data prefetcher + local disk as cache

•  Better performance with NFS connector than Hadoop with ingest (18%~26% reduction in job duration)

•  Overlaps data ingest with task computation

•  Matches ideal Hadoop (data ingested into HDFS before-hand), with moderate/high data reuse across jobs

1MixApart: De-coupled Analytics for Shared Storage Servers. Madalin Mihailescu, Gokul Soundararajan, and Cristiana Amza. In FAST ‘13

Page 48: Mambo - School of Computingxinglin/papers/mambo-usenix-atc15.pdf · Mambo Running Analytics on Enterprise Storage Jingxin Feng, Xing Lin1, Gokul Soundararajan Advanced Technology

Scaling experiments How does the NFS Connector scale with more storage and compute?

© 2015 NetApp, Inc. All rights reserved. 48

8 Nodes (FAS 8080) with 48 HDDs each and 8 10Gb links each

28 Nodes (UCS B230M2) with 20 CPU cores and 256 GB RAM

Cluster of NFS servers

Page 49: Mambo - School of Computingxinglin/papers/mambo-usenix-atc15.pdf · Mambo Running Analytics on Enterprise Storage Jingxin Feng, Xing Lin1, Gokul Soundararajan Advanced Technology

Scaling TeraGen

0.00

0.50

1.00

1.50

2.00

2.50

3.00

3.50

4.00

279 373 466 559 652 745 838 931 1024

Run

ning

Tim

e (N

orm

aliz

ed)

Data Size (in GB)

1 HDD

8 HDD

© 2015 NetApp, Inc. All rights reserved. 49

Page 50: Mambo - School of Computingxinglin/papers/mambo-usenix-atc15.pdf · Mambo Running Analytics on Enterprise Storage Jingxin Feng, Xing Lin1, Gokul Soundararajan Advanced Technology

Scaling TeraGen

0.00

0.50

1.00

1.50

2.00

2.50

3.00

3.50

4.00

279 373 466 559 652 745 838 931 1024

Run

ning

Tim

e (N

orm

aliz

ed)

Data Size (in GB)

1 HDD

8 HDD

© 2015 NetApp, Inc. All rights reserved. 50

5 ×

Page 51: Mambo - School of Computingxinglin/papers/mambo-usenix-atc15.pdf · Mambo Running Analytics on Enterprise Storage Jingxin Feng, Xing Lin1, Gokul Soundararajan Advanced Technology

Scaling TeraGen

0.00

0.50

1.00

1.50

2.00

2.50

3.00

3.50

4.00

279 373 466 559 652 745 838 931 1024

Run

ning

Tim

e (N

orm

aliz

ed)

Data Size (in GB)

1 HDD

8 HDD

© 2015 NetApp, Inc. All rights reserved. 51

5 ×

NFS connector scales well for large datasets.

Page 52: Mambo - School of Computingxinglin/papers/mambo-usenix-atc15.pdf · Mambo Running Analytics on Enterprise Storage Jingxin Feng, Xing Lin1, Gokul Soundararajan Advanced Technology

Scaling TeraSort

0.00

0.50

1.00

1.50

2.00

2.50

3.00

3.50

4.00

279 373 466 559 652 745 838 931 1024

Run

ning

Tim

e (N

orm

aliz

ed)

Data Size (in GB)

1 HDD

8 HDD

© 2015 NetApp, Inc. All rights reserved. 52

Page 53: Mambo - School of Computingxinglin/papers/mambo-usenix-atc15.pdf · Mambo Running Analytics on Enterprise Storage Jingxin Feng, Xing Lin1, Gokul Soundararajan Advanced Technology

Overcome NFS server bottleneck Optimize with Caching

© 2015 NetApp, Inc. All rights reserved. 53

NFS servers

Compute nodes

Page 54: Mambo - School of Computingxinglin/papers/mambo-usenix-atc15.pdf · Mambo Running Analytics on Enterprise Storage Jingxin Feng, Xing Lin1, Gokul Soundararajan Advanced Technology

Overcome NFS server bottleneck

§  Real workloads are cacheable1

1MixApart: De-coupled Analytics for Shared Storage Servers. Madalin Mihailescu, Gokul Soundararajan, and Cristiana Amza. In FAST ‘13

Optimize with Caching

© 2015 NetApp, Inc. All rights reserved. 54

NFS servers

Compute nodes

Page 55: Mambo - School of Computingxinglin/papers/mambo-usenix-atc15.pdf · Mambo Running Analytics on Enterprise Storage Jingxin Feng, Xing Lin1, Gokul Soundararajan Advanced Technology

Overcome NFS server bottleneck

§  Real workloads are cacheable1

1MixApart: De-coupled Analytics for Shared Storage Servers. Madalin Mihailescu, Gokul Soundararajan, and Cristiana Amza. In FAST ‘13

Optimize with Caching

© 2015 NetApp, Inc. All rights reserved. 55

NFS servers

Compute nodes

FlashCache FlashCache FlashCache FlashCache

Flash cache tier

Page 56: Mambo - School of Computingxinglin/papers/mambo-usenix-atc15.pdf · Mambo Running Analytics on Enterprise Storage Jingxin Feng, Xing Lin1, Gokul Soundararajan Advanced Technology

Overcome NFS server bottleneck

§  Real workloads are cacheable1

1MixApart: De-coupled Analytics for Shared Storage Servers. Madalin Mihailescu, Gokul Soundararajan, and Cristiana Amza. In FAST ‘13

Optimize with Caching

© 2015 NetApp, Inc. All rights reserved. 56

NFS servers

Compute nodes

FlashCache FlashCache FlashCache FlashCache

Flash cache tier

Use local disk as cache

Page 57: Mambo - School of Computingxinglin/papers/mambo-usenix-atc15.pdf · Mambo Running Analytics on Enterprise Storage Jingxin Feng, Xing Lin1, Gokul Soundararajan Advanced Technology

Overcome NFS server bottleneck

§  Real workloads are cacheable1

1MixApart: De-coupled Analytics for Shared Storage Servers. Madalin Mihailescu, Gokul Soundararajan, and Cristiana Amza. In FAST ‘13

Optimize with Caching

© 2015 NetApp, Inc. All rights reserved. 57

NFS servers

Distributed in-memory cache tier

Compute nodes

FlashCache FlashCache FlashCache FlashCache

Flash cache tier

Use local disk as cache

Page 58: Mambo - School of Computingxinglin/papers/mambo-usenix-atc15.pdf · Mambo Running Analytics on Enterprise Storage Jingxin Feng, Xing Lin1, Gokul Soundararajan Advanced Technology

Next steps We need your help.

© 2015 NetApp, Inc. All rights reserved. 58

Page 59: Mambo - School of Computingxinglin/papers/mambo-usenix-atc15.pdf · Mambo Running Analytics on Enterprise Storage Jingxin Feng, Xing Lin1, Gokul Soundararajan Advanced Technology

Future Work

•  Productization within NetApp •  Support pNFS protocol •  Security (Kerberos authentication)

© 2015 NetApp, Inc. All rights reserved. 59

Page 60: Mambo - School of Computingxinglin/papers/mambo-usenix-atc15.pdf · Mambo Running Analytics on Enterprise Storage Jingxin Feng, Xing Lin1, Gokul Soundararajan Advanced Technology

Future Work

•  Productization within NetApp •  Support pNFS protocol •  Security (Kerberos authentication)

•  Integration tests with other frameworks •  Tachyon, HBase, Spark, and etc.

© 2015 NetApp, Inc. All rights reserved. 60

Page 61: Mambo - School of Computingxinglin/papers/mambo-usenix-atc15.pdf · Mambo Running Analytics on Enterprise Storage Jingxin Feng, Xing Lin1, Gokul Soundararajan Advanced Technology

Future Work

•  Productization within NetApp •  Support pNFS protocol •  Security (Kerberos authentication)

•  Integration tests with other frameworks •  Tachyon, HBase, Spark, and etc.

•  Production System Integration •  NetApp Auto Support (ASUP) Team •  Customer systems

© 2015 NetApp, Inc. All rights reserved. 61

Page 62: Mambo - School of Computingxinglin/papers/mambo-usenix-atc15.pdf · Mambo Running Analytics on Enterprise Storage Jingxin Feng, Xing Lin1, Gokul Soundararajan Advanced Technology

We Need Your Help

§  Anyone interested §  Try it out and tell us how it works §  Filing bugs

§  Hadoop committers §  Help to push NFS connector into Hadoop mainstream

§  Help integration tests with other frameworks (Tachyon, HBase, etc)

§  Help to improve the code at GitHub!

© 2015 NetApp, Inc. All rights reserved. 62

Page 63: Mambo - School of Computingxinglin/papers/mambo-usenix-atc15.pdf · Mambo Running Analytics on Enterprise Storage Jingxin Feng, Xing Lin1, Gokul Soundararajan Advanced Technology

References

§  Connector Information §  http://www.netapp.com/us/solutions/big-data/nfs-connector-hadoop.aspx

§  Public on GitHub: §  https://github.com/NetApp/NetApp-Hadoop-NFS-Connector

§  Technical Report: §  http://www.netapp.com/us/media/tr-4382.pdf

§  Paper at FAST’13 §  MixApart: De-Coupled Analytics for Shared Storage Servers

§  If you have any question, please contact §  [email protected], [email protected], [email protected]

© 2015 NetApp, Inc. All rights reserved. 63

Page 64: Mambo - School of Computingxinglin/papers/mambo-usenix-atc15.pdf · Mambo Running Analytics on Enterprise Storage Jingxin Feng, Xing Lin1, Gokul Soundararajan Advanced Technology

Summary

§  NetApp NFS connector for Hadoop §  Allows analytics to use any NFS §  An open implementation (no proprietary code) – contribute back to Hadoop §  Works with Apache Hadoop, Apache Spark, Tachyon, and Apache HBase §  In many cases, only configuration file change is needed (no source code changes)

© 2015 NetApp, Inc. All rights reserved. 64

Page 65: Mambo - School of Computingxinglin/papers/mambo-usenix-atc15.pdf · Mambo Running Analytics on Enterprise Storage Jingxin Feng, Xing Lin1, Gokul Soundararajan Advanced Technology

Summary

§  NetApp NFS connector for Hadoop §  Allows analytics to use any NFS §  An open implementation (no proprietary code) – contribute back to Hadoop §  Works with Apache Hadoop, Apache Spark, Tachyon, and Apache HBase §  In many cases, only configuration file change is needed (no source code changes)

§  NetApp NFS connector for Hadoop is being deployed §  Internal testing with other teams §  Testing with select customers

© 2015 NetApp, Inc. All rights reserved. 65

Page 66: Mambo - School of Computingxinglin/papers/mambo-usenix-atc15.pdf · Mambo Running Analytics on Enterprise Storage Jingxin Feng, Xing Lin1, Gokul Soundararajan Advanced Technology

Acknowledgements

§  Madalin Mihailescu for starting down this path

§  Kaladhar Voruganti, Scott Dawkins, Jeff Heller, AJ Mahajan, and Siva Jayasenan for supporting this effort

§  Karthikeyan Nagalingam for validation and customer PoCs

§  NetApp AutoSupport team for testing it in production

§  NetApp NFS team for continuing the effort

© 2015 NetApp, Inc. All rights reserved. 66

Page 67: Mambo - School of Computingxinglin/papers/mambo-usenix-atc15.pdf · Mambo Running Analytics on Enterprise Storage Jingxin Feng, Xing Lin1, Gokul Soundararajan Advanced Technology

Thank you

© 2015 NetApp, Inc. All rights reserved. 67

Mambo: analyze enterprise data in-place