ceph days 2014 paul evans slide deck

Post on 24-Jun-2015

147 Views

Category:

Technology

3 Downloads

Preview:

Click to see full reader

DESCRIPTION

Ceph Days held in October 2014 at Brocade headquarters in Silicon Valley.

TRANSCRIPT

BUILDING A CEPH-POWERED DATA LAKE (OR) DATA GRID

Paul Evans principal architect

daystrom technology group paul@daystrom.com

san jose 2014

ceph days

Why build a data grid (or data lake) ?

…because we have a data FLOOD in process

indeed, we love data…

we’re good at generating more and more, but…

( we never seem to throw any of it out )

too FAST

too many VARIANTS

too MUCH

IS THE ANSWER TO ALL OF THIS…. “ WE NEED LESS DATA! ”

are you crazy? we live to store things!

we just need better tools… (and more storage)

DATA AUTOMATION

Workflow Automation

Wildly-Scalable Storage

Data Lake Data Grid

STACK

DATA LAKE“a storage repository that holds a vast amount of raw data in its native

format until it is needed”

DATA LAKE - ORIGINS

First use credited to James Dixon, CTO at Pentaho, circa 2010

“If you think of a datamart as a store of bottled water – cleansed and packaged and structured for easy consumption – the data lake is a large body of water in a more natural state…”

“The contents of the data lake stream in from a

source to fill the lake, and various users of the lake

can come to examine, dive in, or take samples.”

DATA LAKE - EXPLAINED

While a hierarchical data warehouse stores data in files or folders, a data lake uses a flat architecture to store data. Each data element in a lake is assigned a unique identifier and tagged with a set of extended metadata tags. When a business question arises, the data lake can be queried for relevant data, and that smaller set of data can then be analyzed to help answer the question.

DATA LAKE - WHY ???

?

DATA LAKE CHARACTER

Unwashed Data: schema-on-read from RAW source Flexible Processing: batch, interactive, online, search

MetaData Dependent: tag it or lose it Common Access: hdfs-centric toolset

…in other words: this is not a glass-house Data Mart

A REFERENCE ‘LAKE’ ARCHITECTURE

OPERATIONSSECURITYDATA ACCESSGOVERNENCEINTEGRATION

DATA MANAGEMENT

A CEPHALOPOD IN THE LAKE?

Hadoop-native HDFS Locality-aware HDFS Distributed Name Svc Ceph Native Erasure Coding Ceph 20% Faster * Ceph * on Terasort benchmark over IB, Mar 2014

If this is import… Use this…

(LAKE) DREDGERS

technology grouptechnology group

DATA GRID“the unifying layer to how content and data are stored, protected, located

and accessed”

DATA GRID - ORIGINS

The need for data grids was first recognized by the scientific community concerning climate modeling, where exchanging PB-size data sets became commonplace. Recently, large-scale

instruments such as the Large Hadron Collider (LHC) at CERN are driving grid innovation.

DATA GRID - EXPLAINED

Data Grids present consistent access controls, governance, and metadata extensions to diverse storage media using a common, global interface for access and transport.

Additionally, they offer a ‘micro-service’ architecture for the creation of standard tasks & policies, which are enforced by a distributed “grid control-plane.”

DATA GRID - WHY ???

DATA GRID - ATTRIBUTES

Data Virtualization: common presentation of all content Universe-size Namespace: for files, objects & metadata Automation of Data Operations: distributed, scalable

Policy Mgmt/Reporting: data valuation & action triggers

CEPH MEETS GRID

implemented:

CephFS & RBD Ceph libRADOS RemoteCloud

Cold StorageArchive

DATA GRID unified namespace

HiSpeed Tier

LinkD

irectLIBRADOS

+ Ceph

LIBRADOS + Ceph

RBD

GRID IRON ALL-STARS

technology grouptechnology group

(Dan Bedard: danb@renci.org)

TIME 2 SUMMARIZE…We are in the midst of a Data Explosion

We also need effective, de-centralized ways to care for the dataWe need robust, expandable, yet simple solutions to store data

DATA AUTOMATION

STACK

Workflow Automation

Wildly-Scalable Storage

Ceph+

the SMART approach

Data Lake Data Grid

thank you!

Paul Evans principal architect

paul@daystrom.com

technology grouptechnology group

san jose ceph days

top related