from archive to insight debunking myths of analytics on object stores

55
From Archive to Insight: Debunking Myths of Analytics on Object Stores

Upload: dean-hildebrand

Post on 05-Aug-2015

97 views

Category:

Technology


2 download

TRANSCRIPT

Page 1: From archive to insight  debunking myths of analytics on object stores

From Archive to Insight:Debunking Myths of Analytics on Object Stores

Page 2: From archive to insight  debunking myths of analytics on object stores

Dean HildebrandIBM Research

Bill OwenIBM

Simon LorenzIBM

Rui ZhangIBM Research

Luis PabonRed Hat Storage

Page 3: From archive to insight  debunking myths of analytics on object stores

“Data Must Migrate from Swift to HDFS”

(and back again)”

Myth #1

Myth 1

Page 4: From archive to insight  debunking myths of analytics on object stores

“Swift should only be used with in-memory analytics (Spark)”

Myth #2

Myth 2

Page 5: From archive to insight  debunking myths of analytics on object stores

“Swift cannot efficiently support frameworks such as Hive and

HBase that require appending to a file”

Myth #3

Myth 3

Page 6: From archive to insight  debunking myths of analytics on object stores

“Object Stores are slow for analytics”

Myth #4

Myth 4

Load Imbalance

Unnecessary Data Movement

HTTP vs. RPC

Writing Through Proxy

Authentication

Page 7: From archive to insight  debunking myths of analytics on object stores

These may be true for Swift...but

Swift-on-File debunks the myths

Page 8: From archive to insight  debunking myths of analytics on object stores

Demo Now

Page 9: From archive to insight  debunking myths of analytics on object stores

So What Happened There...

Page 10: From archive to insight  debunking myths of analytics on object stores

10

HDFS

Typical Hadoop+HDFS Setup

Hadoop FS API

All Standard Apache Open-Source Components

MapReduce

Spark HBaseZookeeper

FlumePig

HCatalogSqoop

Solr/Lucene

Hive

HDFS HDFS HDFS HDFS HDFS HDFS HDFS

Create Copy In ⟹ ⟹ Analyze ⟹ Copy out

4 StepsMove data

twice

Page 11: From archive to insight  debunking myths of analytics on object stores

Hadoop FS API

All Standard Apache Open-Source Components

MapReduce

Spark HBaseZookeeper

FlumePig

HCatalogSqoop

Solr/Lucene

Hive

11

IBM Spectrum Scale or GlusterFS Hadoop Connector

Now replace HDFS with Scale-out FS

Scale-out File System

Page 12: From archive to insight  debunking myths of analytics on object stores

12

Now add in Swift access

Scale-out File System

IBM Spectrum Scale or GlusterFS Hadoop Connector

Hadoop FS API

All Standard Apache Open-Source Components

MapReduce

Spark HBaseZookeeper

FlumePig

HCatalogSqoop

Solr/Lucene

Hive

Proxy

Data Ingest and Results Distribution

ObjectDiskFile

Swift

Page 13: From archive to insight  debunking myths of analytics on object stores

Scale-out File System

IBM Spectrum Scale or GlusterFS Hadoop Connector

Hadoop FS API

All Standard Apache Open-Source Components

MapReduce

Spark HBaseZookeeper

FlumePig

HCatalogSqoop

Solr/Lucene

Hive

13

Proxy

Data Ingest and Results Distribution

Swift

Swift-on-File Policy

ObjectSwiftOnFile DiskFile

Now configure Swift-on-File

2 StepsNever

Move Data

Create Analyze⟹

Page 14: From archive to insight  debunking myths of analytics on object stores

Swift-on-File

● Swift Storage Policy● Stores objects on any scale-out

filesystems● Allows objects created using Swift API

to be accessed as files● Maps URL to file path

Page 15: From archive to insight  debunking myths of analytics on object stores

Swift-on-File Storage Policy

Page 16: From archive to insight  debunking myths of analytics on object stores

This object:http://swift.example.com/v1/acct/cont/obj

is located here:/mnt/swift/z1device7/objects/63773/ba2/

f91d1e7550cd32822a17b00fa86d9ba2/1414045361.93852.data

Page 17: From archive to insight  debunking myths of analytics on object stores

Now, this object:http://swift.example.com/v1/acct/cont/obj

is located here:/mnt/scaleout_fs/acct/cont/obj

Page 18: From archive to insight  debunking myths of analytics on object stores

18

Data Must Migrate from Object Store to HDFS

Myth

BUSTE

D RealityAnalyze in Place!

Page 19: From archive to insight  debunking myths of analytics on object stores

19

Object Stores should only be used with in-memory analytics

(Spark)

Myth

BUSTE

D RealitySupport entire Apache analytics

ecosystem with high-performance

Page 20: From archive to insight  debunking myths of analytics on object stores

20

Object Stores cannot efficiently support frameworks such as Hive and HBase that require

appending to a file

Myth

BUSTE

DReality

Support all POSIX operations, including append

Page 21: From archive to insight  debunking myths of analytics on object stores

21

Object Stores are slow for analytics

Myth

BUSTE

D RealityScale-out File System can match HDFS features and performance

Page 22: From archive to insight  debunking myths of analytics on object stores

© 2015 IBM Corporation

22

Swift-on-File Additional Use CasesScientific Analysis and Collaboration

Scale-out File SystemSwiftNAS

Share data to Global Scientific

Community

POSIX

Generate Scientific datasets through

FS

Page 23: From archive to insight  debunking myths of analytics on object stores

© 2015 IBM Corporation

23

Swift-on-File Additional Use CasesSupport General File-based Applications

Scale-out File SystemSwiftNAS

Global Ingest and

Access

POSIX

In-Place Editing

Page 24: From archive to insight  debunking myths of analytics on object stores

Future Plans

- Single Swift Proxy/Object process optimization

- Eliminating data movement in Scale-out File Systems

- Auditor support

- Multi-region support

- Load balancing of auxiliary Swift services across scale out file system

Page 25: From archive to insight  debunking myths of analytics on object stores

Summary➔ Gain insights faster

➔ Stop copying data

➔ High-performance analysis

➔ Leverage entire Apache analytics ecosystem

Page 26: From archive to insight  debunking myths of analytics on object stores

Q&A

Page 27: From archive to insight  debunking myths of analytics on object stores

Credits

Special thanks to all the people who made and released these awesome resources for free:▷ Presentation template by

SlidesCarnival▷ Photographs by Unsplash

Page 28: From archive to insight  debunking myths of analytics on object stores

“● Load imbalance due to lack of auto-segmentation● Rename causes data movement in Swift● Use of HTTP vs RPCs in HDFS● New objects must write through Proxy servers● Authentication overhead

Page 29: From archive to insight  debunking myths of analytics on object stores

BELOW THIS SLIDEIS THE TEMPLATE

EXAMPLESTemplate from:

http://www.slidescarnival.com/antonio-free-presentation-template/84

Page 30: From archive to insight  debunking myths of analytics on object stores

What’s this?This is a free presentation template for Google Slides designed by SlidesCarnival.We believe that good design serves to better communicate ideas, so we create free quality presentation templates for you to focus on the content.Enjoy them at will and share with us your results at:twitter.com/SlidesCarnivalfacebook.com/slidescarnival

About this templateHow can I use it?Open this document in Google Slides (if you are at slidescarnival.com use the button below this presentation)You have to be signed in to your Google account

▷ Edit in Google SlidesGo to the File menu and select Make a copy. You will get a copy of this document on your Google Drive and will be able to edit, add or delete slides.

▷ Edit in Microsoft PowerPoint®Go to the File menu and select Download as Microsoft PowerPoint. You will get a .pptx file that you can edit in PowerPoint. Remember to download and install the fonts used in this presentation (you’ll find the links to the font files needed in the Presentation design slide)

This template is free to use under Creative Commons Attribution license. If you use the graphic assets (photos, icons and typographies) provided with this presentation you must keep the Credits slide.

Page 31: From archive to insight  debunking myths of analytics on object stores

From Archive to Insight:Debunking a ofAnalytics on Object Stores

Page 32: From archive to insight  debunking myths of analytics on object stores

Hello!I am Jayden SmithI am here because I love to give presentations.

You can find me at:@username

Page 33: From archive to insight  debunking myths of analytics on object stores

1.TRANSITION HEADLINE

Let’s start with the first set of slides

Page 34: From archive to insight  debunking myths of analytics on object stores

“Quotations are commonly printed as a means of

inspiration and to invoke philosophical thoughts from

the reader.

Page 35: From archive to insight  debunking myths of analytics on object stores

This is a slide title

▷ Here you have a list of items▷ And some text▷ But remember not to overload

your slides with content

You audience will listen to you or read the content, but won’t do both.

Page 36: From archive to insight  debunking myths of analytics on object stores

Big conceptBring the attention of your audience over a key concept using icons or illustrations

Page 37: From archive to insight  debunking myths of analytics on object stores

WhiteIs the color of milk and fresh snow, the color produced by the combination of all the colors of the visible spectrum.

You can also split your content

BlackIs the color of coal, ebony, and of outer space. It is the darkest color, the result of the absence of or complete absorption of light.

Page 38: From archive to insight  debunking myths of analytics on object stores

In two or three columns

YellowIs the color of gold, butter and ripe lemons. In the spectrum of visible light, yellow is found between green and orange.

BlueIs the colour of the clear sky and the deep sea. It is located between violet and green on the optical spectrum.

RedIs the color of blood, and because of this it has historically been associated with sacrifice, danger and courage.

Page 39: From archive to insight  debunking myths of analytics on object stores

A picture is worth a thousand words

A complex idea can be conveyed with just a single still image, namely making it possible to absorb large amounts of data quickly.

Page 40: From archive to insight  debunking myths of analytics on object stores

Want big impact?Use big image.

Page 41: From archive to insight  debunking myths of analytics on object stores

Use charts to explain your ideas

GrayWhite Black

Page 42: From archive to insight  debunking myths of analytics on object stores

And tables to compare data

A B C

Yellow 10 20 7

Blue 30 15 10

Orange 5 24 16

Page 43: From archive to insight  debunking myths of analytics on object stores

Maps

our office

Page 44: From archive to insight  debunking myths of analytics on object stores

89,526,124Whoa! That’s a big number, aren’t you proud?

Page 45: From archive to insight  debunking myths of analytics on object stores

89,526,124$That’s a lot of money

100%Total success!

185,244 usersAnd a lot of users

Page 46: From archive to insight  debunking myths of analytics on object stores

Our process is easy

FirstSecon

dLast

Page 47: From archive to insight  debunking myths of analytics on object stores

Let’s review some concepts

YellowIs the color of gold, butter and ripe lemons. In the spectrum of visible light, yellow is found between green and orange.

BlueIs the colour of the clear sky and the deep sea. It is located between violet and green on the optical spectrum.

RedIs the color of blood, and because of this it has historically been associated with sacrifice, danger and courage.

YellowIs the color of gold, butter and ripe lemons. In the spectrum of visible light, yellow is found between green and orange.

BlueIs the colour of the clear sky and the deep sea. It is located between violet and green on the optical spectrum.

RedIs the color of blood, and because of this it has historically been associated with sacrifice, danger and courage.

Page 48: From archive to insight  debunking myths of analytics on object stores

You can copy&paste graphs from Google Sheets

Page 49: From archive to insight  debunking myths of analytics on object stores

Android projectShow and explain your web, app or software projects using these gadget templates.

Place your screenshot here

Page 50: From archive to insight  debunking myths of analytics on object stores

Place your screenshot here

iPhone projectShow and explain your web, app or software projects using these gadget templates.

Page 51: From archive to insight  debunking myths of analytics on object stores

Place your screenshot here

Tablet projectShow and explain your web, app or software projects using these gadget templates.

Page 52: From archive to insight  debunking myths of analytics on object stores

Place your screenshot here

Desktop projectShow and explain your web, app or software projects using these gadget templates.

Page 53: From archive to insight  debunking myths of analytics on object stores

Thanks!Any questions?You can find me at:@[email protected]

Page 54: From archive to insight  debunking myths of analytics on object stores

Presentation designThis presentations uses the following typographies and colors:

▷ Titles: Raleway▷ Body copy: Lato

You can download the fonts on this page:https://www.google.com/fonts#UsePlace:use/Collection:Lato:400,700,400italic,700italic|Raleway:400,700Click on the “arrow button” that appears on the top right

▷ Dark blue #2185c5▷ Light blue #7ecefd▷ Yellow #ff9715▷ Magenta #f20253▷ Dark gray #677480▷ Light gray #97abbc

You don’t need to keep this slide in your presentation. It’s only here to serve you as a design guide if you need to create new slides or download the fonts to edit the presentation in PowerPoint®

Page 55: From archive to insight  debunking myths of analytics on object stores

SlidesCarnival icons are editable shapes.

This means that you can:● Resize them without losing

quality.● Change fill color and opacity.● Change line color, width and

style.

Isn’t that nice? :)

Examples: