from archive to insight debunking myths of analytics on object stores
TRANSCRIPT
From Archive to Insight:Debunking Myths of Analytics on Object Stores
Dean HildebrandIBM Research
Bill OwenIBM
Simon LorenzIBM
Rui ZhangIBM Research
Luis PabonRed Hat Storage
“Data Must Migrate from Swift to HDFS”
(and back again)”
Myth #1
Myth 1
“Swift should only be used with in-memory analytics (Spark)”
Myth #2
Myth 2
“Swift cannot efficiently support frameworks such as Hive and
HBase that require appending to a file”
Myth #3
Myth 3
“Object Stores are slow for analytics”
Myth #4
Myth 4
Load Imbalance
Unnecessary Data Movement
HTTP vs. RPC
Writing Through Proxy
Authentication
These may be true for Swift...but
Swift-on-File debunks the myths
Demo Now
So What Happened There...
10
HDFS
Typical Hadoop+HDFS Setup
Hadoop FS API
All Standard Apache Open-Source Components
MapReduce
Spark HBaseZookeeper
FlumePig
HCatalogSqoop
Solr/Lucene
Hive
HDFS HDFS HDFS HDFS HDFS HDFS HDFS
Create Copy In ⟹ ⟹ Analyze ⟹ Copy out
4 StepsMove data
twice
Hadoop FS API
All Standard Apache Open-Source Components
MapReduce
Spark HBaseZookeeper
FlumePig
HCatalogSqoop
Solr/Lucene
Hive
11
IBM Spectrum Scale or GlusterFS Hadoop Connector
Now replace HDFS with Scale-out FS
Scale-out File System
12
Now add in Swift access
Scale-out File System
IBM Spectrum Scale or GlusterFS Hadoop Connector
Hadoop FS API
All Standard Apache Open-Source Components
MapReduce
Spark HBaseZookeeper
FlumePig
HCatalogSqoop
Solr/Lucene
Hive
Proxy
Data Ingest and Results Distribution
ObjectDiskFile
Swift
Scale-out File System
IBM Spectrum Scale or GlusterFS Hadoop Connector
Hadoop FS API
All Standard Apache Open-Source Components
MapReduce
Spark HBaseZookeeper
FlumePig
HCatalogSqoop
Solr/Lucene
Hive
13
Proxy
Data Ingest and Results Distribution
Swift
Swift-on-File Policy
ObjectSwiftOnFile DiskFile
Now configure Swift-on-File
2 StepsNever
Move Data
Create Analyze⟹
Swift-on-File
● Swift Storage Policy● Stores objects on any scale-out
filesystems● Allows objects created using Swift API
to be accessed as files● Maps URL to file path
Swift-on-File Storage Policy
This object:http://swift.example.com/v1/acct/cont/obj
is located here:/mnt/swift/z1device7/objects/63773/ba2/
f91d1e7550cd32822a17b00fa86d9ba2/1414045361.93852.data
Now, this object:http://swift.example.com/v1/acct/cont/obj
is located here:/mnt/scaleout_fs/acct/cont/obj
18
Data Must Migrate from Object Store to HDFS
Myth
BUSTE
D RealityAnalyze in Place!
19
Object Stores should only be used with in-memory analytics
(Spark)
Myth
BUSTE
D RealitySupport entire Apache analytics
ecosystem with high-performance
20
Object Stores cannot efficiently support frameworks such as Hive and HBase that require
appending to a file
Myth
BUSTE
DReality
Support all POSIX operations, including append
21
Object Stores are slow for analytics
Myth
BUSTE
D RealityScale-out File System can match HDFS features and performance
© 2015 IBM Corporation
22
Swift-on-File Additional Use CasesScientific Analysis and Collaboration
Scale-out File SystemSwiftNAS
Share data to Global Scientific
Community
POSIX
Generate Scientific datasets through
FS
© 2015 IBM Corporation
23
Swift-on-File Additional Use CasesSupport General File-based Applications
Scale-out File SystemSwiftNAS
Global Ingest and
Access
POSIX
In-Place Editing
Future Plans
- Single Swift Proxy/Object process optimization
- Eliminating data movement in Scale-out File Systems
- Auditor support
- Multi-region support
- Load balancing of auxiliary Swift services across scale out file system
Summary➔ Gain insights faster
➔ Stop copying data
➔ High-performance analysis
➔ Leverage entire Apache analytics ecosystem
Q&A
Credits
Special thanks to all the people who made and released these awesome resources for free:▷ Presentation template by
SlidesCarnival▷ Photographs by Unsplash
“● Load imbalance due to lack of auto-segmentation● Rename causes data movement in Swift● Use of HTTP vs RPCs in HDFS● New objects must write through Proxy servers● Authentication overhead
BELOW THIS SLIDEIS THE TEMPLATE
EXAMPLESTemplate from:
http://www.slidescarnival.com/antonio-free-presentation-template/84
What’s this?This is a free presentation template for Google Slides designed by SlidesCarnival.We believe that good design serves to better communicate ideas, so we create free quality presentation templates for you to focus on the content.Enjoy them at will and share with us your results at:twitter.com/SlidesCarnivalfacebook.com/slidescarnival
About this templateHow can I use it?Open this document in Google Slides (if you are at slidescarnival.com use the button below this presentation)You have to be signed in to your Google account
▷ Edit in Google SlidesGo to the File menu and select Make a copy. You will get a copy of this document on your Google Drive and will be able to edit, add or delete slides.
▷ Edit in Microsoft PowerPoint®Go to the File menu and select Download as Microsoft PowerPoint. You will get a .pptx file that you can edit in PowerPoint. Remember to download and install the fonts used in this presentation (you’ll find the links to the font files needed in the Presentation design slide)
This template is free to use under Creative Commons Attribution license. If you use the graphic assets (photos, icons and typographies) provided with this presentation you must keep the Credits slide.
From Archive to Insight:Debunking a ofAnalytics on Object Stores
Hello!I am Jayden SmithI am here because I love to give presentations.
You can find me at:@username
1.TRANSITION HEADLINE
Let’s start with the first set of slides
“Quotations are commonly printed as a means of
inspiration and to invoke philosophical thoughts from
the reader.
This is a slide title
▷ Here you have a list of items▷ And some text▷ But remember not to overload
your slides with content
You audience will listen to you or read the content, but won’t do both.
Big conceptBring the attention of your audience over a key concept using icons or illustrations
WhiteIs the color of milk and fresh snow, the color produced by the combination of all the colors of the visible spectrum.
You can also split your content
BlackIs the color of coal, ebony, and of outer space. It is the darkest color, the result of the absence of or complete absorption of light.
In two or three columns
YellowIs the color of gold, butter and ripe lemons. In the spectrum of visible light, yellow is found between green and orange.
BlueIs the colour of the clear sky and the deep sea. It is located between violet and green on the optical spectrum.
RedIs the color of blood, and because of this it has historically been associated with sacrifice, danger and courage.
A picture is worth a thousand words
A complex idea can be conveyed with just a single still image, namely making it possible to absorb large amounts of data quickly.
Want big impact?Use big image.
Use charts to explain your ideas
GrayWhite Black
And tables to compare data
A B C
Yellow 10 20 7
Blue 30 15 10
Orange 5 24 16
Maps
our office
89,526,124Whoa! That’s a big number, aren’t you proud?
89,526,124$That’s a lot of money
100%Total success!
185,244 usersAnd a lot of users
Our process is easy
FirstSecon
dLast
Let’s review some concepts
YellowIs the color of gold, butter and ripe lemons. In the spectrum of visible light, yellow is found between green and orange.
BlueIs the colour of the clear sky and the deep sea. It is located between violet and green on the optical spectrum.
RedIs the color of blood, and because of this it has historically been associated with sacrifice, danger and courage.
YellowIs the color of gold, butter and ripe lemons. In the spectrum of visible light, yellow is found between green and orange.
BlueIs the colour of the clear sky and the deep sea. It is located between violet and green on the optical spectrum.
RedIs the color of blood, and because of this it has historically been associated with sacrifice, danger and courage.
You can copy&paste graphs from Google Sheets
Android projectShow and explain your web, app or software projects using these gadget templates.
Place your screenshot here
Place your screenshot here
iPhone projectShow and explain your web, app or software projects using these gadget templates.
Place your screenshot here
Tablet projectShow and explain your web, app or software projects using these gadget templates.
Place your screenshot here
Desktop projectShow and explain your web, app or software projects using these gadget templates.
Thanks!Any questions?You can find me at:@[email protected]
Presentation designThis presentations uses the following typographies and colors:
▷ Titles: Raleway▷ Body copy: Lato
You can download the fonts on this page:https://www.google.com/fonts#UsePlace:use/Collection:Lato:400,700,400italic,700italic|Raleway:400,700Click on the “arrow button” that appears on the top right
▷ Dark blue #2185c5▷ Light blue #7ecefd▷ Yellow #ff9715▷ Magenta #f20253▷ Dark gray #677480▷ Light gray #97abbc
You don’t need to keep this slide in your presentation. It’s only here to serve you as a design guide if you need to create new slides or download the fonts to edit the presentation in PowerPoint®
SlidesCarnival icons are editable shapes.
This means that you can:● Resize them without losing
quality.● Change fill color and opacity.● Change line color, width and
style.
Isn’t that nice? :)
Examples: