emc hadoop starter kit - vipr edition
DESCRIPTION
Are you deploying Hadoop and want enterprise infrastructure manageability, reliability, and availability? The new EMC Hadoop Starter Kit shows you how to this without building HDFS data silo's.TRANSCRIPT
1© Copyright 2014 EMC Corporation. All rights reserved.
EMC Hadoop Starter KitViPR Edition
EMC Open Innovation Lab
2© Copyright 2014 EMC Corporation. All rights reserved.
The Digital Universe
Less than 1% of the World’s Data
is AnalyzedBy 2020, the Internet will
connect 7.6B people
and 200B things (sensors, machines, cars, appliances…)
Data Volumes
2000: 2 Exabytes a year2011: 2 Exabytes a day
3© Copyright 2014 EMC Corporation. All rights reserved.
Location & Types Of Big Data
Structured Data
UnstructuredData
Enterprise
ForecastData
LocationData
CreditData
ShippingData
Social, Video Data
Partner Public
10101010100101010011001010101110010
1101010100101011111
TelemetryData
Location & Types Of Big (& Fast!) Data
4© Copyright 2014 EMC Corporation. All rights reserved.
Hadoop Challenges
Depends on HDFS for data repository– Must make legacy data accessible through HDFS
Hadoop HDFS inefficiencies:– 3 copies for protection– No advanced data efficiency: de-duplication, thin provision– Security
Integration with robust traditional data center products: compute virtualization, enterprise storage
5© Copyright 2014 EMC Corporation. All rights reserved.
Hadoop Storage Options
Hadoop HDFS
• Leverage Hadoop distro HDFS data services
• Compute, and data converged on cluster of servers
Storage Array
• Name node and Data node services from storage array (i.e. EMC Isilon)
Storage OS
Name node and Data node services from storage OS (i.e. EMC ViPR)
6© Copyright 2014 EMC Corporation. All rights reserved.
ViPR HDFS
HDFS is becoming the de facto file system for distributed applications
ViPR is a great platform for HDFS– Addresses limitations of off-the-shelf HDFS– Brings HDFS to existing storage hardware– Enables HDFS/object/file scenarios– Flexible software model allows colocation
7© Copyright 2014 EMC Corporation. All rights reserved.
Support Mixed WorkloadsObject, File and HDFS operations on the same data
VIRTUAL ARRAY
Isilon3rd Party
VNX5500
ViPR Data Services offer three bucket options:
– Object– HDFS– ObjectandHDFS
ObjectandHDFS provides user with access to either S3 or HDFS
– Full compatibility with existing object based APIs
▪ Amazon S3, Openstack Swift, Atmos
Object HDFSObject& HDFS
8© Copyright 2014 EMC Corporation. All rights reserved.
Simple, Easy, Cost Effective EMC Starter Kit for Hadoop – ViPR Edition
Deployment guides for major Hadoop distributions:– Pivotal, Cloudera, and Hortonworks
Four step deployment:– Deploy preferred Hadoop Distribution– Deploy EMC ViPR with Object, and HDFS data services– Configure Hadoop distribution to use ViPR HDFS target– Validation Process
▪ Load data file via S3 interface▪ Test MapReduce job