emc big data solutions overview
DESCRIPTION
Overview of emerging EMC Big Data solutions using Hadoop, and SplunkTRANSCRIPT
![Page 1: EMC Big Data Solutions Overview](https://reader038.vdocuments.us/reader038/viewer/2022110303/54c6381c4a79594e588b45af/html5/thumbnails/1.jpg)
1© Copyright 2014 EMC Corporation. All rights reserved.© Copyright 2014 EMC Corporation. All rights reserved.
EMC Big Data Solutions Overview
![Page 2: EMC Big Data Solutions Overview](https://reader038.vdocuments.us/reader038/viewer/2022110303/54c6381c4a79594e588b45af/html5/thumbnails/2.jpg)
2© Copyright 2014 EMC Corporation. All rights reserved.© Copyright 2014 EMC Corporation. All rights reserved.
Big Data - Why do I care? Digital universe is expanding rapidly
– 44x to 50x data expansion this decade– By 2020 40ZB (40 trillion GB)
▪ 1.7 MB of new information will be created for each and every human being on the planet -- every second of every day.
41% growth of IoT, M2M data– % of data generated about us exploding– % of data tagged and analyzed exploding
Emerging Markets +62% of data– 22% from China alone
IT challenges: – servers will increase 10x– Information directly managed by enterprises
will grow 14%– Data under security governance will grow
40%– Number of IT professionals is expected to
grow by only a factor of 1.5x by 2020.
![Page 3: EMC Big Data Solutions Overview](https://reader038.vdocuments.us/reader038/viewer/2022110303/54c6381c4a79594e588b45af/html5/thumbnails/3.jpg)
3© Copyright 2014 EMC Corporation. All rights reserved.© Copyright 2014 EMC Corporation. All rights reserved.
Big Data Challenges for IT
Complexity– Multiple Hadoop distributions (Apache, Cloudera,
Hortonworks, Pivotal) Costs
– Acquisition & Operations Security & Governance
– Finance SEC17a-4, HIPPA– ISO – Audit
Big Data is more than Hadoop– Use familiar analytics tools
![Page 4: EMC Big Data Solutions Overview](https://reader038.vdocuments.us/reader038/viewer/2022110303/54c6381c4a79594e588b45af/html5/thumbnails/4.jpg)
4© Copyright 2014 EMC Corporation. All rights reserved.© Copyright 2014 EMC Corporation. All rights reserved.
EMC Hadoop Starter Kit
![Page 5: EMC Big Data Solutions Overview](https://reader038.vdocuments.us/reader038/viewer/2022110303/54c6381c4a79594e588b45af/html5/thumbnails/5.jpg)
5© Copyright 2014 EMC Corporation. All rights reserved.© Copyright 2014 EMC Corporation. All rights reserved.
Simple, Easy, Cost Effective EMC Starter Kit for Hadoop
Create simplified process to get started with Hadoop:– 4-8 node cluster– Automated, repeatable deployment– Leverage existing infrastructure investment
Success Criteria:– Low, no new cost– 2 hour customer deployment– Make it easy to leverage familiar, robust enterprise infrastructure
![Page 6: EMC Big Data Solutions Overview](https://reader038.vdocuments.us/reader038/viewer/2022110303/54c6381c4a79594e588b45af/html5/thumbnails/6.jpg)
6© Copyright 2014 EMC Corporation. All rights reserved.© Copyright 2014 EMC Corporation. All rights reserved.
EMC Hadoop Starter Kit EMC-VMware Deployment Guide
– Enable HDFS on Isilon cluster– Deploy Cloudera compute cluster– Deploy Hortonworks compute cluster– Deploy PivotalHD compute cluster– Deploy Apache compute cluster– Test data set – Ulysses with Map Reduce process– Collateral available through ECN, blogs, and twitter
Running deployment in OIL for demo’s, Pilots EMC vLab created – PivotalHD with VMware, EMC Isilon
![Page 7: EMC Big Data Solutions Overview](https://reader038.vdocuments.us/reader038/viewer/2022110303/54c6381c4a79594e588b45af/html5/thumbnails/7.jpg)
7© Copyright 2014 EMC Corporation. All rights reserved.© Copyright 2014 EMC Corporation. All rights reserved.
EMC Hadoop Starter KitHow do I get Free access to Hadoop Starter Kit?
• Type “EMC hadoop Starter kit” into google• https://community.emc.com/community/connect/everything_big_data• https://community.emc.com/docs/DOC-26892• http://theruddyduck.typepad.com/• https://www.youtube.com/watch?feature=player_embedded&v=MtBRbTeJbZM• https://www.youtube.com/watch?feature=player_embedded&v=1Lch5e3wGtA
Key Data Sets:• Close to 4300 views!• HSK Downloads:
• Pivotal – 410• Cloudera – 261• HortonWorks – 275• Apache – 310
• Over 150 Isilon HDFS license’s deployed world wide!
![Page 8: EMC Big Data Solutions Overview](https://reader038.vdocuments.us/reader038/viewer/2022110303/54c6381c4a79594e588b45af/html5/thumbnails/8.jpg)
8© Copyright 2014 EMC Corporation. All rights reserved.© Copyright 2014 EMC Corporation. All rights reserved.
EMC ViPR with HDFS
![Page 9: EMC Big Data Solutions Overview](https://reader038.vdocuments.us/reader038/viewer/2022110303/54c6381c4a79594e588b45af/html5/thumbnails/9.jpg)
9© Copyright 2014 EMC Corporation. All rights reserved.© Copyright 2014 EMC Corporation. All rights reserved.
VCE VblockTM
Turnkey Solution for Big Data and Analytics
SERVER
NETWORK
STORAGE
VIRTUALIZATION
PROTECTION
EMC Symmetric VMAX, VNX and Isilon
EMC Avamar, Data Domain, VPLEX, RecoverPoint
Cisco Unified Computing System (UCS) serversCisco Data Center and Cloud Networking (DCN) portfolio
VMware vSphere including Big Data Extension (BDE)
![Page 10: EMC Big Data Solutions Overview](https://reader038.vdocuments.us/reader038/viewer/2022110303/54c6381c4a79594e588b45af/html5/thumbnails/10.jpg)
10© Copyright 2014 EMC Corporation. All rights reserved.© Copyright 2014 EMC Corporation. All rights reserved.
Converged Platform for Big Data and AnalyticsVCE VblockTM
![Page 11: EMC Big Data Solutions Overview](https://reader038.vdocuments.us/reader038/viewer/2022110303/54c6381c4a79594e588b45af/html5/thumbnails/11.jpg)
11© Copyright 2014 EMC Corporation. All rights reserved.© Copyright 2014 EMC Corporation. All rights reserved.
Big Data Challenges for IT
Complexity– Multiple Hadoop distributions (Apache, Cloudera,
Hortonworks, Pivotal) Costs
– Acquisition & Operations Security & Governance
– Finance SEC17a-4, HIPPA– ISO – Audit
Big Data is more than Hadoop– Use familiar analytics tools
![Page 12: EMC Big Data Solutions Overview](https://reader038.vdocuments.us/reader038/viewer/2022110303/54c6381c4a79594e588b45af/html5/thumbnails/12.jpg)
12© Copyright 2014 EMC Corporation. All rights reserved.© Copyright 2014 EMC Corporation. All rights reserved.12
Industry’s Most Efficient & Secure Big Data Management Solution
Jyothi SwaroopDirector, Product Marketing & Alliances
![Page 13: EMC Big Data Solutions Overview](https://reader038.vdocuments.us/reader038/viewer/2022110303/54c6381c4a79594e588b45af/html5/thumbnails/13.jpg)
13© Copyright 2014 EMC Corporation. All rights reserved.© Copyright 2014 EMC Corporation. All rights reserved.13
EnterpriseData
Analytical Archive: Enterprise Data Warehouse
OffloadCompliance Archive:
Tape Avoidance/Replacement
First SQL Compatible, Enterprise-grade Database to run on Isilon Scale-out NAS
(with Hadoop or not).
RainStor & EMC Isilon Solution & Use-case
![Page 14: EMC Big Data Solutions Overview](https://reader038.vdocuments.us/reader038/viewer/2022110303/54c6381c4a79594e588b45af/html5/thumbnails/14.jpg)
14© Copyright 2014 EMC Corporation. All rights reserved.© Copyright 2014 EMC Corporation. All rights reserved.
RainStor Architecture
![Page 15: EMC Big Data Solutions Overview](https://reader038.vdocuments.us/reader038/viewer/2022110303/54c6381c4a79594e588b45af/html5/thumbnails/15.jpg)
15© Copyright 2014 EMC Corporation. All rights reserved.© Copyright 2014 EMC Corporation. All rights reserved.
Hadoop Data Security
• Authentication – RBAC• Authorization – ACL’s by
user• Encryption – Data at Rest• Audit Trail – logs data
access by user for audit• Immutability – data can
never changed
![Page 16: EMC Big Data Solutions Overview](https://reader038.vdocuments.us/reader038/viewer/2022110303/54c6381c4a79594e588b45af/html5/thumbnails/16.jpg)
16© Copyright 2014 EMC Corporation. All rights reserved.© Copyright 2014 EMC Corporation. All rights reserved.
Big Data Challenges for IT
Complexity– Multiple Hadoop distributions (Apache, Cloudera,
Hortonworks, Pivotal) Costs
– Acquisition & Operations Security & Governance
– Fiance SEC17a-4, HIPPA– ISO – Audit
Big Data is more than Hadoop– Use familiar analytics tools
![Page 17: EMC Big Data Solutions Overview](https://reader038.vdocuments.us/reader038/viewer/2022110303/54c6381c4a79594e588b45af/html5/thumbnails/17.jpg)
17© Copyright 2014 EMC Corporation. All rights reserved.© Copyright 2014 EMC Corporation. All rights reserved.
Big Data with Splunk
![Page 18: EMC Big Data Solutions Overview](https://reader038.vdocuments.us/reader038/viewer/2022110303/54c6381c4a79594e588b45af/html5/thumbnails/18.jpg)
18© Copyright 2014 EMC Corporation. All rights reserved.© Copyright 2014 EMC Corporation. All rights reserved.
Splunk Company Highlights
• Founded 2004 • First SW in 2006• HQ: San Francisco, CA• AP HQ: Hong Kong• EMEA HQ: London• Over 850+ employees • 8+ Offices WW
Company (SPLK: >100% IPO)
• On Premise, SaaS or In the Cloud: Licensed by Daily Index Volume
• Free Download 500MB Trial: Same bits Scale 500MB > 100s TBs/day
Products/Business Model
6000+ Customers
Business Highlights
60+ Fortune 100
90+ Countries
![Page 19: EMC Big Data Solutions Overview](https://reader038.vdocuments.us/reader038/viewer/2022110303/54c6381c4a79594e588b45af/html5/thumbnails/19.jpg)
19© Copyright 2014 EMC Corporation. All rights reserved.© Copyright 2014 EMC Corporation. All rights reserved.
Industry Leading Platform for Machine Data
Any Machine Data Operational Intelligence
EMCStorage
Search and Investigation
Proactive Monitoring
Operational Visibility
Real-time Business Insights
CommodityServers
Online Service
s Web Service
s
ServersSecurity GPS
Location
StorageDesktops
Networks
Packaged Applications
CustomApplicationsMessaging
TelecomsOnline
Shopping Cart
Web Clickstreams
Databases
Energy Meters
Call Detail Records
Smartphones and Devices
RFID
![Page 20: EMC Big Data Solutions Overview](https://reader038.vdocuments.us/reader038/viewer/2022110303/54c6381c4a79594e588b45af/html5/thumbnails/20.jpg)
20© Copyright 2014 EMC Corporation. All rights reserved.© Copyright 2014 EMC Corporation. All rights reserved.
Industry Leading Platform for Machine Data
Any Machine Data Operational Intelligence
HA Indexes and
Storage
Search and Investigation
Proactive Monitoring
Operational Visibility
Real-time Business Insights
CommodityServers
Online Service
s Web Service
s
ServersSecurity GPS
Location
StorageDesktops
Networks
Packaged Applications
CustomApplicationsMessaging
TelecomsOnline
Shopping Cart
Web Clickstreams
Databases
Energy Meters
Call Detail Records
Smartphones and Devices
RFID
Any amount, any location, any source
Schema-on-the-fly
Universal forwarding
No back-end RDBMS
No need to filter
data
![Page 21: EMC Big Data Solutions Overview](https://reader038.vdocuments.us/reader038/viewer/2022110303/54c6381c4a79594e588b45af/html5/thumbnails/21.jpg)
21© Copyright 2014 EMC Corporation. All rights reserved.© Copyright 2014 EMC Corporation. All rights reserved.
EMC Starter Kit for Splunk• Splunk is easy to setup and deploy• Infrastructure for Splunk should be easy and
inexpensive• Use familiar, robust IT infrastructure• Leverage existing IT investment• Provide reliable, repeatable, tested solution
How do I get Free access to EMC-Splunk Starter Kit?• Type “EMC reference architecture for splunk”
into google• https://community.emc.com/docs/DOC-27406• Over 1000 views!
![Page 22: EMC Big Data Solutions Overview](https://reader038.vdocuments.us/reader038/viewer/2022110303/54c6381c4a79594e588b45af/html5/thumbnails/22.jpg)
22© Copyright 2014 EMC Corporation. All rights reserved.© Copyright 2014 EMC Corporation. All rights reserved.
Splunk Performance with Shared Storage & Compute
RAID 10 6x15k RPM
Time to search (s)
Single Search0
1
2
3
Isilon DAS EC2
Single Search0
10
20
30
Isilon DAS EC2
Time to 1st event (s)
18.072.499
3.02 26.50
Single Index0
10
20
30
Isilon DAS EC2
Single Index0
40
80
Isilon DAS EC2
79,057
10,94437,574
10,649
Average EPS (1000s)Average KBPS (1000s)
2.48 20.18
22,400
38,730
![Page 23: EMC Big Data Solutions Overview](https://reader038.vdocuments.us/reader038/viewer/2022110303/54c6381c4a79594e588b45af/html5/thumbnails/23.jpg)
23© Copyright 2014 EMC Corporation. All rights reserved.© Copyright 2014 EMC Corporation. All rights reserved.
Partners Big Data on Vblock
EMC Solutions for HadoopMany Joint Pivotal on EMC customers
Formal collaboration established
Officially Support IsilonCo-branded HSK for Cloudera
Many Joint Customers
Enabling Service ProvidersHDaaS
Several key winsCo-branded HSK for Splunk
Many Joint CustomersJoint support
Jointly architected Vblock for Hadoop with VMware, Cisco, EMC
Several Customer Pilots
Hadoop Wins
Many installed wins with all of the major distributions
Two new case studies:
![Page 24: EMC Big Data Solutions Overview](https://reader038.vdocuments.us/reader038/viewer/2022110303/54c6381c4a79594e588b45af/html5/thumbnails/24.jpg)
![Page 25: EMC Big Data Solutions Overview](https://reader038.vdocuments.us/reader038/viewer/2022110303/54c6381c4a79594e588b45af/html5/thumbnails/25.jpg)
25© Copyright 2014 EMC Corporation. All rights reserved.© Copyright 2014 EMC Corporation. All rights reserved.
Why Use Shared Infrastructure for Hadoop?
![Page 26: EMC Big Data Solutions Overview](https://reader038.vdocuments.us/reader038/viewer/2022110303/54c6381c4a79594e588b45af/html5/thumbnails/26.jpg)
26© Copyright 2014 EMC Corporation. All rights reserved.© Copyright 2014 EMC Corporation. All rights reserved.
Combined Storage/Compute
VM
Hadoop in VM• VM lifecycle
determinedby Datanode
• Limited elasticity• Limited to Hadoop
Multi-Tenancy
Storage
Compute
VM
VM
Separate Storage• Separate compute
from data• Elastic compute• Enable shared
workloads• Raise utilization
Storage
T1 T2
VM
VM
VM
Separate Compute Tenants• Separate virtual clusters
per tenant• Stronger VM-grade security
and resource isolation• Enable deployment of
multiple Hadoop runtime versions
Slave NodeHadoop Deployment Models
![Page 27: EMC Big Data Solutions Overview](https://reader038.vdocuments.us/reader038/viewer/2022110303/54c6381c4a79594e588b45af/html5/thumbnails/27.jpg)
27© Copyright 2014 EMC Corporation. All rights reserved.© Copyright 2014 EMC Corporation. All rights reserved.
Why HDFS on EMC (Isilon) shared storage
• No Ingest necessary
• Eliminate NameNode SPOF
• Eliminate 3x mirroring
• Enterprise feature set
• Multi-protocol access
• Simultaneous Multi-distribution support
• Better cost!
• Smart-Dedupe for Hadoop
• SEC 17a-4 Compliant WORM
• Kerberos Authentication
• Hadoop Multi-tenancy
• Simultaneous Distribution Version Support
• Great performance!
Module 4: Horizontal and Vertical Markets
![Page 28: EMC Big Data Solutions Overview](https://reader038.vdocuments.us/reader038/viewer/2022110303/54c6381c4a79594e588b45af/html5/thumbnails/28.jpg)
28© Copyright 2014 EMC Corporation. All rights reserved.© Copyright 2014 EMC Corporation. All rights reserved.
Rapid Deployment
Self service tools
Automated resource rebalancing
Performance
True multi-tenancy
Elastic scaling
Avoid dedicated hardware
VM-based isolation
Increase resource utilization
Choice of distributions and storage
Maintain management flexibility at scale
Leverage vSphere features
Why Virtualize Hadoop?
Operational Simplicity with Performance
Maximize Resource Utilization on New or
Existing Hardware
Architect Scalable and Flexible Big Data Platform
![Page 29: EMC Big Data Solutions Overview](https://reader038.vdocuments.us/reader038/viewer/2022110303/54c6381c4a79594e588b45af/html5/thumbnails/29.jpg)
29© Copyright 2014 EMC Corporation. All rights reserved.© Copyright 2014 EMC Corporation. All rights reserved.
Performance: Native vs. Virtual, 32 hosts, 16 disks/host
Source: http://www.vmware.com/resources/techresources/10360
![Page 30: EMC Big Data Solutions Overview](https://reader038.vdocuments.us/reader038/viewer/2022110303/54c6381c4a79594e588b45af/html5/thumbnails/30.jpg)
30© Copyright 2014 EMC Corporation. All rights reserved.© Copyright 2014 EMC Corporation. All rights reserved. 30© Copyright 2013 Pivotal. All rights reserved.
Pivotal-Isilon Alliance
Federation Plan & Field Momentum
Q4 2013
![Page 31: EMC Big Data Solutions Overview](https://reader038.vdocuments.us/reader038/viewer/2022110303/54c6381c4a79594e588b45af/html5/thumbnails/31.jpg)
31© Copyright 2014 EMC Corporation. All rights reserved.© Copyright 2014 EMC Corporation. All rights reserved.
Pivotal Overview
Data Science Team
▶ Developer-friendly.
▶ Industry leading application framework and runtimes.
▶ Complete & disruptive set of data products.
▶ Services that accelerate productivity.
▶ Multi-cloud deployment.
▶ Commitment to open source & open standards.
One
![Page 32: EMC Big Data Solutions Overview](https://reader038.vdocuments.us/reader038/viewer/2022110303/54c6381c4a79594e588b45af/html5/thumbnails/32.jpg)
32© Copyright 2014 EMC Corporation. All rights reserved.© Copyright 2014 EMC Corporation. All rights reserved.
Revised Color Palette For 2014
WhiteR 255G 255B 255
BlackR 0G 0B 0
EMC BlueR 44G 149B 221
GreenR 73G 169B 66
VMware GrayR 113G 112B 116
EMC GrayR 186G 188B 190
RedR 206G 49B 49
Pivotal GreenR 0G 125B 104
Lt. BlueR 147G 197B 255
Replaces Replaces ReplacesReplacesReplaces
![Page 33: EMC Big Data Solutions Overview](https://reader038.vdocuments.us/reader038/viewer/2022110303/54c6381c4a79594e588b45af/html5/thumbnails/33.jpg)
33© Copyright 2014 EMC Corporation. All rights reserved.© Copyright 2014 EMC Corporation. All rights reserved.
![Page 34: EMC Big Data Solutions Overview](https://reader038.vdocuments.us/reader038/viewer/2022110303/54c6381c4a79594e588b45af/html5/thumbnails/34.jpg)