5 myths about spark and big data by nik rouda
TRANSCRIPT
5 Myths about Spark & Big Data(and where it goes next)
Nik RoudaSenior Analyst, ESG
AboutESG• ESGisanITanalyst,research,validation,andstrategycompany• Established in1999• 52employees /29analysts,researchers,andconsultants• ESGconductsresearchwithandforITvendors,ITprofessionals,businessprofessionals,andchannelpartners
• OngoinganalystcoverageinCloudComputing,Cybersecurity, DataManagement&Analytics,DataProtection,Storage,Networking, ApplicationDevelopment&Delivery,EnterpriseMobility, andChannels
• Keycapabilities include:Research&advisoryservices, custommarketresearch,strategicconsulting,technicalandeconomicvalidation,andcustomcontent
2
©2016byTheEnterpriseStrategyGroup,Inc.
Thankstoourhost:Databricks
Winner intheHadoop&Sparkcategory
http://blog.esg-global.com/the-delta-v-awards
©2016byTheEnterpriseStrategyGroup,Inc.
Data-Driven: Enterprise Big Data, BI, and Analytics Trends Redux
©2016byTheEnterpriseStrategyGroup,Inc.
Project Overview
• 475 completed online surveys with IT and business professionals who are familiar with their organization’s current big data, database, data warehouse, business intelligence (BI), and/or analytics solutions, as well as forward-looking strategies
• Midmarket organizations (defined as organizations with 100 to 999 employees) and enterprise organizations (defined as organizations with 1,000 or more employees) in North America
• 58% midmarket vs. 42% enterprise
• Multiple industry verticals including financial, business services, manufacturing, information technology, and retail
©2016byTheEnterpriseStrategyGroup,Inc.
Respondents by Current Area of Responsibility
Databaseadministrator,26%
Managerofdevelopmentordeveloperofbusinessintelligence/analytics
solutions,23%Dataengineer,13%
Dataanalyst,12%
Datawarehouse/businessintelligence/analytics
manager,9%
Enterpriseordataarchitect,7%
Businessanalyst,6%
Datascientist,5%
Whichofthefollowingbestdescribesyourprimaryareaofresponsibility?(Percentofrespondents,N=475)
©2016byTheEnterpriseStrategyGroup,Inc.
Respondents by Industry
Manufacturing,15%
BusinessServices(accounting,consulting,
legal,etc.),15%
Financial (banking,financial,insurance),
15%Retail/Wholesale,11%
InformationTechnology,10%
HealthCare,7%
Communications&Media,6%
Government(Federal/National,State/Province/Local),4%
Other,19%
Whatisyourorganization’sprimaryindustry?(Percentofrespondents,N=475)
©2016byTheEnterpriseStrategyGroup,Inc.
1. Who is responsible for your big data strategy?
©2016byTheEnterpriseStrategyGroup,Inc.
Myth
Big data is…
Just marketing hype regurgitated as “strategic initiatives” by clueless business executives.
Warner Bros., 1984
©2016byTheEnterpriseStrategyGroup,Inc.
The Bigger Truth
Big data is…
An opportunity for IT to capture enthusiasm for data-based insightsand reshape business activities.
20th Century Fox, 1984
©2016byTheEnterpriseStrategyGroup,Inc.
Who is defining projects?
18%
19%
20%
26%
28%
29%
30%
40%
41%
45%
Legal/risk/compliance
Marketingmanagement
Salesmanagement
Informationsecurityteams
Operationsmanagement
Seniorbusinessexecutives(e.g.,CEO,CFO,etc.)
Businessanalyst/datascientistteam
SeniorITexecutives(e.g.,CIO,CTO,etc.)
ITinfrastructureandoperationsteam
ITapplications team
Whichofthefollowinggroupsareinitiatingnewprojectsintheareaofbigdataandanalytics?(Percentofrespondents,N=475,multipleresponsesaccepted)
©2016byTheEnterpriseStrategyGroup,Inc.
Who is doing the work?
23%
23%
25%
27%
29%
32%
52%
53%
Serviceprovider
Value-addedreseller (VAR)
Businessapplicationvendor
Managementconsultancy
Systemsintegrator(SI)
Businessanalyst/datascientistteam
ITapplications team
ITinfrastructureandoperationsteam
Whichofthefollowinggroupsprovidestheskillsandmanpowertoimplementandmanagethetechnologiessupportinginitiativesinthe areaofbigdataandanalytics?(Percentofrespondents,
N=475,multipleresponsesaccepted)
©2016byTheEnterpriseStrategyGroup,Inc.
What other stakeholders should be engaged?
29%
29%
32%
33%
35%
43%
45%
48%
53%
47%
46%
48%
42%
40%
18%
12%
13%
15%
13%
11%
10%
3%
4%
6%
5%
2%
3%
3%
2%
1%
1%
1%
2%
1%
1%
0% 10% 20% 30% 40% 50% 60% 70% 80% 90% 100%
Storageteam
Applicationsteam
Networkingteam
Infrastructure/cloudarchitects
Server/virtualizationteam
Database/BI/analyticsteam
Security/risk/governanceteam
Howimportantistheinvolvementofthefollowing ITdisciplinesfornewinitiatives andprojectsinthe areaofbigdataandanalyticstobesuccessful?(Percent ofrespondents,N=475)
Crucial Important Nice-to-have,butnotrequired Completelyunnecessary Don’tknow/noopinion
©2016byTheEnterpriseStrategyGroup,Inc.
2. How long will it take for Spark to add value?
©2016byTheEnterpriseStrategyGroup,Inc.
Myth
Big data is…
A science project doomed to waste time and moneyand won’t make a difference
Universal Pictures, 1986
©2016byTheEnterpriseStrategyGroup,Inc.
The Bigger Truth
Big data is…
Going to deliverreal business value(if not always overnight)
Tristar, 1985
©2016byTheEnterpriseStrategyGroup,Inc.
How important are these initiatives?
48%
32%
14%
3% 2% 1%
20%
40%
21%13%
4% 1%
Ourmostimportantpriority
Oneofourtop5priorities
Oneofourtop10priorities
Oneofourtop20priorities
Notamongourtop20priorities
Don’tknow/noopinion
Relativetoallofyourorganization’sbusinessandITprioritiesoverthenext12-18months,howwouldyouratetheimportanceofitsbigdataanalyticsprojectsand
initiatives?(Percentofrespondents,N=475)
Importanceofbigdataanalyticsprojectsandinitiativesrelativetoallbusinesspriorities
ImportanceofbigdataanalyticsprojectsandinitiativesrelativetoallITpriorities
©2016byTheEnterpriseStrategyGroup,Inc.
When will we see business value?
7%
15%
35%31%
11%
0%
Wewillseevalueimmediately
1monthto6months 7monthsto12months
13monthsto24months
25monthsto36months
Morethan36months
Fornewinitiativesintheareaofbigdataandanalytics, howlongdoyouthinkitwilltakeforyourorganization tostart seeingsignificantbusinessvalue?(Percentof
respondents,N=475)
©2016byTheEnterpriseStrategyGroup,Inc.
3. What matters most in evaluating offerings?
©2016byTheEnterpriseStrategyGroup,Inc.
Myth
Big data is…
Defined by 5 Vsand synonymouswith Hadoopas a big data lake
20th Century Fox, 1988
©2016byTheEnterpriseStrategyGroup,Inc.
The Bigger Truth
Big data is…
Evaluated againsttraditional enterprise IT operational requirements
Warner Bros., 1983
©2016byTheEnterpriseStrategyGroup,Inc.
What matters in a big data solution?
9%11%12%12%13%14%15%16%16%
19%21%
34%34%
37%
Openstandards-basedOpensource-based
EaseofadministrationReportingand/orvisualizationPubliccloudhostingoptions
AgilityScalability
RecoverabilityEaseofintegrationwithotherapplications,APIs,systems,etc.
AvailabilityCost,ROIand/orTCO
ReliabilityPerformance
Security
Whichofthefollowingattributesaremostimportanttoyourorganizationwhenconsideringtechnologysolutionsintheareaofbigdataandanalytics?(Percentofrespondents,N=475,threeresponsesaccepted)
©2016byTheEnterpriseStrategyGroup,Inc.
Spark Implementation Plans
AlreadyusingSpark,16%
VeryinterestedinSpark,47%
SomewhatinterestedinSpark,27%
Notatall interestedinSpark,4%
NotfamiliarwithSpark,6% Don’tknow,1%
Howwouldyourateyourorganization’sinterestinimplementing Spark?(Percentofrespondents,N=475)
©2016byTheEnterpriseStrategyGroup,Inc.
Factors Driving Interest in Spark
22%
25%
29%
30%
31%
31%
35%
37%
FasterthanMapReduce
Machinelearning
SQLquerying
Graphanalytics
Deploymentflexibility(i.e.,canrunstandalone,incloud,onHadoop,Mesos,etc.)
Abilitytocombine(with)HDFS,Cassandra,Hbase,Hive,Tachyon,andAmazonS3data
Streaminganalytics
Easiertoprogram(inJava,Scala,Python,orR)thanusingHadoop
WhatarethetopreasonsyourorganizationisinterestedinSpark?(Percentofrespondents,N=426,threeresponsesaccepted)
©2016byTheEnterpriseStrategyGroup,Inc.
Hadoop Implementation Plans
AlreadyusingHadoop,20%
VeryinterestedinHadoop,37%
SomewhatinterestedinHadoop,27%
NotatallinterestedinHadoop,5%
NotfamiliarwithHadooptechnology,8%
Don’tknow,3%
Howwouldyourateyourorganization’sinterestinimplementing Hadoop?(Percent ofrespondents,N=475)
©2016byTheEnterpriseStrategyGroup,Inc.
Most Important Features for SQL on Hadoop
13%
24%
24%
24%
26%
28%
28%
35%
45%
ManageableworkloadswithYARN
Schema-lessorschema-on-read
CompleteANSISQLsupport
Supportforcomplexornesteddatatypes
Breadthoffiletypessupported (e.g.,Parquet,JSON,text,AVRO,Hive,etc.)
Richsecuritycontrols
ReliabilitywithfullACIDcompliance
Highconcurrency/parallelized
Highperformance/lowlatency
WhenconsideringsolutionstouseSQLonHadoop,whatarethemostimportantfeatures?(Percentofrespondents,N=94,threeresponsesaccepted)
©2016byTheEnterpriseStrategyGroup,Inc.
Disrupting Traditional Data Warehouses
Hadoopwilllargelyreplaceourexistingdata
warehouse,26%
Hadoopwilloffload/optimizeour
existingdatawarehouse,36%
Hadoopwillbeusedonlyforlimiteddatawarehouse-likefunctions,28%
NoplanstouseHadoopforanydatawarehouse-likefunctions,11%
HowdoyouanticipateHadoopwill fit againstyourorganization’straditionaldatawarehouseapproach?(Percentofrespondents,N=94)
©2016byTheEnterpriseStrategyGroup,Inc.
4. Why does open source matter to Spark?
©2016byTheEnterpriseStrategyGroup,Inc.
Myth
Universal, 1985
Open source is…
An essential elementof collaboration, innovation,and customer confidence
©2016byTheEnterpriseStrategyGroup,Inc.
The Bigger Truth
Open source is…
An essential elementof collaboration, innovation,and customer confidence
Columbia Pictures, 1986
©2016byTheEnterpriseStrategyGroup,Inc.
The Importance of Open Source
Critical,54%Veryimportant,41%
Somewhatimportant,2%
Notaconsideration,2%
Howimportantisopensourcesupport/participationtoyourchoiceofHadoopdistribution(s)? (Percentofrespondents,N=94)
©2016byTheEnterpriseStrategyGroup,Inc.
Will Open Data Platform Add Value?
Yes,85%
No,9%
Don'tknow,6%
DoyoubelievetheOpenDataPlatform (acommonHadoopplatformpartnershipbetweenvendorssuchasHortonworks,IBM,Pivotal,andothers)will addvaluefor
Hadoop?(Percent ofrespondents,N=94)
©2016byTheEnterpriseStrategyGroup,Inc.
Type of Distribution Adopters are Evaluating
Pureopensourcedistribution (i.e.,Apache
Hadoop),24%
Commercialdistribution(e.g.,Cloudera,MapR,Hortonworks, etc.),35%
Hydridapproach (i.e.,someopensource
combinedwithsomecommercialdistributions),
40%
Don’tknow,1%
WhichofthefollowingdescribesthetypeofHadoopdistribution(s) yourorganizationiscurrentlyevaluating? (Percentofrespondents,N=300)
©2016byTheEnterpriseStrategyGroup,Inc.
5. Where and how should big data be deployed?
©2016byTheEnterpriseStrategyGroup,Inc.
The Bigger Truth
Big data is…
Best done on-premiseswith commodity servers
Universal, 1989
©2016byTheEnterpriseStrategyGroup,Inc.
The Bigger Truth
Big data is…
Continuing to evolve inmany directions, including cloud
Warner Bros., 1984
©2016byTheEnterpriseStrategyGroup,Inc.
Primary Deployment Strategy for New Projects
Dedicatedon-premisesinfrastructure(i.e.,supportasingleanalytics
workloadwithasingleplatform),33%
Sharedon-premisesinfrastructure(i.e.,supportsmixedanalyticsworkloadson
singleplatform),25%
Pre-configuredon-premisesappliances/systemspurpose-builttosupportanalyticsworkloads (i.e.,converged infrastructure),23%
Publiccloudorhostedmanagedservices(i.e.,SaaS,IaaS,PaaS),12%
Hybridcloud (i.e.,on-premisesinfrastructureandpubliccloud
services),6%
Don’tknow,1%
Intermsofnet-newbigdataandanalyticsdeployments,whichofthefollowingbestdescribestheprimarydeploymentstrategyyourorganizationwill likelyusegoingforward?
(Percentofrespondents,N=475)
©2016byTheEnterpriseStrategyGroup,Inc.
Public Cloud Services in Consideration
4%
34%
38%
39%
41%
43%
43%
Noneoftheabove
Hadoop
Businessintelligence
Analytics
Datawarehouses
Databases
Spark
Forwhichofthefollowingareyouconsideringpubliccloudservices?(Percentofrespondents,N=475,multipleresponsesaccepted)
©2016byTheEnterpriseStrategyGroup,Inc.
Advantages for Cloud-based Big Data
16%
20%
21%
24%
25%
26%
27%
27%
27%
32%
Pay-as-you-go(i.e.,OpExvs.CapEx)
Avoidsystemsintegrationeffortandriskofbuilding…
Geographiccoverage
Moreelasticity(i.e.,scalingupanddownresources)
Morefrequentfeature/functionalityupdates
Fastertimetodeployfornewprojects
Betteravailabilitythanyoucouldguaranteeon-premises
Datasourcesandapplicationsalreadycloud-based
Fastertimetovaluefornewprojects
Bettersecuritythanyoucouldguaranteeon-premises
Whichofthefollowingdoyouconsidertobeadvantagesforcloud-basedbigdataandanalyticssolutions?(Percentofrespondents,N=452,threeresponsesaccepted)
©2016byTheEnterpriseStrategyGroup,Inc.
Disadvantages for Cloud-based Big Data
19%
19%
20%
21%
23%
23%
25%
26%
27%
28%
Proprietarynatureofplatforms
Unpredictableupgradestoplatform
Difficultyofdatamigrationandconsistency,orETLto/fromcloud
Reducedfunctionality
Concernsaboutintegrationand/orAPIs
Vendorlock-in
Availability/outages
Storagecosts/economics
Processingcosts/economics
Concernsaboutsecurity,privacy,orgovernance
Whichofthefollowingdoyouconsidertobedisadvantages/barrierstoadoptionforcloud-basedbigdataandanalyticssolutions?(Percentofrespondents,N=452,threeresponsesaccepted)
©2016byTheEnterpriseStrategyGroup,Inc.
Preferred Model of Cloud-based Services
Infrastructure-level(i.e.,IaaSofferingserversand
storage),32%
Platform-level(i.e.,PaaSwithapplication
development tools),32%
Application-level(i.e.,SaaSwithbuilt-inanalytics,
visualization,BI,etc.),25%
Fullyhosted/activelymanagedenvironments(i.e.,advising,monitoring,troubleshooting),10%
Noneoftheabove,1%
Whichisyourorganization’spreferredmodelofcloud-basedservices? (Percentofrespondents,N=452)
Thank You
Enterprise Strategy Group | Getting to the bigger truth.™
http://www.twitter.com/esg-global
http://www.facebook.com/ESGglobal
https://www.linkedin.com/groups?gid=1295607&trk=myg_ugrp_ovr
http://www.youtube.com/user/ESGglobal
FOLLOW ESG
Nik Rouda, [email protected]
THANK YOU.Nik Rouda, SeniorAnalyst
@nrouda