5 myths about spark and big data by nik rouda

43
5 Myths about Spark & Big Data (and where it goes next) Nik Rouda Senior Analyst, ESG

Upload: spark-summit

Post on 08-Jan-2017

5.635 views

Category:

Data & Analytics


0 download

TRANSCRIPT

Page 1: 5 Myths about Spark and Big Data by Nik Rouda

5 Myths about Spark & Big Data(and where it goes next)

Nik RoudaSenior Analyst, ESG

Page 2: 5 Myths about Spark and Big Data by Nik Rouda

AboutESG• ESGisanITanalyst,research,validation,andstrategycompany• Established in1999• 52employees /29analysts,researchers,andconsultants• ESGconductsresearchwithandforITvendors,ITprofessionals,businessprofessionals,andchannelpartners

• OngoinganalystcoverageinCloudComputing,Cybersecurity, DataManagement&Analytics,DataProtection,Storage,Networking, ApplicationDevelopment&Delivery,EnterpriseMobility, andChannels

• Keycapabilities include:Research&advisoryservices, custommarketresearch,strategicconsulting,technicalandeconomicvalidation,andcustomcontent

2

Page 3: 5 Myths about Spark and Big Data by Nik Rouda

©2016byTheEnterpriseStrategyGroup,Inc.

Thankstoourhost:Databricks

Winner intheHadoop&Sparkcategory

http://blog.esg-global.com/the-delta-v-awards

Page 4: 5 Myths about Spark and Big Data by Nik Rouda

©2016byTheEnterpriseStrategyGroup,Inc.

Data-Driven: Enterprise Big Data, BI, and Analytics Trends Redux

Page 5: 5 Myths about Spark and Big Data by Nik Rouda

©2016byTheEnterpriseStrategyGroup,Inc.

Project Overview

• 475 completed online surveys with IT and business professionals who are familiar with their organization’s current big data, database, data warehouse, business intelligence (BI), and/or analytics solutions, as well as forward-looking strategies

• Midmarket organizations (defined as organizations with 100 to 999 employees) and enterprise organizations (defined as organizations with 1,000 or more employees) in North America

• 58% midmarket vs. 42% enterprise

• Multiple industry verticals including financial, business services, manufacturing, information technology, and retail

Page 6: 5 Myths about Spark and Big Data by Nik Rouda

©2016byTheEnterpriseStrategyGroup,Inc.

Respondents by Current Area of Responsibility

Databaseadministrator,26%

Managerofdevelopmentordeveloperofbusinessintelligence/analytics

solutions,23%Dataengineer,13%

Dataanalyst,12%

Datawarehouse/businessintelligence/analytics

manager,9%

Enterpriseordataarchitect,7%

Businessanalyst,6%

Datascientist,5%

Whichofthefollowingbestdescribesyourprimaryareaofresponsibility?(Percentofrespondents,N=475)

Page 7: 5 Myths about Spark and Big Data by Nik Rouda

©2016byTheEnterpriseStrategyGroup,Inc.

Respondents by Industry

Manufacturing,15%

BusinessServices(accounting,consulting,

legal,etc.),15%

Financial (banking,financial,insurance),

15%Retail/Wholesale,11%

InformationTechnology,10%

HealthCare,7%

Communications&Media,6%

Government(Federal/National,State/Province/Local),4%

Other,19%

Whatisyourorganization’sprimaryindustry?(Percentofrespondents,N=475)

Page 8: 5 Myths about Spark and Big Data by Nik Rouda

©2016byTheEnterpriseStrategyGroup,Inc.

1. Who is responsible for your big data strategy?

Page 9: 5 Myths about Spark and Big Data by Nik Rouda

©2016byTheEnterpriseStrategyGroup,Inc.

Myth

Big data is…

Just marketing hype regurgitated as “strategic initiatives” by clueless business executives.

Warner Bros., 1984

Page 10: 5 Myths about Spark and Big Data by Nik Rouda

©2016byTheEnterpriseStrategyGroup,Inc.

The Bigger Truth

Big data is…

An opportunity for IT to capture enthusiasm for data-based insightsand reshape business activities.

20th Century Fox, 1984

Page 11: 5 Myths about Spark and Big Data by Nik Rouda

©2016byTheEnterpriseStrategyGroup,Inc.

Who is defining projects?

18%

19%

20%

26%

28%

29%

30%

40%

41%

45%

Legal/risk/compliance

Marketingmanagement

Salesmanagement

Informationsecurityteams

Operationsmanagement

Seniorbusinessexecutives(e.g.,CEO,CFO,etc.)

Businessanalyst/datascientistteam

SeniorITexecutives(e.g.,CIO,CTO,etc.)

ITinfrastructureandoperationsteam

ITapplications team

Whichofthefollowinggroupsareinitiatingnewprojectsintheareaofbigdataandanalytics?(Percentofrespondents,N=475,multipleresponsesaccepted)

Page 12: 5 Myths about Spark and Big Data by Nik Rouda

©2016byTheEnterpriseStrategyGroup,Inc.

Who is doing the work?

23%

23%

25%

27%

29%

32%

52%

53%

Serviceprovider

Value-addedreseller (VAR)

Businessapplicationvendor

Managementconsultancy

Systemsintegrator(SI)

Businessanalyst/datascientistteam

ITapplications team

ITinfrastructureandoperationsteam

Whichofthefollowinggroupsprovidestheskillsandmanpowertoimplementandmanagethetechnologiessupportinginitiativesinthe areaofbigdataandanalytics?(Percentofrespondents,

N=475,multipleresponsesaccepted)

Page 13: 5 Myths about Spark and Big Data by Nik Rouda

©2016byTheEnterpriseStrategyGroup,Inc.

What other stakeholders should be engaged?

29%

29%

32%

33%

35%

43%

45%

48%

53%

47%

46%

48%

42%

40%

18%

12%

13%

15%

13%

11%

10%

3%

4%

6%

5%

2%

3%

3%

2%

1%

1%

1%

2%

1%

1%

0% 10% 20% 30% 40% 50% 60% 70% 80% 90% 100%

Storageteam

Applicationsteam

Networkingteam

Infrastructure/cloudarchitects

Server/virtualizationteam

Database/BI/analyticsteam

Security/risk/governanceteam

Howimportantistheinvolvementofthefollowing ITdisciplinesfornewinitiatives andprojectsinthe areaofbigdataandanalyticstobesuccessful?(Percent ofrespondents,N=475)

Crucial Important Nice-to-have,butnotrequired Completelyunnecessary Don’tknow/noopinion

Page 14: 5 Myths about Spark and Big Data by Nik Rouda

©2016byTheEnterpriseStrategyGroup,Inc.

2. How long will it take for Spark to add value?

Page 15: 5 Myths about Spark and Big Data by Nik Rouda

©2016byTheEnterpriseStrategyGroup,Inc.

Myth

Big data is…

A science project doomed to waste time and moneyand won’t make a difference

Universal Pictures, 1986

Page 16: 5 Myths about Spark and Big Data by Nik Rouda

©2016byTheEnterpriseStrategyGroup,Inc.

The Bigger Truth

Big data is…

Going to deliverreal business value(if not always overnight)

Tristar, 1985

Page 17: 5 Myths about Spark and Big Data by Nik Rouda

©2016byTheEnterpriseStrategyGroup,Inc.

How important are these initiatives?

48%

32%

14%

3% 2% 1%

20%

40%

21%13%

4% 1%

Ourmostimportantpriority

Oneofourtop5priorities

Oneofourtop10priorities

Oneofourtop20priorities

Notamongourtop20priorities

Don’tknow/noopinion

Relativetoallofyourorganization’sbusinessandITprioritiesoverthenext12-18months,howwouldyouratetheimportanceofitsbigdataanalyticsprojectsand

initiatives?(Percentofrespondents,N=475)

Importanceofbigdataanalyticsprojectsandinitiativesrelativetoallbusinesspriorities

ImportanceofbigdataanalyticsprojectsandinitiativesrelativetoallITpriorities

Page 18: 5 Myths about Spark and Big Data by Nik Rouda

©2016byTheEnterpriseStrategyGroup,Inc.

When will we see business value?

7%

15%

35%31%

11%

0%

Wewillseevalueimmediately

1monthto6months 7monthsto12months

13monthsto24months

25monthsto36months

Morethan36months

Fornewinitiativesintheareaofbigdataandanalytics, howlongdoyouthinkitwilltakeforyourorganization tostart seeingsignificantbusinessvalue?(Percentof

respondents,N=475)

Page 19: 5 Myths about Spark and Big Data by Nik Rouda

©2016byTheEnterpriseStrategyGroup,Inc.

3. What matters most in evaluating offerings?

Page 20: 5 Myths about Spark and Big Data by Nik Rouda

©2016byTheEnterpriseStrategyGroup,Inc.

Myth

Big data is…

Defined by 5 Vsand synonymouswith Hadoopas a big data lake

20th Century Fox, 1988

Page 21: 5 Myths about Spark and Big Data by Nik Rouda

©2016byTheEnterpriseStrategyGroup,Inc.

The Bigger Truth

Big data is…

Evaluated againsttraditional enterprise IT operational requirements

Warner Bros., 1983

Page 22: 5 Myths about Spark and Big Data by Nik Rouda

©2016byTheEnterpriseStrategyGroup,Inc.

What matters in a big data solution?

9%11%12%12%13%14%15%16%16%

19%21%

34%34%

37%

Openstandards-basedOpensource-based

EaseofadministrationReportingand/orvisualizationPubliccloudhostingoptions

AgilityScalability

RecoverabilityEaseofintegrationwithotherapplications,APIs,systems,etc.

AvailabilityCost,ROIand/orTCO

ReliabilityPerformance

Security

Whichofthefollowingattributesaremostimportanttoyourorganizationwhenconsideringtechnologysolutionsintheareaofbigdataandanalytics?(Percentofrespondents,N=475,threeresponsesaccepted)

Page 23: 5 Myths about Spark and Big Data by Nik Rouda

©2016byTheEnterpriseStrategyGroup,Inc.

Spark Implementation Plans

AlreadyusingSpark,16%

VeryinterestedinSpark,47%

SomewhatinterestedinSpark,27%

Notatall interestedinSpark,4%

NotfamiliarwithSpark,6% Don’tknow,1%

Howwouldyourateyourorganization’sinterestinimplementing Spark?(Percentofrespondents,N=475)

Page 24: 5 Myths about Spark and Big Data by Nik Rouda

©2016byTheEnterpriseStrategyGroup,Inc.

Factors Driving Interest in Spark

22%

25%

29%

30%

31%

31%

35%

37%

FasterthanMapReduce

Machinelearning

SQLquerying

Graphanalytics

Deploymentflexibility(i.e.,canrunstandalone,incloud,onHadoop,Mesos,etc.)

Abilitytocombine(with)HDFS,Cassandra,Hbase,Hive,Tachyon,andAmazonS3data

Streaminganalytics

Easiertoprogram(inJava,Scala,Python,orR)thanusingHadoop

WhatarethetopreasonsyourorganizationisinterestedinSpark?(Percentofrespondents,N=426,threeresponsesaccepted)

Page 25: 5 Myths about Spark and Big Data by Nik Rouda

©2016byTheEnterpriseStrategyGroup,Inc.

Hadoop Implementation Plans

AlreadyusingHadoop,20%

VeryinterestedinHadoop,37%

SomewhatinterestedinHadoop,27%

NotatallinterestedinHadoop,5%

NotfamiliarwithHadooptechnology,8%

Don’tknow,3%

Howwouldyourateyourorganization’sinterestinimplementing Hadoop?(Percent ofrespondents,N=475)

Page 26: 5 Myths about Spark and Big Data by Nik Rouda

©2016byTheEnterpriseStrategyGroup,Inc.

Most Important Features for SQL on Hadoop

13%

24%

24%

24%

26%

28%

28%

35%

45%

ManageableworkloadswithYARN

Schema-lessorschema-on-read

CompleteANSISQLsupport

Supportforcomplexornesteddatatypes

Breadthoffiletypessupported (e.g.,Parquet,JSON,text,AVRO,Hive,etc.)

Richsecuritycontrols

ReliabilitywithfullACIDcompliance

Highconcurrency/parallelized

Highperformance/lowlatency

WhenconsideringsolutionstouseSQLonHadoop,whatarethemostimportantfeatures?(Percentofrespondents,N=94,threeresponsesaccepted)

Page 27: 5 Myths about Spark and Big Data by Nik Rouda

©2016byTheEnterpriseStrategyGroup,Inc.

Disrupting Traditional Data Warehouses

Hadoopwilllargelyreplaceourexistingdata

warehouse,26%

Hadoopwilloffload/optimizeour

existingdatawarehouse,36%

Hadoopwillbeusedonlyforlimiteddatawarehouse-likefunctions,28%

NoplanstouseHadoopforanydatawarehouse-likefunctions,11%

HowdoyouanticipateHadoopwill fit againstyourorganization’straditionaldatawarehouseapproach?(Percentofrespondents,N=94)

Page 28: 5 Myths about Spark and Big Data by Nik Rouda

©2016byTheEnterpriseStrategyGroup,Inc.

4. Why does open source matter to Spark?

Page 29: 5 Myths about Spark and Big Data by Nik Rouda

©2016byTheEnterpriseStrategyGroup,Inc.

Myth

Universal, 1985

Open source is…

An essential elementof collaboration, innovation,and customer confidence

Page 30: 5 Myths about Spark and Big Data by Nik Rouda

©2016byTheEnterpriseStrategyGroup,Inc.

The Bigger Truth

Open source is…

An essential elementof collaboration, innovation,and customer confidence

Columbia Pictures, 1986

Page 31: 5 Myths about Spark and Big Data by Nik Rouda

©2016byTheEnterpriseStrategyGroup,Inc.

The Importance of Open Source

Critical,54%Veryimportant,41%

Somewhatimportant,2%

Notaconsideration,2%

Howimportantisopensourcesupport/participationtoyourchoiceofHadoopdistribution(s)? (Percentofrespondents,N=94)

Page 32: 5 Myths about Spark and Big Data by Nik Rouda

©2016byTheEnterpriseStrategyGroup,Inc.

Will Open Data Platform Add Value?

Yes,85%

No,9%

Don'tknow,6%

DoyoubelievetheOpenDataPlatform (acommonHadoopplatformpartnershipbetweenvendorssuchasHortonworks,IBM,Pivotal,andothers)will addvaluefor

Hadoop?(Percent ofrespondents,N=94)

Page 33: 5 Myths about Spark and Big Data by Nik Rouda

©2016byTheEnterpriseStrategyGroup,Inc.

Type of Distribution Adopters are Evaluating

Pureopensourcedistribution (i.e.,Apache

Hadoop),24%

Commercialdistribution(e.g.,Cloudera,MapR,Hortonworks, etc.),35%

Hydridapproach (i.e.,someopensource

combinedwithsomecommercialdistributions),

40%

Don’tknow,1%

WhichofthefollowingdescribesthetypeofHadoopdistribution(s) yourorganizationiscurrentlyevaluating? (Percentofrespondents,N=300)

Page 34: 5 Myths about Spark and Big Data by Nik Rouda

©2016byTheEnterpriseStrategyGroup,Inc.

5. Where and how should big data be deployed?

Page 35: 5 Myths about Spark and Big Data by Nik Rouda

©2016byTheEnterpriseStrategyGroup,Inc.

The Bigger Truth

Big data is…

Best done on-premiseswith commodity servers

Universal, 1989

Page 36: 5 Myths about Spark and Big Data by Nik Rouda

©2016byTheEnterpriseStrategyGroup,Inc.

The Bigger Truth

Big data is…

Continuing to evolve inmany directions, including cloud

Warner Bros., 1984

Page 37: 5 Myths about Spark and Big Data by Nik Rouda

©2016byTheEnterpriseStrategyGroup,Inc.

Primary Deployment Strategy for New Projects

Dedicatedon-premisesinfrastructure(i.e.,supportasingleanalytics

workloadwithasingleplatform),33%

Sharedon-premisesinfrastructure(i.e.,supportsmixedanalyticsworkloadson

singleplatform),25%

Pre-configuredon-premisesappliances/systemspurpose-builttosupportanalyticsworkloads (i.e.,converged infrastructure),23%

Publiccloudorhostedmanagedservices(i.e.,SaaS,IaaS,PaaS),12%

Hybridcloud (i.e.,on-premisesinfrastructureandpubliccloud

services),6%

Don’tknow,1%

Intermsofnet-newbigdataandanalyticsdeployments,whichofthefollowingbestdescribestheprimarydeploymentstrategyyourorganizationwill likelyusegoingforward?

(Percentofrespondents,N=475)

Page 38: 5 Myths about Spark and Big Data by Nik Rouda

©2016byTheEnterpriseStrategyGroup,Inc.

Public Cloud Services in Consideration

4%

34%

38%

39%

41%

43%

43%

Noneoftheabove

Hadoop

Businessintelligence

Analytics

Datawarehouses

Databases

Spark

Forwhichofthefollowingareyouconsideringpubliccloudservices?(Percentofrespondents,N=475,multipleresponsesaccepted)

Page 39: 5 Myths about Spark and Big Data by Nik Rouda

©2016byTheEnterpriseStrategyGroup,Inc.

Advantages for Cloud-based Big Data

16%

20%

21%

24%

25%

26%

27%

27%

27%

32%

Pay-as-you-go(i.e.,OpExvs.CapEx)

Avoidsystemsintegrationeffortandriskofbuilding…

Geographiccoverage

Moreelasticity(i.e.,scalingupanddownresources)

Morefrequentfeature/functionalityupdates

Fastertimetodeployfornewprojects

Betteravailabilitythanyoucouldguaranteeon-premises

Datasourcesandapplicationsalreadycloud-based

Fastertimetovaluefornewprojects

Bettersecuritythanyoucouldguaranteeon-premises

Whichofthefollowingdoyouconsidertobeadvantagesforcloud-basedbigdataandanalyticssolutions?(Percentofrespondents,N=452,threeresponsesaccepted)

Page 40: 5 Myths about Spark and Big Data by Nik Rouda

©2016byTheEnterpriseStrategyGroup,Inc.

Disadvantages for Cloud-based Big Data

19%

19%

20%

21%

23%

23%

25%

26%

27%

28%

Proprietarynatureofplatforms

Unpredictableupgradestoplatform

Difficultyofdatamigrationandconsistency,orETLto/fromcloud

Reducedfunctionality

Concernsaboutintegrationand/orAPIs

Vendorlock-in

Availability/outages

Storagecosts/economics

Processingcosts/economics

Concernsaboutsecurity,privacy,orgovernance

Whichofthefollowingdoyouconsidertobedisadvantages/barrierstoadoptionforcloud-basedbigdataandanalyticssolutions?(Percentofrespondents,N=452,threeresponsesaccepted)

Page 41: 5 Myths about Spark and Big Data by Nik Rouda

©2016byTheEnterpriseStrategyGroup,Inc.

Preferred Model of Cloud-based Services

Infrastructure-level(i.e.,IaaSofferingserversand

storage),32%

Platform-level(i.e.,PaaSwithapplication

development tools),32%

Application-level(i.e.,SaaSwithbuilt-inanalytics,

visualization,BI,etc.),25%

Fullyhosted/activelymanagedenvironments(i.e.,advising,monitoring,troubleshooting),10%

Noneoftheabove,1%

Whichisyourorganization’spreferredmodelofcloud-basedservices? (Percentofrespondents,N=452)

Page 42: 5 Myths about Spark and Big Data by Nik Rouda

Thank You

Enterprise Strategy Group | Getting to the bigger truth.™

http://www.twitter.com/esg-global

http://www.facebook.com/ESGglobal

https://www.linkedin.com/groups?gid=1295607&trk=myg_ugrp_ovr

http://www.youtube.com/user/ESGglobal

FOLLOW ESG

Nik Rouda, [email protected]

Page 43: 5 Myths about Spark and Big Data by Nik Rouda

THANK YOU.Nik Rouda, SeniorAnalyst

@nrouda