esg economic validation white paper the economic advantage ... · google bigquery, part of the...

16
© 2017 by The Enterprise Strategy Group, Inc. All Rights Reserved. By Aviv Kaufmann, Senior ESG Validation Analyst; and Nik Rouda, Senior Analyst April 2017 This ESG White Paper was commissioned by Google and is distributed under license from ESG. Enterprise Strategy Group | Getting to the bigger truth.The Economic Advantage of Google BigQuery On- Demand Serverless Analytics ESG Economic Validation White Paper

Upload: others

Post on 22-May-2020

5 views

Category:

Documents


0 download

TRANSCRIPT

©2017byTheEnterpriseStrategyGroup,Inc.AllRightsReserved.

ByAvivKaufmann,SeniorESGValidationAnalyst;andNikRouda,SeniorAnalystApril2017ThisESGWhitePaperwascommissionedbyGoogleandisdistributedunderlicensefromESG.

EnterpriseStrategyGroup|Gettingtothebiggertruth.™

TheEconomicAdvantageofGoogleBigQueryOn-DemandServerlessAnalytics

ESGEconomicValidationWhitePaper

WhitePaper:TheEconomicAdvantageofGoogleBigQueryOn-demandServerlessAnalytics 2

©2017byTheEnterpriseStrategyGroup,Inc.AllRightsReserved.

Contents

TheChallenge:EconomicalInsight.........................................................................................................................................3

TheSolution:GoogleBigQuery..............................................................................................................................................4

GoogleBigQueryversusAlternativeSolutions.......................................................................................................................5

ESG’sEconomicValueAuditProcess.....................................................................................................................................6

EconomicBenefitsofGoogleBigQuery..................................................................................................................................6

ESGEconomicValidation.......................................................................................................................................................8

ModeledScenario#1:SmallOrganization.......................................................................................................................10

ModeledScenario#2:MediumOrganization..................................................................................................................11

ModeledScenario#3:LargeOrganization.......................................................................................................................12

ModelConsiderations:PricingOptions............................................................................................................................13

TheBiggerTruth...................................................................................................................................................................15

WhitePaper:TheEconomicAdvantageofGoogleBigQueryOn-demandServerlessAnalytics 3

©2017byTheEnterpriseStrategyGroup,Inc.AllRightsReserved.

TheChallenge:EconomicalInsight

Whenitcomestogaininginsightfromyourdata,traditionalbusinessintelligenceanddatawarehousesolutionshaveleveragedyearsofinnovationtoarriveatintelligentsolutionsandmethodologiestogainvaluableinsightfromstructuredsetsofdata.Thesuccessofthesesolutionshasconditionedorganizationstothevalueofcollectingandanalyzingdata—resultingininitiativestocollectandprocessevenmore.Today,thesheervolumeandrateofdatagenerateddwarfsthatofdayspastandthepotentialvalueofthisdataleaveslittleincentivetothrottleback.Organizationsarenowfacedwiththerealityofstayingaheadoftheever-growingscaleandvelocityofthedatatheyaregenerating.Tobetterdoso,manyhaveturnedtobigdatasolutionspoweredbyHadoopanddatalakesformostorevenalloftheirdatadiscovery,organization,analytics,andreportinginitiatives.

Bigdatasolutionsdonotcomecheaply,simply,orquickly.Notonlyisalargeupfrontmonetaryinvestmentinhardwareandsupportedsoftwarerequired,butalsoanupfrontinvestmentintimetoplan,purchase,install,configure,andtestthesolutionisneededbeforedeliveringanyvaluetothebusiness.Expertadministratorsandoperatorsarerequiredtoadministerthesystem.Storagecapacityrequirementsgrowrapidlyandmassiveamountsofcomputepowerareessentialtoprocessingdataquickly.Itisnearlyimpossibletodecouplestoragecapacityandcomputepowertoscaleindependently.Systemsmustbegreatlyoverprovisionedforredundancyandfuturegrowth,andruntheriskofbeingobsoletedquickly.Quitesimply:Purchasingorbuildinganon-premisesbigdatasolutioncomesatabigcostwithabigriskofimpactingtimetoinsight.

Timetoinsightisanimportantmetricintoday’srapidlyevolving,knowledge-poweredindustries.ESG’sannualITspendingsurveyrevealsthatnearlyfouroutoftenorganizationsprioritizingbigdatainitiativesin2017expecttoallocatefundingtoenhancingtheirbusinessintelligencecapabilitiesandcustomerinsights,whichremainsapriorityasbusinessesseektodifferentiatefromtheircompetitionbyenablingasmarterworkforce(seeFigure1).1

Figure1.2017DataAnalyticsSpendingPriorities

Source:EnterpriseStrategyGroup,2017

1Source:ESGBrief,2017DataAnalyticsSpendingTrends,January2017.

10%

10%

14%

15%

15%

16%

21%

23%

33%

34%

35%

39%

Don'tknow/toosoontotell

Sparkplatform(s)

Machinelearning

NoSQLdatabases

Streamprocessingorstreaminganalytics

Hadoopplatform(s)

Datapreparation

Datawarehouse

SQLdatabases

Dataintegration

Cloud-basedanalytics

Businessintelligence

Inwhichofthefollowingareaswillyourorganizationmakethemostsignificantdataanalyticsinvestmentsoverthenext12-18months?(Percentofrespondents,N=337,fiveresponsesaccepted)

WhitePaper:TheEconomicAdvantageofGoogleBigQueryOn-demandServerlessAnalytics 4

©2017byTheEnterpriseStrategyGroup,Inc.AllRightsReserved.

Itisimportanttorememberthatmostorganizationsarenotinthebusinessofbuildingandoperatingbigdatasolutions,butrathertheyareinthebusinessofgeneratingdataandextractingvaluableinsightfromthisdata.SimilartowhatIaaSdidforphysicalinfrastructure,cloud-basedbigdatasolutionscanofferacost-effectiveandrapidlyscalablealternativetoDIYorintegratedon-premisessolutions.Tobestmeettheever-growingrequirementsoftheirbigdatainitiatives,organizationsmustkeepacloseeyeonandtrulyunderstandthecostsandbenefitsinvolvedwiththeon-premisesandcloudservicesolutionsavailabletostoretheirdata,runqueries,andextractinsight.

TheSolution:GoogleBigQuery

GoogleBigQueryisacloud-based,fullymanaged,serverlessenterprisedatawarehousethatsupportsanalyticsoverpetabyte-scaledata.Itdelivershigh-speedanalysisoflargedatasetswithoutrequiringinvestmentsinonsiteinfrastructureordatabaseadministrators.BigQueryscalesitsuseofhardwareupordowntomaximizeperformanceofeachquery,addingandremovingcomputeandstorageresourcesasrequired.

GoogleBigQuery,partoftheGoogleCloudPlatform,isdesignedtostreamlinebigdataanalysisandstorage,whileremovingtheoverheadandcomplexityofmaintainingonsitehardwareandadministrationresources.SomeofthespecificadvantagesofGoogleBigQueryforbusinessesthatworkwithbigdatainclude:

• TimetoValue-Userscangettheirdatawarehouseenvironmentonlinequicklyandeasily,withoutrequiringexpert-levelsystemanddatabaseadministrationskillsbyeliminatingtheinfrastructureandreducingthemanagement(knownas“No-Ops”or“Zero-Ops”).

• Simplicity–Completeallmajortasksrelatedtodatawarehouseanalyticsthroughanintuitiveinterfacewithoutthehassleofmanagingtheinfrastructure.

• Scalability–Scaleuptopetabytesordowntokilobytesdependingonyoursize,performance,andcostrequirements.

• Speed–Ingest,query,andexportPB-sizeddatasetswithimpressivespeedsusingtheGoogleCloudPlatformastheunderlyingcloudinfrastructure.

• Reliability–Ensurealways-onavailabilityandconstantuptimerunningontheGoogleCloudPlatformwithgeo-replicationacrossGoogledatacenters.

• Security–ProtectandcontrolaccesstoencryptedprojectsanddatasetsthroughGoogle’scloud-wideidentityandaccessmanagement(IAM).

• CostOptimization–Predictcostswithtransparentflatrateand/orpay-as-you-gopricing,andcontaincoststhroughtheuseofprojectanduserresourcequotas.

GoogleBigQueryisself-scaling;itidentifiesresourcerequirementsforeachquerytofinishquicklyandefficiently,andprovidesthoseresourcestomeetthedemand.Oncetheworkloadhascompleted,BigQueryreallocatesthoseresourcestootherprojectsandotherusers.Bothintransferringdatain,andinprocessingthatdataforresults,BigQuerydeliverstremendousspeedsevenatpetabytescales.Forenhanceddatadurability,BigQueryprovideshighavailabilityandreliabilitythroughgeographicreplicationthatiscompletelytransparenttoitsusers,andwithouttherequirementtoobtainthephysicalresourcesandspacetohouseitall.

Ultimately,GoogleBigQueryenablesorganizationstoaddressthecostandcomplexitychallengesassociatedwithbuildingandmaintainingafast,scalable,andresilientbigdatainfrastructure.ByleveragingGoogleBigQuery’scloud-basedapproach,thetimeandcosttraditionallydedicatedtoprotectingdataandguaranteeinguptimeisnearlyeliminated.WithGooglehandlingscalability,replication,protection,andrecovery,organizationscanfocusmoreongainingvaluableinsights,asopposedtoinfrastructuremanagement.

WhitePaper:TheEconomicAdvantageofGoogleBigQueryOn-demandServerlessAnalytics 5

©2017byTheEnterpriseStrategyGroup,Inc.AllRightsReserved.

GoogleBigQueryversusAlternativeSolutions

Thepurposeofthispaperistohelporganizationsunderstandandcomparethedirectandindirectcoststhatshouldbeconsideredwhenchoosingasolutiontostoretheirbigdataandperformqueriesagainstit.Thispapercomparesado-it-yourself(DIY)on-premisesHadoopclusterdeploymentwithinstance-basedcloudservicesfromAWS(RedshiftwithKinesis)aswellastheon-demandcloudservicefromGoogle(BigQuery).

Withanon-premisesHadoopcluster,theorganizationmustplan,deploy,maintain,andconfigurethephysicalhardwareandsoftwarerequiredtostorethedataandpowerthequeries.HadoopnodesarecomprisedofcommodityserverspopulatedwithlargeNL-SASdrivesthatareusedtostoreandprotectthedata.Substantialworkmustbedonetoadminister,configure,andoptimizeboththehardwaresolutionandtheHadoop/Hivesoftware.AWSRedshifthelpstogreatlysimplifythemanagementandeliminatethemaintenanceandtheneedtophysicallyadministerthehardware.LikeaHadoopcluster,theAWSsolutionisbasedontheconceptofnodes(albeitvirtualnodes).Toscalethedeployment,similarnodesofafixedcomputeandstoragecapacityareaddedsimultaneously,sometimesresultinginprovisioningmorecomputeorstoragecapabilitiesinordertomeettherequirementsoftheother.

Google’sBigQuerysolutioniscompletelyserverlessfromthecustomerperspective.Therearenonodestoplan,configure,orscale.Thecomplexityofsizing,managing,andmaintainingthephysicalinfrastructureishandledbehindthescenesbyGoogle,sotheburdenisremovedfromtheend-user.Figure2depictsthethreesolutionscomparedinthisanalysisandhowtheyeachimplementcompute,storage,administration,andquerymanagement.

Figure2.Hadoop,AWS,andGoogleImplementations

Source:EnterpriseStrategyGroup,2017

Analysts

• DataScientist• HadoopAdmin• DatabaseAdmin• HardwareAdmin

RedundantNetworkInfrastructure

• Compute

• Storage

• Software

HadoopCluster (NameNode/DataNodes)

.

.

.

Analysts

DataScientist

.

.

.

AWSRedshift

• Compute

• Storage

LeaderNode

ComputeNodes

DataScientist

CitizenDataScientists

SQLQueriesSQLQueries

TechnicalAnalysts

S3Storage(Ingestion)

AWSKinesis(Streaming)

SQLQueries

HadoopOn-premises AWSRedshift GoogleBigQuery

• Compute

• Storage

• Software

• Compute

• Storage

• Compute

• Storage

WhitePaper:TheEconomicAdvantageofGoogleBigQueryOn-demandServerlessAnalytics 6

©2017byTheEnterpriseStrategyGroup,Inc.AllRightsReserved.

ESG’sEconomicValueAuditProcess

ESG’sEconomicValueAudit(EVA)processisaprovenmethodforunderstanding,validating,quantifying,andmodelingtheeconomicvaluepropositionsofaproductorsolution.TheprocessleveragesESG’scorecompetenciesinmarketandindustryanalysis,forward-lookingresearch,andtechnical/economicvalidation.TheEVAauditprocessleveragesinterviewswithreal-worldcustomerswhohavehadexperiencewithbothGoogleBigQueryandalternativebigdatasolutionstohelpqualitativelyandquantitativelyvalidatethebenefitsthatGoogleBigQueryhasbroughttotheiroperations.ThisinformationisthenappliedtohelpestimatethecostsandbenefitsinESG’smodeledscenariosdepictedinthispaper.

EconomicBenefitsofGoogleBigQuery

GoogleBigQuery’sserverless,on-demandqueryservicewasdesignedandpricedtoprovidecustomerswithinsightquicklyandeconomically.ESG’sEconomicValueAuditprocessrevealedthatBigQuerycanprovidesignificantcostsavingsandeconomicbenefitopportunities.ESGfoundthatBigQuerycustomershaveenjoyedsignificanteconomicandoperationalsavingswhencomparedwithbothon-premisesHadoop-baseddeploymentandAWS’sRedshiftcloud-basedbigdatasolution.Thesebenefitsfallmainlyintothreecategories:eliminationofupfrontcapitalinvestment;operationalandadministrativesavings;andlowercostofcloudservices.

UpfrontCapitalInvestmentSavings

þ Anon-premisesHadoopdeploymentrequiresaverylargecapitalinvestmenttopurchasenodes,networkinginfrastructure,software,andlicenses.

þ Anon-premisesHadoopdeploymentrequiresasignificantamountofplanning,purchasing,configuration,andtestingpriortoprovidinganybigdatabenefittotheorganization.ThisresultsinadditionalupfrontoperationalexpensesandpotentiallylessrevenueduetoamuchlongertimetovaluewhencomparedwithBigQuery.

þ Forbothanon-premisesHadoopdeployment(hardwarenodes)andanAWSRedshiftdeployment(AWSvirtualinstances),thestoragecapacityisdirectlytiedtothecomputepowerandmemory,andnonecanbescaledindependentlyoftheother.Thiscanpotentiallyresultinoverprovisioningofcomputeorstorageresourcesinanefforttoscaletheother,duetotheinflexibilityofpre-definedmachines.

þ Forbothanon-premisesHadoopdeploymentandanAWSRedshiftdeployment,theorganizationmustspendtimetoplanandsizethedeployment,oftenoverprovisioningtoaccommodatetheworst-casescenario.

þ AWSRedshiftcustomersoftenchoosetopayupfront,reservedinstancepricingandbenefitfromsignificantdiscountsversusrunningRedshiftinstancesondemand.Customerscansaveupto75%overtheon-demandpricingbychoosing3-YearAll-Upfrontreservedinstancepricing.

BigQueryisserverlessandpaymentsareoftenmadecompletelyon-demand,basedonlyontheamountofdataprocessedandstoredpermonth.Thismeansupfrontpayments,planning,purchasing,installation,configuration,nodemanagement,ortestingarenotrequired.BigQuerycanprovidevalueintheformofinsightassoonasthedataismadeavailableonGCPstorage.MultiplecustomerswhoESGspokewithreportedthattheywereupandrunningqueriesonBigQueryinamatterofhoursratherthandays,weeks,orevenmonths.Othercustomerswhohadexperiencewithmultiplesolutionsstated:

“Wespentover$1MonaHive-basedwarehousethatwasalwaysrunningoutofmemoryandrequiredconstanttuningandmaintenance.Upgradescreatedinstabilityandcompletelytookthesystemdown.”

“…ofcoursewechosetopaytheupfrontreservedinstanceprice.Itdoesn’tmakemuchsensenotto.”

RemovingthefinancialbarrierofalargeupfrontinvestmentintermsoftimeandmoneymakesBigQueryanattractiveplatformfororganizationslookingtosimplytryouttheservice,orquicklyscaletheiranalyticscapabilities.

WhitePaper:TheEconomicAdvantageofGoogleBigQueryOn-demandServerlessAnalytics 7

©2017byTheEnterpriseStrategyGroup,Inc.AllRightsReserved.

OperationalandAdministrativeSavings

þ Anon-premisesHadoopdeploymentrequiresmorepeopletoadministerthesolution,oftenincludingahardwareandsoftwareadministrator,andoneormoreHadoopadministratorsand/oroperators.

þ Withanon-premisesHadoopsolution,analystsusuallymustworkwiththeHadoopoperatorstotranslateandexecutequeriesratherthanexecutingthequeriesthemselves.

þ Anon-premisesdeploymentmustbepoweredandcooled,andrequiresfloorspace—increasingtheoperationalcostofthedeployment.

þ On-premisesdeploymentsoftenrequiretheservicesofconsultantstohelptunethesolutionforoptimalperformancebecausethoseskillsgobeyondthoseofthetypicalHadoopadministrator.

þ On-premisesdeploymentsmustbemaintained,includingupdatingfirmware,theOS,securitypatches,andHadoopreleases,aswellastroubleshootingandresolvingissueswiththehardwareandsoftware.

þ AWSRedshiftissimilarlybasedontheconceptofnodescontainingafixedsetofresources.Thismaymakeplanning,deployment,andgrowthmorecomplexthanwithBigQuery’sserverlessservice.

þ AWSRedshiftrequirestheusertologintoa“leadernode”inordertorunrequestsagainstthepoolofRedshiftinstances.Thisaddedcomplexitymeansthatlesstechnicalanalystsmaycontinuetoleverageanoperatortohandlequeriesratherthanperformingthequeriesthemselves.

BigQuerydoesnotrequirededicatedadministrators,andcustomersreportedthatitissimpleenoughforanalyststorunqueriesontheirownbycuttingandpastingqueriesintoaweb-basedself-serviceportal.Incomparison,touseRedshift,analyststhatarecomfortableusingVPNcanlogintoaproxyserverrunninginAWStoconnecttotheRedshiftdeployment.ThisproxyservercanmakeitmoreprohibitiveformultiplegroupsondifferentACLstosharethesameRedshiftdeployment,resultingintheneedtodeployandmanageseparateinstancesforeachdepartmentandfurtherincreasingthemanagementcomplexityofthesolution.Whenaskedaboutoperationalcosts,thecustomerswithwhomESGspokeagreedthatRedshiftwasfareasiertomanage,operate,andmaintaincomparedwithanon-premisesHadoopdeployment,andalsoagreedthatBigQuerywassimplertooperateandmanagethantheirRedshiftdeploymenthadbeen.

“ThekindofpeoplethatcandebugHadoopexceptionsaretypicallynotclosetothebusiness.TheerrormessagesthatcomeoutofHadooparenotwellwrittenandaredifficulttotroubleshoot.Thisledtoretentionissues—welostgoodpeoplebecausetheyweresittingarounddealingwithissues.”

“Theweb-clientisahugebonus.AnalystsdonothavetouseanODBC/JDBC,theyjustneedaURL.”

ThesimplicityofBigQueryenablesanalystsandoperatorswithinthecompanytobecomecitizendatascientists—empoweringthemtotakecontroloftheirownqueries,removingtheirdependenceonothers,andultimatelyproducinghigherqualityinsightfortheorganizationinashorteramountoftime.

CloudCostSavings

þ BothGoogleBigQueryandAWSRedshiftprovideverywelldocumentedpricelistsandweb-basedpricingcalculatorstodetermineestimatedcloudcosts.

þ GoogleBigQuerypricingissimple:PayforyourstoragecapacityandpayfortheTBsprocessedeachmonth.Thereisnocomplexityoffiguringoutsizingorriskofoverprovisioning.

þ RedshiftOn-Demandpricingisbasedondeployingvirtualizedinstances(nodes)withfixedcomputeandstoragerequirements,meaningtheycannotbescaledindependently.

þ PoweringdownRedshiftnodesrequiresuseofsnapshotsormigrationofdata.Onlythesmallestdeploymentsgenerallybenefitfrompayingforlessthan24x7operation.

þ RedshiftOn-DemandpricingisgenerallyquitehighcomparedwithBigQueryandalsowhencomparedwithAWSReservedInstancepricingoptions.

þ ReservedInstancepricingonAWSprovidesaparticularinstancetype,makingitdifficulttotakeadvantageofthelatestCPUorstorageofferingsoverthelengthofyourcontract.BigQueryalwaysmakesuseofthelatestavailable

WhitePaper:TheEconomicAdvantageofGoogleBigQueryOn-demandServerlessAnalytics 8

©2017byTheEnterpriseStrategyGroup,Inc.AllRightsReserved.

technologiesautomaticallyandtransparently.

BigQuerypricingisdesignedwithsimplicityandeconomicsinmind.Thereareabsolutelynoupfrontcosts,andyousimplypayforwhatyoustoreandwhatyouquery.Thisistrue“on-demand”pricing,withouttheneedtoplan,configure,tune,orupdatenodes.Customersmaybewaryofpayinganon-demandper-TBprocessingfeewithoutknowingexactlyhowmuchdatatheywillprocessinagivenmonth.ESGvalidatedwithcustomersandwithinternaldatacollectedbyGCPthattheamountofdataprocessedonaverageisveryoftenequivalenttotheamountofdatastored.Inmanycases,thisresultsinaveryfavorablecostadvantageoverRedshift(inwhichyoumustpayforenoughinstancestocoverthestoragerequirements).

WithBigQuery,pricingissimplifiedandplanningisnotrequiredbecauseitisnotbasedontheconceptofdeployingpreconfiguredinstancesasbigDataNodes.BigQueryusersdonothavetoestimatethehardwarerequirementsoftheirconfigurations,andneverpayforwhatisnotrequired.Theydonothavetochosebetweenpayingveryhighmonthlyon-demandcostsorlockingthemselvesintoaone-orthree-yearcontractwithalargeupfrontinvestment.ThecustomerswhomESGspokewithsharedtheirconcernswithreservedinstancepricing:

“…wemadetheswitchtoBigQueryandtodaywestillhave9monthsleftinreservedinstancesonourRedshiftnodes.”

ESG’sanalysisinthefollowingsectionsofthispaperfurtherillustratetheoperationalandpricingadvantagesthatBigQueryholdsoveranAWSRedshiftdeployment.

ESGEconomicValidation

TovalidatetheeconomicbenefitsofGoogleBigQuerywhencomparedwithbothanon-premHadoopdeploymentandAWSRedshift,ESGcreatedrequirementsforthreemodeledorganizationsthatarerepresentativeoftypicalsmall,medium,andlargeorganizationsperformingqueriesagainsttheirdata.TherequirementsoftheseorganizationsareshowninTable1.

Table1.ModeledScenarioRequirements

Small Medium Large

NumberofDataCitizens 5Analysts 50Analysts 100Analysts

AmountofDataStored 100TB 500TB 1PB

TBsQueriedperMonth 50TB(0.5xDataStored) 500TB(1xDataStored) 4PB(4xDataStored)

StreamingApplications No Yes(2%ofQueries) Yes(5%ofQueries)

ESGmodeledandcomparedthecoststhatthesmall,medium,andlargeorganizationmightexpecttopayoverathree-yearperiodtoplan,purchase,deploy,operate,administer,andmaintainasolutiontoperformtheirqueries.TheassumptionsandcostsusedinthescenarioswerebasedontheresultsofdetailedinterviewswithBigQuerycustomerswhohaveexperiencewithtwoormoreofthesolutionscomparedandcouldhelptoquantifythecostsandrelativedifferencesbetweentheirdeployments.

Aswillbedescribedinmoredetaillaterinthispaper,therearemanyfactorstoconsiderwhenitcomestocloudservicespricing.BecauseAWS’sthree-yearreservedinstance(RI)pricingprovidedthelowesttotalcostsoverthreeyearswhencomparedwithotherAWSpricingoptions,ESGbasedtheTCOanalysisonthisupfrontpricingmodel.ESGalsosizedandpricedon-premisesHadoopnodesandAWSRedshiftinstancesthatwouldbeexpectedtoprovidesimilarperformanceresultswhencomparedwiththeBigQuerysolution.

WhitePaper:TheEconomicAdvantageofGoogleBigQueryOn-demandServerlessAnalytics 9

©2017byTheEnterpriseStrategyGroup,Inc.AllRightsReserved.

ESGleveragedknowledgeofmarkets,theindustry,andvendorsolutions,aswellasdetailedinterviewswithGoogleanditscustomerstomodelandpredictthecoststodeploy,administer,manage,maintain,andoperateeachofthesolutions.Thesecostsweregenerallybasedonthenumberofemployeesandcontractedservicesthatwererequiredfordifferentsizeddeployments,aswellasfortheindividualsolutions.Whereverpossible,directcomparisonsbetweenthesolutionswereusedtogaugerelativedifferencesinman-hourrequirements.ESG’smodeledthree-yearTCOanalysisconsideredthehigh-levelcostcategoriesdepictedinTable2andcreateddetailedmodelstoestimatethecostsofthesmall,medium,andlargescenariosforeachofthethreebigdatasolutions.

Table2.ModeledScenarioRequirements

CostCategory Description HadoopOn-premises AWSRedshift GoogleBigQuery

UpfrontCostsPaymentmadepriortorealizingvaluefromthesolution

• Costofhardware,software,andlicensesforHadoopnodes

• Costofnetworking

• 3-yearreservedinstancepricingpaidinfull

• Adjustmentfortime-valueofmoneyat8%WACC

• Noupfrontcosts

Three-yearOn-demandCloudCosts

Expectedmonthlycloudservicescosts(sumof36monthlypayments)

• Nomonthlycloudservicecosts

• NomonthlycloudservicecostsforinstancesafterpayingupfrontRI

• CostofstreaminginsertspaidtoAWSKinesisservice(mediumandlargescenariosonly)

• CostofGCPstorage

• CostoftotalmonthlyTBsprocessed

• Costofstreaminginserts(mediumandlargescenariosonly)

OperationalCosts

Resources(moneyandmanpower)usedtogetthesystemfunctioningandkeepthesolutionoperating(notincludingadministrationandqueries)

• Costofpower,cooling,andfloorspace

• Man-hourcoststoplanandpurchaseHadoopcluster

• Man-hourcoststoinstall,test,troubleshoot,andperformPOConcluster

• Man-hourcoststoplandeploymentsizeandpurchaseRedshiftinstances

• Man-hourcoststomigratedata,configure,test,andperformPOC

• Man-hourcoststomigratedata,configure,test,andperformPOC

AdministrativeCosts Expectedcoststoadministerthesolutiononadailybasis

• Costofqueryorchestration

• CostofHadoopadministration

• Costofdatabaseadministration

• Costofconsultants

• Costofqueryorchestration

• Costofqueryorchestration

MaintenanceCosts Costofmaintenanceandsupportcontracts

• CostofmaintenanceandsupportcontractsforHadoopnodesandnetworking

• Nocosttomaintainthesolution

• Nocosttomaintainthesolution

WhitePaper:TheEconomicAdvantageofGoogleBigQueryOn-demandServerlessAnalytics 10

©2017byTheEnterpriseStrategyGroup,Inc.AllRightsReserved.

TheresultsofthecomparisonshowedthatbyleveragingtheBigQueryservicefortheirneeds,themodeledorganizationscouldsaveatotalofbetween$532Kand$2.7Moverathree-yearperiod,withtheaddedbenefitsofutilizingasolutionthatintegratesbetterintotheirexistingenvironment,noupfrontcosts,andempoweringcitizendatascientistswithintheorganizationbyremovingtherelianceuponanadministratortoprepare,schedule,andexecutequeries.

ModeledScenario#1:SmallOrganization

Forthefirstscenario,ESGstartedbycalculatingtherequirementstostorea100TBdatasetandperformamixofbatchandad-hocqueriesthatonaverageprocessed50TBofdatapermonth(50%ofthestoreddatacapacity)foreachofthethreesolutions(on-premHadoop,AWSRedshift,andGoogleBigQuery).ESGfiguredthattheon-premisesHadoopinfrastructurewouldrequirethedeploymentofasingleHadoopNameNode,andeightDataNodes.ESGassumedthatHadoopwouldleveragethestandardtripleredundancyandmadesurethattherewas20%extrastoragecapacityavailablefordatagrowth(forboththeon-premisesandAWSinstances).Whilethismaysoundunfair,itisinfactacommonandnecessarypracticeforbothsolutionsasuserswoulddeploymorenodestoavoidfilling100%oftheavailablecapacity.Incontrast,itisunnecessarytopayforstoragegrowthinadvancewithBigQuery’son-demandstorage.TheHadoopupfrontcostsalsoincludesoftwarelicensesfortheoperatingsystem,Hadoop,andHivedistributions.ESGcalculatedtheexpectedmaintenanceandsupportcontractsonthehardware,aswellastheexpectedpower,cooling,andfloorspacerequirements,accountedforasoperationalcosts.

ESGestimatedthatduetothecapacityrequirements,aRedshiftcustomerwouldneedtodeployeightds2.8xlargecomputenodesaswellasasingleleadernodeforVPNaccesstotheclusterandtogeneratequeries.Becausethethree-yearreservedinstancepricingispaidupfront,ESGchosetoaccountforthiscost(aswellasanadditional8%APRcostofcapitaladjustment)asanupfrontcost,ratherthananon-demandcloudcost.ESGcalculatedthatthesmallorganizationcouldalternativelypaymonthlyon-demandcloudcostsinsteadofpayingallcostsupfront.However,overathree-yearperiod,theorganization’scumulativecostwouldberoughly75%higherfortheAWSsolution.

Becausethereisnohardwaretoinstall,configure,maintain,power,andcool,oncedeployed,theoperationalcostsforboththeRedshiftandBigQuerysolutionswouldbeminimal.ESGestimatedtheone-timeman-hourstomigratethedataandtestbothsolutionsbeforeputtingthemintoproduction.BeyondthistheBigQuerysolutioniscompletelyfreeofadditionalmaintenanceandoperationalcosts.TheAWSRedshiftsolution,becauseitisbasedontheconceptofdeployingvirtualnodes,wouldalsobeexpectedtorequireanupfrontinvestmentinman-hourstoresearch,plan,andsizethedeploymenttoensureitmeetstherequirements.

AdministrationcostsforbothRedshiftandBigQuerywereestimatedtobefarlowerthanthosefortheon-premisesHadoopdeployment.WithBigQuery,administrationwouldbeminimal—asingleadministratorwouldbeexpectedtomanagetheaccount,manageuseraccess,investigateadvancedissues,andhelpguideotheranalyststoruntheirownqueriesviaaweb-interface.ESGestimatesthatthiswouldrequirefarlessthanafull-timehireforsmallerdeployments,andcouldbeeasilyaccomplishedbyallocatingresponsibilitytoanexistinganalyst.TheRedshiftsolutionwouldbesimplertomanagethantheon-premisessolution;however,accesstothequeryengineismorecomplexthanforBigQuery,placingmoreofaburdenontheadministratortoassistothers,andmakingself-servicemoredifficultfornon-technicalanalysts.

ESG’smodelestimatedthatGoogle’sBigQuerysolutioncouldsaveasmallorganization$881Koverathree-yearperiodwhencomparedwithinvestinginanon-premisesHadoopcluster,andover$532KwhencomparedwithusingAWSRedshift.TheresultsofESG’sthree-yearTCOanalysisforthe100TB“small”modeledorganizationforeachofthethreesolutionsisshowninFigure3.

WhitePaper:TheEconomicAdvantageofGoogleBigQueryOn-demandServerlessAnalytics 11

©2017byTheEnterpriseStrategyGroup,Inc.AllRightsReserved.

Figure3.Three-yearTotalCostofSolutionSummaryfor100TB“Small”ModeledScenario

Source:EnterpriseStrategyGroup,2017

ModeledScenario#2:MediumOrganization

Themedium-sizedorganizationrequiredenoughstoragetostoreandprotecta500TBdataset,andperformedamixofbatchandad-hocqueriesthatonaverageprocessed500TBofdata(100%ofthestoreddatacapacity).Itshouldbenotedthatthis1:1ratioofstoredtoprocesseddataisasolidstandardformostorganizationstoplanon,andwasvalidatedasthetypicalcaseacrossallBigQuerydeployments.Themedium-sizeddeploymentalsomadeuseofstreamingdataservices,whichcouldberepresentativeofdatageneratedthroughwebormobileapplications,orIoTsensorsforexample.

ESGestimatedthatboththeon-premisesHadoopclusterandAWSRedshiftdeploymentwouldrequireatotalof39nodes(includingtheNameNodeandleadernode).Upfrontmaintenanceandoperationalcostsweremodeledforthethreesystemsinthesamemannerdescribedinthesmallscenario,scaledofcoursetoaccommodatethelargerdeploymentsize.Administrationofthemedium-sizeddeploymentrequiredtwofulltimeadministratorsfortheon-premisesHadoopdeploymenttooverseeoperations,assistanalystswithqueries,andadministertheHadoopdeployment,database,andhardware.ESGassumedthatasinglequerymanagercouldmanagetheRedshiftandBigQuerydeploymentinroughlythesamenumberofman-hoursasthesmalldeployment.However,ESGassumeda44%higherhourlyrateforthemanagerbasedonthescaleandcomplexityoftheorganization.

Inordertoaccommodatethestreamingservice,theAWSRedshiftsolutionmadeuseoftheAWSKinesisservicetocontinuouslyloaddataontotheRedshiftnodesforquerying.TheKinesisserviceofferson-demandpricingonly,whichcanbecomplextocalculate.Topredictpricing,onemustconsiderthroughputrequirements,payloadsize,anddataretention(numberofshard-hoursrequired,numberofPUTpayloadunitspermillion,andoptionalrequirementtokeepdatalongerthan24hours).Incontrast,BigQuerystreaminginsertsaresimplybilledperGBinserted.

ESG’smodelestimatedthatGoogle’sBigQuerysolutioncouldsaveamedium-sizedorganizationmorethan$2Moverathree-yearperiodwhencomparedwithinvestinginanon-premisesHadoopcluster,andover$1.7MwhencomparedwithusingAWSRedshift.TheresultsofESG’sthree-yearTCOanalysisforthe500TB“medium”modeledorganizationforeachofthethreesolutionsisshowninFigure4.

WhitePaper:TheEconomicAdvantageofGoogleBigQueryOn-demandServerlessAnalytics 12

©2017byTheEnterpriseStrategyGroup,Inc.AllRightsReserved.

Figure4.Three-yearTotalCostofSolutionSummaryfor500TB“Medium”ModeledScenario

Source:EnterpriseStrategyGroup,2017

ModeledScenario#3:LargeOrganization

Thelarge-sizedorganizationrequiredenoughstoragetostoreandprotecta1PBdataset,andperformedamixofbatchandad-hocqueriesthatonaverageprocessedanenormous4PBofdata(400%ofthestoreddatacapacity).Thelarge-sizeddeploymentalsomadeuseofalargerpercentageofstreamingdataservices(5%ofqueriesprocessed).

ESGestimatedthatboththeon-premisesHadoopclusterandAWSRedshiftdeploymentwouldrequireatotalof76nodes(includingtheNameNodeandleadernode).Upfrontmaintenanceandoperationalcostsweremodeledforthethreesystemsinthesamemannerdescribedinthesmallscenario,scaledtoaccommodatethelargerdeploymentsize.Administrationofthelarge-sizeddeploymentrequiredthreefulltimeadministratorsfortheon-premisesHadoopdeploymenttooverseeoperations,assistanalystwithqueries,andadministertheHadoopdeployment,database,andhardware.ESGassumedan11%higherhourlyrate(comparedwiththemedium-sizedscenario)fortheadministrationoftheBigQueryandRedshiftsolutionsbasedonthelargescaleandcomplexityoftheorganization.

ESG’smodelpredictedthatGoogle’sBigQuerysolutioncouldsavealargeorganizationmorethan$2.7Moverathree-yearperiodwhencomparedwithinvestinginanon-premisesHadoopcluster,andover$2.3MwhencomparedwithusingAWSRedshift.TheresultsofESG’sthree-yearTCOanalysisforthe1PB“large”modeledorganizationforeachofthethreesolutionsisshowninFigure5.

WhitePaper:TheEconomicAdvantageofGoogleBigQueryOn-demandServerlessAnalytics 13

©2017byTheEnterpriseStrategyGroup,Inc.AllRightsReserved.

Figure5.Three-yearTotalCostofSolutionSummaryfor1PB“Large”ModeledScenario

Source:EnterpriseStrategyGroup,2017

WhattheNumbersMeanESG’smodelsshowthatGoogle’sBigQueryserviceprovidesasimpleandeconomicalsolutionfororganizationsofallsizes.ESG’smodelspredictthatorganizationscanexpecttogenerateinsightatacostthatis60%to88%lowerthandeployingandmanaginganon-premisesHadoopdeployment,and56%to82%lowerthanutilizingtheAWSRedshiftserviceoverathree-yearperiod.

Thebigwin,however,maybesomeofthe“softer”benefitsthatBigQueryprovides.BigQuerydoesnotrequireanupfrontinvestment,providesafastertimetovalue,iseasiertomanagefordepartmentaluse,andallowsformoreanalyststoperformtheirownqueries.Theresultisthatorganizationscanspendlesstimemanagingsoftware,hardware,andqueries,andmoretimegeneratingvaluableinsight.

ModelConsiderations:PricingOptions

On-demandcloudservicesoftenmakesenseforyoungandsmallerorganizationsthatarelookingtogetstartedinanalyticswithoutmakingalargeupfrontinvestment.Largerandmoreestablishedorganizations,however,oftenstruggletomakesenseofthenumerousandcomplexpricingschemesthatsomecloudprovidersoffer.Whenanalyzingtheinvestmentoptionsbetweenanon-premisesandAWSRedshiftdeployment,over20optionsmayhavetobemodeledandconsidered(on-demandversusupfrontpricing;one-yearorthree-yearterm;andnothing,partial,orallupfront,acrossfourinstancetypes,plustheKinesisservice).Incontrast,modelingBigQuerypricingissimple.

ESG’smodelsfoundthatwhenlookingtominimizecostsoverathree-yearperiod,italmostalwaysmakesthebestfinancialsensetopaythethree-yearterm,allupfrontpricingoptionforAWSRedshift.Thesavingsofupto75%nearlyforceorganizationstoopttopaylonginadvanceofvaluerealization.Figure6showsasampleanalysisbasedonthe“medium”scenario,comparingthehighestandlowestcostAWSRedshiftpricingoptionstoBigQuery’son-demandpricing.

WhitePaper:TheEconomicAdvantageofGoogleBigQueryOn-demandServerlessAnalytics 14

©2017byTheEnterpriseStrategyGroup,Inc.AllRightsReserved.

Figure6.UpfrontversusMonthlyCostComparisonfor“Medium”Scenario(CumulativeoverTime)

Source:EnterpriseStrategyGroup,2017

Thedifferencesinthesecostsaremoreapparentwhenthey’rebrokenoutintoonetime“upfront”coststhatarepaidbeforeanyvalueisrealizedfromthesolution,andrecurring“monthly”costs,whicharepaidasthevalueisrealizedfromthesolution.ThedifferencesbetweenthesolutionsareshowninFigure7.

Figure7.UpfrontversusMonthlyCostComparisonfor“Medium”Scenario

Source:EnterpriseStrategyGroup,2017

WhitePaper:TheEconomicAdvantageofGoogleBigQueryOn-demandServerlessAnalytics 15

©2017byTheEnterpriseStrategyGroup,Inc.AllRightsReserved.

WhyThisMattersIfacloudserviceisnoteasytoplan,purchase,anddeploy,organizationswillhaveahardertimejustifyingtheinvestmenttodecisionmakers.Whilesomeorganizationsmayenjoytheoptionofpayingupfront,orfindvalueinconsistentpredictablepricing,thedecisionshouldbestraightforwardandshouldnotresultinbeinglockedintoatechnologyorvendor.Thisdefeatsthepurposeofaservice.

Googleunderstoodthiswhenitcreatedasimple,serverlesson-demandpricingmodelforitsBigQueryservice.Organizationsthatdemanddedicatedresourcesorpricepredictabilitycandecidetopaythemonthlyflatratefee,orincreasetheircomputerequirementsifneeded.However,themajorityofcustomerswillbenefitfromthesimpleon-demandpricingandadvancesinservertechnology,withthefreedomtochangetheirmindatanytime.

TheBiggerTruth

Eventhoughthesmoke-filleddaysoftheindustrialagehavelongpast,thelessonslearnedremain.Thechallengesofindustryonceincludedhowtoquicklyandcost-effectivelymine,transport,andconvertrawmaterialsintousableproducts—businessestodayaresimilarlytaskedwithcollecting,manipulating,andconvertingrawdatatoyieldactionableintelligenceandvaluableinsight.Timehasnottransformedthetechniqueforsuccess:keepcostslowandpredictable,whilesimplifyingoperations,avoidingbottlenecks,anddeliveringmaximumvalueintheminimalamountoftime.

Likeourindustrial-agepredecessors,theprimarymethodologyemployedbytoday’sorganizationsistomakealargeupfrontinvestmentinphysicalinfrastructure(alargefactory),sizedtomeettheexpecteddemandfortheforeseeablefutureatgreatfinancialriskofover-orunder-provisioningequipment(servers,network,storage,andsoftware)andunder-orover-staffinghumanresources(hardwareandsoftwareadministratorsandexperts).Astheindustrialistseventuallyfoundout,productscanbeproducedwithgreateragilityandlessriskbyleveragingmassiveglobaloperationsthatspecializeinproduction—whilethecompanycouldspendmorecyclesmaximizingthemonetaryvalueoftheend-product.

Today’sanalyticschallengesarenodifferent.Manyorganizationsaresimplynotlargeenoughtojustifyspendingvaluabletime,resources,andmoneybuilding,managing,andmaintaininganon-premisesHadoopinfrastructurewhenGoogle’sBigQueryon-demand,serverlesstechnologycanprovideinsightmorecost-effectively,inagreatlysimplifiedmanner,andwithlessfinancialrisktotheorganization.

ESG’models,builtontheresultsofvalidationwithBigQuerycustomers,showthatorganizationscanexpecttosavebetween$881Kand$2.7Moverathree-yearperiodbyleveragingBigQueryinsteadofplanning,deploying,testing,managing,andmaintaininganon-premisesHadoopcluster.ThemodelsalsoshowthatBigQuery’sserverlessdesignandsimplepricingcanprovideasolutionthatissimplertomanageatatotalcostthatisbetween56%and82%lessexpensivethanusingAWSRedshifttostoredataandperformqueries.Perhapsmoreimportantly,thesimplicityoftheBigQuerysolutionaccommodateseconomiesofscaleacrossdepartmentsandenablesanalyststoexecutetheirownqueries,allowingforgreaterorganizationalefficiencyandquickerinsight.

Ifyourorganizationislookingtospendmoretimegeneratingmorevaluefromyourdata,andlesstimemanagingthesolution,whilekeepingcoststoaminimum,thenESGsuggestsyouconsiderlettingGooglehandletheinfrastructurewhileempoweringyourorganizationwiththeabilitytofocusontheinsight.

©2017byTheEnterpriseStrategyGroup,Inc.AllRightsReserved.

Alltrademarknamesarepropertyoftheirrespectivecompanies.InformationcontainedinthispublicationhasbeenobtainedbysourcesTheEnterpriseStrategyGroup(ESG)considerstobereliablebutisnotwarrantedbyESG.ThispublicationmaycontainopinionsofESG,whicharesubjecttochangefromtimetotime.ThispublicationiscopyrightedbyTheEnterpriseStrategyGroup,Inc.Anyreproductionorredistributionofthispublication,inwholeorinpart,whetherinhard-copyformat,electronically,orotherwisetopersonsnotauthorizedtoreceiveit,withouttheexpressconsentofTheEnterpriseStrategyGroup,Inc.,isinviolationofU.S.copyrightlawandwillbesubjecttoanactionforcivildamagesand,ifapplicable,criminalprosecution.Shouldyouhaveanyquestions,pleasecontactESGClientRelationsat508.482.0188.

www.esg-global.com [email protected] P.508.482.0188

EnterpriseStrategyGroupisanITanalyst,research,validation,andstrategyfirmthatprovidesactionableinsightandintelligencetotheglobalITcommunity.

©2017byTheEnterpriseStrategyGroup,Inc.AllRightsReserved.