esg economic validation white paper the economic advantage ... · google bigquery, part of the...
TRANSCRIPT
©2017byTheEnterpriseStrategyGroup,Inc.AllRightsReserved.
ByAvivKaufmann,SeniorESGValidationAnalyst;andNikRouda,SeniorAnalystApril2017ThisESGWhitePaperwascommissionedbyGoogleandisdistributedunderlicensefromESG.
EnterpriseStrategyGroup|Gettingtothebiggertruth.™
TheEconomicAdvantageofGoogleBigQueryOn-DemandServerlessAnalytics
ESGEconomicValidationWhitePaper
WhitePaper:TheEconomicAdvantageofGoogleBigQueryOn-demandServerlessAnalytics 2
©2017byTheEnterpriseStrategyGroup,Inc.AllRightsReserved.
Contents
TheChallenge:EconomicalInsight.........................................................................................................................................3
TheSolution:GoogleBigQuery..............................................................................................................................................4
GoogleBigQueryversusAlternativeSolutions.......................................................................................................................5
ESG’sEconomicValueAuditProcess.....................................................................................................................................6
EconomicBenefitsofGoogleBigQuery..................................................................................................................................6
ESGEconomicValidation.......................................................................................................................................................8
ModeledScenario#1:SmallOrganization.......................................................................................................................10
ModeledScenario#2:MediumOrganization..................................................................................................................11
ModeledScenario#3:LargeOrganization.......................................................................................................................12
ModelConsiderations:PricingOptions............................................................................................................................13
TheBiggerTruth...................................................................................................................................................................15
WhitePaper:TheEconomicAdvantageofGoogleBigQueryOn-demandServerlessAnalytics 3
©2017byTheEnterpriseStrategyGroup,Inc.AllRightsReserved.
TheChallenge:EconomicalInsight
Whenitcomestogaininginsightfromyourdata,traditionalbusinessintelligenceanddatawarehousesolutionshaveleveragedyearsofinnovationtoarriveatintelligentsolutionsandmethodologiestogainvaluableinsightfromstructuredsetsofdata.Thesuccessofthesesolutionshasconditionedorganizationstothevalueofcollectingandanalyzingdata—resultingininitiativestocollectandprocessevenmore.Today,thesheervolumeandrateofdatagenerateddwarfsthatofdayspastandthepotentialvalueofthisdataleaveslittleincentivetothrottleback.Organizationsarenowfacedwiththerealityofstayingaheadoftheever-growingscaleandvelocityofthedatatheyaregenerating.Tobetterdoso,manyhaveturnedtobigdatasolutionspoweredbyHadoopanddatalakesformostorevenalloftheirdatadiscovery,organization,analytics,andreportinginitiatives.
Bigdatasolutionsdonotcomecheaply,simply,orquickly.Notonlyisalargeupfrontmonetaryinvestmentinhardwareandsupportedsoftwarerequired,butalsoanupfrontinvestmentintimetoplan,purchase,install,configure,andtestthesolutionisneededbeforedeliveringanyvaluetothebusiness.Expertadministratorsandoperatorsarerequiredtoadministerthesystem.Storagecapacityrequirementsgrowrapidlyandmassiveamountsofcomputepowerareessentialtoprocessingdataquickly.Itisnearlyimpossibletodecouplestoragecapacityandcomputepowertoscaleindependently.Systemsmustbegreatlyoverprovisionedforredundancyandfuturegrowth,andruntheriskofbeingobsoletedquickly.Quitesimply:Purchasingorbuildinganon-premisesbigdatasolutioncomesatabigcostwithabigriskofimpactingtimetoinsight.
Timetoinsightisanimportantmetricintoday’srapidlyevolving,knowledge-poweredindustries.ESG’sannualITspendingsurveyrevealsthatnearlyfouroutoftenorganizationsprioritizingbigdatainitiativesin2017expecttoallocatefundingtoenhancingtheirbusinessintelligencecapabilitiesandcustomerinsights,whichremainsapriorityasbusinessesseektodifferentiatefromtheircompetitionbyenablingasmarterworkforce(seeFigure1).1
Figure1.2017DataAnalyticsSpendingPriorities
Source:EnterpriseStrategyGroup,2017
1Source:ESGBrief,2017DataAnalyticsSpendingTrends,January2017.
10%
10%
14%
15%
15%
16%
21%
23%
33%
34%
35%
39%
Don'tknow/toosoontotell
Sparkplatform(s)
Machinelearning
NoSQLdatabases
Streamprocessingorstreaminganalytics
Hadoopplatform(s)
Datapreparation
Datawarehouse
SQLdatabases
Dataintegration
Cloud-basedanalytics
Businessintelligence
Inwhichofthefollowingareaswillyourorganizationmakethemostsignificantdataanalyticsinvestmentsoverthenext12-18months?(Percentofrespondents,N=337,fiveresponsesaccepted)
WhitePaper:TheEconomicAdvantageofGoogleBigQueryOn-demandServerlessAnalytics 4
©2017byTheEnterpriseStrategyGroup,Inc.AllRightsReserved.
Itisimportanttorememberthatmostorganizationsarenotinthebusinessofbuildingandoperatingbigdatasolutions,butrathertheyareinthebusinessofgeneratingdataandextractingvaluableinsightfromthisdata.SimilartowhatIaaSdidforphysicalinfrastructure,cloud-basedbigdatasolutionscanofferacost-effectiveandrapidlyscalablealternativetoDIYorintegratedon-premisessolutions.Tobestmeettheever-growingrequirementsoftheirbigdatainitiatives,organizationsmustkeepacloseeyeonandtrulyunderstandthecostsandbenefitsinvolvedwiththeon-premisesandcloudservicesolutionsavailabletostoretheirdata,runqueries,andextractinsight.
TheSolution:GoogleBigQuery
GoogleBigQueryisacloud-based,fullymanaged,serverlessenterprisedatawarehousethatsupportsanalyticsoverpetabyte-scaledata.Itdelivershigh-speedanalysisoflargedatasetswithoutrequiringinvestmentsinonsiteinfrastructureordatabaseadministrators.BigQueryscalesitsuseofhardwareupordowntomaximizeperformanceofeachquery,addingandremovingcomputeandstorageresourcesasrequired.
GoogleBigQuery,partoftheGoogleCloudPlatform,isdesignedtostreamlinebigdataanalysisandstorage,whileremovingtheoverheadandcomplexityofmaintainingonsitehardwareandadministrationresources.SomeofthespecificadvantagesofGoogleBigQueryforbusinessesthatworkwithbigdatainclude:
• TimetoValue-Userscangettheirdatawarehouseenvironmentonlinequicklyandeasily,withoutrequiringexpert-levelsystemanddatabaseadministrationskillsbyeliminatingtheinfrastructureandreducingthemanagement(knownas“No-Ops”or“Zero-Ops”).
• Simplicity–Completeallmajortasksrelatedtodatawarehouseanalyticsthroughanintuitiveinterfacewithoutthehassleofmanagingtheinfrastructure.
• Scalability–Scaleuptopetabytesordowntokilobytesdependingonyoursize,performance,andcostrequirements.
• Speed–Ingest,query,andexportPB-sizeddatasetswithimpressivespeedsusingtheGoogleCloudPlatformastheunderlyingcloudinfrastructure.
• Reliability–Ensurealways-onavailabilityandconstantuptimerunningontheGoogleCloudPlatformwithgeo-replicationacrossGoogledatacenters.
• Security–ProtectandcontrolaccesstoencryptedprojectsanddatasetsthroughGoogle’scloud-wideidentityandaccessmanagement(IAM).
• CostOptimization–Predictcostswithtransparentflatrateand/orpay-as-you-gopricing,andcontaincoststhroughtheuseofprojectanduserresourcequotas.
GoogleBigQueryisself-scaling;itidentifiesresourcerequirementsforeachquerytofinishquicklyandefficiently,andprovidesthoseresourcestomeetthedemand.Oncetheworkloadhascompleted,BigQueryreallocatesthoseresourcestootherprojectsandotherusers.Bothintransferringdatain,andinprocessingthatdataforresults,BigQuerydeliverstremendousspeedsevenatpetabytescales.Forenhanceddatadurability,BigQueryprovideshighavailabilityandreliabilitythroughgeographicreplicationthatiscompletelytransparenttoitsusers,andwithouttherequirementtoobtainthephysicalresourcesandspacetohouseitall.
Ultimately,GoogleBigQueryenablesorganizationstoaddressthecostandcomplexitychallengesassociatedwithbuildingandmaintainingafast,scalable,andresilientbigdatainfrastructure.ByleveragingGoogleBigQuery’scloud-basedapproach,thetimeandcosttraditionallydedicatedtoprotectingdataandguaranteeinguptimeisnearlyeliminated.WithGooglehandlingscalability,replication,protection,andrecovery,organizationscanfocusmoreongainingvaluableinsights,asopposedtoinfrastructuremanagement.
WhitePaper:TheEconomicAdvantageofGoogleBigQueryOn-demandServerlessAnalytics 5
©2017byTheEnterpriseStrategyGroup,Inc.AllRightsReserved.
GoogleBigQueryversusAlternativeSolutions
Thepurposeofthispaperistohelporganizationsunderstandandcomparethedirectandindirectcoststhatshouldbeconsideredwhenchoosingasolutiontostoretheirbigdataandperformqueriesagainstit.Thispapercomparesado-it-yourself(DIY)on-premisesHadoopclusterdeploymentwithinstance-basedcloudservicesfromAWS(RedshiftwithKinesis)aswellastheon-demandcloudservicefromGoogle(BigQuery).
Withanon-premisesHadoopcluster,theorganizationmustplan,deploy,maintain,andconfigurethephysicalhardwareandsoftwarerequiredtostorethedataandpowerthequeries.HadoopnodesarecomprisedofcommodityserverspopulatedwithlargeNL-SASdrivesthatareusedtostoreandprotectthedata.Substantialworkmustbedonetoadminister,configure,andoptimizeboththehardwaresolutionandtheHadoop/Hivesoftware.AWSRedshifthelpstogreatlysimplifythemanagementandeliminatethemaintenanceandtheneedtophysicallyadministerthehardware.LikeaHadoopcluster,theAWSsolutionisbasedontheconceptofnodes(albeitvirtualnodes).Toscalethedeployment,similarnodesofafixedcomputeandstoragecapacityareaddedsimultaneously,sometimesresultinginprovisioningmorecomputeorstoragecapabilitiesinordertomeettherequirementsoftheother.
Google’sBigQuerysolutioniscompletelyserverlessfromthecustomerperspective.Therearenonodestoplan,configure,orscale.Thecomplexityofsizing,managing,andmaintainingthephysicalinfrastructureishandledbehindthescenesbyGoogle,sotheburdenisremovedfromtheend-user.Figure2depictsthethreesolutionscomparedinthisanalysisandhowtheyeachimplementcompute,storage,administration,andquerymanagement.
Figure2.Hadoop,AWS,andGoogleImplementations
Source:EnterpriseStrategyGroup,2017
Analysts
• DataScientist• HadoopAdmin• DatabaseAdmin• HardwareAdmin
RedundantNetworkInfrastructure
• Compute
• Storage
• Software
HadoopCluster (NameNode/DataNodes)
.
.
.
Analysts
DataScientist
.
.
.
AWSRedshift
• Compute
• Storage
LeaderNode
ComputeNodes
DataScientist
CitizenDataScientists
SQLQueriesSQLQueries
TechnicalAnalysts
S3Storage(Ingestion)
AWSKinesis(Streaming)
SQLQueries
HadoopOn-premises AWSRedshift GoogleBigQuery
• Compute
• Storage
• Software
• Compute
• Storage
• Compute
• Storage
WhitePaper:TheEconomicAdvantageofGoogleBigQueryOn-demandServerlessAnalytics 6
©2017byTheEnterpriseStrategyGroup,Inc.AllRightsReserved.
ESG’sEconomicValueAuditProcess
ESG’sEconomicValueAudit(EVA)processisaprovenmethodforunderstanding,validating,quantifying,andmodelingtheeconomicvaluepropositionsofaproductorsolution.TheprocessleveragesESG’scorecompetenciesinmarketandindustryanalysis,forward-lookingresearch,andtechnical/economicvalidation.TheEVAauditprocessleveragesinterviewswithreal-worldcustomerswhohavehadexperiencewithbothGoogleBigQueryandalternativebigdatasolutionstohelpqualitativelyandquantitativelyvalidatethebenefitsthatGoogleBigQueryhasbroughttotheiroperations.ThisinformationisthenappliedtohelpestimatethecostsandbenefitsinESG’smodeledscenariosdepictedinthispaper.
EconomicBenefitsofGoogleBigQuery
GoogleBigQuery’sserverless,on-demandqueryservicewasdesignedandpricedtoprovidecustomerswithinsightquicklyandeconomically.ESG’sEconomicValueAuditprocessrevealedthatBigQuerycanprovidesignificantcostsavingsandeconomicbenefitopportunities.ESGfoundthatBigQuerycustomershaveenjoyedsignificanteconomicandoperationalsavingswhencomparedwithbothon-premisesHadoop-baseddeploymentandAWS’sRedshiftcloud-basedbigdatasolution.Thesebenefitsfallmainlyintothreecategories:eliminationofupfrontcapitalinvestment;operationalandadministrativesavings;andlowercostofcloudservices.
UpfrontCapitalInvestmentSavings
þ Anon-premisesHadoopdeploymentrequiresaverylargecapitalinvestmenttopurchasenodes,networkinginfrastructure,software,andlicenses.
þ Anon-premisesHadoopdeploymentrequiresasignificantamountofplanning,purchasing,configuration,andtestingpriortoprovidinganybigdatabenefittotheorganization.ThisresultsinadditionalupfrontoperationalexpensesandpotentiallylessrevenueduetoamuchlongertimetovaluewhencomparedwithBigQuery.
þ Forbothanon-premisesHadoopdeployment(hardwarenodes)andanAWSRedshiftdeployment(AWSvirtualinstances),thestoragecapacityisdirectlytiedtothecomputepowerandmemory,andnonecanbescaledindependentlyoftheother.Thiscanpotentiallyresultinoverprovisioningofcomputeorstorageresourcesinanefforttoscaletheother,duetotheinflexibilityofpre-definedmachines.
þ Forbothanon-premisesHadoopdeploymentandanAWSRedshiftdeployment,theorganizationmustspendtimetoplanandsizethedeployment,oftenoverprovisioningtoaccommodatetheworst-casescenario.
þ AWSRedshiftcustomersoftenchoosetopayupfront,reservedinstancepricingandbenefitfromsignificantdiscountsversusrunningRedshiftinstancesondemand.Customerscansaveupto75%overtheon-demandpricingbychoosing3-YearAll-Upfrontreservedinstancepricing.
BigQueryisserverlessandpaymentsareoftenmadecompletelyon-demand,basedonlyontheamountofdataprocessedandstoredpermonth.Thismeansupfrontpayments,planning,purchasing,installation,configuration,nodemanagement,ortestingarenotrequired.BigQuerycanprovidevalueintheformofinsightassoonasthedataismadeavailableonGCPstorage.MultiplecustomerswhoESGspokewithreportedthattheywereupandrunningqueriesonBigQueryinamatterofhoursratherthandays,weeks,orevenmonths.Othercustomerswhohadexperiencewithmultiplesolutionsstated:
“Wespentover$1MonaHive-basedwarehousethatwasalwaysrunningoutofmemoryandrequiredconstanttuningandmaintenance.Upgradescreatedinstabilityandcompletelytookthesystemdown.”
“…ofcoursewechosetopaytheupfrontreservedinstanceprice.Itdoesn’tmakemuchsensenotto.”
RemovingthefinancialbarrierofalargeupfrontinvestmentintermsoftimeandmoneymakesBigQueryanattractiveplatformfororganizationslookingtosimplytryouttheservice,orquicklyscaletheiranalyticscapabilities.
WhitePaper:TheEconomicAdvantageofGoogleBigQueryOn-demandServerlessAnalytics 7
©2017byTheEnterpriseStrategyGroup,Inc.AllRightsReserved.
OperationalandAdministrativeSavings
þ Anon-premisesHadoopdeploymentrequiresmorepeopletoadministerthesolution,oftenincludingahardwareandsoftwareadministrator,andoneormoreHadoopadministratorsand/oroperators.
þ Withanon-premisesHadoopsolution,analystsusuallymustworkwiththeHadoopoperatorstotranslateandexecutequeriesratherthanexecutingthequeriesthemselves.
þ Anon-premisesdeploymentmustbepoweredandcooled,andrequiresfloorspace—increasingtheoperationalcostofthedeployment.
þ On-premisesdeploymentsoftenrequiretheservicesofconsultantstohelptunethesolutionforoptimalperformancebecausethoseskillsgobeyondthoseofthetypicalHadoopadministrator.
þ On-premisesdeploymentsmustbemaintained,includingupdatingfirmware,theOS,securitypatches,andHadoopreleases,aswellastroubleshootingandresolvingissueswiththehardwareandsoftware.
þ AWSRedshiftissimilarlybasedontheconceptofnodescontainingafixedsetofresources.Thismaymakeplanning,deployment,andgrowthmorecomplexthanwithBigQuery’sserverlessservice.
þ AWSRedshiftrequirestheusertologintoa“leadernode”inordertorunrequestsagainstthepoolofRedshiftinstances.Thisaddedcomplexitymeansthatlesstechnicalanalystsmaycontinuetoleverageanoperatortohandlequeriesratherthanperformingthequeriesthemselves.
BigQuerydoesnotrequirededicatedadministrators,andcustomersreportedthatitissimpleenoughforanalyststorunqueriesontheirownbycuttingandpastingqueriesintoaweb-basedself-serviceportal.Incomparison,touseRedshift,analyststhatarecomfortableusingVPNcanlogintoaproxyserverrunninginAWStoconnecttotheRedshiftdeployment.ThisproxyservercanmakeitmoreprohibitiveformultiplegroupsondifferentACLstosharethesameRedshiftdeployment,resultingintheneedtodeployandmanageseparateinstancesforeachdepartmentandfurtherincreasingthemanagementcomplexityofthesolution.Whenaskedaboutoperationalcosts,thecustomerswithwhomESGspokeagreedthatRedshiftwasfareasiertomanage,operate,andmaintaincomparedwithanon-premisesHadoopdeployment,andalsoagreedthatBigQuerywassimplertooperateandmanagethantheirRedshiftdeploymenthadbeen.
“ThekindofpeoplethatcandebugHadoopexceptionsaretypicallynotclosetothebusiness.TheerrormessagesthatcomeoutofHadooparenotwellwrittenandaredifficulttotroubleshoot.Thisledtoretentionissues—welostgoodpeoplebecausetheyweresittingarounddealingwithissues.”
“Theweb-clientisahugebonus.AnalystsdonothavetouseanODBC/JDBC,theyjustneedaURL.”
ThesimplicityofBigQueryenablesanalystsandoperatorswithinthecompanytobecomecitizendatascientists—empoweringthemtotakecontroloftheirownqueries,removingtheirdependenceonothers,andultimatelyproducinghigherqualityinsightfortheorganizationinashorteramountoftime.
CloudCostSavings
þ BothGoogleBigQueryandAWSRedshiftprovideverywelldocumentedpricelistsandweb-basedpricingcalculatorstodetermineestimatedcloudcosts.
þ GoogleBigQuerypricingissimple:PayforyourstoragecapacityandpayfortheTBsprocessedeachmonth.Thereisnocomplexityoffiguringoutsizingorriskofoverprovisioning.
þ RedshiftOn-Demandpricingisbasedondeployingvirtualizedinstances(nodes)withfixedcomputeandstoragerequirements,meaningtheycannotbescaledindependently.
þ PoweringdownRedshiftnodesrequiresuseofsnapshotsormigrationofdata.Onlythesmallestdeploymentsgenerallybenefitfrompayingforlessthan24x7operation.
þ RedshiftOn-DemandpricingisgenerallyquitehighcomparedwithBigQueryandalsowhencomparedwithAWSReservedInstancepricingoptions.
þ ReservedInstancepricingonAWSprovidesaparticularinstancetype,makingitdifficulttotakeadvantageofthelatestCPUorstorageofferingsoverthelengthofyourcontract.BigQueryalwaysmakesuseofthelatestavailable
WhitePaper:TheEconomicAdvantageofGoogleBigQueryOn-demandServerlessAnalytics 8
©2017byTheEnterpriseStrategyGroup,Inc.AllRightsReserved.
technologiesautomaticallyandtransparently.
BigQuerypricingisdesignedwithsimplicityandeconomicsinmind.Thereareabsolutelynoupfrontcosts,andyousimplypayforwhatyoustoreandwhatyouquery.Thisistrue“on-demand”pricing,withouttheneedtoplan,configure,tune,orupdatenodes.Customersmaybewaryofpayinganon-demandper-TBprocessingfeewithoutknowingexactlyhowmuchdatatheywillprocessinagivenmonth.ESGvalidatedwithcustomersandwithinternaldatacollectedbyGCPthattheamountofdataprocessedonaverageisveryoftenequivalenttotheamountofdatastored.Inmanycases,thisresultsinaveryfavorablecostadvantageoverRedshift(inwhichyoumustpayforenoughinstancestocoverthestoragerequirements).
WithBigQuery,pricingissimplifiedandplanningisnotrequiredbecauseitisnotbasedontheconceptofdeployingpreconfiguredinstancesasbigDataNodes.BigQueryusersdonothavetoestimatethehardwarerequirementsoftheirconfigurations,andneverpayforwhatisnotrequired.Theydonothavetochosebetweenpayingveryhighmonthlyon-demandcostsorlockingthemselvesintoaone-orthree-yearcontractwithalargeupfrontinvestment.ThecustomerswhomESGspokewithsharedtheirconcernswithreservedinstancepricing:
“…wemadetheswitchtoBigQueryandtodaywestillhave9monthsleftinreservedinstancesonourRedshiftnodes.”
ESG’sanalysisinthefollowingsectionsofthispaperfurtherillustratetheoperationalandpricingadvantagesthatBigQueryholdsoveranAWSRedshiftdeployment.
ESGEconomicValidation
TovalidatetheeconomicbenefitsofGoogleBigQuerywhencomparedwithbothanon-premHadoopdeploymentandAWSRedshift,ESGcreatedrequirementsforthreemodeledorganizationsthatarerepresentativeoftypicalsmall,medium,andlargeorganizationsperformingqueriesagainsttheirdata.TherequirementsoftheseorganizationsareshowninTable1.
Table1.ModeledScenarioRequirements
Small Medium Large
NumberofDataCitizens 5Analysts 50Analysts 100Analysts
AmountofDataStored 100TB 500TB 1PB
TBsQueriedperMonth 50TB(0.5xDataStored) 500TB(1xDataStored) 4PB(4xDataStored)
StreamingApplications No Yes(2%ofQueries) Yes(5%ofQueries)
ESGmodeledandcomparedthecoststhatthesmall,medium,andlargeorganizationmightexpecttopayoverathree-yearperiodtoplan,purchase,deploy,operate,administer,andmaintainasolutiontoperformtheirqueries.TheassumptionsandcostsusedinthescenarioswerebasedontheresultsofdetailedinterviewswithBigQuerycustomerswhohaveexperiencewithtwoormoreofthesolutionscomparedandcouldhelptoquantifythecostsandrelativedifferencesbetweentheirdeployments.
Aswillbedescribedinmoredetaillaterinthispaper,therearemanyfactorstoconsiderwhenitcomestocloudservicespricing.BecauseAWS’sthree-yearreservedinstance(RI)pricingprovidedthelowesttotalcostsoverthreeyearswhencomparedwithotherAWSpricingoptions,ESGbasedtheTCOanalysisonthisupfrontpricingmodel.ESGalsosizedandpricedon-premisesHadoopnodesandAWSRedshiftinstancesthatwouldbeexpectedtoprovidesimilarperformanceresultswhencomparedwiththeBigQuerysolution.
WhitePaper:TheEconomicAdvantageofGoogleBigQueryOn-demandServerlessAnalytics 9
©2017byTheEnterpriseStrategyGroup,Inc.AllRightsReserved.
ESGleveragedknowledgeofmarkets,theindustry,andvendorsolutions,aswellasdetailedinterviewswithGoogleanditscustomerstomodelandpredictthecoststodeploy,administer,manage,maintain,andoperateeachofthesolutions.Thesecostsweregenerallybasedonthenumberofemployeesandcontractedservicesthatwererequiredfordifferentsizeddeployments,aswellasfortheindividualsolutions.Whereverpossible,directcomparisonsbetweenthesolutionswereusedtogaugerelativedifferencesinman-hourrequirements.ESG’smodeledthree-yearTCOanalysisconsideredthehigh-levelcostcategoriesdepictedinTable2andcreateddetailedmodelstoestimatethecostsofthesmall,medium,andlargescenariosforeachofthethreebigdatasolutions.
Table2.ModeledScenarioRequirements
CostCategory Description HadoopOn-premises AWSRedshift GoogleBigQuery
UpfrontCostsPaymentmadepriortorealizingvaluefromthesolution
• Costofhardware,software,andlicensesforHadoopnodes
• Costofnetworking
• 3-yearreservedinstancepricingpaidinfull
• Adjustmentfortime-valueofmoneyat8%WACC
• Noupfrontcosts
Three-yearOn-demandCloudCosts
Expectedmonthlycloudservicescosts(sumof36monthlypayments)
• Nomonthlycloudservicecosts
• NomonthlycloudservicecostsforinstancesafterpayingupfrontRI
• CostofstreaminginsertspaidtoAWSKinesisservice(mediumandlargescenariosonly)
• CostofGCPstorage
• CostoftotalmonthlyTBsprocessed
• Costofstreaminginserts(mediumandlargescenariosonly)
OperationalCosts
Resources(moneyandmanpower)usedtogetthesystemfunctioningandkeepthesolutionoperating(notincludingadministrationandqueries)
• Costofpower,cooling,andfloorspace
• Man-hourcoststoplanandpurchaseHadoopcluster
• Man-hourcoststoinstall,test,troubleshoot,andperformPOConcluster
• Man-hourcoststoplandeploymentsizeandpurchaseRedshiftinstances
• Man-hourcoststomigratedata,configure,test,andperformPOC
• Man-hourcoststomigratedata,configure,test,andperformPOC
AdministrativeCosts Expectedcoststoadministerthesolutiononadailybasis
• Costofqueryorchestration
• CostofHadoopadministration
• Costofdatabaseadministration
• Costofconsultants
• Costofqueryorchestration
• Costofqueryorchestration
MaintenanceCosts Costofmaintenanceandsupportcontracts
• CostofmaintenanceandsupportcontractsforHadoopnodesandnetworking
• Nocosttomaintainthesolution
• Nocosttomaintainthesolution
WhitePaper:TheEconomicAdvantageofGoogleBigQueryOn-demandServerlessAnalytics 10
©2017byTheEnterpriseStrategyGroup,Inc.AllRightsReserved.
TheresultsofthecomparisonshowedthatbyleveragingtheBigQueryservicefortheirneeds,themodeledorganizationscouldsaveatotalofbetween$532Kand$2.7Moverathree-yearperiod,withtheaddedbenefitsofutilizingasolutionthatintegratesbetterintotheirexistingenvironment,noupfrontcosts,andempoweringcitizendatascientistswithintheorganizationbyremovingtherelianceuponanadministratortoprepare,schedule,andexecutequeries.
ModeledScenario#1:SmallOrganization
Forthefirstscenario,ESGstartedbycalculatingtherequirementstostorea100TBdatasetandperformamixofbatchandad-hocqueriesthatonaverageprocessed50TBofdatapermonth(50%ofthestoreddatacapacity)foreachofthethreesolutions(on-premHadoop,AWSRedshift,andGoogleBigQuery).ESGfiguredthattheon-premisesHadoopinfrastructurewouldrequirethedeploymentofasingleHadoopNameNode,andeightDataNodes.ESGassumedthatHadoopwouldleveragethestandardtripleredundancyandmadesurethattherewas20%extrastoragecapacityavailablefordatagrowth(forboththeon-premisesandAWSinstances).Whilethismaysoundunfair,itisinfactacommonandnecessarypracticeforbothsolutionsasuserswoulddeploymorenodestoavoidfilling100%oftheavailablecapacity.Incontrast,itisunnecessarytopayforstoragegrowthinadvancewithBigQuery’son-demandstorage.TheHadoopupfrontcostsalsoincludesoftwarelicensesfortheoperatingsystem,Hadoop,andHivedistributions.ESGcalculatedtheexpectedmaintenanceandsupportcontractsonthehardware,aswellastheexpectedpower,cooling,andfloorspacerequirements,accountedforasoperationalcosts.
ESGestimatedthatduetothecapacityrequirements,aRedshiftcustomerwouldneedtodeployeightds2.8xlargecomputenodesaswellasasingleleadernodeforVPNaccesstotheclusterandtogeneratequeries.Becausethethree-yearreservedinstancepricingispaidupfront,ESGchosetoaccountforthiscost(aswellasanadditional8%APRcostofcapitaladjustment)asanupfrontcost,ratherthananon-demandcloudcost.ESGcalculatedthatthesmallorganizationcouldalternativelypaymonthlyon-demandcloudcostsinsteadofpayingallcostsupfront.However,overathree-yearperiod,theorganization’scumulativecostwouldberoughly75%higherfortheAWSsolution.
Becausethereisnohardwaretoinstall,configure,maintain,power,andcool,oncedeployed,theoperationalcostsforboththeRedshiftandBigQuerysolutionswouldbeminimal.ESGestimatedtheone-timeman-hourstomigratethedataandtestbothsolutionsbeforeputtingthemintoproduction.BeyondthistheBigQuerysolutioniscompletelyfreeofadditionalmaintenanceandoperationalcosts.TheAWSRedshiftsolution,becauseitisbasedontheconceptofdeployingvirtualnodes,wouldalsobeexpectedtorequireanupfrontinvestmentinman-hourstoresearch,plan,andsizethedeploymenttoensureitmeetstherequirements.
AdministrationcostsforbothRedshiftandBigQuerywereestimatedtobefarlowerthanthosefortheon-premisesHadoopdeployment.WithBigQuery,administrationwouldbeminimal—asingleadministratorwouldbeexpectedtomanagetheaccount,manageuseraccess,investigateadvancedissues,andhelpguideotheranalyststoruntheirownqueriesviaaweb-interface.ESGestimatesthatthiswouldrequirefarlessthanafull-timehireforsmallerdeployments,andcouldbeeasilyaccomplishedbyallocatingresponsibilitytoanexistinganalyst.TheRedshiftsolutionwouldbesimplertomanagethantheon-premisessolution;however,accesstothequeryengineismorecomplexthanforBigQuery,placingmoreofaburdenontheadministratortoassistothers,andmakingself-servicemoredifficultfornon-technicalanalysts.
ESG’smodelestimatedthatGoogle’sBigQuerysolutioncouldsaveasmallorganization$881Koverathree-yearperiodwhencomparedwithinvestinginanon-premisesHadoopcluster,andover$532KwhencomparedwithusingAWSRedshift.TheresultsofESG’sthree-yearTCOanalysisforthe100TB“small”modeledorganizationforeachofthethreesolutionsisshowninFigure3.
WhitePaper:TheEconomicAdvantageofGoogleBigQueryOn-demandServerlessAnalytics 11
©2017byTheEnterpriseStrategyGroup,Inc.AllRightsReserved.
Figure3.Three-yearTotalCostofSolutionSummaryfor100TB“Small”ModeledScenario
Source:EnterpriseStrategyGroup,2017
ModeledScenario#2:MediumOrganization
Themedium-sizedorganizationrequiredenoughstoragetostoreandprotecta500TBdataset,andperformedamixofbatchandad-hocqueriesthatonaverageprocessed500TBofdata(100%ofthestoreddatacapacity).Itshouldbenotedthatthis1:1ratioofstoredtoprocesseddataisasolidstandardformostorganizationstoplanon,andwasvalidatedasthetypicalcaseacrossallBigQuerydeployments.Themedium-sizeddeploymentalsomadeuseofstreamingdataservices,whichcouldberepresentativeofdatageneratedthroughwebormobileapplications,orIoTsensorsforexample.
ESGestimatedthatboththeon-premisesHadoopclusterandAWSRedshiftdeploymentwouldrequireatotalof39nodes(includingtheNameNodeandleadernode).Upfrontmaintenanceandoperationalcostsweremodeledforthethreesystemsinthesamemannerdescribedinthesmallscenario,scaledofcoursetoaccommodatethelargerdeploymentsize.Administrationofthemedium-sizeddeploymentrequiredtwofulltimeadministratorsfortheon-premisesHadoopdeploymenttooverseeoperations,assistanalystswithqueries,andadministertheHadoopdeployment,database,andhardware.ESGassumedthatasinglequerymanagercouldmanagetheRedshiftandBigQuerydeploymentinroughlythesamenumberofman-hoursasthesmalldeployment.However,ESGassumeda44%higherhourlyrateforthemanagerbasedonthescaleandcomplexityoftheorganization.
Inordertoaccommodatethestreamingservice,theAWSRedshiftsolutionmadeuseoftheAWSKinesisservicetocontinuouslyloaddataontotheRedshiftnodesforquerying.TheKinesisserviceofferson-demandpricingonly,whichcanbecomplextocalculate.Topredictpricing,onemustconsiderthroughputrequirements,payloadsize,anddataretention(numberofshard-hoursrequired,numberofPUTpayloadunitspermillion,andoptionalrequirementtokeepdatalongerthan24hours).Incontrast,BigQuerystreaminginsertsaresimplybilledperGBinserted.
ESG’smodelestimatedthatGoogle’sBigQuerysolutioncouldsaveamedium-sizedorganizationmorethan$2Moverathree-yearperiodwhencomparedwithinvestinginanon-premisesHadoopcluster,andover$1.7MwhencomparedwithusingAWSRedshift.TheresultsofESG’sthree-yearTCOanalysisforthe500TB“medium”modeledorganizationforeachofthethreesolutionsisshowninFigure4.
WhitePaper:TheEconomicAdvantageofGoogleBigQueryOn-demandServerlessAnalytics 12
©2017byTheEnterpriseStrategyGroup,Inc.AllRightsReserved.
Figure4.Three-yearTotalCostofSolutionSummaryfor500TB“Medium”ModeledScenario
Source:EnterpriseStrategyGroup,2017
ModeledScenario#3:LargeOrganization
Thelarge-sizedorganizationrequiredenoughstoragetostoreandprotecta1PBdataset,andperformedamixofbatchandad-hocqueriesthatonaverageprocessedanenormous4PBofdata(400%ofthestoreddatacapacity).Thelarge-sizeddeploymentalsomadeuseofalargerpercentageofstreamingdataservices(5%ofqueriesprocessed).
ESGestimatedthatboththeon-premisesHadoopclusterandAWSRedshiftdeploymentwouldrequireatotalof76nodes(includingtheNameNodeandleadernode).Upfrontmaintenanceandoperationalcostsweremodeledforthethreesystemsinthesamemannerdescribedinthesmallscenario,scaledtoaccommodatethelargerdeploymentsize.Administrationofthelarge-sizeddeploymentrequiredthreefulltimeadministratorsfortheon-premisesHadoopdeploymenttooverseeoperations,assistanalystwithqueries,andadministertheHadoopdeployment,database,andhardware.ESGassumedan11%higherhourlyrate(comparedwiththemedium-sizedscenario)fortheadministrationoftheBigQueryandRedshiftsolutionsbasedonthelargescaleandcomplexityoftheorganization.
ESG’smodelpredictedthatGoogle’sBigQuerysolutioncouldsavealargeorganizationmorethan$2.7Moverathree-yearperiodwhencomparedwithinvestinginanon-premisesHadoopcluster,andover$2.3MwhencomparedwithusingAWSRedshift.TheresultsofESG’sthree-yearTCOanalysisforthe1PB“large”modeledorganizationforeachofthethreesolutionsisshowninFigure5.
WhitePaper:TheEconomicAdvantageofGoogleBigQueryOn-demandServerlessAnalytics 13
©2017byTheEnterpriseStrategyGroup,Inc.AllRightsReserved.
Figure5.Three-yearTotalCostofSolutionSummaryfor1PB“Large”ModeledScenario
Source:EnterpriseStrategyGroup,2017
WhattheNumbersMeanESG’smodelsshowthatGoogle’sBigQueryserviceprovidesasimpleandeconomicalsolutionfororganizationsofallsizes.ESG’smodelspredictthatorganizationscanexpecttogenerateinsightatacostthatis60%to88%lowerthandeployingandmanaginganon-premisesHadoopdeployment,and56%to82%lowerthanutilizingtheAWSRedshiftserviceoverathree-yearperiod.
Thebigwin,however,maybesomeofthe“softer”benefitsthatBigQueryprovides.BigQuerydoesnotrequireanupfrontinvestment,providesafastertimetovalue,iseasiertomanagefordepartmentaluse,andallowsformoreanalyststoperformtheirownqueries.Theresultisthatorganizationscanspendlesstimemanagingsoftware,hardware,andqueries,andmoretimegeneratingvaluableinsight.
ModelConsiderations:PricingOptions
On-demandcloudservicesoftenmakesenseforyoungandsmallerorganizationsthatarelookingtogetstartedinanalyticswithoutmakingalargeupfrontinvestment.Largerandmoreestablishedorganizations,however,oftenstruggletomakesenseofthenumerousandcomplexpricingschemesthatsomecloudprovidersoffer.Whenanalyzingtheinvestmentoptionsbetweenanon-premisesandAWSRedshiftdeployment,over20optionsmayhavetobemodeledandconsidered(on-demandversusupfrontpricing;one-yearorthree-yearterm;andnothing,partial,orallupfront,acrossfourinstancetypes,plustheKinesisservice).Incontrast,modelingBigQuerypricingissimple.
ESG’smodelsfoundthatwhenlookingtominimizecostsoverathree-yearperiod,italmostalwaysmakesthebestfinancialsensetopaythethree-yearterm,allupfrontpricingoptionforAWSRedshift.Thesavingsofupto75%nearlyforceorganizationstoopttopaylonginadvanceofvaluerealization.Figure6showsasampleanalysisbasedonthe“medium”scenario,comparingthehighestandlowestcostAWSRedshiftpricingoptionstoBigQuery’son-demandpricing.
WhitePaper:TheEconomicAdvantageofGoogleBigQueryOn-demandServerlessAnalytics 14
©2017byTheEnterpriseStrategyGroup,Inc.AllRightsReserved.
Figure6.UpfrontversusMonthlyCostComparisonfor“Medium”Scenario(CumulativeoverTime)
Source:EnterpriseStrategyGroup,2017
Thedifferencesinthesecostsaremoreapparentwhenthey’rebrokenoutintoonetime“upfront”coststhatarepaidbeforeanyvalueisrealizedfromthesolution,andrecurring“monthly”costs,whicharepaidasthevalueisrealizedfromthesolution.ThedifferencesbetweenthesolutionsareshowninFigure7.
Figure7.UpfrontversusMonthlyCostComparisonfor“Medium”Scenario
Source:EnterpriseStrategyGroup,2017
WhitePaper:TheEconomicAdvantageofGoogleBigQueryOn-demandServerlessAnalytics 15
©2017byTheEnterpriseStrategyGroup,Inc.AllRightsReserved.
WhyThisMattersIfacloudserviceisnoteasytoplan,purchase,anddeploy,organizationswillhaveahardertimejustifyingtheinvestmenttodecisionmakers.Whilesomeorganizationsmayenjoytheoptionofpayingupfront,orfindvalueinconsistentpredictablepricing,thedecisionshouldbestraightforwardandshouldnotresultinbeinglockedintoatechnologyorvendor.Thisdefeatsthepurposeofaservice.
Googleunderstoodthiswhenitcreatedasimple,serverlesson-demandpricingmodelforitsBigQueryservice.Organizationsthatdemanddedicatedresourcesorpricepredictabilitycandecidetopaythemonthlyflatratefee,orincreasetheircomputerequirementsifneeded.However,themajorityofcustomerswillbenefitfromthesimpleon-demandpricingandadvancesinservertechnology,withthefreedomtochangetheirmindatanytime.
TheBiggerTruth
Eventhoughthesmoke-filleddaysoftheindustrialagehavelongpast,thelessonslearnedremain.Thechallengesofindustryonceincludedhowtoquicklyandcost-effectivelymine,transport,andconvertrawmaterialsintousableproducts—businessestodayaresimilarlytaskedwithcollecting,manipulating,andconvertingrawdatatoyieldactionableintelligenceandvaluableinsight.Timehasnottransformedthetechniqueforsuccess:keepcostslowandpredictable,whilesimplifyingoperations,avoidingbottlenecks,anddeliveringmaximumvalueintheminimalamountoftime.
Likeourindustrial-agepredecessors,theprimarymethodologyemployedbytoday’sorganizationsistomakealargeupfrontinvestmentinphysicalinfrastructure(alargefactory),sizedtomeettheexpecteddemandfortheforeseeablefutureatgreatfinancialriskofover-orunder-provisioningequipment(servers,network,storage,andsoftware)andunder-orover-staffinghumanresources(hardwareandsoftwareadministratorsandexperts).Astheindustrialistseventuallyfoundout,productscanbeproducedwithgreateragilityandlessriskbyleveragingmassiveglobaloperationsthatspecializeinproduction—whilethecompanycouldspendmorecyclesmaximizingthemonetaryvalueoftheend-product.
Today’sanalyticschallengesarenodifferent.Manyorganizationsaresimplynotlargeenoughtojustifyspendingvaluabletime,resources,andmoneybuilding,managing,andmaintaininganon-premisesHadoopinfrastructurewhenGoogle’sBigQueryon-demand,serverlesstechnologycanprovideinsightmorecost-effectively,inagreatlysimplifiedmanner,andwithlessfinancialrisktotheorganization.
ESG’models,builtontheresultsofvalidationwithBigQuerycustomers,showthatorganizationscanexpecttosavebetween$881Kand$2.7Moverathree-yearperiodbyleveragingBigQueryinsteadofplanning,deploying,testing,managing,andmaintaininganon-premisesHadoopcluster.ThemodelsalsoshowthatBigQuery’sserverlessdesignandsimplepricingcanprovideasolutionthatissimplertomanageatatotalcostthatisbetween56%and82%lessexpensivethanusingAWSRedshifttostoredataandperformqueries.Perhapsmoreimportantly,thesimplicityoftheBigQuerysolutionaccommodateseconomiesofscaleacrossdepartmentsandenablesanalyststoexecutetheirownqueries,allowingforgreaterorganizationalefficiencyandquickerinsight.
Ifyourorganizationislookingtospendmoretimegeneratingmorevaluefromyourdata,andlesstimemanagingthesolution,whilekeepingcoststoaminimum,thenESGsuggestsyouconsiderlettingGooglehandletheinfrastructurewhileempoweringyourorganizationwiththeabilitytofocusontheinsight.
©2017byTheEnterpriseStrategyGroup,Inc.AllRightsReserved.
Alltrademarknamesarepropertyoftheirrespectivecompanies.InformationcontainedinthispublicationhasbeenobtainedbysourcesTheEnterpriseStrategyGroup(ESG)considerstobereliablebutisnotwarrantedbyESG.ThispublicationmaycontainopinionsofESG,whicharesubjecttochangefromtimetotime.ThispublicationiscopyrightedbyTheEnterpriseStrategyGroup,Inc.Anyreproductionorredistributionofthispublication,inwholeorinpart,whetherinhard-copyformat,electronically,orotherwisetopersonsnotauthorizedtoreceiveit,withouttheexpressconsentofTheEnterpriseStrategyGroup,Inc.,isinviolationofU.S.copyrightlawandwillbesubjecttoanactionforcivildamagesand,ifapplicable,criminalprosecution.Shouldyouhaveanyquestions,pleasecontactESGClientRelationsat508.482.0188.
www.esg-global.com [email protected] P.508.482.0188
EnterpriseStrategyGroupisanITanalyst,research,validation,andstrategyfirmthatprovidesactionableinsightandintelligencetotheglobalITcommunity.
©2017byTheEnterpriseStrategyGroup,Inc.AllRightsReserved.