the next generation of data storage · the next generation of data storage ... , facebook, and...

16
The Next Generation of Data Storage

Upload: others

Post on 21-May-2020

9 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: The Next Generation of Data Storage · The Next Generation of Data Storage ... , Facebook, and Yahoo began taking parallel processing techniques used for decades in high-performance

TheNextGenerationofDataStorage

Page 2: The Next Generation of Data Storage · The Next Generation of Data Storage ... , Facebook, and Yahoo began taking parallel processing techniques used for decades in high-performance

TheNextGenerationofDataStorage

www.apeirondata.com ©2017ApeironDataSystems.Allrightsreserved.Version1.0 Page2of16

ExecutiveSummary................................................................................................................................................................3

1 Introduction..............................................................................................................................................................4

1.1 TheStorageRevolutionContinues...............................................................................................................4

1.2 TheEmergenceofReal-timeApplications...................................................................................................4

1.3 ComplicationsofScale-outApplications......................................................................................................5

1.4 EmergenceofNVMeoverEthernet.............................................................................................................6

1.5 ApeironStorageTechnologyAdvantages....................................................................................................7

1.6 ApeironStoragePerformanceAdvantages..................................................................................................8

2 ApeironTechnology..................................................................................................................................................9

2.1 StorageSolutionHardwareComponents.....................................................................................................9

2.2 HowApeironVLUNsWork.........................................................................................................................10

2.3 OtherApeironInnovations.........................................................................................................................10

2.4 StorageManagementInterfaces................................................................................................................11

3 ApplicationsforNVMeoverEthernet.....................................................................................................................13

3.1 SplunkEnterprise.......................................................................................................................................13

3.2 HighPerformanceComputing(HPC)..........................................................................................................14

3.3 Hadoop.......................................................................................................................................................14

3.4 FinTech.......................................................................................................................................................15

4 Conclusions.............................................................................................................................................................16

Page 3: The Next Generation of Data Storage · The Next Generation of Data Storage ... , Facebook, and Yahoo began taking parallel processing techniques used for decades in high-performance

TheNextGenerationofDataStorage

www.apeirondata.com ©2017ApeironDataSystems.Allrightsreserved.Version1.0 Page3of16

ExecutiveSummary

Storagetechnologyhasevolvedovertimetogreaterlevelsofperformance,capacityscalabilityandcosteffectiveness.The

futuredemandsforstoragewillseeamagnitudeofadvancestoaddresstheemergenceofanewgenerationof

applicationsdrivenbyBigDataanalytics.Goingforward,marketsuccesswillbeachievedbystoragevendorsthatcan

addressaquantumleapinthroughputwithhigherIOPs,fullbandwidthandnewlevelsoflowlatencyforbusiness-critical

applicationworkloads.Thisnewperformanceparadigmisrequiredtoenhancecustomerservicelevelexpectationsandto

makefasterbusinessdecisions,Storagewillbethekeyfocalpointasorganizationsofallsizesandverticalmarketsstrive

foracompetitiveadvantage.

TheemergenceofanewtechnologycalledNVMe(Non-VolatileMemoryExpress)isagame-changingenablerof

acceleratedstoragespeedbutnotallNVMestoragesolutionsarethesame.Inthisdocument,wewillexploretheanswers

tothefollowingquestions:

• Howdidwegettowherewearetodaywithstoragearchitectures?

• Whathaschangedwithnewapplicationsandcustomerworkloads?

• WhathasApeirondonetoeliminatetraditionalstoragebottlenecksandwhataretheApeironarchitectural

innovationsthatcanachievefullbandwidthandgreaterthroughputversusanycompetitivealternative?

• WhatarethespecificUseCasesthatrequirenewstoragearchitectureslikeApeiron?

ThefutureforstorageinfrastructurewillbedrivenbythenextwaveofKillerAppswhichrequireanewstorage

architecturethatcanleveragethepotentialofNVMetechnologyinawaythateliminatesIObottlenecksanddeliverson

thepromiseofBigDatadecisionsupportsolutions.

Page 4: The Next Generation of Data Storage · The Next Generation of Data Storage ... , Facebook, and Yahoo began taking parallel processing techniques used for decades in high-performance

TheNextGenerationofDataStorage

www.apeirondata.com ©2017ApeironDataSystems.Allrightsreserved.Version1.0 Page4of16

1 Introduction

1.1 TheStorageRevolutionContinues

Ithasgenerallygoneunnoticedthatsincetheearly2000’sastoragerevolutionhasbegun.Enterprisestoragetodayisa

$60BmarketprovidingcustomerswithstoragecentricdesignsthatarebasedonNASorSANarchitectures.Beginningin

thelate1990’sBigDataandsocialmediagiantslikeGoogle,Facebook,andYahoobegantakingparallelprocessing

techniquesusedfordecadesinhigh-performancecomputing(HPC)andapplyingthemtoclustersofconsumergradex86

whiteboxserverswithinternalharddiskdrivesbeforeprogressingtodaytoincludeinternalsolid-statedrives.

Theydevelopedsoftwaretomanagepetabytesofdataspreadacrosshundredsoreventhousandsoftheselow-end

servers,replacingoreliminatingmanyofthebasicNAS/SANfeaturesintheindustryincluding:

CapacityOptimization DataProtection

Deduplication

Compression

DataManagement

Replication

Snapshots

Clones

Table1.1-1TypicalNASandSANFeaturesOftenProvidedbyApplications

Bykeepingdisparatecopiesofdata,thisapplicationsoftwarewasabletobuildlevelsofredundancyandhighavailability

muchmorecost-effectively,andinturn,endtheneedforSANorNASsolutionsinnascentBigDatamarkets.

1.2 TheEmergenceofReal-timeApplications

Whilemostoftheseapplicationswereinitiallyfocusedonanalytics,by2010theybegantofindtheirwayintoreal-time

customerfacingapplications,startingintheAdTechandwebpersonalizationareas.Withthereleaseofopensourcecode

fromGoogle,Facebook,andAmazontheseapplicationsquicklyexpandedacrosstheFortune500enablingcorporations

toextractvaluefromtheever-growingamountsofmachineandnetworkdata.Thishasresultedindifferencesofopinion

regardingthedifferencebetweenBigDataandFastData,soApeironproposesthefollowingdescriptionsforclarification.

Page 5: The Next Generation of Data Storage · The Next Generation of Data Storage ... , Facebook, and Yahoo began taking parallel processing techniques used for decades in high-performance

TheNextGenerationofDataStorage

www.apeirondata.com ©2017ApeironDataSystems.Allrightsreserved.Version1.0 Page5of16

BigData FastData

Dataatquantities(volume)beyondthecapabilityofcommonITinfrastructuretoretain,manage,andprocess.

Dataatspeeds(velocity)beyondthecapabilityofcommonITinfrastructuretocapture,manage,andprocess.

Table1.2-1DescriptionsofBigDataandFastData

ApplicationsoranalyticframeworkslikeHadoop,Splunk,Spark,SAPHana,andSecurityOnionquicklymovedintocritical

businessareasrangingfromcybersecurity,totransactionsystems,toFinTech.IDCreportsthatby2020,over70%ofall

Fortune2000corporationswillhaveinplaceatleastonereal-timecustomerfacingBigDataapplicationthatisbusiness

critical.Theseclusterswillrequirethisnewtypeofstorage.

Value Variety Velocity Volume

BigData High High Low High

FastData High High High Low

Table1.2-2ComparisonofBigDataandFastDataCharacteristics

ThekeydifferencesbetweenBigDataandFastDataarenottheusefulnessoftheirdata(value)ordiversityofdatatypes

(variety),buttherateatwhichdataiscreated(velocity)andtheamountofdataretained(volume).CapturingcurrentFast

Data,ofteninreal-time,forfutureBigDataanalyticsisanemergingperformance(velocity)challengeforstorage.Fast

DatabecomesBigData,sostoragecapacityrequirementsforBigDataexceedFastData.Apeironsuggestsanincreasing

numberofBigDataapplicationswillbecomeprovisionedbyFastDataapplications.

1.3 ComplicationsofScale-outApplications

Therapidgrowthofthese“scale-out”applicationsquicklybecameanissuefortheEnterprisestorageprovidersastheir

focushadbeenthereliabilityandtherichnessoffeatures,asopposedtoperformanceandtotalcostofownership(TCO).

ITmanagersdeployingscale-outapplicationsbeginbyinstallingsmallclustersofx86serverswithinternaldriveswhere

thestorageismanagedbytheapplication,notwithexpensivestorageheadcontrollers.Manyofthesesamefeaturesthat

allowcompanieslikeDell/EMC,HPE,andNetApptochargehigherpricesfortheirSAN/NASsolutionsnowlookoutof

placeandoverlyexpensiveinBigDataenvironments.

Recently,newsuppliershavealsoshownupwithadvancedversionsofthesametiredSANarchitecturesbuttheyhave

hadlimitedsuccesssincetheyhaveignoredthenewtenetsofBigDatascale-outstorage:

Page 6: The Next Generation of Data Storage · The Next Generation of Data Storage ... , Facebook, and Yahoo began taking parallel processing techniques used for decades in high-performance

TheNextGenerationofDataStorage

www.apeirondata.com ©2017ApeironDataSystems.Allrightsreserved.Version1.0 Page6of16

• ITArchitecturalAdvances:Theapplicationsoftwareontheservernowmanagesstorage

• StorageManagementCostEffectiveness:Customerswillnolongerpayforenterprisestoragefunctionality

• GreaterServiceLevelExpectations:Usersrequirethesameperformancefromanyexternalstorageastheyget

withinternalstorage

Scale-outstorageusinginternalSSD’sscalespoorly.ITmanagersmustaddserverstodaywhenadditionalstorageis

required,evenwhenadditionalprocessinghorsepowerisnot.Theseproblemsareexacerbatedwiththedeploymentof

NVMeflashdriveswithnewNANDand3DXPoint™mediawhereexternalcontrollersactivelylimitthedrives’

performance.TheseissueshavedirectlyaffectedthemarketgrowthofapplicationslikeHadoopandSplunk.ITmanagers

arehesitanttogrowtheirclusterspastacertainsizeduetothefollowingchallengeswithInfrastructurecostthenumber

onecomplaintvoicedbyITmanagersoflargescale-outsolutions.

Figure1.3-1InfrastructureCost Figure1.3-2PerformanceDropatScale Figure1.3-3LackofExpertise

1.4 EmergenceofNVMeoverEthernet

ThiswasthereasonbehinddevelopingNVMeoverEthernetbyApeiron.WewantedtogiveITmanagerstheperformance

theysawwithdrivesinstalledinservers,withthebenefitsofexternalpooledstorage.Thisenablesdatacenterstoscale

processingandstorageresourcesindependently.WhileprovidingtheelasticmanagementofalargepoolofNVMeSSDs

undersoftwaretoolslikeOpenStack,Docker,orHadoop,NVMeoverEthernetalsocanprovideperformancetoservers

thatisoftenbetterthanseenwithSSDsinstalleddirectly,afactthatoftensurprisesITprofessionals.Nowtheycanhave

thebestofbothworlds.

WebeganthisdesignbybuildingalosslessEthernetarchitecturethatcanscaletothousandsofexternalNVMedrives

withlatencyoverheadoflessthan2us-builtupofmulti-portserverHBAsand2UNVMeshelves.Clustersscalewithlinear

performance.Wetookafreshapproachtothedesignimplementingourdatapathusinghigh-speedFPGAs,selecting

Layer2Ethernetasatransportprotocol,andpassingNVMecommandsnativelyacrossthefabricasshowninFigure1.4-1.

Thenativetransportiscriticaltomakingsurenoperformanceislostinthepooledenvironment.

Page 7: The Next Generation of Data Storage · The Next Generation of Data Storage ... , Facebook, and Yahoo began taking parallel processing techniques used for decades in high-performance

TheNextGenerationofDataStorage

www.apeirondata.com ©2017ApeironDataSystems.Allrightsreserved.Version1.0 Page7of16

Figure1.4-1NVMeoverEthernetNativeNVMeProtocol

1.5 ApeironStorageTechnologyAdvantages

LegacysolutionsrequirethetransportofdataalongwithconnectionIDs.Theybuilduptablesofinformationonboththe

serverandonthestorageunitabouttheconnections.Unfortunately,thislimitstheabilityofasystemtotakeadvantage

oftheknowledgeaboutthestorageinformation.TheADS1000canacceleratethetransferoflargeI/Oblocksorstringsof

datainservermemory(I.e.multiplePhysicalRegionPagesorPRPs)bypullingI/O’saheadofnormalNVMecompletion

queues.Attheexternalstorageunit,thesecommandsmustberebuiltbyprocessorsinthedatapathtoprovidetothe

drivesandmustbeprocessedserially.Thistakesmoreandmoretimeastheseconnectiontablesgrow,addsexpense,and

requiresadditionalpowerandequipmentspacetoaccomplish.

TheeffectofthisadditionalcomplexitycanbeseenwhencomparingtheADS1000withallflasharraysservingenterprise

storage.Ontopofhavingunitsoftentwoorthreetimesinsize,theyarethreetimesthecost,withperformancethatis

oftenathirdofwhatinternalSSDsprovide(seeApeironStorageComparisondocument).

Thismeanscustomersmustpurchaseadditionalexpensiveallflasharraystosatisfytheclusterperformanceneeds,

comparedtothoseservedbyNVMeoverEthernet.InapplicationslikeSplunk,queryperformancecanbe90xslower

whenusingtraditionalNASandSANasdetailedintheApeironReferenceArchitectureforSplunkdocument.The

performancedifferenceshaveadramaticeffectonthecostofdeploymentandTCO,notjustapplicationperformance.

Keepingtotalcostofownershipinmind,wealsohaveembeddedEthernetswitchesinourequipmentsothatno

additionalequipmentisneeded,reducinginfrastructurebyupto30%whencomparingtonetworksusingInfiniBandor

FibreChannel.Thiscollapsesdatacenterinfrastructurebyeliminatingmultipletransportprotocolstomanage,redundant

externalswitches,andenablingmulti-pathdriveaccessforhighavailability.ThisallowsApeirontodelivermulti-petabytes

offlashorserverclassmemoryinonerack.

Page 8: The Next Generation of Data Storage · The Next Generation of Data Storage ... , Facebook, and Yahoo began taking parallel processing techniques used for decades in high-performance

TheNextGenerationofDataStorage

www.apeirondata.com ©2017ApeironDataSystems.Allrightsreserved.Version1.0 Page8of16

1.6 ApeironStoragePerformanceAdvantages

ThesameSSDscandeliverfasterperformanceandmoreresponsivenesswheninstalledinADS1000systemsthana

server’sinternalstoragebays.OnereasonthiscanoccuristhatserverinstalledNVMeSSDsconnecttoPCIex4buses,but

ADS1000systemssupportNVMeoverEthernetwithApeironhostadaptersusingPCIex8connectionstoservers.Figure

1.5-1showshowtheperformanceofdifferentSSDscanimprovebyupto1.5xwhentheSSDsareinstalledinanADS1000

systeminsteadofinsidetheserver.Similarly,figure1.5-2showshowresponsivenessimprovesbyupto2xwhentheSSDs

areinstalledinanADS1000systeminsteadofinsidetheserver.

Figure1.6-1SSDPerformanceinADS1000VersusInternalServerStorageBay(DAS)

Figure1.6-2SSDResponsivenessinADS1000VersusInternalServerStorageBay(DAS)

Page 9: The Next Generation of Data Storage · The Next Generation of Data Storage ... , Facebook, and Yahoo began taking parallel processing techniques used for decades in high-performance

TheNextGenerationofDataStorage

www.apeirondata.com ©2017ApeironDataSystems.Allrightsreserved.Version1.0 Page9of16

2 ApeironTechnology

2.1 StorageSolutionHardwareComponents

AlongwiththeseissuesApeironresolvedmultiplecommonscale-outstorageissuesaffectingtheindustry.Webeganby

addressingstoragemetadataissuesthatlimitmanystoragesolutionsastheygrow.OurendpointdesignallowstrueSSD

virtualizationandprovidesbetterfaultisolationthannormallyseenwithothertechnologies.

ForHPC,wealsoprovideanativeservertoservermessagingtechnologythatenablesuserstoeliminateexpensive,

sprawlingInfiniBandinfrastructure,creatingaconvergednetworkfabric.Havingacentralstoragemanagementagent

allowsITmanagerstoquicklyintegrateApeironsystemsintocommondatacentermanagementsystemswithease.

Figure2.1-1ApeironStorageSystem Figure2.1-2ApeironStorageServer

Figures2.1-1and2.1-2showthebasicApeironsolutioncomponents.Weofferadual40Ghostbusadaptor(ADS-40G

HBA)thatisinstalledinastandardhalfheight,half-lengthserverPCIeGen3x8slot.ThispresentsitselfasaSSD,or

endpoint,totheserver,breakingthepathtotheexternalstorageandenablingtrueNVMevirtualization.

TheApeirondriverallowstheHBAtobuildvirtualLUNsfortheclientapplications,builtupofportionsofremoteNVMe

SSDs,orentireSSDs,whicharethenpresentedasblockstoragedriveentriesintheserver/devdirectory(SeeFigure2.1-

2).TheseVLUNscanbeexpandedontheflythroughtheApeironStorageManager(ASM).DatastoredonSSDscanbeset

uptobespreadacrossdrivestoimproveperformanceormirroredforhighavailability.

Page 10: The Next Generation of Data Storage · The Next Generation of Data Storage ... , Facebook, and Yahoo began taking parallel processing techniques used for decades in high-performance

TheNextGenerationofDataStorage

www.apeirondata.com ©2017ApeironDataSystems.Allrightsreserved.Version1.0 Page10of16

Figure2.1-3ADS1000VLUNsinaHadoopApplication

2.2 HowApeironVLUNsWork

AstheseVLUNsgrow,themetadatastoredoneachserverissimplyauniqueMACaddressassignedtoeachremoteSDD

alongwithalogicalblockrangeonthedevice.Thisallowstheconnectioninformationtobelimitedtoahandfulofentries

andallowstheclusterperformancetoscalewithoutperformancelimitations.VLUNscanbeexpandedornewVLUNscan

beaddedasapplicationsrequire.InNOSQLapplicationswhichprefertostartmoreprocessorthreadsasworkloadsgrow,

addingmoreindividualLUNswhichcanbepinnedisideal,whereas,withapplicationsstoringlargevolumesofdatalike

HadoopgrowinglargeindividualSDDscanbehelpful.

Insomeapplicationslikesharedcache(i.e.Memcached)havingmultipleserversaccessasingledrivealsocanberequired.

TheADS1000providespermissionssetupbytheASMtoenablethisability;however,incasesofmultipleserverwrite

access,lockingandunlockingmustbemanagedbytheapplication.Thisflexiblemetadatamanagementoffersboth

performanceatscaleandtheapplicationflexibilityneededbyscale-outapplications.Connectiondatamanagementin

largeclustershastroubledengineersfordecades.

SSDsmountedintheADS1000aretrackedbyboththeirMACaddressesanduniquemanufacturingserialnumbers.SSDs

canbehotplugged,removed,movedtoothershelves,andrestartedwithoutaffectingapplications.Thisisatremendous

problemfornetworksbasedoffotherfabrictechnologieslikePCIebecausetheyforceanapplication-stoppingre-

enumerationcycle.Additional“hot”SSDscanbeaddedtotheADS1000andassignedtoserversbytheASMasneeded,or

throughsetrules(I.e.Drivesinusereachsetstoragelimits).

2.3 OtherApeironInnovations

Onthestorageshelf(ADS1000)thereareanumberofinnovations.ThefirstimprovementisthateachADS1000provides

32individual40Gconnections.Today’sstorageunitsnormallyofferahandfulofconnectionswhichthrottlestheSSD

performancepassedtoservers.Intel3DXPoint™SSDsroutinelycanpresentover2GB/sofbothreadandwrite

Page 11: The Next Generation of Data Storage · The Next Generation of Data Storage ... , Facebook, and Yahoo began taking parallel processing techniques used for decades in high-performance

TheNextGenerationofDataStorage

www.apeirondata.com ©2017ApeironDataSystems.Allrightsreserved.Version1.0 Page11of16

bandwidth.SamsungNVMeSSDscanprovidecloseto3GB/sofsustainedreadbandwidth.With24individualSSDs,seeing

combinedbandwidthover50GB/sisnotunusual.

TheADS1000wasdesignedtopassallofthisperformancetoapplicationserverswhichisabasicrequirementforscale-

outstorage.Inaddition,theseunitscanbedaisychainedtogetherwiththeseconnectionstoprovidelargeramountsof

storageandtoprovidehighavailability.CreatingasinglerackthatcanprovideseveralpetabytesofSSDstoragealongwith

tensofclientserversisroutine.Apeiron’sinnovationisbuiltonasoftwaredefinedstoragestrategyastheADS1000can

supportallcommerciallyavailableNVMeSSDs.

InternaltotheADS1000thereareEthernetswitches,dualI/Omodules(IOMs),anddualfanandpowersupplies.These

areallfieldreplaceableunits.TheIOMspresentaserverrootcomplextotheinternal24NVMeSSDssotheythinkthey

areinstalledinaserver.Theswitchesaredualpurpose.Theyeliminatetheneedforexternalswitcheslimitingdatacenter

sprawlandcost.Theyalsoallowforhighavailabilitybyprovidingmultiplepathstoeachdrive.

Figure2.3-1ApeironNVMeoverEthernet(NativeInternalSwitches)

Figure2.3-2TraditionalEthernetNAS

(RequiresExternalSwitches)

Figure2.3-3TraditionalFibreChannelSAN(RequiresExternalSwitches)

Thiseliminatestheneedforexpensivemulti-portSSDs.Scale-outapplicationstypicallymakeuseofdatareplicasto

provideredundancy.Insmallclusters,thesereplicascanbeplacedacrossthetwoIOMsinasingleADS1000,andasthe

numberofADS1000’sgrowthereplicascanbespreadacrossunitsorracks.TheADS1000alsoprovidesmultiple40G

pathsinternallybetweenIOMsallowingalternateconnectionstodrivesforredundancy.

2.4 StorageManagementInterfaces

Alongwiththeinnovationsinarchitectureandhardware,Apeironspentagooddealoftimedevelopingacomprehensive

solutiontomanagedistributedstorage.Becausetheclustersnowincorporatetheapplicationserversalongwithstorage

SERVER

STORAGE

SERVER

SERVERSERVERSERVERSERVERSERVER

NETWORKNETWORK

SERVER

STORAGE

SERVER

SERVERSERVERSERVERSERVERSERVER

NETWORKNETWORK

Page 12: The Next Generation of Data Storage · The Next Generation of Data Storage ... , Facebook, and Yahoo began taking parallel processing techniques used for decades in high-performance

TheNextGenerationofDataStorage

www.apeirondata.com ©2017ApeironDataSystems.Allrightsreserved.Version1.0 Page12of16

elements,anApeironStorageManager(ASM)wasrequiredthatcanresideonthenetworktomonitorthestateofall

components–applicationservers,HBA’s,ADS1000units,internalswitches,andSSDs.

Figure2.4-1ApeironStorageUserInterface

However,whenconfiguringstorage,theApeironCLIagentwhichmaybeinstalledanywhereinthenetworkhasaccessto

theRESTAPI,ortheAUI.ExamplesoftheApeironCLIandtheApeironStorageUserInterfaceareshowninFigures2.4-1

and2.4-2.

Figure2.4-2ApeironCLI

user@mars1> asmctl -e ADS1000v1-12006 show drives Slot: 0 Enclosure Name: ADS1000v1-12006 Size: 7.64TB SN: SN35248965245 Drive ID: DRVab4de569b3ca45 Updated: 3 seconds ago Slot: 1 Enclosure Name: ADS1000v1-12006 Size: 7.64TB SN: SN35248965395 Drive ID: DRVab4de569b3ca99 Updated: 3 seconds ago Slot: 15 Enclosure Name: ADS1000v1-12006 Size: 7.64TB SN: SN35248965355 Drive ID: DRVab4de569b3ca59 Updated: 3 seconds ago Slot: 16 Enclosure Name: ADS1000v1-12006 Size: 7.64TB SN: SN35248965356 Drive ID: DRVab4de569b3ca61 Updated: 3 seconds ago user@mars1>

Page 13: The Next Generation of Data Storage · The Next Generation of Data Storage ... , Facebook, and Yahoo began taking parallel processing techniques used for decades in high-performance

TheNextGenerationofDataStorage

www.apeirondata.com ©2017ApeironDataSystems.Allrightsreserved.Version1.0 Page13of16

3 ApplicationsforNVMeoverEthernet

3.1 SplunkEnterprise

Apeiron’sNVMeoverEthernetarchitectureisidealforavarietyofscale-out,highperformancecomputing,andparallel

storageapplications.ApplicationslikethecoreSplunkEnterprise—andextendedproductsincludingSplunkEnterprise

Security(ES),SplunkITServiceIntelligence(ITSI),andSplunkUserBehaviorAnalytics(UBA)—aredesignedtoingestand

allowqueriesonpetabytesofmachine,andnetworkdata.DuetoI/Operformancelimitationssearchabledataisoften

constrainedandpassedontoalternateofflinestoragesolutionsasitages.Thisdramaticallylimitsthevalueofthedata.

SlowI/OperformancemeansdatamustbespreadacrossadditionalserverstoprovidequerySLAperformance.

Figure3.1-1SplunkEnterprise

CriticalapplicationsliketheEnterpriseSecuritymoduleinSplunkmustbeturnedoff,orrunfewertimesperday.This

createsmorecorporateexposuretoattacksandmeansintrusionstakemuchlongertoanalyze.Theseissuesalsoextend

thetimerequiredtostandupnewSplunkinstallationsasSplunkcertifiedengineersreportthat80%oftheirtimeisspent

todayonperformancetuningduetostorageequipmentperformanceinadequacies.

SplunkSearchType SplunkEventsFound SplunkReferenceSearchTime

ApeironASASearchTime

ApeironASAAdvantage

Rare23 11.2 2.1 5.3x

115 11.2 6.1 1.8x

SuperSparse26,802 1,112.2 12.6 88x

180,850 1,112.2 19.2 58x

Sparse155,459,317 31,091.9 2842.2 10.9x

1,126,745,647 225,349.1 14,985.3 15.0x

Table3.1-1SplunkSearchPerformancewithApeiron

Page 14: The Next Generation of Data Storage · The Next Generation of Data Storage ... , Facebook, and Yahoo began taking parallel processing techniques used for decades in high-performance

TheNextGenerationofDataStorage

www.apeirondata.com ©2017ApeironDataSystems.Allrightsreserved.Version1.0 Page14of16

Overthelastseveralyearsnon-volatilememorypricinghasgonedownby4xwhiletheSSDsizeshavegrowngreaterthan

10x.ThisenablesSplunkenvironmentstokeepmonths,orevenyearsofdataactiveinacosteffectivemanner.Withthe

entiredriveperformancepresentedtoapplicationservers,wehavedemonstratedqueryimprovementsbyupto90x

whileaccessingmultipleyearsofdata,andwhilecuttingthenumberofserversusedbyupto80%.SeetheApeiron

ReferenceArchitectureforSplunkdocumentfordetails.

3.2 HighPerformanceComputing(HPC)

Formanyyears,HPCwascharacterizedbylargesinglepurposeclusterswithexpensive,proprietaryhardwaretunedto

specificapplications.Overthelastdecadeagooddealofworkhasbeendonetocreatemoregeneralpurposemulti-

tenantclustersthatcansupportmultipleconcurrentusersanddifferentapplications.TheADS1000wasdesignedto

supportthismovement.ItworkshandinhandwithparallelfilesystemslikeLustreinordertoquicklywarm-upclient

datasetslikethoseusedbytheworld’smostpowerfulsupercomputersasrankedbytheTOP500list(www.top500.org).

Figure3.2-1TOP500

Apeironsupportshigh-speedservertoservermessagingusingtheApeironstoragefabricsothatnoexternalswitching,

interfacecards,oralternativeprotocolsarerequired.Themessagingusesdirectmemoryaccessusingthesamequeues

andbuffersdesignedforNVMeoverEthernetstorageinordertogetsingledigitmicrosecondtransfersusingastandard

OpenFabricAlliance(OFA)softwarestack(Seewww.openfabrics.org).ThismakestheADS1000particularlyusefulin

FinTechapplicationsusingTIXmessagingprotocolsorforscientificanalysisasingenomicsequencing.

3.3 Hadoop

KeychallengesofHadoopclusterimplementationsishowtoaccelerateperformancetomakefasterbusinessdecisions

withoutbreakingthebank,coupledwithhowtoeffectivelydealwithdatacentersprawl.WhenfacedwithI/Ostorage

infrastructurelimitationstheanswerisgenerallytoincreasethenumberofservers.However,Hadoopmanagement

solutionsfromClouderaorHortonworkschargebasedonaperserverbasis.Whenyouarescalingtheclusterstoonlyget

additionalstoragethisbecomesanexpensiveproposition.

Page 15: The Next Generation of Data Storage · The Next Generation of Data Storage ... , Facebook, and Yahoo began taking parallel processing techniques used for decades in high-performance

TheNextGenerationofDataStorage

www.apeirondata.com ©2017ApeironDataSystems.Allrightsreserved.Version1.0 Page15of16

Figure3.3-1ApacheHadoop

ThishaslimitedHadooptoapplicationswhereperformanceisnotcriticallikelargedatalakesbuiltonHDDs,orovernight

routineanalytics.However,HadoopwithbuildingblockslikeSparkisideallysuitedtoreal-timepipelinedprocessesfor

DeepLearning,orreal-timeanalyticsonpetabytescaledatasets.

UsingHadoopwithApeironexternalNVMeSSDsversusinternalHDDsincreasesHadoopreadperformanceby49.5xand

writeperformanceby11.6xwhilereducingthenumberofdatanodesrequiredby50%.Evenwhenperformanceis

comparedwithinternalSSDs,ApeironacceleratesHadoopreadperformanceby8.7xandwriteperformanceby2.6xwith

thesamedatanodereduction.Finally,thislevelofApeironperformanceisachievedwith40%fewerHadoopservers.

SATAHDDs SATASDDs ApeironNVMeSSDs ApeironAdvantage

Servers 7 7 4 40%fewerserversrequired

Datanodes 6 6 3 50%fewerdatanodesrequired

Disks 12 12 12 Flexibleexternalvs.internalstorage

Read(MB/s) 722 4,087 35,764 49.5xor8.7xfasterperformance

Write(MB/s) 218 952 2,526 11.6xor2.6xfasterperformance

Table3.3-2ComparisonofHadoopDeploymentswithHDD,SDD,andApeironStorage

3.4 FinTech

AcompellingBigDataapplicationintheFinTechareawhichisidealforApeironispreandposttradeanalytics.NVMeover

EthernetisideallysuitedintheseapplicationssinceitsI/Operformanceeliminateslargenumbersofunneededservers

andprovidesfordramaticqueryperformanceimprovements.ByprovidingdifferenttypesofSSDmediaallinonestorage

environment,highperformancemedialikeIntel’sOptane™SSDscanbeusedforblockcacheorhigh-performancestorage

poolsofhotdatawhilestandardNANDSSDscanbeusedtostorewarmdataallmanagedbyresourcemanagerslikeYARN

orMESOS.Thisallowsapplicationstotailortheiruseofstoragemediatothedataneeds.Formoreinformation,please

readtheHadoopReferenceArchitecturedocument.

Page 16: The Next Generation of Data Storage · The Next Generation of Data Storage ... , Facebook, and Yahoo began taking parallel processing techniques used for decades in high-performance

TheNextGenerationofDataStorage

www.apeirondata.com ©2017ApeironDataSystems.Allrightsreserved.Version1.0 Page16of16

4 Conclusions

Whilethecorporateworldhasbeenrapidlydeployingreal-timescale-outapplications,thecurrentstorageproducts

quicklybecomethelimitingbarrierasdatasetsgrow.Simpleclustersofx86serverswhichworkwellwhendatasetsare

smallbecomebottlenecksasstorageneedsoutpaceprocessorneeds.Withmanysoftwarebusinessmodelstiedtothe

numberofservers,thisproblemismagnifiedwithescalatingcosts.ITmanagersarefacedwithsprawlinghardwareputin

placeonlytogetmorestoragecapacityandtheninturnarebilledforserverswhichareonlymanagingstorage

TheADS1000wasarchitectedfromthegrounduptoaddresstheuniqueproblemsfacedwithscale-outapplications.

• ServerConsolidation:Dramaticlevelsofperformanceminimizeservers.

• TotalCostofOwnership(TCO)Advantage:Storageandprocessingcanbescaledindependently.

• BusinessCriticalDecisionMakingCapability:Responsetimesseenbyapplicationsaredramaticallyimprovedand

theamountofactivedatanolongerhastobemanaged,allowingmuchricherqueryresponses.

• ApplicationEfficiencies:InapplicationslikeSplunkyounolongerhavetosloworturnofftoolslikeEnterprise

Securityandyourqueriescanprovidehumantimeanswers.

• PerformanceAcceleration:DatacanbeplacedonSSDstoragetypesthatmakesenseforthetypeofdatabeing

used,asopposedtothelimitedSSDsavendoroffers.

Scale-outdeeplearningandanalyticapplicationsofferbusinessestheabilitytoextractvaluefrommassiveamountsof

data;butasthesedatasetsgrow,anewtypeofscale-outstorageisneeded.BothenterpriseSAN/NASproductsandarrays

ofx86serverswithinternalstoragefallshort.TheApeironADS1000storagehasbeenarchitectedtoaddressthespecific

needsofthesestrategicBigDatascale-outapplications.