drivescale-hdp reference architecture · this reference architecture guide is for hadoop and it...
TRANSCRIPT
1
Contents1. ExecutiveSummary...........................................................................................................2
2. AudienceandScope..........................................................................................................3
3. GlossaryofTerms..............................................................................................................3
4. DriveScale|HortonworksDataPlatform-ApacheHadoopSolutionOverview..............5
5. DriveScaleComponentsOverview....................................................................................7
4.1Hardware:DriveScaleAdapterChassiswithDriveScaleControllers.....................................7
4.2Software................................................................................................................................7
4.3ConceptualdiagramofDriveScalesolution..........................................................................9
6. BenefitsoftheDriveScalesolution..................................................................................10
7. ReferenceArchitectureDetails........................................................................................11
6.1PhysicalClusterComponentsandConfigurationList.........................................................11
6.2LogicalClusterTopology.....................................................................................................12
6.3PhysicalClusterTopology...................................................................................................13
6.4ClusterManagement..........................................................................................................13
6.4.1SettingupDriveScalecluster...........................................................................................13
6.4.2SettingupHortonworksDataPlatformcluster................................................................16
6.5DiskandFilesystemLayout.................................................................................................17
6.6OSSupportability/CompatibilityMatrix..............................................................................17
8. RackScalability................................................................................................................17
9. References.......................................................................................................................18
10. BillofMaterials..............................................................................................................19
11. Conclusion.....................................................................................................................20
2
1. ExecutiveSummary
Thisdocumentisahigh-leveldesignreferencearchitectureguideforimplementingHortonworksDataPlatformonaDriveScalesolutionwithindustrystandardserversandJBODs.Thereferencearchitectureintroducesallthehigh-levelcomponents,hardware,andsoftwarethatareincludedinthestack.Eachhigh-levelcomponentisthendescribedindividually.ThereferencearchitecturedoesnotdescribetheHortonworksDataPlatformcomponentsortheirapplications.DriveScaleTechnologyOverviewDriveScale is leading the charge in bringing hyper scale computing capabilities tomainstream enterprises. Its composable data center architecture transforms rigid datacenters into flexible and responsive scale-out deployments.UsingDriveScale, data centeradministrators can deploy independent pools of commodity compute and storageresources, automatically discover available assets, and combine and recombine theseresourcesasneeded.Thesolutionisprovidedthroughasetofon-premisesandSaaStoolsthat coordinate between multiple levels of infrastructure. With DriveScale, HadooparchitectscanmoreeasilysupportHadoopdeploymentsofanysizeaswellasothermodernapplicationworkloads.DriveScaleprovideshardwareandsoftwaretechnologythatallowsseparatedeploymentofcomputeandstorageusingcommoditydisklessserversandJBODs(JustaBoxofDisks),withflexiblebindingofstorage-to-computeresourcesinanyratiorequiredbyanapplication.Asneeds change, these bindings can be dissolved and reconfigured on demand, all undersoftwarecontrol.DriveScale technology acquires a deep understanding of the physical infrastructure anddynamics of a data center,which it uses to provide an integrated set of intelligence andautomationtoolstoscale-outdatacenterinfrastructuretogreatlysimplifyandoptimizethedatacenter’soperations.HortonworksDataPlatformTechnologyOverviewHDPistheindustry'sonlytruesecure,enterprise-readyopensourceApache™Hadoop®distributionbasedonacentralizedarchitecture(YARN).YARN(YetAnotherResource
3
Negotiator)allocatesresourcesamongvariousapplicationsandmaximizesdataingestionbyenablingenterprisestoanalyzedatatosupportdiverseusecases.YARNcoordinatescluster-wideservicesforoperations,datagovernanceandsecurity.HDPisinteroperablewithabroadecosystemofdatacenterandcloudproviders,andprovidescentralizedmanagementandmonitoringofclusters.WithHDP,securityanddatagovernanceisbuiltintotheplatform.HDPaddressesthecompleteneedsofdata-at-rest,powersreal-timecustomerapplicationsanddeliversrobustanalyticsthatacceleratedecisionmakingandinnovation.
2. AudienceandScope
This referencearchitectureguide is forHadoopand IT architectswhoare responsible forthedesignanddeploymentofApacheHadoopsolutionsonpremises,aswellasforApacheHadoopadministratorsandarchitectsanddatacenterarchitects/engineerswhocollaboratewithspecialistsinthatspace.
3. GlossaryofTerms
Term Description
DataNode WorkernodesoftheclustertowhichtheHDFSdataiswritten.
HBA HostBusAdapter.AnI/Ocontrollerthatisusedtointerfaceahostwithstoragedevices.
HDD HardDiskDrive.
HDFS Apache™Hadoop®DistributedFileSystem.
HighAvailability
(HA)
Configurationthataddressesavailabilityissuesinacluster.Inastandardconfiguration,theNameNodeisasinglepointoffailure(SPOF).EachclusterhasasingleNameNode,andifthatmachineorprocessbecomesunavailable,theclusterasawholeisunavailableuntiltheNameNodeiseitherrestartedorbroughtuponanewhost.ThesecondaryNameNodedoesnotprovidefailovercapability.HighAvailabilityenablesrunningtwoNameNodesinthesamecluster:theactiveNameNodeandthestandbyNameNode.ThestandbyNameNodeallowsafastfailovertoanewNameNodeincaseofmachinecrashorplannedmaintenance.
4
JBOD JustaBunchofDisks.JBODisanalternativetousingaRAIDconfiguration.RatherthanconfiguringdrivestouseaRAIDlevel,thediskswithinthearrayareeitherspannedortreatedasindependentdisks.
JobHistoryServer
Processthatarchivesjobmetricsandmetadata.Onepercluster.
NameNode ThemetadatamasterofHDFSessentialfortheintegrityandproperfunctioningofthedistributedfilesystem.
NodeManager TheprocessthatstartsapplicationprocessesandmanagesresourcesontheDataNodes.
NIC NetworkInterfaceCard.
HDP™ Hortonworks®DataPlatform(thisincludestheHadoopDistributedFileSystemHDFS)
PDU PowerDistributionUnit.
NTP NetworkTimeProtocol
YARN YetAnotherResourceNegotiator,whichisasoftwarerewritethatdecouplesMapReduce'sresourcemanagementandschedulingcapabilitiesfromthedataprocessingcomponent,enablingHadooptosupportmorevariedprocessingapproachesandabroaderarrayofapplications.IncludedinHDP.
OS OperatingSystem
RM ResourceManager.TheresourcemanagementcomponentofYARN.ThisinitiatesapplicationstartupandcontrolsschedulingontheDataNodesofthecluster(oneinstancepercluster).
ToR TopofRack.
ZK ZooKeeper.Acentralizedserviceformaintainingconfigurationinformation,naming,andprovidingdistributedsynchronizationandgroupservices.
DSC DriveScaleCentral.Aweb-baseduserinterfacetotheDriveScalecloudthatperformsDriveScaleaccountmanagement.DSCiswhereyou
5
findanddownloadthekeystoenableinstallationoftheDriveScalesoftware,andthensetupyourDriveScaleManagementDomain(s)(DMDs).
DMD DriveScaleManagementDomain(s).Thisiswhereyoucreateyourdomain,selectandconfiguretheDMSnodesforthedomain,andselectachassis(withitsassociatedDriveScaleAdapters,DSAs)forthedomain.
DMS
DriveScaleManagementServer.Thisistheserverthatrunsthebundleofsoftware(service)thatmanagesasetofphysicalresourcestoenabletheDriveScaleservices.DriveScaleManageristheweb-baseduserinterfacetotheDMS.
DSAchassis DriveScaleAdapterchassisisa1RUchassisthathosts4EthernettoSAScontrollersservingasabridgebetween10GbpsEthernetconnectingcomputeresourcestoJBODsfullofcommoditydisks.
DSAcontroller DriveScaleAdaptercontrollerisanEthernettoSAScontrollersservingasabridgebetween10GbpsEthernetconnectingcomputeresourcestoJBODsfullofcommoditydisks.
MLAG Multi-chassisLinkAggregation.MLAGistheabilityoftwoormoreswitchestoactasasingleswitchwhenforminglinkbundles.
4. DriveScale|HortonworksDataPlatform-ApacheHadoopSolutionOverview
TheDriveScale|HortonworksDataPlatform(HDP)solutionisdesignedtoaddressthechangingrequirementsfromcustomersforamoreflexibleanddynamichardwareinfrastructurethatprovidessignificantcostandoperationalbenefits.Itisdesignedwithcomposabilityastheprimarygoal,savingmoney,improvingutilizationandgreatlysimplifyingthedeploymentofHadoopclusters.HadoopisanApacheprojectbeingdevelopedintheJavaprogramminglanguagebyaglobalcommunityofcontributors.Yahoo!,hasbeenthelargestcontributortothisproject,andusesApacheHadoopextensivelyacrossitsbusinesses.CorecommittersontheHadoopprojectincludeemployeesfromCloudera,eBay,Facebook,Getopt,Hortonworks,Huawei,IBM,InMobi,INRIA,LinkedIn,MapR,Microsoft,Pivotal,Twitter,UCBerkeley,VMware,WANdisco,andYahoo!,withcontributionsfrommanymoreindividualsandorganizations.
6
AlthoughHadoopispopularandwidelyused,installing,configuring,andrunningaproductionHadoopclusterinvolvesmanyconsiderations,including:
• ChoosingtheappropriateHadoopsoftwaredistributionandextensions• Installingmonitoringandmanagementsoftware• AllocationofHadoopservicestophysicalnodes• Selectionofappropriateserverhardware• Rightsizingthestorageconfiguration• Implementingdatalocality• Designofthenetworkfabric• Sizingandsystemscalability• Overallperformance
Thisiscomplicatedbytheneedtounderstandtheworkloadsthatwillberunningonthecluster,thefast-movingpaceofthecoreHadoopproject,andthechallengestomanagingasystemdesignedtoscaletothousandsofnodesinasingleinstance.
TheDriveScale|HortonworksDataPlatformsolutionstogetherembodiesallthehardware,software,resourcesandservicesneededtorunHadoopinaproductionenvironment.Thisend-to-endapproachmeansthatyoucanbeinproductionwithHadoopinashortertimethanistypicallypossiblewithhomegrownsolutions.Todeliver thecomputeandstoragepoweradatacenterneeds, this solution isbasedonHDP, a secure, enterprise-ready open source Apache Hadoop distribution built on acentralized YARN architecture, DriveScale hardware and software, industry standardservers,networksswitches,andJBODsbuiltfromcommoditydiskdrives.Thissolutionincludescomponentsthatspantheentiresolutionstack:
• Referencearchitectureandbestpractices• Optimizedstorageconfigurations• Optimizednetworkinfrastructure• HDP
ItisdesignedtoaddressthevastmajorityofApacheHadoopusecasesincluding,butnotlimitedto:
• Bigdataanalytics• ETLOffload• DataWarehouseOptimization• Bathprocessingofunstructureddata
7
• Bigdatavisualization• Searchandpredictiveanalysis
5. DriveScaleComponentsOverview DriveScalesystemiscomposedofonehardwarecomponentandfoursoftwarecomponents:
4.1 Hardware: DriveScale Adapter Chassis with DriveScale Controllers
Thisisa1Uappliancewithadaptersthatconnecttoserversvia10GbEthernetinterfacesandtoJBOD’sviaSASinterfaces.
Figure1:DriveScaleAdapter
4.2Software
TherearefourprincipalcomponentsoftheDriveScalesoftware:a) DriveScaleManagementServer(DMS)
• TheserverrunningtheDMSsoftwarebundleiscalledtheDMSnode.• AtypicaldeploymentconsistsofthreeDMSSystemsinaclusteredforhigh
availability(HA).• Thesoftwaremanagesandconfigureresourcesandcontainsthe
inventory/configurationinformationrepositoryanddatabase:ü Inventory:DMS’s,DSAdapters,switches,JBODchassis,disks,servernodesü Configuration:nodetemplates,clustertemplates,configuredclustersü DMSDatabase:usedasamessagebustocommunicatewiththeendpoints.
8
b) DriveScaleServerAgent• DriveScaleServerAgentdiscoveryactionprovidesinventoryforhardwareand
servers,andcreatesmappingsbetweenservernodesandthediskstheyconsume.
c) DriveScaleCentral(DSC)Cloud-basedsoftwaremanagementportalthatactsasthe:o softwaredistributionrepositoriesforsubscriberso DriveScalekeysrepositoryo centralizedlogfilerepositoryo userdocumentationrepositoryo licensemanager
d) DriveScaleAdapterFirmware
• AttheheartoftheclusterwheretheDMSisrunning,thefirmwareontheprocessorenablestheJBODstobemappedtotheserversandusedaslocaldrives.
10
6. BenefitsoftheDriveScalesolutionTheDriveScalesolutionfornext-generationScale-Outarchitecturedisaggregatesserversinclustersintopoolsofindependentcomputeandstorageresources.DriveScaledisaggregatesstoragefromcomputenodesattheracklevel,allowingdatacenterstobuycommoditydisklessservernodesandJBODsfromtheirpreferredvendorsandinstalltheminracks.DriveScale’sadvancedsoftwaremanagestheorchestrationofserversandclustersfromthepoolsofdrivesandcomputenodesthroughaGUIbuiltonaRESTfulAPI.Administratorscanprovision,decommission,andre-provisionserversandclustersdynamically,asneeded.WithDriveScale,theminimumclustersizeistheminimumsizeofaHadoopcluster.
• SoftwareDefinedorElasticinfrastructureforHadoopclusterWithDriveScalesolution,allthecomputeandstorageresourcesaresharedandcanbedeployedatwill.AdministratorscanprovisionclustersinminutesinsteadofdaysandgetsignificantlybetterutilizationoftheirhardwarebyusingDriveScale’sInfrastructureProvisioningandManagementtool.AdministratorscanbuildasingleresourcepoolormultiplesmallresourcepoolsforHadoopapplications.Theseresourcepoolscanbemodifiedondemandtoquicklyrespondtothechangingworkloadneeds.
• IntegrationDriveScale’ssolutionworksseamlesslywiththedatacenter’sexistingbest-in-classorcommodityserverandJBODtechnology.
• ScalabilityWithDriveScalesolution,thedatacentercanstartwithasmallclusterwithfewcomputenodesandasingleJBODandlaterscalestorageandcomputeseparatelyastheclusterstartstorunoutofresources,withoutcausinganydisruption.
• ImprovedutilizationWithDriveScalesolution,adatacentercanreplacetheserversanddrivesseparately,therebymaximizingthelifetimeofthehardwareinfrastructure.
• SimplyeverythingCustomerscanchoosetokeeptheirexistingbest-in-classequipment,oraddcommodityhardwaretobuildacomposableHadoopinfrastructurethatcaneasilybemodified.Addingorremovingcomputeorstoragecapacityisdonewithjustafewclicks.
11
7. ReferenceArchitectureDetails
6.1PhysicalClusterComponentsandConfigurationList
Thefollowingtableliststhephysicalcomponentsforthecluster.
Component Configuration Description QuantityDriveScaleAdapter
ChassisDHCP,Jumboframeenabled
1UappliancewithadaptersthatconnecttoserversviaEthernet,andtoJBOD’sviaSAS.
1
DriveScaleAdaptercontroller
DHCP,Jumboframeenabled
Providesthedatanetwork. 4foreachchassis
DriveScaleManagementServer(DMS)
DMSrunningonaVM ManagesandconfiguresthenodesandDriveScaleclusterandalsostorestheinventory/configurationrepositoryofeveryhardwareinthecluster.
Min1,forHA3DMS’sshouldbeconfiguredasmasterandslave
Servers 2socketCPUandmemorypertheindividualHadoopclusterrequirements
Commodityx86serversthathousealltheNodeManager,computeinstancesandDriveScaleagents.
Min1Namenodes+3Datanodes
HDDforServers 2drivesconfiguredinRAID1
TheinternaldrivesareusedforOSinstall.
2foreachserver
NICs Dual-port10GbpsEthernetNICs.Theconnectortypedependsonthenetworkdesign;couldbeSFP+orTwinax.
Providesthedatanetwork Min1foreachserver
JBOD Defaultconfiguration HousesthedrivewithdualIOcontrollers.
Min1
HDDforJBOD Defaultconfiguration Drivestohousethedataforthecluster.
Dependingontheclusterrequirements
ToR10Gswitch LLDP,MLAG,JumboFrame9Kconfigured
Providesdatanetworkconnectivity.
2foreachrack
ToR1Gswitch Defaultconfiguration Providesmanagementnetworkconnectivity.
1foreachrack
AmbariServer AmbariserverrunningonaVM
Manages,configuresandmonitorstheHDPHadoopcluster.
1foreachenvironment
12
6.2LogicalClusterTopology
Theminimumrequirementstobuildouttheclusterare:
● 1NameNode● 4DataNodes● 1DriveScaleAdapterChassis● 1DriveScaleManagementServer● 210GSwitches● 11GSwitch● 1JBODchassiswithdrives● 1DMS● 1Ambariserver
Thisreferencearchitectureisbuilton1namenodeand4datanodeswith1JBODand60drivesof1or2or3TBHDD.Thefollowingtableliststheconfigurationsoftheserversandnumberofdrivesused.
Component Configuration Description QuantityNamenode 2socket20coreCPU,
256GBRAM,10GbEIntelNICwith2internalHDDforOSand4highcapacityHDDmountedfromtheJBOD.
NamenodehoststheHDPnameservicesandDriveScaleagents.
1
Datanodes 2socket16coreCPU,256GBRAM,10GbEIntelNICwith2internalHDDforOSand8highcapacityHDDmountedfromtheJBOD.
DatanodeshousetheHDFSDataNodesandYARNNodemanagers,anyadditionalrequiredservicesandDriveScaleagents.
4
Notes:-Customerswithhigher(orlower)computeneedscanacquirebigger(orsmaller)datanodesconfiguredwithCPUandmemorythatfitsthespecificrequirementsoftheirapplications.-Similarly,dependingonthedatarequirements,customerscanaddorremovediskdrivestomatchthespecificneedsoftheirapplications.
13
6.3PhysicalClusterTopology
Figure3:DriveScalelabArchitecturewith1xDSAChassis(4xAdaptersinuse),1xJBOD,1NameNodeand4DataNodes
6.4ClusterManagement
ThissectiondetailsthestepsforsettingupaDriveScaleenabledHadoopclusterusingAmbariServer.
6.4.1SettingupDriveScaleclusterBeforeinstallingAmbariServerorusinganexistinginstallofAmbariServer,youmustcompletethefollowingtasksforsettinguptheDriveScalesolution:
1. RackandinstalltheDriveScaleAdapterchassisandcontrollers(DSAs)usingthedocumentationprovidedbyDriveScale.
2. RackandinstalltheJBODusingthedocumentationprovidedbythevendor.3. Rackandinstalltheserversusingthedocumentationprovidedbythevendor.4. CreateaRAID1configurationfortheinternalHDDontheserverandinstalltheOS
onalltheotherservers.5. InstallandconfigureDriveScaleManagementServer(DMS)eitherasaVMorona
standaloneserver.6. SetupDSAconfigurationfromtheDMS.
14
7. InstallandconfigureDriveScaleagentsonthemasteranddatanodes.8. Createmaster/datanodeandclustertemplatewithrequireddrivesusingDMS.9. CreatetheclusterfromthetemplateusingDMS.10. EnsurethatDriveScaleclusterisupandrunningbeforeproceedingahead.
Figure4:PhysicalcomponentsoverviewfromDMSUI
Figure5:LogicalClusterstatusfromDMSUI
16
6.4.2SettingupHortonworksDataPlatformcluster1. Afterthesuccessfulcompletionofthestepsabove,installAmbariServerusingthe
HDPgettingreadytoinstallandinstallationguide.2. Thefollowingservicesweresetupforthisreferencearchitecture.
Figure8:InstalledHadoopservicesdetailsfromAmbariUIDashboard
3. Ensurethatthenameanddatanodesareupandrunningwiththerightassignedrolesandstorage.
Figure9:HostsandrolesoverviewfromAmbariUIHostssection
17
6.5DiskandFilesystemLayout
Node/Role DiskandFilesystemLayout Description
Management/Master Ext4 1/2/3TBdrivesaremountedfromtheJBOD’s
YARNNodeManagernodes
Ext4 1/2/3TBdrivesaremountedfromtheJBOD’s
6.6OSSupportability/CompatibilityMatrix
DMS ServerNodes
CentOS/RHEL6.x X X
CentOS/RHEL7.x X X
Ubuntu14.04 X X
8. RackScalability
Customerscanscalebeyondonerackinastraightforwardmannertoexpandtheircomputeandstorageresourcesdependingasapplicationneedsgrow.Customerscanchangeormaintainthecompute-to-storageratioforthenewracksoranexistingrack.ForeverynewJBODaddition,anewDriveScaleAdapterwithfourcontrollersmustbeaddedaswell.Sincedrivesareassignedfromwithintheracktoserversintherack,scalingisachievedbysimplyaddingmorerackswithServers,DriveScaleAdapters,SwitchesandJBODs.
18
Figure10:HostsandrolesoverviewfromAmbariUIHostssection
9. References
1. HDPInstallationprerequisiteshttps://docs.hortonworks.com/HDPDocuments/HDP2/HDP-2.3.0/bk_installing_manually_book/content/ch_getting_ready_chapter.html
2. AutomatedInstallwithAmbarihttps://docs.hortonworks.com/HDPDocuments/Ambari-2.1.1.0/bk_Installing_HDP_AMB/content/index.html
3. DriveScaledocumentationforrackingandinstallationwhichareprovidedbyDriveScale
4. YARNDefinitionhttp://searchdatamanagement.techtarget.com
19
10. BillofMaterials
ServerComponents Quantity
IntelE5-E5-2665,2.40GHz,16CCPU 2
16GBDIMMS 16
Intel10GbESFP+NIC 1
JBODComponents Quantity
QUANTAJB4602JBOD 1
IOControllers 2
SeagateHDD 60
Switch Quantity
CISCONEXUS554810GbEswitch 2
D-LinkDGs-1518-281GbEswitch 1
DriveScaleComponents Quantity
DriveScaleAdapterChassis 1
DriveScaleAdapter 4
Software Version
CentOS 6.7
DriveScaleAdapter 1.2.0.1
HDP 2.3
20
11. Conclusion
TheDriveScale-HDPsolutionreferencearchitectureguideisdesignedtoprovideanoverviewofthecombinedsolutionsandthekeycomponentsemployed.ThereferencearchitecturealsooutlinestheadvantagesofcomputeandstoragedisaggregationwithDrivescale-HDPsolution.AboutDriveScaleDriveScaleisleadingthechargeinbringinghyperscalecomputingcapabilitiestomainstreamenterprises.Itscomposabledatacenterarchitecturetransformsrigiddatacentersintoflexibleandresponsivescale-outdeployments.UsingDriveScale,datacenteradministratorscandeployindependentpoolsofcommoditycomputeandstorageresources,automaticallydiscoveravailableassets,andcombineandrecombinetheseresourcesasneeded.Thesolutionisprovidedviaasetofon-premisesandSaaStoolsthatcoordinatebetweenmultiplelevelsofinfrastructure.WithDriveScale,companiescanmoreeasilysupportHadoopdeploymentsofanysizeaswellasothermodernapplicationworkloads.DriveScaleisfoundedbyateamwithdeeprootsinITarchitectureandthathasbuiltenterprise-classsystemssuchasCiscoUCSandSunUltraSparc.BasedinSunnyvale,California,thecompanywasfoundedin2013.InvestorsincludePelionVenturePartners,NautilusVenturePartnersandIngrasys,awhollyownedsubsidiaryofFoxconn.Formoreinformation,visitwww.drivescale.comorfollowusonTwitterat@DriveScale_Inc.
AboutHortonworksHortonworksisaleadinginnovatorintheindustry,creating,distributingandsupportingenterprise-readyopendataplatformsandmoderndataapplications.Ourmissionistomanagetheworld’sdata.Wehaveasingle-mindedfocusondrivinginnovationinopensourcecommunitiessuchasApacheHadoop,NiFi,andSpark.Wealongwithour1600+partnersprovidetheexpertise,trainingandservicesthatallowourcustomerstounlocktransformationalvaluefortheirorganizationsacrossanylineofbusiness.Ourconnecteddataplatformspowersmoderndataapplicationsthatdeliveractionableintelligencefromalldata:data-in-motionanddata-at-rest.Visitusathortonworks.com.WearePoweringtheFutureofData™.