introduction to configuation and management for sas® grid manager … · 2018-02-06 · yarn node...

Post on 17-Apr-2020

7 Views

Category:

Documents

0 Downloads

Preview:

Click to see full reader

TRANSCRIPT

PaperSAS1117-2017IntroductiontoConfigurationandManagement

forSAS®GridManagerforHadoopMarkLochbihler,HortonworksInc.,SantaClara,CA

ABSTRACT

"HowcanweruntraditionalSASÒjobs,includingSASworkspaceserversonHadoopworkernodes?"TheanswerisSASÒGridManagerforHadoop.IthasbeenintegratedwiththeHadoopecosystemtoprovideresourcemanagement,highavailabilityandenterpriseschedulingforSAScustomers.Thispaperwillprovideanintroductionarchitect,configurationandmanagementofSASGridManagerforHadoop.AnyoneinvolvedwithSASandHadoopshouldfindtheinformationinthispaperuseful.ThefirstareatobecoveredwillbeabreakdowneachrequiredSASandHadoopcomponent.FromtheHadoopecosystem,wewilldefinetheroleofYARNCompute,HDFSStorage,andHadoopClientservices.WewillreviewSASmetadatadefinitionsforSASGridManager,ObjectSpawnerandGridWorkspaceServers.WewillcoverrequiredKerberossecurity,aswellasSASEnterpriseGuideandtheSASGSUButility.YARNqueuesandtheSASGridPolicyfileforoptimizingjobschedulingwillbereviewed.Andfinally,wewilldiscusstraditionalSASmathrunningonaHadoopWorkernode,andhowitcantakeadvantageofSASHighPerformanceMathtoacceleratejobexecution.ByleveragingSASGridManagerforHadoop,sitesaremovingSASjobsinsideaHadoopCluster.Thiswillultimatelycutdownondatamovementandprovidemoreconsistentjobexecution.AlthoughthispaperiswrittenfortheSASandHadoopAdministrators,SASUserswillalsobenefitfromthissession.

WHATISSASGRIDCOMPUTING?

SASGridComputinghasbeenofferingSASshopsalowercost,shared,multi-tenant,highperformingcomputingenvironmenttomeettheiradvancedanalyticandmodelingneeds.ByimplementingaSASGrid,SASadministratorscancentralizeindividualandordepartmentalSAScomputingenvironmentsontoaSASComputeGridandbetterutilizeITresources,providehighavailabilityandacceleratedprocessing.ASASGridrunsontwoormoreSASGridComputeNodes.EachSASGridComputeNodeisacandidatetoexecuteSASjobssubmittedintoaGridqueuebySASusergroupsatasite.

ENTERHADOOPANDYARN

ThebenefitsofSASGridComputingarenotnewtotheSASUserCommunity.Ithasbeensuccessfullyimplementedandrunninginproductionatthousandsofcustomers’sitesaroundtheworldforwelloveradecadeandprovidessignificantbenefits.What’snewwithSASGridManagerforHadoopisthatitoffersYARNasanorchestrationoptioninadditiontotheexistingandwell-provenuseofthePlatformSuiteforSASwhichincludesLSF.SASGridManagerforHadoopwasdesignedtoenablecustomerstoco-locatetheirSASGridandassociatedSASworkloadsontheirneworexistingSASHadoopclusters.

WHYMOVESASJOBSINSIDEHADOOP?

Adecisiontodevelopthistypeofsolutiontypicallystartsbyidentifyingbusinessneedsandassociatedusagecasestosupporttheseneeds.AnycustomerinterestedloweringoverallSASstorageandservercosts,whileatthesametimeconsolidatingandcentralizingdepartmentalSASdatasetsisacandidate.AninitialstepcanbetomigrateselectedSASWorkloaddata,includingrawinboundandoutboundfiles,SASlibrariesandotherRDBMSSASdatasourcestoHadoopStorage.JustbytakingthefirststeptomoveSASstoragetoHadoop,organizationscansaveover50%inannualSASstoragecosts.Fornewmodeldeveloporexistingmodeloptimizationeffortsthatplanon

usingIoTHadoopdatasources(includingSensor,ClickStream,WebLog,MachineandIoTdevice),thenmovingexistingSASdatasetstoApacheHiveandHadoopStorage(HDFS)couldyieldsignificantcostsavings.ToaccessthesemigratedSASdatasets,userscanturntoSASÒAccesstoHadoop.ItoffersaSASlibnameengineforHDFS,aswellasHive,andafilenamestatementtoHDFS.Oncethedataismigrated,carefullyselectedSASWorkloadscanbemovedontoHadoopWorkernodesandSASuserscantransparentlyleverageSASGridManagerforHadooptoruntheirdailySASjobs.OncethedecisionhasbeenmadetomovenewandorexistingSASdataandworkloadstoHadoop,andleverageSASGridManager,itishighlyrecommendedthatyoursiteinvestupfrontinSASandHadoopadministrativetrainingandprofessionalservices.ItisalsocriticalthatyouhaveadetailedunderstandingofhowYARNconfigurationandschedulingworksonHadoop.Inaddition,involvingSASandHortonworksProfessionalservicesduringtheprojectstartupphaseinimportant.

HOWITWORKS

YARN101 IfyouarenewtoHadoopandYARN,letmeprovideabitofbackground.YARNistheorchestrationengineforHadoop2.x.Figure1belowisahigh-levelviewofanEnterpriseDataLake.WithYARNastheasthecentralorchestratorandoperatingsystemforHadoop,sitescanrunmultipleBatch,InteractiveandRealtimecomputeengineswithinthesameHadoopcluster(includingSASGridManagerasindicatedbythearrow).HadoophasbeenofferingaLowCost,MassiveScaleStorageandComputeArchitectureforoveradecade.WithSASGridManagerforHadoop,SASuserscanruntraditionalSASjobsonHadoopWorkernodes,insidethecluster.

Figure1:AHadoopClusterrunningBatch,InteractiveandRealTimeEngines,includingSASGridManager.WithSASGridManagerforHadoop,acommunityofSASuserstransparentlyleveragingSASClientsandsubmitinteractiveandbatchSASjobstotheSASGridComputinginfrastructureonHadoop.ThesejobsarescheduledbyYARNbasedonqueuesandsitepoliciestorunonanoptimalSASGridComputeNode(HadoopWorkerNode).BelowisaConceptualViewofthearchitecture:

Clickstream Web&Social

Geolocation Sensor& Machine

ServerLogs

Unstructured

SO

UR

CE

S

Existing Systems

ERP CRM SCM

AN

ALY

TIC

S

Data Marts

Business Analytics

Visualization& Dashboards

AN

ALY

TIC

S

Applications Business Analytics

Visualization& Dashboards

°

°

°

°

°

°

°

°

°

°

°

°

°

°

°

°

°

°

°

°

°

°

°

°

°

°

°

°

°

°

HDFS (Hadoop Distributed File System)

YARN: Data Operating System

Interactive Real-TimeBatch SAS GridManager

Batch BatchMPP

EDW

Figure2:SASGridManagerforHadoopConceptualArchitecture(Reference:SASGridManagerforHadoop)

BREAKINGDOWNTHISSASARCHITECTUREONHADOOPWithSASGridManagerforHadoop,SASjobsarescheduledbyYARN’sResourceManagerintoaclusterrunninginsideoftheHadoopfirewall.SASjobscanrequestadditionalHadoopresourcesaftertheyhavebeeninitiallylaunched.ThisnewarchitecturecanreducethecomplexityofconfigurationbysimplifyingportmappingbetweenSASjobsandtheservicestheywillneedtocomplete.Inthisdeploymentmodel,SASmathisrunningclosertotheHadoopdataandnolongerrequiresnegotiationwiththeHadoopfirewalltoaccess,ifrequested,additionalHadoopandSASHighPerformanceserviceports.

Figure3:SASGridManagerforHadoopwYARNArchitectureOverviewTheYARNResourceManager’sroleinFigure3aboveistodeterminethemostoptimalHadoopWorkerNodestorunatraditionalSASjoborWorkspaceServerintheHadoopcluster.Inthisillustration,youcanseetwoYARNjobsalreadyrunningonthecluster.EachoftheseHadoopjobshasasingleYARNApplicationMaster(AM)Containerassigned.Job1isnotaSASjob.ItconsistsofAM1alongwiththreeadditionalJobTaskContainers(C1.1,C1.2,andC1.3).ThereisalsoaSASjobrunningonthesamecluster.ThisSASjobhasitsownApplicationMaster(SASAM1)

YARNNodeManager YARNNodeManager YARNNodeManager YARNNodeManager

Job1Container1.1

YARNNodeManager YARNNodeManager YARNNodeManager YARNNodeManager

YARNNodeManager YARNNodeManager YARNNodeManager YARNNodeManager

Job1Container1.2

Job1Container 1.3

Job1AM 1 SASAM 2

SAS Grid Manager w YARN Architecture Overview

SASClient• SASGSUB• SASBatch• SASEG

YARNResourceManager

YARNCapacityScheduler

SASGridControlServer

SASObjectSpawner

SAS MetadataServer

SASContainer2.1

andoneSASJobTaskContainer(SASC2.1).Fromanarchitectureandadministrativeperspective,letstakealookattheprimarySASandHadoopservicesthatwillbeconfigured.KEYSASGRIDARCHITECTURECOMPONENTSSASMetadataServer–ASASservicesupporting,amongotherobjects,thelogicaltophysicalmappingofSASLogicalServerstoYARN.Inourexamples,wewillbeusingthelocalSASServer,SASGrid.SASGridControlServer–ASASservicerunningontheYARNResourceManagernode.ItiscalledbySASclientstocommunicatewithYARNResourceManagertonegotiateresourcesforSASjobs.SASObjectSpawner–ASASServicerunningontheYARNResourceManagernode.ItisusedtolaunchSAScontainswithYARN.

Figure4:ViewofSASManagementConsole,withexpandedSASGridLogicalServer.SASClients–includesSASbatchjobs,SASGSUB(abatchgridutility),andinteractiveClientslikeSASEnterpriseGuide.SASClientswill“Connect”and“Disconnect”fromaSASLogicalServer,likeSASGrid,definedinSASMetadataandshownaboveinFigure4.KEYHADOOPINTEGRATIONPOINTSFORSASGRIDYARNResourceManager–AHadoopYARNMasterServiceresponsibleforcontrollingglobalHadoopclusterresourceusage.ResourceManagerenablesmulti-tenancyandSLAs.ItisalsoresponsibleformonitoringNodeManagerState,submittingApplicationMasterrequests,verifyingcontainerlaunchandmonitoringApplicationMasterstate.

YARNNodeManager–ThisHadoopYARNWorkerNodeServicemanageslocalresourcesonbehalfoftherequestingservice.ItalsotracksnodehealthandcommunicatesstatustotheResourceManager.YARNCapacityScheduler–AHadoopYARNservice,whichcanbeconfiguredtoprovideJobSchedulingpoliciesforSLAs,Users,Groups,andResources.HadoopDataNodes–HadoopDistributedFileSystem(HDFS)storagenodes.KerberosService–TheHadoopclustermustbeKerborized.HADOOPMASTERNODEDECISIONSASASGridControlServerandSASObjectSpawnermustbedeployedonthesameHadoopMasterNodeastheYARNResourceManager.HADOOPWORKERNODEDECISIONSSASHOMEandSASCONFIGForeachHadoopWorkerNodewhichisacandidatetorunSASjobsmustbeconfiguredsothatSASHOMEandSASCONFIGareavailable.SASWORKandSASUTILItiscriticalthateachHadoopWorkernodewhichisacandidatetorunSASjobsisconfiguredcorrectly.AlargepartoftheI/OrequiredwhenrunningSASanalyticsistothescratchortemporarylocationsofSASWORKandSASUTIL.SASrequiredI/Othroughputforthesefilesystems,toprovidethenecessaryperformancetoaheavilyloadedsystem,is100MB/sec/core.AdequatesizingforSASWORKisalsonecessary.TraditionalStorageandComputeverseComputeOnlyWorkerNodesWithinHadoop,itisacommonpracticetohavedualpurposeworkernodeswhichrunmathorprogramsnearonthesamenodeswheretheHadoopdataresides.WithHadoop2.x,theconceptofdedicatedComputeOnlyHadoopWorkerNodesisanoption.ForSASGridManagerforHadoop,bothoptionsareanoption.ForComputeOnly,theseHadoopWorkerNodeswillnolongerhosttherequiredservicesanddataforHDFS,givingmorecomputingresourcesdedicatedtotheprogramsrunningonthesenodes.ThetradeoffforComputeOnlyHadoopNodesisthelossofHDFSdatalocality.YoursitesSASworkloadrequirementswilldeterminewhichtypeofWorkerNodestodeployforSASGridManagerforHadoop.

REALWORLDCONFIGURATIONEXAMPLEInthissection,wewillwalkthroughaSASGridManagerforHadoopconfigurationexercise.Forthisdeployment,ourHadoopClustercurrentlyhastwenty-eightHadoopWorkerNodes.Eachofthesenodeshas256GBofRAM.AftersubtractingtherequiredRAMneededtoruntheOSandadditionalkeyservices,thereis192GBofRAMleftforHadoopcontainers.ThismeansthatthetotalclusterwideRAMcapacityforHadoopContainers5.376TB.Wehavedecidedtoallocate50%ofthisRAMtoSASjobsor2.688TB.

TotalRAMPerCluster

Node

AvailableContainerRAM

PerNode

#WorkerNodesinCluster

TotalContainer

RAMAvailable

AmountofClusterRAMAllocatedto

SASQueue

TotalContainerRAMforSAS

Queue

256GB 192GB 28 5.376TB 50% 2.688TBTable1:TotalClusterYARNContainerRAMAvailableforSASUsersForourexercise,wehadpreviouslydeterminedthattheaveragenumberofSASbatchorinteractivesessionsforeachactive,concurrentuseratoursiteistwo.FromaHadoopperspective,foreachoftheseSASjobs,weknowthatYARNwillspawnsatleasttwocontainers(anApplicationMasterandaJobTaskContainer).AndonceSASjobsarerunninginHadoop,theycancalladditionalHadoop(ie.Pig,Hive,MapReduce)andSASSASHP(ie.prochpsummary,hpmeans,hpfreq)services.TheaveragenumberofadditionalHadoopContainersspawnedfromeachSASjobrunningonHadoopisfour.Thismeanswecananticipateeachactive,concurrentSASusertobeallocating,onaverage,eightYARNContainers.

Avg#ofBatchJobsorInteractiveSessionsper

SASUser

#ContainersPerJoborSession

Avg#ofAdditionalHadoopContainersSpawnedfrominitial

SASJobContainer

AvgTotal#ofContainersperSAS

user

2 2 4 8Table2:AverageNumberofContainersperSASUsersNowthatweknow,onaverage,eachactiveSASuserwillallocateeightcontainers,andwehave2.688TBoftotalclusterRAMavailable,weneedtodeterminethetotalnumberofconcurrentSASuserswhocanrunatanyonetimeonourHadoopcluster.OneadditionalfactortoconsideristhatnotallSASusersneedthesameamountofcomputingresources.Inourexercise,wehavethreeSASuserresourcecategories(Low,Medium,andHigh).EachuserfallingintotheLowcategory,bydefault,willallocate2GBYARNContainersfortheirjobs.TheSASMediumcategorywilldefaulttoaYARNContainerSizeof4GB.AndtheHighcategorywilldefaultto8GBContainersize.Withthesenumbersinmind,wewillbetohave134concurrentSASusersonoursystemasindicatedbelow.

SASAppType(UserType) ContainerSize

Anticipated%ofUsersTypeon

Server

AvailableCluster

MemoryforSASjobs

Max#ofContainers

Avg#ContainersPerSASUser

Total#ofSASUsers

Low(General/Analyst) 2GB 70% 1.881TB 940 8 117

Medium 4GB 20% 537GB 134 8 16

High 8GB 10% 268GB 33 8 4

Totals 2.688TB 1107 134Table3:BreakdownofSASApplicationTypestobeconfiguredforSASUsers

YARNCapacitySchedulerInthisexercise,weareallocating50%oftheHadoopCluster’sRAMtoSASUsers.WewillusetheYARNCapacitySchedulertosetupthispolicy.WeareabletosetupHierarchicalYARNQueues,includingsas94_queue,asafirstclusterwideconfigurationstep.

Figure5:YARNCapacitySchedulerLogicalView-SASQueue-50%HadoopClusterRAM

Below is a view of the YARN Capacity Scheduler UI, we will see that the sas94_queue has 50% cluster RAMallocated.SASUsersLinuxidsshouldalsobemappedtothisqueue.

Figure6:YARNCapacitySchedulerAdminView-SASQueue-50%HadoopClusterRAM

YARN Capacity SchedulerExample: 50% of Cluster RAM allocated to SAS Queue

ResourceManager

Scheduler

root

Adhoc30%

SAS50%

Mrkting20%

Dev10%

Reserved20%

Prod70%

Prod80%

Dev20%

P070%

P130%

Capacity Scheduler

HierarchicalQueues

OurHadoopclusterhasYARNMinimumContainerSizesetclusterwideto2GB.WearegoingtoneedtoimplementamechanismtoadjustContainersizebasedonSASUserresourceneeds.Wewillusethesasgrid-policy.xmlconfigurationfileandSASMetadataUserandGroupmappingstoaccomplishthistask.First,wewillreviewasasgrid-policy.xmlfilewhichwillensurethatSASusersobtaintherightamountofmemoryfortheirjobs.WithinthisSASGridconfigurationfile,youwillseethreeGridApplicationTypes:Low,Medium,andHigh.

SASGridPolicyFileExample<?xmlversion="1.0"encoding="UTF-8"standalone="yes"?><GridPolicydefaultAppType="low"> <GridApplicationTypename="normal"> <jobname>NormalJob</jobname>

<priority>20</priority> <nice>10</nice> <memory>2048</memory> <vcores>1</vcores> <runlimit>480</runlimit> <queue>default</queue> <hosts> <hostGroup>development</hostGroup> </hosts> </GridApplicationType> <GridApplicationTypename="low"> <jobname>SASLow</jobname> <priority>10</priority> <nice>0</nice> <memory>2048</memory> <vcores>1</vcores> <runlimit>480</runlimit> <queue>sas94_queue</queue> <hosts> <hostGroup>sas94_work</hostGroup> </hosts></GridApplicationType> <GridApplicationTypename="medium"> <jobname>SASMedium</jobname> <priority>10</priority> <nice>0</nice> <memory>4096</memory> <vcores>1</vcores> <runlimit>480</runlimit>

<queue>sas94_queue</queue> <hosts> <hostGroup>sas94_work</hostGroup> </hosts></GridApplicationType> <GridApplicationTypename="high"> <jobname>SASHigh</jobname> <priority>10</priority> <nice>0</nice> <memory>8192</memory> <vcores>1</vcores> <runlimit>480</runlimit> <queue>sas94_queue</queue> <hosts> <hostGroup>sas94_work</hostGroup> </hosts></GridApplicationType><HostGroupname="development"> <host>partnerd1worker-1.field.hortonworks.com</host> <host>partnerd1worker-2.field.hortonworks.com</host> <host>partnerd1worker-3.field.hortonworks.com</host> <host>partnerd1worker-4.field.hortonworks.com</host> <host>partnerd1worker-5.field.hortonworks.com</host></HostGroup><HostGroupname="sas94_work"> <host>partnerd1worker-4.field.hortonworks.com</host>

<host>partnerd1worker-5.field.hortonworks.com</host></HostGroup> </GridPolicy>Theabovesasgrid-policy.xmlfileisasample.ItillustrateshowwecansetupGridApplicationTypesfordifferentSASusergroups.NotethatwehavespecifiedaYARNContainermemoryandtheSAS94QueueforeachAppType.WethatwehaveconfiguredtheappropriateGridApplicationTypes,weneedawaytomapSASApplications(SASGridManagerClientUtility,SASEnterpriseGuide,others)totheseSASGridApplicationTypes.ThiscanbedoneusingtheGridOptionsSetMappingWizardwithintheSASManagementConsole.ThisrequireseachSASApplicationtoberegisteredtotheSASMetadataServerandconfiguredfor“IsGridEnabled”.Oncethisisdone,usethefollowingSASGridServerPropertiesOptionsTabforyourLogicalGridServer(SASGridinourcase)andconfigureSASusersandgroupstotheirappropriatedefaultGridAppTypefoundinsasgrid-policy.xmlabove.

Figure7:ConfiguringSASMetadataGroupstoSASGridApplicationTypes.SASClientIdleandRuntimeConfigurationsItisrecommendedthatyousetthefollowingthresholdsforSASusersasadefaultandadjustaccordingtoyoursidesworkloadneeds.ThiswillfreeupidleresourcesforotherHadoopusers.ItisrecommendedtobeveryconservativeinthesesettingssothatyoudonotinadvertentlydisruptSASuserproductivity.

o SASClientIdleTimeout–inourexample–wesetto120minutes§ YoucancontroltheamountoftimeaclientsuchasSASEnterpriseGuidecanstayconnected

butnotactiveusingtheInactiveClienttimeoutsetting.o SASRunLimitTimeout–inourexample–wesetto480minutes

§ Withinthesasgrid-policy.xmlfile,youcancontrolthisusingthe“runlimit=xx”GridApplicationTypeparameter.

Also,acoupleofotherHadoopparametersthatweshouldpaycloseattentiontoareMapReduceMinimumContainerSize,aswellastheengineyouarerunningforHive.WerecommenduseHive’sTezEngineinmostcasesasitprovidesin-memoryqueryexecutionthatisconsiderablyfasterthantheMapReduceengine.WearecurrentlyinvestigatingYARNNodeLabels(PolicyScheduling)andtheHiveLLAPengine(SQLQueryAcceleration)forfutureconfigurationconsiderations.

SASWORKLOADMIGRATIONCONSIDERATIONSSASGridManagerforHadoopshouldbeleveragedasacomplimenttoyourexistingSASinfrastructureandnotareplacement.WhilecertaintraditionalstoragearchitecturesprovideextremelyfastIOforSASworkloads,thereisasignificantcosttotheseinfrastructures.SelectingtherightSASworkloadsanddatatomovetoHadoopisa

criticallyimportantstepwhenmovingSASjobstoHadoop.Akeybusinessdriverisreducingoverallstoragecosts.AnotherkeydriveracceleratingtheperformanceofSASjobsthatrequireaccesstolargedatasetsinHadoop.ThemostexpensiveoperationforanybigdatajobismovinglargeamountsofdatafromstoragetoacomputetieroutsideofHadoop.WithHadoop,aprimarygoaldesigngoalforthelastdecadehasbeentomoveaprogramscomputeormathtowherethedataresides,inHadoop.Thisenablesavoidingcostlydatamovementoverthenetwork.Withourbusinessdriversinmind,SASshopsshouldinitiallyidentifyahandfulofSASjobsrunningoutsideofHadoopwhichhavesignificantlylargeephemeral(SASWORK)andpermanentstorage(SASdatasets,RDBMS,otherinboundandoutbounddatasets)needs.IftheSASjobdataiscurrentlystoredonexpensiveSANstorage,costsavingswillbesignificant.AnySASjobsusingBaseSASprocedureslikeProcSummary,MeansandFreqwhichsupportSASIn-DatabasePushdownshouldconsidered.AnySASjobsplanningtouseHadoopdatasetsshouldbeonthelist.WhenmovingexistingSASWorkloaddataintoHadoop,thereareseveralstorageoptions.ExistingSASdatacanbemovedintoHadoopDistributedFileSystem(HDFS)andthen,withinHive,aschemaonreadycanbeestablished.IftheexistingSASdatasetsanddatabasetablesarestoredinHive,asmentioned,SASInDatabasecapabilitiescanbeleveraged.ThiswillenabletheSASenginetohaveanopportunitytoimplicitlytranslateBaseSASStatprocedurestocomplexHiveQL,ultimately,movingmostofthemathtothedata.SASAccesstoHadoopisanaddonSASproductthatplaysacriticalroleinthismigration.SASuserswillneedtohavesomebasictrainingonhowtomosteffectivelyleveragetheHiveandHDFSlibnameenginesfromtheirSASjobs,aswellastheSASfilenamestatementtoHFDS.ThisbasicsyntaxchangeintheirprogramswillberequiredtotakeadvantagedirectlyofSASdatanowresidinginHadoop.

o LibnametoHive§ UsesaJDBCconnecttoHive§ ProvidesanopportunityfordistributedparallelcomputeonHive

• UsingSASInDatabasePushDowncapabilities

o LibnametoHDFS§ HDFSAPIsareleveragedfordirect,parallelreadandwritetoHadoop.

o FilenametoHDFS

§ SupportsinboundandoutboundreadandwriteofrawdatatoHadoop.

SASdoesincludesupportforHadoopClientsandRESTfulAPIs.TheSASAccesstoHadoopcanalsoallowasasprogrammertorunexplicitHDFS,Pig,Hive,andMapReducecodeinlinewithintheirSASjob.Inaddition,SAShasanSQLEnginewhichsupportsbothimplicitandexplicitPassThruHiveQLexecutiondirectlytotheHadoopcluster.

ADAYINTHELIFEOFASASUSERLEVERAGESASGRID WhenusingSASclients,itshouldbetransparenttotheSASusersfromafunctionalityperpectivethattheyareinteractingwithaSASGrid.HereisadetailedwalkthroughofaSASuserleveragingSASEnterpriseGuiderunningSASProcessFlowsandTasksinsideofHadoop.Below,aSASuserlogsintoSASEnterpriseGuide’sUI,andopensanexistingproject.Toexecuteworkfromthisproject,theusermustlaunchaSASWorkspaceServer.Inthiscase,iftheuserexpandsthe+SASGridonthelefthandsideofthescreen,SASGridManagerforHadoopwillrequestYARN’sResourceManagertolaunchaSASWorkspaceServer(WSS)insidetheHadoopCluster.

Figure8:Clickon+SASGridtolaunchSASWorkspaceServeronHadoopWorkerNodeThegreencheckmarknexttoSASGridbelowindicatesthattheSASEGusercanrunanyProcessFloworTaskinHadoop,becauseYARNhassuccessfullylaunchedtheSASWSSonaHadoopWorkerNode.

Figure9:Thegreenarrow(SASGrid)indicatesSASWorkspaceServerisestablishedinHadoop

IfwelookattheYARNUIbelow,typicallyreservedforHadoopAdministrators,wecanseeaSASEnterpriseGuide–WorkspaceServerinaRUNNINGstate.ThisindicatestotheHadoopAdministratorthataSASjobisrunningandhasreservedtwoYARNcontainersandiscurrentlyactiveontheHadoopcluster:

Figure10:YARNUI-SASEGWorkspaceServer(WSS)runninginsas94_queueonHadoop.Switchingtothe“Nodes”linkintheYARNUI(thisisasmallersampleclustertoillustrateourexampleonly),youcanseethetwoYARNcontainersassociatedwiththeSASWSS,runningin1GBYARNContainers.

Figure11:YARNUI–“Nodes”linkshowingtwo1GBcontainerssupportingtheSASWSS.

And,switchingtotheYARNUI–“Scheduler”link,wecanseeClusterMetricsbelow.

Figure12:YARNUI–“Scheduler”linkHadoopClusterMetrics. SwitchingbacktotheSASUserEGUI,theSASusercanseethataSASLibraryhasbeenassignedpointingtoHive.WecanalsoseeintheSASEGExplorerwindowonthebottomlefthandsideofthescreen,alltheHiveTablesavailabletotheSASuserwhichareassociatedwithSASlibrefHIVE_TPC.Atthispoint,theSASEGusercanrunSASanalytictasksdirectlyagainstHivetablesusingtheHivedatabaseschematpcds_bin_partitioned_orc_10.

Figure13:SASEnterpriseGuide–LibraryHIVE_TPChasbeenassignedtoHive.

Atthispoint,theSASEGusercanrunanytraditionalSASjoborcode.TheSASusercanalsocallanyHDPservice(i.e.HDFS,Pig,Hive,MapReduce)in-linewithinthisSASworkspaceserver.Ifinstalled,theSASusercanalsocallSASHighPerformanceAnalytics,whichcanrunonYARNintheclusteraswell.WhentheSASEGuserissuesa“Disconnect”(seebelow),thiswillinitiatearequesttotheSASWorkspaceServertoshutdown.YarnwillreleasethecontainerthatwasusedforrunningtheSASWorkspaceServerandreclaimtheresourcesassociatedwiththatcontainer.

Figure14:SASEnterpriseGuide–“Disconnect”fromHadoop. Oncethe“Disconnect”iscompleteinEG,theHadoopAdministratorwillseetheSASEnterpriseGuideWorkspaceServerisnow“FINISHED”.(Seebelow).AdisconnectwillalsooccurautomaticallyifaSASEGusershutsdowntheUI.

Figure14:YARNUI-SASEGWorkspaceServer(WSS)FINISHEDstatusonHadoop. ForsitesinterestedinmovingtraditionalSASworkloadsco-locatedontoHadoop,SASGridManagerforHadoopisanidealsolutiontomeetthisneed.Inthispaper,wehavediscussedthebenefitsofmovingtoSASGridManageronHadoop,requiredarchitectureandrecommendedconfigurations.WehavealsosharedaseamlessSASuserexperiencewithyou.IfyouwouldliketolearnmoreaboutSASGridManagerforHadoop,pleaseseethefollowinglinksbelow.

REFERENCES“SASAnalyticsonYourHadoopClusterManagedbyYARN.”July14th,2015.Availableathttp://support.sas.com/rnd/scalability/grid/hadoop/SASAnalyticsOnYARN.pdf.SASInstituteInc.“IntroductiontoGridComputing.”Availableathttp://support.sas.com/rnd/scalability/grid/.“HortonworksApacheHadoopYARNOverviewWebPage.”Availableathttps://hortonworks.com/apache/yarn/.“ApacheHadoopYarn.”TheApacheSoftwareFoundation.June29,2015.Availableathttps://hadoop.apache.org/docs/current/hadoop-yarn/hadoop-yarn-site/YARN.html.“Introducing-SASÒGridManagerforHadoop.”SASGlobalForumPaperSAS6281-2016Availableathttp://support.sas.com/resources/papers/proceedings16/SAS6281-2016.pdf

CONTACTINFORMATIONYourcommentsandquestionsarevaluedandencouraged.Contacttheauthorat: MarkLochbihler PrincipalArchitect,HortonworksInc. mark.lochbihler@hortonworks.comSASandallotherSASInstituteInc.productorservicenamesareregisteredtrademarksortrademarksofSASInstituteInc.intheUSAandothercountries.ÒindicatesUSAregistration.

top related