table of contents - greenplum...chipsets or hardware used across many platforms nic chipsets (like...

25
1 2 3 4 6 18 19 20 22 24 25 Table of Contents Table of Contents Introduction Key Points for Review Characteristics of a Supported Pivotal Hardware Platform Pivotal Approved Recommended Architecture Pivotal Cluster Examples Example Rack Layout Using gpcheckperf to Validate Disk and Network Performance Pivotal Greenplum Segment Instances per Server Pivotal Greenplum on Virtualized Systems Additional Helpful Tools © Copyright Pivotal Software Inc, 2013-2016 1 A03

Upload: others

Post on 26-Sep-2020

3 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Table of Contents - Greenplum...Chipsets or hardware used across many platforms NIC chipsets (like some of the Intel series) RAID controllers (like LSI or StorageWorks) Reference motherboards/designs

12346

181920222425

TableofContents

TableofContentsIntroductionKeyPointsforReviewCharacteristicsofaSupportedPivotalHardwarePlatformPivotalApprovedRecommendedArchitecturePivotalClusterExamplesExampleRackLayoutUsinggpcheckperftoValidateDiskandNetworkPerformancePivotalGreenplumSegmentInstancesperServerPivotalGreenplumonVirtualizedSystemsAdditionalHelpfulTools

©CopyrightPivotalSoftwareInc,2013-2016 1 A03

Page 2: Table of Contents - Greenplum...Chipsets or hardware used across many platforms NIC chipsets (like some of the Intel series) RAID controllers (like LSI or StorageWorks) Reference motherboards/designs

IntroductionTheEMCDataComputingApplianceprovidesaready-madeplatformthatstrivestoaccommodatethemajorityofcustomerworkloads.OneofPivotalGreenplum’sstrongestvaluepropositionsisitsabilitytorunonpracticallyanymodern-dayhardwareplatform.Moreandmore,PivotalEngineeringisseeingcaseswherecustomerselecttobuildaclusterthatsatisfiesaspecificrequirementorpurpose.

PivotalPlatformEngineeringpublishesthisframeworkasaresourceforassistingcustomersinthiseffort.

ObjectivesThisguidecanbeusedfor:

AclearunderstandingofwhatcharacterizesarecommendedplatformforrunningPivotalGreenplumDatabase

Areviewofthetwomostcommontopologieswithsupportingrecommendedarchitecturediagrams

Pivotalrecommendedreferencearchitecturethatincludeshardwarerecommendations,configuration,harddiskguidelines,networklayout,installation,dataloading,andverification

Extraguidancewithreal-worldGreenplumclusterexamples(seePivotalClusterExamples)

Thisdocumentdoes:providerecommendationsforbuildingawell-performingPivotalclusterusingthehardwareguidelinespresented

providegeneralconceptswithoutspecifictuningsuggestions

Thisdocumentdoesnot:

promisePivotalsupportfortheuseofthirdpartyhardware

assumethattheinformationhereinappliestoeverysite,butissubjecttomodificationdependingonacustomer’sspecificlocalrequirements

provideall-inclusiveproceduresforconfiguringPivotalGreenplum.AsubsetofinformationisincludedasitpertainstodeployingaPivotalcluster.

GreenplumTermstoKnowmaster

AserverthatprovidesentrytotheGreenplumDatabasesystem,acceptsclientconnectionsandSQLqueries,anddistributesworktothesegmentinstances.

segmentinstancesIndependentPostgreSQLdatabasesthateachstoreaportionofthedataandperformthemajorityofqueryprocessing.

segmenthostAserverthattypicallyexecutesmultipleGreenplumsegmentinstances.

interconnectNetworkinglayeroftheGreenplumDatabasearchitecturethatfacilitatesinter-processcommunicationbetweensegments.

FeedbackandUpdatesPleasesendfeedbackand/[email protected].

©CopyrightPivotalSoftwareInc,2013-2016 2 A03

Page 3: Table of Contents - Greenplum...Chipsets or hardware used across many platforms NIC chipsets (like some of the Intel series) RAID controllers (like LSI or StorageWorks) Reference motherboards/designs

KeyPointsforReview

WhatisPivotalEngineeringRecommendedArchitecture?ThisPivotalRecommendedArchitecturecomprisesgenericrecommendationsforthirdpartyhardwareforusewithPivotalsoftwareproducts.Pivotalmaintainsexamplesofvariousimplementationsinternallytoaidinassistingcustomersinclusterdiagnosticsandconfigurationassistance.Pivotaldoesnotperformhardwarereplacement,norisPivotalasubstitutefortheOEMvendorsupportfortheseconfigurations.

WhyInstallonanOEMVendorPlatform?TheEMCDCAstrivestoachievethebestbalancebetweenperformanceandcostwhilemeetingabroadrangeofcustomerneeds.Therearesomeveryvalidreasonscustomersmayopttodesigntheirownclusters.

Somepossibilitiesare:

Varyingworkloadprofilesthatmayrequiremorememoryorhigherprocessorcapacity

Specificfunctionalneedslikepublic/privateclouds,increaseddensity,ordisasterrecovery(DR)

Supportforradicallydifferentnetworktopologies

Deeper,moredirectaccessforhardwareandOSmanagement

ExistingrelationshipswithOEMhardwarepartners

PivotalEngineeringhighlyrecommendsfollowingPivotalarchitectureguidelinesifcustomersoptoutofusingtheapplianceanddiscussingtheimplementationwithaPivotalEngineer.Customersachievemuchgreaterreliabilitywhenfollowingtheserecommendations.

©CopyrightPivotalSoftwareInc,2013-2016 3 A03

Page 4: Table of Contents - Greenplum...Chipsets or hardware used across many platforms NIC chipsets (like some of the Intel series) RAID controllers (like LSI or StorageWorks) Reference motherboards/designs

CharacteristicsofaSupportedPivotalHardwarePlatform

CommodityHardwarePivotalbelievesthatcustomersshouldtakeadvantageoftheinexpensiveyetpowerfulcommodityhardwarethatincludesx86_64platformcommodityservers,storage,andEthernetswitches.

Pivotalrecommends:

Chipsetsorhardwareusedacrossmanyplatforms

NICchipsets(likesomeoftheIntelseries)RAIDcontrollers(likeLSIorStorageWorks)

Referencemotherboards/designs

Machinesthatusereferencemotherboardimplementationsarepreferred.AlthoughDIMMcountisimportant,ifamanufacturerintegratesmoreDIMMslotsthantheCPUmanufacturerspecifies,moreriskisplacedontheplatform.

Ethernet-basedinterconnects(10Gb)are

Highlypreferredtoproprietaryinterconnects.Highlypreferredtostoragefabrics.

ManageabilityPivotalrecommends:

Remote,out-of-bandmanagementcapabilitywithsupportforsshconnectivityaswellasweb-basedconsoleaccessandvirtualmedia.

DiagnosticLEDsthatconveyfailureinformation.Amberlightsareaminimum,butanLEDthatdisplaystheexactfailureismoreuseful.

Tool-freemaintenance(thecovercanbeopenedwithouttools,partsarehot-swappablewithouttools,etc.).

Labeling–componentssuchasDIMMsarelabeledsoit’seasytodeterminewhichpartneedstobereplaced.

Command-line,script-basedinterfacesforconfiguringtheserverBIOS,andoptionslikeRAIDcardsandNICs.

RedundancyPivotalrecommends:

Redundanthot-swappablepowersupplies

Redundanthot-swappablefans

Redundantnetworkconnectivity

Hot-swappabledrives

Hot-sparedriveswhenimmediatereplacementoffailedhardwareisunavailable

DeterminingtheBestTopology

TraditionalTopologyThisconfigurationrequirestheleastspecificnetworkingskills,andisthesimplestpossibleconfiguration.Inatraditionalnetworktopology,everyserverintheclusterisdirectlyconnectedtoeveryswitchinthecluster.Thisistypicallyimplementedover10GbEthernet.Thistopologylimitstheclustersizetothenumberofportsontheselectedinterconnectswitches.10Gbportsontheserversarebondedintoanactive/activepairandroutedirectlytoasetofswitchesconfiguredusingMLAG(orcomparabletechnology)toprovidearedundanthighspeednetworkfabric.

©CopyrightPivotalSoftwareInc,2013-2016 4 A03

Page 5: Table of Contents - Greenplum...Chipsets or hardware used across many platforms NIC chipsets (like some of the Intel series) RAID controllers (like LSI or StorageWorks) Reference motherboards/designs

Figure:RecommendedArchitectureExample1(TypicalTopology)

ScaleableTopologyScalablenetworksimplementanetworkcorethatallowstheclustertogrowbeyondthenumberofportsintheinterconnectswitches.Caremustbetakentoensurethatthenumberoflinksfromthein-rackswitchesisadequatetoservicethecore.

HowtoDeterminetheMaximumNumberofServers

Forexample,eachrackcanhold16serversandyoudeterminethatthecoreswitcheseachhave48ports.Oftheseports4areusedtocreatetheMLAGbetweenthetwocoreswitches.Oftheremaining44ports,networkingfromasinglesetofinterconnectswitchesinarackuses4linkspercoreswitch,2fromeachinterconnectswitchtoeachofthecoreswitches.Themaximumnumberofserversisdeterminedbythefollowingformula:

max-nodes=(nodes-per-rack*((core-switch-port-count-MLAGportutilization)/rack-to-rack-link-port-count))176=(16*((48-4)/4))

Figure:RecommendedArchitectureExample2(ScalableTopology)

©CopyrightPivotalSoftwareInc,2013-2016 5 A03

Page 6: Table of Contents - Greenplum...Chipsets or hardware used across many platforms NIC chipsets (like some of the Intel series) RAID controllers (like LSI or StorageWorks) Reference motherboards/designs

PivotalApprovedRecommendedArchitecture

MinimumServerGuidelinesTable1listsminimumrequirementsforagoodcluster.Usegpcheckperftogeneratethesemetrics.

SeeAppendixC:Using gpcheckperf toValidateDiskandNetworkPerformanceforexample gpcheckperf output.

Table1.BaselineNumbersforaPivotalCluster

MasterNodes(mdw&smdw)

Usersandapplicationsconnecttomasterstosubmitqueriesandreturnresults.Typically,monitoringandmanagingtheclusterandthedatabaseisperformedthroughthemasternodes.

8+physicalcoresatgreaterthan2GHzclockspeed

>256GB

>600MB/sRead

>500MB/sWrite

2x10GbNICs

MultipleNICs 1U

SegmentNodes(sdw)

Segmentnodesstoredataandexecutequeries.Theyaregenerallynotpublicfacing.

Multiplesegmentinstancesrunononesegmentnode.

8+physicalcoresatgreaterthan2GHzclockspeed

>256GB

>2000MB/sRead

>2000MB/sWrite

2x10GbNICs

MultipleNICs 2U

ETL/BackupNodes(etl)

Generallyidenticaltosegmentnodes.Theseareusedasstagingareasforloadingdataorasdestinationsforbackupdata.

8+physicalcoresatgreaterthan2GHzclockspeed

>64GBormore

>2000MB/sRead

>2000MB/sWrite

2x10GbNICs

MultipleNICs 2U

NetworkGuidelines

Table2.AdministrationandInterconnectSwitches

AdministrationNetwork

Administrationnetworksareusedtotietogetherlights-outmanagementinterfacesintheclusterandprovideamanagement

48 1GbAlayer-2/layer-3managedswitchperrackwithnospecificbandwidthorblockingrequirements.

©CopyrightPivotalSoftwareInc,2013-2016 6 A03

Page 7: Table of Contents - Greenplum...Chipsets or hardware used across many platforms NIC chipsets (like some of the Intel series) RAID controllers (like LSI or StorageWorks) Reference motherboards/designs

clusterandprovideamanagementrouteintoserverandOSswitches.

InterconnectNetwork 48 10GB

Twolayer-2/layer-3managedswitchesperrack.Allportsmusthavefullbandwidth,beabletooperateatlinerate,andbenon-blocking.

Table3.Racking,Power,andDensity

RackingGenerally,a40Uorlargerrackthatis1200mmdeepisrequired.Built-incablemanagementispreferred.ESMprotectivedoorsarealsopreferred.

Power

ThetypicalinputpowerforaPivotalGreenplumrackis4x208/220V,30amp,singlephasecircuitsintheUS.Internationally,4x230V,32amp,singlephasecircuitsaregenerallyused.Thisaffordsapowerbudgetof~9600VAoffullyredundantpower.

Otherpowerconfigurationsareabsolutelyfinesolongasthereisenoughenergydeliveredtotheracktoaccommodatethecontentsoftherackinafullyredundantmanner.

NodeGuidelines

OSLevelsAtaminimumthefollowingoperatingsystems(OS)aresupported:

RedHat/CentOSLinux5*

RedHat/CentOSLinux6

RedHat/CentOSLinux7**

SUSEEnterpriseLinux10.2or10.3

SUSEEnterpriseLinux11

*RHEL/CentOS5willbeunsupportedinthenextmajorrelease

**supportforRHEL/CentOS7isnearcompletion,pendingkernelbugfixes

ForthelatestinformationonsupportedOSversions,refertotheGreenplumDatabaseInstallationGuide.

SettingOSParametersforGreenplumDatabaseCarefulconsiderationmustbegivenwhensettingOSparametersforGreenplumDatabasehosts.RefertothelatestversionoftheGreenplumDatabaseInstallationGuideforthesesettings.

GreenplumDatabaseServerGuidelinesGreenplumDatabaseintegratesthreekindsofservers:masterservers,segmenthosts,andETLservers.GreenplumDatabaseserversmustmeetthefollowingcriteria.

MasterServers1Uor2Userver.Withlessofaneedfordrives,rackspacecanbesavedbygoingwitha1Uformfactor.However,a2Uformfactorconsistentwithsegmenthostsmayincreasesupportability.

©CopyrightPivotalSoftwareInc,2013-2016 7 A03

Page 8: Table of Contents - Greenplum...Chipsets or hardware used across many platforms NIC chipsets (like some of the Intel series) RAID controllers (like LSI or StorageWorks) Reference motherboards/designs

Sameprocessors,RAM,RAIDcard,andinterconnectNICsasthesegmenthosts.

Sixtotendisks(eightismostcommon)organizedintoasingleRAID5groupwithonehotspareconfigured.

SAS15korSSDdisksarepreferredwith10kdisksaclosesecond.

SATAdrivesareacceptableinsolutionsorientedtowardsarchivalspaceoverqueryperformance.

Alldisksmustbethesamesizeandtype.

Shouldbecapableofreadratesin gpcheckperf of500MB/sorhigher.(Thefasterthemasterscans,thefasteritcangeneratequeryplans,whichimprovesoverallperformance.)

Shouldbecapableofwriteratesin gpcheckperf of500MB/sorhigher.

Shouldhavesufficientadditionalnetworkinterfacestoconnecttothecustomernetworkdirectlyinthemannerdesiredbythecustomer.

SegmentHostsTypicallya2Userver.

Thefastestavailableprocessors.

256GBRAMormore.

OneortwoRAIDcardswithmaximumcacheandcacheprotection(flashorcapacitorspreferredoverbattery).RAIDcardsshouldbeabletosupportfullread/writecapacityofthedrives.

2x10GbNICs.

12to24disksorganizedintotwoorfourRAID5groups.Hotsparesshouldbeconfigured,unlesstherearedisksonhandforquickreplacement.

SAS15kdisksarepreferredwith10kdisksaclosesecond.SATAdisksarepreferredovernearlineSASifSAS15korSAS10kcannotbeused.Alldisksmustbethesamesizeandtype.

Aminimumreadratein gpcheckperf of300MB/spersegmentorhigher.(2000MB/sperserveristypical.)

Aminimumwriteratein gpcheckperf of300MB/sorhigher(2000MB/sperserveristypical.)

AdditionalTipsforSegmentHostConfigurationThenumberofsegmentinstancesthatarerunpersegmenthostisconfigurable,andeachsegmentinstanceisitselfadatabaserunningontheserver.Abaselinerecommendationoncurrenthardware,suchasthehardwaredescribedinAppendixA,is8primarysegmentinstancesperphysicalserver.

AsetofmemoryparameterswillbedeterminedwheninstallingthedatabasesoftwarethatdependupontheamountofRAMselectedforeachsegmentinstance.Whilethesearenotplatformparameters,itistheplatformthatdetermineshowmuchmemoryisavailableandhowthememoryparametersshouldbesetinthesoftware.Refertotheonlinecalculator(http://greenplum.org/calc/ )todeterminethesesettings.

RefertoAppendixDforfurtherreadingonsegmentinstanceconfiguration.

ETLServersTypicallya2Userver.

Thesameprocessors,RAM,andinterconnectNICsasthesegmentservers

OneortwoRAIDcardswithmaximumcacheandcacheprotection(flashorcapacitorspreferredoverbattery).

12to24disksorganizedintoRAID5groupsofsixtoeightdiskswithnohotsparesconfigured(unlessthereareavailabledisksaftertheRAIDgroupsareconstructed).

SATAdisksareagoodchoiceforETLasperformanceistypicallylessofaconcernthanstorageforthesesystems.

Shouldbecapableofreadratesin gpcheckperf of100MB/sorhigher.(ThefastertheETLserversscan,thefasterquerydatacanbeloaded.

Shouldbecapableofwriteratesin gpcheckperf of500MB/sorhigher.(ThefasterETLserverswrite,thefasterdatacanbestagedforloading.)

AdditionalTipsforSelectingETLServersETLnodescanbeanyserverthatoffersenoughstorageandperformancetoaccomplishthetasksrequired.Typically,between4and8ETLserversarerequiredpercluster.ThemaximumnumberisdependentonthedesiredloadperformanceandthesizeoftheGreenplumDatabasecluster.

Forexample,thelargertheGreenplumDatabasecluster,thefastertheloadscanbe.ThemoreETLservers,thefasterdatacanbeserved.HavingmoreETLbandwidththantheclustercanreceiveispointless.HavingmuchlessETLbandwidththantheclustercanreceivemakesforslowerloadingthanthe

©CopyrightPivotalSoftwareInc,2013-2016 8 A03

Page 9: Table of Contents - Greenplum...Chipsets or hardware used across many platforms NIC chipsets (like some of the Intel series) RAID controllers (like LSI or StorageWorks) Reference motherboards/designs

maximumpossible.

HardDiskConfigurationGuidelinesAgenericserverwith24hot-swappablediskscanhaveseveralpotentialdiskconfigurations.TestingbyPivotalPlatformandSystemsEngineeringshowsthatthebestperformingstorageforPivotalsoftwareis:

fourRAID5groupsofsixdiskseach(usedasfourfilesystems),or

combinedintooneortwofilesystemsusinglogicalvolumemanager.

ThefollowinginstructionsdescribehowtobuildtherecommendedRAIDgroupsandvirtualdisksforbothmasterandsegmentnodes.Howtheseultimatelytranslateintofilesystemsiscoveredintherelevantoperatingsystem’sinstallationguide.

LUNConfigurationTheRAIDcontrollersettingsanddiskconfigurationarebasedonsyntheticloadtestingperformedonseveralRAIDconfigurations.Unfortunately,thesettingsthatresultedinthebestreadratesdidnothavethehighestwriteratesandthesettingswiththebestwriteratesdidnothavethehighestreadrates.

Theprescribedsettingsofferacompromise.Inotherwords,thesesettingsresultinwriterateslowerthanthebestmeasuredwriteratebuthigherthanthewriteratesassociatedwiththesettingsforthehighestreadrate.Thesameistrueforreadrates.Thisisintendedtoensurethatbothinputandoutputarethebesttheycanbewhileaffectingtheothertheleastamountpossible.

LUNsforthesystemshouldbepartitionedandmountedas/data1forthefirstLUNandadditionalLUNsshouldfollowthesamenamingconventionwhileincrementallyincreasingthenumber(/data1,/data2,/data3…/dataN).AllfilesystemsshouldbeformattedasxfsandfollowtherecommendationssetforthinthePivotalGreenplumDatabaseInstallationGuide.

MasterServerMasterservers(primaryandsecondary)haveeight,hot-swappabledisks.Configurealleightdisksintoasingle,RAID5stripeset.Eachofthevirtualdisksthatarecarvedfromthisdiskgroupshouldhavethefollowingproperties:

256kstripewidth

Noread-ahead

Diskcachedisabled

DirectI/O

VirtualdisksareconfiguredintheRAIDcard’soptionalROM.EachvirtualdiskdefinedintheRAIDcardwillappeartobeadiskintheoperatingsystemwitha/dev/sd?devicefilename.

SegmentandETLServersSegmentservershave24,hot-swappabledisks.ThesecanbeconfiguredinanumberofwaysbutPivotalrecommendsfour,RAID5groupsofsixdiskseach(RAID5,5+1).Eachofthevirtualdisksthatwillbecarvedfromthesediskgroupsshouldhavethefollowingproperties:

256kstripewidth

Noread-ahead

Diskcachedisabled

DirectI/O

VirtualdisksareconfiguredintheRAIDcard’soptionalROM.EachvirtualdiskdefinedintheRAIDcardwillappeartobeadiskintheoperatingsystemwitha/dev/sd?devicefilename.

SSDStorageFlashstoragehasbeengaininginpopularity.PivotalhasnothadtheopportunitytodoenoughtestingwithSSDdrivestomakearecommendation.Itis

©CopyrightPivotalSoftwareInc,2013-2016 9 A03

Page 10: Table of Contents - Greenplum...Chipsets or hardware used across many platforms NIC chipsets (like some of the Intel series) RAID controllers (like LSI or StorageWorks) Reference motherboards/designs

importantwhenconsideringSSDdrivestovalidatethesustainedsequentialreadandwriteratesforthedrive.Manydriveshaveimpressiveburstrates,butareunabletosustainthoseratesforlongperiodsoftime.Additionally,thechoiceofRAIDcardneedstobeevaluatedtoensureitcanhandlethebandwidthoftheSSDdrives.

SAN/JBODStorageInsomeconfigurationsitmaybearequirementtouseanexternalstoragearrayduetothedatabasesizeorservertypebeingusedbythecustomer.Withthisinmind,itisimportanttounderstandthat,basedontestingbyPivotalPlatformandSystemsEngineering,SANandJBODstoragewillnotperformaswellaslocal,internalserverstorage.

Someconsiderationstobetakenintoaccountifinstallingorsizingsuchaconfigurationarethefollowing(independentofthevendorofchoice):

Knowthedatabasesizeandtheestimatedgrowthovertime

Knowthecustomer’sread/writeratio

LargeblockI/Oisthepredominantworkload(512KB)

DisktypeandpreferredRAIDtypebasedonthevendorofchoice

Expecteddiskthroughputbasedonreadandwrite

Responsetimeofthedisks/JBODcontroller

PreferredoptionistohaveBBUcapabilityoneithertheRAIDcardorcontroller

Redundancyinswitchzoning,preferablywithafanin:out2:1

Atleast8GBFibreChannel(FC)connectivity

EnsurethattheserversupportstheuseofFC,FCoE,orexternalRAIDcards

Inallinstanceswhereanexternalstoragesourceisbeingutilized,thevendorofthediskarray/JBODshouldbeconsultedtoobtainspecificrecommendationsbasedonasequentialworkload.Thismayalsorequirethecustomertoobtainadditionallicensesfromthepertinentvendors.

NetworkLayoutGuidelinesAllthesystemsintheGreenplumclusterneedtobetiedtogetherinsomeformofdedicated,high-speeddatainterconnect.Thisnetworkisusedforloadingdataandforpassingdatabetweensystemsduringqueryprocessing.Itshouldbeashigh-speedandlow-latencyaspossible,anditshouldnotbeusedforanyotherpurpose(i.e.,itshouldnotbepartofthegeneralLAN).

AruleofthumbfornetworkutilizationinaGreenplumclusteristoplanforuptotwentypercentofeachserver’smaximumI/Oreadbandwidthasnetworktraffic.Thismeansaserverwitha2000MB/sreadbandwidth(asmeasuredby gpcheckperf )mightbeexpectedtotransmit400MB/s.Greenplumalsocompressessomedataondiskbutuncompressesitbeforetransmittingtoothersystemsinthecluster,soa2000MB/sreadratewitha4xcompressionratioresultsinan8000MB/seffectivereadrate.Twentypercentof8000MB/sis1600MB/swhichismorethanasinglegigabitinterface’sbandwidth.

Toaccommodatethistraffic,10Gbnetworkingisrecommendedfortheinterconnect.Currentbestpracticesuggeststwo10Gbinterfacesfortheclusterinterconnect.Thisensuresthatthereisbandwidthtogrowinto,andreducescablingintheracks.Itisrecommendedtoconfigurethetwo10GbinterfaceswithNICbondingtocreateaload-balanced,fault-tolerantinterconnect.

Cisco,Brocade,andAristaswitchesaregoodchoicesasthesebrandsincludetheabilitytotieswitchestogetherinfabrics.TogetherwithNICbondingontheservers,thisapproacheliminatessinglepointsoffailureintheinterconnectnetwork.Intel,QLogic,orEmulexnetworkinterfacestendtoworkbest.Layer3capabilityisrecommendedsinceitintegratesmanyfeaturesthatareusefulinaGreenplumDatabaseenvironment.

Note:Thevendorhardwarereferencedaboveisstrictlymentionedasanexample.PivotalPlatformandSystemsEngineeringdoesnotspecifywhichproductstouseinthenetwork.

FCoEswitchsupportisalsorequiredifSANstorageisused,aswellassupportforFibresnooping(FIPS).

AGreenplumDatabaseclusterusesthreekindsofnetworkconnections:

Adminnetworks

Interconnectnetworks

Externalnetworks

AdminNetworks

©CopyrightPivotalSoftwareInc,2013-2016 10 A03

Page 11: Table of Contents - Greenplum...Chipsets or hardware used across many platforms NIC chipsets (like some of the Intel series) RAID controllers (like LSI or StorageWorks) Reference motherboards/designs

AnAdminnetworktiestogetherallthemanagementinterfacesforthedevicesinaconfiguration.Itisgenerallyusedtoprovidemonitoringandout-of-bandconsoleaccessforeachconnecteddevice.Theadminnetworkistypicallya1Gbnetworkphysicallyandlogicallydistinctfromothernetworksinthecluster.

Serversaretypicallyconfiguredsuchthattheout-of-bandorlights-outmanagementinterfacessharethefirstnetworkinterfaceoneachserver.Inthisway,thesamephysicalnetworkprovidesaccesstolights-outmanagementandanoperatingsystemlevelconnectionusefulfornetworkOSinstallation,patchdistribution,monitoring,andemergencyaccess.

SwitchTypes

Typicallyone24-or48-port,1Gbswitchperrackandoneadditional48-portswitchclusterasacore.

Any1GbswitchcanbeusedfortheAdminnetwork.Carefulplanningisrequiredtoensurethatanetworktopologyisdesignedtoprovideenoughconnectionsandthefeaturesdesiredbythesitetoprovidethekindsofaccessrequired.

CablesUseeithercat5eorcat6cablingfortheAdminnetwork.Cablethelights-outormanagementinterfacefromeachclusterdevicetotheAdminnetwork.PlaceanAdminswitchineachrackandcross-connecttheswitchesratherthanattemptingtoruncablesfromacentralswitchtoallracks.

Note:PivotalrecommendsusingadifferentcolorcablefortheAdminnetwork.

InterconnectNetworksTheinterconnectnetworktiestheserversintheclustertogetherandformsahigh-speed,low-contentiondataconnectionbetweentheservers.ThisshouldnotbeimplementedonthegeneraldatacenternetworkasGreenplumDatabaseinterconnecttraffictendstooverwhelmnetworksfromtimetotime.LowlatencyisneededtoensureproperfunctioningoftheGreenplumDatabasecluster.Sharingtheinterconnectwithageneralnetworktendstointroduceinstabilityintothecluster.

Typicallytwoswitchesarerequiredperrack,andtwomoretoactasacore.Usetwo10Gbcablesperserverandeightperracktoconnecttheracktothecore.

Interconnectnetworksareoftenconnectedtogeneralnetworksinlimitedwaystofacilitatedataloading.Inthesecases,itisimportanttoshieldboththeinterconnectnetworkandthegeneralnetworkfromtheGreenplumDatabasetrafficandvisa-versa.UsearouteroranappropriateVLANconfigurationtoaccomplishthis.

ExternalNetworkConnectionsThemasternodesareconnectedtothegeneralcustomernetworktoallowusersandapplicationstosubmitqueries.Typically,thisisdonewithasmallnumberof1Gbconnectionsattachedtothemasternodes.Anymethodthataffordsnetworkconnectivityfromtheusersandapplicationsneedingaccesstothemasternodesisacceptable.

InstallationGuidelinesEachconfigurationrequiresaspecificrackplan.Therearesingleandmulti-rackconfigurationsdeterminedbythenumberofserverspresentintheconfiguration.Asinglerackconfigurationisonewherealltheplannedequipmentfitsintoonerack.Multi-rackconfigurationsrequiretwoormorerackstoaccommodatealltheplannedequipment.

RackingGuidelinesfora42URackConsiderthefollowingifinstallingtheclusterina42Urack.

Priortorackinganyhardware,performasitesurveytodeterminewhatpoweroptionisdesired,ifpowercableswillbetoporbottomoftherack,andwhethernetworkswitchesandpatchpanelswillbetoporbottomoftherack.

InstalltheKMMtrayintorackunit19.

Installtheinterconnectswitchesintorackunits21and22leavingaone-unitgapabovetheKMMtray.

Racksegmentnodesupfromfirstavailablerackunitatthebottomoftherack(seemulti-rackrulesforvariationsusinglowrackunits).

Installnomorethansixteen2Uservers(excludesmasterbutincludessegment,andETLnodes).

Installthemasternodeintorackunit17.Installthestand-bymasterintorackunit18.

Adminswitchescanberackedanywhereintherack,thoughthetopistypicallythebestandsimplestlocation.

©CopyrightPivotalSoftwareInc,2013-2016 11 A03

Page 12: Table of Contents - Greenplum...Chipsets or hardware used across many platforms NIC chipsets (like some of the Intel series) RAID controllers (like LSI or StorageWorks) Reference motherboards/designs

Allcomputers,switches,arrays,andracksshouldbelabeledonboththefrontandback.

Allcomputers,switches,arrays,andracksshouldbelabeledasdescribedinthesectiononlabelslaterinthisdocument.

Allinstalleddevicesshouldbeconnectedtotwoormorepowerdistributionunits(PDUs)intherackwherethedeviceisinstalled.

Wheninstallingamulti-rackcluster:

Installtheinterconnectcoreswitchesinthetoptworackunitsifthecablescomeinfromthetoporinthebottomtworackunitsifthecablescomeinfromthebottom.

Donotinstallcoreswitchesinthemasterrack.

CablingThenumberofcablesrequiredvariesaccordingtotheoptionsselected.Ingeneral,eachserverandswitchinstalledwilluseonecablefortheAdminnetwork.Runcablesaccordingtoestablishedcablingstandards.Eliminatetightbendsorcrimps.Clearlylabelallateachend.Thelabeloneachendofthecablemusttracethepaththecablefollowsbetweenserverandswitch.Thisincludes:

Switchnameandport

Patchpanelnameandport,ifapplicable

Servernameandport

SwitchConfigurationGuidelinesTypically,thefactorydefaultconfigurationissufficient.

IPAddressingGuidelines

IPAddressingSchemefortheAdminNetworkAnadminnetworkshouldbecreatedsothatsystemmaintenanceandaccessworkcanbedoneonanetworkthatisnotthesameasclustertrafficbetweenthenodes.

Note:Pivotal’srecommendedIPaddressforserversontheAdminnetworkusesastandardinternaladdressspaceandisextensibletoincludeover1,000nodes.

AllAdminnetworkswitchespresentshouldbecrossconnectedandallNICsattachedtotheseswitchesparticipateinthe172.254.0.0/16network.

Table4.IPAddressesforServersandCIMC

HostType NetworkInterface IPAddress

SecondaryMasterNode CIMC 172.254.1.252/16

Eth0 172.254.1.250/16

SecondaryMasterNode CIMC 172.254.1.253/16

Eth0 172.254.1.251/16

Non-masterSegmentNodesinrack1(masterrack) CIMC 172.254.1.101/16through172.254.1.116/16

Eth0 172.254.1.1/16through172.254.1.16/16

Non-masterSegmentNodesinrack2 CIMC 172.254.2.101/16through172.254.2.116/16

Eth0 172.254.2.1/16through172.254.2.16/16

Non-masterSegmentNodesinrack# CIMC 172.254.#.101/16through172.254.#.116/16

Eth0 172.254.#.1/16through172.254.#.16/16

Note:Where#istheracknumber.

Thefourthoctetiscountedfromthebottomup.Forexample,thebottomserverinthefirstrackis172.254.1.1andthetop,excludingmasters,is172.254.1.16.

©CopyrightPivotalSoftwareInc,2013-2016 12 A03

Page 13: Table of Contents - Greenplum...Chipsets or hardware used across many platforms NIC chipsets (like some of the Intel series) RAID controllers (like LSI or StorageWorks) Reference motherboards/designs

Thebottomserverinthesecondrackis172.254.2.1andtop172.254.2.16.Thiscontinuesforeachrackintheclusterregardlessofindividualserverpurpose.

IPAddressingforNon-serverDevicesThefollowingtableliststhecorrectIPaddressingforeachnon-serverdevice.

Table5.Non-serverIPAddresses

Device IPAddress

FirstInterconnectSwitchinRack *172.254.#.201/16

SecondInterconnectSwitchinRack *172.254.#.202/16

*Where#istheracknumber

IPAddressingforInterconnectsusing10GbNICsTheInterconnectiswheredataisroutedathighspeedbetweenthenodes.

Table6.InterconnectIPAddressingfor10GbNICS

HostType PhysicalRJ-45Port IPAddress

PrimaryMaster 1stportonPCIecard 172.1.1.250/16

2ndportonPCIecard 172.2.1.250/16

SecondaryMaster 1stportonPCIecard 172.1.1.251/16

2ndportonPCIecard 172.2.1.251/16

Non-MasterNodes 1stportonPCIecard 172.1.#.1/16through172.1.#.16/16

2ndportonPCIecard 172.2.#.1/16through172.2.#.16/16

Note:Where#istheracknumber:

Thefourthoctetiscountedfromthebottomup.Forexample,thebottomserverinthefirstrackuses172.1.1.1and172.2.1.1.

Thetopserverinthefirstrack,excludingmasters,uses172.1.1.16and172.2.1.16.

EachNIContheinterconnectusesadifferentsubnetandeachserverhasaNIConeachsubnet.

IPAddressingforFaultTolerantInterconnectsThefollowingtablelistscorrectIPaddressesforfaulttolerantinterconnectsregardlessofbandwidth.

Table7.FaultTolerant(Bonded)Interconnects

HostType IPAddress

PrimaryMaster 172.1.1.250/16

SecondaryMaster 172.1.1.251/16

Non-MasterNodes 172.1.#.1/16through172.1.#.16/16

Note:Where#istheracknumber:

Thefourthoctetiscountedfromthebottomup.Forexample,thebottomserverinthefirstrackuses172.1.1.1.

Thetopserverinthefirstrack,excludingmasters,uses172.1.1.16.

©CopyrightPivotalSoftwareInc,2013-2016 13 A03

Page 14: Table of Contents - Greenplum...Chipsets or hardware used across many platforms NIC chipsets (like some of the Intel series) RAID controllers (like LSI or StorageWorks) Reference motherboards/designs

DataLoadingConnectivityGuidelinesHigh-speeddataloadingrequiresdirectaccesstothesegmentnodes,bypassingthemasters.TherearethreewaystoconnectaPivotalclustertoexternaldatasourcesorbackuptargets:

VLANOverlay–ThefirstandrecommendedbestpracticeistousevirtualLANs(VLANS)toopenupspecifichostsinthecustomernetworkandtheGreenplumDatabaseclustertoeachother.

DirectConnecttoCustomerNetwork–Onlyuseifthereisaspecificcustomerrequirement.

Routing–Onlyuseifthereisaspecificcustomerrequirement.

VLANOverlayVLANoverlayisthemostcommonlyusedmethodtoprovideaccesstoexternaldatawithoutintroducingnetworkproblems.TheVLANoverlayimposesanadditionalVLANontheconnectionsofasubsetoftheclusterservers.

HowtheVLANOverlayMethodWorksUsingtheVLANOverlaymethod,trafficpassesbetweentheclusterserversontheinternalVLAN,butcannotpassoutoftheinternalswitchfabricbecausetheexternalfacingportsareassignedonlytotheoverlayVLAN.TrafficontheoverlayVLAN(traffictoorfromIPaddressesassignedtotherelevantservers’virtualnetworkinterfaces)canpassinandoutofthecluster.

ThisVLANconfigurationallowsmultipleclusterstoco-existwithoutrequiringanychangetotheirinternalIPaddresses.Thisgivescustomersmorecontroloverwhatelementsoftheclustersareexposedtothegeneralcustomernetwork.TheOverlayVLANcanbeadedicatedVLANandincludeonlythoseserversthatneedtotalktoeachother;ortheOverlayVLANcanbethecustomer’sfullnetwork.

Figure:BasicVLANOverlayExample

Thisfigureshowsaclusterwith3segmenthosts,master,standbymasterandETLhost.Inthiscase,onlytheETLhostispartoftheoverlay.ItisnotarequirementtohaveETLnodeusetheoverlay,thoughthisiscommoninmanyconfigurationstoallowdatatobestagedwithinacluster.Anyoftheserversinthisrackoranyrackofanyotherconfigurationmayparticipateintheoverlayifdesired.Thetypeofconfigurationwilldependuponsecurityrequirementsandiffunctionswithintheclusterneedtoreachanyoutsidedatasources.

ConfiguringtheOverlayVLAN–AnOverviewConfiguringtheVLANinvolvesthreesteps:

1. VirtualinterfacetagspacketswiththeoverlayVLAN

2. ConfiguretheswitchintheclusterwiththeoverlayVLAN

3. Configuretheportsontheswitchconnectingtothecustomernetwork

©CopyrightPivotalSoftwareInc,2013-2016 14 A03

Page 15: Table of Contents - Greenplum...Chipsets or hardware used across many platforms NIC chipsets (like some of the Intel series) RAID controllers (like LSI or StorageWorks) Reference motherboards/designs

Step1–VirtualinterfacetagspacketswiththeoverlayVLAN

EachserverthatisbothinthebaseVLANandtheoverlayVLANhasavirtualinterfacecreatedthattagspacketssentfromtheinterfacewiththeoverlayVLAN.Forexample,supposeeth2isthephysicalinterfaceonanETLserverthatisconnectedtothefirstinterconnectnetwork.ToincludethisserverinanoverlayVLANtheinterfaceeth2.1000iscreatedusingthesamephysicalportbutdefiningasecondinterfacefortheport.ThephysicalportdoesnottagitspacketsbutanypacketsentusingthevirtualportistaggedwithaVLAN.

Step2–ConfiguretheswitchintheclusterwiththeoverlayVLAN

TheswitchintheclusterthatconnectstotheserversandthecustomernetworkisconfiguredwiththeoverlayVLAN.AlloftheportsconnectedtoserversthatwillparticipateintheoverlayarechangedtoswitchportmodeconvergedandaddedtoboththeinternalVLAN(199)andtheoverlayVLAN(1000).

Step3–Configuretheswitchportsconnectedtothecustomernetwork

Theportsontheswitchconnectingtothecustomernetworkareconfiguredaseitheraccessortrunkmodeswitchports(dependingoncustomerpreference)andaddedonlytotheoverlayVLAN.

DirectConnecttotheCustomer’sNetwork

EachnodeintheGreenplumDatabaseclustercansimplybecableddirectlytothenetworkwherethedatasourcesexistoranetworkthatcancommunicatewiththesourcenetwork.Thisisabruteforceapproachthatworksverywell.Dependingonwhatnetworkfeaturesaredesired(redundancy,highbandwidth,etc.)thismethodcanbeveryexpensiveintermsofcablingandswitchgearaswellasspaceforrunninglargenumbersofcables.

Figure:DataLoading—DirectConnecttoCustomerNetwork

Routing

Onewayistouseanyofthestandardnetworkingmethodsusedtolinktwodifferentnetworkstogether.Thesecanbedeployedtotietheinterconnectnetwork(s)tothedatasourcenetwork(s).Whichofthesemethodsisusedwilldependonthecircumstancesandthegoalsfortheconnection.

ArouterisinstalledthatadvertisestheexternalnetworkstotheserversintheGreenplumcluster.Thismethodcouldpotentiallyhaveperformanceandconfigurationimplicationsonthecustomer’snetwork.

ValidationGuidelinesMostofthevalidationeffortisperformedaftertheOSisinstalledandavarietyofOS-leveltoolsareavailable.AchecklistisincludedintherelevantOSinstallationguidethatshouldbeseparatelyprintedandsignedfordeliveryandincludestheissuesraisedinthissection.

Examineandverifythefollowingitems:

Allcableslabeledaccordingtothestandardsinthisdocument

©CopyrightPivotalSoftwareInc,2013-2016 15 A03

Page 16: Table of Contents - Greenplum...Chipsets or hardware used across many platforms NIC chipsets (like some of the Intel series) RAID controllers (like LSI or StorageWorks) Reference motherboards/designs

Allrackslabeledaccordingtothestandardsinthisdocument

Alldevicespoweron

Allhot-swappabledevicesareproperlyseated

Nodevicesshowanywarningorfaultlights

AllnetworkmanagementportsareaccessibleviatheadministrationLAN

Allcablesareneatlydressedintotheracksandhavenosharpbendsorcrimps

Allrackdoorsandcoversareinstalledandcloseproperly

Allserversextendandretractwithoutpinchingorstretchingcables

Labels

Racks

EachrackinaRecommendedArchitectureislabeledatthetopoftherackandonboththefrontandback.RacksarenamedMasterRackorSegmentRack#,where#isasequentialnumberstartingat1.Aracklabelwouldlooklikethis:

Servers

Eachserverislabeledonboththefrontandbackoftheserver.Thelabelshouldbethehostnameoftheserver.

Inotherwords,ifasegmentnodeisknownassdw15,thelabelonthatserverwouldbesdw15.

Switches

Switchesarelabeledaccordingtotheirpurpose.Interconnectswitchesarei-sw,administrationswitchesarea-sw,andETLswitchesaree-sw.Eachswitchisassignedanumberstartingat1.Switchesarelabeledonthefrontoftheswitchonlysincethebackisgenerallynotvisiblewhenracked.

CertificationGuidelines

NetworkPerformanceTestgpcheckperf

Verifiesthelinerateonboth10GbNICs.

Run gpcheckperf onthedisksandnetworkconnectionswithinthecluster.Aseachcertificationwillvaryduetothenumberofdisks,nodes,andnetworkbandwidthavailable,thecommandstoruntestswilldiffer.

SeeUsinggpcheckperftoValidateDiskandNetworkPerformanceformoreinformationonthe gpcheckperf command.

HardwareMonitoringandFailureAnalysisGuidelinesInordertosupportmonitoringofarunningclusterthefollowingitemsshouldbeinplaceandcapableofbeingmonitoredwithinformationgatheredavailableviainterfacessuchasSNMPorIPMI.

©CopyrightPivotalSoftwareInc,2013-2016 16 A03

Page 17: Table of Contents - Greenplum...Chipsets or hardware used across many platforms NIC chipsets (like some of the Intel series) RAID controllers (like LSI or StorageWorks) Reference motherboards/designs

Fans/TempFanstatus/presence

Fanspeed

Chassistemp

CPUtemp

IOHtemp

MemoryDIMMtemp

DIMMstatus(populated,online)

DIMMsinglebiterrors

DIMMdoublebiterrors

ECCwarnings(correctionsexceedingthreshold)

ECCcorrectableerrors

ECCuncorrectableerrors

MemoryCRCerrors

SystemErrorsPosterrors

PCIefatalerrors

PCIenon-fatalerrors

CPUmachinecheckexception

Intrusiondetection

Chipseterrors

PowerPowerSupplypresence

Powersupplyfailures

Powersupplyinputvoltage

Powersupplyamperage

Motherboardvoltagesensors

Systempowerconsumption

©CopyrightPivotalSoftwareInc,2013-2016 17 A03

Page 18: Table of Contents - Greenplum...Chipsets or hardware used across many platforms NIC chipsets (like some of the Intel series) RAID controllers (like LSI or StorageWorks) Reference motherboards/designs

PivotalClusterExamplesThefollowingtablelistsgoodchoicesforclusterhardwarebasedonIntelSandyBridgeprocessor-basedserversandCiscoswitches.

Table1.HardwareComponents

MasterNode

Twoofthesenodespercluster

1Userver(similartotheDellR630):

2xE5-2680v3processors(2.5GHz,12cores,120W)

256GBRAM(8x16GB)

1xRAIDcardw/1GBprotectedcache

8xSAS,10k,6Gdisks(typically8x600GB,2.5”)Organizedintoasingle,RAID5diskgroupwithahotspare.LogicaldevicesdefinedaspertheOSneeds(boot,root,swap,etc.)andtheremaininginasingle,largefilesystemfordata

2x10GbIntel,QLogic,orEmulexbasedNICs

Lightsoutmanagement(IPMI-basedBMC)

2x650Worhigher,high-efficiencypowersupplies

SegmentNode&ETLNode

Upto16perrack.Nomaximumtotalcount

2Userver(similartotheDellR730xd):

2xE5-2680v3processors(2.5GHz,12cores,120W)

256GBRAM(8x16GB)

1xRAIDcardw/1GBprotectedcache

12to24xSAS,10k,6Gdisks(typically12x600GB,3.5”or24x1.8TB,2.5”)OrganizedintotwotofourRAID5groups.Usedeitherastwotofourdatafilesystems(withlogicaldevicesskimmedoffforboot,root,swap,etc.)orasonelargedeviceboundwithLogicalVolumeManager.

2x10GbIntel,QLogic,orEmulexbasedNICs

Lightsoutmanagement(IPMI-basedBMC)

2x650Worhigherhigh-efficiencypowersupplies

AdminSwitch

CiscoCatalyst2960Series

Asimple,48-port,1GBswitchwithfeaturesthatallowittobeeasilycombinedwithotherswitchestoexpandthenetwork.Theleastexpensive,managedswitchwithgoodreliabilityisappropriateforthisrole.Therewillbeatleastoneperrack.

Interconnect

Arista7050-52

TheAristaswitchlineallowsformulti-switchlinkaggregationgroups(calledMLAG),easyexpansion,andareliablebodyofhardwareandoperatingsystem.

©CopyrightPivotalSoftwareInc,2013-2016 18 A03

Page 19: Table of Contents - Greenplum...Chipsets or hardware used across many platforms NIC chipsets (like some of the Intel series) RAID controllers (like LSI or StorageWorks) Reference motherboards/designs

ExampleRackLayoutThefollowingfigureisanexampleracklayoutwithproperswitchandserverplacement.

Figure:42URackDiagram

©CopyrightPivotalSoftwareInc,2013-2016 19 A03

Page 20: Table of Contents - Greenplum...Chipsets or hardware used across many platforms NIC chipsets (like some of the Intel series) RAID controllers (like LSI or StorageWorks) Reference motherboards/designs

UsinggpcheckperftoValidateDiskandNetworkPerformanceThefollowingexamplesillustratehowgpcheckperfisusedtovalidatediskandnetworkperformanceinacluster.

CheckingDiskPerformance—gpcheckperfOutput

[gpadmin@mdw~]$gpcheckperf-fhosts-rd-D-d/data1/primary-d/data2/primary-S80G

/usr/local/greenplum-db/./bin/gpcheckperf-fhosts-rd-D-d/data1/primary-d/data2/primary-S80G

--------------------

DISKWRITETEST

--------------------

--------------------

DISKREADTEST

--------------------

====================

==RESULT

====================

diskwriteavgtime(sec):71.33diskwritetotbytes:343597383680

diskwritetotbandwidth(MB/s):4608.23

diskwriteminbandwidth(MB/s):1047.17[sdw2]diskwritemaxbandwidth(MB/s):1201.70[sdw1]

perhostbandwidth--

diskwritebandwidth(MB/s):1200.82[sdw4]diskwritebandwidth(MB/s):1201.70[sdw1]diskwritebandwidth(MB/s):1047.17[sdw2]diskwritebandwidth(MB/s):1158.53[sdw3]

diskreadavgtime(sec):103.17diskreadtotbytes:343597383680

diskreadtotbandwidth(MB/s):5053.03

diskreadminbandwidth(MB/s):318.88[sdw2]diskreadmaxbandwidth(MB/s):1611.01[sdw1]diskreadbandwidth(MB/s):1611.01[sdw1]diskreadbandwidth(MB/s):318.88[sdw2]diskreadbandwidth(MB/s):1560.38[sdw3]--perhostbandwidth--

CheckingNetworkPerformance—gpcheckperfOutput

©CopyrightPivotalSoftwareInc,2013-2016 20 A03

Page 21: Table of Contents - Greenplum...Chipsets or hardware used across many platforms NIC chipsets (like some of the Intel series) RAID controllers (like LSI or StorageWorks) Reference motherboards/designs

[gpadmin@mdw~]$gpcheckperf-fnetwork1-rN-d/tmp

/usr/local/greenplum-db/./bin/gpcheckperf-fnetwork1-rN-d/tmp

-------------------

--NETPERFTEST

-------------------

====================

==RESULT

====================

Netperfbisectionbandwidthtestsdw1->sdw2=1074.010000

sdw3->sdw4=1076.250000sdw2->sdw1=1094.880000sdw4->sdw3=1104.080000

Summary:

sum=4349.22MB/secmin=1074.01MB/secmax=1104.08MB/secavg=1087.31MB/secmedian=1094.88MB/sec

©CopyrightPivotalSoftwareInc,2013-2016 21 A03

Page 22: Table of Contents - Greenplum...Chipsets or hardware used across many platforms NIC chipsets (like some of the Intel series) RAID controllers (like LSI or StorageWorks) Reference motherboards/designs

PivotalGreenplumSegmentInstancesperServer

UnderstandingGreenplumSegmentsGreenplumsegmentinstancesareessentiallyindividualdatabases.InaGreenplumclustertherewillbeaGreenplummasterserverwhichdispatchesworktobedonetomultiplesegmentinstances.Eachoftheseinstanceswillresideonsegmenthosts.Dataforatableisdistributedacrossallofthesegmentinstancesandwhenaqueryisexecutedthatrequestsdataitisdispatchedtoallofthemtoexecuteinparallel.Thoseinstancesthatactivelyprocessthequeryarereferredtoastheprimaryinstances.AGreenplumclusterinadditionwillberunningmirrorinstances,onepairedtoeachprimary.Themirrorsdonotparticipateinansweringqueries;theyarejustperformingdatareplication,sothatifaprimaryshouldfailitsmirrorcantakeoverprocessinginitsplace.

Whenplanningacluster,itisimportanttounderstandthatalloftheseinstancesaregoingtoacceptaqueryinparallelandactuponit.Thereforetheremustbeenoughresourcesonaservertofacilitatealloftheseprocessesrunningandcommunicatingwitheachotheratonce.

SegmentsResourcesRuleofThumbAgeneralruleofthumbisthatforeverysegmentinstance(primaryormirror)youwillwanttoprovideatleast:

1core

200MB/sIOread

200MB/sIOwrite

8GBRAM

1GBnetworkthroughput

Asegmenthostwith8primaryand8mirrorinstanceswouldhave:

16cores

3200MB/sIOread

3200MB/sIOwrite

128GBRAM

20GBnetworkthroughput

Thesenumbershaveproventoprovideareliableplatformforavarietyofusecasesandgiveagoodbaselineforthenumberofinstancestorunonasingleserver.Pivotalrecommendsamaximumof8primaryand8mirrorinstancesonaservereveniftheresourcesprovidedaresufficientformore.

Pivotalhasfoundthatallocatingaratioof1to2physicalCPUsperprimarysegmentworkswellformostusecases;itisnotrecommendtodropbelow1CPUperprimarysegment.IdealarchitectureswilladditionallyalignNUMAarchitecturewiththenumberofsegments.

ReasonstoreducethenumberofsegmentinstancesperserverAdatabaseschemathatusespartitionedcolumnartableshasthepotentialtogeneratealargenumberoffiles.Forexample,atablethatispartitioneddailyforayearwillhaveover300files,oneforeachday.Ifthattableadditionallyhascolumnarorientationwith300columnsitwillhavewellover90,000filesrepresentingthedatainthattableononesegmentinstance.Aserverthatisrunning8primaryinstanceswiththistablewouldhavetoopen720,000filesifafulltablescanquerywereissuedtothattable.Systemsthatmakeuseofpartitioncolumnartablesmaybenefitfromalessernumberofsegmentinstancesperserverifdataisbeingusedinawaythatrequiresmanyopenfiles.

Systemsthatspanlargenumbersofnodescreatemoreworkforthemastertoplanqueriesanddocoordinationofallofthesegments.Insystemsspanningtwoormoreracksconsiderreducingthenumberofsegmentinstancesperserver.

Whenqueriesrequirelargeamountsofmemoryreducingthenumberofsegmentsperserverincreasestheamountofmemoryavailabletoanyonesegment.

Iftheamountofconcurrentqueryprocessingcausesresourcestorunlowonthesystem,reducingtheamountofparallelismontheplatformitselfwillallowformoreparallelisminqueryexecution.

Reasonstoincreasethenumberofsegmentinstancesperserver

©CopyrightPivotalSoftwareInc,2013-2016 22 A03

Page 23: Table of Contents - Greenplum...Chipsets or hardware used across many platforms NIC chipsets (like some of the Intel series) RAID controllers (like LSI or StorageWorks) Reference motherboards/designs

Inlowconcurrencysystemsincreasingthesegmentinstancecountwillalloweachquerytoutilizemoreresourcesinparallelifsystemutilizationislow.

SystemswithlargeamountsoffreeRAMthatcanbeusedbytheOSforfilebuffersmaybenefitfromincreasingthenumberofsegmentinstancesperserver.

©CopyrightPivotalSoftwareInc,2013-2016 23 A03

Page 24: Table of Contents - Greenplum...Chipsets or hardware used across many platforms NIC chipsets (like some of the Intel series) RAID controllers (like LSI or StorageWorks) Reference motherboards/designs

PivotalGreenplumonVirtualizedSystems

GeneralunderstandingofPivotalGreenplumandvirtualizationGreenplumDatabaseisaparallelprocessingsoftware.ThismeansthatthePivotalGreenplumsoftwareoftendoesthesameprocessatthesametimeacrossaclusterofnodes.Virtualizationisfrequentlyusedtocentralizesystemssothattheywillbeabletoshareresources,takingadvantageofthefactthatsoftwareoftenutilizesresourcessporadically,allowingthoseresourcestobeover-subscribed.GreenplumDatabasewillnotfunctionwellinanoversubscribedenvironmentbecauseallsegmentsbecomeactiveatonceduringqueryprocessing.Inthattypeofenvironment,thesystemispronetobottlenecksandunpredictablebehaviorthatcouldresultfrombeingunabletoaccessresourcesthesystembelievesithasbeenallocated.

Withthisinmind,aslongasthesystemmeetstherequirementssetforthintheinstallationguide,Greenplumissupportedonvirtualinfrastructure.

ChoosingthenumberofsegmentinstancestorunperVMTherecommendedhardwarespecificationsarequitelargeandmaybehardtoachieveinavirtualenvironment.InthesecaseseachVMshouldhavenomorethan1primaryand1mirrorsegmentforevery2CPUs,32GBofRAM,and300MB/sofsequentialreadbandwidthandwritebandwidth.ThusaVMwith4CPU,64GBRAM,and1GB/ssequentialreadandwritewouldbeabletohost2primarysegmentinstancesand2mirrorsegmentinstances.

WhileitispossibletocreatesegmenthostVMsthatonlyhostasingleprimarysegmentinstance,itispreferredtohaveatleasttwoormoreprimarysegmentinstancesperVM.Certainqueriesthatperformtaskssuchaslookingforuniquenesscancausesomesegmentinstancestoperformmorework,andrequiremoreresources,thanotherinstances.Groupingmultiplesegmentinstancestogetherononeservercanmitigatesomeoftheseincreasedresourceneedsbyallowingasegmentinstancetoutilizetheresourcesallocatedtotheothersegmentinstances.

VMEnvironmentSettingsVMshostingGreenplumDatabaseshouldnothaveanyauto-migrationfeaturesturnedon.Thesegmentinstancesareexpectingtoruninparallelandifoneofthemispausedtocoalescememoryorstateformigrationthesystemcanseeitasafailureoroutage.Itwouldbebettertotakethesystemdown,removeitfromtheactiveclusterandthenintroduceitbackintoclusteronceithasbeenmoved.

Specialcareshouldbegiventounderstandthetopologyofprimaryandmirrorsegmentinstances.NosetofVMsthatcontainaprimaryanditsmirrorshouldrunonthesamehostsystem.Ifahostcontainingboththeprimaryandmirrorforasegmentfails,theGreenplumclusterwillbeofflineuntilatleastoneofthemisrestoredtocompletethedatabasecontent.

©CopyrightPivotalSoftwareInc,2013-2016 24 A03

Page 25: Table of Contents - Greenplum...Chipsets or hardware used across many platforms NIC chipsets (like some of the Intel series) RAID controllers (like LSI or StorageWorks) Reference motherboards/designs

AdditionalHelpfulTools

YumRepositoryConfiguringaYUMrepositoryonthemasterserverscanmakemanagementofthesoftwareacrosstheclustermoreefficient,particularlyincaseswherethesegmentnodesdonothaveexternalinternetaccess.Morethanonerepositorycanmakemanagementeasier,forexampleonerepositoryforOSfilesandanotherforallotherpackages.Configuretherepositoriesonboththemasterandstandbymasterservers.

KickstartImagesKickstartimagesforthemasterserversandsegmenthostscanspeedupimplementationofnewserversandrecoveryoffailednodes.Inmostcaseswherethereisanodefailurebutthedisksaregood,reimagingisnotnecessarybecausethedisksinthefailedservercanbetransferredtothenewreplacementnode.

©CopyrightPivotalSoftwareInc,2013-2016 25 A03