introducing gardner - center for research...
TRANSCRIPT
CenterforResearchInformatics
• Establishedin2011tosupportBSDresearch
• Mission:– ToprovideinformaticsresourcesandservicetotheBSD,toparticipateinclinicalandbiomedicalresearchofthehighestscientificmerit,andtosupportandpromoteresearchandeducationinthefieldofinformatics
ResourcesandServices• Clinicaldataforresearch• Bioinformaticsdataanalysis• Computinginfrastructure– Storage– HPC– VirtualServers
• Researchdatamanagementtools• Custom-builtapplications• Educationalopportunities
http://cri.uchicago.edu
CRIInfrastructureTeam• Director
– Thorbjorn Axelsson• HighPerformanceComputing
– MikeJarsulic– TonyAburaad
• VirtualServers– AndyBrook– Sneha Jha
• Storage– Olumide Kehinde
• UtilityInfielder– DanSullivan
AboutMe
• LivedinPittsburghforabout32years• AttendedtheUniversityofPittsburgh(atJohnstown)
• Bettis AtomicPowerLaboratory(2004-2012)– ScientificProgrammer(Thermal/HydraulicDesign)– Analyst- USSGeraldR.Ford– HighPerformanceComputing
• UniversityofChicago(2012– present)
AboutTony
• MastersstudentincomputerscienceatUChicago– Completingcourseworkinmachinelearning,distributedcomputing,andiOS
• SpentlastsummerattheComputationInstituteworkingonacachingtoolfortheOpenScienceGrid
• HasbeenhelpingwithGardnerattheCRIsinceNovember
• Dislikesmimes
CRIHPCClustersSeptember2012
• PrudentialDataCenter– BRDFCLUSTER– IBICLUSTER– IBIBMEM
• KenwoodDataCenter– BIOCluster
Tarbell
• PurchasedbytheCRIin2012bythepreviousstaff
• DellclusterutilizingAMDBulldozerprocessors• Infiniband QDR• 110TBScratchSpace• WhynamedTarbell?
WhowasHarlanTarbell?• BorninDelavan,IL• GrewupinGroveland,IL• Magician• DoctorofNaprapathy• Futurist
Themes:• Beginnermistakes• Predictingthefuture• Quackery
BeginnerMistakes
• Scratchspace– Setuppoorlywherethesystemwouldbecomeunstable
– Utilizedonly60TBofspaceinitially– HardwarehadlowRAM(24GBpernode)
• Loginnode– Onlyone(fixed)
PredictingtheFuture
• ComputeNodes– Onlyonetierofmemory(fixed)
• Infiniband– ExpectingQDRtostickaroundforever– Poorstrategyforfutureclusters
TarbellMetrics
• SinceDecember2013– 234Users– TotalUserJobs:4.6Million– TotalCPUHours:18.29Million– AverageQueueHours:2.94Hours– AverageJobEfficiency:65%– AverageWallClockAccuracy:11%
WhowasMartinGardner?
• GraduateoftheUniversityofChicago
• YeomanontheUSSPopeduringWWII
• AmateurMagician• MathematicalGames• Skepticism• Literature• Art
MathematicalGames
• Flexagons• Polynominoes• GameofLife• Newcomb’sParadox• Mandelbrot'sFractals• PenroseTiling• PublicKeyCryptography• Bestbetforsimpletonsparadox
Skepticism
• OriginalfoundersofCSICOP
• Criticof:– Lysenkoism– Homeopathy– Chiropractic– Naturopathy– Orgone Chambers– Dianetics
NodeCountComparisonNodeType Tarbell Gardner
Standard ComputeNodes 34 88
Mid-Tier ComputeNodes 0 28
LargeMemory Nodes 2 4
GPU Nodes 0 5
XeonPhiNodes 0 1
InteractiveNodes 2 2(eventually4)
RemoteViz Nodes 0 Possibly 2
CoreCountComparisonNodeType Tarbell Gardner
Standard ComputeNodes 2176 2464
Mid-Tier ComputeNodes 0 784
LargeMemory Nodes 80 112
GPU Nodes 0 140
XeonPhiNodes 0 28
TOTAL 2256 3528
StandardNodeComparisonAttribute Tarbell Gardner
Processor AMDOpteron 6274 Intel HaswellE5-2683v3
Clock Speed 2.2GHz 2.0GHz
Processors perNode 4 2
Cores perProcessor 16 14
Instructions perCycle 8 (or4) 16
RAMperCore 4GB 4.5 GB
Mid-TierComputeNodesAttribute Gardner
Processor Intel HaswellE5-2683v3
Clock Speed 2.0GHz
Processors perNode 2
Cores perProcessor 14
Instructions perCycle 16
RAMperCore 16 GB
LargeMemoryNodeComparisonAttribute Tarbell Gardner
Processor Intel Westmere E7-4860 Intel HaswellE5-2683v3
Clock Speed 2.27GHz 2.0GHz
Processors perNode 4 2
Cores perProcessor 10 14
Instructions perCycle 8 16
RAMperCore 25.6GB 45.7 GB
GPGPUNodesCPUAttribute Gardner
Processor Intel HaswellE5-2683v3
Clock Speed 2.0GHz
Processors perNode 2
Cores perProcessor 14
Instructions perCycle 16
RAMperCore 8GB
Accelerator Nvidia TeslaK80
GPU TeslaGK210 (x2)
CoresperGPU 2496
RAM perAccelerator 24GB
XeonPhiNodesCPUAttribute Gardner
Processor Intel HaswellE5-2683v3
Clock Speed 2.0GHz
Processors perNode 2
Cores perProcessor 14
Instructions perCycle 16
RAMperCore 8 GB
Accelerator IntelXeonPhi5110P(x2)
CoresperAccelerator 60
RAM perAccelerator 8 GB
ScratchSpaceComparisonAttribute Tarbell Gardner
Processor Intel Westmere E5620 Intel HaswellE5-2623v3
Clock Speed 2.4GHz 3.0GHz
Processors perNode 2 2
Cores perProcessor 4 4
Instructions perCycle 8 16
RAMper Node 24 GB 64GB
CachePool N/A 200GB
UsableSpace 110TB 350TB
InterconnectBandwidth 40 Gb/s 56Gb/s
BenchmarkingAttribute Tarbell Gardner
Theoretical Performance 44.2TFLOPs 112.8 TFLOPs
Actual Performance 21.2TFLOPs 97TFLOPs
GPU TheoreticalPerformance N/A 14.5 TFLOPs
GPUActualPerformance N/A 11.4TFLOPs
XeonPhiTheoretical Performance N/A 2TFLOPs
XeonPhiActualPerformance N/A 1.7TFLOPs
FLOPs=Nodes*NumberofCores/Node*Frequency*OperationsperCycle
Software• Compilers
– Intel– PGI– GNU– Java7and8– DLang
• MPI– OpenMPI– MPICH– IntelMPI
• SoftwareEnvironment– Lmod
• Scheduler– Moab9.1
• ResourceManager– Torque6.1
WhatisGoingtoHappenTo?• Tarbell– Decommissioned:3/31/17
• LMEM-CRI– Decommissioned
• Stats– Repurposed– Xenabledloginnodesforthecluster– Commercialsoftware:SAS,Stata,MATLAB,etc.
• Galaxy– DecommissionedwithTarbell
ObtaininganAccount
• Prerequisites:BSDAccount• Signupforandaccount– http://cri.uchicago.edu– EarlyAccess• EmailAddressforJobOutput• EmergencyPhoneNumber• SoftwareRequests• LevelofExperience
– CollaboratorAccounts
BeingaGoodHPCCitizen
1. DonotrunanalysisontheLoginNodes!2. Citetheclusterandthesoftwareusedinyour
publications.3. Trytobeaccuratewithyourresource
requests.4. AllowtheCRItoinstallopensourcesoftware
foryou.5. Ifyouaregoingtorunananalysisthatis
muchlargerthannormal,letusknowinadvance.
BeingaGoodHPCCitizen
6. Providefeedback.7. CleanupyourScratchStorage.8. Ifusingascripttosubmit,sleepforafew
secondsinbetweeneachsubmission.9. Besuretoreleasememoryinyourscripts.10. Ifyouhaveaquestion,don’thesitatetoask
us.11. Ifyounoticeaproblem,reportit.
Citations
• ThecontinuedgroupandsupportoftheCRI’sHPCprogramisdependentondemonstrablevalue.
• Citingtheclusterallowsustojustifypurchasingfasterclusterswithmorecapacityinthefuture.
• SampleCitation:– ThisworkutilizedthecomputationalresourcesoftheCenterforResearchInformatics’GardnerHPCclusterattheUniversityofChicago(http://cri.uchicago.edu).
• Makesureyousitethesoftwareusedaswell!
SoftwareInstallation
• SoftwarerequestcanbesubmittedviatheResourceRequestformsathttp://cri.uchicago.edu
• AdvantagestoallowingtheCRItoinstallopensourcesoftware:– Otheruserscanutilizeit– Supportnightmare– Portability
• Disadvantages– Itmaytakeafewdays(letusknowthepriority)
HowtoGetSupport
• CalltheCRIHelpDesk– 773-834-8475
• [email protected] tosubmitaticketorusetheRequestFormsontheCRIWebsite
• MeetwithMikeatourPeckOffice(N161)– TuesdayandThursdayAfternoons– Scheduleanappointment
• UserGroupMeetings– OnceamonthatPeck
Examples• Getanaccount– ResourceRequestForm
• Havesoftwareinstalled– ResourceRequestForm
• Jobextension– [email protected]– CC:Mike([email protected])
• Majorproblemonthecluster– CallHelpDesk– [email protected]
LoggingIn
• OnCampus– ssh togardner.cri.uchicago.edu
• OffCampus– VPN• CVPN(CNETAccountRequired)• BSDVPN
– ssh togardner.cri.uchicago.edu
Storage• HomeDirectories(/home/<userid>)– Permanent,Private,Quota’d,NotBackedUp– 1Gb/s
• LabShares(/group/<lab_name>)– Permanent,Shared,Quota’d,BackedUp– 1Gb/s
• ScratchSpace(/scratch/<userid>)– Purged,Private,NotQuota’d,NotBackedUp– 56Gb/s– Purgedevery6months(tostart)
SoftwareEnvironment
• Tarbell->EnvironmentModules– Flatmodulesystem–ModuleswritteninTCL– LastUpdate:December2012
• Gardner->Lmod– Hierarchicalmodulesystem–ModuleswritteninLua– LastUpdate:August2016
Lmod Basics
• Seewhichmodulesareavailabletobeloaded– module avail
• Loadpackages– module load <package1> <package2>
• Seewhichpackagesareloaded– module list
• Unloadapackage– module unload <package>
SchedulingJobs(Defaults)• MaximumAmountofWalltime– 14Days
• MaximumAmountofProcessors– 500concurrent
• Maximumamountofjobs– 500concurrent
• Maximumamountofmemory– 2TB
JobScheduling(Queues)
• Route– DefaultQueue(non-executable)
• Express– 1node;1proc;<=4GBRAM;<=6hours
• Standard–Multi-node;Multi-proc;<=8GBRAM
JobScheduling(Queues)
• Mid–Multi-node;Multi-proc;>8GBRAM;<=24GBRAM
• High–Multi-node;Multi-proc;>24GBRAM
TorqueClientCommands• Submitajob– qsub <scriptname>
• Deleteajob– qdel <jobid>
• Jobstatus– qstat
• ExtendedJobStatus– qstat –f
TorqueDirectives• SpecifyaJobName– #PBS -N <JobName>
• Specifynodesandcores– #PBS -l nodes=x:ppn=y
• Specifywallclocktimelimit– #PBS -l walltime=[dd:[hh:[mm:]]]ss
• Specifythememorylimit– #PBS -l mem=<x>gb
TorqueDirectives
• Specifytheshelltoexecutethescript– #PBS -S <path_to_shell>
• SpecifytheSTDOUTlocation– #PBS -o <path>
• SpecifytheSTDERRlocation– #PBS -e <path>
qsub Arguments
• Runandinteractivejob– qsub –I
• Submitajobandimmediatelyholdit– qsub -h <jobscript>