going to light speed with datawarp - cray user group ... · pdf filegoing to light speed with...
TRANSCRIPT
Going to Light Speed with DataWarp!An Administrators Perspective
TinaDeclerckandDavePaulCUG2016–May10,2016
Hardware Description
• Hardware–144nodes– 2SSDspernode• 4devices-nvme• IntelP3608
– Abilitytoincreaseendurance(DWPD)• Decreasesavailablespace• NERSCconfiguredwith10DWPD–default3DWPD
DataWarp configuration • UsesDVStoprojecttocomputenodes– EachDWnodeisaDVSserver– LimitsaccesstoGPFSinCLE5.2
• DWschedulerdaemon– Runsonsdb
• ReSTfulAPI–gunicorn– Runsonmom/loginnode– UsesnginxasthehYpserver
User access
• Assigned2ways– Perjob– Persistent
• #DWdirec[vesinjobscripts– Privatemode– Striped– Type:currentlyonlyscratchsupported– Howmuchspaceneeded
Pools & Granularity
• PoolsdefineasetofDataWarpnodeswithaspecificconfigura[on
• DataWarpsupportsmul[plepools– Na[veSLURMdoesNOT
• Granularityisconfiguredatthenodeandpoollevels– Poolgranularitydefinesthesmallestunitthatcanbeallocatedpernode
Sessions, Instances, and Fragments
• Session– EquatestoajobID
• Instance– DataWarpspaceallocatedtoajoborpersistentovermanyjobs
• Fragment– Por[onsoftheinstanceoneachnodeallocatedtoit
But wait, there’s more…
• Configura[on– DefineshowaDWinstanceisused
• Namespace– Aconfigura[oncanhave0ormorenamespaces– Basicallyadirectoryorfolderinascratchconfigura[on
We’re not done yet…
• Registra[on– Bindsasessionwithaconfigura[on– Maintainsinforma[onforstage-in/stage-out
• Ac[va[on– Anavailableinstanceconfigura[ononasetofnodes
Putting it all together
POOL
96nodejobDWstripedType=scratch
8nodejobType=private
Type=scratchpersistentstriped
General problem solving - dwstat • sessstatetoken creatorownercreated expira[onnodes• 2520CA---myBBname CLI333332016-02-19T13:45:33never 0
CA--- u1_bb1 CLI111112016-03-02T15:01:01never 0• 6185CA---2128492SLURM555552016-05-09T07:13:58never96
• inststatesessbytes nodescreated expira[onintactlabelpublicconfs• 2234CA---2520212.91GiB12016-02-19T13:45:33nevertruemyBBnametrue1
CA--- 1.04TiB52016-03-02T15:01:02nevertrueu1_bb1 true1• 5534CA---61851.87TiB92016-05-09T07:13:58nevertrueI6185-0false1
• confstateinsttypeaccess_typeac[vs• 2505CA---2234scratchstripe0
CA--- scratchstripe0• 5811CA---5534scratchstripe1
• regstatesessconfwait• 5877CA---61145764true• 5890CA---61315773true• 5943CA---61855811true• ac[vstatesessconfnodesmount• 5732CA---6185581196/var/opt/cray/dws/mounts/batch/2128492/ss
• fragstateinstcapacitygrannode• 61382CA--2234212.91GiB4MiBnid00457• CA-- 212.91GiB4MiBnid02249• 73697CA-- 212.91GiB4MiBnid00205• 73698CA-- 212.91GiB4MiBnid01801• 73699CA-- 212.91GiB4MiBnid00014• 73700CA-- 212.91GiB4MiBnid01169• 165142CA--5487425.81GiB4MiBnid01418
• nsstateconffragspan• 49200CA--2505613821• 52607CA-- 5• 59484CA--5764165142129• States– Goal:C–createorD–destroy– Setup:A–actualizedor–non-actualized– Condi[on:F–fuseblownor–fuseintact– Status:T–transi[oningor–stableorblocked– Spectrum:M-mixedor–notdelayed
scontrol show burst Name=crayDefaultPool=wlm_poolGranularity=218016MTotalSpace=872936064MUsedSpace=234803232MStageInTimeout=86400StageOutTimeout=86400Flags=EnablePersistent,TeardownFailureGetSysState=/opt/cray/dw_wlm/default/bin/dw_wlm_cliAllocatedBuffers:Name=u1_bb1CreateTime=2016-03-02T15:01:01Size=1090080MState=allocatedUserID=user1(11111)Name=u2_spaceCreateTime=2016-05-09T11:00:43Size=1090080MState=allocatedUserID=user2(22222)Name=myBBnameCreateTime=2016-02-19T13:45:33Size=218016MState=allocatedUserID=user3(33333)Name=u4_Test2CreateTime=2016-05-05T18:31:36Size=654048MState=allocatedUserID=user4(44444)Name=u4_TestCreateTime=2016-05-05T16:01:02Size=654048MState=allocatedUserID=user4(44444)Name=u5_30TBCreateTime=2016-05-05T14:31:08Size=31612320MState=allocatedUserID=user5(55555)PerUserBufferUse:UserID=user1(11111)Used=1090080MUserID=user2(22222)Used=1090080MUserID=user3(33333)Used=218016MUserID=user4(44444)Used=1962144MUserID=user5(55555)Used=31612320M
Job hung with processes in ‘D’ state • Nodestuckcomple[ng(mostlikelyadmindownifusingAlps)– WithSLURMlogintothenodetoseewhattheproblemis– Processhungin‘D’stateonaDWinstance– Getthejobinforma[onandlookat:• ‘dwstatsessions’tofindthesessionid• ‘dwstatinstance’tofindtheinstanceid• ‘dwstatfragments’|grep<instanceid>
– FindtheMDSnode• Drainthenodeandreboottocleartheissue
DW server crash
• Dwstatshowsa‘D’estroyindicatorthatdoesn’tclear
• “scontrolshowburst”(SLURM)where“alloca[on”size=0orstate=teardown.
• OncetheDW-serverisrebootedmostrecoveryissuesarehandledbytheDWSsonwarewithoutneedforfurtherinterven[on.
Problem w/ size=0
• Silentproblem• Registra[onstuckin‘D’stateandeitherTorM• Dwclirmac[va[on-waittoseeifthatclearstheissue
• Dwcliupdateregista[on--id<num>--no-wait– Cancausedatalossifalldataisn’tstagedout
Log Files
• SMW–logperdwnode+logforsdb– /var/opt/cray/log/p0-current/dws– /var/opt/cray/log/p0-current/console&message• Grepdwandxfstoseeinforma[on
• Onmom/loginnodes– /var/log/nginx
Important Notes • Tocreateordestroyapersistentinstanceacomputenodemustbeallocated
• Exis[ngissues– Symboliclinksdon’twork– Ifthereisanemptydirectoryinthestage-indirectorythestage-inwillfail
• Ifmaxwrites/dayisreachedthenodewillbesettoread-only(ro)
• CheckstatusofanSSDwithxtcheckssd
ThisworkwassupportedbytheDirector,OfficeofScience,OfficeofAdvancedScien_ficCompu_ngResearchoftheU.S.DepartmentofEnergyundercontractNo.DEAC02-05CH11231.
National Energy Research Scientific Computing Center
20