interactive runs for tests (bg/q and theta)...interactive runs for tests (bg/q and theta) submit an...
TRANSCRIPT
![Page 1: INTERACTIVE RUNS FOR TESTS (BG/Q AND THETA)...INTERACTIVE RUNS FOR TESTS (BG/Q AND THETA) Submit an interac-ve job to the queue, e.g. – qsub –I –t 30 –n 512 When job "runs",](https://reader035.vdocuments.us/reader035/viewer/2022071000/5fbc40db9619d6521f1c7aee/html5/thumbnails/1.jpg)
![Page 2: INTERACTIVE RUNS FOR TESTS (BG/Q AND THETA)...INTERACTIVE RUNS FOR TESTS (BG/Q AND THETA) Submit an interac-ve job to the queue, e.g. – qsub –I –t 30 –n 512 When job "runs",](https://reader035.vdocuments.us/reader035/viewer/2022071000/5fbc40db9619d6521f1c7aee/html5/thumbnails/2.jpg)
INTERACTIVE RUNS FOR TESTS (BG/Q AND THETA) § Submitaninterac-vejobtothequeue,e.g.
– qsub–I–t30–n512§ Whenjob"runs",thenodesareallocated,andyougeta(new)shellprompt.§ ThisshellbehavesliketheoneinaCobaltscriptjob
– BG/Q:Justonedifference:do"wait-boot"beforeproceeding– StartyourcomputenoderunjustlikeinaCobaltscriptjob.
• BG/Q:runjob--block$COBALT_PARTNAME--np512–p16:myprogram.exe• Theta:aprun–N64–d1–j1–ccdepthmyprogram.exe
§ Whenyouexittheshell,theCobaltjobwillend§ Note:WhentheCobaltjobrunsoutof-me,thereisnomessage.
– Runjoboraprunwillfail.– Checkyourjobstatuswith"qstat$COBALT_JOBID"
2
![Page 3: INTERACTIVE RUNS FOR TESTS (BG/Q AND THETA)...INTERACTIVE RUNS FOR TESTS (BG/Q AND THETA) Submit an interac-ve job to the queue, e.g. – qsub –I –t 30 –n 512 When job "runs",](https://reader035.vdocuments.us/reader035/viewer/2022071000/5fbc40db9619d6521f1c7aee/html5/thumbnails/3.jpg)
BG/Q LIGHTWEIGHT CORE FILES § Whenrunfails,lookforcorefiles
– core.0,core.1,etc.§ Lightweightcorefiles
– Oneforeachrankthatfailedbeforejobteardown– Containstackbacktraceinaddressform– Decodetosymbolic(useful!)form
§ Environmentsecngstocontrolcorefiles– hdp://www.alcf.anl.gov/user-guides/core-file-secngs
3
![Page 4: INTERACTIVE RUNS FOR TESTS (BG/Q AND THETA)...INTERACTIVE RUNS FOR TESTS (BG/Q AND THETA) Submit an interac-ve job to the queue, e.g. – qsub –I –t 30 –n 512 When job "runs",](https://reader035.vdocuments.us/reader035/viewer/2022071000/5fbc40db9619d6521f1c7aee/html5/thumbnails/4.jpg)
BG/Q LIGHTWEIGHT CORE FILE EXAMPLE +++PARALLELTOOLSCONSORTIUMLIGHTWEIGHTCOREFILEFORMATversion1.0
+++LCB1.0
Program:/gpfs/vesta-home/rloy/src/test/idie
[...]
+++IDRank:0,TGID:1,Core:0,HWTID:0TID:1State:RUN
***FAULTEncounteredunhandledsignal0x00000006(6)(SIGABRT)
[…]
+++STACK
FrameAddressSavedLinkReg
0000001nffn7000000000001001848
0000001nffn8c000000000010003e8
0000001nffn9600000000001000438
[...]
---STACK
[…]
4
![Page 5: INTERACTIVE RUNS FOR TESTS (BG/Q AND THETA)...INTERACTIVE RUNS FOR TESTS (BG/Q AND THETA) Submit an interac-ve job to the queue, e.g. – qsub –I –t 30 –n 512 When job "runs",](https://reader035.vdocuments.us/reader035/viewer/2022071000/5fbc40db9619d6521f1c7aee/html5/thumbnails/5.jpg)
BG/Q: DECODING LIGHTWEIGHT CORE FILES § bgq_stack[op-onal_exename][corefile]
+++IDRank:0,TGID:1,Core:0,HWTID:0TID:1State:RUN
0000000001001848
abort
/bgsys/drivers/V1R2M2/ppc64/toolchain/gnu/glibc-2.12.2/stdlib/abort.c:77
00000000010003e8
barfunc
/gpfs/vesta-home/rloy/src/test/idie.c:6
0000000001000438
foofunc
/gpfs/vesta-home/rloy/src/test/idie.c:12
0000000001000498
main
/gpfs/vesta-home/rloy/src/test/idie.c:19
[...]5
![Page 6: INTERACTIVE RUNS FOR TESTS (BG/Q AND THETA)...INTERACTIVE RUNS FOR TESTS (BG/Q AND THETA) Submit an interac-ve job to the queue, e.g. – qsub –I –t 30 –n 512 When job "runs",](https://reader035.vdocuments.us/reader035/viewer/2022071000/5fbc40db9619d6521f1c7aee/html5/thumbnails/6.jpg)
BG/Q: COREPROCESSOR § Usefulwhenyouhavealargesetofcorefiles
– Showssymbolicbacktrace– Groupsranksthatabortedinthesameloca-ontogether– Canalsoa5achtoarunningjobtotakesnapshot
§ Loca-on– coreprocessor.plisinyourdefaultPATH
• Adachingtorunningjobdoesnotrequireadministrator• coreprocessor-nogui-snapshot=<filename>-j=<jobid>
– Usetheback-end(ibm.runjob)jobidfromthe.errorfile,nottheCobaltjobid
§ Scalabilitylimit– Absolutemaximum32Kranks.Prac-callimitlower.
§ Instruc-ons:– BG/QApplica-onDeveloperRedbook
• hdp://www.redbooks.ibm.com/redpieces/abstracts/sg247948.html6
![Page 7: INTERACTIVE RUNS FOR TESTS (BG/Q AND THETA)...INTERACTIVE RUNS FOR TESTS (BG/Q AND THETA) Submit an interac-ve job to the queue, e.g. – qsub –I –t 30 –n 512 When job "runs",](https://reader035.vdocuments.us/reader035/viewer/2022071000/5fbc40db9619d6521f1c7aee/html5/thumbnails/7.jpg)
COREPROCESSOR WINDOW
7
![Page 8: INTERACTIVE RUNS FOR TESTS (BG/Q AND THETA)...INTERACTIVE RUNS FOR TESTS (BG/Q AND THETA) Submit an interac-ve job to the queue, e.g. – qsub –I –t 30 –n 512 When job "runs",](https://reader035.vdocuments.us/reader035/viewer/2022071000/5fbc40db9619d6521f1c7aee/html5/thumbnails/8.jpg)
BG/Q: GDB § Asinglegdbclientcanconnecttosinglerankofyourjob§ BG/QLimita-ons
– Eachinstanceofgdbclientcountsasa“debugtool”– Only4toolsmaybeconnectedtoajob
• Atmost4rankscanbeexamined
§ Startadebugsessionusingqsub–I(interac-vejob)– qsub–I–qdefault–t30–n64– SeeRedbookformoreinfoonstar-nggdbwithrunjob
§ gdbcanalsoloadacompute-nodebinarycorefile– Useextremecau?onwhengenera?ngbinarycorefiles
§ Generallyaparalleldebugger(e.g.DDT)willbemoreuseful8
![Page 9: INTERACTIVE RUNS FOR TESTS (BG/Q AND THETA)...INTERACTIVE RUNS FOR TESTS (BG/Q AND THETA) Submit an interac-ve job to the queue, e.g. – qsub –I –t 30 –n 512 When job "runs",](https://reader035.vdocuments.us/reader035/viewer/2022071000/5fbc40db9619d6521f1c7aee/html5/thumbnails/9.jpg)
THETA
§ Will come back to DDT on BG/Q later
9
![Page 10: INTERACTIVE RUNS FOR TESTS (BG/Q AND THETA)...INTERACTIVE RUNS FOR TESTS (BG/Q AND THETA) Submit an interac-ve job to the queue, e.g. – qsub –I –t 30 –n 512 When job "runs",](https://reader035.vdocuments.us/reader035/viewer/2022071000/5fbc40db9619d6521f1c7aee/html5/thumbnails/10.jpg)
THETA: ATP
§ ATP = Abnormal Termination Processing – generates a STAT format merged stack backtrace (file atpMergedBT.dot) – view the backtrace file with stat-view
§ Link your app with ATP – Before linking, make sure the "atp" module is loaded (check using module list) – Cray compiler will link in ATP automatically – Intel compiler needs a work-around for now:
• -Wl,-T/opt/cray/pe/cce/8.5.2/craylibs/x86-64/2.23.1.cce.ld
10
![Page 11: INTERACTIVE RUNS FOR TESTS (BG/Q AND THETA)...INTERACTIVE RUNS FOR TESTS (BG/Q AND THETA) Submit an interac-ve job to the queue, e.g. – qsub –I –t 30 –n 512 When job "runs",](https://reader035.vdocuments.us/reader035/viewer/2022071000/5fbc40db9619d6521f1c7aee/html5/thumbnails/11.jpg)
STAT-VIEW
11
![Page 12: INTERACTIVE RUNS FOR TESTS (BG/Q AND THETA)...INTERACTIVE RUNS FOR TESTS (BG/Q AND THETA) Submit an interac-ve job to the queue, e.g. – qsub –I –t 30 –n 512 When job "runs",](https://reader035.vdocuments.us/reader035/viewer/2022071000/5fbc40db9619d6521f1c7aee/html5/thumbnails/12.jpg)
THETA: STAT § While program is running (e.g. deadlocked), you can generate a merged
backtrace snapshot showing where your program is. § module load stat § On the MOM node, invoke "stat-cl pid" where pid is the aprun pid § Method 1:
– In your job script, run "hostname" to output the MOM node's hostname – During the run, read the MOM hostname from your output file, ssh to that hostname,
use ps to determind the pid of your aprun, then invoke stat-cl on that pid
§ Method 2: – Use the example job script in /soft/debuggers/stat/job-stat.sh
• Modify the aprun command to run what you need – When the job is running, run the command "touch STAT_NOW". The script will check
for this file's existence every 60 seconds. If it sees the file, it will generate a STAT snapshot. You can create multiple snapshots.
12
![Page 13: INTERACTIVE RUNS FOR TESTS (BG/Q AND THETA)...INTERACTIVE RUNS FOR TESTS (BG/Q AND THETA) Submit an interac-ve job to the queue, e.g. – qsub –I –t 30 –n 512 When job "runs",](https://reader035.vdocuments.us/reader035/viewer/2022071000/5fbc40db9619d6521f1c7aee/html5/thumbnails/13.jpg)
LGDB
§ lgdb connects a gdb to each rank and provides a text interface § module load cray-lgdb § Modify your script job.sh to mark your aprun:
#cray_debug_start aprun –n 1 –N 1 –d 1 –j 1 a.out #cray_debug_end
§ lgdb – launch $a(8) --qsub=job.sh a.out
• Submits job.sh to run 8 ranks, your executable is a.out
§ Useful commands – backtrace (bt), continue (cont), break, print – See "man lgdb"
13
![Page 14: INTERACTIVE RUNS FOR TESTS (BG/Q AND THETA)...INTERACTIVE RUNS FOR TESTS (BG/Q AND THETA) Submit an interac-ve job to the queue, e.g. – qsub –I –t 30 –n 512 When job "runs",](https://reader035.vdocuments.us/reader035/viewer/2022071000/5fbc40db9619d6521f1c7aee/html5/thumbnails/14.jpg)
ALLINEA DDT
§ BG/Q,Theta,Cooley– MAPavailableonTheta(notsupportedonBG/Q)
§ Environment– BG/Q:sovenvkey“+ddt”– Theta:moduleloadforge/7.0(/sov/environment/modules/modulefiles)
§ Compilingyourcode– Compile–g–O0– Note:XLcompilerop-on-qsmp=ompalsoturnsonop-miza-onwithinOMP
constructs.Tooverride,use"noopt",e.g.• -qsmp=omp:noauto:noopt
§ Moredetails:– hdp://www.alcf.anl.gov/user-guides/allinea-ddt
14
![Page 15: INTERACTIVE RUNS FOR TESTS (BG/Q AND THETA)...INTERACTIVE RUNS FOR TESTS (BG/Q AND THETA) Submit an interac-ve job to the queue, e.g. – qsub –I –t 30 –n 512 When job "runs",](https://reader035.vdocuments.us/reader035/viewer/2022071000/5fbc40db9619d6521f1c7aee/html5/thumbnails/15.jpg)
ALLINEA DDT STARTUP (BG AND THETA)
§ Runusingremoteclient(RECOMMENDED)– DownloadandinstallMacorWindows"Remoteclient"from
hdp://www.allinea.com/products/download-allinea-ddt-and-allinea-map– Op-onal:usesshmastermodesoyouonlyneedloginoncepersession
• Note:supportedonMacOS/X;notsupportedinWindows<=XP(?for>XP)• ~/.ssh/config
– ControlMasterauto– ControlPath~/.ssh/master-%r@%h:%p
§ Runfromloginnode– NeedX11serveronyourlaptopandssh–Xforwarding– RunddtandletitsubmitjobthroughGUI
15
![Page 16: INTERACTIVE RUNS FOR TESTS (BG/Q AND THETA)...INTERACTIVE RUNS FOR TESTS (BG/Q AND THETA) Submit an interac-ve job to the queue, e.g. – qsub –I –t 30 –n 512 When job "runs",](https://reader035.vdocuments.us/reader035/viewer/2022071000/5fbc40db9619d6521f1c7aee/html5/thumbnails/16.jpg)
DDT REMOTE CLIENT (0) GUI LOOKS JUST LIKE THE X11 CLIENT
16
![Page 17: INTERACTIVE RUNS FOR TESTS (BG/Q AND THETA)...INTERACTIVE RUNS FOR TESTS (BG/Q AND THETA) Submit an interac-ve job to the queue, e.g. – qsub –I –t 30 –n 512 When job "runs",](https://reader035.vdocuments.us/reader035/viewer/2022071000/5fbc40db9619d6521f1c7aee/html5/thumbnails/17.jpg)
DDT REMOTE CLIENT (1) SELECT "CONFIGURE" TO ADD A NEW REMOTE HOST
17
![Page 18: INTERACTIVE RUNS FOR TESTS (BG/Q AND THETA)...INTERACTIVE RUNS FOR TESTS (BG/Q AND THETA) Submit an interac-ve job to the queue, e.g. – qsub –I –t 30 –n 512 When job "runs",](https://reader035.vdocuments.us/reader035/viewer/2022071000/5fbc40db9619d6521f1c7aee/html5/thumbnails/18.jpg)
DDT REMOTE CLIENT (2) NOTE: THIS REMOTE INSTALLATION DIRECTORY IS THE DEFAULT VERSION OF DDT, CORRESPONDING TO +DDT OR MODULE CLICK "TEST REMOTE LAUNCH" TO VERIFY
18
![Page 19: INTERACTIVE RUNS FOR TESTS (BG/Q AND THETA)...INTERACTIVE RUNS FOR TESTS (BG/Q AND THETA) Submit an interac-ve job to the queue, e.g. – qsub –I –t 30 –n 512 When job "runs",](https://reader035.vdocuments.us/reader035/viewer/2022071000/5fbc40db9619d6521f1c7aee/html5/thumbnails/19.jpg)
DDT REMOTE CLIENT (3) NOW THAT IT IS DEFINED, SELECT REMOTE MACHINE
19
![Page 20: INTERACTIVE RUNS FOR TESTS (BG/Q AND THETA)...INTERACTIVE RUNS FOR TESTS (BG/Q AND THETA) Submit an interac-ve job to the queue, e.g. – qsub –I –t 30 –n 512 When job "runs",](https://reader035.vdocuments.us/reader035/viewer/2022071000/5fbc40db9619d6521f1c7aee/html5/thumbnails/20.jpg)
DDT (4) CONNECTED (NOTE LICENSE INFO IN LOWER LEFT CORNER) FROM THIS POINT, REMOTE GUI WORKS SAME AS LOCAL
20
![Page 21: INTERACTIVE RUNS FOR TESTS (BG/Q AND THETA)...INTERACTIVE RUNS FOR TESTS (BG/Q AND THETA) Submit an interac-ve job to the queue, e.g. – qsub –I –t 30 –n 512 When job "runs",](https://reader035.vdocuments.us/reader035/viewer/2022071000/5fbc40db9619d6521f1c7aee/html5/thumbnails/21.jpg)
DDT STARTUP – REVERSE CONNECT (BG, THETA)
§ Start remote client and connect to login node (or start X11 client on login node) § In an ssh session to the login node
– Run an interactive job (qsub –I) • BG/Q: Instead of runjob
– ddt --connect --mpiargs="--block $COBALT_PARTNAME" --processes=8 -procs-per-node=16 myprog.exe
• Theta: Instead of aprun ... myprog.exe – /soft/debuggers/forge/bin/ddt --connect aprun ... myprog.exe
§ Likewise with Allinea MAP – Theta: /soft/debuggers/forge/bin/map --connect aprun ... myprog.exe – BG/Q: MAP is not supported on BG (but other perf tools available)
21
![Page 22: INTERACTIVE RUNS FOR TESTS (BG/Q AND THETA)...INTERACTIVE RUNS FOR TESTS (BG/Q AND THETA) Submit an interac-ve job to the queue, e.g. – qsub –I –t 30 –n 512 When job "runs",](https://reader035.vdocuments.us/reader035/viewer/2022071000/5fbc40db9619d6521f1c7aee/html5/thumbnails/22.jpg)
DDT (5) – BG/Q DIRECT JOB SUBMIT CLICK "RUN" TO START A DEBUGGING SESSION
22
![Page 23: INTERACTIVE RUNS FOR TESTS (BG/Q AND THETA)...INTERACTIVE RUNS FOR TESTS (BG/Q AND THETA) Submit an interac-ve job to the queue, e.g. – qsub –I –t 30 –n 512 When job "runs",](https://reader035.vdocuments.us/reader035/viewer/2022071000/5fbc40db9619d6521f1c7aee/html5/thumbnails/23.jpg)
DDT (6) – BG/Q DIRECT JOB SUBMIT REMEMBER TO SET WORKING DIRECTORY IMPORTANT! ENABLE THE CHECKBOX "SUBMIT TO QUEUE" - CLICK "CONFIGURE" AND "PARAMETERS" FOR ADDITIONAL SETTINGS
23
![Page 24: INTERACTIVE RUNS FOR TESTS (BG/Q AND THETA)...INTERACTIVE RUNS FOR TESTS (BG/Q AND THETA) Submit an interac-ve job to the queue, e.g. – qsub –I –t 30 –n 512 When job "runs",](https://reader035.vdocuments.us/reader035/viewer/2022071000/5fbc40db9619d6521f1c7aee/html5/thumbnails/24.jpg)
DDT (6.1) – BG/Q DIRECT JOB SUBMIT JOB SUBMISSION TAB
USE SUBMISSION TEMPLATE: /SOFT/DEBUGGERS/DDT/TEMPLATES/ALCF-BGQ.QTF
24
![Page 25: INTERACTIVE RUNS FOR TESTS (BG/Q AND THETA)...INTERACTIVE RUNS FOR TESTS (BG/Q AND THETA) Submit an interac-ve job to the queue, e.g. – qsub –I –t 30 –n 512 When job "runs",](https://reader035.vdocuments.us/reader035/viewer/2022071000/5fbc40db9619d6521f1c7aee/html5/thumbnails/25.jpg)
DDT (6.2) – BG/Q DIRECT JOB SUBMIT REMEMBER TO SET YOUR PROJECT
25
![Page 26: INTERACTIVE RUNS FOR TESTS (BG/Q AND THETA)...INTERACTIVE RUNS FOR TESTS (BG/Q AND THETA) Submit an interac-ve job to the queue, e.g. – qsub –I –t 30 –n 512 When job "runs",](https://reader035.vdocuments.us/reader035/viewer/2022071000/5fbc40db9619d6521f1c7aee/html5/thumbnails/26.jpg)
DDT (7) – BG/Q DIRECT JOB SUBMIT JOB MUST GO THROUGH QUEUE
26
![Page 27: INTERACTIVE RUNS FOR TESTS (BG/Q AND THETA)...INTERACTIVE RUNS FOR TESTS (BG/Q AND THETA) Submit an interac-ve job to the queue, e.g. – qsub –I –t 30 –n 512 When job "runs",](https://reader035.vdocuments.us/reader035/viewer/2022071000/5fbc40db9619d6521f1c7aee/html5/thumbnails/27.jpg)
DDT (8) – REVERSE CONNECT OR DIRECT SUBMIT WHEN JOB STARTS RUNNING, CONNECTION STATUS WILL SHOW
27
![Page 28: INTERACTIVE RUNS FOR TESTS (BG/Q AND THETA)...INTERACTIVE RUNS FOR TESTS (BG/Q AND THETA) Submit an interac-ve job to the queue, e.g. – qsub –I –t 30 –n 512 When job "runs",](https://reader035.vdocuments.us/reader035/viewer/2022071000/5fbc40db9619d6521f1c7aee/html5/thumbnails/28.jpg)
DDT (9) READY TO DEBUG!
28
![Page 29: INTERACTIVE RUNS FOR TESTS (BG/Q AND THETA)...INTERACTIVE RUNS FOR TESTS (BG/Q AND THETA) Submit an interac-ve job to the queue, e.g. – qsub –I –t 30 –n 512 When job "runs",](https://reader035.vdocuments.us/reader035/viewer/2022071000/5fbc40db9619d6521f1c7aee/html5/thumbnails/29.jpg)
QUESTIONS
§ Seealso
– hdp://www.alcf.anl.gov/user-guides/mira-cetus-vesta• Thetadocscomingsoon.Fornow,seeConfluence(wiki)
29