daily etl checklist - california institute of technologydaily-etl-checklist.pdf · daily etl...
TRANSCRIPT
Daily ETL Checklist
October 8, 2017
1 Main checklist
These are the things I do daily when watching the TCCON station at ETL. Please note that all com-mands on the herc are case sensitive. Furthermore, please remember: The herc connection is slow,please type slowly and wait for the herc to respond before executing commands!
• Log in: $ ssh -p4222 etl.ftpaccess.cc
• Run IFScheck. See §2.
• Flip through the pages in IFSloop to check up on everything. See §3.
• Check the hourly images to ensure that the system is taking data when it should be (i.e., if it’s sunny,but the Zeno says its raining, something is amiss). The hourly images are available here: http:
//www.atmosp.physics.utoronto.ca/~dwunch/etl_webcam/index.html. Live images are availablehere: http://etl.ftpaccess.cc:8080/.
• Read the nightly report to ensure that there is sufficient disk space remaining, that I2S is runningcorrectly (i.e., spectra are being written with sensible MXY and MNY values), and that there are noerrors being reported. See §4.
You can find ideas for what to do when things go wrong in §5.
As always, your best resource is. . . the TCCON wiki: https://tccon-wiki.caltech.edu. There is ageneral section on the herc: https://tccon-wiki.caltech.edu/Caltech-built_Containers/Hercules_
Real-Time_Computing_System/QNX6_Hercules_Commands, and the hardware:https://tccon-wiki.caltech.edu/Caltech-built_Containers/Hardware.
Notes should be written on the Instrument History page: https://tccon-wiki.caltech.edu/Sites/
East_Trout_Lake/Instrument_History, and there you’ll find commands I’ve used in the past to recoverfrom problems that have occurred. Technical information about ETL specifically can be found here:https://tccon-wiki.caltech.edu/Sites/East_Trout_Lake/Technical_Information.
1
2 IFScheck
The output of IFScheck should look like this:
--- NOW (UT) ---
18:13:29 10-Aug-2017 Thu
--- WHODUNIT ---
18:13:07 Clt: Startup (dwunch)
18:13:19 Clt: Shutdown (dwunch)
--- RECENT LOG ---
18:02:02 Srvr: TMA: IFS Solar
18:02:03 ST: currentmode: Tracking Mode: Cloud detected
18:04:56 Srvr: TMA: IFS Solar
18:07:37 Srvr: TMA: IFS Solar
18:10:20 Srvr: TMA: IFS Solar
18:13:01 Srvr: TMA: IFS Solar
18:13:07 DC: Startup
18:13:07 Clt: Startup (dwunch)
18:13:19 Clt: Shutdown (dwunch)
18:13:19 Srvr: --: getcon end session 147341366
--- LATEST MIN/MAX ---
18:10:22 MXY1: 0.109 MNY1: 0.107 MXY2: 0.336 MNY2: 0.330
18:10:22 MXY1: 0.110 MNY1: 0.109 MXY2: 0.338 MNY2: 0.335
18:11:40 MXY1: 0.111 MNY1: 0.108 MXY2: 0.340 MNY2: 0.334
18:11:40 MXY1: 0.113 MNY1: 0.111 MXY2: 0.346 MNY2: 0.340
18:11:40 MXY1: 0.113 MNY1: 0.112 MXY2: 0.347 MNY2: 0.344
18:11:40 MXY1: 0.115 MNY1: 0.113 MXY2: 0.351 MNY2: 0.346
18:11:40 MXY1: 0.116 MNY1: 0.115 MXY2: 0.354 MNY2: 0.351
18:11:40 MXY1: 0.214 MNY1: 0.020 MXY2: 0.583 MNY2: 0.067
--- LATEST STATE CHANGE ---
11:45:19 Ready_To_Play
Below is a detailed description of the sections and pointers on what to look for.
• The NOW (UT) section shows the current time and date in UTC. This is useful for figuring out whenthe last scans were recorded and when the logs were last written.
• The WHODUNIT section shows the last person who started IFSloop and whether they also logged exited.In the example above, I’ve started IFSloop at 18:30;07 and exited at 18:13:19. This is useful for figuringout if anyone else is logged in. If they are do not do anything without coordinating first.
• The RECENT LOG section shows the last 10 lines of the /data/citco2/citco2.log file, which containsa running log of everything that happens during the automated process. This will give you a sense ofwhat has most recently happened. If you need to look farther back in time, just look at the entire logfor that day using less: $ less /data/citco2/citco2.log.
• The LATEST MIN/MAX section shows the maximum interferogram (MXY) and minimum interferogram(MNY) values for the latest scans. This is where knowing what NOW is in UTC is useful, so you can checkthat the most recent scans were, in fact, recent. InSb is displayed first (MXY1/MNY1) and InGaAs isdisplayed second (MXY2/MNY2).
2
• The LATEST STATE CHANGE section shows the last state the system was in, and when it started thatstate. In the example above, we’re in Ready To Play which means that the system is up and running,and will take measurements when it’s sunny. Other examples would be Failure mode, Nap state(usually due to bad weather), Wake Up Sequence (while pumping, filling the Norhof, etc.), Sleep
(night time), Bedtime Story (running lamp measurements).
3 IFSloop
In IFSloop, I check through the following items:
• All temperatures. Especially the Zeno T, which is the back room temperature, and Air T, which isthe main room air temperature. Ensure that nothing is getting too hot (i.e., much warmer than 30Cexcept for the PC CtrT and the outdoor “Temp” field in the Zeno section). You can find these on pages1 and 2 of IFSloop (Figs. 1-2).
• IFS pressure (IFS P). It should be less than 1 hPa throughout the day. This is on page 1 (Fig. 1).
• Status of IFS125 (Solar/SolarInGaAs/Cell/Idle). If it’s sunny out, it should be in Solar or SolarInGaAs.If not, Idle is appropriate. After sundown on Sundays and Mondays, it takes Cell measurements. I’vealso seen “Wait”, but that should be short-lived and rare. This is also on page 1 (Fig. 1).
• LN2 tank depth and temperature. If it’s getting low (less than a few cm), it’s time to fire up thegenerator. This is also on page 1 (Fig. 1).
• O2 stats. Ensure that the oxygen levels are near 20.9%. This is on page 2 of IFSloop (Fig. 2).
• Check the voltages on page 2 (Fig. 2 to ensure that they haven’t changed. The Dome 15V is always0.0 V on this system.
To flip from one page of IFSloop to the next, type CTRL-A spacebar. To go back to a previous page, typeCTRL-A p. You can go around to the first page by continuing to type CTRL-A spacebar through all thepages. Screenshots of all pages are shown in Figs. 1–5. To exit from IFSloop, simply type “exit”. Table 1contains a list of common IFSloop commands. Note that to type commands, IFSloop auto-completesfor you. This is very handy, but don’t type too quickly, or you may execute a command youdo not wish to execute!
3
Table 1: Common IFSloop commandsTopic CommandTo see the console when the software is running: $ IFSloopAfter looking at the console, exit using: exitTo flip between pages: Ctrl-a-n or Ctrl-a-spacebar - next
Ctrl-a-p - previousTo shut down the software: SW Status Shutdown Quickly
SW Status Shutdown CompletelySW Status Shutdown Instantly
To restart the software after shutting down: $ IFSloop startTo put the container into hold: SW Status HoldTo take the container out of hold: SW Status Reinit
(SW Status Time Warp)To take the container out of Failure mode: SW Status ReinitTo command the enclosure: Enclosure Open
Enclosure CloseTo pump down the IFS125HR: SW Status PumpdownTo abort a pump down of the IFS125HR: SW Status Pumpdown AbortTo vent the IFS125HR: SW Status Hold
SW Status Clear to VentIFS Direct VAC=2(wait until IFS P > 1000 hPa)IFS Direct VAC=0
4
Figure 1: This is a screenshot of the first page you see when you launch IFSloop. This has information aboutthe IFS125, IFS Diagnostics, IFS Laser amplitudes and offsets, IFS temperatures and pressure, Sun Trackerstatus, Zeno weather station information, Enclosure (STEnc), Computer memory and disk space, and theLN2 system status.
5
Figure 2: This is a screenshot of the second page of IFSloop. It contains information about all the IFStemperatures, other item temperatures, including the suntracker (ST), PC, vacuum pump, Zeno T (whichis the back pump room temperature), Air T (which is the main room temperature). The Power sectionincludes all voltages and currents for the various subsystems. The Control section shows the status of thesubsystems. The Dome is always “off” on this system. This page also has the Oxygen sensor information.
6
Figure 3: This is a screenshot of the third page of IFSloop. Not much to see.
7
Figure 4: This is a screenshot of the fourth page of IFSloop. Not much to see.
8
Figure 5: This is a screenshot of the fifth page of IFSloop. This contains the last several lines of the/data/citco2/citco2.log file. There are more lines on this page than displayed with IFScheck.
9
4 Nightly Report
An excerpt from a nightly report looks like this:
Status of NTP sync:
remote refid st t poll reach delay offset jitter
*10.10.0.6 .GPS. 1 u 1024 377 0.977 -1.750 1.471
-zero.gotroot.ca 30.114.5.31 2 u 1024 377 760.216 29.868 140.809
-ntp1.torix.ca .PPS. 1 u 1024 377 718.225 28.782 14.866
+206.108.0.134 .PPS. 1 u 1024 335 698.140 21.024 17.629
-S0106c04a00f34a 128.233.154.245 2 u 1024 377 757.851 16.421 8.185
-69.10.161.7 144.111.222.81 3 u 1024 377 804.424 36.594 125.768
-voipmonitor.wci 66.220.9.122 2 u 1024 377 1023.57 2.307 16.690
hadb1.smatwebde 216.229.0.179 2 u 1024 377 1997.39 -619.04 172.385
+tick.no-such-ag 98.210.143.81 2 u 1024 377 758.549 12.814 28.018
Status of disk space:
Filesystem 1K-blocks Used Free Use% Mounted on
/dev/hd1t79 244196001 109134710 135061290 45% /
/dev/hd0t77 244196001 109835712 134360289 45% /removable/
Executing bin2csv on raw/flight/170809.1
Memo Starting: Thu Aug 10 02:37:40 2017
02:37:40 TMbfr: Startup
02:37:40 ccengext: Startup
20:37:40 extract: ccengext (143331383) initialized
20:37:40 extract: rdr -AqP /data/citco2/raw/flight/170809.1
02:37:40 rdr: Startup
02:37:40 [WARNING] ccengext: Column ’LN2Depth’ reported at least one non-numeric value: ’****’
02:37:40 [WARNING] ccengext: Column ’LN2P’ reported at least one non-numeric value: ’****’
02:37:40 [WARNING] ccengext: Column ’Pump_P’ reported at least one non-numeric value: ’******’
02:38:05 rdr: Quit event
02:38:05 rdr: Shutdown
02:38:05 ccengext: Shutdown
02:38:05 TMbfr: Shutdown
extract: Moving products to subdirectory anal/170809.1
mv: Moving anal/Ext.143228967/cceng_1.csv to anal/170809.1/cceng_1.csv
mv: Moving anal/Ext.143228967/cceng_1_2.csv to anal/170809.1/cceng_1_2.csv
mv: Moving anal/Ext.143228967/cceng_1_8.csv to anal/170809.1/cceng_1_8.csv
mv: Moving anal/Ext.143228967/extract.log to anal/170809.1/extract.log
Memo Terminating
extract: Extraction Complete
Calculating dircksum on raw/flight/170809.1
Copying data to /removable/citco2/raw/flight/170809.1
Calculating dircksum on /removable/citco2/raw/flight/170809.1
Checksums agree with archived /removable/citco2/raw/flight/170809.1/.MD5SUM
Invoking spectral analysis script
The input of Exam.ksh is raw/flight/170809.1
Processing Solar scans
slice-ipp version 1.1.0pre18 5-Jul-2012 jfb
raw/flight/170809.1/scan/b489062.0 2017-08-09 13:06:06.894
raw/flight/170809.1/scan/b489068.0 2017-08-09 13:07:24.901 78.007
XSM assumed on because ssm=2. Setting ssp=2.
10
Run 1 found, starting at slice 489062
Run 2 found, starting at slice 489068
Run 1 auxiliary info: raw/flight/170809.1/scan/b489062.0.info
Reject: run 1 of 20170809 has solar intensity STD of 188.3 >= 99.9
Run 2 auxiliary info: raw/flight/170809.1/scan/b489062.1.info
Reject: run 2 of 20170809 has solar intensity STD of 96.8 >= 72.4
Average time drift -1.1
raw/flight/170809.1/scan/b489075.0 2017-08-09 13:08:46.601
raw/flight/170809.1/scan/b489081.0 2017-08-09 13:10:04.608 78.007
XSM assumed on because ssm=2. Setting ssp=2.
Run 3 found, starting at slice 489075
Run 4 found, starting at slice 489081
Run 3 auxiliary info: raw/flight/170809.1/scan/b489075.0.info
Reject: run 3 of 20170809 has solar intensity STD of 84.9 >= 53.4
Run 4 auxiliary info: raw/flight/170809.1/scan/b489075.1.info
Average time drift -1.1
Extremum value of .0542501397 at point 1421731
Best ZPD at point 71179.4125
Writing file: /removable/citco2/spectra.tmp/et20170809shffaa.004
Extremum value of .0304322243 at point 1421733
Best ZPD at point 71178.2722
Writing file: /removable/citco2/spectra.tmp/et20170809shffac.004
The top section shows the status of the NTP sync - this should always show that we’re using the GPS asour time server. It should be at the top of the list with a star next to it.
The Status of disk space section lists the size of the internal drive first, followed by the removabledrive. We record about 8GB of data per sunny day in the summer, significantly less in the winter. I normallyswap disks when there is about 55GB left on the disk, giving myself about 11 days to swap disks, return thedisk to Toronto, analyse the data, and then remove the copied and archived data off of the herc disks.
The Memo Starting section is a log of the steps taken to generate savesets.The Processing Solar scans section shows the output of I2S (formerly slice-ipp). The messages in this
excerpt are normal. I usually look for the Extremum value and check to ensure that they are all positive,and usually around 0.5 or 0.6. These values are a bit low, probably because they are early morning scans.Nothing to worry about, unless the whole day looks like this and you know it was very sunny.
5 When things go wrong
When things go wrong (e.g., scanner errors, laser ready errors, etc.), it is often the case that simply reini-tializing the software will solve the problem. You do this from IFSloop (see §3, especially Table 1):
$ IFSloop
sw status reinit
Wait to ensure that the error is cleared and you go back into Play state.exit
Sometimes, the weather is poor or you’re waiting for some assistance and you do not want to the run theinstrument, but you do want to be able to monitor IFSloop. In this case, you can put the system into “Hold”:
$ IFSloop
sw status hold
exit
To take the system out of hold, use the sw status reinit command described above. Be sure to check thewiki if you need any advice.
11