D. Saltzberg, 7 Dec 01L2 Review
CDF
Level-2 Interface Board Status
David Saltzberg for L2 Group
Level-Two Trigger Review
December 7, 2001
D. Saltzberg, 7 Dec 01L2 Review
CDF Overview
Phase 1: L1interface, Clist, XTRPlist, SVTlist Phase 2: ISOlist, RECES Phase 3: Muon board (not in this talk)
(Phase 1 and Phase 2 have been done in parallel)
D. Saltzberg, 7 Dec 01L2 Review
CDF Responsible Physicists
L1 interface: Greg Feild*
Clist: Monica Tecchio, Heather Ray*
XTRPlist, SVTlist: Matt Worcester *, Jane Nachtman, D.S. RECES: Masa Tanaka *, Karen Byrum *
ISOlist: Steve Kuhlmann * , Bob Blair *
* = lives within 50 miles of Fermilab
ANL engineers (L1,reces,isolist): John Dawson, Bill Haberichter
Special operatives: Stephen Miller, Ted Liu, Peter Wittich
D. Saltzberg, 7 Dec 01L2 Review
CDF Theory of Operation - I
Input data from “Clients” L1 interface, RECES
- one word/event, no handshake
Clist, XTRPlist, SVTlist, Isolist: - variable length data, buffered by FIFO’s
- terminated by EE word
- Some info transfer about BC, L2B or event count for sync checking
D. Saltzberg, 7 Dec 01L2 Review
CDF Theory of Operation - II
Output: Control and Data signals via Magic Bus Master mode (currently all boards except Reces)
- L2P issues “STARTLOAD”
- When ready, Interface board requests “Boss”
- Board is granted Boss from upstream
- Board drives block mode data-transfer on Bus
- Boss is released by interface board, and MOD_DONE asserted
- When all MOD_DONE bits set, L2P begins processing
Slave mode
- Board is addressed over Magic Bus and read in single-word transfers
Alternate Output (TRKlist boards) VME readout
D. Saltzberg, 7 Dec 01L2 Review
CDFGeneral Error Detecton &
Handling
In L2P (every event--10’s kHz) L2P has 600 sec timeout for all MOD_DONE signals BC, L2B, or counters checked where possible event by event Checks for exactly 1 magic bus word from L1 board If error, pull CDF_ERROR (or equivalent) and ask for automatic Halt-
Recover-Run to resynch FIFO’s.
In TrigMon (~ 2 Hz) Check Number of words transferred for each board Check BC across system Exact bit-for-bit comparison of data vs. emulation and/or alternate
source
Offline Run select parts of TrigMon & Monica’s validation code on look area,
stream-g, stream-b, l2-torture runs (~1M events lately)
D. Saltzberg, 7 Dec 01L2 Review
CDF Testing Performance of system
Without Beam “L2 torture” nominally runs at ~20 kHz Occasionally have run system at ~40 kHz Runs system with high L2B occupancy Test patterns in for COT tracks, SVT tracks, emulate clusters 9 interface boards & up to 3 alphas connected
With Beam Same config. Get real XFT tracks but often have to run SVT test patterns (no SVX) Have found other problems (sometimes systemwide) that tests w/o
beam do not show. (Real world stuff that no teststand will anticipate) Extensive tests before Oct. shutdown, preliminary Dec. tests.
D. Saltzberg, 7 Dec 01L2 Review
CDF Current Boss Arb. Kludge
Glitch on BOSSGROUT (pecl) when taking BOSS can lead to two boards taking boss. Since in hardware (not firmware), cannot make simple glitch protection
Solution: Reduce collision rate by putting different delays in boards’ receiving of STARTLOAD (limits deadtimeless
L1A rate at 20kHz--we should have such problems.) Handle remaining collisions with L2P error handling New Backplane In a pinch, could it be fixed with TTL
D. Saltzberg, 7 Dec 01L2 Review
CDF Overview Plot of L2 crate
D. Saltzberg, 7 Dec 01L2 Review
CDF Board -by-Board Status(follows...)
Status of “best” board Highest rate tested & error rate Limit on (or measurement of) bit error rate Cooperation with other boards Plans for further work
Status of spares Number and status of spares known problems?
Status of Documentation Debugging tools, here and elsewhere Plans Other comments
D. Saltzberg, 7 Dec 01L2 Review
CDF L1 Interface Board
L2 torture tests tested at 20-40 kHz no problems tested ~1M events, no errors tested offlineno collisions with other boards (by construction)
Known problemsnoisier than others, but protected in timestill have to connect ground sheild & seeSolving noise here may solve it elsewhere
D. Saltzberg, 7 Dec 01L2 Review
CDF L1 Interface Board Plots
No errors in bit-for-bit comparison
D. Saltzberg, 7 Dec 01L2 Review
CDFL1 Interface Spares &
Debugging tools
SparesS/N 1 OK S/N 2 OK (in crate)S/N 3 3/4 stuffed
Debugging toolsBit for bit check available offline If more or less than one word is sent, L2P pulls error (Pretty simple board, no need for complex diagnostics)Teststand: Can set bit patterns, check in realtime or later
- data source: FRED
- data sink: MB to emulator board
D. Saltzberg, 7 Dec 01L2 Review
CDFL1 Interface
Documentation/Plans
DOCSCDFNOTE 4971Webpage:
http://hepwww.physics.yale.edu/www_info/yale_cdf/l1crate.html
Schematics have control room hardcopyPDF files recently sent to Greg-- will put on web and in trigger
room Plans
Keep runningFinish stuffing board #3 (2nd spare) and testLook into noise problem, not urgent. Wait until after new MB
installed
D. Saltzberg, 7 Dec 01L2 Review
CDF CList Board
Responsibles: Monica Tecchio, Heather Ray Gets data by fiber from each Locos board L2 torture tests
works at 20-40 kHz no errors no errors found in ~1 M events offline
Known problems crate 04-- had bit 02 is stuck low (probably trivial)
D. Saltzberg, 7 Dec 01L2 Review
CDF Clist board plots
No errors in bit-for-bit comparisons
D. Saltzberg, 7 Dec 01L2 Review
CDF L2 cutting on Jets
D. Saltzberg, 7 Dec 01L2 Review
CDF Clist Debugging tools
Bit-for-bit comparisons done in online/offline monitoring If L2 buffer number disagrees L2P pulls error Clusters can be set
pulling cable in DCAS crate makes a known cluster in principle software exists to make arbitrary cluster pattern at B0 (need
to verify)
Michigan teststand capabilities: Standalone board tests using VME Data source: Locos Data sink: MB & L2P Test full clustering chain DCAS ---> L2P via MB w/ tracer generating
multiple L1A’s
D. Saltzberg, 7 Dec 01L2 Review
CDFClist
Spares/Documentation/Plans
Spares S/N 1 OK (in system) S/N 2 flaky VME, otherwise works. S/N 3 being stuffed
Documentation webpage for aces, experts & non-experts
- http://www-cdf.fnal.gov/internal/cdfoperations/trigger/level2/my.html will become general L2 webpage (need more disk space) schematics online in Michigan hardcopies in trigger room
Plans Keep running stably with board #1, monitor robustness Fix flaky VME on board #2
Make board #3 a second “hot spare”
D. Saltzberg, 7 Dec 01L2 Review
CDF SVTlist Board Tests
Responsibles: Jane Nachtman Matt Worcester, D. Saltzberg L2 Torture Testing:
20-40 kHz L1A no errors (SVX off, running SVT test pattern) Tested with ~1 M events no bit errors Special run with checks inside alpha: BER<10-6
No collisions with other boards
Problems Gets confused if no EE word from SVT; L2P pulls error.
- Due to SVX not sending info to SVX
- Known problems in SVX have been fixed, others?
- Bill A. thinking about an SVT timeout to pull error
- Only happens with beam. Checked (painfully) before shutdown & it worked (could even have taken special oct. SVT runs with it.)
No firmware changes to TRACKlist boards in last 2 months!
D. Saltzberg, 7 Dec 01L2 Review
CDF Some SVTList Plots
No errors in bit-for-bit comparisons
D. Saltzberg, 7 Dec 01L2 Review
CDFL2 SVT Cutting (before
shutdown)
D. Saltzberg, 7 Dec 01L2 Review
CDF XTRPlist Board Tests
Responsibles: Jane Nachtman Matt Worcester, D. Saltzberg L2 Torture Testing:
20-40 kHz L1A noerrors Tested with 1 M events no detectable errors
- XTRD bank has known errors that cause Ntracks mismatch
- Correct at L2, wrong in readout
- No errors when cut on Ntrack agreement
- Handscan of other events looks okay
No collisions with other boards
Problems Illinois to fix XTRD bank filling errors One bad pT bit from one XTRP board
D. Saltzberg, 7 Dec 01L2 Review
CDF XTRPlist plots
No errors in bit-for-bit comparisons when number of tracks agrees.
D. Saltzberg, 7 Dec 01L2 Review
CDF Spares for TRACKlist
SVTList & XTRPlist are both instances of one board: TRACKlist CPLD change with JTAG connector one jumper change
Six production TRACKlist boards Currently 2 in L2P crate--permanent Currently 2 in SVT crate --1 or both temporary?
- one makes nominal SVTD bank. Convenient for booking SVT crate for test runs
- having separate boards effectively makes a cable check
- another board in SVT crate makes XTRP list---could be removed soon?
Six production boards, at least 2 required in system, maybe 3. Right now using 4.
D. Saltzberg, 7 Dec 01L2 Review
CDF TRACKlist spares
S/N 1 & 2: (Prototypes, no longer used.) S/N 3 XTRPlist OK (in L2P crate) S/N 4 SVTlist OK -- used for SVTD bank S/N 5 XTRPlist OK --”hot spare” S/N 6 SVTlist MB not working, bad connection S/N 7 SVTlist stuck chisq bit for MB -- used for SVTD bank S/N 8 SVTlist OK (in L2P crate)
All boards work for VME readout
D. Saltzberg, 7 Dec 01L2 Review
CDF TRACKlist debugging tools
Can send arbitrary pattern from SVT easily Can send arbitrary pattern from XTRP (more difficult) Bit-by-bit checking in TrigMon Can test BC from XTRP & SVT on every event UCLA teststand:
data source: merger boarddata sink: MB and emulator board and/or VME
D. Saltzberg, 7 Dec 01L2 Review
CDF TRACKlist plans
Keep running stably Fix one SVT spare (bad connection makes MB error) Fix one bad bit on another SVT spare Wean SVT off of second SVT board Make sure all six boards are “hot spares” Print hardcopies of schematics & firmware
D. Saltzberg, 7 Dec 01L2 Review
CDF TRACKList Documentation
Web-pages:Specs
http://buggs.physics.ucla.edu/~nachtman/board/specifications_v1.ps TIB instructions:
http://www-b0.fnal.gov:8000/level2/tib/tib_main.html TIB database: http://www-b0.fnal.gov:8000/level2/tib/tib_status.html
TIB schematics etc:
http://buggs.physics.ucla.edu/~nachtman/tib.html
Schematics on web in .eps format Need updated hardcopies printed out
D. Saltzberg, 7 Dec 01L2 Review
CDF ISOlist status
Responsibles: Steve Kuhlmann, Bob Blair Calculates 5 isolation sums
DCAS->Iso Pick -->ISOlist Clique ->Isoclique-> ISOlist
L2 Torture tests (or cosmics) need to require eta-phi match (~1-3% failure)
perfect at 20-40 kHz in all 5 sums Problems
with collisions see eta-phi match (still 1-3% failure), but L2P can check and pass the event
In 0.5% of events also scatter of expected vs. seen in all 5 sums (less than analog jitter in Run 1) N.B. the whole scatter comes from crate 1, eta=17.
D. Saltzberg, 7 Dec 01L2 Review
CDF ISOlist plots
D. Saltzberg, 7 Dec 01L2 Review
CDF ISOlist spares
In DCAS cratesNeed 1 ISOclique (have 2)Need 6 isopicks (have 8, 1 with stuck bit)
In L2P crateNeed 1 ISOlist (have 2)
All spares are “hot spares” except for 1 isopick with stuck bit.
D. Saltzberg, 7 Dec 01L2 Review
CDF ISOlist Debugging Tools
Standard running ISOpick times out if DCAS does not send data
Standalone code: writes to ISOclique (only board with VME) a seed tell it to read out fixed values to ISOlation system can load different values for different buffer numbers with a switch, can read energies from DCAS. Essentially this “factors” the
problem.
TrigMon & Offline Code Incorporated isolation variables into Monica’s code Need to debug some boundary values against the hardware
Teststand at ANL data source: ISOpick data sink: MB to emulator board
D. Saltzberg, 7 Dec 01L2 Review
CDF ISOlist Documentation/Plans
DOCSCDFnote 5788Schematics in hardcopy in binders at ANL but will come to
trigger roomPDF files of schematics (firmware & hardware) are
available, will be placed on web by Heather
Plans Continue running & monitor robustness Go after eta/phi mismatch (needs coordination between ANL and
Michigan) Find & fix flaky bit in DCAS crate
D. Saltzberg, 7 Dec 01L2 Review
CDF RECES status
Responsibles: Masa Tanaka, Karen Byrum Four boards in L2P crate receive information from SMXR by fiber During L2 Torture tests (36 kHz)
In crate, on backplane, but not used by default table No negative interactions
Special L2 executable (TEST_RECES table) L1 input is crossing trigger and 4 GeV elec, 8 GeV photon runs at 20kHz L1 input, 100 Hz L2A Maybe small bit errors -- few thousand events All SMXR to RECES is okay (at end of shutdown)
Problems Accidental collisions on Alpha readout Sol’ns: Arnd’s special retry readout code. Stephen will modify FPGA possible bit errors (10-3)
D. Saltzberg, 7 Dec 01L2 Review
CDF Reces Plots
D. Saltzberg, 7 Dec 01L2 Review
CDF RECES Spares/Docs/Plans
Need 4 Reces boards in system 4 in top crate OK 2 spare boards OK
Docs CDF 5132 Need to put schematics on web & hardcopies in trigger room.
Plans Keep RECES on backplane during default running Fix readout problem Search for BER < 10-4 in standard datataking & fix
D. Saltzberg, 7 Dec 01L2 Review
CDF Reces Debugging tools
Special standalone code VME based. Set trigger threshold, load SMXR’s Send bit patterns to RECES board, Alpha reads through VME Check bit-for-bit (checks all bits) 10 Hz (tens of thousands of events OK)
ANL teststand Not needed any more
TrigMon plots temperature plots checks bit-for-bit errors
D. Saltzberg, 7 Dec 01L2 Review
CDF Interface Board status by run(documented for collaboration)
D. Saltzberg, 7 Dec 01L2 Review
CDF Interface Boards:The Bottom Line
L2 crate with Clist, XTRPlist, SVTlist, L1 interface, ISOlist all work at up to full speed 20 kHz as-is.
Their bit-error rates are measured < 10-6 (RECES not tested to this level yet.)
Essentially all documentation exists. Some tweaks in progress
There is at least one working spare for every board. Every board has a real expert living close by Work in progress fixing up extra boards’ bad bits etc. In current configuration we can fulfill the charge of running
jets, electrons and SVT at 5e31 right now, as-is (assuming all clients are working)---”backups” will only distract.
D. Saltzberg, 7 Dec 01L2 Review
CDF Goals of Sept. workshop(for interface boards)
sync errors <10-6 DONE cut on jets/ “reliable Clist” DONE “reliable L1 board” DONE automated HRR DONE “solve XTRP problem” DONE (don’t remember what is was, but it works) reliable SVTlist DONE SVT kludge path DONE alpha code for cutting on SVT: Simple code DONE, complete cdf4718-lite underway Solve clist eta/phi errors for electrons:
DONE for electrons (iso needs work) alpha electron code Debugging prepare firmware without delays for MB testing DONE test boards on new MB NOT DONE test isolist and reces DONE “improve documentation” DONE -- more to do, as always
D. Saltzberg, 7 Dec 01L2 Review
CDF Suggestions-I
Spares should not be kept in lower crate unless being used. Otherwise water leak (it has happened before!) will destroy all boards. Currently squatting on other spare space...could use space allocated specifically for L2 spares
Need more disk space for L2 webpages on B0 machine. SVT group should use XTRP list in TL2D and free up spare
TRACKlist board “Clients” should be kept in stable configuration D-sized plotter in B0 for printing updated Firmware schematics
(.eps or .pdf)
D. Saltzberg, 7 Dec 01L2 Review
CDF Suggestions-II
Need more of the “good” jumpers (white) Make MagicBus document a CDFNOTE File cabinet for all L2 docs. Can be different sized schematics
and also text documents so folders would work better than one binder.
web “clearing house” for all L2 web documents. Good documentation exists for all boards, just need a list of links (Heather is working on this.) I think we should not over-structure this at this point...leave the microstructure to the individual groups
When given choice of testing kludge path vs. real path, try real first
D. Saltzberg, 7 Dec 01L2 Review
CDF Suggestions -III
In next 3-6 months, experts (and their supervisors) should think about training their successors.
Need to implement bit-for-bit emulation SIXD--> TL2D into TrigMon
Need someone to write/ implement XFLD-->XTRD emulation A MB “display” module would be a critical debugging tool
(LED’s on each line) much like the old Fastbus display module