giant systolic array (gsa) correlator concept description€¦ · penticton, bc , canada, v2a 6j9...
TRANSCRIPT
Name Designation Affiliation Date Signature
Additional Authors
Submitted by:
B. Carlson DRAO 2011‐03‐26
Approved by:
W. Turner Signal Processing Domain Specialist
SPDO 2011‐03‐29
GIANT SYSTOLIC ARRAY (GSA) CORRELATOR CONCEPT
DESCRIPTION
Document number .................................................................. WP2‐040.050.010‐TD‐001
Revision ........................................................................................................................... 1
Author .......................................................................................................... Brent Carlson
Date ................................................................................................................ 2011‐03‐29
Status ............................................................................................... Approved for release
WP2‐040.050.010‐TD‐001 Revision : 1
2011‐03‐11 Page 2 of 59
DOCUMENT HISTORY
Revision Date Of Issue Engineering Change
Number
Comments
A‐Preliminary Feb. 21, 2011 ‐ First draft release for internal review
A Feb. 28, 2011 Initial Release
B Mar. 11, 2011 Add items to address CoDR doc requirements
1 Mar. 29, 2011 First Issue
DOCUMENT SOFTWARE
Package Version Filename
Wordprocessor MsWord Word 2007 03b‐wp2‐040.050.010‐td‐001‐1‐gsaconcept‐description
Block diagrams
Other
ORGANISATION DETAILS
Name National Research Council Canada
Physical/Postal
Address
Herzberg Institute of Astrophysics
Dominion Radio Astrophysical Observatory
P.O. Box 248
717 White Lake Rd
Penticton, BC, Canada, V2A 6J9
Tel: 250‐497‐2300
Fax. 250‐497‐2355
Website http://www.nrc‐cnrc.gc.ca/eng/ibp/hia.html
WP2‐040.050.010‐TD‐001 Revision : 1
2011‐03‐11 Page 3 of 59
TABLE OF CONTENTS
1 INTRODUCTION ........................................................................................... 10
1.1 Purpose of the document ..................................................................................................... 10
1.2 Simplified Overview Description ........................................................................................... 10
2 REFERENCES .............................................................................................. 11
3 HIERARCHY ................................................................................................ 12
3.1 Hierarchical Lifecycle ............................................................................................................ 12
4 ELEMENT LEVEL: SIGNAL PROCESSING .............................................................. 12
4.1 F‐part ..................................................................................................................................... 12
4.2 X‐part .................................................................................................................................... 13
4.2.1 Baseline GSA Correlator Board (Dishes) ....................................................................... 15
4.2.2 Board‐to‐Board Interconnections (Dishes) ................................................................... 18
4.2.3 SAA and DAA Board Interconnections .......................................................................... 20
4.2.4 Discussion—System Effects of ASIC Capacity ............................................................... 20
4.3 Correlator to Image Processor Data Transport—Visibility‐Based Addressing ...................... 21
4.3.1 Discussion ...................................................................................................................... 27
4.4 Central Beam‐former Data Access ........................................................................................ 28
4.5 Phase‐I SKA ............................................................................................................................ 29
4.5.1 Phase‐I F‐Part ................................................................................................................ 29
4.5.2 Phase‐I X‐Part: Dishes ................................................................................................... 29
4.5.3 Phase‐I X‐Part: SAAs ...................................................................................................... 29
4.5.4 Phase‐I Central Beam‐forming ...................................................................................... 31
4.6 Monitor and Control ............................................................................................................. 31
4.7 Upgrade Growth Paths .......................................................................................................... 32
5 RISKS ....................................................................................................... 33
6 REQUIREMENTS .......................................................................................... 33
6.1 Item Definition ...................................................................................................................... 34
6.1.1 General Description ...................................................................................................... 34
6.1.2 External Interfaces ........................................................................................................ 34
6.1.3 Internal Interfaces ......................................................................................................... 34
6.1.4 Modes ........................................................................................................................... 34
7 CHARACTERISTICS ........................................................................................ 35
7.1 Performance Characteristics ................................................................................................. 35
7.2 Physical Characteristics ......................................................................................................... 35
7.2.1 X‐part Correlator Boards ............................................................................................... 35
7.2.2 X‐part Racks .................................................................................................................. 35
7.2.3 F‐part Boards and Racks ................................................................................................ 37
WP2‐040.050.010‐TD‐001 Revision : 1
2011‐03‐11 Page 4 of 59
7.2.4 X‐part Synchronization .................................................................................................. 37
7.2.5 Rack Power Delivery and Control .................................................................................. 37
7.2.6 F‐part Data Insertion into the Correlator ...................................................................... 37
7.2.7 M&C Network ............................................................................................................... 37
7.2.8 Output Data Network ................................................................................................... 37
7.2.9 Thermal Considerations ................................................................................................ 38
7.3 Electrical Characteristics ....................................................................................................... 39
7.4 Reliability ............................................................................................................................... 39
7.5 Maintainability ...................................................................................................................... 40
7.6 Availability ............................................................................................................................. 41
7.7 Additional Quality Factors ..................................................................................................... 42
7.8 Environmental Conditions .................................................................................................... 42
7.9 Transportability ..................................................................................................................... 42
7.10 Flexibility and Expandability.................................................................................................. 42
7.11 Portability .............................................................................................................................. 43
8 DESIGN AND PRODUCTION ............................................................................ 43
8.1 Components, Materials and Process .................................................................................... 43
8.2 Development and Production Plan ....................................................................................... 44
8.2.1 Development and Production Schedule ....................................................................... 45
8.3 Electromagnetic Radiation .................................................................................................... 47
8.4 Manufacturer Nameplate and Product Marking .................................................................. 47
8.5 Industrial Standardisation ..................................................................................................... 47
8.6 Interchangeability ................................................................................................................. 47
8.7 Safety .................................................................................................................................... 48
8.8 Ergonomics ............................................................................................................................ 48
8.9 Confidentiality and Protection .............................................................................................. 48
8.10 Supplies from the Contracting Authority .............................................................................. 48
8.11 Resource Reserve Capabilities .............................................................................................. 48
8.12 Documentation ..................................................................................................................... 48
8.13 Logistics ................................................................................................................................. 48
8.14 Personnel and Training ......................................................................................................... 48
8.14.1 Personnel ...................................................................................................................... 48
8.14.2 Training ......................................................................................................................... 49
8.15 Characteristics of Secondary Items ....................................................................................... 49
8.16 Priority ................................................................................................................................... 49
9 COST AND POWER ESTIMATES ........................................................................ 50
9.1 Cost Table .............................................................................................................................. 50
9.2 Power Table .......................................................................................................................... 52
9.3 Operating Costs ..................................................................................................................... 53
9.3.1 Power ............................................................................................................................ 53
9.3.2 Staffing .......................................................................................................................... 53
9.3.3 Maintenance ................................................................................................................. 53
9.3.4 Spares & Replacements ................................................................................................ 53
WP2‐040.050.010‐TD‐001 Revision : 1
2011‐03‐11 Page 5 of 59
10 QUALITY ASSURANCE PROVISIONS ............................................................... 55
10.1 General .................................................................................................................................. 55
10.1.1 Qualification .................................................................................................................. 55
10.1.2 Acceptance .................................................................................................................... 55
10.2 Requirements Conformance Verification ............................................................................. 55
10.2.1 Verification Methods .................................................................................................... 55
10.2.2 Verification Matrix ........................................................................................................ 55
11 DELIVERY PROVISIONS ............................................................................... 55
12 NOTES .................................................................................................. 55
13 APPENDICES ........................................................................................... 56
13.1 X421 ASIC Data Sheet ............................................................................................................ 56
WP2‐040.050.010‐TD‐001 Revision : 1
2011‐03‐11 Page 6 of 59
LIST OF FIGURES
Figure 4‐1: Simplified block diagram of the ‘R’ antenna‐beam element F‐part of the correlator. ...... 13
Figure 4‐2 Cross‐correlation systolic array. .......................................................................................... 14
Figure 4‐3 GSA board arrangement with short, nearest‐neighbour connections both horizontally
between racks, and vertically within the same rack. ................................................................. 15
Figure 4‐4 8x8 array of X421 correlator ASICs on a 19” rack‐mount board. 27 x 27 mm F672 devices
are shown, with 5 mm spacing between chips. The 10G signal “Repeaters” are likely FPGAs, or
hardcopy FPGAs with enough MGTs to satisfy the requirements. These devices can also be
used to stimulate and standalone test the board. ..................................................................... 16
Figure 4‐5 Information brief on 0XT connectors. (a) is a picture of the right‐angle PCB‐mount
connector. (b) shows the surface‐mount solder tails on signal lines, which micro‐via down to
the next layer if necessary. (c) is a 10G eye diagram of a signal going across the connector. (d)
is the mating connector, which would be installed in the interconnect board; the mid‐plane
connector consists of two of these, back‐to‐back. ..................................................................... 17
Figure 4‐6 Rear view of rack showing mid‐planes, horizontal, and vertical “printed wiring” patch
boards, eliminating interconnect cabling. .................................................................................. 18
Figure 4‐7 9 column x 8 row chip layout on the board to allow the board to be split into two
independent correlation triangles. ............................................................................................. 19
Figure 4‐8 Visibility capture regions for two output grid points. ......................................................... 22
Figure 4‐9 Visibility capture region for a CPU producing a square area of grid points. ....................... 22
Figure 4‐10 12x12 correlator matrix showing VBA routing paths from source correlator boards
(YELLOW) to a destination correlator board/gridding CPU (MAGENTA). ................................... 23
Figure 4‐11 Possible overlay of gridding CPUs to grid points, although CPUs 79, 80, 81 do not exist.24
Figure 4‐12 Correlator board output data path routing to nearest neighbour boards, and to gridding
CPUs. The nearest‐neighbour network has 2X the bandwidth of native output data
bandwidth, a somewhat arbitrary choice at this point. ............................................................. 25
Figure 4‐13 Possible front‐panel of board, with connectors for nearest‐neighbour VBA network
connections. ............................................................................................................................... 26
Figure 4‐14 Section of two racks, showing inexpensive VBA nearest‐neighbour wideband network
connections, using identical patch boards as used at the rear of the board/rack. Quad 100G
fibers from each board then route packets to gridding CPUs in a point‐to‐point fashion. ........ 26
Figure 4‐15 Front region of correlator board, and front panel, with gridding CPUs subsumed into the
correlator in the form of “Field Replaceable Processor Packs”. ................................................. 27
Figure 4‐16 Data re‐arrangement to produce ~1 kHz spectral resolution across ~74 MHz/polarization
of bandwidth, using the X421 chip. Accumulator double‐buffering in the chip allows one
band/slice to be correlating/integrating, while the other band/slice results are transmitted on
the 4 x 10G outputs. ................................................................................................................... 30
Figure 7‐1 Possible full‐scale SKA X‐part correlator layout, using the baseline X421 ASIC, measuring
~34 m W x 22 m D. Dish‐WBSPF: 1.05 GHz/polarization, 3072 elements. Dish‐PAF: 750
MHz/polarization, 2048 elements, 30 beams. SAA: 300 MHz/polarization, 256 elements, 1000
beams. DAA: 300 MHz/polarization, 256 elements, 1000 beams. Enough ‐48 VDC power plant
capacity is shown for ~3.2 MW (as shown in the table on page 53, more like 27 x 200 kW
power plants are required). This layout assumes air cooling as shown in Figure 7‐2. Not
WP2‐040.050.010‐TD‐001 Revision : 1
2011‐03‐11 Page 7 of 59
shown are the F‐parts of the correlator, or any COTS computers or network equipment for
M&C, image processing, or non‐visibility processing. ................................................................ 36
Figure 7‐2 Centralized air‐blower cooling arrangement. ..................................................................... 38
Figure 8‐1 Rough Gantt chart of X‐part correlator development, including ASIC development with a
fabrication start time at the beginning of 2016. Some of the activities could have earlier start
times, in particular preliminary X‐board feasibility study/development, software development,
rack and sub‐rack design, and FPGA design. ASIC RTL development and test is shown
beginning of 2012, which may be a bit premature, depending on how accurate future
technology forecasts are. ........................................................................................................... 47
WP2‐040.050.010‐TD‐001 Revision : 1
2011‐03‐11 Page 8 of 59
LIST OF TABLES
Table 5‐1 Possible risks and risk mitigation strategy. .......................................................................... 33
Table 9‐1 Cost Table. ............................................................................................................................ 51
Table 9‐2 Power Table. ........................................................................................................................ 53
WP2‐040.050.010‐TD‐001 Revision : 1
2011‐03‐11 Page 9 of 59
LIST OF ABBREVIATIONS
ASIC .............................. Application-Specific Integrated Circuit
BGA ............................... Ball Grid Array
CNN ............................... CPU Node Number
CoDR ............................. Conceptual Design Review
COTS ............................ Commercial Off-The-Shelf
DAA ............................... Dense Aperture Array
DDR ............................... Double Data Rate
eDRAM .......................... Embedded DRAM
EVLA ............................. Expanded Very Large Array
FFT ................................ Fast Fourier Transform
FPGA ............................. Field Programmable Gate Array
FRPP ............................. Field-Replaceable Processor Pack
FX .................................. Fourier transform, followed by spectral multiplication
GSA ............................... Giant Systolic Array
I/O .................................. Input/Output
LTA ................................ Long-Term Accumulator
MAC .............................. Multiplier-Accumulator
MGT .............................. Multi-Gigabit Transceiver
PAF ............................... Phased-Array Feed
RAM .............................. Random Access Memory
SAA ............................... Sparse Aperture Array
SDRAM ......................... Synchronous Dynamic RAM
SFP ............................... Small Form-factor Pluggable
SKA ............................... Square Kilometre Array
SM ................................. Surface Mount
VBA ............................... Visibility-Based Addressing
Void ............................... Section Intentionally blank to be completed at a subsequent issue
WBSPF .......................... Wide-Band Single Pixel Feed
X421 .............................. Baseline spectral correlator chip, used in this concept. 4 bits, 2k channels,
1k baselines.
WP2‐040.050.010‐TD‐001 Revision : 1
2011‐03‐11 Page 10 of 59
1 Introduction
1.1 Purpose of the document
The purpose of this document is to provide a complete concept‐level description of the GSA (Giant
Systolic Array) correlator [1] for the signal processing Conceptual Design Review to be held in April
2011 in Manchester.
The GSA concept has been refined somewhat since memo 127 was released; a detailed correlator
ASIC pseudo‐data sheet/specification has been produced, bearing in mind all SKA element
correlation requirements, as well as consideration of the problem of routing correlator data
products to the imaging processing system. Both of these additions are included in this document,
as well as a standalone concept description of the GSA itself.
1.2 Simplified Overview Description
The baseline GSA concept provides a straw‐man physical/electrical mechanism for arranging ‘X’ (i.e.
the ‘X’ part of an ‘FX’ correlator) correlator boards to allow for growth to handle very large
interferometer arrays in the thousands of elements, without the need for a large monolithic “corner
turner”. Rather, a distributed partial corner turner of ‘R’ elements may be used. There are
advantages in making ‘R’ larger (section 4.2.4), and if ‘R’ is the number of elements in the array, then
the concept essentially morphs into other forms similar to those which assume a complete corner
turner [2]. There is likely a trade‐off which balances the size of the corner turner and the correlator,
however the GSA concept does not require a large corner turner, and so it is easy to envision actual
implementation in concrete terms without relying on any particularly difficult technology or wiring.
There might likely even be a hybrid approach, which incorporates multiple concepts into one.
Each correlator ‘slice’ processes a given bandwidth, with a given number of spectral channels. This
slice can be part of a much larger bandwidth, part of the same interferometer beam, or can be some
slice of some bandwidth of any particular interferometer beam. The concept implementation, and
in particular the granularity of the circuit board and the ‘X’ processing chip, is chosen so as to allow
for processing of a few hundred elements up to a few thousand elements. It is thus applicable to all
element‐type telescopes envisioned for the SKA, namely SAAs, DAAs, dish‐PAFs, and dish‐WBSPFs.
The heart of the GSA is an ‘X’ correlator ASIC, the capability of which largely determines the cost,
physical scale, and power dissipation of the ensuing system(s). A baseline ASIC pseudo‐data/spec
sheet—the ‘X421’—has been developed, the first 2 pages of which are included in the appendix of
this document. Currently a cost/power study is being performed on this chip, and the results of this
study will allow reasonably accurate system power dissipation forecasting, as well as acting as a
baseline for predicting the cost and power of devices that might be more ambitious or use more
advanced technology. This chip is specified for reliability and manufacturability; it has minimal
physical I/O signals for maximum manufacturing yield and solder‐joint reliability, and all storage
RAM is on chip. The chip, in conjunction with the overall ‘F’ and X‐part design, takes a holistic
approach in that it contains facilities for transporting complete data products to the image
processing system while minimizing data output rates, with the goal of minimizing the magnitude of
the image processing problem. Indeed, an option is presented which might allow the image
processing problem to be subsumed into the correlator (Figure 4‐15).
WP2‐040.050.010‐TD‐001 Revision : 1
2011‐03‐11 Page 11 of 59
This document’s concept description also briefly touches on how/where the central beamformer will
tap into the data paths; however details of a central beamformer concept description are contained
in a separate CoDR document.
2 References
[1] Carlson, B., “The Giant Systolic Array (GSA): Straw‐man Proposal for a Mult‐Mega Baseline
Correlator for the SKA”, SKA Memo 127, August 2010.
(http://www.skatelescope.org/PDF/memos/127_Memo_Carlson.pdf)
[2] D’Addario, L., “Low‐Power Correlator Architecture For the Mid‐Frequency SKA”, SKA Memo xxx,
January 2011.
[3] Crochiere, R.E., “A Weighted Overlap‐Add Method of Fourier Analysis/Synthesis,” IEEE Trans.
Acoust. Speech Signal Process., Vol. ASSP‐28, No. 1, pp. 99‐102, February 1980.
[4] Kung, H.T., Leiserson, C.E., “Systolic Arrays (for VLSI)”, Sparse Matrix Proceedings 1978, Society
for Industrial and Applied Mathematics 1979, pp 256‐282, ISBN: 0‐89871‐160‐6.
[5] http://www.erni.com/DB/PDF/ERmetzeroXT/ERNI‐DesignCon2005‐Paper.pdf
[6] http://www.erni.com/ermetzeroxtfront.htd
[7] Dewdney, P., et. al., “SKA Phase 1: Preliminary System Description”, SKA Memo 130, November
2010.
[8] http://www.critical‐embedded‐systems.com/meecc/2005/presentations/Pautsch‐CoolCON.pdf
[9] Reliability at 50 C junction temperature reference. (TBD)
[10] Rashdan, M.; Yousif, A.; Haslett, J.; Maundy, B.; "A new time‐based architecture for serial
communication link" 16th IEEE International Conference on Electronics, Circuits, and Systems ,
Page(s): 531 ‐ 534, 2009.
[11] Rashdan, M.; Yousif, A.; Haslett, J.; Maundy, B.;" Data link design using a time‐based approach"
Proceedings of 2010 IEEE International Symposium on Circuits and Systems (ISCAS), pp. 3977 ‐ 3980,
2010.
WP2‐040.050.010‐TD‐001 Revision : 1
3 Hierarchy
3.1 Hierarchical Lifecycle
Void
4 Element Level: Signal Processing
The GSA correlator is an ‘FX’‐style correlator. Given the concession of re‐quantizing spectral
channels after FFT to 4 bits (it is assumed that the Re and Im parts require 4 bits each), and the
proliferation of technology which allows very high speed data rates on a single wire pair1, it is the
only real viable option. As such, in order to provide for sufficient numbers of spectral channels, the
correlator is essentially “distributed RAM‐dominated”, rather than compute or “copper” dominated.
The ‘F‐part’ of the correlator is not covered in any significant detail, but is described briefly so that it
can be seen that it is a reasonably straight‐forward problem to solve. The F‐to‐X data transmission
part is all point‐to‐point and so requires nothing other than commercial short‐haul (order of 10‐100
m) copper or more likely fibre technology to solve. The data distribution within the X‐part of the
correlator is all performed without cabling, and is flexible enough to correlate hundreds to
thousands of elements.
4.1 F‐part
The F‐part of the correlator consists of a geometric delay and (poly‐phase) FFT [3], followed by a
short channel‐burst buffer, followed by packet builders/mergers such that in each data packet
destined for the X‐part of the correlator, there are ‘R’ antennas, each with 1 spectral channel, for a
‘J’ channel‐burst length2, for a particular band/slice. ‘K’ such packets then make up an entire
“Packet (channel) Group”, which is all of the channels that are to be correlated for the particular
band/slice correlator. Included within each Packet Group, for each antenna‐beam, is an “INFO
Packet”, providing information to facilitate X‐part integration/data compression, and labelling of
data for facilitating transmission to, and processing by, the downstream image processing system.
Information in the INFO Packet is calculated and inserted into the packet stream in real time by a
CPU, from information supplied by a centralized “meta‐data” source.
A simplified diagram of an ‘R’ antenna‐beam element F‐part of the correlator is shown in Figure 4‐1.
No detailed design of this part of the correlator is provided in this concept description, but it is
reasonable to assume that for small ‘R’ (~8), the processing shown can easily fit within several
devices on one circuit board, and possibly although not necessary or necessarily, multiple such ‘R’‐
elements could fit on one circuit board. Thus, there is simple point‐to‐point connectivity from this
part to each correlator band/slice.
1 It would seem that the cost/complexity of bandwidth connecting boards/chips has out-paced the rate of packing silicon on a chip (or more importantly, the cost and power of doing so), although no formal data is presented to support this statement. 2 In the X421 chip, the minimum required channel-burst length ‘J’ is 64.
WP2‐040.050.010‐TD‐001 Revision : 1
2011‐03‐11 Page 13 of 59
Figure 4‐1: Simplified block diagram of the ‘R’ antenna‐beam element F‐part of the correlator.
The F‐part of the correlator is the same for every SKA element in concept (i.e. element as far as the
correlator is concerned), whether it be SAA, DAA, dish‐PAF, or dish‐WBSPF. From the X‐part of the
correlator’s perspective, there is no difference other than the number of elements (‘Ants’) that must
be cross‐correlated.
Additionally, it is likely the case that the signal entering the correlator is already coarsely
channelized, or possibly even completely channelized, and so the actual location of the FFT
filterbank is not finalized yet and does not yet need to be finalized for the basic concept to remain
sound.
In the GSA concept, each “band/slice” correlator for a particular SKA element type is identical.
4.2 X‐part
The X‐part of the correlator must calculate, for every spectral channel, the cross‐correlation
coefficient for every baseline or antenna‐antenna pair. There are therefore N2/2 baselines (including
auto‐correlations), that must be correlated. A simple systolic‐array [4] arrangement, showing the
correlation elements and data paths, which might accomplish this processing is shown in Figure 4‐2.
Poly-phaseFFT
Filterbank
DelayBW-MHz
Channel-BurstBuffer(BK x J array)
Station-beam F-Processing Element
K
K
K
INFO PacketsAnt-Beam-1
Poly-phaseFFT
Filterbank
DelayBW-MHz
Channel-BurstBuffer(BK x J array)
Station-beam F-Processing Element
K
K
K
INFO PacketsAnt-Beam-R
Pac
ket
Bui
lder
/Mer
ger—
Ban
d/sl
ice
B
Pa
cket
Bu
ilder
/Mer
ge
r—B
an
d/s
lice
1
Pac
ket B
uild
er/M
erge
r—B
and/
slic
e 2
to correlator band/slice 1
to correlator band/slice 2
to correlator band/slice B
WP2‐040.050.010‐TD‐001 Revision : 1
2011‐03‐11 Page 14 of 59
Figure 4‐2 Cross‐correlation systolic array.
The required number of systolic array nodes is determined by the number of antennas that are to be
processed, and the number of baselines that can be correlated in each node. Thus, for a very large
number of antennas, processing as many baselines in a node is advantageous. Also, each array node
can be hierarchical in nature; in the baseline GSA concept a top‐level node is a circuit board, and it
consists of an array of sub‐nodes each element of which is implemented in an ASIC.
Assuming that the entire matrix does not fit on a single convenient entity (circuit board), one way of
connecting adjacent circuit boards together to form the entire matrix is shown in Figure 4‐3. In this
scheme, the boards are mounted horizontally in adjacent racks, facilitating horizontal and vertical
nearest‐neighbour connections. The bold BLUE and RED lines in the figure show data flow for one
particular set of antenna data through the array. The end of this data flow (i.e. racks on the ends of
the matrix), is where F‐part data is accessed for central beam‐forming. Central beam‐forming
hardware can then reside in equipment in additional racks on either end.
The number of band/slice correlation matrices/triangles which can fit in a rack is determined by the
vertical occupancy of the board, likely 1U height, and by the number of racks that can be stacked
side‐by‐side, fundamentally only limited by the mechanical design of the installation. Typically, 48U
of vertical space could be made available, allowing for a 48 x 48 matrix, or many smaller such
band/slice matrices within a smaller adjacent rack footprint.
Antenna Inputs
WP2‐040.050.010‐TD‐001 Revision : 1
2011‐03‐11 Page 15 of 59
Figure 4‐3 GSA board arrangement with short, nearest‐neighbour connections both horizontally between
racks, and vertically within the same rack.
4.2.1 Baseline GSA Correlator Board (Dishes)
The baseline X421 ASIC design provides for 1024 baselines (32 x 32 antennas), full polarization, 2048
channels per product, at up to 75 MHz per polarization. There are 4, 10G3 ‘X’ inputs and 4, 10G ‘Y’
inputs, each input containing real‐time spectral channel data from 8 antennas in the manner as
outlined in section 4.1. The device has very few I/Os (~72 signal contacts plus power and GND),
requires no external chips other than filtering capacitors and power supplies and will likely fit into a
27x27 mm F672 BGA package4. It is no stretch to consider that an 8x8 array (alternatively, it may be
more cost‐effective to have 4X capacity on a chip, and have a 4 x 4 array of chips on a board) of such
chips could easily fit on a board which can horizontally mount in a sub‐rack (crate, shelf), which then
mounts in a 19” rack. A preliminary layout of such a board, which might be used as one element in a
12x12 systolic array (see Figure 4‐2), allowing for cross‐correlation of 3072 SKA dishes, is shown in
Figure 4‐4. The layout allocates room for a 5 mm gap between chips on the board, needed for SM‐
rework.
With 64 such chips on a board, there are a total of 64 x ~72 ~= 4600 signal solder contacts; with a
raw manufacturing solder‐joint defect rate of 50 DPMO5 (Defects Per Million Opportunities), there
are ~0.2 defects per board. Normally though, these defects are not uniformly distributed, so in
practice few boards should actually experience defects, and those that do have multiple defects. By
3 ‘10G’ is 10 Gigabits per second clear channel on a single differential pair, likely 64/66B encoding for an actual line rate of 10.3125 Gbps. 4 Up to a 35x35 mm package could be accommodated within the 19” rack-mount board width. 5 As per the EVLA experience, 50 DPMO is considered “standard” in the industry for a well-controlled solder process.
Inputs from F-part
Inputs from F-partB
and/
slic
e 1
Ban
d/sl
ice
2
WP2‐040.050.010‐TD‐001 Revision : 1
2011‐03‐11 Page 16 of 59
comparison, the EVLA Baseline Board had ~100k solder points (total; ~50k signal solder points), and
had a raw manufacturing yield of about 75%. The raw yield should therefore be quite high with
~1/10th as many solder joints, as should the aggregate solder‐joint failure rate.
32 x 10G pairs enter, are repeated, and exit the board via 40‐pair 0XT (“Zero X Tee”) connectors in
the horizontal and vertical directions in a nearest neighbour scheme (as shown in Figure 4‐3). 0XT
connectors are similar to HM‐Zd connectors, except they achieve higher performance with surface‐
mount signal contacts and press fit ground contacts, rather than press‐fit signal contacts (and the
resulting short transmission‐line stubs which affect signal quality). An information brief on the 0XT
connector is shown in Figure 4‐5. Further information on the 0XT connector can be found in [5] and
[6].
Figure 4‐4 8x8 array of X421 correlator ASICs on a 19” rack‐mount board. 27 x 27 mm F672 devices are
shown, with 5 mm spacing between chips. The 10G signal “Repeaters” are likely FPGAs, or hardcopy FPGAs
with enough MGTs to satisfy the requirements. These devices can also be used to stimulate and standalone
test the board.
~16.5"
~15"
LEDsJTAG
Repeater
Horizontal directionVertical direction
Horizontal direction
Repeater
4
Power Supply
40-pair0XT(IN)
40-pair0XT
(OUT)
-48
PW
R+
M
40-pair0XT(IN)
40-pair0XT
(OUT)
-48
PW
R+
C
X421 X421 X421 X421 X421 X421 X421 X421
X421 X421 X421 X421 X421 X421 X421 X421
X421 X421 X421 X421 X421 X421 X421 X421
X421 X421 X421 X421 X421 X421 X421 X421
X421 X421 X421 X421 X421 X421 X421 X421
X421 X421 X421 X421 X421 X421 X421 X421
X421 X421 X421 X421 X421 X421 X421 X421
X421 X421 X421 X421 X421 X421 X421 X421
4 4 4 4 4 4 4
4
4
4
4
4
4
4
4
3232
Power Supply
Power Supply
Power Supply
Power Supply
Power Supply
Power Supply
Power Supply
M&
C S
FP
M&CFPGAData Output Transport
Data Output Connectors
WP2‐040.050.010‐TD‐001 Revision : 1
2011‐03‐11 Page 17 of 59
There is a single M&C (Monitor and Control) FPGA on the board, which communicates with the
outside world via (for example, but not restricted to) a 1G Ethernet SFP. For wiring simplicity, all
communications between the M&C FPGA and the X421 chips are via 8 I2C busses.
Note that in memo 127 [1] there is a low‐latency 10G repeater and globally distributed reference
clock. It is believed that with a small matrix (i.e. ≤ 12 x 12) this specialized repeater is not necessary
and standard asynchronous rate‐matching methods can be used, eliminating the need for a globally‐
distributed reference clock. Also, the fault tolerant passive‐bypass method in memo 127 is likely
problematic from a signal integrity standpoint, and unnecessary with so few signal repeats.
Figure 4‐5 Information brief on 0XT connectors. (a) is a picture of the right‐angle PCB‐mount connector. (b)
shows the surface‐mount solder tails on signal lines, which micro‐via down to the next layer if necessary. (c)
is a 10G eye diagram of a signal going across the connector. (d) is the mating connector, which would be
installed in the interconnect board; the mid‐plane connector consists of two of these, back‐to‐back.
The “Data Output Transport” and “Data Output Connectors” in Figure 4‐4 are not defined in the
figure, but some ideas on what form these might take are defined in section 4.3. Note that due to
the large memory capacity of the X421 chip, each chip has 4 x 10G data output lanes designed to be
operated in a daisy‐chain fashion with other chips in a column. If these lanes are daisy‐chained
amongst all 8 chips in a column, then the smallest integration time that can be accommodated (on
every product) is ~170 msec, requiring 32, 10G outputs6. If 85 msec integration times are required
(daisy‐chaining 2 groups of 4 chips in a column), there will be 64, 10G outputs, matching the data
input rate into the board(!) A more dynamic scheme could achieve higher dump rate performance
6 For 32-bit (64-bit complex) spectral visibilities, with a data valid count per spectral visibility to allow for per-channel temporal RFI flagging.
(a)(b)
(c)(d)
WP2‐040.050.010‐TD‐001 Revision : 1
2011‐03‐11 Page 18 of 59
by adjusting the output word width based on integration time. This parameter space remains to be
explored, and is not part of the baseline X421 specification. Nevertheless, even with a 16‐bit
visibility word size, the output data rates are still enormous.
4.2.2 Board‐to‐Board Interconnections (Dishes)
As indicated in Figure 4‐3, with horizontal board mounting, adjacent boards can be connected to
each other in a nearest‐neighbour fashion. One way of achieving this is with short cables, but these
are quite expensive because of performance and the number of connections and wire to connector
contacts that must be manufactured. Perhaps a better way is to use “printed wiring connections” in
the form of small “patch boards” (passive PCBs with press‐fit or SM connectors), which plug into the
rear of mid‐planes, into which the correlator PCB is itself plugged in via the front. Thus, all boards
are connected together without the need for expensive cables. (A similar patch board scheme was
used for interconnecting adjacent boards in the EVLA correlator; the board was larger and contained
more connections than required here, and cost ~$75 ea. in small quantities, ~10X savings in cost
over cable.)
Figure 4‐6 shows one small section of the rear of the racks indicating the basic layout of the mid‐
planes and interconnect patch boards.
Figure 4‐6 Rear view of rack showing mid‐planes, horizontal, and vertical “printed wiring” patch boards,
eliminating interconnect cabling.
This interconnect scheme requires higher precision adjacent rack mechanics than normally is
available in COTS 19” rack‐mount systems, likely requiring custom‐engineered racks and sub‐racks.
It is also possible to provide a range of mid‐plane and patch board granularities to simplify and speed
up installation time. The population of each rack in terms of correlator boards and mid‐planes is
identical; which part of the correlation triangle the rack occupies is set by the specific population of
the interconnection patch boards.
For cross‐correlation of 3072 elements, a 12 x 12 correlation triangle of boards is required (for the
baseline X421 chip design, and the board layout shown in Figure 4‐4). Dual correlation systolic array
OUTINOUTIN
OUTINOUTIN
OUTINOUTIN
OUTINOUTIN
OUTINOUTIN
Rack Support Columns
Direction turnerpatch board
IN
IN
IN
IN
IN
Vertical patch board
Horizontal patch board
HMZd Midplane
Input from F-part
Vertical stiffening bar
48 V power studs
Remote power M&C
studs
WP2‐040.050.010‐TD‐001 Revision : 1
2011‐03‐11 Page 19 of 59
matrices (triangles) can be fitted back‐to‐back, and so in a 13U x 12‐rack space, 3072 antennas can
be correlated, 150 MHz per polarization, 4096 channels per polarization product. If another column
of chips is added to the board so that the board itself may be split into two correlation triangles for
boards along the diagonal (one triangle for one band/slice and the other triangle for the other
band/slice), then a 12U x 12‐rack space is required, and 12 boards are saved, although likely at the
expense of having to use interconnect cables rather than patch boards for the shared boards along
the diagonal of the triangle. In such an arrangement, with 48U of vertical rack space available, 4
such dual‐triangle arrays can be accommodated, for a total bandwidth of 600 MHz and 16,384
channels per polarization product.
This split also optimizes the board for cross‐correlation of up to 256 antennas, allowing the board to
be used for SAA and DAA correlation. The basic chip layout of the board, not showing the board
outline, signal repeaters, or output connectors is shown in Figure 4‐7.
Figure 4‐7 9 column x 8 row chip layout on the board to allow the board to be split into two independent
correlation triangles.
Note that the baseline X421 chip design supports the necessary data path switching as indicated in
Figure 4‐7.
40-pair0XT
connector
40-pair0XT
connector
4 x 10G
4 x 10G
256 antennas, 2 bands of 50 MHz/pol’n per board
32 x 10G
32 x 10G
WP2‐040.050.010‐TD‐001 Revision : 1
2011‐03‐11 Page 20 of 59
Although the same board design could be used for dishes, SAAs, and DAAs, in cases where the left‐
most column of Figure 4‐7 is unused, those chips are wasted. It might therefore be more prudent to
have two board designs, one with an 8x8 array of chips and one with a 9x8 array of chips.
4.2.3 SAA and DAA Board Interconnections
For the SAA and DAA application where there are ≤256 elements to cross‐correlate, there is no need
for any board‐to‐board interconnection. Each board (Figure 4‐7) contains two independent
band/slice 256‐element cross‐correlators, each of which (with the baseline X421 chip) is capable of
75 MHz per polarization, 2048 channels per polarization product. Thus, boards can be populated in
a rack in either a vertical or horizontal orientation, with point‐to‐point connections from the F‐part
to correlator boards in the X‐part.
The total number of boards required is nbeams x (bandwidth per beam‐polarization)/ 150 MHz. For
example for 1000 beams, and 300 MHz/polarization, 2000 boards are required. Assuming the
horizontal mounting strategy and 48 boards per rack, 42 racks are required. If not enough spectral
channels are available in “normal” operating mode, the method described in section 4.5.3, could be
used to obtain more channels, but with longer integration times as the output data volume per
integration has increased.
4.2.4 Discussion—System Effects of ASIC Capacity
If the ASIC capacity is instead such that it processes 4k baselines, 1024 spectral channels per
product, with 16 antennas multiplexed onto one 10G data stream (an ‘X414’ design), and ½ the
bandwidth (37.5 MHz/polarization) then the correlation triangle for 3072 antennas is reduced to 6 x
6, or 18 boards, which might fit into a larger‐than‐19” sub‐rack, eliminating the need for horizontal
board mounting and requiring only a monolithic backplane per sub‐rack to distribute real‐time data.
Or, the horizontal mounting scheme could still be used, and two such triangles (75 MHz per
polarization) would fit in a 6 x 6 arrangement of 36 boards. Therefore, 150 MHz per polarization
3072 antennas would fit into 72 boards, or ½ the 12 x 12 board requirement for the X421 design, a
savings of a factor of 2. The total memory capacity, for the same spectral resolution, however,
remains the same. This assumes that that much memory (2X the X421) can fit on the chip, or that
the channel‐burst factor ‘J’ is significantly increased and off‐chip RAM is used, complicating the
board design and impacting board yield and reliability.
In this case, the ASIC memory capacity is double the X421 design, and all of the multiply/accumulate
(MAC) logic runs at 156.25 MHz. This is very slow; running the logic at 625 MHz (the X421 runs at
312.5 MHz, requiring very little if any pipelining, itself a savings in power and area) would allow a
time‐multiplexing of baselines by a factor of 4, reducing the MAC logic by a factor of 4 (so that it is
the same as the X421 design). This is a more complicated design, and would require that a factor of
4X the number of correlated data products exit each board. As more antennas are multiplexed into
a 10G link, the architecture essentially morphs into that proposed in [2].
For the SAA and DAA, an ‘X414’ design would then allow for 4X capacity on each board, at ½ the
bandwidth for similar overall board count savings.
WP2‐040.050.010‐TD‐001 Revision : 1
2011‐03‐11 Page 21 of 59
4.3 Correlator to Image Processor Data Transport—Visibility‐Based Addressing
The X421 ASIC baseline design contains capabilities to help to facilitate real‐time image processing.
In particular, it calculates and includes in the output data frame, u, v, and w for the time and channel
centroid of a correlated data packet, and allows for epoch‐less baseline‐based integration times, as
well as baseline‐based channel integration. These facilities are based on the notion that shorter
baselines can allow for longer integration times and fewer spectral channels (for continuum
observations), with linearly or non‐linearly decreasing integration times and numbers of spectral
channels on longer baselines. RAM capacity in the chip is sufficient to allow for 20 second on‐chip
accumulation, eliminating the need for a separate LTA7. The goal is to include in the output data
packet as much complete information as possible for the image processing computers.
To deal specifically with correlator to image processor data transport, an internal correlator network
is envisioned, using “visibility‐based addressing” (VBA). The goal of VBA is to transport correlated
data packets to only those image processing computers that need those products for gridding. This
method assumes that a gridding CPU does not need all of the visibilities all of the time (and indeed
needs all of the visibilities only a small fraction of the time), and that it can highly compress the data
rates as it grids‐and‐integrates, passing data at relatively low data rates on to 2D‐FFT, self‐cal, CLEAN
etc.
In the proposed VBA scheme, the correlator handles correlated data distribution, rather than
handling it in a generic COTS network. For the simple case where a gridding CPU produces exactly 1
gridded output point, there is some radius (or region) of “visibility capture” for the CPU; all raw
visibilities within that capture region must get transported to that CPU. The next CPU, producing the
next gridded point, has a capture region which might overlap with the first CPU. Therefore, there
can be multiple gridding CPU destinations for a particular correlator data product. This concept is
shown in Figure 4‐8.
7 Although perhaps a separate on-board LTA is more efficient. This parameter space remains to be explored.
WP2‐040.050.010‐TD‐001 Revision : 1
2011‐03‐11 Page 22 of 59
Figure 4‐8 Visibility capture regions for two output grid points.
The visibility capture radius shown in Figure 4‐8 is meant to be general, and not indicate any
quantitative requirement or limitation. (The importance of “w” is not being ignored; it is assumed
that any effect of “w” on packet routing can be incorporated by appropriately adjusting the u,v‐
based capture region.) Of course, the larger the visibility capture region for a grid point, the larger
the effective correlator output data rate, as identical output packets must travel to multiple
destinations.
If a gridding CPU handles/generates more than one grid point, say a square region of grid points, the
visibility capture region for the CPU is also roughly a square, as indicated in Figure 4‐9 below.
Figure 4‐9 Visibility capture region for a CPU producing a square area of grid points.
U1,V1
r1
U2,V2
r2
time
Grid point U1,V1 raw visibility capture region
Grid point U2,V2 raw visibility capture region
Raw visibilities from correlator
U1,V1 U2,V2
U3,V3 U4,V4
WP2‐040.050.010‐TD‐001 Revision : 1
2011‐03‐11 Page 23 of 59
The assignment of grid points to CPUs does not have to be uniform. It can be based on the expected
number of raw visibilities within those grid points’ capture region, and the chosen baseline‐based
integration time, so that gridding CPU load balancing can be achieved.
To actually implement VBA, consider the following simple configuration. Each correlator board has a
single high‐speed network connection to a CPU. “The CPU” could be a single core processor, a multi‐
core processor with shared memory, or multiple independent processors with a gateway node
directing packets to multiple destination processors. Within the correlator, there are full‐duplex
nearest neighbour 2D connections between correlator boards. One or more controllers on each
correlator board follow a simple algorithm to route internally‐correlated packets, based on u and v,
directly to the “native” gridding CPU, or to a nearest neighbour board. Also, incoming “non‐native”
packets get routed to the gridding CPU, or to another nearest neighbour board on their way to a
destination gridding CPU.
Figure 4‐10 is a simple example showing routing paths from source correlator boards, to one
destination correlator board’s gridding CPU, for the 12x12 correlator matrix used with the X421 chip
for dish correlation up to 3072 elements. In reality, it is likely the case that each correlator board
generates data packets destined for all other boards’ CPUs.
Figure 4‐10 12x12 correlator matrix showing VBA routing paths from source correlator boards (YELLOW) to
a destination correlator board/gridding CPU (MAGENTA).
The overlay of CPUs to grid points might be something like Figure 4‐11 below, although there isn’t a
perfect mapping of correlator boards to a square grid of CPUs (but nor does there need to be).
77 7876
5657
7574
5859
54 5553
38 37
73
60
52
39
35 36
72
61
51
40
71
62
50
41
70
63
49
42
69
64
48
43
28
34333231
27262524
21201918
13 14 15
109
68
65
47
44
30
23
17
12
8
5 6
3
67
66
46
45
29
22
16
11
7
4
2
1
WP2‐040.050.010‐TD‐001 Revision : 1
2011‐03‐11 Page 24 of 59
Figure 4‐11 Possible overlay of gridding CPUs to grid points, although CPUs 79, 80, 81 do not exist.
Each correlator board has a finite bandwidth to its nearest neighbours, and so it will sometimes
likely be the case that at a particular instant, transmission to the actual nearest neighbour is not
possible because the link is in use. In this case, the packet is transmitted to the next‐best nearest
neighbour (i.e. re‐routed), where eventually it will find its way to the final destination node. Clearly,
there must be a “time to live” on the packet so that it doesn’t stay cycling in the network for an
indeterminate period of time, although there is no particular rush or order in which packets must
show up at gridding CPUs. Also, a particular packet can have multiple destination gridding CPUs, and
so the packet must, at the start of its journey, be tagged with a list of all possible destinations; as the
packet gets delivered, these destination tags are removed until all tags are gone, at which point the
packet dies.
The basic VBA algorithm is therefore as follows:
Each node (correlator board packet router/controller) that connects to a CPU is assigned a
CPU/node number (CNN), such that numbers are sequential in the manner shown in Figure
4‐10, and Figure 4‐11. This allows the packet to find its destination via the nearest‐
neighbour network in a straightforward manner.
Each CNN has a (u,v,r2) (or (u1,v1; u2,v2; u3,v3; u4,v4)) tag indicating its u,v capture region.
Each node keeps track of every CNN, and its associated u,v capture region. Call this the
“CNN‐TABLE”. This information is provided to nodes via CPUs broadcasting information
packets to all nodes every so often.
When a correlator node generates a (correlated) data packet, it looks in the CNN‐TABLE and
makes a list of every node (and ultimately gridding CPU) the packet must “visit”. It appends
(or pre‐pends) this list of tags to the data packet. Set a “time to live” on the packet, which is
the maximum number of node transmissions that will be allowed until the packet is killed.
The node then transmits the packet out to a nearest neighbour node, which has an available
channel and is closest to the closest node in the CNN‐LIST.
1 2 3 4 5 6 7 8 9
10 11 12 13 14 15 16 17 18
19 20 21 22 23 24 25 26 27
28 29 30 31 32 33 34 35 36
37 38 39 40 41 42 43 44 45
46 47 48 49 50 51 52 53 54
55 56 57 58 59 60 61 62 63
64 65 66 67 68 69 70 71 72
73 74 75 76 77 78 79 80 81
WP2‐040.050.010‐TD‐001 Revision : 1
2011‐03‐11 Page 25 of 59
Once a packet is transmitted to a gridding CPU in the CNN‐LIST, the entry (tag) in the CNN‐
LIST is deleted.
Continue on until there are no CPUs left in the CNN‐LIST, or until the “time to live” has
expired, at which point the packet dies.
Possible data paths on the board, to nearest‐neighbour boards, and to gridding CPUs are shown in
Figure 4‐12 below. Only correlated packet data paths are shown in the figure. 4 x 10G connections
from each column of X421 ASICs are good enough for 170 msec integration times, with possible
improvement (decrease) by adjusting word sizes based on integration time, as previously
mentioned.
The nearest‐neighbour network bandwidth in the figure is double the native output bandwidth. This
is a first guess as to the actual requirement, which depends on the u,v capture region and the
distribution of short (long integration) and long (short integration) baselines to correlator nodes.
This is something of a difficult problem to analyse; network traffic simulations will likely be required
to study how it behaves, and what nearest‐neighbour bandwidth is actually required.
Figure 4‐12 Correlator board output data path routing to nearest neighbour boards, and to gridding CPUs.
The nearest‐neighbour network has 2X the bandwidth of native output data bandwidth, a somewhat
arbitrary choice at this point.
RouterFPGA
orASIC
RouterFPGA
orASIC
8 8 8
100G
8 8 8
To CPUL T B To CPU RT B
44 4
44
4 44
8
2
100G
2
10G
10G
10G
10G
R L
X421 X421 X421 X421 X421 X421 X421 X421
X421 X421 X421 X421 X421 X421 X421 X421
X421 X421 X421 X421 X421 X421 X421 X421
WP2‐040.050.010‐TD‐001 Revision : 1
2011‐03‐11 Page 26 of 59
The implementation of the (front‐panel) nearest neighbour network will greatly impact its cost. One
way of doing it is to use a similar approach to data distribution as is used for real‐time data at the
rear of the board. HM‐Zd or 0XT connectors poke out through the front panel, and, once boards are
in place, connect to each other using identical patch boards as used at the rear, except without the
need for mid‐planes. The front panel of the board might look like that shown in Figure 4‐13.
Figure 4‐13 Possible front‐panel of board, with connectors for nearest‐neighbour VBA network connections.
A slice of the correlator showing patch‐board nearest neighbour connections is shown in Figure 4‐14.
The dark dividing line demarcates the interface between the two band/slice correlators in the rack.
There are no network connections between different band/slice correlators8.
Figure 4‐14 Section of two racks, showing inexpensive VBA nearest‐neighbour wideband network
connections, using identical patch boards as used at the rear of the board/rack. Quad 100G fibers from each
board then route packets to gridding CPUs in a point‐to‐point fashion.
VBA addressing has nicely collated the data packets so that there are point‐to‐point connections
from the correlator boards to the gridding CPUs. However, there are still a huge number of 100G
fiber connections required to the CPUs!
The natural progression, it seems, is to subsume the gridding CPUs into the correlator. One way this
might be done is shown in Figure 4‐15.
8 Unless boards along the diagonal are a 9x8 array of chips to optimally use board resources.
TBL R
1G M&C
32 10G pairs 32 10G pairs
16 10G pairs16 10G pairs
GSA X Board (64k baselines) Ver. 2.1
2 x 100G fiber to CPU 2 x 100G fiber to CPU
TBL R
1G M&C
32 10G pairs 32 10G pairs
16 10G pairs16 10G pairs
GSA X Board (64k baselines) Ver. 2.1
2 x 100G fiber to CPU 2 x 100G fiber to CPU TBL R
1G M&C
32 10G pairs 32 10G pairs
16 10G pairs16 10G pairs
GSA X Board (64k baselines) Ver. 2.1
2 x 100G fiber to CPU 2 x 100G fiber to CPU
TBL R
1G M&C
32 10G pairs 32 10G pairs
16 10G pairs16 10G pairs
GSA X Board (64k baselines) Ver. 2.1
2 x 100G fiber to CPU 2 x 100G fiber to CPU
TBL R
1G M&C
32 10G pairs 32 10G pairs
16 10G pairs16 10G pairs
GSA X Board (64k baselines) Ver. 2.1
2 x 100G fiber to CPU 2 x 100G fiber to CPU
TBL R
1G M&C
32 10G pairs 32 10G pairs
16 10G pairs16 10G pairs
GSA X Board (64k baselines) Ver. 2.1
2 x 100G fiber to CPU 2 x 100G fiber to CPU
TBL R
1G M&C
32 10G pairs 32 10G pairs
16 10G pairs16 10G pairs
GSA X Board (64k baselines) Ver. 2.1
2 x 100G fiber to CPU 2 x 100G fiber to CPU
TBL R
1G M&C
32 10G pairs 32 10G pairs
16 10G pairs16 10G pairs
GSA X Board (64k baselines) Ver. 2.1
2 x 100G fiber to CPU 2 x 100G fiber to CPU
TBL R
1G M&C
32 10G pairs 32 10G pairs
16 10G pairs16 10G pairs
GSA X Board (64k baselines) Ver. 2.1
2 x 100G fiber to CPU 2 x 100G fiber to CPU
TBL R
1G M&C
32 10G pairs 32 10G pairs
16 10G pairs16 10G pairs
GSA X Board (64k baselines) Ver. 2.1
2 x 100G fiber to CPU 2 x 100G fiber to CPU
TBL R
1G M&C
32 10G pairs 32 10G pairs
16 10G pairs16 10G pairs
GSA X Board (64k baselines) Ver. 2.1
2 x 100G fiber to CPU 2 x 100G fiber to CPU
TBL R
1G M&C
32 10G pairs 32 10G pairs
16 10G pairs16 10G pairs
GSA X Board (64k baselines) Ver. 2.1
2 x 100G fiber to CPU 2 x 100G fiber to CPU
TBL R
1G M&C
32 10G pairs 32 10G pairs
16 10G pairs16 10G pairs
GSA X Board (64k baselines) Ver. 2.1
2 x 100G fiber to CPU 2 x 100G fiber to CPU
TBL R
1G M&C
32 10G pairs 32 10G pairs
16 10G pairs16 10G pairs
GSA X Board (64k baselines) Ver. 2.1
2 x 100G fiber to CPU 2 x 100G fiber to CPU
TBL R
1G M&C
32 10G pairs 32 10G pairs
16 10G pairs16 10G pairs
GSA X Board (64k baselines) Ver. 2.1
2 x 100G fiber to CPU 2 x 100G fiber to CPU
TBL R
1G M&C
32 10G pairs 32 10G pairs
16 10G pairs16 10G pairs
GSA X Board (64k baselines) Ver. 2.1
2 x 100G fiber to CPU 2 x 100G fiber to CPU
TBL R
1G M&C
32 10G pairs 32 10G pairs
16 10G pairs16 10G pairs
GSA X Board (64k baselines) Ver. 2.1
2 x 100G fiber to CPU 2 x 100G fiber to CPU
WP2‐040.050.010‐TD‐001 Revision : 1
2011‐03‐11 Page 27 of 59
Figure 4‐15 Front region of correlator board, and front panel, with gridding CPUs subsumed into the
correlator in the form of “Field Replaceable Processor Packs”.
Dual FRPP (“Field Replaceable Processor Packs”) are installed into slots in the front of each board,
and it is in these CPUs where gridding occurs. Final integrated grid points are then transmitted to
final image processing CPUs via the 1G (or 10G) M&C network (assuming the data rate is vastly
reduced), so that there is only one external network connection to each board. The final installed
rack front view will be similar to Figure 4‐14, except there is only one network cable travelling to
each board, rather than several. Thermal issues are also a concern with this approach, and will likely
be the governing factor as to whether subsuming the gridding CPUs into the correlator is feasible or
not.
A survey of existing CPU mezzanine card standards and connectors did not reveal anything suitable
both in terms of assumed CPU horsepower requirements, form‐factor, and bandwidth of the
mezzanine connector to the motherboard. Thus, a custom form‐factor might be required, or time
might have to elapse until a standard—suitably matched to the task—becomes available. A 40‐pair
0XT or HM‐Zd connector would fit the bill as far as bandwidth goes (400 Gbps). The CPU pack should
be field replaceable so that it can be easily upgraded without having re‐design the board, remove
the board from the rack, or power it down.
4.3.1 Discussion
The proposed VBA scheme is likely not required for the SAA or DAA correlator, as each board
generates all of the cross‐correlation products for a band/slice‐beam, and so all of the data is
available in one cluster of output spigots. At most, SAA and DAA might require distribution nodes to
route correlated data packets to multiple CPUs for gridding, assuming one CPU (and by “CPU” this
40-pair0XT F
40-pair0XT F
40-pair0XT FRJ-45
40-pair0XT F
Field Replaceable Processor Pack
Power
40-pair0XT F
Field Replaceable Processor Pack
Power
40-pair0XT F
RouterFPGA or ASIC
RouterFPGA or ASIC
M&C FPGA
TB
L R
1/10G Eth32 10G pairs 32 10G pairs
16 10G pairs16 10G pairs
GSA X Board (64k baselines) Ver. 2.1
FRPP STATUS: ACTIVES/W V2.53
FRPP STATUS: ACTIVES/W V2.53
X421 X421 X421 X421 X421 X421 X421 X421
X421 X421 X421 X421 X421 X421 X421 X421
WP2‐040.050.010‐TD‐001 Revision : 1
2011‐03‐11 Page 28 of 59
could mean multi‐core CPU) can’t handle the full load for that band/slice‐beam, and there is no
overlap (correlated data product sharing requirement) between band/slice‐beams.
For dish correlation, as the capacity of the ‘X’‐ASIC increases as discussed in section 4.2.4, the
number of boards required to perform cross‐correlations decreases, and output “bandwidth
density” increases. This could change the morphology of the VBA, and render the currently
proposed scheme obsolete. Nevertheless, it would seem that it is always likely the case that there
needs to be some way of splitting up the gridding task to multiple processors, either in u‐v space, or
in time.
Clearly due to instrumental and atmospheric effects, it is not possible to blind open‐loop grid data
points. It is assumed that on some regular basis and for some short period of time, all of the
visibilities must get to a single CPU where iterative gridding and image processing are performed to
determine calibration coefficients that can then be applied, for some longer period of time, to open‐
loop‐grid the data. In this case, the VBA network does not have to change. The algorithm presented
on page 24 is just augmented with the ability to tell each correlator board/packet‐router that for a
specific period of time, all packets are routed to a particular gridding CPU (or CPU hanging off a
particular gridding CPU). In this way it is possible to time‐multiplex across all available CPUs,
iterative image processing for calibration coefficient generation. Calibration coefficients could then
be distributed to all gridding CPUs using the VBA network, or a separate network. Part and parcel
with this scheme is likely the need for each gridding CPU to buffer large amounts of incoming data
until calibration coefficients become available. To minimize the duty cycle and frequency with which
calibration coefficients are calculated, it is prudent to try to ensure that all SKA elements are
designed for the longest‐term stability possible.
4.4 Central Beam‐former Data Access
Details of the central beam‐former concept are contained in a separate document. However, some
points about how to access the data are as follows:
For dishes, each band/slice F‐part output data is accessed at either the edge of the matrix
shown in Figure 4‐3.
For dishes, racks at either end of the band/slice correlation matrix perform beam‐forming
operations in a hierarchical manner.
What comes out of these racks are spigots of band/slice beams, which can be merged with
other band/slice beam outputs (i.e. same beam, different spectral channels) in centralized
equipment contained in separate racks, with final output from each beam, all spectral
channels, routed to non‐visibility processing equipment.
For SAAs and DAAs, all signals necessary to beam‐form a band/slice are available on one
board, and so central beam‐former signal processing is most likely contained on the same
circuit board as the correlator. Band/slice spigots are then merged in centralized equipment
contained in separate racks as for dishes, with final output from each beam, all spectral
channels, routed to non‐visibility processing equipment.
WP2‐040.050.010‐TD‐001 Revision : 1
2011‐03‐11 Page 29 of 59
4.5 Phase‐I SKA
The concept design presented in the preceding sections was developed for the full‐scale SKA
consisting of several thousand dishes (WBSPF or PAFs), and up to 256 SAA and DAA elements. This
section briefly describes how the design can be down‐scaled to the Phase‐I SKA requirements, as
defined in memo 130 [7]. However, it is believed that the most cost‐effective Phase‐I
implementation is with FPGA boards using COTS infrastructure, as the scale of Phase‐I pales in
comparison to Phase‐II.
4.5.1 Phase‐I F‐Part
The F‐part of the correlator, whether for SAAs, DAAs, or dishes, is no different for Phase‐I than it is
for the full‐scale SKA, as the design requires only a partial distributed corner‐turner, rather than a
monolithic corner‐turner. Phase‐I F‐part is 1/10th of the full‐scale problem, and so it is likely the case
that FPGAs or “hard‐copy” FPGAs are used in the implementation.
4.5.2 Phase‐I X‐Part: Dishes
The full‐scale SKA dishes X‐part is ~100X the size (processing, data handling) of the Phase‐I SKA
dishes X‐part. For ≤256 dishes, with 1 GHz/polarization (memo 130, Table 1), each X421 correlator
board (Figure 4‐7) can correlate 150 MHz/polarization, and so only 7 boards are required. However,
with the X421 correlator ASIC, these 7 boards produce only 28,672 channels per polarization product
across that bandwidth, whereas the memo 130 specification calls for 67,000 channels per
polarization product. Thus, a factor of between 2 and 3 over this is required to fulfil the
requirement. In this case an X421 ASIC, processing a single band/slice of 25 MHz/polarization,
operates at lower bandwidth, but running at a lower (33%) incoming packet duty cycle. By buffering
and re‐arranging the data, and eliminating baseline‐based time and channel integration9, it is
possible to use the same number of boards and chips to produce the required numbers of spectral
channels, as will be shown in the next section.
4.5.3 Phase‐I X‐Part: SAAs
For SAAs, Phase‐I calls for 50 elements, 480 beams, 380 MHz/polarization/beam, and 380k
channels/polarization/beam. Ignoring for a moment the 50 element requirement, and the mismatch
with the 32x32 element processing capacity of the X421 ASIC, the main challenge here is to optimally
produce the very large number of spectral channels, but hopefully within the internal RAM
capabilities of the chip.
The large number of spectral channels can be produced by the chip by breaking up the 75
MHz/polarization bandwidth capability into smaller ~2 MHz band/slices, each one with 2048
channels, and re‐arranging the data packets entering the chip to produce ~1 kHz channel resolution
across ~74 MHz. This re‐arrangement is shown in the following Figure 4‐16:
9 Already a feature of the chip design.
WP2‐040.050.010‐TD‐001 Revision : 1
2011‐03‐11 Page 30 of 59
Figure 4‐16 Data re‐arrangement to produce ~1 kHz spectral resolution across ~74 MHz/polarization of
bandwidth, using the X421 chip. Accumulator double‐buffering in the chip allows one band/slice to be
correlating/integrating, while the other band/slice results are transmitted on the 4 x 10G outputs.
In this scheme each band/slice “Packet Group”, assuming a channel‐burst length of 64, 2048
channels, 8‐bit samples (4b‐I, 4b‐Q), and dual polarization, requires 2 Mbytes (64 ch_burst x 8 ants x
2 pol’n x 2048 channels). For each band/slice, 18 such Packet Groups must be buffered, and there
are 37 band/slices, for a total of 2 M x 18 x 37 = 1.332 Gbytes—easily handled in the F‐part with
external DDR3 SDRAM (which must have 20 Gbps I/O—64‐bit wide DDR3 SDRAM has ~50 Gbps I/O).
Every ~30 msec, 2048 channels/polarization product, for all baselines, for 2 MHz of bandwidth are
produced by the chip on the 4 x 10G outputs (recall that 8 chips in a column are capable of 170
msec, therefore 1 chip is capable of 170 msec/8 = 21.25 msec). This represents an (original) “sample
time” integration time of ~1.152 seconds, meeting the Phase‐I minimum integration time
requirement in SKA memo 130, Table 3, of 1.2 sec. For longer integration times, an off‐chip LTA can
be used.
When operating in this way, the X421 chip’s ability to perform baseline‐based integration is disabled,
however, the ability to label the output data frames with the u, v, and w centroid remains intact.
Baseline‐based integration could still be performed in an off‐chip LTA.
As the X421 chip array‐based design is a mismatch with the 50 element (1250 baselines) correlation
requirement, it might be possible to incorporate in the X421 design data paths with the capability of
performing an optimal 50‐element cross‐correlation. Perhaps there could be 5 x 10G inputs, each
1 1
Packet Groups: 8xAnt, 2 MHz BW, 2 pol’n, 4+4 bit samples, 2048 channels, ch_burst=64
2 23 3
ch_burst*1/ch_width = 64 msec(37 x 2 MHz = 74 MHz BW/pol’n)
band/slice #:37
436
1.68 msec(real time)
1 1 1 1 1 1 1
1 2 3 4 5 17 18
64 msec(sample time)
1.68 x 18 = 30.24 msec (real time)
64 x 18 = 1.152 sec (sample time)
2 2 2 2 2
30.24 msec (real time)
1.152 sec (sample time)
22
1 2 3 4 5 17 18
to X421 ASIC
2048 channels across 2 MHz of band/slice 1
2048 channels across 2 MHz of band/slice 2
BUFFER
37
37
start integration stop integration stop integration
37
75,776 channels across 74 MHz
start integration
1
WP2‐040.050.010‐TD‐001 Revision : 1
2011‐03‐11 Page 31 of 59
with 10 antennas multiplexed instead of 8, for a bandwidth of 60 MHz/polarization instead, while
still processing either 1024 baselines in “array mode” or 1250 baselines in “cross‐correlation matrix
mode”. Or, a separate ASIC is developed just for the Phase‐I SAA application.
Assuming that the chip can be designed for 50 elements, 60 MHz/polarization, and operate as
described above to produce the required numbers of spectral channels, to cover 380 MHz of
bandwidth then requires 380/60 = 7 chips per beam. Seven chips, along with one or more external
LTAs (made up of FPGAs with external DDR3 SDRAM), could easily fit on one board (and possibly
even double the number of chips could fit on a board). It would therefore require 480 boards
(possibly 240 boards) to process all 480 beams. At 1U height each, and 48 boards in a rack,
approximately 10, 19” racks (possibly 5 racks) would be required for the X‐part of the correlator for
Phase‐I SAAs.
4.5.4 Phase‐I Central Beam‐forming
As all of the band/slice data required for beam‐forming is present on each board, central‐beam‐
forming will likely occur on each correlator board, with merging of band/slice packets occurring in
separate equipment before final output to non‐visibility processing.
4.6 Monitor and Control
Referring to Figure 4‐4, each X‐part correlator board has an M&C 1G (or 10G) network connection to
a higher‐level centralized M&C computer via COTS network switches. Each board could have its own
unique “rack‐slot” ID, determined by hard settings in the mid‐plane slot, communicated to the on‐
board M&C FPGA, and forming part of the board’s IP address10.
The X421 chip is built to be largely autonomous in its operation, with all required commanding
coming from the F‐part of the correlator, via the content and arrival time of data packets and INFO
packets. M&C for the chip is required only for determining transceiver status, eDRAM status, PLL
status, frame detect/overrun status, etc. M&C for the board would include the ability to monitor
voltages and temperatures, and a method for remotely commanding power‐ON and power‐OFF of
the board11.
The F‐part of the correlator, of course, needs to synchronize FFT and packet build/buffer epochs
across the system (for each particular SKA element type). The F‐part inserts INFO packets within
each Packet Group, which transmit to the X‐part and include calculated array‐phase‐center‐based u,
v, and w, observation information, as well as integration time information and timestamp
information. The F‐part calculates u, v, w, from the delay model, the center frequency of the center
spectral channel, and the antenna coordinate w.r.t. the array phase center. The X421 chip is able to
operate in “epoch‐less” integration mode (a necessary condition if baseline‐base integration is
performed), and so start and stop integrations don’t have to be communicated across the F‐part of
the correlator. (If integrations must be synchronized across all baselines, then provision is made for
10 A similar scheme is successfully used in the EVLA correlator; boards are largely known by their location, rather than serial number. 11 For the EVLA correlator, this was done by a separate pair of wires to each board running to a central computer. Another layer of hierarchy here might be more appropriate.
WP2‐040.050.010‐TD‐001 Revision : 1
2011‐03‐11 Page 32 of 59
start/stop integrations in the X421 chip design.) Time must also be distributed to all F‐part hardware
for timestamping of INFO frames and synchronization.
The F‐part of the correlator therefore has much more hierarchical real‐time control than the X‐part,
but is also considerably smaller (less hardware and rack space). It is expected that on each F‐part
board, a CPU core running on an FPGA, with a 1G network connection, could accomplish the
necessary M&C functions and communications.
4.7 Upgrade Growth Paths
As discussed in [1], the GSA concept allows technology upgrades whilst using the existing system
infrastructure. As technology improves (i.e. more logic packed into an ASIC), the size of the
correlation matrix shrinks. This allows more band/slice correlators to be packed into a smaller
space, allowing for more band/slices (i.e. bandwidth or beams) to be correlated for the same power
and space available. The only thing that needs to be done is the population of patch boards need to
change to accommodate a different band/slice correlator size.
At some point the horizontal board mounting scheme may become obsolete, allowing for vertical
board mounting wherein all baselines for a band/slice are correlated in a crate of boards. In this
case, the same racks and cooling infrastructure may be used, but with airflow running vertically
through the racks rather than that shown in Figure 7‐2.
WP2‐040.050.010‐TD‐001 Revision : 1
2011‐03‐11 Page 33 of 59
5 Risks
This section contains a table of possible risks, and risk mitigation strategy, as applied to full‐scale SKA
correlator systems, based on the concept presented in the previous section.
Risk Effect/MitigationNo acceptable air cooling
strategy found.
Use liquid cooling. Custom solution likely required, possibly using
COTS components. Web survey indicates there are several
companies capable of developing robust liquid cooling solutions.
Probably requires more expensive maintenance and therefore higher
operating cost.
Correlator (X421) ASIC
dissipates too much
power12.
Higher board power dissipation, higher than desired operating costs.
More expensive and complicated cooling strategies, process
narrower frequency slice per board, or de‐scope bandwidth and
processing requirements.
Proposed VBA (visibility‐
based addressing) scheme is
untenable from a
fundamental image
processing perspective.
Develop a different scheme, likely a time‐slice addressing approach
but possibly still using the embedded correlator network for
routing/distributing correlated data packets.
Horizontal or vertical
mechanical tolerance
problems eliminate the
possibility of using printed
wiring connections.
Improve mechanical tolerance specs of sub‐racks, racks, and floor
mounting hardware to meet requirements. If necessary, use short
(but more expensive) cables instead of patch boards.
4‐bit(I) and 4‐bit(Q) re‐
quantization and correlation
is not sufficient number of
bits.
Change re‐quantization to 8‐bit(I) and 8‐bit(Q)13. Each chip now
processes ½ the bandwidth.
Table 5‐1 Possible risks and risk mitigation strategy.
6 Requirements
Void
12 There is some indication that X421 chip area and power will not be unreasonable if implemented in <=~30 nm technology, but final assessment of feasibility will be available once a formal design study by an ASIC vendor is complete. 13 5-7 bits/sample don’t fit nicely into a 32-bit or 64-bit packet word, but the implications of using this range of bits remains to be explored.
WP2‐040.050.010‐TD‐001 Revision : 1
2011‐03‐11 Page 34 of 59
6.1 Item Definition
Void
6.1.1 General Description
Void
6.1.2 External Interfaces
Void
6.1.3 Internal Interfaces
Void
6.1.4 Modes
Void
WP2‐040.050.010‐TD‐001 Revision : 1
2011‐03‐11 Page 35 of 59
7 Characteristics
7.1 Performance Characteristics
Void
7.2 Physical Characteristics
7.2.1 X‐part Correlator Boards
Correlator boards for dishes might optimally be a different design (Figure 4‐4, Figure 4‐7) but use the
same ASIC and basic structure as for SAAs and DAAs. Each board may be contained within its own
independent mechanical cage, built to easily slide in and out of a rack slot. Boards may be hot
swapped.
Boards for dishes, SAAs, and DAAs, are approximately 16.5” W x 16‐18” D, depending on whether or
not FRPPs are installed for gridding operations. Each board likely contains a monolithic heatsink,
mechanically and thermally attached to the array of correlator chips and FPGAs for cooling. If liquid
cooling is used, the cooling lines are integrated with a hollow plate, similarly attached to chips on the
board. Cooling lines may enter/exit the board either by the front (if blind‐mate connectors are not
available), or by the rear.
7.2.2 X‐part Racks
Racks for dishes are precision engineered and installed to meet mechanical tolerance requirements
necessary to allow for establishment of printed wiring connections. Sub‐racks are installed in racks,
and are engineered to the same standards. Depending on ASIC capacity, and corner‐turner
granularity, it may not be necessary for rack‐to‐rack printed wiring connections, in which case COTS
racks and equipment may be used.
For dish‐WBSPFs of up to 3072 dishes and 1.05 GHz/polarization, 7, 12x12 board arrays are required
to produce all polarization products using the baseline X421 ASIC design. Thus, there are two bays
of 12, 19” racks, with 48U available vertical space, to house the WBSPF correlator, occupying two
floor areas of approximately 7 m W x 1 m D (assuming a rack depth of 1 m). These racks do not need
to be located together or particularly near the F‐part of the correlator. Minimum clearance at the
front and back of each bay is 1 m14, and racks or other equipment may be placed on either side15.
Total floor area is therefore ~7 m x 4 m. Total board count here is 1008. Standard spectral‐channel
capacity in this configuration is 28,672 channels/baseline/polarization product (or total 114,688
channels/baseline); more spectral channels, but without on‐chip baseline‐based integration can
likely be obtained using the method described in section 4.5.3.
For dish‐PAFs of up to 2048 dishes, 750 MHz/polarization, and 30 beams, 150, 8x8 board arrays are
required to produce all polarization products using the baseline X421 ASIC design. Two of these bays
could be shared with the WBSPF correlator, but the extra complexity might not be worth the effort.
14 Although, in the EVLA system, 2.5’ (76 cm) was found to be sufficient. 15 As previously noted, racks containing central beamformer equipment could be located on each side.
WP2‐040.050.010‐TD‐001 Revision : 1
2011‐03‐11 Page 36 of 59
Six 8x8 board arrays can fit into an 8‐rack bay, and so 25, 8‐rack bays are required for the correlator.
Each bay occupies a floor area of approximately 4.5 m W x 1 m D. These racks do not need to be
located together or particularly near the F‐part of the correlator. Minimum clearance at the front
and back of each bay is 1 m, and racks or other equipment may be placed on either side. Total floor
area, placing two bays side‐by‐side is therefore ~9 m W x 14 m D. Total board count here is 9600.
Standard spectral‐channel capacity in this configuration is 20,480 channels/baseline/polarization
product (81,920 channels/baseline); more spectral channels, but without on‐chip baseline‐based
integration can likely be obtained using the method described in section 4.5.3.
Racks and sub‐racks for DAAs and SAAs can be standard COTS equipment as there is no need for
rack‐to‐rack printed‐wiring connections. For 250 elements, 1000 beams, and 300 MHz/polarization,
two boards are required for each beam, requiring 2000 boards for each telescope, or a total of 4000
boards. Assuming a similar packing of 48 boards per 19” 48U rack, a total of 84 19” racks are
required. If arranged in 10‐rack bays, total floor area is ~6 m W x 11 m D. Standard spectral‐channel
capacity in this configuration is 8192 channels/polarization product/baseline/beam, with more
channels possible as described in section 4.5.3.
A possible floor layout of the X‐part correlator racks, for the full set of SKA elements is shown in
Figure 7‐1 below:
Figure 7‐1 Possible full‐scale SKA X‐part correlator layout, using the baseline X421 ASIC, measuring ~34 m W
x 22 m D. Dish‐WBSPF: 1.05 GHz/polarization, 3072 elements. Dish‐PAF: 750 MHz/polarization, 2048
elements, 30 beams. SAA: 300 MHz/polarization, 256 elements, 1000 beams. DAA: 300 MHz/polarization,
256 elements, 1000 beams. Enough ‐48 VDC power plant capacity is shown for ~3.2 MW (as shown in the
table on page 53, more like 27 x 200 kW power plants are required). This layout assumes air cooling as
shown in Figure 7‐2. Not shown are the F‐parts of the correlator, or any COTS computers or network
equipment for M&C, image processing, or non‐visibility processing.
48VDCPowerPlant+
Batteries
48VDCPowerPlant+
Batteries
48VDCPowerPlant+
Batteries
48VDCPowerPlant+
Batteries
48VDCPowerPlant+
Batteries
48VDCPowerPlant+
Batteries
48VDCPowerPlant+
Batteries
48VDCPowerPlant+
Batteries
48VDCPowerPlant+
Batteries
48VDCPowerPlant+
Batteries
48VDCPowerPlant+
Batteries
48VDCPowerPlant+
Batteries
48VDCPowerPlant+
Batteries
48VDCPowerPlant+
Batteries
48VDCPowerPlant+
Batteries
48VDCPowerPlant+
Batteries
HVAC HVAC HVAC HVAC HVACHVAC
HVAC HVAC HVAC HVAC HVAC HVAC
Dish-WBSPF SAA DAA
SAA
DAA
Dish-PAF
WP2‐040.050.010‐TD‐001 Revision : 1
2011‐03‐11 Page 37 of 59
7.2.3 F‐part Boards and Racks
Standard COTS board sizes, sub‐rack, and rack equipment may be used. The F‐parts of the
correlator(s) may be located in the same or a separate room, a separate building, or even at the
antennas. There are point‐to‐point connections between the F‐parts and X‐parts of the correlator,
requiring short‐haul copper, or more likely fiber, capable of 100 m or so range.
7.2.4 X‐part Synchronization
It is believed that it is not necessary for there to be any globally‐distributed clock or synchronization
signal within the X‐part of the correlator. Each board should be able to operate using its own local
reference clocks, using standard MGT rate‐matching techniques used in telecommunications. Thus,
there is no need for any connections or wiring to handle global clock distribution.
7.2.5 Rack Power Delivery and Control
Each rack will likely have a mains ‐48 VDC power entry and breaker panel, fed by overhead room
bus‐bars, similar to that found in major telecommunications central offices. Wires coming off each
breaker individually route to each board within each rack. A per‐rack, remote power controller will
be used to allow the remote power control of each board in each rack, and this will tie into the rack
network switch (if this is the case, the remote power controller can’t control power to the switch,
likely not required anyway). In Figure 7‐1 a number of distributed COTS ‐48 VDC power plants are
used to supply power to overhead bus bars.
7.2.6 F‐part Data Insertion into the Correlator
The 0XT or HM‐Zd connectors on the correlator board facilitate inexpensive board‐to‐board
connections for distribution of X‐part data. However, for the first insertion of F‐part data (Figure
4‐3), data from 32, 10G sources, each sourcing from 8 SKA elements (antennas) needs to be
converted to appropriate electrical form. It may be possible for a small plug‐in device—with a multi‐
(32)‐fiber connector on one end, and a 40‐pair 0XT connector on the other end—to be designed to
easily perform the conversion. The motherboard could supply power to this converter, via spare
pins on the 0XT connector. Further design study is required to determine exactly how the F‐part of
the correlator connects to the X‐part.
7.2.7 M&C Network
Each rack will likely have its own 48+2 port 1G (or 10G) COTS network switch, powered off ‐48 VDC,
with point‐to‐point network connections to each board in the rack, and network connection to a
central control switch. All network cabling will be routed in overhead cable trays. If a 1G network is
used, cat6e UTP cable, using RJ‐45 connectors is likely sufficient for the job.
7.2.8 Output Data Network
This network depends on whether the image processing CPUs are subsumed into the correlator or
not. If so, then if the data rate for final gridded points is low enough, no separate network is
required, and low‐rate gridded points can possibly be transmitted to the final image processors
WP2‐040.050.010‐TD‐001 Revision : 1
2011‐03‐11 Page 38 of 59
using the M&C network. If high‐capacity network connections are required, then all of these
presumably fibre cables, from the front panels of all of the boards, will be routed in overhead cable
trays to destination computers contained in the same or different room or building.
7.2.9 Thermal Considerations
The horizontal mounting of circuit boards in racks is not particularly conducive to a simple rack or
crate‐based air‐flow cooling arrangement. Front‐to‐back airflow, with individual fans on each
board16, might be possible but then mechanical (fan) failures are distributed to all boards, making
repair/replacement less than desirable (although, large data centers with 19” rack‐mount pizza box
CPUs and switches do just this). There is also the 1U height restriction which will limit the amount of
air that can flow, and the power dissipation that can be handled.
Another possibility is liquid cooling where there is a monolithic heat spreader/exchanger mounted
to all of the chips on the board, with cooling lines exiting the front or rear of the board. There are
companies which engineer such things for specialized applications [8].
Yet another possibility is centralized air blowers, pressurizing the floor and the space between racks
with air flowing side‐to‐side (transverse) across board components. This possibility is shown in
Figure 7‐2. This scheme might have problems with maintaining uniform air flow across all the boards
in the rack and might need a graduated airflow restriction vent to achieve reasonable uniformity.
Nevertheless, it does have the advantage of centralizing blowers and, in principle, should be possible
provided the power dissipation on each board is not unreasonable.
Figure 7‐2 Centralized air‐blower cooling arrangement.
16 Or, with larger rear-rack blowers/suckers.
Cold air in
Warm air outWarm air out
Pressurized floor cavity Cold air in
Boards Boards
Boards
Boards Boards
WP2‐040.050.010‐TD‐001 Revision : 1
2011‐03‐11 Page 39 of 59
7.3 Electrical Characteristics
The boards within a band/slice bay of racks are tightly electrically connected. Good common‐mode
conducted noise filtering on power supplies to meet FCC Part 15, Sub‐part J, Class B conducted EMI
requirements17, good grounding/bonding of racks together and to system ground, and the use of
differential signalling help to ensure reliable data transport between racks. In the EVLA system such
an approach was used and, with much longer cables connecting racks together, reliable data
transport on ~12,000 1 Gbps differential pairs was achieved. Short nearest‐neighbour connections
within the GSA should have equally good data transport reliability, at 10X the bandwidth per pair of
the EVLA.
COTS fiber and M&C network connections use methods and follow standards to ensure electrical
isolation between the two ends. Therefore, M&C COTS network equipment, the F‐part of the
correlator, and other equipment such as image processing computers etc. are properly isolated, and
therefore do not have to be tightly physically or electrically coupled to the main X‐part correlator
system.
All boards in the correlator are supplied mains ‐48 VDC power. COTS power systems used in
telecommunications are readily available, complete with battery backup at a capital cost of about
$1/W. ‐48 VDC hot and return route to each board, and are completely isolated from signal, chassis,
and earth ground, thereby establishing known current paths. The return signal of each power plant
is only connected to earth ground at the power plant distribution panel. The use of ‐48 VDC power
eliminates the requirements for high‐voltage safety engineering and agency approval in non‐COTS
equipment. COTS power plants are built for reliability with N+1 redundancy, hot swap capability,
built‐in alarms, and hot‐standby battery backup. Such a system is used in the EVLA correlator with
very good availability/reliability and performance.
7.4 Reliability
The X421 ASIC is designed for board and system reliability as is nearest‐neighbour printed wiring
connections. As discussed in section 4.2.1, the ASIC minimizes the number of solder points on the
board, and this, as well as the resulting simpler board design, improves long‐term board and system
reliability. At a nominal 50 FITS18 (failures in 109 hours) per chip, the total X‐correlator system MTBF
as far as ASICs goes is ~22 hours (930k chips). Also, an ASIC‐dominated system, because the ASICs
are hardwired, is not subject to soft configuration SRAM failures as are FPGAs, something that has
been seen on a number of occasions in the EVLA correlator with ~16k FPGAs.
To maximize reliability, the recommendation with the EVLA ASIC was to keep junction temperature
at or below 50 oC. At this temperature, every 10 oC rise or fall in junction temperature is ~2X change
in reliability [9]. Unless ASIC fabricators indicate otherwise, to achieve good reliability, 50 oC should
be the maximum junction temperature target for thermal design of the system.
17 Most importantly to deal with ground-loop noise infecting differential signals. 18 No reference given, but chips of this sort of complexity (EVLA ASIC, Altera FPGAs—Altera quarterly reliability report) historically test with failure rates of ~30-50 FITS. However, no experience with very small feature sizes <~30 nm required to implement the X421 ASIC is available.
WP2‐040.050.010‐TD‐001 Revision : 1
2011‐03‐11 Page 40 of 59
Further study, using Telcordia SR‐332 (telecom standard) and/or MIL‐HDBK‐217F (military standard),
once a more detailed design is complete, is required to further quantify reliability.
7.5 Maintainability
As long as centralized air movement is possible, there are no moving parts in any of the correlator
boards to fail or require maintenance. With a ‐48 VDC mains supply, there are no AC‐DC power
converters distributed amongst the boards, and so failure of high‐voltage semi‐conductor devices,
and AC‐smoothing electrolytic capacitors in boards is not an issue. Thus, maintenance requirements
of the boards should be minimal. Some issues that could arise and need to be considered are:
Hot removal of a board, if the VBA network is in place, requires removal of front‐panel
printed wiring boards. These should therefore be designed to make such removal easy,
possibly by integrating handles or ejectors in their design.
Hot removal of a board will interrupt signal flow to other boards in the correlator matrix.
The board and the X421 ASIC must therefore be designed to tolerate and automatically
recover from such interruption. Note that the VBA network does not suffer this same fate
as it is possible for packets to be routed around the missing board.
Other than VBA network connections, the only cable that needs to be removed to remove a
board is the front‐panel network cable, possibly an RJ‐45 UTP or a fiber.
To reduce temperature cycling of boards, and possible long‐term solder‐joint stress issues,
it is a good idea in principle to maintain constant board temperature even through loss of
signals, power failures, HVAC failures etc. Unfortunately, this is in conflict with the desire
for centralized cooling; it may be that some boards go down or off‐line, while others remain
hot‐running, and so it will likely be difficult to individually temperature control each board.
Mitigating this effect is the use of a bare minimum number of solder joints in the board
design.
If the image‐processing/gridding computers are subsumed into the correlator (i.e. using
FRPPs), they should not contain moving parts such as hard drives19 or local cooling fans to
minimize the potential for distributed failures; high‐capacity solid‐state NV memories are
likely capable of meeting storage requirements on the time scale required by the SKA.
They, and the motherboard, should be designed so they are easily hot‐swap replaceable
without having to remove the motherboard or any network connections, or power‐down
the motherboard, or otherwise interrupt data flow on the motherboard.
The correlator room should be kept clean and general access restricted to avoid particle
contamination and ESD hazards. For the EVLA, the room is strictly‐enforced ESD safe,
restricted personnel access, and was cleaned and has filters in place to meet ISO Class 8,
with an additional MERV 13 filter in the HVAC systems.
19 Although this assumption may be incorrect, given the proliferation of very small, high-capacity hard drives in laptops.
WP2‐040.050.010‐TD‐001 Revision : 1
2011‐03‐11 Page 41 of 59
7.6 Availability
Based on the shear numbers of ASICs in the system, the simple reliability calculation of section 7.4
indicates that on average, it would not be unexpected to see 1 failure in the system every day, once
the initial infant mortality failures have worked themselves out of the system. Up until that point,
higher failure rates are to be expected.
Thus, it should be expected that some correlator data products will not be available at any given
time. With centralized DC mains power, and N+1 fault‐tolerance in those systems, the vast majority
of the correlator should, however, be available a high percentage of the time. Experience with the
EVLA has shown that it takes substantial time to fully shake‐down infant mortality failures and
hardware and software bugs, and so for the first ~1‐3 years availability will likely be sporadic, but
increasing with time. There will also likely be times and conditions when very strange and unlikely
behaviour and interaction between sub‐systems occurs, seemingly defying explanation. Only time
and persistent testing and debugging will shake these things out of such a large system.
For maximum availability and reliability, the importance of rigorous testing, strict quality control,
ESD‐controlled handling etc. through the entire life‐cycle of the correlator can’t be over‐emphasized.
Once fully operational, stable, and within the long‐term low failure‐rate region of the failure‐rate vs
time “bathtub curve”, the following exceptions, which reduce availability can be expected:
Regular system protection‐level tests. Examples are AC‐fail tests, fire‐alarm/fire‐suppression
tests, HVAC failure tests.
Software/firmware upgrades.
HVAC failures. HVAC servicing.
COTS network and computing equipment failures.
AC Mains power failures. Reliable battery backup is very expensive; count on ~5 minutes of
full‐power backup, and ~15 minutes of ~75%‐reduced power backup. Backup beyond these
requirements most efficiently comes from external sources (e.g. diesel generators).
‐48 VDC power‐plant failures. These systems are telecom‐quality and designed for high‐
reliability, but they, too, are made of and by “mere mortals” and have their own issues.
Connector/contact failures. Given the shear number of connections, there are bound to be
failures here. High‐reliability contacts and manufacturing processes must be used to
minimize these effects.
Internal ASIC failures.
Internal FPGA failures; SRAM configuration “soft” failures.
Board solder‐point failures.
WP2‐040.050.010‐TD‐001 Revision : 1
2011‐03‐11 Page 42 of 59
Residual bug‐induced failures. These can be very intermittent, and take substantial time to
shake‐down.
Human‐induced failures. Examples are incorrectly installed cables, debris left from
installation getting where it shouldn’t, ESD handling faults, miss‐reading of
handling/installation/repair procedures etc.
Unknown source failures.
Although 100% availability is the ideal goal, in reality, with all of the above issues factored in, on
average ~90% availability, with 10% loss coming in major “planned” timeslots, should be the realistic
goal.
7.7 Additional Quality Factors
7.8 Environmental Conditions
It is assumed that the correlator is contained in one or more RFI‐shielded, clean, ESD‐restricted
rooms, contained in one or more buildings. Nominal ambient temperature is 20 oC, however 15 oC, if
available, would improve reliability by lowering semi‐conductor junction temperature.
7.9 Transportability
The correlator, as a whole, is not transportable once fully installed. Nor does it have to be. All racks,
power plants, cables, and boards are shipped to the correlator site, and installed there.
There are no pieces of the correlator which have extreme “monolithic” transport requirements. All
pieces can ship using standard methods in fork‐lift‐size containers.
7.10 Flexibility and Expandability
Provided room space is allocated, the correlator can be expanded both in terms of bandwidth (by
adding band/slice correlators), in terms of numbers of antennas, and in terms of numbers of spectral
channels (without adding X‐part hardware, but with increased integration time) as shown in section
4.5.3. The number of antennas limitation is governed by the total delay of the signal as it traverses
the cross‐correlation daisy‐chain, and input buffering provided in the X421 ASIC.
For SAAs, and DAAs, the “soft” limit is 256 elements, above which all correlations do not fit on a
single board. Beyond 256 elements, boards can be re‐deployed and connected as is done for dishes
to process more elements with a 256‐element granularity.
As mentioned in section 4.5.3, the X421 ASIC design is not particularly suited for Phase‐I SAAs. A
different or modified ASIC design is required for optimal usage of silicon.
The basic rack and cooling infrastructure can be installed for Phase‐I, and expanded/grow to
encompass the full‐scale SKA.
WP2‐040.050.010‐TD‐001 Revision : 1
2011‐03‐11 Page 43 of 59
7.11 Portability
The system, once installed, is not portable.
8 Design and Production
8.1 Components, Materials and Process
To achieve high reliability and availability, engineering starts at the component level and builds from
there. It is highly recommended that the following methods are in place for engineering and
construction of the full SKA:
Establish a separate, appropriately‐sized, components reliability group, whose goal is to
perform due‐diligence and reliability analysis of every single component (which has potential
for failure) used in every part of the system. A component cannot be used in any part of the
design if it is not approved by this group.
Establish a module‐level reliability testing group and laboratory for the SKA. All modules
must meet and pass all reliability requirements and testing required by this group. The
laboratory would be equipped with HALT/HASS (Highly Accelerated Life Test, Highly
Accelerated Stress Screening) equipment, and with associated expertise, to ensure
effectiveness. It might be cost‐effective to contract some or all of this function to industry.
Establish a quality‐control group, whose sole responsibility is to establish and ensure that all
modules and equipment delivered to the correlator site meet strict quality requirements.
Some of this functionality might be contracted to specialized industry (e.g. SEM‐analysis of
PCBs, semi‐conductors, solder processes).
Establish a module‐level test group, whose sole responsibility is to develop test setups to
rigorously stand‐alone‐test any modules which are to be delivered to the correlator site. For
example, for an ASIC development, there should be a small test group that works in
conjunction with the ASIC designer(s) to develop test suites, test beds, etc. throughout the
full life‐cycle of the ASIC. As another example, when a board is to be developed, the test
group works in parallel with the board design group to develop test fixtures, stimuli,
methods, and procedures.
Establish a system integration and test group, whose job is to perform engineering‐site and
correlator‐site integration and test, and feedback information to designers and other test
groups to find and fix bugs and faults.
It is expected that all correlator circuit board assemblies will be delivered from contract
manufacturers in a full turn‐key manner. This requires that representatives from most of
the above test groups will spend time helping to setup the contract manufacturer’s
processes, monitoring processes, setting up final contract manufacturer Q/A etc. so that
products that meet all reliability and test requirements are delivered to the correlator site.
The above recommendations will help to ensure that, once delivered, installation, and system
integration and test proceeds expeditiously. If these recommendations are not followed, then it can
WP2‐040.050.010‐TD‐001 Revision : 1
2011‐03‐11 Page 44 of 59
be expected that final system integration and test, and full operational availability will be delayed
accordingly.
Of course, design groups are also required; these groups may or may not be in separate locations,
and likely include:
ASIC design group. Carry out the design of one or more ASICs, working in close conjunction
with industrial fabricators, and the ASIC test group.
FPGA/hardcopy design group.
Board design group.
Software design group.
Mechanical design group.
System design group. Working closely with all of the above groups to address thermal
design, system design, and deployment and installation. Develop and arrange for fabrication
and installation of all systems‐level components.
8.2 Development and Production Plan
This plan is roughly based on the EVLA correlator experience, but necessarily ramped‐up in rigour to
address higher volume requirements, and more rigorous engineering to address higher system
complexity and size. This plan addresses development and production of the full‐scale SKA (i.e.
Phase‐II), without regard to what might already be in place for Phase‐I, as it is believed that Phase‐I
as far as the correlator is concerned, might be throw‐away technology, and not naturally grow to
Phase‐II, at least within the context of the GSA concept.
It is envisioned that with any particular circuit board, there will be 4 build stages. These kinds of
stages can also apply to custom‐designed mechanics or other systems, as deemed appropriate, but
the focus here is on circuit board assemblies. The development of the GSA X‐board is the model for
this plan. Of course, prior to these build stages there is significant ASIC and FPGA code development
and test, as well as board design and modelling. The development and production model is such
that design and development occurs “in‐house” by SKA‐consortium institutes, but that production is
through one or more industrial contract manufacturers.
Stage 1 – Alpha prototype. This is the first full‐size engineering prototype of any particular
circuit board assembly. There may or may not have been smaller‐scale “proof‐of‐concept”
prototypes prior to this prototype. Tests of the board at this stage will lead to changes
required for the next‐stage build. It should at least be possible to get the board working at
some level in this stage, unless a major oversight in the design has occurred. The build of
this board includes the prototype ASIC. Normally the build quantity here is 1.
Stage 2 – Beta prototypes. These prototypes incorporates all changes from Stage 1, and are
used to provide more robust prototypes for more detailed and exhaustive testing, as well as
developing full turn‐key manufacturing processes. This build will also include ASIC
WP2‐040.050.010‐TD‐001 Revision : 1
2011‐03‐11 Page 45 of 59
prototypes, to allow for more exhaustive ASIC prototype testing. The build quantity here
could be ~10‐30 pieces (or more or less, depending on the confidence established in Stage
1), so that more range of variability in process is tested. Some of these prototypes will be
used as UUTs for reliability testing, quality analysis, test beds for software testing, and initial
system integration and test. Changes required to the design detected here will lead into the
Stage 3 build.
Stage 3 – Pre‐production prototypes. These prototypes are built in enough volume so as to
flush out any potential large‐volume problems. This build is essentially a “full dress
rehearsal” for the full‐scale (Stage 4) build. Similar rigour of testing as Stage 2 is employed,
except with more quantity. Final tweaking of the full turn‐key production process is done
here. Given the full‐scale full‐volume quantities in the final system, it is not unlikely that the
build quantities here could be in the 100‐1000 quantity range, and at a minimum will be a
full rack‐bay worth of boards for full power delivery and thermal tests. These prototypes
will use production ASICs.
Stage 4 – Full‐scale production. Full turn‐key production, with units delivered directly to the
final installation site, tested and ready for operations. Boards are installed in racks in units
of band/slices, and connected to F‐part and Central Beamformer accordingly.
8.2.1 Development and Production Schedule
This schedule outlines in broad‐brush terms how development and production of a full‐scale SKA
correlator, based on the GSA concept, might proceed. These estimates are roughly based on the
EVLA experience, but modified assuming design and test groups are available as defined in previous
sections. Many tasks can proceed in parallel, but the impact of ASIC capacity can have far reaching
effects and so a design study establishing a good baseline on forecasted ASIC capacity, cost, and
power, as well as functional and performance specifications, is assumed to be available prior to the
development schedule outlined here. There are many important peripheral activities (e.g. reliability
testing, quality analysis) as well, but the concentration here is on main design group tasks; it is
assumed that peripheral support activities occur in step.
X‐chip (e.g. X421) ASIC RTL development and test. Estimate ~2‐3 years20, assuming one
primary designer, 1 or more associate designers (as believed to be optimal), and associated
test group working in parallel. There are many complex features in the ASIC to enable
capabilities as outlined in previous sections, and all of these must be exhaustively tested.
This design process includes physical synthesis and timing analysis, so once it is completed,
i.e. the RTL code is “qualified”, an ASIC fabrication vendor can take the results and fabricate
the chip.
ASIC prototype fabrication, 6 months to 2 years.
20 The EVLA ASIC initial RTL design and incremental test took ~4 months, but the X421 design is quite a bit more complex. “Qualification” of the RTL—subjecting it to independent verification testing—took ~1 year for the EVLA ASIC. Factoring out lost time due to confusion from unqualified vendors, and mistakes we made in test fixtures, the entire EVLA ASIC process, from start of RTL to production chips took ~3 years.
WP2‐040.050.010‐TD‐001 Revision : 1
2011‐03‐11 Page 46 of 59
X‐board FPGA design, development, test21, 2 years.
X‐board design, development, and Stage 1 fabrication. ~1‐1.5 years.
Custom rack and sub‐rack design, thermal modelling, development, and Stage 1 fabrication.
~1‐2 years.
X‐board software development and test. Starts with ASIC specification, and proceeds
through entire duration of project.
Patch board development and Stage 1 prototype fabrication, ~6 months.
X‐board and prototype ASIC Stage 1 testing. ~6 months to 1 year.
Stage 2 X‐board development and fabrication. ~3‐8 months.
Stage 2 testing, initial system integration and testing, ~6 months.
Full production ASIC fabrication, ~6 months to 1 year.
Stage 3 X‐board development and fabrication. ~5 months.
Stage 3 X‐board testing, system integration and testing, final full‐bay thermal testing ~6
months.
Stage 4 full turn‐key production and delivery of all correlator modules and COTS systems, ~1‐
2 years.
A rough Gantt chart schedule of all of these activities is shown in Figure 8‐1. The schedule assumes
that full production ASIC fabrication will see a ~continuous roll‐out of devices, rather than one
lumped delivery at the end of the allocated time. In all cases, the worst‐case times from the above
bullets are used.
21 All on the desktop, using FPGA-vendor tools and simulations.
WP2‐040.050.010‐TD‐001 Revision : 1
2011‐03‐11 Page 47 of 59
Figure 8‐1 Rough Gantt chart of X‐part correlator development, including ASIC development with a
fabrication start time at the beginning of 2016. Some of the activities could have earlier start times, in
particular preliminary X‐board feasibility study/development, software development, rack and sub‐rack
design, and FPGA design. ASIC RTL development and test is shown beginning of 2012, which may be a bit
premature, depending on how accurate future technology forecasts are.
8.3 Electromagnetic Radiation
High‐quality signal integrity board design, using differential pairs for high‐speed signals, helps to
ensure low RFI emissions. However, digital boards will still likely generate RFI at levels unacceptable
to SKA receiving elements, as the correlator is likely to be centrally located and near most elements.
Therefore, it is assumed and required that the correlator be housed in one or more RFI‐shielded
rooms, with appropriate power‐line and signal filtering.
8.4 Manufacturer Nameplate and Product Marking
8.5 Industrial Standardisation
The elements of the correlator described in this concept description do not particularly meet, nor do
they seem required to meet, any particular industry‐standard form factors. If possible, though,
FRPPs, should they be used as described here, should meet industry standards, if such standards
exist to meet performance requirements. Where possible, industry‐standard protocols and methods
should be used throughout, so as to minimize development effort.
8.6 Interchangeability
It is likely desirable for the X‐part correlator board to be interchangeable amongst different
correlator types (e.g. SAA, dishes etc.), and populated in different parts of the correlator. However,
this may not be optimal from a cost point of view, as boards for SAAs, DAAs, and in the diagonal
section of dishes contain 9x8 ASICs, whereas the bulk of boards used for dishes need only 8x8 ASICs.
Therefore, these two boards may not be interchangeable.
WP2‐040.050.010‐TD‐001 Revision : 1
2011‐03‐11 Page 48 of 59
8.7 Safety
‐48 VDC mains systems have the added benefit that they run at human‐safe voltages. Nevertheless,
precautions must be taken to avoid short‐circuit of the hot and return rails, as significant current
delivery is possible, and can quickly melt and/or vaporize metal, and cause severe burns. COTS ‐48
VDC power plants and batteries are engineered and installed for human‐safe environments, and so
this should not be an issue. Earthquake zone ratings for the installation must be considered so that
racks and equipment are appropriately fastened so as to avoid crushing safety issues should such an
event occur.
8.8 Ergonomics
The correlator room(s), assuming centralized air cooling, is not an office‐like work environment,
conducive to long‐term human exposure. The room is restricted access, and only those personnel
performing directed functions should be allowed access. For the EVLA, only a limited and qualified
set of NRAO personnel are allowed in the room and the room is kept “spartan” so as to minimize the
possibility for collection of people, junk, and dust.
8.9 Confidentiality and Protection
The concepts described in this document are the property of the National Research Council of
Canada and must not be divulged to anyone outside the SKA Project Development Office without
permission.
8.10 Supplies from the Contracting Authority
Void
8.11 Resource Reserve Capabilities
Void
8.12 Documentation
Void
8.13 Logistics
Void
8.14 Personnel and Training
This section describes personnel and training required for the final on‐site system.
8.14.1 Personnel
The following personnel are expected to be required:
WP2‐040.050.010‐TD‐001 Revision : 1
2011‐03‐11 Page 49 of 59
Power electrical maintenance (1 person). Responsible for maintenance and monitoring of
mains AC, mains DC, and HVAC systems.
Network administrators (2 persons). Responsible for troubleshooting, maintaining, and
upgrading network equipment.
Correlator maintenance engineers (2 persons). Responsible for troubleshooting,
maintaining, and repair/replacement of correlator systems equipment.
Software maintenance engineers (2 persons). Responsible for troubleshooting, maintaining,
and upgrading correlator systems software.
Site/building maintenance engineer (1 person). Responsible for maintaining the correlator
building(s), and site services.
Operations personnel, operations scientists (4 persons). There will be the need for several
operations personnel (operators) and operations scientists having to do with observation
scheduling, exceptions handling etc.
8.14.2 Training
Void
8.15 Characteristics of Secondary Items
Void
8.16 Priority
Void
WP2‐040.050.010‐TD‐001 Revision : 1
2011‐03‐11 Page 50 of 59
9 Cost and Power Estimates
This section includes 2 tables estimating cost and power of the entire X‐part correlator installation
shown in Figure 7‐1, as well as operating costs. Does not include cost and power of the F‐part, or
data transport from the F to the X‐part, but does include subsumed gridding CPUs (FRPPs) and
associated VBA network. Cost estimates are based on recent experience with the EVLA,
incorporating rough estimates for volume pricing, and technology improvements.
9.1 Cost Table
Description Qty Cost ea Cost total
GSA X Board
X421 ASIC (<=~30 nm) (9 column x 8‐row board) 72 $50 $3600
Repeater Hard Copy FPGA 2 $300 $600
350 W of power supplies (~$1/W) 1 $300 $300
VBA data switching hardcopy FPGAs 2 $400 $800
Multi‐core Field‐Replaceable Processor Pack 2 $500 $1000
M&C FPGA 1 $100 $100
M&C and Gridded data output SFP Module 1 $25 $25
Miscl power, temp, monitor semis 1 $20 $20
Decoupling caps, resistors, various passives, LEDs 1 $200 $20
0XT 40‐pair ERNI connector 8 $10 $80
18” x 16.5” multi‐layer (est. 14 layers) 1 $200 $200
Cooling plate (liquid or air heatsink) 1 $100 $100
Miscl hardware 1 $100 $100
Sub‐total $6935
Turn‐key mfg costs (incl. part procure., test) 25% $1734
TOTAL per GSA board (includes Gridding CPUs) $8669
Per‐GSA Board Connectivity
Rear mid‐plane (turn‐key cost) 1 $100 $100
Horizontal patch board (turn‐key cost) 1 $50 $50
Vertical patch board (turn‐key cost) 1 $30 $30
Telescope GSA+Connectivity+Rack costs
Dish‐WBSPF (1 GHz/pol’n, 3072 antennas)
GSA X Boards 1008 $8669 $8.74M
Rear connectivity22 (1 mid‐plane, 1 h‐patch, 1 v‐patch) 1008 $180 $0.18M
Front connectivity (1 h‐patch, 1 v‐patch) 1008 $80 $0.08M
48U, 19”, precision rack 24 $1000 $0.024M
Per‐rack, network switch, breaker panel, power
cables
24 $5000 $0.12M
Dish‐WBSPF X‐correlator TOTAL $9.1M (A)
22 Assumes cables are not required to connect boards along diagonal.
WP2‐040.050.010‐TD‐001 Revision : 1
2011‐03‐11 Page 51 of 59
Description Qty Cost ea Cost total
Telescope GSA+Connectivity+Rack costs
Dish‐PAF (750 MHz/pol’n, 2048 antennas), 30 beams
GSA X Boards (5 band slices x 8x8 array, x 30 beams) 9600 $8669 $83.2M
Rear connectivity (1 mid‐plane, 1 h‐patch, 1 v‐patch) 9600 $180 $1.73M
Front connectivity (1 h‐patch, 1 v‐patch) 9600 $80 $0.77M
48U, 19”, precision rack 184 $1000 $0.184M
Per‐rack, network switch, breaker panel, power
cables
184 $5000 $0.92M
30 Beam, 750 MHz/pol Dish‐PAF X‐correlator TOTAL $86.8M (B)
SAA (300 MHz/pol’n, 256 elements), 1000 beams
GSA X Boards (2 band slices(boards) x 1000 beams) 2000 $8669 $17.3M
48U, 19”, precision rack 42 $1000 $0.042
Per‐rack, network switch, breaker panel, power
cables
42 $5000 $0.21M
1000 Beam, 300 MHz/pol SAA X‐correlator TOTAL $17.6M (C)
DAA (300 MHz/pol’n, 256 elements), 1000 beams
GSA X Boards (2 band slices(boards) x 1000 beams) 2000 $8669 $17.3M
48U, 19”, precision rack 42 $1000 $0.042
Per‐rack, network switch, breaker panel, power
cables
42 $5000 $0.21M
1000 Beam, 300 MHz/pol DAA X‐correlator TOTAL $17.6M (D)
System Infrastructure
System 200 kW 48VDC battery‐backed power plant 27 $200k $5.4M
Shielded room, cooling, and power distribution
infrastructure
1 $3M $3M
System infrastructure TOTAL $6.3M (E)
Industry NREs (not incl. institute design costs)
X421 ASIC, <=~30 nm 1 $6.5M23 $6.5M
GSA and Patch Board design and fabrication NRE 1 $500k $0.5M
System infrastructure engineering 1 $1M $1M
NRE TOTAL $8M (F)
System TOTAL w/o 35% contingency (A‐F) $147M
TOTAL c/w 35% contingency $199M
Table 9‐1 Cost Table.
23 Possibly less than this; likely the “gold-plated” cost.
WP2‐040.050.010‐TD‐001 Revision : 1
2011‐03‐11 Page 52 of 59
9.2 Power Table
Power estimates are very rough and are heavily leveraged by the X‐board power, itself leveraged by
ASIC power. Further refinements and optimizations based on detailed design studies are required.
Description Qty P each P total
GSA X Board
X421 ASIC (9 column x 8‐row board) 72 3 W24 216 W
Repeater Hard Copy FPGA (170 mW per 48
MGT+some)
2 15 W 30 W
VBA data switching hardcopy FPGAs (170 mW per 40
MGT+some)
2 15 W 30 W
Multi‐core Field‐Replaceable Processor Pack25 2 20 W 40 W
M&C FPGA + SFP 1 5 W 5 W
Subtotal 311 W
Power supplies, 90% efficient26 31.1 W
GSA Board TOTAL Power 355 W
Telescope GSA+Rack Power
Dish‐WBSPF (1 GHz/pol’n, 3072 antennas)
GSA X Boards 1008 355 357.8 kW
Per‐rack network switch 24 200 W 4.8 kW
Per‐rack, breaker panel, power cables (2% losses) 1 7.1 kW 7.3 kW
Dish‐WBSPF TOTAL 370 kW
Dish‐PAF (750 MHz/pol’n, 2048 antennas, 30 beams)
GSA X Boards 9600 355 3.41 MW
Per‐rack network switch 184 200 W 36.8 kW
Per‐rack, breaker panel, power cables (2% losses) 1 7.1 kW 70 kW
Dish‐PAF TOTAL 3.52 MW
SAA (300 MHz/pol’n, 256 elements, 1000 beams)
GSA X Boards 2000 355 710 kW
Per‐rack network switch 42 200 W 8.4 kW
Per‐rack, breaker panel, power cables (2% losses) 1 14.4 kW 14.4 kW
SAA TOTAL 732.8 kW
24 With 12 10G MGTs this will be challenging to meet. A new ultra-low-power transceiver technology may be required, perhaps short-distance pulse-phase modulation transceivers [10] [11]. 25 There is wide-scale intensive development of low-power high-performance computing. This will likely not be a COTS module, but rather a module optimized for image processing, using a COTS CPU core. By the time of the full-scale SKA, 20 W should buy significant processing capability. 26 Based on current-generation 48 VDC to LVDS converters (Artesyn). A two-stage conversion might be necessary for best efficiency for voltages <1.0 V.
WP2‐040.050.010‐TD‐001 Revision : 1
2011‐03‐11 Page 53 of 59
Description Qty P each P total
Telescope GSA+Rack Power
DAA (300 MHz/pol’n, 256 elements, 1000 beams)
GSA X Boards 2000 355 710 kW
Per‐rack network switch 42 200 W 8.4 kW
Per‐rack, breaker panel, power cables (2% losses) 1 14.4 kW 14.4 kW
DAA TOTAL 732.8 kW
TOTAL Correlator Rack Power 5.36 MW
‐48 VDC power plant efficiency (90%)27 1 536 kW 536 kW
Subtotal 5.96 MW
HVAC cooling power (30%)28 1 1.8 MW
TOTAL SYSTEM POWER 7.76 MW
Electricity operating cost, per year, assuming $0.10
per kW‐hr
68 M kW‐
hrs
$0.10 $6.8 M/yr
Table 9‐2 Power Table.
9.3 Operating Costs
9.3.1 Power
The X‐part correlator power operating cost is according to TOTAL SYSTEM POWER in Table 9‐2, and
is roughly $6.8M/year, assuming $0.10 per kW‐hr.
9.3.2 Staffing
With staffing levels outlined in section 8.14.1, and at $150k/year/person, roughly $1.8M/year
operating cost.
9.3.3 Maintenance
Staffing for maintenance also include various contractors such as electrical contractors, HVAC service
contractors etc. Estimate $500k/year for these kinds of additional maintenance contractors.
9.3.4 Spares & Replacements
For the EVLA, 5% spare modules were produced, with additional spare components, and COTS sub‐
modules at varying spares levels depending on the perceived (and analysed) failure rate,
availability/lifetime, and criticality. Overall, count on an additional ~7% cost for spares (7% of
$199M—Table 9‐1). Cost is ~$14M.
27 As per the EVLA COTS -48 VDC power plant, Emmerson Power, Model LPS48E1. 28 Private communication, Bob Broilo NRAO EVLA-site EE w.r.t. EVLA correlator.
WP2‐040.050.010‐TD‐001 Revision : 1
2011‐03‐11 Page 54 of 59
The largest on‐going replacement items will be anything with moving components, and this primarily
means central HVAC systems, if the cooling scheme of Figure 7‐2 is employed.
WP2‐040.050.010‐TD‐001 Revision : 1
2011‐03‐11 Page 55 of 59
10 Quality Assurance Provisions
10.1 General
Void
10.1.1 Qualification
Void
10.1.2 Acceptance
Void
10.2 Requirements Conformance Verification
Void
10.2.1 Verification Methods
Void
10.2.2 Verification Matrix
Void
11 Delivery Provisions
Void
12 Notes
Void
WP2‐040.050.010‐TD‐001 Revision : 1
2011‐03‐11 Page 56 of 59
13 Appendices
13.1 X421 ASIC Data Sheet
This appendix contains the first two pages of the X421 data sheet.
(B. Carlson) CA-X421-150 57
NRC Confidential—Preliminary Data Sheet—December 16, 2010
________________________________________________________________
________________________________________________________________
Features: 1024 baselines, 75 MHz/polarization, 2048
channels per product, all 4 polarization products.
15-level, 4-bit complex MAC, with data valid counting for each channel allowing for temporal/spectral RFI flagging/excision.
Complete on-chip memory, with seamless no dead-time integration/dumping/readout.
Y-input selection of X-input data for maximum packing in medium array size applications.
Quad-cell operation for ½ antennas, 2X BW operation.
Antenna-stream-controlled, baseline-based time + channel integration to minimize output data
volumes. Selectable epoch-less integration.
Baseline-based, u,v,w calculation/generation. Dynamic/flexible channel-burst lengths to
minimize upstream packet buffering. On-chip triple buffering for max 256-burst lengths.
All hi-speed I/Os are 10 Gbps clear channel SERDES 64/66B differential pairs (10.3125 Gbps line rate)
High-performance 4x10G-lane daisy-chained data in/out with buffering for blind nearest neighbour central controller-less output.
All IEEE 802.3 packet-based processing. I2C bus for monitor and control; 3 address bits. Single F672 flip-chip package; minimal I/O
count for high yield, low-power, cost-sensitive applications.
________________________________________________________________
Functional Block Diagram
1024 baseline, 4-bit, 2048 channel, 150 MHz, X-CMAC Array Correlator Chip
CA-X421-150
(B. Carlson) CA-X421-150 59
NRC Confidential—Preliminary Data Sheet—December 16, 2010
Description:
The CA-X421-150 (the “X421”) is a 1024 baseline, full-polarization, 4-bit complex spectral channel cross-correlator. The device has enough on-chip memory for accumulation, eliminating the need for external DRAM, thereby reducing power and minimizing I/O pin count for ease of board routing and maximization of board production yield to minimize cost and maximize reliability. It supports a bandwidth up to 75 MHz per polarization (less bandwidth can be processed, in which case the chip is operating at a lower duty cycle.), and provides 2048 complex channels per correlation product, with enough memory for 20 second on-chip accumulation (i.e. autocorrelation; longer for cross-correlations, inversely proportional to the correlation coefficient). Four lanes of 10G each with serial “frame in” and “frame out”, provide for nearest-neighbor daisy-chained routing of output data, in a blind, controller-less manner, simplifying board routing and eliminating any requirement for centralized real-time device control or scheduling. Each lane can be statically enabled or disabled to support varying board configurations. Enough output bandwidth is provided to allow for 170 msec integration on all baselines, with 8 chips stacked in a daisy-chain.
The X421 works on Ethernet IEEE 802.3 packets, either UDP/IP, or raw transport 802.3, allowing signals to be delivered to the chip over commercial equipment/networks. Two packet formats are supported, one which allows 8 antennas, 75 MHz/polarization, with 64 to 256 channel-burst-length, to be multiplexed into a single packet in a single serial stream. The other allows for ½ antenna, 4-band (effective 2X bandwidth) operation, allowing for 4 antennas, 4 bands at 37.5 MHz/polarization per band to be multiplexed into a single packet. Incoming X and Y packets must be reasonably synchronized in time; intelligent synchronization operating on packet sequence numbers, and on-chip triple packet buffering allows for up to 3.2 µsec skew between arriving packets.
Each full polarization frequency channel accumulator contains a data valid accumulator to allow for temporal channel-based flagging of data. The “-8” (1000b) state of the 4-bit 2’s complement word is used to indicate data invalid; the X421 decodes this state, and if either the R-pol or L-pol, Re or Im 4-bit word contains this value, a “0” is added to each polarization product accumulator, and the valid count accumulator is not
incremented. The output data frame contains this data valid count for each complex frequency channel, allowing for precision normalization by downstream floating-point processors.
The device contains additional features to allow for intelligent minimization of output data rates, and the possibility of downstream u,v,w-based packet destination routing to facilitate on-the-fly, distributed image processing. An INFOrmation packet is defined that contains everything needed for the internal integrators to perform baseline u,v-based time integration and/or baseline u,v-based channel integration, with integration-epoch-less intelligent on-chip handling of different integration times on different products, in units of Packet Groups29. The INFO packet contains antenna-based 48-bit u, v, and w, and the chip calculates baseline-based u, v, w, allowing for high spatial frequency resolution over wide observing frequencies and any possible earth baselines. If desired, these features can be by-passed and epoch-aligned; baseline-independent integration can then be performed using the same protocol.
An I2C bus allows for low data rate monitor and control of operation mode (single band, 1024 baseline, or quad band, 256 baselines), receiver/CDR and PLL status, as well as internal integration status such as integrator overrun, received packet errors etc. Three address bits allow 8 devices to be hung off the same I2C bus. Only a bare minimum of monitor and control functions are provided, to minimize monitor and control complexity in large distributed systems.
29 A “Packet Group” is a complete channel group of channel-burst packets, forming one complete sub-integration.