giant systolic array (gsa) correlator concept description€¦ · penticton, bc , canada, v2a 6j9...

Name Designation Affiliation Date Signature

Additional Authors

Submitted by:

B. Carlson DRAO 2011‐03‐26

Approved by:

W. Turner Signal Processing Domain Specialist

SPDO 2011‐03‐29

GIANT SYSTOLIC ARRAY (GSA) CORRELATOR CONCEPT

DESCRIPTION

Document number .................................................................. WP2‐040.050.010‐TD‐001

Revision ........................................................................................................................... 1

Author .......................................................................................................... Brent Carlson

Date ................................................................................................................ 2011‐03‐29

Status ............................................................................................... Approved for release

WP2‐040.050.010‐TD‐001 Revision : 1

2011‐03‐11 of 59

DOCUMENT HISTORY

Revision Date Of Issue Engineering Change

Number

Comments

A‐Preliminary Feb. 21, 2011 ‐ First draft release for internal review

A Feb. 28, 2011 Initial Release

B Mar. 11, 2011 Add items to address CoDR doc requirements

1 Mar. 29, 2011 First Issue

DOCUMENT SOFTWARE

Package Version Filename

Wordprocessor MsWord Word 2007 03b‐wp2‐040.050.010‐td‐001‐1‐gsaconcept‐description

Block diagrams

Other

ORGANISATION DETAILS

Name National Research Council Canada

Physical/Postal

Address

Herzberg Institute of Astrophysics

Dominion Radio Astrophysical Observatory

P.O. Box 248

717 White Lake Rd

Penticton, BC, Canada, V2A 6J9

Tel: 250‐497‐2300

Fax. 250‐497‐2355

Website http://www.nrc‐cnrc.gc.ca/eng/ibp/hia.html

WP2‐040.050.010‐TD‐001 Revision : 1

2011‐03‐11 of 59

TABLE OF CONTENTS

1 INTRODUCTION ........................................................................................... 10

1.1 Purpose of the document ..................................................................................................... 10

1.2 Simplified Overview Description ........................................................................................... 10

2 REFERENCES .............................................................................................. 11

3 HIERARCHY ................................................................................................ 12

3.1 Hierarchical Lifecycle ............................................................................................................ 12

4 ELEMENT LEVEL: SIGNAL PROCESSING .............................................................. 12

4.1 F‐part ..................................................................................................................................... 12

4.2 X‐part .................................................................................................................................... 13

4.2.1 Baseline GSA Correlator Board (Dishes) ....................................................................... 15

4.2.2 Board‐to‐Board Interconnections (Dishes) ................................................................... 18

4.2.3 SAA and DAA Board Interconnections .......................................................................... 20

4.2.4 Discussion—System Effects of ASIC Capacity ............................................................... 20

4.3 Correlator to Image Processor Data Transport—Visibility‐Based Addressing ...................... 21

4.3.1 Discussion ...................................................................................................................... 27

4.4 Central Beam‐former Data Access ........................................................................................ 28

4.5 Phase‐I SKA ............................................................................................................................ 29

4.5.1 Phase‐I F‐Part ................................................................................................................ 29

4.5.2 Phase‐I X‐Part: Dishes ................................................................................................... 29

4.5.3 Phase‐I X‐Part: SAAs ...................................................................................................... 29

4.5.4 Phase‐I Central Beam‐forming ...................................................................................... 31

4.6 Monitor and Control ............................................................................................................. 31

4.7 Upgrade Growth Paths .......................................................................................................... 32

5 RISKS ....................................................................................................... 33

6 REQUIREMENTS .......................................................................................... 33

6.1 Item Definition ...................................................................................................................... 34

6.1.1 General Description ...................................................................................................... 34

6.1.2 External Interfaces ........................................................................................................ 34

6.1.3 Internal Interfaces ......................................................................................................... 34

6.1.4 Modes ........................................................................................................................... 34

7 CHARACTERISTICS ........................................................................................ 35

7.1 Performance Characteristics ................................................................................................. 35

7.2 Physical Characteristics ......................................................................................................... 35

7.2.1 X‐part Correlator Boards ............................................................................................... 35

7.2.2 X‐part Racks .................................................................................................................. 35

7.2.3 F‐part Boards and Racks ................................................................................................ 37

WP2‐040.050.010‐TD‐001 Revision : 1

2011‐03‐11 of 59

7.2.4 X‐part Synchronization .................................................................................................. 37

7.2.5 Rack Power Delivery and Control .................................................................................. 37

7.2.6 F‐part Data Insertion into the Correlator ...................................................................... 37

7.2.7 M&C Network ............................................................................................................... 37

7.2.8 Output Data Network ................................................................................................... 37

7.2.9 Thermal Considerations ................................................................................................ 38

7.3 Electrical Characteristics ....................................................................................................... 39

7.4 Reliability ............................................................................................................................... 39

7.5 Maintainability ...................................................................................................................... 40

7.6 Availability ............................................................................................................................. 41

7.7 Additional Quality Factors ..................................................................................................... 42

7.8 Environmental Conditions .................................................................................................... 42

7.9 Transportability ..................................................................................................................... 42

7.10 Flexibility and Expandability.................................................................................................. 42

7.11 Portability .............................................................................................................................. 43

8 DESIGN AND PRODUCTION ............................................................................ 43

8.1 Components, Materials and Process .................................................................................... 43

8.2 Development and Production Plan ....................................................................................... 44

8.2.1 Development and Production Schedule ....................................................................... 45

8.3 Electromagnetic Radiation .................................................................................................... 47

8.4 Manufacturer Nameplate and Product Marking .................................................................. 47

8.5 Industrial Standardisation ..................................................................................................... 47

8.6 Interchangeability ................................................................................................................. 47

8.7 Safety .................................................................................................................................... 48

8.8 Ergonomics ............................................................................................................................ 48

8.9 Confidentiality and Protection .............................................................................................. 48

8.10 Supplies from the Contracting Authority .............................................................................. 48

8.11 Resource Reserve Capabilities .............................................................................................. 48

8.12 Documentation ..................................................................................................................... 48

8.13 Logistics ................................................................................................................................. 48

8.14 Personnel and Training ......................................................................................................... 48

8.14.1 Personnel ...................................................................................................................... 48

8.14.2 Training ......................................................................................................................... 49

8.15 Characteristics of Secondary Items ....................................................................................... 49

8.16 Priority ................................................................................................................................... 49

9 COST AND POWER ESTIMATES ........................................................................ 50

9.1 Cost Table .............................................................................................................................. 50

9.2 Power Table .......................................................................................................................... 52

9.3 Operating Costs ..................................................................................................................... 53

9.3.1 Power ............................................................................................................................ 53

9.3.2 Staffing .......................................................................................................................... 53

9.3.3 Maintenance ................................................................................................................. 53

9.3.4 Spares & Replacements ................................................................................................ 53

WP2‐040.050.010‐TD‐001 Revision : 1

2011‐03‐11 of 59

10 QUALITY ASSURANCE PROVISIONS ............................................................... 55

10.1 General .................................................................................................................................. 55

10.1.1 Qualification .................................................................................................................. 55

10.1.2 Acceptance .................................................................................................................... 55

10.2 Requirements Conformance Verification ............................................................................. 55

10.2.1 Verification Methods .................................................................................................... 55

10.2.2 Verification Matrix ........................................................................................................ 55

11 DELIVERY PROVISIONS ............................................................................... 55

12 NOTES .................................................................................................. 55

13 APPENDICES ........................................................................................... 56

13.1 X421 ASIC Data Sheet ............................................................................................................ 56

WP2‐040.050.010‐TD‐001 Revision : 1

2011‐03‐11 of 59

LIST OF FIGURES

Figure 4‐1: Simplified block diagram of the ‘R’ antenna‐beam element F‐part of the correlator. ...... 13

Figure 4‐2 Cross‐correlation systolic array. .......................................................................................... 14

Figure 4‐3 GSA board arrangement with short, nearest‐neighbour connections both horizontally

between racks, and vertically within the same rack. ................................................................. 15

Figure 4‐4 8x8 array of X421 correlator ASICs on a 19” rack‐mount board. 27 x 27 mm F672 devices

are shown, with 5 mm spacing between chips. The 10G signal “Repeaters” are likely FPGAs, or

hardcopy FPGAs with enough MGTs to satisfy the requirements. These devices can also be

used to stimulate and standalone test the board. ..................................................................... 16

Figure 4‐5 Information brief on 0XT connectors. (a) is a picture of the right‐angle PCB‐mount

connector. (b) shows the surface‐mount solder tails on signal lines, which micro‐via down to

the next layer if necessary. (c) is a 10G eye diagram of a signal going across the connector. (d)

is the mating connector, which would be installed in the interconnect board; the mid‐plane

connector consists of two of these, back‐to‐back. ..................................................................... 17

Figure 4‐6 Rear view of rack showing mid‐planes, horizontal, and vertical “printed wiring” patch

boards, eliminating interconnect cabling. .................................................................................. 18

Figure 4‐7 9 column x 8 row chip layout on the board to allow the board to be split into two

independent correlation triangles. ............................................................................................. 19

Figure 4‐8 Visibility capture regions for two output grid points. ......................................................... 22

Figure 4‐9 Visibility capture region for a CPU producing a square area of grid points. ....................... 22

Figure 4‐10 12x12 correlator matrix showing VBA routing paths from source correlator boards

(YELLOW) to a destination correlator board/gridding CPU (MAGENTA). ................................... 23

Figure 4‐11 Possible overlay of gridding CPUs to grid points, although CPUs 79, 80, 81 do not exist.24

Figure 4‐12 Correlator board output data path routing to nearest neighbour boards, and to gridding

CPUs. The nearest‐neighbour network has 2X the bandwidth of native output data

bandwidth, a somewhat arbitrary choice at this point. ............................................................. 25

Figure 4‐13 Possible front‐panel of board, with connectors for nearest‐neighbour VBA network

connections. ............................................................................................................................... 26

Figure 4‐14 Section of two racks, showing inexpensive VBA nearest‐neighbour wideband network

connections, using identical patch boards as used at the rear of the board/rack. Quad 100G

fibers from each board then route packets to gridding CPUs in a point‐to‐point fashion. ........ 26

Figure 4‐15 Front region of correlator board, and front panel, with gridding CPUs subsumed into the

correlator in the form of “Field Replaceable Processor Packs”. ................................................. 27

Figure 4‐16 Data re‐arrangement to produce ~1 kHz spectral resolution across ~74 MHz/polarization

of bandwidth, using the X421 chip. Accumulator double‐buffering in the chip allows one

band/slice to be correlating/integrating, while the other band/slice results are transmitted on

the 4 x 10G outputs. ................................................................................................................... 30

Figure 7‐1 Possible full‐scale SKA X‐part correlator layout, using the baseline X421 ASIC, measuring

~34 m W x 22 m D. Dish‐WBSPF: 1.05 GHz/polarization, 3072 elements. Dish‐PAF: 750

MHz/polarization, 2048 elements, 30 beams. SAA: 300 MHz/polarization, 256 elements, 1000

beams. DAA: 300 MHz/polarization, 256 elements, 1000 beams. Enough ‐48 VDC power plant

capacity is shown for ~3.2 MW (as shown in the table on page 53, more like 27 x 200 kW

power plants are required). This layout assumes air cooling as shown in Figure 7‐2. Not

WP2‐040.050.010‐TD‐001 Revision : 1

2011‐03‐11 of 59

shown are the F‐parts of the correlator, or any COTS computers or network equipment for

M&C, image processing, or non‐visibility processing. ................................................................ 36

Figure 7‐2 Centralized air‐blower cooling arrangement. ..................................................................... 38

Figure 8‐1 Rough Gantt chart of X‐part correlator development, including ASIC development with a

fabrication start time at the beginning of 2016. Some of the activities could have earlier start

times, in particular preliminary X‐board feasibility study/development, software development,

rack and sub‐rack design, and FPGA design. ASIC RTL development and test is shown

beginning of 2012, which may be a bit premature, depending on how accurate future

technology forecasts are. ........................................................................................................... 47

WP2‐040.050.010‐TD‐001 Revision : 1

2011‐03‐11 of 59

LIST OF TABLES

Table 5‐1 Possible risks and risk mitigation strategy. .......................................................................... 33

Table 9‐1 Cost Table. ............................................................................................................................ 51

Table 9‐2 Power Table. ........................................................................................................................ 53

WP2‐040.050.010‐TD‐001 Revision : 1

2011‐03‐11 of 59

LIST OF ABBREVIATIONS

ASIC .............................. Application-Specific Integrated Circuit

BGA ............................... Ball Grid Array

CNN ............................... CPU Node Number

CoDR ............................. Conceptual Design Review

COTS ............................ Commercial Off-The-Shelf

DAA ............................... Dense Aperture Array

DDR ............................... Double Data Rate

eDRAM .......................... Embedded DRAM

EVLA ............................. Expanded Very Large Array

FFT ................................ Fast Fourier Transform

FPGA ............................. Field Programmable Gate Array

FRPP ............................. Field-Replaceable Processor Pack

FX .................................. Fourier transform, followed by spectral multiplication

GSA ............................... Giant Systolic Array

I/O .................................. Input/Output

LTA ................................ Long-Term Accumulator

MAC .............................. Multiplier-Accumulator

MGT .............................. Multi-Gigabit Transceiver

PAF ............................... Phased-Array Feed

RAM .............................. Random Access Memory

SAA ............................... Sparse Aperture Array

SDRAM ......................... Synchronous Dynamic RAM

SFP ............................... Small Form-factor Pluggable

SKA ............................... Square Kilometre Array

SM ................................. Surface Mount

VBA ............................... Visibility-Based Addressing

Void ............................... Section Intentionally blank to be completed at a subsequent issue

WBSPF .......................... Wide-Band Single Pixel Feed

X421 .............................. Baseline spectral correlator chip, used in this concept. 4 bits, 2k channels,

1k baselines.

WP2‐040.050.010‐TD‐001 Revision : 1

2011‐03‐11 of 59

1 Introduction

1.1 Purpose of the document

The purpose of this document is to provide a complete concept‐level description of the GSA (Giant

Systolic Array) correlator [1] for the signal processing Conceptual Design Review to be held in April

2011 in Manchester.

The GSA concept has been refined somewhat since memo 127 was released; a detailed correlator

ASIC pseudo‐data sheet/specification has been produced, bearing in mind all SKA element

correlation requirements, as well as consideration of the problem of routing correlator data

products to the imaging processing system. Both of these additions are included in this document,

as well as a standalone concept description of the GSA itself.

1.2 Simplified Overview Description

The baseline GSA concept provides a straw‐man physical/electrical mechanism for arranging ‘X’ (i.e.

the ‘X’ part of an ‘FX’ correlator) correlator boards to allow for growth to handle very large

interferometer arrays in the thousands of elements, without the need for a large monolithic “corner

turner”. Rather, a distributed partial corner turner of ‘R’ elements may be used. There are

advantages in making ‘R’ larger (section 4.2.4), and if ‘R’ is the number of elements in the array, then

the concept essentially morphs into other forms similar to those which assume a complete corner

turner [2]. There is likely a trade‐off which balances the size of the corner turner and the correlator,

however the GSA concept does not require a large corner turner, and so it is easy to envision actual

implementation in concrete terms without relying on any particularly difficult technology or wiring.

There might likely even be a hybrid approach, which incorporates multiple concepts into one.

Each correlator ‘slice’ processes a given bandwidth, with a given number of spectral channels. This

slice can be part of a much larger bandwidth, part of the same interferometer beam, or can be some

slice of some bandwidth of any particular interferometer beam. The concept implementation, and

in particular the granularity of the circuit board and the ‘X’ processing chip, is chosen so as to allow

for processing of a few hundred elements up to a few thousand elements. It is thus applicable to all

element‐type telescopes envisioned for the SKA, namely SAAs, DAAs, dish‐PAFs, and dish‐WBSPFs.

The heart of the GSA is an ‘X’ correlator ASIC, the capability of which largely determines the cost,

physical scale, and power dissipation of the ensuing system(s). A baseline ASIC pseudo‐data/spec

sheet—the ‘X421’—has been developed, the first 2 pages of which are included in the appendix of

this document. Currently a cost/power study is being performed on this chip, and the results of this

study will allow reasonably accurate system power dissipation forecasting, as well as acting as a

baseline for predicting the cost and power of devices that might be more ambitious or use more

advanced technology. This chip is specified for reliability and manufacturability; it has minimal

physical I/O signals for maximum manufacturing yield and solder‐joint reliability, and all storage

RAM is on chip. The chip, in conjunction with the overall ‘F’ and X‐part design, takes a holistic

approach in that it contains facilities for transporting complete data products to the image

processing system while minimizing data output rates, with the goal of minimizing the magnitude of

the image processing problem. Indeed, an option is presented which might allow the image

processing problem to be subsumed into the correlator (Figure 4‐15).

WP2‐040.050.010‐TD‐001 Revision : 1

2011‐03‐11 of 59

This document’s concept description also briefly touches on how/where the central beamformer will

tap into the data paths; however details of a central beamformer concept description are contained

in a separate CoDR document.

2 References

[1] Carlson, B., “The Giant Systolic Array (GSA): Straw‐man Proposal for a Mult‐Mega Baseline

Correlator for the SKA”, SKA Memo 127, August 2010.

(http://www.skatelescope.org/PDF/memos/127_Memo_Carlson.pdf)

[2] D’Addario, L., “Low‐Power Correlator Architecture For the Mid‐Frequency SKA”, SKA Memo xxx,

January 2011.

[3] Crochiere, R.E., “A Weighted Overlap‐Add Method of Fourier Analysis/Synthesis,” IEEE Trans.

Acoust. Speech Signal Process., Vol. ASSP‐28, No. 1, pp. 99‐102, February 1980.

[4] Kung, H.T., Leiserson, C.E., “Systolic Arrays (for VLSI)”, Sparse Matrix Proceedings 1978, Society

for Industrial and Applied Mathematics 1979, pp 256‐282, ISBN: 0‐89871‐160‐6.

[5] http://www.erni.com/DB/PDF/ERmetzeroXT/ERNI‐DesignCon2005‐Paper.pdf

[6] http://www.erni.com/ermetzeroxtfront.htd

[7] Dewdney, P., et. al., “SKA Phase 1: Preliminary System Description”, SKA Memo 130, November

2010.

[8] http://www.critical‐embedded‐systems.com/meecc/2005/presentations/Pautsch‐CoolCON.pdf

[9] Reliability at 50 C junction temperature reference. (TBD)

[10] Rashdan, M.; Yousif, A.; Haslett, J.; Maundy, B.; "A new time‐based architecture for serial

communication link" 16th IEEE International Conference on Electronics, Circuits, and Systems ,

Page(s): 531 ‐ 534, 2009.

[11] Rashdan, M.; Yousif, A.; Haslett, J.; Maundy, B.;" Data link design using a time‐based approach"

Proceedings of 2010 IEEE International Symposium on Circuits and Systems (ISCAS), pp. 3977 ‐ 3980,

2010.

WP2‐040.050.010‐TD‐001 Revision : 1

3 Hierarchy

3.1 Hierarchical Lifecycle

Void

4 Element Level: Signal Processing

The GSA correlator is an ‘FX’‐style correlator. Given the concession of re‐quantizing spectral

channels after FFT to 4 bits (it is assumed that the Re and Im parts require 4 bits each), and the

proliferation of technology which allows very high speed data rates on a single wire pair1, it is the

only real viable option. As such, in order to provide for sufficient numbers of spectral channels, the

correlator is essentially “distributed RAM‐dominated”, rather than compute or “copper” dominated.

The ‘F‐part’ of the correlator is not covered in any significant detail, but is described briefly so that it

can be seen that it is a reasonably straight‐forward problem to solve. The F‐to‐X data transmission

part is all point‐to‐point and so requires nothing other than commercial short‐haul (order of 10‐100

m) copper or more likely fibre technology to solve. The data distribution within the X‐part of the

correlator is all performed without cabling, and is flexible enough to correlate hundreds to

thousands of elements.

4.1 F‐part

The F‐part of the correlator consists of a geometric delay and (poly‐phase) FFT [3], followed by a

short channel‐burst buffer, followed by packet builders/mergers such that in each data packet

destined for the X‐part of the correlator, there are ‘R’ antennas, each with 1 spectral channel, for a

‘J’ channel‐burst length2, for a particular band/slice. ‘K’ such packets then make up an entire

“Packet (channel) Group”, which is all of the channels that are to be correlated for the particular

band/slice correlator. Included within each Packet Group, for each antenna‐beam, is an “INFO

Packet”, providing information to facilitate X‐part integration/data compression, and labelling of

data for facilitating transmission to, and processing by, the downstream image processing system.

Information in the INFO Packet is calculated and inserted into the packet stream in real time by a

CPU, from information supplied by a centralized “meta‐data” source.

A simplified diagram of an ‘R’ antenna‐beam element F‐part of the correlator is shown in Figure 4‐1.

No detailed design of this part of the correlator is provided in this concept description, but it is

reasonable to assume that for small ‘R’ (~8), the processing shown can easily fit within several

devices on one circuit board, and possibly although not necessary or necessarily, multiple such ‘R’‐

elements could fit on one circuit board. Thus, there is simple point‐to‐point connectivity from this

part to each correlator band/slice.

1 It would seem that the cost/complexity of bandwidth connecting boards/chips has out-paced the rate of packing silicon on a chip (or more importantly, the cost and power of doing so), although no formal data is presented to support this statement. 2 In the X421 chip, the minimum required channel-burst length ‘J’ is 64.

WP2‐040.050.010‐TD‐001 Revision : 1

2011‐03‐11 of 59

Figure 4‐1: Simplified block diagram of the ‘R’ antenna‐beam element F‐part of the correlator.

The F‐part of the correlator is the same for every SKA element in concept (i.e. element as far as the

correlator is concerned), whether it be SAA, DAA, dish‐PAF, or dish‐WBSPF. From the X‐part of the

correlator’s perspective, there is no difference other than the number of elements (‘Ants’) that must

be cross‐correlated.

Additionally, it is likely the case that the signal entering the correlator is already coarsely

channelized, or possibly even completely channelized, and so the actual location of the FFT

filterbank is not finalized yet and does not yet need to be finalized for the basic concept to remain

sound.

In the GSA concept, each “band/slice” correlator for a particular SKA element type is identical.

4.2 X‐part

The X‐part of the correlator must calculate, for every spectral channel, the cross‐correlation

coefficient for every baseline or antenna‐antenna pair. There are therefore N2/2 baselines (including

auto‐correlations), that must be correlated. A simple systolic‐array [4] arrangement, showing the

correlation elements and data paths, which might accomplish this processing is shown in Figure 4‐2.

Poly-phaseFFT

Filterbank

DelayBW-MHz

Channel-BurstBuffer(BK x J array)

Station-beam F-Processing Element

K

K

K

INFO PacketsAnt-Beam-1

Poly-phaseFFT

Filterbank

DelayBW-MHz

Channel-BurstBuffer(BK x J array)

Station-beam F-Processing Element

K

K

K

INFO PacketsAnt-Beam-R

Pac

ket

Bui

lder

/Mer

ger—

Ban

d/sl

ice

B

Pa

cket

Bu

ilder

/Mer

ge

r—B

an

d/s

lice

1

Pac

ket B

uild

er/M

erge

r—B

and/

slic

e 2

to correlator band/slice 1

to correlator band/slice 2

to correlator band/slice B

WP2‐040.050.010‐TD‐001 Revision : 1

2011‐03‐11 of 59

Figure 4‐2 Cross‐correlation systolic array.

The required number of systolic array nodes is determined by the number of antennas that are to be

processed, and the number of baselines that can be correlated in each node. Thus, for a very large

number of antennas, processing as many baselines in a node is advantageous. Also, each array node

can be hierarchical in nature; in the baseline GSA concept a top‐level node is a circuit board, and it

consists of an array of sub‐nodes each element of which is implemented in an ASIC.

Assuming that the entire matrix does not fit on a single convenient entity (circuit board), one way of

connecting adjacent circuit boards together to form the entire matrix is shown in Figure 4‐3. In this

scheme, the boards are mounted horizontally in adjacent racks, facilitating horizontal and vertical

nearest‐neighbour connections. The bold BLUE and RED lines in the figure show data flow for one

particular set of antenna data through the array. The end of this data flow (i.e. racks on the ends of

the matrix), is where F‐part data is accessed for central beam‐forming. Central beam‐forming

hardware can then reside in equipment in additional racks on either end.

The number of band/slice correlation matrices/triangles which can fit in a rack is determined by the

vertical occupancy of the board, likely 1U height, and by the number of racks that can be stacked

side‐by‐side, fundamentally only limited by the mechanical design of the installation. Typically, 48U

of vertical space could be made available, allowing for a 48 x 48 matrix, or many smaller such

band/slice matrices within a smaller adjacent rack footprint.

Antenna Inputs

WP2‐040.050.010‐TD‐001 Revision : 1

2011‐03‐11 of 59

Figure 4‐3 GSA board arrangement with short, nearest‐neighbour connections both horizontally between

racks, and vertically within the same rack.

4.2.1 Baseline GSA Correlator Board (Dishes)

The baseline X421 ASIC design provides for 1024 baselines (32 x 32 antennas), full polarization, 2048

channels per product, at up to 75 MHz per polarization. There are 4, 10G3 ‘X’ inputs and 4, 10G ‘Y’

inputs, each input containing real‐time spectral channel data from 8 antennas in the manner as

outlined in section 4.1. The device has very few I/Os (~72 signal contacts plus power and GND),

requires no external chips other than filtering capacitors and power supplies and will likely fit into a

27x27 mm F672 BGA package4. It is no stretch to consider that an 8x8 array (alternatively, it may be

more cost‐effective to have 4X capacity on a chip, and have a 4 x 4 array of chips on a board) of such

chips could easily fit on a board which can horizontally mount in a sub‐rack (crate, shelf), which then

mounts in a 19” rack. A preliminary layout of such a board, which might be used as one element in a

12x12 systolic array (see Figure 4‐2), allowing for cross‐correlation of 3072 SKA dishes, is shown in

Figure 4‐4. The layout allocates room for a 5 mm gap between chips on the board, needed for SM‐

rework.

With 64 such chips on a board, there are a total of 64 x ~72 ~= 4600 signal solder contacts; with a

raw manufacturing solder‐joint defect rate of 50 DPMO5 (Defects Per Million Opportunities), there

are ~0.2 defects per board. Normally though, these defects are not uniformly distributed, so in

practice few boards should actually experience defects, and those that do have multiple defects. By

3 ‘10G’ is 10 Gigabits per second clear channel on a single differential pair, likely 64/66B encoding for an actual line rate of 10.3125 Gbps. 4 Up to a 35x35 mm package could be accommodated within the 19” rack-mount board width. 5 As per the EVLA experience, 50 DPMO is considered “standard” in the industry for a well-controlled solder process.

Inputs from F-part

Inputs from F-partB

and/

slic

e 1

Ban

d/sl

ice

2

WP2‐040.050.010‐TD‐001 Revision : 1

2011‐03‐11 of 59

comparison, the EVLA Baseline Board had ~100k solder points (total; ~50k signal solder points), and

had a raw manufacturing yield of about 75%. The raw yield should therefore be quite high with

~1/10th as many solder joints, as should the aggregate solder‐joint failure rate.

32 x 10G pairs enter, are repeated, and exit the board via 40‐pair 0XT (“Zero X Tee”) connectors in

the horizontal and vertical directions in a nearest neighbour scheme (as shown in Figure 4‐3). 0XT

connectors are similar to HM‐Zd connectors, except they achieve higher performance with surface‐

mount signal contacts and press fit ground contacts, rather than press‐fit signal contacts (and the

resulting short transmission‐line stubs which affect signal quality). An information brief on the 0XT

connector is shown in Figure 4‐5. Further information on the 0XT connector can be found in [5] and

[6].

Figure 4‐4 8x8 array of X421 correlator ASICs on a 19” rack‐mount board. 27 x 27 mm F672 devices are

shown, with 5 mm spacing between chips. The 10G signal “Repeaters” are likely FPGAs, or hardcopy FPGAs

with enough MGTs to satisfy the requirements. These devices can also be used to stimulate and standalone

test the board.

~16.5"

~15"

LEDsJTAG

Repeater

Horizontal directionVertical direction

Horizontal direction

Repeater

4

Power Supply

40-pair0XT(IN)

40-pair0XT

(OUT)

-48

PW

R+

M

40-pair0XT(IN)

40-pair0XT

(OUT)

-48

PW

R+

C

X421 X421 X421 X421 X421 X421 X421 X421

X421 X421 X421 X421 X421 X421 X421 X421

X421 X421 X421 X421 X421 X421 X421 X421

X421 X421 X421 X421 X421 X421 X421 X421

X421 X421 X421 X421 X421 X421 X421 X421

X421 X421 X421 X421 X421 X421 X421 X421

X421 X421 X421 X421 X421 X421 X421 X421

X421 X421 X421 X421 X421 X421 X421 X421

4 4 4 4 4 4 4

4

4

4

4

4

4

4

4

3232

Power Supply

Power Supply

Power Supply

Power Supply

Power Supply

Power Supply

Power Supply

M&

C S

FP

M&CFPGAData Output Transport

Data Output Connectors

WP2‐040.050.010‐TD‐001 Revision : 1

2011‐03‐11 of 59

There is a single M&C (Monitor and Control) FPGA on the board, which communicates with the

outside world via (for example, but not restricted to) a 1G Ethernet SFP. For wiring simplicity, all

communications between the M&C FPGA and the X421 chips are via 8 I2C busses.

Note that in memo 127 [1] there is a low‐latency 10G repeater and globally distributed reference

clock. It is believed that with a small matrix (i.e. ≤ 12 x 12) this specialized repeater is not necessary

and standard asynchronous rate‐matching methods can be used, eliminating the need for a globally‐

distributed reference clock. Also, the fault tolerant passive‐bypass method in memo 127 is likely

problematic from a signal integrity standpoint, and unnecessary with so few signal repeats.

Figure 4‐5 Information brief on 0XT connectors. (a) is a picture of the right‐angle PCB‐mount connector. (b)

shows the surface‐mount solder tails on signal lines, which micro‐via down to the next layer if necessary. (c)

is a 10G eye diagram of a signal going across the connector. (d) is the mating connector, which would be

installed in the interconnect board; the mid‐plane connector consists of two of these, back‐to‐back.

The “Data Output Transport” and “Data Output Connectors” in Figure 4‐4 are not defined in the

figure, but some ideas on what form these might take are defined in section 4.3. Note that due to

the large memory capacity of the X421 chip, each chip has 4 x 10G data output lanes designed to be

operated in a daisy‐chain fashion with other chips in a column. If these lanes are daisy‐chained

amongst all 8 chips in a column, then the smallest integration time that can be accommodated (on

every product) is ~170 msec, requiring 32, 10G outputs6. If 85 msec integration times are required

(daisy‐chaining 2 groups of 4 chips in a column), there will be 64, 10G outputs, matching the data

input rate into the board(!) A more dynamic scheme could achieve higher dump rate performance

6 For 32-bit (64-bit complex) spectral visibilities, with a data valid count per spectral visibility to allow for per-channel temporal RFI flagging.

(a)(b)

(c)(d)

WP2‐040.050.010‐TD‐001 Revision : 1

2011‐03‐11 of 59

by adjusting the output word width based on integration time. This parameter space remains to be

explored, and is not part of the baseline X421 specification. Nevertheless, even with a 16‐bit

visibility word size, the output data rates are still enormous.

4.2.2 Board‐to‐Board Interconnections (Dishes)

As indicated in Figure 4‐3, with horizontal board mounting, adjacent boards can be connected to

each other in a nearest‐neighbour fashion. One way of achieving this is with short cables, but these

are quite expensive because of performance and the number of connections and wire to connector

contacts that must be manufactured. Perhaps a better way is to use “printed wiring connections” in

the form of small “patch boards” (passive PCBs with press‐fit or SM connectors), which plug into the

rear of mid‐planes, into which the correlator PCB is itself plugged in via the front. Thus, all boards

are connected together without the need for expensive cables. (A similar patch board scheme was

used for interconnecting adjacent boards in the EVLA correlator; the board was larger and contained

more connections than required here, and cost ~$75 ea. in small quantities, ~10X savings in cost

over cable.)

Figure 4‐6 shows one small section of the rear of the racks indicating the basic layout of the mid‐

planes and interconnect patch boards.

Figure 4‐6 Rear view of rack showing mid‐planes, horizontal, and vertical “printed wiring” patch boards,

eliminating interconnect cabling.

This interconnect scheme requires higher precision adjacent rack mechanics than normally is

available in COTS 19” rack‐mount systems, likely requiring custom‐engineered racks and sub‐racks.

It is also possible to provide a range of mid‐plane and patch board granularities to simplify and speed

up installation time. The population of each rack in terms of correlator boards and mid‐planes is

identical; which part of the correlation triangle the rack occupies is set by the specific population of

the interconnection patch boards.

For cross‐correlation of 3072 elements, a 12 x 12 correlation triangle of boards is required (for the

baseline X421 chip design, and the board layout shown in Figure 4‐4). Dual correlation systolic array

OUTINOUTIN

OUTINOUTIN

OUTINOUTIN

OUTINOUTIN

OUTINOUTIN

Rack Support Columns

Direction turnerpatch board

IN

IN

IN

IN

IN

Vertical patch board

Horizontal patch board

HMZd Midplane

Input from F-part

Vertical stiffening bar

48 V power studs

Remote power M&C

studs

WP2‐040.050.010‐TD‐001 Revision : 1

2011‐03‐11 of 59

matrices (triangles) can be fitted back‐to‐back, and so in a 13U x 12‐rack space, 3072 antennas can

be correlated, 150 MHz per polarization, 4096 channels per polarization product. If another column

of chips is added to the board so that the board itself may be split into two correlation triangles for

boards along the diagonal (one triangle for one band/slice and the other triangle for the other

band/slice), then a 12U x 12‐rack space is required, and 12 boards are saved, although likely at the

expense of having to use interconnect cables rather than patch boards for the shared boards along

the diagonal of the triangle. In such an arrangement, with 48U of vertical rack space available, 4

such dual‐triangle arrays can be accommodated, for a total bandwidth of 600 MHz and 16,384

channels per polarization product.

This split also optimizes the board for cross‐correlation of up to 256 antennas, allowing the board to

be used for SAA and DAA correlation. The basic chip layout of the board, not showing the board

outline, signal repeaters, or output connectors is shown in Figure 4‐7.

Figure 4‐7 9 column x 8 row chip layout on the board to allow the board to be split into two independent

correlation triangles.

Note that the baseline X421 chip design supports the necessary data path switching as indicated in

Figure 4‐7.

40-pair0XT

connector

40-pair0XT

connector

4 x 10G

4 x 10G

256 antennas, 2 bands of 50 MHz/pol’n per board

32 x 10G

32 x 10G

WP2‐040.050.010‐TD‐001 Revision : 1

2011‐03‐11 of 59

Although the same board design could be used for dishes, SAAs, and DAAs, in cases where the left‐

most column of Figure 4‐7 is unused, those chips are wasted. It might therefore be more prudent to

have two board designs, one with an 8x8 array of chips and one with a 9x8 array of chips.

4.2.3 SAA and DAA Board Interconnections

For the SAA and DAA application where there are ≤256 elements to cross‐correlate, there is no need

for any board‐to‐board interconnection. Each board (Figure 4‐7) contains two independent

band/slice 256‐element cross‐correlators, each of which (with the baseline X421 chip) is capable of

75 MHz per polarization, 2048 channels per polarization product. Thus, boards can be populated in

a rack in either a vertical or horizontal orientation, with point‐to‐point connections from the F‐part

to correlator boards in the X‐part.

The total number of boards required is nbeams x (bandwidth per beam‐polarization)/ 150 MHz. For

example for 1000 beams, and 300 MHz/polarization, 2000 boards are required. Assuming the

horizontal mounting strategy and 48 boards per rack, 42 racks are required. If not enough spectral

channels are available in “normal” operating mode, the method described in section 4.5.3, could be

used to obtain more channels, but with longer integration times as the output data volume per

integration has increased.

4.2.4 Discussion—System Effects of ASIC Capacity

If the ASIC capacity is instead such that it processes 4k baselines, 1024 spectral channels per

product, with 16 antennas multiplexed onto one 10G data stream (an ‘X414’ design), and ½ the

bandwidth (37.5 MHz/polarization) then the correlation triangle for 3072 antennas is reduced to 6 x

6, or 18 boards, which might fit into a larger‐than‐19” sub‐rack, eliminating the need for horizontal

board mounting and requiring only a monolithic backplane per sub‐rack to distribute real‐time data.

Or, the horizontal mounting scheme could still be used, and two such triangles (75 MHz per

polarization) would fit in a 6 x 6 arrangement of 36 boards. Therefore, 150 MHz per polarization

3072 antennas would fit into 72 boards, or ½ the 12 x 12 board requirement for the X421 design, a

savings of a factor of 2. The total memory capacity, for the same spectral resolution, however,

remains the same. This assumes that that much memory (2X the X421) can fit on the chip, or that

the channel‐burst factor ‘J’ is significantly increased and off‐chip RAM is used, complicating the

board design and impacting board yield and reliability.

In this case, the ASIC memory capacity is double the X421 design, and all of the multiply/accumulate

(MAC) logic runs at 156.25 MHz. This is very slow; running the logic at 625 MHz (the X421 runs at

312.5 MHz, requiring very little if any pipelining, itself a savings in power and area) would allow a

time‐multiplexing of baselines by a factor of 4, reducing the MAC logic by a factor of 4 (so that it is

the same as the X421 design). This is a more complicated design, and would require that a factor of

4X the number of correlated data products exit each board. As more antennas are multiplexed into

a 10G link, the architecture essentially morphs into that proposed in [2].

For the SAA and DAA, an ‘X414’ design would then allow for 4X capacity on each board, at ½ the

bandwidth for similar overall board count savings.

WP2‐040.050.010‐TD‐001 Revision : 1

2011‐03‐11 of 59

4.3 Correlator to Image Processor Data Transport—Visibility‐Based Addressing

The X421 ASIC baseline design contains capabilities to help to facilitate real‐time image processing.

In particular, it calculates and includes in the output data frame, u, v, and w for the time and channel

centroid of a correlated data packet, and allows for epoch‐less baseline‐based integration times, as

well as baseline‐based channel integration. These facilities are based on the notion that shorter

baselines can allow for longer integration times and fewer spectral channels (for continuum

observations), with linearly or non‐linearly decreasing integration times and numbers of spectral

channels on longer baselines. RAM capacity in the chip is sufficient to allow for 20 second on‐chip

accumulation, eliminating the need for a separate LTA7. The goal is to include in the output data

packet as much complete information as possible for the image processing computers.

To deal specifically with correlator to image processor data transport, an internal correlator network

is envisioned, using “visibility‐based addressing” (VBA). The goal of VBA is to transport correlated

data packets to only those image processing computers that need those products for gridding. This

method assumes that a gridding CPU does not need all of the visibilities all of the time (and indeed

needs all of the visibilities only a small fraction of the time), and that it can highly compress the data

rates as it grids‐and‐integrates, passing data at relatively low data rates on to 2D‐FFT, self‐cal, CLEAN

etc.

In the proposed VBA scheme, the correlator handles correlated data distribution, rather than

handling it in a generic COTS network. For the simple case where a gridding CPU produces exactly 1

gridded output point, there is some radius (or region) of “visibility capture” for the CPU; all raw

visibilities within that capture region must get transported to that CPU. The next CPU, producing the

next gridded point, has a capture region which might overlap with the first CPU. Therefore, there

can be multiple gridding CPU destinations for a particular correlator data product. This concept is

shown in Figure 4‐8.

7 Although perhaps a separate on-board LTA is more efficient. This parameter space remains to be explored.

WP2‐040.050.010‐TD‐001 Revision : 1

2011‐03‐11 of 59

Figure 4‐8 Visibility capture regions for two output grid points.

The visibility capture radius shown in Figure 4‐8 is meant to be general, and not indicate any

quantitative requirement or limitation. (The importance of “w” is not being ignored; it is assumed

that any effect of “w” on packet routing can be incorporated by appropriately adjusting the u,v‐

based capture region.) Of course, the larger the visibility capture region for a grid point, the larger

the effective correlator output data rate, as identical output packets must travel to multiple

destinations.

If a gridding CPU handles/generates more than one grid point, say a square region of grid points, the

visibility capture region for the CPU is also roughly a square, as indicated in Figure 4‐9 below.

Figure 4‐9 Visibility capture region for a CPU producing a square area of grid points.

U1,V1

r1

U2,V2

r2

time

Grid point U1,V1 raw visibility capture region

Grid point U2,V2 raw visibility capture region

Raw visibilities from correlator

U1,V1 U2,V2

U3,V3 U4,V4

WP2‐040.050.010‐TD‐001 Revision : 1

2011‐03‐11 of 59

The assignment of grid points to CPUs does not have to be uniform. It can be based on the expected

number of raw visibilities within those grid points’ capture region, and the chosen baseline‐based

integration time, so that gridding CPU load balancing can be achieved.

To actually implement VBA, consider the following simple configuration. Each correlator board has a

single high‐speed network connection to a CPU. “The CPU” could be a single core processor, a multi‐

core processor with shared memory, or multiple independent processors with a gateway node

directing packets to multiple destination processors. Within the correlator, there are full‐duplex

nearest neighbour 2D connections between correlator boards. One or more controllers on each

correlator board follow a simple algorithm to route internally‐correlated packets, based on u and v,

directly to the “native” gridding CPU, or to a nearest neighbour board. Also, incoming “non‐native”

packets get routed to the gridding CPU, or to another nearest neighbour board on their way to a

destination gridding CPU.

Figure 4‐10 is a simple example showing routing paths from source correlator boards, to one

destination correlator board’s gridding CPU, for the 12x12 correlator matrix used with the X421 chip

for dish correlation up to 3072 elements. In reality, it is likely the case that each correlator board

generates data packets destined for all other boards’ CPUs.

Figure 4‐10 12x12 correlator matrix showing VBA routing paths from source correlator boards (YELLOW) to

a destination correlator board/gridding CPU (MAGENTA).

The overlay of CPUs to grid points might be something like Figure 4‐11 below, although there isn’t a

perfect mapping of correlator boards to a square grid of CPUs (but nor does there need to be).

77 7876

5657

7574

5859

54 5553

38 37

73

60

52

39

35 36

72

61

51

40

71

62

50

41

70

63

49

42

69

64

48

43

28

34333231

27262524

21201918

13 14 15

109

68

65

47

44

30

23

17

12

8

5 6

3

67

66

46

45

29

22

16

11

7

4

2

1

WP2‐040.050.010‐TD‐001 Revision : 1

2011‐03‐11 of 59

Figure 4‐11 Possible overlay of gridding CPUs to grid points, although CPUs 79, 80, 81 do not exist.

Each correlator board has a finite bandwidth to its nearest neighbours, and so it will sometimes

likely be the case that at a particular instant, transmission to the actual nearest neighbour is not

possible because the link is in use. In this case, the packet is transmitted to the next‐best nearest

neighbour (i.e. re‐routed), where eventually it will find its way to the final destination node. Clearly,

there must be a “time to live” on the packet so that it doesn’t stay cycling in the network for an

indeterminate period of time, although there is no particular rush or order in which packets must

show up at gridding CPUs. Also, a particular packet can have multiple destination gridding CPUs, and

so the packet must, at the start of its journey, be tagged with a list of all possible destinations; as the

packet gets delivered, these destination tags are removed until all tags are gone, at which point the

packet dies.

The basic VBA algorithm is therefore as follows:

Each node (correlator board packet router/controller) that connects to a CPU is assigned a

CPU/node number (CNN), such that numbers are sequential in the manner shown in Figure

4‐10, and Figure 4‐11. This allows the packet to find its destination via the nearest‐

neighbour network in a straightforward manner.

Each CNN has a (u,v,r2) (or (u1,v1; u2,v2; u3,v3; u4,v4)) tag indicating its u,v capture region.

Each node keeps track of every CNN, and its associated u,v capture region. Call this the

“CNN‐TABLE”. This information is provided to nodes via CPUs broadcasting information

packets to all nodes every so often.

When a correlator node generates a (correlated) data packet, it looks in the CNN‐TABLE and

makes a list of every node (and ultimately gridding CPU) the packet must “visit”. It appends

(or pre‐pends) this list of tags to the data packet. Set a “time to live” on the packet, which is

the maximum number of node transmissions that will be allowed until the packet is killed.

The node then transmits the packet out to a nearest neighbour node, which has an available

channel and is closest to the closest node in the CNN‐LIST.

1 2 3 4 5 6 7 8 9

10 11 12 13 14 15 16 17 18

19 20 21 22 23 24 25 26 27

28 29 30 31 32 33 34 35 36

37 38 39 40 41 42 43 44 45

46 47 48 49 50 51 52 53 54

55 56 57 58 59 60 61 62 63

64 65 66 67 68 69 70 71 72

73 74 75 76 77 78 79 80 81

WP2‐040.050.010‐TD‐001 Revision : 1

2011‐03‐11 of 59

Once a packet is transmitted to a gridding CPU in the CNN‐LIST, the entry (tag) in the CNN‐

LIST is deleted.

Continue on until there are no CPUs left in the CNN‐LIST, or until the “time to live” has

expired, at which point the packet dies.

Possible data paths on the board, to nearest‐neighbour boards, and to gridding CPUs are shown in

Figure 4‐12 below. Only correlated packet data paths are shown in the figure. 4 x 10G connections

from each column of X421 ASICs are good enough for 170 msec integration times, with possible

improvement (decrease) by adjusting word sizes based on integration time, as previously

mentioned.

The nearest‐neighbour network bandwidth in the figure is double the native output bandwidth. This

is a first guess as to the actual requirement, which depends on the u,v capture region and the

distribution of short (long integration) and long (short integration) baselines to correlator nodes.

This is something of a difficult problem to analyse; network traffic simulations will likely be required

to study how it behaves, and what nearest‐neighbour bandwidth is actually required.

Figure 4‐12 Correlator board output data path routing to nearest neighbour boards, and to gridding CPUs.

The nearest‐neighbour network has 2X the bandwidth of native output data bandwidth, a somewhat

arbitrary choice at this point.

RouterFPGA

orASIC

RouterFPGA

orASIC

8 8 8

100G

8 8 8

To CPUL T B To CPU RT B

44 4

44

4 44

8

2

100G

2

10G

10G

10G

10G

R L

X421 X421 X421 X421 X421 X421 X421 X421

X421 X421 X421 X421 X421 X421 X421 X421

X421 X421 X421 X421 X421 X421 X421 X421

WP2‐040.050.010‐TD‐001 Revision : 1

2011‐03‐11 of 59

The implementation of the (front‐panel) nearest neighbour network will greatly impact its cost. One

way of doing it is to use a similar approach to data distribution as is used for real‐time data at the

rear of the board. HM‐Zd or 0XT connectors poke out through the front panel, and, once boards are

in place, connect to each other using identical patch boards as used at the rear, except without the

need for mid‐planes. The front panel of the board might look like that shown in Figure 4‐13.

Figure 4‐13 Possible front‐panel of board, with connectors for nearest‐neighbour VBA network connections.

A slice of the correlator showing patch‐board nearest neighbour connections is shown in Figure 4‐14.

The dark dividing line demarcates the interface between the two band/slice correlators in the rack.

There are no network connections between different band/slice correlators8.

Figure 4‐14 Section of two racks, showing inexpensive VBA nearest‐neighbour wideband network

connections, using identical patch boards as used at the rear of the board/rack. Quad 100G fibers from each

board then route packets to gridding CPUs in a point‐to‐point fashion.

VBA addressing has nicely collated the data packets so that there are point‐to‐point connections

from the correlator boards to the gridding CPUs. However, there are still a huge number of 100G

fiber connections required to the CPUs!

The natural progression, it seems, is to subsume the gridding CPUs into the correlator. One way this

might be done is shown in Figure 4‐15.

8 Unless boards along the diagonal are a 9x8 array of chips to optimally use board resources.

TBL R

1G M&C

32 10G pairs 32 10G pairs

16 10G pairs16 10G pairs

GSA X Board (64k baselines) Ver. 2.1

2 x 100G fiber to CPU 2 x 100G fiber to CPU

TBL R

1G M&C




2 x 100G fiber to CPU 2 x 100G fiber to CPU TBL R

1G M&C





TBL R

1G M&C





TBL R

1G M&C





TBL R

1G M&C





TBL R

1G M&C





TBL R

1G M&C





TBL R

1G M&C





TBL R

1G M&C





TBL R

1G M&C





TBL R

1G M&C





TBL R

1G M&C





TBL R

1G M&C





TBL R

1G M&C





TBL R

1G M&C





TBL R

1G M&C





WP2‐040.050.010‐TD‐001 Revision : 1

2011‐03‐11 of 59

Figure 4‐15 Front region of correlator board, and front panel, with gridding CPUs subsumed into the

correlator in the form of “Field Replaceable Processor Packs”.

Dual FRPP (“Field Replaceable Processor Packs”) are installed into slots in the front of each board,

and it is in these CPUs where gridding occurs. Final integrated grid points are then transmitted to

final image processing CPUs via the 1G (or 10G) M&C network (assuming the data rate is vastly

reduced), so that there is only one external network connection to each board. The final installed

rack front view will be similar to Figure 4‐14, except there is only one network cable travelling to

each board, rather than several. Thermal issues are also a concern with this approach, and will likely

be the governing factor as to whether subsuming the gridding CPUs into the correlator is feasible or

not.

A survey of existing CPU mezzanine card standards and connectors did not reveal anything suitable

both in terms of assumed CPU horsepower requirements, form‐factor, and bandwidth of the

mezzanine connector to the motherboard. Thus, a custom form‐factor might be required, or time

might have to elapse until a standard—suitably matched to the task—becomes available. A 40‐pair

0XT or HM‐Zd connector would fit the bill as far as bandwidth goes (400 Gbps). The CPU pack should

be field replaceable so that it can be easily upgraded without having re‐design the board, remove

the board from the rack, or power it down.

4.3.1 Discussion

The proposed VBA scheme is likely not required for the SAA or DAA correlator, as each board

generates all of the cross‐correlation products for a band/slice‐beam, and so all of the data is

available in one cluster of output spigots. At most, SAA and DAA might require distribution nodes to

route correlated data packets to multiple CPUs for gridding, assuming one CPU (and by “CPU” this

40-pair0XT F

40-pair0XT F

40-pair0XT FRJ-45

40-pair0XT F

Field Replaceable Processor Pack

Power

40-pair0XT F

Field Replaceable Processor Pack

Power

40-pair0XT F

RouterFPGA or ASIC

RouterFPGA or ASIC

M&C FPGA

TB

L R

1/10G Eth32 10G pairs 32 10G pairs



FRPP STATUS: ACTIVES/W V2.53

FRPP STATUS: ACTIVES/W V2.53

X421 X421 X421 X421 X421 X421 X421 X421

X421 X421 X421 X421 X421 X421 X421 X421

WP2‐040.050.010‐TD‐001 Revision : 1

2011‐03‐11 of 59

could mean multi‐core CPU) can’t handle the full load for that band/slice‐beam, and there is no

overlap (correlated data product sharing requirement) between band/slice‐beams.

For dish correlation, as the capacity of the ‘X’‐ASIC increases as discussed in section 4.2.4, the

number of boards required to perform cross‐correlations decreases, and output “bandwidth

density” increases. This could change the morphology of the VBA, and render the currently

proposed scheme obsolete. Nevertheless, it would seem that it is always likely the case that there

needs to be some way of splitting up the gridding task to multiple processors, either in u‐v space, or

in time.

Clearly due to instrumental and atmospheric effects, it is not possible to blind open‐loop grid data

points. It is assumed that on some regular basis and for some short period of time, all of the

visibilities must get to a single CPU where iterative gridding and image processing are performed to

determine calibration coefficients that can then be applied, for some longer period of time, to open‐

loop‐grid the data. In this case, the VBA network does not have to change. The algorithm presented

on page 24 is just augmented with the ability to tell each correlator board/packet‐router that for a

specific period of time, all packets are routed to a particular gridding CPU (or CPU hanging off a

particular gridding CPU). In this way it is possible to time‐multiplex across all available CPUs,

iterative image processing for calibration coefficient generation. Calibration coefficients could then

be distributed to all gridding CPUs using the VBA network, or a separate network. Part and parcel

with this scheme is likely the need for each gridding CPU to buffer large amounts of incoming data

until calibration coefficients become available. To minimize the duty cycle and frequency with which

calibration coefficients are calculated, it is prudent to try to ensure that all SKA elements are

designed for the longest‐term stability possible.

4.4 Central Beam‐former Data Access

Details of the central beam‐former concept are contained in a separate document. However, some

points about how to access the data are as follows:

For dishes, each band/slice F‐part output data is accessed at either the edge of the matrix

shown in Figure 4‐3.

For dishes, racks at either end of the band/slice correlation matrix perform beam‐forming

operations in a hierarchical manner.

What comes out of these racks are spigots of band/slice beams, which can be merged with

other band/slice beam outputs (i.e. same beam, different spectral channels) in centralized

equipment contained in separate racks, with final output from each beam, all spectral

channels, routed to non‐visibility processing equipment.

For SAAs and DAAs, all signals necessary to beam‐form a band/slice are available on one

board, and so central beam‐former signal processing is most likely contained on the same

circuit board as the correlator. Band/slice spigots are then merged in centralized equipment

contained in separate racks as for dishes, with final output from each beam, all spectral

channels, routed to non‐visibility processing equipment.

WP2‐040.050.010‐TD‐001 Revision : 1

2011‐03‐11 of 59

4.5 Phase‐I SKA

The concept design presented in the preceding sections was developed for the full‐scale SKA

consisting of several thousand dishes (WBSPF or PAFs), and up to 256 SAA and DAA elements. This

section briefly describes how the design can be down‐scaled to the Phase‐I SKA requirements, as

defined in memo 130 [7]. However, it is believed that the most cost‐effective Phase‐I

implementation is with FPGA boards using COTS infrastructure, as the scale of Phase‐I pales in

comparison to Phase‐II.

4.5.1 Phase‐I F‐Part

The F‐part of the correlator, whether for SAAs, DAAs, or dishes, is no different for Phase‐I than it is

for the full‐scale SKA, as the design requires only a partial distributed corner‐turner, rather than a

monolithic corner‐turner. Phase‐I F‐part is 1/10th of the full‐scale problem, and so it is likely the case

that FPGAs or “hard‐copy” FPGAs are used in the implementation.

4.5.2 Phase‐I X‐Part: Dishes

The full‐scale SKA dishes X‐part is ~100X the size (processing, data handling) of the Phase‐I SKA

dishes X‐part. For ≤256 dishes, with 1 GHz/polarization (memo 130, Table 1), each X421 correlator

board (Figure 4‐7) can correlate 150 MHz/polarization, and so only 7 boards are required. However,

with the X421 correlator ASIC, these 7 boards produce only 28,672 channels per polarization product

across that bandwidth, whereas the memo 130 specification calls for 67,000 channels per

polarization product. Thus, a factor of between 2 and 3 over this is required to fulfil the

requirement. In this case an X421 ASIC, processing a single band/slice of 25 MHz/polarization,

operates at lower bandwidth, but running at a lower (33%) incoming packet duty cycle. By buffering

and re‐arranging the data, and eliminating baseline‐based time and channel integration9, it is

possible to use the same number of boards and chips to produce the required numbers of spectral

channels, as will be shown in the next section.

4.5.3 Phase‐I X‐Part: SAAs

For SAAs, Phase‐I calls for 50 elements, 480 beams, 380 MHz/polarization/beam, and 380k

channels/polarization/beam. Ignoring for a moment the 50 element requirement, and the mismatch

with the 32x32 element processing capacity of the X421 ASIC, the main challenge here is to optimally

produce the very large number of spectral channels, but hopefully within the internal RAM

capabilities of the chip.

The large number of spectral channels can be produced by the chip by breaking up the 75

MHz/polarization bandwidth capability into smaller ~2 MHz band/slices, each one with 2048

channels, and re‐arranging the data packets entering the chip to produce ~1 kHz channel resolution

across ~74 MHz. This re‐arrangement is shown in the following Figure 4‐16:

9 Already a feature of the chip design.

WP2‐040.050.010‐TD‐001 Revision : 1

2011‐03‐11 of 59

Figure 4‐16 Data re‐arrangement to produce ~1 kHz spectral resolution across ~74 MHz/polarization of

bandwidth, using the X421 chip. Accumulator double‐buffering in the chip allows one band/slice to be

correlating/integrating, while the other band/slice results are transmitted on the 4 x 10G outputs.

In this scheme each band/slice “Packet Group”, assuming a channel‐burst length of 64, 2048

channels, 8‐bit samples (4b‐I, 4b‐Q), and dual polarization, requires 2 Mbytes (64 ch_burst x 8 ants x

2 pol’n x 2048 channels). For each band/slice, 18 such Packet Groups must be buffered, and there

are 37 band/slices, for a total of 2 M x 18 x 37 = 1.332 Gbytes—easily handled in the F‐part with

external DDR3 SDRAM (which must have 20 Gbps I/O—64‐bit wide DDR3 SDRAM has ~50 Gbps I/O).

Every ~30 msec, 2048 channels/polarization product, for all baselines, for 2 MHz of bandwidth are

produced by the chip on the 4 x 10G outputs (recall that 8 chips in a column are capable of 170

msec, therefore 1 chip is capable of 170 msec/8 = 21.25 msec). This represents an (original) “sample

time” integration time of ~1.152 seconds, meeting the Phase‐I minimum integration time

requirement in SKA memo 130, Table 3, of 1.2 sec. For longer integration times, an off‐chip LTA can

be used.

When operating in this way, the X421 chip’s ability to perform baseline‐based integration is disabled,

however, the ability to label the output data frames with the u, v, and w centroid remains intact.

Baseline‐based integration could still be performed in an off‐chip LTA.

As the X421 chip array‐based design is a mismatch with the 50 element (1250 baselines) correlation

requirement, it might be possible to incorporate in the X421 design data paths with the capability of

performing an optimal 50‐element cross‐correlation. Perhaps there could be 5 x 10G inputs, each

1 1

Packet Groups: 8xAnt, 2 MHz BW, 2 pol’n, 4+4 bit samples, 2048 channels, ch_burst=64

2 23 3

ch_burst*1/ch_width = 64 msec(37 x 2 MHz = 74 MHz BW/pol’n)

band/slice #:37

436

1.68 msec(real time)

1 1 1 1 1 1 1

1 2 3 4 5 17 18

64 msec(sample time)

1.68 x 18 = 30.24 msec (real time)

64 x 18 = 1.152 sec (sample time)

2 2 2 2 2

30.24 msec (real time)

1.152 sec (sample time)

22

1 2 3 4 5 17 18

to X421 ASIC

2048 channels across 2 MHz of band/slice 1

2048 channels across 2 MHz of band/slice 2

BUFFER

37

37

start integration stop integration stop integration

37

75,776 channels across 74 MHz

start integration

1

WP2‐040.050.010‐TD‐001 Revision : 1

2011‐03‐11 of 59

with 10 antennas multiplexed instead of 8, for a bandwidth of 60 MHz/polarization instead, while

still processing either 1024 baselines in “array mode” or 1250 baselines in “cross‐correlation matrix

mode”. Or, a separate ASIC is developed just for the Phase‐I SAA application.

Assuming that the chip can be designed for 50 elements, 60 MHz/polarization, and operate as

described above to produce the required numbers of spectral channels, to cover 380 MHz of

bandwidth then requires 380/60 = 7 chips per beam. Seven chips, along with one or more external

LTAs (made up of FPGAs with external DDR3 SDRAM), could easily fit on one board (and possibly

even double the number of chips could fit on a board). It would therefore require 480 boards

(possibly 240 boards) to process all 480 beams. At 1U height each, and 48 boards in a rack,

approximately 10, 19” racks (possibly 5 racks) would be required for the X‐part of the correlator for

Phase‐I SAAs.

4.5.4 Phase‐I Central Beam‐forming

As all of the band/slice data required for beam‐forming is present on each board, central‐beam‐

forming will likely occur on each correlator board, with merging of band/slice packets occurring in

separate equipment before final output to non‐visibility processing.

4.6 Monitor and Control

Referring to Figure 4‐4, each X‐part correlator board has an M&C 1G (or 10G) network connection to

a higher‐level centralized M&C computer via COTS network switches. Each board could have its own

unique “rack‐slot” ID, determined by hard settings in the mid‐plane slot, communicated to the on‐

board M&C FPGA, and forming part of the board’s IP address10.

The X421 chip is built to be largely autonomous in its operation, with all required commanding

coming from the F‐part of the correlator, via the content and arrival time of data packets and INFO

packets. M&C for the chip is required only for determining transceiver status, eDRAM status, PLL

status, frame detect/overrun status, etc. M&C for the board would include the ability to monitor

voltages and temperatures, and a method for remotely commanding power‐ON and power‐OFF of

the board11.

The F‐part of the correlator, of course, needs to synchronize FFT and packet build/buffer epochs

across the system (for each particular SKA element type). The F‐part inserts INFO packets within

each Packet Group, which transmit to the X‐part and include calculated array‐phase‐center‐based u,

v, and w, observation information, as well as integration time information and timestamp

information. The F‐part calculates u, v, w, from the delay model, the center frequency of the center

spectral channel, and the antenna coordinate w.r.t. the array phase center. The X421 chip is able to

operate in “epoch‐less” integration mode (a necessary condition if baseline‐base integration is

performed), and so start and stop integrations don’t have to be communicated across the F‐part of

the correlator. (If integrations must be synchronized across all baselines, then provision is made for

10 A similar scheme is successfully used in the EVLA correlator; boards are largely known by their location, rather than serial number. 11 For the EVLA correlator, this was done by a separate pair of wires to each board running to a central computer. Another layer of hierarchy here might be more appropriate.

WP2‐040.050.010‐TD‐001 Revision : 1

2011‐03‐11 of 59

start/stop integrations in the X421 chip design.) Time must also be distributed to all F‐part hardware

for timestamping of INFO frames and synchronization.

The F‐part of the correlator therefore has much more hierarchical real‐time control than the X‐part,

but is also considerably smaller (less hardware and rack space). It is expected that on each F‐part

board, a CPU core running on an FPGA, with a 1G network connection, could accomplish the

necessary M&C functions and communications.

4.7 Upgrade Growth Paths

As discussed in [1], the GSA concept allows technology upgrades whilst using the existing system

infrastructure. As technology improves (i.e. more logic packed into an ASIC), the size of the

correlation matrix shrinks. This allows more band/slice correlators to be packed into a smaller

space, allowing for more band/slices (i.e. bandwidth or beams) to be correlated for the same power

and space available. The only thing that needs to be done is the population of patch boards need to

change to accommodate a different band/slice correlator size.

At some point the horizontal board mounting scheme may become obsolete, allowing for vertical

board mounting wherein all baselines for a band/slice are correlated in a crate of boards. In this

case, the same racks and cooling infrastructure may be used, but with airflow running vertically

through the racks rather than that shown in Figure 7‐2.

WP2‐040.050.010‐TD‐001 Revision : 1

2011‐03‐11 of 59

5 Risks

This section contains a table of possible risks, and risk mitigation strategy, as applied to full‐scale SKA

correlator systems, based on the concept presented in the previous section.

Risk Effect/MitigationNo acceptable air cooling

strategy found.

Use liquid cooling. Custom solution likely required, possibly using

COTS components. Web survey indicates there are several

companies capable of developing robust liquid cooling solutions.

Probably requires more expensive maintenance and therefore higher

operating cost.

Correlator (X421) ASIC

dissipates too much

power12.

Higher board power dissipation, higher than desired operating costs.

More expensive and complicated cooling strategies, process

narrower frequency slice per board, or de‐scope bandwidth and

processing requirements.

Proposed VBA (visibility‐

based addressing) scheme is

untenable from a

fundamental image

processing perspective.

Develop a different scheme, likely a time‐slice addressing approach

but possibly still using the embedded correlator network for

routing/distributing correlated data packets.

Horizontal or vertical

mechanical tolerance

problems eliminate the

possibility of using printed

wiring connections.

Improve mechanical tolerance specs of sub‐racks, racks, and floor

mounting hardware to meet requirements. If necessary, use short

(but more expensive) cables instead of patch boards.

4‐bit(I) and 4‐bit(Q) re‐

quantization and correlation

is not sufficient number of

bits.

Change re‐quantization to 8‐bit(I) and 8‐bit(Q)13. Each chip now

processes ½ the bandwidth.

Table 5‐1 Possible risks and risk mitigation strategy.

6 Requirements

Void

12 There is some indication that X421 chip area and power will not be unreasonable if implemented in <=~30 nm technology, but final assessment of feasibility will be available once a formal design study by an ASIC vendor is complete. 13 5-7 bits/sample don’t fit nicely into a 32-bit or 64-bit packet word, but the implications of using this range of bits remains to be explored.

WP2‐040.050.010‐TD‐001 Revision : 1

2011‐03‐11 of 59

6.1 Item Definition

Void

6.1.1 General Description

Void

6.1.2 External Interfaces

Void

6.1.3 Internal Interfaces

Void

6.1.4 Modes

Void

WP2‐040.050.010‐TD‐001 Revision : 1

2011‐03‐11 of 59

7 Characteristics

7.1 Performance Characteristics

Void

7.2 Physical Characteristics

7.2.1 X‐part Correlator Boards

Correlator boards for dishes might optimally be a different design (Figure 4‐4, Figure 4‐7) but use the

same ASIC and basic structure as for SAAs and DAAs. Each board may be contained within its own

independent mechanical cage, built to easily slide in and out of a rack slot. Boards may be hot

swapped.

Boards for dishes, SAAs, and DAAs, are approximately 16.5” W x 16‐18” D, depending on whether or

not FRPPs are installed for gridding operations. Each board likely contains a monolithic heatsink,

mechanically and thermally attached to the array of correlator chips and FPGAs for cooling. If liquid

cooling is used, the cooling lines are integrated with a hollow plate, similarly attached to chips on the

board. Cooling lines may enter/exit the board either by the front (if blind‐mate connectors are not

available), or by the rear.

7.2.2 X‐part Racks

Racks for dishes are precision engineered and installed to meet mechanical tolerance requirements

necessary to allow for establishment of printed wiring connections. Sub‐racks are installed in racks,

and are engineered to the same standards. Depending on ASIC capacity, and corner‐turner

granularity, it may not be necessary for rack‐to‐rack printed wiring connections, in which case COTS

racks and equipment may be used.

For dish‐WBSPFs of up to 3072 dishes and 1.05 GHz/polarization, 7, 12x12 board arrays are required

to produce all polarization products using the baseline X421 ASIC design. Thus, there are two bays

of 12, 19” racks, with 48U available vertical space, to house the WBSPF correlator, occupying two

floor areas of approximately 7 m W x 1 m D (assuming a rack depth of 1 m). These racks do not need

to be located together or particularly near the F‐part of the correlator. Minimum clearance at the

front and back of each bay is 1 m14, and racks or other equipment may be placed on either side15.

Total floor area is therefore ~7 m x 4 m. Total board count here is 1008. Standard spectral‐channel

capacity in this configuration is 28,672 channels/baseline/polarization product (or total 114,688

channels/baseline); more spectral channels, but without on‐chip baseline‐based integration can

likely be obtained using the method described in section 4.5.3.

For dish‐PAFs of up to 2048 dishes, 750 MHz/polarization, and 30 beams, 150, 8x8 board arrays are

required to produce all polarization products using the baseline X421 ASIC design. Two of these bays

could be shared with the WBSPF correlator, but the extra complexity might not be worth the effort.

14 Although, in the EVLA system, 2.5’ (76 cm) was found to be sufficient. 15 As previously noted, racks containing central beamformer equipment could be located on each side.

WP2‐040.050.010‐TD‐001 Revision : 1

2011‐03‐11 of 59

Six 8x8 board arrays can fit into an 8‐rack bay, and so 25, 8‐rack bays are required for the correlator.

Each bay occupies a floor area of approximately 4.5 m W x 1 m D. These racks do not need to be

located together or particularly near the F‐part of the correlator. Minimum clearance at the front

and back of each bay is 1 m, and racks or other equipment may be placed on either side. Total floor

area, placing two bays side‐by‐side is therefore ~9 m W x 14 m D. Total board count here is 9600.

Standard spectral‐channel capacity in this configuration is 20,480 channels/baseline/polarization

product (81,920 channels/baseline); more spectral channels, but without on‐chip baseline‐based

integration can likely be obtained using the method described in section 4.5.3.

Racks and sub‐racks for DAAs and SAAs can be standard COTS equipment as there is no need for

rack‐to‐rack printed‐wiring connections. For 250 elements, 1000 beams, and 300 MHz/polarization,

two boards are required for each beam, requiring 2000 boards for each telescope, or a total of 4000

boards. Assuming a similar packing of 48 boards per 19” 48U rack, a total of 84 19” racks are

required. If arranged in 10‐rack bays, total floor area is ~6 m W x 11 m D. Standard spectral‐channel

capacity in this configuration is 8192 channels/polarization product/baseline/beam, with more

channels possible as described in section 4.5.3.

A possible floor layout of the X‐part correlator racks, for the full set of SKA elements is shown in

Figure 7‐1 below:

Figure 7‐1 Possible full‐scale SKA X‐part correlator layout, using the baseline X421 ASIC, measuring ~34 m W

x 22 m D. Dish‐WBSPF: 1.05 GHz/polarization, 3072 elements. Dish‐PAF: 750 MHz/polarization, 2048

elements, 30 beams. SAA: 300 MHz/polarization, 256 elements, 1000 beams. DAA: 300 MHz/polarization,

256 elements, 1000 beams. Enough ‐48 VDC power plant capacity is shown for ~3.2 MW (as shown in the

table on page 53, more like 27 x 200 kW power plants are required). This layout assumes air cooling as

shown in Figure 7‐2. Not shown are the F‐parts of the correlator, or any COTS computers or network

equipment for M&C, image processing, or non‐visibility processing.

48VDCPowerPlant+

Batteries

48VDCPowerPlant+

Batteries

48VDCPowerPlant+

Batteries

48VDCPowerPlant+

Batteries

48VDCPowerPlant+

Batteries

48VDCPowerPlant+

Batteries

48VDCPowerPlant+

Batteries

48VDCPowerPlant+

Batteries

48VDCPowerPlant+

Batteries

48VDCPowerPlant+

Batteries

48VDCPowerPlant+

Batteries

48VDCPowerPlant+

Batteries

48VDCPowerPlant+

Batteries

48VDCPowerPlant+

Batteries

48VDCPowerPlant+

Batteries

48VDCPowerPlant+

Batteries

HVAC HVAC HVAC HVAC HVACHVAC

HVAC HVAC HVAC HVAC HVAC HVAC

Dish-WBSPF SAA DAA

SAA

DAA

Dish-PAF

WP2‐040.050.010‐TD‐001 Revision : 1

2011‐03‐11 of 59

7.2.3 F‐part Boards and Racks

Standard COTS board sizes, sub‐rack, and rack equipment may be used. The F‐parts of the

correlator(s) may be located in the same or a separate room, a separate building, or even at the

antennas. There are point‐to‐point connections between the F‐parts and X‐parts of the correlator,

requiring short‐haul copper, or more likely fiber, capable of 100 m or so range.

7.2.4 X‐part Synchronization

It is believed that it is not necessary for there to be any globally‐distributed clock or synchronization

signal within the X‐part of the correlator. Each board should be able to operate using its own local

reference clocks, using standard MGT rate‐matching techniques used in telecommunications. Thus,

there is no need for any connections or wiring to handle global clock distribution.

7.2.5 Rack Power Delivery and Control

Each rack will likely have a mains ‐48 VDC power entry and breaker panel, fed by overhead room

bus‐bars, similar to that found in major telecommunications central offices. Wires coming off each

breaker individually route to each board within each rack. A per‐rack, remote power controller will

be used to allow the remote power control of each board in each rack, and this will tie into the rack

network switch (if this is the case, the remote power controller can’t control power to the switch,

likely not required anyway). In Figure 7‐1 a number of distributed COTS ‐48 VDC power plants are

used to supply power to overhead bus bars.

7.2.6 F‐part Data Insertion into the Correlator

The 0XT or HM‐Zd connectors on the correlator board facilitate inexpensive board‐to‐board

connections for distribution of X‐part data. However, for the first insertion of F‐part data (Figure

4‐3), data from 32, 10G sources, each sourcing from 8 SKA elements (antennas) needs to be

converted to appropriate electrical form. It may be possible for a small plug‐in device—with a multi‐

(32)‐fiber connector on one end, and a 40‐pair 0XT connector on the other end—to be designed to

easily perform the conversion. The motherboard could supply power to this converter, via spare

pins on the 0XT connector. Further design study is required to determine exactly how the F‐part of

the correlator connects to the X‐part.

7.2.7 M&C Network

Each rack will likely have its own 48+2 port 1G (or 10G) COTS network switch, powered off ‐48 VDC,

with point‐to‐point network connections to each board in the rack, and network connection to a

central control switch. All network cabling will be routed in overhead cable trays. If a 1G network is

used, cat6e UTP cable, using RJ‐45 connectors is likely sufficient for the job.

7.2.8 Output Data Network

This network depends on whether the image processing CPUs are subsumed into the correlator or

not. If so, then if the data rate for final gridded points is low enough, no separate network is

required, and low‐rate gridded points can possibly be transmitted to the final image processors

WP2‐040.050.010‐TD‐001 Revision : 1

2011‐03‐11 of 59

using the M&C network. If high‐capacity network connections are required, then all of these

presumably fibre cables, from the front panels of all of the boards, will be routed in overhead cable

trays to destination computers contained in the same or different room or building.

7.2.9 Thermal Considerations

The horizontal mounting of circuit boards in racks is not particularly conducive to a simple rack or

crate‐based air‐flow cooling arrangement. Front‐to‐back airflow, with individual fans on each

board16, might be possible but then mechanical (fan) failures are distributed to all boards, making

repair/replacement less than desirable (although, large data centers with 19” rack‐mount pizza box

CPUs and switches do just this). There is also the 1U height restriction which will limit the amount of

air that can flow, and the power dissipation that can be handled.

Another possibility is liquid cooling where there is a monolithic heat spreader/exchanger mounted

to all of the chips on the board, with cooling lines exiting the front or rear of the board. There are

companies which engineer such things for specialized applications [8].

Yet another possibility is centralized air blowers, pressurizing the floor and the space between racks

with air flowing side‐to‐side (transverse) across board components. This possibility is shown in

Figure 7‐2. This scheme might have problems with maintaining uniform air flow across all the boards

in the rack and might need a graduated airflow restriction vent to achieve reasonable uniformity.

Nevertheless, it does have the advantage of centralizing blowers and, in principle, should be possible

provided the power dissipation on each board is not unreasonable.

Figure 7‐2 Centralized air‐blower cooling arrangement.

16 Or, with larger rear-rack blowers/suckers.

Cold air in

Warm air outWarm air out

Pressurized floor cavity Cold air in

Boards Boards

Boards

Boards Boards

WP2‐040.050.010‐TD‐001 Revision : 1

2011‐03‐11 of 59

7.3 Electrical Characteristics

The boards within a band/slice bay of racks are tightly electrically connected. Good common‐mode

conducted noise filtering on power supplies to meet FCC Part 15, Sub‐part J, Class B conducted EMI

requirements17, good grounding/bonding of racks together and to system ground, and the use of

differential signalling help to ensure reliable data transport between racks. In the EVLA system such

an approach was used and, with much longer cables connecting racks together, reliable data

transport on ~12,000 1 Gbps differential pairs was achieved. Short nearest‐neighbour connections

within the GSA should have equally good data transport reliability, at 10X the bandwidth per pair of

the EVLA.

COTS fiber and M&C network connections use methods and follow standards to ensure electrical

isolation between the two ends. Therefore, M&C COTS network equipment, the F‐part of the

correlator, and other equipment such as image processing computers etc. are properly isolated, and

therefore do not have to be tightly physically or electrically coupled to the main X‐part correlator

system.

All boards in the correlator are supplied mains ‐48 VDC power. COTS power systems used in

telecommunications are readily available, complete with battery backup at a capital cost of about

$1/W. ‐48 VDC hot and return route to each board, and are completely isolated from signal, chassis,

and earth ground, thereby establishing known current paths. The return signal of each power plant

is only connected to earth ground at the power plant distribution panel. The use of ‐48 VDC power

eliminates the requirements for high‐voltage safety engineering and agency approval in non‐COTS

equipment. COTS power plants are built for reliability with N+1 redundancy, hot swap capability,

built‐in alarms, and hot‐standby battery backup. Such a system is used in the EVLA correlator with

very good availability/reliability and performance.

7.4 Reliability

The X421 ASIC is designed for board and system reliability as is nearest‐neighbour printed wiring

connections. As discussed in section 4.2.1, the ASIC minimizes the number of solder points on the

board, and this, as well as the resulting simpler board design, improves long‐term board and system

reliability. At a nominal 50 FITS18 (failures in 109 hours) per chip, the total X‐correlator system MTBF

as far as ASICs goes is ~22 hours (930k chips). Also, an ASIC‐dominated system, because the ASICs

are hardwired, is not subject to soft configuration SRAM failures as are FPGAs, something that has

been seen on a number of occasions in the EVLA correlator with ~16k FPGAs.

To maximize reliability, the recommendation with the EVLA ASIC was to keep junction temperature

at or below 50 oC. At this temperature, every 10 oC rise or fall in junction temperature is ~2X change

in reliability [9]. Unless ASIC fabricators indicate otherwise, to achieve good reliability, 50 oC should

be the maximum junction temperature target for thermal design of the system.

17 Most importantly to deal with ground-loop noise infecting differential signals. 18 No reference given, but chips of this sort of complexity (EVLA ASIC, Altera FPGAs—Altera quarterly reliability report) historically test with failure rates of ~30-50 FITS. However, no experience with very small feature sizes <~30 nm required to implement the X421 ASIC is available.

WP2‐040.050.010‐TD‐001 Revision : 1

2011‐03‐11 of 59

Further study, using Telcordia SR‐332 (telecom standard) and/or MIL‐HDBK‐217F (military standard),

once a more detailed design is complete, is required to further quantify reliability.

7.5 Maintainability

As long as centralized air movement is possible, there are no moving parts in any of the correlator

boards to fail or require maintenance. With a ‐48 VDC mains supply, there are no AC‐DC power

converters distributed amongst the boards, and so failure of high‐voltage semi‐conductor devices,

and AC‐smoothing electrolytic capacitors in boards is not an issue. Thus, maintenance requirements

of the boards should be minimal. Some issues that could arise and need to be considered are:

Hot removal of a board, if the VBA network is in place, requires removal of front‐panel

printed wiring boards. These should therefore be designed to make such removal easy,

possibly by integrating handles or ejectors in their design.

Hot removal of a board will interrupt signal flow to other boards in the correlator matrix.

The board and the X421 ASIC must therefore be designed to tolerate and automatically

recover from such interruption. Note that the VBA network does not suffer this same fate

as it is possible for packets to be routed around the missing board.

Other than VBA network connections, the only cable that needs to be removed to remove a

board is the front‐panel network cable, possibly an RJ‐45 UTP or a fiber.

To reduce temperature cycling of boards, and possible long‐term solder‐joint stress issues,

it is a good idea in principle to maintain constant board temperature even through loss of

signals, power failures, HVAC failures etc. Unfortunately, this is in conflict with the desire

for centralized cooling; it may be that some boards go down or off‐line, while others remain

hot‐running, and so it will likely be difficult to individually temperature control each board.

Mitigating this effect is the use of a bare minimum number of solder joints in the board

design.

If the image‐processing/gridding computers are subsumed into the correlator (i.e. using

FRPPs), they should not contain moving parts such as hard drives19 or local cooling fans to

minimize the potential for distributed failures; high‐capacity solid‐state NV memories are

likely capable of meeting storage requirements on the time scale required by the SKA.

They, and the motherboard, should be designed so they are easily hot‐swap replaceable

without having to remove the motherboard or any network connections, or power‐down

the motherboard, or otherwise interrupt data flow on the motherboard.

The correlator room should be kept clean and general access restricted to avoid particle

contamination and ESD hazards. For the EVLA, the room is strictly‐enforced ESD safe,

restricted personnel access, and was cleaned and has filters in place to meet ISO Class 8,

with an additional MERV 13 filter in the HVAC systems.

19 Although this assumption may be incorrect, given the proliferation of very small, high-capacity hard drives in laptops.

WP2‐040.050.010‐TD‐001 Revision : 1

2011‐03‐11 of 59

7.6 Availability

Based on the shear numbers of ASICs in the system, the simple reliability calculation of section 7.4

indicates that on average, it would not be unexpected to see 1 failure in the system every day, once

the initial infant mortality failures have worked themselves out of the system. Up until that point,

higher failure rates are to be expected.

Thus, it should be expected that some correlator data products will not be available at any given

time. With centralized DC mains power, and N+1 fault‐tolerance in those systems, the vast majority

of the correlator should, however, be available a high percentage of the time. Experience with the

EVLA has shown that it takes substantial time to fully shake‐down infant mortality failures and

hardware and software bugs, and so for the first ~1‐3 years availability will likely be sporadic, but

increasing with time. There will also likely be times and conditions when very strange and unlikely

behaviour and interaction between sub‐systems occurs, seemingly defying explanation. Only time

and persistent testing and debugging will shake these things out of such a large system.

For maximum availability and reliability, the importance of rigorous testing, strict quality control,

ESD‐controlled handling etc. through the entire life‐cycle of the correlator can’t be over‐emphasized.

Once fully operational, stable, and within the long‐term low failure‐rate region of the failure‐rate vs

time “bathtub curve”, the following exceptions, which reduce availability can be expected:

Regular system protection‐level tests. Examples are AC‐fail tests, fire‐alarm/fire‐suppression

tests, HVAC failure tests.

Software/firmware upgrades.

HVAC failures. HVAC servicing.

COTS network and computing equipment failures.

AC Mains power failures. Reliable battery backup is very expensive; count on ~5 minutes of

full‐power backup, and ~15 minutes of ~75%‐reduced power backup. Backup beyond these

requirements most efficiently comes from external sources (e.g. diesel generators).

‐48 VDC power‐plant failures. These systems are telecom‐quality and designed for high‐

reliability, but they, too, are made of and by “mere mortals” and have their own issues.

Connector/contact failures. Given the shear number of connections, there are bound to be

failures here. High‐reliability contacts and manufacturing processes must be used to

minimize these effects.

Internal ASIC failures.

Internal FPGA failures; SRAM configuration “soft” failures.

Board solder‐point failures.

WP2‐040.050.010‐TD‐001 Revision : 1

2011‐03‐11 of 59

Residual bug‐induced failures. These can be very intermittent, and take substantial time to

shake‐down.

Human‐induced failures. Examples are incorrectly installed cables, debris left from

installation getting where it shouldn’t, ESD handling faults, miss‐reading of

handling/installation/repair procedures etc.

Unknown source failures.

Although 100% availability is the ideal goal, in reality, with all of the above issues factored in, on

average ~90% availability, with 10% loss coming in major “planned” timeslots, should be the realistic

goal.

7.7 Additional Quality Factors

7.8 Environmental Conditions

It is assumed that the correlator is contained in one or more RFI‐shielded, clean, ESD‐restricted

rooms, contained in one or more buildings. Nominal ambient temperature is 20 oC, however 15 oC, if

available, would improve reliability by lowering semi‐conductor junction temperature.

7.9 Transportability

The correlator, as a whole, is not transportable once fully installed. Nor does it have to be. All racks,

power plants, cables, and boards are shipped to the correlator site, and installed there.

There are no pieces of the correlator which have extreme “monolithic” transport requirements. All

pieces can ship using standard methods in fork‐lift‐size containers.

7.10 Flexibility and Expandability

Provided room space is allocated, the correlator can be expanded both in terms of bandwidth (by

adding band/slice correlators), in terms of numbers of antennas, and in terms of numbers of spectral

channels (without adding X‐part hardware, but with increased integration time) as shown in section

4.5.3. The number of antennas limitation is governed by the total delay of the signal as it traverses

the cross‐correlation daisy‐chain, and input buffering provided in the X421 ASIC.

For SAAs, and DAAs, the “soft” limit is 256 elements, above which all correlations do not fit on a

single board. Beyond 256 elements, boards can be re‐deployed and connected as is done for dishes

to process more elements with a 256‐element granularity.

As mentioned in section 4.5.3, the X421 ASIC design is not particularly suited for Phase‐I SAAs. A

different or modified ASIC design is required for optimal usage of silicon.

The basic rack and cooling infrastructure can be installed for Phase‐I, and expanded/grow to

encompass the full‐scale SKA.

WP2‐040.050.010‐TD‐001 Revision : 1

2011‐03‐11 of 59

7.11 Portability

The system, once installed, is not portable.

8 Design and Production

8.1 Components, Materials and Process

To achieve high reliability and availability, engineering starts at the component level and builds from

there. It is highly recommended that the following methods are in place for engineering and

construction of the full SKA:

Establish a separate, appropriately‐sized, components reliability group, whose goal is to

perform due‐diligence and reliability analysis of every single component (which has potential

for failure) used in every part of the system. A component cannot be used in any part of the

design if it is not approved by this group.

Establish a module‐level reliability testing group and laboratory for the SKA. All modules

must meet and pass all reliability requirements and testing required by this group. The

laboratory would be equipped with HALT/HASS (Highly Accelerated Life Test, Highly

Accelerated Stress Screening) equipment, and with associated expertise, to ensure

effectiveness. It might be cost‐effective to contract some or all of this function to industry.

Establish a quality‐control group, whose sole responsibility is to establish and ensure that all

modules and equipment delivered to the correlator site meet strict quality requirements.

Some of this functionality might be contracted to specialized industry (e.g. SEM‐analysis of

PCBs, semi‐conductors, solder processes).

Establish a module‐level test group, whose sole responsibility is to develop test setups to

rigorously stand‐alone‐test any modules which are to be delivered to the correlator site. For

example, for an ASIC development, there should be a small test group that works in

conjunction with the ASIC designer(s) to develop test suites, test beds, etc. throughout the

full life‐cycle of the ASIC. As another example, when a board is to be developed, the test

group works in parallel with the board design group to develop test fixtures, stimuli,

methods, and procedures.

Establish a system integration and test group, whose job is to perform engineering‐site and

correlator‐site integration and test, and feedback information to designers and other test

groups to find and fix bugs and faults.

It is expected that all correlator circuit board assemblies will be delivered from contract

manufacturers in a full turn‐key manner. This requires that representatives from most of

the above test groups will spend time helping to setup the contract manufacturer’s

processes, monitoring processes, setting up final contract manufacturer Q/A etc. so that

products that meet all reliability and test requirements are delivered to the correlator site.

The above recommendations will help to ensure that, once delivered, installation, and system

integration and test proceeds expeditiously. If these recommendations are not followed, then it can

WP2‐040.050.010‐TD‐001 Revision : 1

2011‐03‐11 of 59

be expected that final system integration and test, and full operational availability will be delayed

accordingly.

Of course, design groups are also required; these groups may or may not be in separate locations,

and likely include:

ASIC design group. Carry out the design of one or more ASICs, working in close conjunction

with industrial fabricators, and the ASIC test group.

FPGA/hardcopy design group.

Board design group.

Software design group.

Mechanical design group.

System design group. Working closely with all of the above groups to address thermal

design, system design, and deployment and installation. Develop and arrange for fabrication

and installation of all systems‐level components.

8.2 Development and Production Plan

This plan is roughly based on the EVLA correlator experience, but necessarily ramped‐up in rigour to

address higher volume requirements, and more rigorous engineering to address higher system

complexity and size. This plan addresses development and production of the full‐scale SKA (i.e.

Phase‐II), without regard to what might already be in place for Phase‐I, as it is believed that Phase‐I

as far as the correlator is concerned, might be throw‐away technology, and not naturally grow to

Phase‐II, at least within the context of the GSA concept.

It is envisioned that with any particular circuit board, there will be 4 build stages. These kinds of

stages can also apply to custom‐designed mechanics or other systems, as deemed appropriate, but

the focus here is on circuit board assemblies. The development of the GSA X‐board is the model for

this plan. Of course, prior to these build stages there is significant ASIC and FPGA code development

and test, as well as board design and modelling. The development and production model is such

that design and development occurs “in‐house” by SKA‐consortium institutes, but that production is

through one or more industrial contract manufacturers.

Stage 1 – Alpha prototype. This is the first full‐size engineering prototype of any particular

circuit board assembly. There may or may not have been smaller‐scale “proof‐of‐concept”

prototypes prior to this prototype. Tests of the board at this stage will lead to changes

required for the next‐stage build. It should at least be possible to get the board working at

some level in this stage, unless a major oversight in the design has occurred. The build of

this board includes the prototype ASIC. Normally the build quantity here is 1.

Stage 2 – Beta prototypes. These prototypes incorporates all changes from Stage 1, and are

used to provide more robust prototypes for more detailed and exhaustive testing, as well as

developing full turn‐key manufacturing processes. This build will also include ASIC

WP2‐040.050.010‐TD‐001 Revision : 1

2011‐03‐11 of 59

prototypes, to allow for more exhaustive ASIC prototype testing. The build quantity here

could be ~10‐30 pieces (or more or less, depending on the confidence established in Stage

1), so that more range of variability in process is tested. Some of these prototypes will be

used as UUTs for reliability testing, quality analysis, test beds for software testing, and initial

system integration and test. Changes required to the design detected here will lead into the

Stage 3 build.

Stage 3 – Pre‐production prototypes. These prototypes are built in enough volume so as to

flush out any potential large‐volume problems. This build is essentially a “full dress

rehearsal” for the full‐scale (Stage 4) build. Similar rigour of testing as Stage 2 is employed,

except with more quantity. Final tweaking of the full turn‐key production process is done

here. Given the full‐scale full‐volume quantities in the final system, it is not unlikely that the

build quantities here could be in the 100‐1000 quantity range, and at a minimum will be a

full rack‐bay worth of boards for full power delivery and thermal tests. These prototypes

will use production ASICs.

Stage 4 – Full‐scale production. Full turn‐key production, with units delivered directly to the

final installation site, tested and ready for operations. Boards are installed in racks in units

of band/slices, and connected to F‐part and Central Beamformer accordingly.

8.2.1 Development and Production Schedule

This schedule outlines in broad‐brush terms how development and production of a full‐scale SKA

correlator, based on the GSA concept, might proceed. These estimates are roughly based on the

EVLA experience, but modified assuming design and test groups are available as defined in previous

sections. Many tasks can proceed in parallel, but the impact of ASIC capacity can have far reaching

effects and so a design study establishing a good baseline on forecasted ASIC capacity, cost, and

power, as well as functional and performance specifications, is assumed to be available prior to the

development schedule outlined here. There are many important peripheral activities (e.g. reliability

testing, quality analysis) as well, but the concentration here is on main design group tasks; it is

assumed that peripheral support activities occur in step.

X‐chip (e.g. X421) ASIC RTL development and test. Estimate ~2‐3 years20, assuming one

primary designer, 1 or more associate designers (as believed to be optimal), and associated

test group working in parallel. There are many complex features in the ASIC to enable

capabilities as outlined in previous sections, and all of these must be exhaustively tested.

This design process includes physical synthesis and timing analysis, so once it is completed,

i.e. the RTL code is “qualified”, an ASIC fabrication vendor can take the results and fabricate

the chip.

ASIC prototype fabrication, 6 months to 2 years.

20 The EVLA ASIC initial RTL design and incremental test took ~4 months, but the X421 design is quite a bit more complex. “Qualification” of the RTL—subjecting it to independent verification testing—took ~1 year for the EVLA ASIC. Factoring out lost time due to confusion from unqualified vendors, and mistakes we made in test fixtures, the entire EVLA ASIC process, from start of RTL to production chips took ~3 years.

WP2‐040.050.010‐TD‐001 Revision : 1

2011‐03‐11 of 59

X‐board FPGA design, development, test21, 2 years.

X‐board design, development, and Stage 1 fabrication. ~1‐1.5 years.

Custom rack and sub‐rack design, thermal modelling, development, and Stage 1 fabrication.

~1‐2 years.

X‐board software development and test. Starts with ASIC specification, and proceeds

through entire duration of project.

Patch board development and Stage 1 prototype fabrication, ~6 months.

X‐board and prototype ASIC Stage 1 testing. ~6 months to 1 year.

Stage 2 X‐board development and fabrication. ~3‐8 months.

Stage 2 testing, initial system integration and testing, ~6 months.

Full production ASIC fabrication, ~6 months to 1 year.

Stage 3 X‐board development and fabrication. ~5 months.

Stage 3 X‐board testing, system integration and testing, final full‐bay thermal testing ~6

months.

Stage 4 full turn‐key production and delivery of all correlator modules and COTS systems, ~1‐

2 years.

A rough Gantt chart schedule of all of these activities is shown in Figure 8‐1. The schedule assumes

that full production ASIC fabrication will see a ~continuous roll‐out of devices, rather than one

lumped delivery at the end of the allocated time. In all cases, the worst‐case times from the above

bullets are used.

21 All on the desktop, using FPGA-vendor tools and simulations.

WP2‐040.050.010‐TD‐001 Revision : 1

2011‐03‐11 of 59

Figure 8‐1 Rough Gantt chart of X‐part correlator development, including ASIC development with a

fabrication start time at the beginning of 2016. Some of the activities could have earlier start times, in

particular preliminary X‐board feasibility study/development, software development, rack and sub‐rack

design, and FPGA design. ASIC RTL development and test is shown beginning of 2012, which may be a bit

premature, depending on how accurate future technology forecasts are.

8.3 Electromagnetic Radiation

High‐quality signal integrity board design, using differential pairs for high‐speed signals, helps to

ensure low RFI emissions. However, digital boards will still likely generate RFI at levels unacceptable

to SKA receiving elements, as the correlator is likely to be centrally located and near most elements.

Therefore, it is assumed and required that the correlator be housed in one or more RFI‐shielded

rooms, with appropriate power‐line and signal filtering.

8.4 Manufacturer Nameplate and Product Marking

8.5 Industrial Standardisation

The elements of the correlator described in this concept description do not particularly meet, nor do

they seem required to meet, any particular industry‐standard form factors. If possible, though,

FRPPs, should they be used as described here, should meet industry standards, if such standards

exist to meet performance requirements. Where possible, industry‐standard protocols and methods

should be used throughout, so as to minimize development effort.

8.6 Interchangeability

It is likely desirable for the X‐part correlator board to be interchangeable amongst different

correlator types (e.g. SAA, dishes etc.), and populated in different parts of the correlator. However,

this may not be optimal from a cost point of view, as boards for SAAs, DAAs, and in the diagonal

section of dishes contain 9x8 ASICs, whereas the bulk of boards used for dishes need only 8x8 ASICs.

Therefore, these two boards may not be interchangeable.

WP2‐040.050.010‐TD‐001 Revision : 1

2011‐03‐11 of 59

8.7 Safety

‐48 VDC mains systems have the added benefit that they run at human‐safe voltages. Nevertheless,

precautions must be taken to avoid short‐circuit of the hot and return rails, as significant current

delivery is possible, and can quickly melt and/or vaporize metal, and cause severe burns. COTS ‐48

VDC power plants and batteries are engineered and installed for human‐safe environments, and so

this should not be an issue. Earthquake zone ratings for the installation must be considered so that

racks and equipment are appropriately fastened so as to avoid crushing safety issues should such an

event occur.

8.8 Ergonomics

The correlator room(s), assuming centralized air cooling, is not an office‐like work environment,

conducive to long‐term human exposure. The room is restricted access, and only those personnel

performing directed functions should be allowed access. For the EVLA, only a limited and qualified

set of NRAO personnel are allowed in the room and the room is kept “spartan” so as to minimize the

possibility for collection of people, junk, and dust.

8.9 Confidentiality and Protection

The concepts described in this document are the property of the National Research Council of

Canada and must not be divulged to anyone outside the SKA Project Development Office without

permission.

8.10 Supplies from the Contracting Authority

Void

8.11 Resource Reserve Capabilities

Void

8.12 Documentation

Void

8.13 Logistics

Void

8.14 Personnel and Training

This section describes personnel and training required for the final on‐site system.

8.14.1 Personnel

The following personnel are expected to be required:

WP2‐040.050.010‐TD‐001 Revision : 1

2011‐03‐11 of 59

Power electrical maintenance (1 person). Responsible for maintenance and monitoring of

mains AC, mains DC, and HVAC systems.

Network administrators (2 persons). Responsible for troubleshooting, maintaining, and

upgrading network equipment.

Correlator maintenance engineers (2 persons). Responsible for troubleshooting,

maintaining, and repair/replacement of correlator systems equipment.

Software maintenance engineers (2 persons). Responsible for troubleshooting, maintaining,

and upgrading correlator systems software.

Site/building maintenance engineer (1 person). Responsible for maintaining the correlator

building(s), and site services.

Operations personnel, operations scientists (4 persons). There will be the need for several

operations personnel (operators) and operations scientists having to do with observation

scheduling, exceptions handling etc.

8.14.2 Training

Void

8.15 Characteristics of Secondary Items

Void

8.16 Priority

Void

WP2‐040.050.010‐TD‐001 Revision : 1

2011‐03‐11 of 59

9 Cost and Power Estimates

This section includes 2 tables estimating cost and power of the entire X‐part correlator installation

shown in Figure 7‐1, as well as operating costs. Does not include cost and power of the F‐part, or

data transport from the F to the X‐part, but does include subsumed gridding CPUs (FRPPs) and

associated VBA network. Cost estimates are based on recent experience with the EVLA,

incorporating rough estimates for volume pricing, and technology improvements.

9.1 Cost Table

Description Qty Cost ea Cost total

GSA X Board

X421 ASIC (<=~30 nm) (9 column x 8‐row board) 72 $50 $3600

Repeater Hard Copy FPGA 2 $300 $600

350 W of power supplies (~$1/W) 1 $300 $300

VBA data switching hardcopy FPGAs 2 $400 $800

Multi‐core Field‐Replaceable Processor Pack 2 $500 $1000

M&C FPGA 1 $100 $100

M&C and Gridded data output SFP Module 1 $25 $25

Miscl power, temp, monitor semis 1 $20 $20

Decoupling caps, resistors, various passives, LEDs 1 $200 $20

0XT 40‐pair ERNI connector 8 $10 $80

18” x 16.5” multi‐layer (est. 14 layers) 1 $200 $200

Cooling plate (liquid or air heatsink) 1 $100 $100

Miscl hardware 1 $100 $100

Sub‐total $6935

Turn‐key mfg costs (incl. part procure., test) 25% $1734

TOTAL per GSA board (includes Gridding CPUs) $8669

Per‐GSA Board Connectivity

Rear mid‐plane (turn‐key cost) 1 $100 $100

Horizontal patch board (turn‐key cost) 1 $50 $50

Vertical patch board (turn‐key cost) 1 $30 $30

Telescope GSA+Connectivity+Rack costs

Dish‐WBSPF (1 GHz/pol’n, 3072 antennas)

GSA X Boards 1008 $8669 $8.74M

Rear connectivity22 (1 mid‐plane, 1 h‐patch, 1 v‐patch) 1008 $180 $0.18M

Front connectivity (1 h‐patch, 1 v‐patch) 1008 $80 $0.08M

48U, 19”, precision rack 24 $1000 $0.024M

Per‐rack, network switch, breaker panel, power

cables

24 $5000 $0.12M

Dish‐WBSPF X‐correlator TOTAL $9.1M (A)

22 Assumes cables are not required to connect boards along diagonal.

WP2‐040.050.010‐TD‐001 Revision : 1

2011‐03‐11 of 59

Description Qty Cost ea Cost total

Telescope GSA+Connectivity+Rack costs

Dish‐PAF (750 MHz/pol’n, 2048 antennas), 30 beams

GSA X Boards (5 band slices x 8x8 array, x 30 beams) 9600 $8669 $83.2M

Rear connectivity (1 mid‐plane, 1 h‐patch, 1 v‐patch) 9600 $180 $1.73M

Front connectivity (1 h‐patch, 1 v‐patch) 9600 $80 $0.77M

48U, 19”, precision rack 184 $1000 $0.184M


cables

184 $5000 $0.92M

30 Beam, 750 MHz/pol Dish‐PAF X‐correlator TOTAL $86.8M (B)

SAA (300 MHz/pol’n, 256 elements), 1000 beams

GSA X Boards (2 band slices(boards) x 1000 beams) 2000 $8669 $17.3M

48U, 19”, precision rack 42 $1000 $0.042


cables

42 $5000 $0.21M

1000 Beam, 300 MHz/pol SAA X‐correlator TOTAL $17.6M (C)

DAA (300 MHz/pol’n, 256 elements), 1000 beams

GSA X Boards (2 band slices(boards) x 1000 beams) 2000 $8669 $17.3M

48U, 19”, precision rack 42 $1000 $0.042


cables

42 $5000 $0.21M

1000 Beam, 300 MHz/pol DAA X‐correlator TOTAL $17.6M (D)

System Infrastructure

System 200 kW 48VDC battery‐backed power plant 27 $200k $5.4M

Shielded room, cooling, and power distribution

infrastructure

1 $3M $3M

System infrastructure TOTAL $6.3M (E)

Industry NREs (not incl. institute design costs)

X421 ASIC, <=~30 nm 1 $6.5M23 $6.5M

GSA and Patch Board design and fabrication NRE 1 $500k $0.5M

System infrastructure engineering 1 $1M $1M

NRE TOTAL $8M (F)

System TOTAL w/o 35% contingency (A‐F) $147M

TOTAL c/w 35% contingency $199M

Table 9‐1 Cost Table.

23 Possibly less than this; likely the “gold-plated” cost.

WP2‐040.050.010‐TD‐001 Revision : 1

2011‐03‐11 of 59

9.2 Power Table

Power estimates are very rough and are heavily leveraged by the X‐board power, itself leveraged by

ASIC power. Further refinements and optimizations based on detailed design studies are required.

Description Qty P each P total

GSA X Board

X421 ASIC (9 column x 8‐row board) 72 3 W24 216 W

Repeater Hard Copy FPGA (170 mW per 48

MGT+some)

2 15 W 30 W

VBA data switching hardcopy FPGAs (170 mW per 40

MGT+some)

2 15 W 30 W

Multi‐core Field‐Replaceable Processor Pack25 2 20 W 40 W

M&C FPGA + SFP 1 5 W 5 W

Subtotal 311 W

Power supplies, 90% efficient26 31.1 W

GSA Board TOTAL Power 355 W

Telescope GSA+Rack Power

Dish‐WBSPF (1 GHz/pol’n, 3072 antennas)

GSA X Boards 1008 355 357.8 kW

Per‐rack network switch 24 200 W 4.8 kW

Per‐rack, breaker panel, power cables (2% losses) 1 7.1 kW 7.3 kW

Dish‐WBSPF TOTAL 370 kW

Dish‐PAF (750 MHz/pol’n, 2048 antennas, 30 beams)

GSA X Boards 9600 355 3.41 MW


Per‐rack, breaker panel, power cables (2% losses) 1 7.1 kW 70 kW

Dish‐PAF TOTAL 3.52 MW

SAA (300 MHz/pol’n, 256 elements, 1000 beams)

GSA X Boards 2000 355 710 kW



SAA TOTAL 732.8 kW

24 With 12 10G MGTs this will be challenging to meet. A new ultra-low-power transceiver technology may be required, perhaps short-distance pulse-phase modulation transceivers [10] [11]. 25 There is wide-scale intensive development of low-power high-performance computing. This will likely not be a COTS module, but rather a module optimized for image processing, using a COTS CPU core. By the time of the full-scale SKA, 20 W should buy significant processing capability. 26 Based on current-generation 48 VDC to LVDS converters (Artesyn). A two-stage conversion might be necessary for best efficiency for voltages <1.0 V.

WP2‐040.050.010‐TD‐001 Revision : 1

2011‐03‐11 of 59

Description Qty P each P total

Telescope GSA+Rack Power

DAA (300 MHz/pol’n, 256 elements, 1000 beams)

GSA X Boards 2000 355 710 kW



DAA TOTAL 732.8 kW

TOTAL Correlator Rack Power 5.36 MW

‐48 VDC power plant efficiency (90%)27 1 536 kW 536 kW

Subtotal 5.96 MW

HVAC cooling power (30%)28 1 1.8 MW

TOTAL SYSTEM POWER 7.76 MW

Electricity operating cost, per year, assuming $0.10

per kW‐hr

68 M kW‐

hrs

$0.10 $6.8 M/yr

Table 9‐2 Power Table.

9.3 Operating Costs

9.3.1 Power

The X‐part correlator power operating cost is according to TOTAL SYSTEM POWER in Table 9‐2, and

is roughly $6.8M/year, assuming $0.10 per kW‐hr.

9.3.2 Staffing

With staffing levels outlined in section 8.14.1, and at $150k/year/person, roughly $1.8M/year

operating cost.

9.3.3 Maintenance

Staffing for maintenance also include various contractors such as electrical contractors, HVAC service

contractors etc. Estimate $500k/year for these kinds of additional maintenance contractors.

9.3.4 Spares & Replacements

For the EVLA, 5% spare modules were produced, with additional spare components, and COTS sub‐

modules at varying spares levels depending on the perceived (and analysed) failure rate,

availability/lifetime, and criticality. Overall, count on an additional ~7% cost for spares (7% of

$199M—Table 9‐1). Cost is ~$14M.

27 As per the EVLA COTS -48 VDC power plant, Emmerson Power, Model LPS48E1. 28 Private communication, Bob Broilo NRAO EVLA-site EE w.r.t. EVLA correlator.

WP2‐040.050.010‐TD‐001 Revision : 1

2011‐03‐11 of 59

The largest on‐going replacement items will be anything with moving components, and this primarily

means central HVAC systems, if the cooling scheme of Figure 7‐2 is employed.

WP2‐040.050.010‐TD‐001 Revision : 1

2011‐03‐11 of 59

10 Quality Assurance Provisions

10.1 General

Void

10.1.1 Qualification

Void

10.1.2 Acceptance

Void

10.2 Requirements Conformance Verification

Void

10.2.1 Verification Methods

Void

10.2.2 Verification Matrix

Void

11 Delivery Provisions

Void

12 Notes

Void

WP2‐040.050.010‐TD‐001 Revision : 1

2011‐03‐11 of 59

13 Appendices

13.1 X421 ASIC Data Sheet

This appendix contains the first two pages of the X421 data sheet.

(B. Carlson) CA-X421-150 57

NRC Confidential—Preliminary Data Sheet—December 16, 2010

________________________________________________________________

________________________________________________________________

Features: 1024 baselines, 75 MHz/polarization, 2048

channels per product, all 4 polarization products.

15-level, 4-bit complex MAC, with data valid counting for each channel allowing for temporal/spectral RFI flagging/excision.

Complete on-chip memory, with seamless no dead-time integration/dumping/readout.

Y-input selection of X-input data for maximum packing in medium array size applications.

Quad-cell operation for ½ antennas, 2X BW operation.

Antenna-stream-controlled, baseline-based time + channel integration to minimize output data

volumes. Selectable epoch-less integration.

Baseline-based, u,v,w calculation/generation. Dynamic/flexible channel-burst lengths to

minimize upstream packet buffering. On-chip triple buffering for max 256-burst lengths.

All hi-speed I/Os are 10 Gbps clear channel SERDES 64/66B differential pairs (10.3125 Gbps line rate)

High-performance 4x10G-lane daisy-chained data in/out with buffering for blind nearest neighbour central controller-less output.

All IEEE 802.3 packet-based processing. I2C bus for monitor and control; 3 address bits. Single F672 flip-chip package; minimal I/O

count for high yield, low-power, cost-sensitive applications.

________________________________________________________________

Functional Block Diagram

1024 baseline, 4-bit, 2048 channel, 150 MHz, X-CMAC Array Correlator Chip

CA-X421-150



Description:

The CA-X421-150 (the “X421”) is a 1024 baseline, full-polarization, 4-bit complex spectral channel cross-correlator. The device has enough on-chip memory for accumulation, eliminating the need for external DRAM, thereby reducing power and minimizing I/O pin count for ease of board routing and maximization of board production yield to minimize cost and maximize reliability. It supports a bandwidth up to 75 MHz per polarization (less bandwidth can be processed, in which case the chip is operating at a lower duty cycle.), and provides 2048 complex channels per correlation product, with enough memory for 20 second on-chip accumulation (i.e. autocorrelation; longer for cross-correlations, inversely proportional to the correlation coefficient). Four lanes of 10G each with serial “frame in” and “frame out”, provide for nearest-neighbor daisy-chained routing of output data, in a blind, controller-less manner, simplifying board routing and eliminating any requirement for centralized real-time device control or scheduling. Each lane can be statically enabled or disabled to support varying board configurations. Enough output bandwidth is provided to allow for 170 msec integration on all baselines, with 8 chips stacked in a daisy-chain.

The X421 works on Ethernet IEEE 802.3 packets, either UDP/IP, or raw transport 802.3, allowing signals to be delivered to the chip over commercial equipment/networks. Two packet formats are supported, one which allows 8 antennas, 75 MHz/polarization, with 64 to 256 channel-burst-length, to be multiplexed into a single packet in a single serial stream. The other allows for ½ antenna, 4-band (effective 2X bandwidth) operation, allowing for 4 antennas, 4 bands at 37.5 MHz/polarization per band to be multiplexed into a single packet. Incoming X and Y packets must be reasonably synchronized in time; intelligent synchronization operating on packet sequence numbers, and on-chip triple packet buffering allows for up to 3.2 µsec skew between arriving packets.

Each full polarization frequency channel accumulator contains a data valid accumulator to allow for temporal channel-based flagging of data. The “-8” (1000b) state of the 4-bit 2’s complement word is used to indicate data invalid; the X421 decodes this state, and if either the R-pol or L-pol, Re or Im 4-bit word contains this value, a “0” is added to each polarization product accumulator, and the valid count accumulator is not

incremented. The output data frame contains this data valid count for each complex frequency channel, allowing for precision normalization by downstream floating-point processors.

The device contains additional features to allow for intelligent minimization of output data rates, and the possibility of downstream u,v,w-based packet destination routing to facilitate on-the-fly, distributed image processing. An INFOrmation packet is defined that contains everything needed for the internal integrators to perform baseline u,v-based time integration and/or baseline u,v-based channel integration, with integration-epoch-less intelligent on-chip handling of different integration times on different products, in units of Packet Groups29. The INFO packet contains antenna-based 48-bit u, v, and w, and the chip calculates baseline-based u, v, w, allowing for high spatial frequency resolution over wide observing frequencies and any possible earth baselines. If desired, these features can be by-passed and epoch-aligned; baseline-independent integration can then be performed using the same protocol.

An I2C bus allows for low data rate monitor and control of operation mode (single band, 1024 baseline, or quad band, 256 baselines), receiver/CDR and PLL status, as well as internal integration status such as integrator overrun, received packet errors etc. Three address bits allow 8 devices to be hung off the same I2C bus. Only a bare minimum of monitor and control functions are provided, to minimize monitor and control complexity in large distributed systems.

29 A “Packet Group” is a complete channel group of channel-burst packets, forming one complete sub-integration.

giant systolic array (gsa) correlator concept description€¦ · penticton, bc , canada, v2a 6j9...

Documents