the ska science data processor and regional centres
TRANSCRIPT
The SKA
Science Data Processor
and Regional Centres
Paul Alexander
Leader the Science Data Processor Consortium
SKA: A Leading Big Data Challenge
for 2020 decade
Antennas
Digital Signal
Processing (DSP)
High Performance
Computing Facility (HPC)
Transfer antennas to DSP
2020: 5,000 PBytes/day
2030: 100,000 PBytes/day
Over 10’s to 1000’s kms
To Process in HPC
2020: 50 PBytes/day
2030: 10,000 PBytes/day
Over 10’s to 1000’s kms
HPC Processing
2023: 250 PFlop
2033+: 25 EFlop
X X X X X X
SKY Image
Detect &
amplify
Digitise &
delay
Correlate
Process Calibrate, grid, FFT
Integrate
s
B
B . s
1 2
Astronomical signal
(EM wave) • Visibility:
V(B) = E1 E2*
= I(s) exp( i w B.s/c )
• Resolution determined by
maximum baseline
qmax ~ l / Bmax
• Field of View (FoV) determined
by the size of each dish
qdish ~ l / D
Standard interferometer
Image formation computationally intensive
• Images formed by an inverse
algorithm similar to that used
in MRI
• Typical image sizes up to:
• 30k x 30k x 64k voxels
Work Flows: Imaging Processing Model
Correlator
RFI excision
and phase
rotation
Subtract current sky
model from visibilities using current calibration model
Grid UV data to form e.g.
W-projection
Major cycle
Image gridded data
Deconvolve imaged data
(minor cycle)
Solve for telescope and
image-plane calibration model
Update
current sky model
Update
calibration model
Astronomical
quality data
UV data store
UV processors
Imaging processors
One SDP Two Telescopes
Ingest (GB/s)
SKA1_Low 500
SKA1_Mid 1000
In total need to deploy
eventually a system which
is close to 0.25 EFlop of
processing
Scope of the SDP
The SDP System
Multiple instances of
the process data
component, some real
time some batch
1…N
Illustrative Computing Requirements
• ~250 PetaFLOP system
• ~200 PetaByte/s aggregate BW to fast working memory
• ~80 PetaByte Storage
• ~1 TeraByte/s sustained write to storage
• ~10 TeraByte/s sustained read from storage
• ~ 10000 FLOPS/byte read from storage
• ~2 Bytes/Flop memory bandwidth
9
SDP High-Level Architecture
What does SDP do?
SDP is coupled to rest of the
telescope
Try to make the coupling as
loose as possible, but some
time critical aspects
For each observation:• Controlled by a scheduling
block
• Run a Real time (RT)
process to ingest data and
feed back information
• Schedule a batch processing
for later
• Must manage resources to
SDP keeps up on timesacle
of approximately a week
Real-time activity
Batch activity
SDP Architecture and Architectural Drivers
• How do we achieve this functionality
• Architecture to deliver at scale
• Some key considerations
– SDP is required to evolve over time to support changing work flows required to deliver the science
– Different Science experiments have different computational costs and resource demands
• Use for load balancing
• Expect evolution of system to more demanding experiments during the operational lifetime
Data Driven Architecture
Data Driven Architecture
o Smaller FFT size at cost of data duplication
Data Driven Architecture
o Further data parallelism in spatial indexing (UVW-space)
o Use to balance memory bandwidth per node
o Some overlap regions on target grids needed
Data Driven Architecture
How do we get performance
and manage data volume?
Approach: Build on BigData Concepts
"data driven” graph-based processing approachreceiving a lot of attention
Inspired by Hadoop but for our complex data flow
Graph-based approach
Hadoop
Execution Engine: hierarchical
Processing Level 1
Cluster and data
distribution
controller
Relatively low-data
rate and cadence
of messaging
Staging:
aggregation of
data products
Processing level 2
Static data
distribution
exploiting inherent
parallelism
Execution Engine: hierarchical
Processing Level 2
Data Island
Shared file
store across
data island
Worker
nodes
Task-level
parallelism to
achieve scaling
Process
controller
(duplicated for
resilience)
Cluster manager
e.g. Mesos-like
Data flow for an observation
Execution Framework
Execution Framework
Execution Framework
Execution Framework
Execution Framework
What do we need
• Processing Level 1:– Custom framework to provide scale out
• Processing Level 2:– Many similarities to Big-Data frameworks
– Need to support multiple execution frameworks• Streaming processing – possibly Apache STORM
• Work flows with relatively large task size or where data-driven load balancing is straight forward – possibly development of prototype code
• ”Big Data” workflows with load balancing, resilience – possibly something like Apache SPARK
– Framework manages tasks• Tasks need to be pure functions
• Most tasks domain specific, but need supporting infrastructure– Processing libraries
– Interfaces to the execution framework
– Performance tuning on target hardware
What do we need
• Skills and work – e.g. for SPARK– Need to develop for High Throughput
• High Throughput data analytics framework (HTDAF)
• New data models
• Links to external processes
• Memory management between framework and processes
– Likely to have significant applicability
– Shared file system needs to be very intelligent about data placement / management or by tightly coupled with the HTDAF
– Integration of in-memory file system / object store systems with the cluster file system / object store
– Software development in data analytics, distributed processing systems, JVM, …
Database services
• Tasks running within processing framework need metadata and to communicate updates / results
– High performance distributed database
– Prototyping using REDIS
– Needs to interface to processing requirements
– Schemas
File / object store
• Multiple file / object stores,
one per “data island”
– Integration into execution
framework
– Resource allocation
integrated with cluster
management system
– Complex resource
management
– Migration / interface to
long-term persistent store
likely to be separate
Hierarchical Storage
Manager
Data lifecycle management
Manage data lifecycle across the cluster and through time
• Strong link to overall processing and observing schedule
• Performance is key
• Determine initial data placements
• Manage data movement and migration
• Local high performance obtained by light-weight layer (e.g. policies) on a file system
Hardware Platform
Complex Network
Performance Prototype Platform
35
Example tests:
• automated deployment from containers (github)
• infrastructure management and orchestration
• simulate streamed visibility data from correlator to N
ingest nodes
• basic calibration on ingest
• calibration and imaging with suitable modelling
across 2 compute islands
• data products written to preservation (staging)
• basic quality assurance metrics visible
• distributed database performance
• some automated built in testing at various levels
• LMC interface prototyping (platform management
and telemetry)
• Execution Framework - Big Data prototyping
(Apache Spark, etc.)
• scheduling
• Significant £400k
STFC/UK investment in
hardware prototype
• Currently being installed in
Cambridge
Control layer
Complex functionality
– Resilience required here
– Interface to cluster manager for resource management
– Prototyping around adding functionality to OpenStack
– Built around various open-source products integrated into broader framework
Control and system components
Delivering Data
Delivery System
• User Interfaces
• Delivery System Service Manager
• Delivery System Interfaces Manager
• Tiered Data Delivery Manager
Science data catalogue and Prepare
Science Product
– This component manages the catalogue
of data products and how they are
assembled for distribution including data
model conversion if needed
Domain specific components
39
Not discussed in detail
here
• Some performance
tuning
• Interfacing
• I/O system?
Other system Software
Compute System Software
• Compute Operating System software– The O/S will be based on a widely-adopted community version of Linux such as
Scientific Linux (Fermi Lab, et al) or CentOS (CERN, et al)
• Middleware– Messaging Layer based on TANGO’s messaging layer and/or open source
messaging layers (like ZeroMQ) is possible with effort
• Interface to TANGO– TANGO is the selected product for overall monitor and control – interfacing of other
components to TANGO will be required for e.g. monitor information from the cluster manager
• Middleware stack– Based on widely adopted versions of community-supported software that will mature
up to deployment, like OpenStack. Effort will be required during the construction period to develop enhancements and SDP-specific customisation. Effort will also be required to maintain enhancements, relevant to the hardware and software requirements. Close collaboration and convergence with large contributors (like CERN) in the open source community could lead to reduced long-term maintenance effort.
Other activities
• Expect SDP to be delivered by a consortium with strong link / involvement from SKAO and academic partners
– Consortium building and leadership
– Management
– Software Engineering and System Engineering
– Integration and acceptance
– Some on-going support activities after construction
Archive Estimates SKA1
Archive Growth Rate: ~ 45 – 120 Pbytes/yr
Larger including discovery space: ~300 Pbytes/yr
Data Products
• Standard products
• calibrated multi-dimensional sky images;
• time-series data
• catalogued data for discrete sources (global sky
model) and pulsar candidates
• No derived products from science teams
Further Processing and Science Extraction
at Regional Centres
Regional Centres
• Data distribution and access for
astronomers
• Provide regional data centres and High
Performance Computing services
• Software development
• Support for key science teams
• Technology development and engineering
support
• Significant compute in their own right
• In UK likely to be supported by layering on
top of a coordinated compute infrastructure
– working title UK-T0
Global Headquarters
South African Site
Australian Site
Regional Centre 1
Regional Centre 2
Regional Centre 3
• Data Analytics support
• Science analysis
• Source extraction
• Bayesian analysis
• Joining of catalogues
• Visualisation
END