accelerated acceleration searching · 2015-09-18 · eps2 0 start 56583.219462382228812 finish...

Accelerated Acceleration searching... Ewan BarrSwinburne University of Technology

Pulsar huntingSearch in 5 parameters:

Position

Dispersion measure

Period

Width

Acceleration

Tools of the tradeSIGPROC:

Lorimer et al.

C / FORTRAN

Procedural paradigm.

Easily extended.

Suffers from too many cooks...

Implements “time domain” acceleration searching

PRESTO:

Ransom et al.

ANSI C / PYTHON

Procedural paradigm.

Easily extended.

Well maintained.

Implements “Fourier domain” acceleration searching.

...CUDA support added!

ProblemsPrevious codes were written for the machines of the time. Less RAM smaller cache sizes and no accelerator support.

Often require heavy disk I/O.

FFTs are “slow” to execute on the CPU.

Cannot keep up with data rates without large computing clusters.

Large lags between observation and discovery.

Execution times force smaller parameter space searches.

Case studyTake a 72 minute long HTRU observation (226samples, 1.2 hours).

For one DM trial (single threaded on Intel Xeon 2.67 GHz):

Accelerations 0 m/s/s ± 5 m/s/s

SIGPROC execution time 13 seconds 1.5 hours

PRESTO execution time 55 seconds 0.6 hours


SIGPROC execution time 7.2 hours 125 days

PRESTO execution time 30.5 hours 50 days

Scaling up to a 2000 DM trial search:

GPUs vs CPUs

CPU GPU

CUDA

Compute Unified Device Architecture

Enables us to quickly write code which can use a large fraction of a GPUs potential.

Problem domain must be decomposed to a high level of granularity.

Peasoup

CUDA / C++ pulsar searching pipeline/toolbox

Based on Sigproc + code by Ben Barsdell, Paul Coster and Matthew Bailes.

Currently the only “end-to-end” acceleration searching pipeline in the world.

Very, very fast...

Peasoup pipeline

A thread is spawned for each “worker” GPU.A master “foreman” thread controls allocation of dispersion trials to GPUs.

Each GPU executes all accelerations for its current DM trial.

Candidates undergo a hierarchical sifting system and are finally folded and “scored”.

Benchmarks


SIGPROC execution time 7.2 hours 125 days

PRESTO execution time 30.5 hours 50 days

PEASOUP execution time 5 minutes 30 hours

Same HTRU “lowlat” file as before (2000 DM trials):

Rates of ~36 trials per second for 2^23 length time series.

Rates of ~7 trials per second for 2^26 length time series.

Results

Reprocessed all of HTRU “medlat” in under 30 days.

Produced >50 million candidates...

To analyze and select candidates we must employ neural nets + ranking algorithms.

This side still needs work, but it is a very active field.

J1705-1908

HTRU

Jacoby & Edwards

J1705-1908

J1705-1908 PSRJ J1705-1908RAJ 17:05:42.6108168 1 0.27649538519546339521 DECJ -19:08:02.5899999999903 F0 403.17844714540158663 1 0.00000048482103529509 F1 0 PEPOCH 56582.211983880566549 DM 57.490000000000002 BINARY ELL1PB 0.18395397110741149573 1 0.00000004178723294063 A1 0.10435891291481527563 1 0.00000040610657989516 TASC 56582.212268774442624 1 0.00000060684296922283 EPS1 0 EPS2 0 START 56583.219462382228812 FINISH 56585.350115984103468 GL 3.18 GB 12.99

Binary MSP

~4 hour orbit.

Low eccentricity.

Low mass companion (0.04 Msol).

Probably “black widow”?

Companion Roche lobe is roughly twice A1.

Shows eclipses over ~5% of the orbit.

Expect r.m.s. residual of ~3-6 us.

J1705-1908Low ecliptic latitude (~3.7 degrees), presents minor positional problem.

No nearby 2FGL sources.

Several X-ray sources within 2 arcminutes.

Serveral bright stars within 2 arcminutes.

If we wish to do optical followup (possibly with GEMINI), then we might require a interferometric position (ATCA or GMRT).

May be of interest to PTAs despite its “black widowiness” as the profile is very narrow and it can be observed by most 100-m class radio telescopes.

Source is moderately bright (~S/N 20 in 8 mins with Parkes) and so may be interesting for RM/DM measurements at eclipse ingress/egress.

SummaryGPUs likely represent the future of astronomical data analysis hardware.

The fine-grained parallelism of many DSP operations make GPUs and ideal choice for pulsar searching.

The speed offered by systems such as Peasoup will be vital in dealing with the SKA’s “Big Data”.

Peasoup has already passed the most important test of pulsar finding software by finding a pulsar.

Peasoup is currently being used in Manchester (for simulated SKA data), Germany (for processing the HTRU-North pulsar survey) and at Swinburne (for processing archival data sets).

accelerated acceleration searching · 2015-09-18 · eps2 0 start 56583.219462382228812 finish...

Documents