introduction to sky survey problems
Post on 14-Jan-2016
52 Views
Preview:
DESCRIPTION
TRANSCRIPT
Introduction to Sky Survey Introduction to Sky Survey ProblemsProblems
Bob MannBob Mann
Introduction to sky survey Introduction to sky survey database problemsdatabase problems
Astronomical data Astronomical data
Astronomical databasesAstronomical databases– The The Virtual ObservatoryVirtual Observatory – concept & status – concept & status– Large sky survey databasesLarge sky survey databases
Spatial indexing in astronomical databasesSpatial indexing in astronomical databases
Case Study: SDSS & Case Study: SDSS & SkyServerSkyServer
Observational AstronomyObservational Astronomy
Electromagnetic spectrumElectromagnetic spectrum
IRAS 252MASS 2DSS Optical IRAS 100 NVSS 20cmGB 6cmROSAT ~keV WENSS 92cm
Astronomical data – in original formAstronomical data – in original form
OpticalOptical– Image: array of pixel valuesImage: array of pixel values
X-ray X-ray – Event list: positions, arrival times, energies Event list: positions, arrival times, energies
of all detected photonsof all detected photons
Radio Radio – Interferometric visibilities: sparse Fourier Interferometric visibilities: sparse Fourier
transform of a region of the skytransform of a region of the sky
VeryVery different types of data different types of data
Astronomical data – in final formAstronomical data – in final form
Most research done using catalogue dataMost research done using catalogue data– i.e. tables of attributes of detected sources – i.e. tables of attributes of detected sources –
mainly discrete sources (stars, galaxies, etc)mainly discrete sources (stars, galaxies, etc)– Data compressionData compression
Catalogue - few% of image data volumeCatalogue - few% of image data volume
– Amenable to representation in relational DBAmenable to representation in relational DBNatural indexing by location in skyNatural indexing by location in sky
Astronomical DatabasesAstronomical Databases
Sky survey archivesSky survey archives– Homogeneous data, standard reduction pipelineHomogeneous data, standard reduction pipeline– ““Science Archive” – do science on DBScience Archive” – do science on DB
Telescope archivesTelescope archives– Semi-indexed collections of raw data files from all Semi-indexed collections of raw data files from all
observations taken – heterogeneousobservations taken – heterogeneous– Download data for reduction and analysisDownload data for reduction and analysis
Specialist data centres – collections of cataloguesSpecialist data centres – collections of catalogues
Bibliographic databases– scans of major journalsBibliographic databases– scans of major journals
The The Virtual ObservatoryVirtual Observatory
Concept: Concept: – Interoperable federation of all the world’s Interoperable federation of all the world’s
significant astronomical databasessignificant astronomical databases– Facilitate multi-wavelength astronomyFacilitate multi-wavelength astronomy
Status:Status:– Several projects underway – AstroGrid in UKSeveral projects underway – AstroGrid in UK– 5+ years’ work to create a fully working VO5+ years’ work to create a fully working VO
The VO sets the context for the design of The VO sets the context for the design of new sky survey databasesnew sky survey databases
AstroGrid: AstroGrid: www.astrogrid.orgwww.astrogrid.orgConsortium:Consortium:– Edinburgh, Leicester, Cambridge, RAL, MSSL, Edinburgh, Leicester, Cambridge, RAL, MSSL,
Jodrell Bank, Queens BelfastJodrell Bank, Queens Belfast
3 year (~£4M) project:3 year (~£4M) project:– 1 yr Phase A Study – finished end of 20021 yr Phase A Study – finished end of 2002– 2 yr Phase B Implementation – to end 20042 yr Phase B Implementation – to end 2004
Web (later Grid) service framework; in JavaWeb (later Grid) service framework; in Java
Currently building web services, portals, etc Currently building web services, portals, etc - researching OGSA and OGSA-DAI- researching OGSA and OGSA-DAI
Large sky survey databasesLarge sky survey databases
Major science driver for AstroGrid – and VOMajor science driver for AstroGrid – and VO– New science – mining multi-wavelength dataNew science – mining multi-wavelength data
Largest are optical/near-infrared sky surveysLargest are optical/near-infrared sky surveys
Largest of these hosted in Edinburgh: Largest of these hosted in Edinburgh: – currentcurrent - SuperCOSMOS, SDSS (mirror) - SuperCOSMOS, SDSS (mirror) – futurefuture - WFCAM, VISTA - WFCAM, VISTA– Each yield 1-10TB of catalogue data in RDBMS Each yield 1-10TB of catalogue data in RDBMS
Spatial queries in astronomySpatial queries in astronomy
Two important types:Two important types:– Select entries (with predicate) in area of skySelect entries (with predicate) in area of sky– Match entries (esp. between two tables)Match entries (esp. between two tables)
Second is special case of first Second is special case of first – i.e. both boil down to “point-within-distance-of-i.e. both boil down to “point-within-distance-of-
point” point” – but distances in two cases can be very differentbut distances in two cases can be very different
Advantage in using a hierarchical spatial Advantage in using a hierarchical spatial indexing scheme indexing scheme – Perform spatial query at appropriate granularity Perform spatial query at appropriate granularity
Spatial IndexingSpatial Indexingin Astronomyin Astronomy
The Celestial Sphere The Celestial Sphere
Many coordinate systemsMany coordinate systems
Most common is the Most common is the
equatorial systemequatorial system, with , with
Right AscensionRight Ascension and and
DeclinationDeclination as analogues as analogues
of Longitude & Latitudeof Longitude & Latitude
Spatial indexing in Spatial indexing in astronomical databasesastronomical databases
Basic DBMS indexes are 1-D – e.g. B-treesBasic DBMS indexes are 1-D – e.g. B-treesSome DBMSs support general 2-D indexing Some DBMSs support general 2-D indexing – Usually using R-trees (or variants) – rectangles: astronomical Usually using R-trees (or variants) – rectangles: astronomical
experiments not too successful: [experiments not too successful: [CliveClive]]
Some DBMSs have native spatial indexingSome DBMSs have native spatial indexing– Little knowledge of this in astronomy - Little knowledge of this in astronomy - want to know morewant to know more
ButButThe Celestial Sphere is a sphere(!)The Celestial Sphere is a sphere(!)– Many geographical spatial DBs use planar projectionsMany geographical spatial DBs use planar projections
So, astronomers have felt the need to develop spatial So, astronomers have felt the need to develop spatial indexing prescriptions of their ownindexing prescriptions of their own
Hierarchical Triangular Mesh - Hierarchical Triangular Mesh - HTMHTM
Developed by Sloan survey archive team at JHUDeveloped by Sloan survey archive team at JHUStart with projection of octahedron on sphere and Start with projection of octahedron on sphere and subdivide triangles at their midpointssubdivide triangles at their midpoints
Generate unique pixel ID code based on position Generate unique pixel ID code based on position in the sky and level in hierarchy – can in the sky and level in hierarchy – can index that with B-treeindex that with B-tree
Hierarchical Equal Area Iso-Hierarchical Equal Area Iso-Latitude Pixelisation (HEALPix)Latitude Pixelisation (HEALPix)
Developed by Kris Gorski (now JPL/Caltech)Developed by Kris Gorski (now JPL/Caltech)
Start with division of sphere into twelve equal area Start with division of sphere into twelve equal area curvilinear quadrilaterals,curvilinear quadrilaterals,
then divide each into fourthen divide each into four
Like HTM, produces aLike HTM, produces a
pixel code on which apixel code on which a
B-tree index can be madeB-tree index can be made
(Ian – HEALPix in Oracle?)(Ian – HEALPix in Oracle?)
Sky survey DB case study:Sky survey DB case study:SkyServer for SDSSSkyServer for SDSS
Sloan Digital Sky Survey (SDSS): Sloan Digital Sky Survey (SDSS): – first of new generation of sky surveysfirst of new generation of sky surveys
US-led team, dedicated telescope & cameraUS-led team, dedicated telescope & cameraImage half of northern sky in 5 optical bandsImage half of northern sky in 5 optical bandsThen obtain opticalThen obtain optical
spectra for 1,000,000spectra for 1,000,000galaxiesgalaxies
Estimated ~1TB ofEstimated ~1TB ofcatalogue datacatalogue data
SDSS ArchiveSDSS Archive
First of new generation of sky survey archivesFirst of new generation of sky survey archives– Represents the state-of-the-art in sky survey databasesRepresents the state-of-the-art in sky survey databases
Developed by Alex Szalay’s team at Johns Hopkins Developed by Alex Szalay’s team at Johns Hopkins Project started in earnest in about 1996Project started in earnest in about 1996– OODBMSs seen as the coming thingOODBMSs seen as the coming thing– SDSS chose SDSS chose Objectivity/DBObjectivity/DB for their archive: for their archive:
~15 staff-years of effort later, they’d rewritten ~15 staff-years of effort later, they’d rewritten much of the DBMS themselves…and then jumped much of the DBMS themselves…and then jumped ship and started using MS SQL Server! - ship and started using MS SQL Server! - SkyServer SkyServer (in collaboration with Jim Gray, MS (in collaboration with Jim Gray, MS Research) Research)
SkyServer design considerationsSkyServer design considerations
Power & flexibility to pose arbitrary queriesPower & flexibility to pose arbitrary queriesSimple – astronomers ignorant of SQL!Simple – astronomers ignorant of SQL!Hide messy spherical trigonometry Hide messy spherical trigonometry – Distance on sphere between (a1,d1) and (a2,d2) is Distance on sphere between (a1,d1) and (a2,d2) is
given in SQL bygiven in SQL by 2.0*asin(sqrt(square(sin(0.5*(radians(d1-d2)))) + 2.0*asin(sqrt(square(sin(0.5*(radians(d1-d2)))) +
cos(radians(d1))*cos(radians(d2))* cos(radians(d1))*cos(radians(d2))* square(sin(0.5*(radians(a1-a2)))))square(sin(0.5*(radians(a1-a2)))))
– Don’t want users typing thisDon’t want users typing this– Don’t really want DBMS to evaluate expressions like Don’t really want DBMS to evaluate expressions like
this oftenthis often
SkyServer spatial queriesSkyServer spatial queries
Simple table-valued functions exposed to user:Simple table-valued functions exposed to user:– E.g. selectE.g. select count(*) count(*)
fromfrom fGetNearbyObjEqfGetNearbyObjEq(a,d,radius)(a,d,radius)
(a,d)=(Right Ascension, Declination)(a,d)=(Right Ascension, Declination)
Functions call SQL Server Functions call SQL Server Extended Stored Extended Stored ProcedureProcedure– HTM index manipulation routines, implemented in a HTM index manipulation routines, implemented in a
Dynamically Linked Library (DLL)Dynamically Linked Library (DLL)– DLL generated from HTM package in C++ DLL generated from HTM package in C++
Lessons from HTM Lessons from HTM implementation in SkyServerimplementation in SkyServer
SQL is not great for spherical trigonometrySQL is not great for spherical trigonometry– Messy to write, slow to computeMessy to write, slow to compute
Have to define stored procedures/functionsHave to define stored procedures/functions– Expose a clean interface to usersExpose a clean interface to users– Let them pose queries the way they want toLet them pose queries the way they want to
Replace trig operations by integer arithmetic Replace trig operations by integer arithmetic – Library of HTM index operations underneathLibrary of HTM index operations underneath
Precompute tables of neighbouring objectsPrecompute tables of neighbouring objects– Far fewer spatial match operations at query timeFar fewer spatial match operations at query time
Problems with this approachProblems with this approach
How easy to develop stored procedures, etc?How easy to develop stored procedures, etc?– Needs detailed knowledge of DBMSNeeds detailed knowledge of DBMS– Extended Stored ProcedureExtended Stored Procedure calls slow calls slow
How well will query optimiser use HTM?How well will query optimiser use HTM?– ……less well than built-in spatial index?…less well than built-in spatial index?…
……but that might be poorly suited to astronomical but that might be poorly suited to astronomical applications…applications…
How easy to implement all this in DBMSs How easy to implement all this in DBMSs other than SQL Server?other than SQL Server?
But this works reasonably well in practice!But this works reasonably well in practice!
top related