from photons to petabytes: astronomy in the era of large scale surveys and virtual observatories r....
Post on 26-Dec-2015
217 Views
Preview:
TRANSCRIPT
From Photons to Petabytes:Astronomy in the Era of Large Scale Surveys and Virtual Observatories
From Photons to Petabytes:Astronomy in the Era of Large Scale Surveys and Virtual Observatories
R. Chris Smith
AURA/NOAO/CTIO/LSST
“Classical” OpticalAstronomy“Classical” OpticalAstronomy
QuickTime™ and aTIFF (Uncompressed) decompressor
are needed to see this picture.
“Classical” Optical Astronomy“Classical” Optical Astronomy• 1-4 investigators propose for telescope time• Obtain 1 to 5 nights, or 1 to 5 hours!
Oversubscription on largest telescopes (e.g. Gemini) severely limits time per investigator
• Travel to distant telescope siteor not: Remote Observing, Service Observing, Queue
• Observeor not: clouds (OUCH!)
• Take .5 to 50 GB of data home (on tapes)• Reduce & Analyze “by hand”
Extract every detail from those bits Often takes months per night of data
Optical WindowsOptical Windows
QuickTime™ and aMotion JPEG A decompressor
are needed to see this picture.
Can “classical” techniques answer the BIG questions?Can “classical” techniques answer the BIG questions?
• Where do we come from?Star Formation, Nucleosynthesis
• Are we alone?Proto-planetary disks, search for planets
• Where are we going?Big Bang & the Expansion of the Universe
• What is the Universe made of?What types of matter? What types of energy?
Where are we going? A “Repulsive” ResultWhere are we going? A “Repulsive” Result• In 1990s, began looking for deceleration
• Found expansion of Universe is accelerating!!!
• Implies something NEW!
• Regions of empty space REPEL each other?
“Cosmological constant”?
• Einstein’s greatest blunder… OR NOT?!!
Something going on in the vacuum?
• NEW FUNDAMENTAL PHYSICS!
Today’s BIG Questions:Dark Energy & Dark MatterToday’s BIG Questions:Dark Energy & Dark Matter
Dark Energy is the dominant constituent of the Universe.Dark Matter is next.
95% of the Universe is in Dark Energy and Dark Matter, for which we have little or no detailed understanding.
1998 and 2003 Science breakthroughs of the year
Attacking the Question of Dark EnergyAttacking the Question of Dark Energy• “Classical” approach won’t work
Not enough telescope time Difficult to control calibrations & systematics
• LARGE SURVEYS Goal: Provide large, uniform, well calibrated,
controlled, and documented datasets to allow for advanced statistical analyses
Control calibrations & systematics to <1% Larger collaborations provide both manpower and
diverse expertise• Including traditional astronomers, high-energy physicists,
mathematicians, and computer scientists
Sociology of Dark EnergySociology of Dark Energy
• Dark Energy may be pushing the universe APART
• But it is pulling the Astronomy, High Energy Physics (HEP), and Computer Science (CS) communities TOGETHER HEP interests in fundamental physics HEP experience in large datasetsCS interest in CPUs, Storage, Networks, and (of
course) algorithms & optimization!
Dark EnergyROADMAP to understandingDark EnergyROADMAP to understanding• Today
ESSENCE, large international group of astronomers
• Coming Soon to a telescope nearby Dark Energy Survey
• Camera built by Fermilab, majority DOE funding• Data Management System led by NCSA• Groups from Spain, United Kingdom, and Brazil recently joined
• The next BIG step LSST
• Camera built by SLAC, Data Mgmt with NCSA, • NSF + DOE funding, also inc. LLNL, Brookhaven, others
• Stepping UP Space-based work: JDEM (SNAP and/or others)
• NASA + DOE funding
Today:ESSENCE (+SuperMACHO)Today:ESSENCE (+SuperMACHO)• Use a LARGE (~200 SNe), UNIFORM set of
supernova light curves to allow us to study the evolution of the expansion of the universe Constrain “w”, the equation of state parameter of Dark
Energy, to ~10%
• 30 half-nights per year for 5 years (2002-2006)• Use other half of nights to constrain possible
DARK MATTER candidates The ‘SuperMACHO’ project Search the Large Magellanic Cloud for microlensing
Searching for Supernovae (and other transients)Searching for Supernovae (and other transients)
High-z SN Team
ESSENCE+SuperMACHOThe data flows…ESSENCE+SuperMACHOThe data flows…• The telescope
CTIO’s Blanco 4m
• The camera MOSAIC 8Kx8K imager (67 megapixels)
• Exposures of 60s to 400s• Collect 20GB of RAW data per night• Data must be reduced and analyzed in near REAL
TIME (within ~10min)• Data ‘Reduction’ = >5x EXPANSION!
Roughly 3TB per year
… and flows… and flows
• MUCH larger data flow than most other astronomical projects
• With ADDITIONAL complication of real-time reduction & alert requirement Must plan spectroscopic follow up on LARGEST
telescopes (Gemini, Keck, VLT, Magellan, …)
• We THOUGHT we were ready A few CPUs (cluster of 20 x 1GHz)A few disks (4 x 4TB “data bricks”)
• But…
ChallengesChallenges
• Moving the data From Chile to the U.S.
• Storing the data Filling up racks with “data bricks” Keeping track of the data Initial database didn’t cut it
• Reprocessing the data Pipeline can keep up with real time flow But need to reprocess past years of data when
improvements are made to software
Coming Soon (2009?):Dark Energy SurveyComing Soon (2009?):Dark Energy Survey• Investigate Dark Energy using 4 complementary
and independent methods Various types of distance measurements, based on
standard luminosities, standard yardsticks, and standard volumes
• Combine the results to provide the best (to date) constraints on the equation of state of Dark Energy
The Instrument:Dark Energy CameraThe Instrument:Dark Energy Camera
Focal Plane:• 64 2k x 4k CCDs
• Plus guiding and WFS• 0.5 GIGApixel camera
The Data:Dark Energy SurveyThe Data:Dark Energy Survey• Each image = 1GB• 350 GB of raw data / night• Data must be moved to supercomputer center
(NCSA) before next night begins (<24 hours) Need >36Mbps internationally
• Data must be processed within ~24 hours Need to inform next night’s observing
• Total raw data after 5 yrs ~0.2 PB• TOTAL Dataset 1 to 5 PB
Reprocessing planned using TeraGrid resources
The Large Synoptic Survey Telescope – Massively Parallel Astronomy
The Large Synoptic Survey Telescope – Massively Parallel Astronomy
Survey the entire sky every 4-5 nights, to simultaneously detect and study: Dark Matter via Weak gravitational lensing Dark Energy via thousands of SNe per year Potentially hazardous near earth asteroids Tracers of the formation of the solar system Fireworks in the heavens – GRBs, quasars… Periodic and transient phenomena...…the unknown
LSST: The InstrumentLSST: The Instrument• 8.2m telescope
Optimized for WIDE field of view (FOV)
• 3.5 degree FOV• 3.5 GIGApixel camera
• Deep images in 15-30s• Able to scan whole
sky every 4-5 nights
LSST: Deep, Wide, FastLSST: Deep, Wide, Fast
Field of view (FOV)
KeckTelescope
0.2 degrees10 m
3.5 degrees
LSST
LSST Site: Cerro Pachon in Northern ChileLSST Site: Cerro Pachon in Northern Chile
Soar
Gemini
LSST ~1.5m caltelescope
Support
LSST site plan
ElPenon
Gemini (South)SOAR
LSST: Distributed Data ManagementLSST: Distributed Data Management
Long-Haul CommunicationsData transport & distribution
Base FacilityReal time processing
Mountain Sitedata acquisition, temp. storage
Archive/Data Access CentersData processing, long term storage, & public access
LSST: The Data FlowLSST: The Data Flow• Each image roughly 6.5GB• Cadence: ~1 image every 15s• 15 to 18 TB per night
ALL must be transferred to U.S. “data center”• Mtn-base within image timescale (15s), ~10-20Gbps • Internationally within <24 hours, >2-10Gbps
• REAL TIME reduction, analysis, & alerts Send out alerts of transient sources within minutes Provide automatic data quality evaluation, alert to
problems Processed data grows to >100TB per night!
• Just catalogs >3 PB per year!
DES, LSST, … and now for the REST of the ScienceDES, LSST, … and now for the REST of the Science• Ongoing (ESSENCE, SuperMACHO, etc.)
and future (DES, LSST, etc.) projects will provide PETABYTES of archived data
• Only a small fraction of the science potential will be realized by the planned investigations
• How do we maximize the investment in these datasets and provide for their future scientific use?
The Virtual ObservatoryThe Virtual Observatory
• What is VO?Provides the framework for global access to the various
data archives by facilitating the standardization of archiving and data-mining protocols.
Enables data analysis by providing common standards and state-of-the-art analysis tools which work over high-speed wide area networks
• What is VO not?An organization funded to provide a single universal
archive of all astronomical dataA provider of resources (storage, computation,
bandwidth)
VO: A Global EffortVO: A Global Effort
BR-VO
VO ChallengesVO Challenges
• Provide Access to the Content Multiple distributed archives, some on the
scale of many petabytesArchives provide content, the VO knits those
resources together
• Provide the Standards Allow variety of archives talk to each other Develop generalized data model(s) for
different instruments/different wavelengths
VO ChallengesVO Challenges
• Provide the User Interfaces Streamline data discovery, data understanding,
data movement, and data analysis
• Support the Analysis Support large queries across distributed DBs Support statistical analysis across results
(Grid)
• All the “boring” bits (infrastructure) Security, handshaking, resource management
Chris Miller/NOAO
VO Case Study: NOAO Data Products ProgramVO Case Study: NOAO Data Products Program• Management of data from all NOAO and some
affiliated facilitiesKPNO, including Mayall 4m (MOSAIC, NEWFIRM)CTIO, including Blanco 4m (MOSAIC, ISPI)SOAR & WIYN systems
• Virtual Observatory “back end”; CONTENTProvide access to large volume (TBs to PBs) of archived
ground-based optical & infrared data and data products
• Virtual Observatory “front end”; UI and TOOLSEnable science based upon distributed data and data
products, developing tools and services
NVO portal @ NOAONVO portal @ NOAO
• Focus on Scientific USER
• First support data DISCOVERY
• NOAO Supported NVO Portals:http://nvo.noao.eduAnd for S.America…http://nvo.ctio.noao.edu
• 15Mbps for Chilean use!
Many challenges ahead…Many challenges ahead…
• Security enforcing proprietary periods while allowing
PIs to combine data
• VOSpace virtual (and real) distributed workspaces to
collect and analyze data from many sites shared spaces for collaborations
• Combining GRID storage and CPU resources with VO queries and analysis
Strategic PartnershipsStrategic Partnerships
• In Local Systems Vendors: Local Storage, Processing, Servers
• In Remote Systems Supercomputer center(s) to provide bulk storage, large scale
processing (e.g., NCSA, SDSC’s SRB) Grid processing, storage
• Connectivity High-speed national and international bandwidth
• Scientific VO Partners to develop standards, provide tools Providing services to, and collecting feedback from, physics and
astronomy user communities Providing strong VO node in South America
top related