data-intensive science: addressing common needs with shared tools christopher stubbs professor...
Post on 19-Dec-2015
215 views
TRANSCRIPT
![Page 1: Data-Intensive Science: Addressing common needs with shared tools Christopher Stubbs Professor Department of Physics Department of Astronomy cstubbs@fas.harvard.edu](https://reader030.vdocuments.us/reader030/viewer/2022032800/56649d395503460f94a12f39/html5/thumbnails/1.jpg)
Data-Intensive Science: Data-Intensive Science: Addressing common needs Addressing common needs
with shared toolswith shared tools
Data-Intensive Science: Data-Intensive Science: Addressing common needs Addressing common needs
with shared toolswith shared tools
Christopher StubbsChristopher Stubbs
ProfessorProfessor
Department of PhysicsDepartment of Physics
Department of AstronomyDepartment of [email protected]@fas.harvard.edu
Christopher StubbsChristopher Stubbs
ProfessorProfessor
Department of PhysicsDepartment of Physics
Department of AstronomyDepartment of [email protected]@fas.harvard.edu
![Page 2: Data-Intensive Science: Addressing common needs with shared tools Christopher Stubbs Professor Department of Physics Department of Astronomy cstubbs@fas.harvard.edu](https://reader030.vdocuments.us/reader030/viewer/2022032800/56649d395503460f94a12f39/html5/thumbnails/2.jpg)
2
Storing, analyzing, and exploiting large data setsStoring, analyzing, and exploiting large data setsStoring, analyzing, and exploiting large data setsStoring, analyzing, and exploiting large data sets
Searching for dark matter Searching for dark matter and dark energyand dark energy
Searching for dark matter Searching for dark matter and dark energyand dark energy
Searching Searching
for new for new
elementary elementary
particlesparticles
Searching Searching
for new for new
elementary elementary
particlesparticles
Detailed imaging of brain functionDetailed imaging of brain functionDetailed imaging of brain functionDetailed imaging of brain function
![Page 3: Data-Intensive Science: Addressing common needs with shared tools Christopher Stubbs Professor Department of Physics Department of Astronomy cstubbs@fas.harvard.edu](https://reader030.vdocuments.us/reader030/viewer/2022032800/56649d395503460f94a12f39/html5/thumbnails/3.jpg)
3
Some common threadsSome common threadsSome common threadsSome common threads• Ambitious instruments copious dataAmbitious instruments copious data
• E.g. tens of TB per night from imminent astronomy surveys
• Loosely coupled computingLoosely coupled computing• Don’t need linked analysis that uses all images
• Diverse applications from common dataDiverse applications from common data• Simulations are an integral aspect Simulations are an integral aspect • Build apparatus here, run it elsewhereBuild apparatus here, run it elsewhere• International collaborationsInternational collaborations• Computer science aspects Computer science aspects
• World’s largest non-proprietary databases• Clustering, data mining, file system optimization…
• Ambitious instruments copious dataAmbitious instruments copious data• E.g. tens of TB per night from imminent astronomy surveys
• Loosely coupled computingLoosely coupled computing• Don’t need linked analysis that uses all images
• Diverse applications from common dataDiverse applications from common data• Simulations are an integral aspect Simulations are an integral aspect • Build apparatus here, run it elsewhereBuild apparatus here, run it elsewhere• International collaborationsInternational collaborations• Computer science aspects Computer science aspects
• World’s largest non-proprietary databases• Clustering, data mining, file system optimization…
![Page 4: Data-Intensive Science: Addressing common needs with shared tools Christopher Stubbs Professor Department of Physics Department of Astronomy cstubbs@fas.harvard.edu](https://reader030.vdocuments.us/reader030/viewer/2022032800/56649d395503460f94a12f39/html5/thumbnails/4.jpg)
4
![Page 5: Data-Intensive Science: Addressing common needs with shared tools Christopher Stubbs Professor Department of Physics Department of Astronomy cstubbs@fas.harvard.edu](https://reader030.vdocuments.us/reader030/viewer/2022032800/56649d395503460f94a12f39/html5/thumbnails/5.jpg)
5
27 km27 km27 km27 km
CERN, outside GenevaCERN, outside GenevaCERN, outside GenevaCERN, outside Geneva
![Page 6: Data-Intensive Science: Addressing common needs with shared tools Christopher Stubbs Professor Department of Physics Department of Astronomy cstubbs@fas.harvard.edu](https://reader030.vdocuments.us/reader030/viewer/2022032800/56649d395503460f94a12f39/html5/thumbnails/6.jpg)
6
Seriously Big Toys. Seriously Big Toys. Seriously Big Toys. Seriously Big Toys.
Harvard involvement Harvard involvement in ATLAS detector:in ATLAS detector:
• J. DaCosta and G. Brandenberg at CERN now, in shakedown
• Built muon chambers here
• J. Huth plays leadership role in scientific computing for LHC
Harvard involvement Harvard involvement in ATLAS detector:in ATLAS detector:
• J. DaCosta and G. Brandenberg at CERN now, in shakedown
• Built muon chambers here
• J. Huth plays leadership role in scientific computing for LHC
![Page 7: Data-Intensive Science: Addressing common needs with shared tools Christopher Stubbs Professor Department of Physics Department of Astronomy cstubbs@fas.harvard.edu](https://reader030.vdocuments.us/reader030/viewer/2022032800/56649d395503460f94a12f39/html5/thumbnails/7.jpg)
Event SimulationsEvent Simulations
>30 Million event >30 Million event simulations are typicalsimulations are typical
Pick an interactionPick an interaction
Propagate through Propagate through model of the detectormodel of the detector
Measure detection Measure detection efficienciesefficiencies
>30 Million event >30 Million event simulations are typicalsimulations are typical
Pick an interactionPick an interaction
Propagate through Propagate through model of the detectormodel of the detector
Measure detection Measure detection efficienciesefficiencies
![Page 8: Data-Intensive Science: Addressing common needs with shared tools Christopher Stubbs Professor Department of Physics Department of Astronomy cstubbs@fas.harvard.edu](https://reader030.vdocuments.us/reader030/viewer/2022032800/56649d395503460f94a12f39/html5/thumbnails/8.jpg)
On-the-fly event On-the-fly event reconstructionreconstruction
Find tracks Find tracks
and trigger/store and trigger/store if interestingif interesting
Find tracks Find tracks
and trigger/store and trigger/store if interestingif interesting
Precise track Precise track determination determination Precise track Precise track
determination determination
AggregateAggregate
event statisticsevent statistics
AggregateAggregate
event statisticsevent statistics
![Page 9: Data-Intensive Science: Addressing common needs with shared tools Christopher Stubbs Professor Department of Physics Department of Astronomy cstubbs@fas.harvard.edu](https://reader030.vdocuments.us/reader030/viewer/2022032800/56649d395503460f94a12f39/html5/thumbnails/9.jpg)
ATLAS computingATLAS computingATLAS computingATLAS computing• 5 million lines of code5 million lines of code
• 200 developers, worldwide200 developers, worldwide
• 200 collision events per second200 collision events per second
• Automated event selection in firmwareAutomated event selection in firmware
• Selected subset of events to diskSelected subset of events to disk
• These selected events distributed These selected events distributed worldwide to a hierarchy of data centers.worldwide to a hierarchy of data centers.
• 5 million lines of code5 million lines of code
• 200 developers, worldwide200 developers, worldwide
• 200 collision events per second200 collision events per second
• Automated event selection in firmwareAutomated event selection in firmware
• Selected subset of events to diskSelected subset of events to disk
• These selected events distributed These selected events distributed worldwide to a hierarchy of data centers.worldwide to a hierarchy of data centers.
![Page 10: Data-Intensive Science: Addressing common needs with shared tools Christopher Stubbs Professor Department of Physics Department of Astronomy cstubbs@fas.harvard.edu](https://reader030.vdocuments.us/reader030/viewer/2022032800/56649d395503460f94a12f39/html5/thumbnails/10.jpg)
Sky Surveys in AstronomySky Surveys in AstronomySky Surveys in AstronomySky Surveys in AstronomyOptical:Optical:
PanSTARRSPanSTARRS
1.4 Gpix, 1.8m1.4 Gpix, 1.8m
Optical:Optical:
PanSTARRSPanSTARRS
1.4 Gpix, 1.8m1.4 Gpix, 1.8m
Radio:Radio: Mileura Wide-Field ArrayMileura Wide-Field Array
1 km array of 8000 custom antennas1 km array of 8000 custom antennas
128 gigabit/s computing challenge128 gigabit/s computing challenge
Radio:Radio: Mileura Wide-Field ArrayMileura Wide-Field Array
1 km array of 8000 custom antennas1 km array of 8000 custom antennas
128 gigabit/s computing challenge128 gigabit/s computing challenge
![Page 11: Data-Intensive Science: Addressing common needs with shared tools Christopher Stubbs Professor Department of Physics Department of Astronomy cstubbs@fas.harvard.edu](https://reader030.vdocuments.us/reader030/viewer/2022032800/56649d395503460f94a12f39/html5/thumbnails/11.jpg)
11
Close, Far,Close, Far,Recent AncientRecent Ancient
Expansion Expansion historyhistory can be mapped by measuring can be mapped by measuring both distances and redshiftsboth distances and redshifts
Our View of the Expanding UniverseOur View of the Expanding UniverseOur View of the Expanding UniverseOur View of the Expanding Universe
Expansion causes stretching of light, “redshift”Expansion causes stretching of light, “redshift”
![Page 12: Data-Intensive Science: Addressing common needs with shared tools Christopher Stubbs Professor Department of Physics Department of Astronomy cstubbs@fas.harvard.edu](https://reader030.vdocuments.us/reader030/viewer/2022032800/56649d395503460f94a12f39/html5/thumbnails/12.jpg)
12
(Hubble Space Telescope, NASA)(Hubble Space Telescope, NASA)
Supernovae are powerful cosmological probes
Distances to ~6% from brightness
Redshifts from features in spectra
![Page 13: Data-Intensive Science: Addressing common needs with shared tools Christopher Stubbs Professor Department of Physics Department of Astronomy cstubbs@fas.harvard.edu](https://reader030.vdocuments.us/reader030/viewer/2022032800/56649d395503460f94a12f39/html5/thumbnails/13.jpg)
13
Redshift = Δλ / λ
Distanceto Supernova
Far away
Nearby0.01 0.1 1.0
Δλλ
Schmidt et al, High-z SN TeamSchmidt et al, High-z SN Team
![Page 14: Data-Intensive Science: Addressing common needs with shared tools Christopher Stubbs Professor Department of Physics Department of Astronomy cstubbs@fas.harvard.edu](https://reader030.vdocuments.us/reader030/viewer/2022032800/56649d395503460f94a12f39/html5/thumbnails/14.jpg)
14
Near Earth AsteroidsNear Earth AsteroidsNear Earth AsteroidsNear Earth Asteroids
• Inventory of solar system is incompleteInventory of solar system is incomplete• R=1 km asteroids are dinosaur killersR=1 km asteroids are dinosaur killers• R=300m asteroids in ocean wipe out a R=300m asteroids in ocean wipe out a
coastlinecoastline• Demanding project: requires mapping the sky Demanding project: requires mapping the sky
down to 24down to 24thth every few days, individual every few days, individual exposures not to exceed ~20 sec. exposures not to exceed ~20 sec.
• PanSTARRS will detect NEAs to ~400m PanSTARRS will detect NEAs to ~400m
• Inventory of solar system is incompleteInventory of solar system is incomplete• R=1 km asteroids are dinosaur killersR=1 km asteroids are dinosaur killers• R=300m asteroids in ocean wipe out a R=300m asteroids in ocean wipe out a
coastlinecoastline• Demanding project: requires mapping the sky Demanding project: requires mapping the sky
down to 24down to 24thth every few days, individual every few days, individual exposures not to exceed ~20 sec. exposures not to exceed ~20 sec.
• PanSTARRS will detect NEAs to ~400m PanSTARRS will detect NEAs to ~400m
![Page 15: Data-Intensive Science: Addressing common needs with shared tools Christopher Stubbs Professor Department of Physics Department of Astronomy cstubbs@fas.harvard.edu](https://reader030.vdocuments.us/reader030/viewer/2022032800/56649d395503460f94a12f39/html5/thumbnails/15.jpg)
Cosmic Cinematography: ChallengesCosmic Cinematography: ChallengesCosmic Cinematography: ChallengesCosmic Cinematography: Challenges
The “static” sky: The “static” sky:
optimal co-adding of images, optimal co-adding of images,
database issuesdatabase issues
The transient sky:The transient sky:
variability classificationvariability classification
asteroid association and orbitsasteroid association and orbits
light curve analysislight curve analysis
fusion with other data setsfusion with other data sets
The “static” sky: The “static” sky:
optimal co-adding of images, optimal co-adding of images,
database issuesdatabase issues
The transient sky:The transient sky:
variability classificationvariability classification
asteroid association and orbitsasteroid association and orbits
light curve analysislight curve analysis
fusion with other data setsfusion with other data sets
![Page 16: Data-Intensive Science: Addressing common needs with shared tools Christopher Stubbs Professor Department of Physics Department of Astronomy cstubbs@fas.harvard.edu](https://reader030.vdocuments.us/reader030/viewer/2022032800/56649d395503460f94a12f39/html5/thumbnails/16.jpg)
16
A New Approach to Radio A New Approach to Radio Astronomy HardwareAstronomy Hardware
![Page 17: Data-Intensive Science: Addressing common needs with shared tools Christopher Stubbs Professor Department of Physics Department of Astronomy cstubbs@fas.harvard.edu](https://reader030.vdocuments.us/reader030/viewer/2022032800/56649d395503460f94a12f39/html5/thumbnails/17.jpg)
17
A Brief History of the Universe
•culmination of structure formation •first luminous structures•turning point after the Dark Ages
Era of Reionization
ionized
neutral( H )
ionized
z~6.2
“The
Gap
”
![Page 18: Data-Intensive Science: Addressing common needs with shared tools Christopher Stubbs Professor Department of Physics Department of Astronomy cstubbs@fas.harvard.edu](https://reader030.vdocuments.us/reader030/viewer/2022032800/56649d395503460f94a12f39/html5/thumbnails/18.jpg)
18
BOOLARDY
![Page 19: Data-Intensive Science: Addressing common needs with shared tools Christopher Stubbs Professor Department of Physics Department of Astronomy cstubbs@fas.harvard.edu](https://reader030.vdocuments.us/reader030/viewer/2022032800/56649d395503460f94a12f39/html5/thumbnails/19.jpg)
19
Lincoln Greenhill (CfA)- MWA project
![Page 20: Data-Intensive Science: Addressing common needs with shared tools Christopher Stubbs Professor Department of Physics Department of Astronomy cstubbs@fas.harvard.edu](https://reader030.vdocuments.us/reader030/viewer/2022032800/56649d395503460f94a12f39/html5/thumbnails/20.jpg)
20
IIC affords us the opportunity to share IIC affords us the opportunity to share resources, tools and know-howresources, tools and know-how
IIC affords us the opportunity to share IIC affords us the opportunity to share resources, tools and know-howresources, tools and know-how
• Shared hardware maximizes effectivenessShared hardware maximizes effectiveness• Shared archival data storage, cooperativelyShared archival data storage, cooperatively• Reap benefits of sophisticated system Reap benefits of sophisticated system
administrators and database professionalsadministrators and database professionalsPeople are quantized, unaffordable for single group
• Learn from each other on technical topics Learn from each other on technical topics of common interestof common interestOften large discrepancies across subfields, IIC raises
all boats.
• Shared hardware maximizes effectivenessShared hardware maximizes effectiveness• Shared archival data storage, cooperativelyShared archival data storage, cooperatively• Reap benefits of sophisticated system Reap benefits of sophisticated system
administrators and database professionalsadministrators and database professionalsPeople are quantized, unaffordable for single group
• Learn from each other on technical topics Learn from each other on technical topics of common interestof common interestOften large discrepancies across subfields, IIC raises
all boats.
![Page 21: Data-Intensive Science: Addressing common needs with shared tools Christopher Stubbs Professor Department of Physics Department of Astronomy cstubbs@fas.harvard.edu](https://reader030.vdocuments.us/reader030/viewer/2022032800/56649d395503460f94a12f39/html5/thumbnails/21.jpg)
8K x 8K pixel array8K x 8K pixel array
16 independent amplifiers16 independent amplifiers
Each is a 1024 x 2048 Each is a 1024 x 2048 subimagesubimage
8K x 8K pixel array8K x 8K pixel array
16 independent amplifiers16 independent amplifiers
Each is a 1024 x 2048 Each is a 1024 x 2048 subimagesubimage
![Page 22: Data-Intensive Science: Addressing common needs with shared tools Christopher Stubbs Professor Department of Physics Department of Astronomy cstubbs@fas.harvard.edu](https://reader030.vdocuments.us/reader030/viewer/2022032800/56649d395503460f94a12f39/html5/thumbnails/22.jpg)
22
QuickTime™ and aTIFF (LZW) decompressor
are needed to see this picture.
QuickTime™ and aTIFF (LZW) decompressor
are needed to see this picture.