source: alex szalay. example: sloan digital sky survey the sdss telescope array is systematically...
TRANSCRIPT
![Page 1: Source: Alex Szalay. Example: Sloan Digital Sky Survey The SDSS telescope array is systematically mapping ¼ of the entire sky Discoveries are made by](https://reader035.vdocuments.us/reader035/viewer/2022062409/56649f065503460f94c1c338/html5/thumbnails/1.jpg)
source: Alex Szalay
![Page 2: Source: Alex Szalay. Example: Sloan Digital Sky Survey The SDSS telescope array is systematically mapping ¼ of the entire sky Discoveries are made by](https://reader035.vdocuments.us/reader035/viewer/2022062409/56649f065503460f94c1c338/html5/thumbnails/2.jpg)
Example: Sloan Digital Sky SurveyThe SDSS telescope array is systematically
mapping ¼ of the entire sky
Discoveries are made by querying the database, not through a zero-sum wrestling match for telescope time
Managed by an RDBMS(MS SQL Server), equipped with a hierarchical triangular mesh index, among other customizations
15 TB in the final release in 2007818 GB in the RDBMS (13.6B tuples)
![Page 3: Source: Alex Szalay. Example: Sloan Digital Sky Survey The SDSS telescope array is systematically mapping ¼ of the entire sky Discoveries are made by](https://reader035.vdocuments.us/reader035/viewer/2022062409/56649f065503460f94c1c338/html5/thumbnails/3.jpg)
source: Alex Szalay
![Page 4: Source: Alex Szalay. Example: Sloan Digital Sky Survey The SDSS telescope array is systematically mapping ¼ of the entire sky Discoveries are made by](https://reader035.vdocuments.us/reader035/viewer/2022062409/56649f065503460f94c1c338/html5/thumbnails/4.jpg)
Drowning in data; starving for information
Acquisition eventually outpaces analysis Medicine: Online publishing, digital charts Astronomy: Big telescopes (more in a bit) Genetics: PCR, Shotgun Sequencing Oceanography: ?? Marine Microbiology: ??
Empirical X Analytical X Computational X X-informatics
“Increase Data Collection Exponentially in Less Time, with FlowCAM”
![Page 5: Source: Alex Szalay. Example: Sloan Digital Sky Survey The SDSS telescope array is systematically mapping ¼ of the entire sky Discoveries are made by](https://reader035.vdocuments.us/reader035/viewer/2022062409/56649f065503460f94c1c338/html5/thumbnails/5.jpg)
Cyber-Observatories
Arctic Observing Network (AON) Ocean Observing Initiative (OOI) National Ecological Observatory Network (NEON) The Waters Network The Long-Term Ecological Research (LTER) network The Geosciences Network (GEON) Earthscope/Incorporated Research Institutions for Seismology
(IRIS) Virtual Solar-Terrestrial Observatory (VSTO) Linked Environments for Atmospheric Discovery (LEAD)
![Page 6: Source: Alex Szalay. Example: Sloan Digital Sky Survey The SDSS telescope array is systematically mapping ¼ of the entire sky Discoveries are made by](https://reader035.vdocuments.us/reader035/viewer/2022062409/56649f065503460f94c1c338/html5/thumbnails/6.jpg)
source: Alex Szalay
![Page 7: Source: Alex Szalay. Example: Sloan Digital Sky Survey The SDSS telescope array is systematically mapping ¼ of the entire sky Discoveries are made by](https://reader035.vdocuments.us/reader035/viewer/2022062409/56649f065503460f94c1c338/html5/thumbnails/7.jpg)
source: Jim Gray
![Page 8: Source: Alex Szalay. Example: Sloan Digital Sky Survey The SDSS telescope array is systematically mapping ¼ of the entire sky Discoveries are made by](https://reader035.vdocuments.us/reader035/viewer/2022062409/56649f065503460f94c1c338/html5/thumbnails/8.jpg)
Relational Databases (In Codd we Trust…)
At IBM Almaden in 60s and 70s, Ted Codd worked out a formal basis for tabular data representation, organization, and access1.
The early systems were buggy and slow (and sometimes reviled), but programmers only had to write 5% of the code the previously did.
Key Idea: Programs that manipulate tabular data exhibit an algebraic structure; proposed a relational algebra to manipulate these data sets in their logical form, indpendently of their physical representation
1 E. F. Codd, “A Relational Model of Data for Large Shared Data Banks”, Communications of the ACM 13(6), pp 377-387, 1970
phsyical data independence
logical data independence
![Page 9: Source: Alex Szalay. Example: Sloan Digital Sky Survey The SDSS telescope array is systematically mapping ¼ of the entire sky Discoveries are made by](https://reader035.vdocuments.us/reader035/viewer/2022062409/56649f065503460f94c1c338/html5/thumbnails/9.jpg)
source: Raghu Ramakrishnan
![Page 10: Source: Alex Szalay. Example: Sloan Digital Sky Survey The SDSS telescope array is systematically mapping ¼ of the entire sky Discoveries are made by](https://reader035.vdocuments.us/reader035/viewer/2022062409/56649f065503460f94c1c338/html5/thumbnails/10.jpg)
Characteristicsof Cloud Computing• Virtual – Physical location and underlying
infrastructure details are transparent to users • Scalable – Able to break complex workloads
into pieces to be served across an incrementally expandable infrastructure
• Efficient – “Services Oriented Architecture” for dynamic provisioning of shared compute resources
• Flexible – Can serve a variety of workload types – both consumer and commercial
![Page 11: Source: Alex Szalay. Example: Sloan Digital Sky Survey The SDSS telescope array is systematically mapping ¼ of the entire sky Discoveries are made by](https://reader035.vdocuments.us/reader035/viewer/2022062409/56649f065503460f94c1c338/html5/thumbnails/11.jpg)
Cloud Computing as Hosted Data Management Services
Yahoo Yahoo Distributed Hash Tables: Key/value pairs Yahoo Distributed Ordered Tables: Ordered ranges PNUTS: Relational-style storage, indexing and query
Amazon S3: Simple Storage SimpleDB: Quasi-Relational features
Google APIs for: Storage, Visualization, Document processing, Images, Mail
Microsoft: CloudDB: Relational-style features
![Page 12: Source: Alex Szalay. Example: Sloan Digital Sky Survey The SDSS telescope array is systematically mapping ¼ of the entire sky Discoveries are made by](https://reader035.vdocuments.us/reader035/viewer/2022062409/56649f065503460f94c1c338/html5/thumbnails/12.jpg)
Workflow at CMOP
Cloning/cDNA/…
Sequencingplates
InspectionFASTA files
OHSUWashington University
BLAST
FASTA files
PNNL
Post processingHit tables
Cleaninge.g., trim bad reads at the end
Link
Shared Knowledge
Analyze
synopsis
Cloud
Hit tables + metadata