cyberinfrastructure and california
DESCRIPTION
Cyberinfrastructure and California. Dr. Francine Berman Director, San Diego Supercomputer Center Professor and High Performance Computing Endowed Chair, UC San Diego. The Digital World. Science. Entertainment. Commerce. Information. wireless. sensors. computer. Field instrument. - PowerPoint PPT PresentationTRANSCRIPT
UNIVERSITY OF CALIFORNIA
SAN DIEGO SUPERCOMPUTER CENTER
Fran Berman
UCSD
Dr. Francine BermanDirector, San Diego Supercomputer Center
Professor and High Performance Computing Endowed Chair, UC San Diego
Cyberinfrastructure and California
UNIVERSITY OF CALIFORNIA
SAN DIEGO SUPERCOMPUTER CENTER
Fran Berman
UCSD
The Digital World
Commerce
Entertainment
Information
Science
UNIVERSITY OF CALIFORNIA
SAN DIEGO SUPERCOMPUTER CENTER
Fran Berman
UCSD
Today’s Technology is a Team Sport• Today’s “computer” is
a coordinated set of hardware, software, data, and services providing an “end-to-end” resource.
network
DATA
computer
storage
fieldinstrument
network
computer
DATA
network
computerviz
computer
sensorsFieldinstrument
DATA
wireless
The “computer” as an integrated set of resources
• Cyberinfrastructure captures the integrated character of today’s IT environment
UNIVERSITY OF CALIFORNIA
SAN DIEGO SUPERCOMPUTER CENTER
Fran Berman
UCSD
Cyberinfrastructure -- An Integrating Concept
Cyberinfrastructure =
Resources (computers, data
storage, networks, scientific instruments,
experts, etc.)
+ “Glue”(integrating software,
systems, and organizations)
UNIVERSITY OF CALIFORNIA
SAN DIEGO SUPERCOMPUTER CENTER
Fran Berman
UCSD
How does Cyberinfrastructure Work?Cyberinfrastructure-enabled Neurosurgery
• PROBLEM: Neuro-surgeons seek to remove as much tumor tissue as possible while minimizing removal of healthy brain tissue
• Brain deforms during surgery• Surgeons must align preoperative
brain image with intra-operative images to provide surgeons the best opportunity for intra-surgical navigation
Radiologists and neurosurgeons at Brigham and Women’s Hospital, Harvard Medical School exploring transmission of 30/40 MB brain images (generated during surgery) to SDSC for analysis and alignment
Finite element simulation on biomechanical model for volumetric deformation performed at SDSC; output results are sent to BWH where updated images are shown to surgeons
Transmission repeated every hour during 6-8 hour surgery.
Transmission and output must take on the order of minutes
UNIVERSITY OF CALIFORNIA
SAN DIEGO SUPERCOMPUTER CENTER
Fran Berman
UCSD
SDSC• National facility funded by NSF,
NIH, DOE, Library of Congress, NARA, etc.
• Employs nearly 400 researchers, staff and students
• National Facility and UCSD Organized Research Unit
• Home to many associated activities including
• Protein Data Bank• Biomedical Informatics Research
Network (BIRN) Coordinating Center
• Geosciences Network (GEON)• NEES IT Center, etc.
SDSC is a National Cyberinfrastructure Center
Grid andCluster
Computing
Data-oriented
Science and Engineering
Networking
High Performancecomputing
Data andKnowledge Systems
ComputationalScience and Engineering
Community Databasesand Data Collections
SW tools,workbenches,
toolkits
UNIVERSITY OF CALIFORNIA
SAN DIEGO SUPERCOMPUTER CENTER
Fran Berman
UCSD
SDSC Resources Are Available to the CommunityCOMPUTE SYSTEMS• DataStar
• 2,528 Power4+ processors• IBM p655 8-way and p690
32-way nodes• 7 TB total memory• Up to 3 GBps I/O to disk
• TeraGrid Cluster• 512 Itanium2 IA-64
processors• 1 TB total memory• Also 128 2-way data nodes
• Blue Gene Data• First academic IBM Blue
Gene system• 2,048 PowerPC processors• 128 I/O nodes
http://www.sdsc.edu/user_services/
SCIENCE and TECHNOLOGY STAFF, SOFTWARE, SERVICES
• User Services• Application/Community Collaborations• Education and Training• SDSC Synthesis Center• Community SW, toolkits, portals, codes• http://www.sdsc.edu/
DATA ENVIRONMENT• 1.4 PB Storage-area Network (SAN)• 6 PB StorageTek tape library• HPSS and SAM-QFS archival systems• DB2, Oracle, MySQL• Storage Resource Broker• 72-CPU Sun Fire 15K• IBM p690s – HPSS, DB2, etc
• http://datacentral.sdsc.edu/
Support for community data collections and
databases
Data management,
mining, analysis, and preservation
UNIVERSITY OF CALIFORNIA
SAN DIEGO SUPERCOMPUTER CENTER
Fran Berman
UCSD
Cyberinfrastructure Can Help Harness Today’s Deluge of Data
• Over the next decade, data will come from everywhere• Scientific instruments• Experiments• Sensors and sensornets• New devices (personal digital devices,
computer-enabled clothing, cars, …)
• And be used by everyone• Scientists• Consumers• Educators• General public
• Cyberinfrastructure must support unprecedented diversity, globalization, integration, scale, and use
Data from sensors
Data from simulations
Data from
instruments
Data from analysis
Volunteer Data
UNIVERSITY OF CALIFORNIA
SAN DIEGO SUPERCOMPUTER CENTER
Fran Berman
UCSD
How much Data is there?*
Kilo 103
Mega 106
Giga 109
Tera 1012
Peta 1015
Exa 1018
1 human brain at the
micron level = 1 PetaByte
1 novel = 1 MegaByte
iPod Shuffle (up to 120 songs) = 512 MegaBytes Printed materials in the Library of
Congress = 10 TeraBytes
SDSC HPSS tape archive = 6 PetaBytes
All worldwide information in one year
= 2 ExaBytes
1 Low Resolution
Photo = 100 KiloBytes
* Rough/average estimates
UNIVERSITY OF CALIFORNIA
SAN DIEGO SUPERCOMPUTER CENTER
Fran Berman
UCSD
Cybeirnfrastructure and Data: Using Data for Analysis and
Simulation
UNIVERSITY OF CALIFORNIA
SAN DIEGO SUPERCOMPUTER CENTER
Fran Berman
UCSD
Major Major Earthquakes on Earthquakes on
the San the San Andreas Fault, Andreas Fault, 1680-present1680-present
19061906M 7.8M 7.8
18571857M 7.8M 7.8 16801680
M 7.7M 7.7
How dangerous is the How dangerous is the southern San southern San
Andreas Fault?Andreas Fault?
• The SCEC TeraShake simulation is a result of immense effort from the Geoscience community for over 10 years
• Focus is on understanding big earthquakes and how they will impact sediment-filled basins.
• Simulation combines massive amounts of data, high-resolution models, large-scale supercomputer runs
• TeraShake results provide new information enabling better
• Estimation of seismic risk
• Emergency preparation, response and planning
• Design of next generation of earthquake-resistant structures
• Such simulations provide potentially immense benefits in saving both many lives and billions in economic losses
?
Cyberinfrastructure – enabled Disaster Preparedness
UNIVERSITY OF CALIFORNIA
SAN DIEGO SUPERCOMPUTER CENTER
Fran Berman
UCSD
Domain: 600Km x 300km x 80km Mesh Dimension: 3000x1500x400
Spatial resolution = 200m Simulated time = 200s
Number of time steps = 20,000• What you’re looking at:
• L.A. experiences strong ground motion from the S->N scenario
• The N->S rupture generates strong reverberations in the Imperial Valley, ultimately hitting Mexicalli and other northern Mexico cities.
• Large local peaks in ground motion near Palm Springs, resulting in immense damage.
UNIVERSITY OF CALIFORNIA
SAN DIEGO SUPERCOMPUTER CENTER
Fran Berman
UCSD
Making Terashake Work -- Resources
• Data Storage• 47 TB archival tape storage
on Sun StorEdge SAM-QFS• 47 TB backup on High
Performance Storage system HPSS
• SRB Collection with 1,000,000 files
• Funding• SDSC Cyberinfrastructure
resources for TeraShake funded by NSF
• Southern California Earthquake Center is an NSF-funded geoscience research and development center
• Computers and Systems• 80,000 hours on 240
processors of DataStar• 256 GB memory p690 used
for testing, p655s used for production run, TG used for porting
• 30 TB Global Parallel file GPFS• Run-time 100 MB/s data transfer from
GPFS to SAM-QFS• 27,000 hours post-processing for high
resolution rendering
• People • 20+ people involved in information
technology support• 20+ people involved in geoscience
modeling and simulation
UNIVERSITY OF CALIFORNIA
SAN DIEGO SUPERCOMPUTER CENTER
Fran Berman
UCSD
Cyberinfrastructure and Data: Preserving our Scientific and
Cultural Heritage
UNIVERSITY OF CALIFORNIA
SAN DIEGO SUPERCOMPUTER CENTER
Fran Berman
UCSD
Data Preservation• Many Science, Cultural, and Official
Collections must be sustained for the foreseeable future
• Critical collections must be preserved:
• community reference data collections (e.g. Protein Data Bank)
• irreplaceable collections (e.g. Shoah collection)
• longitudinal data (e.g. PSID – Panel Study of Income Dynamics)
• No plan for preservation often means that data is lost or damaged
“….the progress of science and useful arts … depends on the reliable preservation of
knowledge and information for generations to come.”
“Preserving Our Digital Heritage”, Library of Congress
UNIVERSITY OF CALIFORNIA
SAN DIEGO SUPERCOMPUTER CENTER
Fran Berman
UCSD
Key Challenges for Digital Preservation• What should we preserve?
• What materials must be “rescued”?• How to plan for preservation of materials by
design?
• How should we preserve it?• Formats• Storage media• Stewardship – who is responsible?
• Who should pay for preservation?• The content generators?• The government?• The users?
• Who should have access?
Print media provides easy access for long periods of time
but is hard to data-mine
Digital media is easier to data-mine but requires management of evolution of media
and resource planning over time
UNIVERSITY OF CALIFORNIA
SAN DIEGO SUPERCOMPUTER CENTER
Fran Berman
UCSD
Planning Ahead for Preservation
Services
PolicyR&D
Ingestion
• Comprehensive approach to infrastructure for long-term preservation requires the integration of• Collection ingestion
• Access and Services
• Research and development for new functionality and adaptation to evolving technologies
• Business model, data policies, and management issues critical to success of the infrastructure Consortium
UNIVERSITY OF CALIFORNIA
SAN DIEGO SUPERCOMPUTER CENTER
Fran Berman
UCSD
Cyberinfrastructure Resources at SDSC
UNIVERSITY OF CALIFORNIA
SAN DIEGO SUPERCOMPUTER CENTER
Fran Berman
UCSD
SDSC Data Central• First program of its kind to
support research and community data collections and databases
• Comprehensive resources• Disk: 400 TB accessible via HPC
systems, Web, SRB, GridFTP• Databases: DB2, Oracle, MySQL• SRB: Collection management• Tape: 6 PB, accessible via file system,
HPSS, Web, SRB, GridFTP
• Data collection and database hosting• Batch oriented access• Collection management services• Collaboration opportunities:
• Long-term preservation • Data technologies and tools
New Allocated Data Collections include
• Bee Behavior (Behavioral Science)• C5 Landscape DB (Art)• Molecular Recognition Database
(Pharmaceutical Sciences)• LIDAR (Geoscience)• LUSciD (Astronomy)• NEXRAD-IOWA (Earth Science)• AMANDA (Physics)• SIO_Explorer (Oceanography)• Tsunami and Landsat Data
(Earthquake Engineering)• UC Merced Library Japanese Art Collection
(Art)• Terabridge (Structural Engineering)
UNIVERSITY OF CALIFORNIA
SAN DIEGO SUPERCOMPUTER CENTER
Fran Berman
UCSD
SDSC Cyberinfrastructure Resources Heavily Used by UC faculty and students
• UC PIs account for 329+ trillion bytes of data stored at SDSC
• In FY05, over 5 million CPU hours on HPC machines at SDSC were used by UC faculty and students at all campuses
• UCSD faculty make up 40% of among top users of SDSC compute resources
SDSC Academic Associates Program Targets Enabling Cyberinfrastructure Collaborations
SDSC/UC Academic Associates Program Cyberinfrastructure and “Seeding” Activities
• Targeted workshops• Priority SW installation and support • Priority participation for Cyberinfrastructure
Summer Institute
• Focused assistance with developing successful proposals for national allocation programs
• Targeted user services • Special UC compute and data allocations• Priority for “early usage” of new national
resources
UNIVERSITY OF CALIFORNIA
SAN DIEGO SUPERCOMPUTER CENTER
Fran Berman
UCSD
Cyberinfrastructure is Fundamental for California
• Cyberinfrastructure captures the practice and potential of modern science and engineering
• Cyberinfrastructure is the focus of increasing number of federal programs• NSF (all directorates), NIH (BISTI,
Bioinformatics, Computational Biology, etc.), DOE (Science Grid), etc.
• Cyberinfrastructure is critical for success in modern research and education initiatives• Stem cell research• Grid computing• Multi-disciplinary science and engineering
Leadership in Cyberinfrastructure
provides a competitive edge to
California researchers, educators,
practitioners, and business leaders
UNIVERSITY OF CALIFORNIA
SAN DIEGO SUPERCOMPUTER CENTER
Fran Berman
UCSD
Thank You