collaborations between calit2, sio, and the venter institute-a beginning

30
Collaborations Between Calit2, SIO, and the Venter Institute—a Beginning" Talk to the Venter Institute Board La Jolla, CA December 5, 2005 Dr. Larry Smarr Director, California Institute for Telecommunications and Information Technology; Harry E. Gruber Professor, Dept. of Computer Science and Engineering Jacobs School of Engineering, UCSD

Upload: larry-smarr

Post on 20-Aug-2015

314 views

Category:

Technology


0 download

TRANSCRIPT

“Collaborations Between Calit2, SIO, and the Venter Institute—a Beginning"

Talk to the

Venter Institute Board

La Jolla, CA

December 5, 2005

Dr. Larry Smarr

Director, California Institute for Telecommunications and Information Technology;

Harry E. Gruber Professor,

Dept. of Computer Science and Engineering

Jacobs School of Engineering, UCSD

Driving Cyberinfrastructure with Environmental Metagenomics

Samples Collected by Sorcerer II

How did Calit2, SIO, and VI Arrive at This Unified Vision?

Funded Today!$24. 5 M

Over 7 Years

J. Craig Venter, et al.

Science 2 April 2004:

Vol. 304. pp. 66 - 74

Prochlorococcus Microbacterium

Burkholderia

Rhodobacter SAR-86

unknown

unknown

Metagenomics “Extreme Assembly” Requires Large Amount of Pixel Real Estate

Source: Karin RemingtonJ. Craig Venter Institute

Metagenomics Requires a Global View of Data and the Ability to Zoom Into Detail Interactively

Overlay of Metagenomics Data onto Sequenced Reference Genomes(This Image: Prochloroccocus marinus MED4)

Source: Karin RemingtonJ. Craig Venter Institute

The OptIPuter – Creating High Resolution Portals Over Dedicated Optical Channels to Global Science Data

Green: Purkinje CellsRed: Glial CellsLight Blue: Nuclear DNA

Source: Mark

Ellisman, David Lee,

Jason Leigh

300 MPixel Image!

Calit2 (UCSD, UCI) and UIC Lead Campuses—Larry Smarr PIPartners: SDSC, USC, SDSU, NW, TA&M, UvA, SARA, KISTI, AIST

Scalable Displays Allow Both Global Content and Fine Detail

Source: Mark

Ellisman, David Lee,

Jason Leigh

30 MPixel SunScreen Display Driven by a 20-node Sun Opteron Visualization Cluster

Allows for Interactive Zooming from Cerebellum to Individual Neurons

Source: Mark Ellisman, David Lee, Jason Leigh

Why Optical NetworksWill Become the 21st Century Driver

Scientific American, January 2001

Number of Years0 1 2 3 4 5

Pe

rfo

rma

nc

e p

er

Do

llar

Sp

en

t

Data Storage(bits per square inch)

(Doubling time 12 Months)

Optical Fiber(bits per second)

(Doubling time 9 Months)

Silicon Computer Chips(Number of Transistors)

(Doubling time 18 Months)

Challenge: Average Throughput of NASA Data Products to End User is Only < 50 Megabits/s

Tested from GSFC-ICESATJanuary 2005

http://ensight.eos.nasa.gov/Missions/icesat/index.shtml

fc *

Solution: Individual 1 or 10Gbps Lightpaths -- “Lambdas on Demand”

(WDM)

Source: Steve Wallach, Chiaro Networks

“Lambdas”

San Francisco Pittsburgh

Cleveland

National Lambda Rail (NLR) and TeraGrid Provides Cyberinfrastructure Backbone for U.S. Researchers

San Diego

Los Angeles

Portland

Seattle

Pensacola

Baton Rouge

HoustonSan Antonio

Las Cruces /El Paso

Phoenix

New York City

Washington, DC

Raleigh

Jacksonville

Dallas

Tulsa

Atlanta

Kansas City

Denver

Ogden/Salt Lake City

Boise

Albuquerque

UC-TeraGridUIC/NW-Starlight

Chicago

International Collaborators

NLR 4 x 10Gb Lambdas Initially Capable of 40 x 10Gb wavelengths at Buildout

NSF’s TeraGrid Has 4 x 10Gb Lambda Backbone

Links Two Dozen State and Regional Optical

Networks

DOE, NSF, & NASA

Using NLR

Extending Telepresence with Remote Interactive Analysis of Data Over NLR

HDTV Over Lambda

OptIPuter Visualized

Data

SIO/UCSD

NASA Goddard

www.calit2.net/articles/article.php?id=660

August 8, 2005

25 Miles

Venter Institute

First Trans-Pacific Super High Definition Telepresence Meeting in New Calit2 Digital Cinema Auditorium

Keio University President Anzai

UCSD Chancellor Fox

Sony NTT SGI

Lays Technical Basis for Global Scientific

Collaboration

September 26-30, 2005Calit2 @ University of California, San Diego

California Institute for Telecommunications and Information Technology

Calit2@UCSD Is Connected to the World at 10,000 Mbps

iGrid

2005T H E G L O B A L L A M B D A I N T E G R A T E D F A C I L I T Y

Maxine Brown, Tom DeFanti, Co-Chairs

www.igrid2005.org

50 Demonstrations, 20 Counties, 10 Gbps/Demo

Calit2 is Partnering with SIOto Prototype a Digital Environment Research Systems

• Viewing and Analyzing Earth Satellite Data Sets• Earth Topography• Atmospheric Brown Clouds• Climate Modeling • Surface, Subsurface, and Ocean Floor Observatories• Coastal Zone Data Assimilation• Ocean Environmental Metagenomics

John Orcutt, Director CEOADeputy Director, SIO

Smarr March 2005 Talk to SIO CouncilLed to Calit2 Discussions with Craig Venter

First Remote Interactive High Definition Video Exploration of Deep Sea Vents

Source John Delaney & Deborah Kelley, UWash

Canadian-U.S. Collaboration

A Near Future Metagenomics Fiber Optic-Enabled Data Generator

Source John Delaney, UWash

www.sccoos.org

Use SCCOOS As Prototype for Coastal Zone Data Assimilation Testbed

Goal:

Link SCCOOS Sites with

LambdaGridto

Prototype Future

Ocean and Earth

Sciences Observing

System

Yellow—Proposed Initial Lambda Backbone

Use OptIPuter to Couple Data Assimilation Models to Remote Data Sources Including Biology

Regional Ocean Modeling System (ROMS) http://ourocean.jpl.nasa.gov/

NASA MODIS Mean Primary Productivity for April 2001 in California Current System

Marine Microbial MetagenomicsFrom Species Genomes to Ecological Genomes

• Each Sequence is a Part of an Entire Biological Community• Sequences, Genes and Gene Families, Coupled With

Environmental Metadata– Tremendous Potential to Better Understand the Functioning

of Natural Ecosystems

• Challenge– Much More Powerful Information Infrastructure Required to

Support Metagenomics

Scripps Genome Center

Dr. Terry Gaasterland

Evolution is the Principle of Biological Systems:Most of Evolutionary Time Was in the Microbial World

You Are

Here

Source: Carl Woese, et al

Much of Genome Work Has

Occurred in Animals

Comparative Genomics Can Reveal Biological FactsThat Are Not Visible Within a Species

“After sequencing these three genomes, it is clear that substantial rearrangements in the human genome happen only once in a million years, while the rate of rearrangements in the rat and

mouse is much faster.”--Glenn Tesler, UCSD Dept. of Mathematics

www.calit2.net/culture/features/2004/4-1_pevzner.html

Co-Authors Pavel Pevzner and Glenn Tesler, UCSD

April 1, 2004 December 05, 2002December 9, 2004

Advanced Algorithmic Techniques Reveal Unexpected Results

“Many of the chicken–human aligned,

non-coding sequences occur

far from genes, frequently in clusters

that seem to be under selection for

functions that are not yet understood.”

Nature 432, 695 - 716 (09 December 2004)

David A. Hinds, Laura L. Stuve, Geoffrey B. Nilsen, Eran Halperin, Eleazar Eskin, Dennis G. Ballinger,

Kelly A. Frazer, David R. Cox. “Whole-Genome Patterns of Common DNA Variation

in Three Human Populations” Science 18 February, 2005: 307(5712):1072-1079.

Calit2 Researcher Eskin Collaborates with Perlegen Sciences on Map of Human Genetic Variation Across Populations

“We have characterized whole-genome patterns of common human DNA variation by genotyping

1,586,383 single-nucleotide polymorphisms (SNPs) in 71 Americans of European, African, and Asian

ancestry.”

“Although knowledge of a single genetic risk factor can seldom be used to predict the treatment

outcome of a common disease, knowledge of a large fraction of all the major genetic risk factors contributing to a treatment response or common

disease could have immediate utility, allowing existing treatment options to be matched to

individual patients without requiring additional knowledge of the mechanisms by which the genetic

differences lead to different outcomes .”“More detailed haplotype

analysis results are available at http://research.calit2.net/hap/wgha/ “

The Bioinformatics Core of the Joint Center for Structural Genomics will be Housed in the Calit2@UCSD Building

Extremely Thermostable -- Useful for Many Industrial Processes (e.g. Chemical and Food)

173 Structures (122 from JCSG)

• Determining the Protein Structures of the Thermotoga Maritima Genome • 122 T.M. Structures Solved by JCSG (75 Unique In The PDB) • Direct Structural Coverage of 25% of the Expressed Soluble Proteins• Probably Represents the Highest Structural Coverage of Any Organism

Source: John Wooley, UCSD

Web PortalRich Clients

Providing Integrated Grid Software and Infrastructure for Multi-Scale BioModeling

Telescience Portal

Grid Middleware and Web Services

Workflow

MiddlewarePMV ADT

Vision Continuity

APBSCommand

Grid and Cluster Computing Applications Infrastructure

Rocks Grid of ClustersAPBS Continuity

Gtomo2TxBRAutodockGAMESS

QMView

National Biomedical Computation Resource an NIH supported resource center

Located in Calit2@UCSD Building

Calit2 Intends to Jump BeyondTraditional Web-Accessible Databases

Data Backend

(DB, Files)

W E

B P

OR

TA

L(p

re-f

ilte

red

, q

ue

rie

sm

eta

da

ta)

Response

Request

BIRN

PDB

NCBI Genbank+ many others

Source: Phil Papadopoulos, SDSC, Calit2

Flat FileServerFarm

W E

B P

OR

TA

L

TraditionalUser

Response

Request

DedicatedCompute Farm(100s of CPUs)

TeraGrid: Cyberinfrastructure Backplane(scheduled activities, e.g. all by all comparison)

(10000s of CPUs)

Web(other service)

Local Cluster

LocalEnvironment

DirectAccess LambdaCnxns

Op

tIPu

ter

Clu

ste

r C

lou

dData-BaseFarm

10 GigE Fabric

Calit2’s Direct Access Core Architecture Will Create Next Generation Metagenomics Server

Source: Phil Papadopoulos, SDSC, Calit2+

We

b S

erv

ice

s

What Will Our Core Data Sets Be?

• Metagenomic– Sargasso Sea + Sorcerer II Expedition (GOS)– JGI Community Sequencing Project

• Microbial Genomes– Moore Marine Microbial Project– JGI Community Sequencing Project– Other Relevant genomes (e.g., from Genbank)

• Standard– Non-Redundant Nucleotide and AA Databases

• Environmental and Satellite data– NOAA Oceans and NASA Goddard Satellite Date

Source: Saul KravitzDirector of Software Engineering

J. Craig Venter Institute

Looking Back Nearly 4 Billion YearsIn the Evolution of Microbe Genomics

Science Falkowski and Vargas 304 (5667): 58