next steps for working scientists: access to information · funded centralized facilities,...

104
Next Steps for Working Scientists: Access to Information ( http://www.esp.org/rjr/codata.pdf ) Robert J. Robbins Fred Hutchinson Cancer Research Center 1100 Fairview Avenue North, LV-101 Seattle, Washington 98109 [email protected] (206) 667 2920

Upload: others

Post on 24-Aug-2020

0 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Next Steps for Working Scientists: Access to Information · funded centralized facilities, including a stock center for the cloned DNA fragments generated in the mapping and sequencing

Next Steps for Working Scientists:Access to Information

( http://www.esp.org/rjr/codata.pdf )

Robert J. RobbinsFred Hutchinson Cancer Research Center

1100 Fairview Avenue North, LV-101Seattle, Washington 98109

[email protected](206) 667 2920

Page 2: Next Steps for Working Scientists: Access to Information · funded centralized facilities, including a stock center for the cloned DNA fragments generated in the mapping and sequencing

Abstract

Over the next few years, the relentless exponential effect of Moore’sLaw will have profound effects upon the use computation in scienceand technology. By 2005, analytical power previously available onlyat supercomputer centers will exist on every desktop and the volumeof electronic data flow will be enormous. Even now, a current Intelcomputer delivers more MIPS than the first Cray and GenBankacquires more data every ten weeks than it did in its first ten years.

The information infrastructure needed to support the explosion inscientific computation and scientific data will be substantial. Ifworking scientists are to have adequate access to these resources,significant changes in the way information infrastructure is providedwill be required.

Page 3: Next Steps for Working Scientists: Access to Information · funded centralized facilities, including a stock center for the cloned DNA fragments generated in the mapping and sequencing

3

Topics

• Moore’s Law constantly transforms IT(and everything else).

• Information Technology (IT) has aspecial relationship with biology.

• Current approaches to supporting bio-information infrastructure seeminadequate for 21st-century biology.

• Without better support, much post-genome-era biology may shift entirelyinto the private sector.

Page 4: Next Steps for Working Scientists: Access to Information · funded centralized facilities, including a stock center for the cloned DNA fragments generated in the mapping and sequencing

Moore’s Law

Transforms InfoTech(and everything else)

Page 5: Next Steps for Working Scientists: Access to Information · funded centralized facilities, including a stock center for the cloned DNA fragments generated in the mapping and sequencing

5

Moore’s Law: The Statement

Every eighteen months, thenumber of transistors that canbe placed on a chip doubles.

Gordon Moore, co-founder of Intel...

Page 6: Next Steps for Working Scientists: Access to Information · funded centralized facilities, including a stock center for the cloned DNA fragments generated in the mapping and sequencing

6

Moore’s Law: The EffectP

erfo

rman

ce(c

onst

ant c

ost)

Time

100,000

10,000

1,000

100

10

The computational performancethat can be obtained at constantcost increases exponentially.

Page 7: Next Steps for Working Scientists: Access to Information · funded centralized facilities, including a stock center for the cloned DNA fragments generated in the mapping and sequencing

7

Moore’s Law: The EffectP

erfo

rman

ce(c

onst

ant c

ost)

Cos

t(c

onst

ant

perf

orm

ance

)

Time

100,000 100,000

10,000

1,000

100

10

10,000

1,000

100

10

Similarly, the cost of fixedcomputational performancedeclines exponentially.

Page 8: Next Steps for Working Scientists: Access to Information · funded centralized facilities, including a stock center for the cloned DNA fragments generated in the mapping and sequencing

8

Moore’s Law: The Effect

Three Phases of Novel IT Applications

• It’s Impossible

• It’s Impractical

• It’s Overdue

In many fields, those who are overdue with keyIT projects have experienced catastrophic lossesin competitive advantage.

Page 9: Next Steps for Working Scientists: Access to Information · funded centralized facilities, including a stock center for the cloned DNA fragments generated in the mapping and sequencing

9

Moore’s Law: The EffectP

erfo

rman

ce(c

onst

ant c

ost)

Cos

t(c

onst

ant

perf

orm

ance

)

Time

100,000 100,000

10,000

1,000

100

10

10,000

1,000

100

10

P

Page 10: Next Steps for Working Scientists: Access to Information · funded centralized facilities, including a stock center for the cloned DNA fragments generated in the mapping and sequencing

10

Moore’s Law: The EffectP

erfo

rman

ce(c

onst

ant c

ost)

Cos

t(c

onst

ant

perf

orm

ance

)

Time

100,000 100,000

10,000

1,000

100

10

10,000

1,000

100

10

P

Page 11: Next Steps for Working Scientists: Access to Information · funded centralized facilities, including a stock center for the cloned DNA fragments generated in the mapping and sequencing

11

Moore’s Law: The EffectP

erfo

rman

ce(c

onst

ant c

ost)

Cos

t(c

onst

ant

perf

orm

ance

)

Time

100,000 100,000

10,000

1,000

100

10

10,000

1,000

100

10

D

P

Page 12: Next Steps for Working Scientists: Access to Information · funded centralized facilities, including a stock center for the cloned DNA fragments generated in the mapping and sequencing

12

Moore’s Law: The EffectP

erfo

rman

ce(c

onst

ant c

ost)

Cos

t(c

onst

ant

perf

orm

ance

)

Time

100,000 100,000

10,000

1,000

100

10

10,000

1,000

100

10

D

P

C

Page 13: Next Steps for Working Scientists: Access to Information · funded centralized facilities, including a stock center for the cloned DNA fragments generated in the mapping and sequencing

13

Moore’s Law: The EffectP

erfo

rman

ce(c

onst

ant c

ost)

Cos

t(c

onst

ant

perf

orm

ance

)

Time

100,000 100,000

10,000

1,000

100

10

10,000

1,000

100

10

D

P

AA

C

Page 14: Next Steps for Working Scientists: Access to Information · funded centralized facilities, including a stock center for the cloned DNA fragments generated in the mapping and sequencing

14

Moore’s Law: The EffectP

erfo

rman

ce(c

onst

ant c

ost)

Cos

t(c

onst

ant

perf

orm

ance

)

Time

100,000 100,000

10,000

1,000

100

10

10,000

1,000

100

10

D

P

A

C

Page 15: Next Steps for Working Scientists: Access to Information · funded centralized facilities, including a stock center for the cloned DNA fragments generated in the mapping and sequencing

15

Moore’s Law: The EffectP

erfo

rman

ce(c

onst

ant c

ost)

Cos

t(c

onst

ant

perf

orm

ance

)

Time

100,000 100,000

10,000

1,000

100

10

10,000

1,000

100

10

D

P

A

C

Relevance for biology?

Page 16: Next Steps for Working Scientists: Access to Information · funded centralized facilities, including a stock center for the cloned DNA fragments generated in the mapping and sequencing

16

Cost (constant performance)

1,000

10,000

100,000

1,000,000

10,000,000

1975 1980 1985 1990 1995 2000 2005

UniversityPurchase

Page 17: Next Steps for Working Scientists: Access to Information · funded centralized facilities, including a stock center for the cloned DNA fragments generated in the mapping and sequencing

17

Cost (constant performance)

1,000

10,000

100,000

1,000,000

10,000,000

1975 1980 1985 1990 1995 2000 2005

UniversityPurchase

DepartmentPurchase

Page 18: Next Steps for Working Scientists: Access to Information · funded centralized facilities, including a stock center for the cloned DNA fragments generated in the mapping and sequencing

18

Cost (constant performance)

1,000

10,000

100,000

1,000,000

10,000,000

1975 1980 1985 1990 1995 2000 2005

RO1 GrantPurchase

UniversityPurchase

DepartmentPurchase

Page 19: Next Steps for Working Scientists: Access to Information · funded centralized facilities, including a stock center for the cloned DNA fragments generated in the mapping and sequencing

19

Cost (constant performance)

1,000

10,000

100,000

1,000,000

10,000,000

1975 1980 1985 1990 1995 2000 2005

PersonalPurchase

RO1 GrantPurchase

UniversityPurchase

DepartmentPurchase

Page 20: Next Steps for Working Scientists: Access to Information · funded centralized facilities, including a stock center for the cloned DNA fragments generated in the mapping and sequencing

20

Cost (constant performance)

1,000

10,000

100,000

1,000,000

10,000,000

1975 1980 1985 1990 1995 2000 2005

PersonalPurchase

RO1 GrantPurchase

UniversityPurchase

DepartmentPurchase

These prices are in uncorrecteddollars, so the change is evenmore profound than the figuresuggests.

Page 21: Next Steps for Working Scientists: Access to Information · funded centralized facilities, including a stock center for the cloned DNA fragments generated in the mapping and sequencing

Catchingthe

Wave

Page 22: Next Steps for Working Scientists: Access to Information · funded centralized facilities, including a stock center for the cloned DNA fragments generated in the mapping and sequencing

22

Catching the Wave

Fields Transformed by IT:

• finance & banking

Page 23: Next Steps for Working Scientists: Access to Information · funded centralized facilities, including a stock center for the cloned DNA fragments generated in the mapping and sequencing

23

Catching the Wave

Fields Transformed by IT:

• finance & banking

• travel

Page 24: Next Steps for Working Scientists: Access to Information · funded centralized facilities, including a stock center for the cloned DNA fragments generated in the mapping and sequencing

24

Catching the Wave

Fields Transformed by IT:

• finance & banking

• travel

• discount retailing

Page 25: Next Steps for Working Scientists: Access to Information · funded centralized facilities, including a stock center for the cloned DNA fragments generated in the mapping and sequencing

25

Catching the Wave

Fields Transformed by IT:

• finance & banking

• travel

• discount retailing

• biomedical research ?

Page 26: Next Steps for Working Scientists: Access to Information · funded centralized facilities, including a stock center for the cloned DNA fragments generated in the mapping and sequencing

26

Catching the Wave

Fields Transformed by IT:

• finance & banking

• travel

• discount retailing

• biomedical research ?Why biomedical research? (i) biology is inherently informationrich, (ii) appropriately powered computers are now affordablefor the research community, and (iii) post-genome biology willthrive on computation.

Page 27: Next Steps for Working Scientists: Access to Information · funded centralized facilities, including a stock center for the cloned DNA fragments generated in the mapping and sequencing

IT-BiologySynergism

Page 28: Next Steps for Working Scientists: Access to Information · funded centralized facilities, including a stock center for the cloned DNA fragments generated in the mapping and sequencing

28

IT is Special

Information Technology:

• affects the performance and themanagement of tasks

Page 29: Next Steps for Working Scientists: Access to Information · funded centralized facilities, including a stock center for the cloned DNA fragments generated in the mapping and sequencing

29

IT is Special

Information Technology:

• affects the performance and themanagement of tasks

• allows the manipulation of hugeamounts of highly complex data

Page 30: Next Steps for Working Scientists: Access to Information · funded centralized facilities, including a stock center for the cloned DNA fragments generated in the mapping and sequencing

30

IT is Special

Information Technology:

• affects the performance and themanagement of tasks

• allows the manipulation of hugeamounts of highly complex data

• is incredibly plastic(programming and poetry are both exercises in pure thought)

Page 31: Next Steps for Working Scientists: Access to Information · funded centralized facilities, including a stock center for the cloned DNA fragments generated in the mapping and sequencing

31

IT is Special

Information Technology:

• affects the performance and themanagement of tasks

• allows the manipulation of hugeamounts of highly complex data

• is incredibly plastic(programming and poetry are both exercises in pure thought)

• improves exponentially (Moore’s Law)

Page 32: Next Steps for Working Scientists: Access to Information · funded centralized facilities, including a stock center for the cloned DNA fragments generated in the mapping and sequencing

32

Biology is Special

Life is Characterized by:

• individuality

Page 33: Next Steps for Working Scientists: Access to Information · funded centralized facilities, including a stock center for the cloned DNA fragments generated in the mapping and sequencing

33

Biology is Special

Life is Characterized by:

• individuality

• historicity

Page 34: Next Steps for Working Scientists: Access to Information · funded centralized facilities, including a stock center for the cloned DNA fragments generated in the mapping and sequencing

34

Biology is Special

Life is Characterized by:

• individuality

• historicity

• contingency

Page 35: Next Steps for Working Scientists: Access to Information · funded centralized facilities, including a stock center for the cloned DNA fragments generated in the mapping and sequencing

35

Biology is Special

Life is Characterized by:

• individuality

• historicity

• contingency

• high (digital) information content

Page 36: Next Steps for Working Scientists: Access to Information · funded centralized facilities, including a stock center for the cloned DNA fragments generated in the mapping and sequencing

36

Biology is Special

Life is Characterized by:

• individuality

• historicity

• contingency

• high (digital) information content

No law of large numbers...

Page 37: Next Steps for Working Scientists: Access to Information · funded centralized facilities, including a stock center for the cloned DNA fragments generated in the mapping and sequencing

37

IT-Biology Synergism

• Physics needs calculus, the method formanipulating information aboutstatistically large numbers of vanishinglysmall, independent, equivalent things.

Page 38: Next Steps for Working Scientists: Access to Information · funded centralized facilities, including a stock center for the cloned DNA fragments generated in the mapping and sequencing

38

IT-Biology Synergism

• Physics needs calculus, the method formanipulating information aboutstatistically large numbers of vanishinglysmall, independent, equivalent things.

• Biology needs information technology, themethod for manipulating informationabout large numbers of dependent,historically contingent, individual things.

Page 39: Next Steps for Working Scientists: Access to Information · funded centralized facilities, including a stock center for the cloned DNA fragments generated in the mapping and sequencing

39

Biology is Special

For it is in relation to the statistical point of viewthat the structure of the vital parts of livingorganisms differs so entirely from that of anypiece of matter that we physicists and chemistshave ever handled in our laboratories ormentally at our writing desks.

Erwin Schrödinger. 1944. What is Life.

Page 40: Next Steps for Working Scientists: Access to Information · funded centralized facilities, including a stock center for the cloned DNA fragments generated in the mapping and sequencing

40

The Digital Basis of Life

[The] chromosomes ... contain in some kind ofcode-script the entire pattern of the individual’sfuture development and of its functioning in themature state. ... [By] code-script we mean that theall-penetrating mind, once conceived by Laplace, towhich every causal connection lay immediatelyopen, could tell from their structure whether [an eggcarrying them] would develop, under suitableconditions, into a black cock or into a speckled hen,into a fly or a maize plant, a rhodo-dendron, abeetle, a mouse, or a woman.

Erwin Schrödinger. 1944. What is Life.

Page 41: Next Steps for Working Scientists: Access to Information · funded centralized facilities, including a stock center for the cloned DNA fragments generated in the mapping and sequencing

41

The Digital Basis of LifeWe now know thatSchrödinger’s mysterioushuman “code-script”consists of 3.3 billionbase pairs of DNA.

Page 42: Next Steps for Working Scientists: Access to Information · funded centralized facilities, including a stock center for the cloned DNA fragments generated in the mapping and sequencing

42

The Digital Basis of Life

Typed in 10-pitch font, one human sequence would stretch for morethan 5,000 miles. Digitally formatted, it could be stored on one CD-ROM. Biologically encoded, it fits easily within a single cell.

We now know thatSchrödinger’s mysterioushuman “code-script”consists of 3.3 billionbase pairs of DNA.

Information is passed from parent tochild in form that is genuinely, notmetaphorically digital. Thebiological encoding of digitalinformation is incredibly efficient.

Page 43: Next Steps for Working Scientists: Access to Information · funded centralized facilities, including a stock center for the cloned DNA fragments generated in the mapping and sequencing

43

Bio-digital Information

DNA is a highly efficient digital storage device:

• There is more mass-storage capacity in theDNA of a side of beef than in all the hard drivesof all the world’s computers.

Page 44: Next Steps for Working Scientists: Access to Information · funded centralized facilities, including a stock center for the cloned DNA fragments generated in the mapping and sequencing

44

Bio-digital Information

DNA is a highly efficient digital storage device:

• There is more mass-storage capacity in theDNA of a side of beef than in all the hard drivesof all the world’s computers.

• Storing all of the (redundant) information in allof the world’s DNA on computer hard diskswould require that the entire surface of the Earthbe covered to a depth of three miles in Conner1.0 gB drives.

Page 45: Next Steps for Working Scientists: Access to Information · funded centralized facilities, including a stock center for the cloned DNA fragments generated in the mapping and sequencing

Genomics:An Example

Page 46: Next Steps for Working Scientists: Access to Information · funded centralized facilities, including a stock center for the cloned DNA fragments generated in the mapping and sequencing

46

Human Genome Project - Goals– construction of a high-resolution genetic map of the human

genome;

– production of a variety of physical maps of all humanchromosomes and of the DNA of selected modelorganisms;

– determination of the complete sequence of human DNA andof the DNA of selected model organisms;

– development of capabilities for collecting, storing,distributing, and analyzing the data produced;

– creation of appropriate technologies necessary to achievethese objectives.

USDOE. 1990. Understanding Our Genetic Inheritance.The U.S. Human Genome Project: The First Five Years.

Page 47: Next Steps for Working Scientists: Access to Information · funded centralized facilities, including a stock center for the cloned DNA fragments generated in the mapping and sequencing

47

Infrastructure and the HGP

Progress towards all of the [Genome Project]goals will require the establishment of well-funded centralized facilities, including a stockcenter for the cloned DNA fragmentsgenerated in the mapping and sequencingeffort and a data center for the computer-basedcollection and distribution of large amounts ofDNA sequence information.

National Research Council. 1988. Mapping and Sequencing theHuman Genome. Washington, DC: National Academy Press. p. 3

Page 48: Next Steps for Working Scientists: Access to Information · funded centralized facilities, including a stock center for the cloned DNA fragments generated in the mapping and sequencing

48

Base Pairs in GenBank

0

20 0 ,00 0 ,00 0

40 0 ,00 0 ,00 0

60 0 ,00 0 ,00 0

80 0 ,00 0 ,00 0

1 ,0 00 ,0 00 ,0 00

1 ,2 00 ,0 00 ,0 00

0 5 10 15 20 25 30 35 40 45 50 55 60 65 70 75 80 85 90 95 100 105

GenBank Release Numbers

9493929190898887 95 96 97

Page 49: Next Steps for Working Scientists: Access to Information · funded centralized facilities, including a stock center for the cloned DNA fragments generated in the mapping and sequencing

49

Base Pairs in GenBank

0

20 0 ,00 0 ,00 0

40 0 ,00 0 ,00 0

60 0 ,00 0 ,00 0

80 0 ,00 0 ,00 0

1 ,0 00 ,0 00 ,0 00

1 ,2 00 ,0 00 ,0 00

0 5 10 15 20 25 30 35 40 45 50 55 60 65 70 75 80 85 90 95 100 105

GenBank Release Numbers

9493929190898887 95 96 97

Growth in GenBank is spectacular.More data were added in the last 10weeks than were added in the first 10years of the project.

Page 50: Next Steps for Working Scientists: Access to Information · funded centralized facilities, including a stock center for the cloned DNA fragments generated in the mapping and sequencing

50

Base Pairs in GenBank

0

20 0 ,00 0 ,00 0

40 0 ,00 0 ,00 0

60 0 ,00 0 ,00 0

80 0 ,00 0 ,00 0

1 ,0 00 ,0 00 ,0 00

1 ,2 00 ,0 00 ,0 00

0 5 10 15 20 25 30 35 40 45 50 55 60 65 70 75 80 85 90 95 100 105

GenBank Release Numbers

9493929190898887 95 96 97

Growth in GenBank is spectacular.More data were added in the last 10weeks than were added in the first 10years of the project.

At this rate, what’s next...

Page 51: Next Steps for Working Scientists: Access to Information · funded centralized facilities, including a stock center for the cloned DNA fragments generated in the mapping and sequencing

51

ABI Bass-o-Matic Sequencer

In with the sample, out with the sequence...

TGCGCATCGCGTATCGATAG

speed

gB/min

EnterDefrost

7 8 9

4 5 6

1 2 3

0

+

-

Page 52: Next Steps for Working Scientists: Access to Information · funded centralized facilities, including a stock center for the cloned DNA fragments generated in the mapping and sequencing

52

What’s Really Next

The post-genome era will take forgranted ready access to huge amountsof genomic data.

Page 53: Next Steps for Working Scientists: Access to Information · funded centralized facilities, including a stock center for the cloned DNA fragments generated in the mapping and sequencing

53

What’s Really Next

The post-genome era will take forgranted ready access to huge amountsof genomic data.

The challenge will be understandingthose data and using the understandingto solve real-world problems...

Page 54: Next Steps for Working Scientists: Access to Information · funded centralized facilities, including a stock center for the cloned DNA fragments generated in the mapping and sequencing

54

What’s Really Next

The post-genome era will take forgranted ready access to huge amountsof genomic data.

The challenge will be understandingthose data and using the understandingto solve real-world problems...

The path to understanding will require even more data...

Page 55: Next Steps for Working Scientists: Access to Information · funded centralized facilities, including a stock center for the cloned DNA fragments generated in the mapping and sequencing

21st CenturyBiology

The Science

Page 56: Next Steps for Working Scientists: Access to Information · funded centralized facilities, including a stock center for the cloned DNA fragments generated in the mapping and sequencing

56

Fundamental Dogma

The fundamental dogma of molecular biologyis that genes act to create phenotypes througha flow of information from DNA to RNA toproteins, to interactions among proteins(regulatory circuits and metabolic pathways),and ultimately to phenotypes.

Collections of individual phenotypes, ofcourse, constitute a population.

DNA

RNA

Proteins

Circuits

Phenotypes

Populations

Page 57: Next Steps for Working Scientists: Access to Information · funded centralized facilities, including a stock center for the cloned DNA fragments generated in the mapping and sequencing

57

Fundamental DogmaDNA

RNA

Proteins

Circuits

Phenotypes

Populations

GenBankEMBLDDBJ

MapDatabases

SwissPROTPIR

PDB

Although a few databases already existto distribute molecular information,

Page 58: Next Steps for Working Scientists: Access to Information · funded centralized facilities, including a stock center for the cloned DNA fragments generated in the mapping and sequencing

58

Fundamental DogmaDNA

RNA

Proteins

Circuits

Phenotypes

Populations

GenBankEMBLDDBJ

MapDatabases

SwissPROTPIR

PDB

Gene Expression?

Clinical Data ?

Regulatory Pathways?Metabolism?

Biodiversity?

Neuroanatomy?

Development ?

Molecular Epidemiology?

Comparative Genomics?

the post-genomic era will need manymore to collect, manage, and publishthe coming flood of new findings.

Although a few databases already existto distribute molecular information,

Page 59: Next Steps for Working Scientists: Access to Information · funded centralized facilities, including a stock center for the cloned DNA fragments generated in the mapping and sequencing

21st CenturyBiology

Data Volume

Page 60: Next Steps for Working Scientists: Access to Information · funded centralized facilities, including a stock center for the cloned DNA fragments generated in the mapping and sequencing

60

Base Pairs in GenBank (changes)

0

20 ,000 ,000

40 ,000 ,000

60 ,000 ,000

80 ,000 ,000

10 0 ,00 0 ,00 0

12 0 ,00 0 ,00 0

14 0 ,00 0 ,00 0

0 5 10 15 20 25 30 35 40 45 50 55 60 65 70 75 80 85 90 95 100 105

GenBank Release Numbers

9493929190898887 95 96 97

Page 61: Next Steps for Working Scientists: Access to Information · funded centralized facilities, including a stock center for the cloned DNA fragments generated in the mapping and sequencing

61

Base Pairs in GenBank (Percent Increase)

0 .0 0%

10 .00%

20 .00%

30 .00%

40 .00%

50 .00%

60 .00%

70 .00%

80 .00%

90 .00%

10 0.00 %

85 86 87 88 89 90 91 92 93 94 95 96

Year

Average = 56%

Page 62: Next Steps for Working Scientists: Access to Information · funded centralized facilities, including a stock center for the cloned DNA fragments generated in the mapping and sequencing

62

Projected Base Pairs

110

10 01 ,0 00

10 ,00010 0 ,00 0

1 ,0 00 ,0 0010 ,000 ,000

10 0 ,00 0 ,00 01 ,0 00 ,0 00 ,0 00

10 ,000 ,000 ,00010 0 ,00 0 ,00 0 ,000

1 ,0 00 ,0 00 ,0 00 ,0 0010 ,000 ,000 ,000 ,000

10 0 ,00 0 ,00 0 ,000 ,000

90 95 0 5 10 15 20 25

Year

Assumed annual growth rate: 50%(less than current rate)

Page 63: Next Steps for Working Scientists: Access to Information · funded centralized facilities, including a stock center for the cloned DNA fragments generated in the mapping and sequencing

63

Projected Base Pairs

110

10 01 ,0 00

10 ,00010 0 ,00 0

1 ,0 00 ,0 0010 ,000 ,000

10 0 ,00 0 ,00 01 ,0 00 ,0 00 ,0 00

10 ,000 ,000 ,00010 0 ,00 0 ,00 0 ,00 0

1 ,0 00 ,0 00 ,0 00 ,00010 ,000 ,000 ,000 ,000

10 0 ,00 0 ,00 0 ,00 0 ,00 0

90 95 0 5 10 15 20 25

Year

Ridiculous growth numbers, indicated as number of basepairs per individual medical record in the US.

500,00050,0005,000

Page 64: Next Steps for Working Scientists: Access to Information · funded centralized facilities, including a stock center for the cloned DNA fragments generated in the mapping and sequencing

21st CenturyBiology

Post-Genome Era

Page 65: Next Steps for Working Scientists: Access to Information · funded centralized facilities, including a stock center for the cloned DNA fragments generated in the mapping and sequencing

65

The Post-Genome Era

Post-genome research involves:

• applying genomic tools and knowledge to moregeneral problems

• asking new questions, tractable only to genomicor post-genomic analysis

• moving beyond the structural genomics of thehuman genome project and into the functionalgenomics of the post-genome era

Page 66: Next Steps for Working Scientists: Access to Information · funded centralized facilities, including a stock center for the cloned DNA fragments generated in the mapping and sequencing

66

The Post-Genome Era

Suggested definition:

• functional genomics = biology

Page 67: Next Steps for Working Scientists: Access to Information · funded centralized facilities, including a stock center for the cloned DNA fragments generated in the mapping and sequencing

67

The Post-Genome Era

An early analysis:

Walter Gilbert. 1991. Towards a paradigmshift in biology. Nature, 349:99.

Page 68: Next Steps for Working Scientists: Access to Information · funded centralized facilities, including a stock center for the cloned DNA fragments generated in the mapping and sequencing

68

Paradigm Shift in Biology

To use [the] flood of knowledge, which will pouracross the computer networks of the world,biologists not only must become computerliterate, but also change their approach to theproblem of understanding life.

Walter Gilbert. 1991. Towards a paradigm shift in biology. Nature, 349:99.

Page 69: Next Steps for Working Scientists: Access to Information · funded centralized facilities, including a stock center for the cloned DNA fragments generated in the mapping and sequencing

69

Paradigm Shift in Biology

The new paradigm, now emerging, is that all the‘genes’ will be known (in the sense of beingresident in databases available electronically),and that the starting point of a biologicalinvestigation will be theoretical. An individualscientist will begin with a theoretical conjecture,only then turning to experiment to follow or testthat hypothesis.

Walter Gilbert. 1991. Towards a paradigm shift in biology. Nature, 349:99.

Page 70: Next Steps for Working Scientists: Access to Information · funded centralized facilities, including a stock center for the cloned DNA fragments generated in the mapping and sequencing

70

Paradigm Shift in Biology

Case of Microbiology

< 5,000 known and described bacteria

5,000,000 base pairs per genome

25,000,000,000 TOTAL base pairs

If a full, annotated sequence were available for all known bacteria, the practiceof microbiology would match Gilbert’s prediction.

Page 71: Next Steps for Working Scientists: Access to Information · funded centralized facilities, including a stock center for the cloned DNA fragments generated in the mapping and sequencing

Funding forBio-Information

Infrastructure

Page 72: Next Steps for Working Scientists: Access to Information · funded centralized facilities, including a stock center for the cloned DNA fragments generated in the mapping and sequencing

72

Call for Change

Among the many new tools that are or will be needed (for 21st-century biology), some of those having the highest priority are:

• bioinformatics

• computational biology

• functional imaging tools using biosensors and biomarkers

• transformation and transient expression technologies

• nanotechnologies

Impact of Emerging Technologies on the Biological Sciences: Report of aWorkshop. NSF-supported workshop, held 26-27 June 1995, Washington, DC.

Page 73: Next Steps for Working Scientists: Access to Information · funded centralized facilities, including a stock center for the cloned DNA fragments generated in the mapping and sequencing

73

The Problem

• IT will play a central role in 21st Century,post-genome-era biology.

Page 74: Next Steps for Working Scientists: Access to Information · funded centralized facilities, including a stock center for the cloned DNA fragments generated in the mapping and sequencing

74

The Problem

• IT will play a central role in 21st Century,post-genome-era biology.

• IT moves at “Internet Speed” and respondsrapidly to market forces.

Page 75: Next Steps for Working Scientists: Access to Information · funded centralized facilities, including a stock center for the cloned DNA fragments generated in the mapping and sequencing

75

The Problem

• IT will play a central role in 21st Century,post-genome-era biology.

• IT moves at “Internet Speed” and respondsrapidly to market forces.

• Current levels of support for public bio-information infrastructure are too low.

Page 76: Next Steps for Working Scientists: Access to Information · funded centralized facilities, including a stock center for the cloned DNA fragments generated in the mapping and sequencing

76

The Problem

• IT will play a central role in 21st Century,post-genome-era biology.

• IT moves at “Internet Speed” and respondsrapidly to market forces.

• Current levels of support for public bio-information infrastructure are too low.

• Compared to internet speed, federal grant-funding decision processes are ponderouslyslow and inefficient.

Page 77: Next Steps for Working Scientists: Access to Information · funded centralized facilities, including a stock center for the cloned DNA fragments generated in the mapping and sequencing

77

Federal Funding of Bio-Databases

The challenges:

• providing adequate funding levels

Page 78: Next Steps for Working Scientists: Access to Information · funded centralized facilities, including a stock center for the cloned DNA fragments generated in the mapping and sequencing

78

Federal Funding of Bio-Databases

The challenges:

• providing adequate funding levels

• making timely, efficient decisions

Page 79: Next Steps for Working Scientists: Access to Information · funded centralized facilities, including a stock center for the cloned DNA fragments generated in the mapping and sequencing

IT Budgets

A Reality Check

Page 80: Next Steps for Working Scientists: Access to Information · funded centralized facilities, including a stock center for the cloned DNA fragments generated in the mapping and sequencing

80

Rhetorical Question

Which is likely to be more complex:

• identifying, documenting, and tracking thewhereabouts of all parcels in transit in the US atone time, or...

Page 81: Next Steps for Working Scientists: Access to Information · funded centralized facilities, including a stock center for the cloned DNA fragments generated in the mapping and sequencing

81

Rhetorical Question

Which is likely to be more complex:

• identifying, documenting, and tracking thewhereabouts of all parcels in transit in the US atone time, or...

• identifying, documenting, and analyzing thestructure and function of all individual genes inall economically significant organisms; thenanalyzing all significant gene-gene and gene-environment interactions in those organisms andtheir environments.

Page 82: Next Steps for Working Scientists: Access to Information · funded centralized facilities, including a stock center for the cloned DNA fragments generated in the mapping and sequencing

82

Business Factoids

United Parcel Service:

• uses two redundant 3 Terabyte (yes, 3000 GB)databases to track all packages in transit.

Page 83: Next Steps for Working Scientists: Access to Information · funded centralized facilities, including a stock center for the cloned DNA fragments generated in the mapping and sequencing

83

Business Factoids

United Parcel Service:

• uses two redundant 3 Terabyte (yes, 3000 GB)databases to track all packages in transit.

• has 4,000 full-time employees dedicated to IT.

Page 84: Next Steps for Working Scientists: Access to Information · funded centralized facilities, including a stock center for the cloned DNA fragments generated in the mapping and sequencing

84

Business Factoids

United Parcel Service:

• uses two redundant 3 Terabyte (yes, 3000 GB)databases to track all packages in transit.

• has 4,000 full-time employees dedicated to IT.

• spends one billion dollars per year on IT.

Page 85: Next Steps for Working Scientists: Access to Information · funded centralized facilities, including a stock center for the cloned DNA fragments generated in the mapping and sequencing

85

Business Factoids

United Parcel Service:

• uses two redundant 3 Terabyte (yes, 3000 GB)databases to track all packages in transit.

• has 4,000 full-time employees dedicated to IT.

• spends one billion dollars per year on IT.

• has an income of 1.1 billion dollars (againstrevenues of 22.4 billion dollars).

Page 86: Next Steps for Working Scientists: Access to Information · funded centralized facilities, including a stock center for the cloned DNA fragments generated in the mapping and sequencing

86

Business ComparisonsCompany Revenues IT Budget Pct

Bristol-Myers Squibb 15,065,000,000 440,000,000 2.92 %

Pfizer 11,306,000,000 300,000,000 2.65 %

Pacific Gas & Electric 10,000,000,000 250,000,000 2.50 %

K-Mart 31,437,000,000 130,000,000 0.41 %

Wal-Mart 104,859,000,000 550,000,000 0.52 %

Sprint 14,235,000,000 873,000,000 6.13 %

MCI 18,500,000,000 1,000,000,000 5.41 %

United Parcel 22,400,000,000 1,000,000,000 4.46 %

AMR Corporation 17,753,000,000 1,368,000,000 7.71 %

IBM 75,947,000,000 4,400,000,000 5.79 %

Microsoft 11,360,000,000 510,000,000 4.49 %

Chase-Manhattan 16,431,000,000 1,800,000,000 10.95 %

Nation’s Bank 17,509,000,000 1,130,000,000 6.45 %

Page 87: Next Steps for Working Scientists: Access to Information · funded centralized facilities, including a stock center for the cloned DNA fragments generated in the mapping and sequencing

Bio IT Support

A Modest Proposal

Page 88: Next Steps for Working Scientists: Access to Information · funded centralized facilities, including a stock center for the cloned DNA fragments generated in the mapping and sequencing

88

Level of Support

Appropriate funding level:

• approx. 5-10% of research funding

• i.e., 1 - 2 billion dollars per year

Source of estimate:

- Experience of IT-transformed industries.

- Current support for IT-rich biological research.

Page 89: Next Steps for Working Scientists: Access to Information · funded centralized facilities, including a stock center for the cloned DNA fragments generated in the mapping and sequencing

89

Process of Support

Possible solutions:

• increase the direct support of federal serviceorganizations providing informationinfrastructure (e.g., NCBI).

Page 90: Next Steps for Working Scientists: Access to Information · funded centralized facilities, including a stock center for the cloned DNA fragments generated in the mapping and sequencing

90

Process of Support

Possible solutions:

• increase the direct support of federal serviceorganizations providing informationinfrastructure (e.g., NCBI).

• reduce supply-side support for investigator-initiated, grant-funded public database projects.

Page 91: Next Steps for Working Scientists: Access to Information · funded centralized facilities, including a stock center for the cloned DNA fragments generated in the mapping and sequencing

91

Process of Support

Possible solutions:

• increase the direct support of federal serviceorganizations providing informationinfrastructure (e.g., NCBI).

• reduce supply-side support for investigator-initiated, grant-funded public database projects.

• increase demand-side support for market-provided biomedical information resources.

Page 92: Next Steps for Working Scientists: Access to Information · funded centralized facilities, including a stock center for the cloned DNA fragments generated in the mapping and sequencing

92

Market Forces

Vendors

productsservices

Buyers

$

purchases

In a simple market economy, vendors try to anticipatethe needs of buyers and offer products and services tomeet those needs.

Real users decide whether or not to buy a product orservice, depending upon whether or not it meets a realneed at a reasonable price.

Business 101 Insight:

Successful vendors target aniche and excel at meeting theneeds of that niche.

Page 93: Next Steps for Working Scientists: Access to Information · funded centralized facilities, including a stock center for the cloned DNA fragments generated in the mapping and sequencing

93

Market Forces

VentureCapital

Vendors$

Buyers

$Stock

Offerings

Funding to initiate the developmentof products and services come frominvestors, not from buyers.

Investors decide whether or not toprovide start-up funding based uponthe estimated ability of the vendor tocreate products and services that willmeet real needs at competitive prices.

$

VendorInvestment

productsservices

$

purchases

Page 94: Next Steps for Working Scientists: Access to Information · funded centralized facilities, including a stock center for the cloned DNA fragments generated in the mapping and sequencing

94

Federal Funding

Investors

Database

$

Users

productsservices

$

purchases

If biological databases were drivenby market forces, individual userswould choose what services theyneed and individual databaseproviders would choose whatservices to make available.

Investors would provide start-upmoney on the likelihood ofsuccessful products and servicesbeing developed.

Ultimate success would depend onmeeting the needs of real users.Decisions could be made rapidly, inresponse to changing needs andemerging opportunities.

Page 95: Next Steps for Working Scientists: Access to Information · funded centralized facilities, including a stock center for the cloned DNA fragments generated in the mapping and sequencing

95

Federal Funding

Agency

Database

Reviewers

OtherAgencies

AgencyAdvisors

Congress

productsservices

OMB $

$ DatabaseAdvisors

Users

Instead, funding decisions for grant-supported biological databases canfollow a ponderously slow course,with almost no opportunity for real-time input from real users.

Even with the best of intentions at alllevels, this process is slow,inefficient, risk-averse, and non-responsive to the real and changingneeds of users.

Page 96: Next Steps for Working Scientists: Access to Information · funded centralized facilities, including a stock center for the cloned DNA fragments generated in the mapping and sequencing

96

Federal Funding of Bio-Databases

Creating market forces:

• stop supporting the supply side of biodatabasesthrough slow, inefficient processes.

Page 97: Next Steps for Working Scientists: Access to Information · funded centralized facilities, including a stock center for the cloned DNA fragments generated in the mapping and sequencing

97

Federal Funding of Bio-Databases

Creating market forces:

• stop supporting the supply side of biodatabasesthrough slow, inefficient processes.

• start supporting the demand side through fast,efficient processes.

Page 98: Next Steps for Working Scientists: Access to Information · funded centralized facilities, including a stock center for the cloned DNA fragments generated in the mapping and sequencing

98

Federal Funding of Bio-Databases

Creating market forces:

• stop supporting the supply side of biodatabasesthrough slow, inefficient processes.

• start supporting the demand side through fast,efficient processes.

• provide guaranteed supplementary funding,redeemable only for access to bio-databases.

Page 99: Next Steps for Working Scientists: Access to Information · funded centralized facilities, including a stock center for the cloned DNA fragments generated in the mapping and sequencing

99

Federal Funding of Bio-Databases

Creating market forces:

• stop supporting the supply side of biodatabasesthrough slow, inefficient processes.

• start supporting the demand side through fast,efficient processes.

• provide guaranteed supplementary funding,redeemable only for access to bio-databases.

• data stamps

Page 100: Next Steps for Working Scientists: Access to Information · funded centralized facilities, including a stock center for the cloned DNA fragments generated in the mapping and sequencing

100

Federal Funding of Bio-Databases

Creating market forces:

• stop supporting the supply side of biodatabasesthrough slow, inefficient processes.

• start supporting the demand side through fast,efficient processes.

• provide guaranteed supplementary funding,redeemable only for access to bio-databases.

• data stamps, AKA food (for-thought) stamps ?!

Page 101: Next Steps for Working Scientists: Access to Information · funded centralized facilities, including a stock center for the cloned DNA fragments generated in the mapping and sequencing

101

Food (for thought) Stamps

Funding Agencies could:

• provide a 10% supplement to every researchgrant in the form of “stamps” redeemable only atdatabase providers.

• allow the “stamps” to be transferable amongscientists, so that a market for them couldemerge.

• provide funding only after the stamps have beenredeemed at a database provider.

Page 102: Next Steps for Working Scientists: Access to Information · funded centralized facilities, including a stock center for the cloned DNA fragments generated in the mapping and sequencing

102

Food (for thought) Stamps

Problems:

• how to estimate the amount of FFT stamps thatwould actually be redeemed (and thus therequired budget set-aside).

• how to identify “approved” database providers.

• how to initiate the FFT system.

• etc etc

Page 103: Next Steps for Working Scientists: Access to Information · funded centralized facilities, including a stock center for the cloned DNA fragments generated in the mapping and sequencing

103

Food (for thought) Stamps

Alternatives (if no solution emerges):• increasingly inefficient research activities (abject

failure will occur when it becomes simpler torepeat research than to obtain prior results).

• loss of access to bio-databases for public-sectorresearch.

• movement of majority of “important” biologicalresearch into the private sector.

• loss of American pre-eminence (if othercountries solve the problems first).

Page 104: Next Steps for Working Scientists: Access to Information · funded centralized facilities, including a stock center for the cloned DNA fragments generated in the mapping and sequencing

104

Slides:

http://www.esp.org/rjr/codata.pdf