![Page 1: biological informatics What are the differences among · One Human Sequence Typed in 10-pitch font, one human sequence would stretch for more than 5,000 miles. Digitally formatted,](https://reader034.vdocuments.us/reader034/viewer/2022042711/5f77afc4835d8e5461177513/html5/thumbnails/1.jpg)
What is Biological Informatics?( http://www.esp.org/rjr/canberra.pdf )
Robert J. RobbinsFred Hutchinson Cancer Research Center
1100 Fairview Avenue North, LV-101Seattle, Washington 98109
[email protected]://www.esp.org/rjr
(206) 667 2920
![Page 2: biological informatics What are the differences among · One Human Sequence Typed in 10-pitch font, one human sequence would stretch for more than 5,000 miles. Digitally formatted,](https://reader034.vdocuments.us/reader034/viewer/2022042711/5f77afc4835d8e5461177513/html5/thumbnails/2.jpg)
AbstractAbstract
In the last 25 years, Moore's Law has transformed society, delivering exponentiallybetter computers at exponentially lower prices. Biological informatics is theapplication of powerful, affordable information technology to the problems ofbiology. With $2500 desktop PCs now delivering more raw computing power thanthe first Cray, bioinformatics is rapidly becoming the critical technology for 21st-century biology.
DNA is legitimately seen as a biological mass-storage device, makingbioinformatics asine qua nonfor genomic research. Others areas of biologicalinvestigation are equally information rich –– an exhaustive tabulation of the Earth'sbiodiversity would involve a cross–index of the millions of known species againstthe approximately 500,000,000,000,000 square meters of the Earth's surface.
Bioinformatics is also becoming a scholarly discipline in its own right, meldinginformation science with computer science, seasoning it with engineering methods,and applying it to the most information rich component of the known universe ––the Biosphere.
![Page 3: biological informatics What are the differences among · One Human Sequence Typed in 10-pitch font, one human sequence would stretch for more than 5,000 miles. Digitally formatted,](https://reader034.vdocuments.us/reader034/viewer/2022042711/5f77afc4835d8e5461177513/html5/thumbnails/3.jpg)
3
What is Biological Informatics?
What are the differences among:• biological informatics• bioinformatics• informatics• computational biology
What are the differences among:• biological informatics• bioinformatics• informatics• computational biology
![Page 4: biological informatics What are the differences among · One Human Sequence Typed in 10-pitch font, one human sequence would stretch for more than 5,000 miles. Digitally formatted,](https://reader034.vdocuments.us/reader034/viewer/2022042711/5f77afc4835d8e5461177513/html5/thumbnails/4.jpg)
4
What is Bioinformatics?
Semantic tuning:• X-informatics is the application of
information technology to discipline X,with emphasis on persistent data stores.
• Computational X is the application ofinformation technology to discipline X,with emphasis analytical algorithms.
Semantic tuning:• X-informatics is the application of
information technology to discipline X,with emphasis on persistent data stores.
• Computational X is the application ofinformation technology to discipline X,with emphasis analytical algorithms.
![Page 5: biological informatics What are the differences among · One Human Sequence Typed in 10-pitch font, one human sequence would stretch for more than 5,000 miles. Digitally formatted,](https://reader034.vdocuments.us/reader034/viewer/2022042711/5f77afc4835d8e5461177513/html5/thumbnails/5.jpg)
5
What is Bioinformatics?
Bioinformatics is:• the use of computers (and persistent data
structures) in pursuit of biological research.• an emerging new discipline, with its own
goals, research program, and practitioners.• thesine qua nonfor 21st-century biology.• the mostsignificantly underfunded
component of 21st-century biology.• all of the above.
Bioinformatics is:• the use of computers (and persistent data
structures) in pursuit of biological research.• an emerging new discipline, with its own
goals, research program, and practitioners.• thesine qua nonfor 21st-century biology.• the mostsignificantly underfunded
component of 21st-century biology.• all of the above.
![Page 6: biological informatics What are the differences among · One Human Sequence Typed in 10-pitch font, one human sequence would stretch for more than 5,000 miles. Digitally formatted,](https://reader034.vdocuments.us/reader034/viewer/2022042711/5f77afc4835d8e5461177513/html5/thumbnails/6.jpg)
6
Topics
• Biotechnology and information technology will be the“magic” technologies of the 21st Century.
• Moore’s Law constantly transforms IT (and everythingelse).
• Information Technology (IT) has a special relationshipwith biology.
• 21st-century biology will be based on bioinformatics.• Bioinformatics is emerging as an independent
discipline.• A connected, federated information infrastructure for
21st-century biology is needed.• Current support for public bio-information
infrastructure seems inadequate.
• Biotechnology and information technology will be the“magic” technologies of the 21st Century.
• Moore’s Law constantly transforms IT (and everythingelse).
• Information Technology (IT) has a special relationshipwith biology.
• 21st-century biology will be based on bioinformatics.• Bioinformatics is emerging as an independent
discipline.• A connected, federated information infrastructure for
21st-century biology is needed.• Current support for public bio-information
infrastructure seems inadequate.
![Page 7: biological informatics What are the differences among · One Human Sequence Typed in 10-pitch font, one human sequence would stretch for more than 5,000 miles. Digitally formatted,](https://reader034.vdocuments.us/reader034/viewer/2022042711/5f77afc4835d8e5461177513/html5/thumbnails/7.jpg)
Introduction
MagicalTechnology
![Page 8: biological informatics What are the differences among · One Human Sequence Typed in 10-pitch font, one human sequence would stretch for more than 5,000 miles. Digitally formatted,](https://reader034.vdocuments.us/reader034/viewer/2022042711/5f77afc4835d8e5461177513/html5/thumbnails/8.jpg)
8
Magic
To a person from 1897, much currenttechnology would seem like magic.
What technology of 2097 would seemmagical to a person from 1997?
Candidate: Biotechnology so advanced that the distinctionbetween living and non-living is blurred.
Information technology so advanced that accessto information is immediate and universal.
Candidate: Biotechnology so advanced that the distinctionbetween living and non-living is blurred.
Information technology so advanced that accessto information is immediate and universal.
![Page 9: biological informatics What are the differences among · One Human Sequence Typed in 10-pitch font, one human sequence would stretch for more than 5,000 miles. Digitally formatted,](https://reader034.vdocuments.us/reader034/viewer/2022042711/5f77afc4835d8e5461177513/html5/thumbnails/9.jpg)
Moore’s Law
Transforms InfoTech(and everything else)
![Page 10: biological informatics What are the differences among · One Human Sequence Typed in 10-pitch font, one human sequence would stretch for more than 5,000 miles. Digitally formatted,](https://reader034.vdocuments.us/reader034/viewer/2022042711/5f77afc4835d8e5461177513/html5/thumbnails/10.jpg)
10
Moore’s Law: The Statement
Every eighteen months, thenumber of transistors that canbe placed on a chip doubles.
Gordon Moore, co-founder of Intel...Gordon Moore, co-founder of Intel...
![Page 11: biological informatics What are the differences among · One Human Sequence Typed in 10-pitch font, one human sequence would stretch for more than 5,000 miles. Digitally formatted,](https://reader034.vdocuments.us/reader034/viewer/2022042711/5f77afc4835d8e5461177513/html5/thumbnails/11.jpg)
11
Moore’s Law: The EffectP
erfo
rman
ce(c
on
sta
ntc
ost
)
Time
100,000
10,000
1,000
100
10
![Page 12: biological informatics What are the differences among · One Human Sequence Typed in 10-pitch font, one human sequence would stretch for more than 5,000 miles. Digitally formatted,](https://reader034.vdocuments.us/reader034/viewer/2022042711/5f77afc4835d8e5461177513/html5/thumbnails/12.jpg)
12
Moore’s Law: The EffectP
erfo
rman
ce(c
on
sta
ntc
ost
)
Cos
t(c
on
sta
nt
pe
rfo
rma
nce
)
Time
100,000 100,000
10,000
1,000
100
10
10,000
1,000
100
10
![Page 13: biological informatics What are the differences among · One Human Sequence Typed in 10-pitch font, one human sequence would stretch for more than 5,000 miles. Digitally formatted,](https://reader034.vdocuments.us/reader034/viewer/2022042711/5f77afc4835d8e5461177513/html5/thumbnails/13.jpg)
13
Moore’s Law: The Effect
Three Phases of Novel IT Applications
• It’s Impossible
• It’s Impractical
• It’s Overdue
Three Phases of Novel IT Applications
• It’s Impossible
• It’s Impractical
• It’s Overdue
![Page 14: biological informatics What are the differences among · One Human Sequence Typed in 10-pitch font, one human sequence would stretch for more than 5,000 miles. Digitally formatted,](https://reader034.vdocuments.us/reader034/viewer/2022042711/5f77afc4835d8e5461177513/html5/thumbnails/14.jpg)
14
Moore’s Law: The EffectP
erfo
rman
ce(c
on
sta
ntc
ost
)
Cos
t(c
on
sta
nt
pe
rfo
rma
nce
)
Time
100,000 100,000
10,000
1,000
100
10
10,000
1,000
100
10
D
P
![Page 15: biological informatics What are the differences among · One Human Sequence Typed in 10-pitch font, one human sequence would stretch for more than 5,000 miles. Digitally formatted,](https://reader034.vdocuments.us/reader034/viewer/2022042711/5f77afc4835d8e5461177513/html5/thumbnails/15.jpg)
15
Moore’s Law: The EffectP
erfo
rman
ce(c
on
sta
ntc
ost
)
Cos
t(c
on
sta
nt
pe
rfo
rma
nce
)
Time
100,000 100,000
10,000
1,000
100
10
10,000
1,000
100
10
D
P
![Page 16: biological informatics What are the differences among · One Human Sequence Typed in 10-pitch font, one human sequence would stretch for more than 5,000 miles. Digitally formatted,](https://reader034.vdocuments.us/reader034/viewer/2022042711/5f77afc4835d8e5461177513/html5/thumbnails/16.jpg)
16
Moore’s Law: The EffectP
erfo
rman
ce(c
on
sta
ntc
ost
)
Cos
t(c
on
sta
nt
pe
rfo
rma
nce
)
Time
100,000 100,000
10,000
1,000
100
10
10,000
1,000
100
10
D
P
![Page 17: biological informatics What are the differences among · One Human Sequence Typed in 10-pitch font, one human sequence would stretch for more than 5,000 miles. Digitally formatted,](https://reader034.vdocuments.us/reader034/viewer/2022042711/5f77afc4835d8e5461177513/html5/thumbnails/17.jpg)
17
Moore’s Law: The EffectP
erfo
rman
ce(c
on
sta
ntc
ost
)
Cos
t(c
on
sta
nt
pe
rfo
rma
nce
)
Time
100,000 100,000
10,000
1,000
100
10
10,000
1,000
100
10
D
P
![Page 18: biological informatics What are the differences among · One Human Sequence Typed in 10-pitch font, one human sequence would stretch for more than 5,000 miles. Digitally formatted,](https://reader034.vdocuments.us/reader034/viewer/2022042711/5f77afc4835d8e5461177513/html5/thumbnails/18.jpg)
18
Moore’s Law: The EffectP
erfo
rman
ce(c
on
sta
ntc
ost
)
Cos
t(c
on
sta
nt
pe
rfo
rma
nce
)
Time
100,000 100,000
10,000
1,000
100
10
10,000
1,000
100
10
D
P
C
![Page 19: biological informatics What are the differences among · One Human Sequence Typed in 10-pitch font, one human sequence would stretch for more than 5,000 miles. Digitally formatted,](https://reader034.vdocuments.us/reader034/viewer/2022042711/5f77afc4835d8e5461177513/html5/thumbnails/19.jpg)
19
Moore’s Law: The EffectP
erfo
rman
ce(c
on
sta
ntc
ost
)
Cos
t(c
on
sta
nt
pe
rfo
rma
nce
)
Time
100,000 100,000
10,000
1,000
100
10
10,000
1,000
100
10
D
P
C
![Page 20: biological informatics What are the differences among · One Human Sequence Typed in 10-pitch font, one human sequence would stretch for more than 5,000 miles. Digitally formatted,](https://reader034.vdocuments.us/reader034/viewer/2022042711/5f77afc4835d8e5461177513/html5/thumbnails/20.jpg)
20
Moore’s Law: The EffectP
erfo
rman
ce(c
on
sta
ntc
ost
)
Cos
t(c
on
sta
nt
pe
rfo
rma
nce
)
Time
100,000 100,000
10,000
1,000
100
10
10,000
1,000
100
10
D
P
AA
C
![Page 21: biological informatics What are the differences among · One Human Sequence Typed in 10-pitch font, one human sequence would stretch for more than 5,000 miles. Digitally formatted,](https://reader034.vdocuments.us/reader034/viewer/2022042711/5f77afc4835d8e5461177513/html5/thumbnails/21.jpg)
21
Moore’s Law: The EffectP
erfo
rman
ce(c
on
sta
ntc
ost
)
Cos
t(c
on
sta
nt
pe
rfo
rma
nce
)
Time
100,000 100,000
10,000
1,000
100
10
10,000
1,000
100
10
D
P
A
C
![Page 22: biological informatics What are the differences among · One Human Sequence Typed in 10-pitch font, one human sequence would stretch for more than 5,000 miles. Digitally formatted,](https://reader034.vdocuments.us/reader034/viewer/2022042711/5f77afc4835d8e5461177513/html5/thumbnails/22.jpg)
22
Moore’s Law: The EffectP
erfo
rman
ce(c
on
sta
ntc
ost
)
Cos
t(c
on
sta
nt
pe
rfo
rma
nce
)
Time
100,000 100,000
10,000
1,000
100
10
10,000
1,000
100
10
D
P
A
C
Relevance for biology?Relevance for biology?
![Page 23: biological informatics What are the differences among · One Human Sequence Typed in 10-pitch font, one human sequence would stretch for more than 5,000 miles. Digitally formatted,](https://reader034.vdocuments.us/reader034/viewer/2022042711/5f77afc4835d8e5461177513/html5/thumbnails/23.jpg)
23
Cost (constant performance)
1975 1980 1985 1990 1995 2000 2005
1,000
10,000
100,000
1,000,000
10,000,000 UniversityPurchase
![Page 24: biological informatics What are the differences among · One Human Sequence Typed in 10-pitch font, one human sequence would stretch for more than 5,000 miles. Digitally formatted,](https://reader034.vdocuments.us/reader034/viewer/2022042711/5f77afc4835d8e5461177513/html5/thumbnails/24.jpg)
24
Cost (constant performance)
1975 1980 1985 1990 1995 2000 2005
1,000
10,000
100,000
1,000,000
10,000,000 UniversityPurchase
DepartmentPurchase
![Page 25: biological informatics What are the differences among · One Human Sequence Typed in 10-pitch font, one human sequence would stretch for more than 5,000 miles. Digitally formatted,](https://reader034.vdocuments.us/reader034/viewer/2022042711/5f77afc4835d8e5461177513/html5/thumbnails/25.jpg)
25
Cost (constant performance)
1975 1980 1985 1990 1995 2000 2005
1,000
10,000
100,000
1,000,000
10,000,000
ProjectPurchase
UniversityPurchase
DepartmentPurchase
![Page 26: biological informatics What are the differences among · One Human Sequence Typed in 10-pitch font, one human sequence would stretch for more than 5,000 miles. Digitally formatted,](https://reader034.vdocuments.us/reader034/viewer/2022042711/5f77afc4835d8e5461177513/html5/thumbnails/26.jpg)
26
Cost (constant performance)
1975 1980 1985 1990 1995 2000 2005
1,000
10,000
100,000
1,000,000
10,000,000
PersonalPurchase
ProjectPurchase
UniversityPurchase
DepartmentPurchase
![Page 27: biological informatics What are the differences among · One Human Sequence Typed in 10-pitch font, one human sequence would stretch for more than 5,000 miles. Digitally formatted,](https://reader034.vdocuments.us/reader034/viewer/2022042711/5f77afc4835d8e5461177513/html5/thumbnails/27.jpg)
27
Cost (constant performance)
1975 1980 1985 1990 1995 2000 2005
1,000
10,000
100,000
1,000,000
10,000,000
PersonalPurchase
ProjectPurchase
UniversityPurchase
DepartmentPurchase
UnknownPurchases
![Page 28: biological informatics What are the differences among · One Human Sequence Typed in 10-pitch font, one human sequence would stretch for more than 5,000 miles. Digitally formatted,](https://reader034.vdocuments.us/reader034/viewer/2022042711/5f77afc4835d8e5461177513/html5/thumbnails/28.jpg)
IT-BiologySynergismIT-BiologySynergism
![Page 29: biological informatics What are the differences among · One Human Sequence Typed in 10-pitch font, one human sequence would stretch for more than 5,000 miles. Digitally formatted,](https://reader034.vdocuments.us/reader034/viewer/2022042711/5f77afc4835d8e5461177513/html5/thumbnails/29.jpg)
29
IT is Special
Information Technology:
• affects the performanceand themanagement of tasks
• allows the manipulation of hugeamounts of highly complex data
• is incredibly plastic
Information Technology:
• affects the performanceand themanagement of tasks
• allows the manipulation of hugeamounts of highly complex data
• is incredibly plastic(programming and poetry are both exercises in pure thought)
• improves exponentially(Moore’s Law)
![Page 30: biological informatics What are the differences among · One Human Sequence Typed in 10-pitch font, one human sequence would stretch for more than 5,000 miles. Digitally formatted,](https://reader034.vdocuments.us/reader034/viewer/2022042711/5f77afc4835d8e5461177513/html5/thumbnails/30.jpg)
30
Biology is Special
Life is Characterized by:
• individuality
• historicity
• contingency
• high (digital) information content
Life is Characterized by:
• individuality
• historicity
• contingency
• high (digital) information content
No law of large numbers, since everyliving thing is genuinely unique.No law of large numbers, since everyliving thing is genuinely unique.
![Page 31: biological informatics What are the differences among · One Human Sequence Typed in 10-pitch font, one human sequence would stretch for more than 5,000 miles. Digitally formatted,](https://reader034.vdocuments.us/reader034/viewer/2022042711/5f77afc4835d8e5461177513/html5/thumbnails/31.jpg)
31
IT-Biology Synergism
• Physics needs calculus, the method formanipulating information aboutstatistically large numbers of vanishinglysmall, independent, equivalent things.
• Biology needs information technology, themethod for manipulating informationabout large numbers of dependent,historically contingent, individual things.
• Physics needs calculus, the method formanipulating information aboutstatistically large numbers of vanishinglysmall, independent, equivalent things.
• Biology needs information technology, themethod for manipulating informationabout large numbers of dependent,historically contingent, individual things.
![Page 32: biological informatics What are the differences among · One Human Sequence Typed in 10-pitch font, one human sequence would stretch for more than 5,000 miles. Digitally formatted,](https://reader034.vdocuments.us/reader034/viewer/2022042711/5f77afc4835d8e5461177513/html5/thumbnails/32.jpg)
32
Biology is Special
For it is in relation to the statistical point of viewthat the structure of the vital parts of livingorganisms differs so entirely from that of anypiece of matter that we physicists and chemistshave ever handled in our laboratories ormentally at our writing desks.
For it is in relation to the statistical point of viewthat the structure of the vital parts of livingorganisms differs so entirely from that of anypiece of matter that we physicists and chemistshave ever handled in our laboratories ormentally at our writing desks.
Erwin Schrödinger. 1944.What is Life.Erwin Schrödinger. 1944.What is Life.
![Page 33: biological informatics What are the differences among · One Human Sequence Typed in 10-pitch font, one human sequence would stretch for more than 5,000 miles. Digitally formatted,](https://reader034.vdocuments.us/reader034/viewer/2022042711/5f77afc4835d8e5461177513/html5/thumbnails/33.jpg)
33
Genetics as Code
[The] chromosomes ... contain in some kind of code-script the entire pattern of the individual's futuredevelopment and of its functioning in the mature state.... [By] code-script we mean that the all-penetratingmind, once conceived by Laplace, to which everycausal connection lay immediately open, could tellfrom their structure whether [an egg carrying them]would develop, under suitable conditions, into a blackcock or into a speckled hen, into a fly or a maize plant,a rhodo-dendron, a beetle, a mouse, or a woman.
[The] chromosomes ... contain in some kind of code-script the entire pattern of the individual's futuredevelopment and of its functioning in the mature state.... [By] code-script we mean that the all-penetratingmind, once conceived by Laplace, to which everycausal connection lay immediately open, could tellfrom their structure whether [an egg carrying them]would develop, under suitable conditions, into a blackcock or into a speckled hen, into a fly or a maize plant,a rhodo-dendron, a beetle, a mouse, or a woman.
Erwin Schrödinger. 1944.What is Life.Erwin Schrödinger. 1944.What is Life.
![Page 34: biological informatics What are the differences among · One Human Sequence Typed in 10-pitch font, one human sequence would stretch for more than 5,000 miles. Digitally formatted,](https://reader034.vdocuments.us/reader034/viewer/2022042711/5f77afc4835d8e5461177513/html5/thumbnails/34.jpg)
34
One Human Sequence
Typed in 10-pitch font, one human sequence would stretch for morethan 5,000 miles. Digitally formatted, it could be stored on one CD-ROM. Biologically encoded, it fits easily within a single cell.
We now know thatSchrödinger’s mysterioushuman “code-script”consists of 3.3 billionbase pairs of DNA.
![Page 35: biological informatics What are the differences among · One Human Sequence Typed in 10-pitch font, one human sequence would stretch for more than 5,000 miles. Digitally formatted,](https://reader034.vdocuments.us/reader034/viewer/2022042711/5f77afc4835d8e5461177513/html5/thumbnails/35.jpg)
35
One Human Sequence
For details on Ripley’s interest in DNA, see
HTTP://LX1.SELU.COM/~rjr/factoids/genlen.html
A variant of this factoid actuallymade it intoRipley’s Believe It orNot, but that’s another story...
![Page 36: biological informatics What are the differences among · One Human Sequence Typed in 10-pitch font, one human sequence would stretch for more than 5,000 miles. Digitally formatted,](https://reader034.vdocuments.us/reader034/viewer/2022042711/5f77afc4835d8e5461177513/html5/thumbnails/36.jpg)
36
Bio-digital Information
DNA is a highly efficient digital storage device:
• There is more mass-storage capacity in theDNA of a side of beef than in all the hard drivesof all the world’s computers.
• Storing all of the (redundant) information in allof the world’s DNA on computer hard diskswould require that the entire surface of the Earthbe covered to a depth of three miles in Conner1.0 gB drives.
DNA is a highly efficient digital storage device:
• There is more mass-storage capacity in theDNA of a side of beef than in all the hard drivesof all the world’s computers.
• Storing all of the (redundant) information in allof the world’s DNA on computer hard diskswould require that the entire surface of the Earthbe covered to a depth of three miles in Conner1.0 gB drives.
![Page 37: biological informatics What are the differences among · One Human Sequence Typed in 10-pitch font, one human sequence would stretch for more than 5,000 miles. Digitally formatted,](https://reader034.vdocuments.us/reader034/viewer/2022042711/5f77afc4835d8e5461177513/html5/thumbnails/37.jpg)
Genomics:An ExampleGenomics:
An Example
![Page 38: biological informatics What are the differences among · One Human Sequence Typed in 10-pitch font, one human sequence would stretch for more than 5,000 miles. Digitally formatted,](https://reader034.vdocuments.us/reader034/viewer/2022042711/5f77afc4835d8e5461177513/html5/thumbnails/38.jpg)
38
Human Genome Project - Goals– construction of a high-resolution genetic map of the human
genome;
– production of a variety of physical maps of all humanchromosomes and of the DNA of selected modelorganisms;
– determination of the complete sequence of human DNA andof the DNA of selected model organisms;
– development of capabilities for collecting, storing,distributing, and analyzing the data produced;
– creation of appropriate technologies necessary to achievethese objectives.
– construction of a high-resolution genetic map of the humangenome;
– production of a variety of physical maps of all humanchromosomes and of the DNA of selected modelorganisms;
– determination of the complete sequence of human DNA andof the DNA of selected model organisms;
– development of capabilities for collecting, storing,distributing, and analyzing the data produced;
– creation of appropriate technologies necessary to achievethese objectives.
USDOE. 1990.Understanding Our Genetic Inheritance.The U.S. Human Genome Project: The First Five Years.
USDOE. 1990.Understanding Our Genetic Inheritance.The U.S. Human Genome Project: The First Five Years.
![Page 39: biological informatics What are the differences among · One Human Sequence Typed in 10-pitch font, one human sequence would stretch for more than 5,000 miles. Digitally formatted,](https://reader034.vdocuments.us/reader034/viewer/2022042711/5f77afc4835d8e5461177513/html5/thumbnails/39.jpg)
39
Infrastructure and the HGP
Progress towards all of the [Genome Project]goals will require the establishment of well-funded centralized facilities, including a stockcenter for the cloned DNA fragmentsgenerated in the mapping and sequencingeffort and a data center for the computer-basedcollection and distribution of large amounts ofDNA sequence information.
Progress towards all of the [Genome Project]goals will require the establishment of well-funded centralized facilities, including a stockcenter for the cloned DNA fragmentsgenerated in the mapping and sequencingeffort and a data center for the computer-basedcollection and distribution of large amounts ofDNA sequence information.
National Research Council. 1988.Mapping and Sequencing theHuman Genome. Washington, DC: National Academy Press. p. 3
National Research Council. 1988.Mapping and Sequencing theHuman Genome. Washington, DC: National Academy Press. p. 3
![Page 40: biological informatics What are the differences among · One Human Sequence Typed in 10-pitch font, one human sequence would stretch for more than 5,000 miles. Digitally formatted,](https://reader034.vdocuments.us/reader034/viewer/2022042711/5f77afc4835d8e5461177513/html5/thumbnails/40.jpg)
40
GenBank Totals(Release 103)
DIVISION
Phage Sequences (PHG)Viral Sequences (VRL)
Bacteria (BCT)
Plant, Fungal, and Algal Sequences (PLN)
Invertebrate Sequences (INV)
Rodent Sequences (ROD)Primate Sequences (PRI1–2)
Other Mammals (MAM)Other Vertebrate Sequences (VRT)
High-Throughput Genome Sequences (HTG)
Genome Survey Sequences (GSS)Structural RNA Sequences (RNA)
Sequence Tagged Sites Sequences (STS)Patent Sequences (PAT)
Synthetic Sequences (SYN)Unannotated Sequences (UNA)
EST1-17
TOTALS
Entries
1,31345,35538,023
44,553
29,657
36,96775,58712,74417,713
1,120
42,6284,802
52,82487,767
2,5772,480
1,269,737
1,765,847
Base Pairs
2,138,81044,484,84888,576,641
92,259,434
105,703,550
45,437,309134,944,314
12,358,31017,040,159
72,064,395
22,783,3262,487,397
18,161,53227,593,724
5,698,9451,933,676
466,634,317
1,160,300,687
Per Cent
0.184%3.834%7.634%
7.951%
9.110%
3.916%11.630%
1.065%1.469%
6.211%
1.964%0.214%1.565%2.378%0.491%0.167%
40.217%
100.000%
Per Cent
0.074%2.568%2.153%
2.523%
1.679%
2.093%4.280%0.722%1.003%
0.063%
2.414%0.272%2.991%4.970%0.146%0.140%
71.905%
100.000%
![Page 41: biological informatics What are the differences among · One Human Sequence Typed in 10-pitch font, one human sequence would stretch for more than 5,000 miles. Digitally formatted,](https://reader034.vdocuments.us/reader034/viewer/2022042711/5f77afc4835d8e5461177513/html5/thumbnails/41.jpg)
41
Base Pairs in GenBank
0
200 ,000 ,000
400 ,000 ,000
600 ,000 ,000
800 ,000 ,000
1 ,000 ,000 ,000
1 ,200 ,000 ,000
0 5 10 15 20 25 30 35 40 45 50 55 60 65 70 75 80 85 90 95 100 105
GenBank Release Numbers
9493929190898887 95 96 97
![Page 42: biological informatics What are the differences among · One Human Sequence Typed in 10-pitch font, one human sequence would stretch for more than 5,000 miles. Digitally formatted,](https://reader034.vdocuments.us/reader034/viewer/2022042711/5f77afc4835d8e5461177513/html5/thumbnails/42.jpg)
42
Base Pairs in GenBank
0
200 ,000 ,000
400 ,000 ,000
600 ,000 ,000
800 ,000 ,000
1 ,000 ,000 ,000
1 ,200 ,000 ,000
0 5 10 15 20 25 30 35 40 45 50 55 60 65 70 75 80 85 90 95 100 105
GenBank Release Numbers
9493929190898887 95 96 97
Growth in GenBank is exponential.More data were added in the last tenweeks than were added in the first tenyears of the project.
Growth in GenBank is exponential.More data were added in the last tenweeks than were added in the first tenyears of the project.
At this rate, what’s next...At this rate, what’s next...
![Page 43: biological informatics What are the differences among · One Human Sequence Typed in 10-pitch font, one human sequence would stretch for more than 5,000 miles. Digitally formatted,](https://reader034.vdocuments.us/reader034/viewer/2022042711/5f77afc4835d8e5461177513/html5/thumbnails/43.jpg)
43
ABI Bass-o-Matic Sequencer
In with the sample, out with the sequence...
TGCGCATCGCGTATCGATAG
speed
gB/sec
EnterDefrost
7 8 9
4 5 6
1 2 3
0
+
-
![Page 44: biological informatics What are the differences among · One Human Sequence Typed in 10-pitch font, one human sequence would stretch for more than 5,000 miles. Digitally formatted,](https://reader034.vdocuments.us/reader034/viewer/2022042711/5f77afc4835d8e5461177513/html5/thumbnails/44.jpg)
44
What’s Really Next
The post-genome era in biologicalresearch will take for granted readyaccess to huge amounts of genomicdata.
The challenge will beunderstandingthose data and using the understandingto solve real-world problems...
![Page 45: biological informatics What are the differences among · One Human Sequence Typed in 10-pitch font, one human sequence would stretch for more than 5,000 miles. Digitally formatted,](https://reader034.vdocuments.us/reader034/viewer/2022042711/5f77afc4835d8e5461177513/html5/thumbnails/45.jpg)
45
Base Pairs in GenBank
0
20 ,000 ,000
40 ,000 ,000
60 ,000 ,000
80 ,000 ,000
100 ,000 ,000
120 ,000 ,000
140 ,000 ,000
0 5 10 15 20 25 30 35 40 45 50 55 60 65 70 75 80 85 90 95 100 105
GenBank Release Numbers
9493929190898887 95 96 97
Net ChangesNet Changes
![Page 46: biological informatics What are the differences among · One Human Sequence Typed in 10-pitch font, one human sequence would stretch for more than 5,000 miles. Digitally formatted,](https://reader034.vdocuments.us/reader034/viewer/2022042711/5f77afc4835d8e5461177513/html5/thumbnails/46.jpg)
46
Base Pairs in GenBank(Percent Increase)
0 .00%
10.00%
20.00%
30.00%
40.00%
50.00%
60.00%
70.00%
80.00%
90.00%
100 .00%
85 86 87 88 89 90 91 92 93 94 95 96
Year
Percent Increaseaverage = 56%
Percent Increaseaverage = 56%
![Page 47: biological informatics What are the differences among · One Human Sequence Typed in 10-pitch font, one human sequence would stretch for more than 5,000 miles. Digitally formatted,](https://reader034.vdocuments.us/reader034/viewer/2022042711/5f77afc4835d8e5461177513/html5/thumbnails/47.jpg)
47
Projected Base Pairs
110
1001 ,000
10 ,000100 ,000
1,000 ,00010,000 ,000
100 ,000 ,0001 ,000,000 ,000
10 ,000,000 ,000100 ,000,000 ,000
1 ,000 ,000 ,000 ,00010 ,000 ,000 ,000 ,000
100 ,000 ,000 ,000 ,000
90 95 0 5 10 15 20 25
Year
Assumed annual growth rate: 50%(less than current rate)
![Page 48: biological informatics What are the differences among · One Human Sequence Typed in 10-pitch font, one human sequence would stretch for more than 5,000 miles. Digitally formatted,](https://reader034.vdocuments.us/reader034/viewer/2022042711/5f77afc4835d8e5461177513/html5/thumbnails/48.jpg)
48
Projected Base Pairs
110
1001 ,000
10 ,000100 ,000
1,000 ,00010,000 ,000
100 ,000 ,0001 ,000,000 ,000
10 ,000,000 ,000100 ,000,000 ,000
1 ,000 ,000 ,000 ,00010 ,000 ,000 ,000 ,000
100 ,000 ,000 ,000 ,000
90 95 0 5 10 15 20 25
Year
Assumed annual growth rate: 50%(less than current rate)
Is this crazy?One trillion bp by 2015100 trillion by 2025
Is this crazy?One trillion bp by 2015100 trillion by 2025
![Page 49: biological informatics What are the differences among · One Human Sequence Typed in 10-pitch font, one human sequence would stretch for more than 5,000 miles. Digitally formatted,](https://reader034.vdocuments.us/reader034/viewer/2022042711/5f77afc4835d8e5461177513/html5/thumbnails/49.jpg)
49
Projected Base Pairs
110
1001 ,000
10 ,000100 ,000
1,000 ,00010,000 ,000
100 ,000 ,0001 ,000,000 ,000
10 ,000,000 ,000100 ,000,000 ,000
1 ,000 ,000 ,000 ,00010 ,000 ,000 ,000 ,000
100 ,000 ,000 ,000 ,000
90 95 0 5 10 15 20 25
Year
Is this crazy?One trillion bp by 2015100 trillion by 2025
Is this crazy?One trillion bp by 2015100 trillion by 2025
500,00050,0005,000
Maybe not...Maybe not...
Projected size of thesequence database,indicated as the numberof base pairs perindividual medicalrecord in the US.
![Page 50: biological informatics What are the differences among · One Human Sequence Typed in 10-pitch font, one human sequence would stretch for more than 5,000 miles. Digitally formatted,](https://reader034.vdocuments.us/reader034/viewer/2022042711/5f77afc4835d8e5461177513/html5/thumbnails/50.jpg)
50
Projected Base Pairs
110
1001 ,000
10 ,000100 ,000
1,000 ,00010,000 ,000
100 ,000 ,0001 ,000,000 ,000
10 ,000,000 ,000100 ,000,000 ,000
1 ,000 ,000 ,000 ,00010 ,000 ,000 ,000 ,000
100 ,000 ,000 ,000 ,000
90 95 0 5 10 15 20 25
Year
Is this crazy?One trillion bp by 2015100 trillion by 2025
Is this crazy?One trillion bp by 2015100 trillion by 2025
500,00050,0005,000
Maybe not...Maybe not...
Projected size of thesequence database,indicated as the numberof base pairs perindividual medicalrecord in the US.
Note too:The amount of digital data necessary tostore 1014 bases of DNA is only a fractionof the data necessary to describe theworld’s microbial biodiversity at onesquare meter resolution...
Note too:The amount of digital data necessary tostore 1014 bases of DNA is only a fractionof the data necessary to describe theworld’s microbial biodiversity at onesquare meter resolution...
![Page 51: biological informatics What are the differences among · One Human Sequence Typed in 10-pitch font, one human sequence would stretch for more than 5,000 miles. Digitally formatted,](https://reader034.vdocuments.us/reader034/viewer/2022042711/5f77afc4835d8e5461177513/html5/thumbnails/51.jpg)
21st CenturyBiology
Post-Genome Era
![Page 52: biological informatics What are the differences among · One Human Sequence Typed in 10-pitch font, one human sequence would stretch for more than 5,000 miles. Digitally formatted,](https://reader034.vdocuments.us/reader034/viewer/2022042711/5f77afc4835d8e5461177513/html5/thumbnails/52.jpg)
52
The Post-Genome Era
Post-genome research involves:
• applying genomic tools and knowledge to moregeneral problems
• asking new questions, tractable only to genomicor post-genomic analysis
• moving beyond the structural genomics of thehuman genome project and into the functionalgenomics of the post-genome era
Post-genome research involves:
• applying genomic tools and knowledge to moregeneral problems
• asking new questions, tractable only to genomicor post-genomic analysis
• moving beyond the structural genomics of thehuman genome project and into the functionalgenomics of the post-genome era
![Page 53: biological informatics What are the differences among · One Human Sequence Typed in 10-pitch font, one human sequence would stretch for more than 5,000 miles. Digitally formatted,](https://reader034.vdocuments.us/reader034/viewer/2022042711/5f77afc4835d8e5461177513/html5/thumbnails/53.jpg)
53
The Post-Genome Era
Suggested definition:
• functional genomics = biology
Suggested definition:
• functional genomics = biology
![Page 54: biological informatics What are the differences among · One Human Sequence Typed in 10-pitch font, one human sequence would stretch for more than 5,000 miles. Digitally formatted,](https://reader034.vdocuments.us/reader034/viewer/2022042711/5f77afc4835d8e5461177513/html5/thumbnails/54.jpg)
54
The Post-Genome Era
An early analysis:An early analysis:
Walter Gilbert. 1991. Towards a paradigmshift in biology. Nature, 349:99.
![Page 55: biological informatics What are the differences among · One Human Sequence Typed in 10-pitch font, one human sequence would stretch for more than 5,000 miles. Digitally formatted,](https://reader034.vdocuments.us/reader034/viewer/2022042711/5f77afc4835d8e5461177513/html5/thumbnails/55.jpg)
55
Paradigm Shift in Biology
To use [the] flood of knowledge, which will pouracross the computer networks of the world,biologists not only must become computerliterate, but also change their approach to theproblem of understanding life.
To use [the] flood of knowledge, which will pouracross the computer networks of the world,biologists not only must become computerliterate, but also change their approach to theproblem of understanding life.
Walter Gilbert. 1991. Towards a paradigm shift in biology.Nature, 349:99.Walter Gilbert. 1991. Towards a paradigm shift in biology.Nature, 349:99.
![Page 56: biological informatics What are the differences among · One Human Sequence Typed in 10-pitch font, one human sequence would stretch for more than 5,000 miles. Digitally formatted,](https://reader034.vdocuments.us/reader034/viewer/2022042711/5f77afc4835d8e5461177513/html5/thumbnails/56.jpg)
56
Paradigm Shift in Biology
The new paradigm, now emerging, is that all the‘genes’ will be known (in the sense of beingresident in databases available electronically),and that the starting point of a biologicalinvestigation will be theoretical. An individualscientist will begin with a theoretical conjecture,only then turning to experiment to follow or testthat hypothesis.
The new paradigm, now emerging, is that all the‘genes’ will be known (in the sense of beingresident in databases available electronically),and that the starting point of a biologicalinvestigation will be theoretical. An individualscientist will begin with a theoretical conjecture,only then turning to experiment to follow or testthat hypothesis.
Walter Gilbert. 1991. Towards a paradigm shift in biology.Nature, 349:99.Walter Gilbert. 1991. Towards a paradigm shift in biology.Nature, 349:99.
![Page 57: biological informatics What are the differences among · One Human Sequence Typed in 10-pitch font, one human sequence would stretch for more than 5,000 miles. Digitally formatted,](https://reader034.vdocuments.us/reader034/viewer/2022042711/5f77afc4835d8e5461177513/html5/thumbnails/57.jpg)
57
Paradigm Shift in BiologyCase of Microbiology
< 5,000 known and described bacteria
5,000,000 base pairs per genome
25,000,000,000 TOTAL base pairs
If a full, annotated sequence were available for all known bacteria, the practiceof microbiology would match Gilbert’s prediction.
If a full, annotated sequence were available for all known bacteria, the practiceof microbiology would match Gilbert’s prediction.
![Page 58: biological informatics What are the differences among · One Human Sequence Typed in 10-pitch font, one human sequence would stretch for more than 5,000 miles. Digitally formatted,](https://reader034.vdocuments.us/reader034/viewer/2022042711/5f77afc4835d8e5461177513/html5/thumbnails/58.jpg)
21st CenturyBiology
The Science
![Page 59: biological informatics What are the differences among · One Human Sequence Typed in 10-pitch font, one human sequence would stretch for more than 5,000 miles. Digitally formatted,](https://reader034.vdocuments.us/reader034/viewer/2022042711/5f77afc4835d8e5461177513/html5/thumbnails/59.jpg)
59
Fundamental Dogma
The fundamental dogma of molecular biologyis that genes act to create phenotypes througha flow of information from DNA to RNA toproteins, to interactions among proteins(regulatory circuits and metabolic pathways),and ultimately to phenotypes.
Collections of individual phenotypes, ofcourse, constitute a population.
DNA
RNA
Proteins
Circuits
Phenotypes
Populations
![Page 60: biological informatics What are the differences among · One Human Sequence Typed in 10-pitch font, one human sequence would stretch for more than 5,000 miles. Digitally formatted,](https://reader034.vdocuments.us/reader034/viewer/2022042711/5f77afc4835d8e5461177513/html5/thumbnails/60.jpg)
60
Fundamental DogmaDNA
RNA
Proteins
Circuits
Phenotypes
Populations
GenBankEMBLDDBJ
MapDatabases
SwissPROTPIR
PDB
Although a few databases already existto distribute molecular information,
Although a few databases already existto distribute molecular information,
![Page 61: biological informatics What are the differences among · One Human Sequence Typed in 10-pitch font, one human sequence would stretch for more than 5,000 miles. Digitally formatted,](https://reader034.vdocuments.us/reader034/viewer/2022042711/5f77afc4835d8e5461177513/html5/thumbnails/61.jpg)
61
Fundamental DogmaDNA
RNA
Proteins
Circuits
Phenotypes
Populations
GenBankEMBLDDBJ
MapDatabases
SwissPROTPIR
PDB
Gene Expression?
Clinical Data ?
Regulatory Pathways?Metabolism?
Biodiversity?
Neuroanatomy?
Development ?
Molecular Epidemiology?
Comparative Genomics?
the post-genomic era will need manymore to collect, manage, and publishthe coming flood of new findings.
the post-genomic era will need manymore to collect, manage, and publishthe coming flood of new findings.
Although a few databases already existto distribute molecular information,
Although a few databases already existto distribute molecular information,
![Page 62: biological informatics What are the differences among · One Human Sequence Typed in 10-pitch font, one human sequence would stretch for more than 5,000 miles. Digitally formatted,](https://reader034.vdocuments.us/reader034/viewer/2022042711/5f77afc4835d8e5461177513/html5/thumbnails/62.jpg)
62
Fundamental DogmaDNA
RNA
Proteins
Circuits
Phenotypes
Populations
GenBankEMBLDDBJ
MapDatabases
SwissPROTPIR
PDB
Gene Expression?
Clinical Data ?
Regulatory Pathways?Metabolism?
Biodiversity?
Neuroanatomy?
Development ?
Molecular Epidemiology?
Comparative Genomics?
the post-genomic era will need manymore to collect, manage, and publishthe coming flood of new findings.
the post-genomic era will need manymore to collect, manage, and publishthe coming flood of new findings.
Although a few databases already existto distribute molecular information,
Although a few databases already existto distribute molecular information,
If this extension covers functionalgenomics, then “functional genomics”is equivalent to biology.
If this extension covers functionalgenomics, then “functional genomics”is equivalent to biology.
![Page 63: biological informatics What are the differences among · One Human Sequence Typed in 10-pitch font, one human sequence would stretch for more than 5,000 miles. Digitally formatted,](https://reader034.vdocuments.us/reader034/viewer/2022042711/5f77afc4835d8e5461177513/html5/thumbnails/63.jpg)
21st CenturyBiology
The Literature
![Page 64: biological informatics What are the differences among · One Human Sequence Typed in 10-pitch font, one human sequence would stretch for more than 5,000 miles. Digitally formatted,](https://reader034.vdocuments.us/reader034/viewer/2022042711/5f77afc4835d8e5461177513/html5/thumbnails/64.jpg)
64
P I R -- Beta Hemoglobin----------------------------------------------DEFINITION
[HBHU] Hemoglobin beta chain- Human, chimpanzee, pygmy
chimpanzee, and gorilla
SUMMARY [SUM]#Type Protein#Molecular-weight 15867#Length 146#Checksum 1242
SEQUENCEV H L T P E E K S A V T A L W GK V N V D E V G G E A L G R L LV V Y P W T Q R F F E S F G D LS T P D A V M G N P K V K A H GK K V L G A F S D G L A H L D NL K G T F A T L S E L H C D K LH V D P E N F R L L G N V L V C
G D B -- Beta Hemoglobin----------------------------------------------
** Locus Detail View **--------------------------------------------------------Symbol: HBBName: hemoglobin, betaMIM Num: 141900Location: 11p15.5Created: 01 Jan 86 00:00
--------------------------------------------------------** Polymorphism Table **
--------------------------------------------------------Probe Enzyme=================================beta-globin cDNA RsaIbeta-globin cDNA,JW10+ AvaIIPstbeta,JW102,BD23,pB+ BamHIpRK29,Unknown HindIIbeta-IVS2 probe HphIIVS-2 normal HphIUnknown AvrIIbeta-IVS2 probe AsuI
Electronic Data Publishing
O M I M -- Beta Hemoglobin----------------------------------------------Title*141900 HEMOGLOBIN--BETA LOCUS[HBB; SICKLE CELL ANEMIA,INCLUDED; BETA-THALASSEMIAS,INCLUDED; HEINZ BODY ANEMIAS,BETA-GLOBIN TYPE, ...]
The alpha and beta loci determine thestructure of the 2 types of polypeptidechains in adult hemoglobin, Hb A. Byautoradiography using heavy-labeledhemoglobin-specific messenger RNA,Price et al. (1972) found labeling of achromosome 2 and a group B chromo-some. They concluded, incorrectly as itturned out, that the beta-gamma-deltalinkage group was on a group Bchromosome since the zone of labelingwas longer on that chromosome than onchromosome 2 (which by this reasoning
GenBank -- Beta Hemoglobin----------------------------------------------DEFINITION [DEF]
[HUMHBB] Human betaglobin region
LOCUS [LOC]HUMHBB
ACCESSION NO. [ACC]J00179 J00093 J00094 J00096J00158 J00159 J00160 J00161
KEYWORDS [KEY]Alu repetitive element; HPFH;KpnI repetitive sequence; RNApolymerase III; allelicvariation; alternate cap site;
SEQUENCEgaattctaatctccctctcactactgtctagtatccctcaaggagtggtggctcatgtcttgagctcaagagtttgatataaaaaaaaattagccaggcaaatgggaggatcccttgagcgcactcca
Many findings will be publishedin structured databases...Many findings will be publishedin structured databases...
![Page 65: biological informatics What are the differences among · One Human Sequence Typed in 10-pitch font, one human sequence would stretch for more than 5,000 miles. Digitally formatted,](https://reader034.vdocuments.us/reader034/viewer/2022042711/5f77afc4835d8e5461177513/html5/thumbnails/65.jpg)
65
Electronic Scholarly Publishing
HTTP://WWW.ESP.ORGFor example, the ESP site is dedicated to the electronicpublishing of scientific and other scholarly materials. Ofparticular interest are the history of science, genetics,computational biology, and genome research.
Scientific papers will also bepublished in electronic format,perhaps even including someimportant papers originallypublished only in print format.
Scientific papers will also bepublished in electronic format,perhaps even including someimportant papers originallypublished only in print format.
![Page 66: biological informatics What are the differences among · One Human Sequence Typed in 10-pitch font, one human sequence would stretch for more than 5,000 miles. Digitally formatted,](https://reader034.vdocuments.us/reader034/viewer/2022042711/5f77afc4835d8e5461177513/html5/thumbnails/66.jpg)
66
Electronic Scholarly Publishing
TheClassical Genetics:Foundationsseriesprovides ready accessto typeset-quality,electronic editions ofimportant publicationsthat can otherwise bevery difficult to find.
![Page 67: biological informatics What are the differences among · One Human Sequence Typed in 10-pitch font, one human sequence would stretch for more than 5,000 miles. Digitally formatted,](https://reader034.vdocuments.us/reader034/viewer/2022042711/5f77afc4835d8e5461177513/html5/thumbnails/67.jpg)
67
Electronic Scholarly Publishing
For example, “Hardy” (ofHardy-Weinberg) is aname well known to moststudents of biology, butfew have access to hisoriginal publicationsdealing with the basicprinciples of populationgenetics.
![Page 68: biological informatics What are the differences among · One Human Sequence Typed in 10-pitch font, one human sequence would stretch for more than 5,000 miles. Digitally formatted,](https://reader034.vdocuments.us/reader034/viewer/2022042711/5f77afc4835d8e5461177513/html5/thumbnails/68.jpg)
68
Electronic Scholarly Publishing
And when they do haveaccess, they may besurprised to find thatallof Hardy’s biologicalwritings consist of asingle, one-page letter tothe editor ofScience.
![Page 69: biological informatics What are the differences among · One Human Sequence Typed in 10-pitch font, one human sequence would stretch for more than 5,000 miles. Digitally formatted,](https://reader034.vdocuments.us/reader034/viewer/2022042711/5f77afc4835d8e5461177513/html5/thumbnails/69.jpg)
21st CenturyBiology
The People
![Page 70: biological informatics What are the differences among · One Human Sequence Typed in 10-pitch font, one human sequence would stretch for more than 5,000 miles. Digitally formatted,](https://reader034.vdocuments.us/reader034/viewer/2022042711/5f77afc4835d8e5461177513/html5/thumbnails/70.jpg)
70
Human Resources Issues
• Reduction in need for non-IT staff
• Increase in need for IT staff, especially“information engineers”
• Reduction in need for non-IT staff
• Increase in need for IT staff, especially“information engineers”
In modern biology, a general trend is toconvert expert work into staff work andfinally into computation. New expertise isrequired to design, carry out, and interpretcontinuing work.
In modern biology, a general trend is toconvert expert work into staff work andfinally into computation. New expertise isrequired to design, carry out, and interpretcontinuing work.
![Page 71: biological informatics What are the differences among · One Human Sequence Typed in 10-pitch font, one human sequence would stretch for more than 5,000 miles. Digitally formatted,](https://reader034.vdocuments.us/reader034/viewer/2022042711/5f77afc4835d8e5461177513/html5/thumbnails/71.jpg)
71
Human Resources Issues
Elbert Branscomb: “You must recognize thatsome day you may need as many computerscientists as biologists in your labs.”
Craig Venter: “At TIGR, we already havetwice as many computer scientists on ourstaff.”
Elbert Branscomb: “You must recognize thatsome day you may need as many computerscientists as biologists in your labs.”
Craig Venter: “At TIGR, we already havetwice as many computer scientists on ourstaff.”
Exchange at DOE workshop on high-throughput sequencing.
Exchange at DOE workshop on high-throughput sequencing.
![Page 72: biological informatics What are the differences among · One Human Sequence Typed in 10-pitch font, one human sequence would stretch for more than 5,000 miles. Digitally formatted,](https://reader034.vdocuments.us/reader034/viewer/2022042711/5f77afc4835d8e5461177513/html5/thumbnails/72.jpg)
New Disciplineof Informatics
New Disciplineof Informatics
![Page 73: biological informatics What are the differences among · One Human Sequence Typed in 10-pitch font, one human sequence would stretch for more than 5,000 miles. Digitally formatted,](https://reader034.vdocuments.us/reader034/viewer/2022042711/5f77afc4835d8e5461177513/html5/thumbnails/73.jpg)
73
What is Informatics?
Informatics
ComputerScience
Research
BiologicalApplicationPrograms
Informatics is a conveyor belt that carries thefruits of basic computer science research tothe biological community, transforming themalong the way from theory to practicalapplications.
Informatics is a conveyor belt that carries thefruits of basic computer science research tothe biological community, transforming themalong the way from theory to practicalapplications.
![Page 74: biological informatics What are the differences among · One Human Sequence Typed in 10-pitch font, one human sequence would stretch for more than 5,000 miles. Digitally formatted,](https://reader034.vdocuments.us/reader034/viewer/2022042711/5f77afc4835d8e5461177513/html5/thumbnails/74.jpg)
74
What is Informatics?
Informatics combines expertise from:
• domain science (e.g., biology)
• computer science
• library science
• management science
Informatics combines expertise from:
• domain science (e.g., biology)
• computer science
• library science
• management science
All tempered with an engineering mindset...All tempered with an engineering mindset...
![Page 75: biological informatics What are the differences among · One Human Sequence Typed in 10-pitch font, one human sequence would stretch for more than 5,000 miles. Digitally formatted,](https://reader034.vdocuments.us/reader034/viewer/2022042711/5f77afc4835d8e5461177513/html5/thumbnails/75.jpg)
75
What is Informatics?
ISIS
MedicalInformatics
MedicalInformatics
BioInformatics
BioInformatics
OtherInformatics
OtherInformatics
LibraryScience
LibraryScience
ComputerScience
ComputerScience
MgtScience
MgtScience
DomainKnowledge
EngineeringPrinciples
![Page 76: biological informatics What are the differences among · One Human Sequence Typed in 10-pitch font, one human sequence would stretch for more than 5,000 miles. Digitally formatted,](https://reader034.vdocuments.us/reader034/viewer/2022042711/5f77afc4835d8e5461177513/html5/thumbnails/76.jpg)
76
Engineering Mindset
Engineering is often defined as the use ofscientific knowledge and principles for practicalpurposes. While the original usage restrictedthe word to the building of roads, bridges, andobjects of military use, today's usage is moregeneral and includes chemical, electronic, andeven mathematical engineering.
Engineering is often defined as the use ofscientific knowledge and principles for practicalpurposes. While the original usage restrictedthe word to the building of roads, bridges, andobjects of military use, today's usage is moregeneral and includes chemical, electronic, andeven mathematical engineering.
Parnas, David Lorge. 1990.Computer, 23(1):17-22.Parnas, David Lorge. 1990.Computer, 23(1):17-22.
... or even information engineering.
![Page 77: biological informatics What are the differences among · One Human Sequence Typed in 10-pitch font, one human sequence would stretch for more than 5,000 miles. Digitally formatted,](https://reader034.vdocuments.us/reader034/viewer/2022042711/5f77afc4835d8e5461177513/html5/thumbnails/77.jpg)
77
Engineering Mindset
Engineering education ... stresses findinggood, as contrasted with workable, designs.Where a scientist may be happy with a devicethat validates his theory, an engineer is taughtto make sure that the device is efficient,reliable, safe, easy to use, and robust.
Engineering education ... stresses findinggood, as contrasted with workable, designs.Where a scientist may be happy with a devicethat validates his theory, an engineer is taughtto make sure that the device is efficient,reliable, safe, easy to use, and robust.
Parnas, David Lorge. 1990.Computer, 23(1):17-22.Parnas, David Lorge. 1990.Computer, 23(1):17-22.
The assembly of working, robust systems, on time and onbudget, is the key requirement for a federated informationinfrastructure for biology.
![Page 78: biological informatics What are the differences among · One Human Sequence Typed in 10-pitch font, one human sequence would stretch for more than 5,000 miles. Digitally formatted,](https://reader034.vdocuments.us/reader034/viewer/2022042711/5f77afc4835d8e5461177513/html5/thumbnails/78.jpg)
78
Informatics Triangle
BS
LSCS
MS
MS MS
IS
IS
ISIS
An unfolded tetrahedron canrepresent the relationships amongmanagement science (MS), computerscience (CS), library science (LS),and biological science (BS), as theyconnect around a core of informationscience (IS), or informatics.
An unfolded tetrahedron canrepresent the relationships amongmanagement science (MS), computerscience (CS), library science (LS),and biological science (BS), as theyconnect around a core of informationscience (IS), or informatics.
![Page 79: biological informatics What are the differences among · One Human Sequence Typed in 10-pitch font, one human sequence would stretch for more than 5,000 miles. Digitally formatted,](https://reader034.vdocuments.us/reader034/viewer/2022042711/5f77afc4835d8e5461177513/html5/thumbnails/79.jpg)
79
What is Informatics?
MS
IS
BS
LSCS
This might also be drawn like amethane molecule...
This might also be drawn like amethane molecule...
![Page 80: biological informatics What are the differences among · One Human Sequence Typed in 10-pitch font, one human sequence would stretch for more than 5,000 miles. Digitally formatted,](https://reader034.vdocuments.us/reader034/viewer/2022042711/5f77afc4835d8e5461177513/html5/thumbnails/80.jpg)
FederatedInformation
Infrastructure
FederatedInformation
Infrastructure
![Page 81: biological informatics What are the differences among · One Human Sequence Typed in 10-pitch font, one human sequence would stretch for more than 5,000 miles. Digitally formatted,](https://reader034.vdocuments.us/reader034/viewer/2022042711/5f77afc4835d8e5461177513/html5/thumbnails/81.jpg)
81
ODN Model
ODN Bearer ServiceLayer 1
ATM
Network TechnologySubstrate
LANs
SMDS
Wireless
Dial-upModems
FrameRelay
Point-to-PointCircuits
DirectBroadcastSatellites
Middleware ServicesLayer 3
ServiceDirectories
Privacy
ElectronicMoney
StorageRepositories
MultisiteCoordination
FileSystems
Security
NameServers
ApplicationsLayer 4
VideoServer
Fax AudioServer
FinancialServices
ElectronicMail
Tele-Conferencing
InteractiveEducation
InformationBrowsing
RemoteLogin
Open BearerService Interface
Transport Services andRepresentation Standards
Layer 2
(fax, video, audio, text, etc.)
Several years ago, a report,Realizing the InformationFuture,laid out a vision of anOpen Data Network, in whichany information appliancecould be operated over genericnetworking protocols...
Several years ago, a report,Realizing the InformationFuture,laid out a vision of anOpen Data Network, in whichany information appliancecould be operated over genericnetworking protocols...
![Page 82: biological informatics What are the differences among · One Human Sequence Typed in 10-pitch font, one human sequence would stretch for more than 5,000 miles. Digitally formatted,](https://reader034.vdocuments.us/reader034/viewer/2022042711/5f77afc4835d8e5461177513/html5/thumbnails/82.jpg)
82
National Information Infrastructure
analog
commercialuses
non-commercialuses
digital
Edu Libother ResETC
�A national information infrastructure can be partitioned according to technology(analog vs digital) and usage (commercial vs non-commercial). Scientific use of theinternet involves the non-commercial, digital component. Entertainment, telephony,and communication now primarily falls in the commercial, analog realm.
A national information infrastructure can be partitioned according to technology(analog vs digital) and usage (commercial vs non-commercial). Scientific use of theinternet involves the non-commercial, digital component. Entertainment, telephony,and communication now primarily falls in the commercial, analog realm.
![Page 83: biological informatics What are the differences among · One Human Sequence Typed in 10-pitch font, one human sequence would stretch for more than 5,000 miles. Digitally formatted,](https://reader034.vdocuments.us/reader034/viewer/2022042711/5f77afc4835d8e5461177513/html5/thumbnails/83.jpg)
83
FIIST & NII
FIIST(science & technology)
FIIE(engineering)
FIIS(science)
FII(climatology)
FII(chemistry)
FII(geology)
FII(physics)
FII(biology)
FII(physiology)
FII(ecology)
FII(structural biology)
FII(• • •)
FII(geography)
FII(human)
FII(mouse)
FII(• • •)
FII(Arabidopsis)
FII(genomics)
FII(E. coli)
FII(zoology)
FII(botany)
FII(• • •)
FII(systematics)
FII(• • •)
analog
commercialuses
non-commercialuses
digital
Edu Libother ResETC
�
The research component ofthe NII contains a FederatedInformation Infrastructurefor Science and Technology..
The research component ofthe NII contains a FederatedInformation Infrastructurefor Science and Technology..
![Page 84: biological informatics What are the differences among · One Human Sequence Typed in 10-pitch font, one human sequence would stretch for more than 5,000 miles. Digitally formatted,](https://reader034.vdocuments.us/reader034/viewer/2022042711/5f77afc4835d8e5461177513/html5/thumbnails/84.jpg)
84
FIIST
FIIST(science & technology)
FIIE(engineering)
FIIS(science)
FII(climatology)
FII(chemistry)
FII(geology)
FII(physics)
FII(biology)
FII(• • •)
FII(geography)
![Page 85: biological informatics What are the differences among · One Human Sequence Typed in 10-pitch font, one human sequence would stretch for more than 5,000 miles. Digitally formatted,](https://reader034.vdocuments.us/reader034/viewer/2022042711/5f77afc4835d8e5461177513/html5/thumbnails/85.jpg)
85
FIIB
FII(biology)
FII(physiology)
FII(ecology)
FII(structural biology)
FII(human)
FII(mouse)
FII(• • •)
FII(Arabidopsis)
FII(genomics)
FII(E. coli)
FII(zoology)
FII(botany)
FII(• • •)
FII(systematics)
FII(• • •)
![Page 86: biological informatics What are the differences among · One Human Sequence Typed in 10-pitch font, one human sequence would stretch for more than 5,000 miles. Digitally formatted,](https://reader034.vdocuments.us/reader034/viewer/2022042711/5f77afc4835d8e5461177513/html5/thumbnails/86.jpg)
Funding forBio-Information
Infrastructure
Funding forBio-Information
Infrastructure
![Page 87: biological informatics What are the differences among · One Human Sequence Typed in 10-pitch font, one human sequence would stretch for more than 5,000 miles. Digitally formatted,](https://reader034.vdocuments.us/reader034/viewer/2022042711/5f77afc4835d8e5461177513/html5/thumbnails/87.jpg)
87
Call for Change
Among the many new tools that are or will be needed (for 21st-century biology), some of those having the highest priority are:
• bioinformatics
• computational biology
• functional imaging tools using biosensors and biomarkers
• transformation and transient expression technologies
• nanotechnologies
Among the many new tools that are or will be needed (for 21st-century biology), some of those having the highest priority are:
• bioinformatics
• computational biology
• functional imaging tools using biosensors and biomarkers
• transformation and transient expression technologies
• nanotechnologies
Impact of Emerging Technologies on the Biological Sciences: Report of aWorkshop.NSF-supported workshop, held 26-27 June 1995, Washington, DC.
Impact of Emerging Technologies on the Biological Sciences: Report of aWorkshop.NSF-supported workshop, held 26-27 June 1995, Washington, DC.
![Page 88: biological informatics What are the differences among · One Human Sequence Typed in 10-pitch font, one human sequence would stretch for more than 5,000 miles. Digitally formatted,](https://reader034.vdocuments.us/reader034/viewer/2022042711/5f77afc4835d8e5461177513/html5/thumbnails/88.jpg)
88
The Problem
• IT moves at “Internet Speed” and respondsrapidly to market forces.
• IT will play a central role in 21st Centurybiology.
• Current levels of support for public bio-information infrastructure are too low.
• Reallocation of federal funding is difficult,and subject to political pressures.
• Federal-funding decision processes areponderously slow and inefficient.
• IT moves at “Internet Speed” and respondsrapidly to market forces.
• IT will play a central role in 21st Centurybiology.
• Current levels of support for public bio-information infrastructure are too low.
• Reallocation of federal funding is difficult,and subject to political pressures.
• Federal-funding decision processes areponderously slow and inefficient.
![Page 89: biological informatics What are the differences among · One Human Sequence Typed in 10-pitch font, one human sequence would stretch for more than 5,000 miles. Digitally formatted,](https://reader034.vdocuments.us/reader034/viewer/2022042711/5f77afc4835d8e5461177513/html5/thumbnails/89.jpg)
89
Public Funding of Bio-Databases
The challenges:
• providing adequate funding levels
• making timely, efficient decisions
The challenges:
• providing adequate funding levels
• making timely, efficient decisions
![Page 90: biological informatics What are the differences among · One Human Sequence Typed in 10-pitch font, one human sequence would stretch for more than 5,000 miles. Digitally formatted,](https://reader034.vdocuments.us/reader034/viewer/2022042711/5f77afc4835d8e5461177513/html5/thumbnails/90.jpg)
IT Budgets
A Reality Check
![Page 91: biological informatics What are the differences among · One Human Sequence Typed in 10-pitch font, one human sequence would stretch for more than 5,000 miles. Digitally formatted,](https://reader034.vdocuments.us/reader034/viewer/2022042711/5f77afc4835d8e5461177513/html5/thumbnails/91.jpg)
91
Rhetorical Question
Which is likely to be more complex:
• identifying, documenting, and tracking thewhereabouts ofall parcels in transit in the US atone time
• identifying, documenting, and analyzing thestructure and function ofall individual genes inall economically significant organisms; thenanalyzingall significant gene-gene and gene-environment interactions in those organismsand their environments
Which is likely to be more complex:
• identifying, documenting, and tracking thewhereabouts ofall parcels in transit in the US atone time
• identifying, documenting, and analyzing thestructure and function ofall individual genes inall economically significant organisms; thenanalyzingall significant gene-gene and gene-environment interactions in those organismsand their environments
![Page 92: biological informatics What are the differences among · One Human Sequence Typed in 10-pitch font, one human sequence would stretch for more than 5,000 miles. Digitally formatted,](https://reader034.vdocuments.us/reader034/viewer/2022042711/5f77afc4835d8e5461177513/html5/thumbnails/92.jpg)
92
Business Factoids
United Parcel Service:
• uses two redundant 3 Terabyte (yes, 3000 GB)databases to track all packages in transit.
• has 4,000 full-time employees dedicated to IT
• spends one billion dollars per year on IT
• has an income of 1.1 billion dollars, againstrevenues of 22.4 billion dollars
United Parcel Service:
• uses two redundant 3 Terabyte (yes, 3000 GB)databases to track all packages in transit.
• has 4,000 full-time employees dedicated to IT
• spends one billion dollars per year on IT
• has an income of 1.1 billion dollars, againstrevenues of 22.4 billion dollars
![Page 93: biological informatics What are the differences among · One Human Sequence Typed in 10-pitch font, one human sequence would stretch for more than 5,000 miles. Digitally formatted,](https://reader034.vdocuments.us/reader034/viewer/2022042711/5f77afc4835d8e5461177513/html5/thumbnails/93.jpg)
93
Business ComparisonsCompany Revenues IT Budget Pct
Bristol-Myers Squibb 15,065,000,000 440,000,000 2.92 %
Pfizer 11,306,000,000 300,000,000 2.65 %
Pacific Gas & Electric 10,000,000,000 250,000,000 2.50 %
K-Mart 31,437,000,000 130,000,000 0.41 %
Wal-Mart 104,859,000,000 550,000,000 0.52 %
Sprint 14,235,000,000 873,000,000 6.13 %
MCI 18,500,000,000 1,000,000,000 5.41 %
United Parcel 22,400,000,000 1,000,000,000 4.46 %
AMR Corporation 17,753,000,000 1,368,000,000 7.71 %
IBM 75,947,000,000 4,400,000,000 5.79 %
Microsoft 11,360,000,000 510,000,000 4.49 %
Chase-Manhattan 16,431,000,000 1,800,000,000 10.95 %
Nation’s Bank 17,509,000,000 1,130,000,000 6.45 %
![Page 94: biological informatics What are the differences among · One Human Sequence Typed in 10-pitch font, one human sequence would stretch for more than 5,000 miles. Digitally formatted,](https://reader034.vdocuments.us/reader034/viewer/2022042711/5f77afc4835d8e5461177513/html5/thumbnails/94.jpg)
94
Federal Funding of Biomedical-IT
Appropriate funding level:
• approx. 5-10% of research funding
• i.e., 1 - 2billion dollars per year
Appropriate funding level:
• approx. 5-10% of research funding
• i.e., 1 - 2billion dollars per year
Source of estimate:
- Experience of IT-transformed industries.
- Current support for IT-rich biological research.
Source of estimate:
- Experience of IT-transformed industries.
- Current support for IT-rich biological research.
![Page 95: biological informatics What are the differences among · One Human Sequence Typed in 10-pitch font, one human sequence would stretch for more than 5,000 miles. Digitally formatted,](https://reader034.vdocuments.us/reader034/viewer/2022042711/5f77afc4835d8e5461177513/html5/thumbnails/95.jpg)
Conference on Biological Informatics
Conference Sessions
• Overview of Biological Informatics
• Biodiversity Informatics
• Environmental Informatics
• Molecular Informatics
• Medical / Neuroinformatics
• Teaching and Training in Informatics
Conference Sessions
• Overview of Biological Informatics
• Biodiversity Informatics
• Environmental Informatics
• Molecular Informatics
• Medical / Neuroinformatics
• Teaching and Training in Informatics
![Page 96: biological informatics What are the differences among · One Human Sequence Typed in 10-pitch font, one human sequence would stretch for more than 5,000 miles. Digitally formatted,](https://reader034.vdocuments.us/reader034/viewer/2022042711/5f77afc4835d8e5461177513/html5/thumbnails/96.jpg)
96
Slides:
http://www.esp.org/rjr/canberra.pdfhttp://www.esp.org/rjr/canberra.pdf