summer bioinformatics workshop 2008 chi-cheng lin, ph.d., professor department of computer science...

20
mmer Bioinformatics Workshop 2008 Chi-Cheng Lin, Ph.D., Professor Department of Computer Science Winona State University – Rochester [email protected] Introduction to Bioinformatics Summer Bioinformatics Workshop 2008

Post on 21-Dec-2015

215 views

Category:

Documents


2 download

TRANSCRIPT

Page 1: Summer Bioinformatics Workshop 2008 Chi-Cheng Lin, Ph.D., Professor Department of Computer Science Winona State University – Rochester clin@winona.edu

Summer Bioinformatics Workshop 2008

Chi-Cheng Lin, Ph.D., ProfessorDepartment of Computer Science

Winona State University – [email protected]

Introduction to Bioinformatics

Summer Bioinformatics Workshop 2008

Page 2: Summer Bioinformatics Workshop 2008 Chi-Cheng Lin, Ph.D., Professor Department of Computer Science Winona State University – Rochester clin@winona.edu

2

Summer Bioinformatics Workshop 2008

Outline

• What is Bioinformatics

• The Human Genome Project

• Applications of Bioinformatics

• References

Acknowledgement: The presentation includes adaptations from DOE’s “Human Genome Project and Beyond Primer” and Dr. Yan Asmann’s (Mayo Clinic) lecture notes

Page 3: Summer Bioinformatics Workshop 2008 Chi-Cheng Lin, Ph.D., Professor Department of Computer Science Winona State University – Rochester clin@winona.edu

3

Summer Bioinformatics Workshop 2008

Bioinformatics

• Living things have the ability to store, utilize, and pass on information

• Bioinformatics strives to – determine what information is biologically

important– decipher how it is used to precisely control the

chemical environment within living organisms

Page 4: Summer Bioinformatics Workshop 2008 Chi-Cheng Lin, Ph.D., Professor Department of Computer Science Winona State University – Rochester clin@winona.edu

4

Summer Bioinformatics Workshop 2008

What is Bioinformatics

• The collaboration of

Biology and Informatics

• Originally referred to the use of computational tools to organize and analyze genetic and protein sequence data (first coined by Dr. Hwa Lim in 1988)

Page 5: Summer Bioinformatics Workshop 2008 Chi-Cheng Lin, Ph.D., Professor Department of Computer Science Winona State University – Rochester clin@winona.edu

5

Summer Bioinformatics Workshop 2008

NCBI’s Definition of Bioinformatics

• NCBI (National Center for Biotechnology Information, http://www.ncbi.nlm.nih.gov/) – “Bioinformatics is the field of science in

which biology, computer science, and information technology merge to form a single discipline.”

– “The ultimate goal of the field is to enable the discovery of new biological insights as well as to create a global perspective from which unifying principles in biology can be discerned.”

Page 6: Summer Bioinformatics Workshop 2008 Chi-Cheng Lin, Ph.D., Professor Department of Computer Science Winona State University – Rochester clin@winona.edu

6

Summer Bioinformatics Workshop 2008

Human Genome Project

Page 7: Summer Bioinformatics Workshop 2008 Chi-Cheng Lin, Ph.D., Professor Department of Computer Science Winona State University – Rochester clin@winona.edu

7

Summer Bioinformatics Workshop 2008

Human Genome Project

• Goals include– Identify genes in human DNA– Determine sequence making up human DNA– Store this information in databases– Improve tools for data analysis– Etc.

• Milestone– April 2003: HGP sequencing is completed and

project is declared finished two years ahead of schedule

Page 8: Summer Bioinformatics Workshop 2008 Chi-Cheng Lin, Ph.D., Professor Department of Computer Science Winona State University – Rochester clin@winona.edu

8

Summer Bioinformatics Workshop 2008

Interesting Numbers characterizing the Human Genome

• 3 billion:– The number of chemical nucleotide bases (A, C, G,

and T) contained in the haploid human genome • 3 million:

– The number of locations where single-base DNA differences occur in the human genome

• 2.4 million:– The number of bases comprising the largest known

human gene (the average gene comprises 3000 bases)

• 30,000:– The total number of genes estimated (much lower

than previous estimates of 80,000 to 140,000)

Page 9: Summer Bioinformatics Workshop 2008 Chi-Cheng Lin, Ph.D., Professor Department of Computer Science Winona State University – Rochester clin@winona.edu

9

Summer Bioinformatics Workshop 2008

Interesting Numbers characterizing the Human Genome

• 99.9%– Fraction of nucleotide bases that are exactly the same

in all people• 50%

– Fraction of discovered genes for which function is unknown

• 2% – Fraction of genome that codes for proteins (the rest:

“junk”(?) DNA)• 9%, 11%, 26%, 28%, 45%, 83%, 89%, and 95%

– The percentage of genes E. coli, rice, roundworm, yeast, fruit fly, zebrafish, mouse, and chimpanzee share with humans, respectively.

Page 10: Summer Bioinformatics Workshop 2008 Chi-Cheng Lin, Ph.D., Professor Department of Computer Science Winona State University – Rochester clin@winona.edu

10

Summer Bioinformatics Workshop 2008

How does the human genome stack up?Organism Genome Size

(Bases)Estimated Genes

Human (Homo sapiens) 3 billion 30,000

Laboratory mouse

(M. musculus)2.6 billion 30,000

Mustard weed (A. thaliana) 100 million 25,000

Roundworm (C. elegans) 97 million 19,000

Fruit fly (D. melanogaster) 137 million 13,000

Yeast (S. cerevisiae) 12.1 million 6,000

Bacterium (E. coli) 4.6 million 3,200

Human immunodeficiency virus (HIV)

9700 9

Humans share most of the same protein families with worms, flies, and plants!

Page 11: Summer Bioinformatics Workshop 2008 Chi-Cheng Lin, Ph.D., Professor Department of Computer Science Winona State University – Rochester clin@winona.edu

11

Summer Bioinformatics Workshop 2008

Anticipated Benefits of Genome Research

• Molecular medicine • Microbial genomics• Bioarchaeology• Anthropology• Evolution• Human Migration• DNA identification (forensics)• Agriculture, livestock breeding, and

bioprocessing

Page 12: Summer Bioinformatics Workshop 2008 Chi-Cheng Lin, Ph.D., Professor Department of Computer Science Winona State University – Rochester clin@winona.edu

12

Summer Bioinformatics Workshop 2008

ELSI: Ethical, Legal, and Social Issues

• Privacy and confidentiality of genetic information• Fairness in the use of genetic information• Psychological impact, stigmatization, and discrimination• Reproductive issues• Clinical issues• Uncertainties associated with gene tests for

susceptibilities and complex conditions• Fairness in access to advanced genomic technologies. • Conceptual and philosophical implications • Health and environmental issues • Commercialization of products

Page 13: Summer Bioinformatics Workshop 2008 Chi-Cheng Lin, Ph.D., Professor Department of Computer Science Winona State University – Rochester clin@winona.edu

13

Summer Bioinformatics Workshop 2008

Mike Thompson, Detroit, Michigan -- from The Detroit Free Press Source: http://cagle.msnbc.com/news/gene/gene5.asp

Page 14: Summer Bioinformatics Workshop 2008 Chi-Cheng Lin, Ph.D., Professor Department of Computer Science Winona State University – Rochester clin@winona.edu

14

Summer Bioinformatics Workshop 2008

Future Challenges: What We Still Don’t Know

• Gene prediction and discovery– location, function, structure, regulation, etc.

• Single-base DNA variations among individuals– Correlation with health and disease– Disease-susceptibility prediction

• Genes involved in complex traits and multigene disorders• Protein conservation (structure and function)• Proteomes (total protein content and function) in organisms• Systems biology

– Coordination of gene expression and protein synthesis – Interaction of proteins in complex molecular machines– Microbial consortia useful for environmental restoration

• Developmental genetics and genomics• Evolutionary conservation among organisms

• And many more …

Page 15: Summer Bioinformatics Workshop 2008 Chi-Cheng Lin, Ph.D., Professor Department of Computer Science Winona State University – Rochester clin@winona.edu

15

Summer Bioinformatics Workshop 2008

Tackle Future Challenges: Bioinformatics

• High volume of data to store, compute, and analyze

• Huge amount of information to retrieve, interpret, and visualize

• Complex system to study, model, and simulate

THAT’S WHY BIOINFORMATICS

IS INDISPENSABLE!!

Page 16: Summer Bioinformatics Workshop 2008 Chi-Cheng Lin, Ph.D., Professor Department of Computer Science Winona State University – Rochester clin@winona.edu

16

Summer Bioinformatics Workshop 2008

Genomics Studies

• Genomics– Study of the whole genome– Sequencing and annotating genomes

• Comparative genomics– Comparison and characterization of genomes from different

species to identify genes and their functions and to investigate evolutionary history

• Functional genomics– Understanding the function of genes and other parts of the

genome • Structural genomics

– Determining the 3D structure of all proteins• Pharmacogenomics

– Study of how an individual's genetic inheritance affects the body's response to drugs

Page 17: Summer Bioinformatics Workshop 2008 Chi-Cheng Lin, Ph.D., Professor Department of Computer Science Winona State University – Rochester clin@winona.edu

17

Summer Bioinformatics Workshop 2008

Genome Sequencing

Drew Sheneman, New Jersey -- The Newark Star Ledger Source: http://cagle.msnbc.com/news/gene/gene14.asp

Page 18: Summer Bioinformatics Workshop 2008 Chi-Cheng Lin, Ph.D., Professor Department of Computer Science Winona State University – Rochester clin@winona.edu

18

Summer Bioinformatics Workshop 2008

Human Migration Patterns using DNA Sequences

Page 19: Summer Bioinformatics Workshop 2008 Chi-Cheng Lin, Ph.D., Professor Department of Computer Science Winona State University – Rochester clin@winona.edu

19

Summer Bioinformatics Workshop 2008

• Anticipated benefits:– Improved diagnosis of disease– Earlier detection of genetic predispositions

to disease– Pharmacogenomics:

• Genetic testing before prescribing drugs• Dose-selection based on genetic variations• Drugs tailor-made to each patient

Medicine and the New Genetics

Gene Testing Pharmacogenomics Gene Therapy

However, the application of pharmacogenomics in medical practice is still quite limited today, due to the lack of genetic information from a large population

Page 20: Summer Bioinformatics Workshop 2008 Chi-Cheng Lin, Ph.D., Professor Department of Computer Science Winona State University – Rochester clin@winona.edu

20

Summer Bioinformatics Workshop 2008

References

• NCBI (National Center for Biotechnology Information) http://www.ncbi.nlm.nih.gov/ homepage

• NCBI Science Primer http://www.ncbi.nlm.nih.gov/About/primer/

• Human Genome Project Information http://www.ornl.gov/sci/techresources/Human_Genome/home.shtml (esp. link to the Education module)

• The Human Genome Project and Beyond Primer http://www.ornl.gov/sci/techresources/Human_Genome/publicat/primer2001/primer.ppt