bioinformatics session...bioinformatics session kathryn dempsey, ph.d. research associate school of...

55
Bioinformatics Session Kathryn Dempsey, Ph.D. Research Associate School of Interdisciplinary Informatics University of Nebraska at Omaha Email: [email protected] Phone: 402-554-2562

Upload: others

Post on 18-Jun-2020

1 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Bioinformatics Session...Bioinformatics Session Kathryn Dempsey, Ph.D. Research Associate School of Interdisciplinary Informatics University of Nebraska at Omaha Email: kdempsey@unomaha.edu

Bioinformatics Session

Kathryn Dempsey, Ph.D.

Research Associate

School of Interdisciplinary Informatics

University of Nebraska at Omaha

Email: [email protected]

Phone: 402-554-2562

Page 2: Bioinformatics Session...Bioinformatics Session Kathryn Dempsey, Ph.D. Research Associate School of Interdisciplinary Informatics University of Nebraska at Omaha Email: kdempsey@unomaha.edu

OVERVIEW

• Introductions

• Part I: What is bioinformatics?– Activity 1: Extraction of DNA

• Part II: How is bioinformatics useful?– Activity 2: Finding Disease Genes

• Conclusions

Page 3: Bioinformatics Session...Bioinformatics Session Kathryn Dempsey, Ph.D. Research Associate School of Interdisciplinary Informatics University of Nebraska at Omaha Email: kdempsey@unomaha.edu

OVERVIEW

• Introductions

• Part I: What is bioinformatics?– Activity 1: Extraction of DNA

• Part II: How is bioinformatics useful?– Activity 2: Finding Disease Genes

• Conclusions

Page 4: Bioinformatics Session...Bioinformatics Session Kathryn Dempsey, Ph.D. Research Associate School of Interdisciplinary Informatics University of Nebraska at Omaha Email: kdempsey@unomaha.edu

• The term was originally coined in the mid-1980s to refer to analysis of biological sequences*

• Later, used to describe all computer applications in biological sciences.– Definition varies

• Bioinformatics is a new scientific discipline with foundations in computer science and molecular biology (and chemistry and mathematics and statistics and. . .)

• Very few formally trained bioinformaticians—most have migrated from other fields (myself included)

WHAT IS BIOINFORMATICS?

Page 5: Bioinformatics Session...Bioinformatics Session Kathryn Dempsey, Ph.D. Research Associate School of Interdisciplinary Informatics University of Nebraska at Omaha Email: kdempsey@unomaha.edu

WHAT IS BIOINFORMATICS?

calculusstatistics

discrete math

calculusstatistics

discrete math

computer science

Programmingalgorithms

computer science

Programmingalgorithms

geneticsmolecular

biologychemistry

geneticsmolecular

biologychemistry

databases text mining

parallel computing

databases text mining

parallel computing

BIOINFORMATICS

Page 6: Bioinformatics Session...Bioinformatics Session Kathryn Dempsey, Ph.D. Research Associate School of Interdisciplinary Informatics University of Nebraska at Omaha Email: kdempsey@unomaha.edu

Brief History of Bioinformatics

Charles Darwin

publishes “The

Origin of

Species”

18

59

Gregor Mendel

publishes allelic

work

18

65

“Genetics” first

used

19

05

Alan Turing

develops the

“Turing Machine”

19

36

19

53

Francis & Crick’s

discovery of the

double helix

structure

19

64

World’s first

supercomputer!1

97

0

First use of

“bioinformatics”

19

74

Internet

is born!

19

75

Microsoft

is born!

19

87

Perl is

released

19

90

Linus Torvalds

announces Linux

19

92

Affymetrix

founded

20

01

Human genome

sequenced

20

06

23andMe

founded

Page 7: Bioinformatics Session...Bioinformatics Session Kathryn Dempsey, Ph.D. Research Associate School of Interdisciplinary Informatics University of Nebraska at Omaha Email: kdempsey@unomaha.edu

WHY DO WE NEED BIOINFORMATICS?

http://www.ncbi.nlm.nih.gov/genbank/statistics

Biomedical “Big data”!

• High-throughput experiments

• Genomes, personalized sequencing

• Complexity of disease

• Health records, public health

• Disaster preparedness

Page 8: Bioinformatics Session...Bioinformatics Session Kathryn Dempsey, Ph.D. Research Associate School of Interdisciplinary Informatics University of Nebraska at Omaha Email: kdempsey@unomaha.edu

“Big Data”

Image Source: http://api.ning.com/files/O6-JQcfS6sxRuZ8I2i5nJVa59xL-krT-a6UqeoLNaHwL2w-JSR-Cy56PmikOywRQgy2gDfYxLAb0Hs*VFr8IePv5QFBJdhDH/BigData.001.jpg

Page 9: Bioinformatics Session...Bioinformatics Session Kathryn Dempsey, Ph.D. Research Associate School of Interdisciplinary Informatics University of Nebraska at Omaha Email: kdempsey@unomaha.edu

http://www.genome-engineering.com/from-hypothesis-to-data-driven-research.html

TRADITIONAL BIOLOGY VS.

BIOINFORMATICS?

Bioi

nfor

mat

ics

Gather Lots of Data

Gather Lots of Data

Find patterns, needles in dataFind patterns, needles in data

Test the modelTest the model

Make a computational

model!

Make a computational

model!

Page 10: Bioinformatics Session...Bioinformatics Session Kathryn Dempsey, Ph.D. Research Associate School of Interdisciplinary Informatics University of Nebraska at Omaha Email: kdempsey@unomaha.edu

TYPICAL PROJECTS

Page 11: Bioinformatics Session...Bioinformatics Session Kathryn Dempsey, Ph.D. Research Associate School of Interdisciplinary Informatics University of Nebraska at Omaha Email: kdempsey@unomaha.edu

TYPICAL PROJECTS

Page 12: Bioinformatics Session...Bioinformatics Session Kathryn Dempsey, Ph.D. Research Associate School of Interdisciplinary Informatics University of Nebraska at Omaha Email: kdempsey@unomaha.edu

THE CENTRAL DOGMA OF

BIOINFORMATICS

Page 13: Bioinformatics Session...Bioinformatics Session Kathryn Dempsey, Ph.D. Research Associate School of Interdisciplinary Informatics University of Nebraska at Omaha Email: kdempsey@unomaha.edu

DNA (Deoxyribose Nucleic Acid)

Genetic material

Polymer of nitrogenous

bases (A, T, G & C)

Contains hereditary information (Genes) in the chromosome

Chromosome is a thread like linear strand of DNA and associated proteins (histones)

Chromosomes constitute the genome of an organism

Page 14: Bioinformatics Session...Bioinformatics Session Kathryn Dempsey, Ph.D. Research Associate School of Interdisciplinary Informatics University of Nebraska at Omaha Email: kdempsey@unomaha.edu

A Quick note on Chromosomes…

• Human cells are diploid

• Chromosomes

– Autosomes (22, 2 copies)

– X|Y (2 total – XX or XY)

– 23 pairs (2x23) or

– 46 chromosomes Total

• Strawberries are normally diploid but growers have modified them

• The ones we will use today are octaploid!

• Normal=– 7 types, 2 copies = 14 chr

– 7 types, 8 copies = 56 chr

• Bananas– 11 types, 2 copies = 22chr

Page 15: Bioinformatics Session...Bioinformatics Session Kathryn Dempsey, Ph.D. Research Associate School of Interdisciplinary Informatics University of Nebraska at Omaha Email: kdempsey@unomaha.edu

Rosalind Franklin!

Page 16: Bioinformatics Session...Bioinformatics Session Kathryn Dempsey, Ph.D. Research Associate School of Interdisciplinary Informatics University of Nebraska at Omaha Email: kdempsey@unomaha.edu

THE CENTRAL DOGMA OF

BIOINFORMATICS

Page 17: Bioinformatics Session...Bioinformatics Session Kathryn Dempsey, Ph.D. Research Associate School of Interdisciplinary Informatics University of Nebraska at Omaha Email: kdempsey@unomaha.edu

RNA (RiboNucleic Acid)

Polymer of nucleotides (A,

U, G & C)

The temporary copy of a

gene

Copied in the nucleus,

transported to cytoplasm to

become protein

Page 18: Bioinformatics Session...Bioinformatics Session Kathryn Dempsey, Ph.D. Research Associate School of Interdisciplinary Informatics University of Nebraska at Omaha Email: kdempsey@unomaha.edu

THE CENTRAL DOGMA OF

BIOINFORMATICS

Page 19: Bioinformatics Session...Bioinformatics Session Kathryn Dempsey, Ph.D. Research Associate School of Interdisciplinary Informatics University of Nebraska at Omaha Email: kdempsey@unomaha.edu

Proteins

– Functional unit of life

– Polymer of 20 naturally

occurring amino acids

– Made from RNA

molecules during

translation by ribosome

Page 20: Bioinformatics Session...Bioinformatics Session Kathryn Dempsey, Ph.D. Research Associate School of Interdisciplinary Informatics University of Nebraska at Omaha Email: kdempsey@unomaha.edu

Onward to Activity 1

• Central Dogma

– If you remember one thing, remember this!

• Bioinformatics has roots in biology

• To learn what the human genome is, we must

first get the genome out of the cells!

Page 21: Bioinformatics Session...Bioinformatics Session Kathryn Dempsey, Ph.D. Research Associate School of Interdisciplinary Informatics University of Nebraska at Omaha Email: kdempsey@unomaha.edu

ACTIVITY 1:

Extracting DNA from a Strawberryhttp://www.youtube.com/watch?v=hOpu4iN5Bh4&noredirect=1

http://www.fanpop.com/clubs/fruit/images/34914851/title/strawberry-photo

http://img.wonderhowto.com/img/97/38/63488691134384/0/extract-dna-from-strawberry-with-basic-kitchen-items.w654.jpg

Page 22: Bioinformatics Session...Bioinformatics Session Kathryn Dempsey, Ph.D. Research Associate School of Interdisciplinary Informatics University of Nebraska at Omaha Email: kdempsey@unomaha.edu

OVERVIEW

• Introductions

• Part I: What is bioinformatics?– Activity 1: Extraction of DNA

• Part II: How is bioinformatics useful?– Activity 2: Finding Disease Genes

• Conclusions

Page 23: Bioinformatics Session...Bioinformatics Session Kathryn Dempsey, Ph.D. Research Associate School of Interdisciplinary Informatics University of Nebraska at Omaha Email: kdempsey@unomaha.edu

OVERVIEW

• Introductions

• Part I: What is bioinformatics?– Activity 1: Extraction of DNA

• Part II: How is bioinformatics useful?– Activity 2: Finding Disease Genes

• Conclusions

Page 24: Bioinformatics Session...Bioinformatics Session Kathryn Dempsey, Ph.D. Research Associate School of Interdisciplinary Informatics University of Nebraska at Omaha Email: kdempsey@unomaha.edu

Why don’t we have personalized medicine?

Where is the cure for cancer?

Why is AIDS still misunderstood?

Page 25: Bioinformatics Session...Bioinformatics Session Kathryn Dempsey, Ph.D. Research Associate School of Interdisciplinary Informatics University of Nebraska at Omaha Email: kdempsey@unomaha.edu

Personalized Medicine Study

• 54yr old male volunteer

• Plasma and serum used for testing

• 14 month time course

• Complete medical exams and labs at each meeting (20 time points total)

• Extensive sampling at 2 periods of viral infection:– HRV (human rhinovirus) – common cold

– RSV (respirtatory synticial) - bronchitis

Page 26: Bioinformatics Session...Bioinformatics Session Kathryn Dempsey, Ph.D. Research Associate School of Interdisciplinary Informatics University of Nebraska at Omaha Email: kdempsey@unomaha.edu
Page 27: Bioinformatics Session...Bioinformatics Session Kathryn Dempsey, Ph.D. Research Associate School of Interdisciplinary Informatics University of Nebraska at Omaha Email: kdempsey@unomaha.edu

Personalized Medicine

Page 28: Bioinformatics Session...Bioinformatics Session Kathryn Dempsey, Ph.D. Research Associate School of Interdisciplinary Informatics University of Nebraska at Omaha Email: kdempsey@unomaha.edu

Techniques Used

• Summary of techniques used:– Sample collection

– HRV and RSV detection

– Whole-genome sequencing

– Whole-exome sequencing

– Sanger-DNA sequencing

– Whole-transcriptome sequencing: mRNA-Seq

– Small RNA sequencing: microRNA-Seq

– Serum Shotgun Proteome Profiling

– Serum Metabolome Profiling

– Serum Cytokine Profiling

– Autoantibodyome Profiling

– Telomere Length Assay

– Genome Phasing

– Omics Data Analysis

Page 29: Bioinformatics Session...Bioinformatics Session Kathryn Dempsey, Ph.D. Research Associate School of Interdisciplinary Informatics University of Nebraska at Omaha Email: kdempsey@unomaha.edu

Why don’t we have personalized medicine?

Where is the cure for cancer?

Why is AIDS still misunderstood?

We don’t know everything.

There’s lots and lots of data.

Life is complex.

Everyone is unique.

Page 30: Bioinformatics Session...Bioinformatics Session Kathryn Dempsey, Ph.D. Research Associate School of Interdisciplinary Informatics University of Nebraska at Omaha Email: kdempsey@unomaha.edu

Databases

• Pubmed – Journal articles on biomedical

research

• OMIM – Disease genes in humans

• Genbank – All known data on genes and their

proteins, and their DNA sequence

• PDB – 3-D proteins structures

• MGD – Organism specifc (mouse)

Page 31: Bioinformatics Session...Bioinformatics Session Kathryn Dempsey, Ph.D. Research Associate School of Interdisciplinary Informatics University of Nebraska at Omaha Email: kdempsey@unomaha.edu
Page 32: Bioinformatics Session...Bioinformatics Session Kathryn Dempsey, Ph.D. Research Associate School of Interdisciplinary Informatics University of Nebraska at Omaha Email: kdempsey@unomaha.edu

Tools

BLAST

If I give you a gene sequence, tell

me which of the billions of known

sequences is most similar to it.

Page 33: Bioinformatics Session...Bioinformatics Session Kathryn Dempsey, Ph.D. Research Associate School of Interdisciplinary Informatics University of Nebraska at Omaha Email: kdempsey@unomaha.edu

Tools

Alignment (Clustal, MUSCLE, Tcoffee)

If I give you a bunch of

sequences, tell me where the are

the same and where they are

different.

Page 34: Bioinformatics Session...Bioinformatics Session Kathryn Dempsey, Ph.D. Research Associate School of Interdisciplinary Informatics University of Nebraska at Omaha Email: kdempsey@unomaha.edu

Tools

Phylogenetics

If I give you a bunch of sequences

from different animals, tell me how

they are related.

Page 35: Bioinformatics Session...Bioinformatics Session Kathryn Dempsey, Ph.D. Research Associate School of Interdisciplinary Informatics University of Nebraska at Omaha Email: kdempsey@unomaha.edu

>Mouse

ATTCAGATCA

>Rat

TTTCAGATCG

>Horse

TACCAATCGC

Page 36: Bioinformatics Session...Bioinformatics Session Kathryn Dempsey, Ph.D. Research Associate School of Interdisciplinary Informatics University of Nebraska at Omaha Email: kdempsey@unomaha.edu

ATTCAGATCA

TTTCAGATCG

TACCAATCGC

80% Similar

20% Similar

Page 37: Bioinformatics Session...Bioinformatics Session Kathryn Dempsey, Ph.D. Research Associate School of Interdisciplinary Informatics University of Nebraska at Omaha Email: kdempsey@unomaha.edu

99.9%

96-99.4%

94%

50-60%

Page 38: Bioinformatics Session...Bioinformatics Session Kathryn Dempsey, Ph.D. Research Associate School of Interdisciplinary Informatics University of Nebraska at Omaha Email: kdempsey@unomaha.edu

We already have some databases and tools .

• ….but we need more to solve those questions.

• Example: A disease where only one set of 3 DNA bases is missing. – Do you know what this disease is?

• Activity 2: – The knowledge in bioinformatics databases

– How to use some tools: BLAST, Alignment, Translation!

Page 39: Bioinformatics Session...Bioinformatics Session Kathryn Dempsey, Ph.D. Research Associate School of Interdisciplinary Informatics University of Nebraska at Omaha Email: kdempsey@unomaha.edu

OVERVIEW

• Introductions

• Part I: What is bioinformatics?– Activity 1: Extraction of DNA

• Part II: How is bioinformatics useful?– Activity 2: Finding Disease Genes

• Conclusions

Page 40: Bioinformatics Session...Bioinformatics Session Kathryn Dempsey, Ph.D. Research Associate School of Interdisciplinary Informatics University of Nebraska at Omaha Email: kdempsey@unomaha.edu

o Massive amounts of data

o Many generation methods

o FEW ANALYSIS methods

o “Signal corruption”

o How to model data??

o How to extract knowledge?

Jeong et al.,2001. Nature 411: 41-42.

Motivation

Page 41: Bioinformatics Session...Bioinformatics Session Kathryn Dempsey, Ph.D. Research Associate School of Interdisciplinary Informatics University of Nebraska at Omaha Email: kdempsey@unomaha.edu

• A network:

• Elements and their interactions.

• Nodes elements

• Edges interactions

• Any relationship can be modeled using the network model

Page 42: Bioinformatics Session...Bioinformatics Session Kathryn Dempsey, Ph.D. Research Associate School of Interdisciplinary Informatics University of Nebraska at Omaha Email: kdempsey@unomaha.edu

Gene 1

Gene 2

Prot A

Prot B

John Bob

Page 43: Bioinformatics Session...Bioinformatics Session Kathryn Dempsey, Ph.D. Research Associate School of Interdisciplinary Informatics University of Nebraska at Omaha Email: kdempsey@unomaha.edu

Young edges

Mid edges

Aged edges

Page 44: Bioinformatics Session...Bioinformatics Session Kathryn Dempsey, Ph.D. Research Associate School of Interdisciplinary Informatics University of Nebraska at Omaha Email: kdempsey@unomaha.edu

BRCA1(mutant)

BRCA1(mutant)

PARP1(needed

by

BRCA1)

PARP1(needed

by

BRCA1)

Normal Diseased

BRCA1(healthy)

BRCA1(healthy)

BRCA1(mutant)

BRCA1(mutant)

PARP1PARP1

Treated

Cell death by

apoptosis

Uncontrolled cell

growth

Cell death by

apoptosis

PARP1

Inhibitor

EXAMPLE: SYNTHETIC LETHALITY NETWORKS

Page 45: Bioinformatics Session...Bioinformatics Session Kathryn Dempsey, Ph.D. Research Associate School of Interdisciplinary Informatics University of Nebraska at Omaha Email: kdempsey@unomaha.edu

Activity 3: Networks

Page 46: Bioinformatics Session...Bioinformatics Session Kathryn Dempsey, Ph.D. Research Associate School of Interdisciplinary Informatics University of Nebraska at Omaha Email: kdempsey@unomaha.edu
Page 47: Bioinformatics Session...Bioinformatics Session Kathryn Dempsey, Ph.D. Research Associate School of Interdisciplinary Informatics University of Nebraska at Omaha Email: kdempsey@unomaha.edu
Page 48: Bioinformatics Session...Bioinformatics Session Kathryn Dempsey, Ph.D. Research Associate School of Interdisciplinary Informatics University of Nebraska at Omaha Email: kdempsey@unomaha.edu
Page 49: Bioinformatics Session...Bioinformatics Session Kathryn Dempsey, Ph.D. Research Associate School of Interdisciplinary Informatics University of Nebraska at Omaha Email: kdempsey@unomaha.edu
Page 50: Bioinformatics Session...Bioinformatics Session Kathryn Dempsey, Ph.D. Research Associate School of Interdisciplinary Informatics University of Nebraska at Omaha Email: kdempsey@unomaha.edu
Page 51: Bioinformatics Session...Bioinformatics Session Kathryn Dempsey, Ph.D. Research Associate School of Interdisciplinary Informatics University of Nebraska at Omaha Email: kdempsey@unomaha.edu

The nodes with the most

interactions are almost always

going to get the signal….

….whether it be the flu or the

winning lotto numbers.

Page 52: Bioinformatics Session...Bioinformatics Session Kathryn Dempsey, Ph.D. Research Associate School of Interdisciplinary Informatics University of Nebraska at Omaha Email: kdempsey@unomaha.edu

Networks!

• Human social networks work this way

• Cell networks work this way

• Many other networks act this way

• Flu pandemic planning

• Vaccination planning

• Drug targets for the cell

• National security planning

Page 53: Bioinformatics Session...Bioinformatics Session Kathryn Dempsey, Ph.D. Research Associate School of Interdisciplinary Informatics University of Nebraska at Omaha Email: kdempsey@unomaha.edu

Conclusions

• DNA RNA Protein

• We need bioinformatics– Understanding cellular systems

– Personalized medicine

– Prevention vs. treatment

• Many skills gained– Biomedical research

– Computer science

– Mathematics

– Team science

– …. & many more!

Page 54: Bioinformatics Session...Bioinformatics Session Kathryn Dempsey, Ph.D. Research Associate School of Interdisciplinary Informatics University of Nebraska at Omaha Email: kdempsey@unomaha.edu

A Career in Bioinformatics

Skills needed Programming (e.g., Perl, Python, Java, C++, PHP)

Database administration (e.g. MySQL, Oracle )

UNIX/Linux Operating System

Information Management

Types of Jobs Scientific curators, Software Developer, Network Engineering,

Administrator/analyst, Bio-Statistics or any jobs where biologists are currently hired.

Types of Employers Pharmaceutical, Biotech and Software development companies

Academic Institutes and Hospitals

Research Institutes (JCVI, Broad Institute, JGI, Tgen, )Summer Workshop 2013