1 functional genomics introduction julie a dickerson electrical and computer engineering iowa state...

54
1 Functional Genomics Introduction Julie A Dickerson Electrical and Computer Engineering Iowa State University

Upload: alberta-tate

Post on 12-Jan-2016

212 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: 1 Functional Genomics Introduction Julie A Dickerson Electrical and Computer Engineering Iowa State University

1

Functional Genomics Introduction

Julie A Dickerson

Electrical and Computer Engineering

Iowa State University

Page 2: 1 Functional Genomics Introduction Julie A Dickerson Electrical and Computer Engineering Iowa State University

Module Structure: Day 1

Introduction to Functional Genomics Transcriptomics

Analysis and Experiment Design for Microarray Data (Dr. Peng Liu)

RNA-Seq Data (Mr. Kun Liang) LAB:

Using R for Normalizing, processing microarray data, and clustering analysis of ‘omics data (John Van Hemert)

Page 3: 1 Functional Genomics Introduction Julie A Dickerson Electrical and Computer Engineering Iowa State University

June 15, 2010

BBSI - 2010

3

Module Structure: Day 2

Metabolomics (Dr. Ann Perera) Proteomics (Dr. Young-Jin Lee) Pathways and data integration methods (Dr.

Julie Dickerson and Erin Boggess)

Lab: Analyzing integrated sets of microarray, proteomics

and metabolomics data (Erin Boggess)

Page 4: 1 Functional Genomics Introduction Julie A Dickerson Electrical and Computer Engineering Iowa State University

4

F1: Outline

Module Structure What is Functional Genomics? Data Types Available Transcriptomics

Basic biology behind microarrays What can you learn from microarrays? Types of arrays Limitations of microarrays

Page 5: 1 Functional Genomics Introduction Julie A Dickerson Electrical and Computer Engineering Iowa State University

5

Functional Genomics Definition Functional genomics is a field of molecular

biology that attempts to make use of the data produced by genomic projects to describe gene (and protein) functions and interactions. Functional genomics focuses on the dynamic aspects such as gene transcription, translation, and protein-protein interactions, as opposed to the static aspects such as DNA sequence or structures.

From Wikipedia, the free encyclopedia

Page 6: 1 Functional Genomics Introduction Julie A Dickerson Electrical and Computer Engineering Iowa State University

Genome Wide View of Metabolism

Streptococcuspneumoniae

Explore capabilities of global network How do we go from a pretty picture to a

model we can manipulate?

Page 7: 1 Functional Genomics Introduction Julie A Dickerson Electrical and Computer Engineering Iowa State University

Metabolic Pathways

Metabolitesglucose

Enzymesphosphofructokinase

Reactions & Stoichiometry1 F6P => 1 FBP

Kinetics

Regulationgene regulation

metabolite regulation

hexokinase

phosphoglucoisomerase

phosphofructokinase

aldolase

triosephosphate isomerase

G3P dehydrogenase

phosphoglycerate kinase

phosphoglycerate mutase

enolase

pyruvate kinase

Page 8: 1 Functional Genomics Introduction Julie A Dickerson Electrical and Computer Engineering Iowa State University

Metabolic Modeling: The Dream

Page 9: 1 Functional Genomics Introduction Julie A Dickerson Electrical and Computer Engineering Iowa State University

June 11, 2009 BBSI - 2009 9

Data Types Available for Determining Function Genomes Genes Proteins Metabolites Phenotypes

Sequence Microarrays,

Nextgen sequencing Proteomics Metabolomics Phenomics

Page 10: 1 Functional Genomics Introduction Julie A Dickerson Electrical and Computer Engineering Iowa State University

10

A VERY Simplified Eukaryotic Cell

nucleus

chromosome

DNA strands

DNA contains thousands of genes.

cytoplasm

Dan Nettleton, Department of Statistics, IOWA STATE UNIVERSITY,Copyright © 2008 Dan Nettleton

Page 11: 1 Functional Genomics Introduction Julie A Dickerson Electrical and Computer Engineering Iowa State University

11

Posttranscriptional Modificationsto Primary TranscriptPrimary transcript

Intervening sequences corresponding to intronsthat are removed through splicing

3’ UTR5’ UTR

Primary transcript after modification: messenger RNA (mRNA)

AAAAAA...AAAA

poly-A tailCoding portions of RNA sequencecorresponding to exons

5’ UTR 3’ UTR

5’ cap

G

Dan Nettleton, Department of Statistics, IOWA STATE UNIVERSITY,Copyright © 2008 Dan Nettleton

Page 12: 1 Functional Genomics Introduction Julie A Dickerson Electrical and Computer Engineering Iowa State University

12

Transcription takes place inside the nucleus.

nucleus

chromosome

DNA strands cytoplasm

Translation takes place outside the nucleus.

Dan Nettleton, Department of Statistics, IOWA STATE UNIVERSITY,Copyright © 2008 Dan Nettleton

Page 13: 1 Functional Genomics Introduction Julie A Dickerson Electrical and Computer Engineering Iowa State University

13

Translation

mRNA

Ribosome

amino acid sequence

folds to become a protein

Dan Nettleton, Department of Statistics, IOWA STATE UNIVERSITY,Copyright © 2008 Dan Nettleton

Page 14: 1 Functional Genomics Introduction Julie A Dickerson Electrical and Computer Engineering Iowa State University

14

During translation transfer RNA (tRNA) translates the genetic code

... ...A A C GU GU

codon codon

A A U

leu

U G C

thr

tRNAanticodon

amino acids

Dan Nettleton, Department of Statistics, IOWA STATE UNIVERSITY,Copyright © 2008 Dan Nettleton

Page 15: 1 Functional Genomics Introduction Julie A Dickerson Electrical and Computer Engineering Iowa State University

15

The Genetic Code

UUU phe UCU ser UAU tyr UGU cysUUC phe UCC ser UAC tyr UGC cysUUA leu UCA ser UAA STOP UGA STOPUUG leu UCG ser UAG STOP UGG trp

CUU leu CCU pro CAU his CGU argCUC leu CCC pro CAC his CGC argCUA leu CCA pro CAA gln CGA argCUG leu CCG pro CAG gln CGG arg

AUU ile ACU thr AAU asn AGU serAUC ile ACC thr AAC asn AGC serAUA ile ACA thr AAA lys AGA argAUG met ACG thr AAG lys AGG arg

GUU val GCU ala GAU asp GGU glyGUC val GCC ala GAC asp GGC glyGUA val GCA ala GAA glu GGA glyGUG val GCG ala GAG glu GGG gly

Firs

t B

ase

Second Base

U

C

A

G

U C A G

mRNAcodon

aminoacid

Dan Nettleton, Department of Statistics, IOWA STATE UNIVERSITY,Copyright © 2008 Dan Nettleton

Page 16: 1 Functional Genomics Introduction Julie A Dickerson Electrical and Computer Engineering Iowa State University

16

Miscellaneous Comments The biology is more complicated than I described.

Humans have somewhere around 30,000 genes. (The exact number is a subject for debate.) Regulation of these genes seems to be more important than number!

Much of the variation is created by differences in how cells use the genes they have.

Microarrays are a tool that can help us understand how cells of various types use their genes in response to varying conditions.

Page 17: 1 Functional Genomics Introduction Julie A Dickerson Electrical and Computer Engineering Iowa State University

04/21/23BCB570 Gene Expression Data

Analysis 17

Microarrays With only a few exceptions, every

cell of the body contains a full set of chromosomes and identical genes.

Only a fraction of these genes are turned on, however, and it is the subset that is "expressed" that confers unique properties to each cell type.

"Gene expression" is the term used to describe the transcription of the information contained within the DNA, the repository of genetic information, into messenger RNA (mRNA) molecules that are then translated into the proteins that perform most of the critical functions of cells.

Page 18: 1 Functional Genomics Introduction Julie A Dickerson Electrical and Computer Engineering Iowa State University

04/21/23BCB570 Gene Expression Data

Analysis 18

Microarrays

Microarrays work by exploiting the ability of a given mRNA molecule (target) to bind specifically to, or hybridize to, the DNA template (probe) from which it originated.

This mechanism acts as both an "on/off" switch to control which genes are expressed in a cell as well as a "volume control" that increases or decreases the level of expression of particular genes as necessary.

Source: The Genetic Science Learning Center, University of Utah

Page 19: 1 Functional Genomics Introduction Julie A Dickerson Electrical and Computer Engineering Iowa State University

04/21/23BCB570 Gene Expression Data

Analysis 19

DNA Microarrays

Small, solid supports onto which the sequences from thousands of different genes are immobilized, or attached, at fixed locations.

The DNA is printed, spotted, or actually synthesized directly onto the support.

The spots themselves can be DNA, complementary DNA (cDNA, DNA synthesized from a mRNA template) , or oligonucleotides. (or oligo, a short fragment of a single-stranded DNA that is typically 5 to 50 nucleotides long)

Page 20: 1 Functional Genomics Introduction Julie A Dickerson Electrical and Computer Engineering Iowa State University

04/21/23BCB570 Gene Expression Data

Analysis 20

Why do microarray experiments? Comparing two conditions to find differentially

expressed genes Control/treatment Disease/normal

Compare more than two conditions; some of which may interact Different treatments, different strains

Exploratory analysis What genes are expressed under drought stress?

Page 21: 1 Functional Genomics Introduction Julie A Dickerson Electrical and Computer Engineering Iowa State University

04/21/23BCB570 Gene Expression Data

Analysis 21

Why use microarrays (cont)?

What happens over time? Developmental stages

Predicting certain conditions (cancer vs. normal)

Patterns of gene expression that characterize a patient’s or organism’s response

Page 22: 1 Functional Genomics Introduction Julie A Dickerson Electrical and Computer Engineering Iowa State University

04/21/23BCB570 Gene Expression Data

Analysis 22

Differentially Expressed Genes

Find genes that show a large difference in expression between groups and are similar within a group

Statistical tests (t-test), look at if the groups have different means or variances (chi-squared, F-statistics)

Adapted from “Practical Microarray Analysis”, Presentation by Benedikt Brors, German Cancer Research Center

Page 23: 1 Functional Genomics Introduction Julie A Dickerson Electrical and Computer Engineering Iowa State University

04/21/23BCB570 Gene Expression Data

Analysis 23

Multiple Conditions

Are there differences in expression level between the k conditions?

Analysis of Variance (ANOVA)

Mutant 1 Mutant 2

Inoculated Control Inoculated Control

Page 24: 1 Functional Genomics Introduction Julie A Dickerson Electrical and Computer Engineering Iowa State University

24

Some Example Microarray Experiments from Iowa State UniversityJim Reecy from Animal Science: muscle undergoing

hypertrophy vs. normal muscle

David Putthoff, Steve Rodermel, Thomas Baum fromPlant Pathology: roots infected with soybean cystnematodes vs. uninfected roots

Anne Bronikowski in Genetics: wheel-running mice vs.non-runners

Roger Wise, Rico Caldo in Plant Pathology: interactionbetween multiple isolates of powdery mildew andmultiple genotypes of barley.

Dan Nettleton, Department of Statistics, IOWA STATE UNIVERSITY,Copyright © 2008 Dan Nettleton

Page 25: 1 Functional Genomics Introduction Julie A Dickerson Electrical and Computer Engineering Iowa State University

Wild-type vs. Myostatin Knockout Mice

Belgian Blue cattle have a mutation in the myostatin gene.

Page 26: 1 Functional Genomics Introduction Julie A Dickerson Electrical and Computer Engineering Iowa State University

26

Identifying Genes Involved in Pathways That DistinguishCompatible from Incompatible Interactions

Barley Genotype

Mla6 Mla13 Mla1

Bg

h Is

ola

te

5874

K1

Incompatible

Incompatible Incompatible

IncompatibleCompatible

Compatible

Caldo, Nettleton, Wise (2004). The Plant Cell. 16, 2514-2528.

Page 27: 1 Functional Genomics Introduction Julie A Dickerson Electrical and Computer Engineering Iowa State University

27

An Example Gene of Interest

Hours after Inoculation

Log

Exp

ress

ion

Incompatible

Compatible

Dan Nettleton, Department of Statistics, IOWA STATE UNIVERSITY,Copyright © 2008 Dan Nettleton

Page 28: 1 Functional Genomics Introduction Julie A Dickerson Electrical and Computer Engineering Iowa State University

04/21/23BCB570 Gene Expression Data

Analysis 28

Exploratory Analysis

Find patterns in data to see what genes are expressed under different conditions

Analysis includes clustering methods Used when little or no prior knowledge exists about

the problem

Page 29: 1 Functional Genomics Introduction Julie A Dickerson Electrical and Computer Engineering Iowa State University

04/21/23BCB570 Gene Expression Data

Analysis 29Copyright ©1999 by the National Academy of Sciences

Perou, Charles M. et al. (1999) Proc. Natl. Acad. Sci. USA 96, 9212-9217

Fig. 5 (see Supplemental data at http://www.pnas.orgwww.pnas.org) for the full cluster diagram with all gene names\]

Page 30: 1 Functional Genomics Introduction Julie A Dickerson Electrical and Computer Engineering Iowa State University

04/21/23BCB570 Gene Expression Data

Analysis 30

Time Series

Goal: find patterns of co-expressed genes over time or partial time

Typical length is 3-10 time points Cluster to find similar patterns (k-means, self-organizing

maps) Correlations to find genes that behave like a given gene of

interest.

0 hours 4 hours 12 hours 24 hours

Page 31: 1 Functional Genomics Introduction Julie A Dickerson Electrical and Computer Engineering Iowa State University

04/21/23BCB570 Gene Expression Data

Analysis 31

Classification

Learn characteristic patterns from a training set and evaluate with a test set.

Classify tumor types based on expression patterns

Predict disease susceptibility, stages, etc.

Page 32: 1 Functional Genomics Introduction Julie A Dickerson Electrical and Computer Engineering Iowa State University

04/21/23BCB570 Gene Expression Data

Analysis 32Source: “Practical Microarray Analysis”, Presentation by Benedikt Brors, German Cancer Research Center

Page 33: 1 Functional Genomics Introduction Julie A Dickerson Electrical and Computer Engineering Iowa State University

33

Some Commonly Used Toolsfor Microarray Analysis Oligonucleotide arrays

Affymetrix GeneChips

Nimblegen

Agilent

Page 34: 1 Functional Genomics Introduction Julie A Dickerson Electrical and Computer Engineering Iowa State University

34

Oligonucleotides An oligonucleotide is a short sequence of nucleotides.

(oligonucleotide=oligo for short)

An oligonucleotide microarray is a microarray whose probes consist of synthetically created DNA oligonucleotides.

Probes sequences are chosen to have good and relatively uniform hybridization characteristics.

A probe is chosen to match a portion of its target mRNA transcript that is unique to that sequence.

Oligo probes can distinguish among multiple mRNA transcripts with similar sequences.

Dan Nettleton, Department of Statistics, IOWA STATE UNIVERSITY,Copyright © 2008 Dan Nettleton

Page 35: 1 Functional Genomics Introduction Julie A Dickerson Electrical and Computer Engineering Iowa State University

04/21/23BCB570 Gene Expression Data

Analysis 35

Simplified Example

gene 1

gene 2

shared green regions indicatehigh degree of sequence similaritythroughout much of the transcript

ATTACTAAGCATAGATTGCCGTATAoligo probefor gene 1

GCGTATGGCATGCCCGGTAAACTGG

oligo probe for gene 2

...

... ...

...

Source: Dan Nettleton Course Notes Statistics 416/516X

Page 36: 1 Functional Genomics Introduction Julie A Dickerson Electrical and Computer Engineering Iowa State University

36

Oligo Microarray Fabrication

Oligos can be synthesized and stored in solution.

Oligo sequences can be synthesized on a slide or chip using various commercial technologies.

The company Affymetrix uses a photolithographic approach which we will describe briefly.

Dan Nettleton, Department of Statistics, IOWA STATE UNIVERSITY,Copyright © 2008 Dan Nettleton

Page 37: 1 Functional Genomics Introduction Julie A Dickerson Electrical and Computer Engineering Iowa State University

37

Affymetrix GeneChips Affymetrix (www.affymetrix.com) manufactures

GeneChips.

GeneChips are oligonucleotide arrays.

Each gene (more accurately sequence of interest or feature) is represented by multiple short (25-nucleotide) oligo probes.

Some GeneChips include probes for around 120,000 genes and gene variants.

mRNA that has been extracted from a biological sample can be labeled (dyed) and hybridized to a GeneChip.

Only one sample is hybridized to each GeneChip.

Dan Nettleton, Department of Statistics, IOWA STATE UNIVERSITY,Copyright © 2008 Dan Nettleton

Page 38: 1 Functional Genomics Introduction Julie A Dickerson Electrical and Computer Engineering Iowa State University

04/21/23BCB570 Gene Expression Data

Analysis 38

Different Probe Pairs Represent Different Parts of the Same Gene

gene sequence

Probes are selected to be specific to the target geneand have good hybridization characteristics.

Source: Dan Nettleton Course Notes Statistics 416/516X

Page 39: 1 Functional Genomics Introduction Julie A Dickerson Electrical and Computer Engineering Iowa State University

39

Affymetrix Probe Sets A probe set is used to measure mRNA levels of a single

gene.

Each probe set consists of multiple probe cells.

Each probe cell contains millions of copies of one oligo.

Each oligo is intended to be 25 nucleotides in length.

Probe cells in a probe set are arranged in probe pairs.

Each probe pair contains a perfect match (PM) probe cell and a mismatch (MM) probe cell.

A PM oligo perfectly matches part of a gene sequence.

A MM oligo is identical to a PM oligo except that the middle nucleotide (13th of 25) is replaced by its complementary nucleotide.

Dan Nettleton, Department of Statistics, IOWA STATE UNIVERSITY,Copyright © 2008 Dan Nettleton

Page 40: 1 Functional Genomics Introduction Julie A Dickerson Electrical and Computer Engineering Iowa State University

40

A Probe Set for Measuring Expression Level of a Particular Gene

probepair

probecell

gene sequence...TGCAATGGGTCAGAAGGACTCCTATGTGCCT...AATGGGTCAGAAGGACTCCTATGTGAATGGGTCAGAACGACTCCTATGTG

perfect match sequencemismatch sequence

probe set

Dan Nettleton, Department of Statistics, IOWA STATE UNIVERSITY,Copyright © 2008 Dan Nettleton

Page 41: 1 Functional Genomics Introduction Julie A Dickerson Electrical and Computer Engineering Iowa State University

41

Different Probe Pairs Represent Different Parts of the Same Gene

gene sequence

Probes are selected to be specific to the target geneand have good hybridization characterictics.

Dan Nettleton, Department of Statistics, IOWA STATE UNIVERSITY,Copyright © 2008 Dan Nettleton

Page 42: 1 Functional Genomics Introduction Julie A Dickerson Electrical and Computer Engineering Iowa State University

42

Affymetrix’s Photolithographic Approach

GeneChip

maskmaskmaskmaskmaskmaskmask

mask

A ACC

GG

TT

TA

TT A

A C C

Dan Nettleton, Department of Statistics, IOWA STATE UNIVERSITY,Copyright © 2008 Dan Nettleton

Page 43: 1 Functional Genomics Introduction Julie A Dickerson Electrical and Computer Engineering Iowa State University

43

Sou

rce:

ww

w.a

ffym

etrix

.com

Page 44: 1 Functional Genomics Introduction Julie A Dickerson Electrical and Computer Engineering Iowa State University

44Source: www.affymetrix.com

Page 45: 1 Functional Genomics Introduction Julie A Dickerson Electrical and Computer Engineering Iowa State University

45Source: www.affymetrix.com

Page 46: 1 Functional Genomics Introduction Julie A Dickerson Electrical and Computer Engineering Iowa State University

46Source: www.affymetrix.com

Image from Hybridized GeneChip

Page 47: 1 Functional Genomics Introduction Julie A Dickerson Electrical and Computer Engineering Iowa State University

47

Image Processing for Affymetrix GeneChips

Image processing for Affymetrix GeneChips is typically done using proprietary Affymetrix software.

The entire surface of a GeneChip is covered with square-shaped cells containing probes.

Probes are synthesized on the chip in precise locations.

Thus spot finding and image segmentation are not major issues.

Dan Nettleton, Department of Statistics, IOWA STATE UNIVERSITY,Copyright © 2008 Dan Nettleton

Page 48: 1 Functional Genomics Introduction Julie A Dickerson Electrical and Computer Engineering Iowa State University

48

Probe Cell

8 x 8 =64pixels

borderpixelsexcluded

75th percentileof the 36 pixelintensitiescorrespondingto the center 36pixels is usedto quantifyfluorescenceintensity foreach probe cell.

These values arecalled PM valuesfor perfect-matchprobe cells andMM values formismatch probecells.

The PM and MM values are used to computeexpression measures for each probe set.

Dan Nettleton, Department of Statistics, IOWA STATE UNIVERSITY,Copyright © 2008 Dan Nettleton

Page 49: 1 Functional Genomics Introduction Julie A Dickerson Electrical and Computer Engineering Iowa State University

Normalization

Outputs from each individual probe pair are statistically combined to give an expression level for the gene represented by the probe set.

Normalization accounts for background noise on the chip, levels of control probes, etc

Key methods are MAS5.0, RMA, GCRMA

Page 50: 1 Functional Genomics Introduction Julie A Dickerson Electrical and Computer Engineering Iowa State University

Summary of Microarrays

Positives: commercial chips are accurate and repeatable in experienced hands and the statistics and modeling have been well-explored

Negatives: cost, can only see what is on the chip and difficult to update to new knowledge.

June 11, 2007 BBSI - 2007 50

Page 51: 1 Functional Genomics Introduction Julie A Dickerson Electrical and Computer Engineering Iowa State University

Short Read Sequencing

Sequencing technology has evolved in the last 15 years

Eventual goal is to be able to sequence a genome for $1000 (NIH).

Why not just sequence the transcriptome directly and see what is there?

June 11, 2007 BBSI - 2007 51

Page 52: 1 Functional Genomics Introduction Julie A Dickerson Electrical and Computer Engineering Iowa State University

Sequencing by synthesis (454) Takes a single strand of DNA and

synthesizes its complementary strand enzymatically one base pair at a timedetecting which base was actually added at each step.

Pyrosequencing detect the activity of DNA polymerase with a chemiluminescent enzyme.

Reads are about 400-500 bp

June 11, 2007 BBSI - 2007 52

Page 53: 1 Functional Genomics Introduction Julie A Dickerson Electrical and Computer Engineering Iowa State University

Other Techologies

Illumina Solexa: 40-100 bp, tag DNA or RNA at both ends

ABI SOLID around 50 bp

Page 54: 1 Functional Genomics Introduction Julie A Dickerson Electrical and Computer Engineering Iowa State University

Digital Gene Expression

Sequence census methods for functional genomicsBarbara Wold & Richard M Myers