1 functional genomics introduction julie a dickerson electrical and computer engineering iowa state...

Functional Genomics Introduction

Julie A Dickerson

Electrical and Computer Engineering

Iowa State University

Module Structure: Day 1

Introduction to Functional Genomics Transcriptomics

Analysis and Experiment Design for Microarray Data (Dr. Peng Liu)

RNA-Seq Data (Mr. Kun Liang) LAB:

Using R for Normalizing, processing microarray data, and clustering analysis of ‘omics data (John Van Hemert)

June 15, 2010

BBSI - 2010

Module Structure: Day 2

Metabolomics (Dr. Ann Perera) Proteomics (Dr. Young-Jin Lee) Pathways and data integration methods (Dr.

Julie Dickerson and Erin Boggess)

Lab: Analyzing integrated sets of microarray, proteomics

and metabolomics data (Erin Boggess)

F1: Outline

Module Structure What is Functional Genomics? Data Types Available Transcriptomics

Basic biology behind microarrays What can you learn from microarrays? Types of arrays Limitations of microarrays

Functional Genomics Definition Functional genomics is a field of molecular

biology that attempts to make use of the data produced by genomic projects to describe gene (and protein) functions and interactions. Functional genomics focuses on the dynamic aspects such as gene transcription, translation, and protein-protein interactions, as opposed to the static aspects such as DNA sequence or structures.

From Wikipedia, the free encyclopedia

Genome Wide View of Metabolism

Streptococcuspneumoniae

Explore capabilities of global network How do we go from a pretty picture to a

model we can manipulate?

Metabolic Pathways

Metabolitesglucose

Enzymesphosphofructokinase

Reactions & Stoichiometry1 F6P => 1 FBP

Kinetics

Regulationgene regulation

metabolite regulation

hexokinase

phosphoglucoisomerase

phosphofructokinase

aldolase

triosephosphate isomerase

G3P dehydrogenase

phosphoglycerate kinase

phosphoglycerate mutase

enolase

pyruvate kinase

Metabolic Modeling: The Dream

June 11, 2009 BBSI - 2009 9

Data Types Available for Determining Function Genomes Genes Proteins Metabolites Phenotypes

Sequence Microarrays,

Nextgen sequencing Proteomics Metabolomics Phenomics

A VERY Simplified Eukaryotic Cell

nucleus

chromosome

DNA strands

DNA contains thousands of genes.

cytoplasm

Posttranscriptional Modificationsto Primary TranscriptPrimary transcript

Intervening sequences corresponding to intronsthat are removed through splicing

3’ UTR5’ UTR

Primary transcript after modification: messenger RNA (mRNA)

AAAAAA...AAAA

poly-A tailCoding portions of RNA sequencecorresponding to exons

5’ UTR 3’ UTR

5’ cap

Transcription takes place inside the nucleus.

nucleus

chromosome

DNA strands cytoplasm

Translation takes place outside the nucleus.

Translation

Ribosome

amino acid sequence

folds to become a protein

During translation transfer RNA (tRNA) translates the genetic code

... ...A A C GU GU

codon codon

tRNAanticodon

amino acids

The Genetic Code

UUU phe UCU ser UAU tyr UGU cysUUC phe UCC ser UAC tyr UGC cysUUA leu UCA ser UAA STOP UGA STOPUUG leu UCG ser UAG STOP UGG trp

CUU leu CCU pro CAU his CGU argCUC leu CCC pro CAC his CGC argCUA leu CCA pro CAA gln CGA argCUG leu CCG pro CAG gln CGG arg

AUU ile ACU thr AAU asn AGU serAUC ile ACC thr AAC asn AGC serAUA ile ACA thr AAA lys AGA argAUG met ACG thr AAG lys AGG arg

GUU val GCU ala GAU asp GGU glyGUC val GCC ala GAC asp GGC glyGUA val GCA ala GAA glu GGA glyGUG val GCG ala GAG glu GGG gly

Second Base

U C A G

mRNAcodon

aminoacid

Miscellaneous Comments The biology is more complicated than I described.

Humans have somewhere around 30,000 genes. (The exact number is a subject for debate.) Regulation of these genes seems to be more important than number!

Much of the variation is created by differences in how cells use the genes they have.

Microarrays are a tool that can help us understand how cells of various types use their genes in response to varying conditions.

04/21/23BCB570 Gene Expression Data

Analysis 17

Microarrays With only a few exceptions, every

cell of the body contains a full set of chromosomes and identical genes.

Only a fraction of these genes are turned on, however, and it is the subset that is "expressed" that confers unique properties to each cell type.

"Gene expression" is the term used to describe the transcription of the information contained within the DNA, the repository of genetic information, into messenger RNA (mRNA) molecules that are then translated into the proteins that perform most of the critical functions of cells.

Analysis 18

Microarrays

Microarrays work by exploiting the ability of a given mRNA molecule (target) to bind specifically to, or hybridize to, the DNA template (probe) from which it originated.

This mechanism acts as both an "on/off" switch to control which genes are expressed in a cell as well as a "volume control" that increases or decreases the level of expression of particular genes as necessary.

Source: The Genetic Science Learning Center, University of Utah

Analysis 19

DNA Microarrays

Small, solid supports onto which the sequences from thousands of different genes are immobilized, or attached, at fixed locations.

The DNA is printed, spotted, or actually synthesized directly onto the support.

The spots themselves can be DNA, complementary DNA (cDNA, DNA synthesized from a mRNA template) , or oligonucleotides. (or oligo, a short fragment of a single-stranded DNA that is typically 5 to 50 nucleotides long)

Analysis 20

Why do microarray experiments? Comparing two conditions to find differentially

expressed genes Control/treatment Disease/normal

Compare more than two conditions; some of which may interact Different treatments, different strains

Exploratory analysis What genes are expressed under drought stress?

Analysis 21

Why use microarrays (cont)?

What happens over time? Developmental stages

Predicting certain conditions (cancer vs. normal)

Patterns of gene expression that characterize a patient’s or organism’s response

Analysis 22

Differentially Expressed Genes

Find genes that show a large difference in expression between groups and are similar within a group

Statistical tests (t-test), look at if the groups have different means or variances (chi-squared, F-statistics)

Adapted from “Practical Microarray Analysis”, Presentation by Benedikt Brors, German Cancer Research Center

Analysis 23

Multiple Conditions

Are there differences in expression level between the k conditions?

Analysis of Variance (ANOVA)

Mutant 1 Mutant 2

Inoculated Control Inoculated Control

Some Example Microarray Experiments from Iowa State UniversityJim Reecy from Animal Science: muscle undergoing

hypertrophy vs. normal muscle

David Putthoff, Steve Rodermel, Thomas Baum fromPlant Pathology: roots infected with soybean cystnematodes vs. uninfected roots

Anne Bronikowski in Genetics: wheel-running mice vs.non-runners

Roger Wise, Rico Caldo in Plant Pathology: interactionbetween multiple isolates of powdery mildew andmultiple genotypes of barley.

Wild-type vs. Myostatin Knockout Mice

Belgian Blue cattle have a mutation in the myostatin gene.

Identifying Genes Involved in Pathways That DistinguishCompatible from Incompatible Interactions

Barley Genotype

Mla6 Mla13 Mla1

Incompatible

Incompatible Incompatible

IncompatibleCompatible

Compatible

Caldo, Nettleton, Wise (2004). The Plant Cell. 16, 2514-2528.

An Example Gene of Interest

Hours after Inoculation

Incompatible

Compatible

Analysis 28

Exploratory Analysis

Find patterns in data to see what genes are expressed under different conditions

Analysis includes clustering methods Used when little or no prior knowledge exists about

the problem

Perou, Charles M. et al. (1999) Proc. Natl. Acad. Sci. USA 96, 9212-9217

Fig. 5 (see Supplemental data at http://www.pnas.orgwww.pnas.org) for the full cluster diagram with all gene names\]

Analysis 30

Time Series

Goal: find patterns of co-expressed genes over time or partial time

Typical length is 3-10 time points Cluster to find similar patterns (k-means, self-organizing

maps) Correlations to find genes that behave like a given gene of

interest.

0 hours 4 hours 12 hours 24 hours

Analysis 31

Classification

Learn characteristic patterns from a training set and evaluate with a test set.

Classify tumor types based on expression patterns

Predict disease susceptibility, stages, etc.

Analysis 32Source: “Practical Microarray Analysis”, Presentation by Benedikt Brors, German Cancer Research Center

Some Commonly Used Toolsfor Microarray Analysis Oligonucleotide arrays

Affymetrix GeneChips

Nimblegen

Agilent

Oligonucleotides An oligonucleotide is a short sequence of nucleotides.

(oligonucleotide=oligo for short)

An oligonucleotide microarray is a microarray whose probes consist of synthetically created DNA oligonucleotides.

Probes sequences are chosen to have good and relatively uniform hybridization characteristics.

A probe is chosen to match a portion of its target mRNA transcript that is unique to that sequence.

Oligo probes can distinguish among multiple mRNA transcripts with similar sequences.

Analysis 35

Simplified Example

gene 1

gene 2

shared green regions indicatehigh degree of sequence similaritythroughout much of the transcript

ATTACTAAGCATAGATTGCCGTATAoligo probefor gene 1

GCGTATGGCATGCCCGGTAAACTGG

oligo probe for gene 2

... ...

Source: Dan Nettleton Course Notes Statistics 416/516X

Oligo Microarray Fabrication

Oligos can be synthesized and stored in solution.

Oligo sequences can be synthesized on a slide or chip using various commercial technologies.

The company Affymetrix uses a photolithographic approach which we will describe briefly.

Affymetrix GeneChips Affymetrix (www.affymetrix.com) manufactures

GeneChips.

GeneChips are oligonucleotide arrays.

Each gene (more accurately sequence of interest or feature) is represented by multiple short (25-nucleotide) oligo probes.

Some GeneChips include probes for around 120,000 genes and gene variants.

mRNA that has been extracted from a biological sample can be labeled (dyed) and hybridized to a GeneChip.

Only one sample is hybridized to each GeneChip.

Analysis 38

Different Probe Pairs Represent Different Parts of the Same Gene

gene sequence

Probes are selected to be specific to the target geneand have good hybridization characteristics.

Source: Dan Nettleton Course Notes Statistics 416/516X

Affymetrix Probe Sets A probe set is used to measure mRNA levels of a single

Each probe set consists of multiple probe cells.

Each probe cell contains millions of copies of one oligo.

Each oligo is intended to be 25 nucleotides in length.

Probe cells in a probe set are arranged in probe pairs.

Each probe pair contains a perfect match (PM) probe cell and a mismatch (MM) probe cell.

A PM oligo perfectly matches part of a gene sequence.

A MM oligo is identical to a PM oligo except that the middle nucleotide (13th of 25) is replaced by its complementary nucleotide.

A Probe Set for Measuring Expression Level of a Particular Gene

probepair

probecell

gene sequence...TGCAATGGGTCAGAAGGACTCCTATGTGCCT...AATGGGTCAGAAGGACTCCTATGTGAATGGGTCAGAACGACTCCTATGTG

perfect match sequencemismatch sequence

probe set

Different Probe Pairs Represent Different Parts of the Same Gene

gene sequence

Probes are selected to be specific to the target geneand have good hybridization characterictics.

Affymetrix’s Photolithographic Approach

GeneChip

maskmaskmaskmaskmaskmaskmask

44Source: www.affymetrix.com

Image from Hybridized GeneChip

Image Processing for Affymetrix GeneChips

Image processing for Affymetrix GeneChips is typically done using proprietary Affymetrix software.

The entire surface of a GeneChip is covered with square-shaped cells containing probes.

Probes are synthesized on the chip in precise locations.

Thus spot finding and image segmentation are not major issues.

Probe Cell

8 x 8 =64pixels

borderpixelsexcluded

75th percentileof the 36 pixelintensitiescorrespondingto the center 36pixels is usedto quantifyfluorescenceintensity foreach probe cell.

These values arecalled PM valuesfor perfect-matchprobe cells andMM values formismatch probecells.

The PM and MM values are used to computeexpression measures for each probe set.

Normalization

Outputs from each individual probe pair are statistically combined to give an expression level for the gene represented by the probe set.

Normalization accounts for background noise on the chip, levels of control probes, etc

Key methods are MAS5.0, RMA, GCRMA

Summary of Microarrays

Positives: commercial chips are accurate and repeatable in experienced hands and the statistics and modeling have been well-explored

Negatives: cost, can only see what is on the chip and difficult to update to new knowledge.

June 11, 2007 BBSI - 2007 50

Short Read Sequencing

Sequencing technology has evolved in the last 15 years

Eventual goal is to be able to sequence a genome for $1000 (NIH).

Why not just sequence the transcriptome directly and see what is there?

June 11, 2007 BBSI - 2007 51

Sequencing by synthesis (454) Takes a single strand of DNA and

synthesizes its complementary strand enzymatically one base pair at a timedetecting which base was actually added at each step.

Pyrosequencing detect the activity of DNA polymerase with a chemiluminescent enzyme.

Reads are about 400-500 bp

June 11, 2007 BBSI - 2007 52

Other Techologies

Illumina Solexa: 40-100 bp, tag DNA or RNA at both ends

ABI SOLID around 50 bp

Digital Gene Expression

Sequence census methods for functional genomicsBarbara Wold & Richard M Myers

1 functional genomics introduction julie a dickerson electrical and computer engineering iowa state...

translationmrnadan nettleton

aacgugudan nettleton

cytoplasmdan nettleton

department of statistics

processing microarray

data integration methods

peng liurnaseq data

proteinprotein interactions

Documents

copyright © r. r. dickerson 20111 lecture 1/2 aosc 637...

table of contents - homes by dickerson

robert dickerson - zuber lawler | global legal solutions

bat mitzvah of molly dickerson - images.shulcloud.com

dickerson cv

old cutchogue burying ground€¦ · row #3 2014 asher...

the healer's apprentice by melanie dickerson, excerpt

genetic relationships in cancer: the latest on genetic...

bomb summer 1995 - paul dickerson

miss julie (froken julie) miss julie

issue five - dickerson law

abhe annual meeting · email: clark@dickerson-bakker.com...

james dickerson

bcb 570 spring 20081 protein-protein interaction networks &...

protecting your investment joe creney laura dickerson

dickerson v us

bioinformatics and its applications in plant … ·...

service recovery & availability robert dickerson june 2010

exploring the transcriptome for novel biomarker discoverynew...

joyce dickerson, director, sustainable it department of