bionf/beng 203: functional genomics lecture ti 1 trey ideker ucsd department of bioengineering...

60
BIONF/BENG 203: Functional Genomics Lecture TI 1 Trey Ideker UCSD Department of Bioengineering Sources of Functional Data Lectures 1 and 2

Upload: briana-lane

Post on 20-Jan-2016

214 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: BIONF/BENG 203: Functional Genomics Lecture TI 1 Trey Ideker UCSD Department of Bioengineering Sources of Functional Data Lectures 1 and 2

BIONF/BENG 203:Functional Genomics

Lecture TI 1Trey IdekerUCSD Department of Bioengineering

Sources of Functional DataLectures 1 and 2

Page 2: BIONF/BENG 203: Functional Genomics Lecture TI 1 Trey Ideker UCSD Department of Bioengineering Sources of Functional Data Lectures 1 and 2

Grading

40% Problem Sets (best 4 of 5)30% Midterm30% Final Project

Page 3: BIONF/BENG 203: Functional Genomics Lecture TI 1 Trey Ideker UCSD Department of Bioengineering Sources of Functional Data Lectures 1 and 2

Outline of the course

Biological data

sources (2)

Data pre-processing

(6)

Unsupervised:

Clustering

Inference

Supervised:

Classification

Population Genetics and

Linkage

Single Source (3) (3) (1)Multi-

Source (2) FINAL PROJECT

FINAL

PROJECT

Project Presentations

(2)

Total of 17 lectures

Page 4: BIONF/BENG 203: Functional Genomics Lecture TI 1 Trey Ideker UCSD Department of Bioengineering Sources of Functional Data Lectures 1 and 2

Functional Genomics Data

– ExpressionmRNA, protein

– Molecular interactionsProtein, mRNA, small molecules

– Knockout phenotypes1st, 2nd, higher orders

– SNP sequence (polymorphism) data– Imaging data

Sub-cellular localizationCell morphology

– Gene ontology

Page 5: BIONF/BENG 203: Functional Genomics Lecture TI 1 Trey Ideker UCSD Department of Bioengineering Sources of Functional Data Lectures 1 and 2

Dividing the data into two classes of information:Biological Networks and Network States

Directly observe the network “wires” themselves

Protein-protein interactions:Two-hybrid system, coIP, protein antibody arrays

BIND, DIP

Protein-DNA interactions:Chromatin IP

BIND, Transfac, SCPD

Other types not yet possible:e.g., protein-small molecule

Observe molecular states that result from the interaction wiring

DNA/RNA Gene expression:DNA microarrays, SAGE

Protein levels, locations, and modifications:

Mass spectrometry, fluorescence microscopy, protein arrays

Gross phenotypes:e.g., growth rates of single and double deletion strains

1)

2)

Page 6: BIONF/BENG 203: Functional Genomics Lecture TI 1 Trey Ideker UCSD Department of Bioengineering Sources of Functional Data Lectures 1 and 2

High-throughput methods for measuring cellular states

Gene expression levels: RT-PCR, arrays

Protein levels, modifications: mass specProtein locations: fluorescent tagging

Metabolite levels: NMR and mass spec

Systematic phenotyping

Page 7: BIONF/BENG 203: Functional Genomics Lecture TI 1 Trey Ideker UCSD Department of Bioengineering Sources of Functional Data Lectures 1 and 2

The transcriptome and proteome

The transcriptome is the full complement of RNA molecules produced by a genome

The proteome is the full complement of proteins enabled by the transcriptome

DNA RNA protein Genome transcriptome proteome 30,000 genes ??? RNAs ??? proteins?

For example, the drosophila gene Dscam can generate 40,000 distinct transcripts through alternative splicing.

What is the minimum number of exons that would be required?

Page 8: BIONF/BENG 203: Functional Genomics Lecture TI 1 Trey Ideker UCSD Department of Bioengineering Sources of Functional Data Lectures 1 and 2

Expression: High-throughput approaches

RNA DNA Microarrays cDNA / EST sequencing RT-PCR Differential display SAGE Massively parallel signature sequencing (MPSS)

Proteins 2D PAGE Mass spectrometry

Page 9: BIONF/BENG 203: Functional Genomics Lecture TI 1 Trey Ideker UCSD Department of Bioengineering Sources of Functional Data Lectures 1 and 2

Gene expression arrays

They are really, really, really, really, really, really, really, really, really, really, really, really, really important

Page 10: BIONF/BENG 203: Functional Genomics Lecture TI 1 Trey Ideker UCSD Department of Bioengineering Sources of Functional Data Lectures 1 and 2

Microarrays

Monitors the level of each gene:

Is it turned on or off in a particular biological condition?

Is this on/off state different between two biological conditions?

Microarray is a rectangular grid of spots printed on a glass microscope slide, where each spot contains DNA for a different gene

Page 11: BIONF/BENG 203: Functional Genomics Lecture TI 1 Trey Ideker UCSD Department of Bioengineering Sources of Functional Data Lectures 1 and 2

Two-color DNA microarray design

ReverseTranscription

Page 12: BIONF/BENG 203: Functional Genomics Lecture TI 1 Trey Ideker UCSD Department of Bioengineering Sources of Functional Data Lectures 1 and 2

cDNA-chip of brain glioblastoma

Page 13: BIONF/BENG 203: Functional Genomics Lecture TI 1 Trey Ideker UCSD Department of Bioengineering Sources of Functional Data Lectures 1 and 2

Types of microarrays

Spotted (cDNA)– Robotic transfer of cDNA clones or PCR products– Spotting on nylon membranes or glass slides coated with poly-lysine

Synthetic (oligo)– Direct oligo synthesis on solid microarray substrate– Uses photolithography (Affymetrix) or ink-jet printing (Agilent)

All configurations assume the DNA on the array is in excess of the hybridized sample—thus the kinetics are linear and the spot intensity reflects that amount of hybridized sample.

Labeling can be radioactive, fluorescent (one-color), or two-color

Page 14: BIONF/BENG 203: Functional Genomics Lecture TI 1 Trey Ideker UCSD Department of Bioengineering Sources of Functional Data Lectures 1 and 2

Microarray Spotter

Page 15: BIONF/BENG 203: Functional Genomics Lecture TI 1 Trey Ideker UCSD Department of Bioengineering Sources of Functional Data Lectures 1 and 2

Affymetrix High Density Arrays Affymetrix High Density Arrays

Page 16: BIONF/BENG 203: Functional Genomics Lecture TI 1 Trey Ideker UCSD Department of Bioengineering Sources of Functional Data Lectures 1 and 2

Microarrays (continued)

Imaging– Radioactive 32P labeling: Autoradiography or

phosphorimager– Fluorescent labeling: Confocal microscope (invented

by Marvin Minsky!!)

Feature density– Nylon membrane macroarrays 100-1000 features– Glass slide spotted array 5,000 features / cm2

– Synthesized arrays 50,000 features / cm2

Page 17: BIONF/BENG 203: Functional Genomics Lecture TI 1 Trey Ideker UCSD Department of Bioengineering Sources of Functional Data Lectures 1 and 2

Microarrayconfocal scanner

Collects sharply defined optical sections from which 3D renderings can be created

The key is spatial filtering to eliminate out-of-focus light or glare in specimens whose thickness exceeds the immediate plane of focus.

Two lasers for excitation Two color scan in less than 10 minutes High resolution, 10 micron pixel size

Page 18: BIONF/BENG 203: Functional Genomics Lecture TI 1 Trey Ideker UCSD Department of Bioengineering Sources of Functional Data Lectures 1 and 2

cDNA / EST sequencing projects cDNA = complementary or copy DNA EST = Expressed Sequence Tag

The microarray could be described as a “closed system” because information about RNAs is limited by the targets available for hybridization. RNAs not represented on the array are not interrogated.

Direct sequencing of cDNAs (yielding ESTs) overcomes this problem by large-scale random sampling of sequences from a whole-cell RNA extract

Statistical counting of distinct sequences provides an estimate of expression level

Conversely, cDNA library can be normalized to capture rare messages

Requires large scale sequencing to get statistical significance

Page 19: BIONF/BENG 203: Functional Genomics Lecture TI 1 Trey Ideker UCSD Department of Bioengineering Sources of Functional Data Lectures 1 and 2

cDNA / EST Sequencing:Preparation of a cDNA library in phage vector

Page 20: BIONF/BENG 203: Functional Genomics Lecture TI 1 Trey Ideker UCSD Department of Bioengineering Sources of Functional Data Lectures 1 and 2

SerialAnalysis ofGeneExpression

Takes idea of sequence sampling to the extremeTakes idea of sequence sampling to the extreme

Generates short ESTs (9-14nt) which are joined into long Generates short ESTs (9-14nt) which are joined into long concatamers and then sequencedconcatamers and then sequenced

4499 is 262,144, ~5-fold the number of human genes is 262,144, ~5-fold the number of human genes

The count of each type of tag estimates RNA copy numberThe count of each type of tag estimates RNA copy number

>50X more efficient than cDNA sequencing because many >50X more efficient than cDNA sequencing because many RNAs are represented in a single sequencing runRNAs are represented in a single sequencing run

SAGE Technology

Page 21: BIONF/BENG 203: Functional Genomics Lecture TI 1 Trey Ideker UCSD Department of Bioengineering Sources of Functional Data Lectures 1 and 2

Steps to SAGE

Copy mRNA ds cDNA using biotinylated (dT) Cleave with anchoring enzyme (AE) which cleaves

within ~250bp of poly-A tail at 3’ end. Capture this segment on streptavidin beads Ligate to linkers containing a type IIs restriction site,

which cleave DNA 14 bp away from this site. Ligate sequences to each other and PCR amplify Cleave with AE to remove linkers Concatenate, clone, and sequence

Page 22: BIONF/BENG 203: Functional Genomics Lecture TI 1 Trey Ideker UCSD Department of Bioengineering Sources of Functional Data Lectures 1 and 2

WHY DI-TAGS?Ditags are used to detect bias in the PCR amplification step.

The probability of any two tags being coupled in the same ditag is small.

Biased amplification can be detected as many ditags always having the same 2 tags present.

Velculescu et al. Velculescu et al. ScienceScience (1995) (1995)

AA BBBBBBAA

AA

PrimerAPrimerA PrimerBPrimerB

PrimerAPrimerA PrimerBPrimerB

Page 23: BIONF/BENG 203: Functional Genomics Lecture TI 1 Trey Ideker UCSD Department of Bioengineering Sources of Functional Data Lectures 1 and 2

SAGE (continued)

Tag Sequence Count

ATCTGAGTTC 1075

GCGCAGACTT 125

TCCCCGTACA 112

TAGGACGAGG 92

GCGATGGCGG 91

TAGCCCAGAT 83

GCCTTGTTTA 80

Example of a concatemer:

CATGCATGACCCACGAGCAGGGTACGATGATAACCCACGAGCAGGGTACGATGATACATGCATGGAAACCTATGCACCTTGGGTAGCAGAAACCTATGCACCTTGGGTAGCACATGCATG

TAG1TAG1 TAG2TAG2 TAG3TAG3 TAG4TAG4

Tag Sequence

Count

GCGATATTGT 66

TACGTTTCCA 66

TCCCGTACAT 66

TCCCTATTAA 66

GGATCACAAT 55

AAGGTTCTGG 54

CAGAACCGCG 50

GGACCGCCCC 48

Counting the tags:

Page 24: BIONF/BENG 203: Functional Genomics Lecture TI 1 Trey Ideker UCSD Department of Bioengineering Sources of Functional Data Lectures 1 and 2

Proteomics

SDS PAGE

2D PAGE

MS/MS

Page 25: BIONF/BENG 203: Functional Genomics Lecture TI 1 Trey Ideker UCSD Department of Bioengineering Sources of Functional Data Lectures 1 and 2

An example SDS-PAGE

Protein stains:SilverCopperCoomassie Blue

How many proteins are in a band?

Page 26: BIONF/BENG 203: Functional Genomics Lecture TI 1 Trey Ideker UCSD Department of Bioengineering Sources of Functional Data Lectures 1 and 2

2D-PAGE

Dimension 1: Isoelectric

focusing gel

Dimension 2: size

Page 27: BIONF/BENG 203: Functional Genomics Lecture TI 1 Trey Ideker UCSD Department of Bioengineering Sources of Functional Data Lectures 1 and 2

2D gel from macrophage phagosomes

Page 28: BIONF/BENG 203: Functional Genomics Lecture TI 1 Trey Ideker UCSD Department of Bioengineering Sources of Functional Data Lectures 1 and 2

Mass spectrometry

Mass spectrometers consist of three essential parts

– Ionization source: Converts peptides into gas-phase ions (MALDI + ESI)

– Mass analyzer: Separates ions by mass to charge (m/z) ratio (Ion trap, time of flight, quadrupole)

– Ion detector: Current over time indicates amount of signal at each m/z value

Page 29: BIONF/BENG 203: Functional Genomics Lecture TI 1 Trey Ideker UCSD Department of Bioengineering Sources of Functional Data Lectures 1 and 2

MS/MS Overview

Page 30: BIONF/BENG 203: Functional Genomics Lecture TI 1 Trey Ideker UCSD Department of Bioengineering Sources of Functional Data Lectures 1 and 2

MS/MS Overview

Page 31: BIONF/BENG 203: Functional Genomics Lecture TI 1 Trey Ideker UCSD Department of Bioengineering Sources of Functional Data Lectures 1 and 2
Page 32: BIONF/BENG 203: Functional Genomics Lecture TI 1 Trey Ideker UCSD Department of Bioengineering Sources of Functional Data Lectures 1 and 2
Page 33: BIONF/BENG 203: Functional Genomics Lecture TI 1 Trey Ideker UCSD Department of Bioengineering Sources of Functional Data Lectures 1 and 2

A raw fragmentation spectrumBy calculating the molecular weight difference between ions of the same type the sequence can be determined.

SEQUEST uses the fragmentation pattern to search through a complete protein database to identify the sequence which best fits the pattern.

Page 34: BIONF/BENG 203: Functional Genomics Lecture TI 1 Trey Ideker UCSD Department of Bioengineering Sources of Functional Data Lectures 1 and 2

Tandem Mass Spec (MS/MS)

Page 35: BIONF/BENG 203: Functional Genomics Lecture TI 1 Trey Ideker UCSD Department of Bioengineering Sources of Functional Data Lectures 1 and 2

Typical nanoelectrospray source

Page 36: BIONF/BENG 203: Functional Genomics Lecture TI 1 Trey Ideker UCSD Department of Bioengineering Sources of Functional Data Lectures 1 and 2

Isotope Coded Affinity Tags (ICAT)

Biotin Biotin tagtag

Linker (d0 or d8)Linker (d0 or d8) Thiol specific Thiol specific reactive groupreactive group

ICATICAT ReagentsReagents:: Heavy reagent: d8-ICATHeavy reagent: d8-ICAT ((XX=deuterium)=deuterium)Normal reagent: d0-ICAT (Normal reagent: d0-ICAT (XX=hydrogen)=hydrogen)

S

N N

O

N OO

O N IO OXX

XX

XX

XX

XX

XX

XX

XX

Mass spec based method for measuring relative protein abundances between two samples

Page 37: BIONF/BENG 203: Functional Genomics Lecture TI 1 Trey Ideker UCSD Department of Bioengineering Sources of Functional Data Lectures 1 and 2

Combine and Combine and proteolyzeproteolyze(trypsin)(trypsin)

Affinity Affinity separationseparation

(avidin)(avidin)

Protein identificationProtein identification

ICAT-ICAT-labeled labeled

cysteinescysteines

550550 560560 570570 580580m/zm/z

00

100100

200200 400400 600600 800800m/zm/z

00

100100

NHNH22-EACDPLR--EACDPLR-COOHCOOH

LightLight HeavyHeavy

Mixture 2Mixture 2

Mixture 1Mixture 1

Protein Quantification & Identification Protein Quantification & Identification viavia ICAT Strategy ICAT Strategy

QuantitationQuantitation

ICAT Flash animation:http://occawlonline.pearsoned.com/bookbind/pubbooks/bc_mcampbell_genomics_1/medialib/method/ICAT/ICAT.html

Page 38: BIONF/BENG 203: Functional Genomics Lecture TI 1 Trey Ideker UCSD Department of Bioengineering Sources of Functional Data Lectures 1 and 2

ICAT continued The heavy (blue) and light (gray) peptides are separated and

quantified to produce a ratio for each peptide – here, a single peptide ratio is shown

Each peptide is subjected to CID fragmentation in the second MS stage in order to identify it

Page 39: BIONF/BENG 203: Functional Genomics Lecture TI 1 Trey Ideker UCSD Department of Bioengineering Sources of Functional Data Lectures 1 and 2

Metabolomic measurements

2D NMR or mass spectrometry

Currently not global and in less widespread use than microarrays, but have tremendous potential

Page 40: BIONF/BENG 203: Functional Genomics Lecture TI 1 Trey Ideker UCSD Department of Bioengineering Sources of Functional Data Lectures 1 and 2

Replacement of yeast ORFS with kanMX gene flanked by unique oligo barcodes– Yeast Deletion Project Consortium

Gene knockout and RNAi libraries for model speciesExample from yeast:

Page 41: BIONF/BENG 203: Functional Genomics Lecture TI 1 Trey Ideker UCSD Department of Bioengineering Sources of Functional Data Lectures 1 and 2

YFP tagging for protein localization

NIC96 Nuclear Pore

YPF is green, transmitted light is red

TUB1 Tubulin cytoskeleton

HHF2 Histone Nucleus

BNI4 Bud neck

Images courtesy T. Davis lab

See also recent work byWeissman and O’Shea labs at UCSF

Page 42: BIONF/BENG 203: Functional Genomics Lecture TI 1 Trey Ideker UCSD Department of Bioengineering Sources of Functional Data Lectures 1 and 2

Systematic phenotyping

yfg1 yfg2 yfg3

CTAACTC TCGCGCA TCATAATBarcode

(UPTAG):

DeletionStrain:

Growth 6hrsin minimal media

(how many doublings?)

Rich media

Harvest and label genomic DNA

Page 43: BIONF/BENG 203: Functional Genomics Lecture TI 1 Trey Ideker UCSD Department of Bioengineering Sources of Functional Data Lectures 1 and 2

Systematic phenotyping with a barcode arrayRon Davis and friends…

These oligo barcodes are also spotted on a DNA microarray

Growth time in minimal media:– Red: 0 hours– Green: 6 hours

Page 44: BIONF/BENG 203: Functional Genomics Lecture TI 1 Trey Ideker UCSD Department of Bioengineering Sources of Functional Data Lectures 1 and 2

Molecular Interactions

Among proteins, mRNA, small molecules, and so on…

Page 45: BIONF/BENG 203: Functional Genomics Lecture TI 1 Trey Ideker UCSD Department of Bioengineering Sources of Functional Data Lectures 1 and 2

Protein→DNAinteractions

Gene levels(on/off)

Protein—proteininteractions

Protein levels(present/absent)

Biochemicalreactions

Biochemicallevels

▲ Chromatin IP

▼ DNA microarray

▲ Protein coIP▼ Mass

spectrometry

▲Not yet!!!Metabolic

flux ▼ measurement

s

Page 46: BIONF/BENG 203: Functional Genomics Lecture TI 1 Trey Ideker UCSD Department of Bioengineering Sources of Functional Data Lectures 1 and 2

Also like sequence, protein interaction data are exponentially growing…

DIP Database Growthtotal interactions

(As are the false positives!!!)

EMBL Database Growthtotal nucleotides (gigabases)

1980 20001990

0

10

5

Page 47: BIONF/BENG 203: Functional Genomics Lecture TI 1 Trey Ideker UCSD Department of Bioengineering Sources of Functional Data Lectures 1 and 2

High-throughput methods for measuring interaction networks

2-hybrid co-immunoprecipitation w/ mass spec chIP-on-chip systematic genetic analysis

Page 48: BIONF/BENG 203: Functional Genomics Lecture TI 1 Trey Ideker UCSD Department of Bioengineering Sources of Functional Data Lectures 1 and 2

Yeast two-hybrid method

Fields and Song

Page 49: BIONF/BENG 203: Functional Genomics Lecture TI 1 Trey Ideker UCSD Department of Bioengineering Sources of Functional Data Lectures 1 and 2

Detection of protein interactions with antibody arrays

McBeath and Schreiber

Page 50: BIONF/BENG 203: Functional Genomics Lecture TI 1 Trey Ideker UCSD Department of Bioengineering Sources of Functional Data Lectures 1 and 2

Kinase-target interactions

Mike Snyder and colleagues

Page 51: BIONF/BENG 203: Functional Genomics Lecture TI 1 Trey Ideker UCSD Department of Bioengineering Sources of Functional Data Lectures 1 and 2

High-throughput methods for measuring networks

2-hybrid

co-immunoprecipitation w/ mass spec

chIP-on-chip

systematic genetic analysis

Page 52: BIONF/BENG 203: Functional Genomics Lecture TI 1 Trey Ideker UCSD Department of Bioengineering Sources of Functional Data Lectures 1 and 2

Protein interactions by protein immunoprecipitation followed by mass spectrometry

Gavin / Cellzome

TEV = Tobacco Etch Virus proteolytic site

CBP = Calmodulin binding peptide

Protein A = IgG binding from Staphylococcus

Page 53: BIONF/BENG 203: Functional Genomics Lecture TI 1 Trey Ideker UCSD Department of Bioengineering Sources of Functional Data Lectures 1 and 2

TAP purification

Image courtesy of Bertrand

Seraphin

Page 54: BIONF/BENG 203: Functional Genomics Lecture TI 1 Trey Ideker UCSD Department of Bioengineering Sources of Functional Data Lectures 1 and 2

High-throughput methods for measuring networks

2-hybrid

co-immunoprecipitation w/ mass spec

chIP-on-chip

systematic genetic analysis

Page 55: BIONF/BENG 203: Functional Genomics Lecture TI 1 Trey Ideker UCSD Department of Bioengineering Sources of Functional Data Lectures 1 and 2

ChIP-chip measurement of protein→DNA interactions

From Figure 1 of Simon et al. Cell 2001

Page 56: BIONF/BENG 203: Functional Genomics Lecture TI 1 Trey Ideker UCSD Department of Bioengineering Sources of Functional Data Lectures 1 and 2

High-throughput methods for measuring networks

2-hybrid

co-immunoprecipitation w/ mass spec

chIP-on-chip

systematic genetic analysis

Page 57: BIONF/BENG 203: Functional Genomics Lecture TI 1 Trey Ideker UCSD Department of Bioengineering Sources of Functional Data Lectures 1 and 2

Genetic interactions: synthetic lethals and suppressors

Adapted from Tong et al., Science 2001

Genetic Interactions:

Widespread method used by geneticists to discover pathways in yeast, fly, and worm

Implications for drug targeting and drug development for human disease

Thousands are now reported in literature and systematic studies

As with other types, the number of known genetic interactions is exponentially increasing…

Page 58: BIONF/BENG 203: Functional Genomics Lecture TI 1 Trey Ideker UCSD Department of Bioengineering Sources of Functional Data Lectures 1 and 2

Most recorded genetic interactions are synthetic lethal relationships

Adapted from Hartman, Garvik, and Hartwell, Science 2001

A B A B A B A B

Page 59: BIONF/BENG 203: Functional Genomics Lecture TI 1 Trey Ideker UCSD Department of Bioengineering Sources of Functional Data Lectures 1 and 2

A B

A B

BA

X

A

Suppressor protein interaction

Synthetic-lethal protein interaction

A

A

BA

X

B

B BA

BA

A

Page 60: BIONF/BENG 203: Functional Genomics Lecture TI 1 Trey Ideker UCSD Department of Bioengineering Sources of Functional Data Lectures 1 and 2

A

B

Parallel Effects (Redundant or Additive)

Sequential Effects (Additive)

Single A or B mutations typically abolish their biochemical activities

Single A or B mutations typically reduce their biochemical activities

Interpretation of genetic interactions (Guarente T.I.G. 1990)

A B

GOAL: Identify downstream

physical pathways