final final: 2 of the following 3 choices, –1 hour exam covering recent materials (june 11), –2...

Post on 20-Dec-2015

213 Views

Category:

Documents

0 Downloads

Preview:

Click to see full reader

TRANSCRIPT

Final

• Final: 2 of the following 3 choices,

– 1 hour exam covering recent materials (June 11),

– 2 page review of an assigned paper (due June 11),

– Self-study of a remaining chapter in the text, answers to the “odd” problems (due June 11).

DNA Arrays

…DNA systematically arrayed at high density,

– virtual genomes for expression studies,

• RNA hybridization to DNA for expression studies,

– comparative genomics,

• DNA hybridization to DNA,

– inter- and intra-species comparisons, etc.

– potential yet to be developed.

Arrays

solid substrate

DNA Chip: oligonucleotides, up to 1000s kb fragments.

Probes/Targets

...Probes: are the tethered nucleic acids with known sequence,

– the DNA on the chip,

...Target: is the free nucleic acid sample whose identity/abundance is being detected,

– the labeled nucleic acid that is washed over the chip.

DNA-Probes

– cDNA arrays, DNA arrays,

• DNA Microarrays,

– oligonucleotide arrays,

• DNA chips.

nucleic acid is spotted onto the substrate.

nucleic acid is synthesized directly onto on the substrate.

DNA Chips

…oligonucleotides systematically synthesized in situ at high density.

Affymetrix DNA Chip

Allele-Specific Oligonucleotides(DNA Chips)

…allele specific oligonucleotides (ASOs) recognize single base pair differences in DNA sequences.

--AGTAGCTGTAGCT-- --TCATCGACATCGA--

--AGTAGCTaTAGCT--

--TCATCGACATCGA--

mismatchno binding

Ordered Array of ASOs

linkermolecule

...over a million ASOs and controls can be gridded per cm2.

Photolithography

…the process of using an optical image and a photosensitive substrate to produce a pattern,

• oligonucleotide synthesis can be inhibited by a ‘protection group’ molecule,

• the ‘protection group’ can be linked by a photosensitive bond, and thus cleaved by light.

QuickTime™ and aAnimation decompressor

are needed to see this picture.

Targets

...fluorescent targets,

– genomic DNA,

– cDNA, mRNA or cRNA for expression studies,

…targets are washed over the chip for hybridization.

cDNA Microarrays

...denatured, double stranded DNA (500 - 5000 bp) is dotted, or sprayed on a glass or nylon substrate,

...up to tens of thousands of spots per array,

quill technology...

Hybridization Detection

…fluorescent images are read by an optical scanner, and intensities are compared using algorithms to differentiate artifacts.

Screening for Genetic Disease

• Cystic fibrosis: 75% of mutations are at the 508 deletion site,

– 8% are in three additional specific locations in the gene, the rest are spread across the length of the gene,

• Pre-Array tests yielded only an ~83% chance of detecting a mutation.

Cystic fibrosis Detection

• Create a DNA chip with ASOs for wild-type Cystic fibrosis gene,

– approximately 4.5 kb of the 250 kb gene codes for the structural portion of the gene,

• 225 20-mers span 4.5 kb,

• 20 mismatches per 20-mer requires 4500 ASOs, or grids, plus controls.

Creating the Mask

…computer algorithms are used to design the mask,

– creation of mask is now the limiting process, requires months to accomplish, and about $100,000 per mask,

– masks have limited lifetimes, each array costs about $100 currently.

Cystic fibrosis Chip

…using photlithography, create a chip with ASOs to identify any difference from wild-type DNA,

…match results with mutations at know deleterious loci,

…catalog new deleterious loci.

1 Gene of Many

…with controls, the Cystic fibrosis gene may require up to 20,000 grids,

…new chips can accommodate up to 1 million grids,

…can look at 50 similarly sized genes on one chip.

4000+ Genetic Diseases

…as genes are linked to diseases, quick, inexpensive tests can be performed to determine who carries specific mutations,

…computer analysis will provide genome profiles that predict a variety of traits.

Genome Profiling

…with 1500 SNPs now, and up to thousands available, genetic profiles can be made,

…choose SNPs in or near genes involved in traits or diseases,

…compare profiles over large populations.

How are we different?…at the RNA level.

Southern Analysis

DNA hybridizing to RNA,

DNA Arrays and Expression

…grid gene-specific ASOs onto the DNA chip, or cDNAs onto microarrays,

…assay with labeled cDNA, genes that are expressed at a specific time, place or under a specific condition will bind to the chip for display.

Genes and Targets

• once the Human Genome Project is done, all of the genes can be gridded,– presently, several completely sequenced genomes

have been gridded,• yeast,• E. coli,• various bacteria,

• drug identification, fundamental research, etc.,

Gene Expression Technologies

• DNA Chips (Affymetrix) and MicroArrays can measure mRNA concentration of thousands of genes simultaneously

• General scheme: Extract RNA, synthesize labeled cDNA, Hybridize with DNA on chip.

The Experiment

• After hybridization– Scan the Chip and obtain an image file

– Image Analysis (find spots, measure signal and noise)

• Output File– Affymetrix chips: Measure each gene’s signal and

make a present/absent call.

– cDNA MicroArrays: competing hybridization of target and control. For each gene the log ratio of target and control.

Preprocessing: From one experiment to many

• Chip and Channel Normalization

– Aim: bring readings of all experiments to be on the same scale

– Cause: different RNA amounts, labeling efficiency and image acquisition parameters

– Method: Multiply readings of each array/channel by a scaling factor such that:

• The sum of the scaled readings will be the same for all arrays

• Find scaling factor by a linear fit of the highly expressed genes

Preprocessing: From one experiment to many

• Filtering of Genes– Remove genes that are absent in most

experiments– Remove genes that are constant in all

experiments– Remove genes with low readings which are not

reliable.

Noise and Repeats

• >90% 2 to 3 fold

• Multiplicative noise

• Repeat experiments

• Log scaledist(4,2)=dist(2,1)

log – log plot

We can ask many questions?

• Which genes are expressed differently in two known types of conditions?

• What is the minimal set of genes needed to distinguish one type of conditions from the others?

• Which genes behave similarly in the experiments?• How many different types of conditions are there?

Supervised Methods(use predefined labels)

Supervised Methods(use predefined labels)

Unsupervised Methods(use only the data)

Unsupervised Methods(use only the data)

• Goal A: Find groups of genes that have correlated expression profiles. These genes are believed to belong to the same biological process and/or are co-regulated.

• Goal B: Divide conditions to groups with similar gene expression profiles. Example: divide drugs according to their effect on gene expression.

Unsupervised Analysis

Clustering Methods

Linear Round

What is clustering?

T (RESOLUTION)

Cluster Analysis Yields Dendrogram

Applications

• Monitor expression patterns under the experimental conditions of your choosing to determine the function of the thousands genes,

• Common expression patterns can be used to identify genes that are members of the same pathway,

• Explore expression of candidate/unknown genes.

Gene/Drug Discovery

…genes involved in cancer and other diseases have been identified through a variety of techniques,

– genome expression analysis provides a means of discovering other genes that are concomitantly expressed,

– genome expression analysis provides a means of monitoring drug/treatment regimes.

Applications

• Can study the role of more than 1700 cancer related genes in association with the (rest) of the genome,

• Define interactions and describe pathways,

• Measure drug response,

• Build databases for use in molecular tumor classifications,

– benign vs. cancerous, slow vs. aggressive

Extended Applications

• Water quality testing (4 hours vs. 4 days),

• Environmental watchdogs,

• Fundamental research on non-human subjects,

• Direct sequencing of related species for evolutionary studies,

• Comparisons of gene regulation between closely related species,

• etc.

What’s the Question

• Human and chimp DNA is ~98.7 similar,

• But, we differ in many and profound ways,

• Can this difference be attributed, at least in part, to differences in gene expression, rather than differences in the actual gene and gene products?

Huh?

• Prevailing notion: a gene is mutated, better alleles survive and, in fact, out-compete old alleles…evolution marches on.

• Paper’s hypothesis: it’s not the genes that are changing, but the REGULATION of the genes.

Regulation?

• Although the # of genes (~35,000) in the genome remains controversial, it appears to be a lot less than early dogma (100,000 - 150,000 genes),

• One thought, “many” of the additional genes found in complex organisms, are transcription factors.

First

...What does it mean that our genomes are 98.7% similar at the DNA level, and how do we know this?

DNA Sequence Comparisons

Bacterial Artificial ChromosomesBACs

• F plasmid ancestry,

– maintain bacterial replication system and copy number control system.

BAC End Sequencing

• “Mate Pairs”,

– sequence both ends of the BAC using vector derived sequencing primers,

– yields about 600 bp per sequence.

Contiguous Sequences(contigs)

...looks for end-to end overlaps of at least 40 bp with no more than 6% differences in match.

What’s the significance? ...a one in 1017 event.

...if 100% sequenced.

x 543bp / read =

Science 291 (5507), 1304-1351

8, September 1999 - 25, June 2000

Chimp DNA Sequences

• 3.3x Coverage of the genome.

Human/Chimp BES Similarity

This represents coding (highly conserved) and non-coding (low conservation) regions of the genome.

Are our Phenotypes 98.7% Similar?

• Some apparent differences,

– HIV susceptibility, epithelial neoplasms (cancers), malaria, and Alzheimers,

• In fact, there is only one well understood biochemical difference,

– A 92 bp deletion in a gene that codes for a hydroxylase, results in an un-hydroxylated secretion protein in our immune system.

The Experiment

• Check patterns of gene expression level, using DNA chips, for 12,000 genes in humans, chimps, orangutans, and macaques, (TRANSCRIPTOME),

– brain, liver, and blood

• Check for protein levels using 2-D gel analysis, (PROTEOME)

• Controls,

– Microarray analysis, (17,997 transcripts),– Rodent tests.

Affymetrix

• U95A array...

Targets

• Labeled Human cDNA, Chimp cDNA, Macaque cDNA,

– Collect tissue,– Extract RNA,– Label RNA.

Cluster Analysis

• Distances represent the relative differences in expression changes.

So What?

• Changes in gene expression are greatest in the Human gene cluster.

Primates

Mice

Probably Rejected by the Journal

• Why?

– Probe was human, target at least 98.7% different,

– At the “allele specific oligonucleotide” level, single base changes may skew the data.

Microarray• Spotted 17,997 PCR products onto nylon, probed with labeled cDNAs,

– PCR primers are available, in kits, that will amplify just about any part of the human genome,

– 1000 bp fragments were generated,

• Base pair differences won’t affect probe sensitivity over this large a target.

Microarray Data

5:1 difference in expression profiles.

Proteomics(2d-gels)

• Proteins separated by mass, then by charge.• Qualitative (positions), Quantitative (amount)

8500 Protein Spots

What do You Think?

Monday

• Schedule change...

*RNAi (June 3)

Background: Review of RNAi

Specific and heritable genetic interference by double-stranded RNA in Arabidopsis thaliana

top related