pathway analysis of –omics data unit 21 biol221t: advanced bioinformatics for biotechnology irene...

57
Pathway analysis of – Pathway analysis of – omics data omics data Unit 21 Unit 21 BIOL221T BIOL221T : Advanced : Advanced Bioinformatics for Bioinformatics for Biotechnology Biotechnology Irene Gabashvili, PhD

Upload: alexis-boone

Post on 28-Dec-2015

221 views

Category:

Documents


2 download

TRANSCRIPT

Pathway analysis of –Pathway analysis of –omics dataomics data

Unit 21Unit 21

BIOL221TBIOL221T: Advanced : Advanced Bioinformatics for Bioinformatics for

BiotechnologyBiotechnologyIrene Gabashvili, PhD

PS4 & Exam:PS4 & Exam:

No additional time will be allowed, if No additional time will be allowed, if returned after the deadline – 0 returned after the deadline – 0 pointspoints

ProjectsProjects

Jennifer:Jennifer:

AbstractAbstract Brief Introduction to Brief Introduction to

PharmacogeneticsPharmacogenetics Molecular techniques availableMolecular techniques available Bioinformatics tools availableBioinformatics tools available Current market stateCurrent market state Ethical ConcernsEthical Concerns ConclusionConclusionSee http://www.nigms.nih.gov/Initiatives/NIH-RFI

NIH wants to hear suggestions about how to address needs and challenges

Nancy:Nancy:

AbstractAbstract Brief Introduction to Molecular Brief Introduction to Molecular

CytogeneticsCytogenetics Materials and Methods: data, Materials and Methods: data,

software tools: Genotyping Console software tools: Genotyping Console 2.1 for Cytogenetics & Partek2.1 for Cytogenetics & Partek

ResultsResults ConclusionConclusion

AnnieAnnie

Exploring the royal disease with IPA Exploring the royal disease with IPA and other bioinformatics toolsand other bioinformatics tools

TanzeemaTanzeema

Investigating drug-protein Investigating drug-protein interactions by structure interactions by structure visualization toolsvisualization tools

DahnDahn

Research on heart valve diseaseResearch on heart valve disease

ProjectsProjects

JyotiJyoti Studying the characteristics of psoriatic Studying the characteristics of psoriatic

arthritis genes and common genetic arthritis genes and common genetic control for Crohn’s disease and control for Crohn’s disease and psoriatic arthritis in the pathway.psoriatic arthritis in the pathway.

Chris: Chris: On humans, chimps and mitochondrial On humans, chimps and mitochondrial

EveEve

ProjectsProjects

HarshalHarshal Exploring genetic resistance to HIV Exploring genetic resistance to HIV

and the Black Deathand the Black Death http://home.comcast.net/~igabashvil

i/hiv.htm

Priyanka: Priyanka: Designing Cloning Strategies with Designing Cloning Strategies with

Commercial Tools and FreewareCommercial Tools and Freeware

-Omes & -Omics:-Omes & -Omics:

Genome - all the genes of an organismGenome - all the genes of an organism Transcriptome – all the transcripts Transcriptome – all the transcripts

(mRNAs) of an organism(mRNAs) of an organism Proteome – all the proteins of an Proteome – all the proteins of an

organismorganism Metabolome – all metabolites (low Metabolome – all metabolites (low

molecular weight molecules participating molecular weight molecules participating in general metabolic reactions required in general metabolic reactions required for the maintenance, growth) of an for the maintenance, growth) of an organismorganism

Genomics to Genomics to ProteomicsProteomics

Gene mRNA Protein

transcription translation

Genome Transcriptome Proteome

static dynamic

One gene Many transcripts Many proteins

(alternative splicing) (post-translational modifications)

Systems BiologySystems Biology

Human Genome = 30,000 to 60,000 genesHuman Genome = 30,000 to 60,000 genes

Human Proteome = 300,000 to 1,200,000 Human Proteome = 300,000 to 1,200,000 protein variantsprotein variants

Human Metabalome = metabolic products Human Metabalome = metabolic products of the organism (lipids, carbohydrates, of the organism (lipids, carbohydrates, amino acids, peptides, prostaglandins, amino acids, peptides, prostaglandins, etc)etc)

ProteinsProteins

Exhibit far more sequence and chemical Exhibit far more sequence and chemical complexity than DNA or RNA complexity than DNA or RNA

Properties and structure are defined by Properties and structure are defined by the sequence and side chains of their the sequence and side chains of their constituent amino acidsconstituent amino acids

The “engines” of lifeThe “engines” of life >95% of all drugs targets are proteins>95% of all drugs targets are proteins Favorite topic of post-genomic eraFavorite topic of post-genomic era

mRNA Doesn't Mean You mRNA Doesn't Mean You Have ProteinHave Protein

Storage

Decay

Transport

Protein Complex Protein Complex DiscoveryDiscovery

WhoWho: : identity of proteins in identity of proteins in complex?complex?

WhatWhat: : biological process involved?biological process involved? WhereWhere: is the complex localized?: is the complex localized? WhenWhen: : are proteins involved in the are proteins involved in the

complex?complex? How muchHow much: stoichiometry of proteins in : stoichiometry of proteins in

complex, quantity- relative vs absolutecomplex, quantity- relative vs absolute RegulationRegulation: modifications (kinase,etc) : modifications (kinase,etc)

proteolysis (protease)proteolysis (protease)

Post Translational RegulationPost Translational Regulation

• What structural changes occur to What structural changes occur to create an active protein, alternate create an active protein, alternate splicing, proteolytic processing?splicing, proteolytic processing?

• How is a protein’s activity regulated?How is a protein’s activity regulated?• Are modifications involved in Are modifications involved in

regulation? regulation?

The Post-genomic The Post-genomic

ChallengeChallenge How to rapidly identify a protein?How to rapidly identify a protein? How to rapidly purify a protein?How to rapidly purify a protein? How to identify post-trans modification?How to identify post-trans modification? How to find information about function?How to find information about function? How to find information about activity?How to find information about activity? How to find information about location?How to find information about location? How to find information about structure?How to find information about structure?

Answer: Look at Protein Features

Examples of Protein Examples of Protein FeaturesFeatures

Composition FeaturesComposition Features Mass, pI, Absorptivity, RgMass, pI, Absorptivity, Rg

Sequence FeaturesSequence Features Active sites, Binding Sites, Targeting, Active sites, Binding Sites, Targeting,

Location, Property Profiles, 2Location, Property Profiles, 2oo structure structure Structure FeaturesStructure Features

Super-Secondary Structure, Global Super-Secondary Structure, Global Fold, Volume Fold, Volume http://www.expasy.org/tools/

Molecular WeightMolecular Weight

Useful for SDS PAGE and 2D gel analysisUseful for SDS PAGE and 2D gel analysis Useful for deciding on SEC matrixUseful for deciding on SEC matrix Useful for deciding on MWC for dialysisUseful for deciding on MWC for dialysis EssentialEssential in synthetic peptide analysis in synthetic peptide analysis EssentialEssential in peptide sequencing (classical in peptide sequencing (classical

or mass-spectrometry based)or mass-spectrometry based) Essential Essential in proteomics and high in proteomics and high

throughput protein characterizationthroughput protein characterization

What is Proteomics?What is Proteomics?

The study of the proteome, which is the The study of the proteome, which is the protein complement of the genome protein complement of the genome

Everything post-genomic, “protein Everything post-genomic, “protein chemistry on an unprecedented, high-chemistry on an unprecedented, high-throughput scale”, including structure, throughput scale”, including structure, function and interactions of proteins function and interactions of proteins

As coined in 1994 by Marc Wilkins: the As coined in 1994 by Marc Wilkins: the functional study of proteins using Mass functional study of proteins using Mass SpectrometrySpectrometry

ProteomicsProteomics The The ProteomeProteome is the complete set of is the complete set of

proteins in the cell under a set of proteins in the cell under a set of conditions. It is conditions. It is dynamic and complexdynamic and complex, , and characterized in terms of:and characterized in terms of: Structure Structure – shape, electrostatics– shape, electrostatics AbundanceAbundance – protein expression – protein expression Localization Localization - subcellular location- subcellular location ModificationsModifications – post translational – post translational

modifications modifications InteractionsInteractions – protein-protein interactions – protein-protein interactions

((interactomeinteractome))

Components of Classical Components of Classical ProteomicsProteomics

Protein Separation

Bioinformatics

Mass Spectroscopy

Mass Spectrometry

Challenges facing Proteomic Challenges facing Proteomic TechnologiesTechnologies

Limited sample material – no PCR!Limited sample material – no PCR! Sample degradation (occurs rapidly, Sample degradation (occurs rapidly,

even during sample preparation)even during sample preparation) Post-translational modifications (often Post-translational modifications (often

skew results)skew results) Specificity among tissue, Specificity among tissue,

developmental and temporal stagesdevelopmental and temporal stages Perturbations by environmental Perturbations by environmental

(disease/drugs) conditions(disease/drugs) conditions DynamicsDynamics

Analytical ChallengesAnalytical Challenges

Cell biology techniques to isolate structuresCell biology techniques to isolate structures SensitivitySensitivity Dynamic range: low affinity bindersDynamic range: low affinity binders ThroughputThroughput

Biochemical ThroughputBiochemical Throughput Analytical ThroughputAnalytical Throughput

Direct measurement of intact complexDirect measurement of intact complex Quantitation of components and Quantitation of components and

modifications modifications

Basic Proteomic Analysis Basic Proteomic Analysis SchemeScheme

Protein Mixture Individual Proteins

PeptidesPeptide Mass

Protein Identification

Separation

2D-SDS-PAGE

DigestionTrypsin

Mass Spectroscopy

MALDI-TOF

Database Search

Spot Cutting

General Strategy for Protein CharacterizationGeneral Strategy for Protein Characterization

Purification/Purification/EnrichmentEnrichment

1-DE1-DE 2-DE2-DE SolutionSolution

• IdentificationIdentification• SequencingSequencing

PeptidesPeptidesProteinProtein oror

MeasurementMeasurement

AnalysisAnalysis

Mass SpectrometryMass Spectrometry

Protein Separation methods for Protein Separation methods for ProteomicsProteomics

Dynamic range is central issue for Dynamic range is central issue for separationsseparations

Gel ElectrophoresisGel Electrophoresis 1 and 2-Dimensional Separations1 and 2-Dimensional Separations Native and DenaturingNative and Denaturing Detection- stainsDetection- stains

Chromatographic or Electrophoretic Chromatographic or Electrophoretic Liquid ChromatographyLiquid Chromatography Capillary ElectrophoresisCapillary Electrophoresis Affinity ChromatographyAffinity Chromatography Multi-Dimensional SeparationsMulti-Dimensional Separations DetectionDetection

2D PAGE2D PAGE

2-D gel electrophoresis is a 2-D gel electrophoresis is a multi-step procedure that can be used multi-step procedure that can be used to separate hundreds to thousands of to separate hundreds to thousands of proteins with extremely high resolution. proteins with extremely high resolution.

It works by separation of proteins by It works by separation of proteins by their pI's in one dimension using an their pI's in one dimension using an immobilized pH gradient (first immobilized pH gradient (first dimension: isoelectric focusing) and dimension: isoelectric focusing) and then by their MW's in the second then by their MW's in the second dimension.dimension.

2D PAGE2D PAGE

2-D gel electrophoresis process 2-D gel electrophoresis process consists of these steps: consists of these steps:

Sample preparation Sample preparation First dimension: isoelectric focusing First dimension: isoelectric focusing Second dimension: gel Second dimension: gel

electrophoresis electrophoresis StainingStaining Imaging analysis via softwareImaging analysis via software

Bioinformatics tools for Bioinformatics tools for PAGE:PAGE:

http://world-2dpage.expasy.org/repository/

(database)(database)

http://expasy.org/melanie/ http://www.2d-gel-analysis.com/http://www.2d-gel-analysis.com/

Drawbacks of 2D PAGEDrawbacks of 2D PAGE

Technique precision lacks reliable Technique precision lacks reliable reproduction.reproduction.

Spots often overlap, making Spots often overlap, making identifications difficult.identifications difficult.

More of “an art” than “a science.”More of “an art” than “a science.” Slow and tedious.Slow and tedious. Process contains may “open” phases Process contains may “open” phases

where contamination is possible.where contamination is possible.

Protein Sequencing: Protein Sequencing: fragmenting into peptidesfragmenting into peptides

Protein Sequencing: by Edmund degradation.

Separation by HPLC and detect by absorbance at 269nm.

Array-based ProteomicsArray-based Proteomics

Employ two-hybrid assaysEmploy two-hybrid assays Use GFP, FRET, and GSTUse GFP, FRET, and GST

GFP = green florescent proteinGFP = green florescent protein FRET = florescence resonance energy FRET = florescence resonance energy

transfertransfer GST = glutathione S-transferase, a well GST = glutathione S-transferase, a well

characterized protein used as a marker characterized protein used as a marker protein.protein.

Array-based ProteomicsArray-based Proteomics

Array-based ProteomicsArray-based Proteomics

Offer a high-throughput technique Offer a high-throughput technique for proteome analysis.for proteome analysis.

These small plates are able to hold These small plates are able to hold many different samples at a time.many different samples at a time.

Two-Hybrid AssayTwo-Hybrid Assay

Figure 12-35. Griffiths et. al. Modern Genetic Analysis.

Structural ProteomicsStructural Proteomics

Current techniques are not considered Current techniques are not considered “high throughput” within the “high throughput” within the structural realm.structural realm.

Work is undergoing to significantly Work is undergoing to significantly reduce the amount of painstaking labor reduce the amount of painstaking labor in the crystallization of proteins.in the crystallization of proteins.

Novel solutions combine current Novel solutions combine current technologies, such as NMR and XRC.technologies, such as NMR and XRC.

Next Lecture : More about protein Next Lecture : More about protein structuresstructures

Clinical ProteomicsClinical Proteomics

This area of proteomics focuses on This area of proteomics focuses on accelerating drug development for accelerating drug development for diseases through the systematic diseases through the systematic identification of potential drug targets.identification of potential drug targets.

More specific information on proteins, More specific information on proteins, instead of raw genes will make instead of raw genes will make computational analysis simpler in the computational analysis simpler in the coming years.coming years.

Mass SpectrometryMass Spectrometry Another tool to analyze the proteome.Another tool to analyze the proteome. In general a Mass Spectrometer consists of:In general a Mass Spectrometer consists of:

Ion SourceIon Source Mass AnalyzerMass Analyzer DetectorDetector

Mass Spectrometers are used to quantify the Mass Spectrometers are used to quantify the mass-to-charge (m/z) ratios of substances.mass-to-charge (m/z) ratios of substances.

From this quantification, a mass is determined, From this quantification, a mass is determined, proteins are identified, and further analysis is proteins are identified, and further analysis is performed. performed.

MS is an analytical technique used to measure the mass-to-charge MS is an analytical technique used to measure the mass-to-charge ratio of ions, used to find the composition of a physical sample by ratio of ions, used to find the composition of a physical sample by generating a mass spectrum representing the masses of sample generating a mass spectrum representing the masses of sample components. components.

““Mass Spec” Analyses can be run in Mass Spec” Analyses can be run in TandemTandem

MS/MS refers to two MS MS/MS refers to two MS experiments performed “in tandem.”experiments performed “in tandem.”

Among other things, MS/MS allows Among other things, MS/MS allows for the determination of sequence for the determination of sequence information, usually in the form of information, usually in the form of peptides (small parts of a protein).peptides (small parts of a protein).

This information is used by This information is used by algorithms to identify a protein on algorithms to identify a protein on the basis of mass of a constituent the basis of mass of a constituent peptide.peptide.

Other Proteomics Other Proteomics AbbreviationsAbbreviations

MALDI, Matrix-Assisted Laser MALDI, Matrix-Assisted Laser Desorption IonizationDesorption Ionization

TOF, Time Of Flight TOF, Time Of Flight ESI, Electrospray IonizationESI, Electrospray Ionization MS/MS, tandemMS/MS, tandem FTICR, Fourier Transform Ion FTICR, Fourier Transform Ion

Cyclotron Resonance Mass Cyclotron Resonance Mass Spectrometry, a high resolution Spectrometry, a high resolution sensitivity MS techniquesensitivity MS technique

Mass Spectrometry Analysis of Mass Spectrometry Analysis of ProteinsProteins

Analysis of Peptides – digested proteins & mixtures of proteins Analysis of Peptides – digested proteins & mixtures of proteins (“Bottom Up” Approach)(“Bottom Up” Approach) ESI – Tandem Mass Spectrometers (QIT, LIT, Q-TOFs, TSQs)ESI – Tandem Mass Spectrometers (QIT, LIT, Q-TOFs, TSQs) MALDI-Tandem Mass Spectrometers (LIT, QIT, Q-TOFs, TOF/TOFs)MALDI-Tandem Mass Spectrometers (LIT, QIT, Q-TOFs, TOF/TOFs) ESI-FTMS ESI-FTMS MALDI-FTMSMALDI-FTMS

Analysis of Intact Proteins (“Top Down” Approach)Analysis of Intact Proteins (“Top Down” Approach) FTMSFTMS ESI-TOFESI-TOF MALDI-TOFMALDI-TOF

Analysis of Protein ComplexesAnalysis of Protein Complexes Ion Mobility mass spectrometersIon Mobility mass spectrometers GEMMAGEMMA

Mass Spectrometry technology evolves at a constant rateMass Spectrometry technology evolves at a constant rate Product cycles are 18-24 monthsProduct cycles are 18-24 months

If you are lost….If you are lost….

Consider an example: calculating a Consider an example: calculating a person’s weight, without them knowing.person’s weight, without them knowing.

If we have a backpack that we know is 10 If we have a backpack that we know is 10 pounds, we could have them put it on.pounds, we could have them put it on.

Then, walk the subject over a hidden scale Then, walk the subject over a hidden scale in the floor. in the floor.

The weight of the person could be The weight of the person could be obtained by subtracting the weight of the obtained by subtracting the weight of the backpack.backpack.

Mass Spectrometry*Mass Spectrometry* Analytical method to measure the Analytical method to measure the

molecular or atomic weight of molecular or atomic weight of samplessamples

Typical Mass SpectrumTypical Mass Spectrum

In a similar manner:In a similar manner:

Mass spectrometers allow the Mass spectrometers allow the determination of a mass-to-charge determination of a mass-to-charge ratio of the analyte.ratio of the analyte.

By knowing the charged state of the By knowing the charged state of the analyte through the addition of analyte through the addition of protons (the backpack in the protons (the backpack in the example), the mass can be example), the mass can be calculated after deconvolution of the calculated after deconvolution of the spectrum.spectrum.

LCQ Mass SpectrometerLCQ Mass Spectrometer

Compare to Compare to Microarrays…Microarrays…

……and other biochipsand other biochips

Example MS/MS Example MS/MS SpectrumSpectrum

This spectrum shows the fragmentation of a peptide, which is used to determine the sequence of the peptide, via a search algorithm.

Typical MS experiment:Typical MS experiment:

Comprehensive Analysis of Protein-Protein InteractionsComprehensive Analysis of Protein-Protein Interactions

Co-immunoprecipitation Co-immunoprecipitation

ProteolysisProteolysisLC/MS/MSLC/MS/MS

LC/LC/MS/MSLC/LC/MS/MS

Identification of Protein ComponentsIdentification of Protein ComponentsIdentification of ModificationsIdentification of Modifications

Dynamics of components and modificationsDynamics of components and modifications

Multiprotein ComplexMultiprotein ComplexProtein Interaction Protein Interaction ChromatographyChromatography

AgaroseAgarose ProteinGST

AgaroseAgarose Ig-G

AgaroseAgarose Ig-G

C

LTEV

TAP-Tagged Proteins

Cell Biology/Genetics

Second Generation Proteomics TechnologySecond Generation Proteomics TechnologyShotgun ProteomicsShotgun Proteomics

Identification of Proteins in MixturesIdentification of Proteins in Mixtures

MS/MSMS/MS

SEQUESTSEQUEST

ymr130ymr130ymr142ymr142ymr154ymr154 ymr201ymr201

DigestionDigestion SeparationSeparation

Complex PeptideComplex PeptideMixtureMixture

LCLC

Protein Identification data acquired at ~1 peptide per 1-3 secsProtein Identification data acquired at ~1 peptide per 1-3 secs

Eng, McCormack, Yates, JASMS 1994Eng, McCormack, Yates, JASMS 1994

766.

4868

766.

4868

836.

4362

836.

4362

904.

4685

904.

4685

997.

5691

997.

5691

1209

.571

012

09.5

710

1221

.747

312

21.7

473

1570

.678

215

70.6

782

1697

.817

516

97.8

175

1800

.914

418

00.9

144

1890

.964

318

90.9

643

2061

.136

620

61.1

366

00

1000010000

2000020000

3000030000

4000040000

Co

un

tsC

ou

nts

800 800 1000 1000 1200 1200 1400 1400 1600 1600 1800 1800 2000 2000 Mass (m/z)Mass (m/z)

1406.72201406.7220

……PPGTGKTLLAK PPGTGKTLLAK AVANESGANFISVKAVANESGANFISVK FYVINGPEIM... FYVINGPEIM...

Molecular WeightMolecular Weight

200200 300300 400400 500500 600600 700700 800800 900900 10001000 12001200 14001400m/zm/z

00101020203030404050506060707080809090

100100

Rel

ativ

e A

bund

ance

Rel

ativ

e A

bund

ance

922.4922.4

835.4835.4

1051.61051.6778.5778.5333.1333.1 619.0619.0 1074.51074.5 1236.71236.7468.1468.1 961.4961.4

FragmentationFragmentation

MS videosMS videos

http://video.google.com/videoplay?docid=-6140373375438015688&q=mass+spectrometry&hl=en (Berkeley lecture, 2006) (Berkeley lecture, 2006)

http://video.google.com/videoplay?docid=4083728878452715101 (short, 2007) (short, 2007)

Example: Kinetochore

Ctf19p

Mcm21p

Okp1p

Ortiz et al. 1999

Mcm19p

Chl4p

Ghosh et al. 2001

** Defined by 2-hybrid and co-immunoprecipitation **

Ctf3p

Mcm22p

Mcm16p

Measday et al. 2002

Cheeseman et al. 2001Janke et al. 2002

Li et al. 2002