pathway analysis of –omics data unit 21 biol221t: advanced bioinformatics for biotechnology irene...
TRANSCRIPT
Pathway analysis of –Pathway analysis of –omics dataomics data
Unit 21Unit 21
BIOL221TBIOL221T: Advanced : Advanced Bioinformatics for Bioinformatics for
BiotechnologyBiotechnologyIrene Gabashvili, PhD
PS4 & Exam:PS4 & Exam:
No additional time will be allowed, if No additional time will be allowed, if returned after the deadline – 0 returned after the deadline – 0 pointspoints
ProjectsProjects
Jennifer:Jennifer:
AbstractAbstract Brief Introduction to Brief Introduction to
PharmacogeneticsPharmacogenetics Molecular techniques availableMolecular techniques available Bioinformatics tools availableBioinformatics tools available Current market stateCurrent market state Ethical ConcernsEthical Concerns ConclusionConclusionSee http://www.nigms.nih.gov/Initiatives/NIH-RFI
NIH wants to hear suggestions about how to address needs and challenges
Nancy:Nancy:
AbstractAbstract Brief Introduction to Molecular Brief Introduction to Molecular
CytogeneticsCytogenetics Materials and Methods: data, Materials and Methods: data,
software tools: Genotyping Console software tools: Genotyping Console 2.1 for Cytogenetics & Partek2.1 for Cytogenetics & Partek
ResultsResults ConclusionConclusion
AnnieAnnie
Exploring the royal disease with IPA Exploring the royal disease with IPA and other bioinformatics toolsand other bioinformatics tools
TanzeemaTanzeema
Investigating drug-protein Investigating drug-protein interactions by structure interactions by structure visualization toolsvisualization tools
ProjectsProjects
JyotiJyoti Studying the characteristics of psoriatic Studying the characteristics of psoriatic
arthritis genes and common genetic arthritis genes and common genetic control for Crohn’s disease and control for Crohn’s disease and psoriatic arthritis in the pathway.psoriatic arthritis in the pathway.
Chris: Chris: On humans, chimps and mitochondrial On humans, chimps and mitochondrial
EveEve
ProjectsProjects
HarshalHarshal Exploring genetic resistance to HIV Exploring genetic resistance to HIV
and the Black Deathand the Black Death http://home.comcast.net/~igabashvil
i/hiv.htm
Priyanka: Priyanka: Designing Cloning Strategies with Designing Cloning Strategies with
Commercial Tools and FreewareCommercial Tools and Freeware
-Omes & -Omics:-Omes & -Omics:
Genome - all the genes of an organismGenome - all the genes of an organism Transcriptome – all the transcripts Transcriptome – all the transcripts
(mRNAs) of an organism(mRNAs) of an organism Proteome – all the proteins of an Proteome – all the proteins of an
organismorganism Metabolome – all metabolites (low Metabolome – all metabolites (low
molecular weight molecules participating molecular weight molecules participating in general metabolic reactions required in general metabolic reactions required for the maintenance, growth) of an for the maintenance, growth) of an organismorganism
Genomics to Genomics to ProteomicsProteomics
Gene mRNA Protein
transcription translation
Genome Transcriptome Proteome
static dynamic
One gene Many transcripts Many proteins
(alternative splicing) (post-translational modifications)
Systems BiologySystems Biology
Human Genome = 30,000 to 60,000 genesHuman Genome = 30,000 to 60,000 genes
Human Proteome = 300,000 to 1,200,000 Human Proteome = 300,000 to 1,200,000 protein variantsprotein variants
Human Metabalome = metabolic products Human Metabalome = metabolic products of the organism (lipids, carbohydrates, of the organism (lipids, carbohydrates, amino acids, peptides, prostaglandins, amino acids, peptides, prostaglandins, etc)etc)
ProteinsProteins
Exhibit far more sequence and chemical Exhibit far more sequence and chemical complexity than DNA or RNA complexity than DNA or RNA
Properties and structure are defined by Properties and structure are defined by the sequence and side chains of their the sequence and side chains of their constituent amino acidsconstituent amino acids
The “engines” of lifeThe “engines” of life >95% of all drugs targets are proteins>95% of all drugs targets are proteins Favorite topic of post-genomic eraFavorite topic of post-genomic era
Protein Complex Protein Complex DiscoveryDiscovery
WhoWho: : identity of proteins in identity of proteins in complex?complex?
WhatWhat: : biological process involved?biological process involved? WhereWhere: is the complex localized?: is the complex localized? WhenWhen: : are proteins involved in the are proteins involved in the
complex?complex? How muchHow much: stoichiometry of proteins in : stoichiometry of proteins in
complex, quantity- relative vs absolutecomplex, quantity- relative vs absolute RegulationRegulation: modifications (kinase,etc) : modifications (kinase,etc)
proteolysis (protease)proteolysis (protease)
Post Translational RegulationPost Translational Regulation
• What structural changes occur to What structural changes occur to create an active protein, alternate create an active protein, alternate splicing, proteolytic processing?splicing, proteolytic processing?
• How is a protein’s activity regulated?How is a protein’s activity regulated?• Are modifications involved in Are modifications involved in
regulation? regulation?
The Post-genomic The Post-genomic
ChallengeChallenge How to rapidly identify a protein?How to rapidly identify a protein? How to rapidly purify a protein?How to rapidly purify a protein? How to identify post-trans modification?How to identify post-trans modification? How to find information about function?How to find information about function? How to find information about activity?How to find information about activity? How to find information about location?How to find information about location? How to find information about structure?How to find information about structure?
Answer: Look at Protein Features
Examples of Protein Examples of Protein FeaturesFeatures
Composition FeaturesComposition Features Mass, pI, Absorptivity, RgMass, pI, Absorptivity, Rg
Sequence FeaturesSequence Features Active sites, Binding Sites, Targeting, Active sites, Binding Sites, Targeting,
Location, Property Profiles, 2Location, Property Profiles, 2oo structure structure Structure FeaturesStructure Features
Super-Secondary Structure, Global Super-Secondary Structure, Global Fold, Volume Fold, Volume http://www.expasy.org/tools/
Molecular WeightMolecular Weight
Useful for SDS PAGE and 2D gel analysisUseful for SDS PAGE and 2D gel analysis Useful for deciding on SEC matrixUseful for deciding on SEC matrix Useful for deciding on MWC for dialysisUseful for deciding on MWC for dialysis EssentialEssential in synthetic peptide analysis in synthetic peptide analysis EssentialEssential in peptide sequencing (classical in peptide sequencing (classical
or mass-spectrometry based)or mass-spectrometry based) Essential Essential in proteomics and high in proteomics and high
throughput protein characterizationthroughput protein characterization
What is Proteomics?What is Proteomics?
The study of the proteome, which is the The study of the proteome, which is the protein complement of the genome protein complement of the genome
Everything post-genomic, “protein Everything post-genomic, “protein chemistry on an unprecedented, high-chemistry on an unprecedented, high-throughput scale”, including structure, throughput scale”, including structure, function and interactions of proteins function and interactions of proteins
As coined in 1994 by Marc Wilkins: the As coined in 1994 by Marc Wilkins: the functional study of proteins using Mass functional study of proteins using Mass SpectrometrySpectrometry
ProteomicsProteomics The The ProteomeProteome is the complete set of is the complete set of
proteins in the cell under a set of proteins in the cell under a set of conditions. It is conditions. It is dynamic and complexdynamic and complex, , and characterized in terms of:and characterized in terms of: Structure Structure – shape, electrostatics– shape, electrostatics AbundanceAbundance – protein expression – protein expression Localization Localization - subcellular location- subcellular location ModificationsModifications – post translational – post translational
modifications modifications InteractionsInteractions – protein-protein interactions – protein-protein interactions
((interactomeinteractome))
Components of Classical Components of Classical ProteomicsProteomics
Protein Separation
Bioinformatics
Mass Spectroscopy
Mass Spectrometry
Challenges facing Proteomic Challenges facing Proteomic TechnologiesTechnologies
Limited sample material – no PCR!Limited sample material – no PCR! Sample degradation (occurs rapidly, Sample degradation (occurs rapidly,
even during sample preparation)even during sample preparation) Post-translational modifications (often Post-translational modifications (often
skew results)skew results) Specificity among tissue, Specificity among tissue,
developmental and temporal stagesdevelopmental and temporal stages Perturbations by environmental Perturbations by environmental
(disease/drugs) conditions(disease/drugs) conditions DynamicsDynamics
Analytical ChallengesAnalytical Challenges
Cell biology techniques to isolate structuresCell biology techniques to isolate structures SensitivitySensitivity Dynamic range: low affinity bindersDynamic range: low affinity binders ThroughputThroughput
Biochemical ThroughputBiochemical Throughput Analytical ThroughputAnalytical Throughput
Direct measurement of intact complexDirect measurement of intact complex Quantitation of components and Quantitation of components and
modifications modifications
Basic Proteomic Analysis Basic Proteomic Analysis SchemeScheme
Protein Mixture Individual Proteins
PeptidesPeptide Mass
Protein Identification
Separation
2D-SDS-PAGE
DigestionTrypsin
Mass Spectroscopy
MALDI-TOF
Database Search
Spot Cutting
General Strategy for Protein CharacterizationGeneral Strategy for Protein Characterization
Purification/Purification/EnrichmentEnrichment
1-DE1-DE 2-DE2-DE SolutionSolution
• IdentificationIdentification• SequencingSequencing
PeptidesPeptidesProteinProtein oror
MeasurementMeasurement
AnalysisAnalysis
Mass SpectrometryMass Spectrometry
Protein Separation methods for Protein Separation methods for ProteomicsProteomics
Dynamic range is central issue for Dynamic range is central issue for separationsseparations
Gel ElectrophoresisGel Electrophoresis 1 and 2-Dimensional Separations1 and 2-Dimensional Separations Native and DenaturingNative and Denaturing Detection- stainsDetection- stains
Chromatographic or Electrophoretic Chromatographic or Electrophoretic Liquid ChromatographyLiquid Chromatography Capillary ElectrophoresisCapillary Electrophoresis Affinity ChromatographyAffinity Chromatography Multi-Dimensional SeparationsMulti-Dimensional Separations DetectionDetection
2D PAGE2D PAGE
2-D gel electrophoresis is a 2-D gel electrophoresis is a multi-step procedure that can be used multi-step procedure that can be used to separate hundreds to thousands of to separate hundreds to thousands of proteins with extremely high resolution. proteins with extremely high resolution.
It works by separation of proteins by It works by separation of proteins by their pI's in one dimension using an their pI's in one dimension using an immobilized pH gradient (first immobilized pH gradient (first dimension: isoelectric focusing) and dimension: isoelectric focusing) and then by their MW's in the second then by their MW's in the second dimension.dimension.
2D PAGE2D PAGE
2-D gel electrophoresis process 2-D gel electrophoresis process consists of these steps: consists of these steps:
Sample preparation Sample preparation First dimension: isoelectric focusing First dimension: isoelectric focusing Second dimension: gel Second dimension: gel
electrophoresis electrophoresis StainingStaining Imaging analysis via softwareImaging analysis via software
Bioinformatics tools for Bioinformatics tools for PAGE:PAGE:
http://world-2dpage.expasy.org/repository/
(database)(database)
http://expasy.org/melanie/ http://www.2d-gel-analysis.com/http://www.2d-gel-analysis.com/
Drawbacks of 2D PAGEDrawbacks of 2D PAGE
Technique precision lacks reliable Technique precision lacks reliable reproduction.reproduction.
Spots often overlap, making Spots often overlap, making identifications difficult.identifications difficult.
More of “an art” than “a science.”More of “an art” than “a science.” Slow and tedious.Slow and tedious. Process contains may “open” phases Process contains may “open” phases
where contamination is possible.where contamination is possible.
Array-based ProteomicsArray-based Proteomics
Employ two-hybrid assaysEmploy two-hybrid assays Use GFP, FRET, and GSTUse GFP, FRET, and GST
GFP = green florescent proteinGFP = green florescent protein FRET = florescence resonance energy FRET = florescence resonance energy
transfertransfer GST = glutathione S-transferase, a well GST = glutathione S-transferase, a well
characterized protein used as a marker characterized protein used as a marker protein.protein.
Array-based ProteomicsArray-based Proteomics
Offer a high-throughput technique Offer a high-throughput technique for proteome analysis.for proteome analysis.
These small plates are able to hold These small plates are able to hold many different samples at a time.many different samples at a time.
Structural ProteomicsStructural Proteomics
Current techniques are not considered Current techniques are not considered “high throughput” within the “high throughput” within the structural realm.structural realm.
Work is undergoing to significantly Work is undergoing to significantly reduce the amount of painstaking labor reduce the amount of painstaking labor in the crystallization of proteins.in the crystallization of proteins.
Novel solutions combine current Novel solutions combine current technologies, such as NMR and XRC.technologies, such as NMR and XRC.
Next Lecture : More about protein Next Lecture : More about protein structuresstructures
Clinical ProteomicsClinical Proteomics
This area of proteomics focuses on This area of proteomics focuses on accelerating drug development for accelerating drug development for diseases through the systematic diseases through the systematic identification of potential drug targets.identification of potential drug targets.
More specific information on proteins, More specific information on proteins, instead of raw genes will make instead of raw genes will make computational analysis simpler in the computational analysis simpler in the coming years.coming years.
Mass SpectrometryMass Spectrometry Another tool to analyze the proteome.Another tool to analyze the proteome. In general a Mass Spectrometer consists of:In general a Mass Spectrometer consists of:
Ion SourceIon Source Mass AnalyzerMass Analyzer DetectorDetector
Mass Spectrometers are used to quantify the Mass Spectrometers are used to quantify the mass-to-charge (m/z) ratios of substances.mass-to-charge (m/z) ratios of substances.
From this quantification, a mass is determined, From this quantification, a mass is determined, proteins are identified, and further analysis is proteins are identified, and further analysis is performed. performed.
MS is an analytical technique used to measure the mass-to-charge MS is an analytical technique used to measure the mass-to-charge ratio of ions, used to find the composition of a physical sample by ratio of ions, used to find the composition of a physical sample by generating a mass spectrum representing the masses of sample generating a mass spectrum representing the masses of sample components. components.
““Mass Spec” Analyses can be run in Mass Spec” Analyses can be run in TandemTandem
MS/MS refers to two MS MS/MS refers to two MS experiments performed “in tandem.”experiments performed “in tandem.”
Among other things, MS/MS allows Among other things, MS/MS allows for the determination of sequence for the determination of sequence information, usually in the form of information, usually in the form of peptides (small parts of a protein).peptides (small parts of a protein).
This information is used by This information is used by algorithms to identify a protein on algorithms to identify a protein on the basis of mass of a constituent the basis of mass of a constituent peptide.peptide.
Other Proteomics Other Proteomics AbbreviationsAbbreviations
MALDI, Matrix-Assisted Laser MALDI, Matrix-Assisted Laser Desorption IonizationDesorption Ionization
TOF, Time Of Flight TOF, Time Of Flight ESI, Electrospray IonizationESI, Electrospray Ionization MS/MS, tandemMS/MS, tandem FTICR, Fourier Transform Ion FTICR, Fourier Transform Ion
Cyclotron Resonance Mass Cyclotron Resonance Mass Spectrometry, a high resolution Spectrometry, a high resolution sensitivity MS techniquesensitivity MS technique
Mass Spectrometry Analysis of Mass Spectrometry Analysis of ProteinsProteins
Analysis of Peptides – digested proteins & mixtures of proteins Analysis of Peptides – digested proteins & mixtures of proteins (“Bottom Up” Approach)(“Bottom Up” Approach) ESI – Tandem Mass Spectrometers (QIT, LIT, Q-TOFs, TSQs)ESI – Tandem Mass Spectrometers (QIT, LIT, Q-TOFs, TSQs) MALDI-Tandem Mass Spectrometers (LIT, QIT, Q-TOFs, TOF/TOFs)MALDI-Tandem Mass Spectrometers (LIT, QIT, Q-TOFs, TOF/TOFs) ESI-FTMS ESI-FTMS MALDI-FTMSMALDI-FTMS
Analysis of Intact Proteins (“Top Down” Approach)Analysis of Intact Proteins (“Top Down” Approach) FTMSFTMS ESI-TOFESI-TOF MALDI-TOFMALDI-TOF
Analysis of Protein ComplexesAnalysis of Protein Complexes Ion Mobility mass spectrometersIon Mobility mass spectrometers GEMMAGEMMA
Mass Spectrometry technology evolves at a constant rateMass Spectrometry technology evolves at a constant rate Product cycles are 18-24 monthsProduct cycles are 18-24 months
If you are lost….If you are lost….
Consider an example: calculating a Consider an example: calculating a person’s weight, without them knowing.person’s weight, without them knowing.
If we have a backpack that we know is 10 If we have a backpack that we know is 10 pounds, we could have them put it on.pounds, we could have them put it on.
Then, walk the subject over a hidden scale Then, walk the subject over a hidden scale in the floor. in the floor.
The weight of the person could be The weight of the person could be obtained by subtracting the weight of the obtained by subtracting the weight of the backpack.backpack.
Mass Spectrometry*Mass Spectrometry* Analytical method to measure the Analytical method to measure the
molecular or atomic weight of molecular or atomic weight of samplessamples
In a similar manner:In a similar manner:
Mass spectrometers allow the Mass spectrometers allow the determination of a mass-to-charge determination of a mass-to-charge ratio of the analyte.ratio of the analyte.
By knowing the charged state of the By knowing the charged state of the analyte through the addition of analyte through the addition of protons (the backpack in the protons (the backpack in the example), the mass can be example), the mass can be calculated after deconvolution of the calculated after deconvolution of the spectrum.spectrum.
Example MS/MS Example MS/MS SpectrumSpectrum
This spectrum shows the fragmentation of a peptide, which is used to determine the sequence of the peptide, via a search algorithm.
Comprehensive Analysis of Protein-Protein InteractionsComprehensive Analysis of Protein-Protein Interactions
Co-immunoprecipitation Co-immunoprecipitation
ProteolysisProteolysisLC/MS/MSLC/MS/MS
LC/LC/MS/MSLC/LC/MS/MS
Identification of Protein ComponentsIdentification of Protein ComponentsIdentification of ModificationsIdentification of Modifications
Dynamics of components and modificationsDynamics of components and modifications
Multiprotein ComplexMultiprotein ComplexProtein Interaction Protein Interaction ChromatographyChromatography
AgaroseAgarose ProteinGST
AgaroseAgarose Ig-G
AgaroseAgarose Ig-G
C
LTEV
TAP-Tagged Proteins
Cell Biology/Genetics
Second Generation Proteomics TechnologySecond Generation Proteomics TechnologyShotgun ProteomicsShotgun Proteomics
Identification of Proteins in MixturesIdentification of Proteins in Mixtures
MS/MSMS/MS
SEQUESTSEQUEST
ymr130ymr130ymr142ymr142ymr154ymr154 ymr201ymr201
DigestionDigestion SeparationSeparation
Complex PeptideComplex PeptideMixtureMixture
LCLC
Protein Identification data acquired at ~1 peptide per 1-3 secsProtein Identification data acquired at ~1 peptide per 1-3 secs
Eng, McCormack, Yates, JASMS 1994Eng, McCormack, Yates, JASMS 1994
766.
4868
766.
4868
836.
4362
836.
4362
904.
4685
904.
4685
997.
5691
997.
5691
1209
.571
012
09.5
710
1221
.747
312
21.7
473
1570
.678
215
70.6
782
1697
.817
516
97.8
175
1800
.914
418
00.9
144
1890
.964
318
90.9
643
2061
.136
620
61.1
366
00
1000010000
2000020000
3000030000
4000040000
Co
un
tsC
ou
nts
800 800 1000 1000 1200 1200 1400 1400 1600 1600 1800 1800 2000 2000 Mass (m/z)Mass (m/z)
1406.72201406.7220
……PPGTGKTLLAK PPGTGKTLLAK AVANESGANFISVKAVANESGANFISVK FYVINGPEIM... FYVINGPEIM...
Molecular WeightMolecular Weight
200200 300300 400400 500500 600600 700700 800800 900900 10001000 12001200 14001400m/zm/z
00101020203030404050506060707080809090
100100
Rel
ativ
e A
bund
ance
Rel
ativ
e A
bund
ance
922.4922.4
835.4835.4
1051.61051.6778.5778.5333.1333.1 619.0619.0 1074.51074.5 1236.71236.7468.1468.1 961.4961.4
FragmentationFragmentation
MS videosMS videos
http://video.google.com/videoplay?docid=-6140373375438015688&q=mass+spectrometry&hl=en (Berkeley lecture, 2006) (Berkeley lecture, 2006)
http://video.google.com/videoplay?docid=4083728878452715101 (short, 2007) (short, 2007)