chapter 1 : review of literatureshodhganga.inflibnet.ac.in/bitstream/10603/29640/7/07_chapter...

CHAPTER 1 : Review of literature

''Development of Mass spectrometry (MS) to embrace biological macromolecules has meant a revolutionary breakthrough, making chemical biology into the "big science" of our time. Chemists can now rapidly and reliably identifY what proteins a sample contains. They can also produce three-dimensional images of protein molecules in solution. Hence scientists can both "see" the proteins and understand how they function in the cells."

"Nobel prize foundation during prize announcement for Nobel prize in Chemistry (2002)"

Chapter 1 Review of Literature

The Immune System: A SUPERSYSTEM

The term "super system", coined by Tomio Tada to designate highly integrated

life systems such as the immune system, nervous system, and embryogenesis,

conceptualizes the supersystem as: "While the mechanistic system is defined as a

set of diverse elements so connected and related as to form an organic whole for a

particular purpose. The "supersystem" engenders its own elements from a single

progenitor. The diverse elements generated form relationships by mutual

adaptation and coadaptation, create a dynamic self-regulating system through self

organization. Human body is a closed self-satisfied system, yet open to the

environment, receiving outside signals to transduce them into internal messages

for self-regulation and expansion"(Tada, 1997).

The immune response in higher vertebrates 1s a remarkably adaptive

defense mechanism that has evolved to evade myriads of invading pathogens and

malignant cellular growths. It is orchestrated through a complex interplay of wide

variety of cell types and soluble molecules, collectively referred as 'the Immune

System'. The immune system functions through two interrelated components: the

innate (non-adaptive) and the adaptive immune system. The important differences


between these two components being specificity and memory associated with the

response mounted against the pathogens. Several physical and chemical factors,

apart from some specialized cells such as the blood monocytes, neutrophils, tissue

macrophages, dendritic cells, NK cells constitute the innate immune system. The

adaptive immune system, however, functions centrally through specialized cells

termed as the lymphocytes, which are derived from pleuripotent hematopoetic

stem cells in the bone marrow. Two major populations of the lymphocytes, the B

and the T lymphocytes, are distinguished primarily on the basis of function,

maturation and the expression of cell surface markers.

Humoral Immune System

The Humoral Immune Response (HIR) is the aspect of immunity that is

mediated by secreted antibodies (as opposed to cell-mediated immunity, which

involves T lymphocytes) produced in the cells of the B lymphocyte lineage (B

cell). B cells play a vital role in the immune response as they produce antibodies

that recognize specific antigens that are foreign to the body and potentially

pathogenic. In the absence of pathogens B lymphocytes do not secrete antibodies.

Resting cells have a small cytoplasm with scarce endoplasmic reticulum (ER) 1

cisternae. Upon encounter with antigens, B lymphocytes start to proliferate rapidly

and differentiate primarily into Ig-secreting plasma cells that secrete specific

antibodies against potentially pathogenic non-self antigens.

B cell signaling

The B-cell antigen receptor (BCR) is characterized by a complex hetero

oligomeric structure in which antigen binding and signal transduction are 2


compartmentalized into distinct receptor subunits. For effective humoral immune

responses, mature B cells must respond to foreign antigens and generate antigen

specific effector cells. So, it is easy to imagine that the BCR complex is essentially

required for the later stages of B cell maturation as well as effector phases of

mature B cells.

B cell activation is initiated following the engagement of the B cell receptor

(BCR) with specific antigen. Cross linking of the B cell receptor is prerequisite for

initiation of signal transduction. Aggregation of receptors by binding of

multivalent antigen results in conformational changes in the cytoplasmic tails of

IT AMs in such a way that it results in cascade of phosphorylation events which

ends into the nucleus.

The Ig-a/Ig-~ heterodimer is an essential component of BCR (Kane et al.,

2000; Kurosaki, 2000; Latour and Veillette, 2001; Moretta et al., 2001; Tamir and

Cambier, 1998; Turner and Kinet, 1999). A key feature of the cytoplasmic tail of

both lga and lg~ is the presence of a structural motif termed immunoreceptor

tyrosine based activation motif (ITAM) with a consensus sequence of

ID /EX7D /EX2YX2L/IX7YX2L/I] (Kane et al., 2000; Kurosaki, 2000; Moretta

eta!., 2001; Tamir and Cambier, 1998; Turner and Kinet, 1999) ].

The IT AM tyro sines are phosphorylated upon receptor engagement by the

Src-family protein tyrosine kinase,(Lyn) and the cytosolic protein tyrosine kinase

spleen tyrosine kinase (Syk), generating binding sites for the tandem src homology

2 (SH2) domains of Syk itself (Chan et al., 1994; Chow and Veillette, 1995; Latour

and Veillette, 2001). Binding of Syk to the phosphorylated ITAM and its

3


subsequent autophosphorylation and phosphorylation by Src kinases result in Syk

activation. Activation of Syk is the key event which results in plethora of

downstream signaling events i.e. Phosphorylation, protein recruitment, generation

of secondary messengers, activation of cyclicGMP regulated pathways, gene

activation. The SFTKs also get activated by recruitment to the receptor, although

dephosphorylation by CD45 is likely to be the major mechanism of SFTK

activation (Chan et al., 1994; Latour and Veillette, 2001; van Oers and Weiss,

1995). Once activated, the SFTKs and Syk initiate distinct and inter-related

signaling pathways. SFTKs are required for the activation of NF:v-B and serve to

phosphorylate additional important signaling substrates such as CD22 and

BAM32. Syk phosphorylates BLNK (also through a unique phosphorylated non

ITAM tyrosine in the Iget cytosolic tail (Clark et al., 1994). BLNK coordinates the

assembly and activation of a receptor-retained signalsome containing PLCy2, Vav,

Btk, Nck, and Grb2 (Kurosaki and Tsukada, 2000).

The signals are transmitted through the action of kinases which passes the

signal by phopshorylating the downstream effector proteins. And these activated

molecules are brought to their normal basal level by the action of phosphatases.

Understanding the mechanisms that allow intracellular signals to be relayed from

the cell membrane to specific intracellular targets still remains a daunting

challenge. Many protein kinases and protein phosphatases have relatively broad

substrate specificities and may be used in varying combinations to achieve distinct

biological responses. Thus, mechanisms must exist to organize the correct

repertoires of enzymes into individual signaling pathways. Between these two,

4


there is another set of proteins that are called adaptors which, by definition, lack

both enzymatic and transcriptional activities but control lymphocyte activation by

mediating constitutive or inducible protein-protein or protein-lipid interactions via

modular interaction domain.

Role of adaptors/scaffolds in immune cell signaling

Extracellular signals are relayed by receptors to the interior of the cell and then

translated by transducers into intracellular signals that diverge and are amplified by

signal regulators, received by effector proteins, and finally erased by signal

terminators. These adaptor/ scaffold proteins couple the BCR to transducer

elements, initiating the signaling cascade from the receptor.

Scaffold proteins play an important role in bringing several signaling elements

together in one preformed protein complex, thereby ensuring the specificity and

the speed with which a signal can travel down such a preorganized response

pathway. The deletion of a scaffold protein compromises (increases or decreases)

the efficiency of signaling through a particular response pathway. The co-ordinate

interaction between adaptor and effector molecules is required for the

propagation and dynamic modification of externally applied signals. Adapter

molecules are multidomain proteins lacking intrinsic catalytic activity, functioning

instead by nucleating molecular complexes during signal transduction.

These scaffolds can function in at least four ways by acting as platforms on

which signaling molecules can assemble, by localizing signaling molecules at

specific sites in a cell, by coordinating positive and negative feedback signals to

5


modify the signaling pathways and by protecting activated signaling molecules

from inactivation. These functions of scaffold proteins can provide additional

complexity to the signaling cascade and create signaling thresholds or regulate

complex signaling behaviors, such as graded or digital signaling, transient or

sustained signaling, and oscillatory signaling.

Domains and motifs present in adaptors

The major domains of lymphocyte adapters are Src homology 2 and 3

(SH2 and SH3), phosphotyrosine-binding (PTB) and pleckstrin homology (PH).

These are 40-150 amino acid modular structures containing a ligand-binding

recognition pocket that, in the case of SH2, SH3 and PTB domains recognize a

specific 3-6 amino acid sequence motif.

SH2 domains are phosphotyrosine-binding modules that generally

recognize p Y xxy motifs (where p Y represents phosphotyrosine and y represents

any hydrophobic amino acid). SH3 domains interact with proline containing

peptides that often conform to R/KxxPxxP (class I) or PxxPxR/K (class II)

motifs. PTB domains recognize NPxY sequences in which the tyrosine is not

necessarily phosphorylated. PH domains share a similar fold to the PTB domain

but bind to phosphatidylinositol phosphates (PIPs) and play a role in membrane

targeting. Two domain types that have not been found in the adapters of antigen

receptor signaling pathways are WW domains, which recognize proline-rich

ligands (the name 'WW' refers to two conserved tryptophan residues that are

spaced 20-22 amino acids apart), and PDZ domains, which generally recognize

6


carboxy-terminal motifs (the name 'PDZ'domain is derived from the PDZ

domain containing proteins PSD-95, Dlg and Z01). PDZ domains are present in

adapters that localize signaling complexes at specialized regions of cell-cell

contact, such as the neuronal synapse. PDZ adapters may potentially play an

analogous role in lymphocytes, in which the stable cell-cell contact observed upon

TCR binding to major histocompatibility complex (MHC)-peptide has been

termed the 'immunological synapse.

Adaptors of B lymphocytes

It has become clear that scaffold proteins have a crucial role in regulating

signaling cascades. By binding two or more components of a signaling pathway,

scaffold proteins can help to localize signaling molecules to a specific part of the

cell or to enhance the efficacy of a signaling pathway. Scaffold proteins can also

affect the thresholds and the dynamics of signaling reactions by coordinating

positive and negative feedback signal. Here is the table of the adaptor proteins

present in B lymphocyte,

7

Chapterl

Cytoplasmic adapters Name Str:Uicture

Bam32 c ... -o~

BLNK ,o-o-o-!J"fj)= .. J

BRDG1 ~

Cbl famil~' ,. Crk famity c .. M~

Grap cm,.lit

Shb

She

3BP2

I Nck

{) Potential pTyr residue

• Proline-rich motif

Q PTB domain

Suze (kD)

32

65

37

.~ 120

28,40,42

28

55, 66

46,52,66

e:o

Expressiorn

lB cells

B cells aJil.cj ma ClfO p ha.ges

lB cells <md myeloid eeffs

Ubiquitous

Ubi•QIUiitOUIS

Uib ioQ!UiitOUIS

Ulbi'QiUiitous

Ulb i•QIUiit•O'UIS

Associated mollecu!les

PlCy-2

PlC·~-2, ltUa,, Nck, Vav, Grb2

Tee

Gr'b2, p8:5, Glfi!<, SlAP. S)•'lk, ZA.P-70, BlNK

Cb•l, C3G, Pax:iilliinl, Cas

LAT. She, Sos, Sam6·8

SHP2, Gr:b2, Crkl, pH5

LAT, p•85, Sire. E:ps8, Gr:b2, CD3s, PlCy- i

SHUP, Grb2, RasGAIP,

LAT. Ct( ZA.P-70, Gu:b2. iPlCy-·1, SyGc

47 UbiqjUJitoUIS PAIK, SlP-76, SoDS, Cb( WASP. IRS-·!. NIIIK

-PH domaiin I NlSmoti,f .. SH2domain - Rtngdomain

"' SH3domain - SOCS box

~

Review of Literature

Table 1 :Cytoplasmic

adapter proteins with a

potential role in

antigen receptor

signaling. The domain

structure of each

adapter is shown in

diagrammatic form,

followed by size (kDa)

on SDS PAGE under

reducing conditions,

expression pattern,

associated molecules

and key references.

Chapterl Review of literature

Adaptor in MAPK signaling

The Ras-Raf-MEK-ERK/MAPK pathway (MEK is MAPK and ERK

kinase, MAPK is mitogen-activated protein kinase, and ERK is extracellular signal

regulated kinase) is an evolutionary conserved pathway that is involved in the control

of many fundamental cellular processes that include cell proliferation, survival,

differentiation, apoptosis, motility and metabolism. Therefore, cells have developed

mechanisms by which this single pathway modulates numerous cellular responses

from a wide range of activating factors. This specificity is achieved by several

mechanisms, including temporal and spacial control of MAPK signaling components.

Key to this control are protein scaffolds, which are multidomain proteins that interact

with components of the MAPK cascade in order to assemble signaling complexes.

Studies conducted on different scaffolds, in different biological systems, have shown

that scaffolds exert substantial control over MAPK signaling, influencing the signal

intensity, time course and, importantly, the cellular responses. Protein scaffolds,

therefore, are integral elements to the modulation of the MAPK network in

fundamental physiological processes. Originally identified in yeast (Eli on, 2001 ),

several scaffolds that modulate lY1APK activity in mammalian cells have been

recognized (Kolch, 2005; Morrison, 2001; Sacks, 2006).

Kinase suppressor of Ras (KSR)

Kinase suppressor of Ras (KSR) was originally isolated from a genetic screen as a

positive regulator of MAPK signaling (Kornfeld et al., 1995). Although not present in

Chapterl Review of Literature

yeast, KSR homologues have been identified in all multicellular organisms examined,

including nematodes, Drosophila, C. elegans and mammals (Morrison, 2001 ). The

biological function of KSR was obscure for some time. Due to a high degree of

sequence similarity to C-Raf, KSR was initially considered to be an enzyme, but

kinase activity had not been unequivocally demonstrated (Morrison, 2001 ).

Subsequent investigation led to the realization that KSR is a MAPK scaffold.

KSR is one of the best characterized scaffolds in the MAPK pathway, and

binds to C-Raf, MEK1 /2 and ERK1 /2(Morrison, 2001). More recent evidence

reveals that KSR also binds B-Raf(Ritt et al., 2007), but the physiological significance

of this interaction has not been established. Other proteins known to interact with

KSR include 14-3-3, G protein-~y, heat shock proteins 70 and 90, cdc37 and C

TAK1 (Morrison, 2001 ). Interestingly, MEK is constitutively associated to KSR, while

ERK binds only in response to a stimulus. As is typical for. scaffolds, optimal

expression levels of KSR are required for maximal responses of MAPK to signaling

cues.

Loss-of-function analysis has provided some of the best evidence for the in

vivo function of J<.SR in MAPK signaling. There are two ksr genes in C. elegans and

between them they are required for most Ras dependant signaling during

development (Ohmachi et al., 2002). Although KSR knockout mice are

developmentally normal, they have defects in antigen triggered T cell proliferation

(Nguyen et al., 2002), and are resistant to antibody-induced arthritis (Fusello et al.,

2006). Furthermore, mouse embryonic fibroblasts from KSR knockout mice have

10


defective activation of ERJ( by TNF-IX and interleuk:in-1 (Fusello et al., 2006).

Together, these studies strongly suggest that KSR fulfils an 1mportant role in the

regulation of MAPK signaling during the immune response and inflammation.

Interestingly, KSR null mice are less susceptible to Ras-mediated skin cancer (Lozano

et al., 2003), identifying a role for KSR in the regulation of MAPK-mediated cell

proliferation.

Protein scaffolds in MAPK specificity

Despite progress in our understanding of MAPK signaling, one question that

remains still to be explored is how a particular stimulus elicits the correct response.

This topic, termed MAPK specificity, seems remarkable when one considers the

diverse range of cellular responses induced by many different activators, all of which

signal through the MEK/ERJ( pathway. Spatial and temporal changes to MAPK

signaling influence the cellular response to a specific stimulus and are of particular

interest when considering MAPK specificity. Protein scaffolds provide one

mechanism by which spatiotemporal MAPK signaling is controlled. However, there

are additional means by which scaffolds are able to control aspects of MAPK

signaling. It has been proposed that scaffolds can provide both positive and negative

regulatory mechanisms (Carrington and Johnson, 1999). By assembling individual

components of the MAPK cascade, scaffolds facilitate their interactions and

propagation of the signal. However, the assembly of the multiprotein complexes also

sequesters these same components away from other signaling pathways.

11

Chapter! Review of Literature

Consequently, protein scaffolds preferentially activate specific cascades, while

concomitantly inhibiting others

Scaffold regulation of MAPK compartmentalization

KSR seems to provide a docking platform at the plasma membrane onto

which C-Raf, MEK1 /2 and ERK1 /2 can form a complex, and allow efficient

propagation of the signaling cascade. In support of this notion is the finding that in

quiescent cells, KSR is maintained in the cytosol through an interaction with 14-3-3

(Muller et al., 2001), and in a Triton insoluble fraction through an interaction with

'impedes mitogenic signal propagation' (IMP) (Matheny et al., 2004). Following

stimulation by growth factors, KSR translocates to the plasma membrane, where it

facilitates activation of MEK and ERI( (Muller et al., 2001). Therefore, KSR is able

to regulate the spatial activation of MEK/ERI( and presumably the cellular response.

This concept is bolstered by the observation that over expression in PC12 cells of B-

KSR, a neuronal-specific isoform of KSR, switches EGF signaling from a

proliferative signal to a differentiation signal (J\1uller et al., 2000). Phosphorylation of

ERI<. induces dimerisation and this process has been proposed to effect ERK activity

(Philipova and \X!hitaker, 2005). Recent evidence reveals that KSR1 is required for

ERK dimerisation (Casar et al., 2008). Following EGF activation, KSR1 acts as a

platform for the formation of ERK dimers (Casar et al., 2008). Importantly, these

dimers specifically phosphorylate cytosolic substrates, while ERK monomers, which

are not bound to KSR 1, are translocated into the nucleus to catalyze phosphorylation

12


of nuclear substrates (Casar et al., 2008). Consequently, by regulating dimerisation of

ERK, KSR 1 can control whether cytosolic or nuclear substrates are activated by

MAPKs.

Scaffolds in temporal regulation of MAPK

Signaling Analogous to their role in regulating spatial MAPK signaling, protein

scaffolds can also control the duration of a MAPK signal. Like the originally

identified isoform of KSR, a brain specific isoform of KSR, termed B-KSR, interacts

with MEK and ERK (Muller et al., 2000). Over expression of B-KSR in PC12 cells

results in increased basal levels of active phosphorylated ERK (Muller et al., 2000),

and increases NGF-induced ERK activation and NGF-dependant differentiation.

Interestingly, over expression of B-KSR also causes a sustained increase in ERK

activation fo!Jowing EGF treatment resulting in differentiation (Muller et al., 2000).

As described above, PC12 cells usually differentiate or proliferate in response to

NGF or EGF, respectively, and these differences are due to the duration of ERK

activation (Marshall, 1995). Therefore, it appears B-KSR can alter the time course of

:NlAPK signaling in response to growth factors, and as a consequence, alter the

cellular outcome to these growth factors.

Graded and threshold signaling

A recent characteristic that has been applied to MAPK signaling is that of graded or

"all or nothing" signaling. For example, following activation some MAPK pathways

13


reach a critical level of signal strength and behave in a switch-like manner. Therefore

individual cells within a population will be either "on" or "off' with respect to the

particular outcome (Takahashi and Pryciak, 2008). Examples of such cellular

responses are proliferation, differentiation and programmed cell death. Other

pathways respond in a graded fashion, where all the cells within a population show a

uniform "output" increase that is proportional to the activating stimulus(Takahashi

and Pryciak, 2008). This can be observed in Drosophila embryo development, where

graded concentrations of morphogens determine the dorsoventral axis. While still

incompletely understood, analysis of the mating MAPK pathway in yeast has shown

that protein scaffolds are able to influence graded and switch-like MAPK signaling

(Takahashi and Pryciak, 2008). The yeast scaffold, SteS (analogous to KSR) is

essential for MAPK activation in response to pheromone stimulation(Elion, 2001 ).

SteS interacts with multiple kinases, recruiting them to the plasma membrane. When

SteS is restricted to the cytosol, the activated kinases are inefficient in signal

propagation, and so a strong signal is required in order to produce any significant

output (Takahashi and Pryciak, 2008). Consequently, pheromone signaling is more

switch-like. However, when SteS is localized to the membrane, pheromone signaling

is more graded, as low levels of active kinases are able to efficiently propagate the

signal (Takahashi and Pryciak, 2008). \'Vhile additional studies are required to

elucidate how graded and switch-like MAPK signaling is regulated, protein scaffolds

do appear to have an important role.

14


Role of mass spectrometry in proteomic research

Proteomics started to arouse tremendous attention after the completion of

sequencing human genome. Functional genomics with focus on the dynamics of gene

transcription, translation and protein-protein interactions also became one of the

central topics in modern biomedical research.

Both genomics and proteomics have close tie with biochemical separation

techniques. Gel electrophoresis was successfully developed for oligonucleotide

sequencing in late 1970 and 2D gel was also developed to separate proteins about the

same time. Changes in protein abundance between samples can be quantitatively

measured by the comparison of results from different spectra of 2D gels.

Nevertheless, a reliable and fast method to determine amino acid sequencing for

proteins was not available until early 1990.

In 1988, Tanaka et al. obtained large biomolecular mass spectra by usmg

nanometal particles assisted laser desorption and roused lots interest to pursue

different methods for biomolecular ionization (Tabata et al., 2007). Later, Hillenkamp

and co-workers developed matrix-assisted laser desorption/ ionization (MALDI)

mass spectrometry (MS) which can rapidly measure the molecular weights of

different proteins with a time-of-flight (TOF) mass spectrometer (Karas et al., 2000).

At about the same time, Penn and co-workers developed electrospray ionization

(ESI) mass spectrometry which also can give soft ionization of proteins CW ong et al.,

2008) . Gradually, MALDI and ESI mass spectrometers became the two major tools

for protein analysis.

15


In 1993, Henzel et al. reported the first work related to the identification of

protein from the results of 2D gel. The peptides were generated by in situ tryptic

digestion of proteins. Masses of different peptides were analyzed by MALDI-TOF

mass spectrometer. The mass patterns were used for comparison with known

libraries to confirm peptides which can be further used for protein identification.

This work has been regarded as a major milestone in using mass spectrometry for

proteomics application.

In general, the mass resolution and accuracy of a MALDI-TOF mass

spectrometer is not high enough to give a non-ambiguous identification of a peptide

with a high confidence. In addition, some amino acid residues have a very similar or

even an identical molecular weight. For example, the masses of both isoleucine and

leucine are 113.16 Daltons (Da) since they are isomers. The chemical masses of

glutamine and lysine are 128.131 and 128.17 4 Da, respectively. The difference

between these two amino acids is within 0.04 Da. Therefore, it is desirable to have an

alternative approach to get higher confidence on peptide sequencing information

than the measurement of the masses of peptides only.

Low energy dissociation methods for peptide fragmentation

Collision-induced dissociation (CID) was introduced in 1968 to obtain

structure information with a tandem mass spectrometer. The first mass analyzer is

used to determine the mass spectrum of the sample and the second mass spectrum

(MS2) is used to determine the structure of selected peaks from the first mass

16


spectrum by a collision process with selected gas molecules. Sometimes, higher orders

of mass spectra (MSn) due to CID can be obtained to get more information on the

identification of biomolecular structures. After the publication of the proteomic

paper by Henzel et a!. CID was quickly adopted by proteomic community to give

more reliable determination of sequences of peptides which can be subsequently used

for more accurate protein determination. In addition to CID, electron capture

dissociation (ECD), infraredmultiphoton dissociation (IRMPD), Electron transfer

dissociation (ETD)were also developed to help on sequence determination during the

past decade. Some mass spectrometers such as ion trap mass spectrometer and

Fourier-transform ion cyclotron mass spectrometer (FTICR-MS) can have the same

device to serve as first and higher order mass spectrometer.

In proteomics, the determination of the entire amino acid sequencing of a

protein is basically done through two types of approaches. One is to sequence a few

peptide fragments and match these sequences with a protein derived from genome

sequencing information or previous mass spectra libraries. Up to now, most complete

proteomics studies have been in this category. The other approach is de novo

sequenctng which is needed when neither genom1c sequenctng information nor

sufficient mass spectrum data available. Nevertheless, correct identification of the

entire sequencing from de novo sequencing of peptide is still a big challenge since

peptide-fragmentation data might not contain sufficient information to

unambiguously derive the complete amino acid sequence. Up to now, the effort on de

novo sequencing is still a small percentage of total proteomics research.

17


Strategies for mass spectrometry based proteomics

Currently, there are two fundamental strategies for proteomics study. One is

bottom-up and the other is top-down.

(A) Bottom-up approach:

In bottom-up approach, purified proteins or complex protein mixtures are subjected

to chemical or enzymatic cleavage and the peptide products are usually separated by

chromatography followed with mass spectrometry analysis .

(B) Top-down approach:

In top-down proteomics, intact protein ions or large protein fragments are subjected

to gas-phase fragmentation for mass spectrometry analysis directly (Bogdanov and

Smith, 2005; Sze et al., 2002). With top-down analysis, all post-translational

modifications will be subjected to analysis while bottom-up analysis may skip the

fragments with post-translation modification. Since many fragmentation processes

such as CID are not efficient for very large proteins (MW> 1 OO,OOODa) in routine

operation, a true top down strategy only works for relatively small proteins. Some

researchers also considered mass spectrometry analysis of peptides obtained as in situ

digestion of proteins after gel separation as a top-down strategy.

With rapid progress in mass spectrometry technology development and

bioinformatics during the past few years, proteomics study to identify various

proteins in proteomic samples is still expected to progress as a routine high

throughput exercise in the near future. Moreover, quantitative determination of each

individual protein still needs more effort. Furthermore, the extension of dynamic

18


range to measure ultra-low quantity of proteins inside of a proteomic sample is still in

high demand and deserves special attention.

Mass spectrometry based techniques for biomolecule detection

Mass spectrometry developers have tried to put major effort in developing

mass spectrometer for biomolecule detection for many decades but did not have very

much success until the successful development of MALDI and ESI. Nowadays,

MALDI and ESI still stay as two key methods for protein and peptide analysis.

Matrix-assisted laser desorption/ionization (MALDI)

In 1988, Hillenkamp, Karas and co-workers (Karas and Kruger, 2003)

discovered that large protein molecular ions can be produced by laser desorption

without much fragmentation when these biomolecules are mixed with small organic

compounds that serve as matrix for strong absorption of a laser beam. MALDI

desorption mechanism has been considered as large biomolecules are carried out into

space by many small matrix molecules which get vaporized due to the absorption of

laser photons. The major advantages of MALDI-TOF include:

(1) Fast analysis speed: some commercial MALDI-TOF-MS can finish 100

samples in less than 1 0 min.

(2) No mass range limitation: It can measure from short peptides to a very

large antibody (> 100,000 Da).

19


(3) Simple mass spectrum for analysis: most ions are with one charge and only

a small percentage of ions are doubly charged. Ions with triply charges are

seldom observed.

( 4) Molecular imaging: smce the laser beam can be focused to a tiny spot

~S~-tm depending on the focal lens used and the beam divergence, MALDI

TOF has been successfully used for tissue imaging. It makes the imaging of

proteomic feasible.

(5) High detection sensitivity: the detection sensitivity can reach to a few

attomoles for short peptides.

The disadvantages of MALDI-TOF-MS include: (1) low reproducibility: due

to the difficulty in controlling a crystallization process and ion production as a strong

function of laser influence, MALDI-TOF-MS is poor in reproducibility. (2) Low

mass resolution: due to the broad energy spread of biomolecules during the

desorption plus the possible matrix attachment and biomolecule fragmentation, mass

resolution is usually poor for very large biomolecules. Although MALDI-TOF-MS

can measure very large intact biomolecular ion, its low mass resolution gives a major

limit on its application for direct large protein identification. (3) Inconvenience in

detecting small biomolecules due to the interference of ions produced by matrix

molecules. Direct ionization on silicon (DIOS) (Wei et al., 1999) and direct ionization

on metal (DIOM) have been developed to reduce this concern. In these approaches,

biomolecules are placed on surfaces with porous structures or sharp metal needles to

achieve ionization without the need of matrices. It is still the primary tool for

20


imaging of large biomolecules (Caldwell and Caprioli, 2005; Chaurand et al., 2006;

Reyzer and Caprioli, 2007) without the need of labeling.

Electrospray ionization

ESI was developed by Fenn and co-workers(Wong et al., 2008). ESI has been

very broadly used in proteomics and other applications which need the determination

of the mass of selected biomolecules. In electrospray ionization, a liquid is pushed

through a very small capillary applied with a high voltage between the tip and an

extraction plate. A schematic of a typical electrospray ionization ion trap mass

spectrometer is shown below in Fig. 2. This liquid contains the analyte biomolecules

dissolved in a selected solvent. Volatile acids, bases or buffers are often added to the

solution. The liquid pushes itself out of the capillary by the strong electric field and

forms an aerosol with a mist of small droplets. An uncharged carrier gas such as rare

gas or nitrogen is sometimes used to help nebulize the liquid and to help producing

the solvent droplets. As the solvent evaporates, the biomolecular ions can be

produced for mass spectrometric analysis.

TH-17091

21

(I f·t"I1S

~/( f Cl 7'


Ia )

HPLC 1nlet -- Sklmnefa'11tlon Nebul1zer _ 1 gas In et .j. Co~"ers .o~

Remo,•able Cy ~ode

h1gh flew cap lila-y

j Rlr g Etec;ron 1 j Electrode \' ' 111 piler

Heat ,ng N- ,., ""01 ' uccpo es

Waste

Fig. 2: Experimental schematic of electrospray ionization (ESI) ion trap mass spectrometry (Courresv : Applied Biosvsrems ESI ion rrap mass specrromerer)

In electrospray processes, biomolecular ions observed are quasi-molecular

wns created by protonation or deprotonation. For some biomolecules such as

polysaccharides, they tend to form a complex with an addition of an alkali ion.

Multiply charged ions especially for large biomolecules are often observed. For large

biomolecules, there can be many charge states. The number of charge depends on the

mass and chemical properties of biomolecules as well as the solvent used . Therefore,

there are often many ion peaks for just one single biomolecule compound. For

samples with a pure compound, the pattern of these multiple peaks can help to get

very accurate mass determination. For a complex proteomic sample, it can add

22


tremendous complexity for the mass spectrum. Therefore, a pre-separation such as a

high performance liquid chromatography (HPLC) is often needed for electrospray

ionization mass spectrometry for proteomic application.

ESI is a very soft ionization technique in which very little fragments are

observed. Therefore, it is very suitable for biomolecule analysis. The special

advantages for ESI include: (1) high reproducibility: no crystallization process is

involved. (2) High flexibility to attach to different types of mass spectrometer: due to

the preference of multiply charged ions, the electrospray ionization source with lower

m/ z due to the multiple charges of each biomolecule can be fitted to ion-trap,

quadrupole, Fourier-transform ion cyclotron resonance (FTICR) and TOF mass

spectrometer.

The major disadvantages are

(1) Complex spectra due to peaks from multiple charged ions.

(2) Large sample quantity: this disadvantage more or less disappears after the

introduction of nanospray(Chatman et al., 1999).

However, ESI cannot be used for molecular imaging.

Quantitative proteomics

Up to now, the bottom-up approach has been proved quite efficient in achieving

protein ID. However, it is equally important if not more so in term of quantitative

measurements for different proteins in a sample. There are two different approaches

23


tn term of quantitative proteomtcs. One relies on the labeling methodology. The

other is strictly based on mass spectra without labeling.

Label-free approach

All proteomic researchers definitely prefer label-free approach for quantitative

proteomics if it is reliable and trustful. Nevertheless, the entire process for proteomic

analysis is quite complex. It is very difficult to assure quality control on every

purification and analytical step in different laboratories. Indeed, it is not even easy to

assure data quality from the same laboratory. For example, sample collection may be

a big concern. It is known that degradation of proteins/peptides can occur without

precaution. Ionization efficiency for a selected protein under a different environment

can be quite different in both MALDI and ESI ionization processes. It is well known

ionization is a strong function of acidity in the sample. When HPLC-MS on-line

analysis is used, the quantity of proteins/peptides to be analyzed can be a strong

function on the time protein/peptide is eluted for MS analysis. For example, a mass

spectrum obtained at the peak of HPLC can be significantly higher than the mass

spectrum near the bottom of a HPLC peak. All these complications are very difficult

to be totally eliminated.

For label-free, LC-MS experiments to achieve quantitative determination,

peptide ion intensity counting and spectral counting have been used extensively. For

peptide ion counting, the peak height or the area of a peak at a selected mass-to-

24


charge rauo 1s obtained by counung the number of ions. Some commercial

instruments such as Thermo's LTQ-FT-ICR are set for doing spectra counting.

Labeling approach

Four major methods are briefly introduced in the follows. They include: (1)

isotopic coded affinity tag (ICAT), (2) isobaric tags for relative and absolute

quantitation (iTRAQ), (3) stable 1sotope labeling with amino acid in cell culture

(SILAC) and (4) Hz018labeling.

Chemical modification

Cells or tissue

Purification or fract ionation

Protein

Peptides

MS sample

Figure3

I ,-1 I I

~ r •

I

. -I

I I ~- J

I

I

I I , -1 , -1 I I I I ~r• ~ r •

I

.- I I '- I

I I I I

~-· ~- J I I

Metabolic labeling

• I

I ,-1 I I

~ r •

,-1 I I ~- J

I

Stable isotope labels are incorporated at various stages through the different experimental workflows, allowing protein populations to be mixed (denoted by horizontal lines between boxes). Newer methods allow multiplexing of samples, which is advantageous for comparing up to fou r sample states (iTRAQ) or three states (triple encoding SILAC) in a single MS analysis. Chemical modification methods label extracted proteins or peptides, and despite taking care to process

25


samples in parallel, losses occurring during any handling steps cannot be accounted for (dotted boxes). This is a potential source of quantitative error. In metabolic labeling, samples can be mixed as intact cells and processed together throughout the sample processing workflow. Therefore, sample losses at a particular step do not affect quantitative accuracy.

Stable isotope labeling with amino acid in cell culture(SILAC)

SILAC was developed by Mann and co-workers to detect proteomic

differences between two cell samples (Ong and Mann, 2007). One of the cultured cell

populations is fed with normal amino acids. In contrast, second cell population is fed

with amino acids labeled with stable (non-radioactive) isotopes. For example, the

medium can contain arginine labeled with six nC instead of the normal 12C. During

the cell growing period, they incorporate the arginine into all of their proteins.

Therefore, all heavy arginine-containing peptides are heavier than their normal

counterparts by 6 Da. The proteins from both cell populations can be combined and

analyzed together by mass spectrometry. Pairs of chemically identical peptides of

different stable isotope composition can be clearly distinguished due to their mass

difference. There must be difference of at least 4Da between two differently

isotopically labeled amino acids. The ratio of peak intensities for such peptide pairs

can accurately reflect the population ratio for the two proteins. Presently, SILAC has

become a powerful tool to study cell signaling (Ong et al., 2002; Ong et al., 2003;

Ong and Mann, 2006; Ong and Mann, 2007). It is clear that cell culture is needed for

SILAC. The disadvantages for SILAC include: (1) the culture process is time

consuming, (2) some samples cannot be obtained through culture processes and (3)

26


some cell types cannot accommodate certain amino acids. However, this technique

worked fine with our system.

Stable isotope encoding changes the physiochemical properties of the peptides by

the least possible amount. Protein/peptide samples that are labeled with stable

isotopes have shifted m/ z values when compared to their natural, non-isotope

labeled counterparts but are otherwise identical in all respects(Zhang and Regnier,

2002). Thus, stable isotopes such as 13C, 1sN, and 180 do not induce shifts in HPLC

retention times. Therefore labeled and non-labeled peptides show up as pairs in mass

spectra (Ong et al., 2002). Their relative intensities can be directly visualized.

Deuterated peptides shift slightly from their non-deuterated counterparts(Zhang and

Regnier, 2002). However, this problem can be corrected if quantitation is based on

the entire elution profiles of HPLC instead of on a few single observations.

Generally, there are three ways to label proteins or peptides with stable isotopes

(Figure 3). Metabolic labeling supplies stable isotopes during the growth and

development of cells (Bantscheff et al., 2007; Ong and Mann, 2005) and organisms

(Krijgsveld et al., 2003; Oda et al., 1999). Chemical labeling modifies certain amino

acid side chains with natural or isotope-labeled reagents(Gygi et al., 1999; Ross et al.,

2004). Enzymatic labeling uses trypsin or Glu-C catalyzed incorporation of 180

during protein digestion (Reynolds et al., 2002; Yao et al., 2001 ). To distinguish the

labeled and non-labeled forms of peptides by MS, it is recommended to generate a

minimum of 4 Da difference in the peptide pairs.

27


Posttranslational modification study: Phosphoproteomics

Since its first characterization on glycogen phosphorylase in 1955, protein

phosphorylation has been recognized as a central mechanism for cell regulation and

signaling. It is estimated that one-third of eukaryotic proteins are phosphorylated, a

result of carefully regulated protein kinase and phosphatase activities (Cohen, 2002).

Protein phosphorylation events are detected by increase in amino-acid residue mass

of +SODa, which reports the addition of HP03. Sites of phosphorylation can be

identified from mass shifts in fragment ions generated by gas-phase fragmentation

(MS/MS) of phosphopeptides.

PTM ion signatures can be monitored using MS or MS/MS scanning methods

tailored to specific gas-phase reactivities. For example, peptides containing

phosphotyrosine can often be detected by a characteristic fragment ion of 216 Da,

formed by peptide bond cleavages on both sides of the phosphotyrosine residue6. In

addition, peptides containing phosphorylated serine and threonine often undergo

cleavage of the phosphoester bond and loss of H3P04 as a neutral species ('neutral

loss'), yielding a product with mass lowered by 98 Da (-18 Da from the

unphosphorylated species). Neutral loss often reduces additional peptide

fragmentation but mcreases the difficulty of matching peptide sequences to the

MS/MS spectrum. An 'MS3' scanning method can be used in this situation, wherein

the neutral loss product ion is isolated for an extra fragmentation step. This generates

an MS/MS/MS spectrum where the phosphorylated serine or threonine residue is

replaced by a dehydrated form (-18 Da)(Beausoleil et al., 2004). A related 'multistage

28


activation' strategy fragments the neutral loss product and the parent 10n

simultaneously, generating a hybrid spectrum combining both MS/MS and

MS/MS/MS fragmentation products(Olsen and Mann, 2004).

Typically, MS/MS is performed using low-energy collisionally activated

dissociation (CAD) in positive ion mode, in which ions commonly acquire positive

charge by addition of protons. CAD of peptides mainly occurs by nucleophilic

reactions; therefore sites of cleavage are strongly influenced by peptide sequences and

the distribution of protons across backbone and side-chain atoms. Other

fragmentation methods that are becoming popular for PTM identification are

electron capture dissociation (ECD) and electron transfer dissociation (ETD). These

achieve fragmentation through peptide interactions with low-energy electrons (ECD)

or radical anions (ETD), forming peptide radicals that rapidly undergo backbone

cleavage (Stensballe et al., 2000; Syka et al., 2004). ETD and ECD have advantages

over CAD for detecting phosphorylation and other PTMs unstable to MS/MS,

because peptide fragmentation is less influenced by peptide sequence, and neutral loss

reactions are reduced. ECD and ETD are complementary to CAD, however, since

they perform optimally with highly charged analytes (charge state 2: + 3) whereas

CAD is more efficient with ions of lower charge(Zubarev et al., 2000). MS of

negatively charged ions, most commonly formed by proton removal during

ionization, can be more sensitive than positive-mode MS for detecting

phosphopeptides (Carr et al., 1996). In general, negative-mode MS/MS spectra are

difficult to decipher and have not been extensively investigated. However, negative-

29


mode MS/MS of phosphorylated senne, threonine and tyrosine residues yield

fragments of -79 Da (P03 -) or -63 Da (P02-). A very sensitive method involves

selective monitoring of phosphopeptide parent ions in negative mode based on their

-79 Da ion signature, followed by polarity switching to obtain positive-ion MS/MS

spectra(earr et al., 1996).

There is also an increasing need for improved technologies that enable quick

and routine assay of known PTM chemistries. New mass spectrometry technology

development would play the most critical role. Novel approaches to achieve better

ionization, better resolution, better dynamic range than present MALDI and ESI

deserve a great effort in mass spectrometry community. Interdisciplinary

collaboration is definitely needed for next quantum leap in biomics.

Strong Cation Exchange (SCX)

To enrich the low abundant phosphorylated proteins or peptides, various methods

have been developed(Beausoleil et al., 2004). sex is a low resolution but robust

enrichment method. The principle of using sex in phosphopeptide analysis is based

on reduced positive charges on the phosphorylated peptides. Most tryptic peptides

carry one positive charge at each peptide terminus at pH 2.7, as specified in the sex

buffer (NH4 + from the N-terminal amino group and the positively charged side

chain of trypsin and lysine). The negatively charged phosphate group can counter the

positive charges, effectively reducing the charge state by one, and therefore decrease

the binding to the sex column. Generally multiply phosphorylated peptides bind to

30


the column with minimum affinity, while non-phosphorylated peptides bind to the

column strongly. However, acidic amino acids (glutamic acid and aspartic acid) can

interfere with this strategy. Gygi and coworkers demonstrated large scale

identification of 2001 phosphopeptides using sex fractionation(Ballif et al., 2004).

Methods and tools for data integration and visualization

Mass spectrometry -based proteomic surveys and other high-throughput approaches

(ranging from genomics to combinatorial chemistry) are becoming increasingly

important in biology and chemistry. As a result, we need to develop our ability to

"see" the information in the massive tables of quantitative measurements that these

approaches produce. A system of cluster analysis (a form of hierarchical Clustering)

for proteome-wide expression data from high throughput peptide mass fingerprinting

uses standard statistical algorithms to arrange proteins according to similarity in

pattern of expression.

The rapid increase in experimentally identified binary interactions between

proteins has brought us to a stage where we are now able to start viewing how these

interactions and components come together to form large functional regulatory

networks (Ma'ayan A.et al,2005). Networks, formally graphs, are simple abstract

representations of biomolecular interactions where cellular components are

represented as nodes, and interactions connect these nodes through links. Several

commercial and academic initiatives have been attempting to address the need for

31


integration, consolidation, visualization, querymg and organization of information

about binary mammalian protein-protein interactions and signaling pathways from

sparse sources. For example, Cytoscape (Shannon P. et al, 2003) is Java-based

desktop software for protein and gene network visualization. Cytoscape's several

plugins allow for analysis and integration of experimental data as well as

incorporation with Gene Ontology (Maere S. et al., 2005). The Gene Ontology (GO) •

project (Ashburner M, 2000), initiated in the late 1990's, aims at capturing the

increasing knowledge on gene function in a controlled vocabulary applicable to all

organisms. GO consists of three hierarchically structured vocabularies that describe

gene products in terms of their associated biological processes, molecular functions

and cellular components. Biological Networks Gene Ontology tool (BiNGO) is a

plug in for Cytoscape. BiNGO assesses the overrepresentation of GO categories in a

subgraph of a biological network, or any other set of genes. The main advantage of

BiNGO over other tools is its interactive use on molecular interaction networks, e.g.

protein interaction networks or transcriptional coregulation networks, visualized in

Cytoscape. Furthermore, BiNGO offers great flexibility in the use of ontologies and

annotations. Besides the traditional GO ontologies, BiNGO also supports the use of

GOSlim ontologies, as well as custom ontologies and annotations. Finally, the

Cytoscape graphs produced by BiNGO can be viewed, laid out, modified and saved

in various manners. BiNGO assesses the functional themes that are present in a set

of genes. Eventhough a P-value gives a good indication about the prominence of a

certain functional category, it is risky to draw conclusions solely based on P-values

32


therefore BiNGO tests the significance of all GO labels present in the test set, by one

of the most basic multiple testing corrections, the false discovery rate (FDR), i.e. the

expected proportion of false positives among the positively identified tests (Steven

Maere. et al, 2005).

33

chapter 1 : review of literatureshodhganga.inflibnet.ac.in/bitstream/10603/29640/7/07_chapter...

Documents