in search of a better internal ribosome entry · pdf filecellular ireses are potentially...

7
1 BIOINFORMATIC REPORT IN SEARCH OF A BETTER INTERNAL RIBOSOME ENTRY SITE (IRES) FOR THE COEXPRESSION OF MULTIPLE GENES FOR GENE THERAPY WONG EE TSIN Department of Biochemistry, Faculty of Medicine, National University of Singapore, SINGAPORE Computational tools are increasingly important for biologists in a day-to-day work. They range from DNA analysis tools such as programs for DNA sequence editing, restriction enzyme analysis, alignment, contig analysis etc to primer analysis tools, protein motifs and structure prediction programs to RNA analysis tools. Such tools have greatly reduce the time needed for data handling and simplified the process of handling an enormous amount of data and complex sequences. DNA analysis tools are extensively being used in my lab. Commonly used ones are DNAStar and CloneManager. I would like to take this opportunity to recommend Clone Manager to all molecular biologists who does extensive cloning and vector construction work. The diagrammatic representation of the constructs with all the restriction sites displayed for selection has enabled vector construction strategies to be designed and tested in silico with greater ease and better control over the choice of manipulative steps used in vector control. The correlation of structure-to-function of biomolecules is fundamental to the understanding of how things work in the cells. The study of the structure and function of internal ribosome entry site (IRES) is important for understanding the Picornavirus and some celular genes mode of gene expression and more importantly, from a gene therapist point of view, how to optimize the IRES structure for efficient gene expression in the cells. Hence, this report is focused mainly on discussing IRES RNA structure analysis and motifs identification.

Upload: buihanh

Post on 08-Feb-2018

215 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: IN SEARCH OF A BETTER INTERNAL RIBOSOME ENTRY · PDF fileCellular IRESes are potentially useful in gene therapy vector because they are endogenous part of the cells, and the activity

1

BIOINFORMATIC REPORT

IN SEARCH OF A BETTER INTERNAL RIBOSOME ENTRY SITE (IRES) FOR THE COEXPRESSION OF MULTIPLE GENES FOR GENE THERAPY

WONG EE TSIN

Department of Biochemistry, Faculty of Medicine, National University of Singapore, SINGAPORE

Computational tools are increasingly important for biologists in a day-to-day work. They range from DNA analysis tools such as programs for DNA sequence editing, restriction enzyme analysis, alignment, contig analysis etc to primer analysis tools, protein motifs and structure prediction programs to RNA analysis tools. Such tools have greatly reduce the time needed for data handling and simplified the process of handling an enormous amount of data and complex sequences.

DNA analysis tools are extensively being used in my lab. Commonly used ones are DNAStar and CloneManager. I would like to take this opportunity to recommend Clone Manager to all molecular biologists who does extensive cloning and vector construction work. The diagrammatic representation of the constructs with all the restriction sites displayed for selection has enabled vector construction strategies to be designed and tested in silico with greater ease and better control over the choice of manipulative steps used in vector control.

The correlation of structure-to-function of biomolecules is fundamental to the understanding of how things work in the cells. The study of the structure and function of internal ribosome entry site (IRES) is important for understanding the Picornavirus and some cellular genes mode of gene expression and more importantly, from a gene therapist point of view, how to optimize the IRES structure for efficient gene expression in the cells. Hence, this report is focused mainly on discussing IRES RNA structure analysis and motifs identification.

Page 2: IN SEARCH OF A BETTER INTERNAL RIBOSOME ENTRY · PDF fileCellular IRESes are potentially useful in gene therapy vector because they are endogenous part of the cells, and the activity

2

CONTENT PAGE

INTRODUCTION…………………………………………………………………….. 3

OBJECTIVES AND FINDINGS……………………………………………………… 3

Predict RNA folded structures by Turner’s nearest neighbour rules using RNA structure prediction software………………………………………………………. 3

Systematically collate the properties of the IRESes into a database………..………. 4

Identify common motif(s) in viral and cellular IRESes by comparison of their RNA structures……………………………………………………………………………. 4

FUTURE GOALS …………………………………………………………………..… 5

Identify RNA motif(s) on the IRES that is important for interactions with protein translation initiation factors………………………………………………………… 5

Model an IRES by combining the desirable motif(s) identified…………….………. 5

Page 3: IN SEARCH OF A BETTER INTERNAL RIBOSOME ENTRY · PDF fileCellular IRESes are potentially useful in gene therapy vector because they are endogenous part of the cells, and the activity

3

INTRODUCTION

Internal Ribosome Entry Site (IRES) is incorporated into gene therapy vector for long-term and coexpression of genes. Co-expression of several therapeutic genes simultaneously within the cells is desirable in the therapy against complex diseases such as cancers and infectious diseases. Cancer arises due to mutation in several genes and thus, it is necessary to target several of these defective genes by gene therapy in order to reverse the cancerous state of the cells. Conventional gene therapy vector design using either two promoters system or fusion of two genes for the simultaneous coexpression of two genes suffers from promoter attenuation and protein misfolding or mistargeting respectively. Bicistronic gene therapy was developed to overcome these problems. Therapeutic genes under the control of a single upstream promoter are linked by IRES in the bicistronic gene therapy vector. Transcription from the upstream promoter will produce a dicistronic mRNA (mRNA containing open-reading frame of two genes) with the upstream gene separated from the downstream gene by an IRES.

IRES is RNA sequence required for the efficient CAP-independent translation of the downstream gene. To ensure long-term and co-expression of genes, a selectable marker is often placed downstream of the IRES such that under selective condition, transcription of the marker gene will also ensure the transcription of the upstream therapeutic gene. IRESes are found naturally in the 5’ untranslated region of picornavirus genes and some cellular genes. Translation efficiency of the most commonly used IRES in gene therapy vector (EMCV IRES) is however not satisfactory. Cellular IRESes are potentially useful in gene therapy vector because they are endogenous part of the cells, and the activity of some can be regulated.

OBJECTIVES AND FINDINGS

Translation efficiency mediated by IRES is strongly dependent on the stability of its folded RNA structure and the recognition of the RNA structure by cellular translation initiation factors. Conventional way to test for the efficiency of IRES is to clone the IRES into a bicistronic vector containing two different reporter genes and determining the efficiency of reporter genes expression. This method is however, tedious and time-consuming. Fortunately, the presence of conserved secondary and tertiary RNA folded structures and few primary features allow us to predict the efficiency of the IRES. We hope to identify the conserved structures in the viral and cellular IRESes that are important for mediating efficient translation. Upon which, engineering of the IRES to fit different gene therapy needs can be done. Specific aims are defined:

Predict RNA Folded Structures By Turner’s Nearest Neighbour Rules Using RNA Structure Prediction Software

There are a great number of softwares available on the internet to predict the secondary and tertiary structures of RNA. For the purpose of prediction, display and export of RNA

Page 4: IN SEARCH OF A BETTER INTERNAL RIBOSOME ENTRY · PDF fileCellular IRESes are potentially useful in gene therapy vector because they are endogenous part of the cells, and the activity

4

secondary structures of small RNA molecule (less than 3000 bases), I recommend the Zuker’s MFOLD site. Advantages of this software are it is available free on the internet, fast, easy to use, allows the simulation of the cellular conditions by changing several parameters (such as temperature and salt concentration) for folding and will generate several possible RNA foldings according to their thermodynamic stability. However, since it is only a prediction software, it cannot be used for the manipulation of the RNA structures. For more sophisticated manipulations and comparison of the RNA structures, Vienna RNA package, RNAdraw and ESSA may be used. The higher end program ESSA provides much functionality such as structure prediction, manipulation, calculations and structure comparison for larger molecules. However, it is available only on UNIX workstation. Thus, a number of programs ranging from simple prediction softwares to complex manipulation softwares are available to cater to specific needs.

Systematically Collate The Properties Of The IRESes Into A Database

The primary sequence, secondary structures and other unique features of 3 representative members of the viral IRESes and 8 human IRESes were collated into a database. The structure and properties of viral IRESes were better characterized than the cellular IRESes. Not much is known about the cellular IRESes. Hence, greater emphasis was placed on the human IRESes. RNA secondary structures were predicted using the algothrithm from MFOLD. The list of the number of human IRESes is expected to expand with the recognition of the importance of IRES in the post-trancriptional regulation of gene expression especially under stress conditions.

Identify common motif(s) in viral and cellular IRESes by comparison of their RNA structures

Alignment of the primary sequences of the viral IRESes revealed some homology especially IRESes belonging to the same family (table 1 & fig 1). However, alignment with human IRESes did not reveal any homologous regions (Table 1 & fig 1). Primary features such as the presence of polypyrimidine tract (PPT) and rRNA complementary sites found conserved in the viral IRESes can also be found in some of the human IRESes. PPT is a short continuous stretch of cytosine and uracil bases on the RNA sequence of the IRES (eg viral PPT UUUUC or UUUC). The presence of PPT within the sequence of cellular IRESes was scanned for and PPT can be found in eIF4G, PDGF-2, and VEGF IRESes (Table 2). The distance between the PPT and the downstream translation start codon in the viral IRES affects the efficiency and accuracy of translation initiation. The functional significance of the PPT in cellular IRES function is not known and can be verified through mutation studies. The significance of rRNA complementary sites on the 3’ end of the IRES is more obvious since it affects the recognition and the stability of the interaction with ribosome during translation initiation. Special alignment programs may be needed to determine the extent of sequence homology between the 3’ end of the 18S rRNA and the 3’ end of the IRES since basepairing in RNA do not always follow the Watson-Crick rule.

Page 5: IN SEARCH OF A BETTER INTERNAL RIBOSOME ENTRY · PDF fileCellular IRESes are potentially useful in gene therapy vector because they are endogenous part of the cells, and the activity

5

Several complex RNA folded structures were generated using MFOLD. Y-type stem loop formed by intrastrand base-pairing between complementary bases along the RNA is found conserved among the viral IRESes. This Y-type stem loop can also be found in most of the cellular IRESes listed. This can be one of the RNA motifs to be included for engineering of the IRES. This motif may be the binding site for some essential translation initiation factors. Proteins that bind to viral IRES were identified by mobility shift assay and UV-induced cross-linking of RNA to proteins. Proteins identified are translation initiation factor 4G, polypyrimidine tract binding protein (PTB) and La. Proteins that bind to cellular IRESes are poorly characterized. Thus, evaluation of the interaction of RNA-binding proteins with the IRES is best studied using the viral IRES model.

FUTURE GOALS

Identify RNA motif(s) on the IRES that is important for interactions with protein translation initiation factors

Both the 3-dimentional structure of the protein and IRES must be available. Determination of protein 3D structure may be done with x-ray crystallography or predicted by homology modelling. The 3D structure of eIF4G (figure 2) and other translation initiation factors had been determined by x-ray crystallographic analysis and are available in the Protein DataBank (PDB). Programs available for the prediction of RNA 3D structure is limited and this may pose the major hinder for progress.

Model An IRES By Combining The Desirable Motif(s) Identified

Thus far, important features identified in the IRES that may be important for IRES functions are the PPT, rRNA complementary sequence and Y-type stem loop which are all located near the 3’ end of the IRES sequence. Tertiary interactions such as pseudo-loops are equally important for IRES function. It will be fruitful to next look into tertiary interactions in the RNA structure.

TABLE 1: SEQUENCE PAIR DISTANCE OF IRESES USING CLUSTAL METHOD OF MEGALIGN

(A) EMCV IRES WITH CELLULAR IRESES

% SIMILARITY

1 2 3 4 1 27.1 32.1 27.1 1 2 46.2 24.5 20.5 2 3 29.0 41.8 22.1 3 4 46.4 48 45.6 4

BIP IRES EIF4G IRES EMCV IRES C-MYC IRES

% D

IVE

RG

EN

CE

Page 6: IN SEARCH OF A BETTER INTERNAL RIBOSOME ENTRY · PDF fileCellular IRESes are potentially useful in gene therapy vector because they are endogenous part of the cells, and the activity

6

(B) PICORNAVIRAL IRESES

% SIMILARITY

1 2 3 4 5 6 7 8 9 87.7 75.0 58.5 22.8 23.6 25.3 20.6 26.5 1 11.1 68.5 57.3 25.3 26.1 22.8 26.2 29.0 2 16.1 20.2 60.1 25.9 24.2 21.6 21.9 22.6 3 30.6 32.4 28.7 23.4 22.9 24.7 20.0 23.9 4 65.1 63.8 66.0 66.0 90.4 48.1 36.1 25.8 5 65.6 64.9 66.4 66.5 6.4 48.4 37.6 23.2 6 66.2 65.6 68.4 66.7 33.3 33.6 31.9 25.2 7 65.8 65.2 69.9 67.1 53.7 52.0 54.2 20.0 8 %

DIV

ER

GE

NC

E

51.4 52.7 49.7 58.4 71.4 69.8 73.2 68.6 9 1 2 3 4 5 6 7 8 9

FIGURE 1: PHYLOGENETIC TREE OF VIRAL AND CELLULAR IRESES USING CLUSTAL METHOD IN MEGALIGN

BIP IRES

EMCV IRES

EIF4G IRES

C-MYC IRES

EMCV IRESMENGO IRESTMEV IRESFMDV IRESCOX IRESECHO IRESPV IRESHRV IRES

HAV IRES

Cardio-Aphthoviruses

Entero-Rhinoviruses

Hepatovirus

Phylogenetic tree of cellular and EMCV IRES

Phylogenetic tree of viral IRESes

BIP IRES

EMCV IRES

EIF4G IRES

C-MYC IRES

BIP IRES

EMCV IRES

EIF4G IRES

C-MYC IRES

EMCV IRESMENGO IRESTMEV IRESFMDV IRESCOX IRESECHO IRESPV IRESHRV IRES

HAV IRES

Cardio-Aphthoviruses

Entero-Rhinoviruses

Hepatovirus

EMCV IRESMENGO IRESTMEV IRESFMDV IRESCOX IRESECHO IRESPV IRESHRV IRES

HAV IRES

Cardio-Aphthoviruses

Entero-Rhinoviruses

Hepatovirus

Phylogenetic tree of cellular and EMCV IRES

Phylogenetic tree of viral IRESes

COX IRES ECHO IRES PV IRES HRV IRES EMCV IRES MENGO IRES TMEV IRES FMDV IRES HAV IRES

Page 7: IN SEARCH OF A BETTER INTERNAL RIBOSOME ENTRY · PDF fileCellular IRESes are potentially useful in gene therapy vector because they are endogenous part of the cells, and the activity

7

FIGURE 2: X-RAY CRYSTAL STRUCTURE OF eIF4E/4G.

TABLE 2: PYRIMIDINE TRACTS FOUND IN THE IRESES

IRES PPT

EMCV

HAV

PV

EIF4G

PDGF2

VEGF

807 TTT TCC TTT 816

695 TTT TTC CT 702

719 TTT C 722

304 CTT TCT TTC CCC 315

623 CTT TCC 628

640 CCT TTC C 646

731 TTC CTT TTC CTC TT 744

169 TTT TTT CTT 177