applications of structural bioinformatics

42
2010/11/22 1 Applications of structural bioinformatics 徐唯哲 Paul Wei-Che HSU Assistant Research Specialist Bioinformatics Core, Institute of Molecular Biology, Academia Sinica, Taiwan, R.O.C. RNA structure Primary structure of an RNA: a sequence of the bases A, G, C and U Due to hydrogen bonds, the bases of an RNA may form the base pair Watson-Click base pairs: G≡C: formed by a triple-hydrogen bond A=U: formed by a double-hydrogen bond Wobble base pairs: G−U: formed by a single hydrogen bond Secondary structure of an RNA: the Watson-Crick and wobble base pairs occurring in the RNA fold

Upload: others

Post on 03-Feb-2022

8 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Applications of structural bioinformatics

2010/11/22

1

Applications of structural bioinformatics

徐唯哲 Paul Wei-Che HSU

Assistant Research Specialist

Bioinformatics Core, Institute of Molecular Biology, Academia Sinica, Taiwan, R.O.C.

RNA structure

• Primary structure of an RNA: a sequence of the bases A, G, C and U

• Due to hydrogen bonds, the bases of an RNA may form the base pair– Watson-Click base pairs:

• G≡C: formed by a triple-hydrogen bond• A=U: formed by a double-hydrogen bond

– Wobble base pairs:• G−U: formed by a single hydrogen bond

• Secondary structure of an RNA: the Watson-Crick and wobble base pairs occurring in the RNA fold

Page 2: Applications of structural bioinformatics

2010/11/22

2

3

RNA secondary structure• RNA structure pairing:

A-U, C-G, G-U

a. hairpin loop b. internal loop c. bulge loop d. multibranched loop e. stem f. pseudoknot

Thermodynamic Calculations

△G: Free energy of duplex formation

△H: enthalpy

△S:entropy

△G = △H - T△S

T: temperature in K

Page 3: Applications of structural bioinformatics

2010/11/22

3

Minimum free-energy (MFE)

• E(S) : Total free-energy E of structure

E(S) = ∑ e(ri,rj)(ri,rj)ЄS

E = min E(S)

Nearest-neighbor energy parameters

Breslauer SantaLucia Sugimoto

Dh Ds Dh Ds Dh Ds

AA/TT -9.1 -24.0 -8.4 -23.6 -8.0 -21.9

AG/CT -7.8 -20.8 -6.1 -16.1 -6.6 -16.4

AT/TA -8.6 -23.9 -6.5 -18.8 -5.6 -15.2

AC/GT -6.5 -17.3 -8.6 -23.0 -9.4 -25.5

GA/TC -5.6 -13.5 -7.7 -20.3 -8.8 -23.5

GG/CC -11.0 -26.6 -6.7 -15.6 -10.9 -28.4

GC/GC -11.1 -26.7 -11.1 -28.4 -10.5 -26.4

TA/TA -6.0 -16.9 -6.3 -18.5 -6.6 -18.4

TG/CA -5.8 -12.9 -7.4 -19.3 -8.2 -21.0

CG/CG -11.9 -27.8 -10.1 -25.5 -11.8 -29.0

nuc (GC% >0) 0.0 -16.8 0.0 -5.9 0.6 -9.0

nuc (GC% =0) 0.0 -20.1 0.0 -9.0 0.6 -9.0

Example:

△H= △h int + (△hGG/CC + △hGA/TC + △hAT/TA)

= 0 + (-11) + (-5.6) + (-8.6) = -25.2 (kcal/mol)

△S= △s int + △s GG/CC+△s GA/TC + △s AT/TA

= (-16.8) + (-26.6) + (-13.5) + (-23.9)= -80.8 (cal/K*mol)

△G25℃ = △H - T△S

= (-25.2) - (25+273)*(-0.0808)= -1.1 (kcal/mol)

GGAT||||CCTA

Page 4: Applications of structural bioinformatics

2010/11/22

4

• Mfold– http://mfold.rna.albany.edu/

– Fold many RNA/DNA sequences at once

– Fold RNA/DNA at different temperature (between 0° and 100° C)

M. ZukerMfold web server for nucleic acid folding and hybridization prediction. Nucleic Acids Res. 31 (13), 3406-15, (2003)

Folding of DNA sequence at different temperature

37 °C 60 °C 90 °C

Page 5: Applications of structural bioinformatics

2010/11/22

5

14 utilities in RNA Studio

Page 6: Applications of structural bioinformatics

2010/11/22

6

Page 7: Applications of structural bioinformatics

2010/11/22

7

• RNAz –– http://rna.tbi.univie.ac.at/cgi-bin/RNAz.cgi

– predicting structural noncoding RNAs

Washietl S., Hofacker I.L., Stadler P.F.Fast and reliable prediction of noncoding RNAsProc. Natl. Acad. Sci. U.S.A. 102, 2454-2459, Feb. 2005

RNAz

• Predict structurally conserved and thermodynamically stable RNA secondary structures in multiple sequence alignments

• Can be used in genome wide screens :

– Detect functional RNA structures, as found in noncoding RNAs and cis-acting regulatory elements of mRNAs

Page 8: Applications of structural bioinformatics

2010/11/22

8

Step 1: File upload

Step 2: Analysis options

Page 9: Applications of structural bioinformatics

2010/11/22

9

Step 3: Output options

Step 4: View results

Page 10: Applications of structural bioinformatics

2010/11/22

10

The biogenesis of microRNAs

(Esquela-Kerscher and Slack, 2006)19RISC: RNA-induced silencing complex

Genes can be regulated by miRNAs

• microRNAs (miRNAs) get involved in critical biological processes by repressing the translation of coding genes.

• Previous study finds that more than one-third of human genome are regulated by RNA (Life science news, 2005)

• Thousands of human genes are microRNA targets (Lewis et al., Cell, 2005, Selbach et al., Nature, 2008)

20

Page 11: Applications of structural bioinformatics

2010/11/22

11

microRNA cluster

miRNA Function

mRNA mRNAActive chromatin

Silent chromatin

Histone methylation

mRNA degradation Translation repression Transcription repression

Common in plants Common in animalsCommon in yeasts, plants, and possibly animals

22

Page 12: Applications of structural bioinformatics

2010/11/22

12

23

Schematic of the structure of five human pri-miRNAs

(Cullen et al. Mol. Cell., 2004)

Category in miRNAMap

intergenic

intronic

intergenic

intergenic

exonic

24

• miRBase– http://www.mirbase.org/

– The microRNA database

Page 13: Applications of structural bioinformatics

2010/11/22

13

26

Page 14: Applications of structural bioinformatics

2010/11/22

14

27

Search resultGenomic location

Seed region of miRNA

• Perfect match at either bases 2-8 from the 5' end of the miRNA

• The seed correlated with both mRNA degradation and translational repression (Selbach et al., Nature, 2008)

28

miRNA

Target gene

Page 15: Applications of structural bioinformatics

2010/11/22

15

29

Tools for identifying miRNA targets

• miRNA.org (miRanda)– http://www.microrna.org

• TargetScan– http://www.targetscan.org/

• RNAhybrid– http://bibiserv.techfak.uni-

bielefeld.de/rnahybrid/submission.html

• PicTar– http://pictar.mdc-berlin.de/

30

Predicted miRNA targets - miRNA.org

Page 16: Applications of structural bioinformatics

2010/11/22

16

View target View expression profile

Page 17: Applications of structural bioinformatics

2010/11/22

17

34

Predicted miRNA targets - TargetScan

Page 18: Applications of structural bioinformatics

2010/11/22

18

RNAhybrid

miRNA sequence (in FASTA format)

Target RNA (in FASTA format)

Page 19: Applications of structural bioinformatics

2010/11/22

19

38

Predicted miRNA targets - PicTar

Page 20: Applications of structural bioinformatics

2010/11/22

20

39

Known miRNA targets : Tarbase

• TarBase: A comprehensive database of experimentally supported animal microRNA targets. – Sethupathy, P. et al. (RNA, 12:192-197, 2006)

• A database provides a means of searching through a comprehensive set of experimentally supportedmicroRNA targets in at least 9 organisms. – Number of miRNAs represented : 177– Number of target genes : 995– Number of target sites : 883– http://www.diana.pcbi.upenn.edu/tarbase

Known miRNA targets : Tarbase

Page 21: Applications of structural bioinformatics

2010/11/22

21

• How to get the promoter region of miRNA gene?

Page 22: Applications of structural bioinformatics

2010/11/22

22

43

Schematic of the structure of five human pri-miRNAs

(Cullen et al. Mol. Cell., 2004)

Category in miRNAMap

intergenic

intronic

intergenic

intergenic

exonic

Few of complete pri-miRNA sequences are identified

• Most of the transcription start sites (TSSs) of intergenicmiRNAs are unknown the position

TSS ?

44

Page 23: Applications of structural bioinformatics

2010/11/22

23

Experimental data

FANTOM3

CAGE tags

DBTSS

Cap-analysis gene expression (CAGE) tags

DBTSS Solexa tags

5'-ends of the Solexa sequences of human cell lines (MCF7, HEK293)

Expressed sequence tag (EST)

UCSC

EST positions in human genome

(Kuhn et al., Nucleic Acids Res, 2007)

(Wakaguri et al., Nucleic Acids Res, 2008)

(Carninci et al., Nat Genet, 2006)

http://fantom3.gsc.riken.jp/

http://dbtss.hgc.jp

Statistics of DBTSS

Page 24: Applications of structural bioinformatics

2010/11/22

24

http://dbtss.hgc.jp

Page 25: Applications of structural bioinformatics

2010/11/22

25

Retrieve promoter sequence

Comparative analysis of the promoters

Page 26: Applications of structural bioinformatics

2010/11/22

26

Search for TF Binding Site

Search result of TF binding site

Page 27: Applications of structural bioinformatics

2010/11/22

27

Tool Method Species Features Data Source References

EP3Structural features of DNA

identification

eukaryotic

genomes

DNA denaturation

values,duplex-free energy, GC

content

UCSC

ENCODEAbeel et al., Genome Res, 2008

NNPP 2.2 Neural network prokaryote/

eukaryote

TATA box EPDReese et al., Comput Chem,

2001

Promoter 2.0 Neural network VertebrateFour TFBSs (TATA box, CCAAT

box, GC box, Inr)EPD Knudsen, Bioinformatics, 1999

53

Promoter prediction tools

EP3 NNPP 2.2 Promoter 2.0

EP3 • http://bioinformatics.psb.ugent.be/webtools/ep3/

Page 28: Applications of structural bioinformatics

2010/11/22

28

BDGP: Neural Network Promoter Prediction

• http://www.fruitfly.org/seq_tools/promoter.html

Promoter 2.0http://www.cbs.dtu.dk/services/Promoter/

Page 29: Applications of structural bioinformatics

2010/11/22

29

Protein structure

Protein structure

(Adapted from a slide by P. Johansson, E. Jakobsson)

Drug DiscoveryFunctional study

Protein structure determination1. NMR2. X-ray crystallography

Page 30: Applications of structural bioinformatics

2010/11/22

30

Protein structure database• 1.PDB (Protein Data Bank):

– PDB contains information about experimentally-determined structures of proteins, nucleic acids, and complex assemblies.

– http://www.rcsb.org/pdb/home/home.do

• 2. MMDB (Molecular Modeling Database): – Data came from PDB, with value-added features such as explicit chemical

graphs, computationally identified 3D that are used to identify similar 3D structures

– http://www.ncbi.nlm.nih.gov/sites/entrez?db=structure

• 3. Pfam database (Protein Family Database)– Pfam is a large collection of protein families, proteins are generally

composed of one or more functional regions, commonly termed domains.– http://pfam.sanger.ac.uk/

Protein Data Bank (PDB)

• http://www.pdb.org/pdb/home/home.do

– Structure data determined by X-ray crystallography and NMR

– The data include the atom coordinate, reference, sequence, secondary structure, disulfide bond ……etc.

Page 31: Applications of structural bioinformatics

2010/11/22

31

Page 32: Applications of structural bioinformatics

2010/11/22

32

Enzyme Classification Histogram

Amprenavir: a protease inhibitor used to treat HIV infection.

Page 33: Applications of structural bioinformatics

2010/11/22

33

MMDB

66

PFam

• Pfam is a collection of protein families and

domains

• In Pfam, you can

– Look at multiple alignments

– View protein domain architectures

– Examine species distribution

– Follow links to other databases

– View known protein structures

Page 34: Applications of structural bioinformatics

2010/11/22

34

67

Pfam-A: Families from Pfam

Pfam-B: A large number of

small families taken from the

ProDom database

Protein and domain families

(Note: A single protein

can belong to several

Pfam families )

68

URL : http://pfam.sanger.ac.uk/

Page 35: Applications of structural bioinformatics

2010/11/22

35

70

Keyword Search

apoptosis

Page 36: Applications of structural bioinformatics

2010/11/22

36

71

Bcl-2 family

72

Alignment

Page 37: Applications of structural bioinformatics

2010/11/22

37

73

HMM logo

74

Phylogenetic tree for Bcl-2 family

Page 38: Applications of structural bioinformatics

2010/11/22

38

Structures

76

Domain organization

Page 39: Applications of structural bioinformatics

2010/11/22

39

77

Other databases of structural classification of proteins

• 1. SCOP (Structural Classification of Proteins):folds, superfamilies, and families– http://scop.mrc-lmb.cam.ac.uk/scop/

• 2. CATH (Classification by Class, Architecture, Topology & Homology) – http://www.cathdb.info/

• 3. Dali: a network service for comparing protein structures in 3D

– DALI server http://ekhidna.biocenter.helsinki.fi/dali_server/index.html

– DALI Database (fold classification) http://ekhidna.biocenter.helsinki.fi/dali/start

Page 40: Applications of structural bioinformatics

2010/11/22

40

蛋白質結構軟體應用

Software for Protein Structure Visualization

• PyMol http://www.pymol.org/• Jmol http://jmol.sourceforge.net/• RasMol http://www.umass.edu/microbio/rasmol/• MolPOV http://www.chem.ufl.edu/~der/der_pov2.htm• MolMol http://www.mol.biol.ethz.ch/wuthrich/software/molmol/• Ribbons http://www.cmc.uab.edu/ribbons/• MolScript http://www.avatar.se/molscript/• WebLab ViewerLite and ViewerPro

http://www.accelrys.com/about/msi.html• Swiss-PDB Viewer http://www.expasy.ch/spdbv/• XtalView http://www.scripps.edu/pub/dem-web/toc.html• MolView and MolView Lite http://bilbo.bio.purdue.edu/~tom/

STRING

(‘Search Tool for the Retrieval of Interacting Genes/Proteins’)

Page 41: Applications of structural bioinformatics

2010/11/22

41

Page 42: Applications of structural bioinformatics

2010/11/22

42

84

[email protected]

Thanks for your attention