d atabanks + new tools = new insights
DESCRIPTION
THE AXIOM. D atabanks + New tools = New insights. S imple A tom D epth I ndex C alculator. protein fold barcoding CATH – ADAPT…. -1. SADIC: a new tool to analyze atom depth. Digging inside objects to discover their origins. Birth of the Earth. protein folding. 2D. - PowerPoint PPT PresentationTRANSCRIPT
Databanks +New tools =New insights
THE AXIOM
Simple Atom Depth
Index Calculator
protein fold barcodingCATH – ADAPT… -1
protein foldingBirth of the Earth
Digging inside objects to discover their origins
SADIC: a new tool to analyze atom depth
* Chakravarty S, Varadarajan R. Residue depth: a novel parameter for the analysis of protein structure and stability. Structure Fold Des. 1999 7:723-732* Pintar A, Carugo O, Pongor S. Atom depth as a descriptor of the protein interior. Biophys J. 2003 84:2553-2561.
atom depth calculated as the distance with:the closest external water*the closest dot of the water accessible surface*the closest surface exposed atom*
atom depth
HEWL 4lzt
2D
atom depth2D
Daniele Varrazzo, Andrea Bernini1, Ottavia Spiga, Arianna Ciutti, Stefano Chiellini,Vincenzo Venditti, Luisa Bracci and Neri Niccolai. Three-dimensional Computation of Atomic Depth in Complex Molecular Structures Bioinformatics 2005 21:2856-2860
Calculation of exposed volumes
3D
HEWL 4lzt
2D
atom depthCalculation of exposed
volumes
HEWL 4lzt
3D
Daniele Varrazzo, Andrea Bernini1, Ottavia Spiga, Arianna Ciutti, Stefano Chiellini,Vincenzo Venditti, Luisa Bracci and Neri Niccolai. Three-dimensional Computation of Atomic Depth in Complex Molecular Structures Bioinformatics 2005 21:2856-2860
Calculation of exposed volumes
Depth index: Di,r = 2Vi,r / V 0,r
where Vi,r is the exposed volume of a sphere of radius r centered on atom i of the molecule and V0,r is the exposed volume of the same sphere when centered on an isolated atom
HEWL 4lzt
atom depth3D
Daniele Varrazzo, Andrea Bernini1, Ottavia Spiga, Arianna Ciutti, Stefano Chiellini,Vincenzo Venditti, Luisa Bracci and Neri Niccolai. Three-dimensional Computation of Atomic Depth in Complex Molecular Structures Bioinformatics 2005 21:2856-2860
the sphere radius r should have the biggest value which makes Vi = 0 for the most buried atom
0,0
0,5
1,0
1,5
2,0
4,0
8,0
12,0
16,0
20,0
24,0
Di,r
r [Å]
Thr 47 α carbon Di,9 = 1.59Ile 58 α carbon Di,9 = 0.13Trp 28 α carbon Di.9 = 0.03
58
47
28
atom depth3D vs 2D
HEWL 4lzt
3D atom depth analysis
from PDB ID1UBQ
http://www.sbl.unisi.it/prococoa/
Di
SBL Bioinformatics ProjectsProjects SADIC correlated:
1. fold dependent aa compositions of protein cores;
2. towards i-SADIC.----------------------------------------------------
Projects SADIC uncorrelated:1. systematic analysis of PPI
Di analysis of protein atomsdefining strutural
layers in protein 3D structureseach strutural layer
includes atoms with similar Di’sfast and accurate analysis of
aa content of structural layers
Ln Dicolor
L6 > 1.2 red
L5 1.0 – 1.2 orange
L4 0.8 – 1.0 yellow
L3 0.6 – 0.8 green
L2 0.4 -0.6 blue
L1 0.2 - 0.4 indigo
L0 < 0.2 violet3 VTR (chitinolytic enzyme 572 aa)
Di analysis of protein atoms
N 0.19CA 0.30C 0.25O 0.23CB 0.50CG 0.68CD 0.91CE 1.11NZ 1.29
K63
N 0.38CA 0.52C 0.50O 0.52CB 0.76CG 0.95CD 1.17OE1 1.24OE2 1.24
E24
3D atom depth analysisN 0.10CA 0.05C 0.11O 0.18CB 0.02CG 0.02CD1 0.02CD2 0.00
L43
Dimax
Dimax
Dimax
from PDB ID1UBQ
http://ww
w.sbl.unisi.it/prococoa/
Dimax analysis of protein residues
defining aa occupancy in protein strutural layers
each strutural layer includes residues with
similar Dimax’sfast and accurate analysis of aa distribution in protein
structures
Dimax analysis of protein singlesquite a few proteins like to stay single
(at least in the crystalline state)
Bioinformatiha 2, Firenze 18 ottobre
-9
a database of protein singlesExperimental Method: X-RAY (79,770)
Chain Type: Protein (74,456)Only 1 chain in asym. unit: (28,803)Oligomeric state: 1 (21,193)Number of Entities: 1 (3,517)Homologue Removal @ 95% identity (2,410)
2,410 proteins in the dataset
4,657,574 atoms589,383 residues
2162
322482
642802
9621122
12821442
16021762
192202468
1012141618
DOOPS:
a database of protein singles
2,410 proteins in the dataset
4,657,574 atoms589,383 residues
DOOPS:
Swiss-Prot: 540,958 proteins in the dataset (192 Maa)
2162
322482
642802
9621122
12821442
16021762
192202468
1012141618
0 20001000
calculation of % amino acid content in L0the first quantitative analysis of a large array of protein cores!
aa % in L0
Alanine 11.51Cysteine 2.63Aspartate 1.77Glutamate 1.2
Phenylalanine 6.36Glycine 10.81
Histidine 1.32Isoleucine 11.74
Lysine 0.58Leucina 16.27
Methionine 2.49Asparagine 1.7
Proline 2.45Glutamine 1.21Arginine 0.83Serine 4.85
Threonine 4.65Valine 13.7
Tryptophan 1.43Tyrosine 2.5
Dimax analysis of protein cores2,410 proteins; 4,657,574 atoms; 589,383 residues DOOPS:
~20 % of total molecular volume ΣDOOPS aa(L0) =
106,088(from 2410 proteins)
core aa if Dimax < 0.2
aa % in L0
Alanine 11.51Cysteine 2.63Aspartate 1.77Glutamate 1.2
Phenylalanine* 6.36Glycine 10.81
Histidine 1.32Isoleucine 11.74
Lysine 0.58Leucina 16.27
Methionine 2.49Asparagine 1.7
Proline 2.45Glutamine 1.21Arginine 0.83Serine 4.85
Threonine 4.65Valine 13.7
Tryptophan 1.43Tyrosine 2.5
Class Architectures
Topology
Homologous
superfamily
Domains
1 (mainly α) 5 386 875 37,0382 (mainly β) 20 229 520 43,8813 (α & β) 14 594 1113 90,0294 (few sec. str.) 1 104 118 2,588
Total 40 1313 2626 173,536
Di analysis of protein coresfolding clues from aa core
composition?
:
1.10 1.20 1.25 1.50 2.10 2.30 2.40 2.60 2.80 3.10 3.20 3.30 3.40 3.60 3.90 total
Proteinsmono
213 (84)
84(40)
19(17)
10(3)
17(13)
57(37)
94(73)
134(110)
12(12)
84(73)
52(44)
139(106)
218203
10(8)
49(49)
1,190(872)( )
Di analysis of protein coresfolding clues from aa core
composition?
#
domain
DOOPS + CATHselected Architectures
with ≥ 10 PDB files
:
Cys
PDB ID 1UZK(A01)
aa % average value (av)av + σ
av + 2σav - σav - 2σ
Towards protein folding barcodes
ribbon
LeuPhe
PDB ID 1RG8(A00)
trefoil
Val
PDB ID 2IMH(A01)
four layersandwich
Class Architectures
Topology
Homologous
superfamily
1 5 386 8752 20 229 5203 14 594 11134 1 104 118
Total 40 1313 2626
% L0 1.10 1.20 1.25 1.50 2.10 2.30 2.40 2.60 2.80 3.10 3.20 3.30 3.40 3.60 3.90 overall
ALA 13,28 10,32 21,46 12,74 9,26 10,05 8,43 9,32 5,5 10,69 10,08 12,58 11,88 14,95 12,01 11.51
ARG 0,6 1,28 0,24 1,39 0 0,64 1,72 0,75 0 0,55 1,11 1,75 0,3 0,47 0,95 0.83ASN 0,67 2,62 0,73 2,77 1,85 2,04 1,77 1,36 0 2,1 2,9 0,96 1,52 2,8 2,1 1.70ASP 1,61 2,62 0,24 2,91 1,23 1,27 2,03 1,79 0 2,1 2,9 3,02 1,77 2,34 0,95 1.77CYS 3,35 2,99 5,37 0,83 22,84 2,04 1,46 4,42 0,92 2,83 2,1 1,49 1,86 1,4 3,05 2.63GLN 0,6 1,5 0,24 1,11 1,23 1,15 1,81 1,69 0 0,46 1,56 2,15 0,99 1,4 1,33 1.21GLU 1,48 1,44 0,73 1,52 0 1,15 1,19 1,04 0 0,91 2,59 2,41 1,08 0,93 0,67 1.20GLY 8,05 8,72 9,76 13,85 16,05 9,92 16,2 10,82 9,17 8,78 11,81 11,35 12,64 13,08 9,91 10.8
1HIS 1,01 1,6 2,44 1,11 0,62 0,76 0,79 0,56 0 2,65 1,96 3,02 1,91 0,47 2,48 1.32
ILE 12,68 9,95 10,73 8,59 6,79 13,61 10,68 10,78 13,76 12,8 11,77 12,53 11,53 7,01 11,34 11.74
LEU 23,88 18,34 22,44 11,77 8,02 17,18 12,97 13,98 33,94 16,54 11,9 14,33 14,22 15,42 13,63 16.27
LYS 0,67 0,91 0 1,11 0 0,38 0,49 0,56 0 0,09 0,62 1,36 0,55 0 0,67 0.58MET 2,62 4,17 1,71 4,99 0 2,8 2,65 3,15 1,83 2,93 2,76 2,41 2,39 3,27 1,91 2.49PHE 6,44 6,79 2,93 4,57 4,32 7,12 7,06 6,73 15,6 7,22 4,95 6,18 6,07 4,21 6,01 6.36PRO 1,34 2,46 3,41 2,63 3,09 3,31 3 2,78 0 3,29 2,9 1,84 2,25 1,4 1,81 2.45SER 3,49 4,55 3,66 5,96 3,09 5,34 5,56 5,13 2,75 2,83 5,35 4,43 4,23 6,07 5,34 4.85THR 2,28 4,81 4,15 7,2 5,56 3,31 5,12 4,47 0,92 3,2 5,22 4,25 4,94 5,14 5,91 4.65TRP 1,01 1,55 0 2,77 3,7 0,38 1,63 2,78 2,75 2,19 1,52 0,66 1,26 0,47 2,1 1.43TYR 2,62 3,69 0,24 4,57 2,47 1,27 2,69 4,38 0,92 3,29 3,12 1,58 2,32 0 2,29 2.50VAL 12,34 9,68 9,51 7,62 9,88 16,28 12,75 13,51 11,93 14,53 12,88 11,7 16,29 19,16 15,54 13.7
# PDB
213 (84)
84(40)
19(17)
10(3)
17(13)
57(37)
94(73)
134(110)
12(12)
84(73)
52(44)
139(106)
218203
10(8)
49(49) 2,410
Di of 173,536 CATH domains28 h, 5’ (average comp. time 1.72
s/domain)Calculations performed on
6 cores 990X CPU based computer
Ala
PDB ID 3CKC(A02)
alphahorseshoe
CATH-ADAPTCATH - atom depth assisted protein tomography
Towards protein folding barcodesPutting the protein universe in
order
Towards protein folding barcodesPutting the protein universe in
order
towards i-SADIC(implemented SADIC)
towards i-SADIC(implemented SADIC)
H/D exchange rate profiles
towards i-SADIC(implemented SADIC)
H/D exchange rate profilesD
DD
DD
D
D
D
D
D
D
D
D
D
towards i-SADIC(implemented SADIC)
H/D exchange rate profiles
towards i-SADIC(implemented SADIC)
H/D exchange rate profiles
towards i-SADIC(implemented SADIC)
H/D exchange rate profiles
2D atom depth or 3D atom depth
H/D exchange rate profiles
data from Pedersen TG, Thomsen NK, Andersen KV, Madsen JC, Poulsen FM. Determination of the rate constants k1 and k2 of the Linderstrom-Lang model for protein amide hydrogen exchange. A study of the individual amides in hen egg-white lysozyme. J Mol Biol. 1993 230(2):651-660.
dnwi = or atom distance with the nearest water
molecule
Di,9 = or atom depth index with a probe od radius 9 Å
iSADIC atom depth 3D atom depth
H/D exchange rate profiles
data from Pedersen TG, Thomsen NK, Andersen KV, Madsen JC, Poulsen FM. Determination of the rate constants k1 and k2 of the Linderstrom-Lang model for protein amide hydrogen exchange. A study of the individual amides in hen egg-white lysozyme. J Mol Biol. 1993 230(2):651-660.
Di,9 = or atom depth index with a probe od radius 9 Å
iDi,9 = aDi,9 + bASAi cDi,9 + dDnwi
iSADIC atom depth 3D atom depth
H/D exchange rate profiles
iDi,9 = aDi,9 + bASAi cDi,9 + dDnwi
protein-protein interface analysis
biological vs crystallographic interfaces
crystallographic dimers
biological dimers
vs
N ARG CA ARG C ARG O ARG CB ARG CG ARG CD ARG NE ARG CZ ARG NH1 ARG NH2 ARG H ARG HA ARG HB2 ARG HB3 ARG HG2 ARG HG3 ARG HD2 ARG HD3 ARG HE ARGHH11 ARGHH12 ARGHH21 ARGHH22 ARG
N LYSCA LYSC LYSO LYSCB LYSCG LYSCD LYSCE LYSNZ LYSH LYSHA LYSHB2 LYSHB3 LYSHG2 LYSHG3 LYSHD2 LYSHD3 LYSHE2 LYSHE3 LYSHZ1 LYSHZ2 LYSHZ3 LYS