introduction to biomolecular structure and modeling
DESCRIPTION
Introduction to Biomolecular Structure and Modeling. Dhananjay Bhattacharyya Biophysics Division Saha Institute of Nuclear Physics Kolkata [email protected]. Biomolecular Structures. These are determined experimentally by X-Ray Crystallography - PowerPoint PPT PresentationTRANSCRIPT
Introduction to Biomolecular Introduction to Biomolecular Structure and ModelingStructure and Modeling
Dhananjay BhattacharyyaBiophysics Division
Saha Institute of Nuclear Physics
Kolkata [email protected]
Biomolecular Structures
These are determined experimentally by
• X-Ray Crystallography
• Nuclear Magnetic Resonance Spectroscopy
• Neutron Diffraction Study
• Raman Spectroscopy
And also by theoretical methods
2d sin=n
Nucleic Acid Backbone is Connected to Nucleic Acid Backbone is Connected to Either of Four Different BasesEither of Four Different Bases
A
G
T
C
A-DNAA-DNA
B-DNAB-DNA Z-DNAZ-DNA
Proteins (polymers) are made up of Proteins (polymers) are made up of Amino Acids (monomer units)Amino Acids (monomer units)
There are Twenty different Amino Acids
with different shape, size and electrostatic
properties.
These amino acids form covalent
bonds to form a linear polypeptide chain.
Alanine Phenylalanine
Serine
Cystine
Glutamic Acid (Negatively charged)
Arginine (Positively charged)
Amino Acids are joined together by covalent bonds, called peptide bond, which is structurally very important
helix: Hydrogen bonding between every i i+4 residues
sheet: Hydrogen bonding between ij, i+1j-1 (Antiparallel), or ij, i+1j+1 (parallel)
Coordinate System:
• External coordinates, such as (x,y,z), (r,,), (r,,z)
• Internal coordinates (BondLength, BondAngle, TorsionAngle)
Bond Length
Bond Angle
Torsion Angle
Internal External Coordinate
Generated coordinates
• H 0.000000 0.000000 0.000000
• C 0.000000 0.000000 1.089000
• C 1.367073 0.000000 1.572333
• C 2.050610 -1.183920 1.089000
• C 3.417683 -1.183920 1.572333
• H -0.513360 0.889165 1.452000
• H -0.513360 -0.889165 1.452000
Theoretical Modeling of Biomolecules:
Quantum Mechanics based Methods
Statistics based Methods
Classical or Molecular Mechanics methods
Peptide modeling initiated in India by G.N. Ramachandran
(1950s)Postulates:
• Impenetrable spherical volumes for each atom
• Radius of the sphere depend on atom type
• No two atomic spheres can overlap if they are not covalently bonded
Between H N O C P S
H 2.0 (1.9) 2.4 (2.2) 2.4 (2.2) 2.4 (2.2) 2.65(2.5) 2.65(2.5)
N 2.7 (2.6) 2.7 (2.6) 2.9 (2.8) 3.2 (3.1) 3.1 (3.0)
O 2.7 (2.6) 2.8 (2.7) 3.2 (3.1) 3.1 (2.9)
C 3.0 (2.9) 3.4 (3.2) 3.3 (3.1)
P 3.5 (3.3)
S
Normal and Extreme Limit (within parenthesis) distances (Å) used by Ramachandran co-workers
Original Ramachandran Plot
Fully Allowed Regions
Partially Allowed Regions
Ramachandran plot for 202 proteins at 1.5A or better resolution
Variation of angle by 5o allowed to fit observed phi-psi of protein structures.
Schrodinger Equation: Quantum Mechanics
E)Vdx
d
m
h(
ih)V
m
h(
2
22
22
2
2
28Time dependent
(3 Dimensional)
Time independent
DFT formalism with B3LYP
Pseudoeigenvalue equation:
where
Potential due to exchange-correlation, is defined by
with a, b and c as parameters obtained from fit with experimental data for sample compounds, Ex are for electron exchange and Ec are for
correlation.
N
iNiN
N
i
ksih
1321321
1
......
xci
nuclei
k ki
kksi V'dr
'rr
)'r(
rr
Zh
1
2
2
1
xcV
xcxc
EV
LYPc
LSDAc
Bx
HFx
LSDAxxc cEE)c(EbaEE)a(E 11
Essentials of Computational Chemistry by C.J. Cramer (2002) John Wiley & Sons Ltd,
Input data (atom coordinates, basis sets)
Generate input guess density (overlap integrals)
Construct the potential andSolve Kohn-Sham equation
Generate output densities fromSolutions to Kohn-Sham equations
Are input and output density
same?
Analyze electronic population
Repeat the cycle using the output density as the
input density
YESNO
FLOW CHART DESCRIBING THE DFT METHODOLOGY
G:C W:W CE = -26 kcal/mol
A:U W:W CE = -14
G:U W:W C E = -15
A:G H:S T E = -10
A:G s:s T E = -6
A:U H:W TE = -13
A:A H:H TE = -10
G:A W:W CE = -15
G:A S:W TE = -11
A:A W:W TE = -12
A:U W:W TE = -13
A:A H:W TE = -11
2=>NH..O
1=>NH..N
1=>NH..O
1=>NH..N 2=>NH..O
2=>NH..N
1=>NH..N
1=>CH..O
1=>NH..O
1=>NH..N2=>NH..N
1=>NH..O
1=>NH..N
2=>NH..N
2=>NH..N 1=>NH..O
1=>NH..N
1=>NH..O
1=>NH..N
Strengths of different H-bonds from 33 non-canonical Base Pairs
Considered Energy components, ENHO, ENHN, etc are additive. Additional stabilities, i may come from van der Waals, dipole-dipole etc interactions.
Least Squares Fit indicates i, errors should be smallest for best Fit
iCHNCHN
iCHOCHO
iOHNOHN
iNHNNHN
iNHONHO
ii EnEnEnEnEnE int
2int2
i
CHNCHNi
CHOCHOi
OHNOHNi
NHNNHNi
NHONHOi
i
ii EnEnEnEnEnE
Type of H-bond E (kcal/mol)
N-H…O -7.82
N-H…N -5.62
O-H…N -6.89
C-H…O -1.33
C-H…N -0.67
A. Roy, M. Bhattacharyya, S. Panigrahi, D. Bhattacharyya, (2008) J. Phys. Chem. B (in press)
Netropsin like drugs bind in the B-DNA narrow and deep minor groove
Actinomycin D like drugs make their place in between two stacked base pairs by distorting the DNA double helix
DNA kinks by 90o at the dyad location while binding to two subunits of Catabolite Activator Protein (CAP)
TATA-box binding protein transforms the interfacing DNA region to A-DNA like structure
DNA Smooth Curvature induced by Histone proteins in Chromatin (Nucleosome)
Definition and Nomenclature of Base Definition and Nomenclature of Base Pair Doublet ParametersPair Doublet Parameters
Calculation of Base Pair Calculation of Base Pair parameters by NUPARMparameters by NUPARM
Local Step Parameters: Mean Local Helix Axis: Zm = Xm Ym, where Xm = Xaxis1 + Xaxis2 and Ym = Yaxis1 + Yaxis2
M is Base Pair Center to Center Vector
Tilt : 2.0 * sin-1 ( -Zm Y1) Roll: 2.0 * sin-1 ( Zm X1)Twist: cos-1 (( X1 Zm) ( X2 Zm)) Shift (Dx) M XmSlide(Dy) M YmRise(Dz) M Zm
Partial list of DNA crystal Partial list of DNA crystal structures available at structures available at
http://ndbserver.rutgers.eduhttp://ndbserver.rutgers.edubd0001 12: A C C G A C G T C G G T bd0003 12: A C C G G T A C C G G T bd0004 12: C G C G A A T T C G C G bd0006 10: G G C C A A T T G G bd0011 12: C G C A A A T A T G C G bd0014 12: C G C G A A T T C G C G bd0015 10: C C G C C G G C G G bd0017 9: C G C G C G G A G bd0018 11: G C G A A T T C G C G bd0019 12: G G C G A A T T C G C G bd0022 12: A C C G G C G C C A C A bd0023 10: C C A G T A C T G G Bd0024 10: C C G A A T G A G G
Average Structural Parameters from Crystal Structures
Base-Pair Step
Size of Database
Tilt Roll Twist Rise
G:G 37 -0.24 5.80 30.99 3.46
G:C 106 -0.33 -5.37 38.52 3.32
C:G 157 0.66 3.81 36.26 3.46
A:A 116 -0.01 0.67 35.92 3.21
A:T 54 0.20 -0.60 32.76 3.25
T:A 18 -0.02 0.07 40.39 3.30
A:C 20 -0.37 0.97 32.73 3.43
C:A 47 -0.19 2.17 37.75 3.48
A:G 34 0.16 5.34 31.92 3.44
G:A 55 -0.23 0.52 38.40 3.14
DNA Bending: Experimental and DNA Bending: Experimental and TheoryTheory
Sequence Experimental RL
Theoretical bending (d/l)
Random 1.00 0.98
(AAANNNNNNN)n 1.23 0.85
(AAAANNNNNN)n 1.60 0.81
(AAAAANNNNN)n 2.00 0.74
(AAAAAANNNN)n 2.31 0.72
(AAAAAAAANN)n 2.21 0.67
(AAAAAAAAAN)n 1.73 0.82
Curved DNA models built Curved DNA models built from Crystal parametersfrom Crystal parameters(A3G7)n
(A6G4)n
(A10)n
Bond Angle Deformation
Deformation from equilibrium value costs energy. Simplest form of energy penalty is:
Eko
Bonds are also stretchable but at a cost of energy
Bond Breaking energy
2
2
1)bb(KE obondbond
Ethane (three fold symmetry)
Ethiline (two fold symmetry)
)}cos({kE 31
)}cos({kE 21
Normal and Extreme Limiting (within parenthesis) distances (Å) used by Ramachandran co-workers
Minimum Energy position: rij
o
Between H N O C P S
H 2.0 (1.9) 2.4 (2.2) 2.4 (2.2) 2.4 (2.2) 2.65 (2.5) 2.65 (2.5)
N 2.7 (2.6) 2.7 (2.6) 2.9 (2.8) 3.2 (3.1) 3.1 (3.0)
O 2.7 (2.6) 2.8 (2.7) 3.2 (3.1) 3.1 (2.9)
C 3.0 (2.9) 3.4 (3.2) 3.3 (3.1)
P 3.5 (3.3)
S
Interaction between Instantaneous Atomic dipoles and Induced Atomic dipoles
Force Field for Biomolecular Simulation
202 b
pot bbkE
ncosVn 12
612
0 2r
r
r
r oo
ijr
qqji
(optional) r
D
r
C
ij
ij
ij
ij
1012
202
0 2
1
2
1ikikik rrFk
E(x, y, z)
E(x+1, y, z)
E(x+2, y, z) …..
Search for Search for Conformation with Conformation with Lowest EnergyLowest Energy
Multivariable Optimization: NP-hard Problem
• Systematic Grid Search procedure: Impossible, large no. variables• Guided Grid Search: Depends on Choice• Approximate Method based on Taylor series
Newton-Rhapson Method:
...xdx
Ed
!xxx
dx
Edxxx
dx
dEx
dx
dEmmm 03
32
002
2
00 2
10
1
2
2
0220
dx
Ed
dx
dEx
dxEd
dxdExxm
Energy Landscape of typical bio-moleculesE
nerg
y
Positional Variables
kTE
kTEi
kTEE
i
i
if
e
eQQ
e~xp
xp withxQxpQ
Property Average
1
Alw
ay
s Acc
ep
t
Reject
Accept
Ene
rgy
Uniformly generated Random numbers are used to accept if
exp(-U/kT) > random no
and reject otherwise
Conformation 0: Calculate energy (Ei)
Alter conformation randomly
Calculate energy (Ei+1)
Calculate ρ = exp(-(Ei+1-Ei)/kT)
If ρ > random no
accept the conformation
Repeat the procedure
Deterministic Method
Molecular Dynamics
2
2
dt
xdmamFxE ii
...t
dt
xdt
dt
xdtxttx
2
2
2
2
t
)tt(x)tt(x)t(v
))t(x(Ft)tt(x)t(x)tt(x
)t(dt
xd
!
t)t(
dt
xd
!
t)t(
dt
dxt)t(x)tt(x)tt(x
2
2
322
3
33
2
22
Verlet Algorithm:
NkTtvm
ttattvttv
tttvtxttx
ii 2
3
2
1
22
2
2
Leapfrog-Verlet Algorithm
t0-1/2 t t0+1/2 t t0+3/2 t t0+5/2 t t0+7/2 t
t0 t0+ t t0+2 t t0+3 t t0+4 t
X X X X XEE
EE
EE
EE
EE
v v v v
Time scale of Vibrational MotionsType Wave no (cm-1) Period
Tp(λ/c) (fs)Tp/π (fs)
O-H, N-H stretch 3200-3600 9.8 3.1
C-H Stretch 3000 11.1 3.5
O-C-O Asymm. Stretch 2400 13.9 4.5
C=C, C=N stretch 2100 15.9 5.1
C=O (carbonyl) stretch 1700 19.6 6.2
C=C stretch
H-O-H bend 1600 20.8 6.4
C-N-H, H-N-H bend 1500 22.2 7.1
C-N stretch (amides) 1250 26.2 8.4
Water Libration (rocking) 700 41.7 13
C=C-H bending
Simple Pendulum
Average Position of a simple pendulum
12
3
45
Period of measurement of position : ~2.3 T
Recommended period of measurement ~T /10
Duration of Simulation
• Protein Folding requires 1s to 1ms• Ligand binding/dissociation requires 1s• No. of steps = 1ms / t = 10-3s/10-15s = 1012
Need of faster computer Engaging several computers in parallel Increasing t by Shake, Rattle or Lincs algorithms
Softwares for Molecular Simulation
• Accelrys, MOE, SYBYL, TATA-BioSuite (Composite package, costly)
• CHARMM, AMBER (for Simulation, special Academic Price)
• GROMACS, NAMD (for Simulation, FREE)
• MOLDEN (for molecule Building, FREE)• GAMESS (for QM calculation, FREE)
Heating phase
Equilibration
Dickerson Dodecamer seq: d(CGCGAATTCGCG)2
CURVES calculated values
S replaces O in backbone of substituted DNA.
It yields two chiral conformers of
DNA – PSR and PSS.
S. Mukherjee and D. Bhattacharyya (2004) Biopolymers 73, 269–282
PS-R PS-S
Normal PO
PS-R
PS-S
Students:Dr. Debashree Bandyopadhyay
Dr. Shayantani MukherjeeDr. Kakali Sen
Mr. Sudipta Samanta
Partially Supported by CSIR, DBT and CAMCS (SINP)
Collaborators:Dr. Rabi MajudarDr. Samita Basu
Dr. Sangam BanerjeeDr. Abhijit Mitra (IIIT, Hyderabad)
Dr. N. Pradhan (NIMHANS, Bangalore)