Torsion angle dynamics: Difference between revisions

From CYANA Wiki
Jump to navigation Jump to search
 
(37 intermediate revisions by 6 users not shown)
Line 9: Line 9:
When compared with other structure calculation algorithms based on simulated annealing (Brünger, 1992), the principal difference of torsion angle dynamics is that it works with internal coordinates rather than Cartesian coordinates. Thereby the number of degrees of freedom is decreased about ten-fold, since the covalent structure parameters (bond lengths, bond angles, chiralities and planarities) are kept fixed at their optimal values during the calculation. The strong potentials required to preserve the covalent structure in conventional Cartesian space molecular dynamics and the concomitant high-frequency motions are absent in torsion angle dynamics. This results in a simpler potential energy function and the use of longer time-steps for the numerical integration of the equations of motion, and hence in higher efficiency of the structure calculation. An additional, practical advantage of working in torsion angle space is that this approach excludes inadvertent changes of chirality during the course of a structure calculation, which may happen with Cartesian space molecular dynamics (Schultze & Feigon, 1997).  
When compared with other structure calculation algorithms based on simulated annealing (Brünger, 1992), the principal difference of torsion angle dynamics is that it works with internal coordinates rather than Cartesian coordinates. Thereby the number of degrees of freedom is decreased about ten-fold, since the covalent structure parameters (bond lengths, bond angles, chiralities and planarities) are kept fixed at their optimal values during the calculation. The strong potentials required to preserve the covalent structure in conventional Cartesian space molecular dynamics and the concomitant high-frequency motions are absent in torsion angle dynamics. This results in a simpler potential energy function and the use of longer time-steps for the numerical integration of the equations of motion, and hence in higher efficiency of the structure calculation. An additional, practical advantage of working in torsion angle space is that this approach excludes inadvertent changes of chirality during the course of a structure calculation, which may happen with Cartesian space molecular dynamics (Schultze & Feigon, 1997).  


To fully exploit the potential advantages of torsion angle dynamics for macromolecular structure calculation, its implementation has to take account of the fact that the Lagrange equations of motion with torsion angles as degrees of freedom are more complex than Newton’s equations in Cartesian coordinates. The computation time for “naïve” (but by no means simple) approaches (Mazur & Abagyan, 1989; Mazur et al., 1991) to torsion angle dynamics rises with the third power of the system size, which renders these algorithms unsuitable for use with macromolecules. CYANA gets around this potential drawback by using a fast recursive implementation of the equations of motion that was originally developed for spacecraft dynamics and robotics (Jain et al., 1993), which entails a computational effort proportional to n. With the fast torsion angle dynamics algorithm in CYANA the advantages of torsion angle dynamics, especially that much longer integration time steps can be used, are effective for molecules of all sizes.
To fully exploit the potential advantages of torsion angle dynamics for macromolecular structure calculation, its implementation has to take account of the fact that the Lagrange equations of motion with torsion angles as degrees of freedom are more complex than Newton’s equations in Cartesian coordinates. The computation time for “naïve” (but by no means simple) approaches (Mazur & Abagyan, 1989; Mazur et al., 1991) to torsion angle dynamics rises with the third power of the system size, which renders these algorithms unsuitable for use with macromolecules. CYANA gets around this potential drawback by using a fast recursive implementation of the equations of motion that was originally developed for spacecraft dynamics and robotics (Jain et al., 1993), which entails a computational effort proportional to the number of torsion angles. With the fast torsion angle dynamics algorithm in CYANA the advantages of torsion angle dynamics, especially that much longer integration time steps can be used, are effective for molecules of all sizes.


=== NMR structure calculation versus molecular dynamics simulation ===
=== NMR structure calculation versus molecular dynamics simulation ===
Line 22: Line 22:


A typical “geometric” force field used in NMR structure calculation therefore retains only the most important part of the non-bonded interaction by a simple repulsive potential that replaces the Lennard-Jones and electrostatic interactions of the full empirical energy function. This short-range repulsive function can be calculated much faster and significantly facilitates large-scale conformational changes that are required during the folding process by lowering energy barriers induced by the overlap of atoms.
A typical “geometric” force field used in NMR structure calculation therefore retains only the most important part of the non-bonded interaction by a simple repulsive potential that replaces the Lennard-Jones and electrostatic interactions of the full empirical energy function. This short-range repulsive function can be calculated much faster and significantly facilitates large-scale conformational changes that are required during the folding process by lowering energy barriers induced by the overlap of atoms.
== Fast algorithm for torsion angle dynamics ==
   
   
=== Tree structure ===
== Tree structure ==


The key idea of the fast torsion angle dynamics algorithm in CYANA is to exploit the fact that a chain molecule such as a protein or nucleic acid can be represented in a natural way as a tree structure consisting of ''n'' + 1 rigid bodies or “clusters” that are connected by ''n'' rotatable bonds. Each rigid body is made up of one or several mass points (atoms) with fixed relative positions. The tree structure starts from a base, typically at the N-terminus of the polypeptide chain, and terminates with “leaves” at the ends of the side-chains and at the C-terminus. The only degrees of freedom are rotations about single bonds, and parameters that define the position and orientation of the molecule in space. The clusters are numbered from 0 to ''n''. The base cluster has the number ''k'' = 0. Each of the other clusters, with numbers ''k'' &ge; 1, has a single nearest neighbor in the direction toward the base, which has a number ''p''(''k'') < ''k''. The torsion angle between the rigid bodies ''p''(''k'') and ''k'' is denoted by ''&theta;<sub>k</sub>''. The conformation of the molecule is uniquely specified by the values of all torsion angles, (''&theta;<sub>1</sub>'',…,''&theta;<sub>n</sub>'').
The key idea of the fast torsion angle dynamics algorithm in CYANA is to exploit the fact that a chain molecule such as a protein or nucleic acid can be represented in a natural way as a tree structure consisting of ''n'' + 1 rigid bodies or “clusters” that are connected by ''n'' rotatable bonds. Each rigid body is made up of one or several mass points (atoms) with fixed relative positions. The tree structure starts from a base, typically at the N-terminus of the polypeptide chain, and terminates with “leaves” at the ends of the side-chains and at the C-terminus. The only degrees of freedom are rotations about single bonds, and parameters that define the position and orientation of the molecule in space. The clusters are numbered from 0 to ''n''. The base cluster has the number ''k'' = 0. Each of the other clusters, with numbers ''k'' &ge; 1, has a single nearest neighbor in the direction toward the base, which has a number ''p''(''k'') < ''k''. The torsion angle between the rigid bodies ''p''(''k'') and ''k'' is denoted by ''&theta;<sub>k</sub>''. The conformation of the molecule is uniquely specified by the values of all torsion angles, (''&theta;<sub>1</sub>'',…,''&theta;<sub>n</sub>'').
Line 31: Line 29:
[[Image:TreeStructureRigidBodies.jpg|thumb|500px|Fig. 1. (a) Tree structure of torsion angles for the tripeptide Val-Ser-Ile. Circles represent rigid units. Rotatable bonds are indicated by arrows that point towards the part of the structure that is rotated if the corresponding torsion angle is changed. (b) Excerpt from the tree structure formed by the torsion angles of a molecule, and definition of quantities required by the CYANA torsion angle dynamics algorithm.]]
[[Image:TreeStructureRigidBodies.jpg|thumb|500px|Fig. 1. (a) Tree structure of torsion angles for the tripeptide Val-Ser-Ile. Circles represent rigid units. Rotatable bonds are indicated by arrows that point towards the part of the structure that is rotated if the corresponding torsion angle is changed. (b) Excerpt from the tree structure formed by the torsion angles of a molecule, and definition of quantities required by the CYANA torsion angle dynamics algorithm.]]


The following quantities are defined for each cluster ''k'' (see Fig. 1): the “reference point”, ''r<sub>k</sub>'', which is the position vector of the end point of the bond between the clusters ''p''(''k'') and ''k''; <math>\scriptstyle v_k = \dot{r}_k</math>, the velocity of the reference point; ''&omega;<sub>k</sub>'', the angular velocity vector of the cluster; ''Y<sub>k</sub>'', the vector from the reference point to the center of mass of the cluster; ''m<sub>k</sub>'', the mass of the cluster ''k''; ''I<sub>k</sub>'', the inertia tensor of the cluster ''k'' with respect to the reference point, given by
The following quantities are defined for each cluster ''k'' (see Fig. 1): the “reference point”, ''r<sub>k</sub>'', which is the position vector of the end point of the bond between the clusters ''p''(''k'') and ''k''; <math>v_k = \dot{r}_k</math>, the velocity of the reference point; ''&omega;<sub>k</sub>'', the angular velocity vector of the cluster; ''Y<sub>k</sub>'', the vector from the reference point to the center of mass of the cluster; ''m<sub>k</sub>'', the mass of the cluster ''k''; ''I<sub>k</sub>'', the inertia tensor of the cluster ''k'' with respect to the reference point, given by


::<math>\textstyle I_k = \sum_\alpha m_\alpha I(y_\alpha)</math>
:<math>\textstyle I_k = \sum_\alpha m_\alpha I(y_\alpha)</math>


where the sum runs over all atoms in the cluster ''k'', ''m<sub>&alpha;</sub>'' is the mass of the atom ''&alpha;'', ''y<sub>&alpha;</sub>'' is the vector from the reference point of cluster ''k'' to the atom ''&alpha;'', and ''I''(''y<sub>&alpha;</sub>'') is the symmetric 3 &times; 3 matrix defined by the relation ''I''(''y'') = ''y'' Λ (''x'' Λ ''y'') for all three-dimensional vectors ''x''. The symbol “Λ” denotes the vector product. All position vectors are in an inertial frame of reference that is fixed in space.
where the sum runs over all atoms in the cluster ''k'', ''m<sub>&alpha;</sub>'' is the mass of the atom ''&alpha;'', ''y<sub>&alpha;</sub>'' is the vector from the reference point of cluster ''k'' to the atom ''&alpha;'', and ''I''(''y<sub>&alpha;</sub>'') is the symmetric 3 &times; 3 matrix defined by the relation ''I''(''y'') = ''y'' Λ (''x'' Λ ''y'') for all three-dimensional vectors ''x''. The symbol “Λ” denotes the vector product. All position vectors are in an inertial frame of reference that is fixed in space.


=== Kinetic energy ===
== Kinetic energy ==


The angular velocity vector ''&omega;<sub>k</sub>'' and the linear velocity ''v<sub>k</sub>'' of the reference point of the rigid body ''k'' are calculated recursively from the corresponding quantities of the preceding rigid body ''p''(''k''):
The angular velocity vector ''&omega;<sub>k</sub>'' and the linear velocity ''v<sub>k</sub>'' of the reference point of the rigid body ''k'' are calculated recursively from the corresponding quantities of the preceding rigid body ''p''(''k''):
Line 47: Line 45:
:<math>E_{\text{kin}}=\frac{1}{2}\sum_{k=0}^n[m_k v_k^2 + \omega_k \cdot I_k \omega_k + 2v_k\cdot(\omega_k \wedge m_k Y_k)]</math>.
:<math>E_{\text{kin}}=\frac{1}{2}\sum_{k=0}^n[m_k v_k^2 + \omega_k \cdot I_k \omega_k + 2v_k\cdot(\omega_k \wedge m_k Y_k)]</math>.


=== Potential energy = target function ===
== Potential energy = target function ==


The CYANA target function (Güntert et al., 1991; Güntert et al., 1997) is defined such that it is zero if and only if all experimental distance restraints and torsion angle restraints are fulfilled and all non-bonded atom pairs satisfy a check for the absence of steric overlap. A conformation that satisfies the restraints more closely than another one will lead to a lower target function value. The CYANA target function for distance restraints and torsion angle restraints is defined by
For torsion angle dynamics the CYANA [[target function]] (Güntert et al., 1991; Güntert et al., 1997), ''V'', has the role of the potential energy, ''E''<sub>pot</sub> = ''V'', and energies are reported in units of the target function ''V'', i.e. Å<sup>2</sup>, in the current version of CYANA. If energies and temperatures are to be reported in their standard units of kJ/mol and K, we assume that ''E''<sub>pot</sub> = ''w''<sub>0</sub>''V'', with an overall (arbitrary) scaling factor ''w''<sub>0</sub> = 10 kJ mol<sup>-1</sup>Å<sup>-2</sup>.


:<math>V = \sum_{c=u,l,v} w_c \sum_{(\alpha,\beta)\in I_c} w_{\alpha\beta}^c (d_{\alpha\beta} - b_{\alpha\beta})^2 + w_a \sum_{i\in I_a} w_i \left[1 - \frac{1}{2}\left(\frac{\Delta_i}{\Gamma_i}\right)^2\right]\Delta_i^2</math>
Because the target function is not a physical potential, there is no unique “natural” way to define the energy and temperature scale for protein structure calculations.


Upper and lower bounds, ''b<sub>&alpha;&beta;</sub>'', on distances ''d<sub>&alpha;&beta;</sub>'' between two atoms ''&alpha;'' and ''&beta;'', and restraints on individual torsion angles ''&theta;<sub>i</sub>'' in the form of allowed intervals <math>[\theta_i^\min,\theta_i^\max]</math> are considered. ''I<sub>u</sub>'', ''I<sub>l</sub>'' and ''I<sub>v</sub>'' are the sets of atom pairs (''&alpha;'',''&beta;'') with violated upper, lower or van der Waals distance bounds, respectively, and ''I<sub>a</sub>'' is the set of restrained torsion angles. ''w<sub>u</sub>'', ''w<sub>l</sub>'', ''w<sub>v</sub>'' and ''w<sub>a</sub>'' are overall weighting factors for the different types of restraints, and <math>w_{\alpha\beta}^c</math>  and ''w<sub>i</sub>'' are relative weighting factors for individual restraints. The half-width of the forbidden range of torsion angle values is denoted by <math>\Gamma_i = \pi - (\theta_i^\max - \theta_i^\min)/2</math>, and ''&Delta;<sub>i</sub>'' is the size of the torsion angle restraint violation.
== Generalized forces = torques = –gradient of the target function ==


The target function may include additional terms for restraints on vicinal scalar coupling constants, residual dipolar couplings, and pseudocontact shifts, as well as identity and symmetry restraints for symmetric multimers. Alternatives to the simple square potential for violated distance restraints have also been implemented.
The torques about the rotatable bonds, i.e., the negative gradients of the potential energy or target function with respect to torsion angles, -&nabla;''V''(''&theta;''), are calculated by a fast recursive algorithm (Abe et al., 1984). The gradient of the target function can be calculated efficiently because the target function is a sum of functions of individual interatomic distances and torsion angles. The partial derivative of the target function ''V'' with respect to a torsion angle ''&theta;<sub>k</sub>'' is given by


For torsion angle dynamics the CYANA target function (Güntert et al., 1991; Güntert et al., 1997), ''V'', has the role of the potential energy, ''E''<sub>pot</sub> = ''V'', and energies are reported in units of the target function ''V'', i.e. Å<sup>2</sup>, in the current version of CYANA. If energies and temperatures are to be reported in their standard units of kJ/mol and K, we assume that ''E''<sub>pot</sub> = ''w''<sub>0</sub>''V'', with an overall (arbitrary) scaling factor ''w''<sub>0</sub> = 10 kJ mol<sup>-1</sup>Å<sup>-2</sup>. Because the target function is not a physical potential, there is no unique “natural” way to define the energy and temperature scale for protein structure calculations.
:<math>\frac{\partial V}{\partial\theta_k} = -e_k\cdot f_k - (e_k \wedge r_k)\cdot g_k + 2 w_a \sum_{i\in I_a} w_i\left[1-\left(\frac{\Delta_i}{\Gamma_i}\right)^2\right]\Delta_i \delta_{ik}</math>
with


=== Forces = torques = –gradient of the target function ===
:<math>f_k = \sum_{c=u,l,v} w_c \sum_{(\alpha,\beta)\in I_c\atop\alpha\in M_k} w_{\alpha\beta}^c 2\,\frac{d_{\alpha\beta}-b_{\alpha\beta}}{d_{\alpha\beta}}(r_\alpha\wedge r_\beta)</math>,
:<math>g_k = \sum_{c=u,l,v} w_c \sum_{(\alpha,\beta)\in I_c\atop\alpha\in M_k} w_{\alpha\beta}^c 2\,\frac{d_{\alpha\beta}-b_{\alpha\beta}}{d_{\alpha\beta}}(r_\alpha - r_\beta)</math>.


The torques about the rotatable bonds, i.e., the negative gradients of the potential energy or target function with respect to torsion angles, V(), are calculated by a fast recursive algorithm (Abe et al., 1984). The gradient of the target function can be calculated efficiently because the target function is a sum of functions of individual interatomic distances and torsion angles. The partial derivative of the target function V with respect to a torsion angle k is given by
''r<sub>&alpha;</sub>'' and ''r<sub>&beta;</sub>'' are the position vectors of the atoms ''&alpha;'' and ''&beta;'', and ''M<sub>k</sub>'' denotes the set of all atoms whose position is affected by a change of the torsion angle ''&theta;<sub>k</sub>'' if the base cluster is held fixed. Because ''M<sub>k</sub>'' is a subset of ''M''<sub>''p''(''k'')</sub>, the quantities ''f<sub>k</sub>'' and ''g<sub>k</sub>'' can be calculated recursively for ''k'' = ''n'',''n'' - 1,…,1 starting from the leaves of the tree structure by evaluating the interaction for each atom pair (''&alpha;'',''&beta;'') only once.
with
r and r are the position vectors of the atoms  und , and Mk denotes the set of all atoms whose position is affected by a change of the torsion angle k if the base cluster is held fixed. Because Mk is a subset of Mp(k), the quantities fk and gk can be calculated recursively for kn, n1, …, 1 starting from the leaves of the tree structure by evaluating the interaction for each atom pair (, ) only once.


=== Equations of motion ===
== Equations of motion ==


The calculation of the torsional accelerations, i.e. the second time derivatives of the torsion angles, is the crucial point of a torsion angle dynamics algorithm. The equations of motion for a classical mechanical system with generalized coordinates are the Lagrange equations  
The calculation of the torsional accelerations, i.e. the second time derivatives of the torsion angles, is the crucial point of a torsion angle dynamics algorithm. The equations of motion for a classical mechanical system with generalized coordinates are the Lagrange equations  
:<math>\frac{d}{dt}\left(\frac{\partial L}{\partial\dot\theta_k}\right) - \frac{\partial L}{\partial\theta_k} = 0</math>
    
    
with the Lagrange function ''L'' = ''E''<sub>kin</sub> - ''E''<sub>pot</sub>. They lead to equations of motion of the form
with the Lagrange function ''L'' = ''E''<sub>kin</sub> - ''E''<sub>pot</sub>. They lead to equations of motion of the form
  .  
 
In the case of n torsion angles as degrees of freedom, the n &times; n mass matrix M() and the ''n''-dimensional vector can be calculated explicitly (Mazur & Abagyan, 1989; Mazur et al., 1991). To generate a trajectory this linear set of n equations would have to be solved in each time step for the torsional accelerations , formally by
:<math>M(\theta)\ddot\theta + C(\theta,\dot\theta) = 0</math>.
.
   
[[Image:CPUtimeVsSize.jpg|thumb|300px|Fig. 2. Dependence on the protein size of the CPU time for the calculation of one conformer using the CYANA torsion angle dynamics algorithm with 4000 time steps on two different computers (solid and dotted lines) (Güntert et al., 1997).]]
 
In the case of n torsion angles as degrees of freedom, the n &times; n mass matrix ''M''(''&theta;'') and the ''n''-dimensional vector <math>C(\theta,\dot\theta)</math>, can be calculated explicitly (Mazur & Abagyan, 1989; Mazur et al., 1991). To generate a trajectory this linear set of n equations would have to be solved in each time step for the torsional accelerations <math>\ddot\theta</math>, formally by
 
:<math>\ddot\theta = M(\theta)^{-1}C(\theta,\dot\theta)</math>.
 
This requires a computational effort proportional to ''n''<sup>3</sup>, which is prohibitively expensive for larger systems. Therefore, in CYANA the fast recursive algorithm of (Jain et al., 1993) is implemented to compute the torsional accelerations, which makes explicit use of the tree structure of the molecule in order to obtain  with a computational effort that is only proportional to ''n''. The mathematical details and a proof of correctness of the CYANA torsion angle dynamics algorithm are given in  (Jain et al., 1993).
This requires a computational effort proportional to ''n''<sup>3</sup>, which is prohibitively expensive for larger systems. Therefore, in CYANA the fast recursive algorithm of (Jain et al., 1993) is implemented to compute the torsional accelerations, which makes explicit use of the tree structure of the molecule in order to obtain  with a computational effort that is only proportional to ''n''. The mathematical details and a proof of correctness of the CYANA torsion angle dynamics algorithm are given in  (Jain et al., 1993).


[[Image:CPUtimeVsSize.jpg|thumb|300px|Fig. 2. Dependence on the protein size of the CPU time for the calculation of one conformer using the CYANA torsion angle dynamics algorithm with 4000 time steps on two different computers (solid and dotted lines) (Güntert et al., 1997).]]
== Torsional accelerations ==
 
The algorithm of (Jain et al., 1993) computes a factorization of the inverse of the mass matrix, ''M''(''&theta;'')<sup>-1</sup>, into a product of highly sparse matrices with non-zero elements only in 6 &times; 6 blocks on or near the diagonal. As a result, the torsional accelerations can be obtained by executing a series of three linear loops over all rigid bodies similar to the single loop that is needed to compute the kinetic energy, ''E''<sub>kin</sub>.
 
The computation of the torsional accelerations  by the algorithm of Jain et al. (1993) is initialized by calculating for all rigid bodies, ''k'' = 1,…,n, the six-dimensional vectors ''a<sub>k</sub>'', ''e<sub>k</sub>'' and ''z<sub>k</sub>'':
 
:<math>a_k = \begin{bmatrix}\omega_k \wedge e_k)\dot\theta_k \\\omega_{p(k)} \wedge (v_k - v_{p(k)})\end{bmatrix}</math>,
 
:<math>e_k = \begin{bmatrix}e_k\\0\end{bmatrix}</math>,
 
:<math>a_k = \begin{bmatrix}\omega_k \wedge I_k\omega_k\\(\omega_k \cdot m_k Y_k)\omega_k - |\omega_k|^2 m_k Y_k\end{bmatrix}</math>,
 
and the 6 &times; 6 matrices ''P<sub>k</sub>'' and ''&phi;<sub>k</sub>'':
:<math>P_k = \begin{bmatrix}I_k & m_k A(Y_k)\\-m_k A(Y_k) & m_k 1_3\end{bmatrix}</math>,
 
:<math>\phi_k = \begin{bmatrix}1_3 & A(r_k-r_{p(k)})\\0_3 & 1_3\end{bmatrix}</math>.
 
0<sub>3</sub> is the 3 &times; 3 zero matrix, 1<sub>3</sub> is the 3 &times; 3 unit matrix, and ''A''(''y'') denotes the antisymmetric 3 &times; 3 matrix associated with the cross product, i.e., ''A''(''y'')''x'' = ''y'' Λ ''x'' for all vectors ''x''.
 
Next, several auxiliary quantities are calculated by executing a recursive loop over all rigid bodies in the backward direction, ''k'' = ''n'',''n'' - 1,…,1:
 
:<math>\begin{align}D_k &= e_k \cdot P_k e_k\\
                    G_k &= P_k e_k/D_k\\
                    \epsilon_k &= -e_k \cdot (z_k + P_k a_k) - \frac{\partial V}{\partial\theta_k}\\
                    P_{p(k)} &\leftarrow P_{p(k)} + \phi_k(P_k - G_k e_k^T P_k)\phi_k^T\\
                    z_{p(k)} &\leftarrow z_{p(k)} + \phi_k(z_k + P_k a_k - G_k \epsilon_k)
      \end{align}</math>
 
''D<sub>k</sub>'' and ''&epsilon;<sub>k</sub>'' are scalars, ''G<sub>k</sub>'' is a six-dimensional vector, and “&larr;” means: “assign the result of the expression on the right hand side to the variable on the left hand side.” Finally, the tor-sional accelerations are obtained by executing another recursive loop over all rigid bodies in the forward direction, ''k'' = 1,…,''n'':
 
:<math>\begin{align}\alpha_k &= \phi_k^T \alpha_{p(k)}\\
                    \ddot\theta_k &= \epsilon_k/D_k - G_k\cdot\alpha_k\\
                    \alpha_k &\leftarrow \alpha_k + e_k \ddot\theta_k + a_k
      \end{align}</math>
 
The auxiliary quantities ''&alpha;<sub>k</sub>'' are six-dimensional vectors, with ''&alpha;''<sub>0</sub> being equal to the zero vector.
 
== Time step ==
 
=== Leap-frog algorithm ===
 
The integration scheme for the equations of motion in torsion angle dynamics (Mathiowetz et al., 1994) is a variant of the “leap-frog” algorithm used in Cartesian space molecular dynamics (Allen & Tildesley, 1987). To obtain a trajectory, the equations of motion are numerically integrated by advancing the ''i'' = 1,…,''n'' (generalized) coordinates ''q<sub>i</sub>'' and velocities <math>\dot q_i</math> that describe the system by a small but finite time step &Delta;''t'':


=== Torsional accelerations ===
:<math>\begin{align}
\dot q_i(t + \Delta t/2) &= \dot q_i(t - \Delta t/2) + \Delta t \ddot q_i(t) + O(\Delta t^3)\\
q_i(t + \Delta t) &= q_i(t) + \dot q_i(t + \Delta t/2) + O(\Delta t^3)
\end{align}</math>


The algorithm of (Jain et al., 1993) computes a factorization of the inverse of the mass matrix, M(), into a product of highly sparse matrices with non-zero elements only in 66 blocks on or near the diagonal. As a result, the torsional accelerations can be obtained by executing a series of three linear loops over all rigid bodies similar to the single loop that is needed to compute the kinetic energy, Ekin.
The degrees of freedom, ''q<sub>i</sub>'', are the Cartesian coordinates of the atoms in conventional molecular dynamics simulation, or the torsion angles in CYANA. The ''O''(&Delta;''t''<sup>3</sup>) terms indicate that the errors with respect to the exact solution incurred by the use of a finite time step &Delta;''t'' are proportional to &Delta;''t''<sup>3</sup>.
The computation of the torsional accelerations  by the algorithm of Jain et al. (1993) is initialized by calculating for all rigid bodies, k = 1, …, n, the six-dimensional vectors ak,  ek and zk:
[3]
and the 66 matrices Pk and k:
[4]
0 is the three-dimensional zero vector, 03 is the 33 zero matrix, 13 is the 33 unit matrix, and A(y) denotes the antisymmetric 33 matrix associated with the cross product, i.e.,  for all vectors x.
Next, several auxiliary quantities are calculated by executing a recursive loop over all rigid bodies in the backward direction, kn, n1, …, 1:
[5]
Dk and k are scalars, Gk is a six-dimensional vector, and “” means: “assign the result of the expression on the right hand side to the variable on the left hand side.” Finally, the tor-sional accelerations are obtained by executing another recursive loop over all rigid bodies in the forward direction, k1, …, n:
  [6]
The auxiliary quantities k are six-dimensional vectors, with 0 being equal to the zero vec-tor.


=== Time step ===
=== Temperature control ===


The integration scheme for the equations of motion in torsion angle dynamics (Mathiowetz et al., 1994) is a variant of the “leap-frog” algorithm used in Cartesian space molecular dynamics (Allen & Tildesley, 1987). To obtain a trajectory, the equations of motion are numerically integrated by advancing the i1, …, n (generalized) coordinates qi and ve-locities  that describe the system by a small but finite time step t:
The time step &Delta;''t'' must be small enough to sample adequately the fastest motions. Because the fastest motions in conventional molecular dynamics simulation are oscillations of bond lengths and bond angles, which are “frozen” in torsion angle space, longer time steps can be used for torsion angle dynamics than for molecular dynamics in Cartesian space.  
The degrees of freedom, qi, are the Cartesian coordinates of the atoms in conventional molecular dynamics simulation, or the torsion angles in CYANA. The O(t3) terms indicate that the errors with respect to the exact solution incurred by the use of a finite time step t are proportional to t3. The time step t must be small enough to sample adequately the fastest motions. Because the fastest motions in conventional molecular dynamics simulation are oscillations of bond lengths and bond angles, which are “frozen” in torsion angle space, longer time steps can be used for torsion angle dynamics than for molecular dynamics in Cartesian space.  


In practical applications with proteins time steps of about 100, 30 and 7 fs at low (1 K), medium (400 K) and high (10000 K) temperatures, respectively, can be used in torsion angle dynamics calculations with CYANA (Güntert et al., 1997), whereas time steps in Cartesian space molecular dynamics simulation generally have to be in the range of 2 ns. The concomitant fast exploration of conformation space provides the basis for the efficient CYANA structure calculation protocol.
In practical applications with proteins time steps of about 100, 30 and 7 fs at low (1 K), medium (400 K) and high (10000 K) temperatures, respectively, can be used in torsion angle dynamics calculations with CYANA (Güntert et al., 1997), whereas time steps in Cartesian space molecular dynamics simulation generally have to be in the range of 2 ns. The concomitant fast exploration of conformation space provides the basis for the efficient CYANA structure calculation protocol.
Line 103: Line 141:
The temperature is controlled by weak coupling to an external bath (Berendsen et al., 1984). Using a similar approach, the length of the time step is adapted automatically based on the accuracy of energy conservation (Güntert et al., 1997).  
The temperature is controlled by weak coupling to an external bath (Berendsen et al., 1984). Using a similar approach, the length of the time step is adapted automatically based on the accuracy of energy conservation (Güntert et al., 1997).  


A time step, tt + t, that follows a preceding time step, tt't, consists of the fol-lowing parts:
=== Time step ===
 
A time step, ''t'' &rarr; ''t'' + &Delta;''t'', that follows a preceding time step, ''t'' - &Delta;''t''' &rarr; ''t'', consists of the following parts:
 
(1) On the basis of the torsional positions, ''&theta;''(''t''), [[calculate the Cartesian coordinates|Coordinate generation]] of all atoms (Güntert, 1993).


# On the basis of the torsional positions, (t), calculate the Cartesian coordinates of all atoms (Güntert, 1993).
(2) Using the Cartesian coordinates, calculate the [[target function]], i.e. the potential energy function ''E''<sub>pot</sub>(''t'') = ''E''<sub>pot</sub>(''&theta;''(''t'')), and its gradient &nabla;''E''<sub>pot</sub>(''t'').
# Using the Cartesian coordinates, calculate the potential energy function Epot(t) = Epot((t)), and its gradient Epot(t).
# Determine the time-step, t = t', using the time-step scaling factor
.
 is based on the reference value for the relative accuracy of energy conservation, ref, and on the relative change of the total energy, E = Ekin + Epot, in the preceding time-step, (t), as given by
.
  is the maximally allowed value of the scaling factor, which is set to  in CYANA. The time constant, 1, is a user-defined parameter that is measured in units of the time-step. Typically, a value of 20 is used for structure calculations with CYANA. To calculate (t), the total energy E(t) is evaluated before velocity scaling is ap-plied (step 4 below), whereas for E(tt') the value after velocity scaling in the preceding time-step is used. Thus, the measurement of the accuracy of energy conservation is not affected by the scaling of velocities. An exact algorithm would yield E(t)E(tt') and consequently (t)0.
# Adapt the temperature by scaling of the torsional velocities, i.e., replace  by  and  by  (see step 7 below for the definition of  ). The velocity scaling factor, T, is given by (Berendsen et al., 1984)
,
where Tref is the reference value of the temperature. The instantaneous temperature, T(t), is given by T(t) = 2 Ekin(t)/nkB, where n denotes the number of torsion angles and kB is the Boltzmann constant. The kinetic energy, Ekin(t), is taken from equation [1]. Temperature control by coupling to an external heat bath (and time-step control) can be turned off by setting .
# Calculate the torsional accelerations,  , using equations [3–6].
# Calculate the new velocities at half time-step,
# Calculate the new estimated velocities at full time step,
.
# Calculate the new torsional positions,
.


The algorithm is initialized by setting t = 0, t't and  . Furthermore, the initial torsional velocities,  , are chosen randomly from a normal distribution with zero mean value and a standard deviation which ensures that the initial temperature has a predefined value, T(0). Once the time step tt + t is completed by going through the operations 1 to 8, t is replaced by t + t, and t' by t.
(3) Determine the time-step, &Delta;''t'' = ''&lambda;<sub>&epsilon;</sub>''&Delta;''t''', using the time-step scaling factor


:<math>\lambda_\epsilon = \min\left(\lambda_\epsilon^\max,\sqrt{1+\frac{\epsilon^\text{ref} - \epsilon(t)}{\tau\epsilon(t)}}\right)</math>.
''&lambda;<sub>&epsilon;</sub>'' is based on the reference value for the relative accuracy of energy conservation, ''&epsilon;''<sup>ref</sup>, and on the relative change of the total energy, ''E'' = ''E''<sub>kin</sub> + ''E''<sub>pot</sub>, in the preceding time-step, ''&epsilon;''(''t''), as given by
:<math>\epsilon(t) = \left|\frac{E(t)-E(t-\Delta t')}{E(t)}\right|</math>.
<math>\lambda_\epsilon^\max</math> is the maximally allowed value of the scaling factor, which is set to 1.025 in CYANA. The time constant, ''&tau;'' >> 1, is a user-defined parameter that is measured in units of the time-step. Typically, a value of ''&tau;'' = 20 is used for structure calculations with CYANA. To calculate ''&epsilon;''(''t''), the total energy ''E''(''t'') is evaluated before velocity scaling is applied (step 4 below), whereas for ''E''(''t'' - &Delta;''t''&#39;) the value after velocity scaling in the preceding time-step is used. Thus, the measurement of the accuracy of energy conservation is not affected by the scaling of velocities. An exact algorithm would yield ''E''(''t'') = ''E''(''t'' - &Delta;''t''') and consequently ''&epsilon;''(''t'') = 0.
(4) Adapt the temperature by scaling of the torsional velocities, i.e., replace <math>\dot\theta_i(t - \Delta t'/2)</math>  by <math>\lambda_T\dot\theta_i(t - \Delta t'/2)</math> and <math>\dot\theta_e(t)</math> by <math>\lambda_T\dot\theta_e(t)</math> (see step 7 below for the definition of <math>\dot\theta_e(t)</math>). The velocity scaling factor, ''&lambda;<sub>T</sub>'', is given by (Berendsen et al., 1984)
:<math>\lambda_T = \sqrt{1+\frac{T^\text{ref}-T(t)}{\tau T(t)}}</math>,
   
   
Fig. 3. Plots versus the number of torsion angle dynamics steps for a structure calculation with simulated annealing for the protein cyclophilin A. (a) Temperature of the heat bath to which the system is weakly coupled. (b) Integration time step length, t. (c) RMS torsion angle change, , between successive time steps. Data are averaged over 50 time steps and 80 independently calculated conformers (Güntert et al., 1997).
where ''T''<sup>ref</sup> is the reference value of the temperature. The instantaneous temperature, ''T''(''t''), is given by ''T''(''t'') = 2 ''E''<sub>kin</sub>(''t'')/''nk''<sub>B</sub>, where ''E''<sub>kin</sub>(''t'') is the kinetic energy at time ''t'', ''n'' denotes the number of torsion angles and ''k''<sub>B</sub> is the Boltzmann constant. Temperature control by coupling to an external heat bath (and time-step control) can be turned off by setting ''&tau;'' = &infin;.
 
(5) Calculate the torsional accelerations, <math>\ddot\theta(t)=\ddot\theta(\theta(t),\dot\theta_e(t))</math>, using equations [3–6].
 
(6) Calculate the new velocities at half time-step,
 
:<math>\dot\theta(t + \Delta t/2) = \dot\theta(t - \Delta t'/2) + \frac{\Delta t+\Delta t'}{2} \ddot\theta(t)</math>.
 
(7) Calculate the new estimated velocities at full time step,
 
:<math>\dot\theta_e(t + \Delta t) = \left(1+\frac{\Delta t}{\Delta t+\Delta t'}\right)\dot\theta(t + \Delta t/2) - \frac{\Delta t}{\Delta t+\Delta t'}{2} \dot\theta(t-\Delta t'/2)</math>.
 
(8) Calculate the new torsional positions,


Unlike the situation in Cartesian dynamics, where the accelerations are a function of the positions only, the torsional accelerations also depend on the velocities. These, however, are only known at half time-steps, whereas the positions and accelerations are required at full time-steps. In step 7 the presently used algorithm employs linear extrapolation to obtain an estimate of the velocity after the full time-step,  , which is used in the next integration step to calculate the torsional accelerations (step 5). They are therefore beset with an additional error that could be eliminated by iterating the steps 5–7. In general, no significant improvement can be observed after one iteration (Mathiowetz et al., 1994). Using  instead of  introduces an additional error of order O(t3) into the velocity calculation of the leap-frog algorithm in step 6. However, the intrinsic accuracy of the velocity step is also O(t3). Thus, using the estimated velocities,  , does not change the order of accuracy of the integration algorithm. Each iteration of steps 5–7 requires the calculation of the torsional accelerations which, in general, takes as long as the calculation of the target function and its gradient. It is therefore more efficient to slightly decrease the time step, t, than to perform steps 5–7 twice in each integration step. The situation could be different if a full physical force field were used, since then the calculation of torsional accelerations would require only a negligible fraction of the total computation time.
:<math>\theta(t + \Delta t) = \theta(t) + \Delta t \dot\theta(t+\Delta t/2)</math>.


Since in structure calculations with torsion angle dynamics the time-steps are made as long as possible a safeguard against occasional strong violations of energy conservation was introduced: If the total energy, E, has changed by more than 10% in a single time-step, this time-step is rejected and replaced by two time-steps of half length.
The algorithm is initialized by setting ''t'' = 0, &Delta;''t&#39;'' = &Delta;''t'' and <math>\dot\theta_e(0)=\dot\theta(-\Delta t/2)</math>. The initial torsional velocities, <math>\dot\theta(-\Delta t/2)</math>, are chosen randomly from a normal distribution with zero mean value and a standard deviation which ensures that the initial temperature has a predefined value, ''T''(0). Once the time step ''t'' &rarr; ''t'' + &Delta;''t'' is completed by going through the operations 1 to 8, ''t'' is replaced by ''t'' + &Delta;''t'', and &Delta;''t&#39;'' by &Delta;''t''.


=== Global orientation of the molecule ===
=== Accuracy of acceleration calculation ===


In the original implementation of torsion angle dynamics in the program DYANA (Güntert et al., 1997), the N-terminus of the protein was fixed in space. This was not a prerequisite of the torsion angle dynamics algorithm of (Jain et al., 1993), but it appeared conceptually more appealing since the system then had only one type of degrees of freedom, the torsion angles, whereas for the description of the global position and orientation of a molecule that can freely reorient in space, one has to introduce six additional degrees of freedom of a different type. As a consequence of fixing the N-terminus of the polypeptide chain, the total angular and linear momenta of the system cannot be conserved, since the conservation laws for these quantities result from the fact that a mechanical system is invariant under global rotation and translation (Arnold, 1978). Therefore, if a part of the molecule is fixed in space, it is in general not possible for the total angular momentum to be zero throughout the calculation. Global rotations associated with a non-vanishing total angular momentum may lead to centrifugal forces that can influence the sampling of conformation space for long, flexible molecules (Güntert & Wüthrich, 2001). Therefore, CYANA allows the molecule to reorient freely in space, and the total angular and linear momenta of the system are conserved and periodically reset to zero.
[[Image:Schedule.jpg|thumb|300px|Fig. 3. Plots versus the number of torsion angle dynamics steps for a structure calculation with simulated annealing for the protein cyclophilin A. (a) Temperature of the heat bath to which the system is weakly coupled. (b) Integration time step length, &Delta;''t''. (c) RMS torsion angle change, ''&sigma;<sub>&theta;</sub>'', between successive time steps. Data are averaged over 50 time steps and 80 independently calculated conformers (Güntert et al., 1997).]]
             
Fig. 4. Structures and Ramachandran plots of unconstrained Ala60 polypeptide chains. (a) Generated by randomizing the torsion angles. (b) Calculated using the standard CYANA torsion angle dynamics with momentum compensation. (c) Calculated using torsion angle dynamics in CYANA without momentum compensation. Each structure bundle consists of 50 conformers that were superimposed for minimal RMSD (Güntert & Wüthrich, 2001).


The position and orientation of the molecule in the inertial frame are specified, respectively, by the three Cartesian coordinates of the reference point of the base cluster, r0, and by 4 quaternion parameters q = (q0, q1, q2, q3), which are subject to the normalization condition  . The rotation matrix that describes the orientation of the base cluster with respect to the inertial frame is given in terms of the quaternion parameters by
Unlike the situation in Cartesian dynamics, where the accelerations are a function of the positions only, the torsional accelerations also depend on the velocities. These, however, are only known at half time-steps, whereas the positions and accelerations are required at full time-steps. In step 7 the presently used algorithm employs linear extrapolation to obtain an estimate of the velocity after the full time-step, <math>\dot\delta_e(t)</math>, which is used in the next integration step to calculate the torsional accelerations (step 5). They are therefore beset with an additional error that could be eliminated by iterating the steps 5–7. In general, no significant improvement can be observed after one iteration (Mathiowetz et al., 1994). Using <math>\dot\theta_e(t)</math> instead of <math>\dot\theta(t)</math> introduces an additional error of order ''O''(&Delta;''t''<sup>3</sup>) into the velocity calculation of the leap-frog algorithm in step 6. However, the intrinsic accuracy of the velocity step is also ''O''(&Delta;''t''<sup>3</sup>). Thus, using the estimated velocities, <math>\dot\delta_e(t)</math>, does not change the order of accuracy of the integration algorithm. Each iteration of steps 5–7 requires the calculation of the torsional accelerations which, in general, takes as long as the calculation of the target function and its gradient. It is therefore more efficient to slightly decrease the time step, &Delta;''t'', than to perform steps 5–7 twice in each integration step. The situation could be different if a full physical force field were used, since then the calculation of torsional accelerations would require only a negligible fraction of the total computation time.
.
The following two relations hold between the angular velocity of the base cluster, 0, and the time-derivative of the quaternion parameters,   (Kneller & Hinsen, 1994):
and
, [7]
where the superscript T denotes the transpose, and A(q) is given by
.
The additional degrees of freedom that enable free global reorientation of the molecule are incorporated in the CYANA torsion angle dynamics algorithm by extending the loops over the clusters  so as to include the base cluster (cluster 0). Instead of using  00 and v00, the angular and linear velocities of the base cluster are now given by  and  , respectively, and the value of 0 in the recursive calculation of the torsional accelerations is the solution of the six-dimensional linear system of equations P00z0, in which P0 and z0 are calculated as described above. The second time derivatives of the quaternion parameters are obtained by differentiation of the equation [7] with respect to time. This implementation of the algorithm preserves the total linear momentum and the total angular momentum of the system.
2.9. Compensation of angular and linear momenta
The total angular momentum of the molecule with respect to the origin of the inertial frame is
.
Similarly, the total linear momentum of the molecule is
.
A change in the angular and linear velocity of the base cluster, 00 + 0 and v0v0 + v0, leads to concomitant changes in the velocities of the other clusters to the extent of k0 and vkv00 (rk – r0). Hence, the changes induced in the total angular and linear momenta of the molecule are
, [8]
(with  for all three-dimensional vectors x) and
, [9]
where    is the total mass of the molecule, and 
the position of its center of mass. The angular and linear momentum changes are linear functions of the base cluster velocity changes:
, [10]
where B is a 66 matrix with elements given by equations [8–9]. Solving equation [10] for the case LL and PP yields the values 0 and v0 by which the velocities of the base cluster have to be changed in order to set the angular and linear momenta of the molecule from given values of L and P to zero.
Using this procedure, the angular and linear momenta are reset to zero after each sequence of 10 torsion angle dynamics steps. The additional computational effort for the momentum compensation is minimal and scales linearly with the size of the molecule.
=== Torsion angle dynamics command ‘md’ ===


md steps=integer (required)
Since in structure calculations with torsion angle dynamics the time-steps are made as long as possible a safeguard against occasional strong violations of energy conservation was introduced: If the total energy, ''E'', has changed by more than 10% in a single time-step, this time-step is rejected and replaced by two time-steps of half length.
dt=real (0.05)
level=integer (0)
temperature=real (0.1)
accuracy=real (0.0)
tau=real (0.0)
nprint=integer (0)
tinit=real (0.0)
estart=real (0.01)
angdev=real (10.0°)
vdwupdate=integer (100)
exact  continue
Performs molecular dynamics simulation in torsion angle space (torsion angle dynamics) with the following parameters:
steps Number of time steps.
dt Time step length.
level Maximal residue index difference for restraints that are included into the target function.
temperature  Temperature of the heat bath to which the system is coupled, or 0.0 for a simulation at constant total energy.
accuracy Desired relative accuracy of energy conservation for automatic time step length adaption, or 0.0 for a simulation with constant time step length.
tau Coupling constant for temperature and time step length updates, given in units of the time step length.
nprint Number of time steps between intermediate output.
tinit Initial time.
estart Initial temperature (kinetic energy per degree of freedom).
angdev Maximal change of a dihedral angle (in degrees) between two updates of the van der Waals pair list.
vdwupdate Maximal number of torsion angle dynamics steps between two updates of the van der Waals pair list.
The md command can start a new torsion angle dynamics run, or, if the option continue is set, continue a preceding torsion angle dynamics calculation. A new molecular dynamics trajectory starts at time tinit (normally 0.0) with random torsional velocities, chosen as Gaussian random variables such that the initial temperature (kinetic energy per degree of freedom) equals estart. If the md command is used to continue a previous calculation, then the velocities from the end of the previous md command are used and all parameters that are not specified explicitly are kept at the values of the previous md command. The parameters tinit and estart are not allowed in conjunction with the continue option.
If the temperature is set to 0.0, a molecular dynamics simulation at constant energy is performed. Normally, however, the system is weakly coupled to a heat bath of the given temperature with a time constant of tau times the time step length (Berendsen et al., 1984). The temperature can either be a constant or a function of the parameter 's', which varies linearly from 0 to 1 during the time steps that are performed during the execution of the md command, i.e., s(n)(n1)(N1) for step n out of a total of Nsteps time steps.
If the accuracy reference value for the accuracy of energy conservation has a positive value, then the length of the integration time step will be adapted during the molecular dynamics simulation such that the relative change of the total energy in successive integration steps is close to the given accuracy. The adaption of the time step length works in the same way as the temperature control. The time step length corresponds to the torsional velocities, and the relative change of the total (kinetic plus potential) energy corresponds to the temperature. In this case, the parameter 'dt' specifies the only initial value for the time step length.
The van der Waals interaction pair list is updated after at most vdwupdate torsion angle dynamics steps or whenever a torsion angle has changed its value by more than angdev degrees since the last update of the an der Waals interaction list.
The “leap-frog” algorithm is used to perform the torsion angle dynamics steps. Usually, torsional accelerations are computed on the basis torsional velocity values that are linearly extrapolated from those half a time-step earlier. More exact torsional acceleration values that are calculated iteratively (Mathiowetz et al., 1994) will be used if the option exact is set.
One line of output is written every nprint time steps, giving the current step, current time, potential energy (target function value), kinetic energy, total energy, the root-mean-square torsion angle change per time step (in degrees; averaged over all time steps since the last output), the maximal torsion angle change per time step (in degrees; since the last output), the number of updates of the van der Waals interaction list since the last output, and the number of target function evaluations since the last output. For example:
step     time      Epot      Ekin      Etot      rmsdev  maxdev  #up    #f
  0 0.000 17817.672  5776.000 23593.672   1      1
200  13.778  4367.090  7321.274 11688.363      2.842  18.576    4    204
400  28.471  2896.928  6002.219  8899.147      2.763  16.301    4    206
600  42.374  2464.380  6988.264  9452.645      2.330  13.941    4    200
800  60.234  2496.055  6167.296  8663.351      2.815  15.211    4    200
1000  76.882  1654.211  5322.900  6977.111      2.779  15.591    4    200
Intermediate output is written only if nprint is positive. All energies are measured in target function units. Temperatures are measured in target function units per degree of freedom. Each rotatable torsion angle constitutes a degree of freedom.
A warning is printed if in a single time step the value of a dihedral angle changed by more than 35°, and an error occurs if the change exceeds 90°.
3. Simulated annealing
The potential energy landscape of a protein is complex and studded with many local minima, even in the presence of experimental restraints and when using a simplified target function. Because the temperature, i.e. kinetic energy, determines the maximal height of energy barriers that can be overcome in a molecular dynamics trajectory, the temperature schedule is important for the success and efficiency of a simulated annealing calculation. Elaborated protocols have been devised for structure calculations using molecular dynamics in Cartesian space (Brünger, 1992; Nilges et al., 1988). In addition to the temperature, other parameters such as force constants and repulsive core radii are varied in these schedules that may involve several stages of heating and cooling. The fast exploration of conformation space with torsion angle dynamics allows for simpler schedules.
3.1. Standard CYANA simulated annealing schedule
The standard simulated annealing protocol in the program CYANA includes N torsion angle dynamics time steps. It starts from a conformation with all torsion angles treated as independent, uniformly distributed random variables and consists of five stages:
(1) Initial minimization. A short conjugate gradient minimization is applied to reduce high energy interactions that could otherwise disturb the torsion angle dynamics algorithm: 100 conjugate gradient minimization steps are performed, including only distance restraints between atoms up to 3 residues apart along the sequence, followed by a further 100 minimization steps including all restraints. For efficiency, all hydrogen atoms are excluded from the check for steric overlap, the repulsive core radii of heavy atoms without covalently bound hydrogen atoms are decreased by 0.2 Å with respect to their standard values, and the radii of heavy atoms with covalently bound hydrogens are decreased by 0.05 Å. The weighting factors in the target function are set to 1 for user-defined upper and lower distance bounds, and to 0.5 for steric lower distance bounds.
(2) First simulated annealing stage with reduced heavy atom radii. A torsion angle dynamics trajectory with (N200)3 time steps is generated. Typically, one fifth of these torsion angle dynamics steps are performed at a constant high reference temperature Thigh of, typically, 10000 K, followed by slow cooling according to a fourth-power law to an intermediate reference temperature Tmed = Thigh/20. The time step is initialized to 2 fs. The list of van der Waals lower distance bounds is updated every 50 steps using a cutoff equal to twice the largest van der Waals radius plus 1 Å (= 4.2 Å for proteins) for the van der Waals pair list generation throughout all torsion angle dynamics phases.
(3) Second simulated annealing stage with normal heavy atom radii and, later, normal hydrogen atom radii. The repulsive core radii of all heavy atoms are reset to their standard values, 50 conjugate gradient minimization steps are performed, and the torsion angle dynamics trajectory is continued for 2(N200)3 time step starting with an initial time step that is half as long as the last preceding time step. The reference temperature is decreased according to a fourth-power law from the intermediate temperature Tmed to zero reference temperature. After two thirds of these time steps, the hydrogen atoms are included, with their standard radii, in the steric overlap check, and 50 conjugate gradient minimization steps are performed before continuing the trajectory, starting with a time step that is half as long as the last preceding time step.
(4) Low temperature phase with increased weight for steric repulsion. The weighting factor for steric restraints is increased to 2, and 50 conjugate gradient minimization steps are performed, followed by 200 torsion angle dynamics steps at zero reference temperature, starting with a time step that is half as long as the last preceding time step.
(5) Final minimization. A final minimization with a maximum of, typically, 1000 conjugate gradient steps is applied.
3.2. Simulated annealing command ‘anneal’
anneal thigh=real (8.0)
steps=integer (4000)
highsteps=integer ((steps – 200)/15)
minsteps=integer (1000)
relax
Performs simulated annealing on the current structure with the given total of torsion angle dynamics steps, starting with highsteps of torsion angle dynamics at temperature thigh followed by slow cooling during the remaining torsion angle dynamics steps to zero final reference temperature. Finally, minsteps of conjugate gradient minimization are performed. The temperature is measured in target function units per degree of freedom. More minimization can be performed with the relax option to reduce strong overlaps and restraint violations prior to the start of the MD calculation. The relax option can be useful for proteins with more than 250 residues to avoid excessively large target function values at the outset of the simulated annealing torsion angle dynamics.


== References ==
== References ==

Latest revision as of 19:30, 25 February 2023

Introduction

Torsion angle dynamics versus other structure calculation algorithms

The structure calculation algorithm in CYANA is based on the idea of simulated annealing (Kirkpatrick et al., 1983) by molecular dynamics simulation in torsion angle space. The distinctive feature of molecular dynamics simulation when compared to the straightforward minimization of a target function is the presence of kinetic energy that allows overcoming barriers of the potential surface, which reduces greatly the problem of becoming trapped in local minima.

Torsion angle dynamics, i.e. molecular dynamics simulation using torsion angles instead of Cartesian coordinates as degrees of freedom, provides at present the most efficient way to calculate NMR structures of biological macromolecules. The only degrees of freedom are the torsion angles, i.e. rotations about single bonds, such that the conformation of the molecule is uniquely specified by the values of all torsion angles. Covalent bonds that are incompatible with a tree structure because they would introduce closed flexible rings, for example disulphide bridges, are treated, as in Cartesian space dynamics, by distance constraints.

When compared with other structure calculation algorithms based on simulated annealing (Brünger, 1992), the principal difference of torsion angle dynamics is that it works with internal coordinates rather than Cartesian coordinates. Thereby the number of degrees of freedom is decreased about ten-fold, since the covalent structure parameters (bond lengths, bond angles, chiralities and planarities) are kept fixed at their optimal values during the calculation. The strong potentials required to preserve the covalent structure in conventional Cartesian space molecular dynamics and the concomitant high-frequency motions are absent in torsion angle dynamics. This results in a simpler potential energy function and the use of longer time-steps for the numerical integration of the equations of motion, and hence in higher efficiency of the structure calculation. An additional, practical advantage of working in torsion angle space is that this approach excludes inadvertent changes of chirality during the course of a structure calculation, which may happen with Cartesian space molecular dynamics (Schultze & Feigon, 1997).

To fully exploit the potential advantages of torsion angle dynamics for macromolecular structure calculation, its implementation has to take account of the fact that the Lagrange equations of motion with torsion angles as degrees of freedom are more complex than Newton’s equations in Cartesian coordinates. The computation time for “naïve” (but by no means simple) approaches (Mazur & Abagyan, 1989; Mazur et al., 1991) to torsion angle dynamics rises with the third power of the system size, which renders these algorithms unsuitable for use with macromolecules. CYANA gets around this potential drawback by using a fast recursive implementation of the equations of motion that was originally developed for spacecraft dynamics and robotics (Jain et al., 1993), which entails a computational effort proportional to the number of torsion angles. With the fast torsion angle dynamics algorithm in CYANA the advantages of torsion angle dynamics, especially that much longer integration time steps can be used, are effective for molecules of all sizes.

NMR structure calculation versus molecular dynamics simulation

There is a fundamental difference between molecular dynamics simulation that has the aim of simulating the trajectory of a molecular system as realistically as possible in order to extract molecular quantities of interest and NMR structure calculation that is driven by experimental restraints.

Classical molecular dynamics simulations rely on a full empirical force field to ensure proper stereochemistry, and are generally run at a constant temperature, close to room temperature. Substantial amounts of computation time are required because the empirical energy function includes long-range pair interactions that are time-consuming to evaluate, and because conformation space is explored slowly at room temperature.

When molecular dynamics algorithms are used for NMR structure calculations, however, the objective is quite different. Here, such algorithms simply provide a means to efficiently optimize a target function that takes the role of the potential energy. Details of the calculation, such as the course of a trajectory, are unimportant, as long as its end point comes close to the global minimum of the target function.

Therefore, the efficiency of NMR structure calculation can be enhanced by simplification or modification of the force field and/or the algorithm that do not significantly alter the location of the global minimum (the correctly folded structure) but shorten, in terms of the computation time needed, the path by which it can be reached from the start conformation.

A typical “geometric” force field used in NMR structure calculation therefore retains only the most important part of the non-bonded interaction by a simple repulsive potential that replaces the Lennard-Jones and electrostatic interactions of the full empirical energy function. This short-range repulsive function can be calculated much faster and significantly facilitates large-scale conformational changes that are required during the folding process by lowering energy barriers induced by the overlap of atoms.

Tree structure

The key idea of the fast torsion angle dynamics algorithm in CYANA is to exploit the fact that a chain molecule such as a protein or nucleic acid can be represented in a natural way as a tree structure consisting of n + 1 rigid bodies or “clusters” that are connected by n rotatable bonds. Each rigid body is made up of one or several mass points (atoms) with fixed relative positions. The tree structure starts from a base, typically at the N-terminus of the polypeptide chain, and terminates with “leaves” at the ends of the side-chains and at the C-terminus. The only degrees of freedom are rotations about single bonds, and parameters that define the position and orientation of the molecule in space. The clusters are numbered from 0 to n. The base cluster has the number k = 0. Each of the other clusters, with numbers k ≥ 1, has a single nearest neighbor in the direction toward the base, which has a number p(k) < k. The torsion angle between the rigid bodies p(k) and k is denoted by θk. The conformation of the molecule is uniquely specified by the values of all torsion angles, (θ1,…,θn).

Fig. 1. (a) Tree structure of torsion angles for the tripeptide Val-Ser-Ile. Circles represent rigid units. Rotatable bonds are indicated by arrows that point towards the part of the structure that is rotated if the corresponding torsion angle is changed. (b) Excerpt from the tree structure formed by the torsion angles of a molecule, and definition of quantities required by the CYANA torsion angle dynamics algorithm.

The following quantities are defined for each cluster k (see Fig. 1): the “reference point”, rk, which is the position vector of the end point of the bond between the clusters p(k) and k; , the velocity of the reference point; ωk, the angular velocity vector of the cluster; Yk, the vector from the reference point to the center of mass of the cluster; mk, the mass of the cluster k; Ik, the inertia tensor of the cluster k with respect to the reference point, given by

where the sum runs over all atoms in the cluster k, mα is the mass of the atom α, yα is the vector from the reference point of cluster k to the atom α, and I(yα) is the symmetric 3 × 3 matrix defined by the relation I(y) = y Λ (x Λ y) for all three-dimensional vectors x. The symbol “Λ” denotes the vector product. All position vectors are in an inertial frame of reference that is fixed in space.

Kinetic energy

The angular velocity vector ωk and the linear velocity vk of the reference point of the rigid body k are calculated recursively from the corresponding quantities of the preceding rigid body p(k):

,
.

The kinetic energy can then be computed in a linear loop over all rigid bodies:

.

Potential energy = target function

For torsion angle dynamics the CYANA target function (Güntert et al., 1991; Güntert et al., 1997), V, has the role of the potential energy, Epot = V, and energies are reported in units of the target function V, i.e. Å2, in the current version of CYANA. If energies and temperatures are to be reported in their standard units of kJ/mol and K, we assume that Epot = w0V, with an overall (arbitrary) scaling factor w0 = 10 kJ mol-1Å-2.

Because the target function is not a physical potential, there is no unique “natural” way to define the energy and temperature scale for protein structure calculations.

Generalized forces = torques = –gradient of the target function

The torques about the rotatable bonds, i.e., the negative gradients of the potential energy or target function with respect to torsion angles, -∇V(θ), are calculated by a fast recursive algorithm (Abe et al., 1984). The gradient of the target function can be calculated efficiently because the target function is a sum of functions of individual interatomic distances and torsion angles. The partial derivative of the target function V with respect to a torsion angle θk is given by

with

,
.

rα and rβ are the position vectors of the atoms α and β, and Mk denotes the set of all atoms whose position is affected by a change of the torsion angle θk if the base cluster is held fixed. Because Mk is a subset of Mp(k), the quantities fk and gk can be calculated recursively for k = n,n - 1,…,1 starting from the leaves of the tree structure by evaluating the interaction for each atom pair (α,β) only once.

Equations of motion

The calculation of the torsional accelerations, i.e. the second time derivatives of the torsion angles, is the crucial point of a torsion angle dynamics algorithm. The equations of motion for a classical mechanical system with generalized coordinates are the Lagrange equations

with the Lagrange function L = Ekin - Epot. They lead to equations of motion of the form

.
Fig. 2. Dependence on the protein size of the CPU time for the calculation of one conformer using the CYANA torsion angle dynamics algorithm with 4000 time steps on two different computers (solid and dotted lines) (Güntert et al., 1997).

In the case of n torsion angles as degrees of freedom, the n × n mass matrix M(θ) and the n-dimensional vector , can be calculated explicitly (Mazur & Abagyan, 1989; Mazur et al., 1991). To generate a trajectory this linear set of n equations would have to be solved in each time step for the torsional accelerations , formally by

.

This requires a computational effort proportional to n3, which is prohibitively expensive for larger systems. Therefore, in CYANA the fast recursive algorithm of (Jain et al., 1993) is implemented to compute the torsional accelerations, which makes explicit use of the tree structure of the molecule in order to obtain with a computational effort that is only proportional to n. The mathematical details and a proof of correctness of the CYANA torsion angle dynamics algorithm are given in (Jain et al., 1993).

Torsional accelerations

The algorithm of (Jain et al., 1993) computes a factorization of the inverse of the mass matrix, M(θ)-1, into a product of highly sparse matrices with non-zero elements only in 6 × 6 blocks on or near the diagonal. As a result, the torsional accelerations can be obtained by executing a series of three linear loops over all rigid bodies similar to the single loop that is needed to compute the kinetic energy, Ekin.

The computation of the torsional accelerations by the algorithm of Jain et al. (1993) is initialized by calculating for all rigid bodies, k = 1,…,n, the six-dimensional vectors ak, ek and zk:

,
,
,

and the 6 × 6 matrices Pk and φk:

,
.

03 is the 3 × 3 zero matrix, 13 is the 3 × 3 unit matrix, and A(y) denotes the antisymmetric 3 × 3 matrix associated with the cross product, i.e., A(y)x = y Λ x for all vectors x.

Next, several auxiliary quantities are calculated by executing a recursive loop over all rigid bodies in the backward direction, k = n,n - 1,…,1:

Dk and εk are scalars, Gk is a six-dimensional vector, and “←” means: “assign the result of the expression on the right hand side to the variable on the left hand side.” Finally, the tor-sional accelerations are obtained by executing another recursive loop over all rigid bodies in the forward direction, k = 1,…,n:

The auxiliary quantities αk are six-dimensional vectors, with α0 being equal to the zero vector.

Time step

Leap-frog algorithm

The integration scheme for the equations of motion in torsion angle dynamics (Mathiowetz et al., 1994) is a variant of the “leap-frog” algorithm used in Cartesian space molecular dynamics (Allen & Tildesley, 1987). To obtain a trajectory, the equations of motion are numerically integrated by advancing the i = 1,…,n (generalized) coordinates qi and velocities that describe the system by a small but finite time step Δt:

The degrees of freedom, qi, are the Cartesian coordinates of the atoms in conventional molecular dynamics simulation, or the torsion angles in CYANA. The Ot3) terms indicate that the errors with respect to the exact solution incurred by the use of a finite time step Δt are proportional to Δt3.

Temperature control

The time step Δt must be small enough to sample adequately the fastest motions. Because the fastest motions in conventional molecular dynamics simulation are oscillations of bond lengths and bond angles, which are “frozen” in torsion angle space, longer time steps can be used for torsion angle dynamics than for molecular dynamics in Cartesian space.

In practical applications with proteins time steps of about 100, 30 and 7 fs at low (1 K), medium (400 K) and high (10000 K) temperatures, respectively, can be used in torsion angle dynamics calculations with CYANA (Güntert et al., 1997), whereas time steps in Cartesian space molecular dynamics simulation generally have to be in the range of 2 ns. The concomitant fast exploration of conformation space provides the basis for the efficient CYANA structure calculation protocol.

The temperature is controlled by weak coupling to an external bath (Berendsen et al., 1984). Using a similar approach, the length of the time step is adapted automatically based on the accuracy of energy conservation (Güntert et al., 1997).

Time step

A time step, tt + Δt, that follows a preceding time step, t - Δt't, consists of the following parts:

(1) On the basis of the torsional positions, θ(t), Coordinate generation of all atoms (Güntert, 1993).

(2) Using the Cartesian coordinates, calculate the target function, i.e. the potential energy function Epot(t) = Epot(θ(t)), and its gradient ∇Epot(t).

(3) Determine the time-step, Δt = λεΔt', using the time-step scaling factor

.

λε is based on the reference value for the relative accuracy of energy conservation, εref, and on the relative change of the total energy, E = Ekin + Epot, in the preceding time-step, ε(t), as given by

.

is the maximally allowed value of the scaling factor, which is set to 1.025 in CYANA. The time constant, τ >> 1, is a user-defined parameter that is measured in units of the time-step. Typically, a value of τ = 20 is used for structure calculations with CYANA. To calculate ε(t), the total energy E(t) is evaluated before velocity scaling is applied (step 4 below), whereas for E(t - Δt') the value after velocity scaling in the preceding time-step is used. Thus, the measurement of the accuracy of energy conservation is not affected by the scaling of velocities. An exact algorithm would yield E(t) = E(t - Δt') and consequently ε(t) = 0.

(4) Adapt the temperature by scaling of the torsional velocities, i.e., replace by and by (see step 7 below for the definition of ). The velocity scaling factor, λT, is given by (Berendsen et al., 1984)

,

where Tref is the reference value of the temperature. The instantaneous temperature, T(t), is given by T(t) = 2 Ekin(t)/nkB, where Ekin(t) is the kinetic energy at time t, n denotes the number of torsion angles and kB is the Boltzmann constant. Temperature control by coupling to an external heat bath (and time-step control) can be turned off by setting τ = ∞.

(5) Calculate the torsional accelerations, , using equations [3–6].

(6) Calculate the new velocities at half time-step,

.

(7) Calculate the new estimated velocities at full time step,

.

(8) Calculate the new torsional positions,

.

The algorithm is initialized by setting t = 0, Δt' = Δt and . The initial torsional velocities, , are chosen randomly from a normal distribution with zero mean value and a standard deviation which ensures that the initial temperature has a predefined value, T(0). Once the time step tt + Δt is completed by going through the operations 1 to 8, t is replaced by t + Δt, and Δt' by Δt.

Accuracy of acceleration calculation

Fig. 3. Plots versus the number of torsion angle dynamics steps for a structure calculation with simulated annealing for the protein cyclophilin A. (a) Temperature of the heat bath to which the system is weakly coupled. (b) Integration time step length, Δt. (c) RMS torsion angle change, σθ, between successive time steps. Data are averaged over 50 time steps and 80 independently calculated conformers (Güntert et al., 1997).

Unlike the situation in Cartesian dynamics, where the accelerations are a function of the positions only, the torsional accelerations also depend on the velocities. These, however, are only known at half time-steps, whereas the positions and accelerations are required at full time-steps. In step 7 the presently used algorithm employs linear extrapolation to obtain an estimate of the velocity after the full time-step, , which is used in the next integration step to calculate the torsional accelerations (step 5). They are therefore beset with an additional error that could be eliminated by iterating the steps 5–7. In general, no significant improvement can be observed after one iteration (Mathiowetz et al., 1994). Using instead of introduces an additional error of order Ot3) into the velocity calculation of the leap-frog algorithm in step 6. However, the intrinsic accuracy of the velocity step is also Ot3). Thus, using the estimated velocities, , does not change the order of accuracy of the integration algorithm. Each iteration of steps 5–7 requires the calculation of the torsional accelerations which, in general, takes as long as the calculation of the target function and its gradient. It is therefore more efficient to slightly decrease the time step, Δt, than to perform steps 5–7 twice in each integration step. The situation could be different if a full physical force field were used, since then the calculation of torsional accelerations would require only a negligible fraction of the total computation time.

Since in structure calculations with torsion angle dynamics the time-steps are made as long as possible a safeguard against occasional strong violations of energy conservation was introduced: If the total energy, E, has changed by more than 10% in a single time-step, this time-step is rejected and replaced by two time-steps of half length.

References

ABE, H., BRAUN, W., NOGUTI, T. & GO, N. (1984). Rapid calculation of 1st and 2nd derivatives of conformational energy with respect to dihedral angles for proteins - General recurrent equations. Computers & Chemistry 8, 239–247.

ALLEN, M. P. & TILDESLEY, D. J. (1987). Computer simulation of liquids. Oxford: Clarendon Press.

ARNOLD, V. I. (1978). Mathematical methods of classical mechanics. New York: Springer.

BERENDSEN, H. J. C., POSTMA, J. P. M., VAN GUNSTEREN, W. F., DINOLA, A. & HAAK, J. R. (1984). Molecular dynamics with coupling to an external bath. Journal of Chemical Physics 81, 3684-3690.

BRÜNGER, A. T. (1992). X-PLOR, Version 3.1. A System for X-ray Crystallography and NMR. New Haven, CT: Yale University Press.

GÜNTERT, P. (1993). Neue Rechenverfahren für die Proteinstrukturbestimmung mit Hilfe der magnetischen Kernspinresonanz. Ph.D. thesis, ETH.

GÜNTERT, P., BRAUN, W. & WÜTHRICH, K. (1991). Efficient computation of three-dimensional protein structures in solution from nuclear magnetic resonance data using the program DIANA and the supporting programs CALIBA, HABAS and GLOMSA. Journal of Molecular Biology 217, 517–530.

GÜNTERT, P., MUMENTHALER, C. & WÜTHRICH, K. (1997). Torsion angle dynamics for NMR structure calculation with the new program DYANA. Journal of Molecular Biology 273, 283–298.

GÜNTERT, P. & WÜTHRICH, K. (2001). Sampling of conformation space in torsion angle dynamics calculations. Computer Physics Communications 138, 155–169.

JAIN, A., VAIDEHI, N. & RODRIGUEZ, G. (1993). A fast recursive algorithm for molecular dynamics simulation. Journal of Computational Physics 106, 258–268.

KIRKPATRICK, S., GELATT, C. D. & VECCHI, M. P. (1983). Optimization by Simulated Annealing. Science 220, 671–680.

KNELLER, G. R. & HINSEN, K. (1994). Generalized Euler equations for linked rigid bodies. Physical Review E 50, 1559–1564.

MATHIOWETZ, A. M., JAIN, A., KARASAWA, N. & GODDARD, W. A., III. (1994). Protein simulations using techniques suitable for very large systems: the cell multipole method for nonbond interactions and the Newton-Euler inverse mass operator method for internal coordinate dynamics. Proteins: Structure, Function, and Bioinformatics 20, 227–247.

MAZUR, A. K. & ABAGYAN, R. A. (1989). New methodology for computer-aided modeling of biomolecular structure and dynamics 1. Non-cyclic structures. Journal of Biomolecular Structure & Dynamics 6, 815–832.

MAZUR, A. K., DOROFEEV, V. E. & ABAGYAN, R. A. (1991). Derivation and testing of explicit equations of motion for polymers described by internal coordinates. Journal of Computational Physics 92, 261–272.

NILGES, M., CLORE, G. M. & GRONENBORN, A. M. (1988). Determination of three-dimensional structures of proteins from interproton distance data by hybrid distance geometry-dynamical simulated annealing calculations. FEBS Letters 229, 317–324.

SCHULTZE, P. & FEIGON, J. (1997). Chirality errors in nucleic acid structures. Nature 387, 668-668.