The state of molecular mechanics simulations and their analysis has evolved considerably
since its inception in the mid-1960s [1, 2]. This radical evolution has been driven
by increased scientific knowledge in conjunction with faster and cheaper computing
hardware. As our knowledge level improves – often by co-application of experiment
and theory to individual problems  - aging, monolithic software codes are being
discarded in favor of those that are modern, extensible, and easier to use.
Experimentalists want an approachable molecular mechanics tool that allows them
to ask and rapidly answer questions regarding their systems of interest. Similarly,
theoreticians desire a reliable, validated base upon which they can quickly build
new tools to probe aspects of molecular behavior. Extensibility and ease of use
have been problems for molecular mechanics simulation and analysis environments.
The desire for easy to use and extensible molecular mechanics platforms is accompanied
by the need for software environments that can take advantage of the inexpensive
multi-processor hardware that has become available over the past decade. Further,
there is increased multi-disciplinary dialog between computer science and structural
biology such that new approaches to traditionally hard computational problems have
been outlined – but not yet fully incorporated into the codes available to the scientific
community. For these reasons, we have written a scalable, parallel software environment
for performing and analyzing molecular mechanics simulations: in lucem
Molecular Mechanics ( ilmm, © Beck, D. A. C., Alonso, D. O. V., Daggett
V, University of Washington, 2000-2009).
Several fundamental aspects of the methods employed by ilmm contribute
significantly to the computational expense of molecular mechanics calculations with
our methods. First and foremost is the choice to use a classical potential model
that explicitly represents solvent . Similarly, all solute atoms are modeled
explicitly . Second, there is a minimum spherical cutoff range distance that
is required to accurately reproduce experimental observations – typically at least
8 Å , although longer cutoffs are always better.
Finally, in order to perform a biophysically correct simulation, the time-step for
numerical integration must be small enough to assure against discontinuities in
the trajectory, which will spoil conservation of fixed parameters of the underlying
ensemble [2, 5, 7]. With the methods employed in ilmm, we commonly use
a 2 femtosecond time step. As such, the final contribution to computational expense
is the highly repetitive nature of the calculations: 1 picosecond of MD requires
500 iterations of the energy function calculation and integration cycle. A microsecond
of simulation time requires 500,000,000 iterations.
These three contributors to computational expense conspire and complicate analysis
of the MD from an algorithm optimization perspective. The inclusion of all atoms
in the solute and solvent, in conjunction with the use of longer cutoff ranges must
be addressed together in any attempt to reduce the exponential effect they have
on the wall-clock time required to perform simulations.
Sampling in molecular simulations is almost always the most difficult aspect to
address – that is, how much simulation time and / or how many simulations are required
to accurately sample the event(s) of interest. Many simulations are attempting to
document processes that occur on timescales of 100s or 1000s of nanoseconds. As
such, we are constantly pushing the boundaries of the MD timescale to extend our
simulations . Mathematically, the standard deviation of means varies as 1/N
2  where N is the number of samples / timesteps, such that longer and multiple simulations
can provide more precise results.
Given the desire for longer simulations and the importance in using realistic methodologies,
we have developed and implemented several computing approaches to reduce the computational
expense of molecular mechanics calculations and simulations without sacrificing
the fidelity of the underlying algorithms. First: parallelization of computational
tasks – i.e. splitting the work in a logical way over a number of computational
elements (i.e. central processing units or CPUs). With the proliferation of inexpensive
dual and quad CPU computing platforms, parallel computing has become an attractive
solution to reducing the real-world time required for computationally intensive
tasks. In addition, the governments of American and European nations have invested
heavily in grantee accessible-supercomputing platforms with hundreds and even thousands
of CPUs. Under perfect parallelization strategies, the time required for a computation
is reduced to 1 / P where P CPUs are available. Perfect parallelization is impossible
to achieve, but it can be approached with aggressive load balancing – the act of
equalizing the amount of work participating CPUs perform.
The second significant advancement in ilmm is its algorithm for neighbor
discovery (hash3d). This action involves the identification of atoms to include in the non-bonded
term of potential energy function discussed in . The problem of neighbor discovery
is not unique to MD – it is found in other physics simulation arenas including astrophysics.
However, solutions from astrophysics are generally not appropriate for MD as galaxies
and solar systems have densities that are non-uniform whereas those in MD are quite
There are a number of algorithms presented for MD neighbor discovery in the literature
[2, 10, 11], however, they all based on transformations of the same approach. In
fact, hash3d, is different primarily in that we have formalized the treatment
of the problem in terms of a hash. Hash functions are essentially rapid index and
retrieval mechanisms – usually applied to one dimensional data . Hash3d
is a three-dimensional extension to this approach. As the factors of computational
expense in molecular simulations are linked, so are the solutions: parallelization
is applied within the hash3d algorithm to further accelerate neighbor
discovery in the presence of multiple CPUs.
The computational techniques used to improve the speed of molecular simulations
can also be employed in the subsequent analyses of the simulation trajectories.
For example, in the calculation of the solvent accessible surface area (SASA) for
a solute using the algorithm of Lee and Richards , it is necessary to examine
the overlap of neighboring atoms’ Connelly surface with a given probe radius. Finding
the neighboring atoms is an O(N2) problem where N is the number of atoms.
As such, the aforementioned hash3d can be used to accelerate this process
and reduce the time required for the SASA calculation.