We use 15 physical properties of the protein to construct a multidimensional embedded,
one-dimensional (1D) reaction coordinate that faithfully captures the complex nature of unfolding.
For any particular fold, the distances in property space between structures in the native ensemble
and the unfolding trajectories are calculated; the resulting histogram of the mean distances
is the reaction coordinate. The unfolding reaction coordinates for 188 fold representatives
(1534 simulations and 22.9 μs in explicit water) were calculated. Native, transition, intermediate
and denatured states are readily identified using this reaction coordinate. Principal component analysis
(PCA) and the projections of the first 2 or 3 principal components show distinct clusters for unfolding species .
The following guide will describe how to calculate a one-dimensional reaction coordinate for a simulation, it will
also show you how to run PCA on a matrix of normalized properties to find clusters of structures with similar properties.
At least 1 native (298K) trajectory with all analyses loaded for the reference ensemble.
Analyses loaded for unfolding trajectory
NB. For the reaction coordinate calculation we ran over Dynameomics we used a generic set of 15 properties based on preexisting analyses. You can use these too, but to improve the differentiation between unfolding species you may want to use protein specific properties, such as core Cα-RMSD. You will need to have these tables loaded in the database, if you wish to do this. You can, of course, calculate these in the database if necessary.
There are a series of scripts that need to be run in order for you to calculate the one-dimensional reaction coordinate.
Step 1: Collates the raw properties (chosen for the multi property reaction coordinate) by simulation
Collates the relevant simulation table created in step 1 for a given protein e.g. the raw properties for the 298 K and the 498 K simulations.
Normalizes the properties by range across the 298 K and 498 K simulations
Runs the comparator program which takes the normalized table of properties from Step 3. The output from comparator will be the mean distance to the reference ensemble for each time point.
Runs the histogram maker program on the output from step 4.
We run principal component analysis on the normalized properties. This reduces this multi-dimensional data
into principal components ordered by the amount of variance they capture. We can thus vizualize
15 properties in 2 or 3 dimensions. We can also determine which properties are contributing the
most to the variance in each principal component these are called Loadings.
PCA is run in Mathematica, it takes the normalised properties table created by step 3.