<%@ Page Language="C#" MasterPageFile="~/MasterPages/Dynameomics_Nav.master" AutoEventWireup="true" CodeFile="reaction.aspx.cs" Inherits="datamining_reaction" Title="Dynameomics - Reaction Coordinates" %>

Reaction Coordinates

One dimensional reaction coordinate

We use 15 physical properties of the protein to construct a multidimensional embedded, one-dimensional (1D) reaction coordinate that faithfully captures the complex nature of unfolding. For any particular fold, the distances in property space between structures in the native ensemble and the unfolding trajectories are calculated; the resulting histogram of the mean distances is the reaction coordinate. The unfolding reaction coordinates for 188 fold representatives (1534 simulations and 22.9 μs in explicit water) were calculated. Native, transition, intermediate and denatured states are readily identified using this reaction coordinate. Principal component analysis (PCA) and the projections of the first 2 or 3 principal components show distinct clusters for unfolding species [1].

The following guide will describe how to calculate a one-dimensional reaction coordinate for a simulation, it will also show you how to run PCA on a matrix of normalized properties to find clusters of structures with similar properties.

References

Toofanny RD, Jonsson AL, Daggett V. A Comprehensive Multidimensional-Embedded One-Dimensional Reaction Coordinate for Protein Unfolding/Folding. Biophysical Journal, 98:2671-2681, 2010. [DOI]

Guide to Reaction Coordinates

Requirements:

At least 1 native (298K) trajectory with all analyses loaded for the reference ensemble. Analyses loaded for unfolding trajectory NB. For the reaction coordinate calculation we ran over Dynameomics we used a generic set of 15 properties based on preexisting analyses. You can use these too, but to improve the differentiation between unfolding species you may want to use protein specific properties, such as core Cα-RMSD. You will need to have these tables loaded in the database, if you wish to do this. You can, of course, calculate these in the database if necessary.

There are a series of scripts that need to be run in order for you to calculate the one-dimensional reaction coordinate.

Step 1: Collates the raw properties (chosen for the multi property reaction coordinate) by simulation

Step 2: Collates the relevant simulation table created in step 1 for a given protein e.g. the raw properties for the 298 K and the 498 K simulations.

Step 3: Normalizes the properties by range across the 298 K and 498 K simulations

Step 4: Runs the comparator program which takes the normalized table of properties from Step 3. The output from comparator will be the mean distance to the reference ensemble for each time point.

Step 5: Runs the histogram maker program on the output from step 4.

Step 1 <%=QueryTransform.SQL_Transform(@" DECLARE @sim_id INT DECLARE @table_prefix VARCHAR(64) DECLARE @server_name VARCHAR(32) DECLARE @database_name VARCHAR(32) DECLARE @first BIT DECLARE @view_cmd VARCHAR(MAX) DECLARE @struct_id INT DECLARE @residues INT SET @first = 1 DECLARE sim_id_cursor CURSOR FOR select distinct a.sim_id, struct_id, server_name,database_name from [directory].dbo.[master_property_v] as a join [directory].dbo.[master_simulation_simulationgroup] as b on a.sim_id=b.sim_id where a.property_abbrev='coord' and b.sim_grp_id=1000 and a.sim_id=1626 and a.sim_id not in (2650, 2651, 2652, 2653, 2654) OPEN sim_id_cursor FETCH NEXT FROM sim_id_cursor INTO @sim_id, @struct_id, @server_name, @database_name SET @table_prefix = '[' + @server_name + '].[' + @database_name + '].' SELECT @residues=MAX(residue_number) FROM [helix].[directory].[dbo].[Master_ID] WHERE struct_id = @struct_id WHILE @@FETCH_STATUS = 0 BEGIN SELECT @residues=MAX(residue_number) FROM [helix].[directory].[dbo].[Master_ID] WHERE struct_id = @struct_id SET @table_prefix = '[' + @server_name + '].[' + @database_name + '].' SET @view_cmd = ' SELECT a.sim_id, a.struct_id, a.struct_inst, f.temp, f.run, a.time_step, a.rmsd, b.dissimilarity_score, c.native, c.non_native, e.radgyr, e.end2end, d.Main_chain, d.side_chain, d.polar, d.non_polar,d.mc_polar,d.mc_non_polar, d.sc_polar, d.sc_non_polar, d.total, g.fraction_helix, h.fraction_beta INTO dbo.[PS_n_'+ CAST(@sim_id AS VARCHAR) +'] FROM ' + @table_prefix + 'dbo.[rmsd_'+ CAST(@sim_id AS VARCHAR) +'] AS a Join ' + @table_prefix + 'dbo.[congen_'+ CAST(@sim_id AS VARCHAR)+'] as b on a.time_step=b.time_step Join ' + @table_prefix + 'dbo.[contact_' + CAST(@sim_id AS VARCHAR)+ '] as c on a.time_step=c.time_step Join (SELECT time_step, SUM(main_chain)as Main_chain, SUM(side_chain)as Side_chain, SUM(polar)as Polar, SUM(non_polar)as Non_polar, SUM(mc_polar)as MC_polar, SUM(mc_non_polar)as MC_non_polar, SUM(sc_polar)as SC_polar, SUM(sc_non_polar) as SC_non_polar, SUM(total)as Total from ' + @table_prefix + 'dbo.[sasa_'+ CAST(@sim_id AS VARCHAR)+'] group by [time_step]) as d on a.time_step=d.time_step Join ' + @table_prefix + 'dbo.[radgee_'+ CAST(@sim_id AS VARCHAR)+'] as e on a.time_step=e.time_step Join (select distinct Sim_id, struct_id, temp, run from [Helix].[directory].dbo.[master_property_v]) as f on a.struct_id = b.struct_id and a.sim_id=f.sim_id LEFT OUTER Join (SELECT time_step, CAST(COUNT(*) AS DECIMAL)/CAST(' + CAST(@residues AS VARCHAR) + ' AS DECIMAL) AS fraction_helix FROM ' + @table_prefix + 'dbo.[dssp_'+ CAST(@sim_id AS VARCHAR)+'] WHERE ss_id = 12 GROUP BY time_step) AS g on a.time_step=g.time_step LEFT OUTER JOIN (SELECT time_step, CAST(COUNT(*) AS DECIMAL)/CAST(' + CAST(@residues AS VARCHAR) + ' AS DECIMAL) AS fraction_beta FROM ' + @table_prefix + 'dbo.[dssp_' + CAST(@sim_id AS VARCHAR)+'] WHERE ss_id = 3 OR ss_id = 4 GROUP BY time_step) as h on a.time_step=h.time_step ' FETCH NEXT FROM sim_id_cursor INTO @sim_id, @struct_id, @server_name, @database_name PRINT @view_cmd --EXEC (@view_cmd) END CLOSE sim_id_cursor DEALLOCATE sim_id_cursor") %> Step 2 <%=QueryTransform.SQL_Transform(@" SET NOCOUNT ON DECLARE @pdb4 VARCHAR(4) DECLARE @struct_id INT DECLARE pdb4_cursor CURSOR FOR SELECT DISTINCT a.pdb4 FROM [helix].[Directory].[dbo].[Master_Property_v] as a JOIN (SELECT DISTINCT i.pdb4 FROM [helix].[Directory].[dbo].[Master_Property_v]as i JOIN [helix].[directory].dbo.[master_simulation_simulationgroup] as j on i.sim_id=j.sim_id WHERE (j.sim_grp_id=1000) )as b on a.pdb4=b.pdb4 WHERE a.pdb4='1gcp' --(a.run=1 or a.run=2) OPEN pdb4_cursor FETCH NEXT FROM pdb4_cursor INTO @pdb4 WHILE @@FETCH_STATUS = 0 BEGIN DECLARE @sim_id INT DECLARE @table_prefix VARCHAR(64) DECLARE @server_name VARCHAR(32) DECLARE @database_name VARCHAR(32) DECLARE @first BIT DECLARE @view_cmd VARCHAR(MAX) SET @first = 1 DECLARE sim_id_cursor CURSOR FOR SELECT DISTINCT sim_id, server_name, database_name FROM [helix].[Directory].[dbo].[Master_Property_v] WHERE pdb4 = pdb4 and sim_id<>2628 and sim_id<>2629 and sim_id<>2630 and sim_id<>2631 ORDER BY sim_id OPEN sim_id_cursor FETCH NEXT FROM sim_id_cursor INTO @sim_id, @server_name, @database_name SET @table_prefix = '[' + @server_name + '].[' + @database_name + '].' WHILE @@FETCH_STATUS = 0 BEGIN IF (@first = 1) SET @view_cmd = ' CREATE VIEW dbo.Property_space_n_' + @pdb4 + '_498_all AS SELECT sim_id, struct_id, struct_inst, temp, run, time_step, native,non_native, radgyr, end2end, main_chain, side_chain, polar, non_polar, mc_polar, mc_non_polar, sc_polar, sc_non_polar, total, fraction_helix, fraction_beta FROM dbo.ps_n_' + CAST(@sim_id AS VARCHAR) + '' ELSE SET @view_cmd = @view_cmd + ' UNION ALL SELECT sim_id, struct_id, struct_inst, temp, run, time_step, native, non_native, radgyr, end2end, main_chain, side_chain, polar, non_polar, mc_polar, mc_non_polar, sc_polar, sc_non_polar, total, fraction_helix, fraction_beta FROM dbo.ps_n_' + CAST(@sim_id AS VARCHAR) + '' SET @first = 0 FETCH NEXT FROM sim_id_cursor INTO @sim_id, @server_name, @database_name END CLOSE sim_id_cursor DEALLOCATE sim_id_cursor --PRINT @view_cmd EXEC (@view_cmd) FETCH NEXT FROM pdb4_cursor INTO @pdb4 END CLOSE pdb4_cursor DEALLOCATE pdb4_cursor") %> Step 3 <%=QueryTransform.SQL_Transform(@" DECLARE @pdb4 VARCHAR(4) DECLARE @table_prefix VARCHAR(64) DECLARE @server_name VARCHAR(32) DECLARE @database_name VARCHAR(32) DECLARE @first BIT DECLARE @view_cmd VARCHAR(MAX) DECLARE @struct_id INT DECLARE @residues INT SET @first = 1 DECLARE sim_id_cursor CURSOR FOR SELECT DISTINCT a.pdb4 FROM [helix].[Directory].[dbo].[Master_Property_v]as a JOIN [helix].[directory].dbo.[master_simulation_simulationgroup] as b on a.sim_id=b.sim_id WHERE --a.struct_id = @struct_id and b.sim_grp_id=1000 and a.pdb4='1gcp' and a.sim_id<>2628 and a.sim_id<>2629 and a.sim_id<>2630 and a.sim_id<>2631 --and (a.pdb4 not in ('1iom', '4wbc')) and a.server_name='Turn' ORDER BY a.pdb4 OPEN sim_id_cursor FETCH NEXT FROM sim_id_cursor INTO @pdb4 WHILE @@FETCH_STATUS = 0 BEGIN SET @view_cmd = 'SELECT b.sim_id,b.struct_id, b.struct_inst, b.temp, b.run, b.time_step, --(b.rmsd - a.min_rmsd)/(a.max_rmsd - a.min_rmsd) As norm_rmsd, --(b.dissimilarity_score - a.min_dissimilarity_score)/(a.max_dissimilarity_score - a.min_dissimilarity_score) as norm_dissimilarity_score, (cast(b.native as float) - cast (a.min_native as float))/(cast (a.max_native as float) - cast(a.min_native as float)) as norm_native, (cast(b.non_native as float) - cast (a.min_non_native as float))/(cast(a.max_non_native as float)- cast(a.min_non_native as float)) as norm_non_native, (cast(b.radgyr as float) - cast(a.min_radgyr as float))/(cast(a.max_radgyr as float) - cast(a.min_radgyr as float)) as norm_radgyr, (cast(b.end2end as float) - cast(a.min_end2end as float))/ (cast(a.max_end2end as float) - cast(a.min_end2end as float)) as norm_end2end, (b.main_chain - a.min_main_chain)/(a.max_main_chain - a.min_main_chain) as norm_main_chain, (b.side_chain - a.min_side_chain)/(a.max_side_chain - a.min_side_chain) as norm_side_chain, (b.polar - a.min_polar)/(a.max_polar - a.min_polar) as norm_polar, (b.non_polar - a.min_non_polar) / (a.max_non_polar - a.min_non_polar) as norm_non_polar, (b.mc_polar - a.min_mc_polar) / (a.max_mc_polar - a.min_mc_polar) as norm_mc_polar, (b.mc_non_polar - a.min_mc_non_polar) / (a.max_mc_non_polar - a.min_mc_non_polar) as norm_mc_non_polar, (b.sc_polar - a.min_sc_polar) / (a.max_sc_polar - a.min_sc_polar) as norm_sc_polar, (b.sc_non_polar - a.min_sc_non_polar) / (a.max_sc_non_polar - a.min_sc_non_polar) as norm_sc_non_polar, (b.total - a.min_total_sasa)/(a.max_total_sasa - a.min_total_sasa) as norm_total_sasa, (cast(b.fraction_helix as float) - cast(a.min_fraction_helix as float))/(cast(a.max_fraction_helix as float) - cast(a.min_fraction_helix as float)) as norm_fraction_helix, (cast(b.fraction_beta as float) - cast (a.min_fraction_beta as float))/(cast(a.max_fraction_beta as float) - cast(a.min_fraction_beta as float)) as norm_fraction_beta into dbo.[Property_space_n_norm_' + @pdb4 + '_498_all] FROM dbo.[property_space_n_' + @pdb4 + '_498_all] as b JOIN ( select --max(rmsd) as max_rmsd, min(rmsd) as min_rmsd, --max(dissimilarity_score) as max_dissimilarity_score, min(dissimilarity_score) as min_dissimilarity_score, max(native) as max_native, min(native) as min_native, max(non_native) as max_non_native, min(non_native) as min_non_native, max(radgyr)as max_radgyr, min(radgyr)as min_radgyr, max(end2end) as max_end2end, min(end2end) as min_end2end, max(main_chain) as max_main_chain, min(main_chain) as min_main_chain, max(side_chain) as max_side_chain, min(side_chain) as min_side_chain, max(polar) as max_polar, min(polar) as min_polar, max(non_polar)as max_non_polar, min(non_polar)as min_non_polar, max(mc_polar) as max_mc_polar, min(mc_polar) as min_mc_polar, max(mc_non_polar) as max_mc_non_polar, min(mc_non_polar) as min_mc_non_polar, max(sc_polar) as max_sc_polar, min(sc_polar) as min_sc_polar, max(sc_non_polar) as max_sc_non_polar, min(sc_non_polar) as min_sc_non_polar, max(total) as max_total_sasa, min(total) as min_total_sasa, max(fraction_helix) as max_fraction_helix, min(fraction_helix) as min_fraction_helix, max(fraction_beta) as max_fraction_beta, min(fraction_beta) as min_fraction_beta from dbo.[property_space_n_' + @pdb4 + '_498_all] )as a on 1=1 order by b.[time_step]' FETCH NEXT FROM sim_id_cursor INTO @pdb4 PRINT @view_cmd --EXEC (@view_cmd) END CLOSE sim_id_cursor DEALLOCATE sim_id_cursor") %> Step 4 <%=QueryTransform.SQL_Transform(@" DECLARE @pdb4 VARCHAR(4) DECLARE @table_prefix VARCHAR(64) DECLARE @server_name VARCHAR(32) DECLARE @database_name VARCHAR(32) DECLARE @first BIT DECLARE @view_cmd VARCHAR(MAX) DECLARE @struct_id INT DECLARE @residues INT SET @first = 1 DECLARE sim_id_cursor CURSOR FOR SELECT DISTINCT a.pdb4 FROM [helix].[Directory].[dbo].[Master_Property_v]as a JOIN [helix].[directory].dbo.[master_simulation_simulationgroup] as b on a.sim_id=b.sim_id WHERE --a.struct_id = @struct_id and b.sim_grp_id=1000 and pdb4 in ('1a6n', '1cvz', '1hgu','1ifc','1ixa', '1tmc', '2afp', '2giw') --ORDER BY a.pdb4 OPEN sim_id_cursor FETCH NEXT FROM sim_id_cursor INTO @pdb4 WHILE @@FETCH_STATUS = 0 BEGIN SET @view_cmd = 'CREATE TABLE dbo.[Property_space_rxn_n_' + @pdb4 + '] ( [sim_id] INT not null, [struct_id] INT not null, [struct_inst] INT not null, [temp] INT not null, [run] INT not null, [time_step] INT not null, [Mean_distance_to_reference] REAL not null) INSERT into dbo.[Property_space_rxn_n_' + @pdb4 + '] EXECUTE comparator @input_name=''[Helix].[property_space].dbo.[property_space_n_norm_' + @pdb4 + '_498_all]'', @ignore_self = 0, @min_reference_time_step =0, @min_sample_time_step=0 ' FETCH NEXT FROM sim_id_cursor INTO @pdb4 PRINT @view_cmd --EXEC (@view_cmd) END CLOSE sim_id_cursor DEALLOCATE sim_id_cursor") %> Step 5 <%=QueryTransform.SQL_Transform(@" DECLARE @pdb4 VARCHAR(4) DECLARE @table_prefix VARCHAR(64) DECLARE @server_name VARCHAR(32) DECLARE @database_name VARCHAR(32) DECLARE @first BIT DECLARE @view_cmd VARCHAR(MAX) DECLARE @struct_id INT DECLARE @residues INT SET @first = 1 DECLARE sim_id_cursor CURSOR FOR SELECT DISTINCT a.pdb4 FROM [helix].[Directory].[dbo].[Master_Property_v]as a JOIN [helix].[directory].dbo.[master_simulation_simulationgroup] as b on a.sim_id=b.sim_id WHERE --a.struct_id = @struct_id and b.sim_grp_id=2 and a.pdb4<>'1iom' and a.pdb4<>'4wbc' and a.server_name='turn' ORDER BY a.pdb4 OPEN sim_id_cursor FETCH NEXT FROM sim_id_cursor INTO @pdb4 WHILE @@FETCH_STATUS = 0 BEGIN SET @view_cmd = 'CREATE TABLE dbo.[Property_space_n_histogram_' + @pdb4 + '] ( [sim_id] INT not null, [struct_id] INT not null, [struct_inst] INT not null, [temp] INT not null, [run] INT not null, [bin_midpoint] REAL not null, [bin_population] INT not null, [bin_fraction] REAL) INSERT into dbo.[Property_space_n_histogram_' + @pdb4 + '] EXECUTE pistogramor @input_name = ''[Turn].[property_space].dbo.[Property_space_rxn_n_' + @pdb4 + '_498_all]'', @bins = 100 ' FETCH NEXT FROM sim_id_cursor INTO @pdb4 --PRINT @view_cmd EXEC (@view_cmd) END CLOSE sim_id_cursor DEALLOCATE sim_id_cursor") %>

Running PCA

We run principal component analysis on the normalized properties. This reduces this multi-dimensional data into principal components ordered by the amount of variance they capture. We can thus vizualize 15 properties in 2 or 3 dimensions. We can also determine which properties are contributing the most to the variance in each principal component these are called Loadings.

PCA is run in Mathematica, it takes the normalised properties table created by step 3.

<<MultivariateStatistics`
dbTmp=OpenSQLConnection[JDBC["jtds_sqlserver","SERVER;database=DATABASE",Username"USER",Password"PASSWORD"];

mtx=Transpose[SQLExecute[dbDir,"SELECT * From [helix].[property_space].dbo.[Property_space_normalise_498_all_100ps] order by struct_id, temp, run, time_step "]]; mtx=(mtx//.Null0); mtxa=Take[mtx, 6]; mtx= Take[mtx, -15]; mtx=Transpose[mtx]; orig=mtx; mtx=PrincipalComponents[mtx]; ListPlot[mtx[[All,{1,2}]]] pcs=mtx[[All,{1,2, 3}]]; cor=Correlation[mtx, orig]; cor//MatrixForm {matU,matD, matV}=SingularValueDecomposition[Correlation[orig]]; varcap=MapThread[#1[[#2]]/Tr[matD]&,{matD,Range[1,Length[matD]]}] ArrayPlot[cor] mtxb=Join[mtxa,Transpose[pcs]]; mtxb=Transpose[mtxb]; Export["pc1vpc2vpc3_all_2.csv",mtxb] Export ["PC_coefficents_all_2.csv", cor] Export ["Variance-captured_PCA_all_2.csv", varcap]