4.2.2. Native contacts analysis — MDAnalysis.analysis.contacts
¶
This module contains classes to analyze native contacts Q over a trajectory. Native contacts of a conformation are contacts that exist in a reference structure and in the conformation. Contacts in the reference structure are always defined as being closer then a distance radius. The fraction of native contacts for a conformation can be calculated in different ways. This module supports 3 different metrics listed below, as well as custom metrics.
- Hard Cut: To count as a contact the atoms i and j have to be at least as close as in the reference structure.
- Soft Cut: The atom pair i and j is assigned based on a soft potential
that is 1 if the distance is 0, 1/2 if the distance is the same as in
the reference and 0 for large distances. For the exact definition of the
potential and parameters have a look at function
soft_cut_q()
. - Radius Cut: To count as a contact the atoms i and j cannot be further apart than some distance radius.
The “fraction of native contacts” Q(t) is a number between 0 and 1 and calculated as the total number of native contacts for a given time frame divided by the total number of contacts in the reference structure.
4.2.2.1. Examples for contact analysis¶
4.2.2.1.1. One-dimensional contact analysis¶
As an example we analyze the opening (“unzipping”) of salt bridges when the AdK enzyme opens up; this is one of the example trajectories in MDAnalysis.
import MDAnalysis as mda
from MDAnalysis.analysis import contacts
from MDAnalysis.tests.datafiles import PSF,DCD
import matplotlib.pyplot as plt
# example trajectory (transition of AdK from closed to open)
u = mda.Universe(PSF,DCD)
# crude definition of salt bridges as contacts between NH/NZ in ARG/LYS and
# OE*/OD* in ASP/GLU. You might want to think a little bit harder about the
# problem before using this for real work.
sel_basic = "(resname ARG LYS) and (name NH* NZ)"
sel_acidic = "(resname ASP GLU) and (name OE* OD*)"
# reference groups (first frame of the trajectory, but you could also use a
# separate PDB, eg crystal structure)
acidic = u.select_atoms(sel_acidic)
basic = u.select_atoms(sel_basic)
# set up analysis of native contacts ("salt bridges"); salt bridges have a
# distance <6 A
ca1 = contacts.Contacts(u, selection=(sel_acidic, sel_basic),
refgroup=(acidic, basic), radius=6.0)
# iterate through trajectory and perform analysis of "native contacts" Q
ca1.run()
# print number of averave contacts
average_contacts = np.mean(ca1.timeseries[:, 1])
print('average contacts = {}'.format(average_contacts))
# plot time series q(t)
f, ax = plt.subplots()
ax.plot(ca1.timeseries[:, 0], ca1.timeseries[:, 1])
ax.set(xlabel='frame', ylabel='fraction of native contacts',
title='Native Contacts, average = {:.2f}'.format(average_contacts))
fig.show()
The first graph shows that when AdK opens, about 20% of the salt bridges that existed in the closed state disappear when the enzyme opens. They open in a step-wise fashion (made more clear by the movie AdK_zipper_cartoon.avi).
Notes
Suggested cutoff distances for different simulations
- For all-atom simulations, cutoff = 4.5 Å
- For coarse-grained simulations, cutoff = 6.0 Å
4.2.2.1.2. Two-dimensional contact analysis (q1-q2)¶
Analyze a single DIMS transition of AdK between its closed and open conformation and plot the trajectory projected on q1-q2 [Franklin2007]
import MDAnalysis as mda
from MDAnalysis.analysis import contacts
from MDAnalysisTests.datafiles import PSF, DCD
u = mda.Universe(PSF, DCD)
q1q2 = contacts.q1q2(u, 'name CA', radius=8)
q1q2.run()
f, ax = plt.subplots(1, 2, figsize=plt.figaspect(0.5))
ax[0].plot(q1q2.timeseries[:, 0], q1q2.timeseries[:, 1], label='q1')
ax[0].plot(q1q2.timeseries[:, 0], q1q2.timeseries[:, 2], label='q2')
ax[0].legend(loc='best')
ax[1].plot(q1q2.timeseries[:, 1], q1q2.timeseries[:, 2], '.-')
f.show()
Compare the resulting pathway to the MinActionPath result for AdK [Franklin2007].
4.2.2.1.3. Writing your own contact analysis¶
The Contacts
class has been designed to be extensible for your own
analysis. As an example we will analyze when the acidic and basic groups of AdK
are in contact which each other; this means that at least one of the contacts
formed in the reference is closer than 2.5 Å.
For this we define a new function to determine if any contact is closer than
2.5 Å; this function must implement the API prescribed by Contacts
:
def is_any_closer(r, r0, dist=2.5):
return np.any(r < dist)
The first two parameters r and r0 are provided by Contacts
when it
calls is_any_closer()
while the others can be passed as keyword args
using the kwargs parameter in Contacts
.
Next we are creating an instance of the Contacts
class and use the
is_any_closer()
function as an argument to method and run the analysis:
# crude definition of salt bridges as contacts between NH/NZ in ARG/LYS and
# OE*/OD* in ASP/GLU. You might want to think a little bit harder about the
# problem before using this for real work.
sel_basic = "(resname ARG LYS) and (name NH* NZ)"
sel_acidic = "(resname ASP GLU) and (name OE* OD*)"
# reference groups (first frame of the trajectory, but you could also use a
# separate PDB, eg crystal structure)
acidic = u.select_atoms(sel_acidic)
basic = u.select_atoms(sel_basic)
nc = contacts.Contacts(u, selection=(sel_acidic, sel_basic),
method=is_any_closer,
refgroup=(acidic, basic), kwargs={'dist': 2.5})
nc.run()
bound = nc.timeseries[:, 1]
frames = nc.timeseries[:, 0]
f, ax = plt.subplots()
ax.plot(frames, bound, '.')
ax.set(xlabel='frame', ylabel='is Bound',
ylim=(-0.1, 1.1))
f.show()
4.2.2.2. Functions¶
-
MDAnalysis.analysis.contacts.
hard_cut_q
(r, cutoff)[source]¶ Calculate fraction of native contacts Q for a hard cut off.
The cutoff can either be a float or a
ndarray
of the same shape as r.Parameters: - r (ndarray) – distance matrix
- cutoff (ndarray | float) – cut off value to count distances. Can either be a float of a ndarray of the same size as distances
Returns: Q – fraction of contacts
Return type:
-
MDAnalysis.analysis.contacts.
soft_cut_q
(r, r0, beta=5.0, lambda_constant=1.8)[source]¶ Calculate fraction of native contacts Q for a soft cut off
The native contact function is defined as [Best2013]
\[Q(r, r_0) = \frac{1}{1 + e^{\beta (r - \lambda r_0)}}\]Reasonable values for different simulation types are
- All Atom: lambda_constant = 1.8 (unitless)
- Coarse Grained: lambda_constant = 1.5 (unitless)
Parameters: Returns: Q – fraction of native contacts
Return type: References
[Best2013] RB Best, G Hummer, and WA Eaton, “Native contacts determine protein folding mechanisms in atomistic simulations” _PNAS_ 110 (2013), 17874–17879. doi: 10.1073/pnas.1311599110.
-
MDAnalysis.analysis.contacts.
radius_cut_q
(r, r0, radius)[source]¶ calculate native contacts Q based on the single distance radius.
Parameters: Returns: Q – fraction of contacts
Return type: References
[Franklin2007] (1, 2, 3) Franklin, J., Koehl, P., Doniach, S., & Delarue, M. (2007). MinActionPath: Maximum likelihood trajectory for large-scale structural transitions in a coarse-grained locally harmonic energy landscape. Nucleic Acids Research, 35(SUPPL.2), 477–482. doi: 10.1093/nar/gkm342
-
MDAnalysis.analysis.contacts.
contact_matrix
(d, radius, out=None)[source]¶ calculate contacts from distance matrix
Parameters: Returns: contacts – boolean array of formed contacts
Return type: ndarray
-
MDAnalysis.analysis.contacts.
q1q2
(u, selection=’all’, radius=4.5, start=None, stop=None, step=None)[source]¶ Perform a q1-q2 analysis.
Compares native contacts between the starting structure and final structure of a trajectory [Franklin2007].
Parameters: - u (Universe) – Universe with a trajectory
- selection (string, optional) – atoms to do analysis on
- radius (float, optional) – distance at which contact is formed
- start (int, optional) – First frame of trajectory to analyse, Default: 0
- stop (int, optional) – Last frame of trajectory to analyse, Default: -1
- step (int, optional) – Step between frames to analyse, Default: 1
Returns: contacts – Contact Analysis that is set up for a q1-q2 analysis
Return type:
4.2.2.3. Classes¶
-
class
MDAnalysis.analysis.contacts.
Contacts
(u, selection, refgroup, method=’hard_cut’, radius=4.5, kwargs=None, **basekwargs)[source]¶ Calculate contacts based observables.
The standard methods used in this class calculate the fraction of native contacts Q from a trajectory.
Contact API
By defining your own method it is possible to calculate other observables that only depend on the distances and a possible reference distance. The Contact API prescribes that this method must be a function with call signature
func(r, r0, **kwargs)
and must be provided in the keyword argument method.-
timeseries
¶ list – list containing Q for all refgroup pairs and analyzed frames
Parameters: - u (Universe) – trajectory
- selection (tuple(string, string)) – two contacting groups that change over time
- refgroup (tuple(AtomGroup, AtomGroup)) – two contacting atomgroups in their reference conformation. This can also be a list of tuples containing different atom groups
- radius (float, optional (4.5 Angstroms)) – radius within which contacts exist in refgroup
- method (string | callable (optional)) – Can either be one of
['hard_cut' , 'soft_cut']
or a callable with call signaturefunc(r, r0, **kwargs)
(the “Contacts API”). - kwargs (dict, optional) – dictionary of additional kwargs passed to method. Check respective functions for reasonable values.
- start (int, optional) – First frame of trajectory to analyse, Default: None becomes 0.
- stop (int, optional) – Frame index to stop analysis. Default: None becomes n_frames. Iteration stops before this frame number, which means that the trajectory would be read until the end.
- step (int, optional) – Step between frames to analyse, Default: None becomes 1.
-
4.2.2.4. Deprecated¶
The following classes are deprecated and are scheduled for removal in release 0.17.0.
-
class
MDAnalysis.analysis.contacts.
ContactAnalysis1
(*args, **kwds)[source]¶ Perform a very flexible native contact analysis with respect to a single reference.
This analysis class allows one to calculate the fraction of native contacts q between two arbitrary groups of atoms with respect to an arbitrary reference structure. For instance, as a reference one could take a crystal structure of a complex, and as the two groups atoms one selects two molecules A and B in the complex. Then the question to be answered by q is, is which percentage of the contacts between A and B persist during the simulation.
First prepare
AtomGroup
selections for the reference atoms; this example uses some arbitrary selections:ref = Universe('crystal.pdb') refA = ref.select_atoms('name CA and segid A and resid 6:100') refB = ref.select_atoms('name CA and segid B and resid 1:40')
Load the trajectory:
u = Universe(topology, trajectory)
We then need two selection strings selA and selB that, when applied as
u.select_atoms(selA)
produce a list of atoms that is equivalent to the reference (i.e.u.select_atoms(selA)
must select the same atoms asrefA
in this example):selA = 'name CA and resid 1:95' # corresponds to refA selB = 'name CA and resid 150:189' # corresponds to refB
Note
It is the user’s responsibility to provide a reference group (or groups) that describe equivalent atoms to the ones selected by selection.
Now we are ready to set up the analysis:
CA1 = ContactAnalysis1(u, selection=(selA,selB), refgroup=(refA,refB), radius=8.0, outfile="q.dat")
If the groups do not match in length then a
ValueError
is raised.The analysis across the whole trajectory is performed with
CA1.run()
Results are saved to outfile (
framenumber q N
per line) and can also be plotted withCA1.plot() # plots the time series q(t) CA1.plot_qavg() # plots the matrix of average contacts <q>
Description of computed values in the output file:
- N
- number of native contacts
- q
- fraction of native contacts relative to the reference
Deprecated since version 0.15.0.
__init__ is deprecated, use Contacts instead! This class will be removed in 0.17
Calculate native contacts within a group or between two groups.
Arguments: - topology
psf or pdb file
- trajectory
dcd or xtc/trr file
- universe
instead of a topology/trajectory combination, one can also supply a
MDAnalysis.Universe
Keywords: - selection
selection string that determines which distances are calculated; if this is a tuple or list with two entries then distances are calculated between these two different groups [“name CA or name B*”]
- refgroup
reference group, either a single
AtomGroup
(if there is only a single selection) or a list of two such groups. The reference contacts are directly computed from refgroup and hence the atoms in the reference group(s) must be equivalent to the ones produced by the selection on the input trajectory.- radius
contacts are deemed any atoms within radius [8.0 A]
- outfile
name of the output file; with the gz or bz2 suffix, a compressed file is written. The average <q> is written to a second, gzipped file that has the same name with ‘array’ included. E.g. for the default name “q1.dat.gz” the <q> file will be “q1.array.gz”. The format is the matrix in column-row format, i.e. selection 1 residues are the columns and selection 2 residues are rows. The file can be read with
np.loadtxt()
. [“q1.dat.gz”]
The function calculates the percentage of native contacts q1 along a trajectory. “Contacts” are defined as the number of atoms within radius of a given other atom. q1 is the fraction of contacts relative to the reference state 1 (typically the starting conformation of the trajectory).
The timeseries is written to a file outfile and is also accessible as the attribute
ContactAnalysis1.timeseries
.-
output_exists
(force=False)[source]¶ Return True if default output file already exists.
Disable with force=True (will always return False)
-
plot
(filename=None, **kwargs)[source]¶ Plot q(t).
If filename is supplied then the figure is also written to file (the suffix determines the file type, e.g. pdf, png, eps, …). All other keyword arguments are passed on to
pylab.plot()
.
-
plot_qavg
(filename=None, **kwargs)[source]¶ Plot
ContactAnalysis1.qavg
, the matrix of average native contacts.If filename is supplied then the figure is also written to file (the suffix determines the file type, e.g. pdf, png, eps, …). All other keyword arguments are passed on to
pylab.imshow()
.
-
qN
(q, out=None)[source]¶ Calculate native contacts relative to reference state.
q is the matrix of contacts (e.g.
q
).If out is supplied as a pre-allocated array of the correct shape then it is filled instead of allocating a new one in order to increase performance.
This method is typically only used internally.
-
qarray
(d, out=None)[source]¶ Return distance array with True for contacts.
d is the matrix of distances. The method uses the value of
ContactAnalysis1.radius
to determine if adistance < radius
is considered a contact.If out is supplied as a pre-allocated array of the correct shape then it is filled instead of allocating a new one in order to increase performance.
This method is typically only used internally.
-
run
(store=True, force=False, start=0, stop=None, step=1, **kwargs)[source]¶ Analyze trajectory and produce timeseries.
Stores results in
ContactAnalysis1.timeseries
(if store=True) and writes them to a data file. The average q is written to a second data file.- start
- The value of the first frame index in the trajectory to be used (default: index 0)
- stop
- The value of the last frame index in the trajectory to be used (default: None – use all frames)
- step
- The number of frames to skip during trajectory iteration (default: use every frame)
-
class
MDAnalysis.analysis.contacts.
ContactAnalysis
(*args, **kwds)[source]¶ Perform a native contact analysis (“q1-q2”).
The analysis of the trajectory is performed with the
ContactAnalysis.run()
method. The result is stored inContactAnalysis.timeseries
. It is a numpy array which contains the frame number at index 0, q1 and q2 at index 1 and 2, and the total number of contacts in 3 and 4.frame q1 q2 n1 n2
The total number of contacts in the reference states 1 and 2 are stored in
ContactAnalysis.nref
(index 0 and 1).The
ContactAnalysis.run()
method calculates the percentage of native contacts q1 and q2 along a trajectory. “Contacts” are defined as the number of Ca atoms (or per-residue centroids of a user defined selection) within radius of a primary Ca. q1 is the fraction of contacts relative to the reference state 1 (typically the starting conformation of the trajectory) and q2 is the fraction of contacts relative to the conformation 2.The timeseries is written to a bzip2-compressed file in targetdir named “basename(trajectory)infix_q1q2.dat.bz2” and is also accessible as the attribute
ContactAnalysis.timeseries
.Parameters: - topology (filename as str) – topology file
- trajectory (filename as str) – trajectory
- ref1 (filename or
None
, optional) – structure of the reference conformation 1 (pdb); ifNone
the first frame of the trajectory is chosen - ref2 (filename or
None
, optional) – structure of the reference conformation 2 (pdb); ifNone
the last frame of the trajectory is chosen - radius (float, optional, default 8 A) – contacts are deemed any Ca within radius
- targetdir (path, optional, default
.
) – output files are saved in this directory - infix (string, optional) –
- additional tag string that is inserted into the output filename of
- the data file
- selection : string, optional, default
"name CA"
- MDAnalysis selection string that selects the particles of
interest; the default is to only select the C-alpha atoms
in ref1 and ref2
Note
If selection produces more than one atom per residue then you will get multiple contacts per residue unless you also set centroids =
True
- centroids : bool
- If set to
True
, use the centroids for the selected atoms on a per-residue basis to compute contacts. This allows, for instance defining the sidechains as selection and then computing distances between sidechain centroids.
Deprecated since version 0.15.0.
__init__ is deprecated, use Contacts instead! This class will be removed in 0.17
-
get_distance_array
(g, **kwargs)[source]¶ Calculate the self_distance_array for atoms in group g.
Parameters: - g (AtomGroup) – group of atoms to calculate distance array for
- results (array, optional) – passed on to
MDAnalysis.lib.distances.self_distance_array()
as a preallocated array - centroids (bool, optional, default
None
) –True
: calculate per-residue centroids from the selected atoms;False
: consider each atom separately;None
: use the class default for centroids [None
]
-
output_exists
(force=False)[source]¶ Return True if default output file already exists.
Disable with force=True (will always return False)
-
qN
(q, n, out=None)[source]¶ Calculate native contacts relative to reference state.
Note
This method is typically only used internally.
Parameters: Returns: - contacts (integer) – total number of contacts
- fraction (float) – fraction of contacts relative to the reference state
-
qarray
(d, out=None)[source]¶ Return array with
True
for contacts.Note
This method is typically only used internally.
Parameters: - d (array) – 2D array of distances. The method uses the value of
radius
to determine if adistance < radius
is considered a contact. - out (array, optional) – If out is supplied as a pre-allocated array of the correct shape then it is filled instead of allocating a new one in order to increase performance.
Returns: contact matrix
Return type: - d (array) – 2D array of distances. The method uses the value of