4.2.2. Native contacts analysis — MDAnalysis.analysis.contacts

This module contains classes to analyze native contacts Q over a trajectory. Native contacts of a conformation are contacts that exist in a reference structure and in the conformation. Contacts in the reference structure are always defined as being closer then a distance radius. The fraction of native contacts for a conformation can be calculated in different ways. This module supports 3 different metrics listed below, as well as custom metrics.

  1. Hard Cut: To count as a contact the atoms i and j have to be at least as close as in the reference structure.
  2. Soft Cut: The atom pair i and j is assigned based on a soft potential that is 1 if the distance is 0, 1/2 if the distance is the same as in the reference and 0 for large distances. For the exact definition of the potential and parameters have a look at function soft_cut_q().
  3. Radius Cut: To count as a contact the atoms i and j cannot be further apart than some distance radius.

The “fraction of native contacts” Q(t) is a number between 0 and 1 and calculated as the total number of native contacts for a given time frame divided by the total number of contacts in the reference structure.

4.2.2.1. Examples for contact analysis

4.2.2.1.1. One-dimensional contact analysis

As an example we analyze the opening (“unzipping”) of salt bridges when the AdK enzyme opens up; this is one of the example trajectories in MDAnalysis.

import MDAnalysis as mda
from MDAnalysis.analysis import contacts
from MDAnalysis.tests.datafiles import PSF,DCD
import matplotlib.pyplot as plt
# example trajectory (transition of AdK from closed to open)
u = mda.Universe(PSF,DCD)
# crude definition of salt bridges as contacts between NH/NZ in ARG/LYS and
# OE*/OD* in ASP/GLU. You might want to think a little bit harder about the
# problem before using this for real work.
sel_basic = "(resname ARG LYS) and (name NH* NZ)"
sel_acidic = "(resname ASP GLU) and (name OE* OD*)"
# reference groups (first frame of the trajectory, but you could also use a
# separate PDB, eg crystal structure)
acidic = u.select_atoms(sel_acidic)
basic = u.select_atoms(sel_basic)
# set up analysis of native contacts ("salt bridges"); salt bridges have a
# distance <6 A
ca1 = contacts.Contacts(u, selection=(sel_acidic, sel_basic),
                        refgroup=(acidic, basic), radius=6.0)
# iterate through trajectory and perform analysis of "native contacts" Q
ca1.run()
# print number of averave contacts
average_contacts = np.mean(ca1.timeseries[:, 1])
print('average contacts = {}'.format(average_contacts))
# plot time series q(t)
f, ax = plt.subplots()
ax.plot(ca1.timeseries[:, 0], ca1.timeseries[:, 1])
ax.set(xlabel='frame', ylabel='fraction of native contacts',
       title='Native Contacts, average = {:.2f}'.format(average_contacts))
fig.show()

The first graph shows that when AdK opens, about 20% of the salt bridges that existed in the closed state disappear when the enzyme opens. They open in a step-wise fashion (made more clear by the movie AdK_zipper_cartoon.avi).

Notes

Suggested cutoff distances for different simulations

  • For all-atom simulations, cutoff = 4.5 Å
  • For coarse-grained simulations, cutoff = 6.0 Å

4.2.2.1.2. Two-dimensional contact analysis (q1-q2)

Analyze a single DIMS transition of AdK between its closed and open conformation and plot the trajectory projected on q1-q2 [Franklin2007]

import MDAnalysis as mda
from MDAnalysis.analysis import contacts
from MDAnalysisTests.datafiles import PSF, DCD
u = mda.Universe(PSF, DCD)
q1q2 = contacts.q1q2(u, 'name CA', radius=8)
q1q2.run()

f, ax = plt.subplots(1, 2, figsize=plt.figaspect(0.5))
ax[0].plot(q1q2.timeseries[:, 0], q1q2.timeseries[:, 1], label='q1')
ax[0].plot(q1q2.timeseries[:, 0], q1q2.timeseries[:, 2], label='q2')
ax[0].legend(loc='best')
ax[1].plot(q1q2.timeseries[:, 1], q1q2.timeseries[:, 2], '.-')
f.show()

Compare the resulting pathway to the MinActionPath result for AdK [Franklin2007].

4.2.2.1.3. Writing your own contact analysis

The Contacts class has been designed to be extensible for your own analysis. As an example we will analyze when the acidic and basic groups of AdK are in contact which each other; this means that at least one of the contacts formed in the reference is closer than 2.5 Å.

For this we define a new function to determine if any contact is closer than 2.5 Å; this function must implement the API prescribed by Contacts:

def is_any_closer(r, r0, dist=2.5):
    return np.any(r < dist)

The first two parameters r and r0 are provided by Contacts when it calls is_any_closer() while the others can be passed as keyword args using the kwargs parameter in Contacts.

Next we are creating an instance of the Contacts class and use the is_any_closer() function as an argument to method and run the analysis:

# crude definition of salt bridges as contacts between NH/NZ in ARG/LYS and
# OE*/OD* in ASP/GLU. You might want to think a little bit harder about the
# problem before using this for real work.
sel_basic = "(resname ARG LYS) and (name NH* NZ)"
sel_acidic = "(resname ASP GLU) and (name OE* OD*)"

# reference groups (first frame of the trajectory, but you could also use a
# separate PDB, eg crystal structure)
acidic = u.select_atoms(sel_acidic)
basic = u.select_atoms(sel_basic)

nc = contacts.Contacts(u, selection=(sel_acidic, sel_basic),
                       method=is_any_closer,
                       refgroup=(acidic, basic), kwargs={'dist': 2.5})
nc.run()

bound = nc.timeseries[:, 1]
frames = nc.timeseries[:, 0]

f, ax = plt.subplots()

ax.plot(frames, bound, '.')
ax.set(xlabel='frame', ylabel='is Bound',
       ylim=(-0.1, 1.1))

f.show()

4.2.2.2. Functions

MDAnalysis.analysis.contacts.hard_cut_q(r, cutoff)[source]

Calculate fraction of native contacts Q for a hard cut off.

The cutoff can either be a float or a ndarray of the same shape as r.

Parameters:
  • r (ndarray) – distance matrix
  • cutoff (ndarray | float) – cut off value to count distances. Can either be a float of a ndarray of the same size as distances
Returns:

Q – fraction of contacts

Return type:

float

MDAnalysis.analysis.contacts.soft_cut_q(r, r0, beta=5.0, lambda_constant=1.8)[source]

Calculate fraction of native contacts Q for a soft cut off

The native contact function is defined as [Best2013]

\[Q(r, r_0) = \frac{1}{1 + e^{\beta (r - \lambda r_0)}}\]

Reasonable values for different simulation types are

  • All Atom: lambda_constant = 1.8 (unitless)
  • Coarse Grained: lambda_constant = 1.5 (unitless)
Parameters:
  • r (array) – Contact distances at time t
  • r0 (array) – Contact distances at time t=0, reference distances
  • beta (float (default 5.0 Angstrom)) – Softness of the switching function
  • lambda_constant (float (default 1.8, unitless)) – Reference distance tolerance
Returns:

Q – fraction of native contacts

Return type:

float

References

[Best2013]RB Best, G Hummer, and WA Eaton, “Native contacts determine protein folding mechanisms in atomistic simulations” _PNAS_ 110 (2013), 17874–17879. doi: 10.1073/pnas.1311599110.
MDAnalysis.analysis.contacts.radius_cut_q(r, r0, radius)[source]

calculate native contacts Q based on the single distance radius.

Parameters:
  • r (ndarray) – distance array between atoms
  • r0 (ndarray) – unused to fullfill Contacts API
  • radius (float) – Distance between atoms at which a contact is formed
Returns:

Q – fraction of contacts

Return type:

float

References

[Franklin2007](1, 2, 3) Franklin, J., Koehl, P., Doniach, S., & Delarue, M. (2007). MinActionPath: Maximum likelihood trajectory for large-scale structural transitions in a coarse-grained locally harmonic energy landscape. Nucleic Acids Research, 35(SUPPL.2), 477–482. doi: 10.1093/nar/gkm342
MDAnalysis.analysis.contacts.contact_matrix(d, radius, out=None)[source]

calculate contacts from distance matrix

Parameters:
  • d (array-like) – distance matrix
  • radius (float) – distance below which a contact is formed.
  • out (array (optional)) – If out is supplied as a pre-allocated array of the correct shape then it is filled instead of allocating a new one in order to increase performance.
Returns:

contacts – boolean array of formed contacts

Return type:

ndarray

MDAnalysis.analysis.contacts.q1q2(u, selection=’all’, radius=4.5, start=None, stop=None, step=None)[source]

Perform a q1-q2 analysis.

Compares native contacts between the starting structure and final structure of a trajectory [Franklin2007].

Parameters:
  • u (Universe) – Universe with a trajectory
  • selection (string, optional) – atoms to do analysis on
  • radius (float, optional) – distance at which contact is formed
  • start (int, optional) – First frame of trajectory to analyse, Default: 0
  • stop (int, optional) – Last frame of trajectory to analyse, Default: -1
  • step (int, optional) – Step between frames to analyse, Default: 1
Returns:

contacts – Contact Analysis that is set up for a q1-q2 analysis

Return type:

Contacts

4.2.2.3. Classes

class MDAnalysis.analysis.contacts.Contacts(u, selection, refgroup, method=’hard_cut’, radius=4.5, kwargs=None, **basekwargs)[source]

Calculate contacts based observables.

The standard methods used in this class calculate the fraction of native contacts Q from a trajectory.

Contact API

By defining your own method it is possible to calculate other observables that only depend on the distances and a possible reference distance. The Contact API prescribes that this method must be a function with call signature func(r, r0, **kwargs) and must be provided in the keyword argument method.

timeseries

list – list containing Q for all refgroup pairs and analyzed frames

Parameters:
  • u (Universe) – trajectory
  • selection (tuple(string, string)) – two contacting groups that change over time
  • refgroup (tuple(AtomGroup, AtomGroup)) – two contacting atomgroups in their reference conformation. This can also be a list of tuples containing different atom groups
  • radius (float, optional (4.5 Angstroms)) – radius within which contacts exist in refgroup
  • method (string | callable (optional)) – Can either be one of ['hard_cut' , 'soft_cut'] or a callable with call signature func(r, r0, **kwargs) (the “Contacts API”).
  • kwargs (dict, optional) – dictionary of additional kwargs passed to method. Check respective functions for reasonable values.
  • start (int, optional) – First frame of trajectory to analyse, Default: None becomes 0.
  • stop (int, optional) – Frame index to stop analysis. Default: None becomes n_frames. Iteration stops before this frame number, which means that the trajectory would be read until the end.
  • step (int, optional) – Step between frames to analyse, Default: None becomes 1.
save(outfile)[source]

save contacts timeseries

Parameters:outfile (str) – file to save contacts

4.2.2.4. Deprecated

The following classes are deprecated and are scheduled for removal in release 0.17.0.

class MDAnalysis.analysis.contacts.ContactAnalysis1(*args, **kwds)[source]

Perform a very flexible native contact analysis with respect to a single reference.

This analysis class allows one to calculate the fraction of native contacts q between two arbitrary groups of atoms with respect to an arbitrary reference structure. For instance, as a reference one could take a crystal structure of a complex, and as the two groups atoms one selects two molecules A and B in the complex. Then the question to be answered by q is, is which percentage of the contacts between A and B persist during the simulation.

First prepare AtomGroup selections for the reference atoms; this example uses some arbitrary selections:

ref = Universe('crystal.pdb')
refA = ref.select_atoms('name CA and segid A and resid 6:100')
refB = ref.select_atoms('name CA and segid B and resid 1:40')

Load the trajectory:

u = Universe(topology, trajectory)

We then need two selection strings selA and selB that, when applied as u.select_atoms(selA) produce a list of atoms that is equivalent to the reference (i.e. u.select_atoms(selA) must select the same atoms as refA in this example):

selA = 'name CA and resid 1:95'     # corresponds to refA
selB = 'name CA and resid 150:189'  # corresponds to refB

Note

It is the user’s responsibility to provide a reference group (or groups) that describe equivalent atoms to the ones selected by selection.

Now we are ready to set up the analysis:

CA1 = ContactAnalysis1(u, selection=(selA,selB), refgroup=(refA,refB),
                       radius=8.0, outfile="q.dat")

If the groups do not match in length then a ValueError is raised.

The analysis across the whole trajectory is performed with

CA1.run()

Results are saved to outfile (framenumber q N per line) and can also be plotted with

CA1.plot()        # plots the time series q(t)
CA1.plot_qavg()   # plots the matrix of average contacts <q>

Description of computed values in the output file:

N
number of native contacts
q
fraction of native contacts relative to the reference

Deprecated since version 0.15.0.

__init__ is deprecated, use Contacts instead! This class will be removed in 0.17

Calculate native contacts within a group or between two groups.

Arguments:
topology

psf or pdb file

trajectory

dcd or xtc/trr file

universe

instead of a topology/trajectory combination, one can also supply a MDAnalysis.Universe

Keywords:
selection

selection string that determines which distances are calculated; if this is a tuple or list with two entries then distances are calculated between these two different groups [“name CA or name B*”]

refgroup

reference group, either a single AtomGroup (if there is only a single selection) or a list of two such groups. The reference contacts are directly computed from refgroup and hence the atoms in the reference group(s) must be equivalent to the ones produced by the selection on the input trajectory.

radius

contacts are deemed any atoms within radius [8.0 A]

outfile

name of the output file; with the gz or bz2 suffix, a compressed file is written. The average <q> is written to a second, gzipped file that has the same name with ‘array’ included. E.g. for the default name “q1.dat.gz” the <q> file will be “q1.array.gz”. The format is the matrix in column-row format, i.e. selection 1 residues are the columns and selection 2 residues are rows. The file can be read with np.loadtxt(). [“q1.dat.gz”]

The function calculates the percentage of native contacts q1 along a trajectory. “Contacts” are defined as the number of atoms within radius of a given other atom. q1 is the fraction of contacts relative to the reference state 1 (typically the starting conformation of the trajectory).

The timeseries is written to a file outfile and is also accessible as the attribute ContactAnalysis1.timeseries.

load(filename)[source]

Load the data file.

output_exists(force=False)[source]

Return True if default output file already exists.

Disable with force=True (will always return False)

plot(filename=None, **kwargs)[source]

Plot q(t).

If filename is supplied then the figure is also written to file (the suffix determines the file type, e.g. pdf, png, eps, …). All other keyword arguments are passed on to pylab.plot().

plot_qavg(filename=None, **kwargs)[source]

Plot ContactAnalysis1.qavg, the matrix of average native contacts.

If filename is supplied then the figure is also written to file (the suffix determines the file type, e.g. pdf, png, eps, …). All other keyword arguments are passed on to pylab.imshow().

qN(q, out=None)[source]

Calculate native contacts relative to reference state.

q is the matrix of contacts (e.g. q).

If out is supplied as a pre-allocated array of the correct shape then it is filled instead of allocating a new one in order to increase performance.

This method is typically only used internally.

qarray(d, out=None)[source]

Return distance array with True for contacts.

d is the matrix of distances. The method uses the value of ContactAnalysis1.radius to determine if a distance < radius is considered a contact.

If out is supplied as a pre-allocated array of the correct shape then it is filled instead of allocating a new one in order to increase performance.

This method is typically only used internally.

run(store=True, force=False, start=0, stop=None, step=1, **kwargs)[source]

Analyze trajectory and produce timeseries.

Stores results in ContactAnalysis1.timeseries (if store=True) and writes them to a data file. The average q is written to a second data file.

start
The value of the first frame index in the trajectory to be used (default: index 0)
stop
The value of the last frame index in the trajectory to be used (default: None – use all frames)
step
The number of frames to skip during trajectory iteration (default: use every frame)
class MDAnalysis.analysis.contacts.ContactAnalysis(*args, **kwds)[source]

Perform a native contact analysis (“q1-q2”).

The analysis of the trajectory is performed with the ContactAnalysis.run() method. The result is stored in ContactAnalysis.timeseries. It is a numpy array which contains the frame number at index 0, q1 and q2 at index 1 and 2, and the total number of contacts in 3 and 4.

frame  q1 q2  n1 n2

The total number of contacts in the reference states 1 and 2 are stored in ContactAnalysis.nref (index 0 and 1).

The ContactAnalysis.run() method calculates the percentage of native contacts q1 and q2 along a trajectory. “Contacts” are defined as the number of Ca atoms (or per-residue centroids of a user defined selection) within radius of a primary Ca. q1 is the fraction of contacts relative to the reference state 1 (typically the starting conformation of the trajectory) and q2 is the fraction of contacts relative to the conformation 2.

The timeseries is written to a bzip2-compressed file in targetdir named “basename(trajectory)infix_q1q2.dat.bz2” and is also accessible as the attribute ContactAnalysis.timeseries.

Parameters:
  • topology (filename as str) – topology file
  • trajectory (filename as str) – trajectory
  • ref1 (filename or None, optional) – structure of the reference conformation 1 (pdb); if None the first frame of the trajectory is chosen
  • ref2 (filename or None, optional) – structure of the reference conformation 2 (pdb); if None the last frame of the trajectory is chosen
  • radius (float, optional, default 8 A) – contacts are deemed any Ca within radius
  • targetdir (path, optional, default .) – output files are saved in this directory
  • infix (string, optional) –
    additional tag string that is inserted into the output filename of
    the data file
    selection : string, optional, default "name CA"
    MDAnalysis selection string that selects the particles of interest; the default is to only select the C-alpha atoms in ref1 and ref2

    Note

    If selection produces more than one atom per residue then you will get multiple contacts per residue unless you also set centroids = True

    centroids : bool
    If set to True, use the centroids for the selected atoms on a per-residue basis to compute contacts. This allows, for instance defining the sidechains as selection and then computing distances between sidechain centroids.

Deprecated since version 0.15.0.

__init__ is deprecated, use Contacts instead! This class will be removed in 0.17

get_distance_array(g, **kwargs)[source]

Calculate the self_distance_array for atoms in group g.

Parameters:
  • g (AtomGroup) – group of atoms to calculate distance array for
  • results (array, optional) – passed on to MDAnalysis.lib.distances.self_distance_array() as a preallocated array
  • centroids (bool, optional, default None) – True: calculate per-residue centroids from the selected atoms; False: consider each atom separately; None: use the class default for centroids [None]
load(filename)[source]

Load the data file.

output_exists(force=False)[source]

Return True if default output file already exists.

Disable with force=True (will always return False)

plot(**kwargs)[source]

Plot q1-q2.

qN(q, n, out=None)[source]

Calculate native contacts relative to reference state.

Note

This method is typically only used internally.

Parameters:
  • q (array) – contact matrix (see Contacts.qarray())
  • out (array, optional) – If out is supplied as a pre-allocated array of the correct shape then it will contain the contact matrix relative to the reference state, i.e. only those contacts that are also seen in the reference state.
Returns:

  • contacts (integer) – total number of contacts
  • fraction (float) – fraction of contacts relative to the reference state

qarray(d, out=None)[source]

Return array with True for contacts.

Note

This method is typically only used internally.

Parameters:
  • d (array) – 2D array of distances. The method uses the value of radius to determine if a distance < radius is considered a contact.
  • out (array, optional) – If out is supplied as a pre-allocated array of the correct shape then it is filled instead of allocating a new one in order to increase performance.
Returns:

contact matrix

Return type:

array

run(store=True, force=False)[source]

Analyze trajectory and produce timeseries.

Stores results in ContactAnalysis.timeseries (if store=True) and writes them to a bzip2-compressed data file.