11.2.8. Helper functions — MDAnalysis.lib.util
¶
Small helper functions that don’t fit anywhere else.
Changed in version 0.11.0: Moved mathematical functions into lib.mdamath
11.2.8.1. Files and directories¶
-
MDAnalysis.lib.util.
filename
(name, ext=None, keep=False)[source]¶ Return a new name that has suffix attached; replaces other extensions.
Parameters: - name (str or NamedStream) – filename; extension is replaced unless
keep=True
; name can also be aNamedStream
(and itsNamedStream.name
will be changed accordingly) - ext (None or str) – extension to use in the new filename
- keep (bool) –
False
: replace existing extension with ext;True
: keep old extension if one existed
Changed in version 0.9.0: Also permits
NamedStream
to pass through.- name (str or NamedStream) – filename; extension is replaced unless
-
MDAnalysis.lib.util.
openany
(*args, **kwds)[source]¶ Context manager for
anyopen()
.Open the datasource and close it when the context of the
with
statement exits.datasource can be a filename or a stream (see
isstream()
). A stream is reset to its start if possible (viaseek()
orreset()
).The advantage of this function is that very different input sources (“streams”) can be used for a “file”, ranging from files on disk (including compressed files) to open file objects to sockets and strings—as long as they have a file-like interface.
Parameters: - datasource (a file or a stream) –
- mode ({'r', 'w'} (optional)) – open in r(ead) or w(rite) mode
- reset (bool (optional)) – try to read (mode ‘r’) the stream from the start [
True
]
Examples
Open a gzipped file and process it line by line:
with openany("input.pdb.gz") as pdb: for line in pdb: if line.startswith('ATOM'): print(line)
Open a URL and read it:
import urllib2 with openany(urllib2.urlopen("http://www.mdanalysis.org/")) as html: print(html.read())
See also
-
MDAnalysis.lib.util.
anyopen
(datasource, mode=’r’, reset=True)[source]¶ Open datasource (gzipped, bzipped, uncompressed) and return a stream.
datasource can be a filename or a stream (see
isstream()
). By default, a stream is reset to its start if possible (viaseek()
orreset()
).If possible, the attribute
stream.name
is set to the filename or “<stream>” if no filename could be associated with the datasource.Parameters: - datasource – a file (from
file
oropen()
) or a stream (e.g. fromurllib2.urlopen()
orcStringIO.StringIO
) - mode ({'r', 'w', 'a'} (optional)) – Open in r(ead), w(rite) or a(ppen) mode. More complicated modes (‘r+’, ‘w+’, …) are not supported; only the first letter of mode is used and thus any additional modifiers are silently ignored.
- reset (bool (optional)) – try to read (mode ‘r’) the stream from the start
Returns: Return type: file-like object
Changed in version 0.9.0: Only returns the
stream
and tries to setstream.name = filename
instead of the previous behavior to return a tuple(stream, filename)
.- datasource – a file (from
-
MDAnalysis.lib.util.
greedy_splitext
(p)[source]¶ Split extension in path p at the left-most separator.
Extensions are taken to be separated from the filename with the separator
os.extsep
(as used byos.path.splitext()
).Parameters: p (str) – path Returns: (root, extension) – where root
is the full path and filename with all extensions removed whereasextension
is the string of all extensions.Return type: tuple Example
>>> greedy_splitext("/home/joe/protein.pdb.bz2") ('/home/joe/protein', '.pdb.bz2')
-
MDAnalysis.lib.util.
which
(program)[source]¶ Determine full path of executable program on
PATH
.(Jay at http://stackoverflow.com/questions/377017/test-if-executable-exists-in-python)
Parameters: programe (str) – name of the executable Returns: path – absolute path to the executable if it can be found, else None
Return type: str or None
-
MDAnalysis.lib.util.
realpath
(*args)[source]¶ Join all args and return the real path, rooted at
/
.Expands ‘~’, ‘~user’, and environment variables such as
$HOME
.Returns
None
if any of the args isNone
.
-
MDAnalysis.lib.util.
guess_format
(filename)[source]¶ Return the format of filename
The current heuristic simply looks at the filename extension and can work around compressed format extensions.
Parameters: filename (str or stream) – path to the file or a stream, in which case filename.name
is looked at for a hint to the formatReturns: format – format specifier (upper case) Return type: str Raises: ValueError
– if the heuristics are insufficient to guess a supported formatNew in version 0.11.0: Moved into lib.util
11.2.8.2. Streams¶
Many of the readers are not restricted to just reading files. They can
also use gzip-compressed or bzip2-compressed files (through the
internal use of openany()
). It is also possible to provide more
general streams as inputs, such as a cStringIO.StringIO()
instances (essentially, a memory buffer) by wrapping these instances
into a NamedStream
. This NamedStream
can then be
used in place of an ordinary file name (typically, with a
class:~MDAnalysis.core.universe.Universe but it is also possible to
write to such a stream using MDAnalysis.Writer()
).
In the following example, we use a PDB stored as a string pdb_s
:
import MDAnalysis
from MDAnalysis.lib.util import NamedStream
import cStringIO
pdb_s = "TITLE Lonely Ion\nATOM 1 NA NA+ 1 81.260 64.982 10.926 1.00 0.00\n"
u = MDAnalysis.Universe(NamedStream(cStringIO.StringIO(pdb_s), "ion.pdb"))
print(u)
# <Universe with 1 atoms>
print(u.atoms.positions)
# [[ 81.26000214 64.98200226 10.92599964]]
It is important to provide a proper pseudo file name with the correct extension
(“.pdb”) to NamedStream
because the file type recognition uses the
extension of the file name to determine the file format or alternatively
provide the format="pdb"
keyword argument to the
Universe
.
The use of streams becomes more interesting when MDAnalysis is used as glue
between different analysis packages and when one can arrange things so that
intermediate frames (typically in the PDB format) are not written to disk but
remain in memory via e.g. cStringIO
buffers.
Note
A remote connection created by urllib2.urlopen()
is not seekable
and therefore will often not work as an input. But try it…
-
class
MDAnalysis.lib.util.
NamedStream
(stream, filename, reset=True, close=False)[source]¶ Stream that also provides a (fake) name.
By wrapping a stream stream in this class, it can be passed to code that uses inspection of the filename to make decisions. For instance.
os.path.split()
will work correctly on aNamedStream
.The class can be used as a context manager.
NamedStream
is derived fromio.IOBase
(to indicate that it is a stream). Many operations that normally expect a string will also work with aNamedStream
; for instance, most of the functions inos.path
will work with the exception ofos.path.expandvars()
andos.path.expanduser()
, which will return theNamedStream
itself instead of a string if no substitutions were made.Example
Wrap a
cStringIO.StringIO()
instance to write to:import cStringIO import os.path stream = cStringIO.StringIO() f = NamedStream(stream, "output.pdb") print(os.path.splitext(f))
Wrap a
file
instance to read from:stream = open("input.pdb") f = NamedStream(stream, stream.name)
Use as a context manager (closes stream automatically when the
with
block is left):with NamedStream(open("input.pdb"), "input.pdb") as f: # use f print f.closed # --> False # ... print f.closed # --> True
Note
This class uses its own
__getitem__()
method so if stream implementsstream.__getitem__()
then that will be masked and this class should not be used.Warning
By default,
NamedStream.close()
will not close the stream but insteadreset()
it to the beginning. [1] Provide theforce=True
keyword toNamedStream.close()
to always close the stream.Initialize the
NamedStream
from a stream and give it a name.The constructor attempts to rewind the stream to the beginning unless the keyword reset is set to
False
. If rewinding fails, aMDAnalysis.StreamWarning
is issued.Parameters: - stream (stream) – an open stream (e.g.
file
orcStringIO.StringIO()
) - filename (str) – the filename that should be associated with the stream
- reset (bool (optional)) – start the stream from the beginning (either
reset()
orseek()
) when the class instance is constructed - close (bool (optional)) – close the stream when a
with
block exits or whenclose()
is called; note that the default is not to close the stream
Notes
By default, this stream will not be closed by
with
andclose()
(see there) unless the close keyword is set toTrue
.New in version 0.9.0.
-
close
(force=False)[source]¶ Reset or close the stream.
If
NamedStream.close_stream
is set toFalse
(the default) then this method will not close the stream and onlyreset()
it.If the force =
True
keyword is provided, the stream will be closed.Note
This
close()
method is non-standard.del NamedStream
always closes the underlying stream.
-
closed
¶ True
if stream is closed.
-
fileno
()[source]¶ Return the underlying file descriptor (an integer) of the stream if it exists.
An
IOError
is raised if the IO object does not use a file descriptor.
-
flush
()[source]¶ Flush the write buffers of the stream if applicable.
This does nothing for read-only and non-blocking streams. For file objects one also needs to call
os.fsync()
to write contents to disk.
-
readable
()[source]¶ Return
True
if the stream can be read from.If
False
,read()
will raiseIOError
.
-
seek
(offset, whence=0)[source]¶ Change the stream position to the given byte offset .
Parameters: - offset (int) – offset is interpreted relative to the position indicated by whence.
- whence ({0, 1, 2} (optional)) –
Values for whence are:
io.SEEK_SET
or 0 – start of the stream (the default); offset should be zero or positiveio.SEEK_CUR
or 1 – current stream position; offset may be negativeio.SEEK_END
or 2 – end of the stream; offset is usually negative
Returns: the new absolute position in bytes.
Return type:
-
seekable
()[source]¶ Return
True
if the stream supports random access.If
False
,seek()
,tell()
andtruncate()
will raiseIOError
.
- stream (stream) – an open stream (e.g.
-
MDAnalysis.lib.util.
isstream
(obj)[source]¶ Detect if obj is a stream.
We consider anything a stream that has the methods
close()
and either set of the following
read()
,readline()
,readlines()
write()
,writeline()
,writelines()
Parameters: obj (stream or str) – Returns: True
if obj is a stream,False
otherwiseReturn type: bool See also
New in version 0.9.0.
11.2.8.3. Containers and lists¶
-
MDAnalysis.lib.util.
iterable
(obj)[source]¶ Returns
True
if obj can be iterated over and is not a string nor aNamedStream
11.2.8.4. File parsing¶
-
class
MDAnalysis.lib.util.
FORTRANReader
(fmt)[source]¶ FORTRANReader provides a method to parse FORTRAN formatted lines in a file.
The contents of lines in a file can be parsed according to FORTRAN format edit descriptors (see Fortran Formats for the syntax).
Only simple one-character specifiers supported here: I F E A X (see
FORTRAN_format_regex
).Strings are stripped of leading and trailing white space.
Set up the reader with the FORTRAN format string.
The string fmt should look like ‘2I10,2X,A8,2X,A8,3F20.10,2X,A8,2X,A8,F20.10’.
Parameters: fmt (str) – FORTRAN format edit descriptor for a line as described in Fortran Formats Example
Parsing of a standard CRD file:
atomformat = FORTRANReader('2I10,2X,A8,2X,A8,3F20.10,2X,A8,2X,A8,F20.10') for line in open('coordinates.crd'): serial,TotRes,resName,name,x,y,z,chainID,resSeq,tempFactor = atomformat.read(line)
-
number_of_matches
(line)[source]¶ Return how many format entries could be populated with legal values.
-
parse_FORTRAN_format
(edit_descriptor)[source]¶ Parse the descriptor.
Parameters: edit_descriptor (str) – FORTRAN format edit descriptor Returns: dict with totallength (in chars), repeat, length, format, decimals Return type: dict Raises: ValueError
– The edit_descriptor is not recognized and cannot be parsedNote
Specifiers: L ES EN T TL TR / r S SP SS BN BZ are not supported, and neither are the scientific notation Ew.dEe forms.
-
read
(line)[source]¶ Parse line according to the format string and return list of values.
Values are converted to Python types according to the format specifier.
Parameters: line (str) – Returns: list of entries with appropriate types Return type: list Raises: ValueError
– Any of the conversions cannot be made (e.g. space for an int)See also
-
-
MDAnalysis.lib.util.
FORTRAN_format_regex
= ‘(?P<repeat>\d+?)(?P<format>[IFEAX])(?P<numfmt>(?P<length>\d+)(\.(?P<decimals>\d+))?)?’¶ Regular expresssion (see
re
) to parse a simple FORTRAN edit descriptor.(?P<repeat>\d?)(?P<format>[IFELAX])(?P<numfmt>(?P<length>\d+)(\.(?P<decimals>\d+))?)?
11.2.8.5. Data manipulation and handling¶
-
MDAnalysis.lib.util.
fixedwidth_bins
(delta, xmin, xmax)[source]¶ Return bins of width delta that cover xmin, xmax (or a larger range).
The bin parameters are computed such that the bin size delta is guaranteed. In order to achieve this, the range [xmin, xmax] can be increased.
Bins can be calculated for 1D data (then all parameters are simple floats) or nD data (then parameters are supplied as arrays, with each entry correpsonding to one dimension).
Parameters: Returns: The dict contains ‘Nbins’, ‘delta’, ‘min’, and ‘max’; these are either floats or arrays, depending on the input.
Return type: Example
Use with
numpy.histogram()
:B = fixedwidth_bins(delta, xmin, xmax) h, e = np.histogram(data, bins=B['Nbins'], range=(B['min'], B['max']))
11.2.8.6. Strings¶
-
MDAnalysis.lib.util.
convert_aa_code
(x)[source]¶ Converts between 3-letter and 1-letter amino acid codes.
Parameters: x (str) – 1-letter or 3-letter amino acid code Returns: 3-letter or 1-letter amino acid code Return type: str Raises: ValueError
– No conversion can be made; the amino acid code is not defined.Note
Data are defined in
amino_acid_codes
andinverse_aa_codes
.
-
MDAnalysis.lib.util.
parse_residue
(residue)[source]¶ Process residue string.
Parameters: residue (str) – The residue must contain a 1-letter or 3-letter or 4-letter residue string, a number (the resid) and optionally an atom identifier, which must be separate from the residue with a colon (“:”). White space is allowed in between. Returns: (3-letter aa string, resid, atomname); known 1-letter aa codes are converted to 3-letter codes Return type: tuple Examples
- “LYS300:HZ1” –> (“LYS”, 300, “HZ1”)
- “K300:HZ1” –> (“LYS”, 300, “HZ1”)
- “K300” –> (“LYS”, 300, None)
- “4GB300:H6O” –> (“4GB”, 300, “H6O”)
- “4GB300” –> (“4GB”, 300, None)
11.2.8.7. Class decorators¶
-
MDAnalysis.lib.util.
cached
(key)[source]¶ Cache a property within a class.
Requires the Class to have a cache dict called
_cache
.Example
How to add a cache for a variable to a class by using the @cached decorator:
class A(object): def__init__(self): self._cache = dict() @property @cached('keyname') def size(self): # This code gets ran only if the lookup of keyname fails # After this code has been ran once, the result is stored in # _cache with the key: 'keyname' size = 10.0
New in version 0.9.0.
Footnotes
[1] | The reason why NamedStream.close() does
not close a stream by default (but just rewinds it to the
beginning) is so that one can use the class NamedStream as
a drop-in replacement for file names, which are often re-opened
(e.g. when the same file is used as a topology and coordinate file
or when repeatedly iterating through a trajectory in some
implementations). The close=True keyword can be supplied in
order to make NamedStream.close() actually close the
underlying stream and NamedStream.close(force=True) will also
close it. |