PyOpenMS offers Python bindings to a large part of the OpenMS API.
Install Instructions
We offer pre-built packages on PyPI (pyopenms package), which does not require compilation. If you want to use pyOpenMS in production, we recommend to follow the binary installation instructions specific for your platform on PyPI.
Build Instructions
In order to configure and build pyOpenMS successfully from source, you will need to follow these steps. Please note that compiling pyOpenMS requires substantial memory resources. Currently, Python 2.7 as well as 3.x are officially supported.
-
Install Python (preferably, 2.7 but it may run also run with Python 2.6). Alternatively, also Python 3.x will work and the same build instructions apply. Use Anaconda on Microsoft Windows.
-
On Microsoft Windows: you need the 64 bit C++ compiler from Visual Studio 2008. This is important, else you get a different clib than Python 2.7 is built with, and pyOpenMS will crash on import.
-
The easiest way to install all necessary Python packages on which pyOpenMS depends is through virtualenv:
$ virtualenv pyopenms_venv
$ source pyopenms_venv/bin/activate
$ pip install -U setuptools
$ pip install -U pip
$ pip install -U autowrap
$ pip install -U nose
$ pip install -U numpy
$ pip install -U wheel
If this worked for you, you can directly skip to the "configure" step. On Microsoft Windows, you will have virtualenv if you install through Anaconda.
-
If not using virtualenv, install first setuptools, see: https://pypi.python.org/pypi/setuptools (you will need at least version 0.12)
-
If not using virtualenv, install pip and use it to install other required Python modules
$ easy_install pip
$ pip install autowrap
$ pip install nose
If Cython doesn't get installed, install it with
Note that when using pip without root permissions, you have to add a path prefix: –install-option="--prefix=/path/to/local/python/"
-
If not using virtualenv, install numpy next:
-
Configure OpenMS with pyOpenMS: execute cmake as usual, but with parameters "-DPYOPENMS=ON". Also, if using virtualenv, add "-DPYTHON_EXECUTABLE:FILEPATH=`which python`" to ensure that the correct Python executable is used.
On windows add: "-D CMAKE_BUILD_TYPE=Release" as the standard python27.dll is built in release mode.
-
Build pyOpenMS (now there should be pyOpenMS specific build targets):
on Linux, ensure that the libOpenMS.so is in your $LD_LIBRARY_PATH (it needs to be accessible for Python)
-
Run the Python specific tests to make sure that everything went well
$ cd pyOpenMS
$ run_nose.py
run mem leak test:
-
Optionally: If you want to install locally:
$ python setup.py install
If you want to build Python installers:
$ make pyopenms_bdist_egg
or
you find the built installer files in pyOpenMS/dist
Wrapping Workflow and wrapping new Classes
How pyOpenMS wraps Python classes
General concept of how the wrapping is done (all files are in src/pyOpenMS/
):
- Step 1: The author declares which classes and which functions of these classes s/he wants to wrap (expose to Python). This is done by writing the function declaration in a file in the
pxds/
folder.
- Step 2: The Python tool "autowrap" (developed for this project) creates the wrapping code automatically from the function declaration - see https://github.com/uweschmitt/autowrap for an explanation of the autowrap tool. Since not all code can be wrapped automatically, also manual code can be written in the
addons/
folder. Autowrap will create an output file at pyopenms/pyopenms.pyx
which can be interpreted by Cython.
- Step 3: Cython translates the
pyopenms/pyopenms.pyx
to C++ code at pyopenms/pyopenms.cpp
- Step 4: A compiler compiles the C++ code to a Python module which is then importable in Python with
import pyopenms
Maintaining existing wrappers: If the C++ API is changed, then pyOpenMS will not build any more. Thus, find the corresponding file in the pyOpenMS/pxds/
folder and adjust the function declaration accordingly.
How to wrap new classes
A simple example
To wrap a new OpenMS class: Create a new "pxd" file in the folder ./pxds
. As a small example, look at the CVTerm.pxd
to get you started. Start with the following structure:
from xxx cimport *
cdef
extern from
"<OpenMS/path/to/header/Classname.h>" namespace "
OpenMS":
cdef cppclass ClassName(DefaultParamHandler):
# wrap-inherits:
# DefaultParamHandler
ClassName() nogil except +
ClassName(ClassName) nogil except +
- make sure to use
ClassName:
instead of ClassName(DefaultParamHandler)
to wrap a class that does not inherit from another class and also remove the two comments regarding inheritance below that line.
- always use
cimport
and not Python import
- always add default constructor AND copy constructor to the code (note that the C++ compiler will add a default copy constructor to any class, so there is always one if it is not declared, see http://www.cplusplus.com/articles/y8hv0pDG/ "The
implicit copy constructor does a member-wise copy of the source object.")
- to expose a function to Python, copy the signature to your pxd file, e.g.
DataValue
getValue()
and make sure you cimport
all corresponding classes. Replace std::vector
with the corresponding vector from libcpp.vector
(see for example PepXMLFile.pxd
)
- Remember to include a copy constructor (even if none was declared in the C++ header file) since Cython will need it for certain operations. Otherwise you might see error messages like
item0.inst = shared_ptr[_ClassName](new _ClassName(deref(it_terms))) Call with wrong number of arguments
A further example
A slightly more complicated class could look like this, where we demonstrate how to handle templated classes and static methods:
from xxx cimport *
from AbstractBaseClass cimport *
from AbstractBaseClassImpl1 cimport *
from AbstractBaseClassImpl2 cimport *
cdef
extern from
"<OpenMS/path/to/header/Classname.h>" namespace "
OpenMS":
cdef cppclass ClassName[T](DefaultParamHandler):
# wrap-inherits:
# DefaultParamHandler
#
# wrap-instances:
# ClassName := ClassName[X]
# ClassNameY := ClassName[Y]
ClassName() nogil except +
ClassName(ClassName[T]) nogil except + # wrap-ignore
void method_name(int param1, double param2) nogil except +
T method_returns_template_param() nogil except +
size_t size() nogil except +
T operator[](int) nogil except + # wrap-upper-limit:size()
libcpp_vector[T].iterator begin() nogil except + # wrap-iter-begin:__iter__(T)
libcpp_vector[T].iterator end() nogil except + # wrap-iter-end:__iter__(T)
void getWidgets(libcpp_vector[String] & keys) nogil except +
void getWidgets(libcpp_vector[unsigned int] & keys) nogil except + # wrap-as:getWidgetsAsIntegers
# C++ signature: void process(AbstractBaseClass * widget)
void process(AbstractBaseClassImpl1 * widget) nogil except +
void process(AbstractBaseClassImpl2 * widget) nogil except +
cdef extern from "<OpenMS/path/to/header/Classname.h>" namespace "OpenMS::Classname<OpenMS::X>":
void static_method_name(int param1, double param2) nogil except + # wrap-attach:ClassName
cdef extern from "<OpenMS/path/to/header/Classname.h>" namespace "OpenMS::Classname<OpenMS::Y>":
void static_method_name(int param1, double param2) nogil except + # wrap-attach:ClassNameY
Here the copy constructor will not be wrapped but the Cython parser will import it from C++ so that is is present (using wrap-ignore). The operator[]
will return an object of type X
or <t>Y</t> depending on the template argument and contain a guard that the number may not be exceed size()
.
The wrapping of iterators allows for iteration over the objects inside the Classname
container using the appropriate Python function (here __iter__
with the indicated return type <t>T</t>).
The wrap-as
keyword allows the Python function to assume a different name.
Note that pointers to abstract base classes can be passed as arguments but the classes have to be known at compile time, e.g. the function process
takes a pointer to AbstractBaseClass
which has two known implementations AbstractBaseClassImpl1
and AbstractBaseClassImpl2
. Then, the function needs to declared and overloaded with both implementations as arguments as shown above.
An example with handwritten addon code
A more complex examples requires some hand-written wrapper code (pxds/Classname.pxd), for example for singletons that implement a getInstance()
method that returns a pointer to the singleton resource. Note that in this case it is quite important to not let autowrap take over the pointer and possibly delete it when the lifetime of the Python object ends (leading to Segfaults in Python).
from xxx cimport *
cdef
extern from
"<OpenMS/path/to/header/Classname.h>" namespace "
OpenMS":
cdef cppclass ModificationsDB "OpenMS::ModificationsDB":
# wrap-manual-memory
# wrap-hash:
# getFullId().c_str()
ClassName(ClassName[T]) nogil except + # wrap-ignore
void method_name(int param1, double param2) nogil except +
int process(libcpp_vector[Peak1D].iterator, libcpp_vector[Peak1D].iterator) nogil except + # wrap-ignore
cdef extern from "<OpenMS/path/to/header/Classname.h>" namespace "OpenMS::Classname":
const ClassName* getInstance() nogil except + # wrap-ignore
Here the wrap-manual-memory
keywords indicates that memory management will be handled manually and autowrap can assume that a member called inst
will be provided that implements a gets()
method to obtain a pointer to an object of C++ type Classname
.
We then have to provide such an object (addons/Classname.pyx):
# This will go into the header
# NOTE: _Classname is the C++ class while Classname is the Python class
from Classname cimport Classname as _Classname
cdef class ClassnameWrapper:
# A small utility class holding a ptr and implementing get()
cdef const _Classname* wrapped
cdef setptr(self, const _Classname* wrapped): self.wrapped = wrapped
cdef const _Classname* get(self) except *: return self.wrapped
# This will go into the class
# NOTE: using shared_ptr for a singleton will lead to segfaults, use raw ptr instead
cdef ClassnameWrapper inst
def __init__(self):
self.inst = ClassnameWrapper()
self.inst.setptr(_getInstance_Classname()) # calls the import getInstance method to obtain raw ptr
def __dealloc__(self):
# Careful here, the wrapped ptr is a single instance and we should not
# reset it, therefore use 'wrap-manual-dealloc'
pass
def process(self, Container c):
return self.inst.get().process(c.inst.get().begin(), c.inst.get().end())
Note how the manual wrapping
of the process functions allows us to access the inst
pointer of the argument as well as of the object itself, allowing us to call C++ functions on both pointers. This makes it easy to generate the required iterators and process the container efficiently.
Considerations and limitations
Further considerations and limitations:
- Inheritance: there are some limitations, see for example
Precursor.pxd
- Reference: arguments by reference may be copied under some circumstances. For example, if they are in an array then not the original argument is handed back, so comparisons might fail. Also, simple Python types like int, float etc cannot be passed by reference.
- operator+=: see for example
AASequence.iadd
in AASequence.pxd
- operator==, !=, <=, <, >=, > are wrapped automatically
- Iterators: some limitations apply, see MSExperiment.pxd for an example
- copy-constructor becomes __copy__ in Python
- shared pointers: is handled automatically, check DataAccessHelper using
shared_ptr[Spectrum]
. Use from smart_ptr cimport shared_ptr
as import statement
These hints can be given to autowrap classes (also check the autowrap documentation):
- wrap-ignore: is a hint for autowrap to not wrap the class (but the declaration might still be important for Cython to know about)
- wrap-instances: for templated classes (see MSSpectrum.pxd)
- wrap-hash: hash function to use for
__hash__
(see Residue.pxd)
- wrap-manual-memory: hint that memory management will be done manually
These hints can be given to autowrap functions (also check the autowrap documentation):
- wrap-ignore: is a hint for autowrap to not wrap the function (but the declaration might still be important for Cython to know about)
- wrap-as: see for example AASequence ":"
- wrap-iter-begin, wrap-iter-end (see ConsensusMap.pxd)
- wrap-attach: enums, static methods (see for example VersionInfo.pxd)
- wrap-upper-limit:size or size() (see MSSpectrum.pxd)
Wrapping code yourself in ./addons
Not all code can be wrapped automatically (yet). Place a file with the same (!) name in the addons folder (e.g. myClass.px in pxds/ and myClass.pyx in addons) and leave two lines empty on the top (this is important). Start with 4 spaces of indent and write your additional wrapper functions, adding a wrap-ignore comment to the pxd file. For some examples, look into the src/pyOpenMS/addons/ folder:
- IDRipper.pyx
- for a reference for both input and output of a complex STL construct (map< String, pair<vector<>, vector<> > )
- MSQuantifications.pyx
- for a vector< vector< pair <String,double > > > as input in registerExperiment
- for a map< String, Ratio> in getRatios to get returned
- QcMLFile.pyx
- for a map< String, map< String,String> > as input
- SequestInfile.pyx
- for a map< String, vector<String> > to get returned
- Attachment.pyx
- for a vector< vector<String> > to get returned
- ChromatogramExtractorAlgorithm.h
- for an example of an abstract base class (ISpectrumAccess), to see how it is handled please refer to the ./addons folder
Make sure that you _always_ declare your objects (all C++ and all Cython objects need to be declared) using cdef
Type name. Otherwise you get Cannot convert ... to Python object
errors.