sqlamp — Materialized Path for SQLAlchemy

author:Anton Gritsay, http://angri.ru
version:0.5.2
license:2-clause BSD (see LICENSE)
download:http://sqlamp.angri.ru/sqlamp-0.5.2.tar.gz

sqlamp is an implementation of an efficient algorithm for working with hierarchical data structures — Materialized Path. sqlamp uses (and depends of) SQLAlchemy.

Materialized Path is a way to store (and fetch) a trees in a relational databases. It is the compromise between Nested Sets and Adjacency Relations in respect to simplicity and efficiency. Method was promoted by Vadim Tropashko in his book SQL Design Patterns. Vadim’s description of the method can be read in his article Trees in SQL: Nested Sets and Materialized Path (by Vadim Tropashko).

Implemented features:

  • Setting up with declarative.ext or without it.
  • Saving node roots — if no parent set for node. The tree will have a new tree_id.
  • Saving child nodes — if node has some parent. The whole dirty job of setting values in tree_id, path and depth fields is done by sqlamp.
  • Fetching node’s descendants, ancestors and children using the most efficient way available (see MPInstanceManager).
  • Autochecking exhaustion of tree size limits — maximum number of children and maximum nesting level (see MPManager to learn more about limits fine-tuning) is done during session flush.
  • Rebuilding all trees (see MPClassManager.rebuild_all_trees()) and any subtree (MPClassManager.rebuild_subtree()) on the basis of Adjacency Relations.
  • Collapsing flat tree returned from query to recursive structure (see tree_recursive_iterator()).
  • Node classes may use polymorphic inheritance.

Moving of nodes is not yet implemented.

Known-to-work supported DBMS include sqlite (tested with 3.6.14), MySQL (tested using both MyISAM and InnoDB with server version 5.1.34) and PostgreSQL (tested with 8.3.7), but sqlamp should work with any other DBMS supported by SQLAlchemy.

Supported versions of SQLAlchemy include current minor versions of branches 0.5 and 0.6 as well as 0.7 since 0.7.2.

Quickstart

Note

Code examples here are all runable by copy-paste to interactive interpreter.

import sqlalchemy, sqlalchemy.orm
engine = sqlalchemy.create_engine('sqlite:///:memory:', echo=False)
metadata = sqlalchemy.MetaData(engine)

node_table = sqlalchemy.Table('node', metadata,
    sqlalchemy.Column('id', sqlalchemy.Integer, primary_key=True),
    sqlalchemy.Column('parent_id', sqlalchemy.ForeignKey('node.id')),
    sqlalchemy.Column('name', sqlalchemy.String)
)

There is nothing special to sqlamp here. Note self-reference “child to parent” (‘parent_id’ is foreign key to table’s primary key) just as in any other implementation of adjacency relations.

import sqlamp
class Node(object):
    mp = sqlamp.MPManager(node_table)
    def __init__(self, name, parent=None):
        self.name = name
        self.parent = parent
    def __repr__(self):
        return '<Node %r>' % self.name

Attach instance of MPManager to class that represents node. The only required argument for MPManager constructor is the table object.

Now we can create the table and define the mapper (it is important to create table after MPManager was created as created MPManager appends three new columns and one index to the table):

node_table.create()

Setting up the mapper requires only one extra step — providing Node.mp as mapper extension:

mapper = sqlalchemy.orm.mapper(
    Node, node_table,
    extension=[Node.mp],
    properties={
        'parent': sqlalchemy.orm.relation(
            Node, remote_side=[node_table.c.id]
        )
    }
)

You may see value provided as properties argument: this is a way recommended by the official SQLAlchemy documentation to set up an adjacency relation.

Alternative way to set up: ext.declarative

Starting from version 0.5 it is able and convenient to use declarative approach to set your trees up:

import sqlalchemy, sqlalchemy.orm
from sqlalchemy.ext.declarative import declarative_base
import sqlamp

engine = sqlalchemy.create_engine('sqlite:///:memory:', echo=False)
metadata = sqlalchemy.MetaData(engine)

BaseNode = declarative_base(metadata=metadata,
                            metaclass=sqlamp.DeclarativeMeta)

class Node(BaseNode):
    __tablename__ = 'node'
    __mp_manager__ = 'mp'
    id = sqlalchemy.Column(sqlalchemy.Integer, primary_key=True)
    parent_id = sqlalchemy.Column(sqlalchemy.ForeignKey('node.id'))
    parent = sqlalchemy.orm.relation("Node", remote_side=[id])
    name = sqlalchemy.Column(sqlalchemy.String())
    def __init__(self, name, parent=None):
        self.name = name
        self.parent = parent
    def __repr__(self):
        return '<Node %r>' % self.name

Node.__table__.create()

As you can see it is pretty much the same as usual for sqlalchemy’s “declarative” extension. Only two things here are sqlamp-specific: metaclass argument provided to declarative_base() factory function should be DeclarativeMeta and the node class should have an __mp_manager__ property with string value. See DeclarativeMeta for more information about options.

Now all the preparation steps are done. Lets try to use it!

session = sqlalchemy.orm.sessionmaker(engine)()
root = Node('root')
child1 = Node('child1', parent=root)
child2 = Node('child2', parent=root)
grandchild = Node('grandchild', parent=child1)
session.add_all([root, child1, child2, grandchild])
session.flush()

We have just created a sample tree. This is all about AL, nothing special to sqlamp here. The interesting part is fetching trees:

root.mp.query_children().all()
# should print [<Node 'child1'>, <Node 'child2'>]

root.mp.query_descendants().all()
# [<Node 'child1'>, <Node 'grandchild'>, <Node 'child2'>]

grandchild.mp.query_ancestors().all()
# [<Node 'root'>, <Node 'child1'>]

Node.mp.query(session).all()
# [<Node 'root'>, <Node 'child1'>, <Node 'grandchild'>, <Node 'child2'>]

for node in root.mp.query_descendants(and_self=True):
    print '  ' * node.mp_depth, node.name
# root
#   child1
#     grandchild
#   child2

As you can see all sqlamp functionality is accessible via MPManager descriptor (called 'mp' in this example).

Note: Node.mp (a so-called “class manager”) is not the same as node.mp (“instance manager”). Do not confuse them as they are for different purposes and their APIs has no similar. Class manager (see MPClassManager) exists for features that are not intended to particular node but for the whole tree: basic setup (mapper extension) and tree-maintenance functions. And an instance managers (MPInstanceManager) are each unique to and bounded to a node. They allow to make queries for related nodes and other things specific to concrete node. There is also third kind of values that MPManager descriptor may return, see its reference for more info.

Implementation details

sqlamp borrowed some implementation ideas from django-treebeard. In particular, sqlamp uses the same alphabet (which consists of numeric digits and latin-letters in upper case), sqlamp as like as django-treebeard doesn’t use path parts delimiter — path parts has fixed adjustable length. But unlike django-treebeard sqlamp stores each tree absolutely stand-alone — two or more trees may (and will) have identical values in path and depth fields and be different only by values in tree_id field. This is the way that can be found in django-mptt.

sqlamp works only on basis of Adjacency Relations. This solution makes data more denormalized but more fault-tolerant. It makes possible rebuilding all paths for all trees using only AL data. Also it makes applying sqlamp on existing project easier.

Support

Feel free to email author directly to send bugreports, feature requests, patches or just to say “thanks”! :)

API Reference

exception sqlamp.PathOverflowError

Base class for exceptions in calculations of node’s path.

exception sqlamp.TooManyChildrenError

Maximum children limit is exceeded. Raised during flush.

exception sqlamp.PathTooDeepError

Maximum depth of nesting limit is exceeded. Raised during flush.

class sqlamp.MPManager(table, parent_id_field=None, path_field='mp_path', depth_field='mp_depth', tree_id_field='mp_tree_id', steplen=3, instance_manager_key='_mp_instance_manager')

Descriptor for access class-level and instance-level API.

Basic usage is simple:

class Node(object):
    mp = sqlamp.MPManager(node_table)

Now there is an ability to get instance manager or class manager via property 'mp' depending on way to access it. Node.mp will return mapper extension till class is mapped, class manager MPClassManager after that and instance_node.mp will return instance_node’s MPInstanceManager. See that classes for more details about their public API.

Changed in version 0.5.1: Previously mapper extension was accessible via class manager’s property.

Parameters:
  • table – instance of sqlalchemy.Table. A table that will be mapped to node class and will hold tree nodes in its rows. It is the only one strictly required argument.
  • parent_id_field=None – a foreign key field that is reference to parent node’s primary key. If this parameter is omitted, it will be guessed joining a table with itself and using the right part of join’s onclause as parent id field.
  • path_field=’mp_path’ – the name for the path field or the field object itself. The field will be created if the actual parameter value is a string and there is no such column in the table table. If value provided is an object column some sanity checks will be performed with the column object: it should have nullable=False and have PathField type.
  • depth_field=’mp_depth’ – the same as for path_field, except that the type of this column should be DepthField.
  • tree_id_field=’mp_tree_id’ – the same as for path_field, except that the type of this column should be TreeIdField.
  • steplen=3 – an integer, the number of characters in each part of the path. This value allows to fine-tune the limits for max tree depth (equal to (255 / steplen) + 1) and max children in each node (36 ** steplen). Default 3 value sets the following limits: max depth is 86 and max children number is 46656.
  • instance_manager_key=’_mp_instance_manager’ – name for node instance’s attribute to cache node’s instance manager.

Warning

Do not change the values of MPManager constructor’s attributes after saving a first tree node. Doing this will corrupt the tree.

__get__(obj, objtype)

There may be three kinds of return values from this getter.

The first one is used when the class which this descriptor is attached to is not yet mapped to any table. In that case the return value is an instance of MPMapperExtension. which is intended to be used as mapper extension.

The second scenario is access to MPManager via mapped class. The corresponding MPClassManager instance is returned.

Note

If the nodes of your tree use polymorphic inheritance it is important to know that class manager is accessible only via the base class of inheritance hierarchy.

And the third way is accessing it from the node instance. Attached to that node instance manager is returned then.

class sqlamp.DeclarativeMeta(name, bases, dct)

Metaclass for declaratively defined node model classes.

New in version 0.5.

See usage example above in Quickstart.

All options that accepts MPManager can be provided with declarative definition. To provide an option you can simply assign value to class’ property with name like __mp_tree_id_field__ (for tree_id_field parameter) and so forth. See the complete list of options in MPManager‘s constructor parameters. Note that you can use only string options for field names, not the column objects.

A special class variable __mp_manager__ should exist and hold a string name which will be used as MPManager descriptor property.

class sqlamp.MPClassManager(node_class, opts)

Node class manager. No need to create it by hand: it’s created by MPManager.

Parameters:
  • node_class – class which was mapped to the tree table.
  • opts – instance of MPOptions.

Changed in version 0.6: Previously existing method __clause_element__ which used to allow using the instances of MPClassManager as arguments to methods query.order_by() was removed in 0.6. Use query() instead.

max_children

The maximum number of children in each node, readonly.

max_depth

The maximum level of nesting in this tree, readonly.

query(session)

Query all stored trees.

Parameters:session – a sqlalchemy Session object to bind a query.
Returns:Query object with all nodes of all trees sorted as usual by (tree_id, path).

Changed in version 0.6: Before 0.6 this method was called query_all_trees. The old name still works for backward compatibility.

rebuild_all_trees(order_by=None)

Perform a complete rebuild of all trees on the basis of adjacency relations.

Drops indexes before processing and recreates it after.

Parameters:order_by – an “order by clause” for sorting root nodes and a children nodes in each subtree.
rebuild_subtree(root_node_id, order_by=None)

Reset paths for all nodes in subtree defined by root_node_id on the basis of adjacency relations.

Parameters:
  • root_node_id – the value of subtree root’s primary key.
  • order_by – an “order by clause” for sorting children nodes in each subtree.
class sqlamp.MPInstanceManager(opts, root_node_class, obj)

A node instance manager, unique for each node. First created on access to MPManager descriptor from instance. Implements API to query nodes related somehow to particular node: descendants, ancestors, etc.

Parameters:
  • opts – instance of MPOptions.
  • root_node_class – the root class in the node class’ polymorphic inheritance hierarchy. This class will be used to perform queries.
  • obj – particular node instance.
filter_descendants(and_self=False)

Get a filter condition for node’s descendants.

Requires that node has path, tree_id and depth values available (that means it has “persistent version” even if the node itself is in “detached” state or it is in “pending” state in autoflush-enabled session).

Usage example:

Node.mp.query(session).filter(root.mp.filter_descendants())

This example is silly and only shows an approach of using filter_descendants. Don’t use it for such purpose as there is a better way for such simple queries: query_descendants().

Parameters:and_selfbool, if set to True self node will be selected by filter.
Returns:a filter clause applicable as argument for sqlalchemy.orm.Query.filter() and others.
query_descendants(session=None, and_self=False)

Get a query for node’s descendants.

Requires that node is in “persistent” state or in “pending” state in autoflush-enabled session.

Parameters:
  • session – session object for query. If not provided, node’s session is used. If node is in “detached” state and session is not provided, query will be detached too (will require setting session attribute to execute).
  • and_selfbool, if set to True self node will be selected by query.
Returns:

a sqlalchemy.orm.Query object which contains only node’s descendants and is ordered by path.

filter_children()

The same as filter_descendants() but filters children nodes and does not accepts and_self parameter.

query_children(session=None)

The same as query_descendants() but queries children nodes and does not accepts and_self parameter.

filter_ancestors(and_self=False)

The same as filter_descendants() but filters ancestor nodes.

query_ancestors(session=None, and_self=False)

The same as query_descendants() but queries node’s ancestors.

sqlamp.tree_recursive_iterator(flat_tree, class_manager)

Make a recursive iterator from plain tree nodes sequence (Query instance for example). Generates two-item tuples: node itself and it’s children collection (which also generates two-item tuples...) Children collection evaluates to False if node has no children (it is zero-length tuple for leaf nodes), else it is a generator object.

Parameters:
  • flat_tree – plain sequence of tree nodes.
  • class_manager – instance of MPClassManager.

Can be used when it is simpler to process tree structure recursively. Simple usage example:

def recursive_tree_processor(nodes):
    print '<ul>'
    for node, children in nodes:
        print '<li>%s' % node.name,
        if children:
            recursive_tree_processor(children)
        print '</li>'
    print '</ul>'

query = root_node.mp.query_descendants(and_self=True)
recursive_tree_processor(
    sqlamp.tree_recursive_iterator(query, Node.mp)
)

Changed in version 0.6: Before this function was sorting flat_tree if it was a query-object. Since 0.6 it doesn’t do it, so make sure that flat_tree is properly sorted. The best way to achieve this is using queries returned from public API methods of MPClassManager and MPInstanceManager.

Warning

Process flat_tree items once and sequentially so works right only if used in depth-first recursive consumer.

class sqlamp.PathField

Varchar field subtype representing node’s path.

class sqlamp.DepthField

Integer field subtype representing node’s depth level.

class sqlamp.TreeIdField

Integer field subtype representing node’s tree identifier.

Changelog

0.6: not released yet

The most notable change in 0.6 is the full support of SQLAlchemy 0.7.2, 0.6.8 and 0.5.8. Adding support of 0.7 required some backward-incompatible changes in public API, from which the biggest one is inability to use MPClassManager instances as arguments for order_by(). If you use query* methods from MPClassManager and MPInstanceManager you don’t need to worry — query objects which come from there are properly ordered. If you use old semantic like session.query(Node).order_by(Node.mp) you will need to convert such code to use MPClassManager.query(), like Node.mp.query(session).

Another similar change is that tree_recursive_iterator() doesn’t reorder the argument provided, so if you construct your queries by hand (which is not recommended) — make sure that they’re properly ordered.

  • Python branches 2.4 to 2.7 as well as python3 are supported now.
  • MPClassManager can not be used as an argument for order_by(). Instead use method MPClassManager.query() for constructing queries.
  • Method MPClassManager.query_all_trees() was renamed to query(). The old name still works though.
  • tree_recursive_iterator() doesn’t reorder query argument anymore.
  • Added support of SQLAlchemy 0.7.2.
  • Documentation was cleaned up and updated.
  • Workaround for bug in sqlite 3.6.x (problems with binding two integer attributes). Initial patch by Josip Delic.

0.5.2: released 2010-09-19

  • SQLAlchemy of versions 0.6.x is now supported as well as 0.5.x.
  • tree_recursive_iterator() does not require python 2.6+ anymore, python 2.5 is now fully supported.

0.5.1: released 2009-11-29

  • mapper_extension property now removed from class manager, descriptor MPManager returns it instead of class manager if the class is not mapped yet.
  • Joined table inheritance now supported both in imperative and declarative ways: thanks to Laurent Rahuel for testing.

0.5: released 2009-09-05

This release contains some backward-incompatible changes in setup facilities. The main highlights are support of declarative SQLAlchemy extension and some cleaning up in MPManager‘s constructor options.

  • Index name now includes table name as prefix so there is an ability to have two or more mp-driven tables in the same database.
  • There is only one strictly required option for MPManager now: the table object. pk_field option removed at all (can be safely determined from the table) and parent_id_field could be guessed for almost every simple table (if yours are not so simple you can provide this option as it was with 0.4.x).
  • changed names of path_field, depth_field and tree_id_field parameters of MPManager‘s constructor: removed _name suffix: now the values can be column objects and they are not redefined if such a column exists already.

0.4.1: released 2009-07-16

  • Fixed another bug in MPClassManager.rebuild_all_trees(): tree_id and depth for root nodes were not updated. Method also was slightly optimized to do less queries.
  • Small fixes in documentation.

0.4: released 2009-06-11

  • Small fixes in documentation: actually Tropashko was not the first who introduced MP, he only promoted it.

  • Implemented MPClassManager.query_all_trees().

  • Fixed a bug of MPClassManager.rebuild_all_trees() did not reset path for root nodes.

  • Implemented tree_recursive_iterator().

  • Changed the value of path field for a root nodes. Previously they used to had '0' * steplen path and so first-level children gain '0' * steplen * 2, but now new roots will have an empty string in their path field. This change should be backward-compatible as it touches only new trees. But if you want to have no difference between two identical old and new trees in your table you can rebuild all your trees by Node.mp.rebuild_all_trees() or use sql query like this:

    UPDATE <table> SET mp_path = substr(mp_path, <steplen> + 1);

    This will remove steplen characters from the beginning of each node’s path. Do not forget to backup your data!

0.3: released 2009-05-23

The first public release.