Implements a DOM atop the XML parser, supporting document
parsing, tree traversal and ad-hoc tree manipulation.
The DOM API is non-conformant, yet simple and functional in
style - locate a tree node of interest and operate upon or
around it. In all cases you will need a document instance to
begin, whereupon it may be populated either by parsing an
existing document or via API manipulation.
This particular DOM employs a simple free-list to allocate
each of the tree nodes, making it quite efficient at parsing
XML documents. The tradeoff with such a scheme is that copying
nodes from one document to another requires a little more care
than otherwise. We felt this was a reasonable tradeoff, given
the throughput gains vs the relative infrequency of grafting
operations. For grafting within or across documents, please
use the move() and copy() methods.
Another simplification is related to entity transcoding. This
is not performed internally, and becomes the responsibility
of the client. That is, the client should perform appropriate
entity transcoding as necessary. Paying the (high) transcoding
cost for all documents doesn't seem appropriate.
Parse example
auto doc = new Document!(char);
doc.parse (content);
auto print = new DocPrinter!(char);
Stdout(print(doc)).newline;
API example
auto doc = new Document!(char);
// attach an xml header
doc.header;
// attach an element with some attributes, plus
// a child element with an attached data value
doc.tree.element (null, "element")
.attribute (null, "attrib1", "value")
.attribute (null, "attrib2")
.element (null, "child", "value");
auto print = new DocPrinter!(char);
Stdout(print(doc)).newline;
Note that the document tree() includes all nodes in the tree,
and not just elements. Use doc.elements to address the topmost
element instead. For example, adding an interior sibling to
the prior illustration
doc.elements.element (null, "sibling");
Printing the name of the topmost (root) element:
Stdout.formatln ("first element is '{}'", doc.elements.name);
XPath examples:
auto doc = new Document!(char);
// attach an element with some attributes, plus
// a child element with an attached data value
doc.tree.element (null, "element")
.attribute (null, "attrib1", "value")
.attribute (null, "attrib2")
.element (null, "child", "value");
// select named-elements
auto set = doc.query["element"]["child"];
// select all attributes named "attrib1"
set = doc.query.descendant.attribute("attrib1");
// select elements with one parent and a matching text value
set = doc.query[].filter((doc.Node n) {return n.children.hasData("value");});
Note that path queries are temporal - they do not retain content
across mulitple queries. That is, the lifetime of a query result
is limited unless you explicitly copy it. For example, this will
fail
auto elements = doc.query["element"];
auto children = elements["child"];
The above will lose elements because the associated document reuses
node space for subsequent queries. In order to retain results, do this
auto elements = doc.query["element"].dup;
auto children = elements["child"];
The above .dup is generally very small (a set of pointers only). On
the other hand, recursive queries are fully supported
set = doc.query[].filter((doc.Node n) {return n.query[].count > 1;});
Typical usage tends to follow the following pattern, Where each query
result is processed before another is initiated
foreach (node; doc.query.child("element"))
{
// do something with each node
}
Note that the parser is templated for char, wchar or dchar.
- this(size_t nodes = 1000);
- Construct a DOM instance. The optional parameter indicates
the initial number of nodes assigned to the freelist
- XmlPath!(T).NodeSet query();
- Return an xpath handle to query this document. This starts
at the document root.
See also Node.query
- Node tree();
- Return the root document node, from which all other nodes
are descended.
Returns null where there are no nodes in the document
- Node elements();
- Return the topmost element node, which is generally the
root of the element tree.
Returns null where there are no top-level element nodes
- Document reset();
- Reset the freelist. Subsequent allocation of document nodes
will overwrite prior instances.
- Document header(const(T)[] encoding = null);
- Prepend an XML header to the document tree
- void parse(const(T[]) xml);
- Parse the given xml content, which will reuse any existing
node within this document. The resultant tree is retrieved
via the document 'tree' attribute
- Node allocate();
- allocate a node from the freelist
- void newlist();
- allocate a node from the freelist
- struct Visitor;
- foreach support for visiting and selecting nodes.
A fruct is a low-overhead mechanism for capturing context
relating to an opApply, and we use it here to sweep nodes
when testing for various relationships.
See Node.attributes and Node.children
- bool exist();
- Is there anything to visit here?
Time complexity: O(1)
- int opApply(scope int delegate(ref Node) dg);
- traverse sibling nodes
- Node name(const(T[]) prefix, const(T[]) local, scope bool delegate(Node) dg = null);
- Locate a node with a matching name and/or prefix,
and which passes an optional filter. Each of the
arguments will be ignored where they are null.
Time complexity: O(n)
- bool hasName(const(T[]) prefix, const(T[]) local);
- Scan nodes for a matching name and/or prefix. Each
of the arguments will be ignored where they are null.
Time complexity: O(n)
- Node value(const(T[]) prefix, const(T[]) local, const(T[]) value);
- Locate a node with a matching name and/or prefix,
and which matches a specified value. Each of the
arguments will be ignored where they are null.
Time complexity: O(n)
- Node value(const(T[]) match);
- Sweep nodes looking for a match, and returns either
a node or null. See value(x,y,z) or name(x,y,z) for
additional filtering.
Time complexity: O(n)
- bool hasValue(const(T[]) match);
- Sweep the nodes looking for a value match. Returns
true if found. See value(x,y,z) or name(x,y,z) for
additional filtering.
Time complexity: O(n)
- struct NodeImpl;
- The node implementation
- void* user;
- open for usage
- Document document();
- Return the hosting document
- XmlNodeType type();
- Return the node type-id
- Node parent();
- Return the parent, which may be null
- Node child();
- Return the first child, which may be null
- Node childTail();
- Return the last child, which may be null
Deprecated:
exposes too much implementation detail.
Please file a ticket if you really need
this functionality
- Node prev();
- Return the prior sibling, which may be null
- Node next();
- Return the next sibling, which may be null
- const(T[]) prefix();
- Return the namespace prefix of this node (may be null)
- Node prefix(const(T[]) replace);
- Set the namespace prefix of this node (may be null)
- const(T[]) name();
- Return the vanilla node name (sans prefix)
- Node name(const(T[]) replace);
- Set the vanilla node name (sans prefix)
- const(T[]) value();
- Return the data content, which may be null
- void value(const(T[]) val);
- Set the raw data content, which may be null
- const(T[]) toString(T[] output = null);
- Return the full node name, which is a combination
of the prefix & local names. Nodes without a prefix
will return local-name only
- size_t position();
- Return the index of this node, or how many
prior siblings it has.
Time complexity: O(n)
- Node detach();
- Detach this node from its parent and siblings
- XmlPath!(T).NodeSet query();
- Return an xpath handle to query this node
See also Document.query
- Visitor children();
- Return a foreach iterator for node children
- Visitor attributes();
- Return a foreach iterator for node attributes
- bool hasAttributes();
- Returns whether there are attributes present or not
Deprecated:
use node.attributes.exist instead
- bool hasChildren();
- Returns whether there are children present or nor
Deprecated:
use node.child or node.children.exist
instead
- Node copy(Node tree);
- Duplicate the given sub-tree into place as a child
of this node.
Returns a reference to the subtree
- Node move(Node tree);
- Relocate the given sub-tree into place as a child
of this node.
Returns a reference to the subtree
- Node element(const(T[]) prefix, const(T[]) local, const(T[]) value = null);
- Appends a new (child) Element and returns a reference
to it.
- Node attribute(const(T[]) prefix, const(T[]) local, const(T[]) value = null);
- Attaches an Attribute and returns this, the host
- Node data(const(T[]) data);
- Attaches a Data node and returns this, the host
- Node cdata(const(T[]) cdata);
- Attaches a CData node and returns this, the host
- Node comment(const(T[]) comment);
- Attaches a Comment node and returns this, the host
- Node doctype(const(T[]) doctype);
- Attaches a Doctype node and returns this, the host
- Node pi(const(T[]) pi);
- Attaches a PI node and returns this, the host
- Node element_(const(T[]) prefix, const(T[]) local, const(T[]) value = null);
- Attaches a child Element, and returns a reference
to the child
- Node attribute_(const(T[]) prefix, const(T[]) local, const(T[]) value = null);
- Attaches an Attribute, and returns the host
- Node data_(const(T[]) data);
- Attaches a Data node, and returns the host
- Node cdata_(const(T[]) cdata);
- Attaches a CData node, and returns the host
- Node comment_(const(T[]) comment);
- Attaches a Comment node, and returns the host
- Node pi_(const(T[]) pi, const(T[]) patch);
- Attaches a PI node, and returns the host
- Node doctype_(const(T[]) doctype);
- Attaches a Doctype node, and returns the host
- void attrib(Node node);
- Append an attribute to this node, The given attribute
cannot have an existing parent.
- void append(Node node);
- Append a node to this one. The given node cannot
have an existing parent.
- void prepend(Node node);
- Prepend a node to this one. The given node cannot
have an existing parent.
- Node set(const(T[]) prefix, const(T[]) local);
- Configure node values
- Node create(XmlNodeType type, const(T[]) value);
- Creates and returns a child Element node
- Node remove();
- Detach this node from its parent and siblings
- Node patch(const(T[]) text);
- Patch the serialization text, causing DocPrinter
to ignore the subtree of this node, and instead
emit the provided text as raw XML output.
Warning:
this function does *not* copy the provided
text, and may be removed from future revisions
- Node mutate();
- purge serialization cache for this node and its
ancestors
- Node dup();
- Duplicate a single node
- Node clone();
- Duplicate a subtree
- void migrate(Document host);
- Reset the document host for this subtree
XPath support
Provides support for common XPath axis and filtering functions,
via a native-D interface instead of typical interpreted notation.
The general idea here is to generate a NodeSet consisting of those
tree-nodes which satisfy a filtering function. The direction, or
axis, of tree traversal is governed by one of several predefined
operations. All methods facilitiate call-chaining, where each step
returns a new NodeSet instance to be operated upon.
The set of nodes themselves are collected in a freelist, avoiding
heap-activity and making good use of D array-slicing facilities.
XPath examples
auto doc = new Document!(char);
// attach an element with some attributes, plus
// a child element with an attached data value
doc.tree.element (null, "element")
.attribute (null, "attrib1", "value")
.attribute (null, "attrib2")
.element (null, "child", "value");
// select named-elements
auto set = doc.query["element"]["child"];
// select all attributes named "attrib1"
set = doc.query.descendant.attribute("attrib1");
// select elements with one parent and a matching text value
set = doc.query[].filter((doc.Node n) {return n.children.hasData("value");});
Note that path queries are temporal - they do not retain content
across mulitple queries. That is, the lifetime of a query result
is limited unless you explicitly copy it. For example, this will
fail to operate as one might expect
auto elements = doc.query["element"];
auto children = elements["child"];
The above will lose elements, because the associated document reuses
node space for subsequent queries. In order to retain results, do this
auto elements = doc.query["element"].dup;
auto children = elements["child"];
The above .dup is generally very small (a set of pointers only). On
the other hand, recursive queries are fully supported
set = doc.query[].filter((doc.Node n) {return n.query[].count > 1;});
Typical usage tends to exhibit the following pattern, Where each query
result is processed before another is initiated
foreach (node; doc.query.child("element"))
{
// do something with each node
}
Supported axis include:
.child immediate children
.parent immediate parent
.next following siblings
.prev prior siblings
.ancestor all parents
.descendant all descendants
.data text children
.cdata cdata children
.attribute attribute children
Each of the above accept an optional string, which is used in an
axis-specific way to filter nodes. For instance, a .child("food")
will filter child elements. These variants are shortcuts
to using a filter to post-process a result. Each of the above also
have variants which accept a delegate instead.
In general, you traverse an axis and operate upon the results. The
operation applied may be another axis traversal, or a filtering
step. All steps can be, and generally should be chained together.
Filters are implemented via a delegate mechanism
.filter (bool delegate(Node))
Where the delegate returns true if the node passes the filter. An
example might be selecting all nodes with a specific attribute
auto set = doc.query.descendant.filter (
(doc.Node n){return n.attributes.hasName (null, "test");}
);
Obviously this is not as clean and tidy as true XPath notation, but
that can be wrapped atop this API instead. The benefit here is one
of raw throughput - important for some applications.
Note that every operation returns a discrete result. Methods first()
and last() also return a set of one or zero elements. Some language
specific extensions are provided for too
.child() can be substituted with [] notation instead
[] notation can be used to index a specific element, like .nth()
the .nodes attribute exposes an underlying Node[], which may be
sliced or traversed in the usual D manner
Other (query result) utility methods include
.dup
.first
.last
.opIndex
.nth
.count
.opApply
XmlPath itself needs to be a class in order to avoid forward-ref issues.
- alias Doc;
- the typed document
- alias Node;
- generic document node
- NodeSet start(Node root);
- Prime a query
Returns a NodeSet containing just the given node, which
can then be used to cascade results into subsequent NodeSet
instances.
- struct NodeSet;
- This is the meat of XPath support. All of the NodeSet
operators exist here, in order to enable call-chaining.
Note that some of the axis do double-duty as a filter
also. This is just a convenience factor, and doesn't
change the underlying mechanisms.
- Node[] nodes;
- array of selected nodes
- NodeSet dup();
- Return a duplicate NodeSet
- size_t count();
- Return the number of selected nodes in the set
- NodeSet first();
- Return a set containing just the first node of
the current set
- NodeSet last();
- Return a set containing just the last node of
the current set
- NodeSet opIndex(size_t i);
- Return a set containing just the nth node of
the current set
- NodeSet nth(size_t index);
- Return a set containing just the nth node of
the current set
- NodeSet opSlice();
- Return a set containing all child elements of the
nodes within this set
- NodeSet opIndex(const(T[]) name);
- Return a set containing all child elements of the
nodes within this set, which match the given name
- NodeSet parent(const(T[]) name = null);
- Return a set containing all parent elements of the
nodes within this set, which match the optional name
- NodeSet data(const(T[]) value = null);
- Return a set containing all data nodes of the
nodes within this set, which match the optional
value
- NodeSet cdata(const(T[]) value = null);
- Return a set containing all cdata nodes of the
nodes within this set, which match the optional
value
- NodeSet attribute(const(T[]) name = null);
- Return a set containing all attributes of the
nodes within this set, which match the optional
name
- NodeSet descendant(const(T[]) name = null);
- Return a set containing all descendant elements of
the nodes within this set, which match the given name
- NodeSet child(const(T[]) name = null);
- Return a set containing all child elements of the
nodes within this set, which match the optional name
- NodeSet ancestor(const(T[]) name = null);
- Return a set containing all ancestor elements of
the nodes within this set, which match the optional
name
- NodeSet prev(const(T[]) name = null);
- Return a set containing all prior sibling elements of
the nodes within this set, which match the optional
name
- NodeSet next(const(T[]) name = null);
- Return a set containing all subsequent sibling
elements of the nodes within this set, which
match the optional name
- NodeSet filter(scope bool delegate(Node) filter);
- Return a set containing all nodes within this set
which pass the filtering test
- NodeSet child(scope bool delegate(Node) filter, XmlNodeType type = XmlNodeType.Element);
- Return a set containing all child nodes of
the nodes within this set which pass the
filtering test
- NodeSet attribute(scope bool delegate(Node) filter);
- Return a set containing all attribute nodes of
the nodes within this set which pass the given
filtering test
- NodeSet descendant(scope bool delegate(Node) filter, XmlNodeType type = XmlNodeType.Element);
- Return a set containing all descendant nodes of
the nodes within this set, which pass the given
filtering test
- NodeSet parent(scope bool delegate(Node) filter);
- Return a set containing all parent nodes of
the nodes within this set which pass the given
filtering test
- NodeSet ancestor(scope bool delegate(Node) filter);
- Return a set containing all ancestor nodes of
the nodes within this set, which pass the given
filtering test
- NodeSet next(scope bool delegate(Node) filter, XmlNodeType type = XmlNodeType.Element);
- Return a set containing all following siblings
of the ones within this set, which pass the given
filtering test
- NodeSet prev(scope bool delegate(Node) filter, XmlNodeType type = XmlNodeType.Element);
- Return a set containing all prior sibling nodes
of the ones within this set, which pass the given
filtering test
- int opApply(scope int delegate(ref Node) dg);
- Traverse the nodes of this set
- bool always(Node node);
- Common predicate
- NodeSet assign(size_t mark);
- Assign a slice of the freelist to this NodeSet
- void test(scope bool delegate(Node) filter, Node node);
- Execute a filter on the given node. We have to
deal with potential query recusion, so we set
all kinda crap to recover from that
- bool has(Node p);
- We typically need to filter ancestors in order
to avoid duplicates, so this is used for those
purposes
- size_t mark();
- Return the current freelist index
- size_t push();
- Recurse and save the current state
- void pop(size_t prior);
- Restore prior state
- Node[] slice(size_t mark);
- Return a slice of the freelist
- size_t allocate(Node node);
- Allocate an entry in the freelist, expanding as necessary