Zorba processes XML data (modeled and accessed as XDM instances, http://www.w3.org/TR/xpath-datamodel/) accessible in the data store. Below is a short overview of the Zorba data lifecycle: how to load data in the Zorba store, how to query and update it, and how to remove the data from the store.
The store contains several types of data containers: documents, collections, and other structures. Each such container is identified by a name, which can be a URI or a QName. The association between the name and the content of the containers is maintained by the store during the lifetime of the container.
There are two kinds of containers: static and dynamic. Collections can be either static or dynamic. Documents, maps, stacks, and queues can only be dynamic.
While the dynamic containers require no static knowledge from the query processor (no need for explicit knowledge about their existence at compilation time), the static collections require the query processor to be aware of their existence, and take this information into consideration at compilation and optimization time. Note that static collections can have indexes and integrity constraints defined on them (declared using the XQuery Data Definition Facility), while the dynamic structures cannot. At compilation time the query processor needs to be aware of the indexes, integrity constraints of the static collections in order to generate correct and optimal execution plans.
Both static and dynamic collections have functions that allow queries to manipulate them. Like in most other databases, the functions are clustered into Data Definition Modules (DDL; e.g. creation, deletion) and Data Manipulation Modules (DML; e.g. query, update).
The following tables gives the complete list of all Zorba modules that allow the manipulation of static and dynamic containers.
Static Containers | ||
Container Type | Definition (DDL) / Manipulation (DML) | Module Namespace |
Collections | DDL | http://www.zorba-xquery.com/modules/store/static/collections/ddl |
Collections | DML | http://www.zorba-xquery.com/modules/store/static/collections/dml |
Indexes | DDL | http://www.zorba-xquery.com/modules/store/static/indexes/ddl |
Indexes | DML | http://www.zorba-xquery.com/modules/store/static/indexes/dml |
Integrity Constraints | DDL | http://www.zorba-xquery.com/modules/store/static/integrity_constraints/ddl |
Integrity Constraints | DML | http://www.zorba-xquery.com/modules/store/static/integrity_constraints/dml |
Dynamic Containers | ||
Container Type | Definition (DDL) / Manipulation (DML) | Module Namespace |
Collections | DDL | http://www.zorba-xquery.com/modules/store/dynamic/collections/ddl |
Collections | DML | http://www.zorba-xquery.com/modules/store/dynamic/collections/dml |
W3C Collections | DDL | http://www.zorba-xquery.com/modules/store/dynamic/collections/w3c/ddl |
W3C Collections | DML | http://www.zorba-xquery.com/modules/store/dynamic/collections/w3c/dml |
Documents | DDL / DML | http://www.zorba-xquery.com/modules/store/dynamic/documents |
Unordered Maps | DDL / DML | http://www.zorba-xquery.com/modules/store/dynamic/data-structures/unordered-map |
Stacks | DDL / DML | http://www.zorba-xquery.com/modules/store/dynamic/data-structures/stack |
Queues | DDL / DML | http://www.zorba-xquery.com/modules/store/dynamic/data-structures/queue |
Please note that all of the modules listed above require XQuery version 3.0 or later.
Other the fact the static and dynamic containers are treated differently by the query processor during compilation, their lifetime is exactly the same. Data can be loaded in any container – static or dynamic – and after that it will be available for queries and updates, until the data is explicitly deleted from the store, or the store itself expires. (Note that not all the Zorba Stores are persistent stores). Please refer to the section below on various Zorba stores.
Also please note that a data container that is available in the store will be available to all XQuery programs that are being executed synchronously. Again, please read more about the Zorba Data Stores below about data consistency details.
In the following, we show a couple of examples to demonstrate how data can be retrieved and store in various kinds of containers. It is important to note that most of the examples uses the XQuery Scripting Extension for apply pending updates in order to make them visible to subsequent expressions in the same program.
In the first scenario, we show how to use the XQuery Data Definition Facility and the XQuery Update Facility to
Declare an unordered collection that can store KML placemarks (see http://code.google.com/apis/kml/documentation/). On top of this collection, we declare a unique value index that is indexing the names of the parks. Please note that this module is a library module because collections and indexes can not be declared in a main module. All of the following examples import the library module in order to be able to access the collection.
This is an administration program that imports the module declaring the collection and invokes the create
and create
functions to create the empty collection and index containers, respectively.
Once the collection has been created, we can now populate it with some placemarks. The according placemarks (data about Wildlife National Parks in India) is retrieved from the web. The resource retrieved is a single KML document, from which we select only the placemarks. Before inserting them into the collection, the validate expression makes sure that each placemark is valid according to the KML schema.
This very simple query shows how to invoke the collection function of the dml module to retrieve the contents of the collection. The query returns the names of all national parks in India that have elephants.
To do cleanup of the data in the collection, the following snippet deletes all national parks from the database that do not have Elephants.
In order modify a particular node in a collection, the XQuery Update Facility can be used. In the example below, we set the visibility of all national parks that contain wild pigs to false because we don't want them to show up in a map. Please note that we need to insert the visibility
element after the name element because otherwise the revalidation of the node in the collection would fail.
Finally, the last examples shows how the collection and index containers can be deleted from the store. All the nodes stored in the collection are also deleted. Please note that the index has to be deleted before the according collection.
The examples in this subsection show how to
Analogous to the creation of a collection in Create a Collection and Index, the following examples demonstrates how a dynamic collection can be created. The collection is called "earthquakes" and will be used to contain data of the worldwide earthquakes from July, 29th 2011 to August, 5th 2011 (retrieved from data.gov).
This examples fetches the data as CSV from the web, converts it to XML, and inserts it into the earthquake collection. The conversion is done using Zorba's CSV converter module.
Given all the information of earthquakes in the collection, the following example shows how to query that data. The query selects all the earthquakes having a magnitude of three or higher whose region contains the string "California".
Previous examples have show how to work with static and dynamic collections. The next set of examples will focus on documents. As a data set, we use the meat, poultry, and egg inspection directory from data.gov. The data is available as a CSV file in our file system.
In this example, we read a file from the file system whose name is available in the external variable named input-context. The CSV contents of the file is parsed and converted into XML using the CSV module. The resulting document is put into the store and given the name "meat_poultry.xml".
The XML format of the resulting document (from the CSV conversion in the previous example) is not really nice. For example, a sketch of the document is as follows:
The columns in the first row element define the names of the columns in the subsequent row elements. The following query uses an XQuery Update expression to rename the columns.
Please note that the first-row-is-header
option of the csv:parse
function would have done the job also but we though it was more fun to present this query. ;-)
In this last example, we show how the document resulting from the previous example can be serialized to JSON using Zorba's JSON module.
Note: data can be loaded in the Zorba store either via API calls (see the C++ API and the other APIs, link) or directly via XQuery function calls. Both ways are being executed internally in exactly the same way – in fact all such C++ API functions are 100% mirrored by XQuery functions. We strongly encourage users to use the XQuery modules for data manipulation instead of the C++ API. The reason is simple: the XQuery processor can understand the data flow and data lifecycle in the first case, while it cannot in the second.
Zorba defines a Store API that allows developers to seamlessly process XML data stored in different places. Essentially, the Store API is a C++ interface for
Implementing this API allows, for example, XML processing of data stored in main memory, on mobile devices, in browsers, or disk- and cloud-based environments.
It is important to understand that each store implementation may define its own semantics regarding persistence and transactional semantics. For example, a mobile device store can safely assume that only a single request at a time is processed whereas a store backed by a relational database might provide full-fledged ACID behavior. Analogously, a main memory store does not provide persistence of data across process boundaries.
The Zorba source distribution as well as the packages provided by http://www.zorba-xquery.com/ come with a main memory based store. The lifecycle of the data in this store is bounded by the lifetime of the process in which it is running. For example, a document added to the store can be accessed by XQuery programs in the same process. As soon as the process terminates (or even earlier if Zorba is shutdown before the process terminates), the default in-memory store will destroy the XML data it contains. Changes to this data are not propagated automatically to any persistence storage.
However, propagating the data from the in-memory store to a persistent storage can be achieved manually using the XML serializer and the file module. For example: