HXQ

From HaskellWiki
Jump to navigation Jump to search
The printable version is no longer supported and may have rendering errors. Please update your browser bookmarks and please use the default browser print function instead.

Introduction

Developed by Leonidas Fegaras, HXQ is a fast and space-efficient translator from XQuery (the standard query language for XML) to embedded Haskell code. The translation is based on Haskell templates. HXQ takes full advantage of Haskell's lazy evaluation to keep in memory only those parts of XML data needed at each point of evaluation, thus performing stream-based evaluation for forward queries (queries that do not contain backward steps). This results to an implementation that is as fast and space-efficient as any stream-based implementation based on SAX filters or finite state machines. Furthermore, the coding is far simpler and extensible since its based on XML trees, rather than SAX events.

Comparison to other implementations

For example, a complex XQuery against the DBLP bibliography (420MB XML) runs in 39 seconds on my laptop (using 18MB of max heap space in ghc). To contrast this, Qexo, which compiles XQueries to Java bytecode, took 1 minute 17 seconds (using no less than 1400MB of heap space). Also XQilla, which is written in C++, took 1 minute and 10 secs (using 1150MB heap).


Current Status

HXQ supports most essential XQuery features, although some system functions are missing (but are easy to add). To see the list of supported system functions, run xquery -help . HXQ does not have static typechecking; it leaves all checking to Haskell. This means that it distinguishes regular predicates from indexing at run time: if an XPath predicate returns an integer at run time, it is taken as indexing and this index is checked against the current node position. The most important omission is backward step axes, such as /.. (parent). Some, but not all, parent axis steps are removed using optimization rules; all others cause a compilation error. Finally, the XQuery semantics requires duplicate elimination and sorting by document order for every XPath step, which is very expensive and unnecessary in most cases. For example, e//*//* may return duplicate elements in HXQ. This will be addressed in the future (needs a static analysis to determine when duplicate elimination is necessary).

Further information

For examples and further information on usage, refer to index.html in the HXQ distribution which can be found on hackage.

See also,