XPathLog is a Logic-Programming style language for XML
querying, data manipulation, and integration. It has been designed as
a crossbreed between F-Logic (which has been successfully applied to
semistructured data in pre-XML times) and XPath. Its main features are
the extension of XPath with variable bindings and the definition of a
constructive semantics for XPath atoms in rule heads. For updates and
data integration, we favor a graph-based data model instead of the XML
tree model: The XTreeGraph is an extension of the XML data
model that allows multiple overlapping trees in a graph-like
database. Result views are then defined as XML tree views over this
internal database. XPathLog and the XTreeGraph have been
implemented in the LoPiX system.
Post-Workshop Summary of the Talk:
Due to many questions at the workshop, the talk had a second title,
"From F-Logic to XPathLog": In addition to the presentation of
XPathLog, also its development as a reconcilation between F-Logic as a
"proprietary" data model and language for knowledge representation and
data integration, and the standards of the XML world is described.
The XPathLog language is a Datalog-like extension of XPath for
querying, manipulating and integrating XML data. Based on navigation
and filtering, its basic, internal semantics is closely related to
F-Logic. The querying part extends XPath with binding variables to
XML nodes that are "traversed" when evaluating an XPath expression.
Variables can be bound to literals, nodes, and even names, allowing
for metadata reasoning. The variable bindings can be output as answers,
or they can be communicated to the rule head for specifying updates in
the database. In contrast to other approaches, the XPath syntax and
semantics is also used for a declarative specification how the
database should be updated: when used in rule heads, XPath filters are
interpreted as specifications of elements and properties which should
be added to the database.
The restriction that XML uses a tree data model directly effects the
semantics of updates: if an update specifies that some subtree of a
document should be inserted also at another place, the subtree must be
copied. Thus, for references into this tree, it must be decided
whether they point into the original or into the copy; "sharing"
subtrees - as in graph data models such as OEM or F-Logic - is not
possible. On the data integration level, when restructuring trees,
fusing nodes, and introducing synonyms, this problem occurs even more.
Thus, as a data manipulation and integration language, XPathLog - and
the LoPiX implementation - internally use a graph-based, edge-labeled
model, called XTreeGraph. The XTreeGraph extends the basic XML data
model by modeling multiple overlapping trees, and thus allows for
restructuring existing XML trees into a densely connected graph
database. XML result trees are then defined as XML tree views by
projections from this database.
Thus, the "pure" XPathLog language provides an intuitive Datalog-style
extension of XPath for querying, data manipulation, and data
integration which is very close to the standard syntax and semantics
of XPath. Extended features of the language provide a class hierarchy
(including nonmonotonic inheritance), a lightweight signature
formalism (which is also used for defining tree views from the
XTreeGraph), and data-driven Web access to extend the database with
further XML documents. The expressiveness and flexibility of the full
language makes it a candidate for combined handling of data,
schema-metadata, and semantical metadata such as ontologies.