A Hyperset Approach to Semistructured or Web-like Databases
Since 1970, when E.F. Codd introduced the relational approach to
databases, Relational Databases have become the mainstream both in
database theory and in the industry of database management
systems. This is one of the brightest examples of the influence of
mathematics and mathematical logic (where relation is also a
fundamental concept) on computer practice. Now a new idea of
semistructured databases (SSD) has arisen. It does not fit well with
the relational approach although it is usually presented in terms of
edge labelled graphs which are mathematically just two column
relational tables (with labelled rows). The point is that some other
kind of issues with such graphs arises which are not so "relational".
The Hyperset approach to SSD is intimately connected with the graph
approach and is, in fact, a natural generalization of the relational
approach. The idea is extremely simple, but abstract.
Each vertex X of a graph is considered as a (hyper)set name and
generates a set equation X = {l_1:X_1,...,l_n:X_n} where l_i:X -> X_i
are all outgoing l_i-labelled edges. According to the
(non-wellfounded) Hyperset Theory (from a book of P.Azcel or another
book of J. Barwise and L. Moss) each such system of set equations
generated by a graph has a unique solution in the universe of
(abstract) hypersets. Each hyperset X is a set of labelled sets of
sets of sets, etc., possibly with cycles (according to the given graph
or system of set equations).
Thus, graph vertices only represent hypersets. These are hypersets
which are considered as (abstract) SSD states (like relations which
are also sets of tuples). Any query to SSD in this approach is just an
operation over hypersets (like operations over relations in the
relational approach to databases), whereas graphs or, equivalently,
systems of set equations representing hypersets and SSDs serve to
compute these abstract operations/queries by appropriate
(restructuring and bisimulation invariant) graph transformers.
A hyperset query language Delta and its more practical version
will be discussed whose expressive power proves to be exactly
PTIME (also for the case of multi-hypersets). Some variations
of Delta for the case of acyclic graphs (sets) have expressive
power (N/D)LOGSPACE. No such results are known to the author
for purely graph approaches to SSD (without involving hypersets).
Any SSD, as a system of hyperset equations, may be divided into parts
and represented as, say, HTML files distributed potentially over the
world with each set name X_i "clickable" and leading to (either the
same or possibly remote) file with the corresponding equation X_i =
{...}. This, as well as the arbitrary graph structure of SSD, suggests
calling such databases Web-like (WDB). Thus, in principle, the
Delta-language can be considered as a powerful, flexible and
restructuring (unlike the usual querying mechanisms of, e.g., Google)
query language to WDB or even to WWW.
Some implementation issues potentially leading to a WDBMS, (like many
existing RDBMS) for example, multiple mobile distributed agent
execution of Delta queries may be discussed, too.
(This talk is based on joint works with my colleagues Alexei Lisitsa,
Alexandr Leontjev and Yuri Serdyuk. Also, student Richard Molyneux is
currently working on a new attempt of implementation of Delta based on
HTML files of the above kind.)