Haven Fundamentals Part One

At the lowest level, Haven utilizes a document-oriented, schema-less database. I’ll use the term “object” where other systems would use “document”, to avoid connotations of “document” that are not appropriate at such a low level. Objects have a UUID to define their identity, and an arbitrary number of key/value pairs as content. Key names are informally standardized using reverse-DNS semantics. All objects, once assigned a UUID, are immutable. The immutability of objects referenced by a given UUID allows for the Haven database to be distributed over many, perhaps distant, computers. Ideally, there is only one Haven database (or, object namespace) for all eternity.

Object references

Objects can be referenced using a URL of the form:

haven:UUID[?key=value][#fragment-key]

If a query string is provided, with key=value pairs, it will be used to ensure the validity of the referenced object before being returned to the referencer. This will be most useful with cryptographic hashes, which while not necessarily stored as a part of the object will be provided by havend instances upon object requisition as part of the content.

If the fragment-key is provided, the reference will be to the value of the specified key within the object, rather than to the entire object. This is to provide Xanadu-style transclusion.

Object contents and semantics

I will be referring to the immediate objects managed by the storage system as “core” objects, and standard relationships consisting of multiple core objects which provide higher-level features as “abstract” objects. The simplicity of the core object intends to allow for future migration to new, unforeseen usage semantics. Although I will be describing what I believe to be necessary abstract features to higher level clients, by keeping the definition of the core object simple features can change in the future in a reasonably automated fashion. All of the following abstract object features will use objects that at the very least use the “haven.class” key.

Signed objects

One step up from a core object is the abstract “signed object”. This consists of a primary core object that contains the data to be signed, and a secondary core object (with the haven.class key set to haven.signature) that contains a reference to the primary object, as well as a cryptographic signature of the primary object.

A benefit of this system versus a system wherein the signatures are included as a part of the objects themselves is that the authority of a given object’s content can be validated by multiple people, not just the creator, allowing for a distributed web-of-trust.

Revisioned objects

One step up from a core object, again, is the abstract “revisioned object”. This consists of two primary core objects, one representing the previous revision, and another representing the next revision, and one secondary object (with the haven.class key set to haven.pedigree) that contains appropriate references to both primaries.

The benefit of the pedigree being stored separately from the content of the revisions themselves is that the creator of an object has no absolute control over the history represented by the database. This avoids problems where the creator of an object makes a mistake about what prior object is appropriately the progenitor.

Signed and revisioned objects

Both signed and revisioned object semantics can be combined to provide signed and revisioned objects. In addition to both revisions being signed, the pedigree object is signed as well.

  1. Dean Landolt says:

    From the first paragraph it sounds like you’ve started prototyping — is this the case? Is the code out in the open somewhere? Do you have an idea of what you plan on implementing in? Again from the wording on the first paragraph is sounds like CouchDB.

    This is a problem I’ve also spent quite a few hours ruminating on — I really like where you’re going with it. I’d love to take a look if you plan on putting the code out in the open.

    • Dean,

      The design described is the result of a series of different prototypes that I’ve been working on over the years. So, actually, a prototype matching the description does not yet exist, as I wanted to get my conclusions down so that I’d have something to reference as I started working on this (hopefully final) prototype.

      I’ve been on the fence as to if I should fork or patch CouchDB to do what I want, or just start from scratch. I’m leaning towards the latter. The only component from CouchDB that would really be useful is its data storage mechanism and view logic. And, as I’m only a beginner with Erlang, I could get this done much faster with C.

      I do plan on making everything described open source, as such a system would certainly need adoption to really fulfill its potential. As soon as I get some code together worth posting, up it goes. I’ve just been busy this month as I’m starting a new day job. Stay tuned, and I would certainly welcome any ideas that you have that I’m missing or not addressing in my posts.

  1. There are no trackbacks for this post yet.

Leave a Reply