Prelude to Haven
I often find myself thinking about how the next great computing environment should function. Although computer hardware has advanced considerably over time, the user-oriented software that allows us access to that hardware's potential has been stuck. We're constantly rehashing the same tired metaphors that were developed decades ago: keyboards, mice, windows, icons, files, applications, clients, servers, users, passwords, and so on. A little more than a year ago, I started thinking about how the computers of the future should store and access data. After much refactoring of my ideas and their consequences, I have settled on a system that I call <strong>Haven</strong>, which I will be describing in this and later posts. This first post will talk about the sorts of problems that I have attempted to solve.<!--more-->
Most general purpose computers use the standard hierarchical filesystem model. This is the system you're all undoubtedly familiar with, where you have directories (folders), which contain files and other directories. Files are individual containers of data, wherein the data is generally stored in a use-specific format. Having used computers for more than half my life, I have accumulated hundreds of thousands of files, in tens of thousands of directories, in many different formats, and I have found this model to be unsustainable.
<h3>I hate files</h3>
Filesystems, as a rule, do not impose any sort of structure on the data stored within files. They are "content agnostic". Files are mere collections of bytes. Only applications/programs designed to interpret certain file formats can accurately represent the information stored within them. Due to this, the information can become inaccessible if you do not have such an application. Furthermore, depending on the application and how it utilizes files, a given file may contain multiple discrete pieces of information.
As an example, the music jukebox application iTunes works with audio files stored in your filesystem, each one representing a single piece of music. In order to allow the user to see "albums" or "artists" or "playlists" as conceptual entities, and search within them, iTunes also manages an "iTunes Library" file. The "iTunes Library" file contains a database which contains metadata about and references to the audio files. Although iTunes does have settings which the user can enable to have iTunes automatically store their audio files in a standard hierarchy (in which audio files are stored in directories named after their album, and the album directories are stored within directories named after their artist), if you cease using iTunes you lose access to a lot of the data that iTunes otherwise stores in its database. Your customized playlists, ratings, comments, and play counts are all stored within the "iTunes Library" file, and you must use iTunes to store and access that data.
Personally, I'll often have a friend send me a piece of music. I keep all of my friends' biographical information managed with Mac OS X's Address Book application. That application stores all of that information in a database file, similar to the "iTunes Library" file used by iTunes except designed for managing information about people rather than pieces of music. So, let's say that when a friend sends me a piece of music, I'd like to have my computer remember that. How can this be done? Although iTunes knows about "tracks" and "albums" and "artists" and "playlists", it knows nothing about ordinary "people". Apple would have to give iTunes the ability to read my Address Book database so as to allow me to select someone from my Address Book as the "Source" of a given track. And, even if that ability was added to iTunes, if I were in the Address Book looking at my friend, it would be non-trivial for me to "ask" my computer for all pieces of music sent to me by that friend unless it, too, was granted the ability to read the iTunes Library file.
Of course, these are problems that appear with only two applications. What if I'm using Finder (Mac OS X's file management application) to look at a particular audio file directly, rather than using iTunes, and have the same questions? More often than not, when I receive a piece of music from a friend, it is via e-mail or instant message, mediums which inherently contain identifying information about the person I'm corresponding with. Shouldn't my computer be able to automatically create these relationships for me, without me having to even go into iTunes and specifying a "Source" of a particular track, even if iTunes otherwise supported that feature? Well, in order to do that, my e-mail or instant messaging application would need to know about the iTunes Library and Address Book database file formats, as well as the audio file itself. Furthermore, applications such as Address Book and iTunes generally assume that they are the "owners" of their respective databases, so if another application tried to directly access their database files to modify them, things could go wrong and you may end up losing your entire database due to an error.
Even worse, given that iTunes and Address Book (and even my instant messaging application, iChat) are proprietary, closed-source applications, there is no technical documentation available to programmers that defines the structure of their files. As a programmer, if I wanted to create my own application that works with any of the data managed by those applications, it would be very difficult. So, I am not only limited by what a given application allows me to do with my data, but I am also limited, in many cases, to using those applications forever. If I wanted to switch to another application, there is no guarantee that my data will migrate over, and I may effectively lose it. To me, this is <strong>wrong</strong>. It is <strong>my</strong> data and should not be under the effective control of the producers of the applications that I use to manage it.
<h3>I hate hierarchies</h3>
Even in cases when files do represent individual pieces of information, users are often left with the directory hierarchy system as their only tool with which to conceptually relate them. Directory hierarchies provide two concepts: the concept of directories and their ability to contain both files and other directories, and the concept of directory and file names. If you're navigating through your filesystem trying to find something, you often have only its location in the hierarchy and its name to help you. This demands that you manage your filesystem carefully, to ensure that files are named and placed well enough so that you will be able to identify them later. And, depending on your usage, you may find that no one conceptual hierarchy fits all of your needs.
So, as an example, I enjoy writing essays. I'll store those essays in a directory named after the context of my essay (e.g., Metaphysics, Politics, Sociology, Psychology, etc.), which are stored in a directory named "Essays", which is stored in a directory named "Documents". Now, I also enjoy working with others on projects, and I'll create directories inside a directory named "Projects", also stored in the directory named "Documents", to represent each project. One project that I have ongoing with a friend of mine is related to some of the political essays I've written. Now, let's say that I'm looking at the project's directory, and I want to see that. Well, by default, I cannot. I need to remember that the essays are stored in a different hierarchy, with their own classification system. Most filesystems do support a "symbolic link", or "alias", or "shortcut" metaphor which allows you to put a reference to a file in multiple places, but the problem there is that the file itself still needs to be stored in an original location. If I write a political essay while working on the project, should I save it in the project's directory, and then create a link to it in my "Essays" directory? Or should I do the reverse? Also, should I name it the same thing in both directories?
What about keeping track of who is contributing to a given project? If I'm working with my friend, and he sends me some files related to the project we're working on, I'll store them in that project's directory. But how do I remember which files he worked on, and which ones I worked on? I wouldn't want to accidentally create a link in my "Essays" directory to an essay my friend wrote, as the "Essays" directory is supposed to be for essays that I wrote. But then, what if I do want to start storing essays written by others on my computer? Should I create a new hierarchy? Should I modify my old one? What about the earlier example with Address Book? What if I want to select a file in the project's directory and get to my friend's telephone number so that I can call him and talk about it?
<h3>What about sharing?</h3>
So far, I've only described some problems that an individual might encounter when managing their own files on their own computer and keeping them, at best, personally meaningful and accessible. What about sharing? If I'm using a directory hierarchy to store my data in a way that I can understand, does that guarantee that it's the most useful hierarchy for someone else? What if I want to share my essays with someone, but they'd prefer to see them in chronological order? What if I alone want to see them in chronological order, if only temporarily? What if I used an application that allowed me to embed graphics into my essays, but the person that I'm sending them to doesn't have an application that can read the file format my essays are in? What if the person is blind, and none of the applications available that can decode the file format support screen reading? What if I send my friend's essay to a third friend, and they want to call them and talk to them about it? Did whatever system I set up to remember that it was my friend's essay translate to their computer, so that they can easily figure out who to call?
<h3>Solving the problems</h3>
These are the sorts of problems and questions that led to the design that I will be describing in future posts, the design I refer to as <strong>Haven</strong>. I believe that I've come up with something that takes the best from every system of data storage and access that I've encountered, and which not only solves all of the problems I have with the status quo of filesystems, but is also capable of improving many other concepts, such as e-mail, instant messages, and the world wide web. Stay tuned!