More on synchronization

I started a draft implementation of my identity synchronization idea. In the course of doing this I ran across a minor problem.My initial idea was to keep things very simple: a shared resource would be a collection of bits, and the users of the API would handle all aspects of interpretation. The library would handle bookkeeping details but would defer conflict resolution to the caller — handing back two versions of a file (three-way merge is a possibility, too, but I’m not convinced it is worth the effort) and having the caller respond with a merged version.

Now suppose you log in on your laptop and make some changes while disconnected. When you subsequently reconnect, the ideal would be for all the identity data to automatically resynchronize with the server. This seems like the most useful thing to do — I don’t want to have to actually restart RSSOwl for my saved blog-reading sessions to be available to my other machines.

Unfortunately this throws a wrench into the naive approach outlined above. If the synchronized data is uninterpreted bytes, then any conflict will mess up the idea of generic synchronization without running a particular application to handle a merge.

One way to fix this would be to mandate a file format so that the merge code can be generic. I’m not very fond of this. The other, and I think superior, plan would be to handle conflicts monotone-style: let all commits succeed, and handle merging on demand. In this model we can upload a file (with some ancestry info); when downloading updated files from the server we would simply download all available “head” files and the application would perform multiple merges.

This means having a smart server, but I think that was inevitable anyhow. (Probably Nathaniel is on target, as usual, and I should be reusing monotone’s code for this…)

Another little problem that came up is keyring management. In particular, the gnome keyring will be needed to decrypt data downloaded from the server. But, we want to store the keyring itself as a shared resource. I think the only answer here is a special API used only by gnome-keyring that lets us skip over the download step the first time the keyring file is needed.

Finally, there is a completely different design available. We could use a file-based service like monotone, check things in, and then provide merge utilities for each different file type. One way I don’t like this is that I suspect that current programs don’t differentiate between the different types of stored data; changing the programs themselves ensures a clean separation of identity-related data and transient data. At least for the time being I’m sticking with my first approach.

One Comment

  • > let all commits succeed, and handle merging on
    > demand.

    Right — and this is what lets you do the sort of opportunistic replication I was talking about on the last post. It’s the Right Way.

    I’m not sure why you need a smart-server — it’s (in principle) possible for monotone to sync over dumb servers, and the same tricks would probably work just as well here.

    Just because monotone’s data model happens to be based on filesystem conventions, doesn’t mean you have to store raw files in it. Really it is versioning trees, where each node in the tree is annotated with a unique-among-siblings name (in unicode), zero or more key->value mappings, and leaf nodes have an additional anonymous annotation (“file content”). This data structure is a bit quirky and shows its file-based heritage, but it still seems pretty flexible to me. (Cf., say, XML, as evidence that pretty much anything you want to express, you can express as a tree.)

Join the Discussion

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <s> <strike> <strong>

This site uses Akismet to reduce spam. Learn how your comment data is processed.