Brooke and I talked about reconciliation in our 1on1 again, and it came up that because of the structural differences between the public and the private filesystem, reconciliation will work differently. E.g. on the private side, usually the file on with more changes on a replica will win against the same file with fewer changes on another replica.
Thus, I’ve been thinking about making the private filesystem more “DAG-like” again.
The private filesystem is split into two sections:
- an immutable block store at
/ipfs/<userRootCID>/privateBlocks. It stores encrypted filesystem nodes keyed by CID as an MMPT for efficiency.
- a somewhat mutable “pointer” store at
The pointer store is similar to a design brooke and daniel evaluated a while ago, which brooke calls “an index” into the private filesystem. It’s (conceptually, possibly via MMPT) a map from namefilters to (possibly encrypted?) CIDs.
The pointer should be thought of similar to the root CID of the public filesystem: They’re the entry point for your filesystem. If you’ve got the root CID (i.e. know which block is supposed to be the root) and you know the key for decrypting it, you can unlock the rest of the DAG.
Now, exactly which pointers get stored isn’t obvious:
- we could store a pointer to every block. (it’s not clear to me what the source of truth should be. Do we follow the links inside each block, or do we just use the index?)
- we could store only the root of the file system, but then apps which don’t have root access wouldn’t be able to update parts of the private filesystem, because they can’t change the root pointer
- instead, I propose apps should update the pointer for the outermost part of the filesystem they changed. If they have root access, then this will be the root of the filesystem.
If you’ve got a private filesystem and you want to walk its tree, you do this primarily via the DAG structure pointed to from the root pointer, but once you reach parts of the filesystem where there’s a newer pointer that covers the subtree, you use follow the pointer’s CID instead of the CID from the block you just looked at.
Apps should the merge in these “temporary overlays” into the root DAG and update the root pointer accordingly.
Conflict resolution should basically work exactly the same as in the public filesystem, as we’ve got multiple DAG structure representing WNFSes just like in the public case.
Once conflicts have been resolved, we’ll have a new root which has the other roots as nodes somewhere in the DAG.
If we allow arbitrarily changing the pointer to the root CID, we’d allow all apps to write to rewind the state of the filesystem back destructively, even if they’ve only got
APPEND access, and there would be no way to check this on a server.
To fix this, for each path (e.g. the root path), we store a list of pointers. If an app produces a new update and only has
APPEND access, it’ll push this update into the list. A third party can then check that an
APPEND update is valid by making sure that … the update only appends. Duh.
Semantically, the actual filesystem root is then the ‘merge’ of all pointers. We have this merge operation defined through conflict resolution.
This leads us with two things:
- We generate a lot of entries for each namefilter. E.g. every private filesystem change in an app with only
APPENDaccess is another list entry. Thus, we leak some metadata.
- If we want to prune redundant pointers in the list (some of them are superseded by others because of fast-fowards), we need to log into an app that has
OVERWRITEaccess. However, we don’t want to have apps requesting overwrite access just for this purpose. But for a third party, it’s impossible to distinguish whether someone is trying to e.g. time-travel the filesystem destructively or is just trying to prune the amount of pointers.
Original discord message for completeness:I have an idea for a more DAG-like private filesystem (on the data layer) that I just want to run by you: We store encrypted blocks by their CID, at `/privateBlocks`. These almost look just like our current blocks, but without the namefilters. They reference each other by CID. Because we store the encrypted blocks in an MMPT by CID, not by namefilter, it's really efficient to test whether a CID is in under the root of a filesystem or not. (We kind of replicate an ipfs blockstore). When we store everything by CID, the immediate next question is: How do we know what the root block in the filesystem should be? For that, I'd propose an MMPT at `/privateRoots` with namefilters as keys, just like we have them right now, pointing at the 'root' block. Now, the `/privateBlocks` MMPT is meant to be append-only, while `/privateRoots` is mutable. So if you want to update the private filesystem, you'll add the new root block to `/privateBlocks` and then change the root CID pointer at `/privateRoots`. If you're an app that doesn't have access to the root of the private filesystem, you'll put another entry into `/privateRoots` to update the highest nodes you have access to. An app that has root access can later come and resolve these pointers by updating the actual root. When traversing the private filesystem DAG, you need to account these non-root 'overrides'.
Here’s an example: Let’s say you’re trying to traverse the filesystem from the root, but another app has an entry for an update at
private/Apps/flatmate. Now there will be two entries in
<root>/privateRoots. One for
private/ and one for
private/Apps/flatmate. You can start traversing the filesystem by looking at the block with the cid pointed to from the namefilter for
<root>/privateRoots. You do the normal thing of decrypting the block, looking at the keys and cids inside it and decrypting the next blocks in the DAG. No need for looking revisions ahead. The CIDs inside blocks will always refer to the most recent entry. Except for (in this case) the path at
private/Apps/flatmate. While traversing, you’ll need to check whether any of the namefilters in
<root>/privateRoots match your next paths. If so, you’ll need to look at the CID pointed to from the there, instead of what the block tells you. If you have two replicas with two different “root CIDs” for the private filesystem and you’re trying to merge these two filesystems on an actor that doesn’t have any permissions (e.g. the fission server), you simply put all conflicting root CIDs under the root namefilter in
<root>/privateRoots. At some point a client that can read the conflicting roots will be able to reconcile the changes with exactly the same algorithm as for the public side: Compare DAGs, see whether one DAG is a subDAG of the other, if so fast-forward, otherwise merge and finally replace the root CID with only one entry, the merged filesystem.
— The biggest problem I see is that the fission server will have no way to distinguish an update that only has permissions for
APPEND from an update that has permission to
OVERWRITE. One “simple” solution to this would be: Anything that’s append will only be allowed to append in
<root>/privateRoots, but anything that can
OVERWRITE will be allowed to actually apply the reconciliation. However, that means a (probably not insignificant) overhead if the user mostly uses apps that only have
APPEND permissions and rarely uses apps which have
OVERWRITE permissions: They’ll accumulate lots of entries under the namefilter, which all subsume each other (should be detected as fast-forwarded).