Immutable chain-of-blooms

jonathan.essex · December 11, 2022, 4:57pm

This is mostly a bookmark for future thought.

Background - in the base car mirror we must (at least) transfer a bloom filter as CBOR. To deal with overflow gracefully we have an abstraction which can wrap multiple bloom filters of different sizes. With the wrapper one looses the ability to perform intersections but unions and membership checks still work, with gracefully degrading performance and false positive rates.

Then we started thinking about how these blooms might be persisted between sessions.

Since this is all in CBOR the leap to implementing each bloom as an IPLD block and the wrapper as and IPLD schema is minimal.

So… we could specify a bloom size which is optimized for underlying storage. We could specify a read cache size related to the number of such blooms we are prepared to keep in memory. We have one ‘working bloom’ which is updated whenever car-mirror reads a block and that block is not present in any of the blooms in the cache. When that bloom reaches a defined saturation, it is written to the block store with a link back to the last such bloom that was written to the block store. With reasonable assumptions about the temporal clustering of reads on a given block, the write shouldn’t actually happen all that often.

Since car-mirror can already handle using multiple blooms to cull transmitted blocks, the chain of blooms thus created becomes a kind of incrementally updated fingerprint of the block store content which could be used to efficiently prime any car mirror transfer to that block store.