The Bloom Clock

expede · August 14, 2019, 9:19pm

Hey folks,

I as chatting with the Textile team about a few things yesterday, and this paper came up. There’s a few different variants on this approach floating around, but I found this one to be the easiest to consume. Would love anyone’s thoughts. Enjoy!

bnchrch · August 19, 2019, 6:30pm

Here’s a good related discussion on the topic.
https://news.ycombinator.com/item?id=20093806

Some point of consideration

How does the bloom clock operate in an adversarial environment? (Unsure, but no applicable to a lot of our use cases)
How do you handle nodes being added/removed https://news.ycombinator.com/item?id=20095559

expede · August 19, 2019, 6:34pm

Interesting POV from HN

The upper-bound you mentioned technically does not happen directly because of the nodes, but because of the increased “rate of events”

The Bloom clock is more concerned about data than by participants. This is in line with the content-over-location mindset.

expede · August 19, 2019, 6:39pm

Excellent question! IMO you would probably handle that at a different layer. There are many authentication schemes, plus choice of subscription to a particular stream to anchor your bloom to theirs (user-initiated, and thus essentially pull not push). There are interesting potential applications for merging streams after-the-fact with partial information. But that’s done by choice, and doesn’t overwrite your history up to that point. In scenarios where you need true global consensus and ordering, your best bet is a blockchain with all the tradeoffs that comes with.

expede · August 19, 2019, 6:45pm

Thinking about it a bit more, you could also use an invertible bloom filter to remove bad inserts to your clock when you discover that they’re bad. Doesn’t prevent the issue in the first place, but does recover your stream like the bad merge didn’t happen.

bnchrch · August 19, 2019, 7:20pm

Where do you store all these papers brook?!

Thats a really great modification. I couldnt tell if there were any downsides mentioned. It seems like the false positive rate was the same.

expede · August 19, 2019, 11:58pm

Thanks!

Yes, I would assume the rate would be the same. There may be some weirdness if you’re ignoring updates from ID X, but a trusted peer is including them, since you’re not tracking all of the IDs coming through (there’s an unbounded number of participants). If you’re using RAFT or similar this should be pretty straightforward since everyone agrees on certain principles, but it would be ideal to have a eventually-consistent global clock of some kind. Maybe include the XOR part of the invertable bloom alongside it? There’s a few edge cases of this type that make it hard to work in an adversarial environment while keeping the clock in constant space. At that stage, we may be able to add some work or a zkSNARK, but now we’re wading pretty far into blockchain territory

sanderpick · August 31, 2019, 10:16pm

Hi Brook,

Thanks for sharing this! I was recently pointed to Hybrid Logical Clocks as well. From http://drops.dagstuhl.de/opus/volltexte/2016/6677/pdf/LIPIcs-OPODIS-2015-34.pdf:

Hybrid vector clocks (HVC) implement vector clocks (VC) in a space-efficient manner by ex- ploiting the availability of loosely-synchronized physical clocks at each node.

Have you come across any good comparisons of these two space-saving approaches?

expede · September 2, 2019, 9:06pm

Hey @sanderpick

I’ll have to read the paper again, but if I remember correctly for that technique you need to have control over (or at least detailed knowledge of) the wall clocks at each node. You then construct a event window that is equal-or-greater-than the amount of drift between the wall clocks. You only commit events at the end of each window. This is great if you’re doing distributed systems for Google’s data centres.

The Bloom Clock has the disadvantage of being probabilistic, but the advantage that you stay fully in logical-clock land with an unbounded* number of participants with unknown wall clock drift. It depends on the events (rather than participants) much like how IPFS depends on the data not the location.

*Practically unbounded, depending on the size of the bloom. There are options for increasing the size after the fact, but could possibly break for people now aware of the increase (but maybe not… would have to look at it more closely).