Petabyte scale Webnative archives?

ianconsolata · April 25, 2023, 3:03pm

Hey everyone, I work at the Filecoin Foundation helping large archival projects at major institutions use decentralized storage. I love WNFS and their general approach to providing an easy to use abstraction on top of the decentralized web, but I’m not sure if it could handle the scale of some of the deployments we use.

Do you think WNFS could support peta-byte scale archives? I know it’s mostly designed for personal user data, but one of the big archival projects we’re working with just wants a file system interface on top of Filecoin/IPFS. They have other requirements around specific regional distribution and replication of data that may disqualify WNFS for other reasons, but I’m just curious if you think the tools you’ve been working on are something I should even be considering for deployments like this.

boris · April 27, 2023, 11:52am

Banyan https://banyan.computer/ is working on making Petabyte scale work with WNFS.

Replication and other physical location of data concerns are operational requirements that can be solved at a different layer.

@matheus23 may be able to summarize some in progress activity that are around Petabyte scale.

matheus23 · April 28, 2023, 11:25am

Yeah, Banyan is working on prototyping a CLI that prepares a bunch of CAR files for upload to Filecoin at the moment.
The way we designed rs-wnfs (the library) should make it possible to deal with huge amounts of data, we’ve recently built in some streaming methods so it’s not required to keep big files in memory all at once.

In general, we have “Big Dataset Support” on the roadmap this quarter, this includes sharding directories so they support modification even if they’re >1000 entries.

Let us know if you want to collaborate! Together we can move this ship even faster.