Linked Asset Metadata Protocol(LAMP)

dylan · March 20, 2021, 11:58pm

Linked Asset Metadata Protocol(LAMP) Proposal

This is a suuuper rough proposal for a protocol I have been envisioning for years, but makes sense to begin developing with others(namely Boris and the Fission team), given the interest in these particular phases linked below. There is certainly more that can be done with this proposal and more to be laid out, but this is a basis for grabbing metadata on a decentralized network, leaving room to expand Web 3.0 into a network of linked concepts.

Disclaimer: The implementation of this protocol and its future development is key to Cortex

Objective: Use IPFS to build a protocol for retrieving, storing, and versioning metadata for linked assets(both on Web2 and Web3).

Tech Stack:

Microlink for grabbing metadata + screenshots and locating assets
Fission/IPFS for storing and versioning data
Project Cambria for schema maintenance
GraphQL + Next.JS for SSR?

Phases:

Metadata linking for documents on IPFS
Metadata link versioning for documents on IPFS
Ontology versioning
Potential fourth phase: Web Monetization?

dylan · March 21, 2021, 1:27am

Any initial thoughts @boris @expede ? What did I miss?

boris · March 21, 2021, 3:26am

Going to need some more detail I think exploring use cases is a good start.

Let’s start with screenshots. Attached URL, time stamp, and maybe some details around the initial submitter.

How does one do discovery? For example — a unique URL can be hashed to get a unique content address. That can be used as a key to attach to metadata values.

What are you thinking? How do you want this to work?

dylan · March 21, 2021, 3:59am

Yes for sure. I would say that is the best route for linking IPFS content and attaching metadata. But maybe the best route is for Fission to create a metadata endpoint that is stored as its own collection, the same way other collections are stored so apps can sync their data back to the user’s dId.

boris · March 21, 2021, 4:32am

OK. WNFS does have the concept of metadata and one COULD run a big public instance of WNFS, although a lot of this is getting into the territory of #developers:webnative-db instead.

Running a centralized instance is possible — the trickier part is how one “agrees” to run something where anyone can contribute.

I’m making stuff up here — it’s still not clear to me at a lower level what you would want from this / how you want it to work.

@dylan I think describing a step by step flow of data storage and retrieval for your particular use case is going to be most helpful. Don’t focus on the “how” - that’s kind of covered by the tech stack you suggested — but rather the “what”.

dylan · March 21, 2021, 5:12am

I certainly have to draw this up, but for a second let’s forget about the metadata and focus on screnshots. Metadata already exists on the micro link api, and while there is an opportunity to expand on metadata at a concept and at a ontological level with fission, I think it might be best to start off focusing on creating ipfs resources for screenshots. Linking that original tweet here: https://twitter.com/flancian/status/1373335199384285189?s=21

I think our first goal is being able to publish an ipfs address for a screenshot and have that address also be linked to some simple metadata about the website at that particular time — something I don’t believe the Internet Archive does. We could host our own version of WNFS for this, or could just run a large fission app that serves all of this content and the screenshots themselves are triggered through something like a chrome extension(or a Cortex )

Anyways, I will draw up a very basic diagram in the morning. I hope that helps better clarify it.

boris · March 21, 2021, 6:02am

OK – this is manageable scope.

So if it really is just screenshots and time stamps, we essentially have:

URL
timestamp
screenshot

Request screenshot of URL of current page from an endpoint. If screenshot exists, return it as https://some-ipfs-gateway.io/ipfs/CID

Use Microlink (or Wayback API?) to request a screenshot, push screenshot into IPFS.

Schema something like:

{
  "url": "https://example.com",
  "screenshot": "CID",
  "timestamp": "some-time-format"
}

Might want a few other things while we’re grabbing this – title, opengraph, etc. if available?

{
    "opengraph": { "opengraphobject": "goes here" }
}

Linked list of versions?

{
    "lastversion": "CID"
}

Gimme time travel! This would be a different query request – “show me last N screenshots”. The Memento Protocol may be relevant?

{
    "versions": [
    "20210320": { "screenshot": "CID", "metadata": "CID" },
    "20190923": { "screenshot": "CID", "metadata": "CID" }
  ]
}

Depending on how the request is formatted, either just return the screenshot as a gateway URL as listed, or as a JSON response.

Doing this “natively” within IPFS / IPLD?

Can we handle discovery? This is the part that is breaking my brain of if it can be done without a centralized API, and right now I don’t think it can.

Maybe something with IPNS? Like /ipns/{URL}.webnativescreenshots.com – yeah, this could work maybe! Each URL gets hashed to the CID that represents it, and becomes a subdomain of the app we run.

This needs some more tinkering, but maybe an interesting direction. Then, even without a central server, there might be a “default” current-as-possible screenshot available on IPFS.

FYI we did some earlier research on multiple image sizes / thumbnails by abusing some of the properties of IPFS / IPLD, see Basquiat, an IPFS-ready image resizing tool

Wayback API

https://archive.org/help/wayback_api.php

Maybe we can use the Wayback API in tandem with Microlink to bootstrap this?

e.g. see if any screenshots are available, call it in parallel with microlink to get a screenshot now, and then return both the now screenshot and the last N wayback screenshots

Write these results to IPFS. We’re essentially building a cache of parts of Wayback, plus adding to it (and I think requesting a URL means that Wayback screenshots it again?)

From that Wayback page, also lead me to the Time Travel / Memento API

About the Memento Project – TLDR access dated versions of websites through a Browser extension
Memento Guide: Introduction (2015)
RFC HTTP Framework for Time-Based Access to Resource States -- Memento

dylan · March 21, 2021, 3:16pm

This is a much more manageable scope. Running through a centralized endpoint might make the most sense for now so all of the metadata resources are in one place — could even use Fission versioning and that would make it easier to incorporate data already on the Fission/IPFS network.

Re momento/Internet Archive: we might not even need it for adding to the registry of screenshots — could just use that data to scrape old screenshots for metadata and add those to our registry. Just an idea, but I think that might make more sense.

I will still draw up a simple diagram of the processes, but I think we are getting closer to the initial scope

expede · March 21, 2021, 9:31pm

If you’re interested, I’d be happy to get on a call and rubber duck the design for something like this. This is fully in the “knowledge management” design space.

Yeah, these were my reaction as well

Totally an interesting project, from what I can tell it’s ambitious, but that also means that it could soak up a ton of your time that you may want to use shipping. Here be dragons A warning in advance that this road is littered with the corpses of thousands of developer hours and side projects that have come before you. Which doesn’t mean that it’s not worth doing; just that there are way more sharp edges here than at first blush. It’s “developer-nip”, because we know that there’s something there, but also keep crashing into the fact that this is an extremely deep domain to model.

Big chunks of this push towards semantic web territory, which is great for certain kinds of projects, but is unlikely to be the substrate for wide developer use, so I have open questions about what this needs to support as a first step.

If I’m understanding correctly, some of the biggest factors on approach are to think about are if you want top-down or bottom-up knowledge management. How clean does the data model need to be, does everyone need to be interoperable, what are the specific use cases? Can you build a really tight MVP, learn from it, and grow organically, or does the design need to be imposed from the beginning.

Genuine question: which parts of this would require IPFS? The internet itself is decentralized, and de-risking the project as much as possible by using RESTful endpoints is something to consider (unless you need content addressing.) How can you convince others to adopt your protocol? What’s problems for them does it solve? Where on the generality-to-power spectrum does it fall (the more general, the less powerful, and vice versa). How can you make it easy to adopt, what’s the shortest conceptual leap, most automated way to use it? There’s roughly infinite depth here, so are you hoping to extract this out of Cortex for others to use (i.e. build for your use case and see if others can use this, like how Rails got started), or does Cortex only succeed if you get massive adoption?

GraphQL + Next.JS for SSR?

That’s at a different layer, I wouldn’t worry about this at all to start

Yeah, we have all of that today in WNFS, though it would be at a lower level (“implementation detail”) for your proposed system, but would give you a speed boost not having to learn, design, and implement all of that first

For all of the above, I can point you at resources for persistent data structures, temporal structures, category theory (it’s a stereotype, but actually relevant here), traversal algorithms, and so on. Also, picking up some lean development methodologies would be helpful. But before doing a really nontrivial amount of reading, circle back to what Boris said above: really think about your use cases, the what rather than the how, and how you can start for your own use case, iterate, learn from, and so on.

Anyhow, always happy to help! Feel free to book some time in when you get further along and want to riff on design space

expede · March 21, 2021, 9:32pm

(Also probably worth noting that LAMP is a widely used term in industry: LAMP (software bundle) - Wikipedia)

expede · March 21, 2021, 9:37pm

Also not sure if you’ve seen Underlay (IIRC, coming out of MIT), but feels similar to what you’re describing. Again, need to know more about your specific use cases, but could be a good starting point to help you think through various tradeoffs:

https://www.underlay.org/

dylan · March 21, 2021, 9:50pm

Haha — I understand. I had a call with Joel of Underlay about a month ago. They’re still in RFC stage and it seems they’re taking it in a different direction right now, although when I first heard about the project I also wanted to find a way to integrate.

Would love to hop on a call and rubber duck— will be easier to me respond to all these points in one place.