Notes on Data Interoperability and Schemas

A continuation of Where does data live on your filesystem?
Inspired by the Cambria talk Project Cambria Overview with Geoffrey Litt and Peter van Hardenberg

Goal:
Easily use different apps for the same data.

Webnative SDK

To encourage data reuse in apps, we figured the webnative SDK would benefit from common paths such as:

playlistPath = webnative.common.audio.music.playlists(
  "Feel Good Songs.m3u"
)

With that reasoning, would it make sense to have the schema in the SDK as well?

webnative.common.audio.music.playlists.schema

Or does our original idea make more sense? Using the schema of a specific app.

/public/Apps/Published/APP_NAME.fission.app/schemas/playlist.json

I’m not sure. I think we should support this as well, we can’t possibly add all schemas to the sdk.

Changes to the schema

Assuming the schema never changes, we’d have to account for schema changes somehow. We could do that by using GitHub - inkandswitch/cambria: Schema evolution with bi-directional lenses. lenses.

Tricky thing here is, where do we store these lenses? Different applications will have different needs, and thus different lenses. What if we stored the lenses, or json patches, along side the data?

That way apps can revert the data back to the common schema, or perhaps a common lens (I think the cambria lens graphs make this possible). Apply their lenses, and then store the data and the lenses they used.

4 Likes

This is a recurring topic in remoteStorage as well:

My way of doing it would be:

  1. assume that the notion of schema is not static or cannot be “common”, that there will always be inconsistencies (between multiple apps, between different versions of the same app, between representational differences like JSON vs RDF)
  2. that those inconsistencies extend to “common paths” too
  3. apps can declare lenses for translating their custom format and custom paths to a more common ones (there does not need to be consensus on which ‘common ones’), perhaps as part of manifest.json
  4. use something like Cambria to re-interpret paths and formats at runtime
2 Likes

More discussion from Geoffrey Litt in Bring Your Own Client:

  • Schema compatibility: do all the editors need to agree on a single rigidly specified format? If there are reconcilable differences between formats, can we build “live converters” that convert between them on every change? (Essentially, imagine collaborating between Pages and Microsoft Word, running a file export in both directions on every keystroke from either app) This problem is closely related to the problem of schema versioning within a single editor, but BYOC can complicate things much further.
  • Preserving intent: the decoupling of git + text editors has a downside: the text format fails to capture the intent of edits, so git can’t be very smart about merging conflicts. Is this something fundamental to decoupling editors from collaboration? Or are there ways to design APIs that preserve intent better, while also supporting an open client ecosystem? (It seems like deciding on how you store your data in a CRDT is the key question here?)
  • Additional editor-specific metadata: Some editors need to store additional data that isn’t part of the “core data model.” Eg, Sublime Text stores my .sublime-workspace file alongside the code source. How does this work smoothly without polluting the data being used by other editors?
  • Code distribution: Traditionally code distribution happens through centralized means, but could code be distributed in a decentralized way alongside documents? If we’re collaborating together in a doc, can I directly share a little editor widget/tool that I’m using, without needing to send you a Github link? This might be overcomplicating things / orthogonal to the general idea here… (This idea inspired by Webstrates, linked below)
  • Innovation: Unfortunately stable open formats can limit product innovation—eg, email clients are tied down by adherence to the email standard. Can we mitigate that effect? I think web browsers have struck a good balance between progress and openness, despite frustrations in both directions.
1 Like

Some random thoughts:

  • We should add the paths to the app’s json schemas to its manifest file
  • It should be easy to fork or reuse a schema from an app
  • We need better tools for working with JSON schemas (such as https://www.jsonschema.net/)
  • Would a “scripting app” be useful? That is, a Fission app that loads in a user’s filesystem, the user writes a script to perform, then the result is saved to the filesystem. Could be useful for data processing?
  • Users should be able to customise where an app looks for data (effectively changing its permissions). This could be done by saving a “permissions config file” in the app data folder.
2 Likes

Yes! I want this. I’d love an app that takes some sort of declarative dependency declaration file and makes sure everything in the filesystem is “up to date”. Something like makefiles/justfiles/shake etc.
There’s also some interesting research done by shake about these things: https://dl.acm.org/doi/pdf/10.1145/3236774

3 Likes

Scenario 1

Basics.

  • I start with SCHEMA_0.
  • I make a few changes to my schema, this results in SCHEMA_2.
  • Those changes are made through cambria lenses.
  • When writing data to the filesystem, I store those lenses along with it.
  • Data is written using SCHEMA_2.
  • Reading is done using SCHEMA_2.

Scenario 2

If I happen to have data that was written using SCHEMA_1, how do I get my app to read it?

  • We have one lens along side our data.
  • App applies missing lenses on data (lens 2 in this case).
  • App is able to read data.

Unresolved questions:

  • How do we programmatically determine we have to apply “lens 2”?

Scenario 3

We saved the data using SCHEMA_2, but we’re using an older version of the app that’s still on SCHEMA_1. How do we read the data?

  • We have two lenses along side our data.
  • App reverts data to SCHEMA_1 by going “backwards” using lens 2.
  • App is able to read data.

Unresolved questions:

  • How do we programmatically determine we’re at lens 1, and we have to use lens 2 to go backwards?

Scenario 4

Another app decides to use our SCHEMA_2 as their SCHEMA_0, how do we keep our data compatible?

:man_shrugging:

i think all of the unresolved questions are part of what cambria claims to do automatically: determining which lenses to apply when so that the consumer doesn’t have to. if i remember correctly, in their model you store a ‘schema version number’ with the data and pass it to cambria to help determine which lenses to apply.

I would think so too, but I can’t figure out if the library actually does that (yet).