Introducing TreeLDR: A Canopy Across Your Data Schema Dreams

TreeLDR is an open-source developer tool with a DSL that makes managing data schemas as easy as defining data structures in your favorite (sane) statically-typed language.

Introducing TreeLDR: A Canopy Across Your Data Schema Dreams

As we discover new ways to let users control their data across the web, we face plenty of hard problems to solve on the way there. We keep encountering the challenge of managing data schemas, especially when you add digital signing to them as in the case of W3C Verifiable Credentials.

  • How can you have a handle on your data when you don’t know how to describe them?
  • How do you go from machine-readable JSON to human-friendly understanding?
  • Which fields are required, and what do they mean? Is that bankId referring to a financial institution or a river bank?

Fortunately, a crop of solutions have emerged for the problem of JSON data schema management over the years, including Semantic Web technologies (JSON-LD and SPARQL), JSON Schema, CouchDB views, and IPLD. The downside is that there are many categories of ways to manage data, primarily semantic meaning and validation, and combining them into a complete data schema management system is full of pitfalls and unpaved paths. For example,

  • JSON-LD will add semantic meaning to what a “LeaseAgreement” is in a specific context, but has no straightforward way to enforce that the “startDate” is an ISO 8601 datetime like “2022-08-16”.
  • JSON Schema can be used to require that “age” is greater than or equal to 21, but cannot explain who or what is being described by the age field in a way understandable by both humans and machines.
  • There is no agreed-upon way to perform wholesale migrations from one schema to the next one, or to rollback changes. There are some low-level protocols such as JSON patches that can serve as building blocks, but how would one automatically transform an OpenBadges V2 credential into an OpenBadges V3 one by configuring a managed migration instead of writing custom software that needs its own deployment pipeline?
  • How would you describe a JSON credential schema that must have been issued (digitally signed) by specific Ethereum or Solana accounts? What if this list of issuers needs to change, or networks need to be added based on different cryptography?

We like emergence, and therefore oppose solutions that assume a single entity can efficiently propose, define, and evolve data schemas for all conceivable use cases across disparate verticals. We believe this to be technically infeasible, politically difficult, and also against the tenets of decentralization.

Instead, we much prefer approaches where developers are empowered to self-serve, leveraging their specific domain knowledge to create data schemas that suit their use cases well–and when they need to, easily collaborate with other developers to reach a rough consensus on what would work for even more implementers.

Introducing TreeLDR (Tree Linked Data Representation)

That’s why we’re happy to introduce TreeLDR, which is an open-source developer tool with a DSL that makes managing data schemas as easy as defining data structures in your favorite (sane) statically-typed language.

TreeLDR provides a single language to define common concepts (types) and shared data representations (layouts) that can then be compiled into a concert of data schema artifacts. It can be used to produce JSON Schemas, JSON-LD contexts, migration strategies, and eventually entire SDKs (with credential issuance and verification) in various target programming languages.

In TreeLDR, not only can you import other TreeLDR definitions but also existing schemas such as JSON-LD contexts or XML XSDs. This way, developers can define data layouts in a familiar way and focus purely on the application they wish to build. Today, it just supports printing out JSON Schema and JSON-LD contexts, but it’s already usable, and more features are on the way. We felt it most important to release and quickly iterate against feedback from implementers as soon as possible, so here it is. We already use it to represent credential schemas using W3C Verifiable Credentials for the Rebase project.

Here's an example of a TreeLDR file being compiled into both JSON-LD Context and JSON Schema:

// Sets the base IRI of the document.
base <https://example.com/>;

// Defines an `xs` prefix for the XML schema datatypes.
use <http://www.w3.org/2001/XMLSchema#> as xs;

// A person.
type Person {
	/// Full name.
        name: required xs:string,
        
        /// Parents.
        parent: multiple Person,
        
        /// Age.
        age: xs:nonNegativeInteger
}

After defining the TreeLDR file, we can run the following to define JSON-LD Context:

tldrc -i example/xsd.tldr -i example/person.tldr json-ld context https://example.com/Person

and the following is the output result:

{
	"name": "https://example.com/Person/name",
        "parent": "https://schema.org/Person/parent",
        "age": "https://schema.org/Person/age"
}

We can also run the following to generate JSON Schema:

tldrc -i example/xsd.tldr -i example/person.tldr json-schema https://example.com/Person

and the following is the output result:

{ 
	"$schema": "https://json-schema.org/draft/2020-12/schema",
        "$id": "https://example.com/person.schema.json",
        "description": "Person",
        "type": "object",
        "properties": {
        	"name": {
                	"description": "Full name",
                        "type": "string"
                }
                "parent": {
                	"description": "Parents",
                        "type": "array",
                        "item": {
                        "$ref": "https://example.com/person.schema.json"
                        }
               }
               "age": {
               	        "description": "Age",
                        "type": "integer",
                        "minimum": 0
              }
      },
      "required": [
      	     "name"
      ]
}

What's Next

  • Enable merged representation of JSON-LD contexts and JSON Schema into a single file.
  • Define platform-agnostic and language-agnostic migration formats to upgrade/downgrade data schema versions automatically.
  • Begin integration with cryptoscript and DIDKit to produce developer SDKs that speak W3C Verifiable Credentials, including signing, verifying, and enforcing simple trust frameworks.
  • Investigate support for the IPLD ecosystem, including IPLD Data Schemas and Advanced Data Layouts (ADLs).

TreeLDR is one of many open source tools released by Spruce to help developers tame the complexity of decentralized identity. We are sharing it publicly before an official release so we can get your feedback, so please send any thoughts on the roadmap, feature requests, and general reactions to hello@spruceid.com or in our Discord.

Today, we will dogfood it across our projects before an official release, so if this is the kind of thing you'd like to work on, please check out our careers page.

To give TreeLDR a spin yourself, check out the Quickstart in our documentation:

TreeLDR Quickstart - SpruceID

Spruce lets users control their data across the web. Spruce provides an ecosystem of open source tools and products for developers that let users collect their data in one place they control, and show their cards however they want.

If you're curious about integrating Spruce's technology into your project, come chat with us in our Discord: