libipld: Block API

There is a new Block API in the JavaScript IPLD implementation, which I find quite nice.

I was thinking about using a similar one for Rust IPLD. I’m still implementing/playing around with the idea, but I thought I publish a draft early on:

Block

A Block is an IPLD object together with a CID. The data can be encoded and decoded.

All operations are cached. This means that encoding, decoding and CID calculation happens at most once. All subsequent calls will use a cached version.

Methods

`impl<'a> Block<'a>`

`pub fn new<R>(cid: Cid, raw: Vec<u8>) -> Self where R: Registry`

Create a new Block from the given CID and raw binary data.

It needs a registry that contains codec and hash algorithms implementations in order to be able to decode the data into IPLD.

`pub fn encoder(node: Ipld, codec: &'a dyn Codec<Error = Error>, hash_alg: &'a dyn Hash) -> Self`

Create a new Block from the given IPLD object, codec and hash algorithm.

No computation is done, the CID creation and the encoding will only be performed when the corresponding methods are called.

`pub fn decoder(aw: Vec<u8>, codec: &'a dyn Codec<Error = Error>, hash_alg: &'a dyn Hash) -> Self`

Create a new Block from encoded data, codec and hash algorithm.

No computation is done, the CID creation and the decoding will only be performed when the corresponding methods are called.

`pub fn decode(&mut self) -> Ipld`

Decode the Block into an IPLD object.

The actual decoding is only performed if the object doesn’t have a copy of the IPLD object yet. If that method was called before, it returns the cached result.

`pub fn encode(&mut self) -> Vec<u8>`

Encode the Block into raw binary data.

The actual encoding is only performed if the object doesn’t have a copy of the encoded raw binary data yet. If that method was called before, it returns the cached result.

`pub fn cid(&mut self) -> Cid`

Calculate the CID of the Block.

The CID is calculated from the encoded data. If it wasn’t encoded before, that operation will be performed. If the encoded data is already available, from a previous call of encode() or because the Block was instantiated via encoder(), then it isn’t re-encoded.

About this issue

Original URL
State: closed
Created 4 years ago
Comments: 30 (16 by maintainers)

Most upvoted comments

Expect an update to this proposal, this week.

Next week 😕

vmx on Apr 9, 2020

Also reading the proposal again, the concept of a registry is weird in a compiled language. What’s wrong with something like this:

match codec {
    #[cfg(feature = "cbor")]
    cbor => cbor.encode()
}

dvc94ch on Apr 4, 2020

Not an expert but some high level thoughts: it seems like by implementing a set of independent traits one can “eat your cake and have it too” while both keeping the code idiomatic to Rust and also achieving the API signature that you want / are used to.

I think some careful thought about how to design the individual traits and as much helpful context we can get us gonna be the trick here. This also might be something one might be able to reason about and experiment with either in the tests or in the examples folder, since it’s hard to design all this abstractly without building toward a use case or two, no matter how contrived and simple.

aphelionz on Mar 6, 2020

One last and rather important point I somehow forgot. Part of the value of IPLD is to be codec agnostic. That flexibility is lost if you build on codec specific interfaces and a lot of thought went into how the Block API could maintain that flexibility throughout the stack. It’s probably worth pulling in @Gozala since he provided a lot of the insight here when we were designing the new JS Block interface.

mikeal on Mar 5, 2020

let next = store.read_cbor<UserType>(&cid).await?;

I was wondering what store is about and then found it in an older comment from @dvc94ch I missed to reply to.

I’m also not really sure that a block as a (cid, bytes) tuple provides a meaningful abstraction, as I tried various Block structs and then decided that having a store trait was the more sensible approach. The Block struct was only getting in the way and not really adding anything.

The current direction of IPLD implementations is to move away from having block storage as a central piece. The data should be able to come from anywhere, memory, network or disk. Hence the idea of the Block API came up. You get the block from “somewhere” and then work with it.

vmx on Mar 5, 2020

Looking at the proposed structs I feel like I not able to see the usecase for keeping all of these components together in a single struct?

I can speak pretty well to how this ends up being used in JavaScript.

Once you start layering different interfaces you end up with a lot of actors doing encoding, decoding, and/or hashing. Combining these into a single interface means that you can provide a common API to pass around that normalizes and caches all of this.

This also means that you can pass a single interface to any actor, regardless of whether or not they want the encoded state, the decoded state, or the content address for linking. All of these states can be generated just-in-time and then cached for future consumers of the block.

mikeal on Mar 4, 2020