cosmos-sdk: Protobuf tx UX is bad for multi-chain wallets/block explorers

Problem Definition

When I first ran a testnet for Regen Network, my team and I were able to customize the Lunie wallet and Big Dipper block explorer to support our testnet by just modifying a few parameters in configuration files. In spite of the UX challenges of the amino encoding, this was a fairly good experience.

The current protobuf tx design would make this much harder. Every app will define its own Transaction, Message, and SignDoc types and client applications will need to run code generation for every new chain they intend to support and write custom code to:

  • marshal module Msgs into the app Message
  • sign transactions using the app-specific SignDoc
  • marshal interface types into app-specific Msgs like MsgSubmitProposal

After having spent just a bit of time actually dogfooding our work, I find these tradeoffs unacceptable.

Proposal

I propose we leverage google.protobuf.Any to solve these issues. This can be done in a way that improves client developer UX without increasing encoding payload size, by separating signing messages from encoding messages.

google.protobuf.Any provides essentially the same UX as amino JSON for these use cases:

// amino json
{"type": "cosmos-sdk/Account", "value": { "address": "cosmos...", ... }}

// google.protobuf.Any
{"@type": "/cosmos_sdk.x.account.v1.Account", "value": { "address": "cosmos...", ... }}

One of the biggest concerns around using Any initially was that it increases stored message size by encoding the long URL. If, however, we just use Any for signing and a simplified client UX and still use oneofs for encoding, we don’t have to accept a poor compromise between developer UX and performance.

A more detailed proposal will be submitted in the form of a PR.


For Admin Use

  • Not duplicate issue
  • Appropriate labels applied
  • Appropriate contributors tagged
  • Contributor assigned/self-assigned

About this issue

  • Original URL
  • State: closed
  • Created 4 years ago
  • Reactions: 4
  • Comments: 38 (35 by maintainers)

Most upvoted comments

I just got to catch up on this thread. Here are my thoughts:

First, the framing of the issue is missing a bit. There are a few issues mixed together

(a) Compatibility of JSON-to-Protobuf encodings AND (b) (Re-)Use of Protobuf types

I will focus just on (b). I think we also need a solid cross-platform JSON-to-Protobuf solution, or decide that we do not need to sign this custom “canonical JSON” representation. But that is another issue I think.


First off, amino doesn’t “just work”. If my change uses a different Account type, all wallets and explorers will fail. If I use the same concrete types for the interfaces, then the binary representations are compatible. That is true. Assuming people develop wallets and explorer for the gaia hub, we get compatibility when:

(a) for wallets (sending tx) - if the target chain tx/msg types are a superset of the types in the hub (b) for explorers (parsing tx) - if the target chain tx/msg types are a subset of the types in the hub

They only work perfect when you have the same type definitions as hub. You know the issue that the custom regen governance votes don’t show up in the explorers (as they are not a subset of the gaia/hub types.

This means, the two chains have slightly different type definitions for tx/msg (the oneof/interface types), but since there is significant overlap, much tooling (targeting that common ground) just works.


GIven the above, we get the exact same condition with protobuf oneof. It just takes a little thinking. This is how we did it in weave. In fact, most demos of weave had two chains - one was bnsd, which had the name service, governance, and some more modules; the other was bcpd, which had a much limited set of actions (bank, multisig, atomic swap more or less). We wrote client code and wallets that worked identically for both of them (you switch the url and the code just works). The reason was that we chose standard numbering for the oneof fields, such that we use the same numbers for all standard types, and thus if I encode a msg that falls in the intersection of the two tx types of the two chains, it is guaranteed to have the same representation with both codecs.

Please take a look at the comment here: https://github.com/iov-one/weave/blob/master/cmd/bnsd/app/codec.proto#L25-L35 We have since removed bcpd, but I know @alpe wrote his own weave app and it was compatible with the existing frontend tooling for all common msg types and queries.

Back in the day, we had to register go-wire types with a type byte, which is like this protobuf field. Now, this type prefix is autogenerated from a hash of the descriptor, so this overlap happens as long as you register them under the same name. This is fundamentally the same as using the same field numbers. If instead of cdc.RegisterConcrete(MsgSend{}, "cosmos-sdk/MsgSend", nil), I use cdc.RegisterConcrete(MsgSend{}, "regen/MsgSend", nil), then my amino types will also be incompatible.

I think this whole issue can be resolved very easily via (1) as long as we set a standard naming for all types exposed from the standard modules (cosmos-sdk and other common modules, like group account work). Custom chains can freely use fields from 250 or so to make sure there is no overlap, and there is some discussion to claim a field number below that (when it is mean to be reused between many apps and “standardized”), kind of like you need root to bind a port below 1024.

It just takes a bit of discipline and a canonical registry of types here. We will need the same discipline (or more) to maintain any kind of backwards-compatibility guarantees as well.

@alexanderbez just double-checking, are we in agreement on migrating state encoding to Any as well? I think it’s nice that we have the Codec interfaces to allow SDK users the choice (they could choose amino for instance) and I think we should keep that option open. But choosing the same default of Any for state and tx seems to make sense. What do you think?

This is how I think we should proceed as well. It seemed like on our call last week most folks were in favor for using Any in both state & transactions encoding, but saw it as a stretch goal. In my understanding, the conversations in this thread since that call have shown hesitation with having state & transaction encoding diverge. So aligning both on Any seems to be the favored strategy.

@alexanderbez just double-checking, are we in agreement on migrating state encoding to Any as well? I think it’s nice that we have the Codec interfaces to allow SDK users the choice (they could choose amino for instance) and I think we should keep that option open. But choosing the same default of Any for state and tx seems to make sense. What do you think?

Does anyone have concerns about approach 3 besides message size/performance?

I have run some contrived benchmarks to assess the impact of Any on both performance and disk usage. In my scenarios with random data, the type URL for Any takes up ~47% of the total message size which relatively “bad” scenario. The performance overhead for round trip encoding is ~13%. Disk usage with leveldb is only ~20% more, so even leveldb’s basic compression does something. In a real scenario I would expect the type URL to take up max maybe 15-20% of a tx (consider the size of signatures and pubkeys). So we should be seeing a much smaller hit (maybe 5-6% performance, 5-10% disk usage??). Furthermore as @iramiller mentioned on Discord, since we know exactly what type of data we’re trying to compress, adding a custom compressor in the stack is totally feasible.

So all that’s to say that if performance is the main concern, a) I’m not too alarmed by my preliminary benchmarks and b) there are optimizations we can introduce later to address size concerns.

With all of that, I’m going to agree with @iramiller and @jackzampolin in advocating approach 3).


I do also want to note that even if @ethanfrey and @webmaster128’s approaches for assigning field numbers are feasible, there is a non-trivial developer impact in having to deal with master proto files that all have slightly different combinations of supported msg’s and interface oneof numbers. How will this work for creating generic signing libraries in statically typed languages? Which master proto file will they reference for their tx? I’m not saying it’s impossible using things like traits, interfaces, etc. But something to consider.

My primary convern with (2) is that it is assumes the client does not need and does not get full control of the bytes exchanged with Tendermint. It optimizes for a script kiddie style client development and assumes a fully powered external instance will take care of the rest. If the client wants to get all the features and not a minimal subset, they need to re-implement a massive document manipulation process that is hard to get right even for experienced low level developers.

This is a fair ask and likely something we should consider more seriously.

My secondary concern is that when the bytes sent and the bytes signed differ, you open the door for transaction malleability bugs. I found one of those in Lisk, which I could have used to destroy 300 million market cap at that time. This is probably more dangerous for JSON/protobuf mappings than protobuf/protobuf mappings, but still. And even if it was secure, you still need to convince a lot of people that it is. Every audit of every implementation gets much harder and more expensive when there are multiple representations of a document.

I tend to think of protobuf to JSON as more secure that just signing with protobuf as the JSON includes more information. Nobody ever proposed mapping JSON to protobuf for signing, but that seems like similar to what is actually happened with the Lisk bug. A high resolution format was reduced to a lower resolution format for signing. With protobuf to JSON it is generally the opposite. The JSON representation includes more information than that protobuf representation - specifically field and sometimes type names - and I think this is the general argument for doing JSON signing. Now, theoretically there could be some types of data which are higher resolution in protobuf than their JSON representation, but it should be pretty straightforward to figure out which ones those are. Taking a quick look, it does appear that Timestamp.nanos does contain more resolution (int32 - max 10 digits) in protobuf than JSON (max 9 digits). I don’t know if that malleability could ever be exploited, but I’ll take your point as proven there. It can be mitigated, of course, we can audit the JSON and tweak the spec to not allow that malleability for Timestamp, but you are right that it needs to be audited and proven.

A chain can always fool a multi-chain client by using a different implementation for a given a common instruction that is binary compatible (e.g. replace send tokens with burn tokens). But then this is a shitty chain you should avoid, just like a chain that uses reserved field numbers for non-standard purposes.

This is the scenario I’m actually more concerned about. And not that a chain would intentionally fool clients but that simple user error could cause this to happen. Consider we have two messages MsgMint and MsgBurn that have identical fields (say just signer and coins). Now say this is what’s in the big master oneof registry:

oneof sum {
   ...
   MsgMint mint = 10315;
   MsgBurn burn = 10316;
}

Now say a developer is copying this to the chain proto file (or alternatively to a client app proto file) and they write:

oneof sum {
   ...
    MsgBurn burn = 10315;
    MsgMint mint = 10316;
}

Maybe there’s no malicious intent just user error and then the chain is mostly compatible with standard wallets with one important exception! Just signing the proto binary with approach 1) would not prevent this, but signing the JSON would (because burn and mint would show up in the sign JSON).

However, JSON wouldn’t fix this sort of screw up:

oneof sum {
   ...
    MsgBurn mint = 10315;
    MsgMint burn = 10316;
}

But, approach 3) would because the actual type name MsgBurn or MsgMint would end up in both the proto binary and JSON serialization.

So, if security is our number 1 concern, so far approach 3) seems the most secure - the least amount of manual user coordination with the highest resolution information.

I think we should move the JSON representation discussion to a different place and get clarity about what is signed first before it is discussed which representation is signed. (spoiler: I wonder to what degree the current proposal makes sense)

I wonder what others who advocated for 3) think about this option? @webmaster128 @iramiller ?

I was not aware of the solution that Ethan proposed for the multi-chain support. Be he is right and I support both (1) and to a slightly lesser degree (3).

My primary convern with (2) is that it is assumes the client does not need and does not get full control of the bytes exchanged with Tendermint. It optimizes for a script kiddie style client development and assumes a fully powered external instance will take care of the rest. If the client wants to get all the features and not a minimal subset, they need to re-implement a massive document manipulation process that is hard to get right even for experienced low level developers.

Instead, I think we should optimize for a client lib / client app developer split as Billy mentioned before. It is enough that there is one properly maintained core library per programming language. This is easy to get if the overhead in the spec is minimal. This also ensures we don’t end up with a standard defined by messy Go internals (again). As a client lib developer, I want to be able to implement the full standard to interact with Tendermint (read/write) without knowing any Go.

My secondary concern is that when the bytes sent and the bytes signed differ, you open the door for transaction malleability bugs. I found one of those in Lisk, which I could have used to destroy 300 million market cap at that time. This is probably more dangerous for JSON/protobuf mappings than protobuf/protobuf mappings, but still. And even if it was secure, you still need to convince a lot of people that it is. Every audit of every implementation gets much harder and more expensive when there are multiple representations of a document.

Given those two concerns, I really would not mind a few percent of storage, if it ensures that the resuling chain is (a) used and (b) secure. Without the later, storage optimizations don’t matter.

one of the biggest reason folks were not in favor of @ethanfrey’s registry proposal had to do with the governance overhead of having a registry of message types for a fully decentralized ecosystem.

The are about 530 million protobuf field numbers. I don’t see any significant governance in defining something like: the first 1 million are reserves for SDK core modules in this repo, the first 10 million are open for commonly used external community supported modules (e.g. CosmWasm), and everything above can be used by chains as they want

A chain can always fool a multi-chain client by using a different implementation for a given a common instruction that is binary compatible (e.g. replace send tokens with burn tokens). But then this is a shitty chain you should avoid, just like a chain that uses reserved field numbers for non-standard purposes.

I’m not sure quite what you mean here @ethanfrey. Are you saying that protobuf-js can work extend protobuf functionality at runtime? Or just that a code generation step isn’t needed?

You can inject custom .protos into protobuf-js at runtime and get dynamically generated classes with encoders and decoders in JavaScript. In contrast to .protos known at library build time, you don’t have the code generator that gives you TypeScript interfaces. But still pretty neat. This could be wrapped in a Cosmos specific lib that makes the API a bit nicer for the context.

I advocate against 2. It seems way to complex.

Although I think 1 is the most efficient and organized version, and the idea of a registry is made to be much bigger than it is (just refer to the app protobuf definition in gaia as a basis), I have no desire to try to convince anyone. The other ones also need canonical names for the “any” variants, it just becomes less visible that a registry is useful until later on.

2 adds even more complexity - JSON plus 2 protobuf files. 🤯

With 3 we end up with self-describing protobuf, which may be larger than (1) but smaller than the JSON representation, which is also self-describing. In such a case, I would see no need at all for using JSON for signing. (I do see the argument that you want a self-describing format that can be presented to the user if we encode it in approach 1). I see 3 as meaningful if developers:

a. want a similar ux to amino (dynamically registering types rather than define them in one file) and b. want a self-describing binary format

I see three different potential approaches with different tradeoffs. I think they’re pretty simply classified based on their use of oneof vs Any for different things. Basically: 1) oneof everywhere, 2) oneof for encoding, Any for signing, 3) Any everywhere.

I’ve tried to list out the pros and cons as I see them.


1) oneof everywhere

The current approach in cosmos-sdk master which uses oneof’s to encode all instances of interface types

Pros:

  • small message side (no overhead of URLs in Any)
  • clear communication of which interfaces are actually supported via the oneof
  • same message for signing as for tx encoding

Cons:

  • clients need to run code generation for every new app that they want to support and write adapter functions to handle:
    • transaction generation
    • signing
    • any types that use interfaces (i.e. gov.MsgSubmitProposal will be slightly different for every chain)

I was happy with this approach until I took some time to explore the client-side proto experience and realized that the con above would create a worse developer experience than amino for wallets and block explorers that want to and currently do support multiple chains

2) oneof for encoding, Any for signing

The approach described in #6031 where Any is used for tx signing and also for tx broadcasting via an RPC service

Pros:

  • client apps with a set of module files can support any chains that support those modules without need for code generation or custom adapter functions (same as now with amino)
  • encoded message size is still small (because encoded tx’s use oneofs instead of Any)
  • app-level proto files can still be used to figure out the set of interfaces supporting

Cons:

  • different types for signing and encoding so clients still need the app-level proto files if they want to
    • do merkle state proofs
    • calculate transaction IDs
    • broadcast transactions to Tendermint (less dependencies, more entry points to the network)
    • decode transactions from Tendermint (e.g. tx search and subscriptions)

3) Any everywhere

Use Any instead of oneofs for both signing and encoding. This is the approach @Liamsi and I advocate in https://github.com/tendermint/go-amino/issues/267 but it was discarded in favor of 1), to reduce encoded message size and also to more clearly communicate what interfaces are supported. We haven’t been considering it recently, but since we’re having this discussion, I think it deserves a mention.

Pros:

  • client apps with a set of module files can support any chains that support those modules without need for code generation or custom adapter functions (same as now with amino)
  • same message for signing as for tx encoding

Cons:

  • out of band information is needed to know which interfaces are supported by a given chain (same as now with amino). Note that this could be discoverable via some RPC service
  • larger encoded tx size, although this could be offset by rocksdb compression
  • tight coupling of state encoding and client API - chains need to break support of the client API if it doesn’t work for their app, unlike 2)

#6031 (approach 2) is my attempt to balance the our current approach 1) with what seems like a better client-side UX, but I could be wrong. Would appreciate the input of client side developers on this. @webmaster128 would love to hear your thoughts.

The reason we sign JSON encodings of transactions rather than the bytes sent over the wire is that we prioritize the ability of humans to inspect what they are signing over the ease of confusion attacks.

The strongly typed nature of AMINO JSON by reflecting over the paths has been a pretty robust defense against type confusion/ malleability attacks.

As long as we retain this property in what we sign we should be fine.