protobuf: encoding/protojson: DiscardUnknown doesn't work for unknown enum strings

What version of protobuf and what language are you using? Version: 1.24.0 (golang)

What did you do? We’re using protojson to encode our internal protobufs for indexing in ElasticSearch. For interop with other users, we encode enums as strings. At some point, we added a new enum value, which has caused older golang consumers to fail with the following error: invalid value for enum type: "CLASS_RESULTS_PRE_CLINICAL"

What did you expect to see? We set DiscardUnknown in UnmarshalOptions, so I would have expected the value to be parsed as the default enum value.

What did you see instead? invalid value for enum type: "CLASS_RESULTS_PRE_CLINICAL"

About this issue

  • Original URL
  • State: closed
  • Created 4 years ago
  • Reactions: 3
  • Comments: 21 (12 by maintainers)

Commits related to this issue

Most upvoted comments

Seems like the conformance tests are now officialised. How can we expedite @cybrcodr 's PR ?

It would be a lot better if the discussion is being done externally, preferably on the Protobuf issue tracker. It’ll at least allow for contributors and users of different implementations to provide feedback, or at the very least understand the rationale behind a decision. We do have active non-Googler contributors in the Go Protobuf open source project, and I’m sure they can provide valuable feedback. A solution for one immediate use case may present issues in other use cases. It’ll also make reviewing the CL a lot easier if the reviewers have insight to the discussion.

Given the lack of external discussion, I’ll go ahead and present my feedback on the proposed solution as being implemented in https://golang.org/cl/350469 here.

Even if this was only implemented with the DiscardUnknown option enabled, a JSON value with an unknown integer enum would result in a different deserialized value from a JSON value with an unknown string enum. This can be problematic.

https://developers.google.com/protocol-buffers/docs/proto3#enum

In languages that support open enum types with values outside the range of specified symbols, such as C++ and Go, the unknown enum value is simply stored as its underlying integer representation.

While the current parser can accept integer value outside the specified range, it intentionally rejects unknown string value in order to avoid producing a different enum value.

A possible solution would be to present a different option for this special treatment of enum values outside of specified range regardless of whether the value is an integer or a string. This would keep the DiscardUnknown option as discarding unknown fields only.

While one may think we should simply just implement this in the Go implementation, we have learned through experience that even optional behavior should be specified in the Protobuf spec/guide for consistency as this does present issues if not.

Yeah, I think it’s kind of hard to say what appropriate behavior should be here. Arguably, if you add an enum and a service/client gets that value and doesn’t know which integer value that corresponds to‌… I mean, setting it to the default value is probably not a good idea, unless everyone has followed good procedure and defined their default/zero-value to be ”INVALID VALUE”.

Imagine perhaps, something using googleapis/rpc/code.Code with strings, and a new code gets added, and a service returns that error value as an enum string, and then the client receives it, and goes “I don’t know what this value should be, so it’s just the zero-value” which is‌… OK.

I understand my example is contrived, but it’s just supposed to be an example of how treating the value as an unknown field, or a default value could cause semantics bugs in code.

P.S.: I think if a valid optional support were to be added, it should be explicitly to turn this behavior on, and not conflating unknown fields with unknown enum values. Users of the code should know exactly what behaviors they are enabling/disabling.

@noahdietz

I’d appreciate it if your team or the Protobuf team can update the Protobuf Language Guide

Definitely! I’ll pass that along 😃

Sorry, I should be more clear on this. I’d like to request that the Protobuf Language Guide be updated with regards to this because the current behavior when deserializing enum strings already vary across language implementations. Changing it based on some internal discussion won’t provide for a valid reference especially if other users have different expectations.

Thanks!

Hi @noahdietz,

Thanks for engaging the Protobuf team on this.

I’d appreciate it if your team or the Protobuf team can update the Protobuf Language Guide with regards to what the expected behavior should be across all languages for this issue.

I’m guessing the discussion was done just within the teams and not via https://groups.google.com/g/protobuf or some other channels?