spec: handling of datacontenttype is inconsistent
CloudEvents 1.0
Consider this example, straight from the spec:
{
...
"datacontenttype" : "text/xml",
"data" : "<much wow=\"xml\"/>"
}
Clearly, data is some structure that has been encoded using the XML format and put into the event as a string (binary). Naturally, I’d assume the same behavior for JSON encoding:
{
...
"datacontenttype" : "application/json",
"data" : "{\"foo\": \"bar\"}"
}
However, that’s doesn’t seem to be the case; as the example in the HTTP protocol binding spec shows, the JSON object is not sent in its encoded form but rather nested into the event directly:
{
...
"datacontenttype" : "application/json",
"data" : {
"foo": "bar
}
}
Note that removing the optional datacontenttype attribute doesn’t change this, as the spec clearly states:
A JSON-format event with no datacontenttype is exactly equivalent to one with datacontenttype=“application/json”.
To sum it up, it is not possible to put a JSON-encoded data blob into a CloudEvent; and a parser needs to treat application/json different than any other datacontenttype.
HTTP Protocol Binding 1.0
For structured content mode, the spec says:
The chosen event format defines how all attributes, and data, are represented.
Does this mean that datacontenttype must be present and set to the event format? Or does structured mode implicitly change the default of datacontenttype from application/json to whatever event format is in use? What if datacontenttype is present and set to a different encoding - must a parser treat this event as malformed?
JSON Event Format 1.0
As a side note, the JSON Format spec makes this even more confusing:
If the implementation determines that the type of data is Binary, the value MUST be represented as a JSON string expression containing the Base64 encoded binary value, and use the member name data_base64 to store it inside the JSON object.
This basically says that you have to Base64-encode any simple JSON string (which is, of course, binary). Also, if a receiver does not implement the optional (!) JSON Format spec, it won’t be able to parse the data_base64 value; consequently, implementing the JSON Format spec as a sender means not implementing the full CloudEvents spec.
About this issue
- Original URL
- State: open
- Created 4 years ago
- Reactions: 2
- Comments: 52 (27 by maintainers)
Regarding these two comments about
data_json,data_textanddata_base64: https://github.com/cloudevents/spec/issues/558#issuecomment-873890684 https://github.com/cloudevents/spec/issues/558#issuecomment-876218637I realize this is about solving the issue in spec version 1.0 and not be a breaking change, but going beyond that, is there any discussion for the next version anywhere that would allow for breaking changes like this?
I’d prefer to see a
dataencodingattribute “re”-added with a value of eitherjson,textorbase64and then only a singledataattribute to hold the payload. I’m not seeing the benefit of instead defining individual attributes as mentioned ofdata_json,data_textordata_base64. It sounds like adataencodingattribute was once part of the spec but dropped, maybe it needs to be re-introduced. This would remove any “special” case of*/jsonor*/*+jsonfor thedatacontenttypeattribute and simplify the whole confusion here. Or maybe I’m missing why it wouldn’t.I also question the attribute naming formats for consistency. The other attributes are all lowercase, not camel nor snake, so why is
data_base64all of a sudden using snake case? For consistency is should bedatabase64. But to avoid this inconsistency altogether and to avoid adding any moredata_xxxfields later, I propose just usedataonly and adddataencodingto specify the encoding format.This issue is still open, so I thought I would add my suggestion. I’m a bit confused by the merges as to whether this is considered fixed for spec 1.0 or not now, but I’m suggesting how I think it could be simplified for a future version anyways.
Examples
JSON as JSON
If
dataencodingisjson, then onlydatacontenttypeof*/jsonor*/*+jsonis allowed.To read this would be:
var value = event.data.value;JSON as text
To read this would be:
var value = parseJson(event.data).value;XML as text
To read this would be:
var wow = parseXml(event.data).attr("wow");JSON as bytes
To read this would be:
var value = parseJson(toUtf8String(fromBase64(event.data))).value;XML as bytes
To read this would be:
var wow = parseXml(toUtf8String(fromBase64(event.data))).attr("wow");Binary as bytes
To read this would be:
var imageBytes = fromBase64(event.data);Thank you.
@duglin @deissnerk This is coming up in my work on the Ruby SDK, and I want to bring up a clarification question.
To summarize a conclusion from above:
In the following CE:
… it sounds like the data should be considered a JSON value of type string. The fact that the string’s value happens to look like serialized JSON is irrelevant. It is simply a string. Therefore, if we were to serialize this CE in HTTP Binary mode, it might look like this:
The data must be “escaped” in this way, so that a receiver parsing this content with the
application/jsoncontent type will end up with a JSON string and not an object.As a corollary, when deserializing an HTTP Binary mode CE with
Content-Type: application/json, the HTTP protocol handler must parse the JSON and set the data attribute in memory to the actual JSON value (rather than the string representation of the JSON document). Otherwise, the content’s semantics will change when the CE gets re-serialized. And this, of course, all implies that an SDK’s HTTP protocol handler (and perhaps other protocol handlers as well) must understand JSON, even if the JSON structured format is not in use.Taking that as given, consider this implication:
Earlier a comparison was made with
application/xml, noting a possible inconsistency. Consider this parallel example:If we were to treat this XML data consistently with how we treated the earlier JSON data, we would consider this data as a
string nodein an XML document, whose contents just happen to look like XML. Hence, serializing this as HTML-Binary might yield something like:However, my understanding of the spec, and my understanding of the current behavior of the SDKs, suggests we are not doing that. (And indeed I’m glad, because that would, in turn, imply that all protocol handlers would also need to understand XML.) Instead, we actually consider the above data as semantically an XML document and not a string. Hence, serializing this as HTML-Binary actually looks like:
In other words, our handling of the XML content-type appears to be inconsistent with our handling of the JSON content-type.
So my clarification question is:
application/jsonspecially, differently from string data with content-typeapplication/xml(or indeed any other content-type), as illustrated above?If so, follow-up questions:
application/jsonis obvious, but what if the datacontenttype is itselfapplication/cloudevents+json(i.e. a cloudevent whose payload is another cloudevent)? If we do consider JSON special, it seems it might be a good idea for the spec to state that explicitly, and define how it is identified, perhaps with reference to fields in RFC 2046 or similar.To my mind, one problem is that “dataencoding” is a perfectly valid context attribute name, but its use here isn’t really part of the CloudEvent itself. I would be happier with “data_encoding”, to indicate that it’s metadata about the “data” property rather than a separate context attribute.
In terms of supporting both: I’d prefer not to do that, personally. We can’t make this change until 2.0 (it would be a breaking change) and I’d really like to aim for 2.0 to be very, very long-lived. Instead, I think it makes sense for a CloudEvent 1.0 to use the existing format, and a CloudEvent 2.0 to use “whatever we decide is best” - individual SDKs can decide which versions of CloudEvents they support. They may decide to support both 1.0 and 2.0 forever, or drop 1.0 support after 2.0 is widely adopted. Making that an SDK choice rather than having both options in the spec itself feels like a more flexible approach.
To my mind, it’s as “fixed for spec 1.0” as it can reasonably be (modulo clarifying tweaks etc). Yes, there’s a lot that could change for 2.0, although the larger the change in 2.0 (not just here, but everywhere), the harder it will be for SDKs etc to support both 1.0 and 2.0. I haven’t heard any detailed discussions of expectations around timelines for a 2.0 - I think more of the activity is around getting Discovery etc across the line first.
I have been looking into this for cloudevents/sdk-javascript. At the moment, the SDK would produce A as the result of this transformation. In the example below, I’ve left out the transformation from a structured event, and am just creating the event from whole cloth, since this is how it would look after deserialization anyway.
In @deissnerk’s example, the event representation in A isn’t actually a code representation. It’s the representation on the wire. In my illustration above, the
sobject is the in-memory representation of the event as aMessageobject as defined in the SDK. For users of the SDK, it is their responsibility to push this data across the wire. The reasoning behind this is that the networking world in Node.js is rife with lots and lots of competing frameworks, and the underlying Node.js native APIs are need a lot of scaffolding around them to be very user friendly. So, we’ve just provided interfaces for developers to implement, and we send/receive through whatever framework they want as long as what they hand us conforms to our API.When the user ultimately sends the event with something like
That string is just a string, and nothing is wrapping it in quotes. So over the wire, there are no quotes.
Which got me wondering. What if a binary event arrives and it looks like A. Is it invalid? Should the SDK wrap it in quote marks? It’s not very clear.
@Thoemmeli I agree to your point, but
"{\"foo\": \"bar\"}"is a string and therefore also a JSON value. In that sense the example is valid, but thedatacontenttypein this case refers to the string and not to the escaped JSON object.@deissnerk thank you for the explanation! Got one more question:
CE also defines the “dataschema” property as “Identifies the schema that
dataadheres to.” Could we then not define our schema as for example: { “type”: “string”, “contentEncoding”: “base64”, “contentMediaType”: “image/png” } (https://json-schema.org/understanding-json-schema/reference/non_json_data.html) which then would clearly identify contents of data (and data_base64 would be not needed at all)?@n3wscott In the JSON Schema data is defined as being one of two types, object or string:
but array type should not be allowed afaik.
You can however of course:
@duglin I think, the important text in the section you referenced is
So, if data is already a JSON value, no translation is needed.