azure-functions-nodejs-worker: The charset of content-type request header is ignored
Abstract
It seems that a charset of the content-type response header is ignored in specific case. If a request has an Accept-Charset request header including “utf-16, utf-8” value, the Azure Functions returns a response that has a response body encoded by UTF-16LE ignoring a charset in the “content-type” response header and has a “charset=utf-16” (this means “UTF-16BE”) in the “Content-Type” response header, therefore the client cannot handle the response.
Expected Behavior
- The Azure Functions should determines the charset of the response body according to the charset that is set as the “content-type” response header by a code without using the “Accept-Charset” request header value. That is, if the “charset=utf-8” specified, the response body charset should become UTF-8. If the “charset=utf-16” or "charset=utf-16be specified, the response body charset should become UTF-16BE. And, if the “charset=utf-16le” specified, the response body charset should become UTF-16LE.
- Otherwise, the charset of the content-type response header should become “charset=utf-16le” to fit the response body UTF-16LE charset. Or, the charset of the response body is changed to “UTF-16BE”.
Detail of Current Behavior
For example, it assumes that I write the following code to return the UTF-8 content:
module.exports = (context, req) => {
context.res = {
status: 200,
headers: {
'content-type': 'application/json; charset=utf-8'
},
body: {
'fulfillmentText': 'Hi, Yoichiro!'
}
};
context.done();
};
If a request has an Accept-Charset request header with “utf-8” value only, the response is the following:
- Content-Type: application/json; charset=utf-8
- Response Body Charset: UTF-8
This above is a correct behavior.
Next, if a request has an Accept-Charset request header including “utf-16,utf-8”, the response is the following:
- Content-Type: application/json; charset=utf-16
- Response Body Charset: UTF-16LE
This is a bad behavior. The reason is follows:
If the response body charset (UTF-16LE) is correct (that is, the content-type response header set by the code should be ignored) as the specification, the “charset=utf-16” of the content type is wrong. Because, the “utf-16” means “UTF-16BE (Big Endian)”, not “UTF-16LE (Little Endian)”. This specification is described by the RFC 2781:
Text labelled with the “UTF-16” charset might be serialized in either big-endian or little-endian order. If the first two octets of the text is 0xFE followed by 0xFF, then the text can be interpreted as being big-endian. If the first two octets of the text is 0xFF followed by 0xFE, then the text can be interpreted as being little-endian. If the first two octets of the text is not 0xFE followed by 0xFF, and is not 0xFF followed by 0xFE, then the text SHOULD be interpreted as being big-endian.
The response body string returned by the Azure Functions doesn’t have any BOM (Byte Order Mark), and the string is labeled as “utf-16” by the content-type. As the result, the response body string must be handled as “UTF-16BE” charset. However, the response body string charset is actually UTF-16LE.
On the other hand, the code above is setting the “charset=utf-8” in the content-type. That is, this means that I want to return the response as UTF-8 string. However, the actual sent response charset is UTF-16(LE), and the actual content-type is replaced with the “charset=utf-16”. If the Accept-Charset request header value doesn’t have “utf-8”, this might be a useful behavior (but, I’m not sure this is correct). But, at least, I expect that the response body charset should become UTF-8 without ignoring the specified charset.
In addition, for example, it assumes that I write the following code:
module.exports = (context, req) => {
context.res = {
status: 200,
headers: {
'content-type': 'application/json; charset=utf-16le'
},
body: {
'fulfillmentText': 'Hi, Yoichiro!'
}
};
context.done();
};
And, If a request has an Accept-Charset request header with “utf-16le,utf-16,utf-8” value, the response is the following:
- Content-Type: application/json; charset=utf-16
- Response Body Charset: UTF-16LE
The content-type value I set is forcibly replaced from “utf-16le” with “utf-16”. This is also a bad behavior, I think.
Step to Reproduce
Write the following code and deploy it:
module.exports = (context, req) => {
context.res = {
status: 200,
headers: {
'content-type': 'application/json; charset=utf-8'
},
body: {
'fulfillmentText': 'Hi, Yoichiro!'
}
};
context.done();
};
Next, call the function with curl command as like the following:
$ curl -v -X POST -d "{}" -H "Accept-Charset: utf-16,utf-8" <Function URL>
You should get the response with “Content-Type: application/json; charset=utf-16” and UTF-16LE response body string.
About this issue
- Original URL
- State: closed
- Created 5 years ago
- Reactions: 1
- Comments: 17 (4 by maintainers)
I can reproduce this issue here and suffering from the same defect. Azure functions seems to ignore my carefully crafted content-type response header and does it’s own content negotiation anyway.
One could re-phrase the bug as “Content negotiation is on by default and cannot be turned off” (though @yoichiro’s report is far more comprehensive)
I spent some time investigating this bug. I was able to repro, but as mentioned above there is a workaround. Fortunately, this bug does not repro for the new model v4, likely because we made several changes to the http types. If you’re not familiar with model v4, see the upgrade guide here for more details.
Trying to fix this for the v3 model would be difficult because people could be relying on the old behavior. Since there is a workaround for v3 and this is already fixed in v4, I’ll go ahead and close this issue.
Repro code for model v3
Expected: The response charset is set to
ISO-8859-1Actual: The response charset is set tocharset=utf-8Workaround code for model v3
Encode the body as a buffer using the intended charset before returning the response. I used the popular iconv-lite npm package to help since many encodings are not supported natively by Node.js.
Example code for model v4 that works as-is
Yes, at least, I believe that the UTF-16LE/BE issue is a bug which must be fixed.
Nice job investigating this, @yoichiro! I reported something similar to this a couple months ago and they said they were going to investigate content-type being ignored as well: https://github.com/Azure/azure-functions-host/issues/3435. However the utf-16be/le issue you identified is probably the bigger issue, since we wouldn’t have any need to set content-type if Azure Functions was doing it correctly itself.