ktor: application/json default charset is not UTF-8 when parsing request

This is a follow up to #80. Apparently, responses were fixed, but there are still issues with the requests.

Request with Content-Type: application/json is decoded not as UTF-8, while request with Content-Type: application/json; charset=utf-8 is decoded correctly. Both has to behave the same and be decoded as UTF-8.

About this issue

Original URL
State: closed
Created 6 years ago
Reactions: 15
Comments: 30 (7 by maintainers)

Commits related to this issue

Add tests for Gson and Jackson features charset encodings related to issue #384 — committed to ktorio/ktor by deleted user 6 years ago
Add tests for Gson and Jackson features charset encodings related to issue #384 — committed to schleinzer/ktor by deleted user 6 years ago
Corrigir leitura do payload do PubSubHubbub https://github.com/ktorio/ktor/issues/384 — committed to LorittaBot/Loritta by MrPowerGamerBR 4 years ago

Most upvoted comments

Use this function if you just want to “have it work”:

/**
 * Receive the request as String.
 * If there is no Content-Type in the HTTP header specified use ISO_8859_1 as default charset, see https://www.w3.org/International/articles/http-charset/index#charset.
 * But use UTF-8 as default charset for application/json, see https://tools.ietf.org/html/rfc4627#section-3
 */
private suspend fun ApplicationCall.receiveTextWithCorrectEncoding(): String {
  fun ContentType.defaultCharset(): Charset = when (this) {
    ContentType.Application.Json -> Charsets.UTF_8
    else -> Charsets.ISO_8859_1
  }
  
  val contentType = request.contentType()
  val suitableCharset = contentType.charset() ?: contentType.defaultCharset()
  return receiveStream().bufferedReader(charset = suitableCharset).readText()
}

@cy6erGn0m Maybe you want to change the default implementation of receiveText to the implementation of receiveTextWithCorrectEncoding()?

+29

functionaldude on Jan 29, 2019

I have run into this problem as well. The receiveTextWithCorrectEncoding() work-around solved it for now, but it seems like it is still an issue.

clydebarrow on Feb 14, 2020

Oh, I just got what @cy6erGn0m means in his comment. So, it’s expected behavior. But still doesn’t sound like a good and user-friendly even if it follows HTTP standard. JSON by RFC standard is must be encoded in UTF-8. Maybe make sense to open another issue and reconsider current behavior. Even for non-json content types, I hardly can imagine ISO-8859-1 as default encoding in Modern Web, and in most cases it will just cause bugs.

gildor on Aug 10, 2018

@MOZGIII

since I use call.receive<T>()

Actually, receiveText also uses call.receive<String>(), in case of Gson/Jackson there is different behavior based on assumption of UTF-8 encoding for Json.

Yes, I completely agree with you. I solved my problem by reading as ByteArray and converting it to String and specify the encoding explicitly (actually ByteArray.toString() by default uses UTF-8), but current behavior is very error-prone and sometimes hard to understand what is went wrong: in my case signature check failed sporadically for different events (for ones with UTF characters)

gildor on Aug 13, 2018