i18n: [BUG] UTF-8 YAML files with accents in version 1.9.0 raise incompatible character encodings: UTF-8 and ASCII-8BIT

For long time I have a code like this:

STATES = I18n.t('states').with_indifferent_access.freeze

The yaml file is in UTF-8 and it has accents in some words:

pt-BR:
  states:
    Acre: AC
    Amapá: AP
    Ceará: CE
    Piauí: PI
    Paraná: PR

with new release 1.9.0 It starts failing in our CI in a lot of places:

ActionView::Template::Error incompatible character encodings: UTF-8 and ASCII-8BIT
Failure/Error: = f.select(:state, City::STATES,

ActionView::Template::Error:
  incompatible character encodings: UTF-8 and ASCII-8BIT
./app/views/customers/_form.html.slim:86:in `block in 

Downgrade to 1.8.11 make everything works again.

With 1.9.0 use rails console it loads like:

rails c
Running via Spring preloader in process 4964
Loading development environment (Rails 6.1.4.4)
[1] pry(main)> I18n.t('states').with_indifferent_access.freeze
=> {"Acre"=>"AC",
 "Alagoas"=>"AL",
 "Amazonas"=>"AM",
 "Amap\xC3\xA1"=>"AP",
 "Bahia"=>"BA",
 "Cear\xC3\xA1"=>"CE",
 "Distrito Federal"=>"DF",
 "Esp\xC3\xADrito Santo"=>"ES",
 "Goi\xC3\xA1s"=>"GO",
 "Maranh\xC3\xA3o"=>"MA",
 "Minas Gerais"=>"MG",
 "Mato Grosso do Sul"=>"MS",
 "Mato Grosso"=>"MT",
 "Par\xC3\xA1"=>"PA",
 "Para\xC3\xADba"=>"PB",
 "Pernambuco"=>"PE",
 "Piau\xC3\xAD"=>"PI",
 "Paran\xC3\xA1"=>"PR",
 "Rio de Janeiro"=>"RJ",
 "Rio Grande do Norte"=>"RN",
 "Rond\xC3\xB4nia"=>"RO",
 "Roraima"=>"RR",
 "Rio Grande do Sul"=>"RS",
 "Santa Catarina"=>"SC",
 "Sergipe"=>"SE",
 "S\xC3\xA3o Paulo"=>"SP",
 "Tocantins"=>"TO"}

With 1.8.11 use rails console it loads like:

rails c
Running via Spring preloader in process 3411
Loading development environment (Rails 6.1.4.4)
[1] pry(main)> I18n.t('states').with_indifferent_access.freeze
=> {"Acre"=>"AC",
 "Alagoas"=>"AL",
 "Amazonas"=>"AM",
 "Amap\xC3\xA1"=>"AP",
 "Bahia"=>"BA",
 "Cear\xC3\xA1"=>"CE",
 "Distrito Federal"=>"DF",
 "Esp\xC3\xADrito Santo"=>"ES",
 "Goi\xC3\xA1s"=>"GO",
 "Maranh\xC3\xA3o"=>"MA",
 "Minas Gerais"=>"MG",
 "Mato Grosso do Sul"=>"MS",
 "Mato Grosso"=>"MT",
 "Par\xC3\xA1"=>"PA",
 "Para\xC3\xADba"=>"PB",
 "Pernambuco"=>"PE",
 "Piau\xC3\xAD"=>"PI",
 "Paran\xC3\xA1"=>"PR",
 "Rio de Janeiro"=>"RJ",
 "Rio Grande do Norte"=>"RN",
 "Rond\xC3\xB4nia"=>"RO",
 "Roraima"=>"RR",
 "Rio Grande do Sul"=>"RS",
 "Santa Catarina"=>"SC",
 "Sergipe"=>"SE",
 "S\xC3\xA3o Paulo"=>"SP",
 "Tocantins"=>"TO"}

The output seems exactly the same. Is that a bug in new version?

Probably something between new version and load in rails. This is just a sample I have others files with accents and all of them are raising same exception.

Versions of i18n, rails, and anything else you think is necessary

ruby: 3.0.3 i18n: 1.9.0 rails: 6.1.4.3 rspec-rails: 5.1.0 rspec: 3.10.0

and # frozen_string_literal: true in ruby files.

About this issue

  • Original URL
  • State: open
  • Created 2 years ago
  • Reactions: 4
  • Comments: 21

Commits related to this issue

Most upvoted comments

msgpack 1.4.5 was released a few hours ago and should solve this issue: https://rubygems.org/gems/msgpack/versions/1.4.5

Ok, so the bug is actually in msgpack, I opened a PR here: https://github.com/msgpack/msgpack-ruby/pull/246

You can apply the patch with:

gem 'msgpack', github: 'Shopify/msgpack-ruby', branch: 'symbolize-keys-fix-encoding'
>> I18n.t(:foobar)
=> {:ö=>"ü"}

Alternatively, if you’d rather not run a gem branch, you can disable Bootsnap YAML caching. Sorry for the bug 😕

Reviewing this again, I think we’ll just wait for a new msgpack release to happen, and then advise people who encounter this issue to upgrade to that new version.

I’ll be leaving this issue open until that new version is out.

He said early next week hopefully.

Ah damn it, I know what the problem is. It’s because Bootsnap uses msgpack to accelarate YAML parsing, and MessagePack use an API that doesn’t preserve symbols encoding properly. See an issue I opened a while ago https://github.com/msgpack/msgpack-ruby/pull/211

Let me go over my old research see how we could sidestep this in bootsnap. I’ll update here ASAP.

Commit that breaks this behaviour is 0fda789ea745cd462658a8948ee085201aba5c6f, as discovered through a git bisect:

 ~/code/gems/i18n   v1.9.0~7^2 (bisect)  bad
0fda789ea745cd462658a8948ee085201aba5c6f is the first bad commit
commit 0fda789ea745cd462658a8948ee085201aba5c6f
Author: Paarth Madan <paarth.madan@shopify.com>
Date:   Wed Nov 3 12:33:12 2021 -0400

    Symbolize and freeze keys when loading from YAML

 lib/i18n/backend/base.rb    | 2 +-
 test/backend/simple_test.rb | 7 ++++++-
 2 files changed, 7 insertions(+), 2 deletions(-)

Thank you @joergschiller I am not alone 🙏🏻 I was going to make a repo and you saved me. I was believing that It was something desired in new version but now I think we have a bug.