commonmarker: Error "incompatible character encodings: UTF-8 and ASCII-8BIT" when combined with a rails app

I think this might not be a commonmarker problem, BUT the error is not raised when using pandoc-ruby nor redcarpet, so it has something to do with commonmarker.

Here you can see a test run from the command line with both cmark and commonmarker and there’s no problem:

$ cat test-curly-quotes.md
This curly quote “makes commonmarker throw an exception”.

$ cmark --version
cmark 0.20.0 - CommonMark converter
(C) 2014, 2015 John MacFarlane

$ cmark test-curly-quotes.md
<p>This curly quote “makes commonmarker throw an exception”.</p>

$ gem list --local commonmarker

*** LOCAL GEMS ***

commonmarker (0.2.0)

$ cat test-curly-quotes.md | ruby -r commonmarker -e "puts CommonMarker.render_html(gets)"
<p>This curly quote “makes commonmarker throw an exception”.</p>

That said, I’m testing different markdown parsers/renderers for our rails 4.1.12 (ruby 2.2.2) based app and I’m getting the following error:

ActionView::Template::Error (incompatible character encodings: UTF-8 and ASCII-8BIT):
    12:       - if user_signed_in?
    13:         .outline-content
    14:           = commonmarker_markdown(@quimbee_outline.source)
  app/views/outlines/show.html.slim:15:in `_app_views_outlines_show_html_slim___3317075370232322437_70158621096300'


  Rendered /Users/oboxodo/.rbenv/versions/2.2.2/lib/ruby/gems/2.2.0/gems/actionpack-4.1.12/lib/action_dispatch/middleware/templates/rescues/_trace.html.erb (2.9ms)
  Rendered /Users/oboxodo/.rbenv/versions/2.2.2/lib/ruby/gems/2.2.0/gems/actionpack-4.1.12/lib/action_dispatch/middleware/templates/rescues/_request_and_response.html.erb (1.7ms)
  Rendered /Users/oboxodo/.rbenv/versions/2.2.2/lib/ruby/gems/2.2.0/gems/actionpack-4.1.12/lib/action_dispatch/middleware/templates/rescues/template_error.html.erb within rescues/layout (69.1ms)

I have these helpers:

# encoding: UTF-8
module ApplicationHelper
  def commonmarker_markdown(text)
    CommonMarker.render_html(text, :smart).html_safe
  end

  def pandoc_markdown(text)
    converter = PandocRuby.new(text, from: :markdown, to: :html)
    converter.convert.html_safe
  end

  def redcarpet_markdown(text)
    # ...
  end
end

Changing the call to commonmarker_markdown to either pandoc_markdown or redcarpet_markdown renders the expected result with no errors.

It’s not a DB (postgresql) encoding problem either as hardcoding the test phrase in place of the text variable (no DB involved) causes the same problem.

Any ideas about what could be happening?

About this issue

  • Original URL
  • State: closed
  • Created 9 years ago
  • Comments: 28 (16 by maintainers)

Most upvoted comments

Due to https://github.com/gjtorikian/commonmarker/pull/186, walking over nodes has been removed in v1.0.0. Users can use https://github.com/gjtorikian/html-pipeline if they wish to iterate over HTML after the fact.

Interesting, I ran those steps on a fresh rbenv env, and I still get the same result. Do you get a different result with the code block I posted above?

Oh shoot, I do. Ok. I’ll make time for this today.

Got it. In ruby the convention is to use \u to indicate a unicode hexadecimal:

irb(main):006:0> str = "hello: <https://world.com\u200b>"
=> "hello: <https://world.com​ >"

I can now reproduce the problem; now we’re getting somewhere.

So you can absolutely walk the AST tree: https://github.com/gjtorikian/commonmarker#example-walking-the-ast

But that’s very slow/time-consuming, and ideally shouldn’t be necessary. Are you able to share your markdown doc or create a small (failing) test to show the error?