jekyll: "Liquid Exception: invalid byte sequence in UTF-8..." with binaries

  • I believe this to be a bug, not a question about using Jekyll.
  • I Updated to the latest Jekyll (or) if on Github Pages to the latest github-pages
  • I am on (or have tested on) _macOS_ 10+
  • I was trying to build.

My Reproduction Steps

  1. Create a subdirectory under the _posts folder called 2016-08-02-example.
  2. Add a file 2016-08-02-example.md to this folder. (With proper frontmatter, etc.)
  3. bundle exec jekyll serve and all is well.
  4. Add an image file (ie. 08-02-16-image.png) to this folder.
  5. bundle exec jekyll serve now yields an error.

With both JEYKLL_LOG_LEVEL=debug and the -t switch output is:

...
 Rendering Markup: _posts/2016-08-02-example/2016-08-02-example.md
         Rendering: _posts/2016-08-02-example/2016-08-02-image.png
  Pre-Render Hooks: _posts/2016-08-02-example/2016-08-02-image.png
$USER_DIR$/blog/_posts/2016-08-02-example/2016-08-02-image.png render_with_liquid?  false 
  Rendering Liquid: _posts/2016-08-02-example/2016-08-02-image.png
  Liquid Exception: invalid byte sequence in UTF-8 in $USER_DIR$/blog/_posts/2016-08-02-example/2016-08-02-image.png
$USER_DIR$/.gem/ruby/2.2.3/gems/liquid-3.0.6/lib/liquid/template.rb:232:in `split': invalid byte sequence in UTF-8 (ArgumentError)
  from $USER_DIR$/.gem/ruby/2.2.3/gems/liquid-3.0.6/lib/liquid/template.rb:232:in `tokenize'
  from $USER_DIR$/.gem/ruby/2.2.3/gems/liquid-3.0.6/lib/liquid/template.rb:122:in `parse'
  from $USER_DIR$/.gem/ruby/2.2.3/gems/liquid-3.0.6/lib/liquid/template.rb:108:in `parse'
  from $USER_DIR$/.gem/ruby/2.2.3/bundler/gems/jekyll-6d2b344c0e24/lib/jekyll/liquid_renderer/file.rb:11:in `block in parse'
  from $USER_DIR$/.gem/ruby/2.2.3/bundler/gems/jekyll-6d2b344c0e24/lib/jekyll/liquid_renderer/file.rb:47:in `measure_time'
  from $USER_DIR$/.gem/ruby/2.2.3/bundler/gems/jekyll-6d2b344c0e24/lib/jekyll/liquid_renderer/file.rb:10:in `parse'
  from $USER_DIR$/.gem/ruby/2.2.3/bundler/gems/jekyll-6d2b344c0e24/lib/jekyll/renderer.rb:109:in `render_liquid'
  from $USER_DIR$/.gem/ruby/2.2.3/bundler/gems/jekyll-6d2b344c0e24/lib/jekyll/renderer.rb:62:in `run'
  from $USER_DIR$/.gem/ruby/2.2.3/bundler/gems/jekyll-6d2b344c0e24/lib/jekyll/site.rb:447:in `block (2 levels) in render_docs'
...

The Output I Wanted

I would expect binary files (images, etc.) to simply be ignored and copied through to the output directory unmodified.

Previous issues

This issue seems to have come up in past issues which may relate, including: #2592, #4276, #2262, and #2228.

Root cause

This appears to be due to the logic of Jekyll::Document.render_with_liquid?

    def render_with_liquid?
      !(coffeescript_file? || yaml_file?)
    end

As you can see, it returns true if the source file is not coffeescript or yaml, which a .png isn’t. It’s not suitable for liquid either though, hence the problem.

As a quick and dirty test, I added a case for .png extensions:

    def render_with_liquid?
      !(coffeescript_file? || yaml_file? || %w(.png .jpg .bin).include?(extname)
    end

This has the desired effect, and seems to confirm I’m on the right track. Obviously it’s not a good solution though (hence no PR). Assuming that storing non-liquid files in the same directory is supported (I believe it is?) this probably needs to ensure the content is suitable (ie. text, likely UTF-8) before passing it to Liquid. Maybe something along the lines of checking against config_yml’s existing markdown_ext?

I’ll see if I can get a test together to send as a PR, while I’m not quite sure of the proper solution, I think I’ve got a solid feel for the issue behind this behavior at least.

About this issue

  • Original URL
  • State: closed
  • Created 8 years ago
  • Comments: 19 (11 by maintainers)

Commits related to this issue

Most upvoted comments

I’m still having this issue, and it appears to occur without regard to whether there is a date in the image’s filename.

I’m still seeing this with Jekyll 3.4.2.

I am also seeing this issue, and have and idea that the reason the error seems intermittent is that it depends on the binary content of the image file.

In my case, I get the error for a .png file, no matter how I name it. My guess is that this is because the initial bytes in the .png-header, which read 0x8950 4e47, is not valid UTF-8 (because all valid UTF-8 code points have either no leading ones or two or more leading ones in their binary form.

This means that as soon as you try converting even the first few bytes of a png-file to a string, you will get an encoding error.

A solution might be to have the front-matter detection code look for the ascii-encoded byte-string corresponding to ---\n rather than converting to string and then comparing.

@nhoizey good catch! I just did a test : no error message when image file does not start with a date, error message when image file is named like a post.

I’ve made some progress on this, but I have a few problems.

My branch now only renders liquid in Posts which have YAML frontmatter (@jake-low’s idea above). Other files are copied over unmodified.

However, there are a few failing tests. This is due to these files still being added to site.posts. I’m having trouble figuring out a good way to handle it; the PostReader simply creates Jekyll::Document instances for each file in _posts, which in turn populates site.posts. It seems that flagging these as Jekyll::StaticFile would be better, but due to some differences between StaticFile and Document constructors and logic, this is turning out to be somewhat difficult.

Does anyone who’s more familiar with the ...Reader classes have an idea how to address this?

@parkr No problem! It looked like other people had encountered this in the past, and not knowing what was behind it was bugging me. Happy to help when I can!

I’d maybe put that into a method like image_file? and exclude .png, .jpg, .jpeg, .svg, and maybe 1 or 2 others, but otherwise it is an acceptable PR. In theory, we should be reading those as Jekyll::StaticFiles and copying them like normal. Curious why it is seen as a document.

That’d work for my immediate need, but I worry that it’s not too flexible. Right now, I really just want image folders in post directories (which I realize is a hotly debated topic), but down the road this might expand to other binary file formats; .pdf, .zip, .tar, .tgz, etc.

For that reason, I’d really prefer an approach that confirms a Document should be rendered via Liquid rather than trying to enumerate those which shouldn’t be, or which handles a failure of Liquid to render more gracefully.

I’m not sure why it’s handled as a Document object and not a StaticFile. I was playing around with the jekyll_post_files plugin, which explicitly adds files to site.static_files, but was still encountering this. The plugin seems to leave the original Document object in tact, though why it’s there in the first place I’m not sure. I have this issue both with and without jekyll_post_files enabled, so I don’t think it’s the cause but it seems to exhibit the same issue.

This might be another argument for not processing posts that don’t have frontmatter. Other documents already require frontmatter, but posts are special. Presumably it’s unlikely for a .png to begin with “—\n—” or similar.

Good idea, @jake-low! I like this approach better, for reasons noted above; it’s more flexible when encountering file types we didn’t anticipate and explicitly add logic for. Nearly any binary format (really, any I’m aware of) should have some sort of magic number that isn’t “—\n”, as will other text files that don’t need rendering. I don’t think we should check deeper than that though; the frontmatter itself will vary, and I’m not sure that making sure the frontmatter tag is closed is important enough here to concern ourselves with, when it would make us sensitive to various text encodings. This should be a relatively simple change, and one that I think is much more future proof and flexible.

So long as there aren’t cases where we need to render liquid files that don’t contain frontmatter at all, this seems like a really good approach. I can’t think of any such cases in the context of rendering collection directories. Liquid includes are handled later (by Liquid) and so wouldn’t be affected by this logic. Anyone forsee problems?

I’ll work on putting together a pull request that implements a has_frontmatter? method which performs this method and is called as part of the render_with_liquid? check. From there, the maintainers can determine if that’s something they’d like to incorporate.