jekyll: "Liquid Exception: invalid byte sequence in UTF-8..." with binaries
- I believe this to be a bug, not a question about using Jekyll.
- I Updated to the latest Jekyll (or) if on Github Pages to the latest
github-pages
- I am on (or have tested on) _macOS_ 10+
- I was trying to build.
My Reproduction Steps
- Create a subdirectory under the
_posts
folder called2016-08-02-example
. - Add a file
2016-08-02-example.md
to this folder. (With proper frontmatter, etc.) bundle exec jekyll serve
and all is well.- Add an image file (ie.
08-02-16-image.png
) to this folder. bundle exec jekyll serve
now yields an error.
With both JEYKLL_LOG_LEVEL=debug
and the -t
switch output is:
...
Rendering Markup: _posts/2016-08-02-example/2016-08-02-example.md
Rendering: _posts/2016-08-02-example/2016-08-02-image.png
Pre-Render Hooks: _posts/2016-08-02-example/2016-08-02-image.png
$USER_DIR$/blog/_posts/2016-08-02-example/2016-08-02-image.png render_with_liquid? false
Rendering Liquid: _posts/2016-08-02-example/2016-08-02-image.png
Liquid Exception: invalid byte sequence in UTF-8 in $USER_DIR$/blog/_posts/2016-08-02-example/2016-08-02-image.png
$USER_DIR$/.gem/ruby/2.2.3/gems/liquid-3.0.6/lib/liquid/template.rb:232:in `split': invalid byte sequence in UTF-8 (ArgumentError)
from $USER_DIR$/.gem/ruby/2.2.3/gems/liquid-3.0.6/lib/liquid/template.rb:232:in `tokenize'
from $USER_DIR$/.gem/ruby/2.2.3/gems/liquid-3.0.6/lib/liquid/template.rb:122:in `parse'
from $USER_DIR$/.gem/ruby/2.2.3/gems/liquid-3.0.6/lib/liquid/template.rb:108:in `parse'
from $USER_DIR$/.gem/ruby/2.2.3/bundler/gems/jekyll-6d2b344c0e24/lib/jekyll/liquid_renderer/file.rb:11:in `block in parse'
from $USER_DIR$/.gem/ruby/2.2.3/bundler/gems/jekyll-6d2b344c0e24/lib/jekyll/liquid_renderer/file.rb:47:in `measure_time'
from $USER_DIR$/.gem/ruby/2.2.3/bundler/gems/jekyll-6d2b344c0e24/lib/jekyll/liquid_renderer/file.rb:10:in `parse'
from $USER_DIR$/.gem/ruby/2.2.3/bundler/gems/jekyll-6d2b344c0e24/lib/jekyll/renderer.rb:109:in `render_liquid'
from $USER_DIR$/.gem/ruby/2.2.3/bundler/gems/jekyll-6d2b344c0e24/lib/jekyll/renderer.rb:62:in `run'
from $USER_DIR$/.gem/ruby/2.2.3/bundler/gems/jekyll-6d2b344c0e24/lib/jekyll/site.rb:447:in `block (2 levels) in render_docs'
...
The Output I Wanted
I would expect binary files (images, etc.) to simply be ignored and copied through to the output directory unmodified.
Previous issues
This issue seems to have come up in past issues which may relate, including: #2592, #4276, #2262, and #2228.
Root cause
This appears to be due to the logic of Jekyll::Document.render_with_liquid?
def render_with_liquid?
!(coffeescript_file? || yaml_file?)
end
As you can see, it returns true if the source file is not coffeescript or yaml, which a .png
isn’t. It’s not suitable for liquid either though, hence the problem.
As a quick and dirty test, I added a case for .png
extensions:
def render_with_liquid?
!(coffeescript_file? || yaml_file? || %w(.png .jpg .bin).include?(extname)
end
This has the desired effect, and seems to confirm I’m on the right track. Obviously it’s not a good solution though (hence no PR). Assuming that storing non-liquid files in the same directory is supported (I believe it is?) this probably needs to ensure the content is suitable (ie. text, likely UTF-8) before passing it to Liquid. Maybe something along the lines of checking against config_yml’s existing markdown_ext
?
I’ll see if I can get a test together to send as a PR, while I’m not quite sure of the proper solution, I think I’ve got a solid feel for the issue behind this behavior at least.
About this issue
- Original URL
- State: closed
- Created 8 years ago
- Comments: 19 (11 by maintainers)
Commits related to this issue
- Add files to repro issue in https://github.com/jekyll/jekyll/issues/5181 - `mkdir _posts/2016-08-05-example` - `cd _posts/2016-08-05-example` - `echo "---\n---Test!" > _posts/2016-08-05-example/2016-... — committed to jjarmoc/jekyll-example by jjarmoc-sfdc 8 years ago
- Work in progress for https://github.com/jekyll/jekyll/issues/5181 [x] Does not invoke Liquid renderer on files without YAML frontmatter. [x] Copies files without frontmatter to the corresponding outp... — committed to jjarmoc/jekyll by jjarmoc-sfdc 8 years ago
- skip adding binary files as posts fixes #5181 — committed to Crunch09/jekyll by Crunch09 7 years ago
I’m still having this issue, and it appears to occur without regard to whether there is a date in the image’s filename.
I’m still seeing this with Jekyll 3.4.2.
I am also seeing this issue, and have and idea that the reason the error seems intermittent is that it depends on the binary content of the image file.
In my case, I get the error for a .png file, no matter how I name it. My guess is that this is because the initial bytes in the .png-header, which read
0x8950 4e47
, is not valid UTF-8 (because all valid UTF-8 code points have either no leading ones or two or more leading ones in their binary form.This means that as soon as you try converting even the first few bytes of a png-file to a string, you will get an encoding error.
A solution might be to have the front-matter detection code look for the ascii-encoded byte-string corresponding to
---\n
rather than converting to string and then comparing.@nhoizey good catch! I just did a test : no error message when image file does not start with a date, error message when image file is named like a post.
I’ve made some progress on this, but I have a few problems.
My branch now only renders liquid in Posts which have YAML frontmatter (@jake-low’s idea above). Other files are copied over unmodified.
However, there are a few failing tests. This is due to these files still being added to
site.posts
. I’m having trouble figuring out a good way to handle it; thePostReader
simply createsJekyll::Document
instances for each file in_posts
, which in turn populatessite.posts
. It seems that flagging these asJekyll::StaticFile
would be better, but due to some differences betweenStaticFile
andDocument
constructors and logic, this is turning out to be somewhat difficult.Does anyone who’s more familiar with the
...Reader
classes have an idea how to address this?@parkr No problem! It looked like other people had encountered this in the past, and not knowing what was behind it was bugging me. Happy to help when I can!
That’d work for my immediate need, but I worry that it’s not too flexible. Right now, I really just want image folders in post directories (which I realize is a hotly debated topic), but down the road this might expand to other binary file formats; .pdf, .zip, .tar, .tgz, etc.
For that reason, I’d really prefer an approach that confirms a
Document
should be rendered via Liquid rather than trying to enumerate those which shouldn’t be, or which handles a failure of Liquid to render more gracefully.I’m not sure why it’s handled as a
Document
object and not aStaticFile
. I was playing around with the jekyll_post_files plugin, which explicitly adds files tosite.static_files
, but was still encountering this. The plugin seems to leave the originalDocument
object in tact, though why it’s there in the first place I’m not sure. I have this issue both with and withoutjekyll_post_files
enabled, so I don’t think it’s the cause but it seems to exhibit the same issue.Good idea, @jake-low! I like this approach better, for reasons noted above; it’s more flexible when encountering file types we didn’t anticipate and explicitly add logic for. Nearly any binary format (really, any I’m aware of) should have some sort of magic number that isn’t “—\n”, as will other text files that don’t need rendering. I don’t think we should check deeper than that though; the frontmatter itself will vary, and I’m not sure that making sure the frontmatter tag is closed is important enough here to concern ourselves with, when it would make us sensitive to various text encodings. This should be a relatively simple change, and one that I think is much more future proof and flexible.
So long as there aren’t cases where we need to render liquid files that don’t contain frontmatter at all, this seems like a really good approach. I can’t think of any such cases in the context of rendering collection directories. Liquid includes are handled later (by Liquid) and so wouldn’t be affected by this logic. Anyone forsee problems?
I’ll work on putting together a pull request that implements a
has_frontmatter?
method which performs this method and is called as part of therender_with_liquid?
check. From there, the maintainers can determine if that’s something they’d like to incorporate.