jekyll: Slugify a string doesn't seem to work on Unicode/Swedish letters

What version of Jekyll are you using (jekyll -v)?

v3.0.3 (using github-pages gem)

What operating system are you using?

Windows 7 Ultimate 64bit

What did you do?

Trying to slugify strings with Swedish letters in them. Not behaving like I would expect (to remove them or change the letters). Example string: {{ 'kvalité, då, äta, öl' | slugify }}

What did you expect to see?

kvalit-d-ta-l OR kvalite-da-ata-ol

What did you see instead?

kvalité-då-äta-öl


My workaround would be to replace/remove the single letters before slugify’ing, but I’m wondering if there’s anything else I can do. Example replace: {{ 'kvalité, då, äta, öl' | replace: 'é', 'e' | replace: 'å', 'a' | replace: 'ä', 'a' | replace: 'ö', 'o' | slugify }}

About this issue

  • Original URL
  • State: closed
  • Created 8 years ago
  • Comments: 15 (11 by maintainers)

Most upvoted comments

I know that rails has some method for this. http://api.rubyonrails.org/classes/ActiveSupport/Inflector.html#method-i-transliterate

I do not know if we can use it or copy the code.

To keep the discussion going, I opened up a PR: #6509

I believe this issue was actually solved when ascii mode was introduced to slugify: https://github.com/jekyll/jekyll/pull/4680

Basically, the author could do {{ 'kvalité, då, äta, öl' | slugify: 'ascii' }} and get one of the expected outputs: kvalit-d-ta-l. But I thought it was a good idea to run transliterate to do our best to get as many ASCII characters as we can before we just drop letters. This does add ActiveSupport as a dependency, which I’m not sure if the maintainers prefer to avoid or not.

Besides that, I think from the discussion above, there’s another potential PR to add a config option to slugify categories. Maybe it’d be best to spawn off a new issue specifically for that feature and take it from there.

@DirtyF There is no built-in Ruby method for this IMHO, that’s why Rails added ActiveSupport::Inflector#transliterate like @rriemann pointed out.

after reading #129 and #782 where @parkr has explained concerns about breaking URLs created with the current system. I think that a solution to meet the behavior expected by many users with language with special latin characters, could be to have an option like slugify_chategories that can be set from config . and it default to false to assure backward compatibility.

if set to true it will slug the categories as follows:

'Actualité européenne'  -> 'actualite-europeene'
'Acentuação'            -> 'acentuacao'

what do you think ?