psych: YAML.safe_load fails when a string contains a non-existent date

YAML.safe_load will raise an exception when you try to load text that happens to contain a sequence of numbers that looks like a date but is not:

s="2016-02-31"
YAML.safe_load(s.to_yaml)
# =>  Psych::DisallowedClass: Tried to load unspecified class: Date

Using YAML.load instead of safe_load works fine and text that contains a correct date works fine too. But this can be used to raise an exception on any application that uses YAML.safe_load on user provided text (accidentally or otherwise)

About this issue

  • Original URL
  • State: open
  • Created 9 years ago
  • Reactions: 15
  • Comments: 22 (4 by maintainers)

Commits related to this issue

Most upvoted comments

I just got hit by this after the hashie library updated and they now call safe_load. I would really like to request that this method just returns strings for anything that is deemed unsafe to be parsed.

The difference is that on the end of the chain I can work with sanitized data. If I really want a Date object I can do the conversion myself. Or I can use it as a string, but if reading in a user supplied configuration file leads to a backtrace, than that’s it. Game over. Epic fail. Given that a library like hashi is calling this means I have no control over the call to load vs safe_load, and even if I did, I would prefer to be safe and decide how I validate date strings before converting them, but I don’t want it to become impossible to read in a perfectly valid yaml file.

My program doesn’t need to have values automagically converted, but asking every user of my library for the eternity to never ever put an unquoted date in a yaml file is impossible.

Can anyone explain what is the problem with @najamelan’s suggestion?

I would really like to request that this method just returns strings for anything that is deemed unsafe to be parsed.

Users can put quotes around dates themselves, but in many cases that may prevent interoperability between Ruby/Psych-based apps (in my case gollum) and other applications that parse YAML, when the latter do support the parsing of dates. If Psych just returned a string for anything that looks like a date, the user wouldn’t need to worry about how to specify their data.

When you do '2013-01-01'.to_yaml, Psych looks at the string and realizes that it is ambiguous, so the resulting YAML is quoted:

irb(main):002:0> require 'psych'
=> true
irb(main):003:0> '2013-01-01'.to_yaml
=> "--- '2013-01-01'\n"
irb(main):004:0>

Note the single quotes around the date string in the resulting YAML.

Since '2013-02-31' isn’t a valid date, it considers this to not be an ambiguous value, so it doesn’t add the quotes around it:

irb(main):004:0> '2013-02-31'.to_yaml
=> "--- 2013-02-31\n...\n"
irb(main):005:0>

The single quotes tell the parser “this is absolutely a string, do not check for other values”. The second value isn’t ambiguous when dumping the YAML, but is ambiguous when parsing. Maybe that helps?

Ensure that a string that is not a valid date like 0000-00-00 gets quoted and treat it as “ambiguous”. What would be a drawback of that type of solution?

Honestly, I can’t think of any drawback. I’d merge a commit that does that.

On a different subject, maybe in the future we should add a safe_dump. I want to guarantee that objects can round trip through dump and load, but I’m not sure if we can make the same guarantee about dump -> safe_load. This case seems like we can though.

Users can put quotes around dates themselves, but in many cases that may prevent interoperability between Ruby/Psych-based apps

Fully agree with @dometto.

The YAML spec does support “date” natively: https://yaml.org/spec/1.2.2/

Example 2.22 Timestamps

date: 2002-12-14

Example 2.23 Various Explicit Tags

not-date: !!str 2002-04-28

It is clear that quoting or escaping the date is not the correct solution (according to the spec).

Is there a plan to resolve this issue? @najamelan’s suggestion of “just returns strings for anything that is deemed unsafe to be parsed.” seems reasonable as it allows for the user to handle any “unsafe” cases.

Thanks!

Is there any way around the issue? A date notation like 20190101 works fine, however I do need 2019-01-01

I guess this is because Psych leans on the Date class to determine what is valid date or not. I’m not really keen on writing my own date validation logic. Do you have a suggestion for how to fix this?