pyyaml: order in dict is not preserved
Python 3.6.3
import yaml
document = """
b:
c: 3
d: 4
a: 1
"""
print(yaml.dump(yaml.load(document), default_flow_style=False))
a: 1
b:
c: 3
d: 4
About this issue
- Original URL
- State: closed
- Created 7 years ago
- Reactions: 34
- Comments: 34 (8 by maintainers)
This is a property of Python, not PyYAML. Python does not preserve the order of dictionaries and so we cannot either. To do so, you’d have to
yaml.load
into an OrderedDict (a dictionary implementation that preserves order). Cheers!Correct.
Wrong. Since the spec doesn’t guarantee an order, that means any order is valid. PyYAML could return dict keys in any arbitrary order (alphabetical, reverse-alphabetical, shortest first, random, order of creation, etc.) and it would still be perfectly consistent with the YAML specification.
In practice, the only ordering that makes any sense is the order in which they were created, because if they are returned in a different order then the information about which was created first is lost forever. If the user requires any other form of ordering (alphabetical, etc.), then he/she is able to sort the dict themself after it has been returned in creation order. However, if the dict is not returned in creation order then the user can never put it back in creation order (except by a lucky guess).
It is for this very reason that, since Python 3.7, dictionaries are ordered by default as a feature of the language (and not just as an implementation detail as they were in 3.6).
This is why I think returning in creation order should be the default in PyYAML (at least for Python >= 3.7) and not just an option, though I understand the desire to ensure backwards compatibility. (It should be noted, however, that nobody complained when Python dicts became ordered by default, even though it could be seen as a backwards-incompatible change.)
I wrote a drop-in replacement to address this problem: https://github.com/wimglenn/oyaml You may
import oyaml as yaml
and use as usual.Seems python does only guarantee to keep the insertion order of dicts since 3.7: https://stackoverflow.com/questions/39980323/are-dictionaries-ordered-in-python-3-6 I think a
sort
option (default=true) would make sense. Setting it to false would then keep the original order (with python >= 3.7).@sigmavirus24, that’s not completely true.
So it would seem that Python does preserve the order of the dictionary. In fact, the sorting is done by PyYAML during
yaml.dump()
. This line in representer.py appears to be the culprit.A whole bunch of people have created forks/extensions to PyYAML specifically to get around this issue, so it would be nice if it was fixed in PyYAML itself.
Source: https://stackoverflow.com/a/45984742
sort_keys = True
is nothing I would expect from a dumper by default, especially not from a YAML dumper. A dumper should convert data from one format into another, there is no reason for it to sort the data. YAML is a human readable format and used for user configs a lot. If you run a key sort on a user config you end up with a mess. It already happened to me a couple of times working with python/pyyaml. It’s annoying and it shouldn’t be. Just my 2 cents. 😃An option to not sort would be great, thanks, and fine for my intended use case, but if Python doesn’t sort by default then perhaps PyYAML shouldn’t sort by default either?
@perlpunk, the original order is not “random” order, please stop caling it that. The only random thing here was the random idea of applying a dictionary sort when the spec does not require it.
I just install 5.1.1 using pip, and it does look like this has all been fixed for both the loader and the dumper. The loader preserves the order now by default, whereas the dumper requires setting
sort_keys=False
. Thanks!Fixed by #254
YAML dumpers can (and probably should) dump mappings with their keys sorted (by default) in environments where insertion order is not preserved. PyYAML sorts keys doesn’t have an option not to. Having keys in a deterministic order is generally more useful than not.
The most correct and useful thing to do here is to provide a
sort_keys
option to dump that defaults to True. Setting it to False will get you whatever key order the python implementation being used provides natively.It looks like @perlpunk++'s #143 does this. I’ll try to get it released soon.
@perlpunk Before
3.7
you could useOrderedDict
.@jasweet, you are actually wrong,
sort_keys
has been defaulting toFalse
ever since Python 2.7: https://docs.python.org/2.7/library/json.html#json.dump Now try to bury this fact with “unlikes” 😉I agree with @shoogle that, while the Spec does not guarantee order, it’s not a requirement to return keys in random order.
Regarding backwards compatibility, people might rely on the current behaviour that keys are sorted. That’s a bit different to the change in python 3.6/3.7, where the keys were in random order previously. Changing the
sort_keys
option in my PR #143 to have a different default depending on the python version might be a good compromise, OTOH it could also be confusing.So, I have to use yet another package (oyaml), or is this going to be fixed anytime soon? 😃
btw: pprint is sorting too! -___-
pprint.pprint({}, sort_dicts=False)
Agree with both of you. Like I mentioned above, json.dumps uses sort keys and defaults to true but can be set to false. That functionality should be added to PyYaml for sure. Simply need a parameter based conditional around the try block I posted above.
I created a PR #143 Maybe @sigmavirus24 or @ingydotnet can have a look at it
@perlpunk Just as a quip and curiosity note, the order is not random. It’s “arbitrary”, which means it’s consistent but not to be relied upon before 3.7. You can however make it truly random by either specifying PYTHONHASHSEED=random or -R when you invoke python.
I think a fundamental problem is that the yaml-specs do not guarantee an order. As PyYAML is a yaml parser, guaranteeing order seems like a slight breach of the yaml specs.
That’s why there is Phynix/yamlloader, which is based on PyYAML but extends the functionality by explicitly keeping the order or OrderedDicts (and dicts for Python 3.7+). Though I wanna stress out, that this actually breaks the yaml specifications! But it is still useful…
My proposition would be not to guarantee that behavior directly in
pyyaml
and rely on extensions likeyamlloader
. Or, in other words, how far should pyyaml deviate from the yaml-specs?Any thoughts on that?
(This of course is not a vote against the
ordered
flag, just against giving the guarantee here)If it is any consolation, Python’s JSON module preserves order when dumping. Since oyaml now exists, it’s not that big of an issue but I just thought I’d throw it out there.
@wimglenn Thanks! Worked great!
@mayou36, if insertion-ordering is optional, as it is in PR #143, then there is no problem with backwards compatibility. Furthermore, as you say, PyYAML made no guarantees about ordering anyway, so it would not be breaking the API to change to insertion-ordering by default. I’m not saying it should happen right away, but maybe after one or two releases where it was provided as an option.
@shoogle I guess both defaults can make sense. Important for me would be backwards compatibility.
Pull requests are welcome. I don’t know when I can implement that, I’m busy for a couple of weeks, and additionally I only just started to learn Python 😉 It shouldn’t be that hard, but the classes and objects in dumper/representer.py are confusing me right now.