pyyaml: Duplicate keys are not handled properly
YAML spec in version 1.2 says that in a valid YAML file, mapping keys are unique.
This is not the case in pyyaml, as can be seen by loading this sample file:
a:
- b
- c
a:
q: a
q: b
The correct result from loading this file is an error. pyyaml instead naively overwrites results with the last key, resulting in this dict: {'a': {'q': 'b'}}
.
About this issue
- Original URL
- State: open
- Created 6 years ago
- Reactions: 42
- Comments: 20 (7 by maintainers)
Commits related to this issue
- Update git submodules * Update placement from branch 'master' - Merge "Fix a bad granular gabbi test" - Fix a bad granular gabbi test A gabbi test for multiple member_of<N> was using the... — committed to openstack/openstack by deleted user 5 years ago
- Fix a bad granular gabbi test A gabbi test for multiple member_of<N> was using the `query_parameters` keyword to construct the querystring; but it was trying to test what happens when member_of<N> is... — committed to openstack/placement by deleted user 5 years ago
- Disallow duplicate keys when loading dicts from YAML Part of the fix for: https://github.com/pyvec/naucse/issues/2 Workaround for: https://github.com/yaml/pyyaml/issues/165 The workaround is incompl... — committed to encukou/naucse_render by encukou 5 years ago
- Disallow duplicate keys when loading dicts from YAML Part of the fix for: https://github.com/pyvec/naucse/issues/2 Workaround for: https://github.com/yaml/pyyaml/issues/165 The workaround is incompl... — committed to encukou/naucse_render by encukou 5 years ago
- Report errors when loading YAML files with duplicate keys This adds a custom constructor to the YAML loader created in get_yaml_loader(), so that duplicate keys in the YAML file result in a PyYAML Co... — committed to jgehring/omegaconf by jgehring 4 years ago
- Report errors when loading YAML files with duplicate keys This adds a custom constructor to the YAML loader created in get_yaml_loader() so that duplicate keys in the YAML file result in a PyYAML Con... — committed to jgehring/omegaconf by jgehring 4 years ago
- Report errors when loading YAML files with duplicate keys (#257) * Report errors when loading YAML files with duplicate keys This adds a custom constructor to the YAML loader created in get_yaml_... — committed to omry/omegaconf by jgehring 4 years ago
- Report errors when loading YAML files with duplicate keys (#257) * Report errors when loading YAML files with duplicate keys This adds a custom constructor to the YAML loader created in get_yaml_... — committed to omry/omegaconf by jgehring 4 years ago
- Throw ParseError on duplicate YAML keys Duplicate keys are not allowed, according to the YAML specification. Under the hood, we use the PYYAML library. For the time being, PYYAML parser accepts dupli... — committed to transifex/openformats by ThemisB 4 years ago
- Throw ParseError on duplicate YAML keys Duplicate keys are not allowed, according to the YAML specification. Under the hood, we use the PYYAML library. For the time being, PYYAML parser accepts dupli... — committed to transifex/openformats by ThemisB 4 years ago
- Update on "assert no duplicate yaml keys in codegen" The codegen should error if it sees two yaml entries with the same key. The default behavior of python's yaml loader is to overwrite duplicate key... — committed to pytorch/pytorch by bdhirsh 3 years ago
- Update on "assert no duplicate yaml keys in codegen" The codegen should error if it sees two yaml entries with the same key. The default behavior of python's yaml loader is to overwrite duplicate key... — committed to pytorch/pytorch by bdhirsh 3 years ago
- Update base for Update on "assert no duplicate yaml keys in codegen" The codegen should error if it sees two yaml entries with the same key. The default behavior of python's yaml loader is to overwri... — committed to pytorch/pytorch by bdhirsh 3 years ago
- Update on "assert no duplicate yaml keys in codegen" The codegen should error if it sees two yaml entries with the same key. The default behavior of python's yaml loader is to overwrite duplicate key... — committed to pytorch/pytorch by bdhirsh 3 years ago
- Update base for Update on "assert no duplicate yaml keys in codegen" The codegen should error if it sees two yaml entries with the same key. The default behavior of python's yaml loader is to overwri... — committed to pytorch/pytorch by bdhirsh 3 years ago
- Update on "assert no duplicate yaml keys in codegen" The codegen should error if it sees two yaml entries with the same key. The default behavior of python's yaml loader is to overwrite duplicate key... — committed to pytorch/pytorch by bdhirsh 3 years ago
PyYAML should reject duplicate keys per the YAML spec. These are silent errors in users’ code. Do what
ruamel.yaml
does, and add ayaml.allow_duplicate_keys = True
property for those who want to opt-in.The YAML 1.1 spec also says that keys should be unique: https://yaml.org/spec/1.1/#id861060
@parrenin I think this is misguided. Since this violates the spec, you don’t know whether the original or the overridden value would be active (without looking at the implementation, which could change at any time, since it is not part of the spec.) Simply reading the “base” and “overriding” settings as two separate files, and combining as you see fit would be more correct.
I just wanted to comment that in my case, the overwrite is a feature and not a bug. In my model, I have several “sites”, each of them with a yaml parameter file. And I have also a parameter file which apply to all sites. So I concatenate the general parameter file with each individual parameter file which thus overwrite the general parameter file when there is a duplicate entry. So it would be helpful to at least keep an option for the overwrite feature.
@parrenin
I think this sounds like a use case for merge keys, actually.
The workaround that anz-rfc posted works well enough for me, though a more canonical solution would be preferable.
some discussion of workarounds here: https://gist.github.com/pypt/94d747fe5180851196eb
@wimglenn the correct way to use multiple merge keys is actually:
That way duplicate merge keys can be avoided.
Other than that, the spec is actually not very specific about equality of nodes. For example,
0xa
and10
are different before resolving, but equal after.One problem is that this logic would break the functionality of merge keys: https://yaml.org/type/merge.html See
def flatten_mapping
.To fix this, we would need a seperate handling of the merge keys and the actual keys in the mapping.
I don’t know which tests were failing (haven’t tested your patch yet), but I guess the ones with merge keys were among them.
This seems to duplicate #41.