pyyaml: Incorrect indentation with lists

When using indentation this seems to be applied to the value of the list instead of to the list itself, as you can see below indent=4 is applied after the leading - and not to the list itself.

>>> print(yaml.dump(data['vars']['yaml'], indent=4, allow_unicode=True, default_flow_style=False))
list_of_dict_attr:
-   attr1: value1
    attr2: value2
    attr3:
    - item1
    - item2
single_attr: value1
>>> print(yaml.dump(data['vars']['yaml'], indent=2, allow_unicode=True, default_flow_style=False))
list_of_dict_attr:
- attr1: value1
  attr2: value2
  attr3:
  - item1
  - item2
single_attr: value1

original issue https://github.com/ansible/ansible/issues/48865

About this issue

  • Original URL
  • State: open
  • Created 6 years ago
  • Reactions: 38
  • Comments: 25 (4 by maintainers)

Commits related to this issue

Most upvoted comments

The workaround mentioned above:

class Dumper(yaml.Dumper):
    def increase_indent(self, flow=False, *args, **kwargs):
        return super().increase_indent(flow=flow, indentless=False)

print(yaml.dump(data, Dumper=Dumper))
Gateways:
  - 14
  - 4
  - 18

From my point of view, the most widely-accepted indentation style for sequences is the one used multiple times in the official YAML specification. For instance, in section 2.1, example 2.3 looks like this:

american:
  - Boston Red Sox
  - Detroit Tigers
  - New York Yankees
national:
  - New York Mets
  - Chicago Cubs
  - Atlanta Braves

The question is whether tools like pyyaml should render sequences in such a way for indentation of size 4 or for indentation of size 2.

I would argue that it seems incorrect to render sequences in such a way for indentation of size 4, because other items would visually appear to be indented more:

mapping:
    one: 1
    two: 2
list:
  - 1
  - 2

Therefore, I think that it is more appropriate to render sequences in such a way for indentation of size 2:

mapping:
  one: 1
  two: 2
list:
  - 1
  - 2

That being said, someone may prefer to not indent sequence items to a level that is visually similar to the indentation level of the other items. That is a fair requirement, but in order to fully support it, there would have to be a separate configuration option for indentation size of sequences.

Well, I wouldn’t call that behaviour incorrect. I guess it’s a matter of taste, and I can find arguments proving that it’s consistent. What I’m also missing in this issue is the expected correct behaviour.

Let’s look at both examples:

--- # spaces = 4
list_of_dict_attr:
-   attr1: value1
    attr2: value2
    attr3:
    - item1
    - item2
single_attr: value1
--- # spaces = 2
list_of_dict_attr:
- attr1: value1
  attr2: value2
  attr3:
  - item1
  - item2
single_attr: value1

The top level mapping has an indentation of zero (0 * spaces). The value for list_of_dict_attr, the sequence, also has an indentation of zero, because PyYAML chooses zero-indented sequences always. That’s why the dashes have no indentation in both cases. If it chooses zero-indentation, it simply does not depend on the number of spaces you configured.

The value of the first sequence item, the mapping attr1: ..., has an indentation of 1 * spaces (respectively 4 or 2). The sequence under attr3 is zero-indented again, so 1 * spaces. The items of this sequence are on the same line, so they don’t get any indentation.

I assume you would expect this instead?

--- # spaces = 4
list_of_dict_attr:
  - attr1: value1
    attr2: value2
    attr3:
      - item1
      - item2
single_attr: value1

another year 2022 coming, is there an easy way to resolve this issue?

@pkit I know you’re frustrated about the slow progress on this issue - many of us are - but please do not take your frustration out on your fellow commenters and contributors. By doing so you reduce trust and diminish the quality of all open source projects related to this one.

Also running into this via ansible. It would be nice if the indentation were consistent.

I agree with @pbasista about what the output should look like, that’ll be the same behavior that yamllint using, and maybe a separate configuration option would be a solution for both people want/like it or not.

I’m currently facing yaml file generated by pyyaml not being accepted by yamllint because of the indent of the lists.

@Acidherr

#234 (comment)

This worked for me. It is quite easy to implement also.

What in the words indent=4 is so hard to understand? No it doesn’t work with indent=4, in 2023.

The workaround mentioned above:

class Dumper(yaml.Dumper):
    def increase_indent(self, flow=False, *args, **kwargs):
        return super().increase_indent(flow=flow, indentless=False)

print(yaml.dump(data, Dumper=Dumper))

Unfortunately, it does not works with CDumper. And when working with a lots of yaml or with big yaml, I’d rather have no indentation than having a “good looking” one but much much slower generator.

I just use prettier as a pre-commit hook and it takes care of making yaml look good.

I tried using prettier, but it made changes that both I and YamlLint disagreed with.

It seems the spec varies the output. The Preview section shows sequences indented from the key.

Example 2.3. Mapping Scalars to Sequences (ball clubs in each league)

american:
  - Boston Red Sox
  - Detroit Tigers
  - New York Yankees
national:
  - New York Mets
  - Chicago Cubs
  - Atlanta Braves

However the Failsafe Schema is indeed what pyyaml is doing:

10.1. Failsafe Schema The failsafe schema is guaranteed to work with any YAML document. It is therefore the recommended schema for generic YAML tools. A YAML processor should therefore support this schema, at least as an option. … 10.1.1.2. Generic Sequence URI: tag:yaml.org,2002:seq

Kind: Sequence.

Definition: Represents a collection indexed by sequential integers starting with zero. Example bindings to native types include Perl’s array, Python’s list or tuple, and Java’s array or Vector.

Example 10.2. !!seq Examples

Block style: !!seq
- Clark Evans
- Ingy döt Net
- Oren Ben-Kiki

Flow style: !!seq [ Clark Evans, Ingy döt Net, Oren Ben-Kiki ]

Personally I prefer the indented format and it would be nice if pyyaml supported it as an option but the code isn’t doing anything wrong without the indents even if yamllint disagrees.

Is there any progress on this? Right now the workaround here is working for me https://stackoverflow.com/questions/25108581/python-yaml-dump-bad-indentation