ruff: Formatter: Keep right-hanging comments aligned

ruff format follows black’s style for right-hanging comments (i.e. inline comments), but that leads to poorly readable results, especially when a series of consecutive lines uses comment alignment by column. Blue fixes this readability problem.

Let’s start with this file:

def foo():
    x = 1                                         # this comment
    why = 2                                       # aligns with this comment
    zebra = 3                                     # and this one

As you can see, every # on each of these lines appears at the same column. How does ruff reformat the line?

def foo():
    x = 1  # this comment
    why = 2  # aligns with this comment
    zebra = 3  # and this one

This is the way black does it, i.e. by jamming all comment starts to two space characters between the last code character and the #. This destroys the alignment.

How does blue format the file? Nothing changes! Blue preserves the number of spaces between the last code character and the # with the assumption that the spacing is deliberate and done for readability.

ruff format should follow blue’s rule here.

About this issue

  • Original URL
  • State: open
  • Created 9 months ago
  • Reactions: 16
  • Comments: 25 (7 by maintainers)

Most upvoted comments

For what it’s worth, I have consistently rejected this sort of feature in Black, as I don’t think it leads to a consistent, stable formatting style. Obviously, what style Ruff chooses is up to you.

Barry’s proposed rule is that the number of spaces before the hash gets preserved. But what if someone comes along and decides that zebras are worth more:

def foo():
    x = 1                                         # this comment
    why = 2                                       # aligns with this comment
    zebra = 31                                     # and this one

Now either the code gets misaligned (ugly), or they have to manually fiddle with the spaces, which defeats the point of an autoformatter.

Now, possibly the formatter could be made smart enough to detect that the comments are misaligned, and fix the spacing on the zebra line to get them aligned again.

But now what if we decide that zebras are worth a lot more:

def foo():
    x = 1                                         # this comment
    why = 2                                       # aligns with this comment
    zebra = 300000000000000000000000000000000000000  # and this one

Then, to get the lines aligned again, the formatter would have to change the x and why line, creating noise in the diff.

Hm… a comment-column setting is an interesting approach

Thinking about this overnight, this could be the best way forward. It would be a local-to-the-line only setting – no need for global parsing or alignment. It also, um, aligns more with the Emacs behavior. ruff would do its best to preserve comment-column but in the “big zebra” case above, it would just push the hanging comment out to the right. That’s fine because the user would still have a lot of flexibility in getting the format they want.

Straw man proposal:

  • comment-column = 60 - best effort to align the hanging comment # on column 60, the Emacs behavior
  • comment-column = +2 - a relative setting, i.e. the current black/ruff default
  • comment-column = false - preserve existing whitespace, i.e. the blue behavior

I’d prefer not to support this, i think comments directly after the statements or alternatively leading own line comments (where they are column aligned) are fine enough to read. For the the case of not modifying whitespace i agree with @JelleZijlstra that would defeat the point of an autoformatter while for the case of column aligning i don’t think the possible improvement warrants introducing a column layout when we don’t use one elsewhere¹.

This is also bad from a point of locality, currently the formatting of one statement does not influence the formatting of other statements (we see those comments as trailing on the statement).


¹With column layout, i mean e.g.

class Foo:
    x     = 1                  # this comment
    why   = 22                 # aligns with this comment
    zebra = 333                # and this one

Yes, I can disable formatting for these structures, but there are nice features of formatting, such as uniformizing indentations and quote style, and it would be nice not to have to throw out the baby with the bathwater.

To be clear, I do not want the formatter to calculate the correct columns. Bumping 0/1 spaces to 2 is a completely fine rule to apply, but otherwise I just want them left alone. If I had indicated to the formatter to leave comments alone, I would expect the tuple to get reformatted and the comment spacing kept the same. If I then decided to adjust the spacing before the comment, I would expect the formatter to leave the new spacing alone.

FWIW, if your situation occurred, I would probably reformat it to:

header_dtd = [
    ('sizeof_hdr', 'i4'),      # 0; must be 348
    ...,
    (
        'a very long header that exceeds the line width so that it breaks',
        'i2',
        (8,),
    ),                         # 40; data array dimensions
 ]

Just as a meta-comment, it really is odd to me that black et al have decided that using spacing in comments to improve readability needs to be done away with, like arguments over tabs vs spaces or where to put newlines in parameter lists. Comments are the one bit of code whose only purpose is communication with other humans (okay, modulo pragma: and fmt: and so on), so it strikes me as particularly funny that everybody seems to be saying “We can’t come up with an algorithm to make these look good all the time, so let’s prevent humans from doing it themselves.”

There’s a little bit of missing context that might be helpful. In python-mode you can set a comment column, and then hit M-; (I’m not sure if that’s a default keybinding, but it’s set to comment-dwim for me) and Emacs will do it’s best to ensure that hanging comment starts at that column. Of course in @JelleZijlstra 's second example, it will put a minimum number of spaces there but they won’t align.

So @JelleZijlstra 's first example, it’s relatively easy, albeit manual, to re-indent everything to an aligning column. Emacs only looks at the current, line so it doesn’t actively try to keep things lined up, but the effect is usually that they are. The preservation of whitespace rule that blue implements works well with this approach.

It’s never going to be perfect, but I’d argue that blue’s rule is better than the step-wise always-2-spaces rule that black (and currently ruff) implements because that leads to difficult to read lines. It’s worse with black though because you don’t even have the option of manually aligning those hanging comments – black will just crush them.

I would generalize this issue as vertical alignment (colons, assignments, and comments), including cases such as:

@dataclass
class Example:
    an_attr:      int
    another_attr: str | None
    whatever:     dict[str, str]


class Colors(IntEnum):
    BLACK  = 0  # a comment
    RED    = 1  # another comment
    YELLOW = 2

I use a VSCode extension for vertical alignment, Better Align, which has a very convenient configuration method.

So, the above formatting would be:

colon = [ -1, 1 ]
assignment = [ 1, 1 ]
comment = 2

if we’re still in the spit-balling ideas phase, I’d like to use a double hash ## for @warsaw comment-column = false behavior (in addition to, not as a replacement) . For me it’s only a few spots where I use comment columns, and I’d prefer having something easy enough to remember, not a prama or conf setting. And I’m happy that ruff format is fixing the whitespace in 99% of cases.

fwiw, I came up with this as the most pleasant to my eyes workaround

class Foo(Enum):
    AAA = -2  #      > for the a's
    BBBBBBBB = 0  #  > NOT USED
    CCCC = 1  #      > only when you need sea

it strikes me as particularly funny that everybody seems to be saying “We can’t come up with an algorithm to make these look good all the time, so let’s prevent humans from doing it themselves.”

We want to support the best possible comment formatting that matches users’ intuitions. It’s just that comment formatting is surprisingly hard.I would estimate that about 30% of the development effort of our formatter is related to handling comments as best as we can. Comments are hard because they can appear in any position, there’s no formal definition of what a comment comments, the output must be stable (on the first try), the formatter must support all possible comment placements, and misplacing comments can easily lead to syntax errors.

What do you think about my https://github.com/astral-sh/ruff/issues/7684#issuecomment-1747129718? I think it covers all the use cases.

Assuming we align all comments at, e.g. column 60. What happens if the comment has a width of 29, exceeding the line width by 1? For example, let’s pretend the comment below is aligned at column 60 and does exceed the line width.

a = [1, 2]           # comment that exceeds the line width when aligned to column 60

The comment would fit fine if it isn’t aligned at column 60. What’s your expected behavior?

I would expect that ruff format would leave the comment alone, but ruff check would complain.

Yes. There are plenty of ruff checks that are not auto-fixable, and that hasn’t been an existential problem. If the problem is this difficult, the obvious solution is to work on something that is tractable.

Clearly some people like black’s behavior, so I am not saying do not include it. Just let it be optional, please. Or you can go further and allow some other specific behaviors, as Barry suggests.

Anyway, I feel like I’ve taken up enough space in this thread. I like ruff, I like autoformatting. Thanks for all the effort, whatever the end result of this conversation is.

The comment would fit fine if it isn’t aligned at column 60. What’s your expected behavior?

Speaking just for myself, I would expect that ruff format would leave the comment alone, but ruff check would complain. I actually think it’s an anti-pattern to have long hanging comments, and would personally rewrite such comments as a separate block preceding the code. My personal preference is that hanging comments usually end up being pragmas or type checker hints, although I do occasionally have short, lined-up “blocks” like the examples above.

Aside: I wish there was a better way to signal intent to static checkers, rather than pragma/type comments.

How frequent do you use these carefully crafted comments? IMO, disabling formatting might be the best option if they are rare.

Considering:

header_dtd = [
    ('sizeof_hdr', 'i4'),      # 0; must be 348
    ('data_type', 'S10'),      # 4; unused
    ('db_name', 'S18'),        # 14; unused
    ('extents', 'i4'),         # 32; unused
    ('session_error', 'i2'),   # 36; unused
    ('regular', 'S1'),         # 38; unused
    ('dim_info', 'u1'),        # 39; MRI slice ordering code
    ('dim', 'i2', (8,)),       # 40; data array dimensions
 ]

It’s unclear how the formatter should format the comments once a line exceeds the configured line width

header_dtd = [
    ('sizeof_hdr', 'i4'),      # 0; must be 348
    ('data_type', 'S10'),      # 4; unused
    ('db_name', 'S18'),        # 14; unused
    ('extents', 'i4'),         # 32; unused
    ('session_error', 'i2'),   # 36; unused
    ('regular', 'S1'),         # 38; unused
    ('dim_info', 'u1'),        # 39; MRI slice ordering code
    (
		'a very long header that exceeds the line width so that it breaks', 'i2', (8,)
	),       # 40; data array dimensions
 ]

The top two reasons I use blue and not black are single quotes and not messing with comment spacing. I don’t mind bumping the spaces to 2, but removing additional spaces is a real pain. If I’m using more, there’s pretty much always a reason.

Here’s an example that even blue currently damages (blue fixed the single-line expression case). These are pairs of field names and numpy dtype codes, followed by comments:

 header_dtd = [
-    ('sizeof_hdr', 'i4'),      # 0; must be 348
-    ('data_type', 'S10'),      # 4; unused
-    ('db_name', 'S18'),        # 14; unused
-    ('extents', 'i4'),         # 32; unused
-    ('session_error', 'i2'),   # 36; unused
-    ('regular', 'S1'),         # 38; unused
-    ('dim_info', 'u1'),        # 39; MRI slice ordering code
-    ('dim', 'i2', (8,)),       # 40; data array dimensions
+    ('sizeof_hdr', 'i4'),  # 0; must be 348
+    ('data_type', 'S10'),  # 4; unused
+    ('db_name', 'S18'),  # 14; unused
+    ('extents', 'i4'),  # 32; unused
+    ('session_error', 'i2'),  # 36; unused
+    ('regular', 'S1'),  # 38; unused
+    ('dim_info', 'u1'),  # 39; MRI slice ordering code
+    ('dim', 'i2', (8,)),  # 40; data array dimensions
 ]

I think it’s hard to argue that this change makes things more readable. I understand that preserving an aligned column is difficult, and I really don’t care if an autoformatter chooses not to try. But if I go through the trouble of making an aligned column of comments, it’s frustrating for an autoformatter to destroy it and give no option to just leave it alone. The PR I opened for blue about this (https://github.com/grantjenks/blue/pull/83) still enforces at least two spaces, but otherwise leaves inline comment spacing alone.

I have also been annoyed with this when using Black. I’m not sure if there’s a downside to supporting this. Ideally it would only apply if they start out aligned. Additionally, it’d be cool if it shrunk to the minimum compatible distance for alignment — but that’s a little magical.