go: runtime: append growth sort of inconsistent and surprising
What version of Go are you using (go version)?
1.15
Does this issue reproduce with the latest release?
y
What operating system and processor architecture are you using (go env)?
Any, but also playground.
What did you do?
This actually grew out of a different question. Consider the fairly common slice-insert idiom:
s = append(s, 0)
copy(s[i+1:], s[i:])
s[i] = inserted
Assume i is less than len(s). You could also do:
s = append(s[:i+1], s[i:])
s[i] = inserted
I thought this might be faster because it sort of skips one copy of the s[i+1:] part of the slice – it can in theory copy each half only once. In fact, it was dramatically slower, and allocated more memory, which surprised me. Further investigation led to:
https://play.golang.org/p/cR51ks72emv
What did you expect to see?
I don’t know. I think I expected the behavior of slice growth on append to be the same regardless of how I achieved the outcome of “increase total length by 1”.
What did you see instead?
Two different things, both somewhat surprising to me. The first is that, starting at N=1024, appending two slices produces wildly different resulting capacities than appending a couple of items to a slice. (Curiously, appending two items explicitly, like “append(s, 0, 1)” doesn’t produce the different behavior, but appending b[size-1:]... does produce the single-item-append behavior.) Specifically, appending a single item grows the slice by 25%, but appending “a slice” does something much fancier. It will at least double the capacity of the first parameter to append(), but if you specify a cap on that parameter, it will then pick a size that’s some amount larger than the total size it needs, but not necessarily as large. It might even be smaller than the size picked when appending a single item to the existing slice.
The second is that, for N=1023 exactly, the single-item append always goes straight to 2048. So, appending a single item to a 1023-item slice produces a 2048 capacity, but appending a single item to a 1024-or-larger item slice almost always produces a smaller capacity. This seems like a weird special case, and might be a bug in the sense of “a check that is supposed to do something specific is doing something else specific”. (Or it might just be "grow by doubling until 1024, then grow by 25%, I guess?) But batch-append doesn’t have that behavior. So, for N from 850 to 1023, batch-append grows the slice more slowly (and thus, allocates less memory and has to do less initialization in this particular use case), but for N from 1024 on up, batch-append tends to grow the slice faster (and thus has to do more extra work). It feels like they should be roughly comparable, probably?
I am not sure what the right outcome is. I think what I’d ideally like would be for the short-cut append to have the same semantics (when it’s valid at all) including the amount by which capacity grows.
The “append instead of appending and copying” behavior actually seems moderately rewarding (a few percent faster, sometimes), but it becomes more valuable as N increases, but currently, it’s only actually better when N is large and the index is small so the array isn’t accidentally doubling in size. If they used the same growth sizing logic, I think it would be a potentially significant win for some use cases.
(Thanks to josharian for pointing out that the capacity of the first slice would be a significant factor.)
About this issue
- Original URL
- State: closed
- Created 4 years ago
- Reactions: 1
- Comments: 32 (17 by maintainers)
I think there is something to fix here. I would expect that both of these appends should allocate the same amount:
Both overflow the same amount and need a growslice, and both started with the same capacity slice. The fact that one needed to fill in one available slot in the cap-len space first seems like it shouldn’t matter.
This isn’t to say that callers should depend on this behavior. But growth should ideally not depend on how elements were added (one at a time, two at a time, whatever) but just on how many were added. (With the exception that if you add so many in a single append that you would overflow what we would do for a single grow, then of course we have to grow more.)
Although it is not a must do, I think it would be great to make the
reflect.Appendbe consistent with the non-reflection way.(The following outputs are the same for 1.15 and tip):
Similar to @ianlancetaylor I think the slice capacities that append produces should not be relied on to be consistent between call sites or versions.
I think the difference in result capacities is due to runtime growslice taking the starting capacity and length into account which is different between the two versions: https://github.com/golang/go/blob/c489330987eca992cee0bb018a6fdb7ff5401704/src/runtime/slice.go#L144
So given different starting parameters I dont think these two versions need to come up with the same resulting capacities.