godot: [3.x] GDScript `String` append extremely slow with large strings
Tested versions
Reproducible in: 3.6 beta 4 3.4 stable 3.22 stable
May also be present in 4.x
System information
N/A
Issue description
Last night, while testing https://github.com/fractilegames/godot-obj-export/blob/main/objexport.gd I noticed the script was running extremely slowly for medium sized scene (over an hour of runtime, and I killed the editor because I ran out of time).
Some investigation this afternoon indicates there is a pathological performance problem from String concatenation with large string sizes, from gdscript. Profiling revealed a large amount of time was being spent in memcpy, primarily from CowData::_copy_on_write().
This same drop does not to seem to be present when calling from c++. This is suggestive of some intermediates being used from gdscript which are drastically lowering performance.
Steps to reproduce
This code shows the problem (see MRP). The slowdown grows exponentially with the string length (modify the 50000).
func benchmark_strings():
var sz = "start"
for n in range(50000):
sz += "blah" + "blah" + str(n)
print("String length is " + str(sz.length()))
pass
func _ready():
var before = OS.get_ticks_msec()
benchmark_strings()
var after = OS.get_ticks_msec()
print("Took " + str(after-before) + " ms.")
Minimal reproduction project (MRP)
About this issue
- Original URL
- State: open
- Created 3 months ago
- Reactions: 2
- Comments: 24 (21 by maintainers)
I’ve confirmed it is indeed due to incorrectly shared data, in both 3.x and 4.x, by running tests in c++, running the same code in c++ with
a = a + ...yields the same results, ~3 seconds, so the issue is indeed shared data, will see about digging through and finding how to fix thatWith these modifications, exporting the test scene with the obj exporter mentioned above has gone down from over an hour, to 1594ms. So quite a difference. 👍
Just to clarify the changes:
PoolStringArrayrather thanString, and call append.var string = "".join(my_pool_string_array)to convert to string at the very end.The issue here doesn’t seem to be the allocation BTW, it’s the
memcpy.See profile:
Strings seem to already use 2x reservation on grow, likeVectors. The problem is more intermediates I think.Update, realized the difference between the previous testing I did and these tests, and where it is different in 4.x currently:
So the way the second string is processed matters here, but it doesn’t matter if you concatenate it
This matches what I tested before where I found doing
a = a + band etc. didn’t matter in 4.x, but in this case it does probably because it updates it multiple times, possibly triggering ref-countingEdit: Seems we arrived at the same realization there secondarily
Edit 2: For 4.x a possible improvement we can look into would be to have a compiler conversion that somehow converts:
into:
As an optimization step (it’d actually use a special method of course with pre-processed stuff so it isn’t slow)
I tested and:
So then it seems the problem behaves the same way in both cases, interesting, can’t profile my version at the moment because I’m on Windows and haven’t got a decent setup ready, but it seems the behavior is the same
That makes it more feasible for me to dig through some of the 4.x GDScript code when I find the time
But for now I’m gonna have some birthday cake and relax, good luck with the digging!
A bit difficult to pin down an exact solution, but the compound assignment operator doesn’t actually use compound assignment in GDScript, and it might be difficult to exactly work out how to do the assignment depending on where it’s available, especially with getters/setters in the mix
I’d say that for properties it’s not going to be feasible to avoid this, but there’s a significant overhead for them anyway
So specifically for plain variables it seems to be the case, but it depends on how to carry it out, we could add dedicated compound assignments for those to handle this, but it’s very specific, but should be some way to wipe the target before we carry it out after having copied the data, will try look into something tomorrow
Edit: dug through and tested some and the current system in 4.x doesn’t allow this directly, will test some more tomorrow but we’d probably need a VM compound assignment operator, might be possible to work around it though
(1) and (2) seem within margin of error, there might be an extra allocation but likely not copying.
(3) is interesting as it is way faster than (2).
It seems likely the entire right side is added first (forming a single line) that is added once to the main string.
Whereas in (2), I’d hazard a guess it is adding the main string to each element on the right side in turn, hence is doing many times more copying.
(4) is clear winner as expected, as it does no long string operations until the final merge.
Actually it’s slower in 4.x on my machine, the example with the string takes 6975 ms for me, but will try to break it down into what is happening later, probably tomorrow, but I think the problem is different in 4.x
For comparison, changing the string append to
sz = sz + ...takes over 20 sAlso rewrote the code a bit to ensure we’re benchmarking just the appends and stuff: