godot: [3.x] GDScript `String` append extremely slow with large strings

Tested versions

Reproducible in: 3.6 beta 4 3.4 stable 3.22 stable

May also be present in 4.x

System information

N/A

Issue description

Last night, while testing https://github.com/fractilegames/godot-obj-export/blob/main/objexport.gd I noticed the script was running extremely slowly for medium sized scene (over an hour of runtime, and I killed the editor because I ran out of time).

Some investigation this afternoon indicates there is a pathological performance problem from String concatenation with large string sizes, from gdscript. Profiling revealed a large amount of time was being spent in memcpy, primarily from CowData::_copy_on_write().

This same drop does not to seem to be present when calling from c++. This is suggestive of some intermediates being used from gdscript which are drastically lowering performance.

Steps to reproduce

This code shows the problem (see MRP). The slowdown grows exponentially with the string length (modify the 50000).

func benchmark_strings():
	var sz = "start"
	
	for n in range(50000):
		sz += "blah" + "blah" + str(n)
		
	print("String length is " + str(sz.length()))
	pass



func _ready():
	var before = OS.get_ticks_msec()
	benchmark_strings()
	var after = OS.get_ticks_msec()
	
	print("Took " + str(after-before) + " ms.")

Minimal reproduction project (MRP)

String_Slowdown.zip

About this issue

  • Original URL
  • State: open
  • Created 3 months ago
  • Reactions: 2
  • Comments: 24 (21 by maintainers)

Most upvoted comments

I’ve confirmed it is indeed due to incorrectly shared data, in both 3.x and 4.x, by running tests in c++, running the same code in c++ with a = a + ... yields the same results, ~3 seconds, so the issue is indeed shared data, will see about digging through and finding how to fix that

With these modifications, exporting the test scene with the obj exporter mentioned above has gone down from over an hour, to 1594ms. So quite a difference. 👍

Just to clarify the changes:

  • Use PoolStringArray rather than String, and call append.
  • Use var string = "".join(my_pool_string_array) to convert to string at the very end.

The issue here doesn’t seem to be the allocation BTW, it’s the memcpy.

See profile: profile_gdscript_string

Strings seem to already use 2x reservation on grow, like Vectors. The problem is more intermediates I think.

Update, realized the difference between the previous testing I did and these tests, and where it is different in 4.x currently:

var added := "blah" + "blah" + str(n)

var tmp := sz
sz = ""
# 1
sz = tmp + added # Takes ~1 s with 20000 runs

# 2
sz = sz + added # Takes ~1 s with 20000 runs

# 3
sz += added # Takes ~1 s with 20000 runs

So the way the second string is processed matters here, but it doesn’t matter if you concatenate it

This matches what I tested before where I found doing a = a + b and etc. didn’t matter in 4.x, but in this case it does probably because it updates it multiple times, possibly triggering ref-counting

Edit: Seems we arrived at the same realization there secondarily

Edit 2: For 4.x a possible improvement we can look into would be to have a compiler conversion that somehow converts:

a = b + c + d

into:

a = String().join(PackedStringArray([b, c, d]))

As an optimization step (it’d actually use a special method of course with pre-processed stuff so it isn’t slow)

I tested and:

sz = sz + "blah" + "blah" + str(n) # ~3 s.
sz = String().join([sz, "blah", "blah", str(n)]) # ~1.3 s

So then it seems the problem behaves the same way in both cases, interesting, can’t profile my version at the moment because I’m on Windows and haven’t got a decent setup ready, but it seems the behavior is the same

That makes it more feasible for me to dig through some of the 4.x GDScript code when I find the time

But for now I’m gonna have some birthday cake and relax, good luck with the digging!

A bit difficult to pin down an exact solution, but the compound assignment operator doesn’t actually use compound assignment in GDScript, and it might be difficult to exactly work out how to do the assignment depending on where it’s available, especially with getters/setters in the mix

I’d say that for properties it’s not going to be feasible to avoid this, but there’s a significant overhead for them anyway

So specifically for plain variables it seems to be the case, but it depends on how to carry it out, we could add dedicated compound assignments for those to handle this, but it’s very specific, but should be some way to wipe the target before we carry it out after having copied the data, will try look into something tomorrow

Edit: dug through and tested some and the current system in 4.x doesn’t allow this directly, will test some more tomorrow but we’d probably need a VM compound assignment operator, might be possible to work around it though

(1) and (2) seem within margin of error, there might be an extra allocation but likely not copying.

(3) is interesting as it is way faster than (2).

It seems likely the entire right side is added first (forming a single line) that is added once to the main string.

Whereas in (2), I’d hazard a guess it is adding the main string to each element on the right side in turn, hence is doing many times more copying.

(4) is clear winner as expected, as it does no long string operations until the final merge.

Actually it’s slower in 4.x on my machine, the example with the string takes 6975 ms for me, but will try to break it down into what is happening later, probably tomorrow, but I think the problem is different in 4.x

For comparison, changing the string append to sz = sz + ... takes over 20 s

Also rewrote the code a bit to ensure we’re benchmarking just the appends and stuff:

func benchmark_strings() -> String:
	var sz = "start"
	var arr = PackedStringArray()
	
	for n in range(20000):
		# 1
		#var tmp = sz
		#sz = ""
		#sz = tmp + "blah" + "blah" + str(n) # Takes ~3 s with 20000 runs
		
		# 2
		#sz = sz + "blah" + "blah" + str(n) # Takes ~3 s with 20000 runs
		
		# 3
		#sz += "blah" + "blah" + str(n) # Takes ~1 s with 20000 runs
		
		# 4
		# arr.append("blah" + "blah" + str(n))
		
		pass
	
	# 4
	# sz = "".join(arr) # Takes 18 ms with 20000 runs
	
	return sz


func _ready():
	var before = Time.get_ticks_msec()
	var sz := benchmark_strings()
	var after = Time.get_ticks_msec()
	
	print("String length is " + str(sz.length()))