runtime: Iterating over a string builder by index becomes ~exponentially slow for large builders
This is motivated by this issue originally opened against MSBuild, which internally in its reuseable string builder factory compares string builders by index. It happens to start from the end, but that doesn’t seem to matter. The opener observed this can be enormously slower that creating a string first.
Using this benchmark program I get the following results:
BenchmarkDotNet=v0.10.11, OS=Windows 10 Redstone 3 [1709, Fall Creators Update] (10.0.16299.98)
Processor=Intel Xeon CPU E5-1620 0 3.60GHz, ProcessorCount=8
Frequency=3507174 Hz, Resolution=285.1299 ns, Timer=TSC
[Host] : .NET Framework 4.7 (CLR 4.0.30319.42000), 32bit LegacyJIT-v4.7.2600.0
DefaultJob : .NET Framework 4.7 (CLR 4.0.30319.42000), 32bit LegacyJIT-v4.7.2600.0
Method | N | Mean | Error | StdDev |
---------------- |----- |-------------:|------------:|------------:|
ToStringFirst | 100 | 2.028 us | 0.0403 us | 0.0465 us |
IndexedBackward | 100 | 21.399 us | 0.4123 us | 0.5063 us |
IndexedForward | 100 | 23.020 us | 0.4461 us | 0.4581 us |
ToStringFirst | 1000 | 43.210 us | 0.5032 us | 0.4202 us |
IndexedBackward | 1000 | 323.959 us | 7.3317 us | 6.8581 us |
IndexedForward | 1000 | 335.538 us | 6.3072 us | 5.8997 us |
ToStringFirst | 5000 | 217.657 us | 4.1058 us | 4.3931 us |
IndexedBackward | 5000 | 6,542.491 us | 131.7964 us | 223.8003 us |
IndexedForward | 5000 | 6,301.246 us | 102.0129 us | 90.4317 us |
64 bit
BenchmarkDotNet=v0.10.11, OS=Windows 10 Redstone 3 [1709, Fall Creators Update] (10.0.16299.98)
Processor=Intel Xeon CPU E5-1620 0 3.60GHz, ProcessorCount=8
Frequency=3507174 Hz, Resolution=285.1299 ns, Timer=TSC
[Host] : .NET Framework 4.7 (CLR 4.0.30319.42000), 64bit RyuJIT-v4.7.2600.0
DefaultJob : .NET Framework 4.7 (CLR 4.0.30319.42000), 64bit RyuJIT-v4.7.2600.0
Method | N | Mean | Error | StdDev |
---------------- |----- |-------------:|------------:|------------:|
ToStringFirst | 100 | 1.944 us | 0.0369 us | 0.0410 us |
IndexedBackward | 100 | 22.955 us | 0.2145 us | 0.1902 us |
IndexedForward | 100 | 22.674 us | 0.2493 us | 0.2332 us |
ToStringFirst | 1000 | 40.258 us | 0.4520 us | 0.4007 us |
IndexedBackward | 1000 | 325.231 us | 4.6369 us | 4.3374 us |
IndexedForward | 1000 | 332.819 us | 8.0014 us | 7.4845 us |
ToStringFirst | 5000 | 200.979 us | 1.2610 us | 1.1179 us |
IndexedBackward | 5000 | 7,950.240 us | 157.2641 us | 253.9522 us |
IndexedForward | 5000 | 8,137.382 us | 161.3316 us | 251.1739 us |
Clearly indexing is going to have a lot of overhead such as bounds checking. But I would expect that to be proportional to N, and we see a big jump when N gets large. Related to fragmentation of the SB? Can we do better?
About this issue
- Original URL
- State: closed
- Created 7 years ago
- Comments: 29 (21 by maintainers)
@jkotas I can’t believe that the best SB iterator is a recursive function that calls a callback function!.. If SB has only 10 MB of text, this means 10 million pushes to the stack waiting for 10 millions callbacks, just to print or count something. Using enumerators is the only efficient way.