cc65: Performance regression of generated code in comparison to cc65 v2.13.3

There are dozens of NES games which are written using cc65. But they still often use cc65 v2.13.3 because newer compiler generates slower code, and sometimes it is crucial. Time between frames is very limited, and if the code is not fast enough, the game will become twice slower.

Shiru provided me “Sir Ababol” from Mojon Twins as an example of a game which encountered the issue and had to be built using cc65 v2.13.3 to work properly. It is not the only example. I also was playing with cc65 and also encountered the same issue years ago, when cc65 was v2.14. I had even written about it to the mailing list then (here, I was a newbie in this area in 2013, so my explanation wasn’t good enough). The issue wasn’t resolved, so I decided to stick to cc65 v2.13.3 as others do.

Today I have tried to check if cc65 v2.17 generates at least as fast code as v2.13.3… And still no success.

Comparison of assembly listings reveals the reason (v2.17 on the left, v2.13.3 on the right): diff

It generates calls to slow external subroutines instead of fast inline instructions. For example, jsr pushax: 50 cycles instead of 8. And it is inside of a loop. Of course it is slow. -Oi and other optimization flags don’t help.

The demo to play with: http://veg.by/temp/cc65-regression.zip

About this issue

  • Original URL
  • State: closed
  • Created 6 years ago
  • Comments: 25 (25 by maintainers)

Commits related to this issue

Most upvoted comments

This small piece of code causes it:

                /* If the registers from the push (A/X) are used before they're
                 * changed, we cannot change the sequence, because this would
                 * with a high probability change the register contents.
                 */
                UsedRegs |= E->Use;
                if ((UsedRegs & ~ChangedRegs) & REG_AX) {
                    I = Data.PushIndex;
                    State = Initialize;
                    break;
                }
                ChangedRegs |= E->Chg;
                break;

According to the comment, it is something useful. Now we should understand if it is really needed and if the code works as it was expected by the author =)

My goal is reducing code size at almost any cost. So I used it and tried it in each single function and I got positive results in all the these cases.

Been there, done that 😉

i’m not sure if this is really a regression - eg for what i do with cc65, optimizing for size is much more important than speed. no idea how to fix this though 😃