cc65: Regression: "0 < i--" is optimized to "0 < --i".

I have tried to build “Sir Ababol” using the latest version of cc65. For some reason, it had a very glitchy title screen. I have investigated it a bit, and it turned out that the issue is in this function:

void __fastcall__ un_rle_screen (unsigned char *rle) {
	wx = 0; wy = 0; i = 0;
	while (i < 240) {
		t = (*rle);
		rle ++;
		if (t & 128) {
			top = t & 127;
			t = (*rle);
			rle ++;
			while (0 < top--) {
				draw_tile (wx, wy, t);
				wx = (wx + 2) & 31; if (!wx) wy += 2;
				i ++;
			}
		} else {
			draw_tile (wx, wy, t);
			wx = (wx + 2) & 31; if (!wx) wy +=2;
			i ++;
		}
	}
	// 64 attributes
	i = 64; gp_addr = 0x23c0;
	while (0 < i--) {
		vram_adr (gp_addr++);
		vram_put (*rle++);
	}
}

Just look at this part of generated assembly:

;
; while (0 < top--) {
;
L330D:	lda     #$00
	dec     _top
	cmp     _top
	bcc     L330B
;
; } else {
;
	jmp     L3B6B

It decrements _top; and only then, it does the comparison. But, it should do comparison first; and only then, it should decrement _top. I have replaced it to this code:

L330D:
	lda _top
	dec _top
	cmp #$00
	beq L3B68
	bcs L330B
	jmp L3B68

And now, it works as expected.

Old cc65 had generated this monstrous code:

;
; while (0 < top--) {
;
L330D:	jsr     push0
	lda     _top
	pha
	sec
	sbc     #$01
	sta     _top
	pla
	jsr     tosicmp0
	bcc     L330B
;
; } else {
;
	jmp     L3B6B

It seems that someone had tried to add a great optimization, and forgot to take into account the difference between --i and i--. It would be really nice to preserve the optimization, but to do it in a right way.

About this issue

  • Original URL
  • State: closed
  • Created 6 years ago
  • Comments: 17 (16 by maintainers)

Commits related to this issue

Most upvoted comments

Register usage isn’t currently tracked correctly for OptStackOps transformations, which is the cause of this issue as well as #167. I’m working on an improved way to handle this.

The error is introduced in the compiler pass “OptStackOps”. I used a slightly simplified version of the test program posted by @greg-king5 .

static unsigned char failure = 1;
static unsigned char count = 1;

int main(void)
{
    while (0 < count--)
    {
        --failure;
    }
    return failure;
}

Last good pass:

;
; while (0 < count--)
;
	jmp     L0006
;
; --failure;
;
L0004:	ldx     #$00
	dec     _failure
	lda     _failure
;
; while (0 < count--)
;
L0006:	ldx     #$00
	txa
	jsr     push0
	lda     _count
	dec     _count
	jsr     tosicmp0
	bcc     L0004
;
; return failure;
;
	ldx     #$00
	lda     _failure
;
; }
;
	rts

After OptStackOps:

;
; while (0 < count--)
;
	jmp     L0006
;
; --failure;
;
L000A:	dec     _failure
	lda     _failure
;
; while (0 < count--)
;
L0006:	ldx     #$00
	txa
	dec     _count
	cmp     _count	; (A)
	bcc     L000A
;
; return failure;
;
	lda     _failure
;
; }
;
	rts

On line (A), a new cmp instruction is inserted by the optimizer pass, testing count after it has already been decremented, despite the fact that it was post-decremented.

static int failure = 1;
static unsigned char count = 1;

int main(void)
{
    while (0 < count--) {
        --failure;
    }
    return failure;
}

The unoptimized program succeeds. The optimized program fails.

its still close to impossible to fix anything without a minimal testcase (and then the testcase will go into the regression tests, to make sure it wont break again)