Well, honestly it's not like the operation of cc65's optimizer is at all obvious. It generates the same initial assembly code whether or not -O is used, and then after doing this -O does a series of pattern match steps to refactor the already generated assembly code. It's a bit of a backward approach to the problem, and really weird. It would be much better to start optimizing at a higher level, but I'm not going to fantasize too much about that. This is what the compiler we have does.DRW wrote:I'm just a bit confused why even the assigned value makes a difference in calculating the array itself.
If you want to understand what it does, like I said just above, you can use --debug-opt-output to observe the process if you're curious. It will show you exactly what the initial generated assembly code is, and every step it takes to optimize.
If you want an example:
Code: Select all
// starting C line
(oam+3)[oam_pos] = x+5;
; 1. initial generated assembly:
lda #<(_oam+3)
ldx #>(_oam+3)
clc
adc _oam_pos
bcc L02DF
inx
L02DF: jsr pushax
ldy #$04
ldx #$00
lda (sp),y
jsr incax5 ; this is the +5 operation, which critically breaks up the optimization pattern (see below)
ldx #$00
ldy #$00
jsr staspidx
; 2. OptAdd5
lda #<(_oam+3)
ldx #>(_oam+3)
clc
adc _oam_pos
bcc L02DF
inx
L02DF: jsr pushax
ldy #$04
ldx #$00
lda (sp),y
clc
adc #$05 ; jsr incax5 replaced with an inline add
ldx #$00
ldy #$00
jsr staspidx
; 3. OptUnusedLoads
lda #<(_oam+3)
ldx #>(_oam+3)
clc
adc _oam_pos
bcc L02DF
inx
L02DF: jsr pushax
ldy #$04 ; unused ldx #00 eliminated
lda (sp),y
clc
adc #$05
ldy #$00 ; unused ldx #00 eliminated
jsr staspidx
; 4. OptStackOps
lda #<(_oam+3)
ldx #>(_oam+3)
clc
adc _oam_pos
bcc L02DF
inx
L02DF: sta ptr1 ; jsr pushax / staspidx (temporary pointer on stack) replaced by keeping it in ptr1
stx ptr1+1
ldy #$02
lda (sp),y
clc
adc #$05
ldy #$00
sta (ptr1),yFor comparison, without the intervening +5:
Code: Select all
// starting C line
(oam+3)[oam_pos] = x;
; 1. initial generated assembly
lda #<(_oam+3)
ldx #>(_oam+3)
clc
adc _oam_pos
bcc L02DF
inx
L02DF: jsr pushax
ldy #$04
ldx #$00
lda (sp),y
ldy #$00
jsr staspidx
; 2. OptPtrStore2
lda #<(_oam+3)
ldx #>(_oam+3)
ldy #$02
ldx #$00
lda (sp),y
ldy _oam_pos
sta _oam+3,y ; the temporary pointer on stack is eliminated, and the index add is replaced with Y index
; 3. OptUnusedLoads
ldy #$02 ; unused ldx #00 eliminated
lda (sp),y
ldy _oam_pos
sta _oam+3,yThe optimization patterns are each a function in the cc65 source. They tend to have a good explanation of the operation. Looking up OptPtrStore2:
Code: Select all
unsigned OptPtrStore2 (CodeSeg* S)
/* Search for the sequence:
**
** clc
** adc xxx
** bcc L
** inx
** L: jsr pushax
** ldy yyy
** ldx #$00
** lda (sp),y
** ldy #$00
** jsr staspidx
**
** and replace it by:
**
** sta ptr1
** stx ptr1+1
** ldy yyy-2
** ldx #$00
** lda (sp),y
** ldy xxx
** sta (ptr1),y
**
** or by
**
** ldy yyy-2
** ldx #$00
** lda (sp),y
** ldy xxx
** sta (zp),y
**
** or by
**
** ldy yyy-2
** ldx #$00
** lda (sp),y
** ldy xxx
** sta label,y
**
** or by
**
** ldy yyy-2
** ldx #$00
** lda (sp),y
** ldy xxx
** sta $xxxx,y
**
** depending on the code preceeding the sequence above.
*/The problem is simply that each of them can only match a very simple pattern. It's looking for a specific beginning, middle, and end. You can't just start adding arbitrary extra code in the middle (i.e. the expression to be resolved), that new stuff in the middle has to fit a pattern that's known to be safe to optimize. Probably possible to write such a thing, but it's complicated and difficult... so, instead only this bunch of simpler cases were written.
When you put an expression in a temporary variable first, that code gets generated before the array access portion, so doesn't interfere with the simple "generate address, fetch, store" pattern that these are capable of matching.
Anyhow, that's just one example of how to analyze cc65's optimizer. It's really all spelled out by --debug-opt-output, so if you want to know about any specific cases you're dealing with, that's the tool to use.
Still probably easier to just rewrite offending code in assembly as needed* than it is to try and understand whatever this byzantine optimizer is doing, though.
* ...and don't worry about it when it's not needed.