STA indirect indexed double-increments PPU address?

Discuss technical or other issues relating to programming the Nintendo Entertainment System, Famicom, or compatible systems.

Moderator: Moderators

exdeath
Posts: 103
Joined: Sat Sep 15, 2012 6:58 pm

Re: STA indirect indexed double-increments PPU address?

Post by exdeath »

Dwedit wrote:STA (xx),Y adds a dummy read.

Code: Select all

        1      PC       R  fetch opcode, increment PC
        2      PC       R  fetch pointer address, increment PC
        3    pointer    R  fetch effective address low
        4   pointer+1   R  fetch effective address high,
                           add Y to low byte of effective address
        5   address+Y*  R  read from effective address,
                           fix high byte of effective address
        6   address+Y   W  write to effective address
They did it this way in case they needed to fix up the high byte before performing a write, because they figured that reads wouldn't have side effects like writes would.
Trying to think of a need that can abuse this where you'd want interleaved writes to CIRAM :twisted:
User avatar
Dwedit
Posts: 4470
Joined: Fri Nov 19, 2004 7:35 pm
Contact:

Re: STA indirect indexed double-increments PPU address?

Post by Dwedit »

DMC will still screw with the read.
Here come the fortune cookies! Here come the fortune cookies! They're wearing paper hats!
User avatar
blargg
Posts: 3717
Joined: Mon Sep 27, 2004 8:33 am
Location: Central Texas, USA
Contact:

Re: STA indirect indexed double-increments PPU address?

Post by blargg »

exdeath wrote:Trying to think of a need that can abuse [extra read] where you'd want interleaved writes to CIRAM :twisted:
As I remember, the behavior depended on the CPU-PPU clock alignment at power, that it wasn't reliably the same each time.
Sik
Posts: 1589
Joined: Thu Aug 12, 2010 3:43 am

Re: STA indirect indexed double-increments PPU address?

Post by Sik »

Dwedit wrote:STA (xx),Y adds a dummy read.

Code: Select all

        1      PC       R  fetch opcode, increment PC
        2      PC       R  fetch pointer address, increment PC
        3    pointer    R  fetch effective address low
        4   pointer+1   R  fetch effective address high,
                           add Y to low byte of effective address
        5   address+Y*  R  read from effective address,
                           fix high byte of effective address
        6   address+Y   W  write to effective address
They did it this way in case they needed to fix up the high byte before performing a write, because they figured that reads wouldn't have side effects like writes would.
That's... pretty stupid, couldn't they have made it so that the bus was left unused in that 5th cycle? Pretty sure that reads with side-effects were already common when the 6502 was first designed =/
User avatar
tokumaru
Posts: 12106
Joined: Sat Feb 12, 2005 9:43 pm
Location: Rio de Janeiro - Brazil

Re: STA indirect indexed double-increments PPU address?

Post by tokumaru »

Sik wrote:couldn't they have made it so that the bus was left unused in that 5th cycle?
I'm sure they did everything they could to keep costs down when designing the 6502, so you can be certain that this decision was made to reduce the number of transistors.
Pretty sure that reads with side-effects were already common when the 6502 was first designed =/
Yes, but they probably assumed that the more exotic addressing modes wouldn't be commonly used to access memory-mapped registers.
exdeath
Posts: 103
Joined: Sat Sep 15, 2012 6:58 pm

Re: STA indirect indexed double-increments PPU address?

Post by exdeath »

tokumaru wrote:
Sik wrote:couldn't they have made it so that the bus was left unused in that 5th cycle?
I'm sure they did everything they could to keep costs down when designing the 6502, so you can be certain that this decision was made to reduce the number of transistors.
Pretty sure that reads with side-effects were already common when the 6502 was first designed =/
Yes, but they probably assumed that the more exotic addressing modes wouldn't be commonly used to access memory-mapped registers.
Or auto incrementing single address FIFOs. This is both the fault of the 6502 and the PPU combined, not just the 6502.
This exact combination of CPU, addressing mode, and PPU port is like winning the lottery or something.
User avatar
rainwarrior
Posts: 8062
Joined: Sun Jan 22, 2012 12:03 pm
Location: Canada
Contact:

Re: STA indirect indexed double-increments PPU address?

Post by rainwarrior »

exdeath wrote:This exact combination of CPU, addressing mode, and PPU port is like winning the lottery or something.
Don't forget I also did it by discovering a compiler bug!
User avatar
qbradq
Posts: 952
Joined: Wed Oct 15, 2008 11:50 am

Re: STA indirect indexed double-increments PPU address?

Post by qbradq »

It wasn't a bug, it's what you wrote :D cc65 did exactly what you told it to.
tepples
Posts: 22345
Joined: Sun Sep 19, 2004 11:12 pm
Location: NE Indiana, USA (NTSC)
Contact:

Re: STA indirect indexed double-increments PPU address?

Post by tepples »

Subscripting in C is commutative. If the code that a compiler generates for a, b[a], and *(a+b) differs with optimization turned on, then either A. the compiler is Doing It Wrong with respect to efficiency of commutative operations in general or B. you are coding in C++ and have overloaded some operator.
User avatar
blargg
Posts: 3717
Joined: Mon Sep 27, 2004 8:33 am
Location: Central Texas, USA
Contact:

Re: STA indirect indexed double-increments PPU address?

Post by blargg »

Or C, the optimizer is simplistic and takes advantage of the fact that most people write array[n] instead of n[array]. That is, it generates better code for most array expressions, and that's good enough. Handling it fully generally would be more work just for obscure cases.
User avatar
rainwarrior
Posts: 8062
Joined: Sun Jan 22, 2012 12:03 pm
Location: Canada
Contact:

Re: STA indirect indexed double-increments PPU address?

Post by rainwarrior »

tepples wrote:Subscripting in C is commutative. If the code that a compiler generates for a, b[a], and *(a+b) differs with optimization turned on, then either A. the compiler is Doing It Wrong with respect to efficiency of commutative operations in general or B. you are coding in C++ and have overloaded some operator.


The problem here is there's a difference between array[7] and pointer[7]. One of them has a fixed address at compile/link time, and one of them needs to be resolved by extra code. cc65 normally correctly identifies these two types, and does generate different code for the array (absolute) and pointer variable (indirect indexed). This is neither incorrect, nor undesired, and does not require optimization to be turned on.

In the ((unsigned char*)0)[7] example, it is compiled as if this literal is cast to a pointer variable, rather than an array, and does all the things associated with such a thing. Yes, all arrays can be generically considered pointer variables, but that is a generalization which would generate a lot more (bigger/slower) code than necessary. This isn't really part of the optimization process; this is a problem further up the pipe. If the type is misidentified, you can't optimize away the indirection.

Anyhow, I'll report this to the cc65 mailing list, since someone on the project might be interested in fixing this problem. If not, we've covered a few ways to avoid it already. This was just a case of very poor code generation, which I do consider a bug, but I've no wish to argue the semantics of what should or should not be classified a bug. Yes the code is correct (when the extra read has no side-effect), but it's also slow as hell compared to what it could be.
Sik
Posts: 1589
Joined: Thu Aug 12, 2010 3:43 am

Re: STA indirect indexed double-increments PPU address?

Post by Sik »

I just remembered this quirk of the 6502. More specifically:

Code: Select all

        When an NMI occurs, the processor jumps to Kernal code, which jumps to
        ($0318), which points to the following routine:

        DD09    LSR $40         ; clear N flag
                BPL $DD0A       ; Note: $DD0A contains RTI.

        Operational diagram of BPL $DD0A:

          #  data  address  R/W  description
         --- ----  -------  ---  ---------------------------------
          1   10    $DD0B    R   fetch opcode
          2   11    $DD0C    R   fetch argument
          3   xx    $DD0D    R   fetch opcode, add argument to PCL
          4   40    $DD0A    R   fetch opcode, (fix PCH)
Not to mention all the spurious accesses that can happen everywhere when crossing pages (lower byte is updated in a different cycle than the higher byte but the bus is still taken by the processor). Yeah, it's a mess. Looks like they literally assumed reads would never have side effects.
tepples wrote:Subscripting in C is commutative. If the code that a compiler generates for a, b[a], and *(a+b) differs with optimization turned on, then either A. the compiler is Doing It Wrong with respect to efficiency of commutative operations in general or B. you are coding in C++ and have overloaded some operator.

The standard doesn't require compilers to generate the most optimal code, only to ensure the final results are correct =P Though one could argue that the 6502 quirk here prevents it from being correct... (though we're starting to enter the realm of platform-specific hacks).
Post Reply