Page 1 of 1

MMC1 question - sequential writes & loops

Posted: Wed Jul 29, 2015 2:25 am
by mikaelmoizt
These questions might have been answered before on this forum, but I could not find anything. :oops:

Ok, so lets say you want to make a regular upper bank switch (or whatever you want to accomplish)

Code: Select all

switch:
  sta $ffff
  lsr 
  sta $ffff
  lsr 
  sta $ffff
  lsr 
  sta $ffff
  lsr 
  sta $ffff
  rts
I'm all fine with that you need to write 5 times, the 5th time is what matters and all that stuff.
Looking into commercial games - all have the same procedure as above, so I am not questioning the wiki nor the setup itself.

But.. to me it seems like

Code: Select all

switch:
 ldx #$05
 
write:
 sta $ffff
 lsr
 dex
 bne write
 rts
would do the same thing, but waste fewer cycles and memory? Is there a timing issue with doing this? I have tested this loop method and it worked just fine (at least in an emulator, not tested with hardware).
Seeing all the compression, cycle critical code and what not that went into making some of the commercial games, it makes you wonder why this is so "inefficient".

ps: for anyone stumbling upon this thread; in a nutshell, yes you can loop writes like this, but it will only save memory, not execution time. In fact it will increase cycles (a lot..)

Re: MMC1 question - sequential writes & loops

Posted: Wed Jul 29, 2015 3:15 am
by rainwarrior
The timing is not critical. You can even do 4 writes, and then save your 5th write for a later time for a quick bankswitch.

Your version has two disadvantages: 26 extra cycles of execution, clobbers X register.
Your version has one advantage: 12 fewer bytes of code.

I don't think your version is clearly better at all?

Even if it was, commercial games are hardly an example of best or most-efficient solutions. Commercial programmers are usually more interested in finishing the job than trying to optimize every last thing. In a lot of cases they'll just copy a solution that they know already works (e.g. used in previous game, given to them in example code, etc.) rather than spend time re-evaluating it.

Re: MMC1 question - sequential writes & loops

Posted: Wed Jul 29, 2015 4:40 am
by mikaelmoizt
Please note that this is just for trying to understand some of the things I never got. It would be solved by just using another mapper if I made my own game.. 8-)
Also, this issue is leaning more towards hacking than homebrewing.
Still, there is some food for thought here..
rainwarrior wrote:The timing is not critical. You can even do 4 writes, and then save your 5th write for a later time for a quick bankswitch.
I see.
rainwarrior wrote:Your version has two disadvantages: 24 extra cycles of execution, clobbers X register.
Your version has one advantage: 12 fewer bytes of code.
As you might have guessed, it was all about the code size issue when you "occupy" X-reg like that. I did not realize it took more cycles though. Anyway, here is what I am thinking:

You have a fixed bank, with many small parts of the core engine of your game. Upper bank could be switched for feeding engine with "everything".
So in order to keep execution outside fixed bank as little as possible, you are sooner or later going to be size conservative.. right?
rainwarrior wrote: I don't think your version is clearly better at all? (Also you have a typo in your BNE instruction; I think you meant to put the "write" label in there.)
Well it does save space when you value every byte you have left :P

Also, yes, code was wrong. Sorry. Corrected :roll:
I guess that answers my starting question: yes it does work like that, but I never saw it being done like that.
rainwarrior wrote: Even if it was, commercial games are hardly an example of best or most-efficient solutions. (..)
And that is exactly why I want to "improve" things like that to make room for features.
Also, you are spot on.
ghost.jpg
Even though this must have been generated by a macro..

Re: MMC1 question - sequential writes & loops

Posted: Wed Jul 29, 2015 5:34 am
by tokumaru
mikaelmoizt wrote:I did not realize it took more cycles though.
Maybe you counted the cycles of a single iteration? But remember, this is a loop, so the bottom part of the code will run multiple times, and you obviously have to add up the cycles of every iteration.

For something this small, I'll always pick speed over size, especially if I need to bankswitch several times in a single frame.

You probably have a high-level programming background, where loops are often the cleanest way to implement repetitive operations. In 8-bit assembly though, where speed is lacking and the registers are few, it's very common to unroll small loops like these. In some cases it's even necessary to unroll big loops if you expect to finish certain tasks in time (such as updating VRAM).

Re: MMC1 question - sequential writes & loops

Posted: Wed Jul 29, 2015 9:10 am
by thefox
mikaelmoizt wrote:Even though this must have been generated by a macro..
Whenever you see an unrolled piece of code like that, especially when transferring data to the PPU, it's almost guaranteed that it was done to maximize performance (at the cost of using up more ROM space than a loop would). In many cases if you "optimize" it back to a loop, the game may start to run out of vblank time.

Re: MMC1 question - sequential writes & loops

Posted: Wed Jul 29, 2015 9:17 am
by rainwarrior
mikaelmoizt wrote:I want to "improve" things like that to make room for features.
- picture -
Even though this must have been generated by a macro..
Some assemblers have handy ways to write unrolled loops like this. In ca65 the source code for the pictured code might have looked like:

Code: Select all

.repeat 32, I
    lda vram_buffer+I
    sta $2007
.endrepeat
Just because the code is repetitive doesn't mean it's bad. This technique is actually one of the more effective ways to write data to the PPU quickly. During your very limited vertical blank time, often every cycle counts. Here's a recent thread with more info: viewtopic.php?f=2&t=13037

Re: MMC1 question - sequential writes & loops

Posted: Wed Jul 29, 2015 11:17 am
by mikaelmoizt
:shock:
Woo. I did some calculations on my own and yes indeed ; looping uses a lot more cycles when I recreated what was done in my screenshot. (Ghostbusters btw)
If I were to transfer 256 bytes from ram to ppu like that, it would mean using 1552 bytes of the rom versus a loop, 11 bytes taking twice the cycles.

I am starting to get a general idea of how size vs. execution time is valued depending on what the code is supposed to do.
The technically possible 'how's and 'why's sometimes matter less than the more direct way of "just doing things" because it is actually good enough or even better in some cases.
* 1 XP recieved *

Re: MMC1 question - sequential writes & loops

Posted: Wed Jul 29, 2015 12:14 pm
by Kasumi
You can also do partially unrolled loops.

For transferring 16 bytes (all zero for simplicity), you might do this:

Code: Select all

  lda #$00
  ldy #$03

loop:
  sta $2007
  sta $2007
  sta $2007
  sta $2007
  dey
  bpl loop
4 write loops 4 times. (As opposed to 16 writes, or 1 write looped 16 times)

For 256 bytes, you could do 16 writes looped 16 times. Much, much fewer bytes lost, still somewhat fast.