Trying to structure my code for PRG bank switching
Moderator: Moderators
Trying to structure my code for PRG bank switching
After making several small NROM projects, I want to try and start working on a bigger project that I've had in mind for a while. I know that at the very least I'll probably need MMC1, as I'm thinking that WRAM will definitely be necessary. I've messed around with CNROM a little, but bank switching, especially PRG bank switching, is still pretty foreign of a concept to me.
From what I understand (And by all means, correct me if I've got this all wrong.) from having read the MMC1 page on the wiki, you can have part of the rom be fixed, and have another part be swapped. I'm guessing that I'd want the code for my engine in the fixed bank, and then all of my .db's and code for each game state and whatnot in the banks that are switchable. I've also heard that things like interrupt vectors have to be added to the end of each bank. So I guess what I'm asking here is what all is generally put in the switched banks?
From what I understand (And by all means, correct me if I've got this all wrong.) from having read the MMC1 page on the wiki, you can have part of the rom be fixed, and have another part be swapped. I'm guessing that I'd want the code for my engine in the fixed bank, and then all of my .db's and code for each game state and whatnot in the banks that are switchable. I've also heard that things like interrupt vectors have to be added to the end of each bank. So I guess what I'm asking here is what all is generally put in the switched banks?
- rainwarrior
- Posts: 8062
- Joined: Sun Jan 22, 2012 12:03 pm
- Location: Canada
- Contact:
Re: Trying to structure my code for PRG bank switching
There's really only two critical requirements when it comes to what data can go in a bank:
1. Any bank that can appear at $FFFA-FFFF should have a reset vector (and NMI/IRQ vectors if needed). You might also need some reset stub code somewhere in the bank so it has somewhere to point to.
2. When playing a DPCM sample, the bank it is in should remain resident, or else there will be audible errors when that bank is switched out. (Only affects $C000-FFFF. MMC3 and FME7 have a convenient 8k bank at $C000-DFFF which helps greatly for DPCM sample banking.)
If you are using a mapper with a fixed upper bank (e.g. UxROM), then #1 is really easy to satisfy. If you're using a mapper that could have any bank on reset, you'll need a reset vector and reset stub code in each bank. #2 isn't relevant if you simply don't use DPCM (also makes controller reading simpler).
Otherwise, just make sure that the correct bank is switched in before you try to fetch data from it, or jump to code in it. NMI and IRQ can be tricky in some cases; if your NMI needs to do some bankswitching, you'll want it to put the banks back as they were before your return from it.
I personally am very fond of AxROM/BNROM 32k banking. It's not very good for DPCM, but having 32k banks makes it really easy to organize data. All the music code and data goes together in a single bank, graphics unpacking code and data goes in a bank together, level data and loading code goes together, etc.
I tend to treat going to another bank as a function call, i.e. I jsr to a special banked-code entry function, which bankswitches, does the thing it's there to do, then switches back to the original bank before returning.
1. Any bank that can appear at $FFFA-FFFF should have a reset vector (and NMI/IRQ vectors if needed). You might also need some reset stub code somewhere in the bank so it has somewhere to point to.
2. When playing a DPCM sample, the bank it is in should remain resident, or else there will be audible errors when that bank is switched out. (Only affects $C000-FFFF. MMC3 and FME7 have a convenient 8k bank at $C000-DFFF which helps greatly for DPCM sample banking.)
If you are using a mapper with a fixed upper bank (e.g. UxROM), then #1 is really easy to satisfy. If you're using a mapper that could have any bank on reset, you'll need a reset vector and reset stub code in each bank. #2 isn't relevant if you simply don't use DPCM (also makes controller reading simpler).
Otherwise, just make sure that the correct bank is switched in before you try to fetch data from it, or jump to code in it. NMI and IRQ can be tricky in some cases; if your NMI needs to do some bankswitching, you'll want it to put the banks back as they were before your return from it.
I personally am very fond of AxROM/BNROM 32k banking. It's not very good for DPCM, but having 32k banks makes it really easy to organize data. All the music code and data goes together in a single bank, graphics unpacking code and data goes in a bank together, level data and loading code goes together, etc.
I tend to treat going to another bank as a function call, i.e. I jsr to a special banked-code entry function, which bankswitches, does the thing it's there to do, then switches back to the original bank before returning.
Re: Trying to structure my code for PRG bank switching
What do you do if you have 192 KiB of map data and 192 KiB of tile data? What I'm doing in my current project, which runs on a 512 KiB oversize BNROM, involves sticking copy-to-RAM and unpack routines in 192 bytes of RAM. Or is it common to put the unpacker in all ROM banks?rainwarrior wrote:I personally am very fond of AxROM/BNROM 32k banking. It's not very good for DPCM, but having 32k banks makes it really easy to organize data. All the music code and data goes together in a single bank, graphics unpacking code and data goes in a bank together, level data and loading code goes together, etc.
- rainwarrior
- Posts: 8062
- Joined: Sun Jan 22, 2012 12:03 pm
- Location: Canada
- Contact:
Re: Trying to structure my code for PRG bank switching
I can't imagine that you need me to provide for you a general rule to solve such a specific problem. Do what seems to fit best for your case (or just do anything that gets the job done, really).tepples wrote:What do you do if you have 192 KiB of map data and 192 KiB of tile data? What I'm doing in my current project, which runs on a 512 KiB oversize BNROM, involves sticking copy-to-RAM and unpack routines in 192 bytes of RAM. Or is it common to put the unpacker in all ROM banks?
Re: Trying to structure my code for PRG bank switching
It depends on the mapper. Some have a fixed part, some don't (the MMC1 lets you choose). When you don't have a fixed part, it's common to simulate one by replicating small pieces of code across multiple banks. CPU vectors and a reset stub should be present in all banks, and trampoline code should be present in banks that "talk" to each other.Sogona wrote:you can have part of the rom be fixed, and have another part be swapped.
That really depends on how much space each part of your game needs. I like to optimize things for the main game, so I'd put as much of the main game engine in the fixed bank as possible (physics, object management, etc.), allowing the switchable part to be used for data (level maps) and less common code (some enemy A.I. maybe). This might mean putting anything not related to the main game engine (reset code, splash screens, menus, etc.) in separate switchable banks, so as to not waste any space with things that are not necessary during the most important part of the program.I'm guessing that I'd want the code for my engine in the fixed bank, and then all of my .db's and code for each game state and whatnot in the banks that are switchable.
Without a fixed bank, my approach would be to dedicate 1 or 2 banks to controlling the game states, and have everything else be data along with the functions that make use of hat data. For example, banks with level data would also have a function to check for collisions between objects and the level map.
Only when the mapper doesn't have a fixed bank where the CPU vectors are. The MMC1 has bankswitching modes where the vectors are switchable, so in order to be completely safe you should have a reset stub at the end of every 16KB bank. This could be unnecessary depending on the power up state of the MMC1, but I couldn't find any information about that.I've also heard that things like interrupt vectors have to be added to the end of each bank.
Re: Trying to structure my code for PRG bank switching
I'm leaning towards this approach. When you use 32KB PRG-ROM banks, it's a given that you'll be wasting space with redundancy. I'd rather keep things simple and fast, even if that means losing 1, 2 or 3KB out of every 32KB. That's not such a big deal.tepples wrote:Or is it common to put the unpacker in all ROM banks?
- rainwarrior
- Posts: 8062
- Joined: Sun Jan 22, 2012 12:03 pm
- Location: Canada
- Contact:
Re: Trying to structure my code for PRG bank switching
Sogona wrote:I'm guessing that I'd want the code for my engine in the fixed bank...
It's not like code in a switchable bank runs any slower than in a fixed bank. The bankswitch itself takes a handful of cycles, but as long as your banking structure isn't requiring to you bankswitch 100 times per frame it's probably not an appreciable difference. Only bad usage patterns will make a significant impact.tokumaru wrote:I'd put as much of the main game engine in the fixed bank as possible...
Group stuff by how you use it. "Engine" is too vague a category. Think about the tasks you need to do each frame, and what code needs to be called by a lot of things, and what code/data only needs to be called within a small group. For example, you could probably group all your character drawing code, sprite rendering code, and related data (metasprites, etc.) into a single bank.
If you group your code early on, it's easier to move around later too. You could start with all code in the fixed bank, and then just move code groups out of it when you need more space there as the project grows. The only kind of stuff that really needs to be in the fixed bank are things that are called/needed/referenced by more than one switched bank. Otherwise it's just the same as any other place to put it.
Re: Trying to structure my code for PRG bank switching
No, but it easier/faster to manage the switchable banks from the fixed bank, rather than having one switchable bank call another. Another point of saving most of the fixed bank for the most complex part of the game (the main engine) is to make the most of the limited address space. Having more of the main engine available means less need to bankswitch.rainwarrior wrote:It's not like code in a switchable bank runs any slower than in a fixed bank.
Of course this is completely irrelevant to 32KB bankswitching. Without a fixed bank, you can max out every bank with relevant code/data, and it's mandatory that you have switchable banks call other switchable banks.
My point is that while having a fixed bank is handy in the sense that you can make this bank manage everything, it also makes things less versatile because you can have less dynamic stuff loaded at any given time. This is why I think you should think carefully about what to put in the fixed bank.
Re: Trying to structure my code for PRG bank switching
Personally, I like to start putting stuff in switchable banks right from the beginning, because it's much harder to move stuff out of the fixed bank than it is to move it back in. Running out of space in the fixed bank can be quite annoying to deal with.
As for mappers, my personal preference (based on functionality, not price) is something like FME-7, because it has:
- 8 KB PRG-RAM bank at $6000..7FFF
- 8 KB PRG-ROM bank at $8000..9FFF (could be used for data)
- 8 KB PRG-ROM bank at $A000..BFFF (could be used for code that operates on the data at $8000..9FFF)
- 8 KB PRG-ROM bank at $C000..DFFF (could be used for DPCM samples)
- 8 KB PRG-ROM bank at $E000..FFFF (fixed bank, for vectors, trampolines, etc)
The possibility of having an 8 KB switchable bank for data, as well as another 8 KB switchable bank for code means that the amount of data can be extended very easily without having to duplicate code (as long as the code doesn't need to see more than 8 KB of the data at a time).
UxROM (or MMC1 in 16 KB mode) is quite annoying with its only one switchable PRG bank. I think I'd prefer 32 KB banking to it, although I haven't tried to use that in a real project yet.
As for mappers, my personal preference (based on functionality, not price) is something like FME-7, because it has:
- 8 KB PRG-RAM bank at $6000..7FFF
- 8 KB PRG-ROM bank at $8000..9FFF (could be used for data)
- 8 KB PRG-ROM bank at $A000..BFFF (could be used for code that operates on the data at $8000..9FFF)
- 8 KB PRG-ROM bank at $C000..DFFF (could be used for DPCM samples)
- 8 KB PRG-ROM bank at $E000..FFFF (fixed bank, for vectors, trampolines, etc)
The possibility of having an 8 KB switchable bank for data, as well as another 8 KB switchable bank for code means that the amount of data can be extended very easily without having to duplicate code (as long as the code doesn't need to see more than 8 KB of the data at a time).
UxROM (or MMC1 in 16 KB mode) is quite annoying with its only one switchable PRG bank. I think I'd prefer 32 KB banking to it, although I haven't tried to use that in a real project yet.
Download STREEMERZ for NES from fauxgame.com! — Some other stuff I've done: fo.aspekt.fi
Re: Trying to structure my code for PRG bank switching
Could you guys please explain to me what trampoline code is?
Re: Trying to structure my code for PRG bank switching
A trampoline is used to jump from one piece of code to another. Trampoline (computing) in Wikipedia.
On the NES, a trampoline might be used to jump from code in one bank to code in another bank.
With a single fixed bank, as in UNROM (or MMC3 if you're using $C000-$DFFF for audio), you usually put code that operates on a single bank of data in the same bank as the data and code that operates on multiple banks of data in the fixed bank.
On the NES, a trampoline might be used to jump from code in one bank to code in another bank.
With a single fixed bank, as in UNROM (or MMC3 if you're using $C000-$DFFF for audio), you usually put code that operates on a single bank of data in the same bank as the data and code that operates on multiple banks of data in the fixed bank.
Re: Trying to structure my code for PRG bank switching
A trampoline is a piece of code that's either in a fixed bank or replicated across multiple switchable banks (simulating a fixed bank) that allows switchable banks to call each other. For example, my game engine is running from a 32KB switchable bank, and I need to read level data from another switchable bank. To do this, I can JSR to a piece of code that's present in both banks at the same meory location, which will make the switch and JMP to the actual routine that reads the level data. Once the data is read, the program jumps back to the trampoline code, which swaps the old bank back and returns to the location after the original JSR.Sogona wrote:Could you guys please explain to me what trampoline code is?
It's sort of a slow process, so you don't want to be doing this hundreds of times per frame.
Re: Trying to structure my code for PRG bank switching
Alright, well I seem to have gotten it to work so far 
Re: Trying to structure my code for PRG bank switching
I completely agree with this, but I still haven't decided what the best way to handle inter-bank calls is in this case.rainwarrior wrote:I personally am very fond of AxROM/BNROM 32k banking. It's not very good for DPCM, but having 32k banks makes it really easy to organize data. All the music code and data goes together in a single bank, graphics unpacking code and data goes in a bank together, level data and loading code goes together, etc.
I know about solutions that take the bank index and subroutine address as parameters (which can be passed in registers, global variables, or even in .db/.dw statements after the subroutine call) and handle everything dynamically, but that always seemed so slow to me, and sometimes crippling, if the solution in question prevents you from using registers and/or the stack for passing/returning actual parameters to/from the subroutines being called.
I have considered making a separate trampoline for each subroutine that can be called, because the bank and the address would always be known and wouldn't have to be passed around. It would also be faster than handling indices and addresses dynamically. The obvious disadvantage is the space these routines would occupy in every bank. I don't think the typical game would have that many subroutines scattered across different banks though, seeing as most banks would be occupied by data, and the few routines necessary to interact with that data.
In the kinds of games I have designed, I would expect to switch banks at least once for each active object that collides with the level map, but maybe more than that if objects need more complex A.I. or bigger lookup tables. Then a few times more for scrolling purposes, loading new objects, that kind of thing. Then again in the NMI for different types of VRAM updates and audio updates. All things considered, I expect around 50 round trips to other banks every frame, so optimizing for speed does sound like a good idea in this case.
What are the solutions you have personally used, or seen other games using? What are their advantages and disadvantages?
- rainwarrior
- Posts: 8062
- Joined: Sun Jan 22, 2012 12:03 pm
- Location: Canada
- Contact:
Re: Trying to structure my code for PRG bank switching
Okay, well since you asked, here is how my upcoming BxROM (32k banking) game does bankswitching:
My game is based on "rooms", so collision data is unpacked to RAM between room transitions. As such, there's no need to bankswitch to a collision-data bank for each collision test. If you want to know specifically how my game is arranged, I have basically 5 types of bank:
1. Level data and unpacking code
2. CHR data and upload code
3. Music data and code
4. Character update code and data
5. "Main" bank containing game loops, player control and data, NMI handler, etc.
Banks 1/2 are mostly only used during room transitions, no performance issue there.
Bank 3 is called once per frame after the NMI handler finishes withe the PPU.
Bank 4 is called once per frame to update all the characters.*
My "banked function call" trampoline is a piece of code at the same position in every bank that looks like:
So you lda #target_bank then jsr bank_call. When the bank is finished, it does jmp bank_return, and the whole operation is basically a "long" jsr. It adds 30 extra cycles compared to a non-banked jsr/rts, and you can use X as a parameter to the call. Not horrible, and it could be optimized for specific cases easily.
If I had to bank for collision, my first approach would probably just be to try it with the banked call and see what the overall performance was like. An extra ~1800 cycles for 60 collision calls (this is an upper bound estimate, I think most frames might have 15 or fewer collision calls) doesn't seem too bad to me; it's not wonderful, but might be good enough. If it was a problem, maybe I'd think about batching multiple collision calls into a single call (collision tests often come in groups). My collision routines are actually kind of slow anyway, since the data is bit-packed to save RAM.
During unpause, I have to reload part of the room to redraw the screen area covered by the pause overlay. My level banks unpacking code has an option to unpack 64 bytes at a time, which gets placed in my NMI update buffer. You can use RAM to shuttle blocks of data between banks like this.
* As my game grew, eventually I had a lot of character code and needed a second bank for it, so at this point some character updates are behind an additional banked call. (There are maximum 16 characters at once, and 2 functions for update and draw, so at most ~960 extra cycles? Typically far fewer, often 0.) There is extra overhead, but it's easy to prioritize characters that appear in CPU-heavy rooms to the "primary" character bank where they don't have to bankswitch. Any characters that don't appear in performance critical areas of the game (i.e. most of them) can be freely moved to the auxiliary bank. I could probably split the character banks into one for update code and one for draw code, which might eliminate any extra bankswitching, but I'd rather keep all the code for a single character in one place (no good reason for this other than I don't want to do the work to separate them now that they don't fit in one bank; not going to do that work until I actually have a performance problem to solve with it).
I don't really know how other games do it. Battletoads seemed to have all its music in a single bank. I expect banks are kind of dedicated to a particular kind of level, e.g. a bank for the vertical platforming levels, a bank for the vehicle riding levels, etc. but I haven't really looked into it.
Tepples suggested dedicating a little bit of RAM for a trampoline, or putting some simple unpacking code in RAM to avoid having to duplicate it in many banks (if ROM space is tight). However, I tend to think of RAM as more scarce than ROM, especially when you have PRG banking available, so I'd rather trade ROM space for RAM in most cases. (If I had WRAM it might be a different story.) Depends on your RAM budget though, it can be perfectly fine to use some of it for code or some large transfer buffer.
My game is based on "rooms", so collision data is unpacked to RAM between room transitions. As such, there's no need to bankswitch to a collision-data bank for each collision test. If you want to know specifically how my game is arranged, I have basically 5 types of bank:
1. Level data and unpacking code
2. CHR data and upload code
3. Music data and code
4. Character update code and data
5. "Main" bank containing game loops, player control and data, NMI handler, etc.
Banks 1/2 are mostly only used during room transitions, no performance issue there.
Bank 3 is called once per frame after the NMI handler finishes withe the PPU.
Bank 4 is called once per frame to update all the characters.*
My "banked function call" trampoline is a piece of code at the same position in every bank that looks like:
Code: Select all
bus_conflict: .byte 0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15
bank_call:
tay
lda bank ; stores previous bank number
pha
tya
sta bus_conflict, Y
jmp bank_entry
bank_return:
pla
tay
sta bus_conflict, Y
rtsIf I had to bank for collision, my first approach would probably just be to try it with the banked call and see what the overall performance was like. An extra ~1800 cycles for 60 collision calls (this is an upper bound estimate, I think most frames might have 15 or fewer collision calls) doesn't seem too bad to me; it's not wonderful, but might be good enough. If it was a problem, maybe I'd think about batching multiple collision calls into a single call (collision tests often come in groups). My collision routines are actually kind of slow anyway, since the data is bit-packed to save RAM.
During unpause, I have to reload part of the room to redraw the screen area covered by the pause overlay. My level banks unpacking code has an option to unpack 64 bytes at a time, which gets placed in my NMI update buffer. You can use RAM to shuttle blocks of data between banks like this.
* As my game grew, eventually I had a lot of character code and needed a second bank for it, so at this point some character updates are behind an additional banked call. (There are maximum 16 characters at once, and 2 functions for update and draw, so at most ~960 extra cycles? Typically far fewer, often 0.) There is extra overhead, but it's easy to prioritize characters that appear in CPU-heavy rooms to the "primary" character bank where they don't have to bankswitch. Any characters that don't appear in performance critical areas of the game (i.e. most of them) can be freely moved to the auxiliary bank. I could probably split the character banks into one for update code and one for draw code, which might eliminate any extra bankswitching, but I'd rather keep all the code for a single character in one place (no good reason for this other than I don't want to do the work to separate them now that they don't fit in one bank; not going to do that work until I actually have a performance problem to solve with it).
I don't really know how other games do it. Battletoads seemed to have all its music in a single bank. I expect banks are kind of dedicated to a particular kind of level, e.g. a bank for the vertical platforming levels, a bank for the vehicle riding levels, etc. but I haven't really looked into it.
Tepples suggested dedicating a little bit of RAM for a trampoline, or putting some simple unpacking code in RAM to avoid having to duplicate it in many banks (if ROM space is tight). However, I tend to think of RAM as more scarce than ROM, especially when you have PRG banking available, so I'd rather trade ROM space for RAM in most cases. (If I had WRAM it might be a different story.) Depends on your RAM budget though, it can be perfectly fine to use some of it for code or some large transfer buffer.