NROM larger than 32K?
Moderators: B00daW, Moderators
Re: NROM larger than 32K?
As was said, cc65 can handle banking with some help from the user. I haven't used cc65 for C, only assembly, but I have macro code to enhance the .segment command and other macros that create a 'bank' property for all function's entry points. With MMC3 it seems ideal to setup the 8k banks at $8000 and $E000 as fixed banks, and the rest banked at $A000 and $C000. I can designate the banks as code or data. Code banks are tracked and calls verfied that they are banking/not banking as needed at build. This works as long as you follow the rule to not bank code banks in NMI, otherwise you need a currentCodeBank variable. Databanks are tracked with a variable and banked as needed, but each label is also assigned a bank property. This could be enhanced by tracking and verifying databanks at runtime with Lua. (Lua: see NintendulatorDX).
Re: NROM larger than 32K?
Both of these workarounds make the author think extensively about memory allocation. It's not strictly bad, but it slows you down and now you have to think about things beyond "writing the game".rainwarrior wrote:lidnariq: It's true that CC65 has no built in support for bankswitching, but it's not true that you can't do it without modifying CC65 itself.
Also, the cc65 linker doesn't (didn't?) like duplicated banks of compiled C code in the final result, so either you have to write your own linker, or any shared code has to be written in assembly. The point behind NROM-368 was just that it minimizes barrier to entry as much as possible without modifying the C compiler.
- rainwarrior
- Posts: 8062
- Joined: Sun Jan 22, 2012 12:03 pm
- Location: Canada
- Contact:
Re: NROM larger than 32K?
No, you do not have to modify the linker. To put C code in more than one bank, just link each of those banks separately, and combine them after linking (e.g. with the DOS copy command). Either do this, or put the CRT and shared stuff in a fixed bank, and use pragmas to tell the compiler where various C code/data belongs.
Personally, though, I tried this to see if I could do it (and it worked okay), but I didn't actually have a project that needed it, so I went back to my AxROM one C bank method that lets me have 32k for just C code. It's much simpler, and so far 32k seems like a lot of space for the code. The amount of assembly required for this is pretty low- you need a couple of data readers (CHR uploader, map loader, etc.), and probably a music engine (easy to just use shiru's for this) for the ASM only banks, and the little trampoline stubs. I don't think memory management for this method is more complex than making an NROM game beyond some really trivial setup (for multiple C banks it's a little more of a problem). When I finish my project I will post source for this as an example.
I didn't make any comment about oversized NROM, other than I think it's an okay idea. It has an advantage of being the most simple for the programmer, yes. The AxROM-one-C-bank suggestion I think is the next simplest thing, and it has the advantage of a lot more available space for data, and not requiring new/updated emulators.
Personally, though, I tried this to see if I could do it (and it worked okay), but I didn't actually have a project that needed it, so I went back to my AxROM one C bank method that lets me have 32k for just C code. It's much simpler, and so far 32k seems like a lot of space for the code. The amount of assembly required for this is pretty low- you need a couple of data readers (CHR uploader, map loader, etc.), and probably a music engine (easy to just use shiru's for this) for the ASM only banks, and the little trampoline stubs. I don't think memory management for this method is more complex than making an NROM game beyond some really trivial setup (for multiple C banks it's a little more of a problem). When I finish my project I will post source for this as an example.
I didn't make any comment about oversized NROM, other than I think it's an okay idea. It has an advantage of being the most simple for the programmer, yes. The AxROM-one-C-bank suggestion I think is the next simplest thing, and it has the advantage of a lot more available space for data, and not requiring new/updated emulators.
Re: NROM larger than 32K?
lidnariq wrote:either you have to write your own linker,
Sure looks like what I said.rainwarrior wrote:just link each of those banks separately, and combine them after linking (e.g. with the DOS copy command).
One has to manually allocate which bank any given chunk of data or code is in. One has to manually rebalance things between banks when you exceed the amount available in a given bank. One has to manually issue bankswitching commands. Like I said, fiddly. Many moving parts requiring precision and easy to screw up.rainwarrior wrote:I don't think memory management for this method is more complex than making an NROM game beyond some really trivial setup (for multiple C banks it's a little more of a problem). When I finish my project I will post source for this as an example.
That said, the AxROM "32kb code, up to 224kb data" split is pretty elegant. How do you deal an NMI during data fetch? It looks like this requires making a separate tool to build a data blob and making tables of more-than-16bit constants for the code to refer to?
- rainwarrior
- Posts: 8062
- Joined: Sun Jan 22, 2012 12:03 pm
- Location: Canada
- Contact:
Re: NROM larger than 32K?
That doesn't look like writing my own linker at all. What do you mean?lidnariq wrote:lidnariq wrote:either you have to write your own linker,Sure looks like what I said.rainwarrior wrote:just link each of those banks separately, and combine them after linking (e.g. with the DOS copy command).
The C setup is the same as it would be for NROM, so that part is exactly the same amount as fiddly. Every other bank basically has one kind of thing in it, there is only one segment per bank, which is one line in the .cfg and one line in the assembly file. Very hard to screw up. The bankswitch trampolines are almost trivial, it's not like there is 100 functions in each bank, there's basically one or two function calls per bank, very easy to set up. Harder than doing nothing yes, but not "fiddly" at all, and you get a ton of data space for this trade. (Also there are at most 8 banks.)lidnariq wrote:One has to manually allocate which bank any given chunk of data or code is in. One has to manually rebalance things between banks when you exceed the amount available in a given bank. One has to manually issue bankswitching commands. Like I said, fiddly. Many moving parts requiring precision and easy to screw up.rainwarrior wrote:I don't think memory management for this method is more complex than making an NROM game beyond some really trivial setup (for multiple C banks it's a little more of a problem). When I finish my project I will post source for this as an example.
My current solution for NMI is just to duplicate the NMI routine across all banks. The routine is all assembly and not very large, so the space isn't a problem. Alternatively you can put a push/pop bankswitch in all the banks except one, and lose a few cycles for the switch. Either way, it just becomes part of the common stub you already have to put at the end of each bank.lidnariq wrote:That said, the AxROM "32kb code, up to 224kb data" split is pretty elegant. How do you deal an NMI during data fetch? It looks like this requires making a separate tool to build a data blob and making tables of more-than-16bit constants for the code to refer to?
I am not sure what your question about a separate tool to build a data blob. What data blob? I have tools for making gfx/map/music data, but that's nothing to do with the banking scheme. What is a "more-than-16bit constant" for?
Last edited by rainwarrior on Thu Jun 13, 2013 2:25 pm, edited 1 time in total.
Re: NROM larger than 32K?
Action 53, a multicart menu compatible with BxROM, keeps the menu code in the last bank and disables NMI during data fetch. But that's because it has to preserve the NMI handler of the game in each bank.lidnariq wrote:That said, the AxROM "32kb code, up to 224kb data" split is pretty elegant. How do you deal an NMI during data fetch?
In the general case of a non-multicart, you probably want to do data fetches with code in the same bank as the data. So you could just put a bare-bones NMI handler in RAM that switches to bank 0, calls a 24-byte NMI handler in RAM, and switches back. (Or as rainwarrior ninja'd me, you could mirror these 24 bytes in each PRG bank.)
Code: Select all
cur_bank_number = $FFF0
known_zero = $FFF3 ; in reset handler
ram_nmi_trampoline:
pha
tya
pha
lda cur_bank_number
pha
lda #0
sta known_zero
jmp real_nmi_handler
nmi_handler_jmps_here_when_done:
pla
tay
sta identity,y
pla
tay
pla
rti
Code: Select all
nmi:
inc retraces
rti
What you meant, I'm guessing, is how you'd store the bank bytes. For Action 53, I made a separate data blob builder tool, but I had a very specific use for that: quickly adding and rearranging games in a multicart. In the general case, you could just use 24-bit far addresses, which any 65816-compatible assembler will support. AxROM, BxROM, and GxROM happen to have the same memory layout as Super NES LoROM: an array of 32 KiB banks, each at $8000-$FFFF. C code would go in bank 0 and specialized code for accessing particular data tables would go elsewhere.It looks like this requires making a separate tool to build a data blob and making tables of more-than-16bit constants for the code to refer to?
- rainwarrior
- Posts: 8062
- Joined: Sun Jan 22, 2012 12:03 pm
- Location: Canada
- Contact:
Re: NROM larger than 32K?
Oh, I also had a thought about the oversized NROM:
It seems that beginning the usable space at $4800 has been favoured because of the simpler components for building the ROM, but as far as the mapper 0 extension goes, why not allow it all the way down as low as it goes, and let those concerned about a particular hardware implementation decide whether to pad to $4020 or $4800 or $6000 as they see fit? $4800 feels to me like it should be a recommendation, not a requirement.
It seems that beginning the usable space at $4800 has been favoured because of the simpler components for building the ROM, but as far as the mapper 0 extension goes, why not allow it all the way down as low as it goes, and let those concerned about a particular hardware implementation decide whether to pad to $4020 or $4800 or $6000 as they see fit? $4800 feels to me like it should be a recommendation, not a requirement.
Re: NROM larger than 32K?
ETA:; Sorta ignore it, this is how other hardware (has successfully) worked but I doubt anyone would be able to work around what this takes away without tearing their hair out.
Put it at $4000. Make it all reads come from ROM, and all writes go to the PPU. Only problem would be that the reads for NMI clear and such would be wonky. Although you could have a bus-selection chip(s) in between if needed. And you also can't use the PPU Sprite 0, but hey...making it clean isn't easy.
Put it at $4000. Make it all reads come from ROM, and all writes go to the PPU. Only problem would be that the reads for NMI clear and such would be wonky. Although you could have a bus-selection chip(s) in between if needed. And you also can't use the PPU Sprite 0, but hey...making it clean isn't easy.
- rainwarrior
- Posts: 8062
- Joined: Sun Jan 22, 2012 12:03 pm
- Location: Canada
- Contact:
Re: NROM larger than 32K?
I don't think it's worth trying to use <$4020 but the rest of that space is perfectly usable, isn't it? The whole reason for this proposal is extra space. It'd be nice to squeeze out another almost-2k there, and let the people who want to make it in hardware decide how much circuit complexity vs. space they want.
Re: NROM larger than 32K?
For what it's worth:
This is a simple example, but you can track if your code should be banking or not with this idea even when importing from other pre-assembled modules. I'm not sure if you can do something like that in C.
Code: Select all
; ca65:
; create a function with a bank property. Example: myFunction::bank
.macro func name
.ident(.sprintf("%s_bank", .string(name))) = currentBank ; currentBank is set with an enhanced .segment directive
.proc name
bank = currentBank
.endmacro
.macro endfunc
.endproc
.endmacro
; -----------------------------
; export function name and its bank
.macro exportfunc name
.export name
.export .ident(.sprintf("%s_bank", .string(name)))
.endmacro
; import a function and its bank and create a bank property ( 'bank' only usable at link, but .assert works at link )
.macro importfunc name
.import name
.import .ident(.sprintf("%s_bank", .string(name)))
.scope name
bank = .ident(.sprintf("%s_bank", .string(name)))
.endscope
.endmacro
Re: NROM larger than 32K?
rainwarrior wrote:That doesn't look like writing my own linker at all. What do you mean?
Well, ok, you're linking by hand when you concatenate a bunch of 32kiB segments. What do I mean when I say that?rainwarrior wrote:I am not sure what your question about a separate tool to build a data blob. What data blob? I have tools for making gfx/map/music data, but that's nothing to do with the banking scheme. What is a "more-than-16bit constant" for?
The role of the linker is to take a symbol (say, "my_map") and convert it into an address. Since cc65 doesn't natively support addresses longer than 16 bits, you have to manage all the higher order bits. You could do this with an automated tool that will take all of the parts and move them around between 32kiB slices (specifically assuming your AxROM-style banking) and keep track of them. This could be a simple thing: such a packer (or "linker") could easily spit out a data blob and a header file with 4+16 bit constants that would then be #included so that the program knew where to load its resources from. Or (as I think you're implying) you can manually keep track of which bank and address any given resource lives in.
Or you could automate less of it. Either way, however, something needs to keep track of it.
Why is this NOT the same as using cc65 with NROM? Because there's no bank, and cc65 natively keeps track of the other 16 address bits. You just compile a bunch of files and link them and either they fit, or they don't and you figure out what to rewrite or cut.
Back to the original thread:
The reason for $4800-$ffff was that it's a single IC solution. Routing involves A11-A15 and M2. 32 and 64 kiB PROMs appear to be approximately the same cost. So the incremental cost is 40¢ (looking at current prices as of today) for an additional 14kiB of addressable ROM space. This is the exact same cost as for 64 kiB AMROM, but it doesn't require any sophistication in working with cc65.
On the other hand, adding 2016 more bytes to that takes another IC (a 74'4078) as well as routing six more address lines. While getting 16352 bytes for 80¢ is nice, getting 14336 for 40¢ is usually better. And since iNES requires the 16kiB quantization, it's better to enforce the more restrictive version so there are no surprises. (See also: emulators should enforce bus conflicts). Obviously, the UNIF encapsulation could just provide a PRG0 section of 49120 bytes to specify the larger version. But iNES doesn't have that option.
Ultimately, it can't start any lower than $4020 without routing M2, 13-16 address lines, and most likely R/W. (R/W is necessary if we're going to overlay the APU registers. Otherwise there will be bus conflicts on writes) At this point, the logical equation describing when the ROM should be enabled is too complex to do in discrete logic, and the cheapest CPLD or PAL is $1. Spending the additional cost on at most 29 bytes of ROM is not clearly useful in comparison to other things you could do with the CPLD.
- rainwarrior
- Posts: 8062
- Joined: Sun Jan 22, 2012 12:03 pm
- Location: Canada
- Contact:
Re: NROM larger than 32K?
Aside from the "main" C bank, each bank has a trampoline entry point at a fixed address (part of the stuff I put in every bank). I don't do any managing of resource locations, the resources within a bank are only ever consumed by code within that bank. Basically I put a few arguemnts on zp, switch to the desired bank and jump on the trampoline. In C that's abstracted away, and in practice the code I write looks like: "load_chr_block(3,2);" that loads 1k CHR data block "2" into 1k CHR-RAM slot "3", implemented by the trampoline in a bank that holds all my CHR data and assembly CHR-RAM upload code. Similarly the map bank trampoline just dumps a small block of map data into a designated place in RAM. The music bank trampoline has a couple of different functions, but in the C code portion it's just basically "play_sound(7);".
If you do want to manage resource locations inter-bank, which I don't, it's actually possible to get the CC65 toolchain to generate and use the kinds of labels and addresses you need, but it is fairly complicated and I don't think I'd get much use out of it. As I said, I'm still a long way off of filling up 32k with C code (with no data in there, there's a lot of room left for code), and the rest are basically just data banks with very little code in them.
That's also why I didn't really proceed much father looking at C in multiple banks. I verified that it can be done with separate linking and block concatenation, but I never ended up with a use for it. Once I started partitioning my data so it doesn't have to share space with the C code, I've had lots of room for that code. And, well, this is why I think it might be an attractive solution for people who are considering expanded NROM as well. Being able to move your sound/map data out of the main bank likely makes at least as much room for your C code as expanding NROM does, and potentially gives you A LOT more room for data.
Anyhow, my suggestion that the expanded NROM spec should go down to $4020 is that on the emulator side this is no harder than supporting it at $4800, and is "backward" (upward?) compatible with the smaller memory sizes. I understand it makes a less complicated circuit, and I already said so, but I don't see why the emulator implementations should be limited like that. Homebrewers can still pad to $4800, build their cart with the simple circuit, and have it emulate fine even if the emulator goes down to $4020. Why restrict that person who really needs to squeeze in an extra 2k?
There's no "surprises" to avoid here, if someone knows how to make a repro for an expanded NROM they should know enough to figure out how much padding is in the ROM to determine what circuit they'd need to build. There's already a bunch of factors to look at when building an NROM repro, this is just one more to learn. There's going to be somebody for whom an extra 2k of space is worth 40 cents, and we shouldn't pre-emptively take that option away from them. I'm saying that a strong recommendation is better than a frustrating arbitrary limitation, if emulators adopted $4800 instead of $4020. It doesn't take away from the people who want to use $4800 for hardware reasons to also support to $4020 in emulators.
If you do want to manage resource locations inter-bank, which I don't, it's actually possible to get the CC65 toolchain to generate and use the kinds of labels and addresses you need, but it is fairly complicated and I don't think I'd get much use out of it. As I said, I'm still a long way off of filling up 32k with C code (with no data in there, there's a lot of room left for code), and the rest are basically just data banks with very little code in them.
That's also why I didn't really proceed much father looking at C in multiple banks. I verified that it can be done with separate linking and block concatenation, but I never ended up with a use for it. Once I started partitioning my data so it doesn't have to share space with the C code, I've had lots of room for that code. And, well, this is why I think it might be an attractive solution for people who are considering expanded NROM as well. Being able to move your sound/map data out of the main bank likely makes at least as much room for your C code as expanding NROM does, and potentially gives you A LOT more room for data.
Anyhow, my suggestion that the expanded NROM spec should go down to $4020 is that on the emulator side this is no harder than supporting it at $4800, and is "backward" (upward?) compatible with the smaller memory sizes. I understand it makes a less complicated circuit, and I already said so, but I don't see why the emulator implementations should be limited like that. Homebrewers can still pad to $4800, build their cart with the simple circuit, and have it emulate fine even if the emulator goes down to $4020. Why restrict that person who really needs to squeeze in an extra 2k?
There's no "surprises" to avoid here, if someone knows how to make a repro for an expanded NROM they should know enough to figure out how much padding is in the ROM to determine what circuit they'd need to build. There's already a bunch of factors to look at when building an NROM repro, this is just one more to learn. There's going to be somebody for whom an extra 2k of space is worth 40 cents, and we shouldn't pre-emptively take that option away from them. I'm saying that a strong recommendation is better than a frustrating arbitrary limitation, if emulators adopted $4800 instead of $4020. It doesn't take away from the people who want to use $4800 for hardware reasons to also support to $4020 in emulators.
Re: NROM larger than 32K?
<facepalm> Thanks for explaining. Somehow I kept on thinking you were using a more explicit symbol table.rainwarrior wrote:Aside from the "main" C bank, each bank has a trampoline entry point at a fixed address (part of the stuff I put in every bank). I don't do any managing of resource locations, the resources within a bank are only ever consumed by code within that bank. Basically I put a few arguemnts on zp, switch to the desired bank and jump on the trampoline. In C that's abstracted away, and in practice the code I write looks like: "load_chr_block(3,2);" that loads 1k CHR data block "2" into 1k CHR-RAM slot "3", implemented by the trampoline in a bank that holds all my CHR data and assembly CHR-RAM upload code. Similarly the map bank trampoline just dumps a small block of map data into a designated place in RAM. The music bank trampoline has a couple of different functions, but in the C code portion it's just basically "play_sound(7);".
- rainwarrior
- Posts: 8062
- Joined: Sun Jan 22, 2012 12:03 pm
- Location: Canada
- Contact:
Re: NROM larger than 32K?
Well, I understand the ideal goal is to have the C layer abstract away the idea of banks and just transparently figure out how to execute code or read data between banks, but yes, that kind of thing is either extremely onerous to accomplish in CC65 and/or would require significant modification to the compiler and linker. Movax12 probably has some interesting macro solutions for CA65, but I don't really like complex macros, myself.
There are some fairly simple paradigms for doing bankswitching in CC65/CA65, which require some organization and at least a little bit of assembly knowledge, but IMHO this works pretty well. Not really for the newbie who just wants to write in C and could use a bit of extra space (for this person expanded NROM is nice), but for someone with a little bit of comfort in assembly I think this is a lot more versatile than just having another 16k to work with. I'd like to give a good open source example, but I'm not ready to share it yet, probably in a few months.
There are some fairly simple paradigms for doing bankswitching in CC65/CA65, which require some organization and at least a little bit of assembly knowledge, but IMHO this works pretty well. Not really for the newbie who just wants to write in C and could use a bit of extra space (for this person expanded NROM is nice), but for someone with a little bit of comfort in assembly I think this is a lot more versatile than just having another 16k to work with. I'd like to give a good open source example, but I'm not ready to share it yet, probably in a few months.
Re: NROM larger than 32K?
Five months later, did anyone ever get around to implementing this in a cart PCB?