The Open-Source USB-NES + NES-FASTGFX project

Discuss technical or other issues relating to programming the Nintendo Entertainment System, Famicom, or compatible systems. See the NESdev wiki for more information.

Moderator: Moderators

Post Reply
User avatar
digilogistist
Posts: 21
Joined: Sat May 05, 2012 9:28 pm

The Open-Source USB-NES + NES-FASTGFX project

Post by digilogistist »

Hi NESDEV!

Here you go:

https://github.com/digilogistist/The-Op ... ES-Project

This is the Open-Source USB-NES Project. It includes all the engineering documents to manufacture a new type of cartridge boardset I designed called NES-FASTGFX-01, and as well as the USB-NES Lite, that presents NES carts as a flash drive over USB to a host machine where emulators can load the ROM.NES file right off the cart. Moreover, the USB-NES project was created as a convenient way to do NES development with modern flash carts using file drag + drops on the host.

I'm releasing the project now because USB-NES needs to go to a higher place to become a better tool. In an effort to add NES gamepad ports to USB-NES, I spent way too much time trying to figure out how to implement a USB human interface device with mass storage class on a STM32F103 micro; at this point I just don't know how to make it work on my own. I mean, I spent two weeks working with ChatGPT to try and figure it out, and at the end of the day when I still couldn't get it to work, I asked ChatGPT if the STM32F103 has enough RAM to actually implement both USB devices simultaneously, and the answer I gathered was along the lines of "not too sure about that". I feel stupid because I can't figure out this USB tech, despite its massive adoption in the wired tech world.

The novel NES-FASTGFX cartridge boardset design efficiently expands the number of PPU background tiles to 32768, using a whopping 512 KB for CHR RAM, and a handful of easily-sourced discrete logic chips. It is also possible to do 4-CPU-cycle CHR-RAM byte filling with this mapper, which makes 16-bit name table filling as easy as it gets on NES. However the design is untested and uses a lot of chips, so it is hard to say with full certainty that it works properly and can be manufactured cheaply enough to warrant a potential production run for any new NES game software.
Attachments
IMG_20230210_214148.jpg
Screenshot_2023-04-13_22-25-45.png
Screenshot_2023-04-13_22-25-10.png
Screenshot_2023-04-13_22-23-41.png
Screenshot_2023-04-13_22-22-10.png
User avatar
Memblers
Site Admin
Posts: 4044
Joined: Mon Sep 20, 2004 6:04 am
Location: Indianapolis
Contact:

Re: The Open-Source USB-NES + NES-FASTGFX project

Post by Memblers »

I've been looking at NES-FASTGFX, I like it, it seems like a nice design, especially for being all 74xx. I've spent a lot of time thinking about odd possible mappers, but I never considered anything like, e.g. LDA #$AA \ STA $2557 to put $55AA into VRAM, or having any use for internal VRAM and cart RAM simultaneously. I understood most of it, but I had to show it to kevtris to get the full view.

I do have one concern about the CTRL_CLK signal. When doing an absolute,X write the CPU also a dummy read before the write, and I think CTRL_CLK will be active for both of those, but I'm not 100% sure what the side-effects would be in this case. I'm afraid it could be the same case as GTROM, when I (needlessly, in my case) did indexed writes to the mapper register in a mapper hack of Castlevania. Side effect looked just like 8x1 sized background glitches any time the game changed PRG banks, because CHR bank bits were picking up junk data for that one cycle. It's kind of neat looking, but was not at all what I first expected to see. If I'm right, you may need to bring the PRG R/W signal into that logic.

That's the only potential problem I could see, the rest of my comments are just observations about how to potentially use these features.

To use the timestamp counter, it seems like the simple approach would be to poll it until it matches a specific range. There will be a race condition when reading more than 8 bits though, since the low count can overflow in-between reads. I guess you'd read high, low, then high (compare). There's a chance this could make some positions unreachable, if it's close to that overflow point. Not a big deal though, you just choose a better scanline ahead of time. It just needs the jitter to remain in hblank. Hopefully DMC samples can work with that. Or heck, if it's not being used for audio, one could use the timestamp to capture the upper overscan, and DMC IRQ + sprite zero poll to capture the lower overscan.

This 512kB CHR RAM will be great for background art, but sprite art calls for a novel approach. The simpler one is to use the extra vblank time to write the tiles with an unrolled loop, like Battletoads, and always use the same 4kB page. An alternative way would be to divide the 4kB into multiple slots, of whatever size you want. Say I arbitrarily want 1kB of tiles always mapped in, one 2kB tile bank with 8 pages, and another 1kB tile bank with 8 pages. That's 25kB of unique tile data. The number of unique pages, 8 * 8 = 64 pages * 4kB = 256kB, half the CHR RAM, leaving the other 256kB free for background. It will take some time to load up the CHR-RAM like that, though.
User avatar
digilogistist
Posts: 21
Joined: Sat May 05, 2012 9:28 pm

Re: The Open-Source USB-NES + NES-FASTGFX project

Post by digilogistist »

Hi Memblers,

Thanks for your criticism.

You are right about the extra CPU cycle when writing to the control register using an absolute indexed mode; the value written will be garbage for 1 cycle before the correct one. Six of the control register address lines control PRG bankswitching for flash+RAM, so I don't think that will be a problem (EXCEPT during the reprogramming of the PRG-FLASH- I'm not sure if that will be problematic). But the two bits for selecting the mirroring mode; like you said it will draw a really skinny glitch on the screen there. My only suggestion to work around this glitch are either to set the control register during HBLANK, or to use self-modifying code on a plain absolute store 6502 instruction to change the high displacement bits. I admit, it is a bit of a compromise to organize the control word in this way due to lack of more elaborate CPU address decoding.

The timestamp counter is also registered, so every time you read the LO byte, both the LO and HI counts are latched, which can then be subsequently read. This means the timestamp returned will always be behind a few clock cycles. The proper read sequence for returning the first 16-bit count would be LO, HI, LO, where the first LO read data is discarded.

Like you said, you can poll the register to wait for a certain line or position in the frame with (potentially) great precision. What I also had in mind was for VBLANK code that could exceed the normal 2273 CPU clocks, while not requiring the code to be perfectly timed out. Ideally, you do your VBLANK routine which may routinely allow more PPU time for name table and pattern table filling, and then you read the timestamp counter to basically know what scroll values to program the PPU with, to create the effect of a stationary background scroll, while the top of the screen may occasionally flicker when the VBLANK engine needs to copy more data into PPU memory than would ordinarily be possible. If nothing else, the timestamp counter provides a way to sync the CPU up with the PPU with greater precision than is possible with a NMI event coming off a NOP slide.

There are 128 banks of 4K for sprites, which may or may not be ideal for one's game needs. It's assumed that you would need to copy reused sprite assets across all the banks you plan to use for bankswitching, which is a luxury that using a large RAM chip can offer but like you said, could slow down the game. Moreover, this approach involves a lot of planning when decompressing graphics to RAM in-between screen transitions of the game, to minimize the traffic to the PPU during rendering.

But it is tough to take advantage of the PPU's 8 KB sprite mode, since this involves measuring out the timing of the PPU scanline by waiting for four consecutive reads with A13 high, and then counting out like 126 or so read clocks from there to get to the beginning of the sprite fetch sequence. There's not much else to do with sprites, unless you could have a 256-element table of 15-bits each that would sit between the PPU and sprite memory, so that each tile index code could technically span the entire 512 KB of graphics RAM, but that's just getting crazy already.
User avatar
digilogistist
Posts: 21
Joined: Sat May 05, 2012 9:28 pm

Re: The Open-Source USB-NES + NES-FASTGFX project

Post by digilogistist »

I should mention that in the event where somehow the extra CPU store cycle to the control register somehow interferes negatively with reprogramming the flash, it is also possible to program the control register by using an absolute indexed read, which just does what you expect with no extra invalid write cycle. Of course by doing this, the sprite bank will be filled with open bus data (maybe $48 or something), so you may want to switch the sprites off when programming the PRG flash using this approach.

In the case of reprogramming flash and needing sprites, they would be limited to the bank value that loads from open bus as previously mentioned. To determine the actual value that loads into the sprite bankswitching register with a load from open bus, one byte from all 128 banks of the CHR-RAM will have to have a sequence number, that can be read back through the PPU I/O $2007 port on game bootup. This method does require some setup, but it may work. However, it's hard to say how consistent the bus data returned will be, so I'd consider using i.e., LDA $4800,X as a predictable way to program the sprite banking reg here, as an interesting experiment at best.
User avatar
aquasnake
Posts: 515
Joined: Fri Sep 13, 2019 11:22 pm

Re: The Open-Source USB-NES + NES-FASTGFX project

Post by aquasnake »

nes dumper/flasher + nes flash cart

which cant be an usb-nes, at least others have already implemented
Post Reply