Converting CHR RAM to RGBA textures

Discuss emulation of the Nintendo Entertainment System and Famicom.

Moderator: Moderators

Post Reply
tepples
Posts: 22708
Joined: Sun Sep 19, 2004 11:12 pm
Location: NE Indiana, USA (NTSC)
Contact:

Converting CHR RAM to RGBA textures

Post by tepples »

Say I'm emulating an NES PPU by using SDL2's texture renderer to draw background and sprite characters to the screen. The intent is to draw a larger (32x32-pixel) character in place of each 8x8-pixel character in the ROM, high-level-emulating the game's background and metasprite drawing routines. I successfully did this on an NROM program where I had taken time to redraw the entire background and sprite character sets. My boss now wants me to repeat the process with an MMC3 game with 512 KiB of CHR ROM and 32 KiB of CHR RAM. Because the artists have not yet provided redrawn images for everything in the game, I must fill in the gaps with upscaled assets taken from whatever the game has written to CHR RAM. I warned the director about mixels, and the director told me he would prefer mixels over blank assets for a proof of concept.

A typical scene in the game contains up to 16 out of the 32 CHR RAM banks: four for the playfield (banks 0-3), four for the parallax background (banks 4-7), four for the dialogue window (usually banks 12-15), and four for the sprites (banks 16-31). The playfield, parallax background, and dialogue use a single set of four consecutive CHR RAM banks for an entire row of characters. The sprites' cels can vary rapidly depending on which cel each actor is displaying at any given time. The first 16 characters of banks 0 and 4 are often animated, producing a similar effect to the moving grass and cherries of *Super Mario Bros. 2* or the moving coins and `?` blocks of *Super Mario Bros. 3*.

CHR RAM consists of 32 banks of 64 characters each for 2048 total characters. I had planned to convert the data in CHR RAM in real time to a texture that represents the contents of CHR RAM. Ideally, that'd be a single 256x512-pixel or 131072-byte texture that I would draw with whatever palette is assigned to a particular sprite or a particular area of the background.

However, I've been warned in forum posts that several GPUs do not support indexed textures, only RGBA textures. This means the game would have to convert the data in CHR RAM to whichever palette uses it, potentially all 8. This expands the data by a factor of 32, consisting of a factor of 4 for expanding 8-bit indexed color to RGBA8888 and a factor of 8 for repeating the process in all 8 palettes, totaling a 1024x1024-pixel texture occupying 4194304 bytes.

I'd also have to repeat the conversion process every time a value in CGRAM or CHR RAM is changed. A naïve implementation would end up needing to reconvert and resend all 4 MiB quite often, as forum posts imply that partial updates to a texture are slow. What optimization is possible?
User avatar
rainwarrior
Posts: 8732
Joined: Sun Jan 22, 2012 12:03 pm
Location: Canada
Contact:

Re: Converting CHR RAM to RGBA textures

Post by rainwarrior »

Some GPUs can do indexed colour, but generally it was not a well supported / widely used feature.

However, for a long time now shader programs can easily implement their own indexing. You could have a second texture containing your palette, and just look the colour up on that texture using the value fetched from the CHR texture.
User avatar
Dwedit
Posts: 4924
Joined: Fri Nov 19, 2004 7:35 pm
Contact:

Re: Converting CHR RAM to RGBA textures

Post by Dwedit »

Partial texture updates are not slow (edit: At least when using the original native APIs), where did you hear that?
Here come the fortune cookies! Here come the fortune cookies! They're wearing paper hats!
calima
Posts: 1745
Joined: Tue Oct 06, 2015 10:16 am

Re: Converting CHR RAM to RGBA textures

Post by calima »

I echo rainwarrior, write a shader to do your indexing. Also do not bother with partial updates, a 128kb texture is so small the overhead of fetching and locking over pcie will be more than just re-uploading it wholesale.

The palette is so small it should be sent as uniforms instead of a texture. Should also be faster than dependent texture reads. Not that speed matters for this, a HD NES is something 10 year old mobile gpus can handle.
User avatar
Dwedit
Posts: 4924
Joined: Fri Nov 19, 2004 7:35 pm
Contact:

Re: Converting CHR RAM to RGBA textures

Post by Dwedit »

I'm just going by my past experience with DirectDraw and Direct3D on a 2006 era laptop with Radeon x300 graphics.

The bottleneck was CPU -> GPU transfers. What mattered was how much data was transferred. Once the data was in video memory, drawing it elsewhere was much faster. (Meanwhile going the other way, GPU -> CPU transfers were so slow you should avoid them at all costs)

So for a system like that, you would keep the main copy of your texture in System Memory, then copy any dirty rectangles to video RAM.
As for what OpenGL API to use, you use glTexSubImage2D to update a texture, glTexImage2D is for creating brand new textures only. You can peek inside of your graphics library to make sure it isn't picking the wrong API function.

---

Another thing that is helpful for reducing GPU usage is Dirty Rectangles. Yes, seriously. Those are still a thing today.
Requires use of COPY or FLIP swap effect, DISCARD won't work.
Use the Scissor Rectangle for the overall bounding box. You use the Z buffer for the rest of the rectangles within that overall bounding box. Clear the Z buffer to 0, then draw quads with value 1 everywhere you want to be allowed to be drawn.
Once you have your Scissor Rectangle set and your Z buffer prepared, you can just simply attempt to draw the whole scene. Through GPU magic (clipping, culling, early depth test, etc), everything outside of those rectangles won't be drawn.
After the scene is drawn, it's time to put it in the front buffer.
For Swap Effect COPY, you can provide a bounding rectangle, and a list of all rectangles to copy, and the graphics driver will handle it.
For Swap Effect FLIP, it's more complicated. You keep track of dirty regions, but then you need to UNION the dirty region of the previous frame. For more than 2 buffers, UNION with that many more previous frames.
Here come the fortune cookies! Here come the fortune cookies! They're wearing paper hats!
Post Reply