I worked on the sprites today and yesterday. First I copy pasted them all to a single image and started trying to figure out the palette sets. I managed to get a pretty good compromise once again. The "Superb Joe" and Leopardman sprites are problematic though, since some of the colors they use aren't used anywhere else. Technically they can be drawn with the current palette sets, but the sprites would be layered and end up using colors from three different palette sets, so it would probably flicker like hell, especially since Superb Joe and Leopardman need to be on screen simultaneously.
Anyway, next I assigned the palettes to each spritesheet, split them up with Imagemagick, and fed them to my sprite converter. Then I had a bunch of .SPR (the sprite definition) and .CHR files. I took the most common sprites (excluding the Master Y, Dr. Tary, Leopardman and Superb Joe sprites, since those won't be needed simultaneously with the normal player sprite) and combined them to a single .CHR file, optimized the tileset and guess what: it came out at EXACTLY 256 tiles (4KB). Pretty funny coincidence. Here's an image of the tileset in its current state (the tiles are in pseudo random order due to the way the optimizer works):
You may notice that some of the tiles are very similar, but not quite. This is a small problem that occurs when using the sprite converter. It works on individual images, so it doesn't always produce optimal results if the size of the global tileset is a concern. Since it's a genetic algorithm it's also somewhat random in how the sprites are placed (several placements can give the "optimal" result, i.e. the same amount of sprites used).
There are some ways to fix this problem. I could do a first pass, looking for similarities in images, split them up from those areas, then convert, and then recombine them. This should produce a little bit smaller final tileset, maybe with the cost that the metasprites would use some more 8x8 sprites.
Another way to fix it would be to modify the genetic algorithm so that placements which re-use existing tiles (from the global tileset) get some extra "fitness" points. I'm probably not going to bother with that for now.
One way to get more wiggle room in the sprite bank would be of course to upload some of the sprite frames dynamically. Player sprites would be a good candidate for this. They take 8 tiles in the worst case I think, so that would be 8 * 16 bytes to be uploaded per frame, so 128 * 8 = 1024 CPU cycles with the fastest possible code. Considering the only other PPU uploads I need are the OAM and the palette, it should be very much doable.
I should modify my sprite converter (image -> sprite definition) tool to support 8x16 sprites, just to see what kind of results that would give.
...
On another topic, I've been thinking about how to implement the "ghost" in the original game. You see, there's a 5 minute time trial mode in the Flash game. If you complete it, the next time you'll race it you'll see a ghost run of your fastest run, with Leopardman (:)) playing the role of the ghost. Or so I've been told, I haven't actually been able to beat the time trial in 5 minutes yet.
So the problem is how to get 5 minutes worth of controller input / other state data squeezed so that it fits in the WRAM. (I think WRAM is absolutely necessary for this, especially given that there has to be two copies of the data -- one for the current run and one for the fastest run). The data would need to be compressed and decompressed in real time. Compressing just the controller input wouldn't be enough, because we don't have easy access to collision etc data of each screen all the time (and the player might be at a completely different screen than the ghost).
This is certainly not top priority, but it's fun to think about.