Page 1 of 2
Fast 3D blitting on Super FX
Posted: Wed Mar 15, 2017 3:59 pm
by ARM9
Re: Fast 3D blitting on Super FX
Posted: Thu Mar 16, 2017 11:34 pm
by 93143
Ah. You have at least a rudimentary 3D engine running on the Super FX. Good stuff. I'm afraid 3D is still mostly voodoo to me; flat perspective in Mode 7 is about as far as I've gone...
How much compute time does that scene take? I can't imagine the chip is anywhere near pegged... Also, why does pressing Start cause the screen to momentarily black out?
Are you working on a game?
Re: Fast 3D blitting on Super FX
Posted: Fri Mar 17, 2017 4:00 pm
by ARM9
The frame rate depends heavily on the amount of polygons that actually get drawn, the rest is relatively fast.
I switched to 2bpp and it's just about breaking 60fps if the objects don't get too large, you can see it dip down to 30 as they get closer to the camera.
https://www.dropbox.com/s/448s5i5wi52qq ... s.sfc?dl=0
When you press start it just resets the framebuffer.
That was actually an older engine. I wrote a better engine recently for a thing I'm working on
https://www.youtube.com/watch?v=MjgdSikMXSA
It's rendering too fast since I had to record with no$sns (30fps, actual frame rate is 15). You can just set the playback to 0.5 for the full superfx experience.
This model has around 300 polygons which isn't too bad, 20fps should be doable even with simple lighting.
One problem with lighting at 4bpp is the palette limit, 15 colors just isn't cutting it for these scenes.
Since fill rate is the main bottleneck, and 8bpp doubles the amount of time that'll take, if there's a fast way to build the palette and map each frame after determining which tiles need what colors, you could do up to 127 colors at 4bpp.
Could just cut the resolution and/or start targeting PAL to offset the increased DMA.
Re: Fast 3D blitting on Super FX
Posted: Fri Mar 17, 2017 5:47 pm
by Drew Sebastino
ARM9 wrote:The frame rate depends heavily on the amount of polygons that actually get drawn, the rest is relatively fast.
Wait, really? I would have thought it would be the size. I understand that even the "relatively fast" stuff still has the potential to be enough to put it over the edge, but you also said the framerate drops when the objects are closer to the camera.
How do you program filling in flat-shaded polygons? Is it possible to look at the left side of the polygon, then the right side on the same row of pixels, do a loop to fill out the row, and then move down one and check the bounds of the polygon again?
Re: Fast 3D blitting on Super FX
Posted: Sat Mar 18, 2017 1:48 am
by Bregalad
Heh, it's pretty cool, congratulations ! How can you test it on the hardware ?
Re: Fast 3D blitting on Super FX
Posted: Sat Mar 18, 2017 7:51 am
by KungFuFurby
Sometimes I wonder if under certain circumstances (especially if movement is small), it would be speedier to determine which pixels have actually changed and simply modify those instead? Just thinking of theory, that's all.
Re: Fast 3D blitting on Super FX
Posted: Sat Mar 18, 2017 8:24 am
by ARM9
Espozo wrote:
Wait, really? I would have thought it would be the size. I understand that even the "relatively fast" stuff still has the potential to be enough to put it over the edge, but you also said the framerate drops when the objects are closer to the camera.
It'd be more efficient to render one large polygon than the same area comprised of several smaller polygons.
The larger a polygon gets the longer it'll take to draw, and with fill rate being the largest bottleneck you want to draw as few of them as possible. Given a sufficiently complex scene you're likely to draw on most of the screen and you'll have a lot of overdraw unless you discard as many polygons as possible.
It's hard to sustain 60fps at a decent resolution, I could push a few more polygons with the new engine but there's a hard cap on the amount of pixels that the hardware can draw in the allotted time.
- Some numbers, 21mhz 224x192 2bpp @ 60fps ntsc (assuming all loops are cached):
- You have about 257,500 cycles to work with each frame on the superfx.
- Plotting 8 pixels without crossing a pixel cache (pcache) boundary (when x wraps to 0 in `x mod 8`) takes 10 cycles (16 with LOOP;PLOT sequence used for span fill).
- A pcache miss stalls for 10 cycles.
- Clearing the framebuffer takes roughly 53,760 cycles.
- Filling the entire screen using LOOP;PLOT sequence takes about 86,000 cycles in the best case (plot x from 0-223 on each line).
The latter two consume over 50% of the cycles we're working with. But in that time you also have to copy, transform and project your polygons, do visible surface determination (which is an entire rabbit hole in itself, I've yet to find an optimal solution for arbitrary geometry on the superfx) and clipping.
Due to the nature of a 3D scene you're unlikely to reach the optimal 86,000 raster cycles even when not filling the entire screen. Lots of pcache misses, some overdraw is unavoidable without perfect vsd (zbuffer is expensive in time and impractical in space, coverage buffer is potentially expensive both in time and space).
Espozo wrote:
How do you program filling in flat-shaded polygons? Is it possible to look at the left side of the polygon, then the right side on the same row of pixels, do a loop to fill out the row, and then move down one and check the bounds of the polygon again?
That's the gist of the algorithm, determine the span of each row in a polygon and plot them. I use the slope of the edges to determine the bounds.
Bregalad wrote:Heh, it's pretty cool, congratulations ! How can you test it on the hardware ?
Hey, thanks! Just sacrifice a copy of your favourite superfx game and replace the rom. The old engine only uses 32K ram so no need to download more ram.
The new engine uses 64K so far, might end up using 128K.
KungFuFurby wrote:Sometimes I wonder if under certain circumstances (especially if movement is small), it would be speedier to determine which pixels have actually changed and simply modify those instead? Just thinking of theory, that's all.
With coverage buffers you could clear just the areas that were drawn last frame. But I'm not convinced it'd be faster than a `store word` loop in my case since planar format mandates using the slightly slower PLOT instruction to clear spans, and scenes fill most of the screen. Any ideas?
Re: Fast 3D blitting on Super FX
Posted: Sun Mar 19, 2017 1:26 pm
by 93143
ARM9 wrote:That was actually an older engine. I wrote a better engine recently for a thing I'm working on
This model has around 300 polygons which isn't too bad, 20fps should be doable even with simple lighting.
Now that's what I'm talking about. What sort of differences are there between the old engine and the new one?
I don't mean to disparage the older demo, because it's still cool to see a hobbyist get a real 3D engine running on the Super FX. But the fact remains that if you ignore fill rate, 300 triangles at 15 fps (or, better yet, 20 fps with lighting) is just way more impressive than 18 triangles at 30 fps... Fill rate is kinda important, though...
Anybody have reasonably hard numbers (within an order of magnitude or so would be nice) on polygon count in Star Fox 2? Virtua Racing? The answers I get from the internet seem to have a log-random distribution, though I admit I find some numbers more plausible than others...
One problem with lighting at 4bpp is the palette limit, 15 colors just isn't cutting it for these scenes.
Yeah, that's why Star Fox limits itself to orange, blue, and gray. You can have five shades of each, plus dither.
I found this out when I was messing with the idea of a TIE Fighter port. Gray, blue, red, green, and yellow, with engine glow cycle in three of those and frame rate being very important for gameplay - the lighting in that game would have looked rough...
Stunt Race FX looks much more balanced, because it's 8bpp, which is probably what murdered the frame rate...
Just sacrifice a copy of your favourite superfx game and replace the rom.
I'm hoping an FPGA Super FX gets developed at some point. I'll probably end up ruining a GSU2 cartridge (or at least hiring somebody to do it for me) to make a devcart, so I can be sure my game is real, but making additional copies would get increasingly unjustifiable without a sustainable source of chips.
Re: Fast 3D blitting on Super FX
Posted: Sun Mar 19, 2017 5:22 pm
by Drew Sebastino
93143 wrote:Yeah, that's why Star Fox limits itself to orange, blue, and gray. You can have five shades of each, plus dither.
How possible would it be to have 64 colors via a 4bpp layer and a 2bpp layer with color math (possibly even just varying shades of gray with color subtraction)? I think that would be a good compromise between the number of colors and performance, but I don't know if having two buffers like that will actually hurt performance even if it reduces the amount of data you have to transfer to vram.
Re: Fast 3D blitting on Super FX
Posted: Mon Mar 20, 2017 3:14 am
by Stef
ARM9 wrote:
- Some numbers, 21mhz 224x192 2bpp @ 60fps ntsc (assuming all loops are cached):
- You have about 257,500 cycles to work with each frame on the superfx.
- Plotting 8 pixels without crossing a pixel cache (pcache) boundary (when x wraps to 0 in `x mod 8`) takes 10 cycles (16 with LOOP;PLOT sequence used for span fill).
- A pcache miss stalls for 10 cycles.
- Clearing the framebuffer takes roughly 53,760 cycles.
- Filling the entire screen using LOOP;PLOT sequence takes about 86,000 cycles in the best case (plot x from 0-223 on each line).
Cool to have these numbers i can compare them to numbers i have for my MD bitmap
rendering code.
I don't understand why you said you have only 257,500 cycles to work with on each frame ? shouldn't be 21Mhz / 60 ~350,000 cycles ? As you have double buffering i though you could use the SFX while the frame buffer is being transferred ?
In my case i'm working with a 256x160 resolution 4bpp software bitmap buffer (need bitmap --> tile conversion).
I've a bit less pixels than you (~41000 pixels compared to ~43000 pixels) but i use 4bpp which i think is a minimum and can work well with a good palette (as in Starfox). I guess than for 4bpp you need to divide your numbers by 2 (not sure about the PLOT instruction ?).
Anyway here are the raw numbers in the MD case:
- You have about ~127800 cycles per frame at 60 FPS but realistically on NTSC systems we can assume 20 FPS as the maximum possible with software bitmap transfer to VRAM.
--> ~383500 cycles per frame @20 FPS NTSC
- Clearing the framebuffer takes about 43000 cycles.
- Transferring (and converting) the framebuffer to VRAM takes about 123000 cycles.
- Plotting 8 pixels takes 12 cycles.
- A minimum of 200-250 cycles by polygon horizontal line (filling edges and handling lines loop)
Add to that :
- 3D transformation calculation :
--> can transform with 2D projection about ~10000 vertices / seconde = ~500vertices @20 FPS.
As you also need to do the 3D rendering realistically you can't obtain this number of course 
- Polygon sorting and BSP handling for correct draw order
- Clipping (can consume a lot of multiplication / division operations)
- All the game logic and other stuff
So if you want to keep 20 FPS, you have to work with only 383500 - (43000 + 123000) ~ 217 500 cycles per frame.
These 217 500 cycles should handle:
- 3D transformation
- polygon sorting
- polygon clipping
- polygon rendering
- game logic and other stuff
That's definitely not a lot to work with... Of course you can accept frame drop but i think it start to hurt when you go below 10 FPS.
Re: Fast 3D blitting on Super FX
Posted: Mon Mar 20, 2017 9:57 am
by 93143
Stef wrote:I don't understand why you said you have only 257,500 cycles to work with on each frame ? shouldn't be 21Mhz / 60 ~350,000 cycles ? As you have double buffering i though you could use the SFX while the frame buffer is being transferred ?
No. If that's what you meant specifically, I misled you.
It takes multiple frames to transfer the data in most cases, and you
can have the Super FX working on the next frame before the current one is done transferring. But that only refers to the time between the actual DMA transfers, which is most of the frame, but not all of it.
There's only one RAM pool, and if the SNES is accessing it, the Super FX can't. 224x192 at 2bpp 60 fps is about 65 lines worth of DMA, or about a quarter of a frame, and the GSU can't touch the framebuffer during that time.
Re: Fast 3D blitting on Super FX
Posted: Mon Mar 20, 2017 10:02 am
by Stef
Thanks for the clarification. I though the SFX RAM could be split in 2 banks of 32 KB so you could have one bank on SFX side while the other bank was on S-CPU side (as the word RAM in the Sega CD, that is really convenient).
So the 257,500 cycles comes from the cycles eaten by the DMA transfer i guess.
Re: Fast 3D blitting on Super FX
Posted: Mon Mar 20, 2017 8:54 pm
by Oziphantom
Why do you clear the frame buffer, if you are rendering over the screen already then no point wasting cycles to clear it. Or if you are only rendering part of the screen adding a skybox plane to render on the overlap? Gets you a bunch of clocks back.
Can you use a SuperFX with a SA-1? If so using the SNES CPU to do AI/Game logic. Get the SA-1 to do clipping,draw order,backface culling of the triangles, then pump them to the Super-FX for rendering might give a hefty boost.
Re: Fast 3D blitting on Super FX
Posted: Mon Mar 20, 2017 9:24 pm
by Drew Sebastino
How they're hooked to the cartridge bus, I doubt you could do it. Plus, in real life, the power draw is probably too great for the SNES.
Re: Fast 3D blitting on Super FX
Posted: Mon Mar 20, 2017 9:43 pm
by tepples
If you're willing to hook up two coprocessors, why not just use an ARM SoC? There's precedent for using older ARM coprocessors in shogi games, and there are modern ones that sip well under a watt of power.
See
Atmel microchips with an ARM core