Clever tricks developers implemented for performance

Discussion of development of software for any "obsolete" computer or video game system. See the WSdev wiki and ObscureDev wiki for more information on certain platforms.
Post Reply
User avatar
IHaveOneQuestion
Posts: 2
Joined: Sun Oct 29, 2023 11:11 am

Clever tricks developers implemented for performance

Post by IHaveOneQuestion »

Greetings friends, long time no see. Recently I read this dolphin blog post entry https://dolphin-emu.org/blog/2019/04/01 ... -backends/ in which a clever trick is described to achieve fast bloom effect. From that link:
The Wii is certainly not capable of volumetrics, so they use a clever little trick to create this atmosphere. The game takes a very low resolution EFB copy of the screen, reads that copy with the CPU using EFB peeks, and then uses EFB pokes to write a luminosity map to brighten and darken parts of the screen. The result is surprisingly convincing for what is effectively a fancy post-processing shader!
Image
Image

That left me wondering if developers during the third/fourth generation (snes/nes, the stuff you usually deal with here) did similar things to achieve impressive scenes with modest means. From what I gather the nes is so limited the developers have to fight hard in order to overcome its limitation (such as using the background to render big bosses). Maybe I'm wrong. On the other hand, the snes is more powerful. And though the most beautiful scenes are usually "prerendered" like this one
Image
I wonder if other games pushed the console in what effects they could pull, special modes aside.
User avatar
aa-dav
Posts: 201
Joined: Tue Apr 14, 2020 9:45 pm
Location: Russia

Re: Clever tricks developers implemented for performance

Post by aa-dav »

Imho, most exploitative trick on NES was 'hblank trap for scrolling'. It's described in all details here: https://www.nesdev.org/wiki/PPU_scrolling
This led to parallax scrolling effects and these 'huge bosses on backplane'.
Image
Latter needs to manupulate single existing backplane to split it to several independently scrolled parts:
Image
And so on.
NES from shelve could not do more than 1 hblank-split per frame, so they managed it via mappers.
SNES has built-in capabilities for such things.
User avatar
tokumaru
Posts: 12416
Joined: Sat Feb 12, 2005 9:43 pm
Location: Rio de Janeiro - Brazil

Re: Clever tricks developers implemented for performance

Post by tokumaru »

Bucky O'Hare on the NES has this cool visual effect that simulates 3 overlapping background planes, while the hardware actually only has 1:

https://youtu.be/dWHPNnrTmKs?si=DNB0Jkh ... g&t=36m26s

This is achieved through a combination of various tricks. First, there's the structure on the right (the more detailed one), which never changes so it's drawn normally using the name tables. The scroll is changed over time to make this structure move. Second, there's the structure on the right, which intentionally lacks any horizontal detail, allowing it to be "moved" with just a few tile updates to the parts where the spikes are. The spikes, BTW, are sprites, used to mask the coarse tile updates and create the illusion of smooth movement. Finally, there's the background which should theoretically be moving as the scroll changes, but they use CHR bank witching to replace those tiles with rotated versions of themselves, visually cancelling any horizontal movement (i.e. if 2 pixels are added to the X scroll, a tileset with the tiles shifted 2 pixels to the right is used on the next frame).

Bucky O'Hare has a bunch of other cool visual effects, but this one stands out to me as the most original and creative.
Pokun
Posts: 2600
Joined: Tue May 28, 2013 5:49 am
Location: Hokkaido, Japan

Re: Clever tricks developers implemented for performance

Post by Pokun »

IHaveOneQuestion wrote: Sun Oct 29, 2023 11:34 am From what I gather the nes is so limited the developers have to fight hard in order to overcome its limitation (such as using the background to render big bosses). Maybe I'm wrong. On the other hand, the snes is more powerful. And though the most beautiful scenes are usually "prerendered" like this one
It depends what you compare with. When the NES was released there was pretty much nothing like it as far as home video game systems went, and the hardware was most nearly underused by games in its earlier days, as most games didn't scroll at the time and games normally used very simple BGMs. It uses a custom video chip with great sprite and scrolling capability making it similar to arcade systems of the time and much more capable than most consoles and computers in these areas (while lacking in others like RAM). And it was still quite expensive when new.

Later games used the hardware to its fullest and relied on mappers, extra RAM and other hardware in the cartridge to get around the limitations. But the NES still had a lot to give and I know some developers complained that Nintendo killed it too fast. Like Sunsoft who had to scrap a lot of games when Nintendo denied them licenses in order to focus on the SNES.

The technology improved in square at the time so the difference between one generation and the next, like the NES and SNES are pretty large (though there where in-between systems like the Sega Master System and the PC Engine that filled that gap).

That screen from Seiken Densetsu 3 doesn't really do anything special AFAIK. It simply uses background tiles in multiple layers which the SNES offers natively. It's just very beautifully drawn by the artists, those delicate sunrays are quite characteristic for Square Soft games I think. Maybe the screen uses some transparency effects I don't know, but that is also one of many features offered by the two very capable video chips in the SNES, and not some special trick.


One simple technique often used on NES is taking advantage of the 8 sprites/scanline limitation. By placing 8 invisible sprites with the lowest index numbers at a certain height, any other sprite that will go there will start to clip and gradually disappear as there can't be 9 or more sprites shown on those scanlines. Zelda 1 uses this at cave entrances and dungeon doors so that Link is clipped when he enters to make it look like he is partly obscured by the edge of the door as he enters.
Castlevania: Simon's Quest uses this at the height of the swamp surface to make Simon's lower body be clipped off as he enters the swamp water.


Using backgrounds for large bosses is very common. Rockman games uses this a lot. The main drawback is that the lines the boss appears on can not contain any other background tiles or they would move with the boss (since the boss is moved by scrolling that part of the background). That's why those lines just has a black backdrop (though this color can be any one single color) on those boss fights.


One pretty unique thing with the NES is that the sprite and tile graphics tables (AKA pattern tables) are on the cartridge instead of in the console itself. On the Game Boy, SNES and most other systems these tables are in the console itself as part of the video RAM.
This allows the NES to use either ROM or RAM for these tables, and most NES games do use ROM. With a mapper to allow switching multiple banks of ROM, games can animate the full screen by simply bank switching the graphics, very cheap and powerful.
This is indeed very powerful stuff, someone made some proof-of-concept cartridges that ran all kinds of advanced video (something like Doom) by using the fact that the pattern tables are exposed to the cartridge slot, something not possible on other systems.
User avatar
Dwedit
Posts: 4877
Joined: Fri Nov 19, 2004 7:35 pm
Contact:

Re: Clever tricks developers implemented for performance

Post by Dwedit »

I think the clever trick for the SNES would be intelligent use of Additive and Subtractive blending to get multiple levels of translucency. Normally you can only get 50% transparency, but additive and subtractive blending are not subject to that rule.
Here come the fortune cookies! Here come the fortune cookies! They're wearing paper hats!
creaothceann
Posts: 586
Joined: Mon Jan 23, 2006 7:47 am
Location: Germany
Contact:

Re: Clever tricks developers implemented for performance

Post by creaothceann »

IHaveOneQuestion wrote: Sun Oct 29, 2023 11:34 amClever tricks developers implemented for performance
Broadly speaking:
  • Activating VSync half a line early is a hack of the NTSC standard to get a progressive display (224p/240p) and a faster frame rate. Games can use the latter to fake transparency.
  • Another way of faking transparency is to use dithering / 1-pixel wide vertical columns.
  • Any cartridge that uses a coprocessor does overcome the limitations of the console; can't do that with a disk-based game. They can also use SRAM on the cartridge as extra RAM.
  • SNES HDMA is a method to save RAM; without it a game would have to store a display list in VRAM or one of the PPU chips.
--------

A couple games, e.g. Earthworm Jim 2, stream songs to the APU to overcome its 64 KiB limit / reduce audio loading times / music channel vs. sfx conflicts.

The SNES sprite RAM usually can't/shouldn't be written to while rendering takes place, because the address register is changed and used internally. Uniracers / Unirally does it anyway, because it only has a few sprites on screen during gameplay and the address change is deterministic.

For Another World / Out Of This World, Rebecca Heineman used the DMA scratchpad registers to execute code at 3.58 MHz.

From fullsnes:
Hires and Pseudo 3-Layer Math
In Hires modes (BG Mode 5,6 and Pseudo Hires via SETINI), the main/sub screen pixels are rendered as half-pixels of the high-resolution image. The TV picture is so blurry, that the result will look quite similar to Color Addition with Div2 - some games (Jurassic Park and Kirby's Dream Land 3) are actually using it for that purpose; the advantage is that one can additionally apply COLDATA addition to (both) main/sub-screen layers, ie. the result looks like "(main+sub)/2+coldata".


HDMA is used by many games to change the backdrop color to create a sky, e.g. DKC, or to create parallax (or even 3D) effects. It can be used to change sprites and BG modes, and makes BG Mode 7 and the window registers actually usable.
Last edited by creaothceann on Tue Oct 31, 2023 3:07 pm, edited 1 time in total.
My current setup:
Super Famicom ("2/1/3" SNS-CPU-GPM-02) → SCART → OSSC → StarTech USB3HDCAP → AmaRecTV 3.10
User avatar
Dwedit
Posts: 4877
Joined: Fri Nov 19, 2004 7:35 pm
Contact:

Re: Clever tricks developers implemented for performance

Post by Dwedit »

creaothceann wrote: Tue Oct 31, 2023 4:24 am Broadly speaking:
  • Activating VSync one line too early is a hack of the NTSC standard to get a progressive display (224p/240p) and a faster frame rate. Games can use the latter to fake transparency.
Is that correct? I thought the interlaced fields of video started one half-scanline (not full scanline) later, causing the field to be drawn sightly lower (between the two scanlines of the previous frame)
You got 240p by omitting that half-scanline delay.
Here come the fortune cookies! Here come the fortune cookies! They're wearing paper hats!
creaothceann
Posts: 586
Joined: Mon Jan 23, 2006 7:47 am
Location: Germany
Contact:

Re: Clever tricks developers implemented for performance

Post by creaothceann »

Dwedit wrote: Tue Oct 31, 2023 12:36 pm I thought the interlaced fields of video started one half-scanline (not full scanline) later
Yeah... I was thinking of how it removes an entire line from the standard line count.
Last edited by creaothceann on Tue Oct 31, 2023 5:03 pm, edited 1 time in total.
My current setup:
Super Famicom ("2/1/3" SNS-CPU-GPM-02) → SCART → OSSC → StarTech USB3HDCAP → AmaRecTV 3.10
Catyak
Posts: 51
Joined: Mon Apr 25, 2022 4:33 pm

Re: Clever tricks developers implemented for performance

Post by Catyak »

Some of my favorite "performance tricks" would be the tricks used to fake sprite scaling on 16 bit consoles, since real 2D image scaling is slow. This includes drawing pre-scaled frames and switching between them (what a lot of Mode 7 SNES games do), but my favorite trick is basically smushing sprites closer together to fake scaling down a sprite, which Treasure used in their Genesis Yu Yu Hakusho game. You can't scale the graphics down too much when only using this technique, but if used carefully, it works well.

Image

For more detail, see this article from Raster Scroll's excellent series on Genesis graphics tricks, and another article from plutiedev detailing how you can combine this technique with pre-scaled frames to create a better illusion of scaling a sprite down.

https://rasterscroll.com/mdgraphics/gra ... s/scaling/

https://plutiedev.com/scaling-sprites
turboxray
Posts: 327
Joined: Thu Oct 31, 2019 12:56 am

Re: Clever tricks developers implemented for performance

Post by turboxray »

PCE:

the huc6280 has some additional (specialized) opcodes just for the PCE. Namely, you can store immediate values directly to the vram data port. You can use this to do quick register selections and updates, but you can also write to vram this way (and it's faster than a Txx block transfer instruction.. and it doesn't stall interrupts). I've used it in my homebrew to encode sprites cells for fast transfer (the graphics are stored as st1/st2 opcodes that I jsr to). ST0 #imm selects a VDC register, and ST1 and ST2 write to the lower and upper data port.

The VDC chip has a 16bit WORD interface - upper byte and lower byte. You can only update/write/read VRAM a WORD value at a time. But the "latch" is on the upper byte. So if the lower byte doesn't need to change, in the 16bit WORD, you can just write to the upper byte and whatever was left over in the lower byte comes along for free in the transfer. Pairing this with the above ST1/ST2 embedded graphics as opcodes, you can increase the transfer bandwidth even further.

For VDC registers (unlike VRAM reading and writing), there is no latch for the WORD values. If you can update just the lower, just the upper, or both. This shaves some cycle times for the h-sync interrupt routines.

The huc6280 has TRB and TSB instructions. If you keep Acc clear, you can use these as fast copy-byte for ports. Since it's a read-modify-write instruction to the same address. But since Acc is clear, no modification happens. Falcom does this in some games as a fast copy mechanism for vram during active display; it does a TRB on the low byte port and TRB on the high byte port.. in a loop. This works because vram READ pointers and vram WRITE pointers are separate registers, and you can both read and write to vram via the same port. You can freely write and/or read from VRAM during active display on the PCE (unlike other consoles), but the vram to vram copy DMA cannot be used during active display, so this a good/fast workaround. 
93143
Posts: 1681
Joined: Fri Jul 04, 2014 9:31 pm

Re: Clever tricks developers implemented for performance

Post by 93143 »

Catyak wrote: Tue Oct 31, 2023 3:30 pmsmushing sprites closer together to fake scaling down a sprite, which Treasure used in their Genesis Yu Yu Hakusho game.
There are a couple of SNES games that do that. Revolution X, for instance, uses it to scale a helicopter on a scaling Mode 7 background. Art of Fighting 2 uses a somewhat more expensive version where the edges of the sprites are trimmed before mashing them together - again, using a Mode 7 background.
User avatar
IHaveOneQuestion
Posts: 2
Joined: Sun Oct 29, 2023 11:11 am

Re: Clever tricks developers implemented for performance

Post by IHaveOneQuestion »

Interesting thread so far filled with facts. Catyak I read the links you provided, curious how the techniques described are device agnostic. I specially liked how rotation was cached.
User avatar
tokumaru
Posts: 12416
Joined: Sat Feb 12, 2005 9:43 pm
Location: Rio de Janeiro - Brazil

Re: Clever tricks developers implemented for performance

Post by tokumaru »

Catyak wrote: Tue Oct 31, 2023 3:30 pmImage
I wonder how the games that use this technique implement it under the hood... In the case of Yu Yu Hakusho, I think that the scaling animation is only ever applied to a single pose of each character, so they could very well just have hardcored all the sprite positions in the ROM and it wouldn't have been so bad, but it'd be interesting if the sprite positions were dynamically calculated at run time. Maybe you could make it so that sprite coordinates were relative to a reference point, and you'd proportionally scale these coordinates down according to each object's scaling factor.
Post Reply