CHR prerendering tricks

Discuss emulation of the Nintendo Entertainment System and Famicom.

Moderator: Moderators

Post Reply
User avatar
Disch
Posts: 1848
Joined: Wed Nov 10, 2004 6:47 pm

CHR prerendering tricks

Post by Disch »

I decided to fork this thread from this one:

http://nesdev.com/bbs/viewtopic.php?t=3964&start=30
Fx3 wrote:

Code: Select all

   unsigned char layerA = (src[8] & 0xAA) | ((*src >> 1) & 0x55);
   unsigned char layerB = ((src[8] & 0x55) << 1) | (*src & 0x55); 
That's a very neat way to do it! I use the following:

Code: Select all

	static const u8 lut[4] = {0x00,0xFD,0xFE,0xFF};

	u8 a = src[0];
	u8 b = src[8];

	dst[0] = lut[ ((a >> 7) & 1) | ((b >> 6) & 2) ];
	dst[1] = lut[ ((a >> 6) & 1) | ((b >> 5) & 2) ];
	dst[2] = lut[ ((a >> 5) & 1) | ((b >> 4) & 2) ];
	dst[3] = lut[ ((a >> 4) & 1) | ((b >> 3) & 2) ];
	dst[4] = lut[ ((a >> 3) & 1) | ((b >> 2) & 2) ];
	dst[5] = lut[ ((a >> 2) & 1) | ((b >> 1) & 2) ];
	dst[6] = lut[ ((a >> 1) & 1) | ((b     ) & 2) ];
	dst[7] = lut[ ((a     ) & 1) | ((b << 1) & 2) ];
you can ignore the 'lut' thing for now, that's just doing the converting to 00,Fx thing I was talking about before. Your way seems to involve much less shifting. I think I like your way better =)
So, values are not 00, Fx as you mentioned.
The reason I convert 0,1,2,3 to 00,Fx is for the shorter attribute combination. The idea is the following:

- the easiest way to determine whether or not a pixel is transparent is to see if it's zero.

- if you OR attribute bits... transparent pixels will not be zero (since they will have their attribute bits set)

- you can get around that by conditionally ORing attribute bits (ie: only OR them if the original pixel is nonzero), but this requires an additional conditional for every onscreen sprite and BG pixel!

- by using 00,FD,FE,FF and combining attribute bits with AND, this ensures that pixel 0 will stay zero even after you combine attribute bits, but pixels 1-3 will retain their attribute bits.


A lot of this depends how you're rendering, too. But I think for the most part we all do it the same way, since at the end of the day you have to use the pixel in a palette lookup


Given BG pixel 'a' and sprite pixel 'b', you need to produce a value between 0x00-0x1F for the output (which goes to palette lookup). This value will be either the BG or sprite pixel depending on which has priority -- or it will be 0x00 if both are transparent. The simplest way to approach this that I have found is:

Code: Select all

a = BG_Pixel;
b = Sprite_Pixel;  // will be ORd with 0x80 if it has foreground priority

// apply clipping here -- you can set 'a' or 'b' to 0 if sprites/bg is disabled
//  or if this is being clipped from the left-8 pixels or whatever
if(dot < ppu.nBGClip)  a = 0;
if(dot < ppu.nSpClip)  b = 0;

// determine whether to output BG or sprite pixel
//  you output 'a'  (the BG pixel) unless:
//  a is zero (transparent)
//  or sprite pixel 'b' has foreground priority

if(!a || (b & 0x80) )
  a = b & 0x1F;

// if both a and b were zero (both pixels transparent)
//  result with be 0x00 here, which will output $3F00

// 'a' is now 0x00-0x1F -- our output pixel
OutputPixel(a);
this is the shortest and simplest I've been able to make this code. Since it's run the most (256x240 times every frame) I figure this needs to be the quickest.

This only works if transparent pixels are always 0.. regardless of their attribute bits or other information (like the 0x80 sprite priority bit -- that must also be 0 if the pixel is transparent).

So then the next part of this trick is making transparent pixels always be 0 (rather than 0x04, 0x08, or 0x0C -- which are also transparent pixel values). This is where the AND trick comes in:

Code: Select all

static const u8 at_lookup[4] = { 0x03,0x07,0x0B,0x0F };

u8 at = attribute_bits;  // I'll spare you my calculations here, but basically
// the 'at' gets a value 0-3 for the attribute bits

at = at_lookup[ at ]; // left shift by 2, OR with 3

BG_Pixel = CHR_Pixel & at;  // combine attribute bits with CHR bits
by combining attribute bits this way, BG_Pixel stays 0 for transparent pixels without any conditionals... since CHR_Pixel for pixel 0 is 0... anything you AND with it will also be 0.


Anyway that's my approach.
User avatar
hap
Posts: 355
Joined: Thu Mar 24, 2005 3:17 pm
Contact:

Post by hap »

cool :)
If you've got an earlier source of Schpune, the one before you added this, could you measure the speed improvement?
User avatar
Zepper
Formerly Fx3
Posts: 3264
Joined: Fri Nov 12, 2004 4:59 pm
Location: Brazil
Contact:

Post by Zepper »

- OK, I got your idea working here, but a benchmark might be required...
- you can get around that by conditionally ORing attribute bits (ie: only OR them if the original pixel is nonzero), but this requires an additional conditional for every onscreen sprite and BG pixel!
Extras:
1. You need a table with 00, FD, FE, FF to index every decoded CHR pixel.
2. Another table with 03, 07, 0B, 0F is required for every onscreen pixel.
3. The final AND is performed, but a few table[ table[ table ] ] is added.

Code: Select all

#define TILEBANK   ppu->bg_tile | (*bg->tile >> 6)
#define TILELINE   ((*bg->tile & 0x3F) << 3) | bg->line

   /* pattern pixel
    */
   ppupattern = chr_cache[TILEBANK][TILELINE] & attrmask[ (bg->name[ppu_bgattr[attrnum]] >> ppu_bgshft[attrnum]) & 3 ];
EDIT: well, in particular, the emulation is around 15FPS slower using the AND trick.
User avatar
blargg
Posts: 3717
Joined: Mon Sep 27, 2004 8:33 am
Location: Central Texas, USA
Contact:

Post by blargg »

I posted a pretty complete description of how I do rendering a while back. In short, four pixels are processed at a time by using 32-bit integers (or 64-bit if the processor supports it natively), and there is no branching (if statements), so it pipelines really well. The CHR cache just reorders data rather than expanding it, so it doesn't take lots of space either.
Post Reply