Got any tips for Early NES Emulator Development?

Discuss emulation of the Nintendo Entertainment System and Famicom.

Moderator: Moderators

User avatar
Disch
Posts: 1848
Joined: Wed Nov 10, 2004 6:47 pm

Post by Disch »

A trick I came up with for the renderer. It actually consists of several parts:


1) pre-render CHR to a seperate graphics buffer (where individual pixels are stored in their own byte). That will make rendering faster and easier so you don't have to decode 2bpp repeatedly.

For CHR-ROM this can be done once on ROM load

For CHR-RAM you'll need to re-decode an 8x1 section of the tile every time CHR-RAM is written to via $2007. This isn't a big deal, since $2007 isn't written to anywhere NEAR as often as pixels are rendered... so this approach still pays off.

Needless to say you'll still have to maintain the CHR buffers (you can't replace them with these graphics buffers) because you'll still need to respond to $2007 reads and other things.


2) When you decode CHR, each pixel can be one of 4 colors (2bpp). Have these colors be:

0x00, 0xFD, 0xFE, 0xFF

don't use 0,1,2,3. Read why explained below


3) when you're applying attribute bits to this CHR, your attribute will be 0x00, 0x04, 0x08, or 0x0C as you'd expect.. but don't use those values... OR them with 0x03:
0x03, 0x07, 0x0B, 0x0F

4) with this setup, attributes and transparency can be easily applied with a simple AND operation, rather than conditionals and ORs, which you'd might otherwise need:

Code: Select all

outputpixel = decoded_chr_pixel & attribute;
I found that before I did this trick... I had to have something like the following:

Code: Select all

output_pixel = decoded_chr_pixel;
if(output_pixel != 0)
  output_pixel |= attribute;
The single AND is prefereable to the conditional+OR


Anyway just a trick. You don't have to use it... I'm just throwing ideas at you ^^
User avatar
MottZilla
Posts: 2835
Joined: Wed Dec 06, 2006 8:18 pm

Post by MottZilla »

Well, I have quite alot of games running fairly well now. Including UNROM games like MegaMan and Contra. =)

For rendering right now I started off with an inaccurate tile based, full screen renderer.

For CHRROM games, I decode all tiles on load of the ROM into an array of tiles (enough for 256kb CHRROM). These are used for quicker and easier rendering. I also have an array of pointers to point to the ROM data that is swapped in for reading from the PPU. Sections can be as small as 1K.

For CHR-RAM, there is an array (512 bytes) which consists of 0s on load. Anytime CHR-RAM is written to, it figures out which tile was modified and marks that it must be decoded before it can be rendered. Then my renderer checks for needed updates before drawing.

As for the values, the tile arrays use 0,1,2,3. I haven't had any issue with that yet.

1943 is giving me issues (sprites not appearing) I need to trace so I'll have that to look at tomorrow. But also I will get to work on the "Line Renderer" which will render a line after the CPU completes it, which will allow for the sprite 0# and other split screen scrolling effects. That will get SMB and I think Excitebike working.

I have to say working on the graphics has been alot more fun than working on the CPU core was. Not that it wasn't fun, but the CPU was more time consuming, frustrating, and the rewards weren't easily visable. However with graphics today, I went from a static name table display with incorrect colors all the way to a display with correct colors, full sprites, scrolling, etc.

Overall I'm just happy to have gotten somewhere with this project. I really wasn't sure I would get anywhere with this until I first saw Donkey Kong plotting something on the name table. And that was trumped when I finally say the actual graphics.
User avatar
Zepper
Formerly Fx3
Posts: 3264
Joined: Fri Nov 12, 2004 4:59 pm
Location: Brazil
Contact:

Post by Zepper »

- Cool tip, but here's how the CHR is decoded:

Code: Select all

   unsigned char layerA = (src[8] & 0xAA) | ((*src >> 1) & 0x55);
   unsigned char layerB = ((src[8] & 0x55) << 1) | (*src & 0x55);
   unsigned char *buf = dst;
   
   *buf = (layerA >> 6); buf++;
   *buf = (layerB >> 6); buf++;
   *buf = (layerA >> 4) & 3; buf++;
   *buf = (layerB >> 4) & 3; buf++;
   *buf = (layerA >> 2) & 3; buf++;
   *buf = (layerB >> 2) & 3; buf++;
   *buf = layerA & 3; buf++;
   *buf = layerB & 3;
- src is a pointer from the CHR data.
- dst is a pointer to the decoded CHR data.
User avatar
MottZilla
Posts: 2835
Joined: Wed Dec 06, 2006 8:18 pm

Post by MottZilla »

I decode CHR by masking the bits needed, bit shifting, and adding them together to get the final value (0,1,2, or 3). It really isn't very hard to do. Infact, I did it before this project. I was making a NES map editor for a homebrew ROM and wanted to be able to load the NES graphics rather than a converted BMP.

I'm sure there are many ways to decode CHR. There's no one right way to do it. Your way Fx3, I'd have to give some study to fully understand it. I'm sure it works, but so does my way which I find much easier to understand. Afterall I wrote it. :p

I'm sure I'll have some questions when it comes time to emulate the APU. But so far the hardest part was getting the CPU up and working.
tepples
Posts: 22345
Joined: Sun Sep 19, 2004 11:12 pm
Location: NE Indiana, USA (NTSC)
Contact:

Post by tepples »

Fx3's algorithm appears to have the odd pixels (x=1, 3, 5, 7) in one "layer" and the even pixels (x=0, 2, 4, 6) in the other.
mozz
Posts: 94
Joined: Mon Mar 06, 2006 3:42 pm
Location: Montreal, canada

Post by mozz »

tepples wrote:
blargg wrote:It's a bad idea to assume the sizes of the integral types in C and C++.
Two specifications impose constraints on a C compiler: the C standard and each platform's application binary interface (ABI). In C, char means byte, and C guarantees that char is always at least 8 bits (CHAR_BIT >= 8).[1] The ABIs of the most popular platforms (x86, PowerPC, ARM) guarantee that CHAR_BIT == 8, making unsigned char and uint8_t equivalent.
There are several things that are not guaranteed in C or C++ (except maybe in C99 or the latest version of the C++ spec, I dunno?). However, they are true on practically all platforms that anyone has made in the past 20 years, and will continue to be true basically forever:

(1) A byte is 8-bits, and types exist which are 8, 16 and 32 bits in size. Modern compilers all support 64-bit integer types also. Except, you might not know which types are which size! What most people do is simply define their own types for known sizes. Then if you want to support multiple compilers or port it to a different platform, its easy to supply alternate definitions.

In my own code, I usually use the following definitions:

Code: Select all

typedef unsigned char  U8;
typedef  signed  char  S8;
typedef unsigned short U16;
typedef  signed  short S16;
typedef unsigned int U32;
typedef  signed  int S32;
typedef unsigned long long U64;
typedef  signed  long long S64;
Then I use those types everywhere, so that it is easy for me to keep track of what is going on when I do arithmetic or other operations on them. The only time I would use "int" or "unsigned" is as a loop counter where I'm not doing any operations with the counter that mix it with those fixed-size types. For example, if I'm only using it to index an array or something, then I might use "int" or "unsigned". But even then I tend to prefer U32 or S32 for loop counters. If it makes you feel better, then typedef these to the new language types (uint8_t or whatever) but I've personally never bothered to do that.

(2) Integers are stored using 2's complement representation for negative integers (i.e. the top bit is the sign bit, there is only one representation of zero--with all bits clear--and the representation of -1 is the number with all bits set. Contrast this with floating-point numbers, where they actually have *two* representations of zero). No one has made a machine with other int representations for at least 20 years.

(3) NULL pointers to any data type (including void*) can be represented by a bit-pattern of all clear bits. So you can (for example) use memset(data, 0, sizeof(MyStruct)); to clear a structure, and assume that any pointers in it are now NULL. The C/C++ languages actually allow the implementation to use almost anything they want for a NULL pointer--even different values for different types! But nobody does this, and too much existing code would break if they ever tried to change it. So go ahead and assume it.

(4) Most platforms nowadays are "32-bit", which means sizeof(int)==4 and sizeof(void*)==4 (in fact size of any pointer type except C++ for pointer-to-member types, should be 32 bits). If you want to be future-proof for 64-bit platforms its a good idea to keep in mind that their pointer types might be 64 bits instead of 32. But supporting those two combinations should be plenty for most code (unless you plan to port it to cell phones or something... and most of those have 32-bit processors now anyway).

(5) "Natural" alignment: this is not guaranteed on every platform, but it works on all x86-based platforms (as well as all of the common PPC-, Sparc- and ARM-based platforms, and probably most others). Basically, small types like to be aligned to their size (i.e. a 4-byte integer type should be aligned on a 4-byte boundary, i.e. bottom 2 bits of its address should be zero). Structures need alignment and size to the largest alignment of any of their members. *Also a structure's size is rounded up to a multiple of its alignment by adding padding at the end*, so that if you have an array of that struct, the members of the array are all properly aligned. Classes == structures (but if there are any virtual methods or virtual base classes, assume the compiler added some crud to your structure that you can't see to support the virtual stuff). On some platforms, a mis-aligned type is harmless (on x86 this is anything 8-byte-aligned or less), though it is probably slower to access. In other cases it is NOT harmless and causes the program to crash! So compilers have to insert extra code to do misaligned accesses (which is a lot slower), AND they have to know that they're doing it---so if you cast a structure pointer to an aligned U64* for example, you might get crashes because you tricked the compiler into thinking the data accessed through the pointer would be aligned when it isn't.

Anyway, you can avoid nearly all alignment problems if you use "natural alignment" for all of your data: Simply don't change structure packing from the compiler default (some people like #pragma pack(1) and such, but I always avoid them because of these alignment requirements), and always put the larger members of your structure first, *or* count the sizes of the members to make sure the later ones are properly aligned:

Code: Select all

struct Foo
{
    U8 m_type;
    U8 m_flags;
    U16 m_blockSize;    // <-- offset 2,  "natural" alignment == 2
    U8* m_pData;   // <-- offset 4, "natural" alignment == 4 (on most platforms anyway)
    U16 m_dataAge;  // <-- offset 8, "natural" alignment == 2
    U16 m_padding0;  // <-- only exists to make the next field 4-byte aligned
    U32 m_counter;
};
Two things to notice about this little example:
(1) I assumed that sizeof(U8*) == sizeof(U32) == 4. You can always check that with a compile-time assertion, but its true on all 32-bit platforms. (NOT necessarily on some of the newer 64-bit platforms though! So the compiler would have inserted an extra 4 bytes of padding before the m_pData field!)
(2) I inserted a 2-byte m_padding0 field, just so that m_counter would have the proper alignment. Actually, the compiler will insert padding by itself (if its necessary, and unless you've told it not to)... but I prefer to stick to the "natural" alignment rule by inserting padding fields myself so that the compiler never has to add them. That makes it easier to manually add up the size of the structure at a glance, too.

[Edit: I forgot to describe the main usefulness of the "natural alignment" rule... many platforms, such as x86 for example, have rules where a 2-, 4- or 8-byte type can have any alignment you want, but if it happens to cross a cache line boundary then it will be slower to access (sometimes much slower). Or they have rules where the integer types support misaligned accesses but the floating point types don't. So if you just stick to "natural alignment", then you guarantee that no 4-byte or 8-byte type is ever going to cross a 32- or 64-byte cache line boundary, and you avoid having to deal with any of those special cases. "Natural alignment" is a simple rule that's easy to follow, and will avoid 99% of potential alignment problems for most code.]

Anyway, just some ideas. Happy coding!
tepples
Posts: 22345
Joined: Sun Sep 19, 2004 11:12 pm
Location: NE Indiana, USA (NTSC)
Contact:

Post by tepples »

mozz wrote:
tepples wrote:Two specifications impose constraints on a C compiler: the C standard and each platform's application binary interface (ABI).
There are several things that are not guaranteed in C or C++ (except maybe in C99 or the latest version of the C++ spec, I dunno?). However, they are true on practically all platforms that anyone has made in the past 20 years
That's what I meant by ABI constraints.
(1) A byte is 8-bits
CHAR_BIT (number of bits in a byte) can be larger than 8 on some digital signal processors, which might have, say, 32-bit bytes. But I agree that most of us won't ever write NES emulators for such architectures.
In my own code, I usually use the following definitions:

Code: Select all

typedef unsigned char  U8;
typedef  signed  char  S8;
typedef unsigned short U16;
typedef  signed  short S16;
typedef unsigned int U32;
typedef  signed  int S32;
typedef unsigned long long U64;
typedef  signed  long long S64;
Those names sound familiar. Did you learn them from the GBA scene?
Classes == structures
That's actually true by the C++ standard. Within C++, the only difference between the two is the privilege of members that precede the first privilege statement.
Anyway, you can avoid nearly all alignment problems if you use "natural alignment" for all of your data
You'll also have to use byte-wise I/O for file formats too, as plenty of common file formats (such as .bmp) do not use natural alignment.
User avatar
blargg
Posts: 3717
Joined: Mon Sep 27, 2004 8:33 am
Location: Central Texas, USA
Contact:

Post by blargg »

(3) NULL pointers to any data type (including void*) can be represented by a bit-pattern of all clear bits. So you can (for example) use memset(data, 0, sizeof(MyStruct)); to clear a structure, and assume that any pointers in it are now NULL.
Being portable costs very little in this case. Instead of

Code: Select all

MyStruct* s = ...
memset( s, 0, sizeof *s );
you can do

Code: Select all

MyStruct* s = ...
static const MyStruct zero = { 0 };
*s = zero;
This will work properly even if MyStruct has floating-point types in it. If you are declaring MyStruct locally, you can even just do

Code: Select all

MyStruct s = { 0 };
(4) Most platforms nowadays are "32-bit", which means sizeof(int)==4 and sizeof(void*)==4 (in fact size of any pointer type except C++ for pointer-to-member types, should be 32 bits). If you want to be future-proof for 64-bit platforms its a good idea to keep in mind that their pointer types might be 64 bits instead of 32.
If you're coding for a modern platform, why not use intptr_t (or uintptr_t)? The reader then knows that you're stuffing a pointer into an int, and it's guaranteed portable.
(5) "Natural" alignment: this is not guaranteed on every platform, but it works on all x86-based platforms (as well as all of the common PPC-, Sparc- and ARM-based platforms, and probably most others). Basically, small types like to be aligned to their size.
This pretty much has to be the case, because it's guaranteed that for an array of T, elements will be sizeof (T) bytes apart. So the only way a type's alignment wouldn't be sizeof (T) bytes as well is if it were at some offset, for example if sizeof (int) were 4 and proper alignment required that its address % 4 be some non-zero value.
User avatar
Dwedit
Posts: 4470
Joined: Fri Nov 19, 2004 7:35 pm
Contact:

Post by Dwedit »

tepples wrote: CHAR_BIT (number of bits in a byte) can be larger than 8 on some digital signal processors, which might have, say, 32-bit bytes. But I agree that most of us won't ever write NES emulators for such architectures.
Who would refer to it as a "Byte" rather than a "Word" if it's more than 8 bits large?
Here come the fortune cookies! Here come the fortune cookies! They're wearing paper hats!
mozz
Posts: 94
Joined: Mon Mar 06, 2006 3:42 pm
Location: Montreal, canada

Post by mozz »

blargg wrote:This will work properly even if MyStruct has floating-point types in it. If you are declaring MyStruct locally, you can even just do

Code: Select all

MyStruct s = { 0 };
Interesting, thanks for that idiom!

Usually when memset is used its for something that is not statically initialized (e.g. an array of structures on the stack or something), but memset is pretty slow on some platforms anyway so your way might be faster for individual stack-allocated structs. And being portable is nice too of course!
blargg wrote:If you're coding for a modern platform, why not use intptr_t (or uintptr_t)? The reader then knows that you're stuffing a pointer into an int, and it's guaranteed portable.
That might work, but then you either have to not assume a fixed size for those types (is it 4 bytes or 8? depends on the size of the pointers, *and also the size of int on your platform*) or you have to check sizeof(intptr_t) in your code at which point I'd rather be making my own union type anyway. :wink: Depending on your reason for doing such tricks (I usually encounter them in the context of a memory size optimization), you might need to know the pointer size, in which case you are better off using your fixed-size types (as well as putting a compile-time assertion near the code that uses it, that serves to both document and check the assumption). What I've found over the years is that I hate programming with types that I don't know the sizes of. 8) But there is no easy way to avoid it if you want a pointer-sized union... oh well.
blargg wrote:
(5) "Natural" alignment: this is not guaranteed on every platform, but it works on all x86-based platforms (as well as all of the common PPC-, Sparc- and ARM-based platforms, and probably most others). Basically, small types like to be aligned to their size.
This pretty much has to be the case, because it's guaranteed that for an array of T, elements will be sizeof (T) bytes apart. So the only way a type's alignment wouldn't be sizeof (T) bytes as well is if it were at some offset, for example if sizeof (int) were 4 and proper alignment required that its address % 4 be some non-zero value.
Close, but don't forget that you can have a type T where sizeof(T)==4 but the alignment required for T is only 1, for example. The "natural alignment" rule suggests that you align them on their size anyway, even if that is more than the CPU strictly requires, so e.g. 8-byte double variables should be aligned on an 8-byte boundary, even if some platforms would be perfectly happy with a 4-aligned or 1-aligned double.

Many compilers will already align structure members to natural alignment for you (all x86 compilers I know of do this by default). Knowing this rule means you can put the fields in the struct in an order where the compiler doesn't have to insert padding (or inserts only minimal padding). For example, if you have some U8's and some U32's in the same struct, either put all the U32's first, or make sure you group four U8's together, so that the U32's are 4-byte-aligned. If you don't do that, the compiler might need to insert more padding in the struct in order to satisfy its alignment rules. I'm not sure if its legal according to the C/C++ specs for compilers to *re-order* the fields in your struct, but I've never ever seen a compiler that does that, instead they just add padding whenever the next field would not be properly aligned without padding.
tepples
Posts: 22345
Joined: Sun Sep 19, 2004 11:12 pm
Location: NE Indiana, USA (NTSC)
Contact:

Post by tepples »

blargg wrote:why not use intptr_t (or uintptr_t)?
Because not everybody has a C99 compiler. And because C++ compilers aren't yet required to provide C99's new types as an extension.
Dwedit wrote:
tepples wrote:CHAR_BIT (number of bits in a byte) can be larger than 8 on some digital signal processors, which might have, say, 32-bit bytes. But I agree that most of us won't ever write NES emulators for such architectures.
Who would refer to it as a "Byte" rather than a "Word" if it's more than 8 bits large?
octet n. A vector of eight bits. [From Latin octo = eight.]

byte n. A vector of bits whose size is that of an "addressable unit of data storage large enough to hold any member of the basic character set of the execution environment" (C standard, clause 3.6). [From "bite", modified in spelling to distinguish from "bit".]

word n. A vector of bits whose size is a machine's preferred size for integers, floats, or addresses.

On x86, PowerPC, MIPS, and ARM, a byte is the same size as an octet. On some specialized architectures, a byte is the same size as a word. C makes no explicit provision for architectures that have different sizes of bytes for different regions of memory, such as the VRAM of some Nintendo handhelds.
User avatar
MottZilla
Posts: 2835
Joined: Wed Dec 06, 2006 8:18 pm

Post by MottZilla »

Back to you know, writing emulators..

My emulator is coming along nicely I think. I started working on MMC1 and it was a bitch because of various things that weren't very clear to me. But I've managed to get it working I think for everything with the exception of the 32K switching mode. Does anyone know of a MMC1 game that uses 32K switching? Also are there any games where I need to worry about what happens to PRG which you change between 32k and 16k modes and what gets mappe where/etc.

I also finally wrote a real scanline renderer. Prior to this I was just using a hacked up version of my tiled screen renderer. This should allow me to better emulate Sprite 0 hit I think.

Most importantly perhaps I found out my emulator timing was totally broken. NMI would happen at a constant rate and all, but it wasn't the correct amount of cycles, and the vblank peroid and such was just missing. Whoops. :p

Not so important but nice, I looked at Loopy's docs to figure out why SMB's status bar flickered. It fixed that and I'm pretty happy with my progress now.
User avatar
Dwedit
Posts: 4470
Joined: Fri Nov 19, 2004 7:35 pm
Contact:

Post by Dwedit »

MottZilla wrote: Does anyone know of a MMC1 game that uses 32K switching?
Dragon Warrior 3 + 4 use it.
Here come the fortune cookies! Here come the fortune cookies! They're wearing paper hats!
User avatar
MottZilla
Posts: 2835
Joined: Wed Dec 06, 2006 8:18 pm

Post by MottZilla »

How bout one that doesn't also use the 512k or 1024k cart banking? :p
User avatar
Dwedit
Posts: 4470
Joined: Fri Nov 19, 2004 7:35 pm
Contact:

Post by Dwedit »

1024k MMC1 cart banking does not exist, no matter how many DW4 overdumps you find.
"Forbidden Four" multicart example by Tepples uses 32k bankswitching and a size of 256k.
Here come the fortune cookies! Here come the fortune cookies! They're wearing paper hats!
Post Reply