Theoretical maximum speed for cycle-accurate NES emulation

Discuss emulation of the Nintendo Entertainment System and Famicom.
Josh
Posts: 69
Joined: Sat Mar 19, 2005 11:18 am

Theoretical maximum speed for cycle-accurate NES emulation

Post by Josh »

The most accurate NES emulators execute one opcode at a time and also update the PPU/APU/MMC states at each clock cycle. Nintendulator falls into this category. (What, if any, other emulators do so?) I can't determine what Nintendulator's full speed is on my system, since I can't find any option to turn off throttling. From what I've heard, though, it runs into speed problems with any system much below 1 GHz. Nintendulator appears to be written in a mixture of C and inline assembly. I don't know how much, if any, speed optimization has been done.

If an accurate NES emulator were written in pure, optimized assembly, how fast do you think it would run? Would it be possible to obtain 60fps on a 400 MHz or so Celeron, or is there just too much computational work to be done? I've been trying to get a NES emulator off the ground (I currently have a working cycle accurate CPU core in C), but I'm still a bit undecided about which language to use. Many emulators are written in C++, which makes the coding easier in some aspects; how much of a speed hit does this incur? Would assembly mean a major speedup, or would it be fairly subtle? What area of Nintendulator is the biggest bottleneck?
User avatar
Quietust
Posts: 2028
Joined: Sun Sep 19, 2004 10:59 pm

Re: Theoretical maximum speed for cycle-accurate NES emulati

Post by Quietust »

Josh wrote:I can't determine what Nintendulator's full speed is on my system, since I can't find any option to turn off throttling.
Turn off sound playback (Ctrl+S) and it'll run as fast as it can. You might also want to turn off auto-frameskip (and set frameskip to zero) if you want to measure its speed.
Josh wrote:I don't know how much, if any, speed optimization has been done. What area of Nintendulator is the biggest bottleneck?
Reasonable optimization in the CPU core, and some very small bits in the PPU.
Overall, the PPU is probably the biggest bottlenecks, with the APU close in second place.
Quietust, QMT Productions
P.S. If you don't get this note, let me know and I'll write you another.
Josh
Posts: 69
Joined: Sat Mar 19, 2005 11:18 am

Post by Josh »

Ah. You're using the sound buffer callback for timing. That makes sense, it's about the easiest way to do it on a Win32 platform. Well, I'm seeing from 90-95fps with throttling disabled. This is on a 1.8 GHz P4 @ 2.4 GHz.
User avatar
Zepper
Formerly Fx3
Posts: 3262
Joined: Fri Nov 12, 2004 4:59 pm
Location: Brazil

Post by Zepper »

If you want to do the same with RockNES, set the sound switch to 0 and the blitter to default (256x240 NES screen size), in the config file. Last measure was around 135FPS on my Celeron D 2.66GHz.
User avatar
Quietust
Posts: 2028
Joined: Sun Sep 19, 2004 10:59 pm

Post by Quietust »

Josh wrote:Ah. You're using the sound buffer callback for timing. That makes sense, it's about the easiest way to do it on a Win32 platform.
Actually, I'm not using the callback - I'm repeatedly polling the buffer via IDirectSoundBuffer_GetCurrentPosition (with a sleep thrown in the loop for good measure).
Quietust, QMT Productions
P.S. If you don't get this note, let me know and I'll write you another.
mattmatteh
Posts: 345
Joined: Fri Jul 29, 2005 3:40 pm
Location: near chicago

Post by mattmatteh »

josh

i think c is fast enough for teh nes, thats what i am coding mine in. to get faster you might want to use inline asm. i will do that on my emulator, but last with the c souce still as an option to maintain platform indepence.

if you are working on an emulator then you need to learn how to profile and analysis your code. that is what i am doing now. (no asm)

x86
cpu ~ 100 Mhz
ppu ~ 400 Mhz
sdl drawing ~ 200 Mhz

and less on ppc, but i think the x86 has cache problems where the ppc has a larger cache.

and all i have working now is cpu and partial ppu.

matt
WedNESday
Posts: 1311
Joined: Thu Sep 15, 2005 9:23 am
Location: London, England

Post by WedNESday »

Currently, WedNESday's stats are; (Pure C++) CPU: 25Mhz, PPU: 450Mhz. On my P4 2.2Ghz I get about 180FPS.

100Mhz does seem rather a lot for your CPU core. Why is it so slow?
mattmatteh
Posts: 345
Joined: Fri Jul 29, 2005 3:40 pm
Location: near chicago

Post by mattmatteh »

not sure.... still profiling
WedNESday
Posts: 1311
Joined: Thu Sep 15, 2005 9:23 am
Location: London, England

Post by WedNESday »

Could you post your source code? Or show us the basics of your CPU emulator. My CPU emulator requires 25Mhz for every 60FPS for a 1.8 Mhz 6502.
mattmatteh
Posts: 345
Joined: Fri Jul 29, 2005 3:40 pm
Location: near chicago

Post by mattmatteh »

the source will be posted. not ready to do that yet.... let me work on it some more. and i dont even have a good name for it yet.

matt
mattmatteh
Posts: 345
Joined: Fri Jul 29, 2005 3:40 pm
Location: near chicago

Post by mattmatteh »

i think my cpu is fast now, still have to profile that.

ppu was slow till i did one simple change... i was accessing the pallet with the same function as the ppu memory reads, switched to direct reading and gained 25 % cpu on p3 800. wow ! got that idea from valgrind with cache misses and the fact that its a function call and gets called over 60 000 times a ppu frame.

matt