emulator performance?
Moderator: Moderators
emulator performance?
(In the interest of not turning this into a speed vs. accuracy debate, let's just assume that any cycle accurate emulator that can handle traditionally tough to emulate games without hacks is equal).
I've been working on optimizing my emulator's code recently and have bumped up the performance of the cycle accurate mode to >500fps (Intel i7 920 w/ Nvidia 9800 GTX). Just wondering how this compares to other emulators out there. In other words, have I been successful or do I have a lot of room for improvement?
James
I've been working on optimizing my emulator's code recently and have bumped up the performance of the cycle accurate mode to >500fps (Intel i7 920 w/ Nvidia 9800 GTX). Just wondering how this compares to other emulators out there. In other words, have I been successful or do I have a lot of room for improvement?
James
My emulator drops to a scanline-accurate emulation mode in the selection menu because cycle-accurate is too slow. Ultimately, I'd like to run the cycle-accurate mode throughout. Still not there today (at 60 fps, at least), but there is some point where software efficiency and hardware speed will allow it. I can help one of those along.What requirements do you need to meet to succeed?...More simultaneous emulators at once?
Beyond that, there isn't a particular target in mind. I like the optimization process and am just curious as to how my work compares to others.
Sounds extremely good to me. My emulator is nowhere near that.
On my machine, I typically get 40 fps, and my emulator does not yet support sound, has major PPU issues (like SMB title screen) and only about 6 mappers implemented.
For me, I've learned a ton and have enjoyed a lot of the time spent on the emulator, so even if I can never get Battletoads working and playing at full rate, I'm still happy.
Al
On my machine, I typically get 40 fps, and my emulator does not yet support sound, has major PPU issues (like SMB title screen) and only about 6 mappers implemented.
For me, I've learned a ton and have enjoyed a lot of the time spent on the emulator, so even if I can never get Battletoads working and playing at full rate, I'm still happy.
Al
I think Nesticle and Famtasia are the fastest Windows-based emulators right now, mainly because nobody ever ported LoopyNES to Windows and brought its accuracy up a few notches.
What do you consider to be "cycle-accurate"? Does that mean that it would simulate explicit reads and writes for each cycle within the instruction, and possibly execute something triggered for each access? Does that mean merely getting page-crossing timing correct?
What do you consider to be "hacks"? Detecting a game and tweaking the timing slightly? Idle loop skipping?
Idle loop skipping is some really good stuff, especially when you don't need to emulate the PPU.
What do you consider to be "cycle-accurate"? Does that mean that it would simulate explicit reads and writes for each cycle within the instruction, and possibly execute something triggered for each access? Does that mean merely getting page-crossing timing correct?
What do you consider to be "hacks"? Detecting a game and tweaking the timing slightly? Idle loop skipping?
Idle loop skipping is some really good stuff, especially when you don't need to emulate the PPU.
Here come the fortune cookies! Here come the fortune cookies! They're wearing paper hats!
Yeah, I guess that's a little vague. PPU cycle accurate. Enough for mid-scanline effects to work properly.Dwedit wrote:What do you consider to be "cycle-accurate"?
For example (from this thread: http://nesdev.com/bbs/viewtopic.php?t=6736), detecting Battletoads and forcing sprite 0 hits at a specific time to work around timing issues.Dwedit wrote:What do you consider to be "hacks"?
That's the attitude that's kept me going all these years. It was a long time before I could get Battletoads working, but all I learned along the way was the real reward. Keep it up!albailey wrote:For me, I've learned a ton and have enjoyed a lot of the time spent on the emulator, so even if I can never get Battletoads working and playing at full rate, I'm still happy.
get nemulator
http://nemulator.com
http://nemulator.com
- cpow
- NESICIDE developer
- Posts: 1097
- Joined: Mon Oct 13, 2008 7:55 pm
- Location: Minneapolis, MN
- Contact:
I couldn't agree with this more. My emulator is getting more and more accurate as the days go by--141 of 163 test roms passing! At least for me it runs sufficiently fast but I am having problems with others who use Win7 64-bit having sub-par performance.James wrote:That's the attitude that's kept me going all these years. It was a long time before I could get Battletoads working, but all I learned along the way was the real reward. Keep it up!albailey wrote:For me, I've learned a ton and have enjoyed a lot of the time spent on the emulator, so even if I can never get Battletoads working and playing at full rate, I'm still happy.
The quest for accuracy and performance is most of the fun!
Re: emulator performance?
My emulator is not exactly cycle-accurate (though it can handle most mid-frame PPU effects) and it runs at > 1000 FPS on an Intel i5-760 processor, for what its worth. (This is without actually copying the PPU/APU output to the screen/sound card; i.e. just calling my "calc frame" function inside a timed loop.)James wrote:I've been working on optimizing my emulator's code recently and have bumped up the performance of the cycle accurate mode to >500fps (Intel i7 920 w/ Nvidia 9800 GTX). Just wondering how this compares to other emulators out there. In other words, have I been successful or do I have a lot of room for improvement?
James
What areas of your code have you been optimizing? Find any good tricks? I've been working on speeding up my emulation core over the past month and have made about a 20% improvement. I still have some more areas I want to look into, but when I'm done I was planning on posting a list of things that happened to boost performance for my particular emulator implementation. For example, I profiled a lot of games and found that LDA (zero page) was by far the most frequent instruction (accounting for about 16% of all instructions) and added a special case for that particular opcode which sped things up. Not exactly ground-breaking stuff, but it was helpful to me so maybe it will be helpful for someone else.
I just bought a new computer with Windows 7 64-bit and was disappointed to see that my emulator ran significantly worse than on a lesser machine running XP. Very frustrating. I think it is because I only have GDI and DirectDraw-based renderers, and neither appears to be hardware accelerated in Windows 7. Hopefully a Direct2D renderer will perform better.At least for me it runs sufficiently fast but I am having problems with others who use Win7 64-bit having sub-par performance.
What method are you using? Looks like it might be scanline-based and, if so, I'm interested in hearing about how you handle mid-frame effects. My scanline based rendered is a lot faster than the cycle accurate one, but it can't handle, for example, Marble Madness.My emulator is not exactly cycle-accurate
Nothing especially fancy. I've been doing stuff like using look up tables where it makes sense (pattern bit interleaving, attribute table stuff, etc.), and, in general, just running under a profiler and focusing on hot spots. The biggest improvements have come from rethinking stuff that's specific to my implementation.What areas of your code have you been optimizing? Find any good tricks?
This was why I switched from DirectDraw to Direct3D -- not just for performance reasons, but also because blits on Vista+ are no longer bilenearly filtered (yeah, I could roll my own, but...). With Direct3D, I'm simply rendering a texture mapped quad and it's quite fast, I haven't tried Direct2D.DirectDraw-based renderers, and neither appears to be hardware accelerated in Windows 7.
My approach is almost tile-based; I try to do the cycle-accurate "catch-up" design but I only sync between CPU instructions; I do not sync between all of the individual stages of a single instruction. I also do some cheating in the PPU emulation to try to make the code run a little faster. It's good enough to run games like Marble Madness and Rad Racer but it's definitely a step below the most accurate emulators out there now. A re-design is probably about 6 years overdue.James wrote:What method are you using? Looks like it might be scanline-based and, if so, I'm interested in hearing about how you handle mid-frame effects. My scanline based rendered is a lot faster than the cycle accurate one, but it can't handle, for example, Marble Madness.
That's encouraging to hear that you are getting good performance with Direct3D. As I understand it Direct2D is just a wrapper on top of Direct3D so it should perform similarly well.This was why I switched from DirectDraw to Direct3D -- not just for performance reasons, but also because blits on Vista+ are no longer bilenearly filtered (yeah, I could roll my own, but...). With Direct3D, I'm simply rendering a texture mapped quad and it's quite fast, I haven't tried Direct2D.
Hmm... it would be easy enough to convert my scanline engine into a tile-based one. Might give that a try for the boost in compatibility.
FWIW, I'm using PPU cycles as my timebase and am calling my CPU code every 3 ticks (NTSC only). It was easy to implement and, while I could probably get the biggest boost in performance by converting this to a catch-up design, it's not as slow as I thought it would be (heck, I think it's actually pretty fast).I try to do the cycle-accurate "catch-up" design but I only sync between CPU instructions; I do not sync between all of the individual stages of a single instruction.
Yeah, I'm sure it will work well. My benchmarks are done with rendering enabled and I'm getting >1700 fps with the scanline engine. It's definitely not a bottleneck!That's encouraging to hear that you are getting good performance with Direct3D. As I understand it Direct2D is just a wrapper on top of Direct3D so it should perform similarly well.
- Odd. I though you should run 1 CPU cycle, then call the PPU to run 3 dots (pixels). You do the reverse...James wrote:FWIW, I'm using PPU cycles as my timebase and am calling my CPU code every 3 ticks (NTSC only).
- My emu gets around 120FPS in my Core2Duo 2GHz. In a Pentium 4, it doesn't run at full speed if I use the blitter to double the image size & stretch it.
- cpow
- NESICIDE developer
- Posts: 1097
- Joined: Mon Oct 13, 2008 7:55 pm
- Location: Minneapolis, MN
- Contact:
I also do it by PPU cycles, running one CPU and APU cycle every third PPU cycle...seems the most logical way.Zepper wrote:- Odd. I though you should run 1 CPU cycle, then call the PPU to run 3 dots (pixels). You do the reverse...James wrote:FWIW, I'm using PPU cycles as my timebase and am calling my CPU code every 3 ticks (NTSC only).Interesting, anyway.
- My emu gets around 120FPS in my Core2Duo 2GHz. In a Pentium 4, it doesn't run at full speed if I use the blitter to double the image size & stretch it.
- You mean after the third PPU cycle...?NESICIDE wrote:I also do it by PPU cycles, running one CPU and APU cycle every third PPU cycle...seems the most logical way.
- Why "most logical way"? Indeed, I use PPU cycles to control the emulation timing. The only cycle counter used here is for PPU: from 0 to 341, plus the scanline counter, obviously.
I smell an offtopic discussion