GSU revision comparison test

Discussion of hardware and software development for Super NES and Super Famicom.

Moderator: Moderators

Forum rules
  • For making cartridges of your Super NES games, see Reproduction.
ARM9
Posts: 57
Joined: Sun Aug 11, 2013 6:07 am

Re: GSU revision comparison test

Post by ARM9 »

AWJ wrote:Why does this ROM depend on cache behaviour when the previous one didn't?
I think it's just the granularity of the counter that's too coarse on the previous one, I could make the scpu loop shorter (at the cost of more complexity).
AWJ
Posts: 433
Joined: Mon Nov 10, 2008 3:09 pm

Re: GSU revision comparison test

Post by AWJ »

No, that's okay, if we can fix incorrect cache behaviour in bsnes as well that's two birds with one stone :)
ARM9
Posts: 57
Joined: Sun Aug 11, 2013 6:07 am

Re: GSU revision comparison test

Post by ARM9 »

I was planning to make a more thorough cache test to check any edge cases there such as how cache lines are loaded, fetching stalls, various invalidation methods (what happens if you write to < xxxfh in a cache line from scpu f.ex) and such.
Sik
Posts: 1589
Joined: Thu Aug 12, 2010 3:43 am

Re: GSU revision comparison test

Post by Sik »

byuu wrote:This seems like a really convoluted way to improve GSU timing ...

AWJ looks at instructions, ARM9 writes tests, qwertymodo runs tests and reports numbers, and I apply submitted patches.

... but hey, if it improves the emulation, then I'm all for it.
You have seen nothing yet, wait until people start bringing in oscilloscopes =P
AWJ
Posts: 433
Joined: Mon Nov 10, 2008 3:09 pm

Re: GSU revision comparison test

Post by AWJ »

qwertymodo, we're all waiting for you to run that new test ROM :)
qwertymodo
Posts: 775
Joined: Mon Jul 02, 2012 7:46 am

Re: GSU revision comparison test

Post by qwertymodo »

mult.sfc

GSU-1 (didn't test it on the other GSU revisions since the last one was identical on all of them)
Image

higan v0.94
Image
AWJ
Posts: 433
Joined: Mon Nov 10, 2008 3:09 pm

Re: GSU revision comparison test

Post by AWJ »

As I suspected, fmult and lmult were reversing the sense of ms0 and were also off by one. The manual says that they take "4 or 8 cycles", but one of those cycles is the one that every one-byte instruction takes.

Code: Select all

diff --git a/bsnes/snes/chip/superfx/core/opcodes.cpp b/bsnes/snes/chip/superfx/core/opcodes.cpp
index 3b14d81..35da5ef 100644
--- a/bsnes/snes/chip/superfx/core/opcodes.cpp
+++ b/bsnes/snes/chip/superfx/core/opcodes.cpp
@@ -476,7 +476,7 @@ void SuperFX::op_fmult() {
   regs.sfr.cy = (result & 0x8000);
   regs.sfr.z  = (regs.dr() == 0);
   regs.reset();
-  add_clocks(4 + (regs.cfgr.ms0 << 2));
+  add_clocks((regs.cfgr.ms0 ? 3 : 7) * cache_access_speed);
 }
 
 //$9f(alt1): lmult
@@ -488,7 +488,7 @@ void SuperFX::op_lmult() {
   regs.sfr.cy = (result & 0x8000);
   regs.sfr.z  = (regs.dr() == 0);
   regs.reset();
-  add_clocks(4 + (regs.cfgr.ms0 << 2));
+  add_clocks((regs.cfgr.ms0 ? 3 : 7) * cache_access_speed);
 }
 
 //$a0-af(alt0): ibt rN,#pp
With this patch, bsnes-classic comes very close to hardware, just 2-5 cycles off:

Image
Near
Founder of higan project
Posts: 1553
Joined: Mon Mar 27, 2006 5:23 pm

Re: GSU revision comparison test

Post by Near »

Wow, yeah. Even ignoring my boolean flag inversion (did that with the GBA sequential access speeds too >_>), 4 or 8 cycles isn't right at all. This can be up to 14 cycles. Did not expect CLSR to factor in here, too.

Retested Winter Gold and SMW2, doesn't seem to cause any regressions. Which is good, the former was always a nightmare.

Looks like we're just a tiny bit too slow in every case. But since it's such a small difference, it's a one-time error rather than cumulative for each loop of this test. Probably some kind of delay in starting the GSU ("go" or whatever)?

Anyway, thanks everyone for all the help on this! v095's shaping up to be a great release :D
AWJ
Posts: 433
Joined: Mon Nov 10, 2008 3:09 pm

Re: GSU revision comparison test

Post by AWJ »

byuu wrote:Wow, yeah. Even ignoring my boolean flag inversion (did that with the GBA sequential access speeds too >_>), 4 or 8 cycles isn't right at all. This can be up to 14 cycles. Did not expect CLSR to factor in here, too.

Anyway, thanks everyone for all the help on this! v095's shaping up to be a great release :D
Using CLSR there is a complete guess. The timing test ROM only tests 21MHz mode, so it produces the same results whether I multiply by cache_access_cycles or not. Perhaps ARM9 can modify the test ROMs to test 10MHz mode as well (in fact, it should be possible just by hexediting a single byte in the ROM...)
Near
Founder of higan project
Posts: 1553
Joined: Mon Mar 27, 2006 5:23 pm

Re: GSU revision comparison test

Post by Near »

Yeah, and maybe also test for that cache invalidation on stop thing, if that's even practical.

For now though, I think it's probably wise to guess that we multiply off CLSR. The whole point of 21MHz mode is supposed to be that it's "twice as fast"; at least for non-memory accesses.
AWJ
Posts: 433
Joined: Mon Nov 10, 2008 3:09 pm

Re: GSU revision comparison test

Post by AWJ »

byuu wrote:Yeah, and maybe also test for that cache invalidation on stop thing, if that's even practical.

For now though, I think it's probably wise to guess that we multiply off CLSR. The whole point of 21MHz mode is supposed to be that it's "twice as fast"; at least for non-memory accesses.
bsnes is already running the test slightly slower than hardware rather than slightly faster, so adding more cache invalidation will probably only make the error worse.
qwertymodo
Posts: 775
Joined: Mon Jul 02, 2012 7:46 am

Re: GSU revision comparison test

Post by qwertymodo »

If you have any other tests you want me to run, just let me know. Just know that it is a bit of a pain to reprogram the cart, since the GSU doesn't provide /CS or /WE signals, meaning I can't program the thing in-circuit from the cart edge and have to resort to desoldering the ROM, cleaning the rosin flux off the pins, reprogramming the chip in a socket, then resoldering it... I wish it was as easy as the Cx4 :(
ARM9
Posts: 57
Joined: Sun Aug 11, 2013 6:07 am

Re: GSU revision comparison test

Post by ARM9 »

Wow that is really close, nice one.
qwertymodo wrote:and have to resort to desoldering the ROM, cleaning the rosin flux off the pins, reprogramming the chip in a socket, then resoldering it... I wish it was as easy as the Cx4 :(
Ouch, that's quite an arduous process. I'm making a test suite so you don't have to reprogram it for each test. Currently got mult, cache, plot timing tests and you can toggle between 10/21mhz, going to add rom/sram buffer tests. Any suggestions are welcome./
Sik
Posts: 1589
Joined: Thu Aug 12, 2010 3:43 am

Re: GSU revision comparison test

Post by Sik »

Maybe solder in a socket so you don't have to desolder the memory? o.o
tepples
Posts: 22345
Joined: Sun Sep 19, 2004 11:12 pm
Location: NE Indiana, USA (NTSC)
Contact:

Re: GSU revision comparison test

Post by tepples »

I suggest multiply accuracy, possibly as a way to figure out why fast multiply in fast mode was discouraged.
Post Reply