I think it's just the granularity of the counter that's too coarse on the previous one, I could make the scpu loop shorter (at the cost of more complexity).AWJ wrote:Why does this ROM depend on cache behaviour when the previous one didn't?
GSU revision comparison test
Moderator: Moderators
Forum rules
- For making cartridges of your Super NES games, see Reproduction.
Re: GSU revision comparison test
Re: GSU revision comparison test
No, that's okay, if we can fix incorrect cache behaviour in bsnes as well that's two birds with one stone 
Re: GSU revision comparison test
I was planning to make a more thorough cache test to check any edge cases there such as how cache lines are loaded, fetching stalls, various invalidation methods (what happens if you write to < xxxfh in a cache line from scpu f.ex) and such.
Re: GSU revision comparison test
You have seen nothing yet, wait until people start bringing in oscilloscopes =Pbyuu wrote:This seems like a really convoluted way to improve GSU timing ...
AWJ looks at instructions, ARM9 writes tests, qwertymodo runs tests and reports numbers, and I apply submitted patches.
... but hey, if it improves the emulation, then I'm all for it.
Re: GSU revision comparison test
qwertymodo, we're all waiting for you to run that new test ROM 
-
qwertymodo
- Posts: 775
- Joined: Mon Jul 02, 2012 7:46 am
Re: GSU revision comparison test
mult.sfc
GSU-1 (didn't test it on the other GSU revisions since the last one was identical on all of them)

higan v0.94

GSU-1 (didn't test it on the other GSU revisions since the last one was identical on all of them)

higan v0.94

Re: GSU revision comparison test
As I suspected, fmult and lmult were reversing the sense of ms0 and were also off by one. The manual says that they take "4 or 8 cycles", but one of those cycles is the one that every one-byte instruction takes.
With this patch, bsnes-classic comes very close to hardware, just 2-5 cycles off:

Code: Select all
diff --git a/bsnes/snes/chip/superfx/core/opcodes.cpp b/bsnes/snes/chip/superfx/core/opcodes.cpp
index 3b14d81..35da5ef 100644
--- a/bsnes/snes/chip/superfx/core/opcodes.cpp
+++ b/bsnes/snes/chip/superfx/core/opcodes.cpp
@@ -476,7 +476,7 @@ void SuperFX::op_fmult() {
regs.sfr.cy = (result & 0x8000);
regs.sfr.z = (regs.dr() == 0);
regs.reset();
- add_clocks(4 + (regs.cfgr.ms0 << 2));
+ add_clocks((regs.cfgr.ms0 ? 3 : 7) * cache_access_speed);
}
//$9f(alt1): lmult
@@ -488,7 +488,7 @@ void SuperFX::op_lmult() {
regs.sfr.cy = (result & 0x8000);
regs.sfr.z = (regs.dr() == 0);
regs.reset();
- add_clocks(4 + (regs.cfgr.ms0 << 2));
+ add_clocks((regs.cfgr.ms0 ? 3 : 7) * cache_access_speed);
}
//$a0-af(alt0): ibt rN,#pp
Re: GSU revision comparison test
Wow, yeah. Even ignoring my boolean flag inversion (did that with the GBA sequential access speeds too >_>), 4 or 8 cycles isn't right at all. This can be up to 14 cycles. Did not expect CLSR to factor in here, too.
Retested Winter Gold and SMW2, doesn't seem to cause any regressions. Which is good, the former was always a nightmare.
Looks like we're just a tiny bit too slow in every case. But since it's such a small difference, it's a one-time error rather than cumulative for each loop of this test. Probably some kind of delay in starting the GSU ("go" or whatever)?
Anyway, thanks everyone for all the help on this! v095's shaping up to be a great release :D
Retested Winter Gold and SMW2, doesn't seem to cause any regressions. Which is good, the former was always a nightmare.
Looks like we're just a tiny bit too slow in every case. But since it's such a small difference, it's a one-time error rather than cumulative for each loop of this test. Probably some kind of delay in starting the GSU ("go" or whatever)?
Anyway, thanks everyone for all the help on this! v095's shaping up to be a great release :D
Re: GSU revision comparison test
Using CLSR there is a complete guess. The timing test ROM only tests 21MHz mode, so it produces the same results whether I multiply by cache_access_cycles or not. Perhaps ARM9 can modify the test ROMs to test 10MHz mode as well (in fact, it should be possible just by hexediting a single byte in the ROM...)byuu wrote:Wow, yeah. Even ignoring my boolean flag inversion (did that with the GBA sequential access speeds too >_>), 4 or 8 cycles isn't right at all. This can be up to 14 cycles. Did not expect CLSR to factor in here, too.
Anyway, thanks everyone for all the help on this! v095's shaping up to be a great release
Re: GSU revision comparison test
Yeah, and maybe also test for that cache invalidation on stop thing, if that's even practical.
For now though, I think it's probably wise to guess that we multiply off CLSR. The whole point of 21MHz mode is supposed to be that it's "twice as fast"; at least for non-memory accesses.
For now though, I think it's probably wise to guess that we multiply off CLSR. The whole point of 21MHz mode is supposed to be that it's "twice as fast"; at least for non-memory accesses.
Re: GSU revision comparison test
bsnes is already running the test slightly slower than hardware rather than slightly faster, so adding more cache invalidation will probably only make the error worse.byuu wrote:Yeah, and maybe also test for that cache invalidation on stop thing, if that's even practical.
For now though, I think it's probably wise to guess that we multiply off CLSR. The whole point of 21MHz mode is supposed to be that it's "twice as fast"; at least for non-memory accesses.
-
qwertymodo
- Posts: 775
- Joined: Mon Jul 02, 2012 7:46 am
Re: GSU revision comparison test
If you have any other tests you want me to run, just let me know. Just know that it is a bit of a pain to reprogram the cart, since the GSU doesn't provide /CS or /WE signals, meaning I can't program the thing in-circuit from the cart edge and have to resort to desoldering the ROM, cleaning the rosin flux off the pins, reprogramming the chip in a socket, then resoldering it... I wish it was as easy as the Cx4 
Re: GSU revision comparison test
Wow that is really close, nice one.
Ouch, that's quite an arduous process. I'm making a test suite so you don't have to reprogram it for each test. Currently got mult, cache, plot timing tests and you can toggle between 10/21mhz, going to add rom/sram buffer tests. Any suggestions are welcome./qwertymodo wrote:and have to resort to desoldering the ROM, cleaning the rosin flux off the pins, reprogramming the chip in a socket, then resoldering it... I wish it was as easy as the Cx4
Re: GSU revision comparison test
Maybe solder in a socket so you don't have to desolder the memory? o.o
Re: GSU revision comparison test
I suggest multiply accuracy, possibly as a way to figure out why fast multiply in fast mode was discouraged.