higan CPU emulation mode bug? (attn: byuu or any 65816 guru)

Discussion of hardware and software development for Super NES and Super Famicom.

Moderator: Moderators

Forum rules
  • For making cartridges of your Super NES games, see Reproduction.
Post Reply
Near
Founder of higan project
Posts: 1553
Joined: Mon Mar 27, 2006 5:23 pm

Re: higan CPU emulation mode bug? (attn: byuu or any 65816 g

Post by Near »

> it wasn't until the 65816 that they had separate vectors. The break flag was used to differentiate the two, and that flag does exist in emulation mode.

At least on the SNES, the 1/B (M/X native mode) flags are always forced high in emulation mode.
Nicole
Posts: 218
Joined: Sun Mar 27, 2016 7:56 pm

Re: higan CPU emulation mode bug? (attn: byuu or any 65816 g

Post by Nicole »

Must just be on the SNES then; the emulation mode's status register flags are shown on page 60 of Programming the 65816.
tepples
Posts: 22345
Joined: Sun Sep 19, 2004 11:12 pm
Location: NE Indiana, USA (NTSC)
Contact:

Re: higan CPU emulation mode bug? (attn: byuu or any 65816 g

Post by tepples »

byuu wrote:> it wasn't until the 65816 that they had separate vectors. The break flag was used to differentiate the two, and that flag does exist in emulation mode.

At least on the SNES, the 1/B (M/X native mode) flags are always forced high in emulation mode.
On the 6502, they're also always forced high because the flip-flops storing M and X don't even exist. When P is read on a 6502, bit 5 (M) is always pushed as 1, and bit 4 (B, X) is pushed as 1 for PHP or BRK or 0 for /IRQ or /NMI. I imagine that emulation mode on a 65816 behaves the same way. Anyone up for a quick test of emulation mode PHP, BRK, /IRQ, and /NMI behavior on Super NES?
AWJ
Posts: 433
Joined: Mon Nov 10, 2008 3:09 pm

Re: higan CPU emulation mode bug? (attn: byuu or any 65816 g

Post by AWJ »

tepples wrote:
byuu wrote:> it wasn't until the 65816 that they had separate vectors. The break flag was used to differentiate the two, and that flag does exist in emulation mode.

At least on the SNES, the 1/B (M/X native mode) flags are always forced high in emulation mode.
On the 6502, they're also always forced high because the flip-flops storing M and X don't even exist. When P is read on a 6502, bit 5 (M) is always pushed as 1, and bit 4 (B, X) is pushed as 1 for PHP or BRK or 0 for /IRQ or /NMI. I imagine that emulation mode on a 65816 behaves the same way. Anyone up for a quick test of emulation mode PHP, BRK, /IRQ, and /NMI behavior on Super NES?
bsnes' emulation mode interrupt behaviour is completely correct for a 65816. There is no emulation BRK vector because a 6502/65C02 uses the same vector for both IRQ and BRK. byuu should know this, he's written a 6502 core. The 65816 emulates the 6502 "break flag" by pushing p (bit 4 and 5 are always high because it's in emulation mode) on a BRK/COP/PHP and p & ~0x10 on a hardware interrupt.
It feels like it should be $fff6, which is otherwise unused.
But that wouldn't emulate a 6502.
Near
Founder of higan project
Posts: 1553
Joined: Mon Mar 27, 2006 5:23 pm

Re: higan CPU emulation mode bug? (attn: byuu or any 65816 g

Post by Near »

> byuu should know this, he's written a 6502 core

Never underestimate what the human mind can forget 4-5 years after something was 'obvious' that you haven't bothered to recall at all since then. But thanks, that makes sense now. I simply wasn't bothering to set 'b' in op_interrupt because it was already 1, got it.
AWJ
Posts: 433
Joined: Mon Nov 10, 2008 3:09 pm

Re: higan CPU emulation mode bug? (attn: byuu or any 65816 g

Post by AWJ »

How do you feel about this change? I folded the interrupt_pending() virtual functions into last_cycle(). So last_cycle() now returns a bool, and op_io_irq() now looks like this:

Code: Select all

alwaysinline void CPUcore::L_op_io_irq() {
  if(last_cycle()) {
    //modify I/O cycle to bus read cycle, do not increment PC
    op_read(regs.pc.d);
  } else {
    op_io();
  }
}
The L_ prefix indicates that this function incorporates last_cycle(), so opcodes that call it shouldn't prefix it with the L macro.

Unfortunately, performance with this change is a complete wash. I was hoping for a tiny speedup, but it looks like the impact of one fewer virtual function call is exactly cancelled out by the extra instruction or two it takes last_cycle() to return a value all the times that value is thrown away.

Still, it's a bit less code, so I consider it an improvement. But maybe you think the L_ is ugly? I'd like to move some common instruction tails into helper functions, to avoid if(size) { L read(); } else { read(); L read(); } repetition all over the place when the byte and word opcodes are unified. Since these helper functions will necessarily call last_cycle() themselves, I figure their names should be decorated to indicate that.

I've been poring over the S-CPU code again (I mean sfc/cpu/*, not the 65816 core). Now that the balanced PPU is gone, the sfc/cpu/* stuff is the oldest surviving code in higan, and I hope you don't mind if I say that it shows its age. I think the DMA and IRQ handling can be simplified a ton if we assume that all $42xx writes are delayed until the beginning of the next CPU cycle. To implement this, mmio_write() would merely stash the address and data, and op_io(), op_read() and op_write() would call a function called something like mmio_edge() which would actually apply any write that was done on the previous cycle. I think this would eliminate the need for irq_lock and dma_pending and a bunch of other crap (irq_lock has the effect that writing to $4200 even causes IRQs from coprocessors to be delayed, which seems like complete nonsense to me).

But of course, any such sweeping change to the S-CPU implementation would require massive regression testing, given all the work that you put in 11 years ago to nail down all those IRQ/NMI/DMA edge cases in the first place. Is there any chance that the hardware test programs linked in this 2005 ZSNES board thread still exist somewhere?
Near
Founder of higan project
Posts: 1553
Joined: Mon Mar 27, 2006 5:23 pm

Re: higan CPU emulation mode bug? (attn: byuu or any 65816 g

Post by Near »

> How do you feel about this change?

Half like it, half dislike it. The L_bla definitely breaks convention.

One idea ... if we're willing to reorder the instructions, we could place all of the <opcode fetch, I/O> instructions together, and redefine L to do this check for us. But that's kind of evil too.

> when the byte and word opcodes are unified

As far as I see it, the key to pulling this off is going to be finding a way to eliminate the need for L (lastCycle) declarations from the code. If we can do that, then we can instead use M/X #defines to as if(!regs.p.m) and if(!regs.p.x) operations to block 16-bit codepaths. It even easily works with multi-line statements:

Code: Select all

void ldaConst() {
  regs.a.l = readPC();
M regs.a.h = readPC();
  bla();
M {
  multiLine();
  bla();
}
  moreBla();
}
We could make a big series of extra macros:
M => if(!regs.p.m)
ML => if(!regs.p.m && (lastCycle(), true)) //I know, I know ...

There's just no damn way to abbreviate to one-letter and have these make sense.

I'm pretty sure this unification will end up hurting performance, though. I realize we're testing M/X in the switch table cases anyway, but there are cases where we have to test more than once.

> the sfc/cpu/* stuff is the oldest surviving code in higan, and I hope you don't mind if I say that it shows its age

If sfc/cpu is the oldest, worst code in higan, then I think I'm finally starting to get the codebase into a good state.

The sfc/alt/ppu-balanced code was an abortion. sfc/cpu is loaded with comments compared to most of my other code. I know it's very complex, because as you noted ... it took a very long time to pass all the tests I came up with at the same time.

> Is there any chance that the hardware test programs linked in this 2005 ZSNES board thread still exist somewhere?

The only ones I know of are here:

http://snesemu.black-ship.net/emus/bsne ... _tests.zip

(the demo ones are much older; before I knew how to initialize the SNES properly; so you'll have to power cycle a few times)
(the test ones are newer; and way more thorough)
(a blue screen indicates a pass; a red screen indicates a failure. Yeah, awful tests, I know.)

Unfortunately, my file archival system looks something like this: https://xkcd.com/1360/

There's one chance that the old files may exist on an old secondary Debian Linux hard drive that's been sitting around in a box collecting dust. But probably not.

On my list of 20 billion things to do that I'll probably never get around to, I want to write new test ROMs that are based around the 21fx protocol, and then use them for automated regression testing. Which is a system I should have been working on since day one, but hindsight is 20/20 right?

...

By the way, if you want a real kick, read through this code some time:

http://snesemu.black-ship.net/emus/bsne ... 02_ir9.rar

See how much worse things could have been for you ;)
AWJ
Posts: 433
Joined: Mon Nov 10, 2008 3:09 pm

Re: higan CPU emulation mode bug? (attn: byuu or any 65816 g

Post by AWJ »

I don't think either byuu or Revenant is going to like this...

https://github.com/awjackson/bsnes-clas ... f4f9f0ea2d

Doom ingame goes from 72 to 79 FPS. Yoshi's Island title screen goes from 68 to 72. Winter Gold's main menu (with the dancing polygon person in the background) goes from 73 to 80. Needless to say the performance increase is even bigger in the non-accuracy profiles.

What that means is that even in the presence of the giant bottleneck that is the accuracy PPU, the overhead of those reg16_t callbacks makes up to 10% of bsnes' total CPU load (I have no idea why YI benefits so much less than the other two SFX2 games)
Near
Founder of higan project
Posts: 1553
Joined: Mon Mar 27, 2006 5:23 pm

Re: higan CPU emulation mode bug? (attn: byuu or any 65816 g

Post by Near »

> I don't think either byuu or Revenant is going to like this...

You're right about that. I might toy around to see if I can get most of this speedup without such drastic changes. Maybe just keeping a bool modified; for all registers would be faster than a function<> callback.

In lighter news, I finished de-templating all of my processor cores. The higan.exe file on Windows has dropped from 8046KiB to 5635KiB (3830KiB after stripping symbols.) Most of the size now is due to nall and hiro both being quite large. That and this being five emulators in one.
AWJ
Posts: 433
Joined: Mon Nov 10, 2008 3:09 pm

Re: higan CPU emulation mode bug? (attn: byuu or any 65816 g

Post by AWJ »

byuu wrote:> I don't think either byuu or Revenant is going to like this...

You're right about that. I might toy around to see if I can get most of this speedup without such drastic changes. Maybe just keeping a bool modified; for all registers would be faster than a function<> callback.
Drastic changes? It's just adding a return to each opcode handler, and deleting the no-longer-needed reg16 class and callbacks.

I whipped up a version where reg16 has a bool modified member instead of a callback, and r14 and r15's are checked (and reset) after opcode dispatch. It yielded about half the speed gain of the version that's up on github. Doom and Winter Gold are ~77 FPS, YI is ~69. Coincidentally (or maybe not), the compiled code size is also intermediate:

152400 master
145720 what I whipped up just now
139936 superfx

(the object sizes are larger than higan's because the CPU core and the plotting/MMIO stuff are still all together in bsnes-classic)

On a semi-related note, I think I know why that multiply timing test ROM is a few cycles off in bsnes, and why the sound gradually desyncs in the Yoshi's Island intro. The way bsnes fills instruction cache lines doesn't match the SuperFX documentation. You've got it reading an entire cache line at once, but the real chip apparently reads one byte at a time, and sets the "valid" bit for the current line when it reads $xxxF from RAM/ROM into the cache. If you jump/branch/change r15 while the current cache line is partially filled, the chip stops and finishes filling the cache line before it takes the branch. Likewise, if you branch to a cacheable but uncached address that isn't $xxx0, the chip stops and fills the bytes of the cache line up to the branch target address, then starts fetching and executing one byte at a time until the cache line is full.

I've got a fairly good idea how to implement this, but I need to think about it a bit more.
Near
Founder of higan project
Posts: 1553
Joined: Mon Mar 27, 2006 5:23 pm

Re: higan CPU emulation mode bug? (attn: byuu or any 65816 g

Post by Near »

> Drastic changes?

That's what I said, yes. I have to press page-down 23 times to scroll through your commit.

> It yielded about half the speed gain of the version that's up on github

I can confirm that here; a 3% speed increase in the best possible case, and a wash on Yoshi's Island.

It's somewhat incredulous that making all opcodes return values isn't as harmful as one extra boolean assignment on register writes (the reg16_t class should be a wash ... there's not even masking and it's all in a header file so entirely inlined); but I'll take your word for it. In any case, 'bool modified' is as far as I'm willing to go on this.

It did bring up an interesting point ... do we really expect SNES CPU writes to $3000-301f to reload the ROM buffer and/or potentially affect r15 increment behavior? Because before it was doing both. Affecting r15 increment definitely seems wrong (and you're fucking crazy to try writing that while the GSU is running); r14 ... I really don't know. Games run with or without it, but I'll leave it in since it was there before.

> The way bsnes fills instruction cache lines doesn't match the SuperFX documentation.

The documentation is a heaping pile of garbage. It's damn near incomprehensible.

Try and figure out what happens when the secondary pixel cache is filled and instructions are executing out of RAM, and see if you can maintain your sanity ;)

> I've got a fairly good idea how to implement this, but I need to think about it a bit more.

Cool, I hope you're successful with that. Would be really great to improve GSU timing.

Here's hoping the change isn't as large as removing reg16_t.

Note that we're probably never going to get perfect matches, especially on the SFX revisions that have their own separate on-cart oscillators.
tepples
Posts: 22345
Joined: Sun Sep 19, 2004 11:12 pm
Location: NE Indiana, USA (NTSC)
Contact:

Re: higan CPU emulation mode bug? (attn: byuu or any 65816 g

Post by tepples »

byuu wrote:Note that we're probably never going to get perfect matches, especially on the SFX revisions that have their own separate on-cart oscillators.
Does that mean Super Game Boy 2 will never have perfect timing either? It also uses a separate crystal at 20 MiHz (20.97 MHz) so that the division by 5 produces a speed closer to that of the handheld, which is important for the Game Link port that's the SGB2's big selling point.
AWJ
Posts: 433
Joined: Mon Nov 10, 2008 3:09 pm

Re: higan CPU emulation mode bug? (attn: byuu or any 65816 g

Post by AWJ »

byuu wrote:> Drastic changes?

That's what I said, yes. I have to press page-down 23 times to scroll through your commit.
Now you know how I feel when you do things like change every function declaration in higan to auto foo() -> type syntax, or change every name to camelCase :mrgreen:

Okay, I'll summarize the changes.

core.hpp:
s/void/uint8
opcode_table.cpp:
Added a macro for opcodes that do totally different things based on sfr.b (to/move and from/moves) Otherwise just name changes--I couldn't very well use opb() both for branches and for instructions that depend on sfr.b.
opcodes.cpp
The only interesting part. Every opcode returns the index of the register it changed. Usually this is either dr (i.e. regs.dreg), n (i.e. the low 4 bits of the opcode), 15 (branch/jump/loop), or 0 (meaning no change). Since regs.reset() resets regs.dreg to 0, I made it return the previous value of dreg, so that return regs.reset() can be a (very welcome) shorthand for uint8 ret = regs.dreg; regs.reset(); return ret;
Only a few opcodes modify two registers and none of them can modify both r14 and r15 at once. lmult modifies r4 and dreg; since r4 has no side effects only dreg is of interest. loop always modifies r12, and modifies r15 if the loop was taken. Out of those we only care about r15. I inverted the sfr.z test in loop because it was tidier that way, and separated to/move and from/moves for the same reason.
registers.hpp
reset() now returns the prior value of dreg, as mentioned.
superfx.cpp
Added a check for r14 in addition to the one for r15 after opcode dispatch.

Other than those it's all deletions.
It's somewhat incredulous that making all opcodes return values isn't as harmful as one extra boolean assignment on register writes (the reg16_t class should be a wash ... there's not even masking and it's all in a header file so entirely inlined); but I'll take your word for it. In any case, 'bool modified' is as far as I'm willing to go on this.
Returning an integer is free if the compiler can arrange for it to already be in rax. In nearly all the opcodes, the value returned is one that's just been used as an index into r[].
It did bring up an interesting point ... do we really expect SNES CPU writes to $3000-301f to reload the ROM buffer and/or potentially affect r15 increment behavior? Because before it was doing both. Affecting r15 increment definitely seems wrong (and you're fucking crazy to try writing that while the GSU is running); r14 ... I really don't know. Games run with or without it, but I'll leave it in since it was there before.
$3000-301F supposedly aren't accessible at all while the GSU is running. Do you have any reason to believe the documentation is wrong on that point?

r15_modified was irrelevant when r15 was changed from the S-CPU side, because peekpipe() and pipe() both reset it to false. The only time the test for r15_modified could actually see true was when r15 was modified by an instruction. See, this is the other problem with any solution that depends on overloading operator=()--it obfuscates the code because r15 is "modified" continuously but only one of the places where it can be modified actually matters. Your "elegant" solution isn't so elegant if you have to add code to undo what it does 2/3 of the time.

In any case, setting r15 from the CPU definitely makes the GSU execute starting at that address, not the address after it.
Near
Founder of higan project
Posts: 1553
Joined: Mon Mar 27, 2006 5:23 pm

Re: higan CPU emulation mode bug? (attn: byuu or any 65816 g

Post by Near »

Current object size analysis for higan: http://i.imgur.com/hj4DILw.png

Mostly what I'd expect to see. But I'm not sure why sfc-cartridge.o is so bloated ...

fc-cartridge.o has all the board mappers, at least ...

> Does that mean Super Game Boy 2 will never have perfect timing either?

Correct. At least until we get bsnes-physics, the particle-accurate emulator.

> Now you know how I feel when you do things like change every function declaration in higan to auto foo() -> type syntax, or change every name to camelCase

Believe me, this shit is driving me insane as well.

diffs are a complete disaster, and the past few months I've been getting intense joint pain that's way worse than I've ever had before. There are many times I just won't even type because it hurts so much.

But on the bright side, it's going hand-in-hand with massive code cleanups to code that's sat idle for, in some instances, over a decade.

> $3000-301F supposedly aren't accessible at all while the GSU is running. Do you have any reason to believe the documentation is wrong on that point?

That's what they said about the SNES PPU registers, OAM and CGRAM :P

My reason to believe it is because the SNES is the devil :P
Revenant
Posts: 455
Joined: Sat Apr 25, 2015 1:47 pm
Location: FL

Re: higan CPU emulation mode bug? (attn: byuu or any 65816 g

Post by Revenant »

AWJ's last superfx commit is no problem on my end, since my working copy already (finally) replaced RegisterEdit's reference abuse with a proper interface in ChipDebugger, so (assuming that the superfx branch or something like it gets merged into master) it's just a matter of making sure the SuperFX implementation has consistent behavior for R14/R15, which is pretty trivial.
Post Reply