Writing $4203 twice too fast gives erroneous result (not emulated)

Discussion of hardware and software development for Super NES and Super Famicom. See the SNESdev wiki for more information.

Moderator: Moderators

Forum rules
  • For making cartridges of your Super NES games, see Reproduction.
User avatar
rainwarrior
Posts: 8734
Joined: Sun Jan 22, 2012 12:03 pm
Location: Canada
Contact:

Writing $4203 twice too fast gives erroneous result (not emulated)

Post by rainwarrior »

If you write to $4203, starting a multiply, writing $4203 again before the multiply is finished will corrupt the result.

Demo ROM which demonstrates this: dizworld.sfc before vs. dizworld.sfc after

Code: Select all

.a16
.i16
sta a:$4202 ; WRMPYA = z (spurious write to $4303)
nop ; needed because writing $4303 again too fast will cause an erroneous result
ldx z:pv_scale+0
stx a:$4203 ; WRMPYB = scale a (spurious write to $4304)
The reason I ran into this is that I was writing $4202 as 16-bit to avoid changing register sizes. I thought that this was safe because I was rewriting $4203 after. Originally my code had an extra step in between, giving the needed 8 cycle delay, but at some point during optimization it was removed and my code broke on hardware, but was fine on all emulators (including bsnes accurate). Adding that nop cleared up the issue.

There is also an old comment from Near explaining that trying to multiply with a divide in flight causes a bad result that isn't emulated by anything. I guess the same goes for a multiply during a multiply.

Also, thanks to Undisbeliever for helping me figure this out.

Edit: added hardware capture screenshots of what it looks like on boot, also fixed ROM links.

Before the nop was added:
dizworld_before.jpg
After the nop was added:
dizworld_after.jpg
Probably we should write a test ROM to investigate this and spit out the results, but for now this ROM is a quick litmus test that indicates something not emulated.

In the screenshot taken, it looks as if the mode 7 A coefficient result is getting all 0s instead of whatever the intended scale was. If you move around there's a lot of variety to the results, hard to describe. It might be something like $4203 only starts if ready and otherwise we're just changing the operand halfway through the computation.
UnDisbeliever
Posts: 124
Joined: Mon Mar 02, 2015 1:11 am
Location: Australia (PAL)
Contact:

Re: Writing $4203 twice too fast gives erroneous result (not emulated)

Post by UnDisbeliever »

I can write a test ROM.

It will take me a week to code it. I'll need to code a text-buffer to display the results.
lidnariq
Posts: 11432
Joined: Sun Apr 13, 2008 11:12 am

Re: Writing $4203 twice too fast gives erroneous result (not emulated)

Post by lidnariq »

What do you want out of textbuffer code? I've already written a very simple one ( viewtopic.php?p=190848#p190848 ) that displays 64x28 ASCII using mode 5, but I didn't provide myself many convenience functions - just putHexAt(location,byte)
Myself086
Posts: 158
Joined: Sat Nov 10, 2018 2:49 pm

Re: Writing $4203 twice too fast gives erroneous result (not emulated)

Post by Myself086 »

I'll write one real quick, I should be done within the next 24 hours.
UnDisbeliever
Posts: 124
Joined: Mon Mar 02, 2015 1:11 am
Location: Australia (PAL)
Contact:

Re: Writing $4203 twice too fast gives erroneous result (not emulated)

Post by UnDisbeliever »

Thankyou for offering to help. I've written text-buffers before, I just don't have one written for my snes-test-roms environment.

With the amount of free time I have, it should easily take a day or two to code/bugfix and a day to document/proofread.

While I'm coding a text-buffer, I may as well write a split textbuffer (something on my todo list). With one u8[1024] buffer for the tileId (character) and a second u8[1024] buffer for the tile attributes (palette and priority). Halving the amount of data transferred to VRAM during VBlank when the text changes and the palette is unchanged.


EDIT:
Myself086 wrote: Fri Aug 19, 2022 8:30 pm I'll write one real quick, I should be done within the next 24 hours.
Thankyou. It always great to have more people looking into the SNES internals and creating test roms.
Myself086
Posts: 158
Joined: Sat Nov 10, 2018 2:49 pm

Re: Writing $4203 twice too fast gives erroneous result (not emulated)

Post by Myself086 »

Done!

There are 3 operands and the ability to change between Mul and Div.

What the code does with Mul:
- Operand 0 is written to $4202 in 16-bit mode
- Wait some cycles, mostly improper
- Operand 1 is written to $4203 in 8-bit mode
- Wait proper amount before copying results

What the code does with Div:
- Operand 0 is written to $4204 in 16-bit mode
- Operand 2 is written to $4206 in 8-bit mode
- Wait some cycles, mostly improper
- Operand 1 is written to $4206 in 8-bit mode
- Wait proper amount before copying results

I have a SNES but no way to test it myself, let me know how it goes.
Attachments
Mul Div Test.sfc
(128 KiB) Downloaded 49 times
Myself086
Posts: 158
Joined: Sat Nov 10, 2018 2:49 pm

Re: Writing $4203 twice too fast gives erroneous result (not emulated)

Post by Myself086 »

I fixed the timing by waiting for h-blank per test so we don't get interrupted by memory refresh.

I haven't found anyone to test this on real hardware if anyone is interested.

Mesen-S seems to emulate this behavior but 1 cycle off. If you're just 1 cycle faster, Mesen-S should show the correct behaviors.
Attachments
Mul Div Test.sfc
(128 KiB) Downloaded 47 times
regiscaelus
Posts: 32
Joined: Thu Jan 24, 2019 1:35 am

Re: Writing $4203 twice too fast gives erroneous result (not emulated)

Post by regiscaelus »

Myself086 wrote: Tue Aug 23, 2022 7:46 pm I fixed the timing by waiting for h-blank per test so we don't get interrupted by memory refresh.

I haven't found anyone to test this on real hardware if anyone is interested.

Mesen-S seems to emulate this behavior but 1 cycle off. If you're just 1 cycle faster, Mesen-S should show the correct behaviors.
Photos of your test program running an SFC. I compared with Mesen, Bnes-plus and Mednafen and they all return different values.
Attachments
IMG_20220824_094341.jpg
IMG_20220824_093507.jpg
User avatar
rainwarrior
Posts: 8734
Joined: Sun Jan 22, 2012 12:03 pm
Location: Canada
Contact:

Re: Writing $4203 twice too fast gives erroneous result (not emulated)

Post by rainwarrior »

I can verify that with the same inputs on mine, I get what's in that screenshot, as you might expect...

But as far as testing this on hardware, what are you looking for? We can set whatever values we want, but there's endless combinations.

I guess maybe someone could sit down with this and try to come up with a theory for the internal operation, until they can figure out a model that fits the data? Otherwise, I'm not sure what data to report that would be useful.
User avatar
rainwarrior
Posts: 8734
Joined: Sun Jan 22, 2012 12:03 pm
Location: Canada
Contact:

Re: Writing $4203 twice too fast gives erroneous result (not emulated)

Post by rainwarrior »

Mul Div Test_000.png
Mul Div Test_000.png (4.34 KiB) Viewed 2426 times
So this screeenshot from Mesen... if I set this up on my SNES the last line is different. RDDIVx says 8077 on cycle 8, and RDMPYx says 0 on cycle 8.

In fact, every value I am trying for Mul is having 0 on cycle 8 of RDMPYx. I can't find any input values that cause it to be anything else. Is there a bug with this?

Mul Div Test_001.png
Mul Div Test_001.png (4.21 KiB) Viewed 2425 times
Mul Div Test_002.png
Mul Div Test_002.png (5.05 KiB) Viewed 2425 times
These look like hardware until the cycle 8 line as well. For Mul, RDDIVx ends with ff22 (not ff) and RDMPYx ends with 0 (not 21de). For Div on cycle 16, RDDIVx ends with 8b (not 76) and RDMPYx ends with 7622 (not 98). All other lines are identical for both operations.

Mul Div Test_003.png
Mul Div Test_003.png (4.35 KiB) Viewed 2424 times
Mul Div Test_004.png
Mul Div Test_004.png (5.22 KiB) Viewed 2424 times
These two again, same deal. Mesen is apparently accurate until the final row. Mul's last row is ffff, 0 on hardware. Div's last row is 101, fffff on hardware.
regiscaelus
Posts: 32
Joined: Thu Jan 24, 2019 1:35 am

Re: Writing $4203 twice too fast gives erroneous result (not emulated)

Post by regiscaelus »

I checked with my FPGA implementation and it is somehow doing something close to the real hardware for some values. Is it possible to have access to the source code to hard code the register values? This will be easier for simulation.
User avatar
rainwarrior
Posts: 8734
Joined: Sun Jan 22, 2012 12:03 pm
Location: Canada
Contact:

Re: Writing $4203 twice too fast gives erroneous result (not emulated)

Post by rainwarrior »

To give something at the other extreme:
Mul Div Test_005.png
Mul Div Test_005.png (3.8 KiB) Viewed 2415 times
Mul Div Test_006.png
Mul Div Test_006.png (4.57 KiB) Viewed 2415 times
Mul on hardware ends with 302, 0. Div on hardware ends with ffff, 2. (Otherwise same.)
Mul Div Test_007.png
Mul Div Test_007.png (3.81 KiB) Viewed 2414 times
Mul Div Test_008.png
Mul Div Test_008.png (4.5 KiB) Viewed 2414 times
Mul should be: 309, 0
Div should be: ffff, 9


Mesen always seems to be accurate until the final row.
Myself086
Posts: 158
Joined: Sat Nov 10, 2018 2:49 pm

Re: Writing $4203 twice too fast gives erroneous result (not emulated)

Post by Myself086 »

It looks like Mesen-S is allowing writes on Mul cycle 8 and Div cycle 16 but hardware isn't.
UnDisbeliever
Posts: 124
Joined: Mon Mar 02, 2015 1:11 am
Location: Australia (PAL)
Contact:

Re: Writing $4203 twice too fast gives erroneous result (not emulated)

Post by UnDisbeliever »

Here is my test ROM.

This test will use an IRQ interrupt to wait until after DRAM-refresh, write to WRMPYA and WRMPYB with a 16 bit A, wait a few CPU cycles, write to WRMPYB, wait 8 cycles, read RDMPYL and RDMPYB.

INPUTS:
A = value to write to WRMPYA
B1 = value of the first write to WRMPYB
B2 = value of the second write to WRMPYB after 2-9 CPU cycles

OUTPUTS:
#cy = The RDMPY output when there are # CPU cycles inbetween WRMPYB writes


Screen Capture from a 1/1/1 SFC console:
Screen Capture from a 1/1/1 SFC console
Screen Capture from a 1/1/1 SFC console

bsnes v115 screenshot:
bsnes v115 screenshot
bsnes v115 screenshot

The inputs of the test ROM can be edited using the controller. Use B/Y to change the selected input and the D-Pad to change the input value.

I have tried a few different input values on my 2/1/3 SFC and compared them to bsnes/Mesen-S. For the values I tested, only the 7-cycles inbetween WRMPYB writes output differed between real-hardware (0) and emulator (`A*B2`).

I have also ran this test ROM on my 1/1/1 SFC and 1-chip SFC consoles and they all output the same values for the initial test inputs (no other inputs were tested on the other consoles).


Source code on GitHub (MIT Licensed)
Attachments
wrmpyb-in-flight.zip
(2.08 KiB) Downloaded 39 times
regiscaelus
Posts: 32
Joined: Thu Jan 24, 2019 1:35 am

Re: Writing $4203 twice too fast gives erroneous result (not emulated)

Post by regiscaelus »

I simulated wrmpyb-in-flight.sfc and fixed my code to have the correct multiplication behaviour. On the screen shot from simulation, you can see that multiplication is started with 1st write to 4203, but does not restart it with the second write, however it does clear the current result. On the real hardware, I expect that the multiplication is done at every cycle but result is overwritten when writing to 4203.
Attachments
Screenshot from 2022-08-28 00-39-52.png
Post Reply