Mesen-S - SNES Emulator
Moderator: Moderators
Forum rules
- For making cartridges of your Super NES games, see Reproduction.
Re: Mesen-S - SNES Emulator
Thanks for the Rendering Ranger R2 analysis - it does look like the other games I've checked before. The SPC/CPU are somehow just slightly out of sync and its causing some of the transmission data to be lost in the process. There's definitely something that's causing the timing problem - just need to find what it is. Spent some time yesterday reviewing a lot of the CPU's code to make sure it was timed properly, will keep doing a bit more of that today.
Re: CGRAM, should be fixed now - I had completely overlooked that detail (I thought there was a test rom for this, but I think I might be confusing myself with the OAM test roms?). Thanks!
For the post-DMA irq/nmi timing, changing it to that does fix the bug in Power Rangers, but I'm not sure if it breaks something else, would need to retest some games.
For Axelay, I tried comparing with higan, but can't see a difference? Mesen-S doesn't look like the Snes9x screenshot shown in the issue you linked:Am I looking at the wrong thing?
Re: CGRAM, should be fixed now - I had completely overlooked that detail (I thought there was a test rom for this, but I think I might be confusing myself with the OAM test roms?). Thanks!
For the post-DMA irq/nmi timing, changing it to that does fix the bug in Power Rangers, but I'm not sure if it breaks something else, would need to retest some games.
For Axelay, I tried comparing with higan, but can't see a difference? Mesen-S doesn't look like the Snes9x screenshot shown in the issue you linked:Am I looking at the wrong thing?
Re: Mesen-S - SNES Emulator
That's another thing that also happens. That's part of the DMA<>CPU synchronization.According to anomie's docs, writing to $420B (DMA enable) allows the CPU to continue up to the point where it reads the next instruction's op code before DMA starts.
This is a really complex topic to talk about ... anomie's docs describe it in detail. But yes, one more instruction *cycle* is executed before the DMA begins, and you need to record whether that cycle took 6, 8, or 12 clocks. The clock is then aligned to a multiple of 8, then the DMA transfer runs, then the clock is aligned to a multiple of that last, extra instruction cycle (8, 6, or 12.)
As you can no doubt imagine, emulating this as a state machine is nightmare fuel.
But what I was mentioning, IRQ lock, is a different phenomena. After (H)DMA, the next instruction cycle won't fire interrupts, which *can* be enough to let one more instruction execute (since, sta $420b writes to $420b as the last cycle.)
Sorry, I have to rely a lot on my imperfect memory, because I implemented this stuff ten or so years ago. I hope I'm doing it all justice, but it's definitely all there in my CPU code.
I'm afraid bsnes' PPU timings are not to be relied on. I have 15 years' worth of game bugfixes to get everything commercial games do just right (see eg the Axelay thing), but unfortunately, my test ROMs can't verify things that can't be read back (eg video output), so I don't know when the PPU actually latches settings for rendering.Game changes bg2 nametables mid-scanline during active draw. bsnes-plus executes same timing but does not pixel flicker.
It is, with no exaggeration, the last real frontier for SNES emulation. If we get someone with a logic analyzer to give us PPU fetch timings, then we can basically have our emulation be more precise and more faithful than the SNES Jr was.
There's about a dozen minor things I know aren't emulated, and probably several dozen I don't know about, but they're indeed very minor.
FWIW, Rendering Ranger R2 was a game that broke whenever I tried synchronizing the SMP to the CPU after every opcode, rather than after every cycle. It's very tightly timed code.Thanks for the Rendering Ranger R2 analysis - it does look like the other games I've checked before.
Re: Mesen-S - SNES Emulator
I have no idea what happened but it always looks correct now, using same test build when report created.Sour wrote:For Axelay, I tried comparing with higan, but can't see a difference? Mesen-S doesn't look like the Snes9x screenshot shown in the issue you linked
Ah. That's something I missed. Assumed it was aligned to the next cpu internal operation cycle.byuu wrote:then the clock is aligned to a multiple of that last, extra instruction cycle (8, 6, or 12.)
Something Sour and I was wondering is: that DMA delay of 1 cpu cycle is enough to fetch the next opcode from memory. But not enough to finish CPU internal operations?byuu wrote:But yes, one more instruction *cycle* is executed before the DMA begins
So in psuedo-effect, it's..
- sta $420b
- do dma first
- then run (finish) next cpu instruction
- now handle pending interrupts
...?
I should check the code.
Re: Mesen-S - SNES Emulator
Speaking of which...byuu wrote:It is, with no exaggeration, the last real frontier for SNES emulation. If we get someone with a logic analyzer to give us PPU fetch timings,
If I have a logic analyzer with 16 clips, what signals do you want recorded? You've previously stated that it would start off with the 9 pins between the two PPUs (CHR0..CHR3, PRIO0..1, COLOR0..2 in jwdonal's redrawn schematic), but what after that?
(Also, what ROMs should be tested against?)
Re: Mesen-S - SNES Emulator
Whoever you are, I wish you were around when I was bug-hunting my SNES core, hahah. You're doing a thoroughly awesome job so far.I have no idea what happened but it always looks correct now, using same test build when report created. :confused: I must have some really random glitchy stuff going on. :grumble:
Well ... interrupt triggering is tested on the start of the last work cycle, or the start of the first bus cycle of the next instruction. So to an emulator without explicit pipelining, it's one cycle before the end of the instruction. If you skip an IRQ test on that last cycle, then it effectively won't fire until the next whole instruction, even though it was only one cycle.now handle pending interrupts
The one cycle does the whole deal, one bus cycle plus one work cycle, in full.
Re: Mesen-S - SNES Emulator
Oh.byuu wrote:The one cycle does the whole deal, one bus cycle plus one work cycle, in full.
Phew. That whole procedure could get messy. Glad I'm not an emulator author. I could imagine running some derpy code in ram (or maybe worse, running it in system registers).
Code: Select all
sta $420b
dma_target:
lda $18 (dp = $2100)
I didn't get why the DMA controller had that delay sync to 8 cycles. And then always runs everything per 8. Until I realized CLK line probably feeds it at crystal / 8. So it has to wait for every falling / rising edge to actually do something.
And when it's done, CPU CLK line has what.. a multiplexer upline somewhere that chooses 6,8,12? So that's forced to wait for the edge to come and then move on.
I have to test out an SPC idea in Mesen-S to see what happens.
Re: Mesen-S - SNES Emulator
Working solution:
Let cpu read old spc700 value.
ActRaiser 2, Rendering Ranger R2, Illusion of Gaia / Time, American Tail, Hiouden - Mamono-tachi to no Chikai, Tales of Phantasia = working.
Kishin Douji Zenki - Tenchi Meidou = broken
Code: Select all
uint8_t Spc::CpuReadRegister(uint16_t addr)
{
uint8_t val = _state.OutputReg[addr & 0x03];
Run();
return val;
}
ActRaiser 2, Rendering Ranger R2, Illusion of Gaia / Time, American Tail, Hiouden - Mamono-tachi to no Chikai, Tales of Phantasia = working.
Kishin Douji Zenki - Tenchi Meidou = broken
Re: Mesen-S - SNES Emulator
Ys III - Wanderers from Ys -- save game doesn't write to sram correctly
IIRC, this uses a custom mapper board. SRAM should be mapped to 70:8000-FFFF.
And I guess it's worth bringing up that there's two non-enhanced games that use the pseudo-512 blending mode:
- Jurassic Park (hud, dialog boxes)
- Bishoujo Senshi Sailor Moon S - Kondo wa Puzzle de Oshioki yo! (main menu)
You could cheat by merging every discreet left + right pair back down to 256.
IIRC, this uses a custom mapper board. SRAM should be mapped to 70:8000-FFFF.
And I guess it's worth bringing up that there's two non-enhanced games that use the pseudo-512 blending mode:
- Jurassic Park (hud, dialog boxes)
- Bishoujo Senshi Sailor Moon S - Kondo wa Puzzle de Oshioki yo! (main menu)
You could cheat by merging every discreet left + right pair back down to 256.
Re: Mesen-S - SNES Emulator
That's interesting - I don't think the hardware does this, though? But at least it shows that there isn't much missing for it to actually work properly.topspoon wrote:Working solution:
RE: Ys III - Wanderers from Ys - thanks, I'll take a look!
I haven't checked the sailor moon game, but the jurassic park one should be displaying properly, I've tested it a few times. Is it displaying wrong on your end?
In other news, I spent the day creating tests that measure the cycles used for every single opcode (and losing my sanity in the process) by running each one 100 times and then calculating how many PPU dots have elapsed using the H/V counters:I found a couple of issues along the way, but it matches bsnes-plus now. Sadly, it didn't fix any of the freezes. But at least now I can be confident that my code is timed properly.
Still need to compare with higan and see if I can merge the tests into a single rom (instead of 27 roms). Will post the roms/source to these sometime tomorrow, out of time for the day.
I absolutely second that! Thank you so much for the time you've spent testing and even giving me outright solutions to the issues you find!byuu wrote:Whoever you are, I wish you were around when I was bug-hunting my SNES core, hahah. You're doing a thoroughly awesome job so far.
Re: Mesen-S - SNES Emulator
Does the frontend do the blending? Because I'm using an unofficial one. The raw picture itself looks great, just the TV effect is missing.Sour wrote:I haven't checked the sailor moon game, but the jurassic park one should be displaying properly, I've tested it a few times. Is it displaying wrong on your end?
Honestly I'm not sure. I just imagined it this way:Sour wrote:That's interesting - I don't think the hardware does this, though? But at least it shows that there isn't much missing for it to actually work properly.
- Thread A = cpu. In 1 "sync CLK" pulse, it does "nop : lda $2140"
- Thread B = spc700. In 1 "sync CLK" pulse, it does "sta $f4" (A = B0)
- SMP -> CPU 0 = EA
Last cycle CPU reads from $2140
Last cycle SPC700 writes to $F4
I'm not so sure a CLK pulse can immediately update the CPU in port at the same exact time SPC700 writes out? Guessing SPC700 seems to have a slower bus than the CPU side.
So extra assuming it then takes 1 SPC700 CLK to trickle the port values in/out on bus. And that SPC700 CLK is the guardian that controls this sync relationship; slower chip controls electronic handshaking.
CPU will read stale value of $EA first. Then spin loop and get $B0.
If it's immediate update, that might be too tight. Like how DMA sync can wait 8 cycles, because that CLK edge is that far out, and won't operate on the 0 cycle CLK edge since logic gates aren't built to handle it.
All speculation above.
But Kishin Douji Zenki - Tenchi Meidou could spook everything. Have to get a log running.
edit:
While it might be possible for SPC to write and CPU to read same SMP port at exact same time, I expect you'll get an unstable, indeterminate murky value of old + new.
It would take another CPU read to get the true value.
edit2:
This would be an interesting replacement idea to test, instead of returning old value straight.
Re: Mesen-S - SNES Emulator
The core supports blargg's NTSC filter - if you turn that on, the blending should look alright.topspoon wrote:Does the frontend do the blending? Because I'm using an unofficial one. The raw picture itself looks great, just the TV effect is missing.
RE: SPC, without calling Run() before the read, you actually end up reading an old value that the SPC might have set hundreds of cycles ago. The SPC is only executed when the CPU reads/writes to it (or once at the end of each frame). If I change the code to run the SPC in sync with the CPU, reading the value before calling Run() no longer has an impact. Still, it might be a clue as to what is causing the actual problem here.
The CPU/SPC can read/write the same port at the same time - I think anomie's documentation said this usually returns the AND of both values (this isn't implemented in any way in my code, though)
--
I finished working on my CPU timing test (attached, with source) and the results from a sd2snes are available here: https://www.youtube.com/watch?v=3myuKodnw_k (thanks to koitsu for recording this!)
The test goes through almost every single op code on the 65816, runs a small benchmark and displays a value representing the number of PPU dots it took to perform the test. The actual values are meaningless - the important part is that the results should match the hardware values (+/- 1, or sometimes +/- 10 when dram refresh gets in the way)
The rom goes through 54 separate screens (27 without fastrom, and then the same tests, with fastrom turned on), testing most op codes with various combinations of the X/M flags. It takes about 2 minutes to run. Might try to add a few more test cases into it eventually, but this is a pretty decent start - hopefully this is useful to someone else making a SNES core at some point!
I've found and fixed a few timing issues thanks to this, but it still hasn't been enough to fix the games that freeze. Will have to start looking at other stuff (DMA timing, IRQ timing, etc) to see if I can find more issues.
- Attachments
-
- op_timing_test_v2.zip
- (831.35 KiB) Downloaded 74 times
Re: Mesen-S - SNES Emulator
Yup. That does it.Sour wrote:The core supports blargg's NTSC filter - if you turn that on, the blending should look alright.
Oof.Sour wrote:RE: SPC, without calling Run() before the read, you actually end up reading an old value that the SPC might have set hundreds of cycles ago. The SPC is only executed when the CPU reads/writes to it (or once at the end of each frame). If I change the code to run the SPC in sync with the CPU, reading the value before calling Run() no longer has an impact.
Oh cool. So it returns a possibly unstable value when it happens. This is worth looking into. Will do that first then.Sour wrote:The CPU/SPC can read/write the same port at the same time - I think anomie's documentation said this usually returns the AND of both values (this isn't implemented in any way in my code, though)
That's very neat!Sour wrote:The test goes through almost every single op code on the 65816, runs a small benchmark and displays a value representing the number of PPU dots it took to perform the test. The actual values are meaningless - the important part is that the results should match the hardware values (+/- 1, or sometimes +/- 10 when dram refresh gets in the way)
edit:
old & new
Rendering Ranger R2, Hiouden = gets farther but hangs
ActRaiser 2, American Tail, Illusion of Gaia = okay
-
creaothceann
- Posts: 316
- Joined: Mon Jan 23, 2006 7:47 am
- Location: Germany
- Contact:
Re: Mesen-S - SNES Emulator
It's OR:Sour wrote:The CPU/SPC can read/write the same port at the same time - I think anomie's documentation said this usually returns the AND of both values (this isn't implemented in any way in my code, though)
anomie (timing.txt) wrote:The SPC700 communicates with the S-CPU via 4 registers. Exact memory access
timings on these registers is not known, however it is possible that the 5A22
will be performing a read at the instant the SPC700 is performing a write. The
5A22 will then read the logical OR of the old and new values of the register.
I ran the test on my units:Sour wrote:I finished working on my CPU timing test (attached, with source)
http://www.mediafire.com/folder/dg6s2he ... ng_test_v2
Last edited by creaothceann on Sun Jun 30, 2019 10:24 pm, edited 1 time in total.
My current setup:
Super Famicom ("2/1/3" SNS-CPU-GPM-02) → SCART → OSSC → StarTech USB3HDCAP → AmaRecTV 3.10
Super Famicom ("2/1/3" SNS-CPU-GPM-02) → SCART → OSSC → StarTech USB3HDCAP → AmaRecTV 3.10
Re: Mesen-S - SNES Emulator
Many, many years ago (ten, probably), I wrote a test ROM to start trying to emulate this behavior.Oh cool. So it returns a possibly unstable value when it happens. This is worth looking into. Will do that first then.
I loaded it onto my SNES copier (Super UFO 8.3j), and it froze the system. The copier stopped working after that. Luckily I had a second one.
I am 99% sure this was just coincidental. But it completely scared me away from trying to test bus conflicts and hardware crashes.
If anyone's willing to give it a go though, this and the SNES CPU revision 1 DMA<>HDMA crash would be very good things to emulate. They're definitely problems homebrew devs could accidentally run into. (and in fact, Parallel Worlds did with the DMA crash.)
The big revolution for bsnes' timings was me writing two key functions that I use in all of my test_* ROMs:The actual values are meaningless - the important part is that the results should match the hardware values (+/- 1, or sometimes +/- 10 when dram refresh gets in the way)
First, write a function that you can call, and once it returns, Vcounter=0, and Hcounter=0 (not Hdot ... the actual counter.)
Second, write a function that when called, will consume N master clock (21mhz) cycles, where N is the value in the 16-bit A register.
With these two tests, you can get 100% stable results every time you run a timing test. You can write tests that write to PPU registers at exact moments, and then log the results, etc.
If you're not sure how to do it, some of my test ROMs that Evan rehosted on snescentral.com's homebrew page should have the source with them. But I went with a really gross brute-force code generating method for doing them. I'm sure you could do much better ^-^;
I'd highly recommend you make these two functions for your own tests. Use bsnes or Mesen tracelogs that store Vcounter/Hcounter per CPU instruction to ensure the functions are correct.
Once you have this, DRAM refresh is pretty stable. When you want to get into hopeless pedantry, the actual cycle where DRAM refresh fires each scanline varies between CPU revision 1 and 2, but it's a million times easier than the absolute depraved insanity that is the Sega Genesis' four asynchronous DRAM refresh behaviors ^-^;
A better way to do it is to blend every pixel 50% with the *input* (not output) pixel before (or after) it. Here's my code for it (supports 24-bit and 30-bit color):You could cheat by merging every discreet left + right pair back down to 256.
Code: Select all
if(colorBleed) {
uint32 mask = depth == 30 ? 0x40100401 : 0x01010101;
//note: this isn't demanding enough for #pragma omp parallel for
//unlike with snes_ntsc or HQ2x, OpenMP will just make it slower.
for(uint y : range(height)) {
auto target = output + y * width;
for(uint x : range(width)) {
auto a = target[x];
auto b = target[x + (x != width - 1)];
target[x] = (a + b - ((a ^ b) & mask)) >> 1;
}
}
}
Pseudo-hires is used to blend two 256-width layers together, and hires is used to draw onw 512-width layer.
However, it is absolutely possible to draw a 512-width layer using pseudo-hires, or two 256-width layers using hires, if you interleave the tile data appropriately. Now make a demo where you switch between the two with a single button press and what you find is that the output looks 100% identical. That means there is light blurring (as a result of analog video) on a real SNES in both pseudo-hires and true hires mode. So, even though it makes true hires 512-width text (eg G.O.D., Marvelous, Rudra no Hihou, etc) look a bit worse ... it should be blended always.
It's perfectly fine to require blargg's snes_ntsc to simulate the hires blending, but if you want the lightest-weight filter that won't distort the image more than necessary to simulate the pseudo-hires translucency effects, the above code should get the job done.
Up to you of course, Sour ^-^;
Re: Mesen-S - SNES Emulator
Nice! Thanks for taking the time to record that. At first glance it seems like both NTSC recordings are essentially identical (+/- 1 dot on some values). I haven't compared with koitsu's numbers, though, but I'd assume they're about the same too. It looks like the performance gap with PAL is so tiny that it doesn't really alter the numbers much - maybe if I made each test 10x longer it might make it more obvious (might try doing that sometime)creaothceann wrote:I ran the test on my units
Oh, that's right - I do think I have the source for those. Had completely forgotten about them (and I'm actually unsure if they work as intended on Mesen-S at the moment) - will try to dig them up and see if I figure out how to use them next time I try to time some behavior.byuu wrote:If you're not sure how to do it, some of my test ROMs that Evan rehosted on snescentral.com's homebrew page should have the source with them. But I went with a really gross brute-force code generating method for doing them.
For this particular test, I was mostly just trying to confirm all idle cycles and the like - as is, any missing/extra cycles causes the value to jump up/down by ~150, so it makes it pretty obvious to find any major implementation issues.
I hadn't even really considered the whole blending & transparency side of things, actually. Definitely might be worth adding another video filter that does just a simple blending like the one you just posted - I'll add it to my list, thanks!byuu wrote:It's perfectly fine to require blargg's snes_ntsc to simulate the hires blending, but if you want the lightest-weight filter that won't distort the image more than necessary to simulate the pseudo-hires translucency effects, the above code should get the job done.
---
I've fixed up the SRAM mappings for Wanderers (based on the info byuu had posted a few pages back) - so it should be working properly now (and hopefully nothing else broke in the process!)
I gave fixing the whole DMA/IRQ timing issue a try, too. Wrote a pretty simple test suite to validate a few scenarios on hardware. Waiting on the hardware results, but for now I'm assuming higan is correct and my implementation gives the same result as higan for the test. It also fixes the power rangers game, too.