Isolated Warrior unemulated graphical glitches

Discuss emulation of the Nintendo Entertainment System and Famicom.

Moderator: Moderators

Fiskbit
Posts: 891
Joined: Sat Nov 18, 2017 9:15 pm

Isolated Warrior unemulated graphical glitches

Post by Fiskbit »

Isolated Warrior (MMC3) has a couple different graphical glitches that don't seem to be present in any emulators I've tried so far. BMF54123, who noticed these issues, was kind enough to take some video.

http://bmf.rustedlogic.net/temp/2020-04 ... -34-59.mp4
Here, you can see glitchy slivers above and below the mid-screen animation. Their locations correspond to writes to $8001 for CHR swapping. They don't show up on Everdrive (so I've been unable to test it, myself), but are apparently more severe on PowerPak. They're not present on every power-on and reset only affects them on a frontloader, which implies they're sensitive to CPU/PPU alignment. The particular tiles showing the bad slivers are tile index $00, which is blank in both the source and destination banks. They are white, which implies the pixel values are binary 01.


http://bmf.rustedlogic.net/temp/2020-04 ... -36-53.mp4
Here, the sprite tiles making up the PAUSE text randomly flicker. This also occurs during gameplay and in special rooms. In special rooms, which don't shuffle the location of tiles in the OAM buffer after initial placement, the same tiles will flicker on a given entry. It occurs on all alignments and on flash carts. I've reproduced it on rev E and G PPUs, but haven't seen it on rev B. This issue appears to be caused by the game fully disabling rendering after the HUD, which the wiki claims should not be done while sprite evaluation is ongoing (< 192 or 240). This is usually done at the very end of the scanline, but sometimes it's delayed enough to be around the start of the next one, which might be triggering the flickering.

Indeed, hacking the game to keep rendering enabled seems to remove the random flicker. Visiting the secret room with the game hacked to not shuffle sprites shows flicker specifically on tile sprite indices 4 and 5.

However, OAM is fully written during every vblank, as shown below. Shouldn't any corruption caused by disabling rendering be fixed by this?

Code: Select all

F5D7    LDA #$00                 
F5D9    STA OamAddr_2003         
F5DC    LDA $06C2                
F5DF    STA SpriteDma_4014       

I'd appreciate any insight anyone has into either of these two issues.
Last edited by Fiskbit on Mon Apr 06, 2020 8:44 am, edited 1 time in total.
User avatar
Dwedit
Posts: 4924
Joined: Fri Nov 19, 2004 7:35 pm
Contact:

Re: Isolated Warrior unemulated graphical glitches

Post by Dwedit »

Just rechecked which mapper the game uses, and it's standard MMC3B, not the Acclaim variant.
Here come the fortune cookies! Here come the fortune cookies! They're wearing paper hats!
lidnariq
Posts: 11432
Joined: Sun Apr 13, 2008 11:12 am

Re: Isolated Warrior unemulated graphical glitches

Post by lidnariq »

Fiskbit wrote: Mon Apr 06, 2020 7:17 am [Isolated Warrior title screen]
Here, you can see glitchy slivers above and below the mid-screen animation. Their locations correspond to writes to $8001 for CHR swapping. They're not present on every power-on and reset only affects them on a frontloader, which implies they're sensitive to CPU/PPU alignment.
Perhaps the value of the write propagates through the MMC3 at the exact wrong time, such that the value exposed on the CHR banking bus is either (old|new) or (old&new). Or perhaps the MMC3 is another asynchronous device (latching while /ROMSEL is low or M2 is high?) and this is another shoot-through glitch with the wrong value being on the CHR banking bus for one sliver. Do any other CHR banks have solid color %01 there?

edit: the only banks I see where tile 0 is mostly slivers of color %01 are $7B, $6F, maybe $4E, maybe $0D. Since the shoot-through glitch "should" be the upper half of the address, and bank 0=$80 doesn't have color %01 in tile 0, that hypothesis seems less likely.

edit2: a shoot-through glitch won't affect both bitplanes. If the resulting color is %01, then it could also be source pixels that would have been %11 had it not corrected itself. In that case, banks $18 and $73 might also count as plausible sources of the glitchy pixels.

edit3: Unfortunately, at the location of the raster split, the game is switching from 2KB bank 0 to 2KB bank $70. There's no obvious source of a 1 bit to produce a wrong bank here.
However, OAM is fully written during every vblank, as shown below. Shouldn't any corruption caused by disabling rendering be fixed by this?
No. Whatever bug is tickled by disabling rendering too early within a scanline is not fixed by re-filling OAM. My best guess is that this is what Visual2C02 calls "spr_ptr" having some wrong value (while writes to $2003 set "spr_addr").
Fiskbit
Posts: 891
Joined: Sat Nov 18, 2017 9:15 pm

Re: Isolated Warrior unemulated graphical glitches

Post by Fiskbit »

I've attached a test ROM for the OAM flicker issue. I probably went way overboard on this test, but I guess I'd rather have too many features than not enough.

The test allows you to disable rendering at a user-controlled location (shown with greyscale enabled) and observe its effects on the 64 sprite tiles, which are drawn as a square with sprite 0 in the top left and increasing in index top to bottom, left to right. The location can be specified with the D-pad (right/left are +/- 3 dots, down/up are +/- 342 dots), and you can hold B to make it act on button down instead of button press. It uses blargg's PPU/NMI synchronization code to get the location down to a pair of dots. With this, you can see which target dots will corrupt which OAM rows. No emulators that I'm aware of support this behavior at all. I've seen this on E, G and H PPUs, but not B.

I took down some results for a single alignment on a rev H PPU with an Everdrive N8 Pro. Glitches occur if rendering is disabled during a period beginning at the start of the scanline and another period beginning at the start of hblank. For start-of-scanline, the first glitchy dot is the right dot of 6F,80 (scanline,cycle delay) and the last is the left dot of AD,80. The width of each region that corrupts a specific OAM row is 2 dots. For start-of-hblank, the first glitchy dot is the right half of 6F,D5 and the last is the left half of AE,D5. Here, the width of each region is 1 dot, with an extra 4 dots for every 4th region (so, it goes 1, 1, 1, 5, repeat).

The test also lets you toggle whether rendering is disabled at the target location by pressing A, toggle whether $2003 is set (to 0) during vblank (see this thread), and toggle whether OAM DMA is performed. Pressing A on joypad 2 will test a single instance of the flicker glitch over a multiple-frame window: it first does OAM DMA and keeps rendering on, then does OAM DMA (OAM should be clean now) and turns off rendering, then does OAM DMA and keeps rendering on (the glitch should be visible despite the DMA), and then avoids OAM DMA and keeps rendering on (the glitch should persist). Since this test has a period where rendering is disabled early and OAM DMA is not done, the split should be near the bottom of the screen to avoid OAM decay.

A couple other interesting things I found while playing with this test:
- Greyscale seems to take effect 2 dots before rendering is disabled. You can see this by putting the target location on top of the sprite tiles. I'm pretty sure I've read this before and Mesen seems to get this right, but other emulators I tried (Nintendulator, NesHawk) do not.
- Since the variable length of OAM DMA is a critical part of maintaining the PPU/NMI sync, disabling OAM DMA (by pressing start) prevents us from being able to target a specific dot pair. The test attempts to solve this by falling back on DMC DMA. It plays a 1-byte DPCM sample without loop and with a write cycle 3 cycles after the write to $4015 so the DMA length can vary between 3 and 4 cycles. While this works in Mesen and Nintendulator, it doesn't work consistently on real hardware (it looks like it works on most frames, but there are frequent individual frames where it breaks sync). DMA timing in these emulators seems very good, so I'm wondering if the problem is that they're initiating DMC DMA on the wrong cycle compared to real hardware.

If anyone finds any issues with the test, sees a problem with the timing (there's a lot of timed code), or has any suggestions, please let me know. I'd like to do some testing on the MMC3 issue, too, but that'll probably wait until I make an MMC3 dev cart because the test requires a real MMC3.
Attachments
oam_flicker_test.zip
(13.23 KiB) Downloaded 336 times
Fiskbit
Posts: 891
Joined: Sat Nov 18, 2017 9:15 pm

Re: Isolated Warrior unemulated graphical glitches

Post by Fiskbit »

Sour and I worked out a lot of the behavior around the OAM flicker issue and it's now supported in Mesen as of 0.9.9.41 with the Options -> Emulation -> Advanced -> "Enable PPU OAM row corruption emulation" setting. It seems to be decently accurate in both Isolated Warrior and oam_flicker_test, but has some known limitations.

Assuming a 3-dot delay in the time to disable rendering after a $2001 write, the issue occurs on dots 1-64 and 257-320. Landing in these regions will trigger future corruption of 1 row of OAM, setting its data equal to row 0's. For the 1-64 region, the corrupted row's index is the dot index divided by 2. For the 257-320 region, it's similar, but the window for each corrupted row isn't a uniform 2 dots wide; rather, it's 1 dot except for every 4th row (bottom 2 bits of row index are %11), where it's 5 dots.

Corruption appears to occur on the first dot of rendering. When this occurs coming out of vblank, where rendering was disabled until vblank and is then enabled within vblank (as in both Isolated Warrior and oam_flicker_test), Sour's testing in Visual NES shows that the corruption occurs instantly on scanline -1, dot 0. This is why the issue affects any new data that may have been written to OAM during vblank. We currently believe that the bad state that causes this corruption lasts until the next time rendering occurs, regardless of how many frames later it may be. We also believe this state persists across PPU reset, which explains the behavior I was seeing here on a frontloader, where 2 sequential sprite tiles would sometimes be missing on reset (and apparently on boot, too, in lidnariq's testing). This reset behavior hasn't been explicitly tested yet.

In oam_flicker_test, which has a jitter of 1 dot, row 1 should corrupt on both dots (ie completely disappear) from the 1-64 region with UD,LR values of 64,84 or 65,84, depending on CPU/PPU alignment. Row 1 should flicker from the 257-320 region at 64,67 or 65,67, depending on alignment. Pressing down will advance by 1 dot and cause corruption of rows as described above (with flickering due to the 1-dot jitter from frame to frame). Tiles will also disappear on the right from sprite overflow.

We also tested the case where rendering is enabled outside of vblank (the "reenable" case). A modified version of oam_flicker_test that enables rendering again 6 CPU cycles after disabling it is attached. Behavior in this case appears to be more complicated and is not yet fully understood. Corruption occurs when rendering is enabled again, so if you perform OAM DMA every frame, corruption can only be visible below the $2001 writes. Corruption does not appear to be affected by varying the timing of when rendering is enabled again. Unlike the vblank case, corruption in the reenable case only occurs on some CPU/PPU alignments; referring back to the previous paragraph's UD,LR values, the alignments that require the one-smaller UD value don't reproduce the glitch in the reenable case, while the one-larger UD value alignments do.

The reenable case features more corruption than is present in the vblank case. In addition to the vblank case's corruption, the reenable case can cause sprite tiles to get moved to new locations and also cause visible corruption of OAM row 0. Corruption of some kind can also occur between dots 65-256, but its behavior is not yet understood. At least some of the additional corruption doesn't appear to be entirely stable; holding on the same 2 dots can cause the corruption to change over time. This testing was done on an Everdrive N8 Pro, and it's not known if that could be impacting the stability, but the N8 Pro doesn't appear to cause the OAM corruption produced by the original N8 (likely because of termination resistors on the N8 Pro's CPU data lines) and the access pattern that triggers that corruption isn't being done here.


During this testing (particularly with the reenable case), additional PPU behavior related to interrupting rendering was observed. Most or all of this is probably known, but as they don't appear to be properly emulated, I'll describe what we found about these in more detail.

- Disabling rendering during sprite evaluation causes sprite problems on the next scanline. These problems include some sprite tiles not being drawn (appearing as a black bar in the reenable test) or a single bad sprite sliver caused by OAM data being used with a misaligned byte index (but the misalignment does not persist beyond that one sliver). Support for this has been improved in Mesen in 0.9.9.41, but still doesn't quite match real hardware.

- There appears to be a delay of approximately 4 pixels when toggling rendering with $2001 writes. Turning greyscale on and rendering off will show some greyscale BG/sprite pixels before rendering stops, and toggling both of these again turns off greyscale some pixels before rendering begins again. On a CRT with composite, it's hard to distinguish the exact size of these regions, especially with the jitter, but I've attached a picture showing real hardware compared to Mesen's current emulation, where Mesen is on the right of the two jitter states. The bright grey is greyscale sprites, the dark grey is greyscale non-rendering, and the black is non-rendering. The length of non-rendering is 18 dots, and the line appears to cover 23 pixels, which likely includes 1 jitter pixel, so it's probably 22 pixels long in total.

- When rendering is reenabled, the PPU draws the first 8 pixels that would have been drawn where rendering was disabled. You can see this in the picture as approximately 2 pixels of red triangle and 6 pixels of blue triangle after the post-greyscale black region.

- The first background sliver after those 8 delayed pixels is corrupt, presumably because of the timing of each rendering toggle relative to the 4 background fetches.
Attachments
Rendering toggle artifacts.png
oam_flicker_test_reenable.nes
(24.02 KiB) Downloaded 263 times
Alyosha_TAS
Posts: 173
Joined: Wed Jun 15, 2016 11:49 am

Re: Isolated Warrior unemulated graphical glitches

Post by Alyosha_TAS »

Hello, and thanks for a cool new test!

I started looking at this in NESHawk (the greyscale disable thing is a simple drawing bug.)

However more so then the OAM glitches, I am very interested in the results from pressing start to turn off OAM DMA and use DMC instead.

It looks like in Mesen it still produces a normal pattern: 1-dot-on, 1-dot-off.
In NESHawk though it produces a much different pattern: 1-dot-on, 1-dot-off, 5-dots-off, 5-dots-on.
Since you have said there is also some desync happening on hardware, I wonder if this test can help in ironing out some details.

As far as I know both emulators pass all relevent DMA test, so if this test can somehow be used to determine where differences lie and where things don't match hardware, I think that would be a big advancement!
Fiskbit
Posts: 891
Joined: Sat Nov 18, 2017 9:15 pm

Re: Isolated Warrior unemulated graphical glitches

Post by Fiskbit »

I've written a new test to try to get a better handle on the timings here. This test uses greyscale to measure the timing of various cases related to $4015 writes. The greyscale region shows a white dot every 3 pixels to allow for easy counting of CPU cycles, though because of the 1-dot jitter of the PPU/NMI synchronization code, two dots may fall on the jittery ends of the line such that it looks like there's an extra, so some care is needed.

DMC is configured to use the fastest rate, play a 1 byte sample, and not loop:

Code: Select all

    LDA #$0F
    STA $4010  ; DMC_FREQUENCY
    LDA #<(dDpcmAddress >> 6)
    STA $4012  ; DMC_ADDRESS
    LDA #$00
    STA $4014  ; DMC_RAW
    STA $4013  ; DMC_LENGTH
There are 6 test cases. Ignoring DMA, cases 0-4 should be 13 cycles long for easy comparison and case 4 should be 8 cycles long. Case 3 matches what is done in oam_flicker_test. The cases are as follows:

Code: Select all

X == #$0B (kPpuMaskEnableBg | kPpuMaskEnableLeftmostColumnBg | kPpuMaskGreyscale)
Y == #$10 (kSoundDmc)
A == #$0A (kPpuMaskEnableBg | kPpuMaskEnableLeftmostColumnBg)

Test 0: No work is done. This serves as a control case.
    STX PPU_MASK
    LDX $8000
    LDX $00
    NOP
    STA PPU_MASK

Test 1: $4015 is written. The next 8 cycles are all read cycles.
    STX PPU_MASK
    STY $4015  ; SOUND_CONTROL
    LDX $00
    NOP
    STA PPU_MASK

Test 2: $4015 is written. Write cycles occur 4 and 5 cycles later.
    STX PPU_MASK
    STY $4015  ; SOUND_CONTROL
    INC $00
    STA PPU_MASK

Test 3: $4015 is written. A write cycle occurs 3 cycles later.
    STX PPU_MASK
    STY $4015  ; SOUND_CONTROL
    STX $00
    NOP
    STA PPU_MASK

Test 4: $4015 is written. A write cycle occurs 4 cycles later.
    STX PPU_MASK
    STY $4015  ; SOUND_CONTROL
    STX $0000
    STA PPU_MASK - $10,Y

Test 5: $4015 is written. A write cycle that disables greyscale occurs 4 cycles later.
    STX PPU_MASK
    STY $4015  ; SOUND_CONTROL
    STA PPU_MASK
I tested on an AV Famicom with non-laser RP2A03H/RP2C02H and an AV-modded Famicom with RP2A03/RP2C02A. On the rev H, I got these results:

Test 0: 13 cycles solid.
Test 1: 16 cycles solid. Occasional blips of 1 or 4 cycles.
Test 2: 16 cycles solid. Occasional blips of 1 or 4 cycles.
Test 3: 16 cycles solid. 1 cycle flickering. Occasional blips of 1, 3, or 4 cycles.*
Test 4: 16 cycles solid. 1 cycle flickering. Occasional blips of 1, 3, or 4 cycles.*
Test 5: 8 cycles solid. 3 cycles flickering. Occasional blips of 4 cycles.

The letterless CPU matched these results, except that for blips, only the 1 cycle blips occurred. The blips are infrequent; I'd guess they occur around 1% of the time. *The blips are only ever 1 or 4 cycles; the 3-cycle blips mentioned above are just 4-cycle ones when the 1 flickering cycle is off.

I'll refer here to DMC DMAs initiated by the $4015 write as 'load' DMAs, and DMC DMAs for fetching additional bytes as 'reload' DMAs. Because I can't make the 1st and 2nd cycles after the $4015 write be write cycles, I can't check whether they impact the length of the DMA and thus whether the load DMA ever lands there. However, I can conclude that:
- Load DMAs normally take 3 cycles. (Test 1)
- Individual write cycles can make load DMAs take 4 cycles. (Tests 3 and 4)
- Pairs of write cycles make the DMA take 3 cycles. (Test 2)
- This suggests load DMAs initiate on the opposite cycle parity of reload DMAs, which take 4 cycles on reads and 3 cycles when delayed once by a write.
- Load DMAs occur on the 3rd and 4th cycles after the $4015 write. I can't say for sure they don't occur on other cycles, but I suspect they don't, and if they landed on the 5th cycle, test 2 (INC) should have alternated between 3 and 4 cycle DMAs.

Ignoring blips, Mesen doesn't match cases 4 and 5, but Sour has been able to fix it by adding a 1 cycle delay so that its DMAs land 3 or 4 cycles later. NESHawk doesn't match cases 1, 2, and 3; it looks as though write cycles in NESHawk simply eat into the DMA time rather than rescheduling the full DMA and having the appropriate length depending on whether it's a get or put cycle. This might be the case even on reload DMAs, since it keeps the same parity and thus would still allow NESHawk to pass my test that does synced controller reads forever.

For the blips, I wondered if an extra DMA is occurring, and Sour had the idea that maybe the APU is mistakenly issuing a reload DMA after the $4015 write when it lines up with the current rate, which would explain the infrequency. This seems plausible, though the 1-cycle blips don't seem obviously explainable as an extra DMA. Perhaps the 1 cycle blip is just a change to timing on the load DMA, while the 4 cycle one is an extra DMA. The exact timing of these isn't clear, nor what address is being fetched.

To test Sour's theory, I added the ability to change the DMC rate (select) and found that slower rates also make blips less frequent. I also found that blips seem to have a pattern, which I've documented below. To dig into this issue more, I added the ability to control on which frames the test is done (start) so that the flickering dots are either always on or always off, and found that certain blips in the pattern occur on certain 2-frame parities. In the data below, which is all with Rate F, Frame toggle 0 tests all frames, while Frame toggle 1 and 2 test every other 2 frames.

Code: Select all

Test 1:
- Frame toggle 0: 4 4 1 1
- Frame toggle 1:   4   1
- Frame toggle 2: 4   1
Test 2:
- Frame toggle 0: 4 4 1
- Frame toggle 1: 4   1
- Frame toggle 2:   4
Test 3:
- Frame toggle 0: 4 4 1 1
- Frame toggle 1: 4   1   (short)
- Frame toggle 2:   4   1 (long)
Test 4:
- Frame toggle 0: 4 4 1 1
- Frame toggle 1: 4   1   (long)
- Frame toggle 2:   4   1 (short)
Test 5:
- Frame toggle 0: 4
- Frame toggle 1:         (short)
- Frame toggle 2: 4       (long)
I've checked these results multiple times and am pretty sure they're correct, and they don't seem to vary by alignment. Frame toggle 1 appears to correspond to the DMA landing on the 4th cycle after the $4015 write, and Frame toggle 2 the 3rd cycle. RP2A03 (letterless) results are the same without the 4-cycle blips. I've confirmed the 4-cycle blips are present on RP2A03G.


Two additional findings I had that aren't directly related to the goal of the test:
- There appears to be some unemulated complexity with greyscale timing and CPU/PPU alignment. The greyscale region appears to be 3*n - 1 dots on some alignments rather than the expected 3*n dots. This results in two adjacent scanlines each having one white tick on a jittery dot rather than one scanline with white ticks in both jittery dots. You can move the test location around on the screen; the test targets UD,LR 4E,6F by default because it appears to always be safe on hardware. On some alignments, you get the full 3*n width and see white ticks in the jitter on 4C,6F, but on the shorter alignments, you'll have a white tick in the left jitter on 4C,6F and in the right jitter on 4D,6F. The greyscale region appears to start on the same pixel on both types of alignment; the difference is in how late they end.
- RP2C02A displays corruption of one background sliver when $2001 is written to, similar to the corruption seen when enabling rendering mid-screen such as in oam_flicker_test_reenable.nes. This occurs on both $2001 writes in this test.
Attachments
dmc_dma_start_test.zip
(13.37 KiB) Downloaded 246 times
Alyosha_TAS
Posts: 173
Joined: Wed Jun 15, 2016 11:49 am

Re: Isolated Warrior unemulated graphical glitches

Post by Alyosha_TAS »

This is great! We've needed a test like this for years! I'll definitely start looking into it. Thanks a lot for putting this together.
Fiskbit
Posts: 891
Joined: Sat Nov 18, 2017 9:15 pm

Re: Isolated Warrior unemulated graphical glitches

Post by Fiskbit »

TL;DR: Added 9 new tests in v2:
1. Tests 6-7 test DMC DMA behavior relative to OAM DMA and show that:
- 1a. $4015-initiated DMC DMA adds 2 cycles if it lands on a $4014 write cycle.
- 1b. 4-cycle blips either immediately precede or follow the DMC DMA with a 0-cycle gap.
- 1c. 4-cycle blips add only 2 cycles during OAM DMA.
- 1d. 1-cycle blips add 0 cycles if they land in OAM DMA.
2. Tests 8-B show that the 1-cycle blip occurs 2 cycles after the $4015-initiated DMC DMA and does not occur if that is a write cycle. These blips are probably the RDY pin being deasserted for 1 cycle.
3. Tests C-E check for joypad 2 input corruption resulting from 4- and 1-cycle blips. These require a second joypad that returns 1's after 8 reads.
- 3a. On RP2A03H, the 4-cycle blips cause 1 extra joypad read (2 extra total) and the 1-cycle blips do not.
- 3b. On RP2A03, every DMA delay cycle causes an extra joypad read, so there are 2 extra reads from $4015-initiated DMA and 1 extra read from each 1-cycle blip. Normal DMC DMA causes 1-3 extra reads (not sure why this isn't always 3).

-----


I've written a new version with 9 additional tests and will walk through my analysis and results. Most of this is presented in the README in one place and with less analysis, so refer to that if you want the raw data in an easier-to-consume form. Here are the new tests:

Test 6: OAM DMA is performed. This serves as a control case for test 7.
Test 7: $4015 is written. A write cycle occurs 4 cycles later that triggers OAM DMA.

Test 8: $4015 is written. A write cycle occurs 5 cycles later.
Test 9: $4015 is written. A write cycle occurs 6 cycles later.
Test A: $4015 is written. Write cycles occur 5 and 6 cycles later.
Test B: $4015 is written. Write cycles occur 3 and 6 cycles later.

Test C: $4015 is written. Joypad 1 is read 4 cycles later.
Test D: $4015 is written. Joypad 1 is read 5 cycles later.
Test E: $4015 is written. Joypad 1 is read 6 cycles later.

An important note about the results below: the Frame toggle 1 and 2 data will swap when changing the cycle parity the test is run on using left or right. The results here are for the parity from the default cycle delay of 6F (the LR value), which puts cycle 3 DMA on Frame toggle 2 and cycle 4 DMA on Frame toggle 1.

First, I added tests 6 and 7 to test behavior around OAM DMA. Cycle 3 is the read cycle right before the $4014 write, and cycle 4 is the $4014 write. Disch's findings were that landing on the $4014 write should take 2 cycles, regardless of whether the write is a get or put cycle. This matches up with my findings here, but the result I was more interested in is how this affects the blips. I'm not going to try to count the number of dots on the screen, but compared to test 6, test 7 shows:

Frame toggle 1 (cycle 4): +2 cycles solid. 2 cycle blips.
Frame toggle 2 (cycle 3): +2 cycles solid. 4 cycle blips.

Code: Select all

Blip pattern:
- Frame toggle 0: 2 4
- Frame toggle 1: 2
- Frame toggle 2:   4
(It's a little unintuitive, but the cycle 3 case only adds 2 cycles normally instead of the expected 3 from the length of the DMC DMA. This is because without DMC DMA, the $4014 write in this case always lands on a put cycle and so OAM DMA takes an extra cycle, while DMC DMA changes the alignment so the write is on a get cycle, saving a cycle on OAM DMA.)

I'm guessing at this point that the 4-cycle blip is a reload DMA that occurs immediately after the load DMA. We see that the length is 4 immediately before the start of OAM DMA and 2 immediately after, which suggests that there is no cycle gap between the load and glitch DMAs. DMC DMA always ends on a get cycle, so the very next cycle is a put and is suitable for a reload DMA. If it occurred before by delaying the load DMA, it would mean engaging on a get and making the load DMA engage on a put. Either arrangement results in the same cycle length, but occurring after lets both DMAs land on their preferred cycle parities instead of reversed parities. I can't guess the ordering if they get delayed by a write cycle.

The 1-cycle blips seem to be impacted by what follows the load DMA, which means they have to occur after and have at least a 1 cycle gap. In test 2, the cycle 3 case is followed by 2 write cycles and doesn't show 1-cycle blips. In test 5, the cycle 3 case is followed by a write cycle that ends greyscale and doesn't show the 1-cycle blips. In test 7, neither case shows 1 cycle blips, which means OAM DMA is interfering. It sounds like the RDY pin is being deasserted for one cycle either 1 or 2 cycles later if the cycle is a read; just 1 cycle would prevent cycle 3 in test 4 from having the blip, and more than 2 should show 1-cycle blips on test 2's cycle 3 case. Just a 2 cycle gap should explain everything we see.

To try to test the 1-cycle blip timing, I wrote tests 8-B assuming the blip occurs 2 cycles later and is suppressed by a write cycle. We get the following blip results on RP2A03H, and the same without the 4-cycle blips on RP2A03:

Code: Select all

Test 8:
- Frame toggle 0: 4 4 1
- Frame toggle 1:   4 1
- Frame toggle 2: 4
Test 9:
- Frame toggle 0: 4 4 1
- Frame toggle 1:   4
- Frame toggle 2: 4   1
Test A:
- Frame toggle 0: 4 4
- Frame toggle 1: 4
- Frame toggle 2:   4
Test B:
- Frame toggle 0: 4 4
- Frame toggle 1: 4
- Frame toggle 2:   4
This seems to confirm that the 1-cycle blips do indeed occur 2 cycles later and are suppressed by writes. Test B verifies that it's always a 2-cycle gap, even when the load DMA is delayed by a write cycle.

Next, I wanted to get a better idea of what's going on during the extra delays by checking how they interact with joypad reads. Tests C-E overlap cycle 4 load DMA and cycle 5 and 6 1-cycle blips with $4017 reads and display a count on the top right of the screen of the number of times joypad bits have been set, which indicates the number of lost bits (since each lost bit will shift in a 1 from after the button state). Here are the results:

RP2A03H:
Test C:
- Frame toggle 1: 1 bit deleted every frame. 1 additional bit deleted occasionally.
- Frame toggle 2: No bits deleted.
Test D: No bits deleted.
Test E: No bits deleted.

RP2A03:
Test C:
- Frame toggle 1: 2 bits deleted every frame.
- Frame toggle 2: No bits deleted.
Test D:
- Frame toggle 1: No bits deleted.
- Frame toggle 2: 1 bit deleted occasionally.
Test E:
- Frame toggle 1: 1 bit deleted occasionally.
- Frame toggle 2: No bits deleted.

Revision H matches what we would expect: the 4-cycle blip causes the joypad to be clocked 1 additional time, while the 1-cycle blips have no effect, so the joypads are only clocked on the first of consecutive-cycle reads (like with MMC1 writes). The letterless results are much more interesting: the joypads get clocked on every cycle, consecutive or not. I guessed that this means that normal DMC DMA, which takes 4 cycles, would clock 3 extra times, and a quick and dirty test confirms this, but I also saw cases where it clocked only 1 or 2 extra times. Since test C seems to consistently lose 2, never 1, perhaps something else is going on here, assuming my test is working correctly; I'll make a more targeted test for that later.

I can't think of any more tests I can run that can narrow down the behavior of the blip glitches any further. I'm open to suggestions on if anyone has ideas, but I suspect we may need to use Visual 2A03 to figure out the exact APU timing for triggering the problem.
Attachments
dmc_dma_start_test_v2.zip
(14.82 KiB) Downloaded 279 times
Alyosha_TAS
Posts: 173
Joined: Wed Jun 15, 2016 11:49 am

Re: Isolated Warrior unemulated graphical glitches

Post by Alyosha_TAS »

Brilliant work. Accurate DMC DMA has long been a missing piece in NES emulation. I think you've pretty much got it covered now.
Alyosha_TAS
Posts: 173
Joined: Wed Jun 15, 2016 11:49 am

Re: Isolated Warrior unemulated graphical glitches

Post by Alyosha_TAS »

I have all the basic things working properly for these tests, but I am not getting the 1 and 4 cycle blips that are mentioned. I understand that the 4 cycle blips are just coincidental timing between loading a byte and reloading one when the timing lines up, but I'm not understanding the 1cycle blips. What does the '2-cycle gap' mean?

Is it something like this?:

Code: Select all

RDY    __ __ __ __ --- --- __ --- --- --- --- 
        (DMC DMA) (2 cyc)(blip)(normal)
Fiskbit
Posts: 891
Joined: Sat Nov 18, 2017 9:15 pm

Re: Isolated Warrior unemulated graphical glitches

Post by Fiskbit »

Yes, that's correct; when it occurs, it is always 2 cycles after the $4015-initiated DMC DMA has completed (ie the CPU is allowed to run 2 cycles in between). It can be fully suppressed by colliding it with a CPU write cycle or another DMA. It also occurs just as frequently as the 4-cycle blips. So far as I can tell, it does not occur on the same $4015 write as 4-cycle blips (it could be masked by occurring during the extra DMA time, but that would suggest 1-cycle blips should be twice as frequent on letterless CPUs, which I don't remember seeing). Test B suggests that the gap remains 2 cycles even when DMAs get moved or change in length; in this test, the cycle 3 DMA is delayed to cycle 4, making it take 1 cycle longer, and yet timing a write for the 6th cycle of execution suppresses 1-cycle blips from both the 4th and delayed 3rd cycle DMAs, which are different lengths. However, I don't know if that change in timing changes which frames the 1-cycle blip occurs on.

Kitrinx looked into this issue a while ago at the transistor level and gave me this explanation:
Kitrinx: the conditions for starting a dma transfer are "enabled_delayed & ~have_buffer". have_buffer is recharged on aclk2(phi2) when the dma results are put on the bus. The evaluation of the dmc clock is on aclk1_d, which is the first cpu tick after the results come in. Since enable is delayed, I believe what you're seeing is that the transfer takes place, enabled is set to false, but delayed, on the very next cpu cycle the buffer is consumed and have_buffer is set to false, thus enabled_delayed is still true, and have_buffer is again, false, and it initiates a new DMA session, probably with 1 cycle of ready high
Kitrinx: I don't latch the condition for starting dma, so my core will only produce 1 cycle blips
Fiskbit: Nice, that looks correct for the 1 cycle blips. For the 4-cycle blips, it sounds like you're saying they occur after the $4015 write's DMC DMA. If have_buffer is set when DMA results are put on the bus, why is it false such that it can cause another DMA afterward? And what address does this extra DMA target?
Kitrinx: right, so basically it goes like this
Kitrinx: it really depends on what's currently playing and the alignment of the bit counter
Kitrinx: but you'll do a dma fetch, and on the cycle where you get results from it, ie when the data for your dma address is on the bus, it will set the "have buffer" flag to true
Kitrinx: if that cycle, or (I assume) the next aclk1_d as well, the bit counter becomes 7 and calls for a reload, the enable flag will not yet have been cleared
Kitrinx: but the have_buffer flag will be set to 0, because the dma player will reload itself and consume the buffer
Kitrinx: with the have_buffer flag being false, and the enable_delayed flag being true, the conditions for starting a new DMA become true
Kitrinx: it will use an incremented-by-one dma address
Fiskbit: Will it use an incremented address even if the sample is only 1 byte?
Kitrinx: I believe so, but it's hard to read the logic, and most of the nodes aren't labeled, so I have to kind of wing it
Kitrinx: I think enable_delayed allows for up to two cycles for this to happen
Kitrinx: and depending on which, you'll get the 1 or 4 cycle blips
Turning this information into exact timing is beyond my current skills, so I intend to push forward more on software testing for this issue. I'd hold off on trying to emulate the blips until we have the timing nailed down; I think it'd be better not to emulate them than get them wrong. A test where we attempt to time the blips from CPU power-on should give us the necessary info by placing the blips relative to known power-on state, but as I don't yet have a cartridge that would allow me to run such a test, I've not prioritized it. Note that this still wouldn't show what byte is being fetched by the 4-cycle blips, and it's not clear to me what happens to the extra fetch (which byte is kept in the buffer?), though presumably we could at least play a 17 byte sample and see if the blips cause it to end early. Kitrinx says this test should work from reset, as well, because the bits-remaining counter is set to 0 on reset, so cold boot apparently isn't a strict requirement.
Alyosha_TAS
Posts: 173
Joined: Wed Jun 15, 2016 11:49 am

Re: Isolated Warrior unemulated graphical glitches

Post by Alyosha_TAS »

Thank you for the reply, that is very helpful information.

So is the bits remaining counter confirmed zero at power on? I've been wondering that myself. Also do you know anything about the state of the timer?

These blips and the exact power on state are the last two things keeping me from verifying TASes on console that use the DMC channel. The start up state does seem to be consistent, but I haven't been able to figure out what it is and also lack a dev. board to run any tests (really need some results from power on for count_errors.nes)
Fiskbit
Posts: 891
Joined: Sat Nov 18, 2017 9:15 pm

Re: Isolated Warrior unemulated graphical glitches

Post by Fiskbit »

As far as I know, it hasn't been confirmed in software that the bits remaining counter is 0 on reset, but Kitrinx double checked while discussing with me and said the reset signal grounds out all 3 bits.

Regarding the DMC channel's LFSR timer, my understanding is that it starts at 0 on power-on and becomes 1 on the next tick due to special logic to inject a 1 if it's ever 0 (and is otherwise LFSR.8 ^ LFSR.4). The timer expires when the LFSR becomes 0x100, which takes a full 1024 CPU cycles in this case, and then I think it gets reloaded with whatever the current rate's LFSR value is (0 by default), as normal. On reset, it sounds like the rate gets set to 0 and the LFSR gets set to the value for rate 0, not to value 0 like in the cold boot case.

I'd be interested to know if a sufficiently fast power cycle could cause the DMC timer to be non-0 on boot. [Edit: I wonder this because of this post by lidnariq about the noise channel that says "To be clear: the noise LFSR just happens to have all 0s on initial power up. A warm reset (or even an "insufficiently" cold boot) won't reset its contents." Kitrinx also says in the thread linked below "Incidentally, the noise timer is also the same exact setup". It sounds like a fast power cycle could potentially cause any value to be present in the LFSR.]

(See DMC LFSR formula for more information.)
Alyosha_TAS
Posts: 173
Joined: Wed Jun 15, 2016 11:49 am

Re: Isolated Warrior unemulated graphical glitches

Post by Alyosha_TAS »

Ok cool, thanks again for the info, I will try to see if I can match any game play with that even without the blips.
Post Reply