Help me integrate get-put synchronized controller reading

Discuss technical or other issues relating to programming the Nintendo Entertainment System, Famicom, or compatible systems. See the NESdev wiki for more information.

Moderator: Moderators

tepples
Posts: 22693
Joined: Sun Sep 19, 2004 11:12 pm
Location: NE Indiana, USA (NTSC)
Contact:

Help me integrate get-put synchronized controller reading

Post by tepples »

The nrom-template, snrom-template, and bnrom-template repositories currently contain a controller reading routine that first reads both controllers, second reads both controllers again, and third discards any button press data that does not match between the two reads. The intent was to provide a sample of a routine that is somewhat robust to input corruption through the DMC DMA bit deletion bug on 60 Hz systems.

It was discovered by a user in the NESdev Discord server that at high playback rates (particularly $0F = 33143 Hz), the DMC DMA bit deletion bug can corrupt both the first and second reads. The symptom, as seen in games like Gimmick!, is occasional presses of Right on the Control Pad when bit deletion causes the 11111111 epilogue to spill into the first eight reads after a strobe. It may, for example, delete an early bit of the first read and a late bit of the second read. In each case, the input bits both get corrupted to 00000001, and because they match, the game accepts them as valid.

A staff member claims that I am doing the NESdev community a disservice by allowing this flawed bit deletion avoidance routine to remain in my repositories. I ought to either not provide bit deletion avoidance at all or use OAM DMA to synchronize to the DMA unit's get-put cycle, based on an idea originally proposed by Rahsennor in May 2016 in topic "Glitch-free controller reads with DMC?". However, I have found a few challenges of a get-put synchronized controller reading routine.

Incompatibility with frameskip
Full Quiet, Garbage Pail Kids, and any future games that we make on the same engine avoid slowdown by replacing it with frameskip. Normally, they alternate code to move game characters with code to draw the updated positions: move, draw, move, draw. If, however, they detect that the previous move and draw combined took longer than the 29,780 cycles of one frame, they run another move without the draw. This helps keep the game responsive even in cases where it would otherwise slow down, reducing perceived input lag. Because the sound driver gets called for each move, the player still gets instant feedback through sound effects even if video is delayed a frame.

The get-put synchronized controller reading routine must execute immediately after OAM DMA. This means that if OAM DMA is not scheduled, such as during a draw frame, the controller cannot be read during that same frame. A few workarounds have been proposed:
  • The first is to run OAM DMA outside vertical blanking, relying on the data not being written anywhere. I fear what effect this might have on the internal OAM address (and therefore on object evaluation) during the scanlines where OAM DMA is taking place. I also fear OAM DMA spilling into the next frame's vblank if lag gets deep enough that calculation completes somewhere in lines 237 through 240.
  • The second is to double-buffer shadow OAM. One problem with this is that my object plotting routine is already using register Y to index into a metasprite shape definition in ROM using (zp),Y mode, leaving register X free to index into a shadow OAM table at a fixed address using abs,X mode. Double-buffering shadow OAM would require shadow OAM table's address not to be fixed. This would probably need to use the rarely used (zp,X) mode with constant X = $00.
  • The third is to optimize my code better. I don't know how to do this. Collision detection for the player, the player's projectile, and ten on-screen enemies with a background collision map at 8×4-pixel granularity takes a while, and I don't know how to search the collision map any faster than I already do. How much do you charge for this service?
  • The fourth is to just accept increased input lag. I currently do this on every detected corruption, using the previous frame's data. I'd end up having to do this on every lag frame, reducing the overall input polling rate to 30 Hz. If samples are used for percussion, as in Konami games and Garbage Pail Kids, it should still be safe to poll the controller once the sample length has run out ($4015 bit 4 = 0). If samples are used for bass, chords, or other sustained sounds, as in later Sunsoft games, there is so much DMC active time that lag becomes more noticeable.
Post-vblank mapper preparation
Full Quiet, Garbage Pail Kids, and any future games that we make on the same engine perform a handful of tasks after OAM DMA. These can spill over into the pre-render scanline and scanline 0 of the picture, which are never and practically never visible respectively. They mostly involve setting up the MMC3's timer and CHR banks. I would need to make them constant-timed so as not to disrupt synchronization.

IRQs landing during controller reading
Full Quiet, Garbage Pail Kids, and any future games that we make on the same engine scroll in eight directions. This means the raster split points move up and down while jumping or climbing ladders. If there is a raster split near the top of the screen, it could interrupt controller reading and thereby cause it to lose synchronization. I would need to hide the background in the top scanlines so that raster splits can never occur there.

Integration testing
What emulators can be set to log or break when $4016 or $4017 accesses occur on the wrong half of the get-put cycle?
User avatar
TakuikaNinja
Posts: 76
Joined: Mon Jan 09, 2023 6:42 pm
Location: New Zealand
Contact:

Re: Help me integrate get-put synchronized controller reading

Post by TakuikaNinja »

One weird thing I ran into is that the polling routine in the FDS BIOS has a loop branch crossing a page boundary, which somehow allows their "expansion port + reread loop" to work reliably. (Someone please prove me wrong...)
I know this is the case because Dr. Mario has the same code, and making it use the aforementioned reread routine causes the input corruption you've mentioned. Moving the code so the polling part crosses a page boundary managed to avoid any spurious right inputs for the entire duration I was testing it (~2 hours). I'll paste the relevant code here soon.
User avatar
TakuikaNinja
Posts: 76
Joined: Mon Jan 09, 2023 6:42 pm
Location: New Zealand
Contact:

Re: Help me integrate get-put synchronized controller reading

Post by TakuikaNinja »

Here are the relevant routines from the BIOS disassembly:

Code: Select all

;  poll controllers and expansion port
ReadPads:
$E9EB	LDX $FB
$E9ED	INX
$E9EE	STX $4016
$E9F1	DEX
$E9F2	STX $4016
$E9F5	LDX #$08
$E9F7	LDA $4016
$E9FA	LSR A
$E9FB	ROL $F5
$E9FD	LSR A
$E9FE	ROL $00
$EA00	LDA $4017
$EA03	LSR A
$EA04	ROL $F6
$EA06	LSR A
$EA07	ROL $01
$EA09	DEX
$EA0A	BNE $E9F7;  note the page cross here
$EA0C	RTS

; combine expansion port inputs
OrPads:
$EA0D	LDA $00
$EA0F	ORA $F5
$EA11	STA $F5
$EA13	LDA $01
$EA15	ORA $F6
$EA17	STA $F6
$EA19	RTS

; expansion port + reread loop
ReadOrDownVerifyPads:
$EA4C	JSR ReadPads
$EA4F	JSR OrPads
$EA52	LDY $F5
$EA54	LDA $F6
$EA56	PHA
$EA57	JSR ReadPads
$EA5A	JSR OrPads
$EA5D	PLA
$EA5E	CMP $F6
$EA60	BNE $EA52
$EA62	CPY $F5
$EA64	BNE $EA52
$EA66	BEQ $EA25; unconditional branch to exit
Zeta was suggesting that "a reread with very clever alignment is a valid, if difficult, alternate strategy to merely making the reread quick" after I concluded my testing session. I don't claim that this is correct but I thought it would be useful to mention for the sake of research.
Fiskbit
Posts: 861
Joined: Sat Nov 18, 2017 9:15 pm

Re: Help me integrate get-put synchronized controller reading

Post by Fiskbit »

tepples wrote: Thu Nov 16, 2023 11:34 am A staff member claims that I am doing the NESdev community a disservice by allowing this flawed bit deletion avoidance routine to remain in my repositories. I ought to either not provide bit deletion avoidance at all or use OAM DMA to synchronize to the DMA unit's get-put cycle, based on an idea originally proposed by Rahsennor in May 2016 in topic "Glitch-free controller reads with DMC?".
Let's be clear here: I think that providing a function that looks like it works around the DMC glitch but doesn't, and in a template that largely used by new or less-experienced developers, is misleading at best. It makes developers think the problem is solved when it isn't, while incurring a cycle penalty for the majority of users who aren't using DMC because the template expects everyone to use repeated reads. The comment you added after learning the function doesn't work properly doesn't even say that it doesn't work properly, so a user reading it still won't know there's a problem.

I've suggested multiple options to you before you made this post:
1. At minimum, add a clear comment stating that the solution is not reliable and link to other solutions on the wiki. Because you don't want to make any code changes without extensive testing, this is by far the easiest solution while still being somewhat effective. (However, your test burden is higher than that of most hobbyist developers and leads to situations like this where you'd rather offer code with a known bug than fix it at risk of introducing a different bug. I think this is an unnecessarily burden.)

2. Change the code to a similar, but bug-free solution. This would require repeatedly reading and verifying one single controller, then doing the same on the other controller. As long as two passes (from first joypad read to last joypad read) take less time than 428 cycles, it is safe. The problem with your routine is that doing the controllers both in the same verification pass takes too long. Because this solution is similar, it comes with less risk. However, the majority of users not using DMC still have a cycle penalty.

3. Use the OAM synced read solution from the wiki. The code can be used as-is. It has two caveats: it (like OAM DMA) should not be done on lag frames, and the branch must not cross a page boundary. This is a different model than your current solution and thus presumably a riskier change, but it is known to work and fast for all users.
However, I have found a few challenges of a get-put synchronized controller reading routine.
The unfortunate reality is that no one solution works for all use cases. The strict timing requirements of Zeta's sawtooth code makes synced reads impractical or impossible. I think your frameskip and IRQ cases (and maybe the post-vblank work, depending on the complexity of the work) are cases where this solution is difficult or impossible, though repeated reads are also not reliable if they can be interrupted with IRQs. However, we're talking about the controller reading code in your NROM, SNROM, and BNROM templates. None of these offer IRQs, and templates like this are presumably beginner-oriented, with users that are very unlikely to be doing the complex tasks you or Zeta are undertaking. Anyone at your skill level doesn't need to rely on the code in the template.
Integration testing
What emulators can be set to log or break when $4016 or $4017 accesses occur on the wrong half of the get-put cycle?
None that I know of. However, in the newest version of Mesen, a breakpoint on $4016-4017 reads with condition isDma should trigger on corruption, even in the case where a joypad read is erroneously caused by DMA interrupting a CPU read in the range of $4000-401F.


Also, for what it's worth, I think these templates are fantastic and a great tool for new users, and I really appreciate that you offer resources like this to people. It is because we so enthusiastically recommend them to new users that I think this issue should be fixed or at least clearly documented within the template.
User avatar
Memblers
Site Admin
Posts: 4042
Joined: Mon Sep 20, 2004 6:04 am
Location: Indianapolis
Contact:

Re: Help me integrate get-put synchronized controller reading

Post by Memblers »

Why not both? OAM synced read on completed frames, and slower verified reads on lag frames. Idle cycles on a lag frame, and some free PRG-ROM for code both seem like they would be abundant things. I might just adopt that myself.

Answering my own question, I guess in a template it might appear questionable to someone who doesn't know what's going on there. As well as requiring standardized lag frame handling, which is more work.
User avatar
TakuikaNinja
Posts: 76
Joined: Mon Jan 09, 2023 6:42 pm
Location: New Zealand
Contact:

Re: Help me integrate get-put synchronized controller reading

Post by TakuikaNinja »

The other thing is, rereading the controllers one at a time is the strategy SMB3 uses. That seems to be safe afaik.
User avatar
tokumaru
Posts: 12424
Joined: Sat Feb 12, 2005 9:43 pm
Location: Rio de Janeiro - Brazil

Re: Help me integrate get-put synchronized controller reading

Post by tokumaru »

I had this happen to me and, if I remember correctly, I simply unrolled the controller-reading loop to make it fast enough even for the fastest DMC rate. I personally am not a big fan of the OAM DMA trick, because it's not particularly versatile (it affects the structure of your NMI handler, you still need a fallback if you read controllers while rendering is off, etc.) so I'm fine with double reads.
tepples
Posts: 22693
Joined: Sun Sep 19, 2004 11:12 pm
Location: NE Indiana, USA (NTSC)
Contact:

Re: Help me integrate get-put synchronized controller reading

Post by tepples »

Fiskbit wrote: Thu Nov 16, 2023 5:11 pmI've suggested multiple options to you before you made this post:
1. At minimum, add a clear comment stating that the solution is not reliable and link to other solutions on the wiki.
I thought that I had already added such a comment before creating this topic. The comment in the latest commit as of this writing gives the URL of another solution that works "even more reliably". Quoted for convenience:

Code: Select all

; In 2016, Rahsennor invented a way to synchronize to OAM DMA that
; even more reliably avoids read corruption in many cases.
; https://www.nesdev.org/wiki/Controller_reading_code#DPCM_Safety_using_OAM_DMA
I'm trying to understand what I failed to do. Is it that I used language with inadequate force? Or is it that links in assembly language source code are not clickable, and I should add a clickable link in README.md as well?
Fiskbit wrote: Thu Nov 16, 2023 5:11 pm2. Change the code to a similar, but bug-free solution. This would require repeatedly reading and verifying one single controller, then doing the same on the other controller. As long as two passes (from first joypad read to last joypad read) take less time than 428 cycles, it is safe.
So that I may begin to write "a similar, but bug-free solution": How many times did you have in mind to strobe the controller by writing 1 then 0 to $4016? Four, two for player one and two for player two?
Fiskbit wrote: Thu Nov 16, 2023 5:11 pm3. Use the OAM synced read solution from the wiki. The code can be used as-is.
The code can be used as is, but may it? Who is its author and under what terms may I incorporate it into my own works?
Memblers wrote: Thu Nov 16, 2023 7:25 pmWhy not both? OAM synced read on completed frames, and slower verified reads on lag frames. Idle cycles on a lag frame, and some free PRG-ROM for code both seem like they would be abundant things.
You are correct that I have "some free PRG-ROM for code." However, I would then also need to keep the code to switch to the PRG ROM bank containing the controller reading code synchronized. The treemaps of fixed banks in my last three projects with Retrotainment show that PRG ROM space in a fixed bank isn't quite abundant. (Full disclosure: The "Slack" space in each of the treemaps represents free space. Much of it may be tiny blocks of free space and/or space occupied by debug hooks that are conditioned out in release builds.)
Fiskbit
Posts: 861
Joined: Sat Nov 18, 2017 9:15 pm

Re: Help me integrate get-put synchronized controller reading

Post by Fiskbit »

I find that comment very unclear because nothing earlier suggests that the solution in the template isn't a reliable workaround. If the new one is more reliable, what makes the older one less reliable? I think the fact that your implementation of repeated reads does not solve the problem under standard use cases is a major problem worth pointing out explicitly.

Yes, the "similar, but bug-free solution" would strobe, read controller 1, strobe, read controller 1, compare, and (optionally) repeat. Then you would do the same for controller 2. For a solution that reads a single controller indefinitely until 2 consecutive results match, I need to do some analysis to see if it's possible (and if so, how likely) for it to never terminate. Because I'm very busy working on a compo game right now, this will have to wait until next week.

The contents of the wiki are supposed to be public domain unless otherwise stated. If material on the wiki is not public domain, it should be removed. For the other wikis, we explicitly note on the bottom of each page that the material is CC0 unless otherwise stated (public domain, or as close as possible in jurisdictions where not permissible); I don't currently know if we can legally reclassify the NES wiki as CC0 like the others.
tepples
Posts: 22693
Joined: Sun Sep 19, 2004 11:12 pm
Location: NE Indiana, USA (NTSC)
Contact:

Re: Help me integrate get-put synchronized controller reading

Post by tepples »

Is this commit any good? I ran it at the same time as a rate $4F (looping, max rate) DMC playback, and there were no $4016-4017 reads with condition isDma.
Drag
Posts: 1604
Joined: Mon Sep 27, 2004 2:57 pm
Contact:

Re: Help me integrate get-put synchronized controller reading

Post by Drag »

What's the problem with using a byte comparison? It's hard to detect a bit deletion when you only check one bit at a time, and simply comparing the full byte and rereading until you get a match was fine enough for most games.

I understand that we have a desire to iterate on the solutions to already-solved problems, but it's always possible that the "lame boring original" solution is the best one. :P
Fiskbit
Posts: 861
Joined: Sat Nov 18, 2017 9:15 pm

Re: Help me integrate get-put synchronized controller reading

Post by Fiskbit »

This isn't an already-solved problem; it's been discovered that repeated-read solutions that take longer than 1 DMC period to perform a pair of complete reads can miss errors, leading to issues like spurious start presses (and, more obviously, right presses). This affects real games (eg Gimmick) and homebrew software (eg Pino's templates), and the behavior can differ by emulator (eg may require emulation of the bug where DMA can trigger 2A03 register reads) and by console model (RF and Twin Famicoms has different joypad bit deletion behavior).

Repeated read solutions also corrupt the DMC DMA value on about 3/32 collisions, and 2 of those 3 cases also cause a bus conflict between cartridge ROM and the joypad inverters. There are lots of reasons I think this is an inferior solution. After the compo wraps up on Sunday night, I plan to make a post in the old synced reads thread about the latest findings.
tepples
Posts: 22693
Joined: Sun Sep 19, 2004 11:12 pm
Location: NE Indiana, USA (NTSC)
Contact:

Re: Help me integrate get-put synchronized controller reading

Post by tepples »

When reading the controller twice takes more than 428 cycles, two successive reads may both trigger bit deletions. Because a bit is deleted both times, the deletion is not detected, resulting in a phantom Right press. As I wrote in the OP:
at high playback rates (particularly $0F = 33143 Hz), the DMC DMA bit deletion bug can corrupt both the first and second reads. The symptom, as seen in games like Gimmick!, is occasional presses of Right on the Control Pad when bit deletion causes the 11111111 epilogue to spill into the first eight reads after a strobe. It may, for example, delete an early bit of the first read and a late bit of the second read. In each case, the input bits both get corrupted to 00000001, and because they match, the game accepts them as valid.
Rereading is strictly better than nothing. It's also strictly worse than get-put synchronized reading except in the one case of needing to read the controller on a lag frame. If you know you'll need to read the controller on a lag frame, you'll need to fit both the first and second reads in less than 428 cycles. That's doable for one NES controller, less so for a Super NES controller, and impossible for things where strobing has a side effect (such as the mouse or the Arkanoid controller).
Drag
Posts: 1604
Joined: Mon Sep 27, 2004 2:57 pm
Contact:

Re: Help me integrate get-put synchronized controller reading

Post by Drag »

Speaking strictly from a "I want the joypads to be read correctly" standpoint, it should be possible to read a standard NES controller three times back-to-back within one period of the fastest DMC period setting ($F = 54 cycles per 1-bit sample * 8 samples per byte = 432 cycles per DMC fetch), even when using a routine that accounts for the Famicom expansion controller port:

Code: Select all

; Time starts the first moment $4016 is physically read
; and ends on the moment of its final read.
LDX #$00
LDY #$01

STY $00
STY $4016
STX $4016
loop1:
LDA $4016	;4 (-3, time starts on final cycle of this instruction)
AND #$03	;2
CMP #$01	;2
ROL $00		;5
BCC loop1	;3 (-1 on final loop)
		; 16 * 8 - 4 = 124
STY $01		;3
STY $4016	;4
STX $4016	;4
		; = 135
loop2:
LDA $4016	;4
AND #$03	;2
CMP #$01	;2
ROL $01		;5
BCC loop2	;3 (-1 on final loop)
		; 16 * 8 - 1 = 127
STY $02		;3
STY $4016	;4
STX $4016	;4
		; 135 + 138 = 273
loop3:
LDA $4016	;4 (time stops on final cycle of this instruction on final loop)
AND #$03	;2
CMP #$01	;2
ROL $02		;5
BCC loop3	;3
		; 16 * 7 + 4 = 116
		; 135 + 138 + 116 = 389 of 432 CPU cycles.
Unless I've made a mistake with this code or this math, there's a maximum of just one DMC-related bit deletion across these three reads, so there should usually be two that match. If none match (like if button bouncing is occurring), then the process could be repeated, or the inputs from the previous frame can be reused.

Also, assuming the exact physical $4016 read cycle is the specific cycle the DMC needs to interrupt, that's 1 cycle out of 432, so even a strategy of just rereading pairs over and over until you get a match seems reasonable when the DMC will interfere only 16/432=3.7% of the time. It seems realistic that you may be able to do a pair of 12-bit reads from a SNES controller within 432 cycles, maybe just not a full 16-bit read.

You're still right that you'll need something more technologically advanced if you want to accommodate something other than standard NES controllers while DMC is playing at its highest frequency though, it's just that I'm not sure how common that is.
Fiskbit
Posts: 861
Joined: Sat Nov 18, 2017 9:15 pm

Re: Help me integrate get-put synchronized controller reading

Post by Fiskbit »

Your cycle counts are sound. Note, though, that some of the period time is spent on DMC DMA, so the code actually becomes unsafe at period minus 4 cycles (428, not 432). Also, you are correct that for our purposes here, DMA has to interrupt the exact cycle on which the joypad is read to delete a bit. (If DMA interrupts any CPU read from $4000-401F, dummy reads included, this can cause a joypad or $4015 read, but the only such reads here are $4016.)

There's also a pathological case where corruption and input changes occur on multiple frames in a row. If input for these frames is discarded, this can cause a button press during this window to be delayed the entire duration, or cause two presses that are deliberately at different times to end up being registered at the same time. It's probably very low odds of mattering, though, because input timing is already somewhat uncertain at lengths of around one frame because the user doesn't know when input is actually polled. (Indeed, under normal circumstances, a one-frame button press could be seen by the software as 2 frames or not seen at all, depending on how things line up.)
Post Reply