It was discovered by a user in the NESdev Discord server that at high playback rates (particularly $0F = 33143 Hz), the DMC DMA bit deletion bug can corrupt both the first and second reads. The symptom, as seen in games like Gimmick!, is occasional presses of Right on the Control Pad when bit deletion causes the 11111111 epilogue to spill into the first eight reads after a strobe. It may, for example, delete an early bit of the first read and a late bit of the second read. In each case, the input bits both get corrupted to 00000001, and because they match, the game accepts them as valid.
A staff member claims that I am doing the NESdev community a disservice by allowing this flawed bit deletion avoidance routine to remain in my repositories. I ought to either not provide bit deletion avoidance at all or use OAM DMA to synchronize to the DMA unit's get-put cycle, based on an idea originally proposed by Rahsennor in May 2016 in topic "Glitch-free controller reads with DMC?". However, I have found a few challenges of a get-put synchronized controller reading routine.
Incompatibility with frameskip
Full Quiet, Garbage Pail Kids, and any future games that we make on the same engine avoid slowdown by replacing it with frameskip. Normally, they alternate code to move game characters with code to draw the updated positions: move, draw, move, draw. If, however, they detect that the previous move and draw combined took longer than the 29,780 cycles of one frame, they run another move without the draw. This helps keep the game responsive even in cases where it would otherwise slow down, reducing perceived input lag. Because the sound driver gets called for each move, the player still gets instant feedback through sound effects even if video is delayed a frame.
The get-put synchronized controller reading routine must execute immediately after OAM DMA. This means that if OAM DMA is not scheduled, such as during a draw frame, the controller cannot be read during that same frame. A few workarounds have been proposed:
- The first is to run OAM DMA outside vertical blanking, relying on the data not being written anywhere. I fear what effect this might have on the internal OAM address (and therefore on object evaluation) during the scanlines where OAM DMA is taking place. I also fear OAM DMA spilling into the next frame's vblank if lag gets deep enough that calculation completes somewhere in lines 237 through 240.
- The second is to double-buffer shadow OAM. One problem with this is that my object plotting routine is already using register Y to index into a metasprite shape definition in ROM using (zp),Y mode, leaving register X free to index into a shadow OAM table at a fixed address using abs,X mode. Double-buffering shadow OAM would require shadow OAM table's address not to be fixed. This would probably need to use the rarely used (zp,X) mode with constant X = $00.
- The third is to optimize my code better. I don't know how to do this. Collision detection for the player, the player's projectile, and ten on-screen enemies with a background collision map at 8×4-pixel granularity takes a while, and I don't know how to search the collision map any faster than I already do. How much do you charge for this service?
- The fourth is to just accept increased input lag. I currently do this on every detected corruption, using the previous frame's data. I'd end up having to do this on every lag frame, reducing the overall input polling rate to 30 Hz. If samples are used for percussion, as in Konami games and Garbage Pail Kids, it should still be safe to poll the controller once the sample length has run out ($4015 bit 4 = 0). If samples are used for bass, chords, or other sustained sounds, as in later Sunsoft games, there is so much DMC active time that lag becomes more noticeable.
Full Quiet, Garbage Pail Kids, and any future games that we make on the same engine perform a handful of tasks after OAM DMA. These can spill over into the pre-render scanline and scanline 0 of the picture, which are never and practically never visible respectively. They mostly involve setting up the MMC3's timer and CHR banks. I would need to make them constant-timed so as not to disrupt synchronization.
IRQs landing during controller reading
Full Quiet, Garbage Pail Kids, and any future games that we make on the same engine scroll in eight directions. This means the raster split points move up and down while jumping or climbing ladders. If there is a raster split near the top of the screen, it could interrupt controller reading and thereby cause it to lose synchronization. I would need to hide the background in the top scanlines so that raster splits can never occur there.
Integration testing
What emulators can be set to log or break when $4016 or $4017 accesses occur on the wrong half of the get-put cycle?