6.nmi_disable #7 + 7.nmi_timing #3

Discuss emulation of the Nintendo Entertainment System and Famicom.

Moderator: Moderators

Post Reply
L. Spiro
Posts: 14
Joined: Sun Jan 29, 2023 11:27 pm

6.nmi_disable #7 + 7.nmi_timing #3

Post by L. Spiro »

I’m grouping these together because I suspect there is a common separate issue causing both of them. This is only my suspicion so they might also be separate individual issues.

These are in the vbl_nmi_timing tests.
6.nmi_disable #7 is 7) NMI should occur when disabled 2 PPU clocks after VBL
7.nmi_timing #3 is 3) NMI occurred 2 PPU clocks too early

It’s the 2nd that makes me think something very unhip is happening, because my V-blank is definitely not occurring any number of PPU cycles off.
In this document (taken using 7.nmi_timing, which suggests my NMI timing is off) I have printed every PPU cycle where V is set, every read to $2002, each read that results in suppression, and each NMI triggination:
https://www.dropbox.com/s/v8hltpq7r5flc ... g.txt?dl=0

V is set on 241,1 (with an NMI) and cleared on 261,1, and nothing can change these PPU timings. As the print-out shows, there are the correct number of PPU cycles taking place.
So this should mean the issue must either be:
#1: That the CPU has lost count of correct cycles while running the tests or;
#2: There is an edge case I am not covering such as a 2-cycle delay between something being set on the CPU/PPU and being seen by the PPU/CPU.

The problem with #1 is:
#1: My CPU passes all CPU tests, including all 2,550,000 of these: https://github.com/TomHarte/ProcessorTe ... in/nes6502
Every single opcode is verified to be completely cycle-accurate, and each cycle is verified to be correctly a read or write cycle, each read and write reads from or writes to the correct address, and all register and memory side-effects are known to be correct for every single instruction.
#2: It doesn’t seem to have any timing issues with the previous checks that the vbl_nmi_timing test ROM’s make. For 6.nmi_disable, I got stuck on previous error codes and had to make modifications to get past them. The modifications made sense and it seemed as if the whole test ROM was working perfectly fine. Until it gets to test #7 and refuses to succeed even though I definitely don’t disable NMI if $2002 is read on 241,3+. Rather, it appears in those cases that NMI doesn’t fire because the NMI flag is not set.

BUT I don’t have a working APU yet, so the only interrupt happening is for NMI. The APU interrupts didn’t seem to cause troubles in the previous checks for each ROM and I can’t imagine that the APU would be set to send interrupts during CPU cycle-timing operations.



Could it be a CPU cycle-timing issue?
An edge-case where some register read/write is supposed to be delayed by some cycles?

Why is the NMI flag disabled during most of the reads in the linked log file? For example:

Code: Select all

VBlank Set: 261, 0.  Frame: 20
Write $2000 NMI ON: 240, 299.  Frame: 21
NMI TRIGGINATED: 241, 1.  Frame: 21
VBlank Set: 241, 1.  Frame: 21
VBlank Set: 241, 2.  Frame: 21
VBlank Set: 241, 3.  Frame: 21
…
REad $2002: 240, 324.  Frame: 22
NMI NOT TRIGGINATED DUE TO FLAG: 241, 1.  Frame: 22
VBlank Set: 241, 1.  Frame: 22
VBlank Set: 241, 2.  Frame: 22
VBlank Set: 241, 3.  Frame: 22
VBlank Set: 241, 4.  Frame: 22
REad $2002: 241, 4.  Frame: 22 <- The read doesn’t suppress NMI, but the NMI flag isn’t set.
Write $2000 NMI OFF: 241, 46.  Frame: 22
What could be the issue?


L. Spiro
L. Spiro
Posts: 14
Joined: Sun Jan 29, 2023 11:27 pm

Re: 6.nmi_disable #7 + 7.nmi_timing #3

Post by L. Spiro »

https://github.com/L-Spiro/BeesNES/blob ... NPpu2C0X.h

This is my code for reading $2002.

Code: Select all

		static void LSN_FASTCALL						Read2002( void * _pvParm0, uint16_t /*_ui16Parm1*/, uint8_t * /*_pui8Data*/, uint8_t &_ui8Ret ) {
			CPpu2C0X * ppPpu = reinterpret_cast<CPpu2C0X *>(_pvParm0);
			// Reading $2002 within a few PPU clocks of when VBL is set results in special-case behavior.
			if ( ppPpu->m_ui16CurY == (_tPreRender + _tRender + _tPostRender) ) {
				// Reading one PPU clock before reads it as clear and never sets the flag or generates NMI for that frame.
				if ( ppPpu->m_ui16CurX == (1 - 1) ) {
					ppPpu->m_bSuppressNmi = true;
					ppPpu->m_psPpuStatus.s.ui8VBlank = 0;
					//::OutputDebugStringA( ("ULTIMATE MOVE: SUPPRESSION!!\r\n") );
				}
				else if ( ppPpu->m_ui16CurX == (1 - 0) || ppPpu->m_ui16CurX == (1 + 1) ) {
					// Reading on the same PPU clock or one later reads it as set, clears it, and suppresses the NMI for that frame.
					ppPpu->m_bSuppressNmi = true;
					ppPpu->m_psPpuStatus.s.ui8VBlank = 1;
					//::OutputDebugStringA( ("ULTIMATE MOVE: SUPPRESSION!!\r\n") );
				}
			}

			//::OutputDebugStringA( ("REad $2002: " + std::to_string( ppPpu->m_ui16CurY ) + ", " + std::to_string( ppPpu->m_ui16CurX ) + ".  Frame: " + std::to_string( ppPpu->m_ui64Frame ) + "\r\n").c_str() );


			ppPpu->m_ui8IoBusLatch = (ppPpu->m_ui8IoBusLatch & 0x1F) | (ppPpu->m_psPpuStatus.ui8Reg & 0xE0);
			_ui8Ret = ppPpu->m_ui8IoBusLatch;
			// Reads cause the v-blank flag to reset.
			ppPpu->m_psPpuStatus.s.ui8VBlank = 0;
			ppPpu->m_pnNmiTarget->ClearNmi();
			// The address latch also gets reset.
			ppPpu->m_bAddresLatch = false;

			/*char szBuffer[256];
			std::sprintf( szBuffer, "Read2002: %.2X Frame: %u [%u,%u]\r\n", _ui8Ret, uint32_t( ppPpu->m_ui64Frame ), ppPpu->GetCurrentRowPos(), ppPpu->GetCurrentScanline() );
			::OutputDebugStringA( szBuffer );*/
		}


On PPU [241,1], TriggerNmi() is called.

Code: Select all

		/**
		 * Unless NMI is suppressed, this sets the VBL flag and triggers NMI.
		 */
		inline void										TriggerNmi() {
			if ( !m_bSuppressNmi ) {
				m_psPpuStatus.s.ui8VBlank = 1;
				if ( m_pcPpuCtrl.s.ui8Nmi ) {
					m_pnNmiTarget->Nmi();
					//::OutputDebugStringA( ("NMI TRIGGINATED: " + std::to_string( m_ui16CurY ) + ", " + std::to_string( m_ui16CurX ) + ".  Frame: " + std::to_string( m_ui64Frame ) + "\r\n").c_str() );
				}
				else {
					//::OutputDebugStringA( ("NMI NOT TRIGGINATED DUE TO FLAG: " + std::to_string( m_ui16CurY ) + ", " + std::to_string( m_ui16CurX ) + ".  Frame: " + std::to_string( m_ui64Frame ) + "\r\n").c_str() );
				}
			}
			else {
				//::OutputDebugStringA( ("NMI NOT TRIGGINATED DUE TO SUPPRESSION: " + std::to_string( m_ui16CurY ) + ", " + std::to_string( m_ui16CurX ) + ".  Frame: " + std::to_string( m_ui64Frame ) + "\r\n").c_str() );
			}
		}
When m_pnNmiTarget->Nmi(); is called on the CPU, it sets a bool that is basically the NMI status line. This is basically the PPU setting an NMI flag.

Code: Select all

	/**
	 * Notifies the class that an NMI has occurred.
	 */
	void CCpu6502::Nmi() {
		m_bNmiStatusLine = true;
		//m_ui8NmiCounter = 7;
	}
This status line is checked by the CPU each cycle, and, it set, it sets an internal CPU flag that tells the CPU to go to the NMI handler on the next instruction.

Code: Select all

	/**
	 * Performs a single cycle update.
	 */
	void CCpu6502::Tick() {
		(this->*m_pfTickFunc)();

		//m_bHandleNmi |= (m_bNmiStatusLine && --m_ui8NmiCounter == 0);
		m_bHandleNmi |= m_bNmiStatusLine;
		m_bHandleIrq |= m_bIrqStatusLine;

		++m_ui64CycleCount;
	}
Setting the internal flag, m_bHandleNmi is done at the end of the current CPU cycle.
So if the NMI status line was set at the end of the next CPU cycle, m_bHandleNmi gets set. We have a delay of 1 CPU cycle before the NMI can be handled because of this. This should be correct behavior.
The internal CPU flag is checked at the start of the next instruction.

Code: Select all

	/** Fetches the next opcode and begins the next instruction. */
	void CCpu6502::Tick_NextInstructionStd() {
		if ( m_bHandleNmi ) {
			m_bNmiStatusLine = false;
			BeginInst( LSN_SO_NMI );
		}
		else if ( m_bHandleIrq ) {
			m_bIrqStatusLine = false;
			BeginInst( LSN_SO_IRQ );
		}
		else {
			FetchOpcodeAndIncPc();
		}
	}
Now that I laid this all out I am seeing that it does not quite accurately model what is mentioned here: https://www.nesdev.org/wiki/CPU_interrupts
#1: If the PPU trigger NMI edge on the last CPU cycle of an instruction, m_bHandleNmi gets set at the end of that instruction and then NMI gets handled immediately after, so there is in fact not a delay of a single cycle that allows me to adhere to "As can be deduced from above, it's really the status of the interrupt lines at the end of the second-to-last cycle that matters."
#2: I really need to do proper edge detection on NMI instead of this: m_bHandleNmi |= m_bNmiStatusLine;
That was supposed to be a placeholder.

So I will fix those when I get home and then report back that it didn't fix anything and I still have this problem.


L. Spiro
L. Spiro
Posts: 14
Joined: Sun Jan 29, 2023 11:27 pm

Re: 6.nmi_disable #7 + 7.nmi_timing #3

Post by L. Spiro »

https://github.com/L-Spiro/BeesNES/comm ... 1a2382b012

NMI is now properly edge-detected although that didn’t impact anything.
It is now properly delayed by 1 cycle, which can push it back by an instruction when it occurs on the last cycle of the current instruction. IE, it matters what the status line says on the second-to-last cycle, as mentioned on the Wiki.

The error changed from 7.nmi_timing #3 to 7.nmi_timing #7:
3) NMI occurred 2 PPU clocks too early
7) NMI occurred 1 PPU clock too late

So all I did was move detection back by 1 3-cycle instruction. It’s still equally wrong.

Ideathoughts?


L. Spiro
notyourdad
Posts: 16
Joined: Sun Mar 12, 2023 5:49 am

Re: 6.nmi_disable #7 + 7.nmi_timing #3

Post by notyourdad »

I haven't tried to grok all of your code, but I'll share my experience with this. I also struggled a lot with getting the NMI timing tests to work in my NYDES FPGA emulator. Until I started honoring the following note from the CPU interrupts document, I couldn't eliminate all of the off-by-1 issues:
The NMI input is connected to an edge detector. This edge detector polls the status of the NMI line during φ2 of each CPU cycle...
For my design, I found that I could not perform edge detection right before running a CPU cycle. Rather, I had to perform it in a plausible place for the phi2 tick. I tried several options, but I ended up getting it to work by performing interrupt detection after the first PPU cycle following a CPU cycle. I'm afraid to send you on a wild goose chase trying to replicate my approach, but I'll share the order of operations for each full CPU / 3xPPU cycle in my system.
  1. Perform a CPU tick (a single fetch, decode, execute, etc.)
  2. Perform a single APU and PPU tick simultaneously (either can depend on register requests queued up by the CPU in step 1)
  3. Run a mapper "tick" to check for interrupts (used by MMC3 address sniffing)
  4. Perform interrupt edge / level detection (either from the APU/PPU in step 2 or the mapper in step 3)
  5. Perform a single PPU tick
  6. Perform a single PPU tick
  7. Go back to 1
Before any tick can proceed (steps 1, 2, 5, 6), the PRG or CHR bus (for CPU/APU and PPU, respectively) must be ready with data from the previous request, if any. In software, you'll probably not have to deal with memory requests being asynchronous.

The interrupt detectors in step 4 store the interrupt "armed" states. The CPU tick in step 1 tracks whether to enter an interrupt "request" state based on the "armed" state being true. The "request" state will get handled on the following CPU tick. Finally, entering the request state can be prevented on the third cycle of a taken branch with no page crossing:
Interrupts are always polled before the second CPU cycle (the operand fetch), but not before the third CPU cycle on a taken branch.
For my design, I found that this delicate dance was the heart of the order of operations. I had to adjust all the other timing problems (vsync, APU interrupts, MMC3 interrupts, etc.) around it. It's certainly probable that other orders of operations will work too, but this is where my system settled.

This ordering should certainly not be treated as canonical, because it fits only a subset of CPU/PPU alignments. With this approach, I got all the blargg timing tests working, but the scanline test still shows some artifacts. It appears that FCEUX passes this scanline test, but Mesen and BizHawk both show some artifacts too, for whatever that's worth.

[edit: fixed the description of the mapper update in step 3]
L. Spiro
Posts: 14
Joined: Sun Jan 29, 2023 11:27 pm

Re: 6.nmi_disable #7 + 7.nmi_timing #3

Post by L. Spiro »

Maybe I need to try something like this, since I think I have tried everything else.
Well, maybe for now I can let it be and work on the APU, then return to this issue when I make my actual emulator: a sub-cycle emulator that properly goes through rising and falling edges.

Though it would be nice to have this working in my current cycle-accurate prototype for reference. Either way it’s burned me out so I think I will get back to it later.


L. Spiro
ReyBrujo
Posts: 1
Joined: Sun Aug 06, 2023 10:10 am

Re: 6.nmi_disable #7 + 7.nmi_timing #3

Post by ReyBrujo »

L. Spiro wrote: Mon Mar 13, 2023 1:45 am#1: My CPU passes all CPU tests, including all 2,550,000 of these: https://github.com/TomHarte/ProcessorTe ... in/nes6502
Every single opcode is verified to be completely cycle-accurate, and each cycle is verified to be correctly a read or write cycle, each read and write reads from or writes to the correct address, and all register and memory side-effects are known to be correct for every single instruction.
Sorry to resurrect a dead thread but I kind of discovered an issue with some illegal opcodes between the processor tests by Tom and the Nestest golden log: It looks like opcode 0C only lists 3 cycles when documentation from several sources indicate 4 (including NesDev wiki and going all the way back to Atari and Commodore 64 documentation). Also opcodes 1C, 3C, 5C, 7C, DC and FC all show samples with 4 cycles in the test files but whenever a page boundary is crossed it should add an extra read as the fourth one is wrong. That throws me off by a few cycles, however implementing the extra reading in 0C and the one when crossing pages in the others makes my cycle count match with the log perfectly (note that I don't have PPU emulation yet, only CPU so I run it in batch).

Did you find that problem too? Did you just add the extra readings and let the test break for those 7 opcodes? I reported the issue at Github just in case but for the time being it looks like I need to choose between passing the tests or perfect CPU emulation.

(Edited almost a month later: Just in case anyone else ever comes here for the same question, the processor tests for opcodes 0c, 1c, 3c, 5c, 7c, dc and fc were indeed incorrect. After some back-and-forth he fixed the test files so now a CPU emulator that passes every single processor test (at least all the ones that are necessary for nestest plus the 80 used by Beauty and the Beast) will correctly match the golden log (including cycle count) except flag at the beginning).
Post Reply