cycle for cycle stuff

Discuss emulation of the Nintendo Entertainment System and Famicom.

Moderator: Moderators

User avatar
blargg
Posts: 3717
Joined: Mon Sep 27, 2004 8:33 am
Location: Central Texas, USA
Contact:

Post by blargg »

I don't get Quietust's original comment in the first place. What he described is executing partial instructions, and it will handle the case of reading $2002 just after the VBL flag is cleared (as will any method which communicates the time of the memory read on the fourth instruction clock, even just read_memory( addr, timestamp + 3 )).
User avatar
Quietust
Posts: 1786
Joined: Sun Sep 19, 2004 10:59 pm
Contact:

Post by Quietust »

When someone says "executing partial instructions", I think of the ability to halt the CPU in the middle of an instruction. My approach is not the same - though it does emulate the individual cycles, it still must execute one full instruction at a time.
Quietust, QMT Productions
P.S. If you don't get this note, let me know and I'll write you another.
WedNESday
Posts: 1231
Joined: Thu Sep 15, 2005 9:23 am
Location: Berlin, Germany
Contact:

Post by WedNESday »

According to my diagram, would Nintendulator return $00 or $80?
ReaperSMS
Posts: 174
Joined: Sun Sep 19, 2004 11:07 pm

Post by ReaperSMS »

It would return $00.

As Q mentioned, his is cycle accurate, but execution is instruction granular, so if you tell it to execute N cycles, it likely will execute slightly fewer or slightly more, depending on where the instruction falls. In the grand scheme of things, keeping the rest of the system in-synch with this is trivial.

I wrote a CPU core that could halt in the middle of an instruction. To say it was slow would be a vast understatement. It wasn't terribly useful aside from the novelty of the idea.
User avatar
blargg
Posts: 3717
Joined: Mon Sep 27, 2004 8:33 am
Location: Central Texas, USA
Contact:

Post by blargg »

When someone says "executing partial instructions", I think of the ability to halt the CPU in the middle of an instruction.
If you're running the PPU and APU each CPU clock, then the CPU emulator is halting in the middle of an instruction:

Code: Select all

int opcode = read_mem( pc++ );
next_clock(); // halts here
switch ( opcode )
{
case 0xAD: // LDA abs
    int lo = read_mem( pc++ );
    run_ppu_and_apu(); // halts here
    int hi = read_mem( pc++ );
    run_ppu_and_apu(); // halts here
    a = read_mem( (hi << 8) | lo );
    run_ppu_and_apu(); // halts here
    set_nz( a );
    break;
...
}
What I take you to be meaning is that the CPU emulator function can return in the middle of an instruction, which would require something inefficient like

Code: Select all

case 0xAD: // LDA abs
switch ( phase++ )
{
case 0: lo = read_mem( pc++ ); break;
case 1: hi = read_mem( pc++ ); break;
caes 2: a = read_mem( (hi << 8) | lo ); break;
case 3: set_nz( a ); opcode = read_mem( pc++ ); phase = 0; break;
}
break;
where you have to call this four times to execute a single LDA instruction. These are equivalent if you're just turning around and calling it like this:

Code: Select all

while ( clocks_remain-- )
{
    run_one_cpu_clock();
    run_ppu_and_apu();
}
WedNESday
Posts: 1231
Joined: Thu Sep 15, 2005 9:23 am
Location: Berlin, Germany
Contact:

Post by WedNESday »

I understand what you are saying, but I prefer my method. First of all the 6502 emulator that I wrote will also be used on some of my other emulators (e.g. Atari 2600). Secondly it is easier to handle interrupts this way and I can also emulate the BRK bug (I am going for MAXIMUM accuracy here baby!). Thirdly is the simplicity of it all, as I am emulating what the NES does exactly. I also don't have the function overheads/increased .exe size from updating the PPU/APU like Nintendulator does. I have inlined all of the opcodes for maximum speed.

Example;

Code: Select all

inline void OpticCodeAD()
{
	switch(CPU.Cycle)
	{
		case 0:
			CPU.PC++;
			CPU.Cycle++;
			break;
		case 1:
			CPU.TMP2 = CPU.Memory[CPU.PC];
			CPU.PC++;
			CPU.Cycle++;
			break;
		case 2:
			CPU.TMP2 += (CPU.Memory[CPU.PC] << 8);
			CPU.PC++;
			CPU.Cycle++;
			break;
		case 3:
			CPU.A = CPU.Memory[CPU.TMP2];
			CPU.P &= 0x7D;
			if( !CPU.A )
				CPU.P += 0x02;
			CPU.P += (CPU.A & 0x80);
			CPU.Cycle = 0;
			break;
	}
	CPU.CC++;
}
User avatar
blargg
Posts: 3717
Joined: Mon Sep 27, 2004 8:33 am
Location: Central Texas, USA
Contact:

Post by blargg »

Thirdly is the simplicity of it all, as I am emulating what the NES does exactly.
The NES works via electrons moving in transistors (or even more basic, if you want to go to a subatomic level). An emulator doesn't emulate this. Most work at a higher level, emulating the behavior of the CPU instructions.
I also don't have the function overheads/increased .exe size from updating the PPU/APU like Nintendulator does. I have inlined all of the opcodes for maximum speed.
What you show above is probably slower since it adds lots of branching and function calls. But programmer intuition has never been what determines the speed of code. What does your profiler say?
WedNESday
Posts: 1231
Joined: Thu Sep 15, 2005 9:23 am
Location: Berlin, Germany
Contact:

Post by WedNESday »

Yeah, yeah I meant at a higher level anyway. I know that this method will make my emulator very slow due to the switch/case branching but with since I have inlined every opcode there are actually no function calls at all. As for my profiler, I don't have one, but my probation officer says that if I don't keep my nose clean, it'll be back to the state pen. for me.
Guest

Post by Guest »

blargg: technically, yes it's stopping in the middle of the instruction. as far as performance goes, there's drastic differences between the two.

WdNESday: if you think about it, there is no logical difference between the two approaches. If the other parts are implemented properly, the CPU won't be able to tell the difference, and the only thing you get out of that approach is a slight cleanup in the outer loop running the CPU core. Going down that route for the purpose of personal curiousity is fine, but keep in mind that you get no technical benefit, and a slowdown of about 100x compared to the instruction-granular with cycle-accurate side effects approach.
WedNESday
Posts: 1231
Joined: Thu Sep 15, 2005 9:23 am
Location: Berlin, Germany
Contact:

Post by WedNESday »

Thanks, for the advice. I know that it will make it slower, but I am implementing this because the core will also be used in other console/computers on other emulators. Also I like the simplicity that it involves.

For example, the (NTSC) VBlank time is 2273 (.3) cc's. If we are on cycle 2272 and STA Absolute is executed then then first cycle wouldn't need any PPU drawing/fetching, but the others would. Observe;

Code: Select all

for( int cc = 0; cc < 2273; )
{
    FetchOpcode();
}

for( int cc = 0; cc < 29393; )
{
    FetchOpcode();
    Draw3Pixels();
}
This way, if we are in a VBlank period the PPU won't need any checking. My method ensures that there are no wasted calls to Draw3Pixels(). Also observe the following; (let's say that we are on a different console/computer)

Code: Select all

inline void OpticCodeAD() 
{ 
   switch(CPU.Cycle) 
   { 
      case 0: 
         CPU.PC++; 
         CPU.Cycle++; 
         break; 
      case 1: 
         CPU.TMP2 = CPU.Memory[CPU.PC];
         CPU.PC++; 
         CPU.Cycle++; 
         break; 
      case 2: 
         CPU.TMP2 += (CPU.Memory[CPU.PC] << 8); 
         CPU.PC++; 
         CPU.Cycle++; 
         break; 
      case 3: 
         CPU.A = CPU.Memory[CPU.TMP2]; 
         CPU.P &= 0x7D; 
         if( !CPU.A ) 
            CPU.P += 0x02; 
         CPU.P += (CPU.A & 0x80); 
         CPU.Cycle = 0; 
         break; 
   } 
   CPU.CC++; 
}
Let's pretend that after case # 1 was executed there was some kind of automatic bankswitching that meant that a different high byte was fetched. This would ensure that the correct byte is fetched.
User avatar
Zepper
Formerly Fx3
Posts: 3264
Joined: Fri Nov 12, 2004 4:59 pm
Location: Brazil
Contact:

Post by Zepper »

-deleted-
Last edited by Zepper on Sun Jun 21, 2009 8:16 pm, edited 1 time in total.
ReaperSMS
Posts: 174
Joined: Sun Sep 19, 2004 11:07 pm

Post by ReaperSMS »

There's nothing stopping instruction granular from handling the situation you mention regarding a timed bankswitch, if implemented correctly

If the switch is timed, then it should be updated per-cpu-cycle like the rest of the hardware, and the memory accesses should realize that side effects will possibly invalidate the direct fetches, so the memory fetch should go through code rather than direct access.
User avatar
blargg
Posts: 3717
Joined: Mon Sep 27, 2004 8:33 am
Location: Central Texas, USA
Contact:

Post by blargg »

Any NES CPU emulator which includes the timestamp of memory accessess can be used as the basis for a "cycle-accurate" NES emulator. The general rule is, any number of hardware modules can be emulated on an as-needed ("catch-up") basis as long as the future effects of all but one module on others can easily be predicted in advance. This is the case for the NES, where the CPU is the only entity whose future effect can only be determined by doing the actual emulation.
WedNESday
Posts: 1231
Joined: Thu Sep 15, 2005 9:23 am
Location: Berlin, Germany
Contact:

Post by WedNESday »

Ok, today i finally finished the new cycle-for-cycle accurate 6502 emulator. I immediately hooked it up to WedNESday to test it out. I didn't bother to include any PPU/APU accesses, no memory mapping/trapping, no blitting, x1 window, as I just wanted a rough estimate of how slow the core was.

Boy, nothing could prepare me for it.

On my P4 2.2GHZ I had 60FPS, and full 30 times slower than the previous core, which had 1800FPS in the same situation.

Please don't say I told you so. I did listen to you guys and I always agreed with you all the way it was just that I wanted to give it a try because no one had done it before.
User avatar
lord_Chile
Posts: 120
Joined: Thu Feb 02, 2006 7:07 am
Location: Chile (South America), Quilpué
Contact:

a question

Post by lord_Chile »

what is the name of Quietust emulator???? do you release it???
Good day to nesdev people. Lord..
Author of nothing =P
UTFSM Sansano programmer.. lord_Chile
Saludos a la Sede JMC de la UTFSM... Viña del Mar, CHILE
Post Reply