Well, here's what I've got for now, just figuring out some kind of test framework to make these measurements.
I've got code that synchronizes with VBL, then has a STA ABS instruction execute as late as possible just before NMI is vectored, and also have NMI just get vectored before that instruction (same code, does the before/after depending on CPU-PPU alignment after reset). On the scope is /NMI going low, and to show what the CPU is doing, A8. The STA ABS is at an address with A8 high, and the instruction just before it has A8 low (a JMP) to it. The STA ABS stores into $00xx, so A8 goes low during the store. During NMI vectoring, A8 is high. So you can see what's happening. I figured an address line would assert sooner than anything else. If anyone has better ideas, I'm open to them.
In the pictures there are 1250ns/division (250ns/subdivision dots). A=A8 B=/NMI. Sorry about leaving the vertical cursor bars up.
In the first one, /NMI goes low just too late, so the STA ABS executes. A8 goes high about 612ns after /NMI goes low. It does three fetches with A8 high (opcode, two bytes of address), 1780ns, then low for 560ns, then high for 3920ns, then low (the NMI handler code has A8 low).
In the second one, /NMI goes low just in time, so you get 3920ns of vectoring with A8=high, then A8=low for NMI handler.
(For triggering this, I run the scope in one-shot digital capture mode, and trigger it using EXT TRIG connected to the $4016 read strobe with a BIT $4016 a little before these events)
I've figured out how to capture the scope's RS-232 plotter output for better pictures (hp2xx FTW!), but need to get a DB-9 F-F null modem adapter before it's convenient.
CPU/PPU timing
Moderator: Moderators
Re: CPU/PPU timing
I just figured out that you can make reads start at an exact master cycle in Visual 2C02 (see the updated http://wiki.nesdev.com/w/index.php/Visual_2C02 for how. The io_ce node can be used to confirm where the read starts). The duty cycle is the same as the 2A03's, and it assumes the value is sampled at the end of the read cycle.
There seems to be four cycles where the read can start where the interrupt line and the VBL flag setting gets suppressed completely, so for those it's probably safe to say that no NMI would happen on the real thing. For other start cycles INT/VBL rises momentarily, and those might be trickier I guess. Maybe this info could be combined with some of the earlier stuff to figure out what's going on.
Brain dump below with results from Visual 2C02. The numbers at the top are PPU dots on the line after the post-render line. The 0's are the first phase of the read, the 1's the second (the duty cycle is 5/8).
Some kind of buffering is used in the PPU for the value "returned" by the read, which is why it can be read as set even though INT drops really quickly.
There seems to be four cycles where the read can start where the interrupt line and the VBL flag setting gets suppressed completely, so for those it's probably safe to say that no NMI would happen on the real thing. For other start cycles INT/VBL rises momentarily, and those might be trickier I guess. Maybe this info could be combined with some of the earlier stuff to figure out what's going on.
Brain dump below with results from Visual 2C02. The numbers at the top are PPU dots on the line after the post-render line. The 0's are the first phase of the read, the 1's the second (the duty cycle is 5/8).
Code: Select all
-2 -1 0 1 2 3 4
* * * * * * * * * * * * * * * * * * * * * * * * * * * * * MASTER
000000000111111111111111 CPU Reads as clear
----------------------------... VBL/INT
-2 -1 0 1 2 3 4
* * * * * * * * * * * * * * * * * * * * * * * * * * * * * MASTER
000000000111111111111111 CPU Reads as clear
----------------------------... VBL/INT
-2 -1 0 1 2 3 4
* * * * * * * * * * * * * * * * * * * * * * * * * * * * * MASTER
000000000111111111111111 CPU Reads as clear
--------------------------... VBL/INT
-2 -1 0 1 2 3 4
* * * * * * * * * * * * * * * * * * * * * * * * * * * * * MASTER
000000000111111111111111 CPU Reads as clear
------------------------... VBL/INT
-2 -1 0 1 2 3 4
* * * * * * * * * * * * * * * * * * * * * * * * * * * * * MASTER
000000000111111111111111 CPU Reads as clear
----------------------... VBL/INT
-2 -1 0 1 2 3 4
* * * * * * * * * * * * * * * * * * * * * * * * * * * * * MASTER
000000000111111111111111 CPU Reads as clear, INT and VBL completely suppressed
VBL/INT
-2 -1 0 1 2 3 4
* * * * * * * * * * * * * * * * * * * * * * * * * * * * * MASTER
000000000111111111111111 CPU Reads as clear, INT and VBL completely suppressed
VBL/INT
-2 -1 0 1 2 3 4
* * * * * * * * * * * * * * * * * * * * * * * * * * * * * MASTER
000000000111111111111111 CPU Reads as clear, INT and VBL completely suppressed
VBL/INT
-2 -1 0 1 2 3 4
* * * * * * * * * * * * * * * * * * * * * * * * * * * * * MASTER
000000000111111111111111 CPU Reads as clear, INT and VBL completely suppressed
VBL/INT
-2 -1 0 1 2 3 4
* * * * * * * * * * * * * * * * * * * * * * * * * * * * * MASTER
000000000111111111111111 CPU Reads as set (note the short INT assertion)
- VBL/INT
-2 -1 0 1 2 3 4
* * * * * * * * * * * * * * * * * * * * * * * * * * * * * MASTER
000000000111111111111111 CPU Reads as set
--- VBL/INT
-2 -1 0 1 2 3 4
* * * * * * * * * * * * * * * * * * * * * * * * * * * * * MASTER
000000000111111111111111 CPU Reads as set
----- VBL/INT
-2 -1 0 1 2 3 4
* * * * * * * * * * * * * * * * * * * * * * * * * * * * * MASTER
000000000111111111111111 CPU Reads as set
------- VBL/INT
-2 -1 0 1 2 3 4
* * * * * * * * * * * * * * * * * * * * * * * * * * * * * MASTER
000000000111111111111111 CPU Reads as set
--------- VBL/INT
-2 -1 0 1 2 3 4
* * * * * * * * * * * * * * * * * * * * * * * * * * * * * MASTER
000000000111111111111111 CPU Reads as set
----------- VBL/INT
-2 -1 0 1 2 3 4
* * * * * * * * * * * * * * * * * * * * * * * * * * * * * MASTER
000000000111111111111111 CPU Reads as set
------------- VBL/INT
-2 -1 0 1 2 3 4
* * * * * * * * * * * * * * * * * * * * * * * * * * * * * MASTER
000000000111111111111111 CPU Reads as set
--------------- VBL/INT
Re: CPU/PPU timing
Taking just the ticks from the "preferred" alignment, you get the following:
I'm guessing that those correspond, in order, to 2 before, 1 before, at, and 1 after, as specified in http://wiki.nesdev.com/w/index.php/PPU_frame_timing and the vbl_nmi_timing tests. If the NMI input is sampled near the end of a CPU cycle (not sure if it is), then it makes sense that the last two would miss the NMI even though it momentarily rises. Later starting points would have parts of the NMI assertion overlapping the end of the previous CPU cycle however, and so would not be missed.
Edit: I'm assuming this alignment corresponds to your 742. Is that correct? Just realized it's a bit tricky to see whether this one would be 742 or 743 in http://i.imgur.com/nq78U8I.gif .
Code: Select all
-2 -1 0 1 2 3 4
* * * * * * * * * * * * * * * * * * * * * * * * * * * * * MASTER
000000000111111111111111 CPU Reads as clear
----------------------------... VBL/INT
-2 -1 0 1 2 3 4
* * * * * * * * * * * * * * * * * * * * * * * * * * * * * MASTER
000000000111111111111111 CPU Reads as clear, INT and VBL completely suppressed
VBL/INT
-2 -1 0 1 2 3 4
* * * * * * * * * * * * * * * * * * * * * * * * * * * * * MASTER
000000000111111111111111 CPU Reads as set (note the short INT assertion)
- VBL/INT
-2 -1 0 1 2 3 4
* * * * * * * * * * * * * * * * * * * * * * * * * * * * * MASTER
000000000111111111111111 CPU Reads as set
--------- VBL/INTEdit: I'm assuming this alignment corresponds to your 742. Is that correct? Just realized it's a bit tricky to see whether this one would be 742 or 743 in http://i.imgur.com/nq78U8I.gif .
Re: CPU/PPU timing
I've done several new test and am finishing them and going to post the readings. I'm forming a picture that isn't pleasing of the meta-situation here. I'm thinking there's no clear "this happens at this moment" situation, especially since the CPU can be at any master clock offset with respect to the PPU. That's less than 50ns, so a few gate propagation delays can mean the difference between it being read on one cycle or another. So because of buffers/counters there are between the clock and what the CPU reads/writes, timing is spread across cycles and dots. I think this means that there won't be a clear "this happens on this cycle, that happens on that", unless you're emulating to master clocks and emulating all four CPU-PPU alignments. Even then, you'll have to implement delays, where something can happen but not be visible until a master clock later or so.
Re: CPU/PPU timing
What I'm doing above is picking a particular alignment for which the behavior is known (742 I believe - would be nice to have this confirmed), checking the results you get for different PPU clock offsets near a point (or region perhaps) of interest with that alignment in Visual 2C02 (VBL setting and INT assertion in this case), and then doing some elimination to figure out what these offsets correspond to ("1 before", "at", "1 after", etc.). Of course you can only be completely sure by testing on the real hw, but I think you could still come up with some pretty good guesses.blargg wrote:I've done several new test and am finishing them and going to post the readings. I'm forming a picture that isn't pleasing of the meta-situation here. I'm thinking there's no clear "this happens at this moment" situation, especially since the CPU can be at any master clock offset with respect to the PPU. That's less than 50ns, so a few gate propagation delays can mean the difference between it being read on one cycle or another. So because of buffers/counters there are between the clock and what the CPU reads/writes, timing is spread across cycles and dots. I think this means that there won't be a clear "this happens on this cycle, that happens on that", unless you're emulating to master clocks and emulating all four CPU-PPU alignments. Even then, you'll have to implement delays, where something can happen but not be visible until a master clock later or so.
Barring me having messed up, there doesn't seem to be any other way to assign 1 before, etc., with the 742 alignment that would be likely to produce the observed behavior, even taking propagation delays and such into account.
Edit: s/master clock offsets/PPU clock offsets/
Edit 2: Calling it "1 before", etc. is a bit arbitrary. It's not a point, as you say, and depends on CPU/PPU interaction around the region of interest. I'm basically just trying to figure out what happens when a read starts at PPU tick n for the 742 alignment at the moment.
Re: CPU/PPU timing
OK, finally posting a lot of timing test results. Not complete by any measure, but if I don't post they might get lost in the shuffle of life. Pictures and some docs: nes-signal-timings.zip
The goal here is to get a clearer picture of when things happen and how these correspond to the various test ROMs and timings we see from a purely programming perspective.
* terms - lays out basic terms, the four alignments, why they aren't trivial, how we can synchronize to the PPU frame, and a common timing reference in software to use
* timing pictures - descriptions of all the scope pictures showing various timings
* CPU cycle timing - distillation of the timings into one diagram
One thing that is wanted is a picture of how CPU cycles correspong to PPU dots. It's still not clear when those begin.
I believe that for the CPU, it may be considered beginning/ending on the falling edges of PHI2. I've used the CPU address line changes as a 0 reference, since it's useful for showing which cycle is of interest. I measured PHI2's falling edge to be 60ns before the address line change, and have read that this might be the "proper" beginning a CPU cycle.
Do we have verification that /VBL goes low on the first dot of a frame? If so, then we have enough to form a clear picture of the timings for the four alignments.
The PPU apparently latches internally when reading from it, so the CPU sees the state of things around the time PPU /CE is asserted. For example, even though /VBL goes high during the read, D7 to the CPU doesn't change, indicating that the PPU has a latch for outputs when reading from it.
For what I'm calling the +0 clocks alignment (in the previously posted list, the last is +0 clocks, the next-to-last +1 clocks, etc.), we have CPU A8 going low for the access at +0ns, then at +163ns PPU /CE going low, presumably latching the VBL flag internally. If the CPU had read even one clock sooner, it wouldn't have found VBL set. So if VBL is set on the first dot, then we have a dot beginning around +163ns, and the CPU cycle at +0ns. A PPU dot is 186.24ns, so this is slightly less than a dot. There's probably a little propagation delay from the dot beginning to VBL getting set internally, so the dot probably begins a little sooner.
So there are the probable timings, with the two interpretations of a CPU cycle shown.
Note that if we examine only CPU reads, it's irrelevant where the CPU cycle is, since all the reads occur at the same point. The important thing is that an alignment of +0 clocks corresponds to the CPU reading the VBL flag the earliest possible in the frame.
I haven't looked as much at writes. I'm guessing that they occur on the rising edge of the /CE line, once data lines have stabilized. SO that would put them about 352ns after reads, or about 7.6 clocks later, slightly less than 2 dots.
The goal here is to get a clearer picture of when things happen and how these correspond to the various test ROMs and timings we see from a purely programming perspective.
* terms - lays out basic terms, the four alignments, why they aren't trivial, how we can synchronize to the PPU frame, and a common timing reference in software to use
* timing pictures - descriptions of all the scope pictures showing various timings
* CPU cycle timing - distillation of the timings into one diagram
One thing that is wanted is a picture of how CPU cycles correspong to PPU dots. It's still not clear when those begin.
I believe that for the CPU, it may be considered beginning/ending on the falling edges of PHI2. I've used the CPU address line changes as a 0 reference, since it's useful for showing which cycle is of interest. I measured PHI2's falling edge to be 60ns before the address line change, and have read that this might be the "proper" beginning a CPU cycle.
Do we have verification that /VBL goes low on the first dot of a frame? If so, then we have enough to form a clear picture of the timings for the four alignments.
The PPU apparently latches internally when reading from it, so the CPU sees the state of things around the time PPU /CE is asserted. For example, even though /VBL goes high during the read, D7 to the CPU doesn't change, indicating that the PPU has a latch for outputs when reading from it.
For what I'm calling the +0 clocks alignment (in the previously posted list, the last is +0 clocks, the next-to-last +1 clocks, etc.), we have CPU A8 going low for the access at +0ns, then at +163ns PPU /CE going low, presumably latching the VBL flag internally. If the CPU had read even one clock sooner, it wouldn't have found VBL set. So if VBL is set on the first dot, then we have a dot beginning around +163ns, and the CPU cycle at +0ns. A PPU dot is 186.24ns, so this is slightly less than a dot. There's probably a little propagation delay from the dot beginning to VBL getting set internally, so the dot probably begins a little sooner.
Code: Select all
ns ------- 11111111112222222222333333333344444444445555555555
7654321012345678901234567890123456789012345678901234567890123456789
_______ ___
A0-A15_______\_______________________________________________________/___
0 559
_ _________________________________
PHI2 \___________________/ \___________
-60 135 487
___________________ _______
PPU /CE \___________________________________/
163 517
dots 0clk[ ][VBL set ][ ][
dots 1clk ][VBL set ][ ][
dots 2clk ][VBL set ][ ][ ]
dots 3clk ][VBL set ][ ][ ]
cpu a8 ][ Read ][
phi2 [ Read ][Note that if we examine only CPU reads, it's irrelevant where the CPU cycle is, since all the reads occur at the same point. The important thing is that an alignment of +0 clocks corresponds to the CPU reading the VBL flag the earliest possible in the frame.
I haven't looked as much at writes. I'm guessing that they occur on the rising edge of the /CE line, once data lines have stabilized. SO that would put them about 352ns after reads, or about 7.6 clocks later, slightly less than 2 dots.
Re: CPU/PPU timing
About to look into this some more. I poked around a bit in Visual 2A03/2C02. Here's some stuff that might be useful before I forget it:
- I added a section on PPU address bus contents/timing to http://wiki.nesdev.com/w/index.php/PPU_rendering . In Visual 2C02 ALE is high exactly during the first cycle of a two-cycle VRAM access, so barring propagation delays, ALE goes high right when a PPU cycle starts (but obv. only for the first PPU cycle).
- http://wiki.nesdev.com/w/index.php/CPU_interrupts (renamed from "CPU interrupt quirks") now has more precise info on when interrupts are polled. Seems to be during the first phase (half-cycle) of the CPU cycle.
- http://wiki.nesdev.com/w/index.php/CPU_ ... escription has some more info on the timing of reads and writes.
- Also added some misc. info to http://wiki.nesdev.com/w/index.php/PPU_ ... escription, though it might not be as relevant.
Re: CPU/PPU timing
I've figured out roughly what goes on with the VBlank in the PPU that causes the weird reading behavior.
Inside the PPU, there are two key players when it comes to signal timing:
The _io_ce line comes from the address decoder (the chip marked "LS139" in http://wiki.nesdev.com/w/images/f/f3/Neswires.jpg). The address decoder is set up so that the chip enable signals are generated during the second, high phase of M2 (it basically has an "AND M2" condition on all the outputs, though some are inverted). The /DBE input to the PPU in the diagram corresponds to _io_ce, and is also inverted, so that _io_ce will be low during the second, high phase of M2. M2 has a modified non-50% duty cycle (see http://wiki.nesdev.com/w/index.php/CPU_ ... escription).
To summarize: For a read/write cycle from the CPU, the internal read/write signals in the PPU will get their values right away, while _io_ce will go low during the second (high) phase of M2.
Inside the PPU, the same signal, read_2002_output_vblank_flag, is used both to clear the VBlank flag and to hold the value of the read buffer (see the tutorial). While the read buffer is not held (closed off), it simply mirrors the VBlank flag. The relationship between the signals is
This means that read_2002_output_vblank_flag goes high during the second phase of M2 during a read from $2002. Since this is the point where the value is held, we can make the following observation:
The NMI output is directly tied to the VBlank flag (it just has an additional "AND nmi_enabled" condition). Looking at the above, you can see how an NMI might be missed, as the high "clearing" phase of M2 might completely override set_vbl_flag, and thus the NMI.
It's a bit confusing, but clicking around a bit helps. I also put some notes (mostly for myself) at http://wiki.nesdev.com/w/index.php/User:Ulfalizer .
Sorry for not looking into the timing stuff you posted yet btw, but attacking the issue from two angles is probably a good idea at least.
Maybe read_2002_output_vblank_flag should be renamed to something like "read_2002_clear_vblank_flag_and_hold_value" by the way.
Inside the PPU, there are two key players when it comes to signal timing:
- Read/write signals like /r2002 and /w2001.
- The _io_ce signal.
The _io_ce line comes from the address decoder (the chip marked "LS139" in http://wiki.nesdev.com/w/images/f/f3/Neswires.jpg). The address decoder is set up so that the chip enable signals are generated during the second, high phase of M2 (it basically has an "AND M2" condition on all the outputs, though some are inverted). The /DBE input to the PPU in the diagram corresponds to _io_ce, and is also inverted, so that _io_ce will be low during the second, high phase of M2. M2 has a modified non-50% duty cycle (see http://wiki.nesdev.com/w/index.php/CPU_ ... escription).
To summarize: For a read/write cycle from the CPU, the internal read/write signals in the PPU will get their values right away, while _io_ce will go low during the second (high) phase of M2.
Inside the PPU, the same signal, read_2002_output_vblank_flag, is used both to clear the VBlank flag and to hold the value of the read buffer (see the tutorial). While the read buffer is not held (closed off), it simply mirrors the VBlank flag. The relationship between the signals is
Code: Select all
read_2002_output_vblank_flag = /r2002 NOR _io_ce- The value returned by the read is the value the VBlank flag has when the high phase of M2 starts.
The NMI output is directly tied to the VBlank flag (it just has an additional "AND nmi_enabled" condition). Looking at the above, you can see how an NMI might be missed, as the high "clearing" phase of M2 might completely override set_vbl_flag, and thus the NMI.
It's a bit confusing, but clicking around a bit helps. I also put some notes (mostly for myself) at http://wiki.nesdev.com/w/index.php/User:Ulfalizer .
Sorry for not looking into the timing stuff you posted yet btw, but attacking the issue from two angles is probably a good idea at least.
Maybe read_2002_output_vblank_flag should be renamed to something like "read_2002_clear_vblank_flag_and_hold_value" by the way.