6502 SBC and NES Clocking

Discuss technical or other issues relating to programming the Nintendo Entertainment System, Famicom, or compatible systems. See the NESdev wiki for more information.

Moderator: Moderators

gmcastil
Posts: 9
Joined: Sun Mar 12, 2017 4:47 pm

6502 SBC and NES Clocking

Post by gmcastil »

TL;DR - I'm making a NES / 6502 SBC entirely in an SoC, have the clocks and resets circuits done, and am asking for a code review.

I've been working on a 6502 based single board computer that I'm implementing entirely in the programmable logic (i.e., FPGA) of a Xilinx Zynq 7000 SoC. I won't get into too many details, but basically the RAM, the ROM, the CPU and some other basic peripherals are all going to be in fabric, but I can interact with the entire hardware design from Linux, which is running on the processor. The RAM and ROM are basically dual port block RAM which I can load, dump, purge, program, etc. from Linux and then start the SBC and it will run just like if it were on a circuit board in hardware.

My plan is to support NES emulation - so my CPU will have a NES compatibility mode and I'm going to use the PPU as a basic video output driver instead of an I2C LCD or something like that. To that end, I've been working on my clocking and reset circuits - I could quite easily run whatever clock frequency I wanted to since it's all in fabric, but I wanted to start by building what I thought would support cycle accurate clocking of the NES CPU and PPU. I can support two input clock sources - either an external 100MHz oscillator or a fabric clock at the same frequency and I believe I've nailed the NSTC frequencies that are on the wiki.

I started with the 100 MHz oscillator and then used an MMCM to produce a single 236.25 MHz reference clock, which I divided down by 11 to get the master clock. Then (and this is where it gets a bit interesting) I used a couple of shift registers to divide that single master clock reference by 12 and 4 to produce single pulse CPU and PPU clock enables (on the same domain as the master). This way, the entire fabric can run at the master frequency, but the CPU and PPU logic will still run at what should be effectively the same frequency as the would in a real system. I also generated reset signals which are basically synchronous to those pulses as well.

If anyone is interested in seeing how I did it, I would very much appreciate the feedback and any suggestions. Here's a link to the RTL. Also, I've attached a screenshot of the waveforms.

Edit: I said 5 in the original post, but I meant 4. Confirmed in the simulator that the PPU to CPU relationship is 3:1.
Attachments
nes_clks.PNG
Last edited by gmcastil on Sat Nov 18, 2023 9:29 pm, edited 1 time in total.
User avatar
Dwedit
Posts: 4877
Joined: Fri Nov 19, 2004 7:35 pm
Contact:

Re: 6502 SBC and NES Clocking

Post by Dwedit »

Dividing by 5 doesn't sound right?

PPU should draw 3 pixels every 1 CPU cycle. That number gets you 2.4 dots per CPU cycle.
Here come the fortune cookies! Here come the fortune cookies! They're wearing paper hats!
gmcastil
Posts: 9
Joined: Sun Mar 12, 2017 4:47 pm

Re: 6502 SBC and NES Clocking

Post by gmcastil »

Dwedit wrote: Sat Nov 18, 2023 9:27 pm Dividing by 5 doesn't sound right?

PPU should draw 3 pixels every 1 CPU cycle. That number gets you 2.4 dots per CPU cycle.
Yep, I noted that when I wrote it, fixed the RTL, and then forgot to edit my post. Thanks. Any other thoughts?
User avatar
Dwedit
Posts: 4877
Joined: Fri Nov 19, 2004 7:35 pm
Contact:

Re: 6502 SBC and NES Clocking

Post by Dwedit »

6 ticks of Master Clock gets you the color subcarrier, but 4 ticks of master clock gets you a pixel.
Master clock effectively gets doubled (counts both the rising and falling edges) and that controls the phase of the composite video generator.
I see in the picture that the master clock is a perfect 50% square wave, so that will help with generating the phase.
Here come the fortune cookies! Here come the fortune cookies! They're wearing paper hats!
gmcastil
Posts: 9
Joined: Sun Mar 12, 2017 4:47 pm

Re: 6502 SBC and NES Clocking

Post by gmcastil »

Yes what I'm referring to as `clk_mst` is a 50% duty cycle clock created in the following way:
  • 100 MHz external oscillator to a PLL (you can see the locked indicator at the bottom).
  • PLL has two clock outputs - a 236.25MHz clock, which I refer to as the reference clock and then a divided by 11 version of that, which I refer to as the master clock.
  • The two PLL outputs are phase locked to each other (the 236.25MHz clock is not shown in the waveform)
  • The master clock is then divided by 12 and 4 to get CPU and PPU pulses. Because they both come from the same source, their phase relationship shouldn't change.
  • Rather than creating additional clock domains, I'm choosing to treat the CPU and PPU pulses as clock enables, then I can run the entire fabric at the master clock frequency.
  • This is important because in order to emulate an asynchronous memory with block RAM, I need to be able to read and write in one CPU cycle but the memory itself needs to have a couple of clocks to service the request. I think this is actually the part I'm going to be working on next - the block RAM wrapper that masquerades as an asynchronous memory to a CPU clock, but actually runs off the master clock domain.
I think I'm really looking for someone to see if I've done something silly or am missing something significant that is going to cause a major redesign in the future. You mentioned rising and falling edges and at present, I have no intent to ever do anything with falling edges inside the chip. If I need to do something in a half a cycle, I was anticipating either doing it in parallel on a single edge or doing it on multiple edges of the reference clock domain.
User avatar
Dwedit
Posts: 4877
Joined: Fri Nov 19, 2004 7:35 pm
Contact:

Re: 6502 SBC and NES Clocking

Post by Dwedit »

Just saying that if you are generating an NTSC composite signal, you need something that runs twice as fast as the master clock in order to get the video square wave into the correct location (phase).

I don't know how long the CPU-related clock signals stay high on real NES hardware, or know what cartridges expect to see. They may need a 50% duty cycle square wave?
Here come the fortune cookies! Here come the fortune cookies! They're wearing paper hats!
gmcastil
Posts: 9
Joined: Sun Mar 12, 2017 4:47 pm

Re: 6502 SBC and NES Clocking

Post by gmcastil »

Thanks for the response.

I'm not going to generate a composite video signal at all. I will probably start with an output interface using a VGA form factor connector and signaling, with some sort of scaling to fit reasonably in 1024x768 or 640x480, but I'm not sure how I'll eventually want to do it. It's not going to be analog though and I can synthesize other frequencies as necessary to create whatever output video format I want.

Is there any relationship between the CPU and PPU clocks other than just their frequencies and that their phase relationship doesnt change post-reset? Beyond that, once data is written into the PPU, my understanding is that the output video signal (wherever it goes) is completely independent of the processor. Is that correct?
User avatar
Dwedit
Posts: 4877
Joined: Fri Nov 19, 2004 7:35 pm
Contact:

Re: 6502 SBC and NES Clocking

Post by Dwedit »

NES games will perform scroll changing and graphics bankswitching at various times during the frame. So there will be writes to the PPU while a frame is rendering. So it's not independent of the processor in that way. The PPU also interacts with the CPU in a limited way (vblank interrupt, sprite 0 hit, rarely used and buggy sprite overflow flag)
Here come the fortune cookies! Here come the fortune cookies! They're wearing paper hats!
lidnariq
Posts: 11373
Joined: Sun Apr 13, 2008 11:12 am
Location: Seattle

Re: 6502 SBC and NES Clocking

Post by lidnariq »

To generate higher-resolution video, you'll be able to use a simple linemultiplier (e.g. one scanline in, two scanlines out) but NTSC used a substantially more underscanned video (220-ish scanlines active out of 262 - 84%) than almost all modern video standards (usually 90-97%). If you want to fix that you'll need a partial framebuffer (a circular buffer of linebuffers) and fill/replay from them as needed.

If you intend to display on cranky HDTVs, you shouldn't generate the "correct" 21.47MHz clock at all (and to directly answer your initial question: your multiphase enables sound fine. See footnote below.) but should instead generate the required ( 60Hz vsync x 262 hsync/vsync x 1364 clk/hsync ) clock so that the HDTV doesn't reject your input.


Footnote: Real bugs on the NES depend on the exact sub-pixel phase of CPU vs PPU cycles. (Example: PPU does some register updates the entire time it's selected, before the CPU has driven the data bus. A non-stable data bus can sometimes write nonsense to registers.) Modern HDL design dogma discourages the kind of flaws that produce these bugs, which will be annoying if you intend on bug-for-bug compatibility.
gmcastil
Posts: 9
Joined: Sun Mar 12, 2017 4:47 pm

Re: 6502 SBC and NES Clocking

Post by gmcastil »

If you intend to display on cranky HDTVs, you shouldn't generate the "correct" 21.47MHz clock at all (and to directly answer your initial question: your multiphase enables sound fine. See footnote below.) but should instead generate the required ( 60Hz vsync x 262 hsync/vsync x 1364 clk/hsync ) clock so that the HDTV doesn't reject your input.
Yes, that was something I expected I would need to do. I made a VGA core not too long ago that I might use and for the output, I generated a 24.175MHz clock I think. The Xilinx 7-Series has a lot of clocking resources and flexibility so I shouldn't have any problems making whatever I need. I generated the 21.47MHz clock because a) as I understood it, processor cycle times and counts determine gameplay speeds and my processor design is going to have the correct number of cycles and b) the CPU and PPU needed to have a fixed relationship and it was much simpler to just create the master clock and then divide it down rather than trying to create the two clocks independently and establish a fixed phase. I also need a much faster clock that is synchronous to the CPU clock so that I can overclock the block RAM which I'm using to emulate the RAM and ROM. A read from the block RAM (at least, as I have it configured) requires three clock edges to get the data to appear on the outputs. So, from the processor perspective, when it asserts the address and appropriate control signals (on the rising edge of the master clock and with the CPU clock enable asserted) I need at least 3 clocks to present the data before the processor will expect it to be on the data bus. I think it will actually work out pretty well - for the device I"m using, the master clock frequency is fine to run virtually all of the SBC logic off, I'm using 100MHz as my principal frequency for running all of the non-SBC logic off, and I get guaranteed CPU and PPU relationships.
Footnote: Real bugs on the NES depend on the exact sub-pixel phase of CPU vs PPU cycles. (Example: PPU does some register updates the entire time it's selected, before the CPU has driven the data bus. A non-stable data bus can sometimes write nonsense to registers.) Modern HDL design dogma discourages the kind of flaws that produce these bugs, which will be annoying if you intend on bug-for-bug compatibility.
I was aware there were things like this - my plan was to emulate a modern 65C02 out of the box and then make a NES compatible mode that implement or enables bug behavior. Either through a control register or a compile time switch. As long as the behavior is well-understood I can emulate it.
User avatar
Dwedit
Posts: 4877
Joined: Fri Nov 19, 2004 7:35 pm
Contact:

Re: 6502 SBC and NES Clocking

Post by Dwedit »

65C02 vs 6502:
* No illegal instructions
* Extra cycle consumed by arithmetic operators (when in decimal mode) to correct the flags
* JMP (nnnn) wraps correctly
* Several added instructions

If you emulate the 65C02, there's a lot in there that's not in the regular 6502, and vice versa.
Here come the fortune cookies! Here come the fortune cookies! They're wearing paper hats!
lidnariq
Posts: 11373
Joined: Sun Apr 13, 2008 11:12 am
Location: Seattle

Re: 6502 SBC and NES Clocking

Post by lidnariq »

I've heard anecdotally that some modern famiclones are CMOS and use a 65C02 core. Unfortunately, I haven't heard of any research about which.
User avatar
Dwedit
Posts: 4877
Joined: Fri Nov 19, 2004 7:35 pm
Contact:

Re: 6502 SBC and NES Clocking

Post by Dwedit »

gmcastil wrote: Mon Nov 20, 2023 5:45 am ...as I understood it, processor cycle times and counts determine gameplay speeds and my processor design is going to have the correct number of cycles and b) the CPU and PPU needed to have a fixed relationship...
Let's talk about the Dendy for a second.
The Dendy was designed to run games intended for ~60FPS NTSC on a 50FPS PAL TV.
To make the games work correctly, they kept the 3 PPU dots per 1 CPU cycle ratio. This keeps all the timed code working well, and even timing-sensitive games like Battletoads run with no problems.

Normal NTSC NES has 261 total scanlines. 240 visible scanlines, 1 postrender scanline, Vblank Interrupt and 20 vblank scanlines, 1 prerender scanline.
But Dendy has to generate a PAL picture, which is 312 total scanlines. So they added extra scanlines to the *postrender* part of the frame, making it 51 scanlines long instead of 1 scanline long.

Adding the extra scanlines on to the end of the frame also has a side effect of causing lag reduction. The game gets more CPU time to complete running the frame, so it is less likely to miss the deadline of the vblank period.
Some emulators even let you "overlcock" the NES by acting like a Dendy, but then playing at ~60FPS rather than 50FPS.

So do "processor cycle times and counts determine gameplay speeds"? It determines if the CPU and PPU stay synchronized well enough to make raster effects work without any glitches or crashes, and it determines if you get slowdown in the same situations as original games.

----------

So now, VGA.

Let's say you wanted to generate a standard 480 scanline VGA picture in real time without buffering a complete frame.
640x480 mode has 525 total scanlines per frame, 480 for the visible picture, and 45 scanlines for vblank and sync time.

How would you map that on to a NES?
You'd need two scanline buffers. One that's getting filled, and one that's getting output. You'd output a scanline buffer twice to generate two VGA lines.
Scanline buffer would need to be filled two VGA scanlines before it gets output to the screen.

VGA line counts:
45 lines not visible picture (10 front porch, 2 sync, 33 back porch)
480 visible picture

NES line counts:
22 lines not visible picture (2 scanlines of background color, 3 scanlines of blank, 3 scanlines of sync, 14 scanlines of blank, 1 prerender line)
240 visible picture

If doubled, the 22 scanlines becomes 44 scanlines. Just 1 VGA scanline short (1/2 NES scanline). Based on the Dendy example, you can add additional time before the vblank interrupt happens, and most games won't notice. So you add one half-scanline worth of time to happen before the NES gets a vblank interrupt.

This will change the framerate from 60.098FPS to 60FPS.

---

So then...
You're making a VGA picture (front porch, sync, back porch, visible picture)
Two VGA lines before it's time to output the VGA picture's visible area, you start running the NES's visible area, capturing to a scanline buffer. This gets output twice per VGA scanline.
One VGA line after the VGA picture ends, you give the NES a vblank interrupt.

Also VGA wants a dot clock of 25.175 MHz. Will need to generate that somehow.
Here come the fortune cookies! Here come the fortune cookies! They're wearing paper hats!
User avatar
TmEE
Posts: 949
Joined: Wed Feb 13, 2008 9:10 am
Location: Norway (50 and 60Hz compatible :P)
Contact:

Re: 6502 SBC and NES Clocking

Post by TmEE »

One other thing that can be done is to freeze the NES side to make things fit into VGA (or some other resolution) specs, games/software would be unaware that anything had happened, which may work better than inserting an excess line on the NES side.
gmcastil
Posts: 9
Joined: Sun Mar 12, 2017 4:47 pm

Re: 6502 SBC and NES Clocking

Post by gmcastil »

Dwedit wrote: Mon Nov 20, 2023 4:53 pm So do "processor cycle times and counts determine gameplay speeds"? It determines if the CPU and PPU stay synchronized well enough to make raster effects work without any glitches or crashes, and it determines if you get slowdown in the same situations as original games.
Yeah, keeping CPU and PPU enable signals synchronized is trivial, because they're created from the same source and they can't drift. This isn't a concern, given the way I've generated clock enables on my single clock domain.
Dwedit wrote: Mon Nov 20, 2023 4:53 pm So now, VGA.

Let's say you wanted to generate a standard 480 scanline VGA picture in real time without buffering a complete frame.
640x480 mode has 525 total scanlines per frame, 480 for the visible picture, and 45 scanlines for vblank and sync time.

How would you map that on to a NES?
You'd need two scanline buffers. One that's getting filled, and one that's getting output. You'd output a scanline buffer twice to generate two VGA lines.
Scanline buffer would need to be filled two VGA scanlines before it gets output to the screen.

VGA line counts:
45 lines not visible picture (10 front porch, 2 sync, 33 back porch)
480 visible picture

NES line counts:
22 lines not visible picture (2 scanlines of background color, 3 scanlines of blank, 3 scanlines of sync, 14 scanlines of blank, 1 prerender line)
240 visible picture

If doubled, the 22 scanlines becomes 44 scanlines. Just 1 VGA scanline short (1/2 NES scanline). Based on the Dendy example, you can add additional time before the vblank interrupt happens, and most games won't notice. So you add one half-scanline worth of time to happen before the NES gets a vblank interrupt.

This will change the framerate from 60.098FPS to 60FPS.
I'll make sure to refer back here when I'm actually implementing video output. None of that sounds crazy, but I don't really have much understanding for how the PPU works yet. I figured that was the sort of thing I would need to do though.
Dwedit wrote: Mon Nov 20, 2023 4:53 pm Also VGA wants a dot clock of 25.175 MHz. Will need to generate that somehow.
I can synthesize that from the same 100Mhz that I'm using to create my reference clock (which then gets divided down to make the master clock). I've done it before on a VGA core I made here. The device I'm using has 4 total MMCM, of which one is in use now, so I should be fine.
Post Reply