Reliability of raster timing
Moderator: Moderators
Reliability of raster timing
Hello guys,
I am currently developing a NES game using the MMC3 mapper. My game will basically be a typical jump'n'run game, where the screen is split between top game and bottom status display.
For the raster split, I am using the MMC3 IRQ feature, which works quite well.
One feature of my raster routine is that it changes the palette mid-frame, to have a totally independent status display. As we all (probably) know, for changing the palette, one has to turn off PPU rendering, which on the other hand means that writing new colours will produce coloured stripes in the blank area. Fortunately, I was able to time my routine so that those stripes are hidden by the HBLANK.
I was able to get a stable routine which runs very well on both Nintendulator and Nestopia, which both seem to emulate the PPU behaviour correctly. I still haven't build my MMC3 devcart to test my routines on real hardware. So now my question is, to what degree are both emulators reliable in order to code timing intensive raster routines?
I am currently developing a NES game using the MMC3 mapper. My game will basically be a typical jump'n'run game, where the screen is split between top game and bottom status display.
For the raster split, I am using the MMC3 IRQ feature, which works quite well.
One feature of my raster routine is that it changes the palette mid-frame, to have a totally independent status display. As we all (probably) know, for changing the palette, one has to turn off PPU rendering, which on the other hand means that writing new colours will produce coloured stripes in the blank area. Fortunately, I was able to time my routine so that those stripes are hidden by the HBLANK.
I was able to get a stable routine which runs very well on both Nintendulator and Nestopia, which both seem to emulate the PPU behaviour correctly. I still haven't build my MMC3 devcart to test my routines on real hardware. So now my question is, to what degree are both emulators reliable in order to code timing intensive raster routines?
Absolutely, since there are slight differences between Nestopia and Nintendulator. In order to produce a clean stable raster split on Nintendulator, I need to delay writes to palette ram 1 more cycle than Nestopia. But currently, it looks perfect on all 3 major emulators (even FCE Ultra), so I am guessing (hoping) that it will work equally well on real hardware.Nestopia are very accurate, they even emulates the imperfection of MMC3's counter quite well.
But you'd still want to test on the real hardware after you developped the routine, and possibly fine tune the timing a little.
What I am most proud of is the fact that the routine works equally well on both PAL and NTSC mode (after lots of tweaking), depite PAL having severely less cycles per line. Coming from the C64, this scenario is much worse than the 63 vs 64 vs 65 cycle issue on the C64.
Now the other issue is: should homebrew software be made so it can run on emulators, or should it further explore the quirks of the original hardware....
I defy you to find things modern emulators doesn't implement, exept a trick about $2004 reading that was recently found, that I don't think any emulator has implemented yet.Now the other issue is: should homebrew software be made so it can run on emulators, or should it further explore the quirks of the original hardware.... Wink
Useless, lumbering half-wits don't scare us.
Interesting answer, considering the fact that in your previous post, you recommend checking my timing against real hardware.Bregalad wrote: I defy you to find things modern emulators doesn't implement, exept a trick about $2004 reading that was recently found, that I don't think any emulator has implemented yet.
As soon as you deliberately modify registers during screen rendering, especially in the middle of a scanline, I am quite sure one can produce code which will not display correctly using an emulator.
Even on the best explored systems, like the C64 or the Atari VCS, which absolutely require 99.9% perfect timing in order to run everything, there is still untapped potential left. I kind of doubt that NES emulation has reached the same kind of perfection, though Nestopia is really a huge step forward compared to emulators like FCE Ultra.
Don't take it the wrong way, i don't mean to bash anyone. But I sense a lot of untapped potential in the NES. The only thing which kind of destroys any attempt to unveil this potential is the huge difference between PAL and NTSC consoles, your timing will alway be off depending on the region you develop for. Why on earth did Nintendo choose to do this? They could have easily divided the 26,58Mhz / 15, and PAL users would have 1,77 Mhz CPU speed.
Anyway, thanks for your help!
Of course you can't know if it will work on real hardware before trying. But it's now hard to encounter a situation where Nestopia or Nintendulator are "wrong". Maybe if you play a lot with the register as you said, but this will not output anything usefull on the screen so there is no point in doing that exept for tests.Interesting answer, considering the fact that in your previous post, you recommend checking my timing against real hardware.
This is possible, but I still defy you to find anything really new. I'm not saying this isn't possible or anything, I'm just defying you.Even on the best explored systems, like the C64 or the Atari VCS, which absolutely require 99.9% perfect timing in order to run everything, there is still untapped potential left. I kind of doubt that NES emulation has reached the same kind of perfection, though Nestopia is really a huge step forward compared to emulators like FCE Ultra.
Useless, lumbering half-wits don't scare us.
Hmm, I already have encountered a case where both emulators behave slightly differently. In order to hide the color ram artifacts on Nintendulator, I have to add 1 extra cycle. Furthermore, artifacts caused by updating $2006 too late (when I was still figuring out the timing) look slightly different on both emulators. So logically, either one or both of those emulators are wrong.Bregalad wrote:Of course you can't know if it will work on real hardware before trying. But it's now hard to encounter a situation where Nestopia or Nintendulator are "wrong".Interesting answer, considering the fact that in your previous post, you recommend checking my timing against real hardware.
But at least, they DO implement a pixel based PPU renderer which makes them behave close to each other. Otherwise, it wouldn't be possible at all to do this kind of tweaking I did. Even the worst flickering raster looks perfectly stable on FCE Ultra...
Ummm, what makes you so sure? For example, ever played with the colour emphasize bits in the middle of a scanline?Maybe if you play a lot with the register as you said, but this will not output anything usefull on the screen so there is no point in doing that exept for tests.
The reason why Atari VCS and C64 emulators are so accurate is that lots of people tried lots of weird things on those machines during their life time, with no one asking the question of "usefulness".
Hmm, guess I need to collect some parts to build my devcart.This is possible, but I still defy you to find anything really new. I'm not saying this isn't possible or anything, I'm just defying you.
When Nintendo designed the Famicom, they probably had no intention of bringing it to the states much less to Europe. 26.58 MHz also won't neatly divide to the NTSC colorburst (and multiples of) for the PPU. Anyways, the CPU speed is hardly a matter in porting, with the completely different video timing and all.6502freak wrote:Why on earth did Nintendo choose to do this? They could have easily divided the 26,58Mhz / 15, and PAL users would have 1,77 Mhz CPU speed.
Yes, but the PAL colourburst. PAL NES systems are clocked with 26.58Mhz masterclock. 26.58Mhz / 6 = 4.43Mhz = PAL colorburst.kyuusaku wrote: When Nintendo designed the Famicom, they probably had no intention of bringing it to the states much less to Europe. 26.58 MHz also won't neatly divide to the NTSC colorburst
NTSC NES systems are clocked with 21.48Mhz masterclock. 21.48Mhz / 6 = 3.58Mhz = NTSC colorburst.
Now, the PAL pixelclock is generated by:
26.58Mhz / 5 = 5.31Mhz
So far, they solved this most elegantly, because the 5.31Mhz PAL pixelclock is very close to the NTSC pixelclock of 5.37Mhz (21.48Mhz / 4). With 340 PPU per per line, the PAL NES yields 15.62khz with 50.1Hz using 312 lines.
But why on earth did they choose 26.58Mhz / 16 = 1.66Mhz, instead of 26.58Mhz / 15 = 1.77Mhz, which would be nearly identical to the NTSC CPU speed (21.48Mhz / 12 = 1.79Mhz)? Most PAL/NTSC conversion problems would have been eliminated. The sound hardware wouldn't need a PAL update, because the difference between 1.77 and 1.79 Mhz is hardly noticeable. Furthermore, the formula 1 CPU cycle = 3 PPU pixels would be the same on PAL & NTSC.
Once you start writing timing intensive code, it matters a lot, because on PAL you have quite a few CPU cycles less per line than NTSC. Of course you still would have 50Hz vs. 60Hz, but at least the video AND cpu timing would be nearly identical.(and multiples of) for the PPU. Anyways, the CPU speed is hardly a matter in porting, with the completely different video timing and all.
If I were to take a guess from what I'd learned in my digital systems lab, it allowed them to not use a 4-input AND gate as the input to the clock divider reset. (/12: 4 bit divider, AND q2 & q3 and use it as the reset signal. /16: no reset signal. /15: must AND all 4 outputs). Since the NES dates to the era when ICs were designed almost purely by hand, I'm guessing it was easier to remove the (N)AND gate than to replace it with one approximately twice the size.6502freak wrote: But why on earth did they choose 26.58Mhz / 16 = 1.66Mhz, instead of 26.58Mhz / 15 = 1.77Mhz, which would be nearly identical to the NTSC CPU speed (21.48Mhz / 12 = 1.79Mhz)? Most PAL/NTSC conversion problems would have been eliminated. The sound hardware wouldn't need a PAL update, because the difference between 1.77 and 1.79 Mhz is hardly noticeable. Furthermore, the formula 1 CPU cycle = 3 PPU pixels would be the same on PAL & NTSC.
Yes. Commercial games also did so, and it takes effect instantally.Ummm, what makes you so sure? For example, ever played with the colour emphasize bits in the middle of a scanline? Wink
For NTSC and PAL, the rule is that if the VBlank works in NTSC it's possible to make it work in PAL, and if raster timing works in PAL it's possible to make it work in NTSC.
Useless, lumbering half-wits don't scare us.
I very much doubt that a single tiny gate is the reason for this.lidnariq wrote: If I were to take a guess from what I'd learned in my digital systems lab, it allowed them to not use a 4-input AND gate as the input to the clock divider reset. (/12: 4 bit divider, AND q2 & q3 and use it as the reset signal. /16: no reset signal. /15: must AND all 4 outputs).
Doesn't convince me, because the PAL colour encoding alone is more complex to implement than the NTSC one (you have to shift the phases on every odd line). It requires quite a rework. There are also other changes in the PAL PPU, for example the 341/340 cycle toggling line is not present.Since the NES dates to the era when ICs were designed almost purely by hand, I'm guessing it was easier to remove the (N)AND gate than to replace it with one approximately twice the size.
In the end, we'll never know for sure.
What commercial games changes the emphasize bits in the MIDDLE OF A SCANLINE. Notice, SCANLINE, not SCREEN.Bregalad wrote:Yes. Commercial games also did so, and it takes effect instantally.Ummm, what makes you so sure? For example, ever played with the colour emphasize bits in the middle of a scanline? Wink
I think the PAL change is easy, and requires very little space... but I need to do some research on the wiki before I mouth off.6502freak wrote:Doesn't convince me, because the PAL colour encoding alone is more complex to implement than the NTSC one (you have to shift the phases on every odd line). It requires quite a rework. There are also other changes in the PAL PPU, for example the 341/340 cycle toggling line is not present.
In the end, we'll never know for sure.
The CPU and PPU were entirely separate, and may well have been updated by entirely different teams. As far as I know, the only differences in the PAL cpu were- that divider by 16, and the changes in the lookup table for noise and dpcm. The sound generator on the CPU was already done by pulling out the 6502's BCD mode: it's not altogether unreasonable to think that space was at a premium on the CPU die.
Some more guesses, then: The division needs to be even, because the hardware is tremendously easier for a /6 or /8 which produces the high and low-going edges of the output clock, in comparison to two different comparisons which set it high and low respectively. Or the duty cycle of the resultant clock needs to be 50%, not 46%. Or they were lazy and removed the AND gate instead of drawing new silicon.
But yeah, we can only take educated guesses.