Fun with MMC3 Scanline IRQs

Discuss technical or other issues relating to programming the Nintendo Entertainment System, Famicom, or compatible systems.

Moderator: Moderators

User avatar
darryl.revok
Posts: 520
Joined: Sat Jul 25, 2015 1:22 pm

Re: Fun with MMC3 Scanline IRQs

Post by darryl.revok »

lidnariq wrote:you should know that FCEUX's renderer is scanline-at-a-time
After I knew that it was safe, I started testing it on hardware, and I don't see any timing related glitches anymore.
thefox wrote:you may be able to chain two IRQs to minimize latency
When are some times that this would be useful? It seems like the combination of MMC3 IRQs and just hScroll/vScroll setting is pretty lenient. I will eventually want to work in CHR bankswitching.

I noticed some cool raster effects in Ninja Gaiden III: https://www.youtube.com/watch?v=_FmWbIqe7FQ (at 00:17). Maybe this trick would be useful for something complex like that. (I wonder why it shows garbled tiles when bankswitching. I wonder if that's the emulator or the cartridge. Apparently nobody gets that far on the cartridge...)
tokumaru wrote:But those affect sprites too, so they won't help with creating more colorful backgrounds with overlaying sprites.
Yeah that's probably going to rule those out as well. The biggest problem is that the strips scroll vertically as well so they'd really stand out.

As for the tile and attribute fetching, I wondered if anyone had any thoughts on this.

What I did was basically this:
  • Keep a master hScroll value
    Set hScroll hi/lo of each scanline split
    Calculate each required variable separately for each scanline split, essentially, scroll values, counter reload value, nametable addresses for tiles and attributes, last column tiles for which tiles were fetched, and a separate value for last frame's scroll position so that is doesn't get changed while waiting on scanlines
    Read data from metatiles and write to zero page slots pretty much the same as normal, except on a loop, once per scanline split
    Now the part that I feel like is probably the most messy, is in NMI, since all of the preload tiles aren't being used every time tiles are drawn, I basically made eight routines for writing background tiles to the nametable, one for each scanline strip. Then, as I check for new tiles needed, I draw the address of the routine from an array and JMP (indirect) to it. That means that my NMI is hardcoded for this particular level and would need to be changed depending on how many scanline splits there are.
Now that I've got the first big hurdles to getting this working, my next step is reintegrating attributes. It's going to be a little bit of a pain, but before I do, I was curious if it seemed like I was taking a relatively logical approach to what I've got so far. I'm a little concerned about how to integrate PRG switching but I can't really see that far ahead. I know 8KB will be more than enough for each level, the way I have them designed. As such I can probably squeeze the enemies for the levels in that space too. NMI, as I understand, could really be swapped in anywhere, since it doesn't need access to other code. So I guess that would be a flexible part to change per level.
User avatar
thefox
Posts: 3139
Joined: Mon Jan 03, 2005 10:36 am
Location: Tampere, Finland
Contact:

Re: Fun with MMC3 Scanline IRQs

Post by thefox »

darryl.revok wrote:
thefox wrote:you may be able to chain two IRQs to minimize latency
When are some times that this would be useful? It seems like the combination of MMC3 IRQs and just hScroll/vScroll setting is pretty lenient. I will eventually want to work in CHR bankswitching.
Whenever you need to more precisely land on a specific pixel. That's all it is for. Obviously there's no need to use techniques like that if you can get your stuff to work without them. But it's at least good to be aware that the sync can be improved, if needed. Blargg's nmi_sync is an extreme (and impressive) example of this.
Download STREEMERZ for NES from fauxgame.com! — Some other stuff I've done: fo.aspekt.fi
User avatar
darryl.revok
Posts: 520
Joined: Sat Jul 25, 2015 1:22 pm

Re: Fun with MMC3 Scanline IRQs

Post by darryl.revok »

thefox wrote:Whenever you need to more precisely land on a specific pixel.
I wonder if that would be precise enough for a vertical scroll split.
lidnariq
Posts: 10677
Joined: Sun Apr 13, 2008 11:12 am
Location: Seattle

Re: Fun with MMC3 Scanline IRQs

Post by lidnariq »

Depends entirely on the code that's being interrupted by the interrupt.

A NOP slide will have 2 cycles of jitter; more complicated code will have more. Two CPU cycles is only six pixels—definitely enough to stuff a full 2006-2005-2005-2006 write in hblank. It's about as close as you can get to blargg's nmisync 2 pixels of error without ... well, either nmisync and timed code or special interrupt hardware that will inject clockslides.
User avatar
tokumaru
Posts: 12106
Joined: Sat Feb 12, 2005 9:43 pm
Location: Rio de Janeiro - Brazil

Re: Fun with MMC3 Scanline IRQs

Post by tokumaru »

thefox wrote:If you can't control what code is executing when the IRQ fires, you may be able to chain two IRQs to minimize latency.
This is a really cool idea! For most typical raster effects this wouldn't be necessary though.
darryl.revok wrote:When are some times that this would be useful? It seems like the combination of MMC3 IRQs and just hScroll/vScroll setting is pretty lenient.
The $2006/$2005/$2005/$2006 trick only needs the last 2 writes to happen during hblank, and if you pre-load both values before hblank starts, you only need 5 or so cycles for the writes themselves (the last cycle of the second to last write and the 4 cycles of the last write), out of the 21 you have before the PPU starts fetching NT/AT data for the next scanline, so you have about 16 cycles to absorb any possible latency, way more than the 7 cycles that the slowest 6502 instruction takes. So yeah, changing the scroll is perfectly safe even with the regular timing techniques.
I will eventually want to work in CHR bankswitching.
Mid-screen CHR bankswitching is relatively painless because it's completely transparent to the PPU, since its registers are not used at all. How long it takes will depend on the mapper, and to time it right you have to keep in mind when during the scanline the PPU fetches the patterns for drawing the graphics. If you look at this very useful diagram, you'll notice that sprite patterns for the next scanline are fetched near the beginning of hblank, so you'll definitely want to switch sprite patterns before that, while the scanline is being rendered. As for background patterns, they start to be fetched 20 or so PPU cycles before the end of hblank, so you should have the new patterns loaded by then.
I noticed some cool raster effects in Ninja Gaiden III: https://www.youtube.com/watch?v=_FmWbIqe7FQ (at 00:17). Maybe this trick would be useful for something complex like that.
Back in the day, developers didn't really know all these crazy tricks we know today. The $2006/5/5/6 trick for example, certainly wasn't documented anywhere, because apparently no commercial game from back then ever used it. All games that changed the vertical scroll mid-screen did it for status bars or floor tiles that were always aligned to the tile grid, so the fine Y scroll would always be 0. Back then, people didn't have the time or dedication to find out the things that smart people like blargg do. As far as we know, most of the timing in the old games was done by trial and error, which is why even big name games like Super Mario Bros. 3, Kirby's Adventure and Mega Man 3 have noticeable glitches in their raster effects.
User avatar
darryl.revok
Posts: 520
Joined: Sat Jul 25, 2015 1:22 pm

Re: Fun with MMC3 Scanline IRQs

Post by darryl.revok »

Okay, I gotta throw this out there. I've been stuck on a bug that's driving me crazy. It does not appear on FCEUX but it appears on NES hardware and Nintendulator DX.

First off, when my game scrolls, I'm getting garbled tiles. This is new since I've adapted everything for split scrolling.

I've narrowed it down to the writing of two attributes. Only writing to the last row of attributes does this. If I disable those, it stops.

So, my first thought is bad math on the nametable address. Well, here's where it gets really confusing.

In troubleshooting, I set it to write that tile to an absolute location. Even THEN the glitch still happens. I am highly perplexed.

Is there any reason that this:

Code: Select all

  LDA #$00
  STA $2000
  
  LDA $2002									; Read PPU status to reset the high/low latch
  
  LDA #$23
  STA $2006									; Write the high byte of column address

  LDA #$D8
  STA $2006									; Write the low byte of column address
 
  LDA preloadNametable00Attributes+8
  AND #%00001111
  STA $2007
Could write to anywhere but $23D8? Somehow it's getting into other attributes and sometimes even tiles...
lidnariq
Posts: 10677
Joined: Sun Apr 13, 2008 11:12 am
Location: Seattle

Re: Fun with MMC3 Scanline IRQs

Post by lidnariq »

Any chance you didn't disable rendering, or ran out of time before rendering automatically started?
User avatar
darryl.revok
Posts: 520
Joined: Sat Jul 25, 2015 1:22 pm

Re: Fun with MMC3 Scanline IRQs

Post by darryl.revok »

Actually... I had not been disabling rendering during NMI. Is that odd? If I set it to do so, I get crazy glitches with my scanline IRQ for some reason. I imagine this is pretty important.
lidnariq
Posts: 10677
Joined: Sun Apr 13, 2008 11:12 am
Location: Seattle

Re: Fun with MMC3 Scanline IRQs

Post by lidnariq »

It shouldn't be "crazy" glitches...

During NMI you have ≈2200 cycles to do everything you want. If you run out of time, rendering starts unless you've explicitly disabled it. And that'll change the address in a (seemingly) random manner.

If you do disable rendering, and the time at which you re-enable rendering varies, then you could have the MMC3 IRQ vary by that much (since the MMC3 IRQ is counted in "number of lines rendered since IRQ enabled")
User avatar
darryl.revok
Posts: 520
Joined: Sat Jul 25, 2015 1:22 pm

Re: Fun with MMC3 Scanline IRQs

Post by darryl.revok »

Hmmm... I'm going to need quite a bit of reworking. I see a max of 2400 cycles.

Now, the way my NMI is set up to handle tile fetches of varying rates, there is going to be a lot of variability to the number of cycles it takes to reach the NMI.

Am I correct in understanding that what you're saying is that not disabling rendering during NMI managed to mask that variability without me realizing?

If that's correct, it's going to be easier for what I'm doing to reduce the max length than to time all of the possibilities to reach the end of NMI in the same number of cycles.

I'm wondering though if the rendering issue is actually the problem. Here's why:

My NMI starts loading attributes/tiles, by LDXing the number of scanline splits, then proceeds to decrement.
It calls the updater for split 8 first, this one glitches.
It then calls the updaters for splits 7-5. These are fine.
It calls the updater for split 4 next, this one glitches.
It calls the updater for splits 3-1, these are fine.

So it's not the last one that messes up.
lidnariq
Posts: 10677
Joined: Sun Apr 13, 2008 11:12 am
Location: Seattle

Re: Fun with MMC3 Scanline IRQs

Post by lidnariq »

darryl.revok wrote:Am I correct in understanding that what you're saying is that not disabling rendering during NMI managed to mask that variability without me realizing?
That's what I am saying ... although now I'm wondering whether I actually saw the problem correctly.
It calls the updater for split 8 first, this one glitches.
[... others ...]
It calls the updater for split 4 next, this one glitches.
Nintendulator's debugger will let you find out what pixel/scanline these writes are happening on. In Nintendulator's GUI, rendering is active from scanline -1 through 239; NMI happens on scanline 241.

Should be an easy way to see if I'm entirely barking up the wrong tree.
User avatar
darryl.revok
Posts: 520
Joined: Sat Jul 25, 2015 1:22 pm

Re: Fun with MMC3 Scanline IRQs

Post by darryl.revok »

I think you were right. Nintendulator occasionally throws up an error that there was a write to $2007 during rendering. It wasn't the actual attribute that was being written that was causing the problem, but I believe removing the two attributes removed enough cycles so that the problem didn't manifest during tile writing. (which occurs after attributes)

Removing the tile updates but leaving attribute updates removed the issue.

I'm trying to tweak the code now. I managed to clear enough cycles to get one more of the attribute updates in there. I think if I can squeeze out a few more, I can get all of my updates in.

Thank you very much for the suggestion. This was driving me crazy.

So is it typical to disable rendering during NMI? I was looking this up a little bit, and it seems like not disabling rendering is wasting some cycles. viewtopic.php?f=2&t=11117 I'll definitely want to do that in my levels without scanline IRQs, but I don't even know if it would be possible to time everything right in this level with rendering disabled in NMI. The NMI could load anywhere from 0-8 sets of tiles and 0-8 sets of attributes. If it could all be timed out, it definitely wouldn't be easy.

Edit: In my quest for more cycles, I removed all of the reads from $2002. This made my IRQs a little jittery, so I added in one single read from $2002 at the beginning of the NMI, instead of before each nametable write, and that fixed the issue. From what I'm reading, that should be okay, but I have a few questions.

1. Okay, I understand that $2002 is read so that you can say for sure that $2006 is expecting a high byte, but I'm wondering why it ever wouldn't be. Every time that I've had to write to $2006 thus far, I've used two bytes. Is there every a time that something will be done intentionally that causes a mismatch in address order, or is this just an error-proofing method, in case NMI hits at an inopportune time, or something?

2. Is there any particular reason why $2006 seems to be the only part of the system that's expecting the high byte first?

Okay, this one isn't related to the high/low latch, but something else I've been wondering. How long does setting the PPU to +32 bit increment stay in effect?

Let's say I'm writing a column of tiles. I write #%00000100 to $2000 for +32 bit mode. Now if I want to write attributes next, do I have to write #$00 to $2000, or will it default to +1 on the next write?

It's looking pretty tight in NMI, so I'm looking for any options available to save cycles.
User avatar
tokumaru
Posts: 12106
Joined: Sat Feb 12, 2005 9:43 pm
Location: Rio de Janeiro - Brazil

Re: Fun with MMC3 Scanline IRQs

Post by tokumaru »

darryl.revok wrote:So is it typical to disable rendering during NMI?
I personally don't do it, but I was curious so I just debugged a handful of games I had laying around, and there doesn't seem to be a consensus. These are the games that did turn rendering off:

Code: Select all

Super Mario Bros.
Duck Tales
Street Fighter 2010
Bucky O'Hare
And these are the ones that didn't:

Code: Select all

Alfred Chicken
Baloon Fight
Felix the Cat
Galaxian
The 3-D Battles of World Runner
Gimmick!
I intentionally didn't test Battletoads or anything known to use forced blanking, because those obviously have to disable rendering.

Anyway, I guess that disabling rendering can also be seen like a safety measure... If your NMI handler blows the VRAM access budget by accident, there will be no persistent VRAM corruption, the screen will just jump for a frame. I guess this is the reason why most of the games that do disable rendering do it, it's just to minimize the damage in case something goes wrong.
I was looking this up a little bit, and it seems like not disabling rendering is wasting some cycles.
Yeah, you can buy yourself a few more cycles of VRAM access during the pre-render scanline that way. I think I can get by without using those cycles, because there's some cleaning up I have to do in my NMI handler before setting the scroll anyway, and the pre-render scanline is a good time to do it.
In my quest for more cycles, I removed all of the reads from $2002. This made my IRQs a little jittery, so I added in one single read from $2002 at the beginning of the NMI, instead of before each nametable write, and that fixed the issue.
Most of the time I don't read $2002 at all, and I never had any problems with that. I'm extra careful to always perform $2005/$2006 writes in pairs, though.
1. Okay, I understand that $2002 is read so that you can say for sure that $2006 is expecting a high byte, but I'm wondering why it ever wouldn't be. Every time that I've had to write to $2006 thus far, I've used two bytes. Is there every a time that something will be done intentionally that causes a mismatch in address order, or is this just an error-proofing method, in case NMI hits at an inopportune time, or something?
It must be a bug in your code. This flag doesn't change unless you read $2002 or write to $2005/$2006. I just noticed you didn't mention $2005... was that just an omission or did you not know that these registers share the even/odd write flag?
2. Is there any particular reason why $2006 seems to be the only part of the system that's expecting the high byte first?
Maybe the engineers who designed the PPU though programmers would like to write addresses in the order humans read them. I don't think there's any technical reason for this, the designers simply had to pick a byte to go first and they decided on the high byte.
Okay, this one isn't related to the high/low latch, but something else I've been wondering. How long does setting the PPU to +32 bit increment stay in effect?
AFAIK, until you change that. I don't think there's anything automatic touching that setting.
Let's say I'm writing a column of tiles. I write #%00000100 to $2000 for +32 bit mode. Now if I want to write attributes next, do I have to write #$00 to $2000, or will it default to +1 on the next write?
It will definitely not default back, you have to change it yourself. If you're writing attributes for columns though, increments of 32 bytes can still be useful: Since each row of attributes is 8 bytes long, 32 bytes is equivalent to 4 rows, so you can update a full column by setting the address for the 1st byte and writing the 1st and 5th bytes, then set the address for the 2nd byte and write the 2nd and 6th bytes, and so on. This way you can write the 8 bytes of a column while setting the address only 4 times, instead of 8.
I'm looking for any options available to save cycles.
If you have any loops at all, you should really look into unrolling them. Even partially unrolling can have incredible results. For example, if you have a loop that counts each byte that's being copied, that's one decrement instruction + a branch for each byte (a total of 5 cycles), which is a lot of overhead for a single byte. If you're copying 20 bytes, that's 100 cycles you're losing, while only 160 cycles are actually being spent copying bytes (assuming 8 cycles per byte). If you partially unroll that loop and count pairs of bytes instead, copying 2 bytes per iteration, you'll be cutting back that overhead by half! The more you unroll, the less overhead you'll have.
User avatar
rainwarrior
Posts: 8062
Joined: Sun Jan 22, 2012 12:03 pm
Location: Canada
Contact:

Re: Fun with MMC3 Scanline IRQs

Post by rainwarrior »

darryl.revok wrote:1. Okay, I understand that $2002 is read so that you can say for sure that $2006 is expecting a high byte, but I'm wondering why it ever wouldn't be.
If you're careful, it doesn't ever have to be read after startup. If you're not desperate to save 4 cycles, reading it once at the start of your NMI might be worthwhile just in case it corrects some edge case you missed.
darryl.revok wrote:2. Is there any particular reason why $2006 seems to be the only part of the system that's expecting the high byte first?
It's the only part of the PPU that has a 16-bit interface, so it's not really inconsistent with itself, or anything else in the NES that Nintendo actually designed. The 6502 of course is little endian, but it's a different component designed at a different time by different people.
User avatar
thefox
Posts: 3139
Joined: Mon Jan 03, 2005 10:36 am
Location: Tampere, Finland
Contact:

Re: Fun with MMC3 Scanline IRQs

Post by thefox »

rainwarrior wrote:
darryl.revok wrote:1. Okay, I understand that $2002 is read so that you can say for sure that $2006 is expecting a high byte, but I'm wondering why it ever wouldn't be.
If you're careful, it doesn't ever have to be read after startup. If you're not desperate to save 4 cycles, reading it once at the start of your NMI might be worthwhile just in case it corrects some edge case you missed.
If you do read it at start of NMI, be careful to not have things like bulk PPU uploads running in the main thread that could be affected by it. $2002 read in NMI happening in the middle of the two PPU address writes in the main thread = bad news. (I have actually got bitten by this.)
Download STREEMERZ for NES from fauxgame.com! — Some other stuff I've done: fo.aspekt.fi
Post Reply