Just how cranky is the PPU OAM?
Moderator: Moderators
Re: Just how cranky is the PPU OAM?
I realize now also that since I don't know what the Three Stooges was doing exactly (and I can't find the original story with Google, it may be in the really old email archive .zip on this site though), I can't say for sure if they turned the sprites off or not. The post did mention that glitches were going away when they sprayed it with freon to chill it (and that it was only on some specific revision of the NES). So I wonder if putting a heater on the PPU, if one could make it happen with any NES, heheh.
- rainwarrior
- Posts: 8719
- Joined: Sun Jan 22, 2012 12:03 pm
- Location: Canada
- Contact:
Re: Just how cranky is the PPU OAM?
Yeah, after thinking about the possibilities, I don't think there's a situation where it won't be required to read at least the Y coordinate of each entry. So, yeah, if one read refreshes an entire line of memory, then that should be perfectly fine as a refresh.
Re: Just how cranky is the PPU OAM?
So why can't we check out what happens in Visual2A03?
Re: Just how cranky is the PPU OAM?
Given the failure modes, these seem to be analog and/or timing effects, and I believe I was told that Visual6502 (and so also Visual2A03 and Visual2C02) assumes digital static discipline. Unfortunately, qmtpro.com seems to be down right now, so I can't look for e.g. a carry out line from the OAM address.
Maybe the problem there is that they go FF→00(→FF)→00 ? I just wish I had any idea what to even try poking... Same problem blargg had originally, it seems.thefox wrote:Why wouldn't these work then?
$2003←$FC, $2004←ty, $2004←$FC, $2004←$FC, $2004←tx, $2003←$00
$2003←$FC, $2004←ty, $2004←$FC, $2004←$FC, $2004←tx, $2003←$FF, $2004←tx
- rainwarrior
- Posts: 8719
- Joined: Sun Jan 22, 2012 12:03 pm
- Location: Canada
- Contact:
Re: Just how cranky is the PPU OAM?
3gengames: That depends on whether Visual2A03 accurately emulates the physical causes of this problem.
Anyhow, I have to take back what I said about my example. I have now been able to get my program to run on my PowerPak Lite just fine. The problem was that I was writing my OAM before I wrote the nametable. By making sure the OAM write loop is the last thing I do before turning on rendering, it now comes up consistently (and will remain stable while rendering is on for as long as I've tested it, currently going on 20 minutes).
I dunno why it worked okay on my PowerPak earlier. Maybe it was just a cold day? (It's currently out of commission so I can only test with my PowerPak Lite at the moment; I am in the middle of making some modifications I have been putting off.)
According to FCEUX's cycle counter, the nametable write took 15422 clocks (~9ms). I suppose the OAM must be really sensitive to degradation then, more than most DRAM? I guess it at minimum it has to be able to survive vblank + a little bit of overlap (enough time to write 64 bytes to $2004?), maybe it really is starting to degrade after just 9ms.
Edit: after further testing, even this is not quite enough! See below.
Anyhow, I have to take back what I said about my example. I have now been able to get my program to run on my PowerPak Lite just fine. The problem was that I was writing my OAM before I wrote the nametable. By making sure the OAM write loop is the last thing I do before turning on rendering, it now comes up consistently (and will remain stable while rendering is on for as long as I've tested it, currently going on 20 minutes).
I dunno why it worked okay on my PowerPak earlier. Maybe it was just a cold day? (It's currently out of commission so I can only test with my PowerPak Lite at the moment; I am in the middle of making some modifications I have been putting off.)
According to FCEUX's cycle counter, the nametable write took 15422 clocks (~9ms). I suppose the OAM must be really sensitive to degradation then, more than most DRAM? I guess it at minimum it has to be able to survive vblank + a little bit of overlap (enough time to write 64 bytes to $2004?), maybe it really is starting to degrade after just 9ms.
Edit: after further testing, even this is not quite enough! See below.
Last edited by rainwarrior on Fri Mar 15, 2013 9:19 pm, edited 1 time in total.
Re: Just how cranky is the PPU OAM?
For the first one: The final [2003]=00 does somehow seem to disturb it; that might indicate that there is really a secondary address register that gets set to one value on address wraps from FFh to 00h, and to another value when manually setting the address to 00h.thefox wrote:Why wouldn't these work then?$2003←$FC, $2004←ty, $2004←$FC, $2004←$FC, $2004←tx, $2003←$00
$2003←$FC, $2004←ty, $2004←$FC, $2004←$FC, $2004←tx, $2003←$FF, $2004←tx
For the second one: I would assume that it DOES work, technically. The only problem might be that lidnariq did set the tilenumber to FCh, if that is a blank tile then nothing would be displayed (assuming that the other sprites are also blank or offscreen).
EDIT: If tilneno=FCh was the problem, then that might be the problem in both of the above cases.
But then, I don't understand why this "$2003←$FC, $2004←ty, $2004←$FC, $2004←$FC, $2004←tx" did work.
- rainwarrior
- Posts: 8719
- Joined: Sun Jan 22, 2012 12:03 pm
- Location: Canada
- Contact:
Re: Just how cranky is the PPU OAM?
Wow, after testing a bit more, it sometimes even degrades if the OAM write loop is the very last thing I do! It looks fine most of the time, every 3 or so resets I will be missing 1-5 sprites, I think always from the lower index (I've verified each of sprites 0-4 missing at least once after a reset, most often it is sprite 0 or 1).
The $2004 write loop (lda/sta/inx/bne) takes 3466 cycles (~2ms) until writing $2001 to enable rendering. I guess that's a bit longer than a vblank (~1.3ms). Wow, can it really degrade that fast?
Did someone say this problem doesn't exist on the PAL PPU? Did they have to re-engineer it to fix this because of the much longer blank time?
I guess an unrolled (lda immediate / sta) could probably write all 256 bytes in time, though it'd be more sensible to create an aligned block of data and DMA it, to save space (and cycles). Given how narrow the window seems to be, you might still have trouble if the end of your loop occurs near the start of vblank.
Maybe it would be good advice to suggest only writing OAM during vblank (when rendering is on), so you can ensure that you never miss your refresh window. From my tests there is no need to write OAM every frame, but just getting the data in there reliably before turning on rendering seems really tricky without doing it in vblank.
The $2004 write loop (lda/sta/inx/bne) takes 3466 cycles (~2ms) until writing $2001 to enable rendering. I guess that's a bit longer than a vblank (~1.3ms). Wow, can it really degrade that fast?
Did someone say this problem doesn't exist on the PAL PPU? Did they have to re-engineer it to fix this because of the much longer blank time?
I guess an unrolled (lda immediate / sta) could probably write all 256 bytes in time, though it'd be more sensible to create an aligned block of data and DMA it, to save space (and cycles). Given how narrow the window seems to be, you might still have trouble if the end of your loop occurs near the start of vblank.
Maybe it would be good advice to suggest only writing OAM during vblank (when rendering is on), so you can ensure that you never miss your refresh window. From my tests there is no need to write OAM every frame, but just getting the data in there reliably before turning on rendering seems really tricky without doing it in vblank.
Re: Just how cranky is the PPU OAM?
Maybe related to these results I got couple of years ago: viewtopic.php?p=81695#p81695rainwarrior wrote:Did someone say this problem doesn't exist on the PAL PPU? Did they have to re-engineer it to fix this because of the much longer blank time?
(Looking back at it, I have absolutely no idea why it wasn't working with forced vblank on back then, that seems... strange.)
Download STREEMERZ for NES from fauxgame.com! — Some other stuff I've done: fo.aspekt.fi
Re: Just how cranky is the PPU OAM?
For the test I was doing, tileno=$FC is "<". tileno=$00 is "@". (folded ASCII, $40-$5F, then $20-$3F, repeated 8 times through all of CHR-RAM to mimic m218 using m7 so that I should get the same results using emulators as my test hardware)nocash wrote:For the first one: The final [2003]=00 does somehow seem to disturb it; that might indicate that there is really a secondary address register that gets set to one value on address wraps from FFh to 00h, and to another value when manually setting the address to 00h.$2003←$FC, $2004←ty, $2004←$FC, $2004←$FC, $2004←tx, $2003←$00
$2003←$FC, $2004←ty, $2004←$FC, $2004←$FC, $2004←tx, $2003←$FF, $2004←tx
For the second one: I would assume that it DOES work, technically. The only problem might be that lidnariq did set the tilenumber to FCh, if that is a blank tile then nothing would be displayed (assuming that the other sprites are also blank or offscreen).
[…]
But then, I don't understand why this "$2003←$FC, $2004←ty, $2004←$FC, $2004←$FC, $2004←tx" did work.
In any case, my only guess is that somehow returning to address 0 twice in rapid succession causes it to somehow break things. Hence asking about whether the internal OAMADDR has a carry out or =0 detection anywhere.
The contemporary Intel C2116 (16k x 1b DRAM, used in the IBM CGA card) or its documentation-findable equivalent NTE2117 requests a refresh period of not more than 2ms. Which is right around as cranky as the timing you're finding...rainwarrior wrote:Wow, can it really degrade that fast?
Re: Just how cranky is the PPU OAM?
Some games don't do any sprite DMA at all for several frames on end, such as Metal Max when a text box is displayed.
Other games do a second sprite DMA in the middle of the screen.
RC Pro Am (PRG0) does weird things with sprites: It DMAs a bunch of FF bytes from ROM, then manually writes a few sprites with OAM writes. This happens during rendering time with the screen turned off. The game also does a normal sprite DMA during vblank time.
Other games do a second sprite DMA in the middle of the screen.
RC Pro Am (PRG0) does weird things with sprites: It DMAs a bunch of FF bytes from ROM, then manually writes a few sprites with OAM writes. This happens during rendering time with the screen turned off. The game also does a normal sprite DMA during vblank time.
Here come the fortune cookies! Here come the fortune cookies! They're wearing paper hats!
Re: Just how cranky is the PPU OAM?
A few days ago, I was testing something in visual2c02 and saw some very strange behavior, but after thinking about it for a while I may know what's going on:
The oddity I was seeing was that writing to $2003 would seemingly copy 8 bytes from the previously selected sprite "row" (00-07, 08-0F, 10-17, etc.) to the newly selected row. This turned out to be because my sprite RAM wasn't actually initialized with any values at all - every cell has two transistors, one to store "0" and one to store "1", so if a cell goes too long without being refreshed it will end up with "no value" in it, causing it to inherit values from another row if the sprite RAM address happens to change during the 'refresh' phase (as opposed to the 'bus precharge' phase).
Initializing all of sprite RAM with constant values caused this glitch to go away in the simulation, but it's very possible that this is exactly what's happening on the real chip when you write to $2003; when SPR-RAM increments on writes to $2004, it likely happens during precharge to avoid this glitch, so one possible way of 'safely' reading back sprite RAM might be to read the value and then immediately write it back so that the address increments at the right time. This might also explain the phenomenon of writing to $2003 post-DMA causing different sprites to be used for #0 and #1 (though that would seem to be an instance of values from the new page leaking over to the old page as a result of both rows being enabled at the same time, something which technically shouldn't happen).
The oddity I was seeing was that writing to $2003 would seemingly copy 8 bytes from the previously selected sprite "row" (00-07, 08-0F, 10-17, etc.) to the newly selected row. This turned out to be because my sprite RAM wasn't actually initialized with any values at all - every cell has two transistors, one to store "0" and one to store "1", so if a cell goes too long without being refreshed it will end up with "no value" in it, causing it to inherit values from another row if the sprite RAM address happens to change during the 'refresh' phase (as opposed to the 'bus precharge' phase).
Initializing all of sprite RAM with constant values caused this glitch to go away in the simulation, but it's very possible that this is exactly what's happening on the real chip when you write to $2003; when SPR-RAM increments on writes to $2004, it likely happens during precharge to avoid this glitch, so one possible way of 'safely' reading back sprite RAM might be to read the value and then immediately write it back so that the address increments at the right time. This might also explain the phenomenon of writing to $2003 post-DMA causing different sprites to be used for #0 and #1 (though that would seem to be an instance of values from the new page leaking over to the old page as a result of both rows being enabled at the same time, something which technically shouldn't happen).
Quietust, QMT Productions
P.S. If you don't get this note, let me know and I'll write you another.
P.S. If you don't get this note, let me know and I'll write you another.
- rainwarrior
- Posts: 8719
- Joined: Sun Jan 22, 2012 12:03 pm
- Location: Canada
- Contact:
Re: Just how cranky is the PPU OAM?
Have a working PowerPak again. I retried my earlier test on it, and very strangely it works 90% of the time (i.e. one reset in 10 I'm missing maybe 1 sprite). I feel like something connected on the PowerPak cartridge that's not the same when using the PowerPak Lite somehow makes the OAM more stable. Very odd.
Re: Just how cranky is the PPU OAM?
Here's a simple test ROM: U/D/L/R move the "current" sprite, A and B rotate which sprite is current, Start changes what is done after the four bytes are written to the PPU:
It always starts $2003←4×Cur $2004←Y[Cur] $2004←Cur $2004←Cur&3 $2004←X[Cur]
-- : it does nothing further
RS : finishes with $2003←0
ST : finishes with $2003←$FF $2004←X[$3F]
the X and Y arrays are initialized with X[foo]:=(foo&7)×32+12 and Y[foo]:=foo×3+16. In an earlier version I initialized Y[foo]:=(foo>>3)×24+28, but that showed different weird behavior (the sprite "O" would flicker as I slowly filled up the OAM with "good" data as it exceeded the 8 sprites per scanline limitation). This version pre-fills OAM with all $FF before the interactive portion starts.
On an emulator (and on a 2C07?), the only shown effects are the known "sprite 0 and 1 instead render the sprites pointed to by OAMADDR an extra time". On my NES, I get these effects-
* Almost always, only the latter half written to of any given row of OAM shows. So if I step through the OAM forward, only even sprites appear. If I step backwards, only odd sprites appear. Except:
* Sprites 8 or 9 are usually copied into Sprite 6 or 11, the opposite of the direction I step (i.e. if I step through it as 5, 6, 7, 8, 9, 10, 11, 10, 9, 8, I can move 8, and there's an extra copy of the sprite. When I finally step back to sprite 6, the duplicate disappears. If I go the other direction, 9 is copied to 11. At least once, I'd seen sprites 8 and 9 made triple, but I haven't been able to recreate it with this test.)
This test can be accurately emulated as ... well, really, anything with CHR-RAM. I've used it on mapper 218 hardware, and emulators as 0, 7, and 218.
I'd like it if other people could test it in their own hardware, just to make sure I'm not exceptional.
It always starts $2003←4×Cur $2004←Y[Cur] $2004←Cur $2004←Cur&3 $2004←X[Cur]
-- : it does nothing further
RS : finishes with $2003←0
ST : finishes with $2003←$FF $2004←X[$3F]
the X and Y arrays are initialized with X[foo]:=(foo&7)×32+12 and Y[foo]:=foo×3+16. In an earlier version I initialized Y[foo]:=(foo>>3)×24+28, but that showed different weird behavior (the sprite "O" would flicker as I slowly filled up the OAM with "good" data as it exceeded the 8 sprites per scanline limitation). This version pre-fills OAM with all $FF before the interactive portion starts.
On an emulator (and on a 2C07?), the only shown effects are the known "sprite 0 and 1 instead render the sprites pointed to by OAMADDR an extra time". On my NES, I get these effects-
* Almost always, only the latter half written to of any given row of OAM shows. So if I step through the OAM forward, only even sprites appear. If I step backwards, only odd sprites appear. Except:
* Sprites 8 or 9 are usually copied into Sprite 6 or 11, the opposite of the direction I step (i.e. if I step through it as 5, 6, 7, 8, 9, 10, 11, 10, 9, 8, I can move 8, and there's an extra copy of the sprite. When I finally step back to sprite 6, the duplicate disappears. If I go the other direction, 9 is copied to 11. At least once, I'd seen sprites 8 and 9 made triple, but I haven't been able to recreate it with this test.)
This test can be accurately emulated as ... well, really, anything with CHR-RAM. I've used it on mapper 218 hardware, and emulators as 0, 7, and 218.
I'd like it if other people could test it in their own hardware, just to make sure I'm not exceptional.
- Attachments
-
- oamtest2.zip
- (5.96 KiB) Downloaded 569 times
Re: Just how cranky is the PPU OAM?
I'm assuming that notation translates to the following code:$2003←4×Cur $2004←Y[Cur] $2004←Cur $2004←Cur&3 $2004←X[Cur]
RS : finishes with $2003←0
ST : finishes with $2003←$FF $2004←X[$3F]
Code: Select all
LDA cur
ASL A
ASL A
STA $2003
TAX
LDA oam,X
STA $2004
LDA cur
STA $2004
AND #$03
STA $2004
LDA oam+3,X
STA $2004
RS:
LDA #$00
STA $2003
ST:
LDA #$FF
STA $2003
LDA oam+$FF
STA $2004
Quietust, QMT Productions
P.S. If you don't get this note, let me know and I'll write you another.
P.S. If you don't get this note, let me know and I'll write you another.