AtariAge "CPU comparison"

Discussion of hardware and software development for Super NES and Super Famicom.

Moderator: Moderators

Forum rules
  • For making cartridges of your Super NES games, see Reproduction.
Sik
Posts: 1589
Joined: Thu Aug 12, 2010 3:43 am

Re: AtariAge "CPU comparison"

Post by Sik »

tepples wrote:Educated guess of the fetch pattern over the course of 16 pixels, at 2 pixels per 4-byte burst:
  1. BGA/window map (2 cells)
  2. BGB map (2 cells)
  3. BGA/window sliver for left tile of pair
  4. BGB sliver for left tile of pair
  5. sprite fetch?
  6. refresh?
  7. BGA/window sliver for right tile of pair
  8. BGB sliver for right tile of pair
Is that close?
Yeah now that you make me think about it I'm dumb, there's not enough space for a third background (maybe for more sprites instead? but if they were too strained for more color RAM I'm not hopeful about that). It goes like this, not necessarily in order though, and the fetches happen 16 pixels ahead of time (yes, it starts rendering from within border area already):
  • Plane A 2x cell entries
  • Plane B 2x cell entries
  • Plane A 1st cell silver
  • Plane A 2nd cell silver
  • Plane B 1st cell silver
  • Plane B 2nd cell silver
  • Sprite cell silver
  • Free slot (sometimes refresh)
So yeah mostly like you said, although I think it just reads the pairs of silvers in a row instead of separately (being 16px ahead of time makes this feasible). I just arranged it in a neater way to understand =P (plane A can be either scroll or window, btw)

In any case they had to use a faster clock for 320px than for 256px which resulted in larger bandwidth, so if the SNES had to do the same it'd end up with the same result (can fetch more stuff per line, so it doesn't lose anything). The only question is whether the memory was fast enough for this.

EDIT: btw the fetches above show why there's a bug if window plane is at the left side and scroll A moves horizontally, there's a 16px column where it would need to fetch data for both but it can't do it, so instead it repeats what it had fetched for the window.

EDIT 2: by the way, sprite table fetches happen during hblank, if anybody wonders.
tepples
Posts: 22345
Joined: Sun Sep 19, 2004 11:12 pm
Location: NE Indiana, USA (NTSC)
Contact:

Re: AtariAge "CPU comparison"

Post by tepples »

Sik wrote:I just arranged it in a neater way to understand
I arranged it to be more similar to the NES pattern (nametable, attribute, pattern plane 0, pattern plane 1).
Sik wrote:by the way, sprite table fetches happen during hblank, if anybody wonders.
So the exact opposite of NES and Super NES, where sprite table scanning happens in parallel with background fetches and the sprite patterns are fetched in hblank.

Correct me if I'm wrong, but it sounds like sprites appearing on line n are fetched earlier on the Genesis VDP:
  • Nintendo: table scan during draw of line n-1, then pattern fetch during hblank before line n
  • Sega (I think): table scan during hblank before line n-1, then pattern fetch during draw of line n-1
Sega's way would thus appear to need more secondary OAM inside the VDP than Nintendo's way. Perhaps the Genesis VDP has so little CRAM because it needed more die space for secondary OAM to design around a Nintendo patent.
Sik
Posts: 1589
Joined: Thu Aug 12, 2010 3:43 am

Re: AtariAge "CPU comparison"

Post by Sik »

Or maybe that was just the idea they came up with? I mean, they weren't shy about trying to make a D-pad despite Nintendo having owned a patent on it (though Sega's had a different mechanism internally I believe). Heck, the term "D-pad" actually comes from Sega.

Anyway: the crucial difference here is that the VDP keeps a cache with half of the sprite table (more specifically: Y coordinates, size and link order). It scans this list to figure out which (up to) 20 sprites may appear on that line, and then proceeds to fetch their X coordinates and tile IDs. This means that it doesn't need to retrieve the entire table from VRAM, just about a quarter of it. The time spent scanning the cache can be used to fetch other data from memory. And yeah, sprites are entirely rendered a line ahead of time, which is why if you reenable display mid-screen the first visible line won't show any sprites. Nearly nobody notices though =P

Actually, technically this should result in less internal memory, right? If I recall correcctly the PPU and SPPU keep all of the table on-die, while the VDP only needs to keep half the table this way. I wonder how much die space was spent on the linebuffers though (since to make this work it'd need two lines worth of buffered data, at 7 bits per pixel).

EDIT: also remember Sega's original intention was to allow for scaling sprites. In this case fetching and rendering ahead of time was probably a good thing.
User avatar
Drew Sebastino
Formerly Espozo
Posts: 3496
Joined: Mon Sep 15, 2014 4:35 pm
Location: Richmond, Virginia

Re: AtariAge "CPU comparison"

Post by Drew Sebastino »

Sik wrote:most games just statically assign slots anyway. Seriously, you lot are the only ones obsessed with the idea =P
I'm not sure if you were talking directly to psychopathicteen, but I believe I was the one who came up with the idea because I wanted to use it, and Stef seems to be trying to do the same thing, so yeah... :lol:
Sik wrote:Most games actually would have those graphics compressed in ROM which makes streaming not much of a feasible option in the first place.
I don't think the people here are trying to make "most games". :lol:
psycopathicteen wrote:Yes, most people are perfectly fine with giving each enemy only 4 frames each.
Exactly.

Good luck trying to pull off anything like this without trying to dynamically allocate sprites in vram: https://www.youtube.com/watch?v=lMx4iLp-EAc
Sik
Posts: 1589
Joined: Thu Aug 12, 2010 3:43 am

Re: AtariAge "CPU comparison"

Post by Sik »

Espozo wrote:I'm not sure if you were talking directly to psychopathicteen
This forum in general.
Espozo wrote:Good luck trying to pull off anything like this without trying to dynamically allocate sprites in vram: https://www.youtube.com/watch?v=lMx4iLp-EAc
Most of the time you won't be trying to go that far though, here I'm seeing people behave like you absolutely need this even for simpler games =P

For the record, you could still probably get by changing sprites slots every so often (and just because an address is reserved for something it doesn't mean you can't stream it). That's still different from dynamically assigning addresses all the time. Stream those backgrounds though, I don't think there's much tiling going on. Also I'd try to push the large sprites into backgrounds where possible, luckily there are usually only one or two and normally not overlapping in a bad way.
User avatar
Drew Sebastino
Formerly Espozo
Posts: 3496
Joined: Mon Sep 15, 2014 4:35 pm
Location: Richmond, Virginia

Re: AtariAge "CPU comparison"

Post by Drew Sebastino »

Sik wrote:This forum in general.
Yeah, I'm stupid, the way you said it even implied a group of people.
Sik wrote:here I'm seeing people behave like you absolutely need this even for simpler games =P
Well, psychopathicteen and Stef are advanced programmers, and I already know psychopathicteen is doing something pretty ambitious. I don't know how much he needs to do this though, at least yet. However, I'm not a good programmer, I guess I'm good at coming up with ideas but not being able to implement them. :lol:
Sik wrote:Stream those backgrounds though
I'd be impossible not to. I actually had a conversation about the feasibility of porting Metal Slug to the SNES (inside of the froyo topic) and it seems possible for the most part, you'd just need a giant cartridge. The most difficult part would be running into not having enough colors for sprites, and I said you could put them on a list vertically and swap out the colors when necessary. This takes a surprisingly large amount of CPU time when you think about it, but the game does run at 30fps to begin with...

Anyway, yeah.
Sik
Posts: 1589
Joined: Thu Aug 12, 2010 3:43 am

Re: AtariAge "CPU comparison"

Post by Sik »

Enemies pretty much always come in groups of similar types (which means a lot of palette sharing), and you can't go backwards in the level, so you could probably just swap palettes when a new group comes in. That'd be pretty cheap actually.
User avatar
Drew Sebastino
Formerly Espozo
Posts: 3496
Joined: Mon Sep 15, 2014 4:35 pm
Location: Richmond, Virginia

Re: AtariAge "CPU comparison"

Post by Drew Sebastino »

No, I mean it's down to the point that all objects onscreen at once will use too many palettes. However, it's very rare that there are objects with more than 8 different palettes on any given horizontal line. I tested this with several screenshots. It's possible for this to get messed up, but it would almost have to be deliberately done, at least on single player. Hell, if there are too many objects per line, they won't even display anyway, never mind the palette being incorrect (Of course, it's not like this is synched together, but most objects would be about 32x32 so no more than 8 objects in general.) The problem is changing the colors in time, which I think you can update a color per line per HDMA channel, but I think you can also time code to run during Hblank somehow and update more colors than possible with HDMA, they just have to be continuous.

So yeah, if you wanted to get this to work, it'd require a crap ton of tile swapping and changing colors per line, but I see it working. 30 fps definitely helps to deal with dumb stuff like this that the Neo Geo original didn't even have to.
93143
Posts: 1371
Joined: Fri Jul 04, 2014 9:31 pm

Re: AtariAge "CPU comparison"

Post by 93143 »

Espozo wrote:I think you can also time code to run during Hblank somehow and update more colors than possible with HDMA, they just have to be continuous.
That'd be an H- (or HV-)position-triggered interrupt with a DMA transfer inside. And yes, the colours would have to be in a small number of contiguous chunks, no more than four and perhaps only one chunk, otherwise you wouldn't be able to just DMA them in; you'd have to do a bunch of maneuvering and it would never fit. Technically you could do 8 chunks of one colour each, but then why aren't you using HDMA?

Theoretically, I think up to 21 colours should fit if they're all in one chunk, but the timing would have to be ridiculously precise, and IRQs aren't that precise; you'd need timed code, and doing it every line would eat most or all of your CPU time. A 15-colour sprite palette should work, but you wouldn't be able to use HDMA at all, for anything, lest it bump part of the DMA out into the next active line.

The advantage of HDMA is that it's automatic and has very low overhead. Running an H-IRQ every line can eat a lot of CPU time even for simple tasks.
Stef
Posts: 259
Joined: Mon Jul 01, 2013 11:25 am

Re: AtariAge "CPU comparison"

Post by Stef »

Sik wrote:And most games just statically assign slots anyway. Seriously, you lot are the only ones obsessed with the idea =P Most games actually would have those graphics compressed in ROM which makes streaming not much of a feasible option in the first place.
Hm i don't really agree with that, i think most game uses a mix of static allocated / dynamic allocated sprites in their engine.
Having everything static is a big constraint for the game design, you have to count about how many sprites of which kind / size for a level, how to recycle them etc... in term of level / code design having everything statically allocated can be really painful.
Dynamic resource allocation offers much more freedom and make the graphic engine code simpler and less convoluted. You "just" need to have an efficient Sprite Engine capable of dealing with dynamic resource allocation and that is.
Also in my case i'm basically developing an API, i want it to be simple but still powerful and flexible so you can use it for almost game situation. You always have the choice to build your own engine and only use low level methods but you will spent much more time doing that.
The Mega Drive has a completely different problem, which is the fact it has slower memory altogether (all accesses have to be in bursts of four consecutive bytes, and it only has enough time to read two bytes per pixel). To be fair they could have probably gotten room for a third background plane if they got rid of the free slots (much like the SNES does), although some of those were reserved for memory refresh so that may have been an issue =O)
Not sure what you mean by slower memory altogether, but given video memory on each system i think we cannot say that. The MD VRAM is specially designed to give (very) fast burst reading but slower random accesses. The VDP has be designed around this special memory try to take benefit from it (actually it partially does it) and if you count how much total bandwidth you obtain from both systems you end to have a bit more on MD than on SNES.
Last edited by Stef on Wed May 04, 2016 1:04 pm, edited 1 time in total.
psycopathicteen
Posts: 3001
Joined: Wed May 19, 2010 6:12 pm

Re: AtariAge "CPU comparison"

Post by psycopathicteen »

For the Genesis, I think it you can avoid defragment problems by having objects use multiples of 16 tiles, and then use power of 2 sizes for objects/sprites that are smaller than 32x32.
Last edited by psycopathicteen on Wed May 04, 2016 2:44 pm, edited 1 time in total.
Stef
Posts: 259
Joined: Mon Jul 01, 2013 11:25 am

Re: AtariAge "CPU comparison"

Post by Stef »

psycopathicteen wrote:For the Genesis, I think it you can avoid defragment problems by having objects use multiples of 16 tiles, and then use power of 2 sizes, for objects/sprites that are smaller than 32x32.
Indeed it can help but honestly that hurts :p Having no square sprite is one of the strength of the MD sprite capabilities, it is really handy to limit the number of sprite to use and the scanline sprite overflow.
User avatar
HihiDanni
Posts: 186
Joined: Tue Apr 05, 2016 5:25 pm

Re: AtariAge "CPU comparison"

Post by HihiDanni »

Nobody says you have to use the full slot when drawing a single sprite. You can just use a 16-tile slot for a 4x1 tile graphic, though since you have space leftover you might as well cram in a few frames of animation so there is less DMAing going on.

My game is actually going to be using 32x32 pixel slots, which will hold both 32x32 and 16x16 sprites.
SNES NTSC 2/1/3 1CHIP | serial number UN318588627
psycopathicteen
Posts: 3001
Joined: Wed May 19, 2010 6:12 pm

Re: AtariAge "CPU comparison"

Post by psycopathicteen »

Stef wrote:
psycopathicteen wrote:For the Genesis, I think it you can avoid defragment problems by having objects use multiples of 16 tiles, and then use power of 2 sizes, for objects/sprites that are smaller than 32x32.
Indeed it can help but honestly that hurts :p Having no square sprite is one of the strength of the MD sprite capabilities, it is really handy to limit the number of sprite to use and the scanline sprite overflow.
I mean't power of 2 for smaller sized sprites/metasprites. Larger metasprites would be multiples of 16.
I don't know how much he needs to do this though, at least yet.
You mean, how far it is to what I want it? For the most part, it's what I want. The only thing I'm still deciding is which animation method to do fire/energy balls, and elevators. It feels weird that there is always one static fireball and elevator sprite in VRAM, when I only use fireballs during boss fights, and I only use an elevator in one part of the level. I'll probably come up with an "automatic animation generator" mode, where certain metasprites trigger a generator to turn on, and there's a routine that automatically runs the animation just as long as there are that type of sprites onscreen.
User avatar
Drew Sebastino
Formerly Espozo
Posts: 3496
Joined: Mon Sep 15, 2014 4:35 pm
Location: Richmond, Virginia

Re: AtariAge "CPU comparison"

Post by Drew Sebastino »

psycopathicteen wrote:You mean, how far it is to what I want it? For the most part, it's what I want.
I mean how much you need this system for it to be possible to display the level of graphics you currently are. By checking duplicate tiles, the number of tiles across all the small robots onscreen would be greatly reduced.
Stef wrote:
psycopathicteen wrote:For the Genesis, I think it you can avoid defragment problems by having objects use multiples of 16 tiles, and then use power of 2 sizes, for objects/sprites that are smaller than 32x32.
Indeed it can help but honestly that hurts :p Having no square sprite is one of the strength of the MD sprite capabilities, it is really handy to limit the number of sprite to use and the scanline sprite overflow.
It's kind of funny how the backwards sprite capabilities of the SNES actually kind of comes in handy here. There are less (by a lot) sprite sizes, but there's also about one and a half times the sprites on the SNES.
psycopathicteen wrote:The only thing I'm still deciding is which animation method to do fire/energy balls, and elevators.
Personally, I'd do them the same way as any other object. They don't have to be erased from vram every time they're not onscreen, I'd code it to where they'd only be able to be overwritten when the boss is defeated. Shoot, I'd do every object outsize of a status bar (if that even counts) or something extremely commonplace and non changing in size (like coins) this way.

One kind of random thing I've noticed is the complete lack a sprite status bar to taking of advantage in vram in each tile in the sprite be different, like if I'm making a game with 16x16 and 32x32 sized sprites and I wanted to make the status bar out of sprites, I wouldn't give each number its own 16x16 tile, halving bandwidth where the score is, I'd put a line of unique 16x16's together and DMA tile data where on the sprite I want to overwrite, so instead of having 16x16 tiles of numbers 00-99, you'd just have 0-9. This further helps if you want to use the top or bottom 8 pixels for displaying something else.
Post Reply