Map rendering for side scroller, coding question

Discuss technical or other issues relating to programming the Nintendo Entertainment System, Famicom, or compatible systems.

Moderator: Moderators

User avatar
Banshaku
Posts: 2404
Joined: Tue Jun 24, 2008 8:38 pm
Location: Japan
Contact:

Map rendering for side scroller, coding question

Post by Banshaku »

There is one approach I want to try but I'm not sure if it's good or not. I'm aware that I'm not always good at explaining things so I will do my best to make it clear.

The goal is for a side scroller. I have 2x2 meta-metatile made of 2x2 metatile (which mean the meta-metatile is 4x4 tile wide).

The metatile are exported in column style. This mean the content order will be:

Code: Select all

  1 3
  2 4
The meta-metatile have the same format. The map data is saved in column style too.

What I want to do is to write the map column by column. Let say we have 7 meta-metatiles per column.

The PPU is put in column mode (increment 32). Now when I read the first meta-metatile, I want to write the first "column" of the meta-metatile on the screen. This mean I will write all the A tiles first, shown in the example below:

Code: Select all

Meta-metatile content

    AB      CD

 1  13   3  13
    24      24

 2  13   4  13
    24      24
This mean I will read tile 1-2 of metatile 1 first then repeat the same process for metatile 2. Then I will skip to the next meta-metatile, repeating the same process (writing column A) for the 6 meta-metatile left. Once the first column is over, the PPU adress will now be in column 2 (not really, this 2 tiles left but let assume it is) of the screen and I must repeat the same process 3 times (column B, C, D).

Repeat the same idea for the rest of the column in the map. Only thing left to think about is to when to write the background attributes. My guess at the end of map rendering is better because this will avoid to change the PPU address pointer many times.

Doe this approach make sense?
Thanks for any comments.
User avatar
Memblers
Site Admin
Posts: 3901
Joined: Mon Sep 20, 2004 6:04 am
Location: Indianapolis
Contact:

Post by Memblers »

Yeah, that's the best way to do it.
My guess at the end of map rendering is better because this will avoid to change the PPU address pointer many times.
Just make sure the map rendering is completely separate from any PPU access, because there's not a lot of vblank time. As long as it's in a buffer that's output during NMI you can do about anything.
User avatar
tokumaru
Posts: 12106
Joined: Sat Feb 12, 2005 9:43 pm
Location: Rio de Janeiro - Brazil

Post by tokumaru »

This is one way to do it, but I have another suggestion that might make attribute handling easier. Instead of decoding a tile column at a time for every 8 pixels scrolled, you might consider decoding whole meta-metatiles at once.

Sure, the buffer needed to hold the tiles will have to be larger, but handling full blocks at once is easier than skipping and such. And since the area being decoded at once is conveniently as wide as the area covered by an attribute byte, attributes will be easier to code.

So, whenever the camera scrolls 64 pixels you could decode 7 whole meta-metatiles for a total of 112 bytes of name table data plus 7 bytes of attribute data. Just spread that amount over the next few VBlanks and you should be fine. If you arrange your buffers well and use the index registers cleverly, you can have the same piece of code always decode the blocks to the buffers and the same piece of code always send the buffers to the PPU, without having to handle special cases depending on what column you are rendering, and without wasting time going through the same blocks multiple times just to fetch different parts of them each time.

Anyway, this is just a suggestion, an alternative to the way you first proposed, but both ways are fine of course. since you thought of the other way first you might be more comfortable implementing that.

Most people have a really hard time handling attributes in scrolling games where tile data is updated on the fly, so you might consider it right from the start when designing your rendering engine. Even though it's much easier when the game only scrolls horizontally (no need to worry about attribute byte vertical misalignment) and you might have chosen to define attributes at the meta-metatile (which is conveniently the same size as the area covered by an attribute byte) level, I thought I'd mention it.
Banshaku wrote:Once the first column is over, the PPU adress will now be in column 2 (not really, this 2 tiles left but let assume it is) of the screen.
Well, even if you did write all 30 tiles of a column the address would not automatically move on to the second column, it would go into the attribute table and then into the next name table, so you'd still have to set the address for every tile column.
User avatar
Banshaku
Posts: 2404
Joined: Tue Jun 24, 2008 8:38 pm
Location: Japan
Contact:

Post by Banshaku »

Memblers wrote:Just make sure the map rendering is completely separate from any PPU access, because there's not a lot of vblank time. As long as it's in a buffer that's output during NMI you can do about anything.
Thanks for the comment. I will keep that in mind.
tokumaru wrote:Instead of decoding a tile column at a time for every 8 pixels scrolled, you might consider decoding whole meta-metatiles at once.
For now I didn't mention about decoding 1 tile column only for scrolling since I didn't consider in detail yet how to implement it. What I had in mind is to always decode a column of meta-metatile, then decode that column into (4) columns of tiles. But this is still only theoretical: I don't have any code, I'm thinking on how to approach the problem. The explanation above is how I would decode the column of meta-metatile.
tokumaru wrote:Sure, the buffer needed to hold the tiles will have to be larger, but handling full blocks at once is easier than skipping and such. And since the area being decoded at once is conveniently as wide as the area covered by an attribute byte, attributes will be easier to code.
I fully agree. Starting to skip column of tile inside a metatile seems a pain.

tokumaru wrote:without having to handle special cases depending on what column you are rendering, and without wasting time going through the same blocks multiple times just to fetch different parts of them each time.
For now my approach requires to check each meta-metatile 4 times. If you process the meta-metatile once, does the data format as an impact? Should it be column based on row based? How the impact on the buffer and the way to write the buffer to the PPU?
tokumaru wrote:since you thought of the other way first you might be more comfortable implementing that.
Since it's all theoretical and I don't mind about data format because my editor can produce the format I want (or I can program it), either way is fine with me. I'm just trying to find a way that is simpler to develop since my 6502 coding became quite rusty already ^^;;

tokumaru wrote:Well, even if you did write all 30 tiles of a column the address would not automatically move on to the second column, it would go into the attribute table and then into the next name table, so you'd still have to set the address for every tile column.
Oh. I though the column increment was a little bit more intelligent than that. Didn't saw any mention of it before. For the attribute table, now I can see why since it's at the end of the name table but why it does jump in the second name table after. hmmm.. guess there is something related to the way the memory is organized. I need to check the wiki or some doc to figure out this one.

Thanks for the comment Tokumaru.

Edit:

I checked and I can see why now. If I use vertical mirroring, writing 1 tile at a time at $2000 by increment of 32, once I write the last tile, I end up at $23C0. Once I skip 2 more line, it becomes $2400, becoming the first column of the second name table. So it just increase the address counter.
User avatar
Bregalad
Posts: 8036
Joined: Fri Nov 12, 2004 2:49 pm
Location: Caen, France

Post by Bregalad »

What is really hard in scrolling is handling direction changes from the user. If you decode a large portion of metatiles and split the updates in smaller parts, it works fine when scrolling to the same direction, but if you reverse the direction during the update real crap shit will happen and there is no workaround arround that.
Useless, lumbering half-wits don't scare us.
User avatar
tokumaru
Posts: 12106
Joined: Sat Feb 12, 2005 9:43 pm
Location: Rio de Janeiro - Brazil

Post by tokumaru »

Bregalad wrote:If you decode a large portion of metatiles and split the updates in smaller parts, it works fine when scrolling to the same direction, but if you reverse the direction during the update real crap shit will happen and there is no workaround arround that.
It doesn't have to be as catastrophic as you make it seem. In his situation, if a column of meta-metatiles is being rendered at the right, and the player does go back to the point where a new column has to be rendered at the left, there is nothing wrong in aborting the previous update.

See, when you go past a certain point (since his blocks are 32 pixels wide, that would be whenever bit 5 of the camera's coordinate changes) a column update is triggered. Say that it takes 4 VBlanks for him to fully update the column. If the player goes back and crosses the point in the opposite direction before those 4 frames are done, there is nothing bad about canceling that update to process the new one. There are not gonna be glitches, because in order to see the glitched section, he'd have to go past the trigger again, so there is no way he'll see the glitched section, because in order for it to scroll into view he'd have to move in the same direction for more than 4 frames.

Of course I'm assuming that the correctly rendered area is wider than the visible screen by 1 unit in order to hide scrolling glitches. This is the reason why scrolling in both directions is hard with only the stock 2 name tables: in one of the directions it's not possible to correctly render more than 1 screen woth of blocks, so there will be glitches unless you find a way to make the visible area smaller (with IRQs, sprite masks or whatever).
User avatar
tokumaru
Posts: 12106
Joined: Sat Feb 12, 2005 9:43 pm
Location: Rio de Janeiro - Brazil

Post by tokumaru »

Banshaku wrote:For now my approach requires to check each meta-metatile 4 times. If you process the meta-metatile once, does the data format as an impact? Should it be column based on row based? How the impact on the buffer and the way to write the buffer to the PPU?
Sometimes the way in which dat is arranged makes all the difference. I don't know if you'd need to arrange the data differently in the ROM, but the RAM buffers will probably look a bit "unconventional" if you're aiming at performance. Here's how I'd do it:

Each meta-metaile expands to 16 tiles, right? So, for easy indexing, I'd have 16 7-byte buffers to hold the decoded blocks. Something like this:

Code: Select all

Tile00 .dsb 7
Tile01 .dsb 7
Tile02 .dsb 7
Tile03 .dsb 7
Tile10 .dsb 7
Tile11 .dsb 7
Tile12 .dsb 7
Tile13 .dsb 7
Tile20 .dsb 7
Tile21 .dsb 7
Tile22 .dsb 7
Tile23 .dsb 7
Tile30 .dsb 7
Tile31 .dsb 7
Tile32 .dsb 7
Tile33 .dsb 7
The numbers are the coordinates of the tiles inside the meta0metatile ((0,0) to (3, 3)). OK, I don't know how you are getting the the index of the meta-metatile, but I'd write the decoding routine somewhat like this:

Code: Select all

	ldx #$00
-Decode:
	;LOGIC TO GET THE INDEX OF THE META-METATILE INTO THE ACCUMULATOR GOES HERE!

	;get the index of the 4 metatiles
	tay
	lda Metatile3, y
	pha
	lda Metatile2, y
	pha
	lda Metatile1, y
	pha
	lda Metatile0, y

	;decode the 1st
	tay
	lda Tile0, y
	sta Tile00, x
	lda Tile1, y
	sta Tile01, x
	lda Tile2, y
	sta Tile10, x
	lda Tile3, y
	sta Tile11, x

	;decode the 2nd
	pla
	tay
	lda Tile0, y
	sta Tile02, x
	lda Tile1, y
	sta Tile03, x
	lda Tile2, y
	sta Tile12, x
	lda Tile3, y
	sta Tile13, x

	;decode the 3rd
	pla
	tay
	lda Tile0, y
	sta Tile20, x
	lda Tile1, y
	sta Tile21, x
	lda Tile2, y
	sta Tile30, x
	lda Tile3, y
	sta Tile31, x

	;decode the 4th
	pla
	tay
	lda Tile0, y
	sta Tile22, x
	lda Tile1, y
	sta Tile23, x
	lda Tile2, y
	sta Tile32, x
	lda Tile3, y
	sta Tile33, x

	;move on to the next meta-metatile
	inx
	cpx #$07
	bne -Decode
It's partially unrolled, so it's fast but not too big. I use Y to index the block data because you might want to read it with zero page pointers instead. Oh, I don't know how you are storing the attributes, but they are probably ready as part of the meta-metatile or you'll have to form it by combining the bits of each of the 4 metatiles, but either way you'll have to store it in another array also indexed by X. Anyway, after the buffers are ready, you can write them to VRAM with something like this:

Code: Select all

	ldx FirsrTile
-Update:
	lda Tile00, x
	sta $2007
	lda Tile10, x
	sta $2007
	lda Tile20, x
	sta $2007
	lda Tile30, x
	sta $2007
	
	inx
	cpx #$07
	bne -Update
Of course you can unroll that, arrange the buffer backwards so that you don't need a "cpx" instruction, whatever you want in order to optimize it. Anyway, "FirstTile" would be set beforehand to 0, 7, 14 or 21, depending on which of the 4 columns you want to update.

I know Bregalad will probably say I'm crazy, he always does because he usually doesn't agree with the way I do things. But yeah, I am a fan of data interleaving and moderate levels of unrolling for some extra speed and reduced complexity of the code (less branches, less special cases and such). I hope my ideas give you some good ones.
User avatar
Bregalad
Posts: 8036
Joined: Fri Nov 12, 2004 2:49 pm
Location: Caen, France

Post by Bregalad »

Well, maybe you found a working workarround about the colum update, yet I'm pretty sure if the user is going for example right, then goes left a very small amount and goes right again, it's very hard for the scroll engine not to screw up if it decodes larger parts, which is almost always required by the fact you use metatiles, which is needed unless you have infinite ROM and time to draw your levels.

I remember that I was having multidirectionnal scrolling with a status bar "hiding" the row update glitches (using 1-screen mirroring), and it was impossible to me to have a system that allow the player to repeately change the vertical direction without doing any major screwing up.
Anyway I lost all the code that did that so if I'm rewriting it I will make a better version of it (but it wasn't for my current game project, but for another hypotetical future one).
Useless, lumbering half-wits don't scare us.
User avatar
Dwedit
Posts: 4470
Joined: Fri Nov 19, 2004 7:35 pm
Contact:

Post by Dwedit »

What about fully unrolled code in RAM? I think MC Kids uses that to do scrolling updates. That game also happens to have the 8K WRAM chip, but still...
Here come the fortune cookies! Here come the fortune cookies! They're wearing paper hats!
User avatar
tokumaru
Posts: 12106
Joined: Sat Feb 12, 2005 9:43 pm
Location: Rio de Janeiro - Brazil

Post by tokumaru »

Of course it's not possible to have clean scrolling in both directions, as we've discussed countless times. But provided there is some "wiggle room" (as is always the case when only one type of scrolling is used), it is possible to flush a buffer small pieces at a time.

Image
The red blocks are blocks that have already been rendered, the yellow box represents the area that is rendered to the screen.

Image
Once the camera moves right and crosses a block boundary, a new column of blocks must be drawn at the far right, after the block that will immediately be displayed (this is why you need some wiggle room).

Image
The data is fully decoded to RAM but only one column is sent to VRAM, so there are 3 more to go during the next few VBlanks.

Image
If the player changes his mind and goes back, there is no problem. If he causes the camera to cross a block boundary, the update in process is canceled and a new starts for the left side. If a boundary is not crossed, the update continues normally.

Image
The camera did cross the block boundary after 2 tile columns were updated, but that's not important at all. The new update at the left is processed normally.

Image
If the player doesn't change his mind again, that block will be updated, and that will continue to happen as long as the same direction is maintained. If the camera ever goes back to the right, that partially rendered block will be rendered from the start again.

So, there is no problem in updating blocks little by little if you have the something valid to display while the new data is rendered. The valid stuff will keep you from seeing any glitches.
User avatar
tokumaru
Posts: 12106
Joined: Sat Feb 12, 2005 9:43 pm
Location: Rio de Janeiro - Brazil

Post by tokumaru »

Dwedit wrote:What about fully unrolled code in RAM?
Totally unecessary, in my opinion. It is very much possible to update a great amount of VRAM with partially unrolled code in ROM.

In my scrolling engine I can scroll 16 pixels in both directions if necessary every frame, while also updating sprites, and the bulk of the unrolled code takes only 96 bytes. To update the palette and sparse blocks I need a break from the columns or rows, but I challenge you to find a game (specially a NES game, since not even Sonic on the MD goes that fast) that scrolls 16 pixels in both directions every frame. So chances are there will be time available for other updates besides rows and columns quite often.

If Banshaku really wanted to update all 4 column during a single VBlank, that would be quite possible with a little extra unrolling of the code I presented. I just assumed he'd want to spread the update in order to have a simpler NMI routine, but it could work both ways.
I think MC Kids uses that to do scrolling updates.
I believe I have expressed my opinions on this game a few times on this board already. I feel like it's nothing but a mediocre game that uses mediocre programming solutions, favoring hardware enhancements (such as extra RAM) to clever logic solutions.
tepples
Posts: 22345
Joined: Sun Sep 19, 2004 11:12 pm
Location: NE Indiana, USA (NTSC)
Contact:

Post by tepples »

tokumaru wrote:
I think MC Kids uses that to do scrolling updates.
I believe I have expressed my opinions on this game a few times on this board already. I feel like it's nothing but a mediocre game that uses mediocre programming solutions, favoring hardware enhancements (such as extra RAM) to clever logic solutions.
But it is a benchmark against which to measure your own programming solutions.
User avatar
tokumaru
Posts: 12106
Joined: Sat Feb 12, 2005 9:43 pm
Location: Rio de Janeiro - Brazil

Post by tokumaru »

tepples wrote:But it is a benchmark against which to measure your own programming solutions.
Yes, but just because the author wrote an article about how he made the game many people seem to look at it like the holy bible of platformers.

I have nothing against comparing current solutions to solutions used in the old commercial games, but we have to stop thinking of the programmers of those games as gods, celebrities or something. What they came up with is in no way better than what we can come up with nowadays. The only reason production was better back then is that they were paid to do it. If we were too we'd have much better stuff being released.

Also, this game in particular seems to come up more often than others, probably because of the article, that drew more attention to its internal workings.
User avatar
Dwedit
Posts: 4470
Joined: Fri Nov 19, 2004 7:35 pm
Contact:

Post by Dwedit »

I only brought it up because it's one of the few games which I've examined the VRAM update code for. The other games I've looked at are Battletoads, and Monster Max for the GB.

Monster Max used some crazy tricks for its platform, a series of ld hl,XXXX / push hl instructions stored inside RAM that wrote to VRAM.
Battletoads is just plain unroll crazy, even though it screws up by triggering the page crossing 1-cycle penalty many times.
Here come the fortune cookies! Here come the fortune cookies! They're wearing paper hats!
User avatar
Bregalad
Posts: 8036
Joined: Fri Nov 12, 2004 2:49 pm
Location: Caen, France

Post by Bregalad »

I guess completely unrolled loops are useless when partially unrolled loops can get to about 95% of their performance and waste ridiculously less ressources.
I can do 2 row/columns updates + sprite OAM + palette update with completely rolled loops tough.

Tokumaru your shematics are nice. Yet I fail to see exactly how you "trigger" updates, but I guess it doesn't really matter, as long as you say it works. I'd rather come up with my own project-specific solution anyway. I was just saying that the direction changes were often the more pain in the ass when working with updates split into small parts.
One cause of this is the combination of multidirectionnal scrolling and that.
I had a system (the one I lost I mentionned above) where it was the opposite of yours : I updated single 8-pixel columns at a time, and large 4-tile rows by splitting updates into small partss.

The problem was that when updating a column during the split updates of a row, the row was updated on a place it wasn't suppoed to (where the column was updated). When continuing the scroll in the same direction it would just get updated again so that wasn't a problem, but when changing my system was relying on the rest of the row not being updated to go bakward, and the resulted in possible garbage when changing direction vertically while scrolling horizontally. The workarround I eventually found is to only allow direction changing for values of the vertical scroll which are multiple of $10.
Also note that I was trying to minimize updates on that scroller to save them for CHR-RAM.
Useless, lumbering half-wits don't scare us.
Post Reply