Map data and compression for scrollable map?

Discuss technical or other issues relating to programming the Nintendo Entertainment System, Famicom, or compatible systems. See the NESdev wiki for more information.
tomaitheous
Posts: 592
Joined: Thu Aug 28, 2008 1:17 am

Post by tomaitheous »

I'm not sure I would even compress my tilemaps. Since all the methods described so far are sequential access for the decompression, that would add some overhead not to mention creating a marker system for entry points (I had to do this for a block of RLE compression fonts). For 8way scrolling, anyway.

Just setting up 16x16 metatile tilemap has some nice savings already. You'd only need 6bits to represent any given metatile. The upper bits could be used for palette association of that metatile. 8bits per 16x16 entry is pretty decent, plus the compression ratio is static and accessible in random access. If you broke down larger tilemaps into 256x240 pixel maps/screen, then you could compress for redundant screens that way.

Some examples that I've seen ingame:

Bonk's tilemap engine uses 16x16 metatiles. The palette association is hardcoded to the tile number for the stage. The overall tilemap is made up of screen sized tilemaps and these tilemaps are stored in vertical strips. I don't remember offhand if the vertical strips were compressed in RLE or not (probably is).
User avatar
clueless
Posts: 496
Joined: Sun Sep 07, 2008 7:27 am
Location: Seatlle, WA, USA

Post by clueless »

Crystalis does something like that. "Maps" are composed of a small M x N grid of block numbers and some palettes and music data. Each grid # represents a 16x15 screen of metatiles. This is why so much of the game looks repetitive.
tepples
Posts: 22994
Joined: Sun Sep 19, 2004 11:12 pm
Location: NE Indiana, USA (NTSC)

Post by tepples »

clueless wrote:Crystalis does something like that. "Maps" are composed of a small M x N grid of block numbers and some palettes and music data. Each grid # represents a 16x15 screen of metatiles.
Animal Crossing does much the same thing, building the town map out of acre-sized chunks.
doynax
Posts: 162
Joined: Mon Nov 22, 2004 3:24 pm
Location: Sweden

Post by doynax »

clueless wrote:And I disagree that RLE compression doesn't compress enough. It is enough to fit the entire overworld into a single MMC1 or MMC3 PROG-ROM bank. That makes decompression significantly easier (and faster) to implement. Typical MMC1 bank switching (LSR, STA 5 times) requires 2*(2+4) + jsr + rts = (too lazy to add it up) many CPU cycles.
If it's enough, then it's enough. RLE is certainly better than nothing. *shrug*

The thing about RLE compression isn't so much that it compresses poorly as that it forces you to use a particular graphical style. This works nicely for some games, such as SMB3 with its straight lines and open spaces, but to be honest I think it makes RPGs look rather plain. A bit of dithering in the grass and the occasional flower or tree stump makes all the difference for a game's look.

Granted, a general-purpose compression algorithm is unlikely to rival the compression rates of a well-designed custom scheme with data designed to make the best use of it. But I consider the advantage in freedom for the artist, as well as not to having to work out half a dozen specialized packers different parts of the game well worth the cost. The other drawback here is, of course, that you'd have to split up the overworld and other huge maps. Keeping the whole map in RAM is not without certain advantages though, such as being able to keep track of updated tiles or allowing NPCs to move around off-screen. Not to mention that a gigantic 128x128 map (73 screens?) could surely need a few tileset and palette changes.

As for bank-switching overhead, well, that would depend on your mapper and how the decompressor is designed. For MMC3 you can simply set up both slots to straddle the pair of banks crossed. And some compressors (my own included) have to deal with 256-byte boundaries anyway in order to support floppy streaming. Besides, anything which saves you from having to juggle half the data in the game in order to fill up banks as best as possible is a good thing.
Celius
Posts: 2158
Joined: Sun Jun 05, 2005 2:04 pm
Location: Minneapolis, Minnesota, United States

Post by Celius »

RLE is good for lots of things, I think. For example, the overworld in Final Fantasy is probably more than 50% water. RLE compresses these rows to be two or so bytes instead of 140 in some spots.

I made an RPG map scroller that loads an RLE compressed map like this. The major disadvantage with it is that it can take FOREVER to work through. The problem is that you have to go through every individual byte to get to the tile you want. There can be no calculating, as there's no way to tell what's compressed and what's not without checking every individual byte. It can take longer than a frame to get a column of metatiles depending on where you are in the map.
tepples
Posts: 22994
Joined: Sun Sep 19, 2004 11:12 pm
Location: NE Indiana, USA (NTSC)

Post by tepples »

For big maps that can be scrolled in any direction, recognizing long "runs" in one direction can be hard to deal with. Perhaps a quadtree subdivision method like tokumaru's might help: if a node is entirely water, entirely grass, etc., we can replace it with one byte.
Celius
Posts: 2158
Joined: Sun Jun 05, 2005 2:04 pm
Location: Minneapolis, Minnesota, United States

Post by Celius »

It might be a little hard to make an RPG map with block metatiles. I use 64x64 pixel metatiles for my platformer similarly compressed to Tokumaru's metatiles, but RLE works better for RPG maps as for land, there needs to be more unique looking areas. Though it'd be nice to compress blocks of water to 1 byte.
tepples
Posts: 22994
Joined: Sun Sep 19, 2004 11:12 pm
Location: NE Indiana, USA (NTSC)

Post by tepples »

Celius wrote:It might be a little hard to make an RPG map with block metatiles.
I figured that. I didn't mean apply tokumaru's solution directly, but instead to decompose each square area of the map (starting at 16x16 cell areas) into four subsquares. If a subsquare is solid, stop; otherwise, subdivide it and repeat.

Perhaps it might be easier for me to express my idea by drawing a diagram of how it would be applied to an actual RPG map. Which map should I try it on?
Celius
Posts: 2158
Joined: Sun Jun 05, 2005 2:04 pm
Location: Minneapolis, Minnesota, United States

Post by Celius »

Final Fantasy III. Or I, if it's easier to find.
doynax
Posts: 162
Joined: Mon Nov 22, 2004 3:24 pm
Location: Sweden

Post by doynax »

tepples wrote:
Celius wrote:It might be a little hard to make an RPG map with block metatiles.
I figured that. I didn't mean apply tokumaru's solution directly, but instead to decompose each square area of the map (starting at 16x16 cell areas) into four subsquares. If a subsquare is solid, stop; otherwise, subdivide it and repeat.
Hm. Presumably you'd pre-calculate/store a pointer to the start of each such 16x16 block for "random access." And avoid decoding partial blocks by extending the screen buffer by 16 tiles on each axis, perhaps combined with some convoluted scheme to spread out the cost of decompressing a row of blocks over multiple frames. Besides that you'd need some way of indicating whether to fill up the current subsection with a fixed pattern or further subdivide it. Like a bit vector or an escape tile.

Here's my counter-proposal. Why not do nearly the same thing, that is divide the map into 16x16 blocks with a pointer to the start of each one, but use RLE within these blocks instead? And to improve locality lets store the tiles within these blocks in an alternative order, say according to a space-filling curve (note that the rearrangement can be trivially implemented with some unrolled copying code.)
That ought to compress about as well as the subdivision approach and be quite a bit easier to implement. Well, hopefully anyway...
User avatar
clueless
Posts: 496
Joined: Sun Sep 07, 2008 7:27 am
Location: Seatlle, WA, USA

Post by clueless »

doynax wrote: Here's my counter-proposal. Why not do nearly the same thing, that is divide the map into 16x16 blocks with a pointer to the start of each one, but use RLE within these blocks instead? And to improve locality lets store the tiles within these blocks in an alternative order, say according to a space-filling curve (note that the rearrangement can be trivially implemented with some unrolled copying code.)
That ought to compress about as well as the subdivision approach and be quite a bit easier to implement. Well, hopefully anyway...
That is what I'm doing right now in my own project. My maps are arrays of 8x8 metatile blocks. Each block is RLE compressed. Identical blocks share the same RLE sequence. I plan to buffer more than a screen's worth of blocks in SRAM.

Except that I've not thought of the 'space filling curve' thing. I'm not quite sure what you mean by that.

My goal right now is not to get the most optimal compression. My goal is to get a reasonable trade-off between size and decompression speed and jump into actual scrolling and character actions (jumping, fighting, interacting with objects, etc...)

Can you please elaborate on the 'space filling curve'? Sounds like theoretical physics :)
doynax
Posts: 162
Joined: Mon Nov 22, 2004 3:24 pm
Location: Sweden

Post by doynax »

clueless wrote: My goal right now is not to get the most optimal compression. My goal is to get a reasonable trade-off between size and decompression speed and jump into actual scrolling and character actions (jumping, fighting, interacting with objects, etc...)
Again, I've got to wonder why you don't just make it easy for yourself and decompress the whole thing into memory when entering a region. Together with 2x2 metatiles you'll still get more than 18 screens if you set aside half of SRAM for the map.
Just plug in Exomizer and let it do its thing and I promise you that you'll get excellent compression rates.
clueless wrote: Can you please elaborate on the 'space filling curve'? Sounds like theoretical physics :)
There's nothing remotely scary about it.
Essentially there's no reason why you have to save the tiles within each 16x16 block in left-to-right top-to-bottom order. And one possible set of such alternative orders are the so-called space-filling curves. Think of it as a puzzle where you have to cover every square of a grid by without ever lifting your pencil.

With the right curve you can make sure that the starting point of a long RLE run is as close to the endpoint as possible, as opposed to the traditional scheme where the opposite is true. The idea here is much the same as for tepple's subdivision approach, that is to exploit the fact that four tiles in a square are (presumably) more likely to share the same pattern than four tiles on a line.

You should also notice that you don't have to work out the curve itself at runtime. Rather you can simply reorder the tiles within a decompressed block through a series of LDA/STA opcodes (or let the RLE routine push onto the stack and use a PLA/STA series.)
tomaitheous
Posts: 592
Joined: Thu Aug 28, 2008 1:17 am

Post by tomaitheous »

My goal right now is not to get the most optimal compression. My goal is to get a reasonable trade-off between size and decompression speed and jump into actual scrolling and character actions (jumping, fighting, interacting with objects, etc...)
You could setup the decompression routine as something similar to a processing thread. You just have to make sure to decompress enough rows into ram/buffer that the game tilemap engine can start to read and you would have to request the tilemap decompression system at an earlier state. It's tricky and requires knowing/calculating the max scrolling speed and some other variables. Plus, since you'd more than likely base it off number of bytes instead of a "time slice", you'd want to calculate for almost worst cast scenario in cpu cycles. This would lend it self nicely for LZSS or even more layered compressed variants like "pucrunch" or adding huffman to LZSS, etc.

If you limit your compression to a horizontal or vertical strip defined parameter, you can build in small headers for fast skipping, i.e. speed up "seeking" on sequential accessed streams. Add a more coarse layer(seek map/array) on top of that to really speed it up the seek time. RLE works great with this approach and "token pointer" compression also works pretty well, albeit slower than RLE. Do a variable length control code system and use both ;)

I'm curious if anyone else has seen fixed "seeded" buffers for LZSS decompression, before the decompressor starts? The decompression engine knows the LZ window is primed with a series of values before hand and can/will access them from the very start. I saw it once in a game (it was a 4k ring buffer) and looked like it helped out for structured patterns and data like tilemaps.

Also, if anyone's interested, pucrunch sets up nicely for a ring buffer setup. The standalone decompresser is already written in 6502, you just need to make some minor modifications. Chris Covell has already adapted it for rom based projects, shouldn't be to much trouble to move it back over to 6502 from 6280. It works fairly decent for 256 and 512 byte ring buffers. If you don't know, a ring buffer setup allows you to decompress an LZ stream directly to an I/O port.
tepples
Posts: 22994
Joined: Sun Sep 19, 2004 11:12 pm
Location: NE Indiana, USA (NTSC)

Post by tepples »

doynax wrote:
tepples wrote:
Celius wrote:It might be a little hard to make an RPG map with block metatiles.
I figured that. I didn't mean apply tokumaru's solution directly, but instead to decompose each square area of the map (starting at 16x16 cell areas) into four subsquares. If a subsquare is solid, stop; otherwise, subdivide it and repeat.
Hm. Presumably you'd pre-calculate/store a pointer to the start of each such 16x16 block for "random access." And avoid decoding partial blocks by extending the screen buffer by 16 tiles on each axis
That wouldn't be entirely necessary. You might traverse the quadtree a tile at a time, much like in tokumaru's decoder.
Besides that you'd need some way of indicating whether to fill up the current subsection with a fixed pattern or further subdivide it. Like a bit vector or an escape tile.
$00-$7F: subdivide this section and jump several nodes ahead
$80-$FF: fill this section with constant tile
Here's my counter-proposal. Why not do nearly the same thing, that is divide the map into 16x16 blocks with a pointer to the start of each one, but use RLE within these blocks instead?
You just described how Animala stores ACWW towns for copying them between the PC and DS.
And to improve locality lets store the tiles within these blocks in an alternative order, say according to a space-filling curve (note that the rearrangement can be trivially implemented with some unrolled copying code.)
RLE on a space-filling curve, such as the Hilbert curve, tends to look a lot like a quadtree.
Again, I've got to wonder why you don't just make it easy for yourself and decompress the whole thing into memory when entering a region. Together with 2x2 metatiles you'll still get more than 18 screens if you set aside half of SRAM for the map.
That's still a lot of SRAM. I'm trying to explore different points on the time-space tradeoff, especially for projects that can't use much more than the 2048 bytes in the NES itself.
User avatar
tokumaru
Posts: 12669
Joined: Sat Feb 12, 2005 9:43 pm
Location: Rio de Janeiro - Brazil

Post by tokumaru »

tepples wrote:You might traverse the quadtree a tile at a time, much like in tokumaru's decoder.
You know, I intentionally worked with quadtrees before for CHR compression, but I didn't realize I was doing it with my map compression! O_o

Although I don't take advantage of completely filled blocks in my engine (they all must have 4 sub-blocks), I could easily add a flag for each square, indicating if that square uses a sub-block or is simply filled with a single type of metatile.

And yeah, you can easily read rows and columns from this map on the fly, without having to decompress a whole screen. All you need to do is set up the high bytes of the pointers to each type of block. Once you have the index of the huge (256x256 pixels) block, you use it as the lower byte of the addresses that point to the next set of blocks (128x128).

When rendering rows or columns, you can be sure that only 2 sub-blocks of each block will be read (for rows it's either the top 2 or the bottom 2, and for columns it's either the left 2 or the right 2). So I need 2 pointers for each type of block (256x256, 128x128, 64x64, 32x32).

Here are some details about my decompression method:

Code: Select all

Metatiles are 16x16 pixels. The largest block is 16x16 metatiles, or 256x256 pixels. It is constantly divided into 4 pieces, each dived into 4 others, and so on until the metitiles are reached.

How to decode a 256x256 block:

X Coordinate (used for decoding columns): DCBAPPPP

PPPP: pixel coordinates, not used for decoding the map;
A: left or right of the 32x32 block;
B: left or right of the 64x64 block;
C: left or right of the 128x128 block;
D: left or right of the 256x256 block;

Y coordinate (used for decoding rows): DCBAPPPP
PPPP: pixel coordinates, not used for decoding the map;
A: top or bottom of the 32x32 block;
B: top or bottom of the 64x64 block;
C: top or bottom of the 128x128 block;
D: top or bottom of the 256x256 block;

These bits can be shifted right into the high bytes of the pointers. This will be handled differently depending on if you're rendering rows or columns, but once the pointers are ready the same code can be used to read rows and columns. The necessary pointer are the following:

Block256A, Block256B: 2 128x128 blocks of the 256x256 block;
Block128A, Block128B: 2 64x64 blocks of the 128x128 block;
Block64A, Block64B: 2 32x32 blocks of the 64x64 block;
Block32A, Block32B: 2 metatiles of the 32x32 block;

The pointers are reused as the parent blocks are completely decoded. The code looks basically like this:

	;decode 128x128 first block
	lda (Block256A), y ;get the index of the first 128x128 block
	sta Block128A+0 ;use it to read the first 128x128 block
	sta Block128B+0 ;use it to read the second 128x128 block

	;DECODE 128x128 BLOCK HERE

	;THE FIRST BLOCK WAS DECODED, REUSE POINTERS TO DECODE THE NEXT ONE!

	;decode 128x128 second block
	lda (Block256B), y ;get the index of the second 128x128 block
	sta Block128A+0 ;use it to read the first 128x128 block
	sta Block128B+0 ;use it to read the second 128x128 block

	;DECODE 128x128 BLOCK HERE

Y must be loaded with 0, because we don't use it (although it is used when reading single metatiles for collision purposes, because it's faster). This code is repeated for each block. The blocks are read from the largest to the smallest, and 2 of each type are processed. You can see that the index taken from the parent block is used to read the indexes of 2 child blocks, until the metatiles are reached, and they are read 2 at a time into a buffer.
This is what I do when scrolling, and for the amount of compression I get I'd say it's not that complicated or slow. With little modification, the blocks could be flagged as "compressed blocks", in which case the index read would not point to a child block, but to a metatile that is used to fill the entire block. This would prevent one from wasting blocks if there are no details inside them.

Thanks tepples, for suggesting this expansion. I don't know if I'll use it in my current project, but it's good to keep in mind that there is an interesting compression method that offers somewhat random access to the data.