Tile encoder/decoder
Moderator: Moderators
Tile encoder/decoder
I'm releasing to the public the work I've done so far on the tile compressor based on the format used by Codemasters in some of their games (like Bee52 and Big Nose Freaks Out). Feel free to use it in projects that require CHR compression. Note that it works specifically with NES tiles, so it will hardly work well with other kinds of data.
Apparently, the compressed tiles aren't as compact as the folks at Codemasters were able to make them, and my encoder is to blame. The format has the potential to be even more compact than theirs (because it doesn't have a few redundant bits the original has), but my encoder can't find the optimal way to divide the tiles into groups, and the key for this compression scheme is organizing tiles in groups of lengths that result in better compression.
The source code is included, and if anyone wants to give a shot at modifying the compressor so that it divides the tiles better than it currently does, they are welcome to. As long as nothing about the format is changed, the decompressor will not need to be changed.
Download here.
Please let me know if you plan on using this, and please post here if you have suggestions on how to improve the encoder.
Apparently, the compressed tiles aren't as compact as the folks at Codemasters were able to make them, and my encoder is to blame. The format has the potential to be even more compact than theirs (because it doesn't have a few redundant bits the original has), but my encoder can't find the optimal way to divide the tiles into groups, and the key for this compression scheme is organizing tiles in groups of lengths that result in better compression.
The source code is included, and if anyone wants to give a shot at modifying the compressor so that it divides the tiles better than it currently does, they are welcome to. As long as nothing about the format is changed, the decompressor will not need to be changed.
Download here.
Please let me know if you plan on using this, and please post here if you have suggestions on how to improve the encoder.
ooooo... I'm kind of a fan of compression stuff.
I might take a crack at touching up the compressor. Is the compression format included in the download?
*gets*
EDIT:
doh, no format outline in the package. Is the format documented anywhere? Or do I just have to decipher the source to figure it out?
EDIT2:
I dug up this post. Is this the same compression scheme?
http://nesdev.com/bbs/viewtopic.php?t=3 ... 2&start=15
EDIT3:
Tepples' followup post clarified a lot of it for me (actually I was totally lost until he throw in that C-ish syntax -- that really made it easier to understand). But there are still some format details that are confusing me. Specificlaly, what bits in the stream represent what.
Here's what I understand:
First byte in the stream is the number of tiles to decode, after that:
Do I understand this right? I have a few questions about this:
- I assume the tables can be changed mid-tile, correct? Just not mid-row.
- Can you change tables B,C, or D? I only see how you'd change table A from this. What determines the contents of B,C,D?
I might take a crack at touching up the compressor. Is the compression format included in the download?
*gets*
EDIT:
doh, no format outline in the package. Is the format documented anywhere? Or do I just have to decipher the source to figure it out?
EDIT2:
I dug up this post. Is this the same compression scheme?
http://nesdev.com/bbs/viewtopic.php?t=3 ... 2&start=15
EDIT3:
Tepples' followup post clarified a lot of it for me (actually I was totally lost until he throw in that C-ish syntax -- that really made it easier to understand). But there are still some format details that are confusing me. Specificlaly, what bits in the stream represent what.
Here's what I understand:
First byte in the stream is the number of tiles to decode, after that:
Code: Select all
1-bit to determine whether or not to rebuild the tables.
1: skip table rebuilding
0: rebuild table:
4 groups of 2 bits follow (8 bits total). These bits
rebuild Table A.
1-bit to determine row repeat
1: repeat previous row (next bit is the table rebuild bit)
0: this row is not a repeat
<<if row is not a repeat>>
2-bits to determine the color, and the mode (mode is A[color])
if the mode is 0, the same color is repeated for the whole row
if the mode is nonzero:
7 groups of 1-3 bits to determine the colors for the rest of the row
(as per tepples outline in that other thread)
- I assume the tables can be changed mid-tile, correct? Just not mid-row.
- Can you change tables B,C, or D? I only see how you'd change table A from this. What determines the contents of B,C,D?
Hehe... Sorry I didn't explain the format, but I didn't expect anyone to mess with the format itself. While coding the compressor I tried to separate the encoding (turning the tiles into a stream of bits) from the parsing (deciding how long each block should be).
If you look at the source you can see that I tried a few different parsing functions, and those are kinda self explanatory (I hope!). The parsing functions can use whatever method they see fit to fill an array of block lengths, which later are used for actual encoding.
But since you are interested in the format, let's get to the questions! You almost got it. Here is the format with as much detail as I can think of:
I hope the above is clear enough, I'm having trouble organizing it any better than this.
If you look at the source you can see that I tried a few different parsing functions, and those are kinda self explanatory (I hope!). The parsing functions can use whatever method they see fit to fill an array of block lengths, which later are used for actual encoding.
But since you are interested in the format, let's get to the questions! You almost got it. Here is the format with as much detail as I can think of:
Code: Select all
.start a new block by building the decompression tables:
.for each color (3 down to 0):
.2 bits indicate how many colors can follow the current color.
This value goes straight to table A, indexed by the current
color.
.if 1 or more colors can follow it, a color is specified in
encoded form. Since there are only 3 possibilities, the colors
are represented by the following codes: 0 (first possibility),
10 (second possibility) and 11 (third possibility). Depending
on the current color, the possibilities are different, because
they don't include the current color (for example, the 3
possible colors that can follow color 2 are 0, 1 or 3). The
specified color is enough to find all of them: if only one
color can follow the original one, the specified one is it. If
it's 2 colors, the specified one is discarded and the 2 that
are not it are used. If it's 3 colors, the other 2 are the
ones that are not it. These colors go to tables B (first color
that can follow the original), C (second) and D (third).
.decode the tiles that are part of the same block:
.1 bit selects between repeating the previous row or decoding a
new one (0 = new row, 1 = repeat previous row). It is possible
to repeat the first row of the first tile of a block, and a row
filled with color 0 is used in this case.
.if the row is not a repeat:
.2 bits specify the color of the first pixel. This color is
then used as an index into table A so that we know how many
colors can follow the current one.
.if no color can follow the current one, the same color is
repeated for the rest of the row.
.if there are colors following the current one, 1 bit selects
between repeating the previous pixel or reading a new one
(0 = new pixel, 1 = repeat).
.if the pixel is not a repeat, the next pixel depends on how
many colors follow the current one. If only 1 can follow it,
this is used (it's the color from table B). If 2 can follow
it, 1 bit selects between them (0 = color from table B, 1 =
color from table C). If 3 can follow it a value with 1 or 2
bits select between the 3 possibilities (0 = from table B,
10 = from table C, 11 = from table D).
.after each pixel is draw, its color is used as an index into
table A again so that the next pixel is processed the same way.
.once the tile is complete, 1 bit selects between starting a new
block or not (0 = no new block, 1 = new block).
No, they can't. A compressed stream always starts with a new block (and the definition of new tables), and after each completed tile a bit will select between continuing the current block or ending it and starting a new one.I assume the tables can be changed mid-tile, correct? Just not mid-row.
These are changed whenever a color can be followed by colors that are not itself. The first possible color goes to table B, the second goes to table C and the third goes to table D. Naming them with single letters was not a very good choice, but back then I wasn't sure what to name them. In the decoder I named them ColorCount (A), NextColor0 (B), NextColor1 (C) and NextColor2 (D), I hope those make more sense.Can you change tables B,C, or D? I only see how you'd change table A from this. What determines the contents of B,C,D?
- MetalSlime
- Posts: 186
- Joined: Tue Aug 19, 2008 11:01 pm
- Location: Japan
I don't know anything about those. If you have any links to information about them, I'd be interested.Dwedit wrote:How does it compare to the Contra algorithm, or the Zelda Oracles algorithm?
I can tell you though that this compression scheme often compresses tiles to 60% of their size, sometimes more, sometimes less. It's one of the most efficient I have studied.
I'll call it CMM (Codemasters Markov) for this post.
Both Contra and Who's Cuter use variants of byte run-length encoding for their tiles. (LJ65 and my current project use the same thing for their nametables.) In essence, RLE has a short code for "repeat the previous byte". RLE also has a short code for 2-color tiles, such as those in text fonts: the bottom (blank) row of a glyph's first plane and the 9 blank rows of the font's second plane make up one run. CMM likewise has short codes for "repeat the previous row" and "group of tiles uses only two colors".
But CMM also shrinks the data even when you don't have identical rows because it has a short code for repeated pixels within a row. Each pixel that's the same color as its neighbor to the left takes only 1 bit, not 2. This is a strict gain in tiles with three colors and an amortized gain in tiles with four colors because repeats are more likely than those colors that have to be coded with 3 bpp.
If this codec works, multicarts can get bigger. Imagine Forbidden Five (though not Punch-Out!! obviously).
And no, it isn't limited to NES tiles; it would be trivial to make the decoder output in Game Boy, Virtual Boy/NGPC, or any other 2bpp tile format. I seem to remember Commodore 64 also using 2bpp tiles in its 160px-wide mode.
Both Contra and Who's Cuter use variants of byte run-length encoding for their tiles. (LJ65 and my current project use the same thing for their nametables.) In essence, RLE has a short code for "repeat the previous byte". RLE also has a short code for 2-color tiles, such as those in text fonts: the bottom (blank) row of a glyph's first plane and the 9 blank rows of the font's second plane make up one run. CMM likewise has short codes for "repeat the previous row" and "group of tiles uses only two colors".
But CMM also shrinks the data even when you don't have identical rows because it has a short code for repeated pixels within a row. Each pixel that's the same color as its neighbor to the left takes only 1 bit, not 2. This is a strict gain in tiles with three colors and an amortized gain in tiles with four colors because repeats are more likely than those colors that have to be coded with 3 bpp.
If this codec works, multicarts can get bigger. Imagine Forbidden Five (though not Punch-Out!! obviously).
And no, it isn't limited to NES tiles; it would be trivial to make the decoder output in Game Boy, Virtual Boy/NGPC, or any other 2bpp tile format. I seem to remember Commodore 64 also using 2bpp tiles in its 160px-wide mode.
Last edited by tepples on Mon Dec 21, 2009 9:48 am, edited 1 time in total.
I don't know the Contra algorithm, I've only stepped through it a couple times, but didn't bother to decipher the format. It does not appear to be byte-aligned.
Zelda Oracles either uses an LZ based scheme, or its own special format which I call "16 bits and common byte".
"16 bits and common byte":
Read a 16 bit value, these will indicate whether to use the common byte, or to copy a byte.
If the value isn't zero (indicating that it has a common byte), read the common byte.
For each bit in the 16 bits, either output the common byte, or copy a byte.
Really simple. It was intended for level data but is also used for graphics for some reason.
You need at 3 instances of the common byte to break even.
"LZ" format:
You read a byte to indicate whether the next 8 things are literal bytes, or are LZ runs. You read another byte for the next 8 flags when you are out of bits.
For an LZ run, the distance is 5 bits, and the length is 3 bits. If the length is zero, it reads a byte to get an 8-bit length.
I think the game's "LZ format" has way too much overhead, of one bit per each byte!
Zelda Oracles either uses an LZ based scheme, or its own special format which I call "16 bits and common byte".
"16 bits and common byte":
Read a 16 bit value, these will indicate whether to use the common byte, or to copy a byte.
If the value isn't zero (indicating that it has a common byte), read the common byte.
For each bit in the 16 bits, either output the common byte, or copy a byte.
Really simple. It was intended for level data but is also used for graphics for some reason.
You need at 3 instances of the common byte to break even.
"LZ" format:
You read a byte to indicate whether the next 8 things are literal bytes, or are LZ runs. You read another byte for the next 8 flags when you are out of bits.
For an LZ run, the distance is 5 bits, and the length is 3 bits. If the length is zero, it reads a byte to get an 8-bit length.
I think the game's "LZ format" has way too much overhead, of one bit per each byte!
Here come the fortune cookies! Here come the fortune cookies! They're wearing paper hats!
Disch, if you need to ask anything, please do.
If anyone is wondering what are the differences between this format and the one by Codemasters, there are basically 2:
1. The original format used a bit to indicate that the first block was starting. I considered this redundant because before decompressing anything the first set of tables has to be built. This doesn't save much space, but why would I waste it, even if it's a single bit?
2. When defining the tables, no matter how many colors can follow the original one, I can always build the whole list from a single color. The original format needed an extra bit to select between the 2 remaining colors when 2 colors followed the original one, in addition to specifying the color that doesn't follow it.
Another difference, which has no effect on how compact the encoded data is, is that I use different codes for compressed colors, possibilities and such. The codes I picked seemed more appropriate to me, but the codes are the same size as before, so it really doesn't matter.
So, if the format is more compact, why aren't the encoded files? Because I'm not that good with math, and couldn't find an algorithm to divide the tiles into blocks of optimal lengths. I order words, I can't seem to select the best times to end a block and start another one. However, I took note of the block lengths used by Bee52 in the Codemasters logo tileset, hardcoded them to my compressor and the result was indeed smaller than the original data in the ROM (by only 4 bytes, but still). This proves that this format is more compact then theirs, once the optimal lengths are selected. Hopefully one day I (possibly with some help) will be able to make an algorithm that finds the best lengths.
If anyone is wondering what are the differences between this format and the one by Codemasters, there are basically 2:
1. The original format used a bit to indicate that the first block was starting. I considered this redundant because before decompressing anything the first set of tables has to be built. This doesn't save much space, but why would I waste it, even if it's a single bit?
2. When defining the tables, no matter how many colors can follow the original one, I can always build the whole list from a single color. The original format needed an extra bit to select between the 2 remaining colors when 2 colors followed the original one, in addition to specifying the color that doesn't follow it.
Another difference, which has no effect on how compact the encoded data is, is that I use different codes for compressed colors, possibilities and such. The codes I picked seemed more appropriate to me, but the codes are the same size as before, so it really doesn't matter.
So, if the format is more compact, why aren't the encoded files? Because I'm not that good with math, and couldn't find an algorithm to divide the tiles into blocks of optimal lengths. I order words, I can't seem to select the best times to end a block and start another one. However, I took note of the block lengths used by Bee52 in the Codemasters logo tileset, hardcoded them to my compressor and the result was indeed smaller than the original data in the ROM (by only 4 bytes, but still). This proves that this format is more compact then theirs, once the optimal lengths are selected. Hopefully one day I (possibly with some help) will be able to make an algorithm that finds the best lengths.
I meant that the encoder and decoder I provided are specific to NES tiles, just in case someone decides to try with name table data, for example, which I'm pretty sure wouldn't work well. But sure, modifying it to work with other 2bpp tile formats should be trivial (the encoder would also have to be modified to interpret the input file correctly, but that's trivial too).tepples wrote:And no, it isn't limited to NES tiles; it would be trivial to make the decoder output in Game Boy, Virtual Boy/NGPC, or any other 2bpp tile format.
Dwedit, it appears that the other formats you mentioned are byte-oriented, and I believe that CMM (as tepples named it), being pixel-oriented, will almost certainly compress better. This is because byte oriented compression will only take advantage of vertical redundancy (seeing as a byte represents an entire row of a tile), while pixel-oriented schemes will also take advantage of horizontal redundancy.
The way I see CMM, it looks like they wanted something like RLE (they use a bit to repeat rows, which resembles the way RLE works), but they wanted to be able to repeat pixels within a row as well. However, since adding repetition flags to every pixel and row would cause a lot of expansion when there was no repetition, they came up with this crazy scheme of limiting the possibilities of colors that can be used, so that shorter codes could be used, and that would counterbalance the expansion caused by the excess of flags. I don't know if it was actually designed like that, but this is what it looks like to me.
-
- Posts: 592
- Joined: Thu Aug 28, 2008 1:17 am
- Contact:
I call this RLZ. Just basically a bit mask for repeat byte/word/nibble/etc or literal fetch. 1bit RLE. I've a few versions of this. Some repeat a common/fixed value every so many bits (the size of the mask. I.e. next fixed value is fetched along with the bit mask) and some variations just repeat the last literal.Dwedit wrote:Zelda Oracles either uses an LZ based scheme, or its own special format which I call "16 bits and common byte".
"16 bits and common byte":
Read a 16 bit value, these will indicate whether to use the common byte, or to copy a byte.
If the value isn't zero (indicating that it has a common byte), read the common byte.
For each bit in the 16 bits, either output the common byte, or copy a byte.
Really simple. It was intended for level data but is also used for graphics for some reason.
You need at 3 instances of the common byte to break even.
Heh - LZSS with an extended/additional control code. 5bits is kind of a small window. But then again, the element size is smaller (2bit pixels)."LZ" format:
You read a byte to indicate whether the next 8 things are literal bytes, or are LZ runs. You read another byte for the next 8 flags when you are out of bits.
For an LZ run, the distance is 5 bits, and the length is 3 bits. If the length is zero, it reads a byte to get an 8-bit length.
I think the game's "LZ format" has way too much overhead, of one bit per each byte!
But then LZ77 works a lot better on platforms that have random access to VRAM (like Game Boy with rendering off), a work RAM big enough to hold a fairly large history (like Game Boy with rendering on or Super NES), or both (like GBA or PC). Methods with a static dictionary (e.g. Huffman, word encoding) and methods with a small state (e.g. RLE, CMM) work better when you have to decompress directly to a parallel port.