Banshaku wrote:For now my approach requires to check each meta-metatile 4 times. If you process the meta-metatile once, does the data format as an impact? Should it be column based on row based? How the impact on the buffer and the way to write the buffer to the PPU?
Sometimes the way in which dat is arranged makes all the difference. I don't know if you'd need to arrange the data differently in the ROM, but the RAM buffers will probably look a bit "unconventional" if you're aiming at performance. Here's how I'd do it:
Each meta-metaile expands to 16 tiles, right? So, for easy indexing, I'd have 16 7-byte buffers to hold the decoded blocks. Something like this:
Code: Select all
Tile00 .dsb 7
Tile01 .dsb 7
Tile02 .dsb 7
Tile03 .dsb 7
Tile10 .dsb 7
Tile11 .dsb 7
Tile12 .dsb 7
Tile13 .dsb 7
Tile20 .dsb 7
Tile21 .dsb 7
Tile22 .dsb 7
Tile23 .dsb 7
Tile30 .dsb 7
Tile31 .dsb 7
Tile32 .dsb 7
Tile33 .dsb 7
The numbers are the coordinates of the tiles inside the meta0metatile ((0,0) to (3, 3)). OK, I don't know how you are getting the the index of the meta-metatile, but I'd write the decoding routine somewhat like this:
Code: Select all
ldx #$00
-Decode:
;LOGIC TO GET THE INDEX OF THE META-METATILE INTO THE ACCUMULATOR GOES HERE!
;get the index of the 4 metatiles
tay
lda Metatile3, y
pha
lda Metatile2, y
pha
lda Metatile1, y
pha
lda Metatile0, y
;decode the 1st
tay
lda Tile0, y
sta Tile00, x
lda Tile1, y
sta Tile01, x
lda Tile2, y
sta Tile10, x
lda Tile3, y
sta Tile11, x
;decode the 2nd
pla
tay
lda Tile0, y
sta Tile02, x
lda Tile1, y
sta Tile03, x
lda Tile2, y
sta Tile12, x
lda Tile3, y
sta Tile13, x
;decode the 3rd
pla
tay
lda Tile0, y
sta Tile20, x
lda Tile1, y
sta Tile21, x
lda Tile2, y
sta Tile30, x
lda Tile3, y
sta Tile31, x
;decode the 4th
pla
tay
lda Tile0, y
sta Tile22, x
lda Tile1, y
sta Tile23, x
lda Tile2, y
sta Tile32, x
lda Tile3, y
sta Tile33, x
;move on to the next meta-metatile
inx
cpx #$07
bne -Decode
It's partially unrolled, so it's fast but not too big. I use Y to index the block data because you might want to read it with zero page pointers instead. Oh, I don't know how you are storing the attributes, but they are probably ready as part of the meta-metatile or you'll have to form it by combining the bits of each of the 4 metatiles, but either way you'll have to store it in another array also indexed by X. Anyway, after the buffers are ready, you can write them to VRAM with something like this:
Code: Select all
ldx FirsrTile
-Update:
lda Tile00, x
sta $2007
lda Tile10, x
sta $2007
lda Tile20, x
sta $2007
lda Tile30, x
sta $2007
inx
cpx #$07
bne -Update
Of course you can unroll that, arrange the buffer backwards so that you don't need a "cpx" instruction, whatever you want in order to optimize it. Anyway, "FirstTile" would be set beforehand to 0, 7, 14 or 21, depending on which of the 4 columns you want to update.
I know Bregalad will probably say I'm crazy, he always does because he usually doesn't agree with the way I do things. But yeah, I am a fan of data interleaving and moderate levels of unrolling for some extra speed and reduced complexity of the code (less branches, less special cases and such). I hope my ideas give you some good ones.