Re: Map data formats
Posted: Sat Mar 07, 2015 2:30 pm
Eh, it probably still helps with ROM space usage, which is probably what matters most.
Not exactly a trick, more like an advice: Don't bother with vertical alignment between name tables and 256x256-pixel metatiles. Converting between map space and NT space will probably require an awkward and time-consuming division by 15. It's much simpler to just have 2 CameraY coordinates, one relative to the level (which you use to read data from the map) and another one relative to the name tables (which you use for scrolling and drawing). Just update both by the same amount every time and have the NT one wrap from 239->0 and vice-versa.thefox wrote:Anyway, I've been thinking of a 6502 implementation of a multidirectional map scroller now, and wanted to ask if anybody has any useful tricks when it comes to that before starting on it.
I accounted for this in the code that draws the tiles to VRAM. The buffers are always 68 (rows) or 60 (columns) bytes, but the code that copied them to VRAM uses counters and/or pointers to break those in 2 parts (actually 4, since each metatile is 2x2 tiles).- Nametable crossing (both vertical and horizontal). When creating PPU updates, the update needs to be split into two pieces whenever the update would cross a nametable boundary. There should be at most one nametable crossing, since it wouldn't make sense to have row/column updates longer than roughly one nametable width/height. In the typical case there's exactly one crossing.
I handle that by always preparing 2 256x256-pixel metatiles for reading. When I start decoding data from the level map, I get the index of the block where the row/column starts and the one after it. For columns, the second block isn't always used.- Screen crossing. Again, there should be at most one of these per horizontal/vertical update. Not necessarily aligned with nametable crossings.
I use pointers to indicate which parts of the metatile I'll be reading. Each metatile has 4 children, but I only need to read 2, and based on the row/column coordinate I know which 2 those are, so I build a set of pointers before reading the data. This might not work so well in you case, since there are many 32x32-pixel metatiles inside the 256x256 ones, not only 4.- Different sub-metatile offsets (and metatile boundary crossing). E.g. we might start a vertical update from the left or the right side of a 32x32px metatile (and likewise for the 16x16px metatile within it). Might have a different vertical starting offset, too.
I have all 128 bytes chached just to make them easier to access (i.e. the addresses are always in sync with the attribute tables), but a portion the size of the screen should indeed be enough. In fact, if you don't plan on modifying tiles in the middle of the screen, you could probably get away with keeping track of the edges of the screen only.- Attribute updates. Need a "cache" in CPU memory of the current attribute contents to be able to construct the updates non-destructively. A cache of roughly 64 bytes should be enough.
I considered doing this too, but routines for columns or rows starting on each of the 256 metatiles of a screen would be too insane to do, and combining that with other solutions (like a pointer system) would probably not result in such an improvement.I was thinking it might make sense to figure out all of the different scenarios (different sub-metatile starting offsets, etc) that can happen, and write (probably with macros) separate routines for handling each one of them.
Yeah, I was planning to do this. In my last scroll engine I had actually a separate set of coordinates for each edge of the view area. But I'm not sure if that was a good idea, undecided at this point about whether to do it again. Also not sure whether to do even horizontal alignment. I think it would be kind of nice if there was no requirement about the horizontal map-nametable alignment (except obviously attribute tile alignment requirement, and probably 32x32px metatile alignment would be good to have also).tokumaru wrote:Not exactly a trick, more like an advice: Don't bother with vertical alignment between name tables and 256x256-pixel metatiles. Converting between map space and NT space will probably require an awkward and time-consuming division by 15. It's much simpler to just have 2 CameraY coordinates, one relative to the level (which you use to read data from the map) and another one relative to the name tables (which you use for scrolling and drawing). Just update both by the same amount every time and have the NT one wrap from 239->0 and vice-versa.
Interesting. I hadn't thought about doing it on the VRAM copy side.I accounted for this in the code that draws the tiles to VRAM. The buffers are always 68 (rows) or 60 (columns) bytes, but the code that copied them to VRAM uses counters and/or pointers to break those in 2 parts (actually 4, since each metatile is 2x2 tiles).
Thanks. I believe I did partial updates in my previous scrolling engine. This time I plan to profile the code before and after any changes.Another advice is: don't bother with partial attribute table updates. The logic to handle name table crossing will take as much time as simply updating an entire row or column, including the parts that are off screen.
Ah, that's also a good point. Can easily be done since we always know there's at most two.I handle that by always preparing 2 256x256-pixel metatiles for reading. When I start decoding data from the level map, I get the index of the block where the row/column starts and the one after it. For columns, the second block isn't always used.- Screen crossing
Good point about modifying stuff in the middle of the screen. I do want to have support for that, because of the aforementioned genericity goal.I have all 128 bytes chached just to make them easier to access (i.e. the addresses are always in sync with the attribute tables), but a portion the size of the screen should indeed be enough. In fact, if you don't plan on modifying tiles in the middle of the screen, you could probably get away with keeping track of the edges of the screen only.- Attribute updates. Need a "cache" in CPU memory of the current attribute contents to be able to construct the updates non-destructively. A cache of roughly 64 bytes should be enough.
I wasn't actually planning to have a full set of routines for all the options, but figure out a fair set of "similar" ones. But yeah, not really sure yet whether I'll go with this one.I considered doing this too, but routines for columns or rows starting on each of the 256 metatiles of a screen would be too insane to do, and combining that with other solutions (like a pointer system) would probably not result in such an improvement.I was thinking it might make sense to figure out all of the different scenarios (different sub-metatile starting offsets, etc) that can happen, and write (probably with macros) separate routines for handling each one of them.
You mean 32 metatiles 16x16px that are aligned to the 256x256px metatiles? That's an interesting idea, too. Hadn't thought about that one either, thanks.A solution I have used at some point is to always decode rows and columns that are 32 metatiles wide/tall from the level map using the fastest possible unrolled code, and then extract from there the tiles that are actually necessary. Sounds like a waste of resources, but sometimes it's faster to just do the unrestricted full process (which can usually be optimized) and discard some of the data than worrying about boundaries, counters and such during the process.
You could maybe write specific routines for each of the 16 rows and 16 columns, that always read 32 metatiles, and have the VRAM update code only use the part that will be visible next frame.
Well, my pointers and indices were calculated during draw time, and the VRAM routine just loaded up the indices and jumped to the locations indicated by the pointers. I had separate routines for each kind of VRAM update though, but if you have something more generic already set up (i.e. "copy NN bytes from XXXX to YYYY") it would probably make more sense to split rows and columns into multiple copy commands beforehand.thefox wrote:Interesting. I hadn't thought about doing it on the VRAM copy side.
Profiling would be a good idea. In my own design, the same logic I used to break name table updates in half was saving me very little time when applied to attributes to be worth the trouble. I gave up on it mostly because my VBlank time was split into "VRAM update slots", and there were only 2 slots per frame. Taking the DMA transfer, scroll setting and other minor housekeeping tasks into account, there were 840 or so cycles left for each of the 2 update routines. That was enough for updating a row/column of tiles along with a full row/column of attributes, so there really was no reason to insist on the split, because the little time saved wouldn't have been used for anything.I believe I did partial updates in my previous scrolling engine. This time I plan to profile the code before and after any changes.
Yeah, that's a serious requirement for me. I want to be able to destroy/move background objects and have actual content behind them, not only color 0.Good point about modifying stuff in the middle of the screen. I do want to have support for that, because of the aforementioned genericity goal.
Yeah, but I don't fully decode the 16x16s at this time, I just get their indices into a 32-byte array, in which I scan the range I'll actually need and then decode the tile indices and attributes in preparation for the VRAM updates.You mean 32 metatiles 16x16px that are aligned to the 256x256px metatiles? That's an interesting idea, too.
I'm glad to have given you something to think about. =)Hadn't thought about that one either, thanks.
Yeah, that's why I said "roughly 64" originally. I guess 9x9 = 81 bytes should be enough to cover all scenarios. But like you said, addressing the cache becomes trickier, probably need to keep a separate set of counters for the position(s) inside the attribute cache as well. For one-screen mirroring 64 bytes should be enough.tokumaru wrote:Now that I think of it, I'm not sure if only 64 bytes would be enough for keeping track of attributes, since the area covered by the camera can be slightly wider than 256 pixels depending on the alignment with the metatiles. Maybe if you force columns to be recalculated when switching scrolling directions, but I'm not sure. It's something to consider.
Code: Select all
0x00-0x03 = Associated tiles
0x04 = Collission settings
0x04 0B---xxxxx= Collission type / Slope shape
0x04 0B-xx----- = On solid part collission (0: Obstacle, 1: Ladder, 2: Top-collission only obstacle, 3: Unused)
0x04 0Bx------- = On empty part collission (0: Air, 1: Water)
0x05 = Collission behavior
0x05 0B-------x = On top collission (0: Regular, 1: Ice/low friction)
0x05 0B---abcd- = Get hurt on (a=1: top, b=1: left, c=1: right, d=1: bottom)
0x05 0B--x----- = Breakable block from underneath
0x05 0B-x------ = Pickable object
0x05 0Bx------- = Bounce down when bumped from underneath
0x06 = Block type
0x06 0B----xxxx = How does the block work (0: Regular, 1: Collectable, 2: Has something inside, 3: Unused, 4: 2-block pipe entrance up, 5: 2-block pipe entrance left, 6: 2-block pipe entrance right, 7: 2-block pipe entrance down, etc)
0x06 0B-1xx---- = This block is a segment of a 2x2 block (Giant World) (0: Upleft, 1: Upright, 2: Downleft, 3: Downright)
0x07 = Extra data (if 0x05=0B-1------ then it's the entity ID of the object that will be held (like Blue Block from SMB3). Else if 0x06=0B----0001 then it defines what you get when you collect. Else if 0x06=0B----0010 then it's what gets ejected from the block. Else if 0x06=0B----01-- then you select which pointer on the map will send the player to another locationCode: Select all
0x00 = Number of pages
0x01+Number of pages*2 (24-bit value) = Pointer to which place from which PRG-ROM bank
NEXT SEGMENT:
0x00 = Number of entities (enemies, pointers, etc.)
to be doneWhat metadata to store for blocks is not really relevant for this discussion. The 16x16px metatile format you propose is simple, but also uses quite a lot of space. Also I forgot to mention it in the original post I think, but I don't want the solution to depend on WRAM in any way.8bitMicroGuy wrote:...
thefox wrote: - Should not depend on extra cartridge RAM (but should be able to work with maps decompressed to WRAM).
Huh? Did I miss something in-between the posts that I didn't read?thefox wrote:Also I forgot to mention it in the original post I think, but I don't want the solution to depend on WRAM in any way.8bitMicroGuy wrote:...
I edited that into the first post after I noticed it. There's an "EDIT" note at the bottom of the first message.Roth wrote:Huh? Did I miss something in-between the posts that I didn't read?
That's usually my approach too. At some point I noticed you didn't say anything about this, but once we started discussing the decoding process it started looking like you didn't plan on using WRAM.thefox wrote:Also I forgot to mention it in the original post I think, but I don't want the solution to depend on WRAM in any way.
Code: Select all
Regular blocks:
0b XXXXXXXX XXXXYYCC CCCCCCCC
X's are block art
Y's are the color pattern
C's are the object's collission data: slope shape, obstacle/ladder/top-collission, deadly on touch, air/water; like explained in my previous post
Special blocks:
0b XXXXYYMM MMMMMMCC CCCCCCCC
X's are block art
Y's are block color pattern
C's are the object's collission data: same as above.
M's are the object metadata (Pipe target, what it contains, etc)Hah, you're dreaming if you think you can get away with only 512 bytes of game state just so you can have 1536 bytes for the level.8bitMicroGuy wrote:2048-256(stack)-256(zero page for player and enemy data)=1536 bytes
Code: Select all
MAPS
Each map is comprised of a Map Header, Superchunk Pointer Set, and Superchunk Data.
MAP HEADER
Each map is described by a 2b header value. These headers are located in the core ROM bank.
1b bank/index - location of main map bank
76543210
BBBBBiii
B = bank where Superchunk Pointer Set is located (0-31)
i = index of SuperChunk location in specified bank (0-7)
1b tileset used by this map and size flag
76543210
sstttttt
s = size of map
00 = 8x8 (2048p square map, pointer set is 64x2 = 128b)
01 = 16x16 (4096p square map, pointer set is 256x2 = 512b)
10 = 32x32 (8192p square map, pointer set is 1024x2 = 2kb)
11 = 64x64 (16384p square map, pointer set is 4096x2 = 8kb)
t = tileset used by this map.Code: Select all
SUPERCHUNK POINTER SET
The Superchunk Pointer Set is 2kb, and is located in bank B (0-31), at
[$8000 + $00iii000 00000000], where B and i are taken from the Map header.
The Superchunk Pointer Set designates a set of 8x8 to 64x64 superchunks. The Superchunk
Pointers are interleaved by halves. For a 32x32 superchunk map, the pointers are laid out
as such:
$0000 Map Lo byte SuperChunk pointer (1024)
$0400 Map Hi byte SuperChunk pointer (1024)
Each pointer is 2 bytes, and is comprised of:
76543210 76543210
bbbppppp ppPPPPPP
b = bank offset (0-7) added to the bank value B from the map header.
pP = pointer, pointing to $8000 + %00PPPPPP ppppppp0
Note that bank index can be incremented based on map index.
If a pointer is $0000, then there is no data for this superchunk. Otherwise, the
pointer points to memory location: [$8000 + %00PPPPPP ppppppp0] in bank [B + b].Code: Select all
SUPERCHUNK DATA SET
Each SuperChunk data set is comprised of 1b flags, 5b Chunk indexes, and any
combination of the following:
* Alternate SuperChunk data
* Actor data
* Egg data
1b Flags
76543210
.....EAL
L = 1 : has aLternate superchunk data
A = 1 : has Actor data
E = 1 : has Eggs data
5b Chunk Indexes
4b Chunks in this SuperChunk
1b Hi bits for Chunks
76543210
BRblURulCode: Select all
TileSet banks (4 TileSets per bank, 800 left over)
* Updated 01/02/13
Tileset header: 2b pointers.
Each tileset is 1848b, organized as follows:
255 MetaTile bitfields
1b pal index zero
255 MetaTile attributes
1b palette 0
255 MetaTile graphics UL
1b palette 1
255 MetaTile graphics UR
1b palette 2
255 MetaTile graphics LL
1b palette 3
255 MetaTile graphics LR
byte tilecount, always 208
tilecount b tiles low bytes.
tilecount / 2 b tiles hi byte
- (lo nibble is for $0 index, hi nibble is for $1 index)
- Tile graphic index is a 12 bit number, 0-4095
200 free bytes follow each tileset.