Actually making progress: tricked out metasprite routine
Moderator: Moderators
Forum rules
- For making cartridges of your Super NES games, see Reproduction.
- Drew Sebastino
- Formerly Espozo
- Posts: 3496
- Joined: Mon Sep 15, 2014 4:35 pm
- Location: Richmond, Virginia
Actually making progress: tricked out metasprite routine
I know that the reason I crashed and burned last time I tried to implement my crazy vram setup is that I was trying to do a lot, but not having any way to test if what I was doing even worked correctly, and at that point I realized what would make the most sense is to make my metasprite routine work with my vram setup even before I made an animation engine or a vram slot finder. (I do have a tile uploader that works though.) Because I really liked psychopathicteen's linked list idea for vram slots, I decided to implement that into my metasprite routine by having the feature to where it will either stay on the same spot in vram, or will go to the next location on the linked list. It's not very efficient how I did it I imagine, partly because I used up x, y, and the direct page so I had to push and pull x. I actually don't want to have 16x16's and 32x32's for what I'm planning to do (not that many sprites total, but there are a lot of overlaying ones) so I only implemented 16x16 vram slots, as I have a miniature offset for a specific 8x8 in a 16x16 sized slot.
One problem that I encountered is that if I have a smaller sized sprite, it'll flip just like the larger one, so I had it check if the sprite is small or large, and then I would add 8 to the sprite's position if it were flipped. I did it in a very lousy way, but I don't know how else to do it. Also, metasprites just flip wherever instead of according to the width, because I don't really know how to program this and don't feel like thinking to hard considering I pretty much just did this all today.
Enjoy...
Kind of random, but that stobe-like effect has proven to be pretty useful as a CPU usage meter and especially to tell me if what I am programming has crashed or not.
One problem that I encountered is that if I have a smaller sized sprite, it'll flip just like the larger one, so I had it check if the sprite is small or large, and then I would add 8 to the sprite's position if it were flipped. I did it in a very lousy way, but I don't know how else to do it. Also, metasprites just flip wherever instead of according to the width, because I don't really know how to program this and don't feel like thinking to hard considering I pretty much just did this all today.
Enjoy...
Kind of random, but that stobe-like effect has proven to be pretty useful as a CPU usage meter and especially to tell me if what I am programming has crashed or not.
-
psycopathicteen
- Posts: 3001
- Joined: Wed May 19, 2010 6:12 pm
Re: Actually making progress: tricked out metasprite routine
Nice. I'm glad your making progress.
- Drew Sebastino
- Formerly Espozo
- Posts: 3496
- Joined: Mon Sep 15, 2014 4:35 pm
- Location: Richmond, Virginia
Re: Actually making progress: tricked out metasprite routine
Thanks! I had a lot of problems thinking about how to handle things like double buffering, but I think I got a solution. Instead of having a space in ram for the start of where each metasprite's tiles are, I'll have a separate space for the double buffered area. So, on a 64x64 double buffered object, there will be a slot for each 64x32 spot. The top part will still be linked to the bottom on the vram table, but it will first see if the bottom part even exists in vram (and if it doesn't, it will upload it). This is also useful for if you had a tank or something and needed to animate the treads but nothing else. The part that differs (the treads) would follow into the commonly shared part (the tank body). This is somewhat limited, but my original idea was way overcomplicated and had no real purpose, because if you wanted to have it as complicated as I wanted it originally, (kind of like the same as above, but each slot could go into any other, which totally screwed up the metasprite routine) then you would just use another object slot, which I edited to where if the identity is #$0001 (#$0000 is nothing) the object identifier won't jump to it, but the object slot searcher also won't overwrite it like it will with #$0000. So if I am animating a tank in my game engine, the body and treads will be one object, but the turret will be a separate object that doesn't actually have any code, as it is really only for visual purposes. Yeah, I'm not entirely sure how I'm going to program my vram idea, but it doesn't seem too hard.
- Drew Sebastino
- Formerly Espozo
- Posts: 3496
- Joined: Mon Sep 15, 2014 4:35 pm
- Location: Richmond, Virginia
Re: Actually making progress: tricked out metasprite routine
Okay, so I've been successful in nearly everything related to making this, except one thing: I can't seem to get it to where I'm uploading from the correct address. So, it is able to find what slots are empty, make the linked list correctly, and upload the tiles in the correct location, but it just isn't able to upload them from the correct location.
I even tried to make it static to where it is loading "LOWORD(Test1Tiles)" and stuff like that, but it still doesn't work, inexplicably. I had noticed that I have direct page somewhere other than 0, (it's at the start of the object it is currently looking at) and so I've put an "a:" in front of anything else. Is this not always the same as loading something normally when direct page is #$0000? I mean, that's all I can think of.
The code is a mess because I kept running out of hardware registers and things are named poorly (which is easy enough to fix though). I already know I'll have to go back and optimize it, but that's for another day.
This almost looks like gibberish to me so I don't expect anyone else to be able to understand it, but I figure I might as well post it here. The object's identity is #$0004. #$0000 counts as nothing for objects, and it also counts as nothing for vram slots, that's why a majority of the tables are being offset by -2.
Kind of random, but about things that are only used during one routine, I think I'll just have everything use its own space in vram and then just replace the names of everything so that they're all the same, like "Temporary1" or something like that. I have bigger concerns right now though.
Actually, hell, why not, here's the rom:
Dang it, I keep realizing I have things to say... The vram engine doesn't have any sort of double buffering thing because I realized that I really don't need it right now. I'll incorporate one if I ever get to 16x16 and 32x32 sized sprites.
I even tried to make it static to where it is loading "LOWORD(Test1Tiles)" and stuff like that, but it still doesn't work, inexplicably. I had noticed that I have direct page somewhere other than 0, (it's at the start of the object it is currently looking at) and so I've put an "a:" in front of anything else. Is this not always the same as loading something normally when direct page is #$0000? I mean, that's all I can think of.
The code is a mess because I kept running out of hardware registers and things are named poorly (which is easy enough to fix though). I already know I'll have to go back and optimize it, but that's for another day.
This almost looks like gibberish to me so I don't expect anyone else to be able to understand it, but I figure I might as well post it here. The object's identity is #$0004. #$0000 counts as nothing for objects, and it also counts as nothing for vram slots, that's why a majority of the tables are being offset by -2.
Code: Select all
.proc vram_engine
rep #$30 ;A=16, X/Y=16
lda #ObjectTable
tcd
ldy #$0002
vram_engine_loop:
ldx ObjectSlot::RequestedFrame
beq next_object
lda a:AnimationFrameSlotUsageTable-2,x
beq find_vram
inc a:AnimationFrameSlotUsageTable-2,x
lda a:AnimationFrameLinkedListTable-2,x
sta ObjectSlot::VramOffset
stz ObjectSlot::RequestedFrame
next_object:
stz ObjectSlot::RequestedFrame
inx
inx
tdc
clc
adc #ObjectSlotSize
cmp #ObjectTable+ObjectTableSize
bcs vram_engine_done
tcd
bra vram_engine_loop
vram_engine_done:
rts
find_vram:
lda a:TilesInFrameTable-2,x
sta a:TilesInFrame
lda a:VramAddressFrameTable-2,x
sta a:VramAddressOfFrame
lda a:VramBankByteFrameTable-2,x
sta a:BankByteOfFrame
find_vram_loop_1:
lda a:VramLinkedListTable-2,y
beq open_slot_found_1
iny
iny
cpy #$0102
bcc find_vram_loop_1
bra next_object
open_slot_found_1:
tya
sta a:AnimationFrameLinkedListTable-2,x
sta ObjectSlot::VramOffset
phy
ldy a:TileRequestCounter16x16
lda a:VramAddressToTransferAddressTable-2,x
sta a:TileRequestTable+VramAddress,y
lda a:VramAddressOfFrame
sta a:TileRequestTable+TileAddress,y
lda a:BankByteOfFrame
sta a:TileRequestTable+BankNumber,y
lda a:VramAddressOfFrame
clc
adc #$0020
sta a:VramAddressOfFrame
lda a:TileRequestCounter16x16
clc
adc #$0006
sta a:TileRequestCounter16x16
ply
dec a:TilesInFrame
beq next_object
tyx
iny
iny
cpy #$0102
bcc find_vram_loop_2
bra next_object
find_vram_loop_2:
lda a:VramLinkedListTable-2,y
beq open_slot_found_2
iny
iny
cpy #$0102
bcc find_vram_loop_2
bra next_object
open_slot_found_2:
tya
sta a:VramLinkedListTable-2,x
phy
ldy a:TileRequestCounter16x16
lda a:VramAddressToTransferAddressTable-2,x
sta a:TileRequestTable+VramAddress,y
lda a:VramAddressOfFrame
sta a:TileRequestTable+TileAddress,y
lda a:BankByteOfFrame
sta a:TileRequestTable+BankNumber,y
lda a:VramAddressOfFrame
clc
adc #$0020
sta a:VramAddressOfFrame
lda a:TileRequestCounter16x16
clc
adc #$0006
sta a:TileRequestCounter16x16
ply
dec a:TilesInFrame
beq jump_to_next_object
iny
iny
cpy #$0102
bcc find_vram_loop_2
jump_to_next_object:
brl next_object
.endprocCode: Select all
;=========================================================================================
.segment "RODATA"
;=========================================================================================
VramAdressToTileNumberTable:
.word $0000,$0002,$0004,$0006,$0008,$000A,$000C,$000E
.word $0020,$0022,$0024,$0026,$0028,$002A,$002C,$002E
.word $0040,$0042,$0044,$0046,$0048,$004A,$004C,$004E
.word $0060,$0062,$0064,$0066,$0068,$006A,$006C,$006E
.word $0080,$0082,$0084,$0086,$0088,$008A,$008C,$008E
.word $00A0,$00A2,$00A4,$00A6,$00A8,$00AA,$00AC,$00AE
.word $00C0,$00C2,$00C4,$00C6,$00C8,$00CA,$00CC,$00CE
.word $00E0,$00E2,$00E4,$00E6,$00E8,$00EA,$00EC,$00EE
.word $0100,$0102,$0104,$0106,$0108,$010A,$010C,$010E
.word $0120,$0122,$0124,$0126,$0128,$012A,$012C,$012E
.word $0140,$0142,$0144,$0146,$0148,$014A,$014C,$014E
.word $0160,$0162,$0164,$0166,$0168,$016A,$016C,$016E
.word $0180,$0182,$0184,$0186,$0188,$018A,$018C,$018E
.word $01A0,$01A2,$01A4,$01A6,$01A8,$01AA,$01AC,$01AE
.word $01C0,$01C2,$01C4,$01C6,$01C8,$01CA,$01CC,$01CE
.word $01E0,$01E2,$01E4,$01E6,$01E8,$01EA,$01EC,$01EE
VramAddressToTransferAddressTable:
.word $0000,$0020,$0040,$0060,$0080,$00A0,$00C0,$00E0
.word $0200,$0220,$0240,$0260,$0280,$02A0,$02C0,$02E0
.word $0400,$0420,$0440,$0460,$0480,$04A0,$04C0,$04E0
.word $0600,$0620,$0640,$0660,$0680,$06A0,$06C0,$06E0
.word $0800,$0820,$0840,$0860,$0880,$08A0,$08C0,$08E0
.word $0A00,$0A20,$0A40,$0A60,$0A80,$0AA0,$0AC0,$0AE0
.word $0C00,$0C20,$0C40,$0C60,$0C80,$0CA0,$0CC0,$0CE0
.word $0E00,$0E20,$0E40,$0E60,$0E80,$0EA0,$0EC0,$0EE0
.word $1000,$1002,$1004,$1006,$1008,$100A,$100C,$100E
.word $1200,$1220,$1240,$1260,$1280,$12A0,$12C0,$12E0
.word $1400,$1420,$1440,$1460,$1480,$14A0,$14C0,$14E0
.word $1600,$1620,$1640,$1660,$1680,$16A0,$16C0,$16E0
.word $1800,$1820,$1840,$1860,$1880,$18A0,$18C0,$18E0
.word $1A00,$1A20,$1A40,$1A60,$1A80,$1AA0,$1AC0,$1AE0
.word $1C00,$1C20,$1C40,$1C60,$1C80,$1CA0,$1CC0,$1CE0
.word $1E00,$1E20,$1E40,$1E60,$1E80,$1EA0,$1EC0,$1EE0
;=========================================================================================
TilesInFrameTable:
.word $0002,$0002
VramAddressFrameTable:
.word .LOWORD(Test1Tiles)
VramBankByteFrameTable:
.word .BANKBYTE(Test1Tiles),$00
;=========================================================================================
Test1Tiles:
.incbin "Test1.pic"
Test2Tiles:
.incbin "Test2.pic"
;=========================================================================================Actually, hell, why not, here's the rom:
Dang it, I keep realizing I have things to say... The vram engine doesn't have any sort of double buffering thing because I realized that I really don't need it right now. I'll incorporate one if I ever get to 16x16 and 32x32 sized sprites.
- Drew Sebastino
- Formerly Espozo
- Posts: 3496
- Joined: Mon Sep 15, 2014 4:35 pm
- Location: Richmond, Virginia
Re: Actually making progress: tricked out metasprite routine
Of course, in my "lol plz help omg" moment, I actually realized half of the mistakes I made. (For starters, the frame's identity was 4, but not all the tables even went to that.
)
I actually have it working now after somewhat randomly moving things around and then trying to make sense of the result, except one thing: I don't get it, if you had a 16x16, 4bpp tile at one offset and then several after that follow it, wouldn't you add #$80 to get the address of each additional tile? I mean, 16x16=256/2=128.
For whatever reason, it's not working, and it's leading me to believe that the assembler is causing the problem in how it is arranging data, because if offset the thing by #$40, it shows half of the first tile.
Yeah, is the data here non-linear?
I actually have it working now after somewhat randomly moving things around and then trying to make sense of the result, except one thing: I don't get it, if you had a 16x16, 4bpp tile at one offset and then several after that follow it, wouldn't you add #$80 to get the address of each additional tile? I mean, 16x16=256/2=128.
For whatever reason, it's not working, and it's leading me to believe that the assembler is causing the problem in how it is arranging data, because if offset the thing by #$40, it shows half of the first tile.
Yeah, is the data here non-linear?
Code: Select all
TestTiles:
.incbin "Test1.pic"
.incbin "Test2.pic"Re: Actually making progress: tricked out metasprite routine
That should work fine as far as I know, assuming those files are the right size.
Re: Actually making progress: tricked out metasprite routine
$40 != 128.Espozo wrote:I mean, 16x16=256/2=128.
For whatever reason, it's not working, and it's leading me to believe that the assembler is causing the problem in how it is arranging data, because if offset the thing by #$40, it shows half of the first tile.
EDIT: I misread. Owell.
Last edited by thefox on Sat Jul 09, 2016 7:40 pm, edited 1 time in total.
Download STREEMERZ for NES from fauxgame.com! — Some other stuff I've done: fo.aspekt.fi
- Drew Sebastino
- Formerly Espozo
- Posts: 3496
- Joined: Mon Sep 15, 2014 4:35 pm
- Location: Richmond, Virginia
Re: Actually making progress: tricked out metasprite routine
And apparently, they aren't... They're 512 bytes, for whatever reason: the first fourth is actual graphical data, the rest is 0 filled. I pcx2snes.Nicole wrote:That should work fine as far as I know, assuming those files are the right size.
Yeah, I just now made a 16x32 picture instead of two 16x16's, (there was really no point in having it split in the first place) and it works perfectly now. I don't think I need to upload another file, as it's not like it looks any different on the surface.
Anyway, I did find out one thing from all of this that you people might be able to use... It appears pcx2snes just doesn't output any file smaller than 512 bytes. I think we're long overdue for a new tool, but I don't have the kind of skill to make one. (I only know 65816 and a smidge of 80186 assembly.)
Man though, it sucks that it appears absolute addressing always takes one more cycle per instruction, because I'm going to have to fix a lot of my stuff for good performance.
-
KungFuFurby
- Posts: 264
- Joined: Wed Jul 09, 2008 8:46 pm
Re: Actually making progress: tricked out metasprite routine
pvSNESLib has gfx2snes, which can handle .pcx, .tga and .bmp files.
Re: Actually making progress: tricked out metasprite routine
I made one in Python that can handle at least BMP and PNG in multiple tile formats, including Super NES 4-bit. It's included with my Super NES project template.
Re: Actually making progress: tricked out metasprite routine
That's not quite true. Direct page takes an extra cycle if it's not page-aligned, meaning it takes just as long as an absolute instruction (assuming you're running in FastROM, so the extra byte fetch in the absolute instruction doesn't take any longer than the internal add in the direct-page instruction). And while indexing adds a cycle to absolute instructions if X/Y are 16-bit, it adds a cycle to direct page instructions regardless of the index register size.Espozo wrote:it appears absolute addressing always takes one more cycle per instruction
So, for simple load/store/add/whatever instructions (not RMW or anything fancy):
- direct - 3 cycles
- absolute - 4 cycles
- direct non-page-aligned - 4 cycles
- direct indexed - 4 cycles
- direct indexed non-page-aligned - 5 cycles
- absolute 8-bit indexed - 4 cycles
- absolute 16-bit indexed - 5 cycles
Notice that for indexed accesses, if X/Y are 8-bit and the bottom byte of DP is nonzero, absolute is faster.
Add one cycle to all of those if the data is 16-bit. I'll knock off there; see 65c816.txt for further information.
If you want to know how many slow cycles each instruction has, just count the number of byte accesses in slow memory. Everything else is fast.
Re: Actually making progress: tricked out metasprite routine
pcx2snes/gfx2snes are known to be buggy, even with bigger files.Espozo wrote:Anyway, I did find out one thing from all of this that you people might be able to use... It appears pcx2snes just doesn't output any file smaller than 512 bytes.
I'll try out tepples' script shortly.
Some of my projects:
Furry RPG!
Unofficial SNES PowerPak firmware
(See my GitHub profile for more)
Furry RPG!
Unofficial SNES PowerPak firmware
(See my GitHub profile for more)
- Drew Sebastino
- Formerly Espozo
- Posts: 3496
- Joined: Mon Sep 15, 2014 4:35 pm
- Location: Richmond, Virginia
Re: Actually making progress: tricked out metasprite routine
Man, all that is hard to keep track of.93143 wrote:That's not quite true. Direct page takes an extra cycle if it's not page-aligned, meaning it takes just as long as an absolute instruction (assuming you're running in FastROM, so the extra byte fetch in the absolute instruction doesn't take any longer than the internal add in the direct-page instruction). And while indexing adds a cycle to absolute instructions if X/Y are 16-bit, it adds a cycle to direct page instructions regardless of the index register size.
Would you like to have your blank data in the front, or the back?Ramsis wrote:pcx2snes/gfx2snes are known to be buggy, even with bigger files.
Anyway, I got to thinking that my next step in my grand SNES adventure would be to make it where old frames are deleted whenever an object changes frames. I suppose I'll have it to where there's the existing "FrameRequest" thing, but also have a "CurrentFrame". What it will do is see if the frame request is equal to the current frame, and if it is, do nothing. If it isn't, it would upload the frame and copy the frame request into the current frame. It would also get rid of what was then the current frame if nothing else is using it. (There's a counter of how many objects are using a particular frame, so if it's 0, follow the linked list, replacing it with #$0000 on every entry in the linked list as that acts as an empty slot.) I'm not really sure I'll have an animation engine, because it would be a giant mess if I were to implement everything that I want out of it. For example implementing tank treads or tires moving is a major pain: there'd have to be the feature of playing animation at different speeds, and also playing animations backwards. Also if one thing is this fancy, everything has to be, and that could be unnecessarily slow. I think I'll just hardcode everything.
Re: Actually making progress: tricked out metasprite routine
That might be a job for a profiler. Instead of breakpoint, you set a profile point on a JSR, and the emulator counts cycles for you.Espozo wrote:It would be so awesome if there was a way to have the cycles per instruction shown while you were typing code, but that would mean this theoretical program would have to assemble the whole file each and ever time you did anything.
So far this sounds like the scheme I used for Haunted: Halloween '85. I was sometimes able to fit two frames' tiles into one slot if they shared many, so that I could get away with this trivial frame request more often.Espozo wrote:I suppose I'll have it to where there's the existing "FrameRequest" thing, but also have a "CurrentFrame". What it will do is see if the frame request is equal to the current frame, and if it is, do nothing. If it isn't, it would upload the frame and copy the frame request into the current frame.
And there it differs. You're using reference-counted GC, which is quite a bit more complicated than what I used. I just fully double-buffered all actors' cels, which was fine for the number and size of enemies that engine supported but may not be fine for a more detailed game.Espozo wrote:There's a counter of how many objects are using a particular frame
- Drew Sebastino
- Formerly Espozo
- Posts: 3496
- Joined: Mon Sep 15, 2014 4:35 pm
- Location: Richmond, Virginia
Re: Actually making progress: tricked out metasprite routine
So I take it you took the more simple but faster method of having a fixed spot in vram for each object? Yeah, I can't think of a single game that is doing anything as complicated as I am, but I am worried about how it will run. However, I won't concern myself with compression because I'll just make the cartridge bigger, so that'll save a good amount of time. My whole thing is trying to fit as much data into the little 16KB of vram available to sprites as possible, because I think one of the main differences you can tell about an SNES game vs a Neo Geo game or something is how much less diverse the backgrounds and especially the sprites are on the SNES because most everything is often crammed into vram and never swapped out. (Often times, the only thing that is is the character.) Heck, most games don't even seem to go anywhere near using the whole vram bandwidth, which isn't even large to begin with.
Anyway, I'll post when I get my slot deletion thing working. The current slot and frame slot approach seems to be the best, which is why it also seems pretty popular.
Anyway, I'll post when I get my slot deletion thing working. The current slot and frame slot approach seems to be the best, which is why it also seems pretty popular.