Optimizing the usage of character-related data

Discuss technical or other issues relating to programming the Nintendo Entertainment System, Famicom, or compatible systems. See the NESdev wiki for more information.

Moderator: Moderators

User avatar
DRW
Posts: 2225
Joined: Sat Sep 07, 2013 2:59 pm

Optimizing the usage of character-related data

Post by DRW »

I'm looking for a way to optimize the usage of my character-related data.

In my game, the properties of all characters on screen are stored in separate arrays. Something like this:

Code: Select all

byte ChrsType[CharCount];
byte ChrsX[CharCount];
byte ChrsY[CharCount];
byte ChrsEnergy[CharCount];
/* ... */
When I process a specific character, I first copy all variables from the current array index to single variables, so that I don't have to use too much array access in the function:

Code: Select all

LDX CharIndex
LDA ChrsType, X
STA CharType
LDA ChrsX, X
STA CharX
LDA ChrsY, X
STA CharX
LDA ChrsEnergy, X
STA CharEnergy
And after the function, I copy the values back to the array.

However, while the variables for the player character, for weapons and items are relatively few, each opponent has 35 variables.
This means, in each frame for each opponent, I have 140 LDAs and STAs just for preparation stuff. And most of the stuff gets used only once or not at all during this specific frame.
But simply omitting the rarely-used variables and accessing them directly in the array wouldn't be a solution either, since each read or write of those variables would have yet another LDX CharIndex.

So, is there any way to improve this stuff in a meaningful way? How can I reduce the huge overhead of copying 35 variables from arrays to single variables and then back to the arrays again?


(Unfortunately, I'm not aware of any banking method for RAM, where I would store the variables for each character in the same memory location of different banks and then activate the bank of the currently-processed character.)


By the way, in case you're interested, these are the variables of an opponent. Apart from shortening the access, I would also welcome ways to reduce the number of variables in general:

Code: Select all

BaseType
Type
MetaSpriteIndex
CurrentMetaSpriteIndexForStanding
FacingDirection
X
Y
UntouchableStatus
FlickerDuringUntouchable
PaletteCycleCounter
HorizontalDirection
VerticalDirection
Energy
WalkingPixelsPerFramesIndex
WalkingSpeedId
WalkingCounter
PatternIndex
SubPatternType
SubPatternCounter
UnattackableSoundCounter
PatternIsAdditional
CooldownCounter
CooldownType
StunnedOrThrownAwayCounter
HorizontalDirectionBeforeThrownAway
VerticalDirectionBeforeThrownAway
LoadAfterAttackedPatternNext
DyingStatus
DyingCounter
PreDyingType
IsBoss
LastHorizontalFacingDirection
SubPatternParameter1
SubPatternParameter2
Pattern
My game "City Trouble":
Gameplay video: https://youtu.be/Eee0yurkIW4
Download (ROM, manual, artworks): http://www.denny-r-walter.de/city.html
calima
Posts: 1745
Joined: Tue Oct 06, 2015 10:16 am

Re: Optimizing the usage of character-related data

Post by calima »

Not much can be done. I usually just do the array access, and to avoid constant loads, don't use the X/Y reg for any computation.
Oziphantom
Posts: 1565
Joined: Tue Feb 07, 2017 2:03 am

Re: Optimizing the usage of character-related data

Post by Oziphantom »

what are we optimising for? speed? RAM cost? ROM cost?

while having yet another ldx CharIndex shouldn't really be a problem as you want to load it as infrequently as possible, and maintain X for most of your code as its the important look up value.
FacingDirection, UntouchableStatus, HorizontalDirection, VerticalDirection, PatternIsAdditional, HorizontalDirectionBeforeThrownAway, VerticalDirectionBeforeThrownAway, LastHorizontalFacingDirection
these all seem to be bool flags, so 1 bit. you could do a NV pack and put 2 per byte and use BIT to branch and detect each sub bit. You could also then pack 8 into a single byte and use lda #value BIT address,x to check for the status of sub bits with NV being faster for more commonly used values.
DyingStatus
DyingCounter
does the counter need all 8 bits? how many bits does the status need? is the status a bool? then the counter being +ve tells you that status is set and that +ve value is the counter.
CooldownCounter
CooldownType
likewise.
SubPatternType
SubPatternCounter
again.
User avatar
DRW
Posts: 2225
Joined: Sat Sep 07, 2013 2:59 pm

Re: Optimizing the usage of character-related data

Post by DRW »

calima wrote: Wed Jun 29, 2022 11:46 pm I usually just do the array access, and to avoid constant loads, don't use the X/Y reg for any computation.
Since my code is written mostly in C, I cannot preserve any registers. But even if I write in pure Assembly, having to work without a usable X register because it has to keep the same variable at all times would be a huge hassle.

Oziphantom wrote: Thu Jun 30, 2022 2:28 am what are we optimising for? speed? RAM cost? ROM cost?
Speed. The only thing that's not extendable.
Oziphantom wrote: Thu Jun 30, 2022 2:28 amFacingDirection, UntouchableStatus, HorizontalDirection, VerticalDirection, PatternIsAdditional, HorizontalDirectionBeforeThrownAway, VerticalDirectionBeforeThrownAway, LastHorizontalFacingDirection
these all seem to be bool flags, so 1 bit.
Unfortunately no. The direction variables are up, down, left, right, none. Untouchable status is also a counter value for mercy invincibility. Only PatternIsAdditional is an actual boolean value.
Oziphantom wrote: Thu Jun 30, 2022 2:28 amyou could do a NV pack and put 2 per byte and use BIT to branch and detect each sub bit. You could also then pack 8 into a single byte and use lda #value BIT address,x to check for the status of sub bits with NV being faster for more commonly used values.
Since I program most of the game logic in C, I cannot really use any registers or the N and V flag.
But even if I could use them: Would this really be a realistic way to do things? Apart from the fact that using the N and V flag would merely save me two of my 35 variables, aren't those the values that get overwritten by certain operations? How can I use these flags as general-purpose variables if they can get changed by common operations? Isn't that totally dangerous?

And about using one byte for multiple stuff:
Oziphantom wrote: Thu Jun 30, 2022 2:28 amDyingStatus
DyingCounter
does the counter need all 8 bits? how many bits does the status need? is the status a bool? then the counter being +ve tells you that status is set and that +ve value is the counter.
Doesn't the whole bit manipulation that I need to do to access the values in the byte require just as much time as if I copy the variable from the array and back?
My game "City Trouble":
Gameplay video: https://youtu.be/Eee0yurkIW4
Download (ROM, manual, artworks): http://www.denny-r-walter.de/city.html
User avatar
rainwarrior
Posts: 8732
Joined: Sun Jan 22, 2012 12:03 pm
Location: Canada
Contact:

Re: Optimizing the usage of character-related data

Post by rainwarrior »

Honestly, it'd be hard to suggest any real speed optimization that isn't writing in assembly.

In assembly, it's probably quite reasonable to just keep the current index in X, and in cases where you need to reuse X temporarily, reload it quickly from a ZP variable afterward. I find that having only one free indexing register (Y) is quite sufficient for the majority of object/character handling, and most relevant instructions have an X indexed variation.

In C, I can't really give a suggestion beyond what you're doing. Probably doing a copy in assembly to and from the indexed block to a temporary "character" is about as good as you'll get for optimizing speed. Removing the indexing in C is likely to help a lot.

If you do need to use an index in C, pass it as a static variable with zeropage pragma. Make sure the index is 8-bit, and make sure the arrays you're using are also 8-bit (and static). The compiler probably won't be too efficient with reusing X, but at least these steps would avoid the overhead involved when the pointers have to resort to 16-bit addition.

I suspect though that in C the temporary copy approach will break even against indexing after only a couple of uses of the index. (At least with cc65.)

...in assembly, on the other hand, retaining X for indexing would most likely be faster overall than a temporary copy. If the computations are extensive enough, even there a temporary copy to ZP would help, but it's really a matter of how much. (You could even do both, making the temporary copy only for objects that need a complicated enough update.)
Oziphantom
Posts: 1565
Joined: Tue Feb 07, 2017 2:03 am

Re: Optimizing the usage of character-related data

Post by Oziphantom »

ok coding in c... cc65? to which best bet is just use a better C compiler, port to Kick-C for best performance but harder to code in. Or VBCC,however LLVM-MOS is probably not ready for the prime time yet.

however to the
The direction variables are up, down, left, right, none. and NV being used.
ok so lets encode this as
00000000 = none
10000000 = up
11000000 = down
01000000 = left
00XXXXXX = right
where X is one of them at least has to be set. Doesn't matter which or how.
Now when you want to "run some code that handles each case" you can do a fast NV dispatch like so

Code: Select all

lda #0
bit Direction
beq none
bpl testLR
bvc up
; down starts here
testLR
bcc right
; left starts here
it takes longer to clear/set them, but that happens rarely, and you will want to dispatch them more often for movement, collision, animation etc
alternatively you could store it
000TTTTT = none
100TTTTT = up
110TTTTT = down
010TTTTT = left
001TTTTT = right
to which the code then becomes

Code: Select all

lda #$20
bit Direction
... as before
and you can store a 5 bit counter 0-32 in the lower half which is accessed with

Code: Select all

lda direction
and #$1f
to which how often would you need to use that counter value, vs the time it takes to copy and copy back said value on every entity. paying a slightly higher cost 2 clocks for 1~2 entities vs 14 clocks for every entity is still a big saving.
User avatar
DRW
Posts: 2225
Joined: Sat Sep 07, 2013 2:59 pm

Re: Optimizing the usage of character-related data

Post by DRW »

rainwarrior wrote: Thu Jun 30, 2022 4:05 pm In assembly, it's probably quite reasonable to just keep the current index in X
I don't know. Even in my functions that are written in Assembly, I usually need A, X and Y and cannot afford to keep one with the same value. Especially not in those huge character game logic functions.

So, yeah all the stuff that relies on using registers with fixed values would not be something that's really an option.

Unfortunately, banking in RAM doesn't seem to be a thing.

Likewise, if it's possible that for example 10 opponents are on screen at once, I could simply use 10 copies of each character movement function, so that it doesn't use Property[CharIndex], but Property[0], Property[1], Property[2] etc. which then translates to regular variable access again. But yeah, I don't want to waste that much ROM space.

And I guess there's no indirect access with fixed indices, right? I.e. stuff like this: LDA (Pointer) + 5 instead of LDY #5. LDA (Pointer), Y does not exist, does it?


I guess one thing I can do is reduce the number of total variables: When an enemy is hit and is being thrown away, then his movement pattern resets anyway. Likewise no movement pattern is necessary when he dies. So, all those pattern variables can probably double as the status values for getting hit and dying.
But I have to see how much I can actually save with this approach. I mean, apart from these special situations, you really do need everything at once: The direction is independent from the animation phase counter which is independent from the position of the movement pattern array or the current blinking counter, energy or cooldown counter for when the next non-standard pattern is alowed to hit again.

rainwarrior wrote: Thu Jun 30, 2022 4:05 pm If you do need to use an index in C, pass it as a static variable with zeropage pragma. Make sure the index is 8-bit, and make sure the arrays you're using are also 8-bit (and static). The compiler probably won't be too efficient with reusing X, but at least these steps would avoid the overhead involved when the pointers have to resort to 16-bit addition.
Those things are of course already done. I mean, global variables instead of local ones, zeropage and byte-sized types, that should be basic knowledge anyway.

Oziphantom wrote: Fri Jul 01, 2022 9:39 am ok coding in c... cc65? to which best bet is just use a better C compiler, port to Kick-C for best performance but harder to code in. Or VBCC
Are those compilers tested to be truly reliable for NES development?


About the direction optimizations: I don't know whether it would save a lot, but in any case, this would be just one of 35 variables that has slightly less calculation time. That's merely a drop on the hot stone. All the other variables still need to be copied back and forth. That's why I'm looking for something to drastically shorten the whole copy stuff.
My game "City Trouble":
Gameplay video: https://youtu.be/Eee0yurkIW4
Download (ROM, manual, artworks): http://www.denny-r-walter.de/city.html
User avatar
rainwarrior
Posts: 8732
Joined: Sun Jan 22, 2012 12:03 pm
Location: Canada
Contact:

Re: Optimizing the usage of character-related data

Post by rainwarrior »

DRW wrote: Fri Jul 01, 2022 2:32 pmI don't know. Even in my functions that are written in Assembly, I usually need A, X and Y and cannot afford to keep one with the same value. Especially not in those huge character game logic functions.
I've found it very practical on multiple major projects. X fits best, because it has more relevant instructions (especially inc), and Y's indirect ability would be wasted. Y is also good for static lookup tables, so it's fine to be free for that. Otherwise the instances where I felt X needed to be use something else were rare, but the zp copy of the index made that quick to restore after.
DRW wrote:And I guess there's no indirect access with fixed indices, right? I.e. stuff like this: LDA (Pointer) + 5 instead of LDY #5. LDA (Pointer), Y does not exist, does it?
Nope. I've seen stuff like that on other CPUs but not 6502. (65816 with DP, or x86 instruction-level offsets, etc.)
User avatar
tokumaru
Posts: 12427
Joined: Sat Feb 12, 2005 9:43 pm
Location: Rio de Janeiro - Brazil

Re: Optimizing the usage of character-related data

Post by tokumaru »

DRW wrote: Fri Jul 01, 2022 2:32 pmI don't know. Even in my functions that are written in Assembly, I usually need A, X and Y and cannot afford to keep one with the same value. Especially not in those huge character game logic functions.
I code in assembly and I always adhere to the standard of having X always pointing to the current object slot when processing objects. I hardly ever need to use X for anything else, but in the very few cases that I do, I restore its value from a global ZP variable like rainwarrior already mentioned.

I've written fairly complex object logic, and for nearly everything I can get by using exclusively Y for indexing, leaving X untouched. This includes physics, level geometry access, object collisions, animations, and a lot more. The notable exception for me is metasprite processing - Y is used to read the metasprite data while X is needed to write the results to the OAM buffer, but this is not a problem for me since metasprite processing is either the last thing my objects do, or it's a separate step from the object logic altogether.

Most of the object processing does not need simultaneous access to two different tables/arrays, at least the way I coded mine.
Unfortunately, banking in RAM doesn't seem to be a thing.
It can be a thing of you're a hardware engineer creating your own mappers... cartridge RAM banking is technically possible, but not really done much, and specially not in such small chunks. IIRC, the MMC5 can access 32KB of RAM in 8KB chunks, but that wouldn't be of much use to you unless you had only 4 active objects and didn't mind swapping the remaining 7.9KB of RAM of each bank along with your object variables.
User avatar
Dwedit
Posts: 4924
Joined: Fri Nov 19, 2004 7:35 pm
Contact:

Re: Optimizing the usage of character-related data

Post by Dwedit »

DRW wrote: Wed Jun 29, 2022 3:32 pm I'm looking for a way to optimize the usage of my character-related data.

In my game, the properties of all characters on screen are stored in separate arrays.

When I process a specific character, I first copy all variables from the current array index to single variables, so that I don't have to use too much array access in the function:

And after the function, I copy the values back to the array.
Blaster Master did that too.
Here come the fortune cookies! Here come the fortune cookies! They're wearing paper hats!
Oziphantom
Posts: 1565
Joined: Tue Feb 07, 2017 2:03 am

Re: Optimizing the usage of character-related data

Post by Oziphantom »

to echo rainwarrior and tokumaru X index really shouldn't be a problem, and having a temp stash of X in ZP makes rapid reloads fast and cheap.

so at 10 enemies 14 clocks per item, each item saved is a line.

HorizontalDirection and VerticalDirection surly this is just L/0/R and U/0/D. So 2 bits each.

What is WalkingPixelsPerFramesIndex and how is it not a property of WalkingSpeedId?
WalkingCounter is the counter to next frame? next movement? to which this is >16 or >8 even? does the WalkingSpeedID need all 8bits? will it fit into 4 or 5 bits so you can put the counter up the top and then either check for overflow on the byte add or prescale the check values and compare for >= rather than ==?

IsBoss can this not be inferred from Type and/or SubType?

If one is dying does one still need a CooldownCounter? or can he CooldownCounter also be the DeathCounter?
StunnedOrThrownAwayCounter, you can't really walk during this period so if you still have the walkingConter could it not be used? or can you have a cooldown while being stunned or thrown?

can every enemy be thrown or stunnded at the same time? if not drop all of those variables to single index and then allocate a fixed number of the need variables for it. So when an enemy is stunned it uses one of the stunned slots. or one of the dead slots, of one of the thrown slots. Thus you have a single byte for the index and the index state ( thrown, stunned, dead ).
User avatar
DRW
Posts: 2225
Joined: Sat Sep 07, 2013 2:59 pm

Re: Optimizing the usage of character-related data

Post by DRW »

Well, since my game is written mostly in C, keeping X with a pre-defined value isn't really an option either way.
So, I guess all I can do is reduce the total number of variables. And maybe leave rarely-used variables out of the copy process. Or maybe leaving them out of the copy-back-to-array process.

How did commercial games, apart from "Blaster Master", handle this? How are the enemy status values in "Super Mario Bros.", "Castlevania", "Mega Man" etc. stored and used?

Oziphantom wrote: Fri Jul 01, 2022 11:41 pm What is WalkingPixelsPerFramesIndex and how is it not a property of WalkingSpeedId?
Since I don't use sub pixels, I have an array that tells you how many pixels per frames a character walks:
Regular speed is 1, 1, 2.
Fast is 2, 2, 2.
Slow is 1, 1, 1.
And very slow is 1, 0, 1.
WalkingSpeedId is the speed itself, i.e. regular, fast etc. to identify which array should be used to begin with.
WalkingPixelsPerFramesIndex is the index within the current array where the above numbers are stored.
Oziphantom wrote: Fri Jul 01, 2022 11:41 pm WalkingCounter is the counter to next frame? next movement?
It's the counter until the next animation phase for the meta sprite is loaded.
Oziphantom wrote: Fri Jul 01, 2022 11:41 pm to which this is >16 or >8 even? does the WalkingSpeedID need all 8bits? will it fit into 4 or 5 bits so you can put the counter up the top and then either check for overflow on the byte add or prescale the check values and compare for >= rather than ==?
There are surely values that can be put into less than eight bits. But even though I save time by copying them at the start of the function, wouldn't I lose this again when I cannot use the values directly anymore, but have to fiddle with bit manipulation?
Oziphantom wrote: Fri Jul 01, 2022 11:41 pm IsBoss can this not be inferred from Type and/or SubType?
Yeah, that's probably one variable that will go.
Oziphantom wrote: Fri Jul 01, 2022 11:41 pm If one is dying does one still need a CooldownCounter? or can he CooldownCounter also be the DeathCounter?
StunnedOrThrownAwayCounter, you can't really walk during this period so if you still have the walkingConter could it not be used? or can you have a cooldown while being stunned or thrown?
You're right. As I said in my previous post, thrown away and death variables are indeed variables that can be shortened.
Oziphantom wrote: Fri Jul 01, 2022 11:41 pm can every enemy be thrown or stunnded at the same time? if not drop all of those variables to single index and then allocate a fixed number of the need variables for it. So when an enemy is stunned it uses one of the stunned slots. or one of the dead slots, of one of the thrown slots. Thus you have a single byte for the index and the index state ( thrown, stunned, dead ).
I don't think that's possible. If all enemies happen to be in the same location, then one sword slash of course throws them all at the same time. And if they're slightly next to each other, then some are thrown earlier than others, whenever the sword hits them.
My game "City Trouble":
Gameplay video: https://youtu.be/Eee0yurkIW4
Download (ROM, manual, artworks): http://www.denny-r-walter.de/city.html
Oziphantom
Posts: 1565
Joined: Tue Feb 07, 2017 2:03 am

Re: Optimizing the usage of character-related data

Post by Oziphantom »

DRW wrote: Sat Jul 02, 2022 6:01 am Well, since my game is written mostly in C, keeping X with a pre-defined value isn't really an option either way.
So, I guess all I can do is reduce the total number of variables. And maybe leave rarely-used variables out of the copy process. Or maybe leaving them out of the copy-back-to-array process.

How did commercial games, apart from "Blaster Master", handle this? How are the enemy status values in "Super Mario Bros.", "Castlevania", "Mega Man" etc. stored and used?
They use ASM to which point the whole rest of the code is lot faster, they also have less bytes per ent, less ents and slowdown a lot. Konami were fond of this method and hence the term "Konami Code" coming to mean Bad.
Oziphantom wrote: Fri Jul 01, 2022 11:41 pm What is WalkingPixelsPerFramesIndex and how is it not a property of WalkingSpeedId?
Since I don't use sub pixels, I have an array that tells you how many pixels per frames a character walks:
Regular speed is 1, 1, 2.
Fast is 2, 2, 2.
Slow is 1, 1, 1.
And very slow is 1, 0, 1.
WalkingSpeedId is the speed itself, i.e. regular, fast etc. to identify which array should be used to begin with.
WalkingPixelsPerFramesIndex is the index within the current array where the above numbers are stored.
well not using powers of two, rookie move. However, so you have 4 modes and then 3 in each mode lets pack this like so
XXXXFFMM
where F is the frame number and M is the speed mode.
Now you make a table that is the first value for each, then the second value for each, like so.

Code: Select all

;r,f,s,v,r,f,s,v,r,f,s,v
 1,2,1,1,1,2,1,0,2,2,1,1
to get to the next frame index you add 4 not 1, you have gone to far when you >= 12 and then to reset the index you do &= 3. This makes the reset slightly more expensive, but stagger your enemies so they start at a different index, so the reset hits over multiple frames and you are ahead. As you've dropped from 2 variables down to 1 variable and the lookup into the table is now single index without needing any maths or branches to work out which table to look up.

possibly others can be combined this way?
User avatar
DRW
Posts: 2225
Joined: Sat Sep 07, 2013 2:59 pm

Re: Optimizing the usage of character-related data

Post by DRW »

Oziphantom wrote: Sat Jul 02, 2022 7:10 am They use ASM to which point the whole rest of the code is lot faster
Yes, I know that commercial games used much more optimized hand-written Assembly. Unfortunately, im my case this means: Either I program the game mostly in C or there won't be any game at all because writing the whole game in Assembly would be too much hassle to me.

I'd still like to know what different approaches commercial games used. Did they use array access and actually kept the X register? Or did they do other stuff?
Oziphantom wrote: Sat Jul 02, 2022 7:10 am well not using powers of two, rookie move.
That was a necessary design decision. In my old version of the game, the movement was fixed to the 8x8 tile grid. So, I had to use speeds where it's possible to start on a full tile and end on a full tile. Walking one pixel per frame was too slow for the hero. Two pixels was too fast. So, the speed was set to 1, 1, 2, 1, 1, 2.

But now that you mention it, my new version has pixel-based movement, so I might use eight values per speed and then set the regular speed as something that's similar to the old speed. This way the array can consist of 32 entries and the reset can be done easier, with just one variable and even without bit manipulation.
Thanks for bringing this to my attention.
My game "City Trouble":
Gameplay video: https://youtu.be/Eee0yurkIW4
Download (ROM, manual, artworks): http://www.denny-r-walter.de/city.html
Fiskbit
Posts: 891
Joined: Sat Nov 18, 2017 9:15 pm

Re: Optimizing the usage of character-related data

Post by Fiskbit »

DRW wrote: Sat Jul 02, 2022 6:01 am How did commercial games, apart from "Blaster Master", handle this? How are the enemy status values in "Super Mario Bros.", "Castlevania", "Mega Man" etc. stored and used?
Using X as a dedicated object index as others have described is very common across a lot of games. Zelda and Mega Man both do this. When X is needed for something else, games clobber it and then reload the index from a variable after.
Post Reply