Optimizing important code parts

Discuss technical or other issues relating to programming the Nintendo Entertainment System, Famicom, or compatible systems. See the NESdev wiki for more information.

Moderator: Moderators

User avatar
DRW
Posts: 2273
Joined: Sat Sep 07, 2013 2:59 pm

Optimizing important code parts

Post by DRW »

Before I go on with my general game logic, I wanted to make sure that important basic functions are optimized well enough.

I started with the sprite rendering function. (Next thing will be the PPU update, then reading the current level data.)

So, could you please have a look at the following sprite rendering function and tell me if there's something that I could improve to make it faster?

Some information:

All meta sprites are stored in the same array. The function takes the array index and then puts the starting address to a pointer.

The meta sprites are declared like this:
Width, height, Y offset.
Tile, palette, tile, palette, tile, palette...
So, all meta sprites are drawn in a rectangle shape, so I don't have to read the X and Y offset for each tile. Makes the function much faster.

As global variables, we have:
Absolute X: The value in the center of the meta sprite (not the leftmost position).
Absolute Y: The value at the bottom of the meta sprite (not the top position).
The meta sprites array index where the data are read from.
The next PPU sprites index where the data is written to
The mirror attribute to check whether the meta sprite should be flipped.

Each X and Y position is two bytes since characters who leave the screen on one side shall not enter on the other side. (X and Y offset values that are relative to the actual sprite's position are of course just one byte.)

Three remarks:

1. Characters in the game can be assigned to more than one palette. (For example, my main character has: Palette 1 = skin color, hair color, t-shirt color. Palette 2 = skin color, pants color, shoe color.) So, I cannot save the palette for the whole meta sprite. I have to use one value per tile.

2. Yes, I know: I set all sprite values first and then I check for the question whether I should actually render them. Instead of skipping the code as soon as one coordinate is outside the screen.
I did this because of the following reason:
If the current game situation is one where less than the maximum number of characters are on screen, then these characters will have an IsActive variable set to false in the game logic code. I.e. if there are only two characters on the screen while the game can handle five at once, UpdateSprites will be called only two times anyway.
This means, said optimization will only work in the rare cases where a character is partly on screen and partly offscreen.
But if a character is rendered, the game engine has to be able to handle him anyway, so there's no need to add some more comparisons just because we could save some cycles in the one second where he's partly outside the screen.
If a character is on-screen, he will be fully visible for 99% of the time, so additional BEQs for a 1 % case where only parts of him are visible would actually make the code slower since most of the time, the stuff cannot be skipped anyway.

3. Checking for mirroring whenever a new X value is set is actually faster than saving a mirror bit mask and a subtraction value in the beginning and then using that for calculation. At least when the characters are only two tiles wide, which is the case with almost all of my characters.


Alright, that's my code:

Code: Select all

.segment "ZEROPAGE"

	_UpdateSpritesSpritesIndex: .res 1
	.export _UpdateSpritesSpritesIndex
	_UpdateSpritesMetaSpritesIndex: .res 2
	.export _UpdateSpritesMetaSpritesIndex
	_UpdateSpritesX: .res 2
	.export _UpdateSpritesX
	_UpdateSpritesY: .res 2
	.export _UpdateSpritesY
	_UpdateSpritesMirrorAttributes: .res 1
	.export _UpdateSpritesMirrorAttributes

	XCounter: .res 1
	YCounter: .res 1

	HalfWidth: .res 1
	HeightInTiles: .res 1

	AbsoluteX: .res 2
	AbsoluteY: .res 2

	RelativeX: .res 2
	PossiblyMirroredRelativeX: .res 2

.segment "CODE"

_UpdateSprites_:
.export _UpdateSprites_

	; The start position from the meta sprites array
	; for the current sprites is set to the const pointer.
	CLC
	LDA #<(_MetaSprites)
	ADC _UpdateSpritesMetaSpritesIndex
	STA _ConstPointer
	LDA #>(_MetaSprites)
	ADC _UpdateSpritesMetaSpritesIndex + 1
	STA _ConstPointer + 1

	; The index offset of the meta sprites array,
	; starting at the position of the const pointer.
	LDY #$00

	; The size of the current meta sprites,
	; counted in tiles, not in pixels.
	; That's the counter value for the X loop,
	; i.e. the outer loop.
	LDA (_ConstPointer), Y
	INY
	STA XCounter

	; XCounter * 4 = Half of the width of the meta sprite.
	ASL
	ASL
	STA HalfWidth

	; The absolute X position is in the center of the meta sprite.
	; The relative X position gets moved from the center to the left,
	; so that this value points to the leftmost position of the meta sprite.
	SEC
	LDA #$00
	SBC HalfWidth
	STA RelativeX
	LDA #$00
	SBC #$00
	STA RelativeX + 1

	; Height, counted in tiles.
	LDA (_ConstPointer), Y
	INY
	STA HeightInTiles

	; The absolute Y position is at the bottom of the meta sprite.
	; So, it is moved eight pixels to the top,
	; so that the tiles' bottoms are actually at the desired position.
	SEC
	LDA _UpdateSpritesY
	SBC #$08
	STA _UpdateSpritesY
	LDA _UpdateSpritesY + 1
	SBC #$00
	STA _UpdateSpritesY + 1

	; Some characters cannot be drawn with their feet in the bottom position.
	; For these meta sprites, the offset value is added to the Y position,
	; so that they're still in the correct position.
	CLC
	LDA _UpdateSpritesY
	ADC (_ConstPointer), Y
	INY
	STA _UpdateSpritesY
	LDA _UpdateSpritesY + 1
	ADC #$00
	STA _UpdateSpritesY + 1

	; The index of the PPU sprites that are written next.
	LDX _UpdateSpritesSpritesIndex

	; The outer loop: All rows are drawn from left to right.
@loopX:

	; The height in tiles becomes the loop counter.
	LDA HeightInTiles
	STA YCounter

	; The absolute Y value is set to its starting position.
	LDA _UpdateSpritesY
	STA AbsoluteY
	LDA _UpdateSpritesY + 1
	STA AbsoluteY + 1

	; If the meta sprite shall be mirrored,
	; we have to manipulate the X position.
	LDA _UpdateSpritesMirrorAttributes
	BEQ @noMirroring

	; The relative X position gets inverted and subtracted with 7.
	; This way, it has the correct value to render the tile
	; at the opposite of the meta sprite's center.
	; The new value is stored in a separate variable.
	SEC
	LDA RelativeX
	EOR #%11111111
	SBC #$07
	STA PossiblyMirroredRelativeX
	LDA RelativeX + 1
	EOR #%11111111
	SBC #$00
	STA PossiblyMirroredRelativeX + 1

	JMP @endMirroring

@noMirroring:

	; If no mirroring is done,
	; the value is simply copied into the new variable.
	LDA RelativeX
	STA PossiblyMirroredRelativeX
	LDA RelativeX + 1
	STA PossiblyMirroredRelativeX + 1

@endMirroring:

	; We take the original absolute centered X position
	; and add the relative X position to it.
	; This way we get the actual value
	; that needs to be used for the rendering.
	CLC
	LDA _UpdateSpritesX
	ADC PossiblyMirroredRelativeX
	STA AbsoluteX
	LDA _UpdateSpritesX + 1
	ADC PossiblyMirroredRelativeX + 1
	STA AbsoluteX + 1

	; The inner loop: Every tile in this column is rendered from bottom to top.
@loopY:

	; The low byte of the Y position is written to the sprites array.
	LDA AbsoluteY
	STA _Sprites + 0, X

	; The tile is read from the meta sprites array
	; and set to the sprites array.
	LDA (_ConstPointer), Y
	INY
	STA _Sprites + 1, X

	; The attributes are read from the meta sprites array.
	; They are OR-connected with the mirror attributes
	; and then written to the sprites array.
	LDA (_ConstPointer), Y
	INY
	ORA _UpdateSpritesMirrorAttributes
	STA _Sprites + 2, X

	; The low byte of the X position is written to the sprites array.
	LDA AbsoluteX
	STA _Sprites + 3, X

	; If the high byte of X or Y is not 0,
	; this means this specific sprite is outside the screen.
	; In this case, the rendering is skipped.
	; It doesn't matter that the values in the sprites array are already written.
	; As long as _UpdateSpritesSpritesIndex isn't incremented,
	; the _ClearSprites function will make sure
	; that all unused sprites are put outside the screen in the end.
	LDA AbsoluteX + 1
	BNE @endRendering
	LDA AbsoluteY + 1
	BNE @endRendering

	; If everything is alright, then _UpdateSpritesSpritesIndex and the X register
	; get incremented with the value 4.
	; This value corresponds to the four bytes that we have written to the sprites array.
	; The PPU will render the current sprite on the screen.
	INX
	INX
	INX
	INX
	STX _UpdateSpritesSpritesIndex

@endRendering:

	; If the Y counter is 0,
	; the inner loop isn't repeated anymore
	; and all of the loop preparation is skipped.
	DEC YCounter
	BEQ @noLoopY

	; For the next loop,
	; the Y position is decremented with 8,
	; i.e. one tile height.
	SEC
	LDA AbsoluteY
	SBC #$08
	STA AbsoluteY
	LDA AbsoluteY + 1
	SBC #$00
	STA AbsoluteY + 1

	; The inner loop is repeated.
	JMP @loopY

@noLoopY:

	; If the X counter is 0, the function ends.
	; Otherwise, the outer loop is repeated.
	DEC XCounter
	BEQ @noLoopX

	; For the next loop,
	; the X position is incremented with 8,
	; i.e. one tile width.
	CLC
	LDA RelativeX
	ADC #$08
	STA RelativeX
	LDA RelativeX + 1
	ADC #$00
	STA RelativeX + 1

	; The outer loop is repeated.
	JMP @loopX

@noLoopX:

	RTS
Last edited by DRW on Sat Nov 07, 2015 2:23 pm, edited 1 time in total.
My game "City Trouble":
Gameplay video: https://youtu.be/Eee0yurkIW4
Download (ROM, manual, artworks): http://www.denny-r-walter.de/city.html
furrykef
Posts: 35
Joined: Fri Mar 02, 2012 11:10 pm

Re: Optimizing important code parts

Post by furrykef »

This won't help make anything faster, but if you're using ca65, I highly recommend using ".macpack generic" and the macros within. The ADD and SUB macros are less error-prone than writing CLC (or SEC) with ADC and SBC explicitly, and I much prefer writing BLT and BGE to writing BCC and BCS after comparisons.
User avatar
DRW
Posts: 2273
Joined: Sat Sep 07, 2013 2:59 pm

Re: Optimizing important code parts

Post by DRW »

My goal was actually not to use any library functions, but to write everything myself.

Except in a few special cases:

I use the randomizer provided by by CC65.
Although I took the source code and changed some minor things, like setting the seed to 1 in the beginning. Firstly, this would require a DATA segment which I don't have because I don't need it. And secondly, my game doesn't call the rand function without having used srand anyway.

And I use Shiru's FamiTone library completely unaltered because even after working through the Nerdy Nights NES music tutorial, I would still be unable to create a decent sound driver.

But all those little code snippets with small functions or macros: I don't use them because these things I want to write myself.
It might be good to write my own little macros ADDITION16BIT and SUBTRACTION16BIT though.
My game "City Trouble":
Gameplay video: https://youtu.be/Eee0yurkIW4
Download (ROM, manual, artworks): http://www.denny-r-walter.de/city.html
furrykef
Posts: 35
Joined: Fri Mar 02, 2012 11:10 pm

Re: Optimizing important code parts

Post by furrykef »

Well then you can write your own "ADD" and "SUB" macros. Although I don't see the point in doing that instead of using the ones already written for you.
User avatar
DRW
Posts: 2273
Joined: Sat Sep 07, 2013 2:59 pm

Re: Optimizing important code parts

Post by DRW »

As I said: Because I want to do eveything myself.

I have no problem in using external stuff when it comes to more complicated things like a randomizer, a sound library or a compiler that transforms C code into Assembly. I.e. longer stuff where I don't understand the inner workings even if I read the code.

And of course, I have no problem in asking how certain things are done and when somebody explains it to me, I implement it into my code.

But I don't want to clutter my game with these little mundane library calls.

When I have to write 99.9 % of the game myself anyway (unlike a Windows application where the standard library is of actual help and where there are hundreds of external function calls in your own code), why should I use an external library for 0.01 % of the game? These little code details can be selfmade as well then. No need to add another external dependency for something as simple as an addition or a subtraction.
My game "City Trouble":
Gameplay video: https://youtu.be/Eee0yurkIW4
Download (ROM, manual, artworks): http://www.denny-r-walter.de/city.html
User avatar
Movax12
Posts: 541
Joined: Sun Jan 02, 2011 11:50 am

Re: Optimizing important code parts

Post by Movax12 »

Macros aren't library calls. Personally I don't like ADD and SUB macros because I don't want to get in the habit of using them and missing opportunities to optimize where the carry flag is already in the correct state. But macros can be super helpful.
User avatar
DRW
Posts: 2273
Joined: Sat Sep 07, 2013 2:59 pm

Re: Optimizing important code parts

Post by DRW »

Movax12 wrote:Macros aren't library calls.
Still, it doesn't invalidate what I worte about it.
My game "City Trouble":
Gameplay video: https://youtu.be/Eee0yurkIW4
Download (ROM, manual, artworks): http://www.denny-r-walter.de/city.html
User avatar
tokumaru
Posts: 12536
Joined: Sat Feb 12, 2005 9:43 pm
Location: Rio de Janeiro - Brazil

Re: Optimizing important code parts

Post by tokumaru »

Movax12 wrote:Personally I don't like ADD and SUB macros because I don't want to get in the habit of using them and missing opportunities to optimize where the carry flag is already in the correct state. But macros can be super helpful.
I agree. I hardly ever need typical additions or subtractions, I usually find myself doing multiple things at once and using the carry to my advantage.
furrykef
Posts: 35
Joined: Fri Mar 02, 2012 11:10 pm

Re: Optimizing important code parts

Post by furrykef »

Movax12 wrote:Personally I don't like ADD and SUB macros because I don't want to get in the habit of using them and missing opportunities to optimize where the carry flag is already in the correct state.
Y'know what they say: computer cycles are cheap, people cycles are expensive. Just one bug caused by forgetting to CLC or SEC would negate whatever advantage you get out of that.

If you're doing it in a tight loop where it can actually make a performance difference, you won't forget that ADC/SBC is faster than ADD/SUB.
User avatar
tokumaru
Posts: 12536
Joined: Sat Feb 12, 2005 9:43 pm
Location: Rio de Janeiro - Brazil

Re: Optimizing important code parts

Post by tokumaru »

I have a 6502 emulator in my head, I'm always minding the carry flag. :twisted:

I still write the CLC and SEC instructions in my programs when they're not necessary, but I comment then out. This makes it easier to see that it's an optimization, and not a mistake.
User avatar
GradualGames
Posts: 1106
Joined: Sun Nov 09, 2008 9:18 pm
Location: Pennsylvania, USA
Contact:

Re: Optimizing important code parts

Post by GradualGames »

The only thing I can think of is that, at least for large sprites it could save some time to pre-clip the sprite (roughly 20% faster in the cases I tested, large metasprites with 18 or more sprites). I.e. clip the sprite's rectangle and figure out how to iterate over the metasprite calculating some increments to traverse the sub-rectangle that is actually visible. That way you won't be doing redundant clipping tests in the inner loop, which is where the lion's share of time goes. This is the improvement I was referring to over in Efficiency of development process using C versus 6502 As you mention you're only doing small sprites, might not be useful in your game. *edit* once I clean it up I'll share my routine here.

*edit* This is my current WIP. I actually realized I need to re-work this to perform mirroring the way I've been doing it, which has been to bake it into the metasprite data itself. But, it at least demonstrates the pre-clipping idea I mentioned. Though, taking thefox's comment into consideration (later in this thread), this might be overkill. Still, I was pleased with how little has to be in the innermost loop of the routine, which was an improvement over my previous efforts.

Code: Select all

;****************************************************************
;This routine pre-clips and draws a metasprite using 16 bit
;coordinates.
;C prototype:
;void __fastcall__ sprite_draw_metasprite(int x, int y, unsigned char chr_handle, const unsigned char *metasprite);
;****************************************************************
.proc _sprite_draw_metasprite
    left = w0
    top = w1
    right = w2
    bottom = w3
    metasprite = w4
    start_row = b0
    end_row = b1
    start_column = b2
    end_column = b3
    row = b4
    column = b5
    chr_handle = b6
    width = b7
    height = b8
    width_in_columns = b9
    height_in_rows = b10
    bytes_to_skip_per_row = b11
    metasprite_offset = b12
    metasprite_offset_term1 = b13

    ;const metasprite_entry *metasprite_entries = (const metasprite_entry*) (metasprite + 2);
    sta metasprite
    stx metasprite+1

    jsr popa
    sta chr_handle

    ;int top = y - 1;
    jsr popax
    sta top
    stx top+1

    dec16 top

    ;int left = x;
    jsr popax
    sta left
    stx left+1

    ;int right = left + (metasprite[0] * 8) - 1;

    ;(metasprite[0] * 8)
    ldy #0
    lda (metasprite),y
    asl
    asl
    asl
    sta width

    ;left + (metasprite[0] * 8)
    clc
    lda left
    adc width
    sta right
    lda left+1
    adc #0
    sta right+1

    ; - 1;
    dec16 right

    ;int bottom = top + (metasprite[1] * 8) - 1;

    ;(metasprite[1] * 8)
    iny
    lda (metasprite),y
    asl
    asl
    asl
    sta height

    ;top + (metasprite[1] * 8)
    clc
    lda top
    adc height
    sta bottom
    lda top+1
    adc #0
    sta bottom+1

    ; - 1;
    dec16 bottom

    ;if (left > 255) return;
    cmp16 #255, left
    blt :+
    jmp :++
:   rts
:

    ;if (top > 238) return;
    cmp16 #238, top
    blt :+
    jmp :++
:   rts
:

    ;if (right < 7) return;
    cmp16 right, #7
    blt :+
    jmp :++
:   rts
:

    ;if (bottom < 7) return;
    cmp16 bottom, #7
    blt :+
    jmp :++
:   rts
:

    ;if (left < 0) {
    .scope
    cmp16 left, #0
    blt clip_left
no_clip_left:
    ;no clip on left side
    ;start_column = 0;
    lda #0
    sta start_column
    jmp done
clip_left:
    ;clip on left side
    ;start_column = (-(left + 1) >> 3) + 1;
    lda left
    sta start_column
    inc start_column
    clc
    lda start_column
    eor #$ff
    adc #$01
    lsr
    lsr
    lsr
    sta start_column
    inc start_column
done:
    .endscope

    ;if (right > 255) {
    .scope
    cmp16 #255, right
    blt clip_right
no_clip_right:
    ;no clip on right side
    ;end_column = metasprite[0] - 1;
    ldy #0
    lda (metasprite),y
    sta end_column
    dec end_column
    jmp done
clip_right:
    ;end_column = (255 - left) >> 3;
    sec
    lda #255
    sbc left
    lsr
    lsr
    lsr
    sta end_column
done:
    .endscope

    ;if (top < 0) {
    .scope
    cmp16 top, #0
    blt clip_top
no_clip_top:
    ;no clip on top
    ;start_row = 0;
    lda #0
    sta start_row
   jmp done
clip_top:
    ;clip on top
    ;start_row = (-(top + 1) >> 3) + 1;
    lda top
    sta start_row
    inc start_row
    clc
    lda start_row
    eor #$ff
    adc #$01
    lsr
    lsr
    lsr
    sta start_row
    inc start_row
done:
    .endscope

    ;if (bottom > 239) {
    .scope
    cmp16 #239, bottom
    blt clip_bottom
no_clip_bottom:
    ;no clip on bottom
    ;end_row = metasprite[1] - 1;
    ldy #1
    lda (metasprite),y
    sta end_row
    dec end_row
    jmp done
clip_bottom:
    ;clip on bottom
    ;end_row = (239 - top) >> 3;
    sec
    lda #239
    sbc top
    lsr
    lsr
    lsr
    sta end_row
done:
    .endscope

    lda start_row
    sta row
    lda start_column
    sta column

    ;metasprite_offset = (start_row * metasprite[0] + start_column) * 5;

    ;(start_row * metasprite[0]
    ldy #0
    lda (metasprite),y
    tax
    lda #0
:   clc
    adc start_row
    dex
    bne :-

    ; + start_column)
    clc
    adc start_column
    sta metasprite_offset_term1

    ; * 5;
    asl
    asl
    clc
    adc metasprite_offset_term1
    sta metasprite_offset

    inc metasprite_offset

    ;number_of_bytes_to_skip_per_row = ((metasprite[0] - (end_column - start_column + 1)) * 5);

    ;(end_column - start_column + 1)
    sec
    lda end_column
    sbc start_column
    sta bytes_to_skip_per_row
    inc bytes_to_skip_per_row

    ;((metasprite[0] - (end_column - start_column + 1))
    sec
    ldy #0
    lda (metasprite),y
    sbc bytes_to_skip_per_row
    sta bytes_to_skip_per_row

    ;((metasprite[0] - (end_column - start_column + 1)) * 5)
    lda bytes_to_skip_per_row
    asl
    asl
    clc
    adc bytes_to_skip_per_row
    sta bytes_to_skip_per_row

    sec
    lda end_row
    sbc start_row
    sta height_in_rows
    inc height_in_rows

    sec
    lda end_column
    sbc start_column
    sta width_in_columns
    inc width_in_columns

    lda height_in_rows
    sta row
next_row:

    lda width_in_columns
    sta column

    ldy metasprite_offset
next_column:

    ldx _next_sprite_address

    ;get y
    iny
    clc
    lda (metasprite),y
    adc top
    sta _sprite_ram+sprite_struct::ycoord,x
    ;get tile
    iny
    lda (metasprite),y
    sta _sprite_ram+sprite_struct::tile,x
    ;get attribute
    iny
    lda (metasprite),y
    sta _sprite_ram+sprite_struct::attribute,x
    ;get x
    iny
    clc
    lda (metasprite),y
    adc left
    sta _sprite_ram+sprite_struct::xcoord,x
    ;skip flipped x
    iny

    clc
    lda _next_sprite_address
    adc #4
    sta _next_sprite_address

    dec column
    bne next_column

    clc
    tya
    adc bytes_to_skip_per_row
    sta metasprite_offset

    dec row
    bne next_row

    rts

.endproc

Last edited by GradualGames on Sat Jan 09, 2016 11:59 am, edited 4 times in total.
User avatar
thefox
Posts: 3134
Joined: Mon Jan 03, 2005 10:36 am
Location: the universe
Contact:

Re: Optimizing important code parts

Post by thefox »

GradualGames wrote:The only thing I can think of is that, at least for large sprites it could save some time to pre-clip the sprite (roughly 20% faster in the cases I tested, large metasprites with 18 or more sprites). I.e. clip the sprite's rectangle and figure out how to iterate over the metasprite calculating some increments to traverse the sub-rectangle that is actually visible. That way you won't be doing redundant clipping tests in the inner loop, which is where the lion's share of time goes. This is the improvement I was referring to over in Efficiency of development process using C versus 6502 As you mention you're only doing small sprites, might not be useful in your game.
That's a good point. In fact, it might be a decent enough optimization to simply have two rendering routines: one that clips per sprite, and another one that doesn't clip at all. The routine would be selected depending on whether the screen edges intersect the bounding box of the metasprite (the whole call could be skipped if the bounding box is outside the screen boundaries). You would think that most of the time the metasprites would be entirely visible.

Might also be worth having some extra logic for small and large metasprites (for really small ones the overhead of the extra checks might not be worth it).
Download STREEMERZ for NES from fauxgame.com! — Some other stuff I've done: fo.aspekt.fi
User avatar
DRW
Posts: 2273
Joined: Sat Sep 07, 2013 2:59 pm

Re: Optimizing important code parts

Post by DRW »

GradualGames wrote:The only thing I can think of is that, at least for large sprites it could save some time to pre-clip the sprite
The code above is not the most recent one. I actually did some more improvements, using the rule that all meta sprites have to have a rectangular shape which saves you reading the X and Y coordinate for each sprite.

I can post my most recent function in a few hours.
My game "City Trouble":
Gameplay video: https://youtu.be/Eee0yurkIW4
Download (ROM, manual, artworks): http://www.denny-r-walter.de/city.html
User avatar
GradualGames
Posts: 1106
Joined: Sun Nov 09, 2008 9:18 pm
Location: Pennsylvania, USA
Contact:

Re: Optimizing important code parts

Post by GradualGames »

DRW wrote:
GradualGames wrote:The only thing I can think of is that, at least for large sprites it could save some time to pre-clip the sprite
The code above is not the most recent one. I actually did some more improvements, using the rule that all meta sprites have to have a rectangular shape which saves you reading the X and Y coordinate for each sprite.

I can post my most recent function in a few hours.
That's a neat idea.
User avatar
DRW
Posts: 2273
Joined: Sat Sep 07, 2013 2:59 pm

Re: Optimizing important code parts

Post by DRW »

O.k., this is my current sprite rendering function. If you have any questions, just ask.

Code: Select all

	.importzp _ConstPointer

	.import _CharactersSprites

.segment "ZEROPAGE"

	_UpdateSpritesSpritesIndex: .res 1
	.export _UpdateSpritesSpritesIndex
	_UpdateSpritesCharactersSpritesIndex: .res 2
	.export _UpdateSpritesCharactersSpritesIndex
	_UpdateSpritesX: .res 2
	.export _UpdateSpritesX
	_UpdateSpritesY: .res 2
	.export _UpdateSpritesY
	_UpdateSpritesMirrorAttributes: .res 1
	.export _UpdateSpritesMirrorAttributes

	XCounter: .res 1
	YCounter: .res 1

	HalfWidth: .res 1
	HeightInTiles: .res 1

	AbsoluteX: .res 2
	AbsoluteY: .res 2

	RelativeX: .res 2
	PossiblyMirroredRelativeX: .res 2
	
	Palette: .res 1

.segment "CODE"

_UpdateSprites_:
.export _UpdateSprites_

	; The start position from the meta sprites array
	; for the current sprites is set to the const pointer.
	CLC
	LDA #<(_CharactersSprites)
	ADC _UpdateSpritesCharactersSpritesIndex
	STA _ConstPointer
	LDA #>(_CharactersSprites)
	ADC _UpdateSpritesCharactersSpritesIndex + 1
	STA _ConstPointer + 1

	; The index offset of the meta sprites array,
	; starting at the position of the const pointer.
	LDY #$00

	; The size of the current meta sprites,
	; counted in tiles, not in pixels.
	; That's the counter value for the X loop,
	; i.e. the outer loop.
	LDA (_ConstPointer), Y
	INY
	STA XCounter

	; XCounter * 4 = Half of the width of the meta sprite.
	ASL
	ASL
	STA HalfWidth

	; The absolute X position is in the center of the meta sprite.
	; The relative X position gets moved from the center to the left,
	; so that this value points to the leftmost position of the meta sprite.
	SEC
	LDA #$00
	SBC HalfWidth
	STA RelativeX
	LDA #$00
	SBC #$00
	STA RelativeX + 1

	; Height, counted in tiles.
	LDA (_ConstPointer), Y
	INY
	STA HeightInTiles

	; The absolute Y position is at the bottom of the meta sprite.
	; So, it is moved eight pixels to the top,
	; so that the tiles' bottoms are actually at the desired position.
	SEC
	LDA _UpdateSpritesY
	SBC #$08
	STA _UpdateSpritesY
	LDA _UpdateSpritesY + 1
	SBC #$00
	STA _UpdateSpritesY + 1

	; Some characters cannot be drawn with their feet in the bottom position.
	; For these meta sprites, the offset value is added to the Y position,
	; so that they're still in the correct position.
	CLC
	LDA _UpdateSpritesY
	ADC (_ConstPointer), Y
	INY
	STA _UpdateSpritesY
	LDA _UpdateSpritesY + 1
	ADC (_ConstPointer), Y
	INY
	STA _UpdateSpritesY + 1
	
	LDA (_ConstPointer), Y
	INY
	STA Palette
	
	; The index of the PPU sprites that are written next.
	LDX _UpdateSpritesSpritesIndex

	; The outer loop: All rows are drawn from left to right.
@loopX:

	; The height in tiles becomes the loop counter.
	LDA HeightInTiles
	STA YCounter

	; The absolute Y value is set to its starting position.
	LDA _UpdateSpritesY
	STA AbsoluteY
	LDA _UpdateSpritesY + 1
	STA AbsoluteY + 1

	; If the meta sprite shall be mirrored,
	; we have to manipulate the X position.
	LDA _UpdateSpritesMirrorAttributes
	BEQ @noMirroring

	; The relative X position gets inverted and subtracted with 7.
	; This way, it has the correct value to render the tile
	; at the opposite of the meta sprite's center.
	; The new value is stored in a separate variable.
	SEC
	LDA RelativeX
	EOR #%11111111
	SBC #$07
	STA PossiblyMirroredRelativeX
	LDA RelativeX + 1
	EOR #%11111111
	SBC #$00
	STA PossiblyMirroredRelativeX + 1

	JMP @endMirroring

@noMirroring:

	; If no mirroring is done,
	; the value is simply copied into the new variable.
	LDA RelativeX
	STA PossiblyMirroredRelativeX
	LDA RelativeX + 1
	STA PossiblyMirroredRelativeX + 1

@endMirroring:

	; We take the original absolute centered X position
	; and add the relative X position to it.
	; This way we get the actual value
	; that needs to be used for the rendering.
	CLC
	LDA _UpdateSpritesX
	ADC PossiblyMirroredRelativeX
	STA AbsoluteX
	LDA _UpdateSpritesX + 1
	ADC PossiblyMirroredRelativeX + 1
	STA AbsoluteX + 1

	; The inner loop: Every tile in this column is rendered from bottom to top.
@loopY:

	; The low byte of the Y position is written to the sprites array.
	LDA AbsoluteY
	STA _Sprites + 0, X

	; The tile is read from the meta sprites array
	; and set to the sprites array.
	LDA (_ConstPointer), Y
	INY
	STA _Sprites + 1, X

	; The attributes are read from the meta sprites array.
	; They are OR-connected with the mirror attributes
	; and then written to the sprites array.
	LDA Palette
	ORA _UpdateSpritesMirrorAttributes
	STA _Sprites + 2, X

	; The low byte of the X position is written to the sprites array.
	LDA AbsoluteX
	STA _Sprites + 3, X

	; If the high byte of X or Y is not 0,
	; this means this specific sprite is outside the screen.
	; In this case, the rendering is skipped.
	; It doesn't matter that the values in the sprites array are already written.
	; As long as _UpdateSpritesSpritesIndex isn't incremented,
	; the _ClearSprites function will make sure
	; that all unused sprites are put outside the screen in the end.
	LDA AbsoluteX + 1
	BNE @endRendering
	LDA AbsoluteY + 1
	BNE @endRendering

	; If everything is alright, then _UpdateSpritesSpritesIndex and the X register
	; get incremented with the value 4.
	; This value corresponds to the four bytes that we have written to the sprites array.
	; The PPU will render the current sprite on the screen.
	INX
	INX
	INX
	INX
	STX _UpdateSpritesSpritesIndex

@endRendering:

	; If the Y counter is 0,
	; the inner loop isn't repeated anymore
	; and all of the loop preparation is skipped.
	DEC YCounter
	BEQ @noLoopY

	; For the next loop,
	; the Y position is decremented with 8,
	; i.e. one tile height.
	SEC
	LDA AbsoluteY
	SBC #$08
	STA AbsoluteY
	LDA AbsoluteY + 1
	SBC #$00
	STA AbsoluteY + 1

	; The inner loop is repeated.
	JMP @loopY

@noLoopY:

	; If the X counter is 0, the function ends.
	; Otherwise, the outer loop is repeated.
	DEC XCounter
	BEQ @noLoopX

	; For the next loop,
	; the X position is incremented with 8,
	; i.e. one tile width.
	CLC
	LDA RelativeX
	ADC #$08
	STA RelativeX
	LDA RelativeX + 1
	ADC #$00
	STA RelativeX + 1

	; The outer loop is repeated.
	JMP @loopX

@noLoopX:

	RTS
This is the function call definition within C:

Code: Select all

#define UpdateSprites(charactersSpritesIndex, x, y, directionAsMirrorAttributes)\
{\
	UpdateSpritesCharactersSpritesIndex = charactersSpritesIndex;\
	UpdateSpritesX = x;\
	UpdateSpritesY = y;\
	UpdateSpritesMirrorAttributes = directionAsMirrorAttributes;\
	UpdateSprites_();\
}
The meta sprites are all part of one huge array:

Code: Select all

#define SPRITES_INIT(width, height, offsetY, palette)\
	width, height, LowByte(offsetY), HighByte(offsetY), palette
	
const byte CharactersSprites[] =
{
	/* Goon
	   ---- */
	
	/* Walking0 */
	SPRITES_INIT(GoonWidth, GoonHeight, GoonOffsetY, GoonPalette),
	0x90, 0x80, 0x70, 0x60, 0x50,
	0x91, 0x81, 0x71, 0x61, 0x51,

	/* Walking1 */
	SPRITES_INIT(GoonWidth, GoonHeight, GoonOffsetY, GoonPalette),
	0x92, 0x82, 0x72, 0x62, 0x52,
	0x93, 0x83, 0x73, 0x63, 0x53,

	/* etc. */
};
My game "City Trouble":
Gameplay video: https://youtu.be/Eee0yurkIW4
Download (ROM, manual, artworks): http://www.denny-r-walter.de/city.html
Post Reply