Unrolling loops?

Are you new to 6502, NES, or even programming in general? Post any of your questions here. Remember - the only dumb question is the question that remains unasked.

Moderator: Moderators

Post Reply
CrowleyBluegrass
Posts: 42
Joined: Sun Jun 30, 2013 7:59 am

Unrolling loops?

Post by CrowleyBluegrass »

Can somebody give me a simple (6502 is my first foray into programming) explanation of what loop unrolling is? These concepts are quite hard to understand because (it seems) a lot of prior programming knowledge is assumed in many online explanations. Makes sense though I guess, why would someone with no programming experience care about what loop unrolling is?

At the risk of sounding like a right numpty, is it something like this? (original code from Nerdy Nights 5)

Code: Select all

;INSTEAD OF DOING THIS

LoadSprites:

  LDX #$00              ; start at 0

LoadSpritesLoop:

  LDA sprites, x        ; load data from address (sprites + x)

  STA $0200, x          ; store into RAM address ($0200 + x)

  INX                   ; X = X + 1

  CPX #$10              ; Compare X to hex $10, decimal 16

  BNE LoadSpritesLoop   ; Branch to LoadSpritesLoop if compare was Not Equal to zero

                        ; if compare was equal to 16, continue down

;DO THIS?

LoadSprites:

  LDX #$00              ; start at 0

LoadSpritesLoop:

  LDA sprites, x        ; load data from address (sprites + x)

  STA $0200, x          ; store into RAM address ($0200 + x)

  INX                   ; X = X + 1

  LDA sprites, x        ; load data from address (sprites + x)

  STA $0200, x          ; store into RAM address ($0200 + x)

  INX                   ; X = X + 2

  LDA sprites, x        ; load data from address (sprites + x)

  STA $0200, x          ; store into RAM address ($0200 + x)

  INX                   ; X = X + 3

  LDA sprites, x        ; load data from address (sprites + x)

  STA $0200, x          ; store into RAM address ($0200 + x)

  INX                   ; X = X + 4

  CPX #$10              ; Compare X to hex $10, decimal 16

  BNE LoadSpritesLoop   ; Branch to LoadSpritesLoop if compare was Not Equal to zero

                        ; if compare was equal to 16, continue down
The way I see it (with my very limited math skills) instead of repeating the loop 15 times and loading 1 address each iteration, you load 4 addresses. This way, the loop is only repeated 4 times (4*4=16).

I work out the total clock cycles for the first loop to be 18 if the branch is taken, and 17 if the branch is not taken. So 18*15=270 and then add 17 for the last run through in which the branch is not taken 270+17=287.

I work out the second loop as taking 51 cycles if the branch is taken, 50 if the branch is not taken. So 51*3=153 and then add 50 for the last run through in which the branch is not taken 153+50=203.

Therefore, ya save 84 cycles?

The last time I did any serious math (though I doubt you guys would consider this serious) was when I was in school, so please, be nice!
Thanks y'all :)
User avatar
Movax12
Posts: 529
Joined: Sun Jan 02, 2011 11:50 am

Re: Unrolling loops?

Post by Movax12 »

You got the idea. You are saving cycles on not have to branch as often. You can save more cycles though, depending on what is happening in a given loop and thinking about it logically.


You could make things a bit quicker here:

Code: Select all


LDA #$00
CLC
LoadSpritesLoop:

  TAX

  LDY sprites, x
  STY $0200, x

  LDY sprites+1, x
  STY $0200+1, x

  LDY sprites+2, x
  STY $0200+2, x

  LDY sprites+3, x
  STY $0200+3, x

  ADC #$04
  CMP #$10
  BNE LoadSpritesLoop   ; Branch to LoadSpritesLoop if compare was Not Equal to zero

  ; if compare was equal to 16, continue down
You should also try to start loops from the end and count backwards to zero when possible so you don't have to use a compare instruction (a zero value will set the zero flag). As well, though this may just be an example, if the data you are copying to the sprite shadow OAM is raw sprite data, you should just put it there in the first place.
User avatar
thefox
Posts: 3139
Joined: Mon Jan 03, 2005 10:36 am
Location: Tampere, Finland
Contact:

Re: Unrolling loops?

Post by thefox »

The idea is right but the cycle counts seem off to me. I got 60 cycles saved. The easiest way to calculate it is to see that the 2nd loop gets rid of 3 pairs of CPX+BNE per each 4 iterations, so a total of 12 CPX+BNE pairs. Those CPX+BNE take a total of 2+3 = 5 cycles, so the number of cycles saved is 12*5 = 60.

But really the easiest way to do this stuff is to use something like NintendulatorDX. :) No more second guessing if you got it right or not.
Download STREEMERZ for NES from fauxgame.com! — Some other stuff I've done: fo.aspekt.fi
CrowleyBluegrass
Posts: 42
Joined: Sun Jun 30, 2013 7:59 am

Re: Unrolling loops?

Post by CrowleyBluegrass »

Thanks guys. I wouldn't be surprised if the amount of clock cycles saved was incorrect, if anyone can tell from a glance why (obviously my lackluster arithmetic) I'd be interested to know.

I've never used the + operator before. I assume it just replaces the task of adding x to the address by just adding a number to the address directly in the assembler? In other words, in the first run, adding x to the address doesn't do anything because x is 0 and incrementing the address is done with the + operator? Then the second time, each address is incremented by an additional 4 (due to the ADC from the end of the previous run) and so on? The CMP just checks to see if the zero flag is set before moving on.

Many thanks!
User avatar
qbradq
Posts: 952
Joined: Wed Oct 15, 2008 11:50 am

Re: Unrolling loops?

Post by qbradq »

You can save a few more cycles by only incrementing the counter once, then compare against 4 instead of 16.
tepples
Posts: 22345
Joined: Sun Sep 19, 2004 11:12 pm
Location: NE Indiana, USA (NTSC)
Contact:

Re: Unrolling loops?

Post by tepples »

And in some cases, there's an even slicker advanced way to avoid increments at all, which I call a "pop slide" (by analogy with a nop slide). It's no faster than an absolute indexed loop like what Movax12 posted, but it may allow deeper unrolling in a given amount of ROM space. I just described it on the wiki's article about the stack. Don't worry if you don't understand all of it.
User avatar
tokumaru
Posts: 12106
Joined: Sat Feb 12, 2005 9:43 pm
Location: Rio de Janeiro - Brazil

Re: Unrolling loops?

Post by tokumaru »

CrowleyBluegrass wrote:I've never used the + operator before. I assume it just replaces the task of adding x to the address by just adding a number to the address directly in the assembler?
Yes. Since the base address is explicitly given 4 times, nothing stops you from incrementing the address each time instead of incrementing the index register, the end result is the same (the final address will be base + x, so it doesn't matter whether you increment the base address or the index register). Then in the end you increment the index register by 4 units at once, to catch up.

This trick is useful for accessing arrays of structures. If you think about it, the OAM is an array of 4-byte structures, so when you load an index register with 0, 4, 8, etc. you can have free access to any byte withing the structure just by incrementing the base address.

The same technique could be used for processing game objects, for example. Say that each active object in your game has 16 bytes of RAM. Then, by loading X with 0, 16, 32, 48, 64, etc. you can have free access to any attribute of that object. Like so:

Code: Select all

;this specifies which object bytes correspond to which attributes:

PositionX = 0
PositionY = 1
Health = 3
Speed = 4
(...)
Strength = 15

;this is an example of how to take one health point from the 3rd object (index 2, because we count from 0):

ldx #(16 * 2)
dec ObjectRAM+Health, x
This allows you to have generic subroutines to manipulate objects, you just have to supply the routines with an index.
Post Reply