Scrolling large sprites off the screen using only 8-bit x/y

Are you new to 6502, NES, or even programming in general? Post any of your questions here. Remember - the only dumb question is the question that remains unasked.

Moderator: Moderators

Post Reply
UncleSporky
Posts: 388
Joined: Sat Nov 17, 2007 8:44 pm

Scrolling large sprites off the screen using only 8-bit x/y

Post by UncleSporky »

I was going to ask a question about implementing this, but the solution turned out to be much easier than I'd thought. I'm going to go ahead and post this anyway, since the concept could prove useful for some people.

My sprite drawing routine takes in 16 bit x and y values for a metasprite and calculates which parts of it to draw and which to hide. In other words, if a 32 pixel wide sprite has an x of -16, the routine will hide the left two sprites and draw the remaining part at the edge of the screen. Most people probably draw sprites this way, it's something that needs to be solved early in many NES games' development.

What I've decided to do is save on valuable RAM space by only using 8 bit x and y values for simple or unimportant objects, then convert these 8 bit values to 16 bit for the drawing routine, adjusted so that when x = 0 the entire sprite is off screen.

I like illustrating things, so here's a visual:

Image

The idea is to have a whole screen full of clouds like this that scroll smoothly across and off the screen, but only use up two coordinate bytes instead of four. The sprite will be moving slightly faster than you tell it to, in order to account for its own width.

What you have to do is this:

sprite's 16-bit x for drawing = sprite's x - (((sprite's x eor $FF) * sprite's width) / 256)

As an example, say you have a 24 pixel wide sprite with an x value of 45. You eor with $FF to invert it, resulting in 210. 210 * 24 = 5040 / 256 = 19.6875. Subtracting 19 from the sprite's x value of 45, we find that this sprite should actually be drawn at an x of 26.

This turns out to be quite easy to do in assembly. Use a 16-bit multiplication routine, like this one I found elsewhere on this forum:

Code: Select all

; routine originally created by frantik

; val1 = first 8-bit number to multiply
; val2 = second 8-bit number to multiply (cannot be zero)
; tmp16x = 16-bit result (output)
; temp = temporary variable
; x and y registers are preserved

mult16
   lda #$00     ; clear temporary variables
   sta tmp16x
   sta tmp16x+1
   sta temp
   jmp multstart
-loop
   asl val1     ; double first value
   rol temp     ; using 16bit precision
   lsr val2     ; halve second value
multstart
   lda val2
   and #01      ; is new 2nd value an odd number?
   beq -loop
   clc          ; if so, add new 1st value to running total
   lda tmp16x
   adc val1
   sta tmp16x
   lda tmp16x+1
   adc temp
   sta tmp16x+1
   lda val2
   cmp #01      ; is 2nd value 1?  if so, we're done
   bne -loop    ; otherwise, loop 
   rts
Then simply use whatever result is in the high bits as the value to subtract! With the example above, tmp16x+1 will contain the decimal value 19.

What do you guys think? Useful, not useful, potential hangups? I haven't actually implemented it yet but I like the idea. :)
User avatar
tokumaru
Posts: 12106
Joined: Sat Feb 12, 2005 9:43 pm
Location: Rio de Janeiro - Brazil

Post by tokumaru »

It's an interesting idea, but I'm not a big fan of the trade off here... To gain 2 bytes of RAM you have to dedicate quite a lot of CPU cycles to this multiplication routine. Also, since there's no sub-pixel positioning, the speed of the clouds (or whatever it is the sprites represent) is not very configurable.

Personally, I'd never do something like this for such a small gain, specially considering what you have to sacrifice (CPU cycles and smooth movement). If I was really desperate for a few bytes of RAM I'd cut down the coordinates to 3 bytes instead, so that each coordinate would be 12 bits. 9 bits is enough to smoothly scroll sprites in and out of the screen and there are still 3 bits left for sub-pixel positioning.
UncleSporky
Posts: 388
Joined: Sat Nov 17, 2007 8:44 pm

Post by UncleSporky »

I thought of that too, and subpixel movement precision is nice to have, but I'd think it would ultimately cost you more CPU cycles elsewhere just setting up and dealing with a fragmented 3 byte coordinate set. Stuff like rolling bits in and out repeatedly.

It's true that this method really wouldn't be very useful in a scrolling game like Mario 3, it's more of a single screen thing, or for effects that aren't affected by the scroll (clouds/wind/rain).


EDIT: Well, actually...there could be other uses for this way of thinking.

Currently in my demo I use 4 byte coordinates: 2 each of 4 bit high/8 bit screen/4 bit subpixel.

This would arguably makes the 4 high bits unnecessary, if all you're using it for is to scroll sprites smoothly on and off the screen in a non-scrolling game. If nothing has to leave the screen and survive, you could still have 4 byte coordinates, but go with 8 bit screen/8 bit subpixel. It'd be much smoother and also easier to manage.
bogax
Posts: 34
Joined: Wed Jul 30, 2008 12:03 am

Post by bogax »

Assuming you've only got a few sprite widths, looks like
an obvious candidate for LUTs or dedicated code.

You could find a better multiply routine.

Here's some dedicated code with a hardcoded
multiply for 24 bit width.

First, what you've got is:

x-(256-x-1)* w/256)

I think what you want is:

x-(256-x)*w/256

So

x-(w*256/256-x*w/256)
x-(w-x*w/256)
x-w+x*w/256
x*w/256-w+x

And with a little care you can fit that into 8 bits

Code: Select all

   lda x 
   lsr        ; .5*x       
   adc x      ; 1.5*x
   ror
   lsr
   lsr
   lsr        ; 1.5*x/16 = 24*x/256
   sbc #$18   ; defer the sec to facilitate
   sec        ; rounding
   adc x
User avatar
Bregalad
Posts: 8036
Joined: Fri Nov 12, 2004 2:49 pm
Location: Caen, France

Post by Bregalad »

And what is the problem with 16-bit coordinates ?
I first programmed my engine with 8-bit and when all the problems arose I spend weeks to convert it to 16-bit. The lower 4-bit is subpixel precision, the middle 8-bits are pixels and higher 4-bits would be "screen" (only used for overflow if not scrolling).

It really makes it a lot easier to move object smoothly and move them off-screen properly.
Useless, lumbering half-wits don't scare us.
UncleSporky
Posts: 388
Joined: Sat Nov 17, 2007 8:44 pm

Post by UncleSporky »

Nothing's really wrong with them, I like 16 bit coordinates. This is just a method of saving some precious RAM, perhaps only necessary for people with a really bloated setup. I'm not pretending this is a great way to do it or anything, I know it'll be situational. Just wanted to share.

If your engine supports a player, 16 enemies and 16 objects, you could save up to 66 bytes if you decided to use this method on all of them, about 3% of your total RAM.

Or as I said above, if you'd like to have a full 8 bits of subpixel precision but don't want to move to 3 byte x and y values, this is a way to do that too.

I mean, we can let this topic die if you want, I didn't intend for it to be a big point of contention. If people think it's not a good idea compared to other options, that's fine, I'm not bummed. :) Just an idea I had.

And thanks for that code, bogax, I didn't think of hardcoding it like that. It's true that sprites would always be specific widths that are simpler to work with than using a generic multiply.
User avatar
Bregalad
Posts: 8036
Joined: Fri Nov 12, 2004 2:49 pm
Location: Caen, France

Post by Bregalad »

If your engine supports a player, 16 enemies and 16 objects, you could save up to 66 bytes if you decided to use this method on all of them, about 3% of your total RAM.
my $2 : If so many objects are present, chances are that it would slow the game down. So you should make slots for less objects, and therefore save more ram more intelligently than by using the method you describe in your post.
Useless, lumbering half-wits don't scare us.
User avatar
tokumaru
Posts: 12106
Joined: Sat Feb 12, 2005 9:43 pm
Location: Rio de Janeiro - Brazil

Post by tokumaru »

If 66 bytes make that much of a difference, you're in serious trouble. IMO there are better ways of getting more RAM, like using a great portion of page 1, something not many people do because that's the stack page.
User avatar
Dwedit
Posts: 4470
Joined: Fri Nov 19, 2004 7:35 pm
Contact:

Post by Dwedit »

I don't usually use more than 32 bytes of stack, but I still let it go up to 64 bytes.
Here come the fortune cookies! Here come the fortune cookies! They're wearing paper hats!
User avatar
qbradq
Posts: 952
Joined: Wed Oct 15, 2008 11:50 am

Post by qbradq »

I leave the stack page alone unless I am 100% out of RAM everywhere else. It gives me the creeps :D
User avatar
Dwedit
Posts: 4470
Joined: Fri Nov 19, 2004 7:35 pm
Contact:

Post by Dwedit »

Stack page is great to use for PPU updates when scrolling, since you can do PLA / STA $2007 in an unrolled loop.
Here come the fortune cookies! Here come the fortune cookies! They're wearing paper hats!
User avatar
tokumaru
Posts: 12106
Joined: Sat Feb 12, 2005 9:43 pm
Location: Rio de Janeiro - Brazil

Post by tokumaru »

Dwedit wrote:Stack page is great to use for PPU updates when scrolling, since you can do PLA / STA $2007 in an unrolled loop.
Too bad the the only advantage in doing that is ROM use, since PLA takes as many cycles as LDA absolute (or indexed, which makes the stack pointer not so useful either). If PLA was faster I'd do this more often.

So far I haven't used more than 10 bytes of my stack, and I don't expect that to change much. I have set aside only 32 bytes for it.
User avatar
qbradq
Posts: 952
Joined: Wed Oct 15, 2008 11:50 am

Post by qbradq »

I like using the stack for backing up a value without having to use a memory address, and my call stack gets fairly deep when I am doing out-of-frame stuff, like decompressing a map.

It all depends on your personal style I guess :)
Post Reply