Sprite mappings

Discuss technical or other issues relating to programming the Nintendo Entertainment System, Famicom, or compatible systems.

Moderator: Moderators

User avatar
tokumaru
Posts: 12106
Joined: Sat Feb 12, 2005 9:43 pm
Location: Rio de Janeiro - Brazil

Post by tokumaru »

tepples wrote:This is NESdev, where four thousand cycles equal 35.2 scanlines.
Yes... Boy, I wish it were just 4!
Celius wrote:My game will scroll all 4 directions just as yours does
Oh, I see... I kinda remembered Castlevania games to be side-scrolles, with stairs taking you to the other floors, but with no vertical scrolling. Well, then you know that scrolling in both directions is not exactly trivial! =)
So it took you 4000 cycles just to draw those three metasprites? I guess I'll have to see how long it'll take for my routine. But I have to ask. What exactly is happening in your drawing routine?
It'd probably be easier for me to just paste it here then explaining everything, so here it is:

Code: Select all

DrawMetaSprite:
	;-- SUBROUTINE --------------------------------------------------
	;DESCRIPTION:
	; Processes the sprites in a sprite definition.
	;INPUT:
	; A: Mask used to modify the attributes;
	; X: bytes to skip when moving to the next slot (4 or -4);
	; SpriteDefinition: Address of the sprite definition;
	; SpriteStep: value to add to the slot index after each sprite;
	; SpriteX, SpriteY: coordinates of the object;
	;DESTROYS: A, X, Y, SpriteX, SpriteY;
	;----------------------------------------------------------------

	;Verify if there are slots left
	ldy SlotsLeft
	bne +
	rts
+
	;Save the attributes
	sta SpriteAttrib

	;Point to the first byte in the definition
	ldy #$00

	;Copy the sprite count
	lda (SpriteDefinition), y
	sta SpritesLeft

	;Calculate the central X coordinate of the sprite
	sec
	lda SpriteX+0
	sbc CameraX+0
	sta SpriteX+0
	lda SpriteX+1
	sbc CameraX+1
	sta SpriteX+1

	;Fix the coordinate if the sprite is flipped horizontally
	bit SpriteAttrib
	bvc NoHorFlip
	sec
	lda SpriteX+0
	sbc #$07
	sta SpriteX+0
	lda SpriteX+1
	sbc #$00
	sta SpriteX+1
NoHorFlip:

	;Calculate the central Y coordinate of the sprite
	sec
	lda SpriteY+0
	sbc CameraY+0
	sta SpriteY+0
	lda SpriteY+1
	sbc CameraY+1
	sta SpriteY+1

	;Compensate for the sprite delay and blank scanlines
	clc
	lda SpriteY+0
	adc #$0f
	sta SpriteY+0
	lda SpriteY+1
	adc #$00
	sta SpriteY+1

	;Fix the coordinate if the sprite is flipped vertically
	bit SpriteAttrib
	bpl NoVertFlip
	sec
	lda SpriteY+0
	sbc #$0f
	sta SpriteY+0
	lda SpriteY+1
	sbc #$00
	sta SpriteY+1
NoVertFlip:

	;Load the correct index of the slot
	stx SpriteStep
	txa
	bmi +
	ldx SpriteSlotA
	jmp DrawSprite
+	ldx SpriteSlotB
	jmp DrawSprite

OutOfScreen:
	dec SpritesLeft
	beq SpritesFinished

	;Advance to the next definition block
	clc
	tya
	and #%11111100
	adc #%00000100
	tay

DrawSprite:
	;Advance to the next definition byte
	iny

	;Load the relative X coordinate
	lda (SpriteDefinition), y
	;Check if the sprite is flipped horizontally
	bit SpriteAttrib
	bvc +
	;Invert the value if it is
	eor #$ff
+	sta SpriteTemp
	;Add the displacement
	clc
	adc SpriteX+0
	;Store the result
	sta SpritePage+3, x
	;Check if the result was valid
	php
	lda #$7f
	cmp SpriteTemp
	adc #$80
	plp
	adc SpriteX+1
	and ScreenXMask
	bne OutOfScreen

	;Advance to the next definition byte
	iny

	;Load the relative Y coordinate
	lda (SpriteDefinition), y
	;Check if the sprite is flipped vertically
	bit SpriteAttrib
	bpl +
	;Invert the value if it is
	eor #$ff
+	sta SpriteTemp
	;Add the displacement
	clc
	adc SpriteY+0
	;Store the result
	sta SpritePage+0, x
	;Check if the result was valid
	php
	lda #$7f
	cmp SpriteTemp
	adc #$80
	plp
	adc SpriteY+1
	and ScreenYMask
	bne OutOfScreen

	;Advance to the next definition byte
	iny

	;Load the index of the sprite
	lda (SpriteDefinition), y
	;Store it in the slot
	sta SpritePage+1, x

	;Advance to the next definition byte
	iny

	;Load the byte with the attributes of the sprite
	lda (SpriteDefinition), y
	;Modify it as necessary
	eor SpriteAttrib
	;Store it in the slot
	sta SpritePage+2, x

	;Move on to the next slot
	clc
	txa
	adc SpriteStep
	tax

	dec SlotsLeft
	beq SpritesFinished

	;Move on to the next definition
	dec SpritesLeft
	bne DrawSprite

SpritesFinished:
	lda SpriteStep
	bmi +
	stx SpriteSlotA
	rts
+	stx SpriteSlotB
	rts
This is the working version. As far as I tested, no errors. There are probably ways to optimize it, and I'll look into it soon. But since the sprites are fully working now, I'll go back to working on the background code, which is almost ready.

EDIT: Oh, you must remember to clear the unused sprites after you're done drawing all the objects. I do this with the following code:

Code: Select all

	;Clear the unused sprite slots
	lda SlotsLeft
	beq SlotsCleared
	ldx SpriteSlotA
	lda #$ef
ClearSlot:
	sta SpritePage+0, x
	inx
	inx
	inx
	inx
	dec SlotsLeft
	bne ClearSlot
SlotsCleared:
User avatar
Bregalad
Posts: 8036
Joined: Fri Nov 12, 2004 2:49 pm
Location: Caen, France

Post by Bregalad »

I think no game where the player is supposed to interract in real time should ever run slower than the console's framerate. All Mega man games runs at the consle frame rate, and all Castlevania games too. Don't take any fasle information as true or you'll end up make wrong decisions.

Hey, I'm impressed. If I put the maximum of 8 objects, the sprite mazing routine effectively takes a lot of time. Something about 50% of the whole CPU time I assume. About 2/3 of the screen are grayed when I gray this part trough $2001. However, it's fairly rare that that much objects are active, and this is the only time-consumming task during gameplay if you're not scrolling.

Also, I wrote my routine with easy-to use in mind and it's not very good optimised. For every single sprite, the program does a lot of checks before actually mazing it. The idea to have different sprites configuration tables is crazy, but maybe it could work who knowns ?

EDIT : Tokumaru, it's amazing how you programm things differently as I do. You take your sprite coordinates, and add somthing to them, store them back, then add something to them again, store them agin etc... You do everything step by step. I would never do anything like this myself, I'd always take the coordinates perform all checks and calculations on them, and then store them back at the end. I guess your way of doing things is clearer to understand than mine, but in the end maybe it's slightly less optimised.
User avatar
tokumaru
Posts: 12106
Joined: Sat Feb 12, 2005 9:43 pm
Location: Rio de Janeiro - Brazil

Post by tokumaru »

Bregalad wrote:You do everything step by step. I would never do anything like this myself, I'd always take the coordinates perform all checks and calculations on them, and then store them back at the end.
I'd rather do that too (I'm all for performance rather than making understandable code), but when you are working with 16-bit values there is not much choice... you have to store the result because you have to use A again for the high byte! Unless you used X and/or Y to hold temporary results, something I do in other parts of my code, but for just 1 extra CPU cycle this is hardly worth it.

Do you have any ideas on how I could optimize the code above? Optimizing the loop has a much bigger effect than optimizing the setup that comes before it, that's for sure.

On a somewhat related topic, the background-drawing routine seems to perform much better than this one. Even when rendering a row and a column in the same frame, many less cycles are used, when comparing to the sprites.
tepples
Posts: 22345
Joined: Sun Sep 19, 2004 11:12 pm
Location: NE Indiana, USA (NTSC)
Contact:

Post by tepples »

Bregalad wrote:I think no game where the player is supposed to interract in real time should ever run slower than the console's framerate.
Doom for PC ran at much slower than the 70 fps of VGA mode 13h. A lot of PS1 games ran at 30 or even 20 fps.
All Mega man games runs at the consle frame rate, and all Castlevania games too.
Including the Castlevania games on Game Boy?
Hey, I'm impressed. If I put the maximum of 8 objects, the sprite mazing routine effectively takes a lot of time.
Where did the word "mazing" come from?
User avatar
Bregalad
Posts: 8036
Joined: Fri Nov 12, 2004 2:49 pm
Location: Caen, France

Post by Bregalad »

@ Tokumaru : The inc and dec instruction are made to handle 16-bit opperation when the high byte is only here to serve to test purpose. For example this :

Code: Select all

 clc
   lda SpriteY+0
   adc #$0f
   sta SpriteY+0
   lda SpriteY+1
   adc #$00
   sta SpriteY+1

   ;Fix the coordinate if the sprite is flipped vertically
   bit SpriteAttrib
   bpl NoVertFlip
   sec
   lda SpriteY+0
   sbc #$0f
   sta SpriteY+0
   lda SpriteY+1
   sbc #$00
   sta SpriteY+1 
Could be optimized in this :

Code: Select all

   lda #$00
   sta Temp
  clc
   lda SpriteY+0
   adc #$0f
   sta SpriteY+0
   bcc +
   inc Temp
+
   ;Fix the coordinate if the sprite is flipped vertically
   bit SpriteAttrib
   bpl NoVertFlip
   sec
   lda SpriteY+0
   sbc #$0f
   sta SpriteY+0
   bcs +
   dec Temp
+
etc...
Or even better use X or Y instead of a temporary variable (but this isn't always managable, especially in sprite mazing routine where you keep an index indexing the OAM all the time (at least I do this)).

@ tepples : I'm don't remember where mazing come from, but I'm pretty sure I didn't made it up. Isn't this a correct english word ?

Oh and by the way I don't know any games that runs solwer than the console framerate while looking good. I never played doom, but this is an early 3D game, so I think the lag is excusable. I also never played any original gameboy Castlevania games, I was talking about NES Castlevania games that runs at full framrate (60 fps on the NTSC and 50 fps on PAL).

I don't know much about original Gameboy games, but I'm pretty sure the only gameboy game I really love, wich is Final Fantasy Adventure, runs at full speed. All 2D Gameboy Color and Advance games I played seems to run at full speed.
User avatar
tokumaru
Posts: 12106
Joined: Sat Feb 12, 2005 9:43 pm
Location: Rio de Janeiro - Brazil

Post by tokumaru »

I see what you mean... but this can only be done when one of the numbers is 8-bits and positive. But yeah, I could do what you suggested. This part of the code is still outside of the loop, so I could use X or Y if I needed to. But inside the loop, X is used to point to the sprite slots (OAM mirror) and Y is used to point to the sprite definitions.

About the word "mazing", I believe it makes some sense because of sprite cycling, where the sprites are distributed semi-randomly (or "mazed") across the sprite slots. I don't know. =)
User avatar
tokumaru
Posts: 12106
Joined: Sat Feb 12, 2005 9:43 pm
Location: Rio de Janeiro - Brazil

Post by tokumaru »

Bregalad wrote:Oh and by the way I don't know any games that runs solwer than the console framerate while looking good. I never played doom, but this is an early 3D game, so I think the lag is excusable. I also never played any original gameboy Castlevania games, I was talking about NES Castlevania games that runs at full framrate (60 fps on the NTSC and 50 fps on PAL).

I don't know much about original Gameboy games, but I'm pretty sure the only gameboy game I really love, wich is Final Fantasy Adventure, runs at full speed. All 2D Gameboy Color and Advance games I played seems to run at full speed.
Or maybe they did a good job of it and you didn't even notice the difference! =)
Celius
Posts: 2159
Joined: Sun Jun 05, 2005 2:04 pm
Location: Minneapolis, Minnesota, United States
Contact:

Post by Celius »

So I made my sprite drawing routine, and I think I'm gonna have to come up with a better idea. What I did was I had the tile values and attribute values in an array in RAM. After fetching those, I calculated the coordinates of every sprite, and I copied all the data from the arrays into the OAM page. In the tile fetching routine, I checked to see if there was a flip. If so, I copied the values accordingly. The problem is that I didn't do it for the coloring.

It took me about 8 scanlines to draw an 2x2 sprite, which I don't think is very good. If I took a 4x4, it'd take about 32 scanlines. So I think I might want to take a different approach.

Tokumaru, I look at your code, and I really don't understand how you handle flips. Could you explain?
User avatar
tokumaru
Posts: 12106
Joined: Sat Feb 12, 2005 9:43 pm
Location: Rio de Janeiro - Brazil

Post by tokumaru »

Yeah, your times are not looking very good... 32 scanlines is the time it took me to draw 3 2x4 sprites, and that's not very good either.
Celius wrote:Tokumaru, I look at your code, and I really don't understand how you handle flips. Could you explain?
You mean vertical and horizontal flipping? Well, my definitions have the relative (relative to the position of the object, in my routine, SpriteX and SpriteY) coordinates.

So, I can say that a sprite is 8 pixels to the left of the central point and 16 pixels above it, for example. Before adding the relative X value, I check if the sprite is fliped horizontally. If yes, I invert the 8 turning it into -8, so it's moved to the other side. But this is still not enough, because the cordinates of the sprite are for it's top left corner, but when fliping it you'd kinda like those coordinates to be for the right corner, but since this is impossible, I just move the coordinates of the object to the side to compensate for this before entering the loop. Flipping vertically works exactly the same.

About inverting the number, to do it you have to inver all the bits (eor #$ff) and add one. To avoid having to add one to each sprite, I take this 1 into account when compensating for the width of the sprite as I said above.

The idea is that I tweak the coordinates of the object in case of flipping, so that each relative coordinate can be flipped with a simple EOR command. When outputting the attributes of each sprite, the individual flipping bits are EOR'ed with the flipping bits of the whle object, so if the object used any flipped sprites originally, they'd be unflipped, causing them to look flipped relative to the other ones that were just flipped. Well, this sounded confusing, but trust me: the definitions an contains flipped and unflipped sprites, and the final structure of the object is maintained in case o flipping because all the sprites will be flipped, even the ones that were already flipped.

But if I'm not mistaken, your sprites are arranged in grids, right? So you don't define the coordinates of each sprite, but only of the whole block, right? I must admit that this seems harder to flip. But since this was my original design, I had a solution for this.

My designe used relative coordinates for the top left corner of the grid, it's width and height (in sprite units), and then the indexes and attributes of each individual sprite. To flip that, You'd also have to invert the relative coordinates to have them go to the other side of the block. When inverting, you'd probably have to account for the width of the sprites (8) too. After that you got the coordinates of the first sprite, and can enter the loop that will draw them all.

In this loop, you should check the high byte of each coordinate and if both are 0, output the sprite. Increment the X coordinate for the next sprite. The amount you use to increment should probably be in a variable, because you'll want to add 8 when it's not flipped and -8 when it is. Just set this variable with the proper value before entering the loop.

I'd keep the number of horizontal sprites (width of the block) in an index register, so that I could decrement it and detect when the first row ended. When the row ends, reset the X coordinate (to the number yu calculated right before entering the loop), and increment the Y coordinate by 16 or -16 (the amount should be in a variable, like for the X coordinate), assuming you are using 8x16 sprites. When the number of vertical sprites (height) ends, you're done.

That would not need any buffers, you could just keep updating the same pair of coordinates for all the sprites (and just keep the calculated X coordinate for when starting new rows). This is how I'd do it.

I'm considering implementing a routine like this and use both types of sprites in my game, because this other type seems to leave more room for otimization. Depending on the type of the object, it will call one routine or the other, and I won't waste precious cycles when they are not needed.

Oh, I think I should advise against using a lot of RAM buffers/arrays, specially when it's possible to output the data directly. Handling arrays is a very time consuming process, because of the loops and all that. Did you see how the output in my routine works? When I output the X and Y coordinates of the sprite, I always write them to the OAM page directly, even before knowing if they are valid or not. I leave the validity check for later (and i don't even store the high byte anywhere, I just need to know if it is zero or not), and in case a coordinate was not valid, I simply do not advance a slot, and that invalid information will be overwritten by the next valid sprite. This makes the cases when the coordinates are valid much faster then buffering the results.

Heh, I had never thought that this task could use so much CPU time!

Bregalad, you said you used tables to make this process faster... what kind of tables are those? I can't think of anything you could pre-calculate to make this whole process faster...
User avatar
tokumaru
Posts: 12106
Joined: Sat Feb 12, 2005 9:43 pm
Location: Rio de Janeiro - Brazil

Post by tokumaru »

You know something that sucks? Having to switch banks to access different types of data (level maps, sprite mappings, etc) with the MMC1, which requires a lot of time to complete a register write.

I'm saying this because I need the screen mappings to be loaded when the object routines are executed, because objects my need that information when walking, and so on. The actual level map and the object definitions are in RAM, so those are fine. But to render their sprites, the sprite mappings must be loaded. They can't be in the same bank, because there are many different screen mappings, spread across multiple banks.

The only solution I see is to buffer the parameters that would otherwise be sent to the drawing routine, and send them all at once after all the objects have been processed, so I'd bankswitch only once. This solution is annoying, because it uses more RAM, and wastes more time with the menaging of this new list.

Another option would be to dedicate part of the object RAM itself to hold the buffered values, and just scan all objects again sending the buffered values to the drawing routine, when these are present. In any case, the sprites should be rendered last.
User avatar
Bregalad
Posts: 8036
Joined: Fri Nov 12, 2004 2:49 pm
Location: Caen, France

Post by Bregalad »

Oh my it's amazing how you can get complicted stuff from a simple stuff.

Well, I don't see the problem to have all sprites definition is a single bank. And bankswitch in MMC1 is a bit longer than with a discrete logic mapper, but it's really nothing to worry about I think. 5 writes and 4 shifts take something like 30 cycles or so.

And myself I've used completely different sprite definitions for flipped sprites, so that they aren't forced to be symetric. However it's a top-down game so this is really different than a plaftomer where everything is flipped horizontally anways.
Celius
Posts: 2159
Joined: Sun Jun 05, 2005 2:04 pm
Location: Minneapolis, Minnesota, United States
Contact:

Post by Celius »

That gives me some great ideas! Inverting the object as a whole sounds much simpler than going through some inverting loop that takes a million cycles.

And also, I did the same thing in my code. I took the sprite, calculated it's coordinates, and if the high byte was used, I simply moved on to the next cel in the object. This I would not change.

I just have to think about how to go about this wisely. I won't use arrays, because in the end, it's a waste of time and RAM. The only thing I'll use arrays for is updating while scrolling. But yeah, inverting the top left corner to the other side is a really smart idea.

Also, in my sprite tables, I had the color data compressed, and this is why it took a lot longer. I had 4 attributes compressed into one byte, so I had to do numerous shifts to get these out. But I think I'll just stick to using decompressed values, but I feel like I wasting so much by only using 2 bits in every definition. What do you think about that? Should I leave them decompressed?

And yes, I have my sprites in arranged grids, so I have one general coordinate for that object. I really have to think about how to do this routine wisely.

EDIT: I've tested my new routine. It takes about as long as yours, Tokumaru. I can probably shorten it a bit. I would post it up, but I don't have any comments or anything on it, and it wouldn't make much sense. I'll post it up later.

You said something about inverting your positions. I avoided inverting pretty much. Last time, I took a different tile/color for a certain set of coordinates if the object was flipped. BAD IDEA. I just read the data as is this time, and calculated the coordinates for that specific cell depending on whether or not it was flipped. For a flip, I took the tile width of the sprite - 1, multiplied it by 8, and just added it to the X coord. This will give me the X coord for the tile on the right side of the metasprite. I then subtract 8 for every tile placement instead of adding. I did the same for vertical flips, except I multiplied the vertical position -1 by 16.

But drawing a 2x4 sprite took about 12 scanlines, but this can be shortened. I would really rather draw from an array in RAM, because I could compress my tables to not take up so much space.

I'll modify my routine to take less time. I'll also add some comments and post it up.

Also, Tokumaru, I see in your first screen shot that the top of the screen is pink partially for seeing how long the routine is. Did you take Vblank into account when seeing how long your routine was? In the beggining of my routine, I waste time so I can get out of Vblank to see how many scanlines it takes.
User avatar
tokumaru
Posts: 12106
Joined: Sat Feb 12, 2005 9:43 pm
Location: Rio de Janeiro - Brazil

Post by tokumaru »

Celius wrote:EDIT: I've tested my new routine. It takes about as long as yours, Tokumaru. I can probably shorten it a bit.
When you optimize it, tell me what you did, maybe you can give me some ideas! =) I will not work on this again until my scrolling engine fully works. I got the columns updating fine now, I just gotta do the rows, but I got everything pretty much worked out already. There's some tweaking to the code that handles attributes too.
I just read the data as is this time, and calculated the coordinates for that specific cell depending on whether or not it was flipped. For a flip, I took the tile width of the sprite - 1, multiplied it by 8, and just added it to the X coord. This will give me the X coord for the tile on the right side of the metasprite. I then subtract 8 for every tile placement instead of adding. I did the same for vertical flips, except I multiplied the vertical position -1 by 16.
Yeah, I think this is the way to go for grid-aligned metasprites. Now let me ask you one thing: from what I can see, your coordinates always indicate the top left corner of the sprite, right? This is the only part I seem to diagree with you, as I chose to have a pair or coordinates relative to the central point of the object (Sonic's is at the bottom, by his feet, centered horizontally) indicate where the sprites are. This keeps me from having to manually calculate the position of the sprites every time... Well, unless you consider your player's coordinates to be at the top left corner, like the sprite. I wouldn't do that, but if it works for you, OK.
Also, Tokumaru, I see in your first screen shot that the top of the screen is pink partially for seeing how long the routine is. Did you take Vblank into account when seeing how long your routine was? In the beggining of my routine, I waste time so I can get out of Vblank to see how many scanlines it takes.
Yeah, there are other things before the sprite code that take up most of VBlank (I update the palette, draw a few patterns, and there's some other test code), so I guess that was pretty accurate.
Celius
Posts: 2159
Joined: Sun Jun 05, 2005 2:04 pm
Location: Minneapolis, Minnesota, United States
Contact:

Post by Celius »

Here's the code, but it didn't really end up taking less time. But it's commented:

Code: Select all

;The first three bytes of the sprite definition are the number of tiles in the metasprite,
;The X dimension, and the Y dimension of the metasprite. The rest of the bytes
;define colors and tile IDs. So the next byte after the Y dimension byte will represent
;the Tile ID for the first tile. The next one will be the color data for that tile.
;The next two will represent the tile ID and the color for the next cel in the metasprite.
;It goes on for however many cels are in the metasprite.

DrawMetaSprite:
	ldx #4
	ldy #0
-
	iny
	bne -
	dex
	bne -
	lda #$00
	sta $2001
	lda #<NoFlipX		;We may be jumping to these locations
	sta TempAddL		;Depending on if there's a flip or not.
	lda #>NoFlipX
	sta TempAddH
	lda #<NoFlipY
	sta TempAdL1
	lda #>NoFlipY
	sta TempAdH1
	ldy #0			;Start at the beggining. Obviously.
	lda (SampleL),y		;Load the number of cels in the metasprite.
	sta SpritesLeft
	iny			;Go to the next byte.
	lda (SampleL),y		;Load the width of the metasprite
	sta DimX
	iny			;Go to the next byte.
	lda (SampleL),y		;Load the Height of the metasprite.
	sta DimY
	iny			;Go to the next byte.

	
;**************************************************
	sec			;Here we take the coords of the object,
	lda ObjectXL		;Subtract the coordinates of the screen
	sbc ScreenXL		;And it becomes the relative coordinates of the metasprite.
	sta StartingXL		;But we need to remember it for when we start a new row
	sta CurrentXL		;Of sprites, so we have a starting X value.

	lda ObjectXH		;All coordinates are 16-bit. So we need to take that into account.
	sbc ScreenXH		;The only reason for a 16-bit X coordinate is so we can determine
	sta StartingXH		;if a cel in a sprite will be displayed or not.
	sta CurrentXH

;*********************
	sec			;The same goes for the Y coord. However, a starting value and
	lda ObjectYL		;The current value do not need to be seperate, because we don't
	sbc ScreenYL		;Need to refresh the value once we're done with it.
	sta CurrentYL

	lda ObjectYH
	sbc ScreenYH
	sta CurrentYH
;**************************************************

	bit FlipStatus		;We'll test to see if it's flipped horizontally.
	bvc +			;If not, skip ahead.
	ldx DimX			;To calculate the X position of the opposite side,
	dex			;We use the formula NewXPos = (Width - 1) * 8 + CurrentXPos
	txa			;After getting that, we'll tell the routine to subtract 8
	asl a			;For every tile instead of adding. We lay the tiles right to left
	asl a			;Instead of left to right.
	asl a
	clc
	adc StartingXL
	sta StartingXL
	sta CurrentXL
	lda StartingXH
	adc #0
	sta StartingXH
	sta CurrentXH
	lda #<FlipHrzntl		;Instead of doing comparisons to see if it's flipped or not, we'll just jump
	sta TempAddL		;Directly to where we need to go with a Temporary address.
	lda #>FlipHrzntl
	sta TempAddH

;*********************
+
	bit FlipStatus		;We check here to see if there's a vertical flip
	bpl +			;If not, just skip ahead.
	ldx DimY			;We can use a formula very similar to the one to
	dex			;calculate the Y coord of the bottom cels.
	txa			;NewYPos = (Height - 1) * 16 + CurrentYPos
	asl a
	asl a
	asl a
	asl a
	clc
	adc CurrentYL
	sta CurrentYL
	lda CurrentYH
	adc #0
	sta CurrentYH
	lda #<FlipVrtcl		;We also tell the routine to go bottom to top instead of
	sta TempAdL1		;Top to bottom if there's a vertical flip.
	lda #>FlipVrtcl
	sta TempAdH1

;**************************************************
+
	lda DimX			;Here we copy the value of the X dimension because
	sta Variable1		;We'll be needing to do a certain loop for however many tiles the sprite is wide.
	ldx CurrentPos		;Start where we left off if we call this routine more than once. (It starts off as 0)
DrawSprites:
	lda CurrentYH		;Before doing anything, we need to check if the cel is actually on screen
	beq +			;If the High byte is used, it's off screen.
	iny			;Move on to the next set of definitions
	iny
	jmp ++			;Skip past table copying
+
	lda CurrentXH		;See if the high byte is used for the X coord
	beq +
	iny			;Move on to the next set of definitions
	iny
	jmp ++			;Skip past the copying
+
	lda CurrentYL		;Copy the current Y value
	sta OAMPage,x
	inx
	lda (SampleL),y		;Copy the current tile ID
	sta OAMPage,x
	iny			;Get the next byte
	inx
	lda (SampleL),y		;Copy the Attribute data
	ora FlipStatus		;This byte can include priority data, I just called it FlipStatus for some reason.
	sta OAMPage,x
	iny
	inx
	lda CurrentXL		;Copy the X position
	sta OAMPage,x
	inx
++
	jmp (TempAddL)
--
	dec Variable1
	bne DrawSprites

	lda DimX
	sta Variable1
	lda StartingXL
	sta CurrentXL
	lda StartingXH
	sta CurrentXH	
	jmp (TempAdL1)
-
	dec DimY
	bne DrawSprites
	lda #$1E
	sta $2001
	stx CurrentPos
	jsr Clear_Unused
	ldx #0
	stx CurrentPos
	jmp Return

NoFlipX:
	clc
	lda CurrentXL
	adc #8
	sta CurrentXL
	lda CurrentXH
	adc #0
	sta CurrentXH
	jmp --
NoFlipY:
	clc
	lda CurrentYL
	adc #16
	sta CurrentYL
	lda CurrentYH
	adc #0
	sta CurrentYH
	jmp -
FlipHrzntl:
	sec
	lda CurrentXL
	sbc #8
	sta CurrentXL
	lda CurrentXH
	sbc #0
	sta CurrentXH
	jmp --
FlipVrtcl:
	sec
	lda CurrentYL
	sbc #16
	sta CurrentYL
	lda CurrentYH
	sbc #0
	sta CurrentYH
	jmp -

Clear_Unused:
	lda #0
	sec
	sbc CurrentPos
	tay
	ldx CurrentPos
	lda #$FF
-
	sta OAMPage,x
	inx
	dey
	bne -
	rts
At the beggining, I waste time just to get it out of Vblank. Then I shut the screen off until it's done with the loop. And the thing at the end will be changed. I won't jump directly into the clearing routine after the first sprite is done being drawn. There are many things just there for testing purposes. I'll also be doing a different routine to check whether or not the metasprite is touching the screen. After confirming, I'll call the routine.

EDIT: I had to hurry, so I left some things out of my post. Yes, my object positions are always defined by the top left coordinate. I don't really see a reason to change it. I think it works fine the way it is.

But after looking at your routine again, I notice that it allows for objects that aren't completely surrounded by a box, while mine doesn't. This would be really good in some cases, but generally metasprites are so small that you wouldn't really have sprites displayed that are blank tiles. In my game, most of the background enemies are the big ones. But yours allows for it because you define all the positions in the metasprite. I personally see this as a lot of ROM being used, but if it works for you, that's good.
tokumaru wrote:I will not work on this again until my scrolling engine fully works. I got the columns updating fine now, I just gotta do the rows, but I got everything pretty much worked out already. There's some tweaking to the code that handles attributes too.
I took a long break from NESdev a couple months ago. As soon as I got back in, I finally conquered that task once and for all. I hope to never have to make another scrolling routine. I felt really really good once I finished it, because I can use it in pretty much any game that uses scrolling. I just need to tweak it to allow scrolling speeds faster than 4 pixels. If my rows or columns are split between two nametables, I write the data for one half in one frame, and write the data for the second half in the next. This is the reason I can't scroll faster than 4 pixels a frame, because I update every section of 8 pixels. By the time the second part of the column/row needs to be written, it's already displaying a new column/row that needs to be updated. So the first half of the column/row would be updated correctly, while the next part appears in the newly displayed row/column. It's dumb, and I have to fix it. Then I'll be able to scroll 8 pixels a frame. This will be a problem for my character falling down a long pit or something, because gravity will grow to have the character falling faster than 4 pixels a frame, and my camera needs to follow the character.

I suppose yours has to support really really high speeds, huh?
User avatar
tokumaru
Posts: 12106
Joined: Sat Feb 12, 2005 9:43 pm
Location: Rio de Janeiro - Brazil

Post by tokumaru »

Celius wrote:If my rows or columns are split between two nametables, I write the data for one half in one frame, and write the data for the second half in the next.
Boy, you'd flip if you saw my routines that draw columns and rows!
I suppose yours has to support really really high speeds, huh?
16 pixels per frame, in both directions if necessary! 8)

I always draw full metatiles, never just tiles. Rows are always 17 metatiles long, and columns 15 metatiles tall. I always assume they will cross the name table barrier (rows in fact always do, because they are wider than the name table, and columns almost always do to). It's really not hard at all to handle this...

See, you most likely have the destination address (the one you write to $2006) stored somewhere, because you use it to write the first half of the row/column being updated. After you write the first half, with a small modification to that address you are ready to draw the second half! When drawing rows, for example: if you crossed the edge of the name table and entered the other one, you should flip the bit in the address that selects between the 2 name tables (so, if you updated name table 1, now you'll update name table 0). The other little modification is to clear all the bits that select the X coordinate, because since you just entered a new name table, you'll sure start updating if from the absolute left.

What I'm saying is that you don't have to spread your update across 2 frames, since with this simple modification of the address you can find the address to where the rest of the tiles should go.

Now, if your problem is speed, it mostly likely is because you are drawing the tiles with a loop. Loops are slow, and for maximum speed I use a series of LDA & STA instead of loops. You may ask how can I do this if I don't know how many tiles will go to each name table... the answer is pretty simple... a jump table. So, my drawing "loop" looks something like this:

Code: Select all

DrawMetatiles:
	;Y holds the number of metatiles to draw
	lda SkipDrawLo, y
	sta TempAddress+0
	lda SkipDrawHi, y
	sta TempAddress+1
	;Skip a number of metatiles
	jmp (TempAddress)
Draw16:
	lda TileBufferA-16, x
	sta $2007
	lda TileBufferB-16, x
	sta $2007
Draw15:
	lda TileBufferA-15, x
	sta $2007
	lda TileBufferB-15, x
	sta $2007

(...)

Draw02:
	lda TileBufferA-02, x
	sta $2007
	lda TileBufferB-02, x
	sta $2007
Draw01:
	lda TileBufferA-01, x
	sta $2007
	lda TileBufferB-01, x
	sta $2007
Draw00:
	rts
Two tables hold the address of where to skip, depending on how many metatile I have to draw:

Code: Select all

SkipDrawLo:
	.db <Draw00, <Draw01, <Draw02, <Draw03, <Draw04, (...)
SkipDrawHi:
	.db >Draw00, >Draw01, >Draw02, >Draw03, >Draw04, (...)
The routine must be called twice, once for each half. Note that because of the "Draw00" label, you can always assume the tiles are divided, because even if they aren't, there will be no harm done.

Then there is the value of X... This is a big part of the trick: the first time the routine is called, it should be the number of metatiles you want to draw. So if you wanted to draw 4 metatiles, X would be 4. The jump would send you directly to the "Draw4" label, where the value at "TileBufferA-02, x" is loaded. If X is 4, the address will be TileBuferA, which is the beginning of the buffer, and this is exactly what we want.

For the second time, X should be whatever makes the last copy command see the last slot of your buffer. Since this is a row of 17 metatiles, the last slot is numbered 16, and for that last address evaluation (TileBufferA-01, x) to be 16, X must be 17. So, the calls to the drawing routine will look like this:

Code: Select all

	;WRITE THE ADDRESS TO $2006 HERE!
	ldy TileCount0
	ldx TileCount0
	jsr DrawMetatiles
	;MODIFY THE ADDRESS AND WRITE TO $2006 HERE!
	ldy TileCount1
	ldx #$11
	jsr DrawMetatiles
There you have it, the secret for my fast scrolling! =) Of course, since I draw full metatiles, I actually call the drawing routine 4 times for a row, and 4 times for a column, for a total of 8 calls if both rows and columns are being rendered, and the value sent in X is more complex because it selects between rows and columns, first half or second half, left or right side, etc. But it's still pretty fast.
Post Reply