Page 3 of 4

Re: 2bpp bullet rendering

Posted: Sat Feb 13, 2016 11:08 pm
by 93143
psycopathicteen wrote:Should I make the picture full screen?
Wouldn't that limit you to 20 fps at best? Frankly it looks a little choppy to me - still impressive, but a bit suboptimal for a shooter where tracking these bullets by eye is the whole point. If you ended up wanting a pattern with a huge number of fairly slow bullets, you could always drop to 20 temporarily...

With 256x208, you could easily do 30 fps with a couple KB free every frame to do other stuff in. With 256x216, you could still do 30 fps, but there's not much room left for anything besides updating OAM, and even that's getting tight... A fully-occluding status bar reduces the amount of data to transfer, but doesn't help DMA bandwidth - 256x208 plus an 8-line status bar leaves less than a KB per frame...

I'd suggest trimming the edges, but I wonder if that would complicate wrap detection in the renderer...

...

Another thing I thought of: would you try to finish blitting a contiguous transfer chunk as soon as possible, so you could get a jump on the DMA, or would you finish the whole screen before starting the transfer and just eat the lag?
Espozo wrote:Wait, couldn't you use 1 32x32, 16x16 tile tilemap and just change the vertical scroll value to change between "two tilemaps"? I mean, in terms of data size, that's the same as an 32x32, 8x8 tile tilemap.
Clever... and it can be switched per-layer too, so you aren't locking yourself into anything this way. I guess this is one of those SNES features I noticed but never put much thought into, since I figured I didn't need it...

As for the partial buffering in VRAM, 30 fps has an advantage there too. You don't have to actually sustain 30 fps, but if you have enough blanking time that you theoretically could, you can do 3/2 buffering, where you only need to buffer half the screen size over and above what the display is using. You'd save a couple of KB as compared with 5/3 buffering, and several as compared with full double buffering...
Generally in these games, if there are a ton of bullets, there isn't anything else to run into. (I never understood why. I always thought it would be neat to do a duck and cover sort of thing.)
It would be cool, but it would require the terrain collision routine to be run for every single bullet.

Re: 2bpp bullet rendering

Posted: Sat Feb 13, 2016 11:42 pm
by Drew Sebastino
93143 wrote:Clever...
I'm good at coming up with ideas, just not implementing them. :lol: Irrelevant, but I made the realization that if you were to have a 4096 color image on the SNES using color math between a 8bpp and a 4bpp layer, then you'd only need one tilemap.

Actually back to the discussion, It's still crazy, but like I said, the collision for the bullets could be a single point. Really, BG collision would be easy, that is, if it weren't being run several hindered times... Because bosses in these games often fire the most bullets and are more often than not around obstacles, you could switch between checking for BG collision and not when appropriate. Actually, you could do it for everything.

Off topic, but I really want to know how they handled BG collision here, even if it is slow as all get out:

Image

Re: 2bpp bullet rendering

Posted: Sun Feb 14, 2016 1:01 am
by 93143
Espozo wrote:I made the realization that if you were to have a 4096 color image on the SNES using color math between a 8bpp and a 4bpp layer, then you'd only need one tilemap.
Well, 7/16 of a tilemap. If the rest of it never shows up on screen, you can put tile data in it. Meaning a 224x192 still image (perfect 4:3 with 16-pixel borders all around) is feasible at 12bpp.

Hang on; that's just 3/8 of the tilemap, even if you only count free space if tiles will fit in it... I suppose you could fit a few sprite tiles into the remaining 256 bytes...

I figured this subject wasn't likely to trigger an extended digression... I'll shut up now...

Re: 2bpp bullet rendering

Posted: Sun Feb 14, 2016 1:38 am
by Sik
93143 wrote:With 256x208, you could easily do 30 fps with a couple KB free every frame to do other stuff in. With 256x216, you could still do 30 fps, but there's not much room left for anything besides updating OAM, and even that's getting tight... A fully-occluding status bar reduces the amount of data to transfer, but doesn't help DMA bandwidth - 256x208 plus an 8-line status bar leaves less than a KB per frame...
Try 256x192, some MD games use that. There's a good reason: it pretty much doubles blanking time in NTSC, but the borders still aren't very noticeable, especially with the image centered (and overscan will eat part of the borders as well). That could be worth trying.

Re: 2bpp bullet rendering

Posted: Sun Feb 14, 2016 3:42 pm
by psycopathicteen
What would really speed up rendering would be to have the graphics embedded into the code itself as immediate values.

Re: 2bpp bullet rendering

Posted: Sun Feb 14, 2016 3:58 pm
by Drew Sebastino
I really kind of wondered why you didn't do that, but I thought maybe you had a specific reason which was why I asked how it worked.

Again though, how do you know when to draw an extra tile horizontally?

Re: 2bpp bullet rendering

Posted: Sun Feb 14, 2016 5:25 pm
by tepples
Espozo wrote:how do you know when to draw an extra tile horizontally?
if (bullet_left % tile_width) + bullet_width - 1 >= tile_width then you need two tiles across.

Re: 2bpp bullet rendering

Posted: Sun Feb 14, 2016 5:27 pm
by Drew Sebastino
I don't have a clue what that means, but I was also wondering if you could also use a lookup table for that and see if it would go faster. For this, I'd do the craziest things just to save a couple of cycles.

Re: 2bpp bullet rendering

Posted: Sun Feb 14, 2016 6:17 pm
by tepples
bullet_left: The X coordinate of the left side of a bullet
tile_width: 8 on most platforms, 4 if trying to use 16-bit reads and writes on 4bpp packed pixel platforms such as Genesis or GBA, 7 on oddball platforms such as Apple II
bullet_left % tile_width: The remainder when dividing bullet_left by tile_width, which will be in the range 0 through 7. Interpret it as the distance in pixels from the left side of the first tile that the bullet occupies.
bullet_width: The width of the bullet in pixels, such as 5
bullet_width - 1: The distance from the center of the leftmost pixel of a bullet to the center of its rightmost pixel
(bullet_left % tile_width) + bullet_width - 1: The distance in pixels from the left side of the first tile to the rightmost pixel of the bullet
(bullet_left % tile_width) + bullet_width - 1 >= tile_width: Whether this is at least one byte

Let's try an example, with bullet_left=101 and bullet_width=5
bullet_left % tile_width = 5, meaning that the bullet starts 5 pixels from the left side of the tile
bullet_left % tile_width + bullet_width - 1 = 5 + 5 - 1 = 9, meaning that the bullet's rightmost pixel is 9 pixels from the left side of the leftmost tile containing the bullet.
Because this distance is at least 8 pixels (the width of a tile), the bullet will occupy two tiles.

Now move this bullet a bit, with bullet_left=97 and bullet_width=5
bullet_left % tile_width = 1, meaning that the bullet starts 1 pixel from the left side of the tile
bullet_left % tile_width + bullet_width - 1 = 1 + 5 - 1 = 5, meaning that the bullet's rightmost pixel is 5 pixels from the left side of the leftmost tile containing the bullet.
Because this distance is less than 8 pixels (the width of a tile), the bullet will occupy only one tile.

These two inequalities are equivalent:
bullet_left % tile_width + bullet_width - 1 >= tile_width
bullet_left % tile_width + bullet_width >= tile_width + 1
The first means that the last pixel goes into a new tile. The second means that the right edge of the last pixel is past the end of the first tile.

The following subroutine calculates the second inequality and puts the result in carry:

Code: Select all

; as usual, capitals denote constants
TILE_WIDTH = 8
MOD_TILE_WIDTH = TILE_WIDTH - 1
.assert tile_width & mod_tile_width = 0, error, "tile_width must be a power of 2 to calculate remainders with AND"

.proc bullet_x_needs_two_tiles
  lda bullet_left,x
  and #MOD_TILE_WIDTH  ; A = bullet_left % tile_width
  clc
  adc bullet_width,x  ; A = (bullet_left % tile_width) + bullet_width
  cmp #TILE_WIDTH + 1
  rts
.endproc

Re: 2bpp bullet rendering

Posted: Sun Feb 14, 2016 8:11 pm
by psycopathicteen
Setting $2115 to $84 allows you to DMA 2bpp tiles as if they are arranged in a bitmap. The buffer is arranged from left to right, up to down, with each pair of bytes representing an 8x1 sliver.

Here is 30fps, 256x216, with 256 4x4 "sprites." I think they look a little too tiny.

Re: 2bpp bullet rendering

Posted: Sun Feb 14, 2016 8:53 pm
by Drew Sebastino
psycopathicteen wrote:Here is 30fps, 256x216
Now we're talking! :D
psycopathicteen wrote: I think they look a little too tiny.
I agree. Try this: (I tried 6x6, but I couldn't get a convincing circle)
5x5 Bullet.png
5x5 Bullet.png (193 Bytes) Viewed 2529 times
You know though, I still don't necessary understand how you're finding the tiles (although it's changed now that you said it acts like a bitmap now) but I tried me own code where you'd index two different tables by the x and y positions, but it kind of exploded... :lol: I must seriously be doing something wrong... I pretty much just doubled the amount of data for actual drawing code just to get rid of one "tay"...

Code: Select all

  rep #$30	;A=16, X/Y=16
  ldx BulletYPosition
  sec
  sbc #BulletHeight-1
  cmp #ScreenWidth
  beq done
  lda BulletYPositionBufferOffsetTable,x
  tax
  jsr (StartOfBulletYPositionCode,x)

bullet_x_position_code_finder:
  ldx BulletXPosition
  sec
  sbc #BulletWidth-1
  cmp #ScreenWidth
  beq done
  lda BulletXPositionCodeJumpTable,x
  tax
  jsr (StartOfBulletXPositionCode,x)

;============================================================
bullet_y_position=5_start:
  ldy #$0200
  ldx #$0200+BulletHeight*32
  stx EndOfBullet
  ldx #$0200+8
  stx BottomOfTile

;============================================================
bullet_x_position=23_start:
  sep #$20	;A=8
  lda Buffer+4,y
  and #BulletMask>>7
  ora #BulletPattern>>7
  sta Buffer+4,y

  tya
  clc
  adc #$08
  cpy BottomOfTile
  beq bullet_x_position=23_next_tile_start
  cpy EndOfBullet
  beq bullet_x_position=23_loop
  rts

bullet_x_position=23_loop:
  tay
  lda Buffer+4,y
  and #BulletMask>>7
  ora #BulletPattern>>7
  sta Buffer+4,y

  tya
  clc
  adc #$08
  cpy BottomOfTile
  beq bullet_x_position=23_next_tile_start
  cpy EndOfBullet
  bne bullet_x_position=23_next_tile_start:
  rts

bullet_x_position=23_next_tile_start:
  lda Buffer+4+128,y
  and #BulletMask>>7
  ora #BulletPattern>>7
  sta Buffer+4+128,y

  tya
  clc
  adc #$08
  cpy EndOfBullet
  bne bullet_x_position=23_next_tile_loop
  rts

bullet_x_position=23_next_tile_start:
  tay
  lda Buffer+4+128,y
  and #BulletMask>>7
  ora #BulletPattern>>7
  sta Buffer+4+128,y

  tya
  clc
  adc #$08
  cpy EndOfBullet
  bne bullet_x_position=23_next_tile_loop
  rts
Edit: hey, psychopathicteen, you heard the discussion 93413 were having about tile map data and stuff, right? I'm just wondering because you could make the tilemap half as large if you use 16x16 tiles and move the tilemap up and down to "switch" it. Also, what's with the gaps between the buffers?

Re: 2bpp bullet rendering

Posted: Mon Feb 15, 2016 8:42 am
by Drew Sebastino
Because you said you could have it like a regular bitmap, I thought I'd improve the disastrous code I made earlier: (It's a bullet that's 5 pixels tall.)

Code: Select all

  rep #$30   ;A=16, X/Y=16
  ldx BulletYPosition
  sec
  sbc #BulletHeight-1
  cmp #ScreenWidth
  beq done
  lda BulletYPositionBufferOffsetTable,x
  tay

bullet_x_position_code_finder:
  ldx BulletXPosition
  sec
  sbc #BulletWidth-1
  cmp #ScreenWidth
  beq done
  lda BulletXPositionCodeJumpTable,x
  tax
  jsr (StartOfBulletXPositionCode,x)

;============================================================
bullet_x_position=23_start:

  lda Buffer+4,y
  and #BulletMask>>7
  ora #BulletPattern>>7
  sta Buffer+4,y

  lda Buffer+4+64,y
  and #BulletMask>>7
  ora #BulletPattern>>7
  sta Buffer+4+64,y

  lda Buffer+4+128,y
  and #BulletMask>>7
  ora #BulletPattern>>7
  sta Buffer+4+128,y

  lda Buffer+4+192,y
  and #BulletMask>>7
  ora #BulletPattern>>7
  sta Buffer+4+192,y

  lda Buffer+4+256,y
  and #BulletMask>>7
  ora #BulletPattern>>7
  sta Buffer+4+256,y

  rts
So if the CPU is running between 2.5 and 3.5 MHz or something, we can say it's running at 3, and 3,000,000 / 60 = 50,000. I was thinking of the feasibility or running just this at 60fps because although you can't upload tiles that fast, collision detection and whatnot will certainly bring it down, and you wouldn't want the game running at 20fps.

Re: 2bpp bullet rendering

Posted: Mon Feb 15, 2016 9:14 am
by tepples
I wonder whether you could use the 8x8 bit multiplier to avoid having to store eight versions of the bullet sprite. Load the shift amount ($80, $40, $20, $10, $08, $04, $02) into $4202 before drawing a bullet. Then write each byte of the bullet sprite to $4203, and eight cycles later, read the tile out of $4217 (left tile) and $4216 (right tile). That'd let you use a wider variety of bullet sprites and even reuse the same routine for a proportional font.

Collision detection with that many bullets will have to use either 1D sorting (if the current bullet is too far from the player, all subsequent bullets in the same direction will be likewise) or the 2D "sector method" described in a 1995 Dr. Dobb's article by Dave Roberts.

Re: 2bpp bullet rendering

Posted: Mon Feb 15, 2016 9:25 am
by Drew Sebastino
tepples wrote:I wonder whether you could use the 8x8 bit multiplier to avoid having to store eight versions of the bullet sprite.
If it's slower, than don't bother.

Re: 2bpp bullet rendering

Posted: Mon Feb 15, 2016 10:12 am
by tepples
In some cases, you might be right. One difference between the NES and the Super NES is that the memory in a Super NES Game Pak is typically about eight times as big. This is large enough to store eight copies of each bullet and each glyph in a font, each shifted by a different amount, without causing unacceptable compromises to the detail of other graphics. But you'll need to write a program that makes said eight copies of each bullet graphic, and you'll need to have your build process re-run that program every time you edit the bullet graphics.