Page 1 of 4
2bpp bullet rendering
Posted: Tue Feb 09, 2016 10:28 pm
by psycopathicteen
I have an idea for rendering bullets for a bullet hell shmup. Store 2bpp bullet sprite patterns and their transparency masks in the ROM at 8 horizontal scroll values. One row of 8 pixels would work like this:
lda buffer,y
and mask,x
ora pattern,x
sta buffer,y
That's already 8 pixels in ~20 cycles. I'm not too sure if that's enough though.
Re: 2bpp bullet rendering
Posted: Wed Feb 10, 2016 12:02 am
by darryl.revok
I'm not sure if I understand the idea for this. If your bullets are sprites, why do you need to update the pattern? Wouldn't it be easier to have a single sprite shared between as many bullets as you want, and move the bullets on-screen via object position?
You're going to need a value with which to calculate collisions anyway. Perhaps I'm missing something, but the explanation is very vague.
It could be that I'm missing something because this is SNES and not NES, but you can still move sprites on SNES, right?
If you can figure out what they did for Aero Fighters, from what I've seen I feel like that game has probably the best bullet:slowdown ratio for the console.
Re: 2bpp bullet rendering
Posted: Wed Feb 10, 2016 12:21 am
by Drew Sebastino
darryl.revok wrote:I'm not sure if I understand the idea for this. If your bullets are sprites, why do you need to update the pattern?
But they
aren't sprites.

It's BG3 being used as a screen with a bunch of bullets being "painted" on it. The reason you'd do this is to avoid the sprite limit and the sprite pixel per scanline limit. It's really a shame that oam can't just be updated during vblank, because I think sprites are being drawn to a linebuffer then?
Wait... Couldn't you just write to oam during active display? I must be missing something, because this is way to obvious. I just don't see how oam could be used during hblank and active display, because I thought I remembered hearing about how sprites are drawn and then BGs or something like that.
Sorry for derailing this already. Sprites are really one of those situations where I'm not completely sure why they aren't just meant to be CPU driven, as in it wouldn't just have the same number of sprites per scanline as the total and you'd just multiplex it. Doesn't the Amiga actually work this way? Oh wait, I just said I was sorry for interrupting this.

Re: 2bpp bullet rendering
Posted: Wed Feb 10, 2016 12:22 am
by Sik
Honestly, if I were to make a bullet hell for a 4th gen platform, I'd try to come up with regular patterns that are easy to recreate (e.g. by using tilemaps). Bonus in that it simplifies collision calculations (e.g. if there's a row of bullets, first check if the ship is inside the row, then if within the loop it's a bullet or a gap)
Re: 2bpp bullet rendering
Posted: Wed Feb 10, 2016 12:32 am
by UnDisbeliever
psycopathicteen wrote:That's already 8 pixels in ~20 cycles. I'm not too sure if that's enough though.
Umm, according to the bsnes/higan source it would be 24 cycles if the Accumulator & Index registers are 16 bits long.
Anyway, I thought about coding a bullet hell game. Let me find my notes.
There would be 2 buffers, one for player bullets, one for enemy bullets. Each buffer would be a 1bpp bitmap, 256x192 px in size (6144bytes). Bullets would be a single pixel in size.
Some
VMAIN magic would combine the two buffers into a single 2bpp tileset.
Transfer One: player bullet buffer DMA DMAP_TRANSFER_1REG to VMDATAL with VMAIN set to $04 (increment on VMDATAL, 8 bit address shift).
Transfer Two: enemy bullet buffer DMA DMAP_TRANSFER_1REG to VMDATAH with VMAIN set to $84 (increment on VMDATAH, 8 bit address shift).
I never actually implemented this. My napkin-math suggested that I would not have been able to fit 250 bullets and 10 enemies onto the screen at 30fps (I lost that sheet and I can't remember how I got that conclusion),
Draw Bullet Code:
Code: Select all
.A8
.I16
; DP = bullet address
LDA z:Bullet::xPos
AND #$07
TAY
LDA z:Bullet::xPos
LSR
LSR
LSR
STA tmp
REP #$30
.A16
LDA z:Bullet::yPos
AND #$00FF
XBA
LSR
LSR
LSR
; C always clear
; value at address tmp+1 is always 0
ADC tmp
TAX
; X = (xPos & 7)
; Y = yPos * 32 + xPos / 8
SEP #$20
.A8
LDA buffer, X
ORA SetBulletTable, Y
STA buffer, X
Code: Select all
SetBulletTable:
.repeat 8, i
.byte 1 << i
.endrepeat
Collision code would have been pixel perfect:
Code: Select all
.A16
.I16
Check_8x8Collision:
; X = frame collsion data offset + (xPos & 7) * 2
; Y = yPos * 32 + xPos / 8
.repeat 8, i
LDA buffer + i * 32, Y
AND frameCollisionData + i * 16 * 2, X
BNE CollisionOccoured
.endrepeat
; no collision
CollisionOccoured:
; collision code
Code: Select all
; CollisionData
; -------------
.macro _buildRow data
.repeat 8, i
.word data << i
.endrepeat
.endmacro
CollisionDataFrame1:
_buildRow %00011000
_buildRow %00011000
_buildRow %00111100
_buildRow %00111100
_buildRow %00111100
_buildRow %00111100
_buildRow %01111110
_buildRow %11111111
.endrepeat
EDIT: Added info about combining buffers.
Re: 2bpp bullet rendering
Posted: Wed Feb 10, 2016 6:08 pm
by psycopathicteen
I was basically thinking of doing this.
Making normal objects act like a normal game with hardware sprites, but also have a layer of software bullets on top of it. (Or underneath it, wait I need to check how priorities work again) The whole game would probably run at 30fps, alternating between updating the normal sprites and backgrounds, and updating bullets. It would need the screen to be cropped at 184 pixels in order to fit a whole 2bpp screen in one frame.
Re: 2bpp bullet rendering
Posted: Wed Feb 10, 2016 9:52 pm
by psycopathicteen
This is 256 bullets moving at 20fps.
Re: 2bpp bullet rendering
Posted: Thu Feb 11, 2016 12:14 am
by UnDisbeliever
psycopathicteen wrote:This is 256 bullets moving at 20fps.
Nice.
The movements are smoother than I expected them to be.
Are you going to do more with this?
Re: 2bpp bullet rendering
Posted: Thu Feb 11, 2016 8:13 am
by psycopathicteen
Umm, according to the bsnes/higan source it would be 24 cycles if the Accumulator & Index registers are 16 bits long.
At first I didn't know what you were talking about, but then I found this at
http://www.defence-force.org/computing/ ... /annexe_2/:
3) Add 1 cycle if adding index crosses a page boundary
I seriously never knew that. Surprisingly the long index addressing doesn't have a similar limitation. I wonder if that was just something left over from the 6502, because I don't see why a CPU with a 16-bit ALU would need to do that.
Re: 2bpp bullet rendering
Posted: Thu Feb 11, 2016 5:51 pm
by Drew Sebastino
psycopathicteen wrote:I was basically thinking of doing this.Making normal objects act like a normal game with hardware sprites, but also have a layer of software bullets on top of it.
I think 93143 is doing the same thing.
psycopathicteen wrote: It would need the screen to be cropped at 184 pixels in order to fit a whole 2bpp screen in one frame.
Why not double buffer?
Re: 2bpp bullet rendering
Posted: Thu Feb 11, 2016 6:43 pm
by Sik
Presumably you need memory for everything else too. (also could be referring to transfer bandwidth)
Also I was thinking, most bullet hells are vertical. You could probably just use that ad an excuse to render only half the screen (the extra space would be presumably used for the HUD)
Re: 2bpp bullet rendering
Posted: Thu Feb 11, 2016 9:56 pm
by UnDisbeliever
psycopathicteen wrote:I seriously never knew that. Surprisingly the long index addressing doesn't have a similar limitation. I wonder if that was just something left over from the 6502, because I don't see why a CPU with a 16-bit ALU would need to do that.
It is because absolute index addressing can increment the bank when the index crosses the bank boundary. This means that that 2 processing cycles are needed to preform the 24 bit addition with the 65816's 16 bit ALU.
The 65816 first preforms an 8 bit addition between the low byte of the address and the low byte of the index when it reads ADDR.H.
In the next cycle preforms a 16 bit addition between the 16 bit DB:ADDR.H and IH.
If the page boundary is never crossed (8 bit index && carry of {ADDR.L + I} is 0) then DB:ADDR.H is unchanged and the addition is skipped, saving an unneeded cycle.
(source)
With absolute long addressing the second addition is processed in the half-cycle after the bank byte is read from memory and will not save a cycle if skipped.
EDIT: added source, reordered sentences.
Re: 2bpp bullet rendering
Posted: Fri Feb 12, 2016 12:29 pm
by psycopathicteen
Add 1 cycle for indexing across page boundaries, or write, or X=0
Do they mean X=0 as in the status register bit that controls the size of the index registers? So does that mean that it always take an extra cycle when the index registers are 16-bit?
Re: 2bpp bullet rendering
Posted: Fri Feb 12, 2016 9:40 pm
by UnDisbeliever
psycopathicteen wrote:Add 1 cycle for indexing across page boundaries, or write, or X=0
Do they mean X=0 as in the status register bit that controls the size of the index registers? So does that mean that it always take an extra cycle when the index registers are 16-bit?
Yes, that extra cycle always occurs when the Index registers are 16 bit.
Re: 2bpp bullet rendering
Posted: Fri Feb 12, 2016 9:58 pm
by 93143
Espozo wrote:psycopathicteen wrote:I was basically thinking of doing this.Making normal objects act like a normal game with hardware sprites, but also have a layer of software bullets on top of it.
I think 93143 is doing the same thing.
Yeah, for a port of an existing game. But I'm not doing software rendering on the S-CPU, partly because of the sheer number, size, and colour depth of bullets in the original, and partly because I got stubborn about look&feel and blew 3/4 of my CPU budget on raster effects. I'm using the Super FX chip for bullets and collisions, which moves me out of direct competition with anything that doesn't need a coprocessor.
Also, it's been almost two years and I still haven't done a bullet test. Advantage: not me...