Of course, these codes are probably not final. I'm still not very good at Super FX.
I had also considered using a sort of data-as-code format for large quantities of small identical objects. The entire graphic would be hardcoded and require no ROM access, metadata handling, or branching. But I haven't written anything like that yet.
I've got to say, I find the 8-bit busing to be far more aggravating in the case of the Super FX than in the case of the S-CPU. What they were trying to do with the Super FX really needed more bits per word than they had...
Code: Select all
; SINGLE-PIXEL BLITTING (slowest and most general):
to R12 ; pixel count goes in the LOOP index register
getb ; get pixel count for first line
inc R14 ; increment ROM address, triggering a buffer load
Start:
getc ; get pixel data (one byte per pixel) from ROM buffer
inc R14 ; and increment ROM address
loop ; decrement pixel count; if not zero, go to address in R13, ie: "Start"
plot ; plot pixel and increment X-counter in R1 (since the GSU is pipelined, this byte gets executed regardless)
getb ; get carriage return X-component (goes in R0)
inc R14 ; increment ROM address
with R1 ; update X-coordinate
sub R0 ; by subtracting carriage return X-component
inc R2 ; increment Y-coordinate
to R12 ; update LOOP index register
getb ; with pixel count for next line
inc R14 ; increment ROM address
loop ; decrement pixel count and branch to Start if not zero
nop ; dummy fill pipeline (nothing else to do before GETC, and the ROM buffer isn't ready anyway)
; The main loop has only two bytes between INC R14 and GETC, so in high-speed mode it's probably 6 cycles rather than 4.
; Blitting a sliver in 4bpp is probably at least 40 cycles, but that's still only 5 cycles per pixel, so this method is
; bottlenecked by code unless you're drawing in 8bpp.
Code: Select all
; DUAL-PIXEL BLITTING (faster for long solid runs, slower for short runs, doesn't support gaps):
to R12 ; pixel count goes in the LOOP index register
getb ; get pixel count for first line, plus two if odd
inc R14 ; increment ROM address, triggering a buffer load
with R12 ; operate on pixel count
SStart:
lsr ; turn pixel count into pixel pair count
bcc DStart ; if the pixel count was even, go to dual-pixel blitting
nop ; waste a cycle, because it's better than wasting 5 cycles at the end of the loop
getc ; fetch the first pixel from the ROM buffer
inc R14 ; increment the ROM address
loop ; decrement pixel pair count (hence the +2 for odd pixel counts) and go to DStart if nonzero
plot ; plot first pixel to buffer and increment X-coordinate (happens regardless of LOOP result)
bra EndL ; go to end of line (at this point it's been determined that the line was only one pixel long)
getb ; get carriage return X-component in R0 (happens after branch)
DStart:
getc ; get pixel pair
inc R14 ; increment ROM address
plot ; plot pixel to buffer and increment X-coordinate
loop ; decrement pixel pair count and go to DStart if nonzero
plot ; plot pixel to buffer (relying on dither flag to switch colours) and increment X-coordinate
getb ; get carriage return X-component in R0
EndL:
inc R14 ; increment ROM address
with R1 ; update X-coordinate
sub R0 ; with carriage return value
inc R2 ; increment Y-coordinate
to R12 ; refresh pixel counter
getb ; with next line's pixel count, plus three if odd and one if even
inc R14 ; increment ROM address
dec R12 ; decrement pixel count (hence the +1 for lines other than the first)
bne SStart ; branch to SStart if pixel count is nonzero
with R12 ; set up for right shift of pixel count
; This one uses the dither functionality to plot two pixels per byte fetched from ROM. Naturally this means all the
; graphics have to be duplicated in ROM so there's a version for each value of the dither bit (XOR of the X and Y
; bottom bits). Also, since dither can't plot transparent with non-transparent (it always checks the bottom of the
; colour register for colour #0, because it's checking the dither bit at the same time and doesn't yet know which half
; to use), this method does not support gaps in a line.
Code: Select all
; DUAL-PIXEL WITH GAPS (a bit slower than basic dual-pixel blitting, but more flexible):
to R12 ; pixel count goes in the LOOP index register
getb ; get pixel count for first line, plus two if odd
inc R14 ; increment ROM address, triggering a buffer load
with R12 ; operate on pixel count
SStart:
lsr ; turn pixel count into pixel pair count
bcc DStart ; if the pixel count was even, go to dual-pixel blitting
nop ; waste a cycle, because it's better than wasting 5 cycles at the end of the loop
getc ; fetch the first pixel from the ROM buffer
inc R14 ; increment the ROM address
loop ; decrement pixel pair count (hence the +2 for odd pixel counts) and go to DStart if nonzero
plot ; plot first pixel to buffer and increment X-coordinate (happens regardless of LOOP result)
bra EndL ; go to end of line (at this point it's been determined that the line was only one pixel long)
getb ; get X increment in R0, shifted left and added to the Y increment bit
DStart:
getc ; get pixel pair
inc R14 ; increment ROM address
plot ; plot pixel to buffer and increment X-coordinate
loop ; decrement pixel pair count and go to DStart if nonzero
plot ; plot pixel to buffer (relying on dither flag to switch colours) and increment X-coordinate
getb ; get X increment in R0, shifted left and added to the Y increment bit
EndL:
inc R14 ; increment ROM address
sex ; ensure that negative X increments remain negative when shifted
lsr ; shift X increment into position, pushing the Y increment out into the carry flag
bcs NewLine ; if the Y increment was one, go to NewLine (duplicated code for speed)
with R1 ; update X-coordinate
sub R0 ; with X increment
to R12 ; refresh pixel counter
getb ; with next run's pixel count, plus three if odd and one if even
inc R14 ; increment ROM address
dec R12 ; decrement pixel count (hence the +1 for runs other than the first)
bne SStart ; branch to SStart if pixel count is nonzero
with R12 ; set up for right shift of pixel count
bra EndBlit ; branch past duplicated code
NewLine:
sub R0 ; update X-coordinate with X increment
to R12 ; refresh pixel counter
getb ; with next line's pixel count, plus three if odd and one if even
inc R14 ; increment ROM address
inc R2 ; increment Y-coordinate
dec R12 ; decrement pixel count
bne SStart ; branch to SStart if pixel count is nonzero
with R12 ; set up for right shift of pixel count
EndBlit:
; This one encodes the X-coordinate carriage return value shifted left with a Y-increment bit shoved in on the right, so
; as to allow the algorithm to jump across gaps in a line without jumping down. This limits the size of the object
; somewhat, since there are now only 7 bits for the X-increment value, but I'm not too worried about that. I could
; encode TWO Y-increment bits this way, so as to allow vertical gaps in the object, but with what most of the graphics
; in my game look like, I doubt plotting a transparent pixel now and then is less efficient than doing a bunch of extra
; maneuvering at the end of every single run of solid pixels...
I suppose dumps of untested code aren't especially useful or interesting, since there's no indication of what might or might not be wrong...
EDIT: Just had an idea:
Code: Select all
getb
inc R14
color
plot
mult R3 ; where R3 contains 0010h
swap
color
loop
plot
On the other hand, my single-pixel blit routine is even slower, and the extra pixel this method tacks onto odd-sized lines is transparent and can't cause a sliver overflow, so it might actually be better...