A problem would be that the horizontal scrolling and clipping would need to be "brute forced", i.e. each window coordinate needs to be scrolled and clipped individually (800 times in the above example). That would cost a bunch of frame time (not vblank time), something like roughly 30 ~ 40 scanlines.
I wonder if it can be done with 16 bit instructions (somehow scrolling two coordinates at once) but I don't think it's viable because doing it via LUT would require a 24bit LUT and doing it arithmetically (i.e., with adc) would make the result of the addition of the low bytes overflow into the high byte and even if that could be fixed with masking out the first bit (essentially halving horizontal resolution), there is still the clipping problem and I have no idea how to solve that.
In 8 bits, my best ideas yet were just using a 64k LUT to do the addition and clipping at once, or using addition and clamping with the carry flag (using the buffer prefill / skipping method just like with the lighting). The good thing is that only one screen border (left OR right) needs to be clipped against.
Edit: In fact the bit 1 mask problem is no problem at all.... for example when clipping against the right screeen border: When the left border is clipped, the right one will be clipped to, so no problem.
Edit 2: These are the best solutions I could come up with...
for 16 bits
Code: Select all
tya ; (2) y = monster horizontal position
clc ; (2)
adc window_right_left, x ; (5)
bcs _clip_both_window_borders ; (2)
bit window_right_mask, x ; (6)
bne _write ; (2)
ora #$00ff ; (3*)
_write:
xba ; (2)
sta hdma_table, x ; (6)
_clip_both_window_borders:
iny ; (2)
; total 27 / 2 = 13.5 cycles per coordinate
for 8 bits
Code: Select all
txa ; (2) x = monster horizontal position
clc ; (2)
adc [window], y ; (6)
bcc :+ ; (2)
lda #$ff ; (2*)
:
sta reg_wmdata ; (4)
iny ; (2)
; total 18 cycles per coordinate
the 16-bits solution requires double the amount of rom for the masking part though...