koitsu, this is really fun, you are provided exactly the same code from exactly the same source as in the current libtcc.asm version.
mic_'s examples of hardware divider usage take about 60 cycles, by the way, including necessary delays.
tcc816 math optimization
Moderator: Moderators
Forum rules
- For making cartridges of your Super NES games, see Reproduction.
Re: tcc816 math optimization
Ha! Imagine that. I guess the only difference is that mine has inline comments, haha. :-)
Using the hardware registers on the SNES/SFC is probably your best bet then, as the only thing that can beat that is a large pre-calculated table. The downside is that during the 8 and 16-cycle wait times, you could actually be doing some other things (potentially), but with a subroutine you're just going to have a bunch of NOPs. Possibly using in-line macros (instead of a subroutine) would be more efficient (not just for saving JSR/RTS or JSL/RTL cycles but for those 8 or 16 cycles where you could do other things)? It means more "wasted" ROM space but the trade off is additional flexibility. I'm not sure if 8/16 cycles really matters in your project or not.
Using the hardware registers on the SNES/SFC is probably your best bet then, as the only thing that can beat that is a large pre-calculated table. The downside is that during the 8 and 16-cycle wait times, you could actually be doing some other things (potentially), but with a subroutine you're just going to have a bunch of NOPs. Possibly using in-line macros (instead of a subroutine) would be more efficient (not just for saving JSR/RTS or JSL/RTL cycles but for those 8 or 16 cycles where you could do other things)? It means more "wasted" ROM space but the trade off is additional flexibility. I'm not sure if 8/16 cycles really matters in your project or not.
Re: tcc816 math optimization
This should be 2 cycles faster per iteration than the current version:
Or this, which should be one additional cycle faster in some cases (when "bcc +" is taken at least half of the time):
Code: Select all
tcc__udiv:
stz.b tcc__r9
ldy #1
- asl a
bcs +
iny
cpy #17
bne -
+ ror a
- sta.b tcc__r5
txa
sec
sbc.b tcc__r5
bcc +
tax
+ rol.b tcc__r9
lda tcc__r5
lsr a
dey
bne -
rtl
Code: Select all
tcc__udiv:
stz.b tcc__r9
ldy #1
- asl a
bcs +
iny
cpy #17
bne -
+ ror a
- sta.b tcc__r5
cpx tcc__r5
bcc +
txa
sbc.b tcc__r5
tax
+ rol.b tcc__r9
lda.b tcc__r5
lsr a
dey
bne -
rtl
Re: tcc816 math optimization
Thanks. What I done now, using code suggestions from this thread, is:
Code: Select all
tcc__mul:
lda #0
.repeat 4
.repeat 4
ldx.b tcc__r9
beq ++
lsr.b tcc__r9
bcc +
clc
adc.b tcc__r10
+ asl.b tcc__r10
.endr
++
.endr
rtl
tcc__udiv:
stz.b tcc__r9
ldy #1
.repeat 16
asl a
bcs tcc__udiv1
iny
.endr
tcc__udiv1:
ror a
- sta.b tcc__r5
cpx tcc__r5
bcc +
txa
sbc.b tcc__r5
tax
+ rol.b tcc__r9
lda.b tcc__r5
lsr a
dey
bne -
rtl