tcc816 math optimization

Discussion of hardware and software development for Super NES and Super Famicom.

Moderator: Moderators

Forum rules
  • For making cartridges of your Super NES games, see Reproduction.
Shiru
Posts: 1161
Joined: Sat Jan 23, 2010 11:41 pm

Re: tcc816 math optimization

Post by Shiru »

koitsu, this is really fun, you are provided exactly the same code from exactly the same source as in the current libtcc.asm version.

mic_'s examples of hardware divider usage take about 60 cycles, by the way, including necessary delays.
User avatar
koitsu
Posts: 4203
Joined: Sun Sep 19, 2004 9:28 pm
Location: A world gone mad

Re: tcc816 math optimization

Post by koitsu »

Ha! Imagine that. I guess the only difference is that mine has inline comments, haha. :-)

Using the hardware registers on the SNES/SFC is probably your best bet then, as the only thing that can beat that is a large pre-calculated table. The downside is that during the 8 and 16-cycle wait times, you could actually be doing some other things (potentially), but with a subroutine you're just going to have a bunch of NOPs. Possibly using in-line macros (instead of a subroutine) would be more efficient (not just for saving JSR/RTS or JSL/RTL cycles but for those 8 or 16 cycles where you could do other things)? It means more "wasted" ROM space but the trade off is additional flexibility. I'm not sure if 8/16 cycles really matters in your project or not.
mic_
Posts: 922
Joined: Thu Oct 05, 2006 6:29 am

Re: tcc816 math optimization

Post by mic_ »

This should be 2 cycles faster per iteration than the current version:

Code: Select all

tcc__udiv:
stz.b tcc__r9
ldy #1
- asl a
bcs +
iny
cpy #17
bne -
+ ror a
- sta.b tcc__r5 
txa             
sec             
sbc.b tcc__r5  
bcc +           
tax             
+ rol.b tcc__r9 
lda tcc__r5     
lsr a           
dey            
bne -           
rtl
Or this, which should be one additional cycle faster in some cases (when "bcc +" is taken at least half of the time):

Code: Select all

tcc__udiv:
stz.b tcc__r9
ldy #1
- asl a
bcs +
iny
cpy #17
bne -
+ ror a
- sta.b tcc__r5 
cpx tcc__r5     
bcc +           
txa             
sbc.b tcc__r5 
tax             
+ rol.b tcc__r9 
lda.b tcc__r5  
lsr a           
dey             
bne -           
rtl
Shiru
Posts: 1161
Joined: Sat Jan 23, 2010 11:41 pm

Re: tcc816 math optimization

Post by Shiru »

Thanks. What I done now, using code suggestions from this thread, is:

Code: Select all

tcc__mul:
	lda #0
	.repeat 4
	.repeat 4
	ldx.b tcc__r9
	beq ++
	lsr.b tcc__r9
	bcc +
	clc
	adc.b tcc__r10
+   asl.b tcc__r10
	.endr
++
	.endr
    rtl


tcc__udiv:
	stz.b tcc__r9
	ldy #1
	.repeat 16
 	asl a
	bcs tcc__udiv1
	iny
	.endr
tcc__udiv1:
 	ror a
- 	sta.b tcc__r5
	cpx tcc__r5     
	bcc +           
	txa             
	sbc.b tcc__r5
	tax             
+ 	rol.b tcc__r9
	lda.b tcc__r5 
	lsr a           
	dey             
	bne -           
	rtl
Post Reply