Espozo wrote:
Wait, are you saying that you can do 16x8 or 16x16 multiplication with mode 7's multiplication and division registers? Is there any realistic way to "chain" two 8 bit multiplications together like you can do with addition or subtraction? Probably not...

You can do 8x16 --> 24 multiplication with mode 7 registers but no division as far i know.

And of course you can chain them to obtain a 16x16 --> 32 multiplication :

(8 low) x16 = x

(8 high) x16 = y

then just do a 32 bits addition : x + (y << 8) to get the final 32 bits results.

Still even using the mode 7 registers i bet you spent a large amount of cycle just to do 16x16=32

That's why I was thinking it wouldn't be that fast. 70 to 140 cycles sounds astronomically high to me, but on the SNES, it's more like half which is still ridiculously large, but it's at least somewhat reasonable.

Actually on 68000 the multiplication takes up to 72 cycles (for signed version) and up to 140 cycles for the signed division but given some benchmarks i did, you can assume a mean of 50 cycles for multiplication and about 90/100 cycles for division which is not that bad.

With the genesis 68000 cpu (7.67 Mhz) i can transform (3D transformation + 2D projection) about 10000 vertices per second which is not that bad (i expected 6000 max).

A single 3D transformation consist of :

- 9 16x16=32 multiplication

- 6 32bit additions

- 3 16bit additions

A single 2D projection consist of :

- 3 16bit additions

- 1 32:16=16 division

- 2 16x16=32 multiplication

The projection could be different but i handled it that way for convenience.

Still that is a big amount of complexes operations and i don't count the load / store / shift operations here.

If we just count maximum cycles of mul and div, we already obtain (11*70) + 140 which is close to 1000 cycles per vertex ! So we shouldn't be able to transform more than 8000 vertices per second just because of these operations... but hopefully that is not the case. I wonder how much we could transform with the SNES hardware using smart interlacing of operations with the different available multiplier / diviser units