David Wheeler's ideas were a big help in figuring out the runtime for a high level language. I decided to build a split stack machine.
There are 4 data stacks -- STACK0, STACK1, STACK2, and STACK3. STACK0 could be on the zero page to save code space and allow some ldy ZP, x instructions, and STACK1, STACK2, STACK3 can be elsewhere. There is only one stack pointer, stored in the X register. This allows stack access quite quickly using indexed absolute addressing. Multi-byte values are stored across multiple stacks. For example, the bytes of a 24-bit integer would be stored at STACK0+x, STACK1+x, and STACK2+x. Manipulating the stack pointer is also very cheap, it only takes a dex or inx.
As an optimization, the current top-of-stack value is stored contiguously in the zero page at TOS. This means that most operations, which conceptually pop 2 values from the stack, then push the result back on the stack, only have to modify TOS in-place and then increment x in order to calculate their values.
Here's an example, the 2-byte addition operation:
Code: Select all
add2: ; 30 cycles total
clc ; 2 cycles
lda STACK0, x ; 4 cycles
adc TOS ; 3 cycles
sta TOS ; 3 cycles
lda STACK1, x ; 4 cycles
adc TOS+1 ; 3 cycles
sta TOS+1 ; 3 cycles
inx ; 2 cycles
rts ; 6 cycles
Code: Select all
main: ; calculates $1234 + $1337
; put $1234 on the stack
lda #<$1234
sta STACK0, x
lda #>$1234
sta STACK1, x
; put $1337 on the top of the stack
lda #<$1337
sta TOS
lda #>$1337
sta TOS+1
jmp add2