Compiled stack proposal

Discuss technical or other issues relating to programming the Nintendo Entertainment System, Famicom, or compatible systems. See the NESdev wiki for more information.

Moderator: Moderators

tepples
Posts: 22708
Joined: Sun Sep 19, 2004 11:12 pm
Location: NE Indiana, USA (NTSC)
Contact:

Compiled stack proposal

Post by tepples »

Bananmos wrote: Tue May 12, 2020 7:06 am for 99.9% of all code targeting the NES you wouldn't use recursion anyway, but have all your functions use zeropage. The problem is you need to avoid collisions between different variables. But this is easily done by assuming no recursion / call-by-pointer and creating a "compiled stack" of zeropage variables. In fact, this is exactly what Microchip's PIC C compiler appears to do
I'm proposing a preprocessor that automatically allocates local variables on zero page without the risk of one subroutine stepping on another subroutine's toes. It's analogous to this sort of "compiled stack" used by some C compilers that target microcontrollers and assumes that a subroutine is not recursive or otherwise reentrant, which Jarhmander mentioned earlier and Dwedit suggested applying elsewhere. I haven't yet written any of the needed code; I'm mostly spitballing here to see if I'm on the right track.

First some definitions:
  • A callee is a subroutine that reserves space on the static stack for its local variables.
  • A caller is a subroutine that calls one or more callees and does not overwrite the static stack of any of its callees.
In order for a caller to also be a callee, it needs to know how much of the data stack its callees have reserved. The preprocessor is mostly there to extract scope names from .proc keywords; it uses the linker to calculate where each item on the data stack is placed in zero page based on where its callees' data stack allocation ends.

The preprocessor would understand these keywords in code:
  • var varname, sz
    Creates a symbol on this subroutine's static stack called varname, local to this subroutine. Its size is sz bytes, defaulting to 1.
  • calls another_sub
    Marks the current subroutine as a caller of another_sub (the callee). This causes the start of the current subroutine's static stack to be placed no earlier than the end of the callee's static stack. If a jump table is used, the jump table needs to be a .proc that calls all routines inside it, and the subroutine that reads the jump table also needs to be a .proc that calls the jump table.
  • call another_sub
    Creates a caller-callee relationship and then actually calls another_sub. Equivalent to calls another_sub followed by jsr another_sub.
  • tailcalls another_sub
    Marks the current subroutine as a tail caller of another_sub. In a tail call, all local variables' lifetimes end before the call. This causes the start of the current subroutine's static stack to be placed no earlier than the start of the callee's static stack. A fall-through to another subroutine should also be marked as a tail call.
  • tailcall another_sub
    Creates a tail caller-callee relationship and actually tail calls another_sub. Equivalent to tailcalls another_sub followed by jmp another_sub.
  • leaf
    Suppresses a warning about declaring a var but having no calls. A leaf function is a callee but not a caller, and its static stack begins at the start of static stack space. If a function declares a var but isn't marked as a caller, the preprocessor otherwise emits a warning in case the programmer forgot to use calls.
Would anyone be interested in a proof of concept? Did I miss something critical?
User avatar
Dwedit
Posts: 4924
Joined: Fri Nov 19, 2004 7:35 pm
Contact:

Re: Compiled stack proposal

Post by Dwedit »

If you're trying to build a call tree, your root isn't just the entry point. Every possible 'thread' (interrupt handler) becomes its own root, and then function pointers make everything more complicated.
Here come the fortune cookies! Here come the fortune cookies! They're wearing paper hats!
User avatar
aa-dav
Posts: 220
Joined: Tue Apr 14, 2020 9:45 pm
Location: Russia

Re: Compiled stack proposal

Post by aa-dav »

I was amazed how very old computer processors handle calls of procedures. They just don't have stack at all!
Popular idea was to have word at the beginning of procedure where processor writes return adress while executing CALL insttruction.
That is CALL ADDR writes current IP pointer to ADDR and jumps to ADDR+1.
And "return instruction" was just indirect JUMP referring to head of procedure. JMP (HEAD_OF_THIS_PROCEDURE)
As you can note this absolutely excludes concept of reentry.
That is why Fortran introduced "recursive" keyword in Fortran 90 standard: http://fortranwiki.org/fortran/show/procedure
Only procedures with the keyword recursive are permitted to call themselves.
It's interesting that we can exclude functions which do not contain call of another functions from execution tree of program requiring stack: let's call it 'leaf-node functions'.
Leaf-node functions are subject of heavy optimization in modern compilers.
But they can be much more perfect targets for optimization in 8-bit systems, IMHO.
So, I belive these caller/callee relationships can be fully understood in compile-time.

It's really interesting question how to optimise 8 bit compilers....
User avatar
Bregalad
Posts: 8056
Joined: Fri Nov 12, 2004 2:49 pm
Location: Divonne-les-bains, France

Re: Compiled stack proposal

Post by Bregalad »

My $2:

I find the concept of auto-allocating variables so that they don't step on other functions' feet very interesting. Howver I'd suggest re-inventing new names entierely and never mention the word "stack" in it, because the main point is to not use a stack.

When it comes to interrupt, I'd exclude them from auto-allocating in the 1st place, resolving the issue. Except if you go for an "all in NMI" coding style (Konami-style), in that case I guess it's the opposite, the main thread would just be a short initialization routine and the NMI would be the root of the calling tree.

I'd be nice to determine it from JSR instructions directly, rather than having to put extra stuff in the comments that would be hard to be up-to-date and lead to nasty bugs.

BUT there's also many evil traps, for instance, I can call a subroutine by using JMP instead of JSR when it's the last instruction of my subroutine, sometimes I'll even use no instruction at all and leave the 6502 continue fetching instructions from the subroutine which is placed just below.

Sometimes I use a "double-return" technique of PLA/PLA/RTS which has no equivalent in any high level languages, except maybe the throw/catch of an exception.

@Bananmos : I'd say it's more 99% than 99.99% of the code. Although rarely useful, recursion can be useful in some rare cases.
User avatar
Dwedit
Posts: 4924
Joined: Fri Nov 19, 2004 7:35 pm
Contact:

Re: Compiled stack proposal

Post by Dwedit »

I would have called it "Stack Flattening".
Here come the fortune cookies! Here come the fortune cookies! They're wearing paper hats!
Garth
Posts: 246
Joined: Wed Nov 30, 2016 4:45 pm
Location: Southern California
Contact:

Re: Compiled stack proposal

Post by Garth »

It seems like some key characteristics of stacks are perhaps not being understood correctly. There's a treatise on 6502 stacks (plural, not just the page-1 hardware stack) at http://wilsonminesco.com/stacks/ . Later chapters do get into local variables and environments, and recursion. You may not need recursion; but once you have local variables and environments, there's nothing keeping you from doing recursion. It's simply not a problem, as long as you don't run out of stack space.
http://WilsonMinesCo.com/ lots of 6502 resources
User avatar
Bregalad
Posts: 8056
Joined: Fri Nov 12, 2004 2:49 pm
Location: Divonne-les-bains, France

Re: Compiled stack proposal

Post by Bregalad »

Garth wrote: Wed May 13, 2020 2:10 am It seems like some key characteristics of stacks are perhaps not being understood correctly. There's a treatise on 6502 stacks (plural, not just the page-1 hardware stack) at http://wilsonminesco.com/stacks/ . Later chapters do get into local variables and environments, and recursion. You may not need recursion; but once you have local variables and environments, there's nothing keeping you from doing recursion. It's simply not a problem, as long as you don't run out of stack space.
Of course it is a problem, why would you sacrifice a register for being your stack pointer, and use longer, slower instructions, and possibly waste a lot of ROM space, when you're not going to use recursion anyway ?!
Bananmos
Posts: 552
Joined: Wed Mar 09, 2005 9:08 am
Contact:

Re: Compiled stack proposal

Post by Bananmos »

Nice to see this being picked up. Though I would probably suggest avoiding the *requirement* to specify / maintain calle-caller relationships. The point is after all to automate all of this for conveniency, and lots of new metadata requirements kind of work against that. They would lead to subtle bugs when you update the code and forget to update the metadata.

But I appreciate that some calls (like those using function pointers) might be tricky to fully automate and may require some manual work. But this is fine if it's a much less common case, rather than the norm.
Bananmos
Posts: 552
Joined: Wed Mar 09, 2005 9:08 am
Contact:

Re: Compiled stack proposal

Post by Bananmos »

Bregalad wrote: Wed May 13, 2020 12:40 am @Bananmos : I'd say it's more 99% than 99.99% of the code. Although rarely useful, recursion can be useful in some rare cases.
Yes, recursive algorithms have good uses even on the 6502. But again I would argue that even with a recursive algorithm, you'd probably re-write it to explicitly push / pull the needed state onto the stack, rather than having a long sequence of JSRs.

2 bytes per call can be very limiting for deeply recursive algorithms where some cases can lead to a very deep stack depth, as you can often not guarantee the worst case of traversing your connected elements in a way that ends up pushing all of them on the stack. But often the state you need to return to can be encapsulated into a single byte. If you require more state it becomes even more important to conserve the 256-byte stack page. 240 bytes means you can safely apply your recursive algorithm on at least a metatile screen

And if you do need to go beyond what the $100 page area can provide you, then your call / return code has already done the abstraction to not rely on a jsr and should be easier to move to using more memory, perhaps by storing state into parallel 256-byte pages.
tepples
Posts: 22708
Joined: Sun Sep 19, 2004 11:12 pm
Location: NE Indiana, USA (NTSC)
Contact:

Re: Compiled stack proposal

Post by tepples »

Bregalad wrote: Wed May 13, 2020 12:40 am I'd suggest re-inventing new names entierely and never mention the word "stack" in it, because the main point is to not use a stack.
Microchip's term. If the computer science literature offers a better term, I'm willing to use it instead.
Bregalad wrote: Wed May 13, 2020 12:40 am When it comes to interrupt, I'd exclude them from auto-allocating in the 1st place, resolving the issue.
Agreed.
Bregalad wrote: Wed May 13, 2020 12:40 am I'd be nice to determine it from JSR instructions directly, rather than having to put extra stuff in the comments that would be hard to be up-to-date and lead to nasty bugs.
That's exactly what call does: emit a jsr and add it as a callee.
Bregalad wrote: Wed May 13, 2020 12:40 am BUT there's also many evil traps, for instance, I can call a subroutine by using JMP instead of JSR
Analogously, you would call a subroutine with tailcall instead of call.
Bregalad wrote: Wed May 13, 2020 12:40 am Sometimes I'll even use no instruction at all and leave the 6502 continue fetching instructions from the subroutine which is placed just below.
I mentioned fallthroughs. You'd end the routine with tailcalls following_sub.
Bregalad wrote: Wed May 13, 2020 12:40 am Sometimes I use a "double-return" technique of PLA/PLA/RTS which has no equivalent in any high level languages, except maybe the throw/catch of an exception.
That shouldn't cause a problem here, as the lifetime of your variables is ending anyway.
Bananmos wrote: Wed May 13, 2020 4:20 am The point is after all to automate all of this for conveniency, and lots of new metadata requirements kind of work against that.
In the common case, the call pseudo-instruction would call the subroutine and insert the metadata for you.
Bananmos
Posts: 552
Joined: Wed Mar 09, 2005 9:08 am
Contact:

Re: Compiled stack proposal

Post by Bananmos »

In the common case, the call pseudo-instruction would call the subroutine and insert the metadata for you.
Well, I was thinking it'd be nice if the system could just deduce it from parsing jsr instructions, without a need for the pseudo-instruction for most functions.

But OTOH, if the preprocessor works well enough then a search-and-replace + subsequent corrections isn't a lot of work on the coder's behalf. Certainly less than the work I occasionally spend on accidentally overwriting my ZPs in nested calls, not to mention all those tedious "@localVar = TEMP+something" manual ZP assignments... :)
Oziphantom
Posts: 1565
Joined: Tue Feb 07, 2017 2:03 am

Re: Compiled stack proposal

Post by Oziphantom »

Window Allocator I feel is a more correct term.

As somebody who has built tree analysis and a Static Analyzer for 6502, this is not an easy thing to do in the uncontrolled environment of RAW ASM. Just working out "what is a function" as I do in the VICE debugger is only mostly accurate. C has a strict protocol and limited way it will use data, greatly simplifying the process.
SelfMod, Dispatch via RTS, Double Interrupt stabilization, Main, IRQ, NMI and potentially BRK threads to consider, stack manipulation, rts chaining and just well I want to return a couple up so lets just pop a bunch of stuff off the stack and go for it. Make it a lot harder. Having function jump into the middle of other functions starts to make trees break.

I would think having a narrower, you explicitly mark the start and end of a region, and then give it variable names that are "you will allocate this for me" would help contain the scope and make it more viable. But the maintenance of such info becomes more trouble than it is worth, so automation is key.
User avatar
Dwedit
Posts: 4924
Joined: Fri Nov 19, 2004 7:35 pm
Contact:

Re: Compiled stack proposal

Post by Dwedit »

This is more of a thing for languages like C than actual assembly.
Build the call tree, and allocate 'stack space'.

And you will end up with multiple parallel trees once you consider Interrupt Handlers.
From within your interrupt handler, anything called by that won't be using the same memory as something called from the main entry point. You may need to duplicate functions called by both just to ensure that they won't use the same memory.

You'd only need manual decoration for identifying other thread roots, such as your interrupt handler.


Then the hard part is function pointers. Sometimes you might get lucky, and function pointers will end up being a way to select between one of multiple different functions, and that can be represented in the call tree simply enough. Otherwise, all bets are off.
Here come the fortune cookies! Here come the fortune cookies! They're wearing paper hats!
Bananmos
Posts: 552
Joined: Wed Mar 09, 2005 9:08 am
Contact:

Re: Compiled stack proposal

Post by Bananmos »

Oziphantom wrote: Wed May 13, 2020 10:46 am Window Allocator I feel is a more correct term.

As somebody who has built tree analysis and a Static Analyzer for 6502, this is not an easy thing to do in the uncontrolled environment of RAW ASM. Just working out "what is a function" as I do in the VICE debugger is only mostly accurate.
That's why I think it's key to provide automation for convenience, but manual overrides when assumptions break down.

I agree there's a lot of edge cases, but still believe automating this tedious and bug-prone process would save programming effort compared to no automation at all. I'll be following this thread with great interest.
lidnariq
Posts: 11432
Joined: Sun Apr 13, 2008 11:12 am

Re: Compiled stack proposal

Post by lidnariq »

As far as I can tell, "compiled stack" is a term Microchip inherited from the much older 3rd party HI-TECH PICC.
Post Reply