Protecting against stack overflow/underflow

Discussion of hardware and software development for Super NES and Super Famicom. See the SNESdev wiki for more information.

Moderator: Moderators

Forum rules
  • For making cartridges of your Super NES games, see Reproduction.
User avatar
rainwarrior
Posts: 8735
Joined: Sun Jan 22, 2012 12:03 pm
Location: Canada
Contact:

Re: Protecting against stack overflow/underflow

Post by rainwarrior »

In most cases you can probably assume the stack starts where it's initialized and is never intended to go above that. There's probably counter-examples but I'm sure it would cover a majority. Maybe you could have an option to set it below or above if there's some special case.

As for where it ends, I dunno if there's a good heuristic for that. If you keep a code-data-log (CDL) you might look for the highest non-push write below the start of the stack? Would probably have to have an option to skip the first write, in case it gets initialized once at startup.

On NES there's probably more special cases, since it's not an uncommon technique to do temporary stack relocations for PLA/STA vblank uploads. I'm not sure if this technique would have as much use on SNES, but maybe not?
User avatar
jeffythedragonslayer
Posts: 344
Joined: Thu Dec 09, 2021 12:29 pm

Re: Protecting against stack overflow/underflow

Post by jeffythedragonslayer »

I also wonder if there are any games that leak stack space - like keep pushing things and forget to pull them, until the stack wraps around.
Myself086
Posts: 158
Joined: Sat Nov 10, 2018 2:49 pm

Re: Protecting against stack overflow/underflow

Post by Myself086 »

rainwarrior wrote: Mon Aug 22, 2022 3:55 pm On NES there's probably more special cases, since it's not an uncommon technique to do temporary stack relocations for PLA/STA vblank uploads. I'm not sure if this technique would have as much use on SNES, but maybe not?
I use SP to write to the OAM buffer on Project Nested. It's very useful for what I'm doing but I wonder if something similar can be used efficiently for native games.
User avatar
rainwarrior
Posts: 8735
Joined: Sun Jan 22, 2012 12:03 pm
Location: Canada
Contact:

Re: Protecting against stack overflow/underflow

Post by rainwarrior »

Myself086 wrote: Mon Aug 22, 2022 4:45 pmI use SP to write to the OAM buffer on Project Nested. It's very useful for what I'm doing but I wonder if something similar can be used efficiently for native games.
I'm curious how that works. I tried to search for it on the github code but I couldn't spot where it happens.

For the purpose I'm used to on NES, I think normally WMDATA would offer the same benefit as PLA without having to relocate the stack, and slight FastROM speed advantage? Though, if the target is a 16-bit register PLA would still be much better. ...but then again both of these cases are probably better served by DMA in most cases...

Though I'm used to PLA/STA pattern on NES, in lieu of DMA, but I suppose SNES has cases that would make PHA useful for storing to a buffer serially, which WMDATA couldn't accommodate as well if it's 16-bit data.
Myself086
Posts: 158
Joined: Sat Nov 10, 2018 2:49 pm

Re: Protecting against stack overflow/underflow

Post by Myself086 »

rainwarrior wrote: Mon Aug 22, 2022 5:07 pm
Myself086 wrote: Mon Aug 22, 2022 4:45 pmI use SP to write to the OAM buffer on Project Nested. It's very useful for what I'm doing but I wonder if something similar can be used efficiently for native games.
I'm curious how that works. I tried to search for it on the github code but I couldn't spot where it happens.
https://github.com/Myself086/Project-Ne ... O.4014.asm
Entry point is "IO__w4014_In"

SP points to the SNES OAM buffer.
DP points to the NES OAM buffer or a shadow copy if the transfer was requested outside of pages $00-$1f.
Pushes are as fast as accessing DP and is indeed faster than my previous methods that weren't using SP.

There are 4 variants: 8x8 or 8x16 and limit or no limit. All of which are unrolled.

The 8x8 variants make use of PEI which is hardly ever useful.

I refer to the OAM bytes as X, Y, T, A.

After everything is copied over to the SNES OAM buffer, only the changing bytes in OAM buffer are uploaded to OAM during vblank, this is possible because gaps are removed as part of using SP. Every unused sprite is skipped during conversion from NES to SNES format.
User avatar
rainwarrior
Posts: 8735
Joined: Sun Jan 22, 2012 12:03 pm
Location: Canada
Contact:

Re: Protecting against stack overflow/underflow

Post by rainwarrior »

Ah neat. Yeah that's a good use case for the PEA/PEI instructions too.
calima
Posts: 1745
Joined: Tue Oct 06, 2015 10:16 am

Re: Protecting against stack overflow/underflow

Post by calima »

I don't see any perf issues with this really. It's not like we're still running P3s that can barely emulate SNES, even if you're running an inefficient emu and checking your stack ops in lua/whatever, it will still run at full speed. A native compiled check? Leaf in the wind.
Oziphantom
Posts: 1565
Joined: Tue Feb 07, 2017 2:03 am

Re: Protecting against stack overflow/underflow

Post by Oziphantom »

Side note.
Always init your stack to a know value, as Reset might do this or that, jmp ($FFFC) won't and you might put in a soft reset at some point to handle "return to title screen" or something else similar to that.

Break points are the really the only way to put in guards, as your stack probably won't be static and you may have multiple stacks you need to guard against. To which a simple if the SP > X will then fail as you have moved your stack.

Working out the min and max used stack again gets tricky when you move it around is probably something code analysis should handle, however write some code to fill your main stack with a sentinel pattern, play the game, check as much as you can to avoid the "nobody pressed up while pruning code paths" problem. Then look up the last modified address and then add a couple for safety.
Garth
Posts: 246
Joined: Wed Nov 30, 2016 4:45 pm
Location: Southern California
Contact:

Re: Protecting against stack overflow/underflow

Post by Garth »

jeffythedragonslayer wrote: Mon Aug 22, 2022 11:16 am But the stack is a circular buffer, so (on the 6502) you can initialize the stack pointer to anywhere on page one (or even not initialize it at all!) and everything will still be just fine, even through the push and pull that causes the stack pointer to wraparound between $100 and $1FF. The CPU doesn't care.
That's neglecting a very useful stack-relative addressing which you can do even on the '02. The following is from the middle of the stack-addressing page of my treatise on 6502 stacks (stack plural, not just the page-1 hardware stack):

  • So far, regarding the hardware stack, we have only approached it as a last-on-first-off memory. That's not to say there aren't tricks to read or replace items in the stack that aren't at the top. There are! It's just that the additions to, and removals from, the stack only happen at the top. Stack items further down the stack are still accessible, without having to pull everything above them off first.

    On the 6502, it starts with TSX, the "Transfer Stack pointer to X register" instruction. (Previously we touched on TXS to go the opposite direction (ie, X to Stack pointer) in the reset routine to initialize the stack pointer. Don't confuse them.) TSX lets us use X instead of S for indexing, and the instructions available in the abs,X addressing mode allow changing the base address, so we can index different depths into the stack, without even changing the X value, or even necessarily caring what it is.

    Note that stack addressing is the same as a 100,S addressing mode, with a hypothetical preceding INS or trailing DES (INncrement S or DEcrement S) instruction built in, for pulling or pushing a byte, respectively. You could synthesize the stack instructions with the abs,X addressing mode. (Part of the penalty is that you lose the use of X for other things if you don't save it somewhere else—possibly on the stack itself.) You could for example synthesize PHA with STA $100,X, DEX, and PLA with INX, LDA $100,X. But why would you? You wouldn't. It would be a waste, since PHA and PLA are built in.

    But! there is a related technique for gaining stack-relative addressing. Again, stack addressing is indexed already; so stack relative addressing is doubly indexed. When you write the program, you won't normally know the value of S (the stack pointer register) at any given point; but if you do for example TSX, ADC $105,X (it was mentioned near the top of this section that you cannot directly pull a byte off the stack and add it to the accumulator's content all in one step), you have the base value $100, plus the index 5 that you added, plus the X index which you got from S, and now you've just accessed the fifth byte on the stack without first pulling off everything above it to get there. Also, it's still there for future use. You accessed it without pulling (or in the language of some other processors, "popping") it. Going in order then is not required, and you're not limited to just PLA, PLX, and PLY, (as we showed with the ADC example). (To make things easier, the 65816 has additional built-in stack-relative addressing modes. These will be covered briefly in section 10 and further in section 13.)

    Note that for something like the TSX, LDA $105,X example above to always work, you must start the program with (or have in the reset routine) LDX #$FF, TXS (or replace $FF with whatever is the highest address in page 1 that you want the hardware stack to use). Otherwise, with a random start point, if the stack wraps from $100 to $1FF to $1FE long before the stack space is full, the abs,X indexing could put you into page 2, completely out of the stack area. For example, if you started randomly at $107 and put nine bytes on the stack, they would be at $107 through $100, then wrap to $1FF, and S would contain $FE. Now if you do TSX, LDA $105,X, $105+$FE puts you at $203, which is out of the stack area.

    Another way to do stack-relative addressing in stack frames (which will be discussed later, in Section 14, on local variables and environments) is the (ZP),Y addressing mode. <snip>

Note the importance of initializing the stack!

Section 15 is on recursion.

As for measuring where you are in the stack, this is the 2nd-to-last paragraph of the "Enough Stack Space?" page of my 6502 stacks treatise:

  • There is no warning of overflowing or underflowing the stack, or even imminent over/underflow. If underflow happens, you have a serious bug you need to fix; but overflow may be a semi-innocent result of running too much nesting, especially with a recursive subroutine, or of putting unexpectedly large sets of data on the stack when there wasn't enough room left. Again, normally you'll have plenty of stack space and no protections are needed; but TSX, CPX can be used to check the depth of the hardware stack. In the case of a ZP data stack, the pointer is X, and again you can check that easily. Section 18 ("Stacks Potpourri") has a section on measuring stack depth, and valid reasons for doing so.

The '816 of course lends itself better to large stack frames; but even there, you probably don't need to allow any more than one extra page. How much of course depends on what you're doing. You'll have to figure out what's the maximum you might need, worst case.

In all my decades of 6502/816 programming, I don't remember ever underflowing or overflowing the hardware stack. I think that once you get past the beginner stages, it takes very little care to keep from losing control.

Pokun wrote: I thought the RESET interrupt worked exactly like every other interrupt which includes pushing both PC and P to the stack. But since neither PC, P, SP nor RAM are initialized on boot, that normally doesn't mean much to the programmer (unless non-volatile RAM is used).
I came across at least one manufacturer of the '02 that wrote to the stack in the reset sequence. It could be helpful in figuring out which way the bear went into the woods if you crash and have to press the reset button to get control back. The reset sequence would push the program counter onto the stack, which then you could examine and see if for example you had a loop there whose exit condition is never met. I don't remember which brand(s) of 6502 or 65c02 did this.
http://WilsonMinesCo.com/ lots of 6502 resources
User avatar
jeffythedragonslayer
Posts: 344
Joined: Thu Dec 09, 2021 12:29 pm

Re: Protecting against stack overflow/underflow

Post by jeffythedragonslayer »

Garth wrote: Sun Aug 28, 2022 1:04 amIn all my decades of 6502/816 programming, I don't remember ever underflowing or overflowing the hardware stack. I think that once you get past the beginner stages, it takes very little care to keep from losing control.
Sure. What I'm trying to accomplish with this emulator feature is to make these beginner stages less intimidating.
Garth
Posts: 246
Joined: Wed Nov 30, 2016 4:45 pm
Location: Southern California
Contact:

Re: Protecting against stack overflow/underflow

Post by Garth »

jeffythedragonslayer wrote: Sun Aug 28, 2022 4:22 pm
Garth wrote: Sun Aug 28, 2022 1:04 amIn all my decades of 6502/816 programming, I don't remember ever underflowing or overflowing the hardware stack. I think that once you get past the beginner stages, it takes very little care to keep from losing control.
Sure. What I'm trying to accomplish with this emulator feature is to make these beginner stages less intimidating.
Would you be single-stepping? If so, part of the display would be telling you what's in the stack pointer register, right? Is there any need for more than that?
http://WilsonMinesCo.com/ lots of 6502 resources
User avatar
jeffythedragonslayer
Posts: 344
Joined: Thu Dec 09, 2021 12:29 pm

Re: Protecting against stack overflow/underflow

Post by jeffythedragonslayer »

Garth wrote: Sun Aug 28, 2022 6:00 pm
jeffythedragonslayer wrote: Sun Aug 28, 2022 4:22 pm
Garth wrote: Sun Aug 28, 2022 1:04 amIn all my decades of 6502/816 programming, I don't remember ever underflowing or overflowing the hardware stack. I think that once you get past the beginner stages, it takes very little care to keep from losing control.
Sure. What I'm trying to accomplish with this emulator feature is to make these beginner stages less intimidating.
Would you be single-stepping? If so, part of the display would be telling you what's in the stack pointer register, right? Is there any need for more than that?
Yes, yes, and yes. The reason people single-step debug is that they aren't sure which line of code the bug is on (and/or unsure about the machine state), otherwise they would fix it in the source code as soon as they see it in the source. On a project with several thousands of lines of code, single-stepping can be a lot of work. An emulator that can notice when the program is doing something wrong can tell the programmer exactly which line the problem was noticed on (therefore the bug has to have been encountered on that line or before) and help them drill down into where the bug is faster than an emulator that can't.
Last edited by jeffythedragonslayer on Sun Aug 28, 2022 7:00 pm, edited 1 time in total.
User avatar
rainwarrior
Posts: 8735
Joined: Sun Jan 22, 2012 12:03 pm
Location: Canada
Contact:

Re: Protecting against stack overflow/underflow

Post by rainwarrior »

I guess I'd agree that stack underflow and overflow are pretty rare. Most stack problems are just missing one of a push/pull pair, which won't underflow but end up returning to a nonsense location to resume execution. For this in particular, I've found filling the unused areas of ROM with 0 = BRK and having a BRK handler (or just break on BRK in debugger) frequently picks up the problem for me, though usually a trace is needed to understand how it got there. Mesen's history viewer can be super helpful for rewinding and replaying stuff like this.

Overflow most often happens with out of control recursion, which also usually creates a very evident error, most often an infinite loop hang. Usually easy to diagnose.

Underflow.... I dunno, can't really think of many times I've seen it, offhand. The most common stack error usually doesn't pull enough from the stack to get back to the top. I think I've seen it happen from the RTS-to-nonsense problem, if it his a few RTSes by coincidence, but by then it's usually well past the point of error.

If you have a really limited stack space overflow might be more of a problem, but I've usually allowed 100 bytes or so and used only a small portion of it. Unless storing significant data structures on the stack with significant call depth, it just isn't a big problem? At least, not on these systems.
User avatar
jeffythedragonslayer
Posts: 344
Joined: Thu Dec 09, 2021 12:29 pm

Re: Protecting against stack overflow/underflow

Post by jeffythedragonslayer »

If you're writing a lot of subroutines that pass parameters by stack it's more common to mess up the stack.
Garth
Posts: 246
Joined: Wed Nov 30, 2016 4:45 pm
Location: Southern California
Contact:

Re: Protecting against stack overflow/underflow

Post by Garth »

jeffythedragonslayer wrote: Sun Aug 28, 2022 6:25 pm
Garth wrote: Sun Aug 28, 2022 6:00 pm
jeffythedragonslayer wrote: Sun Aug 28, 2022 4:22 pm
Sure. What I'm trying to accomplish with this emulator feature is to make these beginner stages less intimidating.
Would you be single-stepping? If so, part of the display would be telling you what's in the stack pointer register, right? Is there any need for more than that?
Yes, yes, and yes. The reason people single-step debug is that they aren't sure which line of code the bug is on (and/or unsure about the machine state), otherwise they would fix it in the source code as soon as they see it in the source.
The only time I had any use for a simulator was when I was new to a processor and would get the wrong addressing mode. I could look right at it and think it was correct, but when I single-stepped and looked at the registers and program counter in the simulator before and after executing the instruction, I'd see that it wasn't what I wanted. Otherwise my main need for a simulator was for the I/O, and I've never seen one that was really any good, since so much of it (input especially) is asynchronous in real life. Even Microchip's, which simulates their microcontrollers' onboard timers and I/O and processor support, proved to be useless to me.

On a project with several thousands of lines of code, single-stepping can be a lot of work. An emulator that can notice when the program is doing something wrong can tell the programmer exactly which line the problem was noticed on (therefore the bug has to have been encountered on that line or before) and help them drill down into where the bug is faster than an emulator that can't.
I highly doubt a simulator (a true emulator involves hardware, not just software—if I may be a purist) will have the intelligence to figure out and tell you which way the bear went into the woods. By the time it senses that the program is doing something wrong or that the stack overflowed or underflowed, the cause may be thousands of cycles in the past.

If you're writing a lot of subroutines that pass parameters by stack it's more common.
The '816 (which is more conducive to using large stack frames than the '02 is) lets you allot a lot more memory for the stack than the '02 does; so if you figure out the maximum stack space you'd need under the most extreme circumstances and allot that much plus a small pad, you should always be fine.

When you know you're accessing the stacks constantly but don't know what the maximum depth is you're using, the tendency is to go overboard and keep upping your estimate, "just to be sure." I did this for years myself, and finally decided to do some tests to find out. I filled the 6502 stack area with a constant value (maybe it was 00—I don't remember), ran a heavy-ish application with all the interrupts going too, did compiling, assembling, and interpreting while running other things in the background, and after a while looked to see how much of the stack area had been written on. It wasn't really much—less than 20% of each of page 1 (return stack) and page 0 (data stack). This was in Forth, which makes heavy use of the stacks. The IRQ interrupt handlers were in Forth too, although the software RTC (run off a timer on NMI) was in assembly language.
http://WilsonMinesCo.com/ lots of 6502 resources
Post Reply