Page 1 of 2
Nintendo Jump tables
Posted: Tue May 29, 2012 10:26 pm
by Movax12
This is from SMB and is pretty smart:
Code: Select all
OperModeExecutionTree:
lda OperMode ;this is the heart of the entire program,
jsr JumpEngine ;most of what goes on starts here
.dw TitleScreenMode ; <-- *
.dw GameMode
.dw VictoryMode
.dw GameOverMode
;....
;later somewhere else in ROM:
;$04 - address low to jump address
;$05 - address high to jump address
;$06 - jump address low
;$07 - jump address high
JumpEngine:
asl ;shift bit from contents of A
tay
pla ;pull saved return address from stack
sta $04 ;save to indirect
pla
sta $05
iny
lda ($04),y ;load pointer from indirect
sta $06 ;note that if an RTS is performed in next routine
iny ;it will return to the execution before the sub
lda ($04),y ;that called this routine
sta $07
jmp ($06) ;jump to the address we loaded
This is an interesting solution, it was used in other games too, but is using the stack really better than having a label to reference the pointers where I marked with an '*'? For example, you could load x and y with the high and low of the label and then call the jump engine. Or Is there a big advantage I am missing?
(Note there are multiple blocks of code that call the jumpengine, not just OperModeExecutionTree.)
Posted: Tue May 29, 2012 11:55 pm
by Memblers
You could replace the PLA instructions with "LDA table,y" and it would be the same for that one example by itself. But using the stack like that would allow a unique table for every different time it's called. That works for SMB because it re-uses code and SMB has no PRG-ROM left. Even some PRG data is stored in CHR, and read out manually.
In the example of having X and Y contain the high/low of the label, seems like it'd be better to not put it there, but into zeropage. Which leaves nothing for the jump engine to do except jump.
Posted: Wed May 30, 2012 5:03 am
by Movax12
Actually, I thought about it more..keeping the same structure, and using x,y for the table pointer (or zeropage) is okay, but then you would need two RTS or still use PLA,PLA,RTS to return to the code that first called into the jump table, so it might as well be done this way.
Posted: Wed May 30, 2012 7:08 pm
by mcmartin
This is a fun trick - I associate it with OO-style method calls, myself. I first encountered it in Gradius, which has longer code but which preserves the (non-A) registers for the same 4 bytes of RAM:
Code: Select all
asl ; A = A * 2
stx $9B ; Cache X and Y
sty $9A
tay
iny ; Y = A + 1
pla ; Put RTS's return address in $98
sta $98
pla
sta $99
lda ($98), y ; Y is the offset for the A'th address
tax ; after the caller's JSR. Read that
iny ; address...
lda ($98), y
sta $99 ; And put it in $98-$99.
stx $98
ldy $9A ; Restore arguments
ldx $9B
jmp ($0098) ; Then jump there.
The calling convention is otherwise identical. I didn't get much further in my disassembly of Gradius, but that trick alone was worth the price of admission.
Posted: Wed May 30, 2012 8:49 pm
by lidnariq
The exact same instructions from the SMB disassembly is also in Galaxian. (with different addresses)
Posted: Wed May 30, 2012 9:41 pm
by Movax12
Apparently this style of code is in many NES games. Metriod has the code that preserves X and Y. My question was basically if this is really a good solution, or if whomever coded it was outsmarting themselves with cleverness, but and I suppose it is a decent way to solve that problem.
Posted: Thu May 31, 2012 12:07 am
by strat
Super Mario Bros. style of dynamic jumping is repeated in Gameboy games, only the z80 allows the actual jumping to be done with registers.
From Balloon Kid:
Code: Select all
048A:
add a
pop hl
ld e,a
ld d,00h
add hl,de
ld e,(hl)
inc hl
ld d,(hl)
push de
pop hl
ld pc,hl
Posted: Thu May 31, 2012 4:47 am
by Shiru
Strange, why they used that slow push/pop sequence instead of this:
Code: Select all
...
ld a,(hl)
inc hl
ld h,(hl)
ld l,a
jp (hl)
Posted: Thu May 31, 2012 5:17 am
by smkd
Movax12 wrote:Apparently this style of code is in many NES games. Metriod has the code that preserves X and Y. My question was basically if this is really a good solution, or if whomever coded it was outsmarting themselves with cleverness, but and I suppose it is a decent way to solve that problem.
It's not the fastest way to do it but it's really compact. Passing a pointer while also jumping to the dispatch code in only 3 bytes is pretty good. Memblers makes a good point with PRG space being starved. They would've been doing everything they can think of to save as much space as possible.
This appears in SNES games too. SMW uses the same trick, although it's 24bit instead. You just have JSL instead of JSR, LDA [$xx],y instead of LDA ($xx),y etc.
Posted: Thu May 31, 2012 11:34 pm
by strat
Shiru: That's nothing. All graphics tiles in Balloon Kid are visible in a tile editor - in contrast, very few SNES games, not even Super Mario World, can be seen that way - and apparently the first screen of each stage (the first one at least) is also stored uncompressed. I plan on going back to disassembling it and it's going to be a let down if they used that 128k space to just store uncompressed level data.
Posted: Thu May 31, 2012 11:49 pm
by strat
Shiru wrote:Strange, why they used that slow push/pop sequence instead of this:
Code: Select all
...
ld a,(hl)
inc hl
ld h,(hl)
ld l,a
jp (hl)
Just for grins I swapped in that instruction sequence.
Maybe the programmer didn't think an indirect load from hl into h would work. It looks like at least with these early games the programmers didn't know everything about the chips they coded for. The Sprite 0 hit in SMB doesn't know 'bit' changes the V flag. And whoever re'd Metroid made fun of the NMI for saving the processor status.
Posted: Fri Jun 01, 2012 3:51 am
by tepples
strat wrote:Shiru: That's nothing. All graphics tiles in Balloon Kid are visible in a tile editor - in contrast, very few SNES games, not even Super Mario World, can be seen that way
Super Mario Kart object graphics are uncompressed, but then it uses Battletoads style sprite cel copying. Super Mario All-Stars tiles are uncompressed, but I guess it too needs them uncompressed to simulate an MMC3's CHR bankswitching with DMA to VRAM.
- and apparently the first screen of each stage (the first one at least) is also stored uncompressed. I plan on going back to disassembling it and it's going to be a let down if they used that 128k space to just store uncompressed level data.
(Excuse me for the apologetics; I was a big fan of Balloon Kid at one time.)
Mask ROM fabrication rounds up the ROM size to a power of two. If a game is 128 KiB uncompressed or 68 KiB compressed, and you lack ideas for bonus minigames to fill the extra space, why waste effort on compression? That's why I didn't compress the 3.5 KiB of scripts in the cut scenes of Thwaite: it wouldn't have saved enough to let me add the things I wanted to add while keeping it NROM-128 should I ever get around to making version 0.04.
The Sprite 0 hit in SMB doesn't know 'bit' changes the V flag.
Apart from the 6502's famous die-space efficiency, one reason why Nintendo chose it is because it was an unfamiliar chip (or "stone"), as 8080 family CPUs were more popular in Japan at the time than the 6502 used in Apple, Commodore, and Atari products. See
page 2 of this interview.
Posted: Fri Jun 01, 2012 11:43 am
by strat
That's really interesting, because I was also wondering why the Famicom didn't just use the same CPU as Donkey Kong. (Too bad they didn't go with the 65c02, if that was even out yet).
Normally, in porting Donkey Kong, the quickest way would have been to use the CPU in the arcade version. But Ricoh wanted us to use the 6502, which they had the license for. When I said I wanted to use the 6502 at Nintendo, the staff told me that I make such decisions because I didn’t make video games.
Posted: Fri Jun 01, 2012 1:45 pm
by 3gengames
^ Then the "game makers" ran with it because in the long run it would help because the Z80 sucked at cycle efficiency and programming ease.

Posted: Fri Jun 01, 2012 3:31 pm
by tepples
But then the 6502 needed faster memory that responded within a half cycle, while
the Z80 allowed a cycle and a half. This allowed Z80s to be clocked faster with the same spec memory chips, making up for the lower cycle efficiency. Compare a 1.8 MHz Ricoh 6502 clone (NES) to a 4.2 MHz Sharp 8080 clone with some Z80 features (Game Boy).