convoluted rts trick macro...need better approach

Discuss technical or other issues relating to programming the Nintendo Entertainment System, Famicom, or compatible systems.

Moderator: Moderators

Post Reply
User avatar
GradualGames
Posts: 1106
Joined: Sun Nov 09, 2008 9:18 pm
Location: Pennsylvania, USA
Contact:

convoluted rts trick macro...need better approach

Post by GradualGames »

I had a few indirect jumps followed by hard coded return locations in parts of my game engine. I decided to revise this and use the well known "rts trick." The wiki pointed out you must have a subroutine that does the trick in order for the jsr to push the correct return address on the stack. However I didn't like that I'd have to jump far away from the code just to jump far away again, so I developed this macro (ca65 syntax):

Code: Select all

.macro indirectJsr address

  lda #>(*+12)
  pha
  lda #<(*+9)
  pha
  
  lda address+1
  pha
  lda address
  pha
  
  rts

.endmacro
Where * is the current program counter address (as calculated during assembly of your code). I wondered if anyone else has used a similar approach for their own usage of the rts trick?
Last edited by GradualGames on Thu Jan 28, 2010 7:40 am, edited 2 times in total.
User avatar
Sivak
Posts: 316
Joined: Tue Jul 17, 2007 9:04 am
Location: Somewhere
Contact:

Post by Sivak »

Interesting on the first part. But for the last part, you'd need to read from a table that has something like:

Code: Select all

TableOfPlaces:
 .dw DesiredAddressA - 1, DesiredAddressB - 1

LDY navigator
LDA TableOfPlaces + 1, Y
PHA
LDA TableOfPlaces, Y
PHA
RTS
That's how I did it, anyway. I think the indirect jump method is one fewer cycle though.
User avatar
Disch
Posts: 1848
Joined: Wed Nov 10, 2004 6:47 pm

Post by Disch »

"Low overhead"?

This seems like high overhead. I never liked the "push your address then RTS it" crap. It always seemed absurd to me.

Code: Select all

; bytes,cycles
;  (my cycle count migiht be off.  I'm doing this from memory
;  and I'm rusty)

.macro indirectJsr address

  lda #>(*+12)  ; 2,2
  pha             ; 1,3
  lda #<(*+9) ; 2,2
  pha             ; 1,3
 
  lda address+1  ; 3,4
  pha         ; 1,3
  lda address  ; 3,4
  pha         ; 1,3
 
  rts    ; 1, 6

.endmacro

; total:   15 bytes
;           30 cycles
; AND your 'address' has to be -1 the actual address you want to jump to
;  (ugh)
The straightforward approach seems simpler:

Code: Select all

; this is pseudo code
; my ca65 macros (or whatever) is rusty

.macro IndirectJSR address
  jmp phoneylabel_jsr  ; 3,3

phoneylabel_jmp:
  jmp (address)  ; 3,5

phoneylabel_jsr:
  jsr phoneylabel_jmp  ; 3,6

; total:  9 bytes
;          14 cycles
iirc you can have ca65 generate phoney labels that only appear in the macro, so it won't interfere with other labels in your program. I forget exactly how that works though.

But having a common indirect JMP somewhere in the hardwired bank and then JSRing to it still seems like the best solution.
User avatar
tokumaru
Posts: 12106
Joined: Sat Feb 12, 2005 9:43 pm
Location: Rio de Janeiro - Brazil

Re: low overhead indirect jsr (rts trick)?

Post by tokumaru »

I have to honestly say that this method is not very good. First, even if you plant the return address like you're doing, I see no reason for you to use...

Code: Select all

  lda address+1
  pha
  lda address
  pha
  
  rts
...intead of...

Code: Select all

  jmp (address)
...which is much faster. It's like you want to use the RTS just for the heck of it, not because you need it. That trick is often used with jump tables, because you'd have to fetch the destination address from the table even if you were to use JMP (), but in your case the address is already at a known location in RAM, there is absolutely no need to copy it to the stack.

You also waste a lot of time planting the return address manually when you could do it with a JSR much quicker. If I were you I wouldn't worry about "having to jump far away from the code just to jump far away again", because although it sounds like a bad thing to do it's still faster and more compact than your current solution.

Here's how I do it: I have a few temp locations in ZP that I use as scratchpad memory. Somewhere in ROM I have a few (as many as necessary, but usually no more than 3 or 4) indirect jumps to some of the temp locations acting as subroutines.

Code: Select all

	Address0 .dsb 2
	Address1 .dsb 2
	Address2 .dsb 2
	Address3 .dsb 2

	(...)

CallAddress0:
	jmp (Address0)

CallAddress1:
	jmp (Address1)

CallAddress2:
	jmp (Address2)

CallAddress3:
	jmp (Address3)
Those locations act as virtual address registers, which I can use not only for indirect JSR'ing but also as pointers and such. Those few indirect JMPs take much less space than what your macro expands to.
User avatar
GradualGames
Posts: 1106
Joined: Sun Nov 09, 2008 9:18 pm
Location: Pennsylvania, USA
Contact:

Post by GradualGames »

I'm glad I posted. Thanks for the ideas/correction! I wrote a new macro based on Disch's idea, and holy crap, that's a lot simpler =).

Code: Select all

.macro indirectJsr address

  jmp *+6
  jmp (address)
  jsr *-3

.endmacro
*edit*
I guess I could use phony labels like Disch mentioned and do:

Code: Select all

  
jmp :++
: jmp (address)
: jsr :--
But, I think I'll stick with the program counter approach just to account for the extremely unlikely situation I'm still using anonymous labels anywhere in my code. I tried to get rid of all of them a while back, it makes one's code impossible to read.
Last edited by GradualGames on Wed Jan 27, 2010 9:23 pm, edited 2 times in total.
User avatar
tokumaru
Posts: 12106
Joined: Sat Feb 12, 2005 9:43 pm
Location: Rio de Janeiro - Brazil

Post by tokumaru »

Disch wrote:I never liked the "push your address then RTS it" crap. It always seemed absurd to me.
I don't think it's necessarily crap, but it's also not such the big find that we sometimes make it to be. It is 1 cycle slower than the indirect JMP way if the JMP uses ZP to hold the address, but it is 1 cycle faster if the JMP doesn't use ZP. Also, there are cases when we don't want to create a new variable just for a certain purpose, and we'd rather use the stack instead. But I admit that there are few advantages, when any, in using the JSR trick instead of an indirect jump.
Celius
Posts: 2159
Joined: Sun Jun 05, 2005 2:04 pm
Location: Minneapolis, Minnesota, United States
Contact:

Post by Celius »

If you're looking to have a substitute for the non-existent JSR ($XXXX), and you only want to use 2 bytes of RAM for the instruction, do this in your code:

Code: Select all

....
jsr IndirectJSR
....

IndirectJSR:
jmp ($XXXX)

That takes 3 or 5 extra cycles, and saves you a lot of hassle. And wherever $XXXX points to, you can have an RTS and it will take you right back to after the "jsr IndirectJSR". I haven't tested this method; I just came up with it and I think it would work great.

EDIT: Oh, I guess Disch already kind of posted the same solution! Except I'm not sure why there's a JMP to the JSR, which is after the JMP ($XXXX). Why not just have one universal "IndirectJSR" routine that you use and never have to define again? I suppose if you're using different values for $XXXX, then yes, you'd want more than one routine, but it wastes time to needlessly stick in a JMP + 6 to skip the indirect jump that you JSR to... It seems the macro makes things easier to program, but performance goes down a little, and it takes up more space, it seems.
User avatar
tokumaru
Posts: 12106
Joined: Sat Feb 12, 2005 9:43 pm
Location: Rio de Janeiro - Brazil

Post by tokumaru »

Celius wrote:Except I'm not sure why there's a JMP to the JSR, which is after the JMP ($XXXX).
Disch's solution is a macro which is to be used "in place", so if you want to return to the correct location later you have to skip the indirect jump.
Why not just have one universal "IndirectJSR" routine that you use and never have to define again? I suppose if you're using different values for $XXXX, then yes, you'd want more than one routine
Which is the solution I presented. With 3 or 4 fake "address registers" I have never had to worry about this again.
It seems the macro makes things easier to program, but performance goes down a little, and it takes up more space, it seems
In this case, yes. Usually macros do need more space, but they are supposed to be faster, because there is no calling and returning, but in this particular case using macros is indeed a bit slower, so I really don't see a reason to use them in this case.

For every address you call this macro with, two JMP instructions will be generated, when you could very well manually write just the indirect one somewhere else... So you are really just wasting space and time (it may not be much, but there is no advantage here that justifies the waste) IMO.
tepples
Posts: 22345
Joined: Sun Sep 19, 2004 11:12 pm
Location: NE Indiana, USA (NTSC)
Contact:

Re: low overhead indirect jsr (rts trick)?

Post by tepples »

tokumaru wrote:I have a few temp locations in ZP that I use as scratchpad memory [for] indirect jumps to some of the temp locations
So you're writing the function pointer to a global variable in RAM. You almost hit on the one advantage of using the stack for jump table calls vs. a temporary location in allocated memory (zero page or BSS): you can use jump tables in both your main thread and your interrupt handler without them stepping on each other.
User avatar
GradualGames
Posts: 1106
Joined: Sun Nov 09, 2008 9:18 pm
Location: Pennsylvania, USA
Contact:

Post by GradualGames »

Thanks for the additional pointers. I'm starting to learn that Disch likes to frame his answer assuming the OP had some reason for the original way they structured their code: He once found a way to solve a convoluted problem I was trying to solve where I didn't really need the solution (wraparound test, remember?)---and he has done it again for me =) In both cases, it seems I should just change my approach and do it the simpler way. I was using my macro in several different places in my code where I use a specific ZP variable to hold the address to jump to. I guess what I'll change it to now is just a jsr to a location that jmp (to what I was passing into the macro), and I'll save an extra jmp. Thanks everyone!

For some reason, I had been really locked into that "rts trick" thing, thinking it was the only way to simulate an indirect jsr. It didn't occur to me to search for some other way of doing it, hence my original, rather convoluted macro. Is there really any value to using the rts trick?
User avatar
Bregalad
Posts: 8036
Joined: Fri Nov 12, 2004 2:49 pm
Location: Caen, France

Post by Bregalad »

The only place I use the rts trick, I used it instead of jmp() because it saved me 2 bytes (2 times pha instead of 2 times sta zeropage), and because it removes the need of 2 temp variables.

There is absolutely no other advantage of this over a regular jmp().
Useless, lumbering half-wits don't scare us.
UncleSporky
Posts: 388
Joined: Sat Nov 17, 2007 8:44 pm

Post by UncleSporky »

I want to make sure I have something straight here, regarding the jmp indirect bug:

jmp ($xxxx) is safe to use in this case because you are jumping to a variable, which is obviously not going to straddle a page boundary? (Unless you set it up that way...)
tepples
Posts: 22345
Joined: Sun Sep 19, 2004 11:12 pm
Location: NE Indiana, USA (NTSC)
Contact:

Post by tepples »

UncleSporky wrote:jmp ($xxxx) is safe to use in this case because you are jumping to a variable, which is obviously not going to straddle a page boundary?
Correct. You can .align 2 before declaring the variable to be absolutely sure.
Post Reply