6502 ASM trick

Discuss technical or other issues relating to programming the Nintendo Entertainment System, Famicom, or compatible systems.

Moderator: Moderators

User avatar
tokumaru
Posts: 12106
Joined: Sat Feb 12, 2005 9:43 pm
Location: Rio de Janeiro - Brazil

Post by tokumaru »

After you've tried coding for the Atari 2600 and its 128 bytes of RAM, the 2KB of the NES seem like a lot of space, and suddenly flags that use entire bytes don't sound so bad.
zzo38
Posts: 1080
Joined: Mon Feb 07, 2011 12:46 pm

Re: 6502 ASM trick

Post by zzo38 »

tokumaru wrote:I also like very much the one where you push an address (minus 1) to the stack and then use the RTS instruction to jump to that address. This can be useful for implementing jump tables, and I'm using this a lot in my game.
I have thought of the same thing and used that in modifying PPMCK.

Another (very simple) thing is that if you call a subroutine and then followed immediately by the return from subroutine instruction, you don't need that on the stack, you can just jump directly to the subroutine without use of stack, to make tail calls. (Some NES programs did not use this and I have fixed them)
strat
Posts: 396
Joined: Mon Apr 07, 2008 6:08 pm
Location: Missouri

Post by strat »

I thought of a way to handle an object's state with bit-packed variables (2 bits or 4 variables per byte).

Code: Select all

	LDA #$55 ; becomes #$AA
	STA var
	ldx #4
.Loop:
	JSR CheckState
	ROL
	ROL
	STA var
	DEX
	BNE .Loop
	ROL
	
	brk

CheckState:
	lda var
	bit var

	bpl .0x
	bvc .10
	jmp MakeState00		; @11
	
.10:		; @10
	jmp MakeState11
.0x:
	bvc .00
	jmp MakeState10	; @01
	
.00:	jmp MakeState01	; @00
	
MakeState00:
	AND #$3F
	RTS
MakeState01:
	AND #$3F
	ORA #$40
	RTS
MakeState10:
	AND #$3F
	ORA #$80
	RTS
MakeState11:
	ORA #$C0
	RTS
This way there aren't a bunch of and's and cmp's.
User avatar
Jeroen
Posts: 1048
Joined: Tue Jul 03, 2007 1:49 pm

Re:

Post by Jeroen »

tokumaru wrote:After you've tried coding for the Atari 2600 and its 128 bytes of RAM, the 2KB of the NES seem like a lot of space, and suddenly flags that use entire bytes don't sound so bad.
/me shivers thinking back at that one time he tried Atari.
User avatar
Movax12
Posts: 529
Joined: Sun Jan 02, 2011 11:50 am

Re: 6502 ASM trick

Post by Movax12 »

Specific to ca65, but maybe workable in other assemblers:
Call any compatible subroutine and pass stack parameters and no messing around with return address:

Code: Select all

.macro call function, param1, param2, param3, param4, param5, param6

	lda #>@return
	pha
	lda #<@return-1
	pha
	.ifnblank param6
		lda param6
		pha
	.endif	
	.ifnblank param5
		lda param5
		pha
	.endif	
	.ifnblank param4
		lda param4
		pha
	.endif	
	.ifnblank param3
		lda param3
		pha
	.endif	
	.ifnblank param2
		lda param2
		pha
	.endif	
	.ifnblank param1
		lda param1
		pha
	.endif
		
	jmp function
	@return:
.endmacro
'function' should use pla to get all paramaters and rts when done.

Improved version:

Code: Select all

.macro call function, param1, param2, param3, param4, param5, param6, param7
.local return

	.if .paramcount > 3+1
		lda #>(return-1)
		pha
		lda #<(return-1)
		pha
	.endif
	.ifnblank param7
		lda param7
		pha
	.endif	
	.ifnblank param6
		lda param6
		pha
	.endif	
	.ifnblank param5
		lda param5
		pha
	.endif	
	.ifnblank param4
		lda param4
		pha
	.endif	
	.ifnblank param3
		ldy param3
	.endif	
	.ifnblank param2
		ldx param2
	.endif	
	.ifnblank param1
		lda param1
	.endif
	.if .paramcount <4+1
		jsr function
	.else
		jmp function
	.endif
	return:	
.endmacro
Fixes issue of possible page wrap for return address at the cost of a byte or two, only uses stack as needed, local label.
Last edited by Movax12 on Thu Aug 09, 2012 8:53 am, edited 4 times in total.
User avatar
Jarhmander
Formerly ~J-@D!~
Posts: 521
Joined: Sun Mar 12, 2006 12:36 am
Location: Rive nord de Montréal

Re: 6502 ASM trick

Post by Jarhmander »

[quote="Movax12"]

Code: Select all

.macro call function, param1, param2, param3, param4, param5, param6

	lda #>@return
	pha
	lda #<@return-1
	pha
        ...
Potential rare but ugly bug here. Let's imagine when expanding the macro, "@return" resolve to $C000 (for example), then it'll push $C0 then $FF, and when your subroutine returns the program will go haywire. So both the high and the low part should get a "-1", not just the low part. Otherwise, nice trick.

Also, that somehow reminds me of some TI's DSPs that doesn't have a "call", "jsr", "bx", "blx" instruction or anything equivalent, with these you have to put the return address in a register (B3) then jump to your subroutine.
((λ (x) (x x)) (λ (x) (x x)))
User avatar
thefox
Posts: 3139
Joined: Mon Jan 03, 2005 10:36 am
Location: Tampere, Finland
Contact:

Re: 6502 ASM trick

Post by thefox »

Another (small) thing, it's better to use .local control command in CA65 instead of cheap local labels (@foo) for labels inside the macro, because the cheap local label will still be visible outside the macro in this case.
Download STREEMERZ for NES from fauxgame.com! — Some other stuff I've done: fo.aspekt.fi
User avatar
Movax12
Posts: 529
Joined: Sun Jan 02, 2011 11:50 am

Re: 6502 ASM trick

Post by Movax12 »

Thanks for the feedback. I did know about .local in a macro, fixed. I think I fixed the return address problem in a slightly hacky way, but I am happy with it.
User avatar
thefox
Posts: 3139
Joined: Mon Jan 03, 2005 10:36 am
Location: Tampere, Finland
Contact:

Re: 6502 ASM trick

Post by thefox »

Movax12 wrote:Thanks for the feedback. I did know about .local in a macro, fixed. I think I fixed the return address problem in a slightly hacky way, but I am happy with it.
Why not this:

Code: Select all

   lda #>(return-1)
   pha
   lda #<(return-1)
   pha
Download STREEMERZ for NES from fauxgame.com! — Some other stuff I've done: fo.aspekt.fi
User avatar
Movax12
Posts: 529
Joined: Sun Jan 02, 2011 11:50 am

Re: 6502 ASM trick

Post by Movax12 »

Oh yeah that's much better. Thanks :)

Added one last revision to that post. It could be improved more, but the idea is there anyway.
bogax
Posts: 34
Joined: Wed Jul 30, 2008 12:03 am

Re:

Post by bogax »

Bregalad wrote: Second part, I analyzed the passing of >4 bytes of parameters very carefully.
bogax's solution is officially the best.

I counted bytes as following, assuming you have M functions that uses N parameters :

- Standard solution (3 last args in registers and anything else stored to ZP before calling) : 9+N bytes for each call, 0 overhead for the calee
Overall : (9+N)*M bytes.

- Nintendo's solution (store arguments after the jsr opcode and have the calee call a function that copies args to ZP) : 3+N bytes for each call, and 3 bytes overhead for the callee, the argument retrieve function takes 32 bytes :
Overall : (6+N)*M + 32 bytes

- bogax's solution (jsr to a special function followed by the adress of the actual function, # of arguments and arguments) : 6+N bytes for each call, 0 overhead for the calee, argument retrieve function takes 29 bytes :
Overall : (6+N)*M + 29 bytes

So bogax's is the winner (for a close call) is M>3, that if a program needs at least 4 functions with more than 3 bytes of arguments.
I haven't checked speed, but I'm sure bogyax's is less slow since it avoids an extra jsr/rts of overhead each time.

Eventually his routine can be improved further more using BRK to simulate a JSR to the argument retrieving function if IRQs aren't used.
Just to confuse the issue some more :P

It occured to me to move the target addresses and numbers of parameters
into tables. I think this might work out better provided you're calling each
target routine several times.

Code: Select all

 ;gets parameters inline
 ;then jumps to a target routine that uses the parameters
 ;the first byte inline is a pointer into a table of
 ;addresses of target routines and their associated
 ;number of parameters
 ;the low byte of the pointer to the inline parameters
 ;is kept in y and pushed back on the stack as the
 ;lo byte of the address the target routine returns to
 ;the paramters are fetched inline in ascending order
 ;but put into memory in descending order
 ;there's no provision for propagating a carry to the
 ;high byte of the inline pointer/return address
 ;so the inline parameters should not straddle a page boundary
 ;you jsr to this routine supplying a pointer as the
 ;first byte inline followed by the parameters to fetch
 ;in reverse order

GET_PARAMETERS
 pla                       ;first get the return address
 tay                       ;into the inline parameters pointer
 pla                       ;lo byte in y
 sta ptr+1                 ;hi byte in ptr
 pha                       ;put the hi byte back on the stack
 iny
 lda (ptr),y               ;get the first byte inline
 tax                       ;and put it in x to point to  
 lda jmp_tbl_lo,x          ;the target routine's paramter's
 sta jmp_add               ;target address
 lda jmp_tbl_hi,x
 sta jmp_add+1
 lda num_parameters_tbl,x
 tax                       ;number of parameters to fetch
LOOP
 iny
 lda (ptr),y
 sta parameters-1,x
 dex
 bne LOOP
 tya                       ;put the lo byte of the return address
 pha                       ;back on the stack
 jmp (jmp_add)

jmp_tbl_lo
 .db <ROUTINE1,<ROUTINE2,<ROUTINE3  ;etc

jmp_tbl_hi
 .db >ROUTINE1,>ROUTINE2,>ROUTINE3  ;etc

num_parameters_tbl
 .db num_parameters1,num_parameters2,num_parameters3  ;etc
So you jsr to the GET_PARAMETERS routine and the first parameter
inline is a pointer into the GET_PARAMETERS routine parameters table(s)
with an entry for each routine that uses GET_PARAMETERS,
containing that routine's address and the number of parameters to fetch.
bogax
Posts: 34
Joined: Wed Jul 30, 2008 12:03 am

Re:

Post by bogax »

strat wrote:I thought of a way to handle an object's state with bit-packed variables (2 bits or 4 variables per byte).

Code: Select all

	LDA #$55 ; becomes #$AA
	STA var
	ldx #4
.Loop:
	JSR CheckState
	ROL
	ROL
	STA var
	DEX
	BNE .Loop
	ROL
	
	brk

CheckState:
	lda var
	bit var

	bpl .0x
	bvc .10
	jmp MakeState00		; @11
	
.10:		; @10
	jmp MakeState11
.0x:
	bvc .00
	jmp MakeState10	; @01
	
.00:	jmp MakeState01	; @00
	
MakeState00:
	AND #$3F
	RTS
MakeState01:
	AND #$3F
	ORA #$40
	RTS
MakeState10:
	AND #$3F
	ORA #$80
	RTS
MakeState11:
	ORA #$C0
	RTS
This way there aren't a bunch of and's and cmp's.
If I'm reading that right, you just increment two bits at a time
that is you've got four independent two bit counters that count
to 11 and then wrap back to 00.
How about just incrementing the top two bits where a carry
can't do you any harm?

Code: Select all

 lda var
 ldx #$04
LOOP
 clc
 adc #$40   ; increment top two bits
 asl        ; rotate two bits through
 adc #$80   ; the carry
 rol
 dex
 bne LOOP
 sta var
User avatar
Bregalad
Posts: 8036
Joined: Fri Nov 12, 2004 2:49 pm
Location: Caen, France

Re: 6502 ASM trick

Post by Bregalad »

I just tought of a good old trick I'd like to share :

Divide a value by 2^n (in this example 8) but round to the nearest integer, instead of rounding down :

Code: Select all

  [...]
  lsr A
  lsr A
  lsr A
  adc #$00
Do addition in another base (typically 10) :

Code: Select all

   lda foo
   clc
   adc bar   ;Supposed to be in the 0-9 range
   cmp #10
   bcc +
   sbc #10
+ sta wathever      ;At this point, we got the low digit in A AND the carry is set if and only if there was an overflow

;Then we can continue with the dizains, etc...
   lda foo2
   adc bar2
   cmp #10

etc, etc....
Aah, this is good stuff only assembly language can do it so elegantly
User avatar
tokumaru
Posts: 12106
Joined: Sat Feb 12, 2005 9:43 pm
Location: Rio de Janeiro - Brazil

Re: 6502 ASM trick

Post by tokumaru »

Bregalad wrote:

Code: Select all

  [...]
  lsr A
  lsr A
  lsr A
  adc #$00
I've done this a couple of times.
Do addition in another base (typically 10)
And this too, for scores and such. Instead of converting numbers from binary to decimal just for displaying, I prefer to give each digit 1 byte and do all the math straight in decimal. This only gets complicated if you have to do more than add/subtract/compare these numbers.
User avatar
Bregalad
Posts: 8036
Joined: Fri Nov 12, 2004 2:49 pm
Location: Caen, France

Re: 6502 ASM trick

Post by Bregalad »

Same here, I don't even care to pack the digits in nybles because the minor RAM savings isn't worth the loads of shifts you're going to add in your code.
Also I'm pretty sure most games does this. Only games with a decent amount of numbers and math makes it worthwhile to actually code them in binary and convert them to be display on the screen. Heh, I'm even sure it would be possible to code an entiere RPG while keeping the HP, Mana, Money, etc... coded entirely in BCD at all times. It would also make people have a harder time to find cheat codes :)
I've done this a couple of times.
I know it's nothing so spectacular, but it shows the power of assembly language, there is no way to use the carry like this in any high level language.
If I wanted for example to do this in C (divide by 8 and round to the nearest integer) I'd have no choice but to do this :

Code: Select all

   if(variable & 0x8 == 0)
         result = variable / 8;
    else
          result = variable / 8 + 1;
It would be the exact same, but I think there is no way the compiler would make it this efficient (completely removing the if-else clause), unless it was specifically written with this case in mind.
Post Reply