6502 ASM trick
Moderator: Moderators
Re: 6502 ASM trick
I have thought of the same thing and used that in modifying PPMCK.tokumaru wrote:I also like very much the one where you push an address (minus 1) to the stack and then use the RTS instruction to jump to that address. This can be useful for implementing jump tables, and I'm using this a lot in my game.
Another (very simple) thing is that if you call a subroutine and then followed immediately by the return from subroutine instruction, you don't need that on the stack, you can just jump directly to the subroutine without use of stack, to make tail calls. (Some NES programs did not use this and I have fixed them)
I thought of a way to handle an object's state with bit-packed variables (2 bits or 4 variables per byte).
This way there aren't a bunch of and's and cmp's.
Code: Select all
LDA #$55 ; becomes #$AA
STA var
ldx #4
.Loop:
JSR CheckState
ROL
ROL
STA var
DEX
BNE .Loop
ROL
brk
CheckState:
lda var
bit var
bpl .0x
bvc .10
jmp MakeState00 ; @11
.10: ; @10
jmp MakeState11
.0x:
bvc .00
jmp MakeState10 ; @01
.00: jmp MakeState01 ; @00
MakeState00:
AND #$3F
RTS
MakeState01:
AND #$3F
ORA #$40
RTS
MakeState10:
AND #$3F
ORA #$80
RTS
MakeState11:
ORA #$C0
RTS
Re:
/me shivers thinking back at that one time he tried Atari.tokumaru wrote:After you've tried coding for the Atari 2600 and its 128 bytes of RAM, the 2KB of the NES seem like a lot of space, and suddenly flags that use entire bytes don't sound so bad.
Re: 6502 ASM trick
Specific to ca65, but maybe workable in other assemblers:
Call any compatible subroutine and pass stack parameters and no messing around with return address:
'function' should use pla to get all paramaters and rts when done.
Improved version:
Fixes issue of possible page wrap for return address at the cost of a byte or two, only uses stack as needed, local label.
Call any compatible subroutine and pass stack parameters and no messing around with return address:
Code: Select all
.macro call function, param1, param2, param3, param4, param5, param6
lda #>@return
pha
lda #<@return-1
pha
.ifnblank param6
lda param6
pha
.endif
.ifnblank param5
lda param5
pha
.endif
.ifnblank param4
lda param4
pha
.endif
.ifnblank param3
lda param3
pha
.endif
.ifnblank param2
lda param2
pha
.endif
.ifnblank param1
lda param1
pha
.endif
jmp function
@return:
.endmacro
Improved version:
Code: Select all
.macro call function, param1, param2, param3, param4, param5, param6, param7
.local return
.if .paramcount > 3+1
lda #>(return-1)
pha
lda #<(return-1)
pha
.endif
.ifnblank param7
lda param7
pha
.endif
.ifnblank param6
lda param6
pha
.endif
.ifnblank param5
lda param5
pha
.endif
.ifnblank param4
lda param4
pha
.endif
.ifnblank param3
ldy param3
.endif
.ifnblank param2
ldx param2
.endif
.ifnblank param1
lda param1
.endif
.if .paramcount <4+1
jsr function
.else
jmp function
.endif
return:
.endmacro
Last edited by Movax12 on Thu Aug 09, 2012 8:53 am, edited 4 times in total.
- Jarhmander
- Formerly ~J-@D!~
- Posts: 521
- Joined: Sun Mar 12, 2006 12:36 am
- Location: Rive nord de Montréal
Re: 6502 ASM trick
[quote="Movax12"]
Potential rare but ugly bug here. Let's imagine when expanding the macro, "@return" resolve to $C000 (for example), then it'll push $C0 then $FF, and when your subroutine returns the program will go haywire. So both the high and the low part should get a "-1", not just the low part. Otherwise, nice trick.
Also, that somehow reminds me of some TI's DSPs that doesn't have a "call", "jsr", "bx", "blx" instruction or anything equivalent, with these you have to put the return address in a register (B3) then jump to your subroutine.
Code: Select all
.macro call function, param1, param2, param3, param4, param5, param6
lda #>@return
pha
lda #<@return-1
pha
...
Also, that somehow reminds me of some TI's DSPs that doesn't have a "call", "jsr", "bx", "blx" instruction or anything equivalent, with these you have to put the return address in a register (B3) then jump to your subroutine.
((λ (x) (x x)) (λ (x) (x x)))
Re: 6502 ASM trick
Another (small) thing, it's better to use .local control command in CA65 instead of cheap local labels (@foo) for labels inside the macro, because the cheap local label will still be visible outside the macro in this case.
Download STREEMERZ for NES from fauxgame.com! — Some other stuff I've done: fo.aspekt.fi
Re: 6502 ASM trick
Thanks for the feedback. I did know about .local in a macro, fixed. I think I fixed the return address problem in a slightly hacky way, but I am happy with it.
Re: 6502 ASM trick
Why not this:Movax12 wrote:Thanks for the feedback. I did know about .local in a macro, fixed. I think I fixed the return address problem in a slightly hacky way, but I am happy with it.
Code: Select all
lda #>(return-1)
pha
lda #<(return-1)
pha
Download STREEMERZ for NES from fauxgame.com! — Some other stuff I've done: fo.aspekt.fi
Re: 6502 ASM trick
Oh yeah that's much better. Thanks 
Added one last revision to that post. It could be improved more, but the idea is there anyway.
Added one last revision to that post. It could be improved more, but the idea is there anyway.
Re:
Just to confuse the issue some moreBregalad wrote: Second part, I analyzed the passing of >4 bytes of parameters very carefully.
bogax's solution is officially the best.
I counted bytes as following, assuming you have M functions that uses N parameters :
- Standard solution (3 last args in registers and anything else stored to ZP before calling) : 9+N bytes for each call, 0 overhead for the calee
Overall : (9+N)*M bytes.
- Nintendo's solution (store arguments after the jsr opcode and have the calee call a function that copies args to ZP) : 3+N bytes for each call, and 3 bytes overhead for the callee, the argument retrieve function takes 32 bytes :
Overall : (6+N)*M + 32 bytes
- bogax's solution (jsr to a special function followed by the adress of the actual function, # of arguments and arguments) : 6+N bytes for each call, 0 overhead for the calee, argument retrieve function takes 29 bytes :
Overall : (6+N)*M + 29 bytes
So bogax's is the winner (for a close call) is M>3, that if a program needs at least 4 functions with more than 3 bytes of arguments.
I haven't checked speed, but I'm sure bogyax's is less slow since it avoids an extra jsr/rts of overhead each time.
Eventually his routine can be improved further more using BRK to simulate a JSR to the argument retrieving function if IRQs aren't used.
It occured to me to move the target addresses and numbers of parameters
into tables. I think this might work out better provided you're calling each
target routine several times.
Code: Select all
;gets parameters inline
;then jumps to a target routine that uses the parameters
;the first byte inline is a pointer into a table of
;addresses of target routines and their associated
;number of parameters
;the low byte of the pointer to the inline parameters
;is kept in y and pushed back on the stack as the
;lo byte of the address the target routine returns to
;the paramters are fetched inline in ascending order
;but put into memory in descending order
;there's no provision for propagating a carry to the
;high byte of the inline pointer/return address
;so the inline parameters should not straddle a page boundary
;you jsr to this routine supplying a pointer as the
;first byte inline followed by the parameters to fetch
;in reverse order
GET_PARAMETERS
pla ;first get the return address
tay ;into the inline parameters pointer
pla ;lo byte in y
sta ptr+1 ;hi byte in ptr
pha ;put the hi byte back on the stack
iny
lda (ptr),y ;get the first byte inline
tax ;and put it in x to point to
lda jmp_tbl_lo,x ;the target routine's paramter's
sta jmp_add ;target address
lda jmp_tbl_hi,x
sta jmp_add+1
lda num_parameters_tbl,x
tax ;number of parameters to fetch
LOOP
iny
lda (ptr),y
sta parameters-1,x
dex
bne LOOP
tya ;put the lo byte of the return address
pha ;back on the stack
jmp (jmp_add)
jmp_tbl_lo
.db <ROUTINE1,<ROUTINE2,<ROUTINE3 ;etc
jmp_tbl_hi
.db >ROUTINE1,>ROUTINE2,>ROUTINE3 ;etc
num_parameters_tbl
.db num_parameters1,num_parameters2,num_parameters3 ;etc
inline is a pointer into the GET_PARAMETERS routine parameters table(s)
with an entry for each routine that uses GET_PARAMETERS,
containing that routine's address and the number of parameters to fetch.
Re:
If I'm reading that right, you just increment two bits at a timestrat wrote:I thought of a way to handle an object's state with bit-packed variables (2 bits or 4 variables per byte).
This way there aren't a bunch of and's and cmp's.Code: Select all
LDA #$55 ; becomes #$AA STA var ldx #4 .Loop: JSR CheckState ROL ROL STA var DEX BNE .Loop ROL brk CheckState: lda var bit var bpl .0x bvc .10 jmp MakeState00 ; @11 .10: ; @10 jmp MakeState11 .0x: bvc .00 jmp MakeState10 ; @01 .00: jmp MakeState01 ; @00 MakeState00: AND #$3F RTS MakeState01: AND #$3F ORA #$40 RTS MakeState10: AND #$3F ORA #$80 RTS MakeState11: ORA #$C0 RTS
that is you've got four independent two bit counters that count
to 11 and then wrap back to 00.
How about just incrementing the top two bits where a carry
can't do you any harm?
Code: Select all
lda var
ldx #$04
LOOP
clc
adc #$40 ; increment top two bits
asl ; rotate two bits through
adc #$80 ; the carry
rol
dex
bne LOOP
sta var
Re: 6502 ASM trick
I just tought of a good old trick I'd like to share :
Divide a value by 2^n (in this example 8) but round to the nearest integer, instead of rounding down :
Do addition in another base (typically 10) :
Aah, this is good stuff only assembly language can do it so elegantly
Divide a value by 2^n (in this example 8) but round to the nearest integer, instead of rounding down :
Code: Select all
[...]
lsr A
lsr A
lsr A
adc #$00
Code: Select all
lda foo
clc
adc bar ;Supposed to be in the 0-9 range
cmp #10
bcc +
sbc #10
+ sta wathever ;At this point, we got the low digit in A AND the carry is set if and only if there was an overflow
;Then we can continue with the dizains, etc...
lda foo2
adc bar2
cmp #10
etc, etc....
Re: 6502 ASM trick
I've done this a couple of times.Bregalad wrote:Code: Select all
[...] lsr A lsr A lsr A adc #$00
And this too, for scores and such. Instead of converting numbers from binary to decimal just for displaying, I prefer to give each digit 1 byte and do all the math straight in decimal. This only gets complicated if you have to do more than add/subtract/compare these numbers.Do addition in another base (typically 10)
Re: 6502 ASM trick
Same here, I don't even care to pack the digits in nybles because the minor RAM savings isn't worth the loads of shifts you're going to add in your code.
Also I'm pretty sure most games does this. Only games with a decent amount of numbers and math makes it worthwhile to actually code them in binary and convert them to be display on the screen. Heh, I'm even sure it would be possible to code an entiere RPG while keeping the HP, Mana, Money, etc... coded entirely in BCD at all times. It would also make people have a harder time to find cheat codes
If I wanted for example to do this in C (divide by 8 and round to the nearest integer) I'd have no choice but to do this :
It would be the exact same, but I think there is no way the compiler would make it this efficient (completely removing the if-else clause), unless it was specifically written with this case in mind.
Also I'm pretty sure most games does this. Only games with a decent amount of numbers and math makes it worthwhile to actually code them in binary and convert them to be display on the screen. Heh, I'm even sure it would be possible to code an entiere RPG while keeping the HP, Mana, Money, etc... coded entirely in BCD at all times. It would also make people have a harder time to find cheat codes
I know it's nothing so spectacular, but it shows the power of assembly language, there is no way to use the carry like this in any high level language.I've done this a couple of times.
If I wanted for example to do this in C (divide by 8 and round to the nearest integer) I'd have no choice but to do this :
Code: Select all
if(variable & 0x8 == 0)
result = variable / 8;
else
result = variable / 8 + 1;