snes assembly - beginnings, a few questions (wla-65816)
Moderator: Moderators
Forum rules
- For making cartridges of your Super NES games, see Reproduction.
Re: snes assembly - beginnings, a few questions (wla-65816)
It is still not very clear.
Why, when I have "variable" in LOW Ram ($ 0000-1FFF), "INC variable" works, and when I have it in $7E2000+, I can't do it?
Why, when I have "variable" in LOW Ram ($ 0000-1FFF), "INC variable" works, and when I have it in $7E2000+, I can't do it?
Re: snes assembly - beginnings, a few questions (wla-65816)
The 816's op-code table is full, and there aren't enough positions in the table to do all the addressing modes in all the instructions; so INC's long addressing modes got cut.
http://WilsonMinesCo.com/ lots of 6502 resources
Re: snes assembly - beginnings, a few questions (wla-65816)
So some processor 'instructions' are limited when they are executed on 3-Byte addresses (higher ram banks)?
Last edited by sdm on Thu Nov 17, 2022 1:42 pm, edited 1 time in total.
- rainwarrior
- Posts: 8732
- Joined: Sun Jan 22, 2012 12:03 pm
- Location: Canada
- Contact:
Re: snes assembly - beginnings, a few questions (wla-65816)
The set of operations which work on 3-byte (long/far) addresses is more limited than the 2-byte (absolute) ones. All of the read-modify-write instructions like INC/ASL/ROR/etc. are missing, along with some others, though you can use long addresses with stuff like ADC or CMP. There are a few options for getting around this:
1. The 2-byte addressing modes use the data bank register to fill in the 3rd byte. You can switch the data bank to the bank of your variable:
2. Use 3-byte load to temporarily move the variable into a register, do the operation on the register, then use a 3-byte store to put it back:
3. You can also use a 3-byte pointer on the direct page. Same as indirect addressing but with square brackets instead of round ones to indicate a long pointer instead of an absolute one.
4. Use $2180 WMDATA register. Usually only convenient when you need to access many contiguous bytes in series, not so good for stuff like trying to increment something in place.
Since it's a bit inconvenient to change the data bank, usually #1 is used when you have many operations to do at once on the same bank. You could also just always keep your data bank at $7E. Remember also that if you want to access tables in ROM areas they are subject to the same data bank.
Indexing with X and Y is also a bit limited with long addressing. There is LONG, X and [DIRECT], Y but no LONG, Y.
This page is good reference: http://www.6502.org/tutorials/65c816opcodes.html
1. The 2-byte addressing modes use the data bank register to fill in the 3rd byte. You can switch the data bank to the bank of your variable:
Code: Select all
.accu 8 ; assuming 8-bit accumulator
lda.b #bankbyte(variable)
pha
plb ; sets data bank to high 8 bits of variable
inc.w loword(variable)
Code: Select all
lda.l variable
inc
sta.l variable
4. Use $2180 WMDATA register. Usually only convenient when you need to access many contiguous bytes in series, not so good for stuff like trying to increment something in place.
Since it's a bit inconvenient to change the data bank, usually #1 is used when you have many operations to do at once on the same bank. You could also just always keep your data bank at $7E. Remember also that if you want to access tables in ROM areas they are subject to the same data bank.
Indexing with X and Y is also a bit limited with long addressing. There is LONG, X and [DIRECT], Y but no LONG, Y.
This page is good reference: http://www.6502.org/tutorials/65c816opcodes.html
Re: snes assembly - beginnings, a few questions (wla-65816)
inc isn't an instruction, or even an opcode. It's a mnemonic for the use of the assembler. It can be used to refer to several different 8-bit opcodes, each corresponding to a different addressing mode.
Think of it this way. Bytes are just bytes. The CPU has no way of distinguishing a 2-byte address from a 3-byte address when just looking at the raw program bytestream. So when it loads an opcode (the first byte of the instruction), that opcode needs to contain all the necessary information for the CPU to load and correctly interpret the rest of the instruction before trying to load the next opcode, and that includes the length and nature of the operand.
In the case of inc, the mnemonic can refer to one of five different opcodes: $1A, $E6, $EE, $F6, or $FE. $1A is inc A, or ina, which constitutes a single-byte instruction with no operand that tells the CPU to increment the contents of the accumulator. $EE, by contrast, is the first byte of a 3-byte instruction, in which the following two bytes (the operand) are a 16-bit address. When the CPU sees the $EE opcode, it interprets it as "load two more bytes from the program counter, use them as an address, load one or two bytes from that address depending on accumulator size setting, increment the read value, and store it back".
There is unfortunately no opcode corresponding to inc that tells the CPU to load three bytes and use them as an address. With 8-bit opcodes, you only get 256 of them, and they ran out of room.
(The same problem exists on the Super FX, but worse, because the Super FX has a RISC-like architecture with 16 general-purpose registers, and a bunch of instructions pack a 4-bit opcode with a 4-bit operand indicating the register to use. They did a good job IMO considering the constraints, but when coding in hex you can really feel the squeeze...)
Think of it this way. Bytes are just bytes. The CPU has no way of distinguishing a 2-byte address from a 3-byte address when just looking at the raw program bytestream. So when it loads an opcode (the first byte of the instruction), that opcode needs to contain all the necessary information for the CPU to load and correctly interpret the rest of the instruction before trying to load the next opcode, and that includes the length and nature of the operand.
In the case of inc, the mnemonic can refer to one of five different opcodes: $1A, $E6, $EE, $F6, or $FE. $1A is inc A, or ina, which constitutes a single-byte instruction with no operand that tells the CPU to increment the contents of the accumulator. $EE, by contrast, is the first byte of a 3-byte instruction, in which the following two bytes (the operand) are a 16-bit address. When the CPU sees the $EE opcode, it interprets it as "load two more bytes from the program counter, use them as an address, load one or two bytes from that address depending on accumulator size setting, increment the read value, and store it back".
There is unfortunately no opcode corresponding to inc that tells the CPU to load three bytes and use them as an address. With 8-bit opcodes, you only get 256 of them, and they ran out of room.
(The same problem exists on the Super FX, but worse, because the Super FX has a RISC-like architecture with 16 general-purpose registers, and a bunch of instructions pack a 4-bit opcode with a 4-bit operand indicating the register to use. They did a good job IMO considering the constraints, but when coding in hex you can really feel the squeeze...)
Re: snes assembly - beginnings, a few questions (wla-65816)
Thanks, that's clearer. However, now I can see that the MC68K is a much "friendlier"" processor than the 65816 ..
Would it be better to leave wla-65816 and switch to CA64 for SNES?
Would it be better to leave wla-65816 and switch to CA64 for SNES?
- rainwarrior
- Posts: 8732
- Joined: Sun Jan 22, 2012 12:03 pm
- Location: Canada
- Contact:
Re: snes assembly - beginnings, a few questions (wla-65816)
For 65816 I personally prefer ca65 for a bunch of reasons but wla-dx is pretty capable.
I'd suggest using whichever feels more comfortable for now. When you already know your way around one, learning and switching to another one isn't so bad.
I'd suggest using whichever feels more comfortable for now. When you already know your way around one, learning and switching to another one isn't so bad.
-
- Posts: 1565
- Joined: Tue Feb 07, 2017 2:03 am
Re: snes assembly - beginnings, a few questions (wla-65816)
wla-dx is an amazing Z80 assembler but has lots of issues and quirks with 6502 based and even more so for the 65816.
I prefer 64tass as it is more powerful than ca65, but I would recommend ca65 over wla-dx all the same.
I prefer 64tass as it is more powerful than ca65, but I would recommend ca65 over wla-dx all the same.
Re: snes assembly - beginnings, a few questions (wla-65816)
I agree with Oziphantom. I had some bad experiences while using WLA with 65816, and I don't trust that this assembler always outputs the code you want it to for this architecture.
CA65 is much better but I heard that it has problems with handling the direct page (it tends to prefer to think that the direct page is always page 0). As I said earlier in this thread I use 64TASS for SNES development which supposedly handles the direct page mechanisms correctly.
Should you decide to try 64TASS, you can check my attempts here to get it up and running. It's actually quite simple compared to CA65 since there is no need for a separate config file like in CA65, just assembler directives.
CA65 is much better but I heard that it has problems with handling the direct page (it tends to prefer to think that the direct page is always page 0). As I said earlier in this thread I use 64TASS for SNES development which supposedly handles the direct page mechanisms correctly.
Should you decide to try 64TASS, you can check my attempts here to get it up and running. It's actually quite simple compared to CA65 since there is no need for a separate config file like in CA65, just assembler directives.
- rainwarrior
- Posts: 8732
- Joined: Sun Jan 22, 2012 12:03 pm
- Location: Canada
- Contact:
Re: snes assembly - beginnings, a few questions (wla-65816)
I think ca65 is quite alright for direct-page usage, even though it has no built-in concept for it.Pokun wrote: ↑Wed Dec 07, 2022 3:12 pmCA65 is much better but I heard that it has problems with handling the direct page (it tends to prefer to think that the direct page is always page 0). As I said earlier in this thread I use 64TASS for SNES development which supposedly handles the direct page mechanisms correctly.
The potential problem is that it automates detection of zeropage instructions whenever a label is known as zeropage or has a known value < 256. There are a lot of ways to easily avoid this though.
On NES I think automatic ZP detection is great, but on the SNES direct, absolute, and far are 3 independent address spaces and don't really want automatic mixing of them. So, instead of using the automation, I just put a z:/a:/f: prefix on addresses and it safely checks ranges for me. When the directpage moves, I can add < if the variables are page aligned (but with no check), or I might define relative direct-page aliases for the those variables (which are automatically safety checked).
Code: Select all
lda z:label ; zero page instruction (checked)
lda a:label ; absolute address (checked)
lda f:label ; far address (no check needed)
lda z:<label ; direct page aligned (unchecked)
dp_label = <(label - $435) ; relative alias
.assert (label>=$435)&&(label-$435)<256, error, "error message" ; range checking for alias (recommend: use a macro for this)
lda z:dp_label ; direct page instruction
lda z:Struct::Member ; direct-page-relative struct access
I just try to put clear comments in the code about direct-page assumptions, which for me seems sufficient. While I think a D-awareness language feature might be nice, from my experience I don't feel it's terribly important to have. I still make use of automatic addressing in some cases, but generally only in LoROM code sections where D=0.
.
64TASS on the other hand has a .dpage directive which adapts automatic instruction selection to take the given direct page address into account. So, the advantage there is that you can just set .dpage/.databank and have any LowRAM variables automatically sized for you. As long as you set those correctly, it's a low-effort automatic optimization vs. everything being assumed absolute.
I haven't tried 64TASS very much, but I believe it's more powerful at detecting which things can be direct-page as well, since it can multi-pass to resolve addresses and doesn't need the kind of explicit "zeropage" segment tagging like ca65. This is an advantage if you really like the automation.
.
Personally I don't generally like the automation in a SNES context, and I'd rather use slightly more verbose explicit/checked variations, usually. Yeah I can save having to create an alias or type an extra two characters for a prefix... but at the same time it hides what the real instruction is, and where my variables actually live.
I like automation for "don't care" kinda code, where I don't need to think about which variables are where, and performance doesn't matter too much. In a case like that, it's nice to just let the assembler safely "upgrade" direct-page accesses for me. If it's a high performance piece of code, I want to know exactly which instructions are direct and which are not, and I need explicit awareness of the location of my variables. Automation is the opposite of what I want there. So... ca65 only being able to automate the D=0 case isn't a big loss to me.
Re: snes assembly - beginnings, a few questions (wla-65816)
I see that doesn't sound so bad, thanks for explaining what the problem is and how to work around it.
I actually agree with you that this type of automation isn't really desirable in this case, and I don't rely on it in 64TASS either. I make sure to let the assembler know what D and K are whenever it needs to.
I do like some other aspects of 64TASS however, like the multipass thing in general and not having the need for a config file (though the config file has its advantages as well).
Knowing this I'd say either CA65 or 64TASS are fine, I've used both and had good experiences with both, but avoid WLA-DX (at least for 65816 programming).
I actually agree with you that this type of automation isn't really desirable in this case, and I don't rely on it in 64TASS either. I make sure to let the assembler know what D and K are whenever it needs to.
I do like some other aspects of 64TASS however, like the multipass thing in general and not having the need for a config file (though the config file has its advantages as well).
Knowing this I'd say either CA65 or 64TASS are fine, I've used both and had good experiences with both, but avoid WLA-DX (at least for 65816 programming).
-
- Posts: 1565
- Joined: Tue Feb 07, 2017 2:03 am
Re: snes assembly - beginnings, a few questions (wla-65816)
auto direct page is nice but not even the main reason to use 64tass
I do prefer the ,d addressing mode and ,b over the instruction prefix though, it also allows the assembler to check my assumptions in some cases.
being able to do <> over LOWORD( ) being able to get the upper word for when you are doing a 24 bit write in 16 bit mode i.e
being able to use `><` to get a swapped word for when you are putting things on the stack and using ,s to reference it
having
practically makes it painless.
but the real tour-de-force is the ability to manipulate lists in the assembler to do Array of Structs to Structs of Arrays conversion and other packing operations. So you can make
I do prefer the ,d addressing mode and ,b over the instruction prefix though, it also allows the assembler to check my assumptions in some cases.
being able to do <> over LOWORD( ) being able to get the upper word for when you are doing a 24 bit write in 16 bit mode i.e
Code: Select all
lda #<>someAddress
sta $02
lda #>`someAddress
sta $03
having
Code: Select all
.virtual #1,s
p1 .addr ? ; at #1,s
tmp .byte ? ; at #3,s
.endvirtual
lda (p1),y ; lda ($01,s),y
but the real tour-de-force is the ability to manipulate lists in the assembler to do Array of Structs to Structs of Arrays conversion and other packing operations. So you can make
Code: Select all
; sprites = (x,y,tile,vflip,hflip,priority,pal,size)
spritesData := (0,0,0,0,3,2,1)
spritesData ..= (16,0,0,0,3,2,0)
spritesData ..= (0,16,0,1,3,3,1)
spritesOAM .for spr in spritesData
.byte <spr[0], spr[1], spr[3]<<7|spr[4]<<6|spr[5]<<4|spr[6]<<1|spr[2]>>8, <spr[2]
.next
_tempOAM := 0
_tempCounter := 0
spritesOAMUpper .for spr in spriteData
_tempOAM |= >spr[0]
_tempOAM <<= 1
_tempOAM |= spr[7]
_tempOAM <<= 1
_tempCounter += 1
.if _tempCounter == 4
.byte _tempOAM
_tempOAM := 0
_tempCounter := 0
.endif
.next
- rainwarrior
- Posts: 8732
- Joined: Sun Jan 22, 2012 12:03 pm
- Location: Canada
- Contact:
Re: snes assembly - beginnings, a few questions (wla-65816)
Yeah, I do wish there was a more succinct operator than .lobyte for ca65 but it does let me .define my own anyway so even that complaint is a bit malleable.
The lists/tuples capability of 64tass looked interesting but I haven't tried it yet. ca65 does have a way to do some of this, e.g. a .define list and .lobytes, but it's less generic.
Usually, if I have a significant sized table it's generated either outside the program (build tool, python script, etc.) or in-place with a .repeat so the need doesn't really seem to come up for me.
It'd be nice if the C side of things had a "striped array" construction as an easy optimization, but it would have to be a C language extension. Though, where the desire comes up I generally just write the array access in assembly.
The lists/tuples capability of 64tass looked interesting but I haven't tried it yet. ca65 does have a way to do some of this, e.g. a .define list and .lobytes, but it's less generic.
Usually, if I have a significant sized table it's generated either outside the program (build tool, python script, etc.) or in-place with a .repeat so the need doesn't really seem to come up for me.
It'd be nice if the C side of things had a "striped array" construction as an easy optimization, but it would have to be a C language extension. Though, where the desire comes up I generally just write the array access in assembly.
Re: snes assembly - beginnings, a few questions (wla-65816)
I plan to return to the SNES assembler again in some time, but for now I'm trying to understand its architecture better
Can someone explain to me practically what are the main differences between LoROM and HiROM?
What are the pros and cons of both?
There are a lot of theories, but I haven't found a reasonable explanation...
I noticed that the HiROM memory map has space in banks $40-$7D only for 200ns ROM chips? However, $C0-FF is mainly used because it works with both 200ns and 120ns ROM (slow/fastROM)?
I guess (or not) that the biggest advantage of HiROM and its 64KB of space for ROM is, for example, the possibility of better optimization of the code, e.g. by less long jumps to other banks, e.g. the main game mechanics are located in one 64KB bank, which in the case of LoROM could wouldn't fit there and would require using more jumps/24Bit addressing?
However, the advantage of LoROM is again faster access to the first 8KB RAM and at the same time access to 32KB ROM without the need to use 24-bit addressing because within the same bank we have 32KB ROM as well as registers, RAM, etc.?
Which may result in more optimized operation of the code, which will not exceed 32KB (the main mechanics/game loop will not exceed 32KB). ?
Unfortunately, a lot about SNES is unclear to me, it is a much more twisted architecture than e.g. Sega Genesis
Can someone explain to me practically what are the main differences between LoROM and HiROM?
What are the pros and cons of both?
There are a lot of theories, but I haven't found a reasonable explanation...
I noticed that the HiROM memory map has space in banks $40-$7D only for 200ns ROM chips? However, $C0-FF is mainly used because it works with both 200ns and 120ns ROM (slow/fastROM)?
I guess (or not) that the biggest advantage of HiROM and its 64KB of space for ROM is, for example, the possibility of better optimization of the code, e.g. by less long jumps to other banks, e.g. the main game mechanics are located in one 64KB bank, which in the case of LoROM could wouldn't fit there and would require using more jumps/24Bit addressing?
However, the advantage of LoROM is again faster access to the first 8KB RAM and at the same time access to 32KB ROM without the need to use 24-bit addressing because within the same bank we have 32KB ROM as well as registers, RAM, etc.?
Which may result in more optimized operation of the code, which will not exceed 32KB (the main mechanics/game loop will not exceed 32KB). ?
Unfortunately, a lot about SNES is unclear to me, it is a much more twisted architecture than e.g. Sega Genesis