Excessive operand length?

Are you new to 6502, NES, or even programming in general? Post any of your questions here. Remember - the only dumb question is the question that remains unasked.

Moderator: Moderators

Post Reply
CrowleyBluegrass
Posts: 42
Joined: Sun Jun 30, 2013 7:59 am

Excessive operand length?

Post by CrowleyBluegrass »

Couldn't think how to phrase the title :/

I'm writing a generic parser for 6502 assembler. The assembler I'm most familiar with is asm6 so it's based on that as much as anything. Anyway, the question I'd like to ask is whether it ever makes sense to accept syntax like:

Code: Select all

LDA $42424242
that is, with the operand length being "excessive". It seems like every time I assume something must be completely wrong, someone provides a use (or should I say, abuse) case. I guess you could maybe use this to force bytes into the binary? In that case though you'd just use a directive.

Just asking because I'm writing the operand parts of the parser and I'm unsure whether to put a hard limit on operand lengths for operators -- $FFFF for most addressing modes and $FF for zero page modes. The latter I'm not as unsure about because the operand length is practically the definition of the addressing mode, and without it you'd be unable to distinguish from absolute and zero page.

Thanks :)
Garth
Posts: 246
Joined: Wed Nov 30, 2016 4:45 pm
Location: Southern California
Contact:

Re: Excessive operand length?

Post by Garth »

I would think that whatever you're trying to do would be better done with macros.
http://WilsonMinesCo.com/ lots of 6502 resources
CrowleyBluegrass
Posts: 42
Joined: Sun Jun 30, 2013 7:59 am

Re: Excessive operand length?

Post by CrowleyBluegrass »

Garth wrote:I would think that whatever you're trying to do would be better done with macros.
I'm just doing this as an exercise/project. It's a fairly regular syntax to parse, especially compared to the majority of languages.

Edit: I think asm6 does do this?:

Code: Select all

                if(opsize[type]==1) {
                    if(!dependant) {
                        if(val>255 || val<-128)
                            errmsg=OutOfRange;
                    }
User avatar
pubby
Posts: 583
Joined: Thu Mar 31, 2016 11:15 am

Re: Excessive operand length?

Post by pubby »

You shouldn't allow addresses outside 16 bit, but you should allow numeric constants to be 32 or 64 bit.
User avatar
koitsu
Posts: 4201
Joined: Sun Sep 19, 2004 9:28 pm
Location: A world gone mad

Re: Excessive operand length?

Post by koitsu »

To be a bit more clear than my above colleagues: 6502 operands technically can only be 0 bytes (ex. nop), 1 byte (ex. lda zp / lda $23 / bcc $e7 (which is PC-relative, so while that's sort of assembly + bytecode combined, that might be bcc $e230 or something like that), or 2 bytes (ex. jmp $800c). That's literally all the CPU supports. There is no other variance. Each addressing mode (and there are only a few) has a set/defined size. These are well-documented across tons of resources, books, everything. So, for your example LDA $42424242, this wouldn't assemble / would generate an out-of-range error or parser error of some sort.

I say this with total respect, not judgement: I think you need to spend some more time getting to understand the CPU if you're having to ask this question. Saying you are "familiar with asm6" *should* mean you are familiar with writing 6502 code, but possibly some part of it in your head is "mangled" because you haven't actually looked at assembled results before. May it would be helpful if you looked at the results of a _disassembler_, for learning purposes? Not sure. Since you're familiar with asm6, you should try generating a listings file (-l (lowercase ELL) flag) and look at the raw bytecode generated on a per-instruction basis. Or, well, just read actual CPU documentation... :-)

As for a "generic 6502 parser", you are going to have one hell of a time with this, specifically if you plan on comprehending *human-written assembly* with things like names for labels, equates, macros, and so on. This will probably break your brain, because every assembler is different. Consider variances like how NESASM forces you to use [] brackets for indirect addressing (ex. lda [$40,x]), while pretty much every other assembler since the 70s has used () parenthesis (ex. lda ($40,x)). Now consider assembler directives (a.k.a pseudo-ops), etc.. Is it possible? Probably after years of work, but I don't see what would really be gained by this (FYI, that is not me posing a question).
Oziphantom
Posts: 1565
Joined: Tue Feb 07, 2017 2:03 am

Re: Excessive operand length?

Post by Oziphantom »

It depends on how you want to handle cart banks..
While a 6502 can not see beyond FFFF, carts can be a lot bigger than FFFF, a 24bit limit is probably sane enough. So as you assemble for cart bin files, you will need a 24bit address, so if you have labels that are in a bank and you allow the PC to be set to beyond FFFF so you can place things at specific points in the bin file or override points in the bin file to make a patch, you will get STA $XXXXXX cases, to which your code will need to be able to convert XXXXX into the correct XXXX value before it can assemble.
User avatar
koitsu
Posts: 4201
Joined: Sun Sep 19, 2004 9:28 pm
Location: A world gone mad

Re: Excessive operand length?

Post by koitsu »

Oziphantom wrote:It depends on how you want to handle cart banks..
While a 6502 can not see beyond FFFF, carts can be a lot bigger than FFFF, a 24bit limit is probably sane enough. So as you assemble for cart bin files, you will need a 24bit address, so if you have labels that are in a bank and you allow the PC to be set to beyond FFFF so you can place things at specific points in the bin file or override points in the bin file to make a patch, you will get STA $XXXXXX cases, to which your code will need to be able to convert XXXXX into the correct XXXX value before it can assemble.
No 6502 assembler works like this (read: some kind of faux-linear-addressing that abstracts out mapper PRG-ROM switching). CPU addressing space is $0000-FFFF. Addressing range in operands cannot exceed 16-bit. You will just confuse the OP with what you've said.
CrowleyBluegrass
Posts: 42
Joined: Sun Jun 30, 2013 7:59 am

Re: Excessive operand length?

Post by CrowleyBluegrass »

koitsu wrote:To be a bit more clear than my above colleagues: 6502 operands technically can only be 0 bytes (ex. nop), 1 byte (ex. lda zp / lda $23 / bcc $e7 (which is PC-relative, so while that's sort of assembly + bytecode combined, that might be bcc $e230 or something like that), or 2 bytes (ex. jmp $800c). That's literally all the CPU supports. There is no other variance. Each addressing mode (and there are only a few) has a set/defined size. These are well-documented across tons of resources, books, everything. So, for your example LDA $42424242, this wouldn't assemble / would generate an out-of-range error or parser error of some sort.
That's how I've got it defined, I just wanted to be sure.
koitsu wrote:I say this with total respect, not judgement: I think you need to spend some more time getting to understand the CPU if you're having to ask this question. Saying you are "familiar with asm6" *should* mean you are familiar with writing 6502 code, but possibly some part of it in your head is "mangled" because you haven't actually looked at assembled results before. May it would be helpful if you looked at the results of a _disassembler_, for learning purposes? Not sure. Since you're familiar with asm6, you should try generating a listings file (-l (lowercase ELL) flag) and look at the raw bytecode generated on a per-instruction basis. Or, well, just read actual CPU documentation... :-)
I'll definitely start looking at listings files if I have any doubts. This wasn't so much not understanding the CPU as it was not understanding whether this kind of thing might be done for some unknown purpose even thought it's "technically" incorrect.

Thanks for the advice. I'll admit (not that it was ever in doubt) I'm not the most experienced 6502 programmer, but I do understand how opcodes are placed in the binary w.r.t addressing mode, and how labels are resolved. That's enough for me personally to still enjoy this project and feel like I can make something I'll be proud of, so I'll continue on. I have no illusions of making the best assembler ever, plenty of great ones already exist. I'm having a lot of fun - and learning a lot - doing this.
koitsu wrote:As for a "generic 6502 parser", you are going to have one hell of a time with this, specifically if you plan on comprehending *human-written assembly* with things like names for labels, equates, macros, and so on. This will probably break your brain, because every assembler is different. Consider variances like how NESASM forces you to use [] brackets for indirect addressing (ex. lda [$40,x]), while pretty much every other assembler since the 70s has used () parenthesis (ex. lda ($40,x)). Now consider assembler directives (a.k.a pseudo-ops), etc.. Is it possible? Probably after years of work, but I don't see what would really be gained by this (FYI, that is not me posing a question).
Apologies, I've given completely the wrong impression with the word "generic". Generic as in, not specific. Just for parsing opcodes, operands, labels, and defines. Not my best choice of verbiage - my bad.
User avatar
koitsu
Posts: 4201
Joined: Sun Sep 19, 2004 9:28 pm
Location: A world gone mad

Re: Excessive operand length?

Post by koitsu »

"Parsing labels and defines" is going to cause you grief. Nothing stops an assembler from doing a mathematical expression that results in, say, a 32-bit value. The 6502's addressing modes, however, obviously cannot handle that because registers are only 8 bits, and absolute addressing modes only support a maximum of 16-bit values for addressing, i.e. $0000-FFFF in ROM space, so the programmer must use something like < (asm6 and others) or .LOBYTE() (ca65) to get at the lowest 8-bit piece of the 32-bit value. All of this is all done at assemble-time, not run-time! There is no standard for what the maximum size of an expression can be (and I would say most assembler documentations do not disclose this). So, in the end, CPU-wise, the size limits are as I stated: opcodes are always 1 byte in length, operands are either 0 (implied), 1 (immediate, ZP), or 2 bytes (absolute) depending on addressing mode.

The point is: if you're parsing human-written source code, you are in for a massive world of hurt and hair-pulling. There is no "standard" for this, sorry to say; humans are very good at creating ways to solve conundrums or limitations of a tool (assembler, linker, etc.) through other means, so parsing/handling all of that may break your brain. I wish you genuine luck in this endeavour.
Oziphantom
Posts: 1565
Joined: Tue Feb 07, 2017 2:03 am

Re: Excessive operand length?

Post by Oziphantom »

koitsu wrote:
Oziphantom wrote:It depends on how you want to handle cart banks..
While a 6502 can not see beyond FFFF, carts can be a lot bigger than FFFF, a 24bit limit is probably sane enough. So as you assemble for cart bin files, you will need a 24bit address, so if you have labels that are in a bank and you allow the PC to be set to beyond FFFF so you can place things at specific points in the bin file or override points in the bin file to make a patch, you will get STA $XXXXXX cases, to which your code will need to be able to convert XXXXX into the correct XXXX value before it can assemble.
No 6502 assembler works like this (read: some kind of faux-linear-addressing that abstracts out mapper PRG-ROM switching). CPU addressing space is $0000-FFFF. Addressing range in operands cannot exceed 16-bit. You will just confuse the OP with what you've said.
64Tass allows you to set a 24bit address range for the output file, which effectively allows you to make a 16MB file, this is how I make 512K CRT files. So while all code is assembled within a 16bit address limit the output is positioned into a linear 24bit file.

How does a NES assembler handle a cart > 64K if everything in the file is limited to 64K?
User avatar
rainwarrior
Posts: 8734
Joined: Sun Jan 22, 2012 12:03 pm
Location: Canada
Contact:

Re: Excessive operand length?

Post by rainwarrior »

You only enforce the range of operands, not all expressions.

How do you find a label that's in a bank? Well, the label has that metadata attached somehow by the assembler. It might have an associated ".bank" or ".segment" or some other property like this depending on the assembler. That's not part of the operand, though, so it's not applicable. (If you need to get an associated bank number to write to a banking register, some assemblers have mechanisms for that too. CA65 lets you add a bank number attribute to a segment and retrieve it with a pseudo-function, for example.)

The address of a thing in the file probably isn't really something you'd want to use in a 6502 assembler... I'm not sure of an application for that. For any 6502 instruction operand, you need it's address in memory, and that is never more than 16-bit. The platform doesn't do anything larger.


Some assemblers will truncate expressions larger than 16 bits without warning, and probably there are some people who prefer it that way, but personally I love to have range checking and will gladly accept having to manually truncate the rare cases where I need to. It's safer and more explicit. Though, one sticky point here is what to do about signed values... ca65's range checking is unsigned only and it doesn't have a mechanism to turn it off temporarily around code that needs signed stuff. On the other hand, something like allowing -128 to 255 has the converse problem of not catching unsigned values that have underflowed. I'd take either of these compromises over silent truncation in a heartbeat though.


Expressions on the other hand should be some practical large type size. 32-bit seems to be common. I wouldn't expect them to be limited to 16-bits unless this was a very old assembler written for an actual 16-bit computer. The extra bits are important when you need to do assemble-time calculations (especially multiplying and dividing). The range check only belongs on instruction operands.
koitsu wrote: There is no standard for what the maximum size of an expression can be (and I would say most assembler documentations do not disclose this).
Well, NESASM and ASM6 don't, but ca65 explicitly documents it. I think I've seen it documented in several assembler manuals, but maybe we're talking about a domain where a lot of assemblers don't have very comprehensive documentation to begin with. :P
Oziphantom
Posts: 1565
Joined: Tue Feb 07, 2017 2:03 am

Re: Excessive operand length?

Post by Oziphantom »

rainwarrior wrote:Some assemblers will truncate expressions larger than 16 bits without warning, and probably there are some people who prefer it that way, but personally I love to have range checking and will gladly accept having to manually truncate the rare cases where I need to. It's safer and more explicit. Though, one sticky point here is what to do about signed values... ca65's range checking is unsigned only and it doesn't have a mechanism to turn it off temporarily around code that needs signed stuff. On the other hand, something like allowing -128 to 255 has the converse problem of not catching unsigned values that have underflowed. I'd take either of these compromises over silent truncation in a heartbeat though.
I would think the type that you are using should dictate this.

Code: Select all

.byte  <some expression> ; unsigned
.char  <some expression> ; signed
.word  <some expression> ; unsigned 16bits
.sint  <some expression> ; signed 16bits
.addr  <some expression> ; 16bit unsigned, forced to PC limits auto clips bank byte
.rta   <some expression> ; rts return address, 16bits, forced to PC limits auto clips bank byte
.long  <some expression> ; unsigned 24 bits
.lint  <some expression> ; signed 24 bits
.dword <some expression> ; unsigned 32 bits
.dint  <some expression> ; signed 32 bits
for immediate values

Code: Select all

lda #XX    <- unsigned range 0-255
lda #XXXXX <- unsigned range 0-65535
lda #+XX   <- signed range -128-127
lda #-XX   <- signed range -128-127
lda #+XXXX <- signed range -32768-32767
lda #-XXXX <- signed range -32768-32767
User avatar
rainwarrior
Posts: 8734
Joined: Sun Jan 22, 2012 12:03 pm
Location: Canada
Contact:

Re: Excessive operand length?

Post by rainwarrior »

Yeah, that's something I proposed on the CC65 mailing list in the past (ref), signed data types and some sort of "signed immediate" mechanism. I was thinking that we would need some alternative symbol besides # to indicate a signed immediate, but treating #- and #+ as a digraph that indicates this actually seems like a pretty good way to do it. I say digraph meaning it should only work if they are directly adjacent, so you can still use the negate operator with an unsigned value if you need it:

Code: Select all

#(-o) ; still use an unsigned range check
#-o ; signed
Anyway, despite this being a longstanding irritation, it's only a minor one for me. (In that linked mailing list post I indicated several workarounds for the same problem.) I'd love to see this feature, but also it doesn't seem enough of a problem for me to implement it myself, so far. The overwhelming majority of my code only needs unsigned types anyway, so occasional exceptions for signed isn't too much of an issue for me.

Incidentally the char type is unsigned on CC65, so it is not an appropriate symbol for a signed byte. (This is allowed by the C spec, and probably an appropriate choice for a 6502 compiler.)
Post Reply