ASM6 question -- silent addressing mode shifts?

Discuss technical or other issues relating to programming the Nintendo Entertainment System, Famicom, or compatible systems.

Moderator: Moderators

User avatar
cpow
NESICIDE developer
Posts: 1097
Joined: Mon Oct 13, 2008 7:55 pm
Location: Minneapolis, MN
Contact:

ASM6 question -- silent addressing mode shifts?

Post by cpow »

Code: Select all

.org $8000
.enum $80
ZPVal .byte 0
.ende
.enum $800
NonZPVal .byte 0
.ende
lda (ZPV),y  ; *********
lda (NZPV),y ; **********
I am trying to decide how to handle disambiguation in my assembler. Specifically, above is the case of what to do when I encounter what appears to be an intended "post-indexed indirect" LDA on the lines marked with asterisks [intent discerned by presence of ()'s]. Obviously anyone can look at the second one and call me an idiot, because there's no way the programmer could have MEANT to try to use a non-zeropage address in an indexed indirect addressing mode. Furthermore a programmer should be well-within rights to put as many parenthesis as his paranoid mind requires, without having it cause a headache.

However, from the point-of-view of the assembler the instruction addressing mode of the marked lines is not determinable. I decided to run a test with ASM6. Sure enough, the program above assembles without error. It turns out ASM6 has decided for me that I really meant absolute indexed-by-y addressing mode for the second LDA, not post-indexed indirect as I may [or may not...I may just be a paranoid parenthesizer] have actually intended. Conversely I may have intended for absolute indexed-by-y addressing mode in the first LDA, but I get post-indexed indirect emitted without as much as a whif of confusion.

I bring this up because I want to make sure I am sane [not with regard to parenthesis emittance]. I am thinking that I will emit a "warning" that such promotions/demotions [I refer to zp-to-non-zp addressing mode shifts like this as demotions, non-zp-to-zp shifts as promotions] are occurring. Then let the user figure out whether or not this is an error in their code [ie. the value NZPV really SHOULD be a zeropage value] or just an assembler being as helpful as possible. In the case where the programmer intent is for absolute indexed-by-y addressing mode and the removal of the ()'s is not possible, perhaps a warning-supression flag or directive of some kind would be useful too.

I know there's at least one assembler out there that forces []'s upon the programmer to disambiguate this situation, but I would hate to go there.

Loopy?
User avatar
loopy
Posts: 403
Joined: Sun Sep 19, 2004 10:52 pm
Location: UT

Post by loopy »

ASM6 tries all addressing modes, ordered by (what I've deemed) most restrictive to least restrictive. The first one that works, it uses. By "works", I mean it must match a template (begins/ends with characters for that addressing mode like "( .. ),Y" ) and the address must fit (addr < 256 for ZP instructions). For LDA, it tries (indirect),Y before absolute,Y.

With your examples:

lda (ZPV),y : It sees "( ... ),Y" so it tries to make it indirect,Y, which it can.

lda (NZPV),y:
It also sees "( ... ),Y" so it tries to make an indirect,Y instruction. The address is too big so it tries absolute,Y. That works, so that's what it uses.

What's the right solution? I'm not sure. Warnings sound like a good idea, but I think they would get out of control. Every "lda (..),Y" could be interpreted as ABS,Y so do you want to throw a warning all the time? That would REALLY annoy me.

I think what I would do is give the programmer a way to disambiguate, and if they don't want to then the assembler will just make its best guess without complaining. Perhaps something like INSTR.MODE

lda.abs (blah),y ;use absolute,y
lda.ind (blah),y ;use (ind),y

Good question... and not an easy one to answer.
User avatar
Disch
Posts: 1848
Joined: Wed Nov 10, 2004 6:47 pm

Post by Disch »

I thought about this too way back when I was kicking around the idea of writing an assembler.

IMO:

- [] should not be used to replace parenthesis because they're used for long indirection on other 65xx systems (65c816)

- (foo),Y should not evaluate to Absolute,Y EVER, no matter what 'foo' is. No expression will ever need to be fully encased in parenthesis. The only reason to encase an expression in parenthesis would be to indicate you want indirection.

Consider the following situations:

Code: Select all

; obvious examples
lda foo,Y  ; clearly Abs,Y
lda (foo),Y  ; clearly Ind,Y
lda foo+1,Y  ; clearly Abs,Y

; more obfuscated
lda (foo+1),Y  ; Ind,Y (fully encased in parenthesis)
lda (foo)+1,Y  ; Abs,Y (not fully encased)
lda ((foo+3)*4),Y  ; Ind,Y
lda (foo+3)*4,Y  ; Abs,Y
Having (foo),Y evaluate to Ind,Y opens up some potential problems:

Code: Select all

foo = $100

; ...

LDA (foo),Y
That's clearly intended to be Indirect,Y, yet ASM6 (if I understand the logic properly) will assemble it as Absolute,Y. What's worse it will do it silently without an error or anything. That's going to be one really tough bug to find.

- In the case of "lda (NZPV),Y", I'd think the best course of action would be either:
-- give an error and refuse to assemble
or
-- assemble as truncated (take low byte only) and give a warning that the value was truncated
tepples
Posts: 22345
Joined: Sun Sep 19, 2004 11:12 pm
Location: NE Indiana, USA (NTSC)
Contact:

Post by tepples »

Disch wrote:- (foo),Y should not evaluate to Absolute,Y EVER, no matter what 'foo' is. No expression will ever need to be fully encased in parenthesis.
What you say is true of assemblers that don't apply C-style preprocessing. In C preprocessor, it is common to defensively parenthesize any macro using an operator in #define statements so that the operators' precedences don't interact in unexpected ways. A toy example to prove the point:

Code: Select all

#define SIX 1+5
int twelve = 2*SIX;  // expands to 2*1+5, which is 7 not 12

#define SEVEN (1+6)
int fourteen = 2*SEVEN;  // expands to 2*(1+6), which is in fact 14
Some assembler toolchains, such as GNU as, supports preprocessing with the C preprocessor. Pretend for a moment that GNU as supported 6502 as a target.

Code: Select all

#define SEVEN (1+6)
go:
  lda SEVEN,y
Would this assemble to lda $07,y or lda ($07,y)?

Another line of code to consider:

Code: Select all

go:
  lda (4)+(5),y
If you parse this using the naive glob pattern "lda (*),y", you'll get 4)+(5 as the argument.
User avatar
blargg
Posts: 3717
Joined: Mon Sep 27, 2004 8:33 am
Location: Central Texas, USA
Contact:

Post by blargg »

The fundamental problem is that the crreators of 65xx assembler syntax made a poor choice. They gave special meaning to something that had a meaning already. A programmer could easily parenthesize something for clarity but not realize that he happened to fall into this special syntax. If an assembler is to accept parenthesis for grouping as well as indirection, and minimize accidental errors, it should maximize the "distance" between the two valid forms, and give an optional warning for the forms between. But given that (addr),y is the common way to express indirect addressing, it's pretty hard to give a useful warning that won't be disabled by most users. One approach I've taken as a user to disambiguate is to add zero outside the parenthesis:

lda (addr),y ; indirect, regardless of value of addr
lda (addr)+0,y ; absolute
lda 0+(addr),y ; absolute
lda +(addr),y ; absolute (if assembler supports unary + operator)

You could even name zero in some common include file, to better document the above:

ABS = 0
lda (addr)+ABS,y
lda ABS+(addr),y
User avatar
Disch
Posts: 1848
Joined: Wed Nov 10, 2004 6:47 pm

Post by Disch »

tepples wrote:What you say is true of assemblers that don't apply C-style preprocessing. In C preprocessor, it is common to defensively parenthesize any macro using an operator in #define statements so that the operators' precedences don't interact in unexpected ways. A toy example to prove the point:
But C style preprocessing sucks for things like this, is known to cause these kinds of screw ups when used this way, and is therefore ill advised.

Besides in an assembler, macros and constants are defined totally differently.

Code: Select all

; bad
#define ACONST 5
#define AMACRO() lda ACONST,Y

; good
ACONST = 5
ACONST = (5)  ; or even this would be fine and quirk-free

.beginmacro AMACRO
lda ACONST,Y
.endmacro
Problem solved.

Things to note:

- constants are evaluated once, rather than #defines which are glorified copy/pastes. So the ACONST = (5) would not cause AMACRO to become indirect like the #define version would.
Some assembler toolchains, such as GNU as, supports preprocessing with the C preprocessor.
But does asm6? I don't think it necessarily should, but if it does, then it should do things like like #define does them (ie: all the quirks and whatnot)
Pretend for a moment that GNU as supported 6502 as a target.
[snip]Would this assemble to lda $07,y or lda ($07,y)?
The latter. But again this would be because #defines are a horrible way to define constants, not because this parenthesis approach is flawed.

In fact, the parenthesis in the #define might be intentional because the user might want it to make it indirect. What about this:

Code: Select all

#define foo (5),Y

lda foo ; absolute?  or Ind,Y?
Logically this should be Indirect,Y.
Another line of code to consider:

Code: Select all

go:
  lda (4)+(5),y
If you parse this using the naive glob pattern "lda (*),y", you'll get 4)+(5 as the argument.
Well that would be a dumb way to parse it then =P That wouldn't be a problem with this apporoach, it'd be a bug in the assembler.
User avatar
cpow
NESICIDE developer
Posts: 1097
Joined: Mon Oct 13, 2008 7:55 pm
Location: Minneapolis, MN
Contact:

Post by cpow »

Disch wrote: - [] should not be used to replace parenthesis because they're used for long indirection on other 65xx systems (65c816)
I completely agree, which is why I raised the concern.
Disch wrote: - (foo),Y should not evaluate to Absolute,Y EVER, no matter what 'foo' is. No expression will ever need to be fully encased in parenthesis. The only reason to encase an expression in parenthesis would be to indicate you want indirection.
No expression will ever need to, no. But no programmer can be expected never to break this rule either on purpose or by mistake. On purpose rule breaking is okay but the unintended rule break can lead to a potentially difficult-to-debug situation.
Disch wrote:

Code: Select all

foo = $100

; ...

LDA (foo),Y
That's clearly intended to be Indirect,Y, yet ASM6 (if I understand the logic properly) will assemble it as Absolute,Y. What's worse it will do it silently without an error or anything. That's going to be one really tough bug to find.
Exactly the reason I brought this up...ASM6 is doing that.
Disch wrote: - In the case of "lda (NZPV),Y", I'd think the best course of action would be either:
-- give an error and refuse to assemble
or
-- assemble as truncated (take low byte only) and give a warning that the value was truncated
I have opted for the warning/truncation route. I think I will further opt for the disambiguation directives suggested by loopy. These make sense. That way you will be warned that the assembler may be doing something you don't want it to do and you have a couple of options to resolve it. Either a)remove the parenthesis or b)disambiguate by addition of the appropriate disambiguation directive to the offending opcode.
User avatar
loopy
Posts: 403
Joined: Sun Sep 19, 2004 10:52 pm
Location: UT

Post by loopy »

asm6 allows both types of defines.

foo equ blah ;this is like #define
foo = blah ;blah is evaluated first, and foo is assigned that value

This is explained in the documentation. I think both styles are useful. Assembly macros don't work like in C, they are an entirely different beast and don't belong in this discussion.

I've decided I agree with Disch, if something is enclosed in paretheses it should always be considered indirect.

Disch wrote:
Another line of code to consider:

Code: Select all

go:
  lda (4)+(5),y
If you parse this using the naive glob pattern "lda (*),y", you'll get 4)+(5 as the argument.
Well that would be a dumb way to parse it then =P That wouldn't be a problem with this apporoach, it'd be a bug in the assembler.
Dumb to a human maybe. The syntax should be made more clear, from "anything in between paretheses", to "a full expresion should be surrounded by paretheses".
User avatar
loopy
Posts: 403
Joined: Sun Sep 19, 2004 10:52 pm
Location: UT

Post by loopy »

Disch wrote: - In the case of "lda (NZPV),Y", I'd think the best course of action would be either:
-- give an error and refuse to assemble
or
-- assemble as truncated (take low byte only) and give a warning that the value was truncated
I would argue that truncating an address will ALWAYS give you the wrong thing, so it should be an error, not a warning.
User avatar
cpow
NESICIDE developer
Posts: 1097
Joined: Mon Oct 13, 2008 7:55 pm
Location: Minneapolis, MN
Contact:

Post by cpow »

loopy wrote:
Disch wrote: - In the case of "lda (NZPV),Y", I'd think the best course of action would be either:
-- give an error and refuse to assemble
or
-- assemble as truncated (take low byte only) and give a warning that the value was truncated
I would argue that truncating an address will ALWAYS give you the wrong thing, so it should be an error, not a warning.
Good point. Great discussion.
User avatar
koitsu
Posts: 4203
Joined: Sun Sep 19, 2004 9:28 pm
Location: A world gone mad

Post by koitsu »

I agree with Disch and loopy, mostly. :-) In the case of the "lda (NZPV),Y" line, an error should be thrown by the 6502 assembler. Be glad you're not having to deal with 65816, otherwise this "debate" (note quotes) could go on for years.

The bottom line is that "it depends" on the assembler. As someone who has a history of doing 65c02 and 65816 on both the Apple IIE and IIGS, I've worked with 3 different major assemblers over the years: Merlin, Merlin 16, and ORCA/M. Each of them have different syntax quirks, especially with regards to what we're discussing here.

If you really wanted label or macro expansion to trump addressing mode assumptions, using braces (e.g. {}) works quite well, as it's not a pair of characters which is used by any 65xxx series CPU.

Otherwise, some of the assemblers I mentioned used to treat double-parens (e.g. (( )) ) special, while others simply required you write the equivalent of what Disch outlined in his macro.
User avatar
blargg
Posts: 3717
Joined: Mon Sep 27, 2004 8:33 am
Location: Central Texas, USA
Contact:

Post by blargg »

Not sure whether my reply got hidden in the shuffle, but I posted a disambiguation syntax that should already work on all assemblers:

lda (NZPV)+0,y ; abs,y
lda 0+(NZPV),y ; abs,y (alternate syntax)

With this "syntax", you can treat lda (NZPV),y as illegal (since NZPV>$FF).
tepples
Posts: 22345
Joined: Sun Sep 19, 2004 11:12 pm
Location: NE Indiana, USA (NTSC)
Contact:

Post by tepples »

Disch wrote:
Some assembler toolchains, such as GNU as, supports preprocessing with the C preprocessor.
But does asm6?
I have never used asm6, so I wouldn't know. I'm more familiar with ca65, which supports both C-style macros (.define) and structured macros (.macro/.endmacro). There are things that C-style macros can do that structured macros can't, and vice versa.
loopy wrote:I would argue that truncating an address will ALWAYS give you the wrong thing
This is true on 6502, but on 65C816, as I understand it, it's OK in some cases to truncate a 24-bit far address to a 16-bit near address provided the data segment register is set correctly (PHA PLB).
User avatar
loopy
Posts: 403
Joined: Sun Sep 19, 2004 10:52 pm
Location: UT

Post by loopy »

tepples wrote:
loopy wrote:I would argue that truncating an address will ALWAYS give you the wrong thing
This is true on 6502, but on 65C816, as I understand it, it's OK in some cases to truncate a 24-bit far address to a 16-bit near address provided the data segment register is set correctly (PHA PLB).
Noted. We are discussing 6502 assemblers though, so I'm not sure what your point is.
User avatar
cpow
NESICIDE developer
Posts: 1097
Joined: Mon Oct 13, 2008 7:55 pm
Location: Minneapolis, MN
Contact:

Post by cpow »

blargg wrote:Not sure whether my reply got hidden in the shuffle, but I posted a disambiguation syntax that should already work on all assemblers:

lda (NZPV)+0,y ; abs,y
lda 0+(NZPV),y ; abs,y (alternate syntax)

With this "syntax", you can treat lda (NZPV),y as illegal (since NZPV>$FF).
You are correct, this is a plausible disambiguation syntax. And, as you state, it does work. What I wonder, though, someone looking through the code "a long long time from now" might be like "wtf are all these silly +0's doing? They'll remove them and have something that completely crashes on their hands. If they're smart enough they'll disassemble the different file and realize what is going on, but then they'll think "well why on earth didn't the assembler SAY something?"

That was my main reason for bringing this up at all...to make sure it wasn't just me who thought it was strange that the assembler we talk about the most [at least from my experience here], ASM6, doesn't at least say "ehh, is this what you meant?"

I think I will still pursue the disambiguation qualifiers as a really obvious way of stating intent in an otherwise ambiguous situation.
Post Reply