Multiple symbols with one value in ld65 .dbg files

Discuss emulation of the Nintendo Entertainment System and Famicom.

Moderator: Moderators

Post Reply
tepples
Posts: 22708
Joined: Sun Sep 19, 2004 11:12 pm
Location: NE Indiana, USA (NTSC)
Contact:

Multiple symbols with one value in ld65 .dbg files

Post by tepples »

"I broke this debugger with a byte with two names..."

Sometimes a variable has more than one name. Versions of the emulator Mesen published prior to third quarter 2022 have had trouble sorting this out, causing emu.getLabelAddress in a Lua script to stop execution with the error "label not found".

For example, six bytes of an actor's state may be devoted to the state of an actor's physics, such as subpixel position and velocity. Actors that don't use momentum and instead need more complex state may reuse those six bytes for something else, such as the state of their AI. I've modeled this as giving an alias to one of the momentum-related variable names.

In one of a game's core files:

Code: Select all

actor_dxsub: .res NUM_ACTORS  ; Actors with momentum
actor_dx:    .res NUM_ACTORS  ; Actors with momentum
actor_xsub:  .res NUM_ACTORS  ; Actors with momentum
actor_x:     .res NUM_ACTORS  ; All actors
actor_xscr:  .res NUM_ACTORS  ; All actors
actor_dysub: .res NUM_ACTORS  ; Actors with momentum
actor_dy:    .res NUM_ACTORS  ; Actors with momentum
actor_ysub:  .res NUM_ACTORS  ; Actors with momentum
actor_y:     .res NUM_ACTORS  ; All actors
actor_yscr:  .res NUM_ACTORS  ; All actors
In the source file implementing one actor class's movement:

Code: Select all

num_bubbles = actor_xsub
In the source file implementing another actor class's movement:

Code: Select all

spirit_distance = actor_xsub
In a third source file implementing a third actor class's movement:

Code: Select all

seen_space_characters = actor_xsub
I shared a .dbg file from a game with Sour, former maintainer of Mesen. From these, Sour inferred a convention of using : for an address's canonical name (or := if setting it as a numeric value, such as a hardware register or an offset from another pointer) and = for aliases. A symbol created with : or := is type lab (label), whereas one set with = is type equ (equate). Stripping out all type=equ symbols from the file, either after ld65 creates it or while Mesen is loading it, makes Mesen less likely to misbehave.

However, this convention breaks when a source file defines multiple labels with :.

Code: Select all

Ann_cels:
  .incbin "Ann_cels.chr"
Ann_cels_end:
Andy_cels:
  .incbin "Andy_cels.chr"
Andy_cels_end:
Another convention accommodates these by distinguishing : and :=, for symbols referring to an address in CPU memory, from =, for sizes, offsets, velocities, addresses in PPU memory, etc. Distinguishing a canonical name from an alias would then happen in a different way, prioritized by one or more of these:
  • Whether the symbol is ever defined with a nonzero size=
  • How many other translation units .import the symbol, represented in the .dbg file as count of sym entries of the same name= with type=imp
  • type=lab beats type=equ, as before
Ideally, all aliases would be accessible through emu.getLabelAddress. However, because some are local to one file and not .exported, definitions of file-local symbols with the same name may conflict. Counting type=imp could help here as well.

To better inform the priorities that a debugger can put in place, what are the usual conventions in ca65 programs to distinguish among canonical labels, alias labels, and non-address symbols?
Post Reply