Looking for feedback about a patching 6502 assembler for integration in larger C# projects

Discuss technical or other issues relating to programming the Nintendo Entertainment System, Famicom, or compatible systems. See the NESdev wiki for more information.

Moderator: Moderators

Post Reply
jroweboy
Posts: 2
Joined: Mon Apr 03, 2023 9:50 am

Looking for feedback about a patching 6502 assembler for integration in larger C# projects

Post by jroweboy »

i'm starting work on a new assembler for 6502 intended for game patching/randomization, and I'm just seeing if anyone has feedback.

Randomizers often need a LOT of code for managing the meta information for the game, and so this assembler is designed for generating modules that can patch existing modules at link time. For instance, using `.org` will default to overwriting the base code from the game, and `.reloc` generates relocatable blocks. `.free` can be used to mark sections of the game as unused which will let the linker place relocatable blocks in there.

The catch is i'm doing the Assembling and linking in the C# so the rest of the C# code for the randomizer can interact with the ASM module, and I'm creating a custom C# class to represent all of the different expressions in the syntax. So if you wanted to write ASM in C# it would look like

Code: Select all

var a = new Asm(org: 0x8000, segment: Segments[0x2]);
var PowerOfTwos = a.label("PowerOfTwos");
a.db(0x1, 0x2, 0x4, ...); // etc...
var Exit = a.label("Exit");
a.rts();
a.lda("MyValue"); // link time value loads the Label and uses zp or abs depending on the label type
a.tay();
a.lday(PowerOfTwos); // uses Abs addressing since its a label
a.cmp(Imm(0x20));
a.bne(Exit);
a.rts();
But this is ugly right? It has some cool things done here, like the types and stuff are checked by the C# compiler, but its got a lotta cruft to write out just for basic blocks.

So this is where i feel I can do something novel, C# supports a feature called SourceGenerator (and the new IncrementalGenerator) which allows you to register a custom compiler step to generate C# source files. After I get the basic building blocks that i wrote above compiling and linking, I plan to write a custom 6502 parser/lexer that will read regular 6502 asm files and spit out generated blocks like above. In your C# project, you'll be able to include full assembly source files and import/reference the modules from the C# code.

Whats neat about this approach is the C# compiler will be doing type checking for the assembly, and using the `#line` feature, I can actually display the errors from the generated source directly into the ASM file, with error underlining and everything.

Code: Select all

BasicLabel:
  lda $80, y ; red squiggly under $80: error on line 2 no such function lda_y that takes ZeroPage
will turn into generated source that looks something like

Code: Select all

var a = new Asm();
#line (1,0)-(1-10) "asmfile.s"
var BasicLabel = a.label("BasicLabel");
#line (2,3)-(2-6) "asmfile.s"
a.lda_y(
#line (2-8)-(2-11) "asmfile.s"
  ZeroPage(0x80) // This generated code has an error since theres no overload for lda_y that takes a ZeroPage
  );
Even cooler, is I can use this same function on inline C# asm code like thus

Code: Select all

public partial class Foo {

  [Asm("""
.segment "2"
.org $8000 ; Location where the player movement is checked
  ; Patch a function in the original game to call our custom handler
  jsr PreventPlayerFromGoingOffscreen

PreventPlayerFromGoingOffscreen:
  lda PlayerX
  cmp #$ef
  bcc +
    ; do something
+ ; do original hook code here
  rts
""")]
  private Module PreventPlayerFromGoingOffscreen;
  
  // Just example code for how it can be used
  void Randomizer(Flags flags) {
    if (flags.DontGoOffscreen)
      Assembler.AddModule(PreventPlayerFromGoingOffscreen);
  }
}
and using the magic of source generation, it will compile the ASM into equivalent C# code that uses those classes I talked about above. Since we declared the class as a partial class, the generator will "complete" the partial class by filling in the value for the field `PreventPlayerFromGoingOffscreen`

So yeah, I think this could be really handy for these kinds of ROM hacking/ randomizer projects where you want a lot of options toggled on/off with high level code to dictate how its to be used
User avatar
Dwedit
Posts: 4924
Joined: Fri Nov 19, 2004 7:35 pm
Contact:

Re: Looking for feedback about a patching 6502 assembler for integration in larger C# projects

Post by Dwedit »

I made a patching 6502 assembler for C++ projects once.

First, you bind an array range to the Assembler (array, index and length), as well as the 6502 address that corresponds to the first byte of that range.

As you assemble instructions, you write bytes to the array and advance the current address.

When you write an instruction that needs a symbol, you write dummy data to the output. Instead of writing the symbol, you write to a "Fixup" list which takes in the Address of the code, and the name of the symbol. This allows you to define the symbols later. You'd also need to convert symbols into relative displacements for branches.

You can just simply create a label (symbol) for the current address, or create symbols for any provided address.

I ended up naming things like LDA_ZPG, LDA_IMM, LDA_ZPG_X, LDA_ABS_X, LDA_IND_Y. It's not as nice as the true text form of the instruction, but it skips any string parsing and keeps the function calls direct.

---

It basically ended up being just like your first example. For someone who would want to maintain a project that used code assembled that way, it would be straightforward, though annoying to actually write any 6502 code. But it's pretty clear to an assembly programmer how it works, and how to use it.

(edit: remainder of post being made into a reply)
Here come the fortune cookies! Here come the fortune cookies! They're wearing paper hats!
User avatar
Dwedit
Posts: 4924
Joined: Fri Nov 19, 2004 7:35 pm
Contact:

Re: Looking for feedback about a patching 6502 assembler for integration in larger C# projects

Post by Dwedit »

An embedded assembler that actually parses text would get complicated once you try to automatically detect if a symbol is zeropage or not. You just can't emit any bytes until you know whether your instruction is zeropage or not.

So options include:
* Use > symbol (unary 'low byte of' operator) to annotate zeropage instructions
* Require symbols to be pre-declared, anything not pre-declared will be an absolute address rather than a zeropage address
-- Declaring the symbols could be done with C# code (add symbols to a list first), or with ASM code (equate the symbols before any code that uses them)
* Dummy first pass on all code to get addresses of symbols, and use this result to determine which symbols are zeropage
-- Can't use this with code that assumes you will be immediately outputting bytes
-- Does allow symbols to be declared after their use

(another reply being written)
Here come the fortune cookies! Here come the fortune cookies! They're wearing paper hats!
User avatar
Dwedit
Posts: 4924
Joined: Fri Nov 19, 2004 7:35 pm
Contact:

Re: Looking for feedback about a patching 6502 assembler for integration in larger C# projects

Post by Dwedit »

I don't really like the idea of a pre-build step that's parsing 6502 code, I think it would make a project harder to maintain, or harder to explain to other people who might need to set up a similar solution.

I would think that the C# program knows the best time to emit the assembly code, so the C# code should just call a function called "Assemble", which takes in a string (or byte array) containing the assembly code.
It would be the same as if you were repeatedly calling Instruction Emit functions, and Create Label functions. You get bytes emitted into the binary, and you get any new entries in your symbol list and fixup list.

You can use """ syntax for multiple lines.

If you need to organize in to separate files, you can use File resources. The data becomes available at "Properties.Resources.filename_ext" as a byte array.
Or if you prefer Static Classes, you could use something like: static partial class AsmCode { public static string file_asm = """ blah """; }

If you need to #include, you can use reflection to check for a resource with the correct filename. For example, your assembler encounters the line #include "something.asm". The filename could be reformatted as a variable named "something_asm", and you use reflection to check for an embedded resource (or static class member) with that name. If that succeeds, you include the embedded resource. If that fails, you can go out to the file system and look for that file.
Here come the fortune cookies! Here come the fortune cookies! They're wearing paper hats!
jroweboy
Posts: 2
Joined: Mon Apr 03, 2023 9:50 am

Re: Looking for feedback about a patching 6502 assembler for integration in larger C# projects

Post by jroweboy »

Dwedit wrote: Mon Apr 03, 2023 11:34 am I made a patching 6502 assembler for C++ projects once.

First, you bind an array range to the Assembler (array, index and length), as well as the 6502 address that corresponds to the first byte of that range.

As you assemble instructions, you write bytes to the array and advance the current address.

When you write an instruction that needs a symbol, you write dummy data to the output. Instead of writing the symbol, you write to a "Fixup" list which takes in the Address of the code, and the name of the symbol. This allows you to define the symbols later. You'd also need to convert symbols into relative displacements for branches.

You can just simply create a label (symbol) for the current address, or create symbols for any provided address.

I ended up naming things like LDA_ZPG, LDA_IMM, LDA_ZPG_X, LDA_ABS_X, LDA_IND_Y. It's not as nice as the true text form of the instruction, but it skips any string parsing and keeps the function calls direct.

---

It basically ended up being just like your first example. For someone who would want to maintain a project that used code assembled that way, it would be straightforward, though annoying to actually write any 6502 code. But it's pretty clear to an assembly programmer how it works, and how to use it.

(edit: remainder of post being made into a reply)
Thanks for the thoughtful replies! I did come across your project in my requirements gathering step, and so its cool to see you respond here as well :D I wanted something similar to your project but also more ergonomic for a developer to use, while still providing the compiler time benefits that using the static typed classes.
Dwedit wrote: Mon Apr 03, 2023 12:32 pm An embedded assembler that actually parses text would get complicated once you try to automatically detect if a symbol is zeropage or not. You just can't emit any bytes until you know whether your instruction is zeropage or not.

So options include:
* Use > symbol (unary 'low byte of' operator) to annotate zeropage instructions
* Require symbols to be pre-declared, anything not pre-declared will be an absolute address rather than a zeropage address
-- Declaring the symbols could be done with C# code (add symbols to a list first), or with ASM code (equate the symbols before any code that uses them)
* Dummy first pass on all code to get addresses of symbols, and use this result to determine which symbols are zeropage
-- Can't use this with code that assumes you will be immediately outputting bytes
-- Does allow symbols to be declared after their use
Yes, and my solution is sort of a mixture of all of these, if a symbol has a known size at module build time such as when using `.lobyte`/`.hibyte`, then it uses that size. But also if a symbol is unknown, defer until link time to resolve (which means that this could NOT be statically checked at C# compile time like i want). I realize that I can't expect all symbols to be known for every module being built, but the output of the compile time codegen isn't raw 6502 opcodes, its an object "file" that can be linked with the other objects later, at which step we can check other module export/imports and resolve them.

With the object file design, this will let the runtime code generator skip over any blocks that haven't changed using the caching rules as well, to keep from having to reparse and generate code for them again. I effectively get both a runtime assembler and a compile time syntax checker.
Dwedit wrote: Mon Apr 03, 2023 12:56 pm I don't really like the idea of a pre-build step that's parsing 6502 code, I think it would make a project harder to maintain, or harder to explain to other people who might need to set up a similar solution.

I would think that the C# program knows the best time to emit the assembly code, so the C# code should just call a function called "Assemble", which takes in a string (or byte array) containing the assembly code.
It would be the same as if you were repeatedly calling Instruction Emit functions, and Create Label functions. You get bytes emitted into the binary, and you get any new entries in your symbol list and fixup list.

You can use """ syntax for multiple lines.

If you need to organize in to separate files, you can use File resources. The data becomes available at "Properties.Resources.filename_ext" as a byte array.
Or if you prefer Static Classes, you could use something like: static partial class AsmCode { public static string file_asm = """ blah """; }

If you need to #include, you can use reflection to check for a resource with the correct filename. For example, your assembler encounters the line #include "something.asm". The filename could be reformatted as a variable named "something_asm", and you use reflection to check for an embedded resource (or static class member) with that name. If that succeeds, you include the embedded resource. If that fails, you can go out to the file system and look for that file.
I think maybe the hold up here is you are expecting it to be complicated to configure the prebuild process, but from what I've tested so far, its actually really straightforward. All it takes is including my projects assembly in your project (typically just a right click -> install package from nuget) and then its all setup. The SourceGenerator is configured by an attribute that the compiler scans for on build, and it'll automatically start up my custom source gen. On build, the custom SourceGenerator will scan for all asm files, or any fields/properties that have my custom `Asm` attribute on there. From there, its using the same Lexer/Parser that I'll have available at runtime (like you are suggesting I build out in this comment) in order to tokenize the 6502, but instead of emitting 6502 machine code, I am emitting C# code that wraps the Asm code as a precompiled Objfile. The linking and final 6502 machine code emitting step will still need to happen at application runtime.

The primary objective is to have a better checked build environment for the ASM code, since the types are checked as best as possible at compile time, any errors reported now will end up as compile errors for the application. While this isn't the same as a full language server implementation for 6502, I feel its a good step towards that goal, as I'd love to have better tooling for NES romhacking.

All this said, maybe its best if i just start work towards a language server implementation instead of this halfhearted intermediary codegen. Not sure which i prefer yet. ...
Oziphantom
Posts: 1565
Joined: Tue Feb 07, 2017 2:03 am

Re: Looking for feedback about a patching 6502 assembler for integration in larger C# projects

Post by Oziphantom »

what advantage does this have over "spit out asm file" and call assembler with it?
User avatar
Dwedit
Posts: 4924
Joined: Fri Nov 19, 2004 7:35 pm
Contact:

Re: Looking for feedback about a patching 6502 assembler for integration in larger C# projects

Post by Dwedit »

Maybe I'm not understanding the problem, but I'm getting vibes of Scope Creep and Inner Platform Effect here, and solving interesting problems (compile time ASM!) for the sake of solving them rather then going for a goal (actually getting custom asm code into a game).
Here come the fortune cookies! Here come the fortune cookies! They're wearing paper hats!
Post Reply