LLVM-MOS NES targets

Discuss technical or other issues relating to programming the Nintendo Entertainment System, Famicom, or compatible systems. See the NESdev wiki for more information.

Moderator: Moderators

Post Reply
mysterymath
Posts: 6
Joined: Sat Apr 23, 2022 12:34 pm

LLVM-MOS NES targets

Post by mysterymath »

Hi nesdev,

I'm the primary codegen author for the LLVM-MOS 6502 backend for Clang/LLVM. I finally got around to playing around with developing some quick example routines for my NES, and I ended up with the basis for a real port for LLVM-MOS to the NES.

I've added skeletal targets to the LLVM-MOS SDK for the NES-NROM-128, NES-NROM-256, and NES-SLROM (MMC1) boards. I'm planning to target at least one board for each Nintendo-produced mapper, to make sure that various banking schemes are all reasonably supportable.

Right now, only basic NES poweron functionality is provided, as well as the usual C runtime initialization and finalization routines. I've also added a small PPU support library, and simple color-cycling example. The compiler outputs both ELF binaries (useful for command-line manipulation) and iNES 2.0 files. The contents of various fields in the iNES header can be controlled by setting the values of corresponding linker symbols. All the math to construct the eventual iNES 2.0 header is handled automatically.

Our targets are set up in a hierarchical fashion, so it can be as little as 50 lines to add a new one to the SDK. For example, both the NES-NROM-128 and NES-NROM-256 targets inherit most of their code/config from an incomplete NES-NROM target. Accordingly, I'd eventually like to collect pretty much all of the production boards (well, that anyone's interested in developing on, at least) into the SDK. If we set things up right, it shouldn't take too much maintenance overhead per board to keep them around, which should provide a nice out-of-the-box experience when developing for them.

Let me know if you have any questions about the NES targets, our plans, comments, critiques, etc. I'll add that we're perenially committed to improving the quality of generated code, although this competes with other concerns like correctness, portability, and maintainability. Still, at the end of the day, a compiler is only useful if it generates "good enough" code, and we want llvm-mos to be good enough for all but the tightest inner loops of an game or application (and maybe even those, someday.)
lidnariq
Posts: 11432
Joined: Sun Apr 13, 2008 11:12 am

Re: LLVM-MOS NES targets

Post by lidnariq »

Any support for structure rearrangement? (e.g. an array of 256 16-bit numbers being striped into two separate 8-bit arrays to use the faster instructions)
mysterymath
Posts: 6
Joined: Sat Apr 23, 2022 12:34 pm

Re: LLVM-MOS NES targets

Post by mysterymath »

lidnariq wrote: Sat Apr 23, 2022 2:34 pm Any support for structure rearrangement? (e.g. an array of 256 16-bit numbers being striped into two separate 8-bit arrays to use the faster instructions)
Not yet, but it's something we'd definitely like to build. Making the analysis safe is a bit tricky: you have to prove that no wide pointers to the interior of the array can escape the analysis, and you have to convert all pointer uses that don't escape.
lidnariq
Posts: 11432
Joined: Sun Apr 13, 2008 11:12 am

Re: LLVM-MOS NES targets

Post by lidnariq »

Would it be possible to add a special __attribute__ instead or in addition? A promise by the programmer that it will only be used in the optimized way?
mysterymath
Posts: 6
Joined: Sat Apr 23, 2022 12:34 pm

Re: LLVM-MOS NES targets

Post by mysterymath »

lidnariq wrote: Sat Apr 23, 2022 2:52 pm Would it be possible to add a special __attribute__ instead or in addition? A promise by the programmer that it will only be used in the optimized way?
Maybe, it's not something I've spent much time thinking about yet. We'd need a really precise definition of what the optimized way actually is, as it'd be undefined behavior if the programmer stepped out of line. Ideally, this would also be difficult to accidentally do.

That's why we've tended to shy away from hand annotation whenever possible; it adds to the number of things the programmer needs to keep in mind, and it decreases the compilers flexibility.

For example, you probably wouldn't want to do this optimization if the array was of length 257; if it were automatic, then there's no risk of the programmer forgetting to remove the annotation if they change the array size. That's why modern compilers almost completely ignore the register keyword, for example. They end up decreasing performance in practice, since their performance implications are complex, and incompletely understood by programmers.

Still, we do use some hand annotation; there will always be things that the compiler won't ever reasonably be able to figure out. I'd wager for this one, doing it automatically may only be around 1.5x or 2x harder than via annotation, but I'll know more once I get around to it.
User avatar
Dwedit
Posts: 4924
Joined: Fri Nov 19, 2004 7:35 pm
Contact:

Re: LLVM-MOS NES targets

Post by Dwedit »

One optimization I'd really like to see is elimination of variables on the stack.
This would be for code isn't recursive (either directly, or indirectly).
I could describe it better later if there's any interest.
Here come the fortune cookies! Here come the fortune cookies! They're wearing paper hats!
mysterymath
Posts: 6
Joined: Sat Apr 23, 2022 12:34 pm

Re: LLVM-MOS NES targets

Post by mysterymath »

Dwedit wrote: Mon Apr 25, 2022 12:36 am One optimization I'd really like to see is elimination of variables on the stack.
This would be for code isn't recursive (either directly, or indirectly).
I could describe it better later if there's any interest.
We actually do that one; we call it "static stack optimization". We analyze the call graph of each translation unit, and the stack frame of each function we can prove non-recursive is replaced with a global variable. We default to generating code at link time (i.e., link time optimimization), so a "translation unit" is typically the whole program.

Eventually, we'd like to allow stack stack regions for functions that cannot be simultaneously active to overlap. There's not much technical obstacle to doing so, I just haven't gotten around to it yet.
mysterymath
Posts: 6
Joined: Sat Apr 23, 2022 12:34 pm

Re: LLVM-MOS NES targets

Post by mysterymath »

I briefly mentioned this on the discord, but there's been two fairly big additions to LLVM-MOS's code generator since the last time I checked in.

First, static stack frames of different functions can now overlap if the functions can be proven to never simultaneously be active.

Second, the compiler now allocates the zero page! It scans the whole-program call graph, estimates how often each instruction in each function is called, estimates the cycle/byte savings of moving each possible global/local/constant to the zero page, then greedily allocates candidates best-first until the available zero page is consumed. Each target defaults to using all available zero page, but the amount the compiler can use can be capped with a compiler flag. As with static stack, zero page frames can overlap.

Here's an example. Note that only 25 bytes of zero page are used; foo does not conflict with bar, so they share the same region of the zero page. The large array in main is placed in a static stack in main memory, as usual. Sections that begin with '.zp' are automatically placed in the zero page by the SDK's linker scripts. In this example there's no real savings, but it does show off the semantics.

Code: Select all

static char * volatile global;

__attribute__((noinline)) void foo() {
  char foo_local[5];
  global = foo_local;
}

__attribute__((noinline)) void bar() {
  char bar_local[10];
  global = bar_local;
}

int main(void) {
  char main_local[15];
  char big_local[512];
  global = main_local;
  global = big_local;
  foo();
  bar();
  return 0;
}

Code: Select all

foo:
   ldx   #mos8(.Lfoo_zp_stk)
   ldy   #mos8(0)
   stx   global
   sty   global+1
   rts
bar:
   ldx   #mos8(.Lbar_zp_stk)
   ldy   #mos8(0)
   stx   global
   sty   global+1
   rts
main:
   ldx   #mos8(.Lmain_zp_stk)
   ldy   #mos8(0)
   stx   global
   sty   global+1
   ldx   #mos16lo(.Lmain_sstk)
   ldy   #mos16hi(.Lmain_sstk)
   stx   global
   sty   global+1
   jsr   foo
   jsr   bar
   ldx   #0
   txa
   rts
   .section   .bss.global,"aw",@nobits
global:
   .short   0
   .section   .zp.noinit..Lzp_stack,"aw",@nobits
.Lzp_stack:
   .zero   25
   .section   .noinit..Lstatic_stack,"aw",@nobits
.Lstatic_stack:
   .zero   512

.set .Lfoo_zp_stk, .Lzp_stack+15
   .size   .Lfoo_zp_stk, 5
.set .Lbar_zp_stk, .Lzp_stack+15
   .size   .Lbar_zp_stk, 10
.set .Lmain_zp_stk, .Lzp_stack
   .size   .Lmain_zp_stk, 15
.set .Lmain_sstk, .Lstatic_stack
   .size   .Lmain_sstk, 512
There's a few things that the current approach can't do, but overall it works pretty well. (The compiler doesn't have a notion of an 8-bit pointer yet, so it can't see any benefit to rewriting general pointer loops over arrays lifted to the zero page. This only applies if absolute indexed address mode wasn't selected, though.)

The next big "humans do this but compilers just don't" optimization is converting arrays of structs to structs of arrays. But I think it's important to pause at this point and start improving the SDK's libraries; a full suite of hardware registers for the NES is near the top of my list. I'll try to port over cc65's headers wherever appropriate so there's a degree of compatibility between the compilers.

Take care!
asie
Posts: 14
Joined: Sun Sep 22, 2019 10:41 pm

Re: LLVM-MOS NES targets

Post by asie »

I've joined the LLVM-MOS project in some capacity and would like to bump the thread to bring an update on NES support in LLVM-MOS, and its code generation in general, since July 2022. The full changelog is available here, as usual.

Code generation
  • Whole-program automatic zero page allocation - LLVM-MOS now automatically allocates global variables/constants, function local variables, and callee-saved registers to function-specific zero page locations whenever possible. There are also heuristics implemented to estimate and try to maximize benefit during selection of variables to be promoted in such a way.
  • Marking sections and variables as zero-page; right now this relies on __attribute__((section)), but work on proper C-side support (__zeropage address space) is ongoing.
  • Small memory copy/set operations are now properly inlined, instead of emitting an expensive library call.
  • Many minor and major code generation optimizations have been added.
NES targets

The NES targets have been completely reworked:
  • Many additional mappers are supported, and existing ones have been reworked. The list is now: CNROM, NROM, MMC1, MMC3, Action 53 (thanks to jroweboy) and UNROM; homebrew scene favorites UNROM-512 and GTROM are scheduled for the upcoming release. Most mappers now have test suites, which also serve as examples on how to use their banking functionality in code. Suggestions for additional mappers are welcome!
  • The iNES header information is now fully configurable using either an assembly-language file or C macros.
Other changes include:
  • The neslib, nesdoug and FamiTone2 libraries have been ported over and are now available in LLVM-MOS.
  • The .dpcm section has been added for correctly allocating DPCM sample data without hassle; in addition, a __dpcm_offset symbol is defined with the correct value for APU usage. (Note that on 32K mappers, like GTROM, each bank has its own .dpcm_N section.)
  • Many cc65 headers (nes.h, peekpoke.h) have been ported over to LLVM-MOS for easier code porting.
While my initial focus has been PC Engine/HuC6280 and 65816 support, I'm glad to have been able to help with the NES port somewhat. Special thanks to mysterymath for working on this project tirelessly for so long, and his infinite patience towards my less-than-stellar contributions. There are some more things I'd like to try, and I'll do my best to keep all of you updated.
User avatar
Memblers
Site Admin
Posts: 4044
Joined: Mon Sep 20, 2004 6:04 am
Location: Indianapolis
Contact:

Re: LLVM-MOS NES targets

Post by Memblers »

mysterymath, asie, anyone else who has worked on this project, I just wanted to say thanks for bringing this to the 6502, and including so many NES resources in the SDK. The results from the compiler are excellent.

I've attached a couple of simple demos. Shows how to build from a batch file in Windows, and how to include CHR-ROM data with an NROM program.

example11, by Shiru. Originally for cc65, the worst-case frames take up the entire frame. With LLVM-MOS, it takes a little over 1/3rd of the frame. Note that it builds with the wrong mirroring, thankfully this is much easier to configure with the newest SDK version, but I left it as it was.

ballsc, same benchmark test of naively-written array of structs code that I've run on cc65, vbcc6502, and now here. I think vbcc was getting 62 of 64 objects, LLVM-MOS is the first one to handle all 64, and has significant idle time left.. about 60 scanlines, more than 6K CPU cycles.

It's really cool to see support might be added for the unofficial DCP instruction. When I manually optimized neslib's vram_write and vram_read functions, I used DCP in there. AXS is also a nice one for incrementing X, inc by four is common when dealing with OAM.

I was wondering if I could help the project by adding optimizing info into the compiler, but looking through the LLVM docs, there's a lot to take in. I'd like to help out where I can, though.

I was wondering also, if it's worth considering including an "identity table" to extend the instruction set.
They are all 3-byte, 4-cycle instructions, but (for example) allowing something like SBC ident,y is 2 cycles faster than doing STY $00, SBC $00.https://www.nesdev.org/wiki/Identity_table
Attachments
example11-llvm-mos.zip
(7.91 KiB) Downloaded 37 times
ballsc.zip
(4.6 KiB) Downloaded 30 times
asie
Posts: 14
Joined: Sun Sep 22, 2019 10:41 pm

Re: LLVM-MOS NES targets

Post by asie »

Memblers wrote: Fri Sep 29, 2023 8:47 pm mysterymath, asie, anyone else who has worked on this project, I just wanted to say thanks for bringing this to the 6502, and including so many NES resources in the SDK. The results from the compiler are excellent.
Thank you for your long-time work for the NES scene, and I'm happy to have been able to add support for your mapper :-)

You might also find mysterymath's ports of the nesdoug examples to LLVM-MOS interesting.
It's really cool to see support might be added for the unofficial DCP instruction. When I manually optimized neslib's vram_write and vram_read functions, I used DCP in there. AXS is also a nice one for incrementing X, inc by four is common when dealing with OAM.
DCP for multiple-byte decrements has now been merged, and should be available in the next release. It will, however, require the -mcpu=mos6502x switch while compiling; we don't enable unofficial instructions by default. (As a side-note, we've also optimized multi-byte decrements for official 6502 opcode users.)

AXS is a little tricker to add support for.

(As another side-note, as a newcomer to NES development, I'm pretty sure it was this tweet which initially set me on the path of adding support for them. Thank you!)
I was wondering if I could help the project by adding optimizing info into the compiler, but looking through the LLVM docs, there's a lot to take in. I'd like to help out where I can, though.
The LLVM-MOS Discord server is where most of the chat happens. However, even without contributing to the code itself, identifying places which require optimization and documenting them (providing good test cases as issues on the GitHub repository, there's already some open) is a great help in itself. If you'd like to tackle LLVM itself nonetheless, I highly recommend following the video resources linked at the LLVM-MOS wiki as a start - that's what I did, and they really helped provide a "bird's eye" view of the target backend architecture. I also recommend playing around with the godbolt Compiler Explorer - it has support for many LLVM-MOS targets, and "Add New... -> LLVM Opt Pipeline" can be used to explore all of the compiler passes performed on the code in a (relatively) user-friendly manner.

Alternatively, contributing to the SDK side of things - libraries, non-LLVM tooling - would be of great help too. While llvm-mos-sdk mostly as a set of low-level implementations for many targets, a kind of starting point, there's almost certainly demand for wrapping the LLVM-MOS tooling around a more opinionated set of libraries and tools, providing an actual ready-to-use NES developer workflow. Cogwheel over on our Discord has been looking into this as an option, after working on some neslib optimizations - you may find that of particular interest.
I was wondering also, if it's worth considering including an "identity table" to extend the instruction set.
I don't see a reason not to support it, especially as we already implement an "identity table" (of dynamic size) for mappers with bus conflicts. I have opened a relevant issue, but I cannot promise an ETA.
Post Reply