NES boot loader specification

Discuss technical or other issues relating to programming the Nintendo Entertainment System, Famicom, or compatible systems.

Moderator: Moderators

User avatar
blargg
Posts: 3717
Joined: Mon Sep 27, 2004 8:33 am
Location: Central Texas, USA
Contact:

NES boot loader specification

Post by blargg »

I've just completed a preliminary version of the NES boot loader specification, along with implementations.
NES boot loader specification
NES boot loader usage
A boot loader is a tiny program which receives a larger program from a PC connected to the NES via RS-232 at 57600 bits per second. The larger program is loaded into zero-page and executed there, where it can then communicate with the PC to determine what to do next. The format and protocol include a checksum, but still allow a very small implementation that does no checking. The smallest I've come up with is 30 bytes. Other implementations are included on the usage page.

Code: Select all

        ; NTSC version
        ldx #0          ; Number of bytes received
byte:   lda #$01
start:  bit $4017       ; Wait for start bit
        beq start
        lsr             ; A = 0
        nop
dbit:   ldy #3          ; Delay between bits
        lsr $4017       ; Read bit. First time reads 1 for start bit.
dly:    dey             ; Delay
        bne dly
        rol a           ; Move bit into shift register
        sta 0,x         ; Delay, and store received byte on final iter
        bcc dbit
        inx
        bne byte
        jmp $0007       ; Execute received code
EDIT: updated for slight format change.
Last edited by blargg on Wed Sep 08, 2010 11:45 am, edited 1 time in total.
User avatar
Dwedit
Posts: 4470
Joined: Fri Nov 19, 2004 7:35 pm
Contact:

Post by Dwedit »

So this could work with a Game Genie and any cartridge with battery backed SRAM? So then you can develop and run games which run entirely from the SRAM area, and take advantage of the CHR RAM built into the cartridge. Of course, you'd need to override the vectors.

But I'm a total hardware klutz, don't have soldering irons laying around nor random resistors, and can't build the cable.
Here come the fortune cookies! Here come the fortune cookies! They're wearing paper hats!
User avatar
blargg
Posts: 3717
Joined: Mon Sep 27, 2004 8:33 am
Location: Central Texas, USA
Contact:

Post by blargg »

It would work as long as you could come up with a Game Genie patch that causes execution of SRAM. Run game in debugger to find when it first enables SRAM, then find a JMP/JSR instruction close thereafter and patch its high byte to $60, then have SRAM with page $60 filled with $EA, and the boot loader beginning on the next page. For vector overrides, you can do the same; patch the high bytes to $07, then put JMP instructions where they happen to point in page 7 of RAM. The above would nicely fit in three Game Genie patch slots as well.

The main snag is that you need to find some way to initially get the boot loader into SRAM. If you could have someone put it on an EPROM and replace the ROM with that, then you'd have a really cheap devcart. Of course if you're going to the trouble of replacing the ROM, you might as well put a Flash ROM there instead, as it's virtually the same amount of rewiring and chip cost.
tepples
Posts: 22345
Joined: Sun Sep 19, 2004 11:12 pm
Location: NE Indiana, USA (NTSC)
Contact:

Post by tepples »

Or we could make a tacit agreement to include a bootloader like this in our homebrews so that people can boot by keying a code into the title screen of a repro.
User avatar
blargg
Posts: 3717
Joined: Mon Sep 27, 2004 8:33 am
Location: Central Texas, USA
Contact:

Post by blargg »

Nifty idea. It would also allow backing up/restoring battery-backed SRAM to a PC connected via second controller serial, without having to do any hotswapping. It would be desirable to support this in one's homebrew cartridge release, because it adds value with very little extra implementation cost.

You can even put such communication code in the title screen's main loop, where it merely checks for activity on D0 of the second controller. If found, it enters the boot loader. Then you have the PC send some $FF sync bytes before the program block, to give time for input to be detected and the boot loader to be started. This way you can boot the cartridge and begin sending a program, without having to do anything on the controller (this is how the Munchausen menu works in the recently-posted video).

Right now I'm working on a redone secondary loader that accepts small blocks of code, executes them, and can be re-entered. This allows easy uploading of data to any part of NES RAM, and execution of code to program things into Flash, configure an MMC chip, load CHR RAM, or whatever. I've implemented all this before, but not with this revamped boot loader design that I posted.
User avatar
clueless
Posts: 496
Joined: Sun Sep 07, 2008 7:27 am
Location: Seatlle, WA, USA

Post by clueless »

Would anyone be willing to modify a NES emulator to support connecting the 2nd controller port to some sort of virtual serial port (named pipe on windows or unix socket on unix)? Kind of like how VMWare workstation will let you attach a guest's emulated serial port to some logical "device" on the host that implements simple character IO (named pipes, sockets, file handles and real serial ports).

That way homebrew carts can test this proposed functionality...

Blargg, would you be willing to license your boot loader code very permissively (I'm thinking BSD or equivalent) so that we can place it into our homebrew carts without needing to GPL the entire cart?

I could see adding some attribution to the "credits" screen if desired.
User avatar
blargg
Posts: 3717
Joined: Mon Sep 27, 2004 8:33 am
Location: Central Texas, USA
Contact:

Post by blargg »

byuu has probably done the closest to emulating serial in an emulator. He's made some sort of library for treating it the same as a serial port, so a PC-side program can communicate with the emulator as if it were the real thing.

And yeah, the boot loader code should be licensed modified BSD/MIT/zlib style for sure. No credit needed, just be sure to mention where someone interested in the code might find it by at least mentioning a name or something someone can search for.

I was hoping for more discussion of the boot loader itself, including its design and implementation, to iron out any problems before it gets put in cartridges. Once I implement some things with it I'll have a better idea of any problems.
User avatar
clueless
Posts: 496
Joined: Sun Sep 07, 2008 7:27 am
Location: Seatlle, WA, USA

Post by clueless »

The boot loader looks well thought out. It really reminds me of the compactness and power of the Apple ][ disk boot loader ($c600-$c6ff).

(briefly going off topic...)

I once had a text file of a heavily commented disassembly of the ROM boot loader and first two stages of DOS 3.3. The document explained all of the "tricks" used in the loaders, especially how the stage-1 DOS 3.3 loader will copy code from the disk II ROM (the GCR decoder IIRC).

I can't find it, nor any online copies (to cite), but I found this while searching for it:
http://home.comcast.net/~mjmahon/AppleCrateII.html

A 17-node Apple II parallel computer... wow.
User avatar
blargg
Posts: 3717
Joined: Mon Sep 27, 2004 8:33 am
Location: Central Texas, USA
Contact:

Post by blargg »

OK, one possible breaking change. The fact that the code begins at 7 in zero-page, but in the program block it begins at 4 is bothering me. It makes it just a little bit harder to understand. I'm trying to work it out so that the program block has a header only. It's just that this might add a byte to one of the larger loaders. I know it matters little, but I'm still obsessing over it. If this change works out, the format has one less thing someone could object to. The format I'm aiming for is this: 4-byte signature, 8-bit checksum, 16-bit CRC, 249 bytes of user data.
User avatar
blargg
Posts: 3717
Joined: Mon Sep 27, 2004 8:33 am
Location: Central Texas, USA
Contact:

Post by blargg »

OK, I made the above change. Sorry about breaking the spec already, but this removes some little conceptual snags and simplifies the specification.

The secondary loader that allows remote procedure calls is coming along very nicely.
User avatar
p1xl
Posts: 32
Joined: Sat May 15, 2010 4:13 pm
Location: U.S./Canada
Contact:

Post by p1xl »

Very nice work on the bootloader, Blargg. And I really like the idea of supporting a developer and getting a game and a programmable cart.

Is it possible to have a second stage boot loader that writes directly to CHR-RAM? If so, something 'official' would be nice. That way you could write little ram games without having to worry about compressing tiles into your code space.
User avatar
blargg
Posts: 3717
Joined: Mon Sep 27, 2004 8:33 am
Location: Central Texas, USA
Contact:

Post by blargg »

Yeah, the secondary one I'm rebuilding will be fabulous. First version I've been using is really fun to program from C. You basically get a clean API for accessing the NES, for example write_chr( addr, ptr, size ) and it writes that from your C program to the NES CHR. Internally it just does a generalized RPC, sending the NES code for a small routine that loads the CHR, along with the CHR data. When that returns, this secondary loader is running, waiting for the next RPC call. Just to be clear, this isn't for writing games or anything (the latency would be too great), just for manipulating the NES hardware/loading things from the host in a very streamlined fashion.
User avatar
thefox
Posts: 3139
Joined: Mon Jan 03, 2005 10:36 am
Location: Tampere, Finland
Contact:

Post by thefox »

Hows the RPC API coming along, blargg? :)
User avatar
Memblers
Site Admin
Posts: 3902
Joined: Mon Sep 20, 2004 6:04 am
Location: Indianapolis
Contact:

Post by Memblers »

Why not do 256 bytes of data, 249 byte blocks is kind of a weird size, I would think that the header and error-checking stuff will discarded immediately after it checks out OK. XMODEM by comparison is 128 bytes of data + 2 bytes of CRC-16, which is super easy to handle - no problems crossing page boundaries.

I've been thinking about this lately, about hooking this up on the expansion port version of my Squeedo board. Given the choice between synchronous SPI and async UART, I'm definitely going to try synchronous. What I'm hoping would work, is on MCU it could do an async bit-bang to be compatible with just the initial loader. After it gets the proper comms code loaded from that, then it should be OK to use any kind of hardware whatsoever, right?

See any potential problems with this idea? Seems OK, as far as I can tell so far.

EDIT - Sorry, nevermind what I said about the block size, I wasn't considering that it's only one block, heheh. Still seems a little odd, but any arbitrary amount is fine in that case.

I guess my biggest concern (with my hardware as I imagine it), is wondering wtf happens if a controller is in port 2 at the same time the MCU (or anything for that matter) is bit-banging the same lines on the expansion port.. On the expansion port though it would be really easy to move to the other bits. I kind of wish the "standard" serial adapter didn't use D0. So I'll have to look for a work-around, probably.

Despite whatever issues I may or may not run into, a standard bootloader like this is a really great idea. I figured XMODEM would just be the standard (as it has been since before I was even born), but XMODEM has it's faults (no filename, or filesize given, no auto-start transfers) so this could handle things a lot better, while still being standard enough.
User avatar
kyuusaku
Posts: 1665
Joined: Mon Sep 27, 2004 2:13 pm

Post by kyuusaku »

I hacked out some NES->PC code, maybe it's of some use to someone. It's a little big but it sends up to 64k as fast as possible (8N1 @ 57600 baud, no gaps in between bytes) while generating a 16-bit checksum.

Code: Select all


dw count         ; byte count, $0000 will send 64k
dw tcheck        ; temp checksum
dw checksum   ; final checksum
dw ptr             ; start address
db byte           ; holds read byte
db invert         ; set to $FF for direct connection, $00 for MAX232/FTDI


    ldy #0
    sty <tcheck        ; temp checksum
    sty >tcheck
startbit:
    ; pla, branch to here  6
    lda <count    ; 3
    beq skip    ; 2
    nop        ; 2
    dec <count    ; 5
    jmp here    ; 4
skip:            ; 3
    dec <count    ; 5
    dec >count    ; 5
here:            ; ----- 16
    nop        ; 2
    lda invert    ; 3
    sta $4016    ; 4 ---- 31


    lda (ptr),y    ; 5
    sta byte    ; 3
    inc <ptr    ; 5
    beq one        ; 2
    nop        ; 2
    jmp two        ; 4
one:            ; 3
    inc >ptr    ; 5
two:            ; ------ 13
    lda byte    ; 3
    eor invert    ; 3
    sta $4016    ; 4 ---- 31


`    lda byte    ; 3
    clc        ; 2
    adc <tcheck    ; 3
    sta <tcheck    ; 3
    lda >tcheck    ; 3
    adc #0        ; 2
    sta >tcheck    ; 3
    lda byte    ; 3
    eor invert    ; 3
    lsr a        ; 2
    sta $4016    ; 4 ---- 31 
    
    ; waste 3 cycles
    pha        ; 3
    ldx #7        ; 2
loop:
    pha    ; 3
    pla    ; 3
    pha    ; 3
    pla    ; 3
    pha    ; 3
    pla    ; 3
    nop    ; 2 -- 20
    lsr a        ; 2
    sta $4016    ; 4 -- 31
    dex        ; 2
    bne loop    ; 3

stopbit:
            ; 2 added cycles from bne
    lsr a        ; 2
    sta byte    ; 3 --- byte should be clear, maybe useful
    pla        ; 3
    lda <count    ; 3
    ora >count    ; 3 --- this is a done flag
    pha        ; 3
    lda invert    ; 3
    eor #1        ; 2    
    sta $4016    ; 4 -- 31

    pla        ; 3
    bne startbit
    lda <tcheck
    sta <checksum
    lda >tcheck
    sta >checksum
    rts
Last edited by kyuusaku on Thu Sep 16, 2010 7:42 am, edited 1 time in total.
Post Reply