So, I thought it would be fun to dump the text out of the ROM. At first I tried a straightforward search for known words with various offsets, and I found that some of the text is actually just uncompressed ASCII-ish strings right in the ROM, but not the level introductions. After digging in with a debugger/trace I eventually discovered that most of the text of the game is compressed with a huffman scheme.
I wrote a python script that will dump all the text of the ROM for you, and it's very closely based on the assembly code that actually does the decompression.
http://rainwarrior.ca/projects/nes/battletoads_text.py
I've disassembled and annotated the heart of the decompressor in the original ROM:
Code: Select all
; Battletoads text decompression code
;
; taken from 32k ROM bank 6 at memory $E60F
; dissassembled with DISASM6
; edited and annotated by Brad Smith 2/11/2012
; http://rainwarrior.ca
; The code at $E60F represents the main text decompression loop.
; It is a subroutine that will read bits from a bitstream
; and return a decoded byte in register A.
; $1B stores the current byte, each bit read shifts it left
; $1C/$1D is a pointer to the next byte of the stream (with a -7 byte offset)
; $1E counts how many bits remain in the current loaded byte
__E60F: LDX #$00 ; X stores current tree node index
__E611: DEC $1E ; $1E counts bits left in the current byte at $1B
BPL __E623 ; if we're out of bits, load a new byte
LDY #$07 ;
STY $1E ; reset $1E to tell us there are 8 available bits
LDA ($1C),Y ; load byte from indirect $1C/$1D address + 7
STA $1B ; store byte in $1B
INC $1C ; increment stream byte position ($1C/$1D)
BNE __E623 ;
INC $1D ;
__E623: ASL $1B ; shift the high bit out of $1B and...
BCS __E631 ;
LDA __E63B,X ; if bit is 0 load next node from $E63B table
TAX ;
BPL __E611 ; loop to next bit if node < 128
LDA __E611,X ;
RTS ; return byte from $E691 + (node - 128)
__E631: LDA __E666,X ; if bit is 1 load next node from $E666 table
TAX ;
BPL __E611 ; loop to next bit if node < 128
LDA __E611,X ;
RTS ; return byte from $E691 + (node - 128)
__E63B: ... ; 43 byte table, tree node index if bit is 0
__E666: ... ; 43 byte table, tree node index if bit is 1
__E691: ... ; 44 byte table, output symbol lookup
; note that the $E691 table is looked up from $E611 with an index that has 128 added
Overall this compressed about 10k of text down to about 6k (plus ~300 bytes of decompressor code and tables). Since this is an AxROM mapper, the whole 32k bank gets switched at once, so you need to keep your code and data together; this bank seems to be dedicated to all the cutscenes and title screen. I can imagine that saving that extra 4k of space was useful, especially since you need space for your graphics because it's a CHR-RAM mapper.
There are a few interesting optimization hacks in there; there seems to be a bit of need to keep this inner loop efficient. The code actually tends to decompress and discard several strings, usually, before it gets to the one it wants to use. One hack is the indirect index address being used is off by 7. Y is used to reset the $1E bit counter, I think specifically so that it will be set to a known number (7) before using it for the indirect index lookup. Another hack is the table at $E691 is actually addressed as $E611 and the high bit that indicates to use that table it just used as part of the index. Saves an AND, I guess. I had to wonder why it was looking up a symbol from the code itself before I realized X was always >= 128.
Anyhow, I thought you guys might enjoy seeing a real-world huffman coding implemention for the NES.
Image of Battletoads huffman tree
Text dump of the script