My attempt at RLE compression

Discussion of hardware and software development for Super NES and Super Famicom.

Moderator: Moderators

Forum rules
  • For making cartridges of your Super NES games, see Reproduction.
User avatar
dougeff
Posts: 2876
Joined: Fri May 08, 2015 7:17 pm
Location: DIGDUG
Contact:

Re: My attempt at RLE compression

Post by dougeff »

So do I have to do fancy bank switching, or do long references (lda $7fe000) work fine?
It looks from tepples example, that you would be reading from the ROM, decompressing to the WRAM, and then DMA ing to the VRAM.

I would assume you would use...STA long,X or something, for the second part.
nesdoug.com -- blog/tutorial on programming for the NES
User avatar
bazz
Posts: 476
Joined: Fri Sep 02, 2011 8:34 pm
Contact:

Re: My attempt at RLE compression

Post by bazz »

Thanks Tepples for correcting me, Sysram is 128KB ..

all other comments are on the right track
SNES Tutorials (WLA DX)
SNES Memory Mapping Tutorial (Universal / LoROM) -- By Universal I introduce how memory mapping works, rather than just provide a LoROM map.
SNES Tracker (WIP) - Music/SFX composition tool / SPC Debugger
User avatar
bazz
Posts: 476
Joined: Fri Sep 02, 2011 8:34 pm
Contact:

Re: My attempt at RLE compression

Post by bazz »

I just want to happily report that I wrote the asm code to stream RLE joypad log data to my game's joypad routine for its demo mode. It works :). It only required an extra byte of RAM to track the count -- and it made my data 1/4 the size :)

I also wrote a RLE decompression program for python3 based on nicklausw's scheme.

Code: Select all

#!/usr/bin/env python3
# basic rle decompression program
# by bazz
# public domain.

# There is no error detection

import sys
import argparse
import struct

parser = argparse.ArgumentParser()
parser.add_argument("in_file", help="the RLE data to be decompressed")
args = parser.parse_args()

with open(args.in_file, 'rb') as f_in:
  while True:
    count = int.from_bytes(f_in.read(1), byteorder='little')
    if count == 0xFF:
      break
    byte = f_in.read(1)
    while count > 0:
      sys.stdout.buffer.write(byte)
      count -= 1
I am a big fan of writing output to terminal, and deciding if I want to pipe output to other programs or to actual files. That's why I do not have the decompressor write explicitly to a file. I modified my local compressor to behave the same way.
SNES Tutorials (WLA DX)
SNES Memory Mapping Tutorial (Universal / LoROM) -- By Universal I introduce how memory mapping works, rather than just provide a LoROM map.
SNES Tracker (WIP) - Music/SFX composition tool / SPC Debugger
User avatar
nicklausw
Posts: 376
Joined: Sat Jan 03, 2015 5:58 pm
Location: ...
Contact:

Re: My attempt at RLE compression

Post by nicklausw »

Success! But now...Question time!

1. On the Z80, pushing and pulling from the stack is discouraged in loops because it's slow. Is it discouraged on the SNES? Because the only way I could get long RAM addressing while not making the rest of the code problematic was this:

Code: Select all

rle_inter:
  phx
  phy
  plx
  sta $7e2000,x   ; this is only writes the low byte.
  phx
  ply
  plx
  iny
  dex
  bne rle_inter
  sty rle_cp_num
  ply
  bra rle_loop_done
2. How do you get CA65 to figure out that you're trying to use long addressing? This is why I had to write out $7e2000 above. EDIT: figured this one out. CA65 says it's "far" addressing.
tepples
Posts: 22345
Joined: Sun Sep 19, 2004 11:12 pm
Location: NE Indiana, USA (NTSC)
Contact:

Re: My attempt at RLE compression

Post by tepples »

nicklausw wrote:On the Z80, pushing and pulling from the stack is discouraged in loops because it's slow. Is it discouraged on the SNES?
A 16-bit push takes 4 cycles, and a 16-bit pull takes 5.
Because the only way I could get long RAM addressing while not making the rest of the code problematic was this:

Code: Select all

rle_inter:
  phx
  phy
  plx
  sta $7e2000,x   ; this is only writes the low byte.
  phx
  ply
  plx
  iny
  dex
  bne rle_inter
  sty rle_cp_num
  ply
  bra rle_loop_done
I'd have to see it in context to see whether you could rearrange the use of registers to minimize stack use, such as using Y for the source (especially using [dd],Y addressing) and X for the destination.
2. How do you get CA65 to figure out that you're trying to use long addressing? This is why I had to write out $7e2000 above. EDIT: figured this one out. CA65 says it's "far" addressing.
To force far (24-bit) addressing, prefix the address with f:.
User avatar
nicklausw
Posts: 376
Joined: Sat Jan 03, 2015 5:58 pm
Location: ...
Contact:

Re: My attempt at RLE compression

Post by nicklausw »

K then, I'll give the function more context.

variables: (I'm using your cfg file)

Code: Select all

.segment "ZEROPAGE"
  rle_cp_ram: .res 2
  rle_cp_num: .res 2
  
.segment "BSS7E" : far
  rle_cp_dat: .res 8192 ; 8 KB?
Function:

Code: Select all

.proc rle_copy_ram
  setxy16
  seta8
  ldy #$00
  sty rle_cp_num
  lda #0   ; clear Accum. hi-byte
  xba

 loop:
  lda (rle_cp_ram), y
  cpa #$ff
  beq done
  tax
  iny
  lda (rle_cp_ram),y
  bra rle_loop
rle_loop_done:
  iny
  bra loop
 
done:
  rtl

; IN: X = count
;      A = byte
rle_loop:
; INCOMPLETE / WILL NOT WORK (for above mentioned reasons in forum post)
; Please finish reading forum post and reply so we can work out your needs
  phy
  ldy rle_cp_num
rle_inter:
  phx
  phy
  plx
  sta rle_cp_dat,x   ; this is only writes the low byte.
  phx
  ply
  plx
  iny
  dex
  bne rle_inter
  sty rle_cp_num
  ply
  bra rle_loop_done
.endproc
Function being called, I guess:

Code: Select all

; copy font
  setaxy16
  lda #font & $ffff
  sta rle_cp_ram
  jsl rle_copy_ram
tepples
Posts: 22345
Joined: Sun Sep 19, 2004 11:12 pm
Location: NE Indiana, USA (NTSC)
Contact:

Re: My attempt at RLE compression

Post by tepples »

Assuming rle_cp_ram is the source address, and the data is in the current data bank. If so, rle_cp_src might be a clearer name.

In this loop, you may want to put Y (source index) on the stack first so that you can use Y as a remaining length counter.

See if this (untested) makes any sense:

Code: Select all

.segment "ZEROPAGE"
  rle_cp_src: .res 2    ; Renamed: serves as pointer (within current bank) to compressed data
  rle_cp_index: .res 2  ; Renamed: serves as index into rle_cp_dat
 
.segment "BSS7E" : far
  rle_cp_dat: .res 8192 ; 8 KB?

;;
; Decompresses data to rle_cp_dat using a simple RLE scheme.
; @param DBR:rle_cp_src pointer to compressed data
; @return rle_cp_index = 4
.proc rle_copy_ram
  setxy16
  seta8
  ldy #$00
  sty rle_cp_index
  tya  ; clear low and high bytes of accumulator

 loop:
  lda (rle_cp_src),y
  cpa #$ff  ; RLE data is terminated by a run length of $FF
  beq done  ; But what does a run length of 0 do?
  tax
  iny
  lda (rle_cp_src),y
  iny
  phy

  ; At this point, Y (source index) is saved on the stack,
  ; A is the byte to write, and X is the length of the run.
  txy
  ldx rle_cp_index
  ; And here, Y is the length of the run, A is the byte to write,
  ; and X is the index into the decompression buffer.
rle_inter:
  sta rle_cp_dat,x
  inx
  dey
  bne rle_inter

  stx rle_cp_index
  ply  ; Restore source index
  bra loop
 
done:
  rtl
.endproc
I wonder if it could be made even faster by DMAing that byte from ROM to the WRAM's B bus port instead of using this CPU fill.
User avatar
nicklausw
Posts: 376
Joined: Sat Jan 03, 2015 5:58 pm
Location: ...
Contact:

Re: My attempt at RLE compression

Post by nicklausw »

The code worked after I added a little something, because x's higher bit never gets cleared before transferring to y (causing way too much byte copying):

Code: Select all

; At this point, Y (source index) is saved on the stack,
  ; A is the byte to write, and X is the length of the run.
  txy
  
  ; additional code starts here
  
  ; no higher byte!
  pha
  seta16
  tya
  and #$ff
  tay
  seta8
  pla
Nothing worse about assembly programming than learning the limitations of registers.

Does anyone know of any documents that explain more about what exactly DMA does? Because as far as I know, it just does super fast data transfers and no one really questions why.
User avatar
nicklausw
Posts: 376
Joined: Sat Jan 03, 2015 5:58 pm
Location: ...
Contact:

Re: My attempt at RLE compression

Post by nicklausw »

Thanks. I wasn't looking for anything exactly, just wanted to know how DMA works. So basically it reads data without passing by the CPU first?

Also, here's the ROM for what I've been working on throughout this thread. It's a..."platformer". Quotes because there's no sprites yet, and even once they're there, it's not really designed to be scrolling.

I don't want to release source right now, so I'll just say this: the ROM...
Initializes the SNES registers (thanks koitsu),
Loads a little palette through DMA (thanks tepples and AntonioND for bgr macro),
Decompresses some RLE data to RAM and DMA's to VRAM (thanks tepples, bazz, (and myself [semi-sarcasm?] because this is the only part where I got kind of original)),
Draws stuff to the screen (thank-yous already handled for that),
and fades in the screen (I actually figured this out myself).
Extra thank you's to tepples for various stuff like the cfg file.

The point I'm trying to make with all these weird thank-yous is that this is one crazy system. :lol:
Attachments
platformer.sfc
(256 KiB) Downloaded 97 times
User avatar
nicklausw
Posts: 376
Joined: Sat Jan 03, 2015 5:58 pm
Location: ...
Contact:

Re: My attempt at RLE compression

Post by nicklausw »

Update, here's a git repo with it all. The set-up is a little weird, so if for some reason you want to play with it then be sure that the makefile suits your environment.
User avatar
nicklausw
Posts: 376
Joined: Sat Jan 03, 2015 5:58 pm
Location: ...
Contact:

Re: My attempt at RLE compression

Post by nicklausw »

Another update I guess! The RLE now handles many individual symbols.

Used to, this:

Code: Select all

00 01 00 01 00 01
Would basically be expanded to this during "compression":

Code: Select all

01 00 01 01 01 00 01 01 01 00 01 01
Now, longer statements are available with the format of:
$fe: to warn the decomp routine to direct copy these bytes.
$xx: the next byte is the number of bytes ahead to copy.
And then there's the bytes.

So now these bytes would become:

Code: Select all

fe 06 00 01 00 01 00 01
This reduced the size of the font in ROM by 2 KB. Not a crazy amount given the size available on a SNES cartridge, but this is more complicated than I thought I'd be willing to get, so I'm happy.

It's all in the repo.
Post Reply