Page 1 of 2

My attempt at RLE compression

Posted: Sat Apr 09, 2016 2:31 pm
by nicklausw
I wanted a reason to practice my python skills, and so I've tried to make my own RLE compression thingy for the SNES.

The way the format works is like this. Say you have these bytes:

Code: Select all

00 ff 00 00 ff ff 00 00 00 00 00 00 ff ff ff ff ff ff
The new file is made up of pairs of two: one byte for how many of the second byte should be copied.
An FF is appended at the end to mark where the routine should stop copying, so the output would be:

Code: Select all

01 00 01 ff 02 00 02 ff 06 00 06 ff ff
The .py file does this perfectly:

Code: Select all

#!/usr/bin/env python3
# basic rle compression program
# by nicklausw

# this format works as follows.
# one byte is read, and that is
# how many of the next byte to copy over. repeat.
# when the first byte is $ff, the copy is done.

import sys
import argparse

parser = argparse.ArgumentParser()
parser.add_argument("in_file", help="the chr data to be rle'd")
parser.add_argument("out_file", help="the rle'd output")
args = parser.parse_args()

f_in = open(args.in_file, 'rb')
f_out = open(args.out_file, 'wb')

# python doesn't need variable initialization i think,
# but for clarity's sake.
byte_n = 0 # byte
byte_c = 1 # byte count 

for c in f_in.read():
  if c != byte_n or byte_c == 0xFE:
    f_out.write(bytes([byte_c]))
    f_out.write(bytes([byte_n]))
    byte_c = 1
    byte_n = c
  else:
    byte_c += 1

# now catch the last time around
f_out.write(bytes([byte_c]))
f_out.write(bytes([byte_n]))

# the end mark
f_out.write(bytes([0xFF]))

f_in.close()
f_out.close()
The problem is the routine to copy the bytes to vram:

Code: Select all

.proc rle_copy_ppu
  setxy16
  seta8
  ldy #$00

 loop: 
  seta8
  lda (rle_cp_ram), y
  cpa #$ff
  beq done
  seta16
  and #$ff
  tax
  iny
  seta8
  lda (rle_cp_ram),y
  jsr rle_loop
  iny
  jmp loop
  
done:
  rts

rle_loop:
  seta16
  ;and #$ff
 loop2:
  sta PPUDATA
  dex
  bne loop2
  rts
.endproc
It's called like this:

Code: Select all

; copy font
  setaxy16
  stz PPUADDR  ; we will start video memory at $0000
  lda #font
  sta rle_cp_ram
  jsr rle_copy_ppu
Now, this is the expected output:
Image

And this is the actual output:
Image

I'll give more info on my code if need be, but can anyone see anything wrong that might cause an error in things being copied over?

Re: My attempt at RLE compression

Posted: Sat Apr 09, 2016 2:43 pm
by bazz
I don't have time to look in-depth at your code, but I noticed that you are only using a byte for the count. So, if your original data has more than 254 repetitions (because you are also using 255 as a terminator), then you will need to accommodate this. Rather than make the count a 16-bit value and consistently waste space, you can just use a "divide and modulus" strategy to make combination entries to create the proper total.

EDIT: (Actually, your code is already handling this just fine!)

EDIT: supplying the actual data you are using for the RLE (2bpp font) would be helpful too.

Re: My attempt at RLE compression

Posted: Sat Apr 09, 2016 2:46 pm
by tokumaru
That's an incredibly wasteful variation of RLE, seeing as every non repeated byte gets a "1" inserted in front of it! I wouldn't be surprised if this caused expansion instead of compression in a lot of cases!

The most common solution is to specify runs of uncompressed bytes, just like you represent runs of compressed bytes. You can use bit 7 to specify whether a run is compressed or uncompressed, and bits 0-6 to specify the length, from 1 to 128. Or you can assume that compressed and uncompressed runs will alternate, and use all 8 bits for the length. In this case, you'd need to allow runs of 0 length, which you'd use between two runs of the same type.

Another option is to use auto-triggered RLE, where a length is only specified after 2 (or 3) repeated bytes show up, and more are assumed to come. This assumption might be wrong, in which case the length will be 0.

Re: My attempt at RLE compression

Posted: Sat Apr 09, 2016 2:57 pm
by nicklausw
Bazz: good point, but it doesn't affect the file I'm RLE-ing.

tokumaru: thanks for the suggestion, but if I can't even get this basic RLE to work then why over-complicate it more?

Anyway, I fixed the python file, the first byte was copied either once too many times if it's 0 or once too few if not. The bug is still present.

Code: Select all

#!/usr/bin/env python3
# basic rle compression program
# by nicklausw
# public domain.

# this format works as follows.
# one byte is read, and that is
# how many of the next byte to copy over. repeat.
# when the first byte is $ff, the copy is done.

import sys
import argparse

parser = argparse.ArgumentParser()
parser.add_argument("in_file", help="the chr data to be rle'd")
parser.add_argument("out_file", help="the rle'd output")
args = parser.parse_args()

f_in = open(args.in_file, 'rb')
f_out = open(args.out_file, 'wb')

# python doesn't need variable initialization i think,
# but for clarity's sake.
byte_n = int.from_bytes(f_in.read(1), byteorder='little') # byte
byte_c = 0 # byte count

f_in.seek(0) 

for c in f_in.read():
  if c != byte_n or byte_c == 0xFE:
    f_out.write(bytes([byte_c]))
    f_out.write(bytes([byte_n]))
    byte_c = 1
    byte_n = c
  else:
    byte_c += 1

# now catch the last time around
f_out.write(bytes([byte_c]))
f_out.write(bytes([byte_n]))

# the end mark
f_out.write(bytes([0xFF]))

f_in.close()
f_out.close()

Re: My attempt at RLE compression

Posted: Sat Apr 09, 2016 3:07 pm
by bazz
tokumaru wrote:That's an incredibly wasteful variation of RLE, seeing as every non repeated byte gets a "1" inserted in front of it! I wouldn't be surprised if this caused expansion instead of compression in a lot of cases!
It can only be considered wasteful because people here are generalizing (OP is not designing for any particular dataset or hasn't communicated that) -- and it's been on my mind how I believe a lot of concepts are discussed in this forum "generally" -- when we all know everything changes when a specific project provides actual constraints.

Now is a good time to mention "use-case." I bring this up because there's a difference between designing a general RLE or an RLE specific to a dataset. This same notion applies to almost any other part of development.

Personally, I am going to use RLE for my joypad recordings that run my game's demo mode, and it is likely to ALWAYS have repeats -- in which case, using a bit to specify uncompressed or compressed data would actually be wasteful (waste half a byte per entry). So always consider your use-case if you are designing for some specific dataset.

On the other hand, I'm not sure if there are better compression algos for such a dataset.

Let's also consider the size of the decompression routine in ROM compared to the volume of compressed data we will be using, and ensure there is still a sizable benefit.

Re: My attempt at RLE compression

Posted: Sat Apr 09, 2016 3:23 pm
by dougeff
Edit...somehow I missed that you were talking about graphic compression for SNES. Never mind.

Re: My attempt at RLE compression

Posted: Sat Apr 09, 2016 3:36 pm
by bazz
tokumaru: thanks for the suggestion, but if I can't even get this basic RLE to work then why over-complicate it more?
I am proud of you.

I am going to take a closer look at it now

Re: My attempt at RLE compression

Posted: Sat Apr 09, 2016 4:00 pm
by bazz
no reason to make all of 8 / 16 / 8 bits changes to the accumulator.. just clear hi-8 before start loop

Next -- the problem appears to be in the vram write loop -- you only load the accumulator with a byte, but you are writing a whole word to VRAM, with the high byte as zero.. Is this intentional???? I thought you wanted to copy the bytes directly to VRAM as they are?? But in this case, you are only using them as low bytes of a word.... Since I'm not sure... I'm not going to finish any sort of solution.. But at least have done some refactoring below. (mainly removing all the rep/sep 's) (This could be refactored even further if you don't need the high byte zero at all).

I'm assuming PPUDATA is the VRAM, which is a $2118/$2119 IIRC? Don't forget to have properly configured the increment setting ($2115) (I won't add this in my code.

---- my version (incomplete)

Code: Select all

.proc rle_copy_ppu
  setxy16
  seta8
  ldy #$00
  lda #0	; clear Accum. hi-byte
  xba

 loop: 
  lda (rle_cp_ram), y
  cpa #$ff
  beq done
  tax
  iny
  lda (rle_cp_ram),y
  jsr rle_loop
  iny
  bra loop
  
done:
  rts

; IN: X = count
;      A = byte
rle_loop:
; INCOMPLETE / WILL NOT WORK (for above mentioned reasons in forum post)
; Please finish reading forum post and reply so we can work out your needs
  sta PPUDATA	; this is only writes the low byte.
  dex
  bne rle_loop
  rts
.endproc
-----
So now, let's talk about a solution if you do indeed want to write bytes directly to VRAM (and not just lobytes of a word with the high byte as zero, as you started with). we've identified how it's an issue since writing to VRAM requires maintaining an index of whether we are currently writing to the
low byte or high byte of the VRAM register.. and also, if we do the last transfer, ending on the low byte, we'll need to manually write an empty high byte or (some other solution) to get it transferred.

Because of this added difficulty -- well, you can figure it out if you'd like.. Shouldn't be too hard -- but also consider an alternative of decompressing to RAM and DMA'ing -- once again,
it's just an alternative -- and with no project constraints it doesn't matter (go with VRAM direct, it sounds more challenging :)

Re: My attempt at RLE compression

Posted: Sat Apr 09, 2016 5:24 pm
by nicklausw
Is there enough RAM to store 8 KB of data for OAM? Because I'm scared of having to figure out the words and stuff for a direct VRAM copy.

Or is there a way for PPUDATA to just take bytes?

Re: My attempt at RLE compression

Posted: Sat Apr 09, 2016 6:03 pm
by bazz
> Is there enough RAM to store 8 KB of data for OAM?

s/OAM/VRAM ??? There are 64KB 128KB of SysRam

> Or is there a way for PPUDATA to just take bytes?

There is only a way to have the VRAM word address increment based on lo-byte access or hi-byte access, but there is no single byte-port to write to.

----

Let's just verify that your decompression routine works -- you can decompress into SysRAM, and use a debugger to dump the RAM to file. then compare your decompressed section (you may need to trim in a hex editor) with the original data. (I use vbindiff for this)

Re: My attempt at RLE compression

Posted: Sat Apr 09, 2016 8:15 pm
by bazz

Code: Select all

byte_n = int.from_bytes(f_in.read(1), byteorder='little') # byte
byte_c = 0 # byte count

f_in.seek(0)
should be

Code: Select all

byte_n = int.from_bytes(f_in.read(1), byteorder='little') # byte
byte_c = 1 # byte count
There's no point in re-seeking back the byte you've already read.

Caution: I'm not a big python programmer, let alone python3 - so if there's other points of improvement, I can't see them X_X

I tested out your compression program -- it seems to work just fine. And, it does automatically cover > 254 repetitions :)

I used a nifty perl command to print large repetitions of characters for testing (I actually learned this from "Hacking: The Art of Exploitation" like 14 years ago xD)

perl -e 'print "A"x256' > test.txt

Have fun proving your SNES decompression works ^_^

Re: My attempt at RLE compression

Posted: Sat Apr 09, 2016 8:39 pm
by tokumaru
nicklausw wrote:tokumaru: thanks for the suggestion, but if I can't even get this basic RLE to work then why over-complicate it more?
You're right. I actually didn't realize you were having trouble making your current design work until I had typed my entire post, but I ended up posting it anyway. I can't help much with the SNES side of things, so please consider what I wrote as tips for the future.

Re: My attempt at RLE compression

Posted: Sun Apr 10, 2016 10:29 am
by nicklausw
Okay, this routine copies to non-zeropage RAM properly (to $100), but there isn't enough of it available so everything screws up.

You say there's 64 KB of RAM (I said OAM on mistake)? Where? As that would fix everything.

The variables:

Code: Select all

.segment "ZEROPAGE"
  rle_cp_ram: .res 2
  rle_cp_num: .res 2
  
.segment "BSS"
  rle_cp_dat: .res 2 ; a LOT more
The fixed routine:

Code: Select all

.proc rle_copy_ram
  setxy16
  seta8
  ldy #$00
  sty rle_cp_num
  lda #0   ; clear Accum. hi-byte
  xba

 loop:
  lda (rle_cp_ram), y
  cpa #$ff
  beq done
  tax
  iny
  lda (rle_cp_ram),y
  jsr rle_loop
  iny
  bra loop
 
done:
  rts

; IN: X = count
;      A = byte
rle_loop:
  phy
  ldy rle_cp_num
rle_inter:
  sta rle_cp_dat,y
  iny
  dex
  bne rle_inter
  sty rle_cp_num
  ply
  rts
.endproc

Re: My attempt at RLE compression

Posted: Sun Apr 10, 2016 10:56 am
by tepples
The Super NES has 64 KiB of VRAM and 128 KiB of work RAM at $7E0000-$7FFFFF. (The bottom 8 KiB of work RAM is mirrored into the bottom 8 KiB of banks $00-$3F and $80-$BF.) One common pattern is to decompress tile data to work RAM and then schedule a DMA to copy a few KiB to VRAM during the next vertical blanking period.

Re: My attempt at RLE compression

Posted: Sun Apr 10, 2016 11:18 am
by nicklausw
So do I have to do fancy bank switching, or do long references (lda $7fe000) work fine?