Page 2 of 2
Re: Is there a known algorithm for this?
Posted: Tue Jun 07, 2016 8:53 pm
by Myask
Thought: apply economics! Calculate the opportunity cost of each merge, by subtracting the savings of the next-best merge for each of the two "ends" involved from its savings, and go for the best value after counting opportunity cost to determine which one gets picked.
Compare with the simple-greedy.
Re: Is there a known algorithm for this?
Posted: Tue Jun 07, 2016 9:06 pm
by lidnariq
... I think that sounds like A* search?
Re: Is there a known algorithm for this?
Posted: Wed Jun 08, 2016 1:27 am
by calima
tokumaru wrote: if you did it in sets of say 12 strings
Which 12 strings, though? Those that are involved in the longest overlaps? What kind of selection would make me lose less opportunities?
Going with the Monte Carlo approach, any random 12 strings. Pick say 10 random sets, brute-force the best for each, and pick the best. The program can also do the original greedy algo on the side, and at the end pick the better one.
If randomness is bad, this is the type that a neural network can guess reasonably, if you know how to set those up.
Re: Is there a known algorithm for this?
Posted: Wed Jun 08, 2016 8:13 am
by tokumaru
I did in fact think of considering the next merge before committing the current merge.
Re: Is there a known algorithm for this?
Posted: Wed Jun 08, 2016 8:32 am
by tokumaru
Here's an idea: Instead of remembering the longest overlap for each block, remember the longest overlap on each end of each block, so that before merging 2 blocks, I could calculate how much I'd be losing from merges I wouldn't be doing (i.e. add up all merges that'd use the ends that would be connected). So instead of merging the longest overlap each step, I merge the longest overlap that will ruin less alternate merges.
EDIT: Sounds like what Myask suggested.
Re: Is there a known algorithm for this?
Posted: Wed Jun 08, 2016 11:26 pm
by 3gengames
For each byte of data, why not have a mirroring byte that shows the count of how many things use said byte? Match all the bytes you could merge it with, and choose the one with the lowest amount of relying entries.
Re: Is there a known algorithm for this?
Posted: Thu Jun 09, 2016 5:47 pm
by tokumaru
Kasumi, what logic are you using to merge two blocks? I'm actually having trouble coming up with something decent! Allowing small blocks to overlap not only the edges of large blocks, but also be entirely contained by them, is something I didn't anticipate, and everything I'm coming up with is proving to be excruciatingly slow!

Re: Is there a known algorithm for this?
Posted: Thu Jun 09, 2016 6:32 pm
by Kasumi
Err... I wrote this a while ago, so I dunno. *checks*
I've got a "blockmaster" class, a "block" class, and a "subblock" class.
A blockmaster contains a pointer to a pointer to a list of blocks.
A block contains a name, a size, a pointer to the data, and a pointer to a pointer to a list of subblocks.
A subblock contains a name and its offset from the block's data.
To start, I add every block to the blockmaster. The blockmaster has three functions called merge, testmerge, and mergeall.
This is what testmerge looks like (Forgive my glorious lack of code style RE: everythinglowercase):
Code: Select all
int metblockmaster::testmerge(metblock block1, metblock block2){
int bytessaved = 0;
if(block1.size <= 0 || block2.size <= 0){
return bytessaved;
}
int startingoffset = -block1.size+1;
int range = block1.size+block2.size-1;
int bestoffset = -1;
for(int x = startingoffset; x < startingoffset+range; x++){
int tempbytessaved = 0;
for(int y = 0; y < block1.size; y++){
if(y+x >= 0 && y+x < block2.size){
if(block1.data[y] == block2.data[y+x]){
tempbytessaved++;
}else{
tempbytessaved = 0;
break;
}
}
}
if(tempbytessaved > bytessaved){
bytessaved = tempbytessaved;
bestoffset = x;
}
}
return bytessaved;
}
I feel like I did it the most obvious way, so this may be what you're thinking of that's excruciatingly slow, I dunno. I actually see a few ways to improve the speed here.
I use test merge to look for the best savings (since it's non destructive). Merge basically does the same thing, except it merges the data using the best offset, adds one block as a subblock to the other (storing its offset from the main data, and retaining its name, as well as adding all subblocks from one block to the other), and removes that block from the block master.
I use whether best offset was positive or negative to decide which block should be the one to become a subblock. (If the positive offset one becomes the subblock, the negatively offset one doesn't need its own subblock's offsets updated) As well, this means to create the merged string, I can start by copying all of the negative blocks data, and then append the new block's data.
I can probably post merge too if you'd like, but it's... long, and this explanation is probably better than the code.
Re: Is there a known algorithm for this?
Posted: Thu Jun 09, 2016 6:53 pm
by tokumaru
This is exactly what I asked for. So, this is basically sliding one block over the other, from one end to the other, and comparing the overlapping parts element by element, right? I did think of something like this, but haven't tried this approach yet. Thanks for sharing the code!
Re: Is there a known algorithm for this?
Posted: Thu Jun 09, 2016 7:12 pm
by Kasumi
Correct. Whether a block is contained within another or at either end effectively doesn't matter because of the if statement with the range checks. (The count is not updated, but also doesn't reset.)
You could break on y+x >= block2.size or do a few other things to speed it up, but this does my 746 blocks fast enough that I'm not really waiting up for it. If you do find a novel way to do it, still share, of course.
Re: Is there a known algorithm for this?
Posted: Fri Jun 10, 2016 3:44 am
by Rahsennor
I'm not entirely sure it's applicable to this particular problem, but the
Knuth-Morris-Pratt algorithm should reduce the O(mn) comparisons to O(m+n), if you can work the range check into it.
Re: Is there a known algorithm for this?
Posted: Fri Jun 10, 2016 2:26 pm
by pubby
Here's a compressor in C++ using ant colony optimization to solve TSP:
http://pastebin.com/raw/geE75wfn
And some test results:
Code: Select all
# | Un | NN | Ant
500 | 8000 | 5452 | 5336
200 | 3200 | 2378 | 2304
50 | 800 | 659 | 630
Where
# = number of strips
Un = uncompressed size
NN = nearest neighbor optimized size
Ant = ant colony optimized size
I would probably use nearest neighbor (or some other greedy algorithm) once the number of strips is >500.
Re: Is there a known algorithm for this?
Posted: Sun Jun 19, 2016 6:50 pm
by tokumaru
pubby wrote:Here's a compressor in C++ using ant colony optimization to solve TSP
Sorry I didn't say this before, but thanks for giving this a try. I'm still trying to fully understand the whole ant colony thing!
Anyway, I'll probably not be using this technique to compress level maps in my game anymore. The results were OK, but not as good as I expected. My idea was to break up each screen (area of 16x16 blocks) into 16 strips of 16 blocks, and then compress all the strips into their shortest common supersequence. In my tests, only about 70% of the strips were unique. On top of that, the supersequence thing worked reasonably well, even though it was just the greedy algorithm, reducing the space required by the unique strips by practically 50%, so only about 35% of the strip data remained. It sounds good, but when you factor in the space taken by the screens themselves (16 pointers, 32 bytes, per screen), the results aren't so hot. Another thing that didn't work as expected was supporting both vertical and horizontal strips. Screens would be encoded one way or another depending on the amount of unique strips they introduced... this did reduce the number of unique strips a bit, but they didn't overlap as well, for obvious reasons, so the final supersequence ended up being longer than if only horizontal strips were used.
I'll probably revert to my old level format, which uses multiple levels of metatiles and is similar to quad-tree compression. For one, the compression algorithm is much more straightforward, and the ratios are better too, as long as design the levels with enough grid-aligned redundancy. The main problem with that was being limited to 256 blocks of each size (256x256, 128x128, 64x64, 32x32, 16x16). Using pointers instead of indices in order to support more than 256 entities would not only double the space required for the entities, but also significantly slow down the decoding routine.
I do expect to stay under 256 blocks most of the time, but I'd like to have the possibility of breaking this mark should the need arise. Oh well, maybe I'll come up with another way to expand this technique.