Page 19 of 21

Posted: Mon Jul 14, 2008 3:14 pm
by Andreas Naive
Ummm, i have failed both trying to figure out how every bitplane #0's context is "splitted" in 2 contexts for bitplane #1 and the exact way the "MPS" works for bitplane #0 (the "prediction" thing we saw in type 0).

In my tests, the selection of the context for bitplane #1 *seems* to depend on the output symbol for context #0, but that's not the only data to consider... and though i have seen some ideas that "seemed" to work, they fail when advancing in the stream.

Due to that, i changed to study the "prediction" thing for bitplane #0; in the first pixels, it seemed like only the inversion of bit #0 of the previous pixel was used, but after some more bits, things started to get more complex... so i haven't draw any conclusion.

At this point i'm not sure if i'm looking at the data the wrong way, if my approaches were too simplistic or if the programs i have used are buggy, so just take your chance and make us know what you get.

Time for sleeping. Good luck.

EDITED: Back from bed to make an important observation:
I have been working under the implicit assumption that the 'MPS' thing worked on the bit level; i mean: that, for every pixel, the first symbol determine one full bit, while the second one (whose context should be dependent of the first) determined the second one. This is, of course, just a wild guess (i assumed it as first hypothesis as i was expecting, like neviksti, that the first bit of the pixel determined which one of the 2 bitplane #1's contexts was selected), as the "MPS" thing could be more complex that just two 'orthogonal' values tracking the mps for the fist and second bit.

Posted: Tue Jul 15, 2008 9:27 am
by neviksti
Andreas Naive wrote:EDITED: Back from bed to make an important observation:
I have been working under the implicit assumption that the 'MPS' thing worked on the bit level; i mean: that, for every pixel, the first symbol determine one full bit, while the second one (whose context should be dependent of the first) determined the second one. This is, of course, just a wild guess (i assumed it as first hypothesis as i was expecting, like neviksti, that the first bit of the pixel determined which one of the 2 bitplane #1's contexts was selected), as the "MPS" thing could be more complex that just two 'orthogonal' values tracking the mps for the fist and second bit.
Hmm... maybe my terminology choices were unclear before, but the whole reason the prob calculator failed at first was because each symbol was NOT one-to-one related to a bit. Every two symbols decides one pixel (otherwise the updated prob calculator would not work), but it seems to be more of an index into some lookup table than it is directly related to the bit values.

Since you have been able to figure out the context for the first symbol of each pixel (there are five), have you tried just selecting the context for the second symbol something like:
5 + context of first symbol*2 + 1 if first symbol is lps?
Or if that doesn't work then something like:
5 + context of first symbol*2 + 1 if first symbol (lps^invert)?

Sorry if these are obvious things you already tried. I haven't had time to play with the data myself yet.

EDIT: I did take time to give the patent a quick read last night, but on the first time through I didn't understand their method of encoding the pixel. They did seem to indicate the symbols didn't directly relate to the bit values, but beyond that I didn't understand the details. I definitely need to give it a second read with more care, but if someone already understands that part of the patent... please share your interpretation of that.

EDIT(2): To be more explicit, I've seen the two bits of the pixel show up in this order in the current "range" of the arithmetic encoder:

something like this

10
01
00
11

and sometimes something like

01
10
00
11

etc.

That's why I don't believe each symbol relates directly to a bit, but the two symbols together are somekind of lookup into a table. The patent mentions prior art used a lookup into a table of the pixel values in order of frequency, and that they improved on this somehow. The patent seems to still use the symbols as a index into some table (the patent comments on some ROM precalculated table based on states or something), but I don't fully understand it yet. Like I mentioned, I really need to give the patent a second read.

Posted: Tue Jul 15, 2008 11:33 am
by Andreas Naive
5 + context of first symbol*2 + 1 if first symbol is lps?
Or if that doesn't work then something like:
5 + context of first symbol*2 + 1 if first symbol (lps^invert)?
Yeah, i tried that and much more ideas. :P
something like this

10
01
00
11

and sometimes something like

01
10
00
11
OK. It's more clear now. But... then, when an "invert" occur, what is happening with that order? You mentioned maybe you could modify your program to be able to reproduce that full order; maybe knowing what happens when an invert occur in a context for bitplane #0 could be valuable... (or even we could see whether or not that order is related with the choice of contexts for bitplane #1).

Indeed, in your description:
bit 7 - first bit of 2bit color at top of current input range
bit 6 - second bit of 2bit color at top of current input range
i'm not sure what you refer to as the "current input range" after the first symbol for a pixel. Till now i have simply ignored those 2 bits.

As for the patent, i just did a "diagonal" read, so i suppose i should take another look if you think something interesting about the encoding of the pixels is said there.

Posted: Tue Jul 15, 2008 12:25 pm
by Andreas Naive
OK. I took another look at the patent. I don't have the patiente to do a careful read (as i still it's too generic to pay too much attention to it), but i looked again at the PBP thing, as that seem what we are lacking to know how to select the context for bitplane #1.

If i have understood it correctly, it seems it would chose one or another based on the output for the first bit in the pixel.

However, if i have followed neviksti's explanations, we can have a symbol ordering for a pixel of the form
01
10
00
11
in which case the first symbol (the one from bitplane #0) doesn't give us an output symbol that could be used to determine the context for bitplane #1.

However, if the ordering is like
01
00
10
11
then the first symbol effectively correlate with the first output bit for the pixel.

So... i was wondering if indeed the context choice for bitplane #1 is decided based on if the first symbol has determined a full bit. That way, one of the contexts would be intended to optimize the compression in the case when the second bit can be understood as "independent" to the first one, while the other context would be optimized to take into account the "joint probability estimation" for the 2 bits.

This is, of course, just (another) wild guess, and i don't even consider it very probable, but i thought i should share it. Now i'm not sure if i can check this with the available data. neviksti, could you explain what exactly those 2 bits are? (Which is the "top" of the current input range? The pixel value corresponding to the LPS/LPS pair? And what is it when referred to the second symbol for a pixel? The pixel value corresponding to an LPS as second symbol when the value for the first symbol have been fixed?)

Posted: Tue Jul 15, 2008 12:45 pm
by neviksti
Andreas Naive wrote:Now i'm not sure if i can check this with the available data. neviksti, could you explain what exactly those 2 bits are? (Which is the "top" of the current input range? The pixel value corresponding to the LPS/LPS pair? And what is it when referred to the second symbol for a pixel? The pixel value corresponding to an LPS as second symbol when the value for the first symbol have been fixed?)
Yes, for the first symbol it is the pixel value for the lps/lps pair. And for the second symbol it is the pixel value corresponding to an lps with the first symbol fixed to the chosen path.

For this reason, it is not possible to construct the correspondance to all four pixel values.

Where I saw the cases where the bits don't correspond directly to the symbols and I could finally see what was going on, was from the "firstbyte" data for mode 1 (from way back with the first data runs with the original cartridge). So that may be a good place to start until I get a chance to change the program and give you more appropriate data. (I may not be able to get to it tonight as I'm having some friends over for a cookout... but given my sleeping habits lately, who knows.)

Posted: Tue Jul 15, 2008 11:34 pm
by Lord Nightmare
This stuff is going so far over my head its not funny. I need to read up on the arithmetic coding techniques used here...

LN

Posted: Tue Jul 15, 2008 11:44 pm
by Near
Lord Nightmare wrote:This stuff is going so far over my head its not funny.
I'm glad it's not just me ... and I really, really want to help, too :(

Posted: Tue Jul 15, 2008 11:56 pm
by neviksti
Yeah! I finally had time to look at the data. With Andreas' explanation of the 5 contexts, everything is making decent sense now. I seem to be able to follow all the prob evolution.
Andreas Naive wrote:
5 + context of first symbol*2 + 1 if first symbol is lps?
Or if that doesn't work then something like:
5 + context of first symbol*2 + 1 if first symbol (lps^invert)?
Yeah, i tried that and much more ideas.
Hmm... I meant "invert" as it was for mode 0 (ie. the invert value used for that symbol). I think you may have used instead the invert value after it was updated from that symbol. For the second option seems to work fine.

I haven't rewritten the SNES program yet, but looking at the first byte data it look like the following to me:
the symbols are just indexes into a table and the output is somehow xor'd with one of the reference pixels?

I don't know. I still have a ways to go on that.

Posted: Wed Jul 16, 2008 12:17 am
by Andreas Naive
This stuff is going so far over my head its not funny. I need to read up on the arithmetic coding techniques used here...
As a first reading, i would suggest this document:
http://msp.csie.ncu.edu.tw/DC/pdf/QM.pdf
Hmm... I meant "invert" as it was for mode 0 (ie. the invert value used for that symbol). I think you may have used instead the invert value after it was updated from that symbol. For the second option seems to work fine.
Ummm, yes, maybe we were given different meanings to the "invert" thing. I will recheck it when at home.

Posted: Wed Jul 16, 2008 1:06 am
by Andreas Naive
OK. I took a look at your implementation for the type 0. Definitively we were talking of different things. :P

Besides that, it's clear for me that that 0x7f "extrange limit case" is only an artifact derived from the way you have chosen your internal states. If you look at the way QM-coders are implemented in reference documents, you can see the choice i did for the internal states is the standard one ('A' is what i was naming 'cutoff', and 'C' what i was naming 'value', but that's all). Then, instead of initializing top=0xff, you can do A=0x100=0x00(modulus 0x100) and drop both the extrange border case and the "<<1 + 1" thing when renormalizing.

Posted: Wed Jul 16, 2008 1:49 am
by neviksti
Andreas Naive wrote:Besides that, it's clear for me that that 0x7f "extrange limit case" is only an artifact derived from the way you have chosen your internal states.
It is not an artifact. Yes there are several ways to choose your internal variables. But regardless of how you write them, if the current range of values being considered is constrained to the range:
0x000000..... to 0x7FFFFF....

It is possible to renormalize. They don't. That is why it is a strange border case.

Instead they wait until the range is constrained to something like:
0x000000..... to 0x7EFFFF....

This fact is independant of the choice of internal state variables.

;------------
I like the top=0xFF (all lower bits are implied 1's) instead of top=0x00 (when 0x00 imply the next higher bit is 1), because the "implied bits" are always the same in the first case, where they aren't in the second case (once you subtract and top=0xA6 for example, the bit is no longer implied so that top=0x1A6). I find the 'consistency' of the first one nicer for conceptual simplicity. Actually it is this "inconsistency" that probably causes that strange border case out of convenience (for if you actually renormalized every time it was possible, you would have to test for two conditions "is A<=0x80 && A!=0x00 then shift"... instead they change this to "is A<0x80" for convenience of only needing to test one condition and in doing so introduce a small error).

It of course is merely a choice of preference at this point, that's just my personal preference and reason.

Posted: Wed Jul 16, 2008 1:56 am
by neviksti
Okay, I rewrote the SNES code to record enough data that we can see evolving "order of likely pixel values".

http://neviksti.com/SPC7110/output1_7030_an1.dat2

format:
Each pixel has 8 bytes worth of data
0 - bit4/bit0 show pixel value for lps then lps
1 - bit4/bit0 show pixel value for lps then mps
2 - bit4/bit0 show pixel balue for mps then lps
3 - first symbol prob value
4 - bit 7 (lps?), lower bits are number of renormalization shifts after handling symbol
5 - (not needed, just a consistency check) bit4/bit0 show pixel value for second symbol lps given decided first symbol
6 - second symbol prob value
7 - bit 7 (lps?), lower bits are number of remormalizaiton shifts after handling symbol


EDIT:
The data is very bizarre. I'm glad I took this because I can test ideas quicker ... unfortunately they are all failing :(

The part I am having trouble conceptually with most, is how they handle the fact that we might be in context (A=B!=C), but the predicted values probably depend on A somehow. How do they depend on A?

The easiest way to see this now is look at the first time a new context is used, the list of pixels is 00 01 10 11 only if the context pixels are all zero. That very first prediction for each context should be the easiest ... yet I fail at even that. I'm clearly missing something.

Posted: Wed Jul 16, 2008 2:58 am
by Andreas Naive
I like the top=0xFF (all lower bits are implied 1's) instead of top=0x00
You are missing the point here. Your "top" doesn't play the same role than "A" in the standard implementations. To say it this way, your "top" try to indicate the maximum value of a certain range, while the 'A' try to indicate the lesser value not included in that range. That's why you continuously have to add 1 after shifting bits while that is not needed for A.
A<=0x80 && A!=0x00
A can *never* be less than 0x80-0x5a=0x26 but in the initialization.
in doing so introduce a small error
This is a funny sentence. The theoretical base for all this is to try to maintain the interval range as close to 1 as possible (to avoid discretization errors); by shifting a bit when A's MSb is 0 they try to maintain the interval length in the [0.75-1.5] range. Even if i believed that the way you are doing this is the natural one (which i don't), to consider a 1/128 shift in the extremes of that "length range" as an "error" is funny when considered how small it is when compared with the discretization errors you are commiting when situated in the extremes of that interval.
It of course is merely a choice of preference at this point, that's just my personal preference and reason.
Yes, of course. At this point it is clear that we have different states of mind about this, but that is not very important as, indeed, we can see the arithmetic decoder as a black box for what is now discussed.

EDITED:
A<=0x80 && A!=0x00
A can *never* be less than 0x80-0x5a=0x26 but in the initialization.
Oh, i now understand what you referred to with the "A!=0x00". But, not, it's not only a reason of convenience for implementation purposes; it is more "unnatural" to shift when A=0x80 than when A=0x7f because when A is near 0x80 the interval length to which your internal variables are mapped is near 0.75 in the [0.75-1.5] range, and, from the point of view of the "natural" things" it's preferred to not shift as long as possible at that point, as if you shift, the interval to which your internal variables will have a length near 1.5 and, of course, the discretization errors will be bigger when located in the 1.5 neiborhood. So, indeed, if this were exclusively driven by the "most natural choice" we should renormalize at mush lesser values but, of couse, just checking 1 bit is a big convenience for implementation purposes.

Posted: Wed Jul 16, 2008 3:43 am
by neviksti
Andreas Naive wrote:You are missing the point here. Your "top" doesn't play the same role than "A" in the standard implementations. To say it this way, your "top" try to indicate the maximum value of a certain range, while the 'A' try to indicate the lesser value not included in that range. That's why you continuously have to add 1 after shifting bits while that is not needed for A.
I fully understand that. It is a choice of having your state variable mark what the current range is "less than", or "less than or equal to". Hence the choice of having your intial value be "0x00" or "0xFF" with different implications for the "unnamed bits". This doesn't change the fact that they don't renormalize when they can.
Andreas Naive wrote:
A<=0x80 && A!=0x00
A can *never* be less than 0x80-0x5a=0x26 but in the initialization.
Yes, because of their choice to renormalize only when A < 0x80, instead of renormalizing whenever possible (which would be "top" < 0x80 or equivalently "A<=0x80 && A!=0x00").

The compresion would be more efficient if they actually multiplied by the probabilities (true arithmetic compression), but instead we are of course making the approximation: range * prob ~= prob. The further the range gets from 0000.... to FFFF... the worse this approximation becomes.

So I don't understand why not renormalizing whenever possible wouldn't be considered a small error.
Andreas Naive wrote:...by shifting a bit when A's MSb is 0 they try to maintain the interval length in the [0.75-1.5] range.
Wait what? What is giving 1.5?


Ahhhh! I finally get it now.
I was considering the $80 as 0.5, and the probability values just down sized a bit to alleviate the fact that we are in the range 0.5 - 1.0 all the time.

Instead you are suggesting we consider something like 0xAA = 1.0 (so the initial range is called 1.5 and the we always try to 'renormalize' to keep the range in $80-$FF ~ 0.75 - 1.5). While the result (the "black box") is the same, this interpretation does make much more sense. For now the discretation errors can appear more "balanced" as we can be above and below the optimal value... and now it makes it easier to understand and interpret $5A as roughly 50%.

Now I also understand your complaint about my calling the lack of renormalization of 0000....7FFF an error, for the prob values were calculated such that they actually work slightly better at this extreme than at the "1.5" extreme.

Thank you for taking the time to prod me a bit more with that. I definitely learned more by finally thinking that through the rest of the way. Makes good sense. Thanks.

EDIT: Oh, I see you clarified now.

Posted: Wed Jul 16, 2008 3:59 am
by Andreas Naive
Instead you are suggesting we consider something like 0xAA = 1.0 (so the initial range is called 1.5 and the we always try to 'renormalize' to keep the range in $80-$FF ~ 0.75 - 1.5). While the result (the "black box") is the same, this interpretation does make much more sense. For now the discretation errors can appear more "balanced" as we can be above and below the optimal value... and now it makes it easier to understand and interpret $5A as roughly 50%.
Exactly. I'm not only suggesting it. It is clearly described in IBM's documents.
The representation for 1 is 0xaaaaaaaaa... If they use 0x5a instead 0x 0x55 (as an aproximation to 0x55555555...) is because they applied a probability estimation technique over real data based on Markov chain to adjust those values.