Where did you get the number 01273654? Is there a reason that you're still scanning left to right, top to bottom, instead of scanning in a circle?byuu wrote:> To make rotation as easy as rotation, order the bits as
0 1 2
7 x 3
6 5 4
Reordering won't help.
01273654
HQ2X Algorithm Ported to Verilog
Moderator: Moderators
Because real binary streams are not circular? :/
Okay, the raw pattern from diffed pixels:
IHG
FeD
CBA
You suggest re-arranging it like so:
FGH
DeI
CBA
I can re-order the hqTable like this, no problem.
But what have we accomplished? We need a CC-rotation to give us:
HIA
GeB
FDC
And how do we get this value? pattern << 2 | pattern >> 6 won't do it. Neither will pattern << 6 | pattern >> 2.
Ultimately, we always have to look into an 8-bit array, so we MUST pack the result down into a bitstream to do it:
FGHDICBA
->
HIAGBFDC
With your method, we'd have to transform the values one at a time:
0->5, 1->3, 2->0, 4->1, 7->2, 6->4, 5->7, 3->6
01247653 needs to become 53012476
Which looks like our nice (n<<6)|(n>>2), but we have to move individual bits. So we really need:
((n&0x01)<<5) | ((n&0x02)<<2) | ((n&0x04)>>2) | ((n&0x10)>>3) | ((n&0x80)>>5) | ((n&0x40)>>2) | ((n&0x20)<<2) | ((n&0x08)<<3)
Which is ... no better than what we are doing now.
---
But anyway, the code's posted. If you use the latest bsnes, you can compile the snesfilter HQ2x file separately using pure C++98 code. If you can get it to work and eliminate the rotation table, I'll pay you $20 =)
Okay, the raw pattern from diffed pixels:
IHG
FeD
CBA
You suggest re-arranging it like so:
FGH
DeI
CBA
I can re-order the hqTable like this, no problem.
But what have we accomplished? We need a CC-rotation to give us:
HIA
GeB
FDC
And how do we get this value? pattern << 2 | pattern >> 6 won't do it. Neither will pattern << 6 | pattern >> 2.
Ultimately, we always have to look into an 8-bit array, so we MUST pack the result down into a bitstream to do it:
FGHDICBA
->
HIAGBFDC
With your method, we'd have to transform the values one at a time:
0->5, 1->3, 2->0, 4->1, 7->2, 6->4, 5->7, 3->6
01247653 needs to become 53012476
Which looks like our nice (n<<6)|(n>>2), but we have to move individual bits. So we really need:
((n&0x01)<<5) | ((n&0x02)<<2) | ((n&0x04)>>2) | ((n&0x10)>>3) | ((n&0x80)>>5) | ((n&0x40)>>2) | ((n&0x20)<<2) | ((n&0x08)<<3)
Which is ... no better than what we are doing now.
---
But anyway, the code's posted. If you use the latest bsnes, you can compile the snesfilter HQ2x file separately using pure C++98 code. If you can get it to work and eliminate the rotation table, I'll pay you $20 =)
Dang, this is awesome. I just finished implementing a new version of my HQ2X Verilog filter. This version includes just the optimized rotation symmetry enhancement (i.e. it does not include any of the other optimizations that are shown in byuu's bsnes hq2x filter source code). The rotation symmetry upgrade all by itself resulted in a 34% overall reduction in FPGA resources. Rockin!! Thanks byuu!
More updates to come when I implement more of the bsnes optimizations.
Pz!
Jonathon
More updates to come when I implement more of the bsnes optimizations.
Pz!
Jonathon
Hey byuu,
Would you mind explaining your grow/pack functions and how/why they work? And how they're better than MaxSt's.
I could just go ahead and implement them blindly in Verilog and they would work fine, but I always want to understand what I'm implementing otherwise I don't learn anything.
Thanks!
Jonathon
Would you mind explaining your grow/pack functions and how/why they work? And how they're better than MaxSt's.
I could just go ahead and implement them blindly in Verilog and they would work fine, but I always want to understand what I'm implementing otherwise I don't learn anything.
Thanks!
Jonathon
Code: Select all
uint16_t blend2(uint32_t A, uint32_t B, uint32_t C) {
grow(A); grow(B); grow(C);
return pack((A * 2 + B + C) >> 2);
}Code: Select all
#define Interp02(c1, c2, c3) \
(((((c1 & Mask_2) * 2 + (c2 & Mask_2) + (c3 & Mask_2) ) >> 2) & Mask_2) + \
((((c1 & Mask13) * 2 + (c2 & Mask13) + (c3 & Mask13) ) >> 2) & Mask13))Ignoring that ... Lots of masking and repeated multiplications there.
It's masking FF00FF, performing math on that, then masking 00FF00 and doing the same again, and combining the results. Looks to be working on 24-bit input.
Mine splits the channels apart and does the multiplication only once, works on SNES 15-bit input (can do 16-bit too.)
The idea is that n*4 in the worst case can spill over by two extra bits:
%11111*4=%(11)11100, the part in parenthesis have spilled over, which would alias into the next color channel. But if we have some zero values between them, we can shift around and mask. So mine turns:
0rrrrrgggggbbbb into:
000000ggggg00000 0rrrrr00000bbbbb
Then does the math on them, shifts back, and then packs it back together.
I couldn't say which was faster (would guess mine), you'd have to bench-mark it. I just like mine more for readability.
Okay, thanks a lot.
Did you notice that in your blend() function the case 0 will never fire (since hqTable[] contains no 0 values)? Same goes for cases 7, 8, 9, 10, and 11.
Also, can you go into a little more detail on why you have both diff() and same() functions? Instead of just one or the other.
And one more thing...
Thanks byuu!
Jonathon
Did you notice that in your blend() function the case 0 will never fire (since hqTable[] contains no 0 values)? Same goes for cases 7, 8, 9, 10, and 11.
Also, can you go into a little more detail on why you have both diff() and same() functions? Instead of just one or the other.
And one more thing...
Maybe this is some weird SNES thing that I don't know about but how does shifting and masking get you 5 'b's when you only have 4 'b's to start with?byuu wrote:0rrrrrgggggbbbb into:
000000ggggg00000 0rrrrr00000bbbbb
Thanks byuu!
Jonathon
There should have been five b's.
I think I kept the "holes" to match HQ2x rules, but yeah, if our goal is code size, I should get rid of the duplicates, good idea. The table was generated by writing a parser for hq2x.cpp from MaxSt.
diff v same is because one caches part of the decode process when comparing the center pixel against other pixels. Slight speedup.
I think I kept the "holes" to match HQ2x rules, but yeah, if our goal is code size, I should get rid of the duplicates, good idea. The table was generated by writing a parser for hq2x.cpp from MaxSt.
diff v same is because one caches part of the decode process when comparing the center pixel against other pixels. Slight speedup.
Hello all!
I finally integrated my verilog HQ2X pixel scalar into my VeriNES emulator. Now I can finally demo real games running with the scalar enabled rather than single static images (as in my first post). Unfortunately, the codec that I used to record these videos performs some of its own blending and such, but you can certainly still tell the difference between when the scalar is enabled and when it's not. The HQ2X implementation that I finally integrated into my emulator is ~75% smaller than my original unoptimized implementation. The biggest optimizations were byuu's (author of bsnes) symmetry optimization, a huge BRAM reduction, and a couple major parallelization/pipelining optimizations.
Here are some videos (Xvid codec) - I think Solstice is the best demonstration of the scalar. There is really nothing to see here that can't be seen in either bsnes, nestopia, or whatever. This is really just to prove that I accomplished what I originally set out to do.
Super Mario Bros. (HQ2X Demo) (31MB)
Legend of Zelda (HQ2X Demo) (56MB)
Solstice (HQ2X Demo) (38MB)
Major thanks to byuu for telling me about his symmetry optimization.
Pz!
Jonathon
I finally integrated my verilog HQ2X pixel scalar into my VeriNES emulator. Now I can finally demo real games running with the scalar enabled rather than single static images (as in my first post). Unfortunately, the codec that I used to record these videos performs some of its own blending and such, but you can certainly still tell the difference between when the scalar is enabled and when it's not. The HQ2X implementation that I finally integrated into my emulator is ~75% smaller than my original unoptimized implementation. The biggest optimizations were byuu's (author of bsnes) symmetry optimization, a huge BRAM reduction, and a couple major parallelization/pipelining optimizations.
Here are some videos (Xvid codec) - I think Solstice is the best demonstration of the scalar. There is really nothing to see here that can't be seen in either bsnes, nestopia, or whatever. This is really just to prove that I accomplished what I originally set out to do.
Super Mario Bros. (HQ2X Demo) (31MB)
Legend of Zelda (HQ2X Demo) (56MB)
Solstice (HQ2X Demo) (38MB)
Major thanks to byuu for telling me about his symmetry optimization.
Pz!
Jonathon
Good job with hq2x.
I noticed a couple unrelated problems in the SMB1 video, in both scaled and unscaled mode. You seem to skip a single column of pixels near the left side: move forward while watching the hills and floor tiles closely. And you appear not to be doing the 33rd fetch and have blank pixels at the far right.
I noticed a couple unrelated problems in the SMB1 video, in both scaled and unscaled mode. You seem to skip a single column of pixels near the left side: move forward while watching the hills and floor tiles closely. And you appear not to be doing the 33rd fetch and have blank pixels at the far right.
- infiniteneslives
- Posts: 2102
- Joined: Mon Apr 04, 2011 11:49 am
- Location: WhereverIparkIt, USA
- Contact:
Thanks!tepples wrote:Good job with hq2x.
Yeah, I've had those bugs for almost 2 years. LOL. I have literally just been working on everything else and implementing new features (fixing CPU bugs, APU, FIR filters, porting to altera, etc). Once I got the PPU to a point where I could play pretty much every game without any major trouble I moved on to other things. But I really need to get back to fixin my PPU....some day.tepples wrote:I noticed a couple unrelated problems in the SMB1 video, in both scaled and unscaled mode. You seem to skip a single column of pixels near the left side: move forward while watching the hills and floor tiles closely. And you appear not to be doing the 33rd fetch and have blank pixels at the far right.
Thanks!infiniteneslives wrote:This is awesome to see.
Yep. It will also be controllable via my Qt GUI interface.infiniteneslives wrote:So do you plan to make it so that the user could just flick a switch (change an input) and turn it on and off seamlessly like you were in the video then?