HQ2X Algorithm Ported to Verilog

Discuss emulation of the Nintendo Entertainment System and Famicom.

Moderator: Moderators

tepples
Posts: 22345
Joined: Sun Sep 19, 2004 11:12 pm
Location: NE Indiana, USA (NTSC)
Contact:

Post by tepples »

byuu wrote:> To make rotation as easy as rotation, order the bits as

0 1 2
7 x 3
6 5 4

Reordering won't help.

01273654
Where did you get the number 01273654? Is there a reason that you're still scanning left to right, top to bottom, instead of scanning in a circle?
Near
Founder of higan project
Posts: 1553
Joined: Mon Mar 27, 2006 5:23 pm

Post by Near »

Because real binary streams are not circular? :/

Okay, the raw pattern from diffed pixels:

IHG
FeD
CBA

You suggest re-arranging it like so:

FGH
DeI
CBA

I can re-order the hqTable like this, no problem.

But what have we accomplished? We need a CC-rotation to give us:

HIA
GeB
FDC

And how do we get this value? pattern << 2 | pattern >> 6 won't do it. Neither will pattern << 6 | pattern >> 2.

Ultimately, we always have to look into an 8-bit array, so we MUST pack the result down into a bitstream to do it:

FGHDICBA
->
HIAGBFDC

With your method, we'd have to transform the values one at a time:

0->5, 1->3, 2->0, 4->1, 7->2, 6->4, 5->7, 3->6
01247653 needs to become 53012476

Which looks like our nice (n<<6)|(n>>2), but we have to move individual bits. So we really need:
((n&0x01)<<5) | ((n&0x02)<<2) | ((n&0x04)>>2) | ((n&0x10)>>3) | ((n&0x80)>>5) | ((n&0x40)>>2) | ((n&0x20)<<2) | ((n&0x08)<<3)

Which is ... no better than what we are doing now.

---

But anyway, the code's posted. If you use the latest bsnes, you can compile the snesfilter HQ2x file separately using pure C++98 code. If you can get it to work and eliminate the rotation table, I'll pay you $20 =)
User avatar
jwdonal
Posts: 719
Joined: Sat Jun 27, 2009 11:05 pm
Location: New Mexico, USA
Contact:

Post by jwdonal »

Dang, this is awesome. I just finished implementing a new version of my HQ2X Verilog filter. This version includes just the optimized rotation symmetry enhancement (i.e. it does not include any of the other optimizations that are shown in byuu's bsnes hq2x filter source code). The rotation symmetry upgrade all by itself resulted in a 34% overall reduction in FPGA resources. Rockin!! Thanks byuu!

More updates to come when I implement more of the bsnes optimizations.

Pz!

Jonathon :)
User avatar
jwdonal
Posts: 719
Joined: Sat Jun 27, 2009 11:05 pm
Location: New Mexico, USA
Contact:

Post by jwdonal »

Hey byuu,

Would you mind explaining your grow/pack functions and how/why they work? And how they're better than MaxSt's.

I could just go ahead and implement them blindly in Verilog and they would work fine, but I always want to understand what I'm implementing otherwise I don't learn anything.

Thanks!

Jonathon
Near
Founder of higan project
Posts: 1553
Joined: Mon Mar 27, 2006 5:23 pm

Post by Near »

Code: Select all

uint16_t blend2(uint32_t A, uint32_t B, uint32_t C) {
  grow(A); grow(B); grow(C);
  return pack((A * 2 + B + C) >> 2);
}

Code: Select all

#define Interp02(c1, c2, c3) \
(((((c1 & Mask_2) *  2 + (c2 & Mask_2)     + (c3 & Mask_2)    ) >> 2) & Mask_2) + \
 ((((c1 & Mask13) *  2 + (c2 & Mask13)     + (c3 & Mask13)    ) >> 2) & Mask13))
Unsure if the equality test on some of those functions will help or not. Certainly will for solid-color screens, but how common/rare is that? Extra test could make it slower in some cases.

Ignoring that ... Lots of masking and repeated multiplications there.

It's masking FF00FF, performing math on that, then masking 00FF00 and doing the same again, and combining the results. Looks to be working on 24-bit input.

Mine splits the channels apart and does the multiplication only once, works on SNES 15-bit input (can do 16-bit too.)

The idea is that n*4 in the worst case can spill over by two extra bits:
%11111*4=%(11)11100, the part in parenthesis have spilled over, which would alias into the next color channel. But if we have some zero values between them, we can shift around and mask. So mine turns:
0rrrrrgggggbbbb into:
000000ggggg00000 0rrrrr00000bbbbb
Then does the math on them, shifts back, and then packs it back together.

I couldn't say which was faster (would guess mine), you'd have to bench-mark it. I just like mine more for readability.
User avatar
jwdonal
Posts: 719
Joined: Sat Jun 27, 2009 11:05 pm
Location: New Mexico, USA
Contact:

Post by jwdonal »

Okay, thanks a lot.

Did you notice that in your blend() function the case 0 will never fire (since hqTable[] contains no 0 values)? Same goes for cases 7, 8, 9, 10, and 11.

Also, can you go into a little more detail on why you have both diff() and same() functions? Instead of just one or the other.

And one more thing...
byuu wrote:0rrrrrgggggbbbb into:
000000ggggg00000 0rrrrr00000bbbbb
Maybe this is some weird SNES thing that I don't know about but how does shifting and masking get you 5 'b's when you only have 4 'b's to start with?

Thanks byuu!

Jonathon
Near
Founder of higan project
Posts: 1553
Joined: Mon Mar 27, 2006 5:23 pm

Post by Near »

There should have been five b's.

I think I kept the "holes" to match HQ2x rules, but yeah, if our goal is code size, I should get rid of the duplicates, good idea. The table was generated by writing a parser for hq2x.cpp from MaxSt.

diff v same is because one caches part of the decode process when comparing the center pixel against other pixels. Slight speedup.
User avatar
jwdonal
Posts: 719
Joined: Sat Jun 27, 2009 11:05 pm
Location: New Mexico, USA
Contact:

Post by jwdonal »

Hello all!

I finally integrated my verilog HQ2X pixel scalar into my VeriNES emulator. Now I can finally demo real games running with the scalar enabled rather than single static images (as in my first post). Unfortunately, the codec that I used to record these videos performs some of its own blending and such, but you can certainly still tell the difference between when the scalar is enabled and when it's not. The HQ2X implementation that I finally integrated into my emulator is ~75% smaller than my original unoptimized implementation. The biggest optimizations were byuu's (author of bsnes) symmetry optimization, a huge BRAM reduction, and a couple major parallelization/pipelining optimizations.

Here are some videos (Xvid codec) - I think Solstice is the best demonstration of the scalar. There is really nothing to see here that can't be seen in either bsnes, nestopia, or whatever. This is really just to prove that I accomplished what I originally set out to do.
Super Mario Bros. (HQ2X Demo) (31MB)
Legend of Zelda (HQ2X Demo) (56MB)
Solstice (HQ2X Demo) (38MB)

Major thanks to byuu for telling me about his symmetry optimization.

Pz!

Jonathon :)
tepples
Posts: 22345
Joined: Sun Sep 19, 2004 11:12 pm
Location: NE Indiana, USA (NTSC)
Contact:

Post by tepples »

Good job with hq2x.

I noticed a couple unrelated problems in the SMB1 video, in both scaled and unscaled mode. You seem to skip a single column of pixels near the left side: move forward while watching the hills and floor tiles closely. And you appear not to be doing the 33rd fetch and have blank pixels at the far right.
User avatar
infiniteneslives
Posts: 2102
Joined: Mon Apr 04, 2011 11:49 am
Location: WhereverIparkIt, USA
Contact:

Post by infiniteneslives »

This is awesome to see. I'm even more excited to get my hands on an accurate NOAC that implements this!

So do you plan to make it so that the user could just flick a switch (change an input) and turn it on and off seamlessly like you were in the video then?
User avatar
jwdonal
Posts: 719
Joined: Sat Jun 27, 2009 11:05 pm
Location: New Mexico, USA
Contact:

Post by jwdonal »

tepples wrote:Good job with hq2x.
Thanks!
tepples wrote:I noticed a couple unrelated problems in the SMB1 video, in both scaled and unscaled mode. You seem to skip a single column of pixels near the left side: move forward while watching the hills and floor tiles closely. And you appear not to be doing the 33rd fetch and have blank pixels at the far right.
Yeah, I've had those bugs for almost 2 years. LOL. I have literally just been working on everything else and implementing new features (fixing CPU bugs, APU, FIR filters, porting to altera, etc). Once I got the PPU to a point where I could play pretty much every game without any major trouble I moved on to other things. But I really need to get back to fixin my PPU....some day. :)
infiniteneslives wrote:This is awesome to see.
Thanks!
infiniteneslives wrote:So do you plan to make it so that the user could just flick a switch (change an input) and turn it on and off seamlessly like you were in the video then?
Yep. It will also be controllable via my Qt GUI interface.
Post Reply