higan CPU emulation mode bug? (attn: byuu or any 65816 guru)

Discussion of hardware and software development for Super NES and Super Famicom.

Moderator: Moderators

Forum rules
  • For making cartridges of your Super NES games, see Reproduction.
AWJ
Posts: 433
Joined: Mon Nov 10, 2008 3:09 pm

Re: higan CPU emulation mode bug? (attn: byuu or any 65816 g

Post by AWJ »

byuu wrote:> The bit size is required so that 1-bit bitfields can be specialized to have boolean semantics (you want flags.cy = 2 to set the bit to 1, not 0)

That's actually debatable. In most cases that is what you want, but there are fun things you can do with a uint1 type.
You know, that's a damn good point. Instead of specializing on size, I'll use a separate bitbool<type, position> template for the boolean-semantics version. That lets the parameters to the regular bitfield template be <type, lo, hi>, which is easier to eyeball for correctness.
So your method would disallow treating the parent as the whole object.
Add a cast operator that returns a reference to the all-the-bits member. You have to define this operator for each bitfieldy union you define, but it's only two trivial one-liners, one for rvalue and one for lvalue.
Okay, but what about the SPC700's YA register? You're going to end up with:
regs.ya.a
regs.ya.y
regs.ya.ya = yayaya.iAmLorde
Make a, y and ya an anonymous union inside regs. Problem solved. Anyway, I thought we agreed we weren't going to use these for byte-aligned stuff :)


(snip stuff about nall::vector, I haven't really looked at it in detail, I'll acknowledge that maybe it's not as bad as I thought it was)

What I really want are getter/setters. We shouldn't need to have Foo::value(), Foo::setValue(Value), Foo::_value. This breaks all of our math.
No, I don't mean the readonly<> crap. I mean properties in the Python sense, which look, walk and quack exactly like data members to the outside world, but transparently wrap a getter and setter method of the class they're a property of. I think there are other dynamic programming languages with a similar feature, Python is just the one I'm familiar with.

Code: Select all

struct Size {
  int width;  //our first iteration of our class; we're fine treating this as a variable
  property<int> height {  //a later iteration, turns out if height changes we need to do other stuff
    operator int() { return *this; }
    auto& operator=(int value) { *this = value; doStuff(); return *this; }
  };
  //the layout of the struct is the same after we changed height;
  //all existing code that used size.height still compiles and is backward-compatible;
  //although .o files that use Size::height will need to be recompiled for obvious reasons
};
Yes, that is exactly what Python properties are. If you've defined height as a property of the Size class, you can say size.height += 4 and the language transparently turns it into size._setheight(size._getheight() + 4) The only limitation is that you can't have a property with only a setter; if there's a setter there has to be a getter and the underlying data member has to be called something else like _height (Python doesn't have public/protected/private access controls, but leading underscores are a convention for denoting "private" members) You can have a property with a getter but no setter; the result is a readonly property that raises an exception if anyone tries to assign to it.

There's no requirement that Python properties have an "underlying data member" at all; you can have a readonly property that calculates some result on the fly, or a property whose getter and setter do completely unrelated things (though I'm not sure why you'd want to do the latter except to write obfuscated code) Since properties are invisible to client code, you can start with class.a as a property that's calculated on the fly from class.b which is a real data member, and later change it so class.a is the data member and class.b is calculated on the fly, and none of your clients will care (unless they try to assign to one of them)
Near
Founder of higan project
Posts: 1553
Joined: Mon Mar 27, 2006 5:23 pm

Re: higan CPU emulation mode bug? (attn: byuu or any 65816 g

Post by Near »

> You know, that's a damn good point.

The funny thing is, it came about by accident. I didn't like that uint2 and uint1 acted differently. Could even be a nasty surprise if you ever have a template<int n> Natural<n> foobar(); function. I also started using uintN in place of masking the lower bits. You may disagree here, but I like (uint23)n more than (n & 0x7fffff). And it's way more useful on signed types for sign extension, eg (int15)sample instead of sclamp<15>(sample) [which can't even be (int16)(sample << 1) >> 1 thanks to C leaving signed shift right behavior as unspecified >_> ]

But after finding some (albeit rare) edge cases to benefit it, and not having to specialize things, I decided to keep it. I think it's only used in the ARM core right now.

That said, probably the dumbest class in all of nall is there for consistency as well, can you guess what it is? int1. It's a value that can hold either 0 or -1, and nothing else. I will be stunned if I ever find a use case for that.

> That lets the parameters to the regular bitfield template be <type, lo, hi>, which is easier to eyeball for correctness.

Yeah, I didn't want to nitpick because you had a good reason, and also because some people prefer <offset, length> to <lo, hi>. This was a really tough decision for Integer/Natural::bits() for me. offset, length has another benefit that you don't have to worry about reversing the arguments (with my class, hi,lo is identical to lo,hi, which is weird but ... the alternative is worse.)

> Add a cast operator that returns a reference to the all-the-bits member.

That will only work for direct assignment.

Code: Select all

Reg16 x;
uint16 y = x;  //ok
uint16 z = x++;  //not ok
x += 3;  //not ok
There may be a way to make Reg16 inherit right from bitrange<type, lo, hi>, but ... I'd have to think about it.

> Anyway, I thought we agreed we weren't going to use these for byte-aligned stuff

We did? Well, I like the idea of getting rid of the order_lsb macros. But if it has a speed hit, then I agree, we stick with what we have then.

I wonder if we could be evil fucks and detect bitrange<n, n+7> and add an operator uint(bits)_t&() that grabs the exact address inside the byte based on endian, so that bitrange<0, 7> l; can be passed to a function taking uint8_t&. I'm not going to try this, don't worry, just spit balling.

> Yes, that is exactly what Python properties are.

Then yes, you're right, I love this feature of Python.

I've thought about writing something to simulate them in C++, this is true, but I don't think I actually did it, did I? Unless this is about r14,r15 again.
AWJ
Posts: 433
Joined: Mon Nov 10, 2008 3:09 pm

Re: higan CPU emulation mode bug? (attn: byuu or any 65816 g

Post by AWJ »

Never mind memory usage--making template bitfields inherit from Natural is just not going to work. The reason you can put different bitfields into a union and access them is that any instantiation of the bitfield template is a standard layout struct. The requirements of a standard layout struct include that it have no virtual functions, no data members that are references, its non-static data members can only come from one class (it can't have a base class with data members and also data members of its own), and its data members must all have the same access control. Your idea violates both the no references rule and the no mixing inherited and original data members rule.
Near
Founder of higan project
Posts: 1553
Joined: Mon Mar 27, 2006 5:23 pm

Re: higan CPU emulation mode bug? (attn: byuu or any 65816 g

Post by Near »

I spent the last day converting my audio resampler from a blackman windowed sinc FIR + integral decimator over to a direct form ii quadratic butterworth IIR. That gives a nice 10-15% speedup to SGB, DMG, CGB, WS, WSC emulation. Possibly a lot more than that on platforms with weak floating point. It also sounds better, surprisingly. Better attenuation of frequencies beyond 20KHz. And we can easily chain more passes to keep making it sound better, but at 44.1KHz+ output, there's plenty of headroom for being slightly less ideal than a brick wall response.

Also refactored the SFX switch table with the opcode&15 trick. That shaved off an additional 30% of the line count, very nice!

Still haven't toyed around with a template bitfield class just yet, but I'll get to that soon.
AWJ
Posts: 433
Joined: Mon Nov 10, 2008 3:09 pm

Re: higan CPU emulation mode bug? (attn: byuu or any 65816 g

Post by AWJ »

Code: Select all

template <typename T, unsigned i, unsigned j> class bitfield {
  // allow lo and hi to be specified in either order
  enum : unsigned { lo = i < j ? i : j, hi = i > j ? i : j };
  static_assert(hi < sizeof(T) * 8, "bitfield does not fit in type");
  enum : T { mask = (T)(((1ULL << (hi - lo)) + (1ULL << (hi - lo)) - 1) << lo) };
  T value;

  // helper for optimized assignment, requires newvalue to be pre-shifted
  bitfield& assign(T newvalue) { value = value & ~mask | newvalue & mask; return *this; }
public:
  // value >> lo & mask >> lo produces smaller code than (value & mask) >> lo
  operator T() const             { return value >> lo & mask >> lo; }
  explicit operator bool() const { return value & mask; }

  // optimized assignment--avoid doing two shifts
  bitfield& operator++() { return assign(value + (1 << lo)); }
  bitfield& operator--() { return assign(value - (1 << lo)); }
  T operator++(int) { T r = *this; assign(value + (1 << lo)); return r; }
  T operator--(int) { T r = *this; assign(value - (1 << lo)); return r; }
  bitfield& operator=(T other)  { return assign(other << lo); }
  bitfield& operator&=(T other) { return assign(value & (other << lo)); }
  bitfield& operator|=(T other) { return assign(value | (other << lo)); }
  bitfield& operator^=(T other) { return assign(value ^ (other << lo)); }
  bitfield& operator+=(T other) { return assign(value + (other << lo)); }
  bitfield& operator-=(T other) { return assign(value - (other << lo)); }
  bitfield& operator<<=(unsigned bits) { return assign(value << bits); }
  bitfield& operator>>=(unsigned bits) { return assign(value >> bits); }

  // these aren't so optimized... shouldn't be needed much anyway
  bitfield& operator*=(T other) { return assign((*this * other) << lo); }
  bitfield& operator/=(T other) { return assign((*this / other) << lo); }
  bitfield& operator%=(T other) { return assign((*this % other) << lo); }

  // ensure bitfield-to-bitfield assignment preserves other bits
  // this requires C++11 unrestricted unions
  bitfield& operator=(bitfield other) { return assign(other.value); }
};

template <typename T, unsigned pos> class bitflag {
  static_assert(pos < sizeof(T) * 8, "bitflag does not fit in type");
  enum : T { mask = (T)(1ULL << pos) };
  T value;

  // shift a different bitflag so its significant bit is aligned with ours
  template<typename otherT, unsigned otherpos>
  T align(bitflag<otherT, otherpos> other) {
    return (pos > otherpos) ? other.value << (pos - otherpos) :
           (otherpos > pos) ? other.value >> (otherpos - pos) :
           other.value; }

  // constructor used by the optimized bitwise operators
  // this requires C++11 unrestricted unions
  bitflag(T v) : value(v) {}
public:
  // even in C++11, unions require a trivial default constructor
  bitflag() = default;

  operator bool() const { return value & mask; }

  // don't use other as a number--should allow more optimal codegen in more situations
  bitflag& operator=(bool other)  { value = other ? value | mask : value & ~mask; return *this; }
  bitflag& operator&=(bool other) { if(!other) value &= ~mask; return *this; }
  bitflag& operator|=(bool other) { if(other) value |= mask; return *this; }
  bitflag& operator^=(bool other) { if(other) value ^= mask; return *this; }

  // these return a bitflag, not a bool--allows them to be chained without redundant masking
  template<typename otherT, unsigned otherpos>
  bitflag operator&(bitflag<otherT, otherpos> other) { return bitflag(value & align(other)); }
  template<typename otherT, unsigned otherpos>
  bitflag operator|(bitflag<otherT, otherpos> other) { return bitflag(value | align(other)); }
  template<typename otherT, unsigned otherpos>
  bitflag operator^(bitflag<otherT, otherpos> other) { return bitflag(value ^ align(other)); }

  // ensure bitflag-to-bitflag assignment preserves other bits
  // this requires C++11 unrestricted unions
  bitflag& operator=(bitflag other) { return operator=((bool)other); }

  // optimized bitwise operators need all bitflags to be friends with each other
  template<typename otherT, unsigned otherpos> friend class bitflag;
};
I did a really quick-and-dirty conversion of the 65816 flag_t, ignoring serialization for the moment:

Code: Select all

union flag_t {
  uint8_t all;
  bitflag<uint8_t, 7> n;
  bitflag<uint8_t, 6> v;
  bitflag<uint8_t, 5> m;
  bitflag<uint8_t, 4> x;
  bitflag<uint8_t, 3> d;
  bitflag<uint8_t, 2> i;
  bitflag<uint8_t, 1> z;
  bitflag<uint8_t, 0> c;

  inline operator unsigned() const { return all; }

  inline unsigned operator=(uint8 data) { return all = data; }

  inline unsigned operator|=(unsigned data) { return operator=(operator unsigned() | data); }
  inline unsigned operator^=(unsigned data) { return operator=(operator unsigned() ^ data); }
  inline unsigned operator&=(unsigned data) { return operator=(operator unsigned() & data); }
};
Shit works. Didn't have to touch a single line outside registers.hpp. Doom ingame and Masoukishin title screen are within 1FPS of before. The Sufami Turbo BIOS "please insert a cartridge" screen is a couple FPS faster. I didn't expect significant gains; I was just crossing my fingers that it wouldn't get significantly slower. Like I said, the real win with these is things like the GSU flags register where you can now access any of alt1, alt2 and alt for free without explicit bit-twiddling.

Please try it out with the current, de-template-hellified higan 65816.

Oh, and you really don't want to default to providing the whole set of assignment and unary operators for the underlying data of a set of bitfields/bitflags, even if you could work out some syntax that let you do it. Yes, for Reg16 it's slightly more convenient not having to type .w, but what about a CPU flags word that has some bits that are unused and always 0? You want those 0 bits to be enforced on assignment, and adding/subtracting/multiplying/dividing a set of flags makes no sense. Each union of bitfields/bitflags should provide only the operators that make sense for it, on a case by case basis.
Near
Founder of higan project
Posts: 1553
Joined: Mon Mar 27, 2006 5:23 pm

Re: higan CPU emulation mode bug? (attn: byuu or any 65816 g

Post by Near »

Still working on my audio filtering. It has near-zero impact on the SNES (save for SGB), but it's of critical importance to all of my other emulators.

But yes, I'll definitely implement this and give it a go. I don't really care whether P is eight booleans or one bit-packed uint8. But if there's really no speed hit, then I'll go with yours since it makes the cast and assignment operators simpler.

As for these:

Code: Select all

  inline unsigned operator|=(unsigned data) { return operator=(operator unsigned() | data); }
  inline unsigned operator^=(unsigned data) { return operator=(operator unsigned() ^ data); }
  inline unsigned operator&=(unsigned data) { return operator=(operator unsigned() & data); }
At this point, I'd probably just write it like this:

Code: Select all

  inline auto& operator|=(uint data) { return all |= data; }
  inline auto& operator^=(uint data) { return all ^= data; }
  inline auto& operator&=(uint data) { return all &= data; }
(You could return uint as the type, but assignment is supposed to return *this, usually. We don't do anything stupid with chained P assignments, so it really doesn't matter.)

> Oh, and you really don't want to default to providing the whole set of assignment and unary operators for the underlying data of a set of bitfields/bitflags, even if you could work out some syntax that let you do it.

Well it's not the end of the world to not have that.

So for the 65816, right now I have this layout:

Code: Select all

.l = bits 0-7 (low)
.h = bits 8-15 (high)
.b = bits 16-23 (bank)
.w = bits 0-15 (word)
.d = bits 0-31 (doubleword)
The last one's kind of stupid. Normally, I'd want .l (long), but that's already taken. How would you feel about .a = bits 0-23 (absolute, or all)?
AWJ
Posts: 433
Joined: Mon Nov 10, 2008 3:09 pm

Re: higan CPU emulation mode bug? (attn: byuu or any 65816 g

Post by AWJ »

byuu wrote:Still working on my audio filtering. It has near-zero impact on the SNES (save for SGB), but it's of critical importance to all of my other emulators.

But yes, I'll definitely implement this and give it a go. I don't really care whether P is eight booleans or one bit-packed uint8. But if there's really no speed hit, then I'll go with yours since it makes the cast and assignment operators simpler.

As for these:

Code: Select all

  inline unsigned operator|=(unsigned data) { return operator=(operator unsigned() | data); }
  inline unsigned operator^=(unsigned data) { return operator=(operator unsigned() ^ data); }
  inline unsigned operator&=(unsigned data) { return operator=(operator unsigned() & data); }
At this point, I'd probably just write it like this:

Code: Select all

  inline auto& operator|=(uint data) { return all |= data; }
  inline auto& operator^=(uint data) { return all ^= data; }
  inline auto& operator&=(uint data) { return all &= data; }
(You could return uint as the type, but assignment is supposed to return *this, usually. We don't do anything stupid with chained P assignments, so it really doesn't matter.)

> Oh, and you really don't want to default to providing the whole set of assignment and unary operators for the underlying data of a set of bitfields/bitflags, even if you could work out some syntax that let you do it.

Well it's not the end of the world to not have that.

So for the 65816, right now I have this layout:

Code: Select all

.l = bits 0-7 (low)
.h = bits 8-15 (high)
.b = bits 16-23 (bank)
.w = bits 0-15 (word)
.d = bits 0-31 (doubleword)
The last one's kind of stupid. Normally, I'd want .l (long), but that's already taken. How would you feel about .a = bits 0-23 (absolute, or all)?
If you use these for 65816 registers and leave all the regs.pc.l = read(addr); regs.pc.h = read(addr+1) stuff as-is, the compiler's going to generate a lot of superfluous masking operations. Where possible, replace that stuff with shifts-and-ors into a temporary variable (unsigned temp = read(addr); temp |= read(addr+1) << 8; pc.w = temp;)

I'm about to rewrite the 65816 so that by #including different versions of register.hpp I can choose between the old uint8/uint16 unions, these, and masking via lookup values stored in static const arrays (using three different definitions of reg16 and three versions of setreg(reg16 &reg, unsigned value, bool size))

I'm also going to do a bunch of other cleanups at the same time--makes no sense to compare different implementations for performance when the rest of the core is in a known sub-optimal state.

The 65816 has a bunch of member variables that are only used within an instruction, left over from when it was a state machine pre-libco. At one point I tried replacing them all with local variables but the code got bigger and slower--I think it's register starved because of the need to constantly call virtual functions for read/write/io cycles, so adding even one local variable to an opcode handler results in a stack frame being needed when there was none before. Note that this was before switchifying and de-templating. I'm hoping for better results after my cleanups.

Oh, and if you use these for the 65816 flags you're going to have to rewrite sec/clc/sei/cli/etc. so they don't take a reference to a bool (I already rewrote them as seven separate functions; that's why my bitflags were a drop-in replacement)

While we're being clever and evil, how do you feel about a class for the 65816 PC that has a postfix ++ operator that returns a copy of all 24 bits but only increments the lower 16? You just have to make sure you don't use that operator anywhere except readpc()...
Near
Founder of higan project
Posts: 1553
Joined: Mon Mar 27, 2006 5:23 pm

Re: higan CPU emulation mode bug? (attn: byuu or any 65816 g

Post by Near »

I hope you can understand how, even if it ends up with extra masking (and an extra inc in this example):
pc.lo = read(addr++);
pc.hi = read(addr++);

Is much nicer to read than:
uint16_t temp = read(addr);
temp |= read(addr + 1) << 8;
pc.word = temp;

This is where the two of us have our biggest disagreement, I suppose. I'm willing for my code to run a little slower, if it means that it looks a whole lot nicer and is a lot more compact. I'm not willing to sacrifice half my speed for that small change above, but if the difference of unrolling all my 16-bit reads to be like yours throughout all my CPU cores is a 1% speed boost, that's absolutely not worth it to me at all. Yes, I understand these little 1fps boosts add up. Throw in 100 such changes and you've potentially just doubled the speed of the emulator! But now you have a codebase I don't want to be the primary maintainer of; because it will be much harder for me to reason about and fix bugs in.

I think our principle disagreement is that the two of us are trying to achieve our separate goals with the same codebase. That's never what I intended for bsnes. I can't stop you from doing what you want on your fork (that's the GPL for you), but I feel you'd be much better off writing an SNES emulator from the ground up focused on speed. I would even be willing to work with you and help you to make that faster, by writing code in your style with your suggestions to try and get stuff faster at the expense of the niceness of the code.

At the rate you're going, you'll end up having rewritten every single piece of bsnes, little by little, and it'll be a good bit faster, but you won't ever get it as fast as you could with a clean start. And it's forever be considered a fork of my code instead of your own code. Which wouldn't be fair if you rewrote everything yourself over time.

Note that this isn't to say I'm not appreciative of all the optimizations and fixes you're providing that I get to incorporate back into my own emulator, which I wouldn't recieve if you had your own entirely separate project. I'm just speaking about what I feel is strictly best for you and your goals, at the expense of mine. Your own project would likely be a lot more popular than a fork of mine, which would cannibalize my userbase even more, so it'd be much worse for me if you actually took my advice above.

> The 65816 has a bunch of member variables that are only used within an instruction, left over from when it was a state machine pre-libco.

What I liked about them was not having to add extra "reg24_t aa, rd;" etc definitions at the top of each opcode instruction.

One way to reduce them, was I was using them for no reason in things like the IRQ opcode stuff earlier. Copying them into rd and then just transferring that over to regs.pc instead of just writing directly to regs.pc on each read.

> Oh, and if you use these for the 65816 flags you're going to have to rewrite sec/clc/sei/cli/etc. so they don't take a reference to a bool

Another easy solution would be to have op_setflag(bool bit) and op_clearflag(bool bit) and just do:
regs.p |= 1 << bit;
regs.p &= ~(1 << bit);

> While we're being clever and evil, how do you feel about a class for the 65816 PC that has a postfix ++ operator that returns a copy of all 24 bits but only increments the lower 16?

Sounds a little too evil for me, but you may be able to persuade me with examples. If we can only use it in readpc() one time, then why bother with the extra code for it?
AWJ
Posts: 433
Joined: Mon Nov 10, 2008 3:09 pm

Re: higan CPU emulation mode bug? (attn: byuu or any 65816 g

Post by AWJ »

I haven't even tested which is faster out of size punning, bitfields and mask lookup tables yet. Let's get there before we start worrying about performance versus prettiness. Also, I think we both need to stop psychoanalyzing each other; I don't think either of us is very good at it.

bitfields are never going to replace size punning as a generic solution for CPU registers. They happen to work for the 65816 because it barely does any operations specifically on the upper byte(s) of a register, but for something like z80 you need to do the full range of ALU operations on b, c, d, e, h or l, and likewise on x86 with ah/al etc. I think we need to concentrate on applications like flags and the coarse/fine fields of the PPU scroll registers.
What I liked about them was not having to add extra "reg24_t aa, rd;" etc definitions at the top of each opcode instruction.
This is C++, not C. You don't declare local variables at the top of functions. Well, I guess you didn't know that in 2005 when you wrote that code :)
If we can only use it in readpc() one time, then why bother with the extra code for it?
Yeah, it's a bit silly for an operation that's only done in one place. A better example of a refactoring I'd like to do is r.p.setnz(result, size) instead of r.p.n = result & 0x80; r.p.z = (result == 0); repeated in every opcode. As well as being less repetitive, it makes the few instructions that only affect z and not n stand out more.
Near
Founder of higan project
Posts: 1553
Joined: Mon Mar 27, 2006 5:23 pm

Re: higan CPU emulation mode bug? (attn: byuu or any 65816 g

Post by Near »

> Also, I think we both need to stop psychoanalyzing each other; I don't think either of us is very good at it.

You're especially challenging, yeah. You keep telling me performance isn't your goal, but all of your code changes always seem to be directly about what's more performant. Very mixed messaging. I really don't know what your goals are.

For the record, I'm okay with wherever you are and whatever your utimate goal is. As long as we're free to disagree on certain changes, then I've been really enjoying working with you on improving the things we do agree on. I'm elated about the improvements to the SuperFX core, and I like your suggestions with this bitfield class (even if I am a bit surprised you suggested it.)

I've been fighting back and forth on that for a while with the GBA CPU. This may be just the thing I need. The GBA has 32-bit registers with multiple fields that span byte boundaries, yet you can write to the regs in 8/16/32-bit blocks. I've been using .field = byte.bits(n, n+length) style coding for that, but I think this could be even better.

... or possibly worse. It will put a burden to extract every time we want to -read- bits. So even if a register is written once per frame, if it's read once per pixel, that's a very bad trade-off if the read now has to do (v>>start)&mask to get us a usable value.

Always hard to say without actually testing these things. So far, it looks to work just fine for flag registers, at least.

> This is C++, not C. You don't declare local variables at the top of functions.

It still has to be on its own line, separate from the assignments to .l and .h.

We're talking a good 10-20% increase in code size to not have those scratchpad variables defined.
tepples
Posts: 22345
Joined: Sun Sep 19, 2004 11:12 pm
Location: NE Indiana, USA (NTSC)
Contact:

Re: higan CPU emulation mode bug? (attn: byuu or any 65816 g

Post by tepples »

AWJ wrote:This is C++, not C. You don't declare local variables at the top of functions. Well, I guess you didn't know that in 2005 when you wrote that code :)
I thought C allowed declarations after a statement since C99.
AWJ
Posts: 433
Joined: Mon Nov 10, 2008 3:09 pm

Re: higan CPU emulation mode bug? (attn: byuu or any 65816 g

Post by AWJ »

byuu wrote:I've been fighting back and forth on that for a while with the GBA CPU. This may be just the thing I need. The GBA has 32-bit registers with multiple fields that span byte boundaries, yet you can write to the regs in 8/16/32-bit blocks. I've been using .field = byte.bits(n, n+length) style coding for that, but I think this could be even better.

... or possibly worse. It will put a burden to extract every time we want to -read- bits. So even if a register is written once per frame, if it's read once per pixel, that's a very bad trade-off if the read now has to do (v>>start)&mask to get us a usable value.
bitflag<> should be as fast as native bools, as long as you only use them in a boolean sense and not as a number (flags.foo ? dothis() : dothat() is OK, but x = lookup[flags.foo] will have overhead)

bitfield<>, yeah, if they're not byte aligned and are read extremely often you'll probably want to extract them into separate variables. You can still use a bitfield union, only use it as a temporary during your MMIO handler.

If you have an n : 17 <= n <= 31 bit bitfield with unused bits above it (the 65816 PC, the GBA scroll registers, etc.), you can define a 32 bit bitfield that overlaps it and use the full-word one for reading and the exact-width one for writing. That will pretty much duplicate the behaviour of Natural<>. Just make sure you clear the unused bits at least on initialization and serialization. That reminds me, I still have to add serialization to these. I noticed that your Natural<> and Integer<> templates simply serialize the underlying native type, without doing any checking or masking; I suggest that you make them mask out the unused bits on load as a defensive measure.

I did some experimentation today with a bitfield-based reg16 and discovered that clang and gcc generate optimal code for writing to reg.h (they directly write the second byte without any masking or shifting) but not for reg.l (they always read, mask, and write back the full 16 bits). WTF, compilers?

ETA: seems to be a GCC bug that was fixed a while back, guess I need to update: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=15184

Also, I misremembered what clang's exact problem was. clang correctly generated a single instruction for reg.l = value;, but for size ? reg.l = value : reg.w = value; (which is more than half the instructions on the 65816) it generated the whole masking enchilada for the reg.l branch (and changing the ternary to an if didn't make a difference)
Near
Founder of higan project
Posts: 1553
Joined: Mon Mar 27, 2006 5:23 pm

Re: higan CPU emulation mode bug? (attn: byuu or any 65816 g

Post by Near »

Here's another interesting CPU edge case, emulation mode BRK instructions:

Image

The manual says they don't exist. I have them set to $fffe, which is emulation mode IRQ. But I certainly don't remember testing or confirming this behavior. It feels like it should be $fff6, which is otherwise unused.

I understand that the B bit doesn't exist in the processor status flags. Just, seems odd. Would have been nice if the official manual confirmed this behavior at least.
Near
Founder of higan project
Posts: 1553
Joined: Mon Mar 27, 2006 5:23 pm

Re: higan CPU emulation mode bug? (attn: byuu or any 65816 g

Post by Near »

I refactored all the template functions out of the SPC700 core. Drops the object file size from ~175kb to ~125kb, does not incur any speed penalty. Also used macros for the opcode switch table, which reduced a ton of red tape.

All that's left now are the 6502 and LR35902 (GBZ80) cores. Then we won't have any more cores with templated opcodes.
Nicole
Posts: 218
Joined: Sun Mar 27, 2016 7:56 pm

Re: higan CPU emulation mode bug? (attn: byuu or any 65816 g

Post by Nicole »

On the 6502, both external IRQs and BRK instructions used the IRQ vector; it wasn't until the 65816 that they had separate vectors. The break flag was used to differentiate the two, and that flag does exist in emulation mode.

(The reason there seems to be a hole there in the middle of the vectors is because of the addition of the COP vector; that vector and instruction don't exist on a 6502, but do exist in the 65816's emulation mode, since all 65816 instructions can be used in that mode.)
Post Reply