higan CPU emulation mode bug? (attn: byuu or any 65816 guru)

Discussion of hardware and software development for Super NES and Super Famicom.

Moderator: Moderators

Forum rules
  • For making cartridges of your Super NES games, see Reproduction.
AWJ
Posts: 433
Joined: Mon Nov 10, 2008 3:09 pm

Re: higan CPU emulation mode bug? (attn: byuu or any 65816 g

Post by AWJ »

We code golf now? This compiles to a several hundred bytes smaller object file :) Note that the addr parameters of the inlines that wrap within a bank are now uint16_t, and adding the DP or SP register is done by assigning to addr.

Code: Select all

// direct page address: wraps within bank zero
// if in emulation mode and direct page is aligned, wraps in 256-byte page
alwaysinline uint8_t op_readdp(uint16_t addr) {
  addr = regs.d + (regs.e && !(regs.d & 0xff) ? addr & 0xff : addr);
  return op_read(addr);
}

// direct page address without 6502 wrapping emulation
// used by pei and indirect long instructions
alwaysinline uint8_t op_readdpn(uint16_t addr) {
  addr += regs.d;
  return op_read(addr);
}

// stack relative: wraps within bank zero
alwaysinline uint8_t op_readsp(uint16_t addr) {
  addr += regs.s;
  return op_read(addr);
}
Near
Founder of higan project
Posts: 1553
Joined: Mon Mar 27, 2006 5:23 pm

Re: higan CPU emulation mode bug? (attn: byuu or any 65816 g

Post by Near »

Don't ... make me code golf. You won't like me when I code golf. BYUU SMASH!! (the stack frame)
AWJ
Posts: 433
Joined: Mon Nov 10, 2008 3:09 pm

Re: higan CPU emulation mode bug? (attn: byuu or any 65816 g

Post by AWJ »

Macroizing the M and X flag checks the way you did for the E flag isn't gonna work out, because most of the time which flag you have to check is variable. Like for op_read_const(fp op) (and everything else in opcode_read.cpp), if "op" is LDA or ADC you have to check the M flag, but if it's LDY or LDX you have to check the X flag.
AWJ
Posts: 433
Joined: Mon Nov 10, 2008 3:09 pm

Re: higan CPU emulation mode bug? (attn: byuu or any 65816 g

Post by AWJ »

https://github.com/awjackson/bsnes-clas ... d83b83c10c

Remember how a while back I converted the superfx opcode dispatch to a switch table, and discovered that duplicating the conditional branch instructions was faster than unifying them? Today I figured out how to eliminate the duplicate code without a speed penalty: move the branch test out of the common handler and into the dispatch macros.

You should try this with the 65816 (I haven't gotten around to porting your switch table conversion yet) and see what impact it has. Luckily, taken conditional branches and the BRA instruction are identical cyclewise--you just have to add an "untaken branch" pseudo-instruction (you could reuse op_wdm since it's exactly what you need, a two-byte two-cycle NOP)
Near
Founder of higan project
Posts: 1553
Joined: Mon Mar 27, 2006 5:23 pm

Re: higan CPU emulation mode bug? (attn: byuu or any 65816 g

Post by Near »

> move the branch test out of the common handler and into the dispatch macros.

Interesting! I guess -O3 wasn't inlining things enough to do that for us.

> I haven't gotten around to porting your switch table conversion yet

Funny, I haven't gotten around to porting yours for the SuperFX yet ;)

> You should try this with the 65816

Perhaps, when I clear my list a little. I'm still in the middle of a massive code cleanup to unify things around camelCase, and gut a lot of old, nasty coding patterns.

But speaking of performance enhancements, I believe the real meat is in the PPU still. Just force inlining the hell out of functions in there helped, but there must be more we can do without resorting to things like tile decoding caches.

I know I can't ever reach balanced-profile performance levels on the accuracy-profile, but I'd like to try and get the accuracy one more reasonable now that it's my only maintained core. I relied too heavily on, "well there are faster cores!" when writing code for it, so I've started to let things deteriorate awfully close to my limit of "must run at full speed on modern CPUs" ... the Cx4, SFX2 and ST018 have moments where I get dangerously close on my i7 2600k.

> you could reuse op_wdm

How I wish Bill had used that instruction as a prefix to override the P flag size for a given instruction, and made it a chained sequence so there was no wasted I/O cycle (so it was just one extra cycle of overhead), eg:

lda #$01
wdm; lda #$0102
AWJ
Posts: 433
Joined: Mon Nov 10, 2008 3:09 pm

Re: higan CPU emulation mode bug? (attn: byuu or any 65816 g

Post by AWJ »

byuu wrote:> move the branch test out of the common handler and into the dispatch macros.

Interesting! I guess -O3 wasn't inlining things enough to do that for us.
We don't want the compiler to inline half a dozen entire copies of the branch instruction--we're trying to make the code smaller! We only want the branch-or-don't-branch decision inlined, and the way to do that is to move it outside the function.

I think I've ranted at you about this before, but unless a function is a one-liner that there's no cost to alwaysinline-ing, you really shouldn't be writing functions that look like this:

Code: Select all

void do_something_or_other(bool which) {
  if(which) {
    // ...
    // something
    // ...
  } else {
    // ...
    // something completely different
    // ...
  }
  // maybe one little line that's common to both
}
But speaking of performance enhancements, I believe the real meat is in the PPU still. Just force inlining the hell out of functions in there helped, but there must be more we can do without resorting to things like tile decoding caches.
There are two low hanging fruits in the PPU that I've noticed: the sprites and the windows. The sprites should render into a line buffer once each scanline (like the real hardware does), rather than searching through the list of 32 active sprites every pixel. This is a pure win--it's faster, more accurate to hardware, and should be equally simple if not simpler code.

As for the windows, you have them doing a large amount of branch-heavy calculations, on one bit at a time for six output bits, once for every pixel of the screen. There are two things you could do that would greatly speed things up, but you aren't going to like either of them.

One is to cache the window outputs for each combination of in/out/W1/W2, and recalculate them only when one of the window-related registers is written to. That reduces the per-pixel work to testing the X coordinate against the window coordinates and selecting one of the four precalculated results.

The other is to calculate all six bits at once, rather than one at a time, using a lot of masking. I mostly figured out how to do this "on paper" a long time ago, around when I did those hires color math tests for you, but I didn't finish it because I hadn't officially forked bsnes-classic yet and I knew you would never accept that code in mainline (at the time you were very emphatic "I don't care how slow accuracy is because there's balanced and performance for that")

Caching the window outputs is likely to have a bigger impact than changing how they're calculated, especially if you're willing to store them as byte-packed bitmasks and not as a kazillion bools.

Changing the subject, someone on the 6502.org forums was taking requests to test 65816 behaviour with a hardware test rig, so I signed up to ask him about the quirk you discovered with interrupts and two-cycle implied mode instructions. It turns out that it happens on a standard 65816--it's not specific to the S-CPU. The I/O cycle of those instructions mutates into an "operand" fetch (the 65816 has sufficient output signals to distinguish between opcodes, operands, interrupt vectors, data, and I/O cycles. The only one of the five that the S-CPU bothers to distinguish are I/O cycles, but the SA-1 presumably uses those signals more extensively, in order to fetch vectors from internal registers like it does and to implement the prefetch queue it must have)
tepples
Posts: 22345
Joined: Sun Sep 19, 2004 11:12 pm
Location: NE Indiana, USA (NTSC)
Contact:

Re: higan CPU emulation mode bug? (attn: byuu or any 65816 g

Post by tepples »

AWJ wrote:There are two low hanging fruits in the PPU that I've noticed: the sprites and the windows. The sprites should render into a line buffer once each scanline (like the real hardware does), rather than searching through the list of 32 active sprites every pixel. This is a pure win--it's faster, more accurate to hardware, and should be equally simple if not simpler code.
I wonder if this is due to confusion with the NES PPU, which does use a set of eight shift registers (as far as I understand Visual 2C02). Yet I know the Neo Geo has a line buffer because it's stored on a discrete SRAM. When exactly did console PPUs switch from individual shift registers to a line buffer?
  1. The 2.5 generation (CreatiVision, ColecoVision, SG-1000)
  2. The third generation (NES, Master System)
  3. The 3.5 generation (TurboGrafx-16)
  4. The fourth generation (Genesis, Super NES, Neo Geo)
Near
Founder of higan project
Posts: 1553
Joined: Mon Mar 27, 2006 5:23 pm

Re: higan CPU emulation mode bug? (attn: byuu or any 65816 g

Post by Near »

Okay, I went to town on cleaning up the SuperFX core (processor/gsu).

Source: files.byuu.org/temp/higan_20160530.tar.xz

What I did was:
* remove the jump table and replace it with a switch table
* convert template arguments to function arguments
* merge the alt2/alt1 checks into the opcodes themselves
* merge the branch functions to just one test

This reduced the number of instruction functions from 81 (multiplied a lot thanks to templates) to 40. We also no longer need to compute the opcode to execute as alt2<<9 | alt1<<8 | peekpipe(), but instead as just peekpipe(). This cuts our switch table from 1024 entries down to 256 entries. This made our switch table way smaller.

We already checked sfr.b for certain opcodes (from/moves and to/move), so it made sense to handle sfr.alt2 and sfr.alt1 the same way. This also led to a huge decrease in duplicated code. We went from 695 lines of code to 424 lines of code for all instructions. As a whole, the core went from 42.7KiB to 34.0KiB.

This also led to a substantial decrease in compiled object size. We dropped from 497KB to 100KB (not KiB, but 1000s) for processor-gsu.o

And the best part of all ... no speed hit whatsoever.

If you don't want to download the source, here's a quick source link:

http://hastebin.com/zopovafifi.coffee

AWJ, if you're looking to speed things up even more, you could still use your ternary branch unroll macro trick to avoid a pointless test on the bra() case. I would also recommend adding alwaysinline tags to the hottest instructions. And perhaps most importantly, it might make sense to store regs.sfr.alt as a 2-bit value (0-3) instead of separating the flags. Or possibly do both. That would avoid some cases where we have to do two or more alt tests in some instructions.

Also, for other CPUs ... (regs.sfr.alt1 ? regs.sfr.cy : 0) may be faster as regs.sfr.alt1*regs.sfr.cy instead.

EDIT: we can reduce the #define op sizes with a simple trick. Instead of:
case id+#: return op_##name(#);
We can use:
case id+#: return op_##name(#&15);

Then we can make an op4() for link, make op6() inherit op4(), make op12() inherit op6(), and replace ophi() with op15(). Of course, if we really wanted to code golf, it's probably not too difficult to write a preprocessor for-loop to make the repetition count a parameter to the op() macro. But, yeah, screw that.

..........

> We don't want the compiler to inline half a dozen entire copies of the branch instruction

The conditional branch instruction is literally just:
auto GSU::op_branch(bool c) { regs.r[15] += c ? (int8)pipe() : 0; }

And pipe() can't be inlined since it's an abstract virtual function. I'm not too worried about it bloating out the code size from inlining that.

> I think I've ranted at you about this before, but unless a function is a one-liner that there's no cost to alwaysinline-ing, you really shouldn't be writing functions that look like this:

... heheh. You're gonna hate my new GSU core, then. Darn, I was hoping you'd like it, too. I'm very happy with my numbers (500% object size decrease, 20% total code size decrease, no speed hit, *much* more DRY than before.)

.......

> http://forum.6502.org/viewtopic.php?f=4&t=4129

Fascinating! Thanks for bringing that up there! I've always wondered what the hell that was about. Though I still don't fully understand the why part behind it.

I'd love to try and pick their brains and see if they can find any more obscure timing quirks that we don't know about. I only discovered this one by a very fortunate fluke, and because I was so obsessive about having my H/V counter latches be absolutely perfect for my test ROMs. I'm sure there has to be more things like this, and knowing them might even improve our chances at not failing things like the Super Bonk intro or Magical Drop 2 sound thing.
AWJ
Posts: 433
Joined: Mon Nov 10, 2008 3:09 pm

Re: higan CPU emulation mode bug? (attn: byuu or any 65816 g

Post by AWJ »

byuu wrote:AWJ, if you're looking to speed things up even more, you could still use your ternary branch unroll macro trick to avoid a pointless test on the bra() case. I would also recommend adding alwaysinline tags to the hottest instructions. And perhaps most importantly, it might make sense to store regs.sfr.alt as a 2-bit value (0-3) instead of separating the flags. Or possibly do both. That would avoid some cases where we have to do two or more alt tests in some instructions.
That's similar to what I did, but I left some (but not all) of the alt instructions as separate functions (mainly the ones where there's literally no code in common, and cmp because it violates the "!alt2 is register, alt2 is immediate" pattern and requires excessive confusing bit-twiddling of the alt flags to merge it into sub/sbc) and I wrote the macros in the switch table somewhat differently:

https://github.com/awjackson/bsnes-clas ... _table.cpp

I'll test your version and tell you which is smaller and which is faster (using a performance build since it's easier to detect small performance differences when the time spent outside the GSU is minimized)

Re the alt flags, I'm currently doing some tests with Evan Teran's templated bitfields (fixing the correctness issue I pointed out in a comment on his blog, rewriting them to nall style circa bsnes 073, and adding serialization). If I'm satisfied with the code the compiler generates in all my tests, I'm going to start using them throughout bsnes-classic, starting with the 65816 and GSU. With his bitfields it's possible to access any of regs.sfr.alt, regs.sfr.alt1 or regs.sfr.alt2 with no getters/setters and no explicit masking in the client code.

One more thing: I'd advise reverting the split you did between the /processor/ and /sfc/coprocessor/ trees and just putting the whole GSU/SuperFX into one class again. The plotting stuff is deeply integrated into the CPU (it's done through special instructions rather than MMIO) and unlike, say, 65816 and ARM, we don't have any other CPUs using that instruction set to compare to--the decision which parts of the architecture are "generic" and which are "SNES-specific" is completely arbitrary.
AWJ
Posts: 433
Joined: Mon Nov 10, 2008 3:09 pm

Re: higan CPU emulation mode bug? (attn: byuu or any 65816 g

Post by AWJ »

tepples wrote:
AWJ wrote:There are two low hanging fruits in the PPU that I've noticed: the sprites and the windows. The sprites should render into a line buffer once each scanline (like the real hardware does), rather than searching through the list of 32 active sprites every pixel. This is a pure win--it's faster, more accurate to hardware, and should be equally simple if not simpler code.
I wonder if this is due to confusion with the NES PPU, which does use a set of eight shift registers (as far as I understand Visual 2C02). Yet I know the Neo Geo has a line buffer because it's stored on a discrete SRAM. When exactly did console PPUs switch from individual shift registers to a line buffer?
  1. The 2.5 generation (CreatiVision, ColecoVision, SG-1000)
  2. The third generation (NES, Master System)
  3. The 3.5 generation (TurboGrafx-16)
  4. The fourth generation (Genesis, Super NES, Neo Geo)
What's the per-scanline sprite limit on the TG16? I would guess that PPUs with a ratio close to 100% (you can completely cover a scanline with sprites) or greater than it are using a line buffer, because at that point a line buffer takes fewer transistors than per-sprite shift registers.
tepples
Posts: 22345
Joined: Sun Sep 19, 2004 11:12 pm
Location: NE Indiana, USA (NTSC)
Contact:

Re: higan CPU emulation mode bug? (attn: byuu or any 65816 g

Post by tepples »

AWJ wrote:I'd advise reverting the split you did between the /processor/ and /sfc/coprocessor/ trees and just putting the whole GSU/SuperFX into one class again. The plotting stuff is deeply integrated into the CPU (it's done through special instructions rather than MMIO) and unlike, say, 65816 and ARM, we don't have any other CPUs using that instruction set to compare to--the decision which parts of the architecture are "generic" and which are "SNES-specific" is completely arbitrary.
There exist other processors using instruction sets derived from that of the Super FX. It appears they're used in embedded applications when even ARM is too heavy. What would it take to obtain info about other applications of ARC in order to distinguish generic aspects from MARIO/GSU-specific ones?
AWJ
Posts: 433
Joined: Mon Nov 10, 2008 3:09 pm

Re: higan CPU emulation mode bug? (attn: byuu or any 65816 g

Post by AWJ »

tepples wrote:
AWJ wrote:I'd advise reverting the split you did between the /processor/ and /sfc/coprocessor/ trees and just putting the whole GSU/SuperFX into one class again. The plotting stuff is deeply integrated into the CPU (it's done through special instructions rather than MMIO) and unlike, say, 65816 and ARM, we don't have any other CPUs using that instruction set to compare to--the decision which parts of the architecture are "generic" and which are "SNES-specific" is completely arbitrary.
There exist other processors using instruction sets derived from that of the Super FX. It appears they're used in embedded applications when even ARM is too heavy. What would it take to obtain info about other applications of ARC in order to distinguish generic aspects from MARIO/GSU-specific ones?
According to Wikipedia those are 32-bit processors with 16-bit or 32-bit instruction words. The GSU is 16-bit with an 8-bit instruction word. Even if ARC is ultimately derived from the GSU, they're even more different than a 6502 is from a SPC700.
Near
Founder of higan project
Posts: 1553
Joined: Mon Mar 27, 2006 5:23 pm

Re: higan CPU emulation mode bug? (attn: byuu or any 65816 g

Post by Near »

> mainly the ones where there's literally no code in common, and cmp because it violates the "!alt2 is register, alt2 is immediate" pattern and requires excessive confusing bit-twiddling of the alt flags to merge it into sub/sbc

Yeah. If not for the op_##name mangling, it probably would have been nicer to keep different classes of opcodes separate. There's three or four that don't share a single line of code.

I like your example though. You even beat me to the id&15 thing, nice. Very clever to just use opcode. I'll take some ideas from yours and refine mine more in that case, thanks.

> Re the alt flags, I'm currently doing some tests with Evan Teran's templated bitfields

I implemented this in bsnes in the past. Went away when I just started using separate booleans for flags and the order_lsbN macros for registers. I liked not needing all the operator overloads for every kind of math operation inside every processor core.

I also had template Range<Lo, Hi> at one point, when I was doing "variable.b0, variable.w1". I moved to functions because then I could use any bit-range I wanted instead of having pre-defined byte/word/long granularity only.

But I was recently thinking, if I inherit from Natural<T>, then I could add my own Range<Lo, Hi> values for things like .b, .w, etc. And I can use the no-suffix version for when I want the full size (eg regs.pc instead of regs.pc.d)

One thing was, in the main Natural<T> class, you could get the Range<>s for free:

Code: Select all

template<uint Bits> struct Natural {
  ...
  template<uint Lo, uint Hi> Range {
    ...
    uint value;
  };
  union {
    uint value;
    Range<0, 7> b0;
    Range<8, 15> b1;
    ...
  };
};
static_assert(sizeof(Natural<3>) == sizeof(uint));
This may be a violation of ISO C++ standard section 37.218.6.17 or something, but ... it works on every platform I've tried it on so far.

When we inherit from it:

Code: Select all

struct Reg16 : Natural<16> {
  Range<0, 7> l;
  Range<8, 15> h;
  Range<0, 15> w;
};
Then this trick doesn't work. We'll need the Range objects to capture a Natural<T>& reference. This isn't a big deal though for a CPU core to expend a few extra bytes for a dozen registers. Hopefully C++11 class initializers will allow "Range<0, 7> l{*this};" or we'll need some ugly as hell constructors.

And again, the funny thing is, this trick tends to generate more efficient code than bit-field unions in my experience.

> One more thing: I'd advise reverting the split you did between the /processor/ and /sfc/coprocessor/ trees and just putting the whole GSU/SuperFX into one class again.

I agree completely. I've talked with Cydrak and others about this in the past. I feel processor/ is one of my bigger refactoring mistakes.

In reality, it gets two reuse cases: the SNES CPU and SNES SA1 (it would be trivial to just have the SA1 inherit from a CPUcore that's inside sfc/cpu); and the SNES ST018 and GBA ARM. The latter is a lot trickier. Technically, I'm cheating and the ST018 gets a full ARM7T(MD|DM)I, even though it's really an ARM6 or so. Doesn't matter in practice because the ROM is stored inside the CPU as mask ROM. But still, it's cheating badly.

Right now, we're paying virtualization costs on function calls because the derived class exists in a different object file, so the compiler can't figure out that it doesn't really need to be virtual (assuming compilers are even that smart in the first place.)

If I ever get to a point where I reduce the massive bloatedness of my ARM core, and iron out all the remaining bugs (ARM is way worse than the V30MZ [x86 clone] with ridiculous edge cases, I absolutely hate ARM), then I'll consider spinning them off again and have gba/cpu and sfc/coprocessor/st018/core.

And if I reach that point, then I will undo processor/ in its entirety. I'm not MAME/MESS, so I don't really need this kind of separation. Even looking forward, there's only one other possible CPU I might end up wanting to share at some point in the distant future, but it's a long shot.
AWJ
Posts: 433
Joined: Mon Nov 10, 2008 3:09 pm

Re: higan CPU emulation mode bug? (attn: byuu or any 65816 g

Post by AWJ »

This is what Evan would do and what I'm planning to do for your Reg16 example. The template parameters are the underlying builtin type, the lowest bit position, and the size in bits (not the highest bit position). The underlying type has to be explicitly specified for the reasons I explained on Evan's blog. The bit size is required so that 1-bit bitfields can be specialized to have boolean semantics (you want flags.cy = 2 to set the bit to 1, not 0)

Code: Select all

union Reg16 {
    bitfield<uint16_t, 0, 8> l;
    bitfield<uint16_t, 8, 8> h;
    bitfield<uint16_t, 0, 16> w;
};
No per-application bit twiddling or operator overloads--all that crap is done by the bitfield template. No internal references bloating things and making them inefficient to pass around by value--an instance of Reg16 is the same size as a builtin uint16_t. You just need every bitfield union you define to contain one member that captures the entire range of used bits, and use that member for serialization.

Note that bitfield<uint16_t, 0, 16> is only needed for strict language conformance. Certainly GCC and probably every compiler will let you get away with using a straight uint16_t as the "whole shebang" member.

Of course for Reg16 this is probably still less efficient than an endian-dependent union. The real point of these unions is for fields that aren't byte-sized or -aligned.

The main thing you can't do with these is you can't pass a reference to an arbitrary bitfield into a function, because each named field is a different, incompatible type. So if you use these for CPU flags you can't have a generic set_flag(flag &which) opcode handler (but you wouldn't be able to do that with the reference-based version you're musing about either) My advice is that you accept that limitation, and if you really, really need to do that then use something else (e.g. a bunch of bools) for that specific application.

If you don't mind me criticizing your programming style once again, I think your worst habit is trying to cover excessively broad ranges of use cases with a single kitchen-sink class (your vector that's also a deque is exhibit #1)) Your second worst habit is language envy, where you insist on shoehorning simulated versions of other languages' features into C++ no matter the cost in efficiency (I've noticed that you really, really, really like property semantics and it breaks your heart that C++ doesn't have them. Tough. If you like them so much then learn Python; it's a terrific language for just about everything except OS kernels and emulators. A large proportion of the awesome sauce in C++1x/y/z was inspired by Python, so you'll feel right at home)
Last edited by AWJ on Mon May 30, 2016 2:31 pm, edited 1 time in total.
Near
Founder of higan project
Posts: 1553
Joined: Mon Mar 27, 2006 5:23 pm

Re: higan CPU emulation mode bug? (attn: byuu or any 65816 g

Post by Near »

> The bit size is required so that 1-bit bitfields can be specialized to have boolean semantics (you want flags.cy = 2 to set the bit to 1, not 0)

That's actually debatable. In most cases that is what you want, but there are fun things you can do with a uint1 type.

Code: Select all

uint3 a = opcode >> 0;
uint1 b = opcode >> 3;
uint4 c = opcode >> 4;
I know you can rewrite it like this:

Code: Select all

uint3 a = opcode & 0x07;
uint1 b = opcode & 0x08;
uint4 c = (opcode & 0xf0) >> 4;
It's nice for the consistency. And you can also say:

Code: Select all

uint1 b = bool(data & 0x08);
For this reason, I have both uint1 and boolean types to represent the two differences -and- allow for member functions to do whatever on the types.

> No per-application bit twiddling or operator overloads--all that crap is done by the bitfield template.

So your method would disallow treating the parent as the whole object.

Okay, but what about the SPC700's YA register? You're going to end up with:
regs.ya.a
regs.ya.y
regs.ya.ya = yayaya.iAmLorde

Ick.

I see why the type is mandatory with this style. That's a shame, but no way around that.

> So if you use these for CPU flags you can't have a generic set_flag(flag &which) opcode handler

EDIT: oh, nevermind. I didn't read the flag part. Yeah, that's a shame.

Type erasure or template functions are the only possible workarounds. The latter is an absolute "fuck no" for obvious reasons.

> your vector that's also a deque is exhibit #1

nall::vector is as fast as std::vector on both accesses and appends, and is O(1) vs O(n) for prepends/pop_fronts.

My goal is to keep code size small. nall::vector is 8.86KiB of code. Without the deque-stuff, it'd be ~7.5KiB. Duplicating it all and then changing a small bit for a deque would make it ~16KiB, and I'd have to keep the APIs in parity. When I wanted to read a file, I'd get a vector<uint8_t> from file::read, and then when I wanted to use that with prepend/pop_front for some reason, I'd need to convert a vector to a deque, or write a file::readAsDeque function.

Further, nall::vector isn't a deque. std::deque does not guarantee contigious storage. deque is slightly faster as a result, but you can't grab a raw memory pointer to the array which is a deal breaker for a huge amount of use cases.

The only overhead the prepend/pop_front functionality adds to nall::vector right now is an extra 4 bytes for the container object. It doesn't even waste any extra space on size growth amortization if you don't use that functionality.

For once ... instead of just saying, "this thing you're doing is dumb", I'd like to have you give a strong rationale for why having prepend functionality in my vector class is such a terrible thing. Please elaborate on all the problems caused by this functionality.

> I've noticed that you really, really, really like property semantics

You mean the readonly<> templates? I've been slowly removing those. I like the idea, but it's too ugly in the code.

What I really want are getter/setters. We shouldn't need to have Foo::value(), Foo::setValue(Value), Foo::_value. This breaks all of our math.

Example:

Code: Select all

size.width += size.height + 4;
size.setWidth(size.width() + size.height() + 4);
Ick ick ick ick ick. And not hyperbole. This kind of code is all over hiro with the Geometry class.

Code: Select all

struct Size {
  int width;  //our first iteration of our class; we're fine treating this as a variable
  property<int> height {  //a later iteration, turns out if height changes we need to do other stuff
    operator int() { return *this; }
    auto& operator=(int value) { *this = value; doStuff(); return *this; }
  };
  //the layout of the struct is the same after we changed height;
  //all existing code that used size.height still compiles and is backward-compatible;
  //although .o files that use Size::height will need to be recompiled for obvious reasons
};
Of course, the thing I'd really, really kill for is unified function call syntax. But that got tanked by ... short-sighted people on the C++17 Jacksonville panel.

> Tough. If you like them so much then learn Python

Okay then. I'll get to work on rewriting bsnes in Python, so that I can have properties :P
Post Reply