Again, you'd only need to call this once.
Of course, the problem with such a test function is that it can only be accomplished at run-time. Thus, any time you were to actually use a big or little endian specific function, you'd had to go through a function pointer or conditional test. Or go really evil and rely on self-modifying code :)
Best bet is to try and detect the platform based on compiler-specific #defines, and fall back on letting the user manually choose endianness. And finally, create a run-time assertion on startup to ensure the correct endian was chosen.
Still, for an NES assembler, is it really worth the speed benefit for all the extra hassle; when you can use the same code on all platforms? I can't imagine writing more than 1MB of data this way. Surely the added overhead isn't even close to 1ms.
I was impressed with the simplicity of the memory model. It was clearly made to just assemble and work, without arcane segments and other things used by other assemblers.
I've been trying to convince people of that approach for over a decade now. That kind of flexible magic can be there, just only require it when it is really required.
So it's a sort of bool/char* union, but without having to use a union. It may even use this as the type flag, so if the pointer is (char*)1, then it's a bool with the value true. If it's NULL, then it's a bool with the value false. If it's neither, then it's a normal char* pointing at something.
So then, I assume a value of (char*)2 represents a file not found condition?