What is the slowest part of 6502/65816 running C?

You can talk about almost anything that you want to on this board.

Moderator: Moderators

psycopathicteen
Posts: 3001
Joined: Wed May 19, 2010 6:12 pm

What is the slowest part of 6502/65816 running C?

Post by psycopathicteen »

I've heard there is more of a performance hit with 65xx chips than with other CPUs. I want to know the reasons of why it is, and try to come up with ways to fix it.
FrankWDoom
Posts: 255
Joined: Mon Jan 23, 2012 11:27 pm

Re: What is the slowest part of 6502/65816 running C?

Post by FrankWDoom »

CPUs don't run C code. C and any other languages get compiled into instructions the cpu can understand. C compilers for the 65xx chips aren't the best at optimizing C statements into efficient low level instructions they way mature compilers for other platforms are. We're still at the point where handwritten asm is likely to be more performant. If you want to improve the situation, improve the C compilers.
lidnariq
Posts: 10677
Joined: Sun Apr 13, 2008 11:12 am
Location: Seattle

Re: What is the slowest part of 6502/65816 running C?

Post by lidnariq »

The 6502 doesn't do stack-based indexing particularly well, and a naïve translation of C to machine code (as well as any instance requiring recursion) really wants a fast stack-based indexing.
tepples
Posts: 22345
Joined: Sun Sep 19, 2004 11:12 pm
Location: NE Indiana, USA (NTSC)
Contact:

Re: What is the slowest part of 6502/65816 running C?

Post by tepples »

FrankWDoom wrote:C compilers for the 65xx chips aren't the best at optimizing C statements into efficient low level instructions they way mature compilers for other platforms are.
What psycopathicteen asks is why this is the case: whether it's an inherent limit of the architecture or just a matter of lack of interest in a sub-32-bit architecture among authors of Free compilers. GCC won't save you because "We don’t support 16-bit machines in GNU" (GNU Coding Standards: Portability between CPUs).

For the 6502, the answer is simple: registers are smaller than int, smaller than size_t, and smaller than char *. Most C operations require promotion of char values to int. Nor can registers be paired into a pointer, making it harder to have automatic variables in a recursive program. So compilers have to use zero page to store 16-bit values, including the pointer to the automatic variable stack. Accessing arrays through pointers stored as local variables on the stack is thus relatively slow. True, it would be possible to cache pointer variables in software-defined registers on zero page, except for a couple things. First, some people expect to have their programs interoperate with ROM BASIC interpreters on Apple II, Commodore 64, and Atari 800, which take a large chunk of zero page space. Second, some people expect to write interrupt handlers in C, which may clobber the software-defined registers, especially when an NMI interrupts an IRQ that interrupted the main thread.

In addition, because there is no hardware multiply, accessing elements of an array can be slow if the access is not sequential and the size of an element is not a power of two.

The 65816 fixes the pointer problem, as a near pointer can be held in X, Y, or D. The d,s and (d,s),y addressing modes largely fix the problems with pointers to local variables. But it shares the lack of hardware multiply. Thus there were some decent C compilers, such as APW and ORCA/C, but they're non-free and still mostly payware. A program that relies on a non-free compiler is called Java trapped.
If you want to improve the situation, improve the C compilers.
How would we go about attracting skilled compiler authors, especially for a niche platform like 65816, and set up a Kickstarter campaign?
User avatar
Bregalad
Posts: 8036
Joined: Fri Nov 12, 2004 2:49 pm
Location: Caen, France

Re: What is the slowest part of 6502/65816 running C?

Post by Bregalad »

The major problems are of "ANSI" C are:
  • C was designed for 16-bit or more CPUs. The ANSI standard says that "int" should be at least 16-bit. The 6502 has only 3 8-bit registers, so the usage of 16-bit instructions and variables is a major performance hit if you "naively" write C code. Of course it can be fixed by using "char" type everywhere...
  • ...but the problem of authomatic promotion remains. If you add two "chars" together, the ANSI standard specify the result is an "int". So if you conform to the standard you HAVE to do it the inneficient way. Of course it's possible to detect that the high byte will be unused later and delete those instructions internally, but this complicates the compiling process as opposed to a 16-bit CPU for example.
  • C was designed for CPUs that adress the whole adress space equally. As such, without using any kind of clever trick a C compiler can not take *any* advantage of the zero page, except for temporary storage. A C compiler also assumes all variables are on the stack, and on the 6502 acessing the stack other than the top most element is tricky to say the least. It is possible but it kills the X register. Finally, the stack is limited to 256 bytes.
  • C was designed for CPU with orthogonal instructions and adressing modes. It cannot take advantage for $xxxx,Y and $xxxx,X adressing mode for adressing arrays, because the index of an array is always "int", which again is 16-bit and do not fit in X and Y register. Even worse, it's signed int, so negative index to arrays should be allowed by the standard, and it does not work on a 6502 except within the zero page, which is unfortunately unusable. So the access to "any" array, even single dimentional, will copy the array adress to ZP temporaries, add a 16-bit value to that temporary and and use that pointer to access the element, which is ridiculously bloated.
So the choice to compile C for 6502 would be either depart from the standard largely, or make a super clever compiler that handles all those cases and produces efficient code when it detects doing so is possible.

A third approach, would be to say that since C will be slow and bloated anyway, you just generate an intermediate bytecode and interpret that bytecode on the 6502. That's an approach I'd seriously take, if only I could find a "suitable" bytecode which would be simple and simple to interpret. Most open bytecodes I could find on the net are way too complex unfortunately.
GCC won't save you because "We don’t support 16-bit machines in GNU
But GCC has an AVR port which is 8-bit. (very different from 6502, but still)
the size of an element is not a power of two
This is an extremely rare case and definitely not part of why C is slow on the 6502. GCC pads its structs to fix that problem by default, because on modern architecture it's better to waste memory than time multipliying by the size of a weird sized element.
tepples
Posts: 22345
Joined: Sun Sep 19, 2004 11:12 pm
Location: NE Indiana, USA (NTSC)
Contact:

Re: What is the slowest part of 6502/65816 running C?

Post by tepples »

Would the work done by Intel on support for 8086 and 80286 in LLVM help 65816 any?
the size of an element is not a power of two
This is an extremely rare case
Not if you have arbitrarily sized struct Actor.
GCC pads its structs to fix that problem by default
I thought GCC padded up to the next multiple of the largest alignment of a member, such as 4 bytes if CHAR_BIT == 8 and the struct contains an int32_t. What makes you think it pads, say, a struct with 5 int32_t members from 20 bytes up to 32?
lidnariq
Posts: 10677
Joined: Sun Apr 13, 2008 11:12 am
Location: Seattle

Re: What is the slowest part of 6502/65816 running C?

Post by lidnariq »

tepples wrote:Would the work done by Intel on support for 8086 and 80286 in LLVM help 65816 any?
[LLVMdev] 16-bit x86 status update wrote:In fact we've implemented no 16-bit ABI at all. This is really 32-bit
code, 32-bit object formats, 32-bit ABIs. Just expecting to run on a CPU
which happens to be in 16-bit mode and hence needs the 0x66 and 0x67
prefixes to be used. A lot.
So ... Doubtful.
User avatar
Bregalad
Posts: 8036
Joined: Fri Nov 12, 2004 2:49 pm
Location: Caen, France

Re: What is the slowest part of 6502/65816 running C?

Post by Bregalad »

I haven't studied it deeply, but LLVM is a very complex "bytecode", that natively supports variable sized strings and similar concept. It has nothing to do with a "simple" bytecode that I was thinking of that would be made of between 15 and 30 simple instructions at most.
User avatar
rainwarrior
Posts: 8062
Joined: Sun Jan 22, 2012 12:03 pm
Location: Canada
Contact:

Re: What is the slowest part of 6502/65816 running C?

Post by rainwarrior »

psycopathicteen wrote:What is the slowest part of 6502/65816 running C?
Honestly the architectural issues aren't really the big problem. The slowest thing is the existing compilers we have for this target. They're just not up to the optimization task.

Compilers are among the most difficult and compilcated computing tasks. You can't expect a compiler like cc65, used by only a handful of people, to compare to an old workhorse like GCC with millions of development hours behind it.

The second slowest thing is just the CPU itself. The targets we're talking about just aren't very powerful computers. There's always a loss of efficiency when using C, but if you've got lots of computing power it doesn't have to matter. In a case like the NES, the lack of power magnifies the impact of an efficiency loss like this, which is already bad because of the compiler quality.

This CPUs are from an era before C was popular, so there was never very good C tools for it. The problem isn't really C itself, or the processor.
tepples
Posts: 22345
Joined: Sun Sep 19, 2004 11:12 pm
Location: NE Indiana, USA (NTSC)
Contact:

Re: What is the slowest part of 6502/65816 running C?

Post by tepples »

Consider 65816 vs. 68000, processors from roughly the same hardware generation. I was under the impression that the existing widely available compilers targeting 68000 were a lot better because of the sixteen 32-bit registers (of which D0 and A4-A7 correspond to the five 16-bit registers AXYDS on 65816), instructions that make more orthogonal use thereof (apart from a few instructions that hardcode A7 as the stack pointer), and hardware 16x16 multiplier. Is this an advantage of the 68000 architecture, or is it just that Atari ST, Amiga, and Mac outnumbered Apple IIGS?
User avatar
rainwarrior
Posts: 8062
Joined: Sun Jan 22, 2012 12:03 pm
Location: Canada
Contact:

Re: What is the slowest part of 6502/65816 running C?

Post by rainwarrior »

tepples wrote:Atari ST, Amiga, and Mac outnumbered Apple IIGS?
Do you really consider Atari ST, Amiga and Mac the same generation as the Apple IIGS? I don't.

I thought the IIGS was popular for a long time just because it was cheap and backwards compatible, not because of computing power. (Kinda like Wii vs PS3?)

Also we're still talking about an era where assembly was vastly preferred for high performance applications like games. C existed, and a lot of people were using it on the 68000, especially hobbyists, but I don't think "good" C compilers really started to happen until some time in the 90s.
psycopathicteen
Posts: 3001
Joined: Wed May 19, 2010 6:12 pm

Re: What is the slowest part of 6502/65816 running C?

Post by psycopathicteen »

C was designed for CPU with orthogonal instructions and adressing modes. It cannot take advantage for $xxxx,Y and $xxxx,X adressing mode for adressing arrays, because the index of an array is always "int", which again is 16-bit and do not fit in X and Y register. Even worse, it's signed int, so negative index to arrays should be allowed by the standard, and it does not work on a 6502 except within the zero page, which is unfortunately unusable. So the access to "any" array, even single dimentional, will copy the array adress to ZP temporaries, add a 16-bit value to that temporary and and use that pointer to access the element, which is ridiculously bloated.
So the 65816 got screwed over by having X and Y being unsigned instead of signed?
User avatar
rainwarrior
Posts: 8062
Joined: Sun Jan 22, 2012 12:03 pm
Location: Canada
Contact:

Re: What is the slowest part of 6502/65816 running C?

Post by rainwarrior »

psycopathicteen wrote:
C was designed for CPU with orthogonal instructions and adressing modes. It cannot take advantage for $xxxx,Y and $xxxx,X adressing mode for adressing arrays, because the index of an array is always "int", which again is 16-bit and do not fit in X and Y register. Even worse, it's signed int, so negative index to arrays should be allowed by the standard, and it does not work on a 6502 except within the zero page, which is unfortunately unusable. So the access to "any" array, even single dimentional, will copy the array adress to ZP temporaries, add a 16-bit value to that temporary and and use that pointer to access the element, which is ridiculously bloated.
So the 65816 got screwed over by having X and Y being unsigned instead of signed?
Not really. The indexing issue can be optimized away any time the index being used is an unsigned char. The ZP pointer issue can be optimized away any time the array is static.

Trying to use a negative index to an array on the 6502 is problematic, yes, but that has nothing to do with C. Same deal with using 16 or 32 bit numbers everywhere, that's a problem for the platform, not really for C itself. If you want your code to be able to run well you have to know your platform and make concessions for it. Just like having a multiplication instruction makes multiplication easier, but again not specific to C but the platform; you avoid using that stuff on a platform that can't do it well.

Actually, that's basically why cc65 is somewhat usable, and someone like Shiru is able to develop games with it quickly. If you limit yourself to stuff that you know the platform + compiler will handle well, you can still have plenty of the utility of C without the code being too inefficient. Prefer static variables to locals, use unsigned char for everything, use static arrays instead of passing pointers around, etc. etc.

It's perfectly possible to write a much better C compiler for NES / SNES, but it's just not practical given the resources we have. (How many people are both capable and interested? Approximately zero, it seems.)
User avatar
Drew Sebastino
Formerly Espozo
Posts: 3496
Joined: Mon Sep 15, 2014 4:35 pm
Location: Richmond, Virginia

Re: What is the slowest part of 6502/65816 running C?

Post by Drew Sebastino »

I may not speak for everyone, but I though programming in C kind of ruins the fun. You're doing this for fun, not because you need to get the game out the door faster.

I'm still waiting for WDC's Terbium...
User avatar
nicklausw
Posts: 376
Joined: Sat Jan 03, 2015 5:58 pm
Location: ...
Contact:

Re: What is the slowest part of 6502/65816 running C?

Post by nicklausw »

Espozo wrote:I may not speak for everyone, but I though programming in C kind of ruins the fun.
I hate C, along with most other high-level languages. They all have the weirdest ways of doing things, and you can tell the creators came up with a lot of stupid workarounds for things that could be very simple.
Post Reply