Page 1 of 3

Desired Compiler Features

Posted: Wed Dec 15, 2010 1:31 pm
by 67726e
I am currently planning a 6502 compiler for a few reasons:
1) I'm kind of annoyed by different features or the lack thereof with different compilers.
2) I'm in an IB CS class and have to write a program for my final so I figured I'd go with something I might actually use and like.

On the note of the IB CS part, it is recommended that I interview others who have the potential to use it about what they would want in the program.

So what features would you find highly desirable in a compiler. Right now I'm looking at a few features:

1) Basic Optimizations (Kind of obvious but not all compilers do this *cough*NESASM*cough*)
2) Block commenting (I do not know of any compilers that offer this)

Posted: Wed Dec 15, 2010 2:30 pm
by tepples
Are you making an "assembler", which compiles a dialect of assembly language to an object file, or a "compiler", which is generally understood to compile a higher-level language?

Posted: Wed Dec 15, 2010 2:39 pm
by clueless
Awsome! I wrote a mini C compiler for my final college project too.

I've often thought about writing a "NES Friendly" compiler for a "C-like" language. I like cc65, but not for NES development.

One feature that I would add, and I recommend to you, is the ability for your language to handle 8, 16, 24 and 32 bit signed and unsigned integers and fixed-point math.

For example, your game's world coordinate system might be 2^16 pixels wide, but you want sub-pixel precision for object movement. So you really want an unsigned 24-bit variable to store the X coordinate. (I'm ignoring multiple dimensions). However, you want to store your object's velocities as single (signed) bytes, where the upper bit is the sign (two's complement of course), the next 4 bits are pixel magnitude, and lower 3 bits are the sub-pixel part. So you want to write in code "X += V" and have the compiler produce the correct bit-shifts, clcs, adcs, etc...

Same for multiplication. You might want to express "X += V * sin(A)". There should be a way to give the compiler a hint that sin(A) is in range -1 to +1 and that V*sin(A) will not need all of the bits that a less-intelligent compiler would emit code for. (where 'sin' and 'cos' are LUTs).

Anyway, just some thoughts of mine about a NES-friendly compiler feature.

Posted: Wed Dec 15, 2010 4:06 pm
by 67726e
tepples wrote:Are you making an "assembler", which compiles a dialect of assembly language to an object file, or a "compiler", which is generally understood to compile a higher-level language?
I'm writing an assembler. Straight 6502 -> ROM.

Posted: Wed Dec 15, 2010 4:19 pm
by tokumaru
67726e wrote:Basic Optimizations (Kind of obvious but not all compilers do this *cough*NESASM*cough*)
67726e wrote:I'm writing an assembler. Straight 6502 -> ROM.
You got me a little confused then... What kind of optimizations should assemblers do? Assembly should be a straight conversion from mnemonics to binary code, and I don't know what kind of optimizations you think would be desirable during this process.

For example, on the NES there are times when you write the same value to the same memory location twice in a row, such as when you are resetting the scroll to (0, 0):

Code: Select all

LDA #$00
STA $2005
STA $2005
An assembler that "optimizes" code might think that it's stupid to write the same value to the same location twice if it doesn't know that this address is mapped to a register and that all writes are being watched and have a meaning. I'd be very pissed if my assembler filtered out one of the writes, as that would produce a very hard to find bug.

I don't think there is any case I would want an assembler to monkey around with the code I wrote. The whole point of ASM is that you give the CPU precise instructions that should be followed to the letter, even if they don't always make sense.

Posted: Wed Dec 15, 2010 4:27 pm
by 67726e
tokumaru wrote: I'd be very pissed if my assembler filtered out one of the writes, as that would produce a very hard to find bug.
Well to be fair, I could program the optimization to recognize necessary double writes, seemingly useless reads, etc.

Also, I'm not necessarily saying the compiler will have carte blanche and just do what it will with your code. I'm thinking more like make an option to that outputs possible spots of code that are not needed or could be better written. Just as an example:

Code: Select all

LDA Some_Address
CMP #$00
BEQ Another_Location
The user would be prompted of this possible optimization:

Code: Select all

LDA Some_Address
BEQ Another_Location
Finally, I really had in mind the whole zero-page issue with NESASM3 although I'm not sure if you would consider that an optimization or not.

Posted: Wed Dec 15, 2010 4:31 pm
by tepples
It sounds like someone's trying to make an assembler capable of peephole optimizations. Such an assembler would need a keyword to declare a symbol exempt from such optimization, just as C does with 'volatile'. It might look like this:

Code: Select all

PPUCTRL   = volatile $2000
PPUMASK   = volatile $2001
PPUSTATUS = volatile $2002
OAMADDR   = volatile $2003
OAM_DMA   = volatile $4014
PPUSCROLL = volatile $2005
PPUADDR   = volatile $2006
PPUDATA   = volatile $2007
As for eliminating CMP #$00 after an instruction that loads A, CMP #$00 has the side effect of always setting the carry flag like SEC, and Another_Location might expect the carry to be set for an SBC, ROL, or ROR.

Posted: Wed Dec 15, 2010 4:47 pm
by tokumaru
67726e wrote:Well to be fair, I could program the optimization to recognize necessary double writes, seemingly useless reads, etc.
I think this severely limits the usefulness of your assembler. You might have a good understanding of the NES, but how much do you know about the hundreds of other computers that have a 6502 in them? I am really glad I can use the same assembler to make NES programs and Atari 2600 programs, something I probably couldn't do with NESASM and its stupid platform-specific things such as 8KB banks.
Just as an example:

Code: Select all

LDA Some_Address
CMP #$00
BEQ Another_Location
The user would be prompted of this possible optimization:

Code: Select all

LDA Some_Address
BEQ Another_Location
I think of this kind of thing as unnecessary hand-holding. If you are coding in assembly, you must have some confidence in what you are doing. These little details are things that people usually learn to fix pretty early on, and it's not like they cause programs to be terribly inefficient. I think it's not worth the trouble because you'll spend a lot of time making a program smart enough to identify these situations (and the result is never 100% smart, I'm sure there will be dumb or plain wrong suggestions sometimes) but once newbies get past the "writing stupid code" phase it will rarely present any advantage. You might be spending your time better if you focused on other features related to macros, memory management and things like that, in order to make your assembler a very solid and useful program.
Finally, I really had in mind the whole zero-page issue with NESASM3 although I'm not sure if you would consider that an optimization or not.
I guess this is an optimization after all. I'd rather have the option to do it one way or the other though, like using a command to pick the default option but still have the mans to select a specific addressing mode for individual instructions when more than one are possible.

Posted: Wed Dec 15, 2010 4:50 pm
by tokumaru
tepples wrote:As for eliminating CMP #$00 after an instruction that loads A, CMP #$00 has the side effect of always setting the carry flag like SEC, and Another_Location might expect the carry to be set for an SBC, ROL, or ROR.
This is the perfect example of how optimizations like these can go terribly wrong. Assembly code is often more than what it appears to be, and making a program that always makes smart and *safe* decisions is not a trivial task.

Posted: Wed Dec 15, 2010 5:08 pm
by 3gengames
But what if the instruction is in there for cycle-accuracy? Then it would be taking out a VERY key component. I'd rather have no optimizations, but thats just me. If you want optimized code, it shouldn't be the assembler that does it, it should be you. :)

Posted: Wed Dec 15, 2010 5:18 pm
by tepples
tokumaru wrote:If you are coding in assembly, you must have some confidence in what you are doing.
Some people code in assembly because they have full confidence in what they are doing. (You're likely to agree with me that action games' graphics engines should be left to such people.) Others code in assembly because most of the available high-level languages are nowhere near efficient enough to make even a non-scrolling game. Perhaps an assembler with a peephole optimizer could be a useful step toward making NES programming more accessible, if only as a base on which to build a compiler.
I think it's not worth the trouble because you'll spend a lot of time making a program smart enough to identify these situations
Whenever I compile a C program with gcc -Wall -O2, I thank the GCC team for making its compiler smart enough to find type errors for me and to optimize the RTL that the C translation front-end generates.
You might be spending your time better if you focused on other features related to macros, memory management and things like that
At this point, would it be worth it to make this project an extension to ca65 (zlib licensed 6502 assembler) instead of a rewrite from scratch?

@3gengames: For a cycle-timed subroutine, one should be able to mark a block of code as not suitable for peephole optimizations, and the assembler won't apply them there.

Posted: Wed Dec 15, 2010 5:20 pm
by 67726e
Let me clarify, the compiler will not optimize the code for you, it will output something saying "Check out this line, you might want to do this".

Now I'm assuming someone who is writing code to be cycle-accurate would know what they are doing and know this would break the code.

The optimization recommendations are also triggered via a switch when the program is called e.g. 'assembler game.asm -optimize'. It is by no means a feature that is run without the user's expressed desire, and even then it will tread lightly.

That said, do you still think it is a bad idea and just plain needs to be scrapped. In that case, for something like zero-page, how would you go about doing that? Would you always do zero-page unless told not to or always do $0001 unless told to do $01?

Also, are there any other things you would want in a compiler? Anything you think I should avoid?

Posted: Wed Dec 15, 2010 5:33 pm
by tepples
So we introduce two qualifiers 'absolute' and 'volatile', and we introduce a directive '.peephole'.

Code: Select all

somelabel = $F1
anotherlabel = absolute $F2
SNDCHN = volatile $4015

; These generate zero page/direct page addressing mode
lda $F1
lda somelabel

; These generate absolute addressing mode
lda absolute $F2
lda anotherlabel

; This LDA/CMP won't be changed to LDA/SEC.
.peephole off
LDA Some_Address
CMP #$00
BEQ Another_Location
.peephole on
; But if peephole were turned on, the SEC can be slid
; upward until it meets up with another that sets C.

; This LDA won't get removed, even though the values of
; A and flags NZ after the second LDA would ordinarily be
; the same as after the first LDA, because the label has the
; volatile qualifier.
lda #$0F  ; instruction setting NZ flags
sta SNDCHN
lda SNDCHN

Posted: Wed Dec 15, 2010 5:42 pm
by atarimike
As someone pointed out, you're talking about a assembler, not a compiler.

There's already great assemblers out there. You might want to focus just on the optimizations - instead of generating code and warnings, just generate warnings. Much like lint. Then people could run their code through your tool, fix the warnings they cared about, and then assemble it with ca65 or something else.

Posted: Wed Dec 15, 2010 5:48 pm
by 67726e
atarimike wrote:As someone pointed out, you're talking about a assembler, not a compiler.

There's already great assemblers out there. You might want to focus just on the optimizations - instead of generating code and warnings, just generate warnings. Much like lint. Then people could run their code through your tool, fix the warnings they cared about, and then assemble it with ca65 or something else.
I just wanted to point out that I am doing this for a class and there are certain criteria my application must meet. Something that just goes through and generates warnings does not sound like it would meet all of my criteria.