Thoughts on Higher Level Language Design for 6502 Systems

Discuss technical or other issues relating to programming the Nintendo Entertainment System, Famicom, or compatible systems.

Moderator: Moderators

User avatar
qbradq
Posts: 952
Joined: Wed Oct 15, 2008 11:50 am

Post by qbradq »

Rudla,

Thank you very much for your input! You have given me a lot to think about.

Your points reinforce the insight this language is trying to express: that low-level language design patterns should be expressed in a higher-level language. I think it is inappropriate for the design patterns of established languages (such as C) to be back-ported to this architecture. For instance, I don't know of anyone (not to say that there aren't any) that use a stack-based expression parser in hand-written assembly.

This will be a cross-compiler. No use in trying to write a self-hosting compiler on an NES now is there? :D



Expression parsing is stack based, but only at compile-time. The expression processing will be reduced to standard 6502 mathematics.

Example:

Code: Select all

r = a - b + 4
Becomes:

Code: Select all

clc
lda b
adc #4
sta ___temp0
sec
lda a
sbc ___temp0
sta r
Rather than using the hardware stack or a software-controlled dynamic stack for storage of the intermediate variable a zero-page temporary variable is allocated statically at compile time, thus allowing this type of hand-written equivalent code to be generated.

However this static assignment of temporary variable space means that the entire scope of the expression must be constant, knowable at compile-time and consistent. This is why function calls within expressions are not possible.

Here's a more detailed example:

Code: Select all

byte someFunc(a, b) {
  byte tempVal = a + b;
  tempVal |= %00100101;
  // This expression allocates temporary values on the static expression stack
  return tempVal - a + b;
}

// This expression must also allocate temporary values
byte myVal = c - someFunc(9, 11) + 9;
OK, so maybe that didn't clear things up. Anyhow, the point is stack-based operations on the 6502 are inefficient when compared to the above generated code example, and that's the level of optimization I am going for.



As for larger arrays, I had thought about using two different addressing modes for small and large arrays, but I gave up on this idea as there were things that would get inefficient. That's why I have the notion of "memory files" like you were asking about. Here's a quick example:

Code: Select all

word myPtr = mapDataArray + (mapYPos - 1 << 5/* x32 */) + mapXPos - 1;

// Read the tile we're standing on
//               Pointer       Y Index Value
byte tile = read(mapDataArray, 32 + 1);  // 32 + 1 is optomized down to 33 at compile time

// Read the tiles around us

tile = read(mapDataArray, 0);		// North-West
tile = read(mapDataArray, 1);		// North
tile = read(mapDataArray, 2);		// North-East
tile = read(mapDataArray, 32 + 0);	// West
tile = read(mapDataArray, 32 + 2);	// East
tile = read(mapDataArray, 64);		// South-West
tile = read(mapDataArray, 64 + 1);	// South
tile = read(mapDataArray, 64 + 2);	// South-East
The compiler sees that mapDataArray is being used in one of the file-access built-ins and requires it to be a zero-page variable. The resulting assembly look something like this:

Code: Select all

ldy #0
lda (mapDataArray),y
sta tile
ldy #1
lda (mapDataArray),y
sta tile
This ends up being a lot more efficient that how I typically write this in assembly, simply because I get lazy :D

Note that the call graph-based variable space reuse would be a great optimization here to conserve zero-page space.



I had never thought of doing objects that way. That's a great idea! I'll have to give that some thought.




As for multiplication operations as functions, that's something I gave a lot of thought to. In the end I decided it was best to force the user to use a function so they understood the performance impact of the operation. I've seen plenty of timing-critical C code with division inside of tight loops on platforms that do not have hardware division, then the programmer trying in vain to understand why it's taking so long. I took the Phythonic approach of "explicit is better than implicit".



As for memory reuse, I've never used that in my demos and never had an issue with running out of RAM, even on NROM boards. Then again I use a set of sixteen zero-page variables for all of this temporary stuff, so I guess that is a limited form of variable reuse. Anyhow, that's an optimization I am saving for later.



What's in a name? But if you're curious, here's the project page. It's called NICHE, for Niche Instruction Code for Homebrew Enthusiasts. I've always wanted to name something with a recursive acronym :P
tepples
Posts: 22345
Joined: Sun Sep 19, 2004 11:12 pm
Location: NE Indiana, USA (NTSC)
Contact:

Post by tepples »

qbradq wrote:However this static assignment of temporary variable space means that the entire scope of the expression must be constant, knowable at compile-time and consistent. This is why function calls within expressions are not possible.
Once on EFnet #nesdev, kevtris told me how he allocates local variables in his own programs. I'll try to express it in the language of graph theory so that it can be implemented in a compiler.

Assume the set of functions is a partially ordered set over the relation "f calls g", that is, their call graph is acyclic. (This captures what I think you were trying to say by excluding recursive functions.) The sources in the call graph is the reset handler, and the sinks are "leaf functions", or functions that do not call any functions. Add an edge from each function called while NMI is enabled to the NMI handler, and do the same for IRQ.

Once you have a call graph, you can topological sort it, and the "height" of the function becomes the maximum of A. the amount of temporary space used while not calling any function, or B. the amount of temporary space that needs to persist across each call to the function f plus the height of f.
Here's a more detailed example:

Code: Select all

byte someFunc(a, b) {
  byte tempVal = a + b;
  tempVal |= %00100101;
  // This expression allocates temporary values on the static expression stack
  return tempVal - a + b;
}

// This expression must also allocate temporary values
byte myVal = c - someFunc(9, 11) + 9;
The height of this expression is the space needed by the expression plus the space needed by someFunc().
As for larger arrays, I had thought about using two different addressing modes for small and large arrays
Much like "near pointers", "far pointers", and "huge pointers" in 16-bit MS-DOS programs.
The resulting assembly look something like this:

Code: Select all

ldy #0
lda (mapDataArray),y
sta tile
ldy #1
lda (mapDataArray),y
sta tile
This ends up being a lot more efficient that how I typically write this in assembly, simply because I get lazy :D
Which incidentally is a claim that some compiler publishers have made at various points: their products could schedule instructions in compiled code better than all but the most skilled (and highest paid) humans.
Note that the call graph-based variable space reuse would be a great optimization here to conserve zero-page space.
Extend this sort of reuse to all variables, and you don't need a data stack unless you're recursive.
As for multiplication operations as functions, that's something I gave a lot of thought to. In the end I decided it was best to force the user to use a function so they understood the performance impact of the operation.
In other words, the same complaint leveled against C++ operator overloading: it's "as efficient as C yet still rawther deceptive and easy to misuse because the 'simple' syntax hides how much code is actually being generated".
I've seen plenty of timing-critical C code with division inside of tight loops on platforms that do not have hardware division, then the programmer trying in vain to understand why it's taking so long. I took the Phythonic approach of "explicit is better than implicit".
Yet Python has operator overloading :P
Then again I use a set of sixteen zero-page variables for all of this temporary stuff, so I guess that is a limited form of variable reuse.
And from the call graph, you can extend this set of sixteen variables to nearly the entire zero page (except for a few things that must persist in zero page, such as open "memory files" used by music engines and the like).
Last edited by tepples on Mon Jul 18, 2011 8:09 am, edited 1 time in total.
rudla.kudla
Posts: 21
Joined: Mon Nov 22, 2010 9:36 am

Post by rudla.kudla »

However this static assignment of temporary variable space means that the entire scope of the expression must be constant, knowable at compile-time and consistent. This is why function calls within expressions are not possible.
That's not really true, you know. Atalan does it. Local variables, function input and output arguments are just normal global (static) variables, you just do not allow programmer to reference them outside the specified scope (function).
In the end I decided it was best to force the user to use a function so they understood the performance impact of the operation.
I knew this would be the reason :-) Well, it's design decision based on taste, so it's hard to argue. However I do not believe, that this is a way to go.
It makes the language harder to use even for competent programmers for the sake of preventing newbies to write inefficient code. Newbies will not write efficient code even if you force them to use operator '$#&#%' to do the multiplication.



If I understand the idea of memory files, it is basically just different syntax for C pointer arithmetics?
read(mapDataArray, 32 + 0) is equivalent to C mapDataArray[32+0] ?

In such case, common subexpression elimination should provide simmilar functionality.

Atalan provides 2D arrays for this. It generates index of array lines to eliminate multiplication and provide fast access to item elements. Access to one element is then:

Code: Select all

ldy mapYpos
lda mapIndexLo, y
sta _arr
lda mapIndexHi, y
sta _arr+1
ldy mapXPos
lda (_arr),y
In case of several reads in row, line adress computation gets removed using common subexpression elimination.
User avatar
qbradq
Posts: 952
Joined: Wed Oct 15, 2008 11:50 am

Post by qbradq »

Most of the suggestions in the previous two replies are over my head at this point. The features I have described are already stretching my abilities, so I think I'll stick to what I've got for the initial version and start studding these other concepts as I gain more basic understanding of compiler design.

Just to be clear, I am experienced with compiler implementation, but not the design of a good compiler, if that makes any sense.
tepples
Posts: 22345
Joined: Sun Sep 19, 2004 11:12 pm
Location: NE Indiana, USA (NTSC)
Contact:

Post by tepples »

Then let me help you through it a bit more gently:
  1. Do you understand what a call graph is?
  2. Do you understand what a leaf function is?[1]
  3. Do you understand how to perform a topological sort on a graph?
  4. Can you determine how much memory each leaf function uses for its local variables and temporary values?
[1] Other than to photosynthesize.
rudla.kudla
Posts: 21
Joined: Mon Nov 22, 2010 9:36 am

Post by rudla.kudla »

Local variables are quite easy. I will try to demonstrate the concept:

Code: Select all

p:x, y, z -> q =
   a = x + y
   q = a * z

s = p 13,10,4
can be translated to:

Code: Select all

p: p_x, p_y, p_z -> p_q =
  p_a = p_x + p_y
  p_q = p_a * p_z

p_x = 13
p_y = 10
p_z = 4
call p
s = p_q
Now all the variables are global, only some have 'p_' prefix .
User avatar
qbradq
Posts: 952
Joined: Wed Oct 15, 2008 11:50 am

Post by qbradq »

tepples wrote:Then let me help you through it a bit more gently:
  1. Do you understand what a call graph is?
  2. Do you understand what a leaf function is?[1]
  3. Do you understand how to perform a topological sort on a graph?
  4. Can you determine how much memory each leaf function uses for its local variables and temporary values?
[1] Other than to photosynthesize.
Thanks Tepples! You are always quite helpful!

1: Yes
2: Yes
3: Never done it before, but I think I have a handle on it now that I read that article.
4: Yes

This all makes sense for how I can reuse RAM locations for function parameters and locals. I also see how this can be extended to allowing function calls within expressions without using a stack, however that would be a rather extreme application.

Rudla,

That's basically what I was going to do. I will assign a global RAM location to each parameter and local variable. The symbols are resolved within a local scope.
rudla.kudla
Posts: 21
Joined: Mon Nov 22, 2010 9:36 am

Post by rudla.kudla »

Well, if you implement local variables like this, then nothing prevents you from implementing function calls in expressions. You do not need the stack then.

It may be useful to think about the goals of your language. I understand very well that you want to try to implement your own language, not just use the almost existing one. :-) Learning by doing was one of my motivations for starting the development of Atalan. However it may be useful for future discussions to clearly differentiate two reasons for restricting the NICHE:

1. You do not want to implement some feature, because you feel it would be too much work in the current state of development (there is always possibility to implement the feature later).

2. You believe the code generated by the feature would be so inefficient that it is not worth to do it.


For example in case of nested function calls, I thought the reason was 2., while it is probably 1., right?
Basically the case 2. is always a compromise on complexity. For me this is recursive functions. I have an idea on how to implement them without sacrificing effectiveness of non-recursive functions. However I believe Atalan will be mostly used for programming games, maybe some simple utilities. This type of application has very little use for recursion. Therefore it it not very high on the todo list.
User avatar
qbradq
Posts: 952
Joined: Wed Oct 15, 2008 11:50 am

Post by qbradq »

I chose not to implement function calls within expressions because I feared the generated code would be inefficient. The only solution to the problem I could come up with was to use stack-based expression solving, which I know from experience on the 6502 is less efficient that statically allocated temporary position expression solving.

You and Tepples have giving me some insights into compiler design that I lacked previously. I now see how, when taken to the extreme, a topologically sorted call stack could be used to allow function calls within expressions and retain the static nature of expression solving. My only problem now is that it seems like this would require a very large area of RAM to be dedicated to the expression space.

My intuition is telling me it would be better to leave out functions as terms and be able to move more variables to the zero-page.

I won't let complexity of implementation stand in my way. After all you never stop learning :D

Also, about not using Atalan: I did give it a try prior to deciding to make my own compiler. It's just not what I'm looking for. Basically I want a very lean C compiler, which is what I am making. If you look at some of the other projects I've done you'll notice a pattern of re-implementing things "my way" :P
Shiru
Posts: 1161
Joined: Sat Jan 23, 2010 11:41 pm

Post by Shiru »

Just some thoughts on a high level language designed to work fats on a certain CPU. Is it actually could be useful? Is it useful enough so it could justify amount of needed work?

For non-speed critical parts you don't really need such an speed optimized language. Fast code is usually large, there are speed/size tradeoffs, and you probably would prefer to have compact code for these parts.

For speed critical parts, such as NMI handler, it depends from the language, if it will be fast enough to use it there, or you will have to write it in assembly anyway. Like, C on 6502 is fast enough for a simple handler, but not fast enough for a complex one. If the new language will be just twice times faster, it will make not much difference, because you will be able to use it for a bit more complex handler, but still not for a complex one.

Also, using C on 6502 so far speed was not an issue (with speed critical parts in assembly), but compiled code is pretty large, much larger than written in assembly by hand.
User avatar
qbradq
Posts: 952
Joined: Wed Oct 15, 2008 11:50 am

Post by qbradq »

Shiru,

Thank you for the input! I appreciate your point of view.

What I am trying to do is create a language that produces code as efficient and as small as hand-written assembly. That may not necessarily be possible, but I'm giving it a shot :D The limitations of the language are the key to producing good machine code.

As for the effort versus benefit, that is a pretty moot argument for me. I am doing this because I enjoy it, and because if it works out I can make another game or demo with it. It's a hobby after all, the whole point is to kill time and enjoy yourself :D
tepples
Posts: 22345
Joined: Sun Sep 19, 2004 11:12 pm
Location: NE Indiana, USA (NTSC)
Contact:

Post by tepples »

And even if it isn't quite as fast as ASM, as long as it's more efficient than whatever the Micronics guys used, a language that reduces the entry barrier to NESdev will at least be a positive step in expanding the developer base and thus legitimizing NES homebrew.
Shiru
Posts: 1161
Joined: Sat Jan 23, 2010 11:41 pm

Post by Shiru »

There is a problem with new languages, they only increase the entry barrier, because they unique. You can find a lot of books, docs, or people to ask, about 6502 assembly or C. With a new language you will have just one doc written by the author, and a forum thread. Also, it will have bugs, which is a real problem for beginners, because they don't have experience to understand if they do something wrong, or it is a bug.

I can recall only one sort of successful unique language optimized for micros, it is CCZ80. Few actual games were written with it, but generally it not gained any major popularity in 3 years it exists. It has another problem, by the way, the compiler is closed-source.
User avatar
qbradq
Posts: 952
Joined: Wed Oct 15, 2008 11:50 am

Post by qbradq »

Shiru wrote:There is a problem with new languages, they only increase the entry barrier, because they unique.
That's why it's based on C with a few limitations.
Shiru wrote:Also, it will have bugs, which is a real problem for beginners, because they don't have experience to understand if they do something wrong, or it is a bug.
I can't argue with that. Hopefully my test suite will be comprehensive enough to eliminate bugs most folks are likely to encounter. The compiler will also be open source.
slobu
Posts: 275
Joined: Tue Jul 12, 2011 10:58 am

Post by slobu »

One thing from BASIC that should carry over is a relaxed syntax. If it's supposed to be easier for the layman C style { } and ; and other typo inducing requirements should go out the door.

I'd take a look at BatariBASIC for some ideas:
http://bataribasic.com/

Oh, yeah. Someone tried to port ZX Basic to the SMS:
http://www.smspower.org/forums/viewtopic.php?t=12902

It's a Z80 but maybe some of the ideas implemented will help.

I've got a PowerPack ready to rumble when you get to beta :) I'd love to make a real NES game!
Post Reply