YCPU: an imaginary 16-bit processor.

You can talk about almost anything that you want to on this board.

Moderator: Moderators

lidnariq
Posts: 10677
Joined: Sun Apr 13, 2008 11:12 am
Location: Seattle

Re: YCPU: an imaginary 16-bit processor.

Post by lidnariq »

pops wrote:I thought that there might be cases where a program would need to determine which interrupt to call at run-time. Is this not the case, even in modern operating systems? (Not that this is what I'm targeting, mind you).
Modern x86+ (pentium 3, athlon xp, and newer, I think?) machines just have a simple SYSENTER / SYSEXIT instructions, and in software use the contents of the registers to decide which system call to use.

The goofy thing with the older INT instructions was that this implementation lost 1KiB of RAM to hold a lookup table of instructions, of which most weren't even used. Can you think of exactly 256 different things someone would want to do? Looking at Ralf Brown's list, 16 of the 256 are for hardware interrupts, and maybe another 16-32 are useful syscall entry points. Each of those 256 entry points have anywhere from 0 (e.g. int E1-EFh, all of which are used only by IBM BASICA) to 100ish (int 21h) subfunctions.

I don't really see an advantage of the 8086 software interrupt system. In practice, you still ended up with horrific messy hierarchies of subclassed syscalls, and the only real advantage was that you could rely on "Anything DOS is int 21h, anything video is int 10h, anything disk is int 13h, &c" and put your own stuff somewhere else knowing you probably wouldn't step on anything. But with modern more planned things (e.g. linux), everything was just under int 80h (until someone pointed out this causes horrific cache misses, and they replaced it with linux-gate).

Between only "syscall table that dispatches based on register" vs "syscall table that dispatches based on immediate argument" the latter is clearly better: the former consumes a register that you could use for parameters.
Joe
Posts: 469
Joined: Mon Apr 01, 2013 11:17 pm

Re: YCPU: an imaginary 16-bit processor.

Post by Joe »

pops wrote:Joe, I've added a preliminary memory paging specification. Under this spec, the processor has 16 pages of on-chip memory, but the MMU can page in memory from a device on the bus - which could be an additional memory chip with 256mw of memory (2^16 4kw pages). However, the processor itself - and the active process running on it - are limited to a flat $10000 address space. A kernel that is aware of additional memory or slow storage could provide interrupts to a process that would switch out pages on demand.
Why are you giving the CPU internal RAM when it could just as easily map external RAM directly? The speed hit from copying between internal and external RAM will discourage developers from using it. (In fact, it sounds like you're trying to reinvent the cache. Any particular reason you're doing things this way instead of using a more typical cache design?)

Why are there separate "RAM" and "ROM" address spaces? Why does the "hardware device" page mechanism allocate a space the same size as the maximum amount of RAM to every hardware device? A clever system designer will probably use that function of the MMU to allow more than your limit of 2^16 pages of RAM by placing a few "devices" that are just more RAM. Is that the intended use?
pops
Posts: 91
Joined: Sun Apr 04, 2010 4:28 pm

Re: YCPU: an imaginary 16-bit processor.

Post by pops »

Joe, I'm going to rewrite the memory management specification to explain that 'switching' a page into the address page is an instantaneous procedure. There is no need to copy external memory to internal memory - the MMU switches in an external memory/rom page, and work on it with no additional latency.
Why are you giving the CPU internal RAM when it could just as easily map external RAM directly?
I'm giving the CPU 16kw of internal memory so the processor has something to work with (a) before polling the bus, or (b) should there be no extra memory on the bus. So it's not a cache, per se - just a minimal amount of 'starter memory'.
Why are there separate "RAM" and "ROM" address spaces?
There's not separate address space for memory and rom and devices and so on - the only address space is the $10000 words that are currently loaded. The 'T' bits in the MMU allow the processor to switch pages of the 'rom' (like a BIOS) into a page of active address space, but doing so would switch out whatever else was in that space.
Why does the "hardware device" page mechanism allocate a space the same size as the maximum amount of RAM to every hardware device?
The maximum number of pages a device can have is 2^16 - but it could have less. For example, a hypothetical PPU device might have only 16kw of memory, and when selecting a page from that device, it would ignore all but the lower two bits of the page index selection word (I suppose the device itself would have to decide what - if any - page to expose to the MMU based on the page select index).
A clever system designer will probably use that function of the MMU to allow more than your limit of 2^16 pages of RAM by placing a few "devices" that are just more RAM. Is that the intended use?
Absolutely - and the OS writer would have to poll the system at start-up to determine what memory is available to use, and where it is on the hardware bus.

I'm adamant about keeping the $10000 address space. The MMU exists to extend this so that multiple processes can run, each with their own active memory - and if the OS supports it, each process can access even more memory by requesting that the OS switch in additional pages.

Code: Select all

=========================[ 1.F. Memory Management ]=============================
(Possible edit to Version 0.1e)
As the processor has a 16-bit address bus, it can only address $10000 words of
memory at a time. When the 'Memory Paging' status bit is clear, this address
space is filled with the processor's internal $10000 kw of memory.

When the 'Memory Paging' status bit is set, the processor's integrated memory
management unit (MMU) is activated. The MMU divides the processor's address
space into 16 pages of 4 kilowords each, and allows each individual page to be
moved in memory, or switched with a page of address space from a hardware
device.

There is no additional latency incurred by accessing address space mapped to a
page of memory in an external device unless the external device is intrinsically
slower than the internal memory (an inexpensive but slow memory device might
incur some additional latency, for example).

Each of the currently loaded 16 x 4 kiloword pages in address space are
described by 2 words:

WORD 0 (flags)
FEDC BA98 7654 3210
SWE. TT.. hhhh hhhh
    S - Supervisor only, 1: User mode accesses to this page cause a page fault.
    W - Write protect, 1: writing to this page cause a page fault.
    E - Execute protect, 1: executing on this page cause a page fault.
    T - Page type:
        00: Use processor internal memory page with index = (word 1 & 0x000F)
        01: Use blank page, reads/executes are 0x0000, writes fail silently.
        10: Use hardware page, device = h, page index = (word 1)
        11: Use processor ROM, page index = (word 1)

WORD 1 (index)
FEDC BA98 7654 3210
iiii iiii iiii iiii
    i - Index of device page mapped to this address page.
pops
Posts: 91
Joined: Sun Apr 04, 2010 4:28 pm

Re: YCPU: an imaginary 16-bit processor.

Post by pops »

tepples wrote:Writing to memory management registers might need to be a privileged operation so that the kernel can virtualize the address space, translating between page numbers that the application sees and physical page numbers that the hardware sees.
Agreed. Now I need to determine how to access the MMU. I'm considering a privileged opcode.
I'd also recommend having a seventeenth page that replaces one of the pages in supervisor mode, so that the user process can see the full 64 Kwords.
I don't understand what you mean here - do you mean that the supervisor would have its own special page? For what purpose?
And what's this "processor memory page" and "hardware page"? Is it that there's a fast 64 Kword memory in the CPU package and a slower, larger memory accessed through 36-bit "hardware page" (8 bits h, 16 bits i, 12 bits address)?
I did a very poor job describing this, obviously. There's only a $10000 address space, but pages from hardware devices can be switched into this address space. Hardware device pages are not (necessarily) any slower than the internal memory.
lidnariq
Posts: 10677
Joined: Sun Apr 13, 2008 11:12 am
Location: Seattle

Re: YCPU: an imaginary 16-bit processor.

Post by lidnariq »

pops wrote:I'm giving the CPU 16kw of internal memory so the processor has something to work with (a) before polling the bus, or (b) should there be no extra memory on the bus. So it's not a cache, per se - just a minimal amount of 'starter memory'.
Where are these 16 KiW of RAM in physical address? If you bank them out using the MMU, how do you get them back?
The 'T' bits in the MMU allow the processor to switch pages of the 'rom' (like a BIOS) into a page of active address space, but doing so would switch out whatever else was in that space.
Effectively, the TT bits increase the address space from 16 bits to 18, where the 17th and 18th bits mean "physical address range for ROM or MMIO or blank instead of RAM". Why bother specifying that it's memory, but it's read-only memory?
On x86, each PCI slot gets its own MMIO range of 64 KiB up to 256 MiB, for use with MMIO (instead of PMIO, which is only 256 bytes). But the physical addresses for the peripheral I/O is shared with physical addresses for RAM:e.g.:

Code: Select all

$ cat /proc/iomem 
[...]
00100000-8974afff : System RAM
[...]
dfa00000-feafffff : PCI Bus 0000:00
  e0000000-efffffff : 0000:00:02.0
  f0000000-f03fffff : 0000:00:02.0
  f0400000-f041ffff : 0000:00:19.0
    f0400000-f041ffff : e1000e
  f0420000-f042ffff : 0000:00:14.0
    f0420000-f042ffff : xhci_hcd
  f0430000-f0433fff : 0000:00:1b.0
    f0430000-f0433fff : ICH HD audio
  f0435000-f04350ff : 0000:00:1f.3
  f0436000-f04367ff : 0000:00:1f.2
    f0436000-f04367ff : ahci
  f0437000-f04373ff : 0000:00:1d.0
    f0437000-f04373ff : ehci_hcd
  f0438000-f04383ff : 0000:00:1a.0
    f0438000-f04383ff : ehci_hcd
[...]
I'm adamant about keeping the $10000 address space. The MMU exists to extend this so that multiple processes can run, each with their own active memory - and if the OS supports it, each process can access even more memory by requesting that the OS switch in additional pages.
I think there's some confusion here, between logical addresses (the 64 KiW) and physical addresses (256 MiW to 1 GiW depending on exactly what the TT bits mean)
pops
Posts: 91
Joined: Sun Apr 04, 2010 4:28 pm

Re: YCPU: an imaginary 16-bit processor.

Post by pops »

I think I'm getting where the confusion is coming from. I know very little about hardware - and thus I'm not familiar with how 'physical address space' works. Again, my only real exploration of a processor/memory is the NES 6502 + a Mapper. It's no surprise, then, that my design of the YCPU's memory manager is very similar to how a MMC3 might be said to 'manage' memory pages.

If the NES had 65k of memory (and no I/O registers mapped in this space), and had a mapper that could switch out any 4kb bank of memory in the address space with a bank of rom --- this is closer to what I'm thinking. The TT bits select whether the specified bank is located in internal memory, internal rom, an external device, or is blank. The internal memory has its own address space. So does each external device. But the devices don't 'know' anything about address space. They only know that they offer X pages of memory that can be switched into the address space of the processor.

I'm going to start calling my pages 'banks' - I know this won't clear up the current confusion, but it might help in the future.

Code: Select all

When the 'Memory Banking' status bit is set, the processor's integrated memory
management unit (MMU) is activated. The MMU divides the processor's address
space into 16 banks of 4 kilowords each, and allows each individual bank to be
filled with a bank of memory from internal memory, internal rom, a bank of
memory from a hardware device on the bus, or a blank bank.

The MMU has 32 words of memory internally that describe which banks are loaded
into memory. The MMU memory is accessed with the MMR and MMW instructions, which
allow values to be read from and written into the MMU.

The syntax for these instructions is:

    MMW r2,r0      ; write the value of r0 into MMU memory word at address = r2.
    MMR r2,r1      ; read the value of MMU memory word at address = r2 into r1.
    
MMU Bit Pattern
    FEDC BA98 7654 3210 
    iiii irrr WOOO OOOO
        O = Opcode
        W = 1: write to MMU word i, 0: read from MMU word i
        r = Source/Dest register
        i = index of MMU word to read from/write to (0-31)
lidnariq wrote:Where are these 16 KiW of RAM in physical address? If you bank them out using the MMU, how do you get them back?
Initially, the 64 KiW of internal memory are mapped into the entire physical address space. To restore the entire internal memory to the address space, you would do this:

Code: Select all

    LOD r0,#$0000       ; r0 = $00 (used for mmu word 1/page idx & loop counter)
    LOD r1,r0           ; r1 = $00 (used for mmu word 0/flags)
                        ;           flags are: SWE. TT.. hhhh hhhh
                                    S, W, E are all disabled.
                                    TT = 00, choosing banks from internal memory
                                    h is only used for TT = 10
    LOD r2,#$0001       ; r2 = $01 (used for increment - I need an inc opcode!)
Loop:
    MMW r0,r0           ; write word 0 of the current MMU bank
    ADD r0,r2           ; equivalent to INC r0
    MMW r0,r1           ; write word 1 of the current MMU bank
    ADD r0, r2
    CMP r0,#$0020       ; loop until we write 20 words (2w per bank, 10 banks)
    BNE Loop
Last edited by pops on Wed Mar 12, 2014 5:42 pm, edited 1 time in total.
lidnariq
Posts: 10677
Joined: Sun Apr 13, 2008 11:12 am
Location: Seattle

Re: YCPU: an imaginary 16-bit processor.

Post by lidnariq »

pops wrote: I know very little about hardware - and thus I'm not familiar with how 'physical address space' works. Again, my only real exploration of a processor/memory is the NES 6502 + a Mapper. It's no surprise, then, that my design of the YCPU's memory manager is very similar to how a MMC3 might be said to 'manage' memory pages.
So, on some level, an MMU is "nothing" more than a small bit of really fast memory that converts a few bits of address into more bits of address. The former is the "logical" address, and is what the CPU sees. The latter is the "physical" address, and is what the signals on the physical board would look like.

Alternatively, "logical" addresses are what the NES sees. "Physical" addresses are what the ROM sees.
pops wrote:Initially, the 64 KiW of internal memory are mapped into the entire physical address space. To restore the entire internal memory to the address space, you would do this:
Ok, so to translate your asm, the internal memory are mapped to the bottom 64 KiW of physical addresses (since you're using the 32 bit dword 0x00000000 and incrementing for it)
pops wrote: CMP r0,#$0020 ; loop until we write 20 words (2w per bank, 10 banks)
Your comment should probably be either "$20 words, $10 banks" or "32 words, 16 banks"
LOD r2,#$0001 ; r2 = $01 (used for increment - I need an inc opcode!)
One of the neat things I think I saw in the MSP430 instruction set is a set of hardwired registers with the values 0,1,2,4,8,16,32 and -1.
Joe
Posts: 469
Joined: Mon Apr 01, 2013 11:17 pm

Re: YCPU: an imaginary 16-bit processor.

Post by Joe »

To extend upon what Tepples said, here's a diagram explaining one possible way your MMU could operate.

Code: Select all

            +--------------+
            | Virtual addr |
            FEDCBA9876543210
            VVVVvvvvvvvvvvvv
            \  /\          /
             ||  |        |
             ||  |        |
+--------------+ |        |
|   MMU        | |        |
FEDCBA9876543210 |        |
iiiiiiiiiiiiiiii |        |
\              / |        |
/              \/          \
iiiiiiiiiiiiiiiivvvvvvvvvvvv
| 28-bit physical address  |
+--------------------------+
This MMU uses the high four bits of the virtual address to determine which page you want. It then uses the page index programmed for that page as the high 16 bits of the physical address, and passes the low 12 bits of the virtual address to the low 12 bits of the physical address.

Let's say I program MMU page 0x1 to point to 0xABCD, and then the CPU tries to access (virtual) address 0x1234. The CPU will be accessing physical address 0xABCD234.

Others have suggested that the bits you label "T" and "h" would be used to add even more bits to the physical address space. In this way, RAM, ROM, and peripherals would be located at different 30-bit physical addresses, and you would need to program the MMU with the correct values to access them from a 16-bit virtual address.

Also, one thing has bothered me: if your CPU starts up with everything mapped to (empty) RAM, what will execute?
pops
Posts: 91
Joined: Sun Apr 04, 2010 4:28 pm

Re: YCPU: an imaginary 16-bit processor.

Post by pops »

With all the latest additions, I feel comfortable incrementing the version of the document to 0.2. I've updated the first post of this thread to the latest version of the specification. The specification is now 40kb in size, up from only 9kb in the initial draft. I've had a lot of fun fleshing it out, and I know there's a lot of work left to do. I'm very thankful to everyone who has provided feedback, pointed out my misunderstandings and errors or the document's ambiguities.

Thanks to Lidnariq, Joe, and Tepples for taking the time to explain virtual memory and physical addresses. I think I finally understand the concept - and I've modified my MMU spec to better fit what you're talking about. Physical address space is now 33 bits, with the 33rd bit selecting internal/external memory. The lower 12 bits select a word in a page, and the remaining 20 bits select a page within the specified memory. I've updated the memory management section of the spec to reflect these updates.
zzo38 wrote:I notice you can tell it to shift out zero bits. What happens in these cases?
Does LSR still always clear the negative flag if no bits are shifted out?
If ROL is used shifting nothing, will the carry flag copy the low bit (low four bits of the result of (16-0) is 0)?
If ROR is used shifting nothing, will the carry flag copy the high bit (low four bits of the result of (0-1) is 15)?
I've clarified all of these: on a zero bit shift, there is no operation, and no flags are changed. Thanks!
Also, one thing has bothered me: if your CPU starts up with everything mapped to (empty) RAM, what will execute?
That question has occurred to me as well. I suppose the easy answer would be to have the processor load a small ROM program which would POST and then look for bootable devices on the hardware bus. This would require that the MMU enable bit be set by the processor's boot routine.
lidnariq wrote:I don't really see an advantage of the 8086 software interrupt system.
I find that I agree with you. I'm reducing the size of the interrupt vector table to 16 entries, which includes just one 'software interrupt' vector, which is always called on SWI.

I've also added at least basic descriptions for the remaining instructions that have not yet been fleshed out.

I'm certain that it's only hubris to assume that anyone else would ever make use of this specification in their own project. That said, I want to make certain that should anyone ever want to build on it or use it in any way, that use should be unencumbered by any ambiguity as to copyright. With this in mind, it's my intention to release this specification to the public domain, as specified in http://creativecommons.org/publicdomain/.
tepples
Posts: 22345
Joined: Sun Sep 19, 2004 11:12 pm
Location: NE Indiana, USA (NTSC)
Contact:

Re: YCPU: an imaginary 16-bit processor.

Post by tepples »

Tiny licensing nit: I'd recommend Creative Commons Zero (about, deed), which works even in countries whose copyright law lacks a concept of "donation of a work to the public domain" or "willful abandonment of copyright".
zzo38
Posts: 1080
Joined: Mon Feb 07, 2011 12:46 pm

Re: YCPU: an imaginary 16-bit processor.

Post by zzo38 »

The new ADI/SBI instruction is very good idea. I suppose INC/DEC are then assembler macros for ADI/SBI?
pops wrote:I've clarified all of these: on a zero bit shift, there is no operation, and no flags are changed. Thanks!
It is good that you clarified it, although I disagree with that specification. It would be simpler to implement (no special cases), and more useful, to do something like this:
  • All bit shifts, with a shift count of zero, affect the negative and zero flag.
  • ASL, ASR, LSL, and LSR, with a shift count of zero, clear the carry flag.
  • RNL and RNR with a shift count of zero, do not affect the carry flag.
  • Not sure about ROL and ROR; perhaps they also should not affect the carry flag if the shift count is zero (like RNL and RNR).
I'm certain that it's only hubris to assume that anyone else would ever make use of this specification in their own project. That said, I want to make certain that should anyone ever want to build on it or use it in any way, that use should be unencumbered by any ambiguity as to copyright. With this in mind, it's my intention to release this specification to the public domain, as specified in http://creativecommons.org/publicdomain/.
I agree absolutely with this (although as tepples has stated, you may need CC0 instead).
[url=gopher://zzo38computer.org/].[/url]
psycopathicteen
Posts: 3001
Joined: Wed May 19, 2010 6:12 pm

Re: YCPU: an imaginary 16-bit processor.

Post by psycopathicteen »

If you're going to have the MMU built in the chip, I still don't understand why you can't have the registers have their own selectable bank, since there's not going to be an absolute addressing mode. You can probably use two banks for program counter, to avoid the need of a long jump instruction.
lidnariq
Posts: 10677
Joined: Sun Apr 13, 2008 11:12 am
Location: Seattle

Re: YCPU: an imaginary 16-bit processor.

Post by lidnariq »

psycopathicteen wrote:I still don't understand why you can't have the registers have their own selectable bank, since there's not going to be an absolute addressing mode.
How would you get the bank containing the MMU registers back if you banked it out? You'd either need a fixed bank, special instructions to address them, or a special instruction for "restore the MMU bank". But if you were to do the last, you may as well make it uniform.
psycopathicteen
Posts: 3001
Joined: Wed May 19, 2010 6:12 pm

Re: YCPU: an imaginary 16-bit processor.

Post by psycopathicteen »

I thought the MMU registers were part of the register set. I guess I read it wrong.

EDIT:

Nope, I read it correctly. The MMU registers are separate from the logical memory space.

Code: Select all

The MMU has 32 words of memory internally that describe which banks are loaded
into memory. The MMU memory is accessed with the MMR and MMW instructions, which
allow values to be read from and written into the MMU.

The syntax for these instructions is:

    MMW $00,r0      ; write the value of r0 into MMU memory word $00.
    MMR $1F,r1      ; read the value of MMU memory word $1F into r1
lidnariq
Posts: 10677
Joined: Sun Apr 13, 2008 11:12 am
Location: Seattle

Re: YCPU: an imaginary 16-bit processor.

Post by lidnariq »

What do you mean by "have their own selectable bank", then?
Post Reply