65816 really 16 bit or just a 6502 with 16 bit registers
Moderator: Moderators
Forum rules
- For making cartridges of your Super NES games, see Reproduction.
Re: 65816 really 16 bit or just a 6502 with 16 bit registers
Nintendo could have done a custom 65816, with a 16bit bus and throw out the 6502 compatibility,which becoming useless,and a 1 cycle WRAM/ROM access .psycopathicteen wrote:I'm surprised nobody made a dirt cheap 16-bit RISC cpu. I'm thinking maybe having 8 24-bit registers, with most ALU instructions being 16-bit, but some being 24-bit.
-
tomaitheous
- Posts: 592
- Joined: Thu Aug 28, 2008 1:17 am
- Contact:
Re: 65816 really 16 bit or just a 6502 with 16 bit registers
Hardly the bare minimum. Just for clarification, I pretty sure he meant the cpu handling writing to the DSP registers, not generating the waveforms itself. But that said, I've done 4 channels, frequency scaled with volume control, on the PC Engine at 35% cpu resource - all software (and that's a 7.16mhz 8bit 65x). A simple set of 24bit fixed point auto increment regs/pointers hardware on cart would cut that down to 8% cpu resource. No need for a 16mhz arm chip.Bregalad wrote:Considering how much the 65c816 is criticized for not being fast enough, I cannot imagine how poor it'd be at rendering audio. The 16 MHz ARM on the GBA is already the bare minimum, and most games have audio engines of poor quality.No SPC700 or 64KB of audio ram, just have the 65816 handle audio...
The Genesis doesn't have a true 16bit data bus to the VDP, although it acts like it. I ran into this problem when setting odd values for the autoincrement settings (the two bytes of a word going to different places in video memory). The VDP access to vram itself is 8bit as well. My point being, none of that makes the slightest bit of difference.
Point out the inferiority of the SNES' 8bit data bus the main cpu is about as irrelevant. All it really does is show ignorance of how the system works. The DMA is 8bit and damn fast at that. This whole is the cpu 16bit or not, is hardware engineering perspective VS software engineering perspective. The 65816 is a 16bit processor from both perspectives, while the 68000 is 16bit from a hardware perspective and 32bit from a software perspective. I have no problem calling the 68000 a 32bit processor, having written enough code for it (Genesis) - but it's the most underwhelming pathetic 32bit processor I've used. The ISA might be a dream to code with, but the thing is just so slow. A "crippled" '816 at less than half the frequency comes relatively close to it, and a hyper 8bit 7.16mhz processor often matches it. Not impressed.
Even when the SNES is running in 3.58mhz mode, it isn't really. Registers are essentially in ram (WRAM: address vectors) and the processor relies heavily on using ram - it's closer to ~3.1mhz. Try down clocking the Genesis at half speed (3.85mhz) and it would choke. The SNES is doing more with less. A 7.67mhz 65816 with no delays would exceed the 68000 in these consoles. A 65816 with a 16bit bus.... would smack the 68000 around like a little biatch. Crazy that they never made one.
__________________________
http://pcedev.wordpress.com
http://pcedev.wordpress.com
Re: 65816 really 16 bit or just a 6502 with 16 bit registers
Genesis does what Nintendon't, and it has nothing to do with the data bus. The data bus is largely a wash, as the 68000 only accesses memory every four cycles. The real difference is Times. Not cycle times or The New York Times or Times New Roman, but *. The 68000 has a hardware 16x16=32 multiplier. The 65816 doesn't, and the multiplier in the 5A22 built around it is only 8x8. Hardware multiply and divide make basic 3D practical even without DSP or GSU.
- Drew Sebastino
- Formerly Espozo
- Posts: 3496
- Joined: Mon Sep 15, 2014 4:35 pm
- Location: Richmond, Virginia
Re: 65816 really 16 bit or just a 6502 with 16 bit registers
Yes. I'm pretty sure the SPC700 doesn't actually generate waveforms, which is why I don't really think it's needed, but of course I have no clue how sound hardware works. It seems like you could have just wired up the DSP to a bus from the cartridge. I would just say main ram, but doesn't it need to constantly pull data to generate waveforms? If the DSP were using main ram, I imagine the CPU couldn't at the same time.tomaitheous wrote: I pretty sure he meant the cpu handling writing to the DSP registers, not generating the waveforms itself.
Yeah, I don't get the whole data bus thing. It doesn't matter if a cpu has a data bus that is twice as large if it only pulls from ram half as often, or if the clock speed is halved or whatever else. I actually kind of like the thought of a small data bus, because it's not wasteful, like if you have a 16 bit data bus and use an 8 bit instruction, you're wasting a theoretical cycle vs if the data bus were 8 bit, if that makes sense. However, I think arguments like "if the 65816 ran at 7.18MHz or the 68000 at 3.58MHz" are kind of silly, because I imagine getting the 65816 to run at 7.18MHz would be a lot more difficult and expensive than getting the 68000 to go at that speed, as it would be more powerful, which drives up the cost 90% of the time.tomaitheous wrote:Point out the inferiority of the SNES' 8bit data bus the main cpu is about as irrelevant.
Re: 65816 really 16 bit or just a 6502 with 16 bit registers
The normal multiplier is u8 * u8 -> u16, and requires waiting 8 CPU cycles, regardless of waitstates (it uses the CPU clock), before you can read the result.
However, there's also the mode 7 multiplier, which is i16 * i8 -> i24, and can be used when you have mode 0-6 set, or during blanking in mode 7. It's also significantly faster, letting you read the result immediately after writing the operands.
However, there's also the mode 7 multiplier, which is i16 * i8 -> i24, and can be used when you have mode 0-6 set, or during blanking in mode 7. It's also significantly faster, letting you read the result immediately after writing the operands.
Re: 65816 really 16 bit or just a 6502 with 16 bit registers
Eh, BlastEm is getting there (some games still broken due to random bugs, but most should work) (・・ ) (though requires 64-bit x86)calima wrote:Excellent! You're one of the few people who care about portability, as it is, there are practically no good Gen emulators for Linux, especially non-32-bit x86.
RISC gained traction when 32-bit was already common. Note that there are 32-bit RISC CPUs running off a 16-bit bus and with 16-bit opcodes (the SH2s on the 32X come to mind since if I recall correctly their bus is 16-bit, and the ARM in the GBA has Thumb mode as well as a 16-bit bus on everything but a portion of RAM).psycopathicteen wrote:I'm surprised nobody made a dirt cheap 16-bit RISC cpu. I'm thinking maybe having 8 24-bit registers, with most ALU instructions being 16-bit, but some being 24-bit.
No, it didn't. Multiply is a really slow operation on the 68000, and division even moreso. On top of that, they mess with raster effects, since the 68000 can't process interrupts until the current instruction is finished (this is particularly bad for division, since each division eats up about 1/3 worth of scanline's time). You really want to avoid them at all costs... but at least you can get away without them for 3D if you really need to. But the 68000 is still too slow for 3D precisely because it's damn too slow at memory accesses (nearly all the time is spent on filling polygons, and this requires an absurd amount of memory accesses, in addition to the time spent transferring this to video memory once done).tepples wrote:Genesis does what Nintendon't, and it has nothing to do with the data bus. The data bus is largely a wash, as the 68000 only accesses memory every four cycles. The real difference is Times. Not cycle times or The New York Times or Times New Roman, but *. The 68000 has a hardware 16x16=32 multiplier. The 65816 doesn't, and the multiplier in the 5A22 built around it is only 8x8. Hardware multiply and divide make basic 3D practical even without DSP or GSU.
Re: 65816 really 16 bit or just a 6502 with 16 bit registers
The 68k multiply and divide timings are also clearly microcoded, and take variable time:
The SNES's mode 7 multiplier would be so I/O bound (instead of internal processing limited) that I'm having a hard time making a fair comparison.
This means the Genesis can do approximately 110k-190k 16×16→32 multiplications per second.M68000 User's Manual wrote:DIVS, DIVU — The divide algorithm used by the MC68000 provides less than 10% difference between the best- and worst-case [158 or 140 clocks respectively] timings.
MULS, MULU — The multiply algorithm requires 38+2n clocks where n is defined as:
MULU: n = the number of ones in the <ea>
MULS: n = concatenate the <ea> with a zero as the LSB; n is the resultant number of 10 or 01 patterns in the 17-bit source; i.e., worst case happens when the source is $5555.
The SNES's mode 7 multiplier would be so I/O bound (instead of internal processing limited) that I'm having a hard time making a fair comparison.
- Drew Sebastino
- Formerly Espozo
- Posts: 3496
- Joined: Mon Sep 15, 2014 4:35 pm
- Location: Richmond, Virginia
Re: 65816 really 16 bit or just a 6502 with 16 bit registers
Uh, what? I imagine that dealing with the mode 7 multiplication registers, you're only doing loads and stores. LDA can be from 2-7 cycles, (8 cycles for 16 bit) and STA can be from 3-7 cycles (8 for 16 bit). Just see the number of times you'll do either, and there you'll have it.lidnariq wrote:I'm having a hard time making a fair comparison.
Re: 65816 really 16 bit or just a 6502 with 16 bit registers
Oh sorry i can't let pass that... Do you in fact understand how hardware work before accusing others about their ignorance ? You are still from the ones comparing CPU based on their frenquency .. really ? Arging that a 7.67 Mhz 65816 can do more than a 7.67 Mhz 68000 (and of course it does) actually just show your ignorance and is totally irrelevant... The problem is all about memory speed. Please tell me how you could had 7 Mhz ROM (even 5 Mhz) in 1990 with good capacity and resonable price ? Of course the 8 bits BUS of the SNES in a pain in the butt, and of course the 65x0/65816 architecture is really inneficient in term of what it can do with a given memory speed compared to 68000, and that is actually what matter back in time as the memory (ROM as RAM) speed/cost was a major deal.tomaitheous wrote: Point out the inferiority of the SNES' 8bit data bus the main cpu is about as irrelevant. All it really does is show ignorance of how the system works. The DMA is 8bit and damn fast at that. This whole is the cpu 16bit or not, is hardware engineering perspective VS software engineering perspective. The 65816 is a 16bit processor from both perspectives, while the 68000 is 16bit from a hardware perspective and 32bit from a software perspective. I have no problem calling the 68000 a 32bit processor, having written enough code for it (Genesis) - but it's the most underwhelming pathetic 32bit processor I've used. The ISA might be a dream to code with, but the thing is just so slow. A "crippled" '816 at less than half the frequency comes relatively close to it, and a hyper 8bit 7.16mhz processor often matches it. Not impressed.
Even when the SNES is running in 3.58mhz mode, it isn't really. Registers are essentially in ram (WRAM: address vectors) and the processor relies heavily on using ram - it's closer to ~3.1mhz. Try down clocking the Genesis at half speed (3.85mhz) and it would choke. The SNES is doing more with less. A 7.67mhz 65816 with no delays would exceed the 68000 in these consoles. A 65816 with a 16bit bus.... would smack the 68000 around like a little biatch. Crazy that they never made one.
Re: 65816 really 16 bit or just a 6502 with 16 bit registers
Not that slow, of course you can't do advanced 3D at all but you can get something not far from what the SNES did with a SFX:Sik wrote: But the 68000 is still too slow for 3D precisely because it's damn too slow at memory accesses (nearly all the time is spent on filling polygons, and this requires an absurd amount of memory accesses, in addition to the time spent transferring this to video memory once done).
https://www.youtube.com/watch?v=YUZpF2JLF4s
I made benchmarks with my 3D maths methods in SGDK and you can barely do complete 3D transformation of about 10000 vertices / second (that is, including 2D projection so 11 multiplications + 11 additions + 1 division per vertex). If you spent 30% of your CPU time in it that let you about 3000 vertices / second or 200 vertices / frame (for a 15 FPS game). Not sure Starfox on SNES put more than that.
Also i don't understand why you say that "68000 it's damn too slow at memory accesses" O_o ? compared to what ? In fact i would totally say the opposite. The 68000 is quite efficient doing memory operation regarding it's BUS speed... And that "starfox demo" shows it (not bad for a CPU using < 2Mhz memory operation speed).
Re: 65816 really 16 bit or just a 6502 with 16 bit registers
> On top of that, they mess with raster effects
... so I take it I'm going to have to implement the VDP as a dot-based pixel renderer, instead of a scanline-based renderer, eh? :P
Not that I was planning on doing a scanline-based renderer anyway ...
... so I take it I'm going to have to implement the VDP as a dot-based pixel renderer, instead of a scanline-based renderer, eh? :P
Not that I was planning on doing a scanline-based renderer anyway ...
Re: 65816 really 16 bit or just a 6502 with 16 bit registers
I think you don't need dot based raster to get troubles with raster effects... if a 68k division happen right before a H-Int, then H-int processing is postponed after the division (let say, about 130 cycles too late then), depending what you are doing in the H-Int callback you can definitely miss the period where VDP is fetching next scanline data (and that is the case even with a scanline based renderer).
You will need dot based rendering anyway if you plan to allow that bitmap DMA mode
I just dig in my old Gens (rewrite) sources, it looks like i tried to implement all basic timing with a global event system with a centralized timer (split in global / current frame / current scanline timers). The basic idea was to push all incoming relevant events for a whole scanline (H Blank flag change, VDP SAT prefetch, VDP line rendering, h-int trigger...) then i was executing CPU (all CPU) cycles for 1 scanline and the event handler system was splitting cycles slices according so events occurred when expected. The idea was to use that framework to emulate the Sega Saturn as well as it heavily relies on SMP.
You will need dot based rendering anyway if you plan to allow that bitmap DMA mode
I just dig in my old Gens (rewrite) sources, it looks like i tried to implement all basic timing with a global event system with a centralized timer (split in global / current frame / current scanline timers). The basic idea was to push all incoming relevant events for a whole scanline (H Blank flag change, VDP SAT prefetch, VDP line rendering, h-int trigger...) then i was executing CPU (all CPU) cycles for 1 scanline and the event handler system was splitting cycles slices according so events occurred when expected. The idea was to use that framework to emulate the Sega Saturn as well as it heavily relies on SMP.
Last edited by Stef on Sat Jul 30, 2016 10:43 am, edited 1 time in total.
Re: 65816 really 16 bit or just a 6502 with 16 bit registers
Stef you are always speaking of costs, but there is a big margin between 2,58 and 7,67 mhz .
A simple 1 cycle for accessing RAM/ROM like hu6280 and you can go more than 6mhz and have ROM/RAM speed as used in MD (150ns) .
For RAM no needs to have 128 ko 32/64 were enought, and the cherry on the cake, you can boost your DMA too .
For me the SA-1 is the good exemple that costs are not so expensive like we think, A 10mhz 65816+enbeded RAM+all others feature within a simple cartridge .
I even read that Z80 RAM on Md was a 100ns(at least on VA0 revision) .
http://segaretro.org/Mega_Drive_PCB_revisions
A simple 1 cycle for accessing RAM/ROM like hu6280 and you can go more than 6mhz and have ROM/RAM speed as used in MD (150ns) .
For RAM no needs to have 128 ko 32/64 were enought, and the cherry on the cake, you can boost your DMA too .
For me the SA-1 is the good exemple that costs are not so expensive like we think, A 10mhz 65816+enbeded RAM+all others feature within a simple cartridge .
I even read that Z80 RAM on Md was a 100ns(at least on VA0 revision) .
http://segaretro.org/Mega_Drive_PCB_revisions
Re: 65816 really 16 bit or just a 6502 with 16 bit registers
Yeah, I think that could really work well for the Genesis, being based off a single oscillator.Stef wrote:I just dig in my old Gens (rewrite) sources, it looks like i tried to implement all basic timing with a global event system. The basic idea was to push all incoming relevant events for a whole scanline (H Blank flag change, VDP SAT prefetch, VDP line rendering, h-int trigger...) then i was executing CPU (all CPU) cycles for 1 scanline and the event handler system was splitting cycles slices according so events occurred when expected. The idea was to use that framework to emulate the Sega Saturn as well as it heavily relies on SMP.
In many ways, I am starting to feel MAME's pain with supporting more and more systems. It's extremely rewarding intellectually, but my pride and strive for perfection really take a beating.
Anyway ... I would strongly recommend you look into binary min-heap arrays for this. Here's my implementation for reference: http://hastebin.com/raw/qurokahane
If you use this as a priority queue, it's pretty miraculous. The idea is, any time you know something is going to happen in N cycles, where N can be any number of cycles you want ... you can add it to the queue in logarithmic time. And whenever an event triggers, you can remove it in logarithmic time too. But the real magic that makes it so great ... as time passes, you can advance the queue by N cycles and trigger callback events in constant(!!) time ... which boils down to one compare.
My version above uses a trick to avoid having to normalize the queue periodically to avoid overflow. Makes it a bit harder to read, but easier to use since you never have to worry about that case.
So instead of having an add_cpu_cycles(uint N) loop that has to test if we need to fire an IRQ, an NMI, a DMA event, run the ALU, or do a bunch of other things like that ... you can test every single possible event with just one compare.
There may be better data structures than binary min-heap for this, but I loved the simplicity of it. It's very rare that I'm able to implement algorithms when described by mathematicians.
Anyway, a Gens reboot sounds pretty awesome! Gens was always my favorite Genesis emulator (sorry Steve, but I don't use closed source stuff) ... would be fun to talk shop with you sometime in the future after I learn a lot more :D
...
As for the Saturn, that's my ultimate dream console to emulate. But short of a 100-fold increase in processing power before I reach 40, I'm not going to attempt it. It would require too many accuracy sacrifices and nothing kills my enjoyment of emu coding more than that =(