I know I'm a bit of an outlier here with using cooperative-threading to design emulators, and that most will use state machines that run each chip the minimum amount of time, but I ran into a rather interesting challenge with threads, and was curious if anyone here had some input.
The Sega Genesis + Sega CD turns out to be a bit of a nightmare of a design to emulate.
You have two 68Ks, a Z80, a VDP graphics chip, PSG audio, YM2612 audio. And tons of components in the Sega CD that I basically roll into its 68K: a graphics rotozoom ASIC, CD controller, CD driver, CD-DA playback, etc. Then there's controllers that may need threads of their own, cartridges as well, etc.
The way my scheduler works is, every time a thread is running, and it does something that takes time, I increment that thread's current clock cycle. There's some extra math in there to scale the value so that each thread can run at different clock rates.
Many chips share RAM and resources, and so you need to make sure the current thread doesn't perform an operation that could affect the other chips while it's ahead in time. So every time the CPU accesses a PSG register, it has to catch up the Z80 *and* the PSG, since the Z80 can also access the PSG. The same for the YM2612. In fact, the Z80 can access almost the entire bus of the 68K, and vice versa, so it ends up being a lot of synchronization.
The naive approach looks like this:
Code: Select all
CPU::step(uint clocks) {
clock += clocks;
//resume(thread) is a thread switch:
//after resume(cpu) later, CPU code execution continues here:
if(clock >= apu.clock) resume(apu);
if(clock >= psg.clock) resume(psg);
if(clock >= ym2612.clock) resume(ym2612);
}
APU::step(uint clocks) {
clock += clocks;
if(clock >= cpu.clock) resume(cpu);
if(clock >= psg.clock) resume(psg);
if(clock >= ym2612.clock) resume(ym2612);
}
YM2612::step(uint clocks) {
clock += clocks;
if(clock >= cpu.clock) resume(cpu);
if(clock >= apu.clock) resume(apu);
}So notice how the YM2612 can't modify the PSG. As a result, it doesn't need to sync the PSG right now and can happily keep running even if it's ahead of the PSG, so long as it doesn't run so long that the PSG runs out of buffered audio samples.
The design works well, but with one glaring flaw: let's say the APU runs for a few clock cycles, and calls step. It is currently behind the CPU, but it is ahead of the YM2612. So it switches to it. The YM2612 runs for quite a while (my core is sample-based, so it eats 144 clock cycles per sample), and now it's quite a ways ahead of the CPU, so then it switches over to the CPU. BUT!! The CPU is still behind the APU, and now the CPU goes on to perform a write that affects the APU.
The naive solution to that is: every time any thread is about to do something that would influence another processor, we scan through every single thread, and find the thread with the smallest clock value, and if that's not the current thread, switch to it.
Code: Select all
Scheduler::synchronize() {
uint minimum = ~0;
Thread* context = nullptr;
for(auto thread : allThreads) {
if(thread->clock < minimum) {
minimum = thread->clock;
context = minimum;
}
}
//should never happen, don't check in release builds
assert(context);
if(context != currentThread) resume(*context);
}My initial thought was something like a binary min-heap array for the threads, so I can find min(list) in O(1) time, but that doesn't solve the problem with synchronizing threads that aren't important.
I can also unroll the synchronization logic, for instance in the YM2612's case:
Code: Select all
YM2612::step(uint clocks) {
clock += clocks;
if(clock >= cpu.clock) {
if(clock >= apu.clock) {
resume(apu.clock >= cpu.clock ? apu : cpu);
} else [
resume(cpu);
}
} else if(clock >= apu.clock) {
resume(apu);
}
}So right now, the naive approach seems to be the most likely to work. But as it's way too slow, I was curious if anyone had better ideas?