In that case, allow me to share the hardest fought and strangest behaviors I've encountered. They're documented on my forums, so I don't know if you've seen them before or not. Some of these took me weeks to figure out, so hopefully it'll save you some trouble.
CPU interrupts: obviously, there's a two-stage pipeline, and the effect is that it looks as though interrupts are tested one cycle before the instruction ends. But there's a strange anomaly: if this test succeeds and an IRQ is pending, then for any opcode that is exactly two cycles, and the second cycle is an I/O cycle (eg nop, clc, inc, asl, xce, etc.), that second I/O cycle transforms into a bus read of the PC address (without incrementing PC, of course.) If you're in a slow memory area, this will make the instruction take two cycles longer to execute.
CPU DMA synchronization: this is just psychopathic. The DMA runs on a 1/8 clock rate, so when DMA starts, the CPU has to wait N (where N > 0) cycles until it's aligned to the DMA clock. Then you have the DMA setup, then transfers, then it has to sync back to the CPU. That takes the last cycle's length (6, 8 or 12) and sleeps N cycles (where N > 0) until the total DMA time was an even multiple of that cycle's length (eg for 6; the total DMA time must be 6, 12, 18, 24 ...)
CPU ALU: I would highly suggest looking at my source for this one. These opcodes work over 8 or 16 CPU cycles to slowly compute the result. Reading them early returns partially computed values. The behavior becomes psychopathic if you try doing a DIV during an active MUL or vice versa.
HDMA MDR: so Speedy Gonzales tends to freeze in level 6-1. It turns out the game is bugged, and gets stuck in an infinite loop reading a non-existent register. The open bus value it reads will never satisfy the loop condition to exit. But it works on hardware. What happens is that HDMA continues to trigger, and eventually an HDMA will trigger right before that open bus read, and the HDMA read will update the MDR. This happens enough times, and eventually HDMA reads a value that can break out of the loop.
HDMA ordering: the expected order of operations used to be:
for(channel 0 to 7): transfer, update_indirect_address
But this is wrong. It is actually:
for(channel 0 to 7): transfer
for(channel 0 to 7): update_indirect_address
Doing it the latter way ensures that all HDMA transfers happen inside Hblank, even for the theoretical worst-case HDMA of eight channels and eight indirect address fetches.
HDMA early termination: this one's crazy. If the very last HDMA channel that terminates HDMA for the entire frame happens to perform an indirect memory address, it only performs a single fetch. So the indirect address becomes (low << 8), with the lower eight bits cleared.
PPU long dots: dots 322 and 326 last for 6 cycles instead of 4. Even more bizarre is that different models act differently. Sometimes it's dots 321 and 325; sometimes it's dots 323 and 327. Some systems it varies each time you test. This doesn't apply to the NTSC interlace field scanline 240 that is missing one dot. This also doesn't affect CPU IRQ timing, since the CPU has its own shadow H/V counter based off the PPU H/V signals being fed to it.
PPU VRAM writes: if you write to VRAM during the last possible cycle before writes are completely blocked and disabled, an odd thing happens: it writes the CPU MDR into VRAM instead of the actual bus value. Two cycles before that (since you can only step by 2), it writes normally.
CPU $2180: this reads WRAM in 6 clocks instead of 8 clocks. Even if you exploit the system to read from $2180 twice in a row, both reads are still valid. My guess is the CPU internally buffers the WRAM twice. Because if it could be read at 6 clocks, they would have made the system work that way everywhere. God knows the CPU needs all the speed it can get.
CPU differences: rev1 has a fixed position to trigger DRAM refresh. rev2 it alternates between two positions (uses another clock / 8 thing like DMA). rev1 and rev2 have a difference in the HDMA initialization start time computation. Obviously rev2 fixes the DMA<>HDMA conflict that crashes the rev1, but it's hard to emulate the rev1 crash since obviously, tests of it crash the system :P
SMP test register: stay far away from this one. You can control the CPU scalar for instructions, enable/disable RAM reading, RAM writing, MMIO register accesses, and do really weird things to the timers that we don't fully understand yet. This register is a rabbit hole that's broken me, anomie, and blargg.
Probably half of that is required for at least one game. The other half not so much.
Code: Select all
send(A3h), recv(E0h)
send(addr0), recv(anything)
send(addr1), recv(anything)
send(addr2), recv(anything)
send(addr3), recv(anything)
repeat 128 times: send(80h..01h), recv(data)
Ah, I was doing it wrong, then. I was using:
send A3
send addr (little-endian order)
wait a while (for chip to get ready)
receive 128 times
On that note, my f3/f4 dumps were similar:
send F3/F4
wait a while (for chip to get ready)
receive 128K/32K bytes
I'll give this a shot if we need to know more about the memory map, thanks.
The 32Kbyte stuff dumped from at A0000000 (command F4) should be RAM, like you said.
How do you know it's from a0000000? Just curious.
Given you know the commands too, I assume you found a debug mode that prints this info out onscreen?
Both no$gba and VisualBoyAdvance have an ARMv4, and DS emulators have an ARMv5 core.
Both the MESS team and nocash have a huge head start on me, since I haven't written an ARM CPU emulator yet. I probably won't be the first to get this game running; unless they are both uninterested in it. But then this isn't a competition.