What exactly are T-states doing?
Re: What exactly are T-states doing?
If I had to make a wild guess, I'd guess that it would finish executing the CB instruction instead of handling the interrupt. This is just a wild guess.
Here come the fortune cookies! Here come the fortune cookies! They're wearing paper hats!
Re: What exactly are T-states doing?
If someone cared to test (and had a flash cart to do it with), it should be simple enough to set up a test that would have an interrupt trigger just after the CB prefix is read, and then watch the address bus to see if the next memory access is to the stack (to push PC) or to fetch the next instruction, and have your program look at the pushed F register to check for unused bits being set.
I did a quick test with something like this:If I had to make a wild guess, I'd guess that it would finish executing the CB instruction instead of handling the interrupt. This is just a wild guess.
Code: Select all
ld a, $F0
ei
swap a
On a Z80, this technique can be used to reveal prefix opcodes, but here it just handles swap a as usual.
I'll add this test ROM to the mooneye test suite once I tidy up things a bit and verify it on all devices.
Mooneye GB test acceptance/bits/reg_f confirms that the low bits of F are not usable. Also, the test I mentioned above would show something special in F if CB would be a status flagIf CB flag is in the status flags (and POP itself doesn't clear it), the NOP should be interpreted as RLC B, so B will be 3 now.
Re: What exactly are T-states doing?
Just some comments about the questions on your "emulation accuracy" page.
I've never seen any test results for writes during OAM DMA, or whether OAM DMA automatically suppresses interrupts.
On a real Z80, prefix instructions are the same: no interrupt can occur between the prefix and the instruction it's modifying (the 8080 doesn't have any prefix instructions). This is almost certainly true on the GB as well, otherwise chaos would ensue (you could never safely use a CB instruction any time an interrupt could possibly happen)
Likewise MBC2 only has four data pins; that's why its internal battery RAM is arranged in nybbles, and why it only supports up to 2 Mbit ROMs.
Also, MBC1 is only connected to A15, A14 and A13 of the cartridge bus, and MBC2 is only connected to A15, A14 and A8-A0. So MBC1 registers are mirrored over spans of $2000 bytes, and MBC2 registers are mirrored over spans of $100 bytes (you can select a ROM bank by writing to $0100-01FF, $0300-03FF, $0500-05FF, etc.) The reason most monochrome GB games write to $2100 to switch ROM banks is to be compatible with either MBC1 or MBC2.
re those bus timing diagrams: in case it isn't obvious, the reason why accesses to $8000-9FFF don't show any bus activity is that VRAM is on a separate bus on the GB (on the 'Pocket and everything afterwards it's built right into the CPU)
DMG:
13-bit address bus and 8-bit data bus to VRAM
16-bit address bus and 8-bit data bus to WRAM and the cartridge slot
$FF80-FFFE internal to the CPU
GBP:
Unconnected external VRAM address and data bus (maybe it can be enabled and the internal VRAM disabled somehow?)
16-bit address bus and 8-bit data bus to WRAM and the cartridge slot
VRAM and $FF80-FFFE internal to the CPU
GBC:
15-bit address bus and 8-bit data bus to WRAM (the upper 3 bits come from the bank select register)
16-bit address bus and 8-bit data bus to the cartridge slot only
VRAM and $FF80-FFFE internal to the CPU
GBA:
WRAM moved inside the CPU as well. The only external RAM on the GBA is the big slow work RAM (which isn't usable in GBC mode)
Schematics showing pinouts of the GB CPU, MBC1 and MBC2 are at http://fms.komkon.org/GameBoy/Tech/Hardware.html
I believe someone tested this and found that on the DMG, reads (presumably including opcode fetches) return the byte currently being DMAed. On the GBC, external WRAM (i.e. $C000-$DFFF) is on a separate physical bus from the cartridge slot, probably because of the WRAM bankswitching. If you read WRAM during DMA it has the same effect as on the DMG, but you can apparently run code from ROM normally while DMAing from WRAM (the Wizardry Famicom remakes do this--they don't bother copying their OAM DMA routine to $FF80) There are probably some limitations to executing code in parallel with DMA; the Wizardry games still do a 160-cycle delay loop after triggering DMA.What happens if the CPU accesses memory during OAM DMA?
I've never seen any test results for writes during OAM DMA, or whether OAM DMA automatically suppresses interrupts.
On a real Z80 (and I believe an 8080 as well), "EI's effect is delayed one cycle" is not true so much as "EI actually disables interrupts until after the next instruction" The reason is to ensure that the sequence "EI; RET" is atomic. If you put a hundred EIs in a row, no interrupts can occur between any of them. You should test whether this is true on the GB as well.What is the exact behaviour of EI?
On a real Z80, prefix instructions are the same: no interrupt can occur between the prefix and the instruction it's modifying (the 8080 doesn't have any prefix instructions). This is almost certainly true on the GB as well, otherwise chaos would ensue (you could never safely use a CB instruction any time an interrupt could possibly happen)
It's not surprising that PUSH has an extra internal delay that POP doesn't. Remember that the GB, like other 8080-family CPUs, has a "full" stack: SP points to the last item pushed. So PUSH has to decrement SP first to generate the address for the write, whereas POP can immediately read memory while incrementing SP in parallel. The 6502 family has an "empty" stack, and pops take one more cycle than pushes do--exactly the opposite of the 8080 family.What is the exact timing of PUSH rr?
MBC1 only has five data pins; it can't see the top three bits of the data bus at all. That's why ROMs bigger than 4 MBit need a second register to select the upper bank bits. So a data value of 32 will mirror to 0 in the MBC and trigger the "0 actually selects 1" behaviour, but a data value of 16 on a 2 Mbit ROM, or 8 on a 1 Mbit ROM will mirror to 0 in the ROM and won't be converted to 1. The MBC doesn't "know" how big the ROM is; smaller ROMs just leave the upper address lines of the MBC unconnected.What does MBC1 do if you request a ROM bank number higher than what the cartridge supports?
Likewise MBC2 only has four data pins; that's why its internal battery RAM is arranged in nybbles, and why it only supports up to 2 Mbit ROMs.
Also, MBC1 is only connected to A15, A14 and A13 of the cartridge bus, and MBC2 is only connected to A15, A14 and A8-A0. So MBC1 registers are mirrored over spans of $2000 bytes, and MBC2 registers are mirrored over spans of $100 bytes (you can select a ROM bank by writing to $0100-01FF, $0300-03FF, $0500-05FF, etc.) The reason most monochrome GB games write to $2100 to switch ROM banks is to be compatible with either MBC1 or MBC2.
re those bus timing diagrams: in case it isn't obvious, the reason why accesses to $8000-9FFF don't show any bus activity is that VRAM is on a separate bus on the GB (on the 'Pocket and everything afterwards it's built right into the CPU)
DMG:
13-bit address bus and 8-bit data bus to VRAM
16-bit address bus and 8-bit data bus to WRAM and the cartridge slot
$FF80-FFFE internal to the CPU
GBP:
Unconnected external VRAM address and data bus (maybe it can be enabled and the internal VRAM disabled somehow?)
16-bit address bus and 8-bit data bus to WRAM and the cartridge slot
VRAM and $FF80-FFFE internal to the CPU
GBC:
15-bit address bus and 8-bit data bus to WRAM (the upper 3 bits come from the bank select register)
16-bit address bus and 8-bit data bus to the cartridge slot only
VRAM and $FF80-FFFE internal to the CPU
GBA:
WRAM moved inside the CPU as well. The only external RAM on the GBA is the big slow work RAM (which isn't usable in GBC mode)
Schematics showing pinouts of the GB CPU, MBC1 and MBC2 are at http://fms.komkon.org/GameBoy/Tech/Hardware.html
Re: What exactly are T-states doing?
Nice, that answers a lot of questions!
One thing I've never been sure of is what actually is at FEA0-FEFF. It's only marked as "unusable" in documents, but what happens if you try to use it? Some have suggested it partially mirrors OAM.
Also, is FFFF actually within HRAM or is it separate? Can it be accessed during DMA?
One thing I've never been sure of is what actually is at FEA0-FEFF. It's only marked as "unusable" in documents, but what happens if you try to use it? Some have suggested it partially mirrors OAM.
Also, is FFFF actually within HRAM or is it separate? Can it be accessed during DMA?
Sent from my Game Boy.
Re: What exactly are T-states doing?
No idea about FEA0-FEFF, I doubt it's an OAM mirror though.Rena wrote:Nice, that answers a lot of questions!
One thing I've never been sure of is what actually is at FEA0-FEFF. It's only marked as "unusable" in documents, but what happens if you try to use it? Some have suggested it partially mirrors OAM.
Also, is FFFF actually within HRAM or is it separate? Can it be accessed during DMA?
FFFF is a memory-mapped register, like everything from FF00 to FF7F. I don't see why it wouldn't be accessible during OAM DMA (it's been discovered that accessing FF46 during DMA restarts the DMA) though some of the registers might have weird side effects.
You know about the OAM corruption hardware bug with 16-bit inc/dec instructions, right? Instructions that don't even perform memory accesses can corrupt OAM, suggesting that the CPU and PPU parts of the die aren't as well segregated as one might expect.
Re: What exactly are T-states doing?
Thanks! The page is very out of date so I already have answers to many of the questions, but I don't mind discussing these things and sharing knowledgeJust some comments about the questions on your "emulation accuracy" page.
Yes, I've confirmed this as well. Basically, if you DMA stuff from the cartridge bus and the CPU wants to read stuff at the same time, the DMA wins and both the DMA and CPU see a byte from the current DMA source address. This applies regardless of whether it's an opcode fetch or a load.I believe someone tested this and found that on the DMG, reads (presumably including opcode fetches) return the byte currently being DMAed. On the GBC, external WRAM (i.e. $C000-$DFFF) is on a separate physical bus from the cartridge slot, probably because of the WRAM bankswitching. If you read WRAM during DMA it has the same effect as on the DMG, but you can apparently run code from ROM normally while DMAing from WRAM (the Wizardry Famicom remakes do this--they don't bother copying their OAM DMA routine to $FF80) There are probably some limitations to executing code in parallel with DMA; the Wizardry games still do a 160-cycle delay loop after triggering DMA.
I'm pretty sure I've checked that interrupts are not suppressed in any way. But I don't think I've tried writes...And I need to publish test ROMs for all these things. I've got sooo many unpublished test ROMs named oam_hell1, oam_hell2, oam_hell3, etc.I've never seen any test results for writes during OAM DMA, or whether OAM DMA automatically suppresses interrupts.
Unfortunately, this seems to be untrue on the GB based on a test ROM I wrote. EI doesn't disable interrupts, so given an EI sequence, the interrupt happens between the second and third EIs.On a real Z80 (and I believe an 8080 as well), "EI's effect is delayed one cycle" is not true so much as "EI actually disables interrupts until after the next instruction" The reason is to ensure that the sequence "EI; RET" is atomic. If you put a hundred EIs in a row, no interrupts can occur between any of them. You should test whether this is true on the GB as well.
Aha! I never thought about this, but it makes perfect sense.It's not surprising that PUSH has an extra internal delay that POP doesn't. Remember that the GB, like other 8080-family CPUs, has a "full" stack: SP points to the last item pushed. So PUSH has to decrement SP first to generate the address for the write, whereas POP can immediately read memory while incrementing SP in parallel. The 6502 family has an "empty" stack, and pops take one more cycle than pushes do--exactly the opposite of the 8080 family.
I've confirmed that it's accessible as usual. What do you mean exactly mean with "accessing FF46 during DMA restarts the DMA"? Do you mean writing or also reading? Writing indeed restarts the DMA, although the behaviour during the first DMA cycle is slightly different.FFFF is a memory-mapped register, like everything from FF00 to FF7F. I don't see why it wouldn't be accessible during OAM DMA (it's been discovered that accessing FF46 during DMA restarts the DMA) though some of the registers might have weird side effects.
I've also thought about $FFFF, and I don't see any reason why it would have to be a separate register. It doesn't really make a difference in emulation, but I find it completely plausible that it's just the last byte of high RAM. After all, all the bits are accessible unlike in the IF register.
Re: What exactly are T-states doing?
Is there any possibility you can do something like this with the SNES? Or has someone already done it (maybe nocash?)gekkio wrote:Did you notice the logic analysis directory under tests in the mooneye-gb repository? I've done some logic analysis on the Game Boy hardware, and you might be interested in things like the write and read timings in the external bus.
The SNES CPU is a lot closer to a standard 65816 than the GB is to a standard Z80, but it has a rather different bus from a 65816 (two address buses, separate /RD and /WR strobes instead of RD/WR) and I'm curious what the timings are, especially for DMA (which uses both address buses and the single data bus)
Re: What exactly are T-states doing?
We have some SNES logic analyzer traces from the repair effort with Poot36. It's not all 58-ish signals, but it is 32 of them.
Re: What exactly are T-states doing?
Those are very interesting, thanks for pointing them out. It looks like "FastROM"/3.58MHz cycles have a 50% duty cycle (/RD or /WR is asserted 3 master clocks after the address is put on the bus), and "SlowROM"/2.68MHz cycles stretch the phase in which /RD or /WR is asserted by 2 master clocks (and, for read cycles, presumably the CPU delays latching the data by the same amount).lidnariq wrote:We have some SNES logic analyzer traces from the repair effort with Poot36. It's not all 58-ish signals, but it is 32 of them.
Unfortunately those traces are missing /PARD and /PAWR (the signals for the "B-bus" or $21xx address range), and as far as I can tell none of them shows any DMA operations (not that you could tell what was going on in DMA without both sets of signals...)
Also there's the little fact that they're traces from a defective CPU...
Re: What exactly are T-states doing?
There's 25 different traces; Numbers 8 and up do have PARD and PAWR. Listings 12, 13, and 15 seem to show DMA.
The defect only appeared to be that the PLB and PLD instructions corrupted the stack pointer; I don't think there's any reason to believe that would affect timing.
The defect only appeared to be that the PLB and PLD instructions corrupted the stack pointer; I don't think there's any reason to believe that would affect timing.
Re: What exactly are T-states doing?
Yeah, I see the DMA in listing 15 now. Looks like DMA cycles aren't quite the same as SlowROM cycles. In a SlowROM cycle /RD or /WR is asserted for 5 cycles out of 8, but in a DMA cycle /RD and /PAWR are asserted for only 4 cycles out of 8 (and it looks like they're asserted simultaneously, which was the main thing I was curious about. I wonder how byuu came up with that 'two stage pipeline' nonsense...)lidnariq wrote:There's 25 different traces; Numbers 8 and up do have PARD and PAWR. Listings 12, 13, and 15 seem to show DMA.
The defect only appeared to be that the PLB and PLD instructions corrupted the stack pointer; I don't think there's any reason to believe that would affect timing.
Here's the relevant section of listing 15 annotated with what's going on during an 8-byte general-purpose DMA transfer:
Code: Select all
Label > D CA CPURD CPUWR PA PARD PAWR RAMSEL REFRSH
Base > He Hex Hex Hex He Hex Hex Hex Hex
__________ __ ____ _____ _____ __ ____ ____ ______ ______
----- fetch STA $420B (three slow cycles)
2452 01 E31F 1 1 05 1 1 1 0
2453 01 E318 1 1 06 1 1 1 0
2454 01 E318 1 1 06 1 1 1 0
2455 FF E318 0 1 06 1 1 1 0
2456 8D E318 0 1 06 1 1 1 0
2457 8D E318 0 1 06 1 1 1 0
2458 8D E318 0 1 06 1 1 1 0
2459 8D E318 0 1 06 1 1 1 0
2460 8D E319 1 1 06 1 1 1 0
2461 8D E319 1 1 06 1 1 1 0
2462 8D E319 1 1 06 1 1 1 0
2463 FF E319 0 1 06 1 1 1 0
2464 0B E319 0 1 06 1 1 1 0
2465 0B E319 0 1 06 1 1 1 0
2466 0B E319 0 1 06 1 1 1 0
2467 0B E319 0 1 06 1 1 1 0
2468 0B E31B 1 1 06 1 1 1 0
2469 0B E31A 1 1 06 1 1 1 0
2470 0B E31A 1 1 06 1 1 1 0
2471 FF E31A 0 1 06 1 1 1 0
2472 42 E31A 0 1 06 1 1 1 0
2473 42 E31A 0 1 06 1 1 1 0
2474 42 E31A 0 1 06 1 1 1 0
2475 42 E31A 0 1 06 1 1 1 0
----- write to $420B (fast cycle, /WR asserted even though it's an internal CPU register!)
2476 42 E31B 1 1 06 1 1 1 0
2477 42 420B 1 1 02 1 1 1 0
2478 42 420B 1 1 02 1 1 1 0
2479 01 420B 1 0 02 1 1 1 0
2480 01 420B 1 0 02 1 1 1 0
2481 01 420B 1 0 02 1 1 1 0
----- fetch RTL (slow cycle)
2482 01 E31B 1 1 06 1 1 1 0
2483 01 E31B 1 1 06 1 1 1 0
2484 01 E31B 1 1 06 1 1 1 0
2485 FF E31B 0 1 06 1 1 1 0
2486 6B E31B 0 1 06 1 1 1 0
2487 6B E31B 0 1 06 1 1 1 0
2488 6B E31B 0 1 06 1 1 1 0
2489 6B E31B 0 1 06 1 1 1 0
----- DMA pre-sync: align to a multiple of 8 clocks since power-on
2490 6B E31F 1 1 06 1 1 1 0
2491 6B E31C 1 1 07 1 1 1 0
2492 6B E31C 1 1 07 1 1 1 0
2493 6B E31C 1 1 07 1 1 1 0
2494 6B E31C 1 1 07 1 1 1 0
2495 6B FFFF 1 1 CF 1 1 1 0
----- DMA setup: 8 clocks
2496 6B FFFF 1 1 CF 1 1 1 0
2497 6B FFFF 1 1 CF 1 1 1 0
2498 6B FFFF 1 1 CF 1 1 1 0
2499 6B FFFF 1 1 CF 1 1 1 0
2500 6B FFFF 1 1 CF 1 1 1 0
2501 6B FFFF 1 1 CF 1 1 1 0
2502 6B FFFF 1 1 CF 1 1 1 0
2503 6B FFFF 1 1 CF 1 1 1 1
----- DMA transfer: 8 clocks x 8 bytes
2504 6B F400 1 1 08 1 1 1 0
2505 6B F400 1 1 08 1 1 1 0
2506 6B F400 1 1 08 1 1 1 0
2507 FF F400 0 1 08 1 0 1 0
2508 00 F400 0 1 08 1 0 1 0
2509 00 F400 0 1 08 1 0 1 0
2510 00 F400 0 1 08 1 0 1 0
2511 00 F401 1 1 08 1 1 1 0
2512 00 F401 1 1 08 1 1 1 0
2513 00 F401 1 1 08 1 1 1 0
2514 00 F401 1 1 08 1 1 1 0
2515 FF F401 0 1 08 1 0 1 0
2516 00 F401 0 1 08 1 0 1 0
2517 00 F401 0 1 08 1 0 1 0
2518 00 F401 0 1 08 1 0 1 0
2519 00 F403 1 1 08 1 1 1 0
2520 00 F402 1 1 08 1 1 1 0
2521 00 F402 1 1 08 1 1 1 0
2522 00 F402 1 1 08 1 1 1 0
2523 FF F402 0 1 08 1 0 1 0
2524 CE F402 0 1 08 1 0 1 0
2525 CE F402 0 1 08 1 0 1 0
2526 CE F402 0 1 08 1 0 1 0
2527 CE F403 1 1 08 1 1 1 0
2528 CE F403 1 1 08 1 1 1 0
2529 CE F403 1 1 08 1 1 1 0
2530 CE F403 1 1 08 1 1 1 0
2531 FF F403 0 1 08 1 0 1 0
2532 39 F403 0 1 08 1 0 1 0
2533 39 F403 0 1 08 1 0 1 0
2534 39 F403 0 1 08 1 0 1 0
2535 39 F407 1 1 08 1 1 1 0
2536 39 F404 1 1 08 1 1 1 0
2537 39 F404 1 1 08 1 1 1 0
2538 39 F404 1 1 08 1 1 1 0
2539 FF F404 0 1 08 1 0 1 0
2540 18 F404 0 1 08 1 0 1 0
2541 18 F404 0 1 08 1 0 1 0
2542 18 F404 0 1 08 1 0 1 0
2543 18 F405 1 1 08 1 1 1 0
2544 18 F405 1 1 08 1 1 1 0
2545 18 F405 1 1 08 1 1 1 0
2546 18 F405 1 1 08 1 1 1 0
2547 FF F405 0 1 08 1 0 1 0
2548 63 F405 0 1 08 1 0 1 0
2549 63 F405 0 1 08 1 0 1 0
2550 63 F405 0 1 08 1 0 1 0
2551 63 F407 1 1 08 1 1 1 0
2552 63 F406 1 1 08 1 1 1 0
2553 63 F406 1 1 08 1 1 1 0
2554 63 F406 1 1 08 1 1 1 0
2555 FF F406 0 1 08 1 0 1 0
2556 10 F406 0 1 08 1 0 1 0
2557 10 F406 0 1 08 1 0 1 0
2558 10 F406 0 1 08 1 0 1 0
2559 10 F407 1 1 08 1 1 1 0
2560 10 F407 1 1 08 1 1 1 0
2561 10 F407 1 1 08 1 1 1 0
2562 10 F407 1 1 08 1 1 1 0
2563 FF F407 0 1 08 1 0 1 0
2564 7C F407 0 1 08 1 0 1 0
2565 7C F407 0 1 08 1 0 1 0
2566 7C F407 0 1 08 1 0 1 0
2567 7C E2FF 1 1 08 1 1 1 0
------ DMA teardown: 8 clocks
2568 7C 22FF 1 1 08 1 1 1 0
2569 7C 22FF 1 1 08 1 1 1 0
2570 7C 22FF 1 1 08 1 1 1 0
2571 7C 22FF 1 1 08 1 1 1 0
2572 7C 22FF 1 1 08 1 1 1 0
2573 7C 22FF 1 1 08 1 1 1 0
2574 7C 22FF 1 1 08 1 1 1 0
2575 7C 22FF 1 1 CF 1 1 1 0
------ DMA post-sync: align to a multiple of 6 clocks since start of pre-sync
2576 7C 22FF 1 1 CF 1 1 1 0
2577 7C 22FF 1 1 CF 1 1 1 0
2578 7C 22FF 1 1 CF 1 1 1 0
2579 7C 22FF 1 1 CF 1 1 1 0
------- opcode execution resumes: two internal operation cycles for RTL
2580 7C E31C 1 1 07 1 1 1 0
2581 7C E31C 1 1 07 1 1 1 0
2582 7C E31C 1 1 07 1 1 1 0
2583 7C E31C 1 1 07 1 1 1 0
2584 7C E31C 1 1 07 1 1 1 0
2585 7C E31C 1 1 07 1 1 1 0
2586 7C E31C 1 1 07 1 1 1 0
2587 7C E31C 1 1 07 1 1 1 0
2588 7C E31C 1 1 07 1 1 1 0
2589 7C E31C 1 1 07 1 1 1 0
2590 7C E31C 1 1 07 1 1 1 0
2591 7C E31C 1 1 07 1 1 1 0
------- fetch return address from stack
2592 7C E1FC 1 1 0F 1 1 1 0
2593 7C 01FA 1 1 CE 1 1 0 0
2594 7C 01FA 1 1 CE 1 1 0 0
2595 7C 01FA 0 1 CE 1 1 0 0
2596 3F 01FA 0 1 CE 1 1 0 0
2597 3F 01FA 0 1 CE 1 1 0 0
2598 3F 01FA 0 1 CE 1 1 0 0
2599 3F 01FA 0 1 CE 1 1 0 0
(snip)