I figure you might be interested in this, given what you're currently working on:
http://www.devic.us/hacks/zilog-z80-und ... -behavior/
ETA: There's a slightly confusing thing about those tables: the timing for conditional instructions doesn't indicate whether the condition was met or not. Hence JR Z is shown as taking fewer cycles than the other JR cc instructions, but that's because it's showing an untaken branch whereas the others are showing taken branches. It gets especially confusing with RET because unconditional RET is 1 t-state faster than (taken) RET cc.
Timing for all Z80 opcodes incl. undocumented (attn: byuu)
-
AWJ
- Posts: 433
- Joined: Mon Nov 10, 2008 3:09 pm
-
AWJ
- Posts: 433
- Joined: Mon Nov 10, 2008 3:09 pm
Re: Timing for all Z80 opcodes incl. undocumented (attn: byu
If you look at the Zilog manual and the bus traces, you can see that Z80 cycles break down into just a few categories:
Opcode fetch cycles aka M1 cycles. 2 clocks to read memory + 2 clocks to decode the opcode while refreshing DRAM.
Read/write cycles, 3 clocks. Operand fetches (including the displacements of IX/IY indexed instructions) are normal read cycles, but the second byte of a prefixed instruction is an opcode fetch.
Port in/out cycles, 4 clocks.
Internal operations, which can come after any other cycle type, up to 5 in a row depending on the operation. Technically 1- or 2-clock internal operations are part of the memory cycle they come after, and 5-clock internal operations comprise an entire machine cycle in their own right, but I don't think the difference is visible externally or matters for emulation.
Here's something very weird: Normally, the second byte of a $CB or $ED prefixed instruction is an opcode fetch (i.e. it takes 4 clocks, asserts M1, and refreshes DRAM) However, in an instruction with both a $DD/$FD prefix and a $CB prefix (i.e. an indexed bitwise instruction) the instruction encoding is $DD/$FD, $CB, displacement, subop, and the subop fetch turns into a normal read with 2 internal operation clocks after it. I assume the instructions are encoded that way so that the effective address calculation can be overlapped with the sub-opcode fetch, and the mutation of the sub-opcode fetch into a non-M1 cycle is a side effect of the out-of-order encoding. One case where this quirk is important is arcade machines with encrypted opcodes.
Since opcode fetches require the memory to respond faster than normal read/write cycles do, some machines (like the MSX) have an externally-inserted wait state only on M1 cycles. I don't think that applies to any of the Sega consoles though.
Examples:
PUSH rr:
Fetch opcode
1 internal operation (decrementing SP)
Write high register byte
Write low register byte
Total: 11 clocks
POP rr:
Fetch opcode
Read low register byte
Read high register byte
Total: 10 clocks
ADD A,(HL):
Fetch opcode
Read memory
Total: 7 clocks
ADD A,(IX+d):
Fetch $DD prefix
Fetch opcode
Read displacement
5 internal operations (calculating IX+d)
Read memory
Total: 19 clocks (remember when I said Z80 indexed instructions were slow?)
RES/SET/BIT/RLC/etc (HL):
Fetch $CB prefix
Fetch sub-opcode
Read memory
1 internal operation
Write memory (except for BIT)
Total: 15 clocks (12 for BIT)
RES/SET/BIT/RLC/etc (IX+d):
Fetch $DD prefix
Fetch $CB prefix
Read displacement
Read (not fetch!) sub-opcode
2 internal operations (calculating IX+d; partly overlapped with sub-opcode read)
Read memory
1 internal operation
Write memory (except for BIT)
Total: 23 clocks (20 for BIT)
Opcode fetch cycles aka M1 cycles. 2 clocks to read memory + 2 clocks to decode the opcode while refreshing DRAM.
Read/write cycles, 3 clocks. Operand fetches (including the displacements of IX/IY indexed instructions) are normal read cycles, but the second byte of a prefixed instruction is an opcode fetch.
Port in/out cycles, 4 clocks.
Internal operations, which can come after any other cycle type, up to 5 in a row depending on the operation. Technically 1- or 2-clock internal operations are part of the memory cycle they come after, and 5-clock internal operations comprise an entire machine cycle in their own right, but I don't think the difference is visible externally or matters for emulation.
Here's something very weird: Normally, the second byte of a $CB or $ED prefixed instruction is an opcode fetch (i.e. it takes 4 clocks, asserts M1, and refreshes DRAM) However, in an instruction with both a $DD/$FD prefix and a $CB prefix (i.e. an indexed bitwise instruction) the instruction encoding is $DD/$FD, $CB, displacement, subop, and the subop fetch turns into a normal read with 2 internal operation clocks after it. I assume the instructions are encoded that way so that the effective address calculation can be overlapped with the sub-opcode fetch, and the mutation of the sub-opcode fetch into a non-M1 cycle is a side effect of the out-of-order encoding. One case where this quirk is important is arcade machines with encrypted opcodes.
Since opcode fetches require the memory to respond faster than normal read/write cycles do, some machines (like the MSX) have an externally-inserted wait state only on M1 cycles. I don't think that applies to any of the Sega consoles though.
Examples:
PUSH rr:
Fetch opcode
1 internal operation (decrementing SP)
Write high register byte
Write low register byte
Total: 11 clocks
POP rr:
Fetch opcode
Read low register byte
Read high register byte
Total: 10 clocks
ADD A,(HL):
Fetch opcode
Read memory
Total: 7 clocks
ADD A,(IX+d):
Fetch $DD prefix
Fetch opcode
Read displacement
5 internal operations (calculating IX+d)
Read memory
Total: 19 clocks (remember when I said Z80 indexed instructions were slow?)
RES/SET/BIT/RLC/etc (HL):
Fetch $CB prefix
Fetch sub-opcode
Read memory
1 internal operation
Write memory (except for BIT)
Total: 15 clocks (12 for BIT)
RES/SET/BIT/RLC/etc (IX+d):
Fetch $DD prefix
Fetch $CB prefix
Read displacement
Read (not fetch!) sub-opcode
2 internal operations (calculating IX+d; partly overlapped with sub-opcode read)
Read memory
1 internal operation
Write memory (except for BIT)
Total: 23 clocks (20 for BIT)
-
Near
- Founder of higan project
- Posts: 1553
- Joined: Mon Mar 27, 2006 5:23 pm
Re: Timing for all Z80 opcodes incl. undocumented (attn: byu
Sorry, didn't notice this sooner. Don't usually frequent this subforum.
(That and I lost a week learning about X509 certificates. Those things are horrifically complex.)
Greatly appreciate the info!! Glad to have gotten it now before I went too far into writing the core.
I scrapped what I had and started over to support the T-cycles properly (including the extra clock for opcode fetches), as well as to handle the way you can stack DD/FD opcode prefix flags; and to roll them into the regular tables, so that there's only three now (main, CB, ED.)
This CPU is certainly a lot less awful to emulate than the 68K, but it's still not very fun >_>
Just out of curiosity, does anyone know the bus hold delays for the various read/write/in/out operations on the Z80?
Eg is it:
* wait 4 clocks
* read from in
* return in value
Or more like:
* wait 2 clocks
* read from in
* wait 2 clocks
* retur in value
If we have no idea, then I'll just guess something for the time being.
(That and I lost a week learning about X509 certificates. Those things are horrifically complex.)
Greatly appreciate the info!! Glad to have gotten it now before I went too far into writing the core.
I scrapped what I had and started over to support the T-cycles properly (including the extra clock for opcode fetches), as well as to handle the way you can stack DD/FD opcode prefix flags; and to roll them into the regular tables, so that there's only three now (main, CB, ED.)
This CPU is certainly a lot less awful to emulate than the 68K, but it's still not very fun >_>
Just out of curiosity, does anyone know the bus hold delays for the various read/write/in/out operations on the Z80?
Eg is it:
* wait 4 clocks
* read from in
* return in value
Or more like:
* wait 2 clocks
* read from in
* wait 2 clocks
* retur in value
If we have no idea, then I'll just guess something for the time being.
-
AWJ
- Posts: 433
- Joined: Mon Nov 10, 2008 3:09 pm
Re: Timing for all Z80 opcodes incl. undocumented (attn: byu
You know that memory accesses aren't instantaneous but consist of a sequence of operations, right? The timing for every signal for every type of cycle (fetch, read, write, in, out) is shown starting on page 13 of the Zilog manual (page 33 of the PDF).byuu wrote:Sorry, didn't notice this sooner. Don't usually frequent this subforum.
(That and I lost a week learning about X509 certificates. Those things are horrifically complex.)
Greatly appreciate the info!! Glad to have gotten it now before I went too far into writing the core.
I scrapped what I had and started over to support the T-cycles properly (including the extra clock for opcode fetches), as well as to handle the way you can stack DD/FD opcode prefix flags; and to roll them into the regular tables, so that there's only three now (main, CB, ED.)
This CPU is certainly a lot less awful to emulate than the 68K, but it's still not very fun >_>
Just out of curiosity, does anyone know the bus hold delays for the various read/write/in/out operations on the Z80?
Eg is it:
* wait 4 clocks
* read from in
* return in value
Or more like:
* wait 2 clocks
* read from in
* wait 2 clocks
* retur in value
If we have no idea, then I'll just guess something for the time being.
The important takeaway is that opcode fetches are compressed into just 2 clocks; the second 2 clocks of an M1 cycle are DRAM refresh, in which the Z80 puts the contents of the R register on the address bus and then increments the lower 7 bits of R (you probably don't have to emulate the refresh itself, but you do need to emulate the R register because software can read it; it's sometimes used by games as a PRNG seed)
I think what you really want to know is "if the Z80 does a read/write that triggers an interrupt from some device, does the device respond fast enough to interrupt the Z80 before it starts the next instruction?" And that depends on the hardware responding to the write (e.g. the VDP), so you'll have to consult Sega-specific documentation.
Memory RMW operations on the Z80 have one internal operation between the read and the write for the same reason they do on the 6502: it takes time to actually do the inc/dec/shift/whatever. Like I said, the Z80 manual shows most IOs as part of the preceding memory cycle (which they are from the perspective of the chip's microcode, I guess). The exact breakdown of that instruction for bus timing purposes is:byuu, on twitter wrote:Why does [inc (hl)] take 11 cycles?
fetch/decode opcode (2+2 clocks)
read memory (3 clocks)
internal operation (1 clock)
write memory (3 clocks)
Every place the manual shows a memory read taking more than 3 clocks, or an opcode fetch/decode taking more than 4 clocks, it should be interpreted as "standard read or fetch cycle with (n - 3) or (n - 4) internal operations after".
-
Near
- Founder of higan project
- Posts: 1553
- Joined: Mon Mar 27, 2006 5:23 pm
Re: Timing for all Z80 opcodes incl. undocumented (attn: byu
> You know that memory accesses aren't instantaneous but consist of a sequence of operations, right?
Considering I emulate that on the SNES and am asking about it now, I'd go with yes ;)
The problem is that I can't really determine how best to simulate these things even on platforms where I can run my own code on.
It's not really practical to emulate the entire bus propagation delay, especially when we don't even know when other things are supposed to respond.
> And that depends on the hardware responding to the write (e.g. the VDP), so you'll have to consult Sega-specific documentation.
They never document things at that fine a granularity :/
I only got where I did on the SNES because there were two ways to latch H/V counters. One for read, one for write.
> the Z80 manual shows most IOs as part of the preceding memory cycle
Ah, cool. Then my guess was correct. Thank you for confirming!
Considering I emulate that on the SNES and am asking about it now, I'd go with yes ;)
The problem is that I can't really determine how best to simulate these things even on platforms where I can run my own code on.
It's not really practical to emulate the entire bus propagation delay, especially when we don't even know when other things are supposed to respond.
> And that depends on the hardware responding to the write (e.g. the VDP), so you'll have to consult Sega-specific documentation.
They never document things at that fine a granularity :/
I only got where I did on the SNES because there were two ways to latch H/V counters. One for read, one for write.
> the Z80 manual shows most IOs as part of the preceding memory cycle
Ah, cool. Then my guess was correct. Thank you for confirming!
-
AWJ
- Posts: 433
- Joined: Mon Nov 10, 2008 3:09 pm
Re: Timing for all Z80 opcodes incl. undocumented (attn: byu
No, just no. There's a very good reason why AND has higher precedence than OR or XOR in literally every programming language in the world. AND is the Boolean analogue of multiplication and OR is the Boolean analogue of addition (XOR is addition modulus 2). Fighting against the fundamentals of algebra is... not a good start for your programming language.there's no reason for the bitwise and logical operators to be different precedent levels
-
Near
- Founder of higan project
- Posts: 1553
- Joined: Mon Mar 27, 2006 5:23 pm
Re: Timing for all Z80 opcodes incl. undocumented (attn: byu
It's impressive that I've been programming for 20 years and have never heard AND referred to as boolean multiplication, and OR as boolean addition. I've further never come across code that relied on the precedence of AND to be higher than OR. Why does XOR get precedence between AND and OR, then? And just out of curiosity, what about the rest of the operations? NOT, NAND, NOR, XNOR, etc? Is one of them divide, subtract, regular modulus, etc?
Ah well, at any rate, thanks for helping me dodge a bullet prior to any kind of formalization. A shame you weren't around when the PHP devs started out to explain to them why ternary should have right-to-left associativity, heh.
But ... https://en.wikipedia.org/wiki/Logical_c ... precedence
Why isn't XOR (exclusive disjunction) listed in that table? Is there a more thorough table that includes it?
> not a good start for your programming language.
I didn't expect to knock it out of the park with only a month's worth of practice writing programming languages. Still, I hope to do the best I can, and take input from others that know more than I do here.
I'm pretty lost right now with a billion possibilities and trying to find the best compromises for my values.
Ah well, at any rate, thanks for helping me dodge a bullet prior to any kind of formalization. A shame you weren't around when the PHP devs started out to explain to them why ternary should have right-to-left associativity, heh.
But ... https://en.wikipedia.org/wiki/Logical_c ... precedence
Why isn't XOR (exclusive disjunction) listed in that table? Is there a more thorough table that includes it?
> not a good start for your programming language.
I didn't expect to knock it out of the park with only a month's worth of practice writing programming languages. Still, I hope to do the best I can, and take input from others that know more than I do here.
I'm pretty lost right now with a billion possibilities and trying to find the best compromises for my values.
-
thefox
- Posts: 3134
- Joined: Mon Jan 03, 2005 10:36 am
- Location: the universe
Re: Timing for all Z80 opcodes incl. undocumented (attn: byu
Ada (and VHDL) take a different approach: and/or/xor have equal precedence, and it's an error to mix them without disambiguating. For example, true and false or true is an error, true and (false or true) and (true and false) or true are OK.AWJ wrote:No, just no. There's a very good reason why AND has higher precedence than OR or XOR in literally every programming language in the world. AND is the Boolean analogue of multiplication and OR is the Boolean analogue of addition (XOR is addition modulus 2). Fighting against the fundamentals of algebra is... not a good start for your programming language.there's no reason for the bitwise and logical operators to be different precedent levels
Download STREEMERZ for NES from fauxgame.com! — Some other stuff I've done: fo.aspekt.fi
-
AWJ
- Posts: 433
- Joined: Mon Nov 10, 2008 3:09 pm
Re: Timing for all Z80 opcodes incl. undocumented (attn: byu
One of the important ways conjunction and disjunction are analogous to multiplication and addition is that the same distributive law applies: A & (B | C) equals A & B | A & C, just like A(B + C) equals AB + AC. Older types of programmable logic hardware such as PALs consisted of planes of AND gates linked by OR gates, and therefore implemented logic expressed in disjunctive normal form (example). Reverse engineering PAL dumps is one-half figuring out what the inputs and outputs mean and one-half algebraic factoring.byuu wrote:It's impressive that I've been programming for 20 years and have never heard AND referred to as boolean multiplication, and OR as boolean addition. I've further never come across code that relied on the precedence of AND to be higher than OR. Why does XOR get precedence between AND and OR, then?
XOR isn't a fundamental operation in Boolean algebra because it can be expressed in terms of AND and OR. I'm actually not sure why XOR has higher precedence than OR in C-derived languages. It doesn't in Ruby, but it does in Python. Both of those languages, incidentally, fix the C brain damage of bitwise operators having lower precedence than comparison operators.
A practical programming benefit to the algebraic AND/OR precedence rules is that bit-mixing operations like "a & amask | b & bmask | c & cmask" don't need parentheses.