What's the real cycle timing for LDA abs,x?

Discussion of hardware and software development for Super NES and Super Famicom. See the SNESdev wiki for more information.

Moderator: Moderators

Forum rules
  • For making cartridges of your Super NES games, see Reproduction.
Post Reply
psycopathicteen
Posts: 3138
Joined: Wed May 19, 2010 6:12 pm

What's the real cycle timing for LDA abs,x?

Post by psycopathicteen »

With both index and accumulator being in 16-bit mode, first it was 5 cycles, then it was 6 cycles, now every document says it's 5 cycles again? I checked all the WDC manuals and they say nothing about an extra cycle being added when index registers are in 16-bit mode, only when they are crossing pages. Does anybody understand where the confusion came from?
93143
Posts: 1711
Joined: Fri Jul 04, 2014 9:31 pm

Re: What's the real cycle timing for LDA abs,x?

Post by 93143 »

From 65c816.txt:

Code: Select all

12a Absolute,X -- a,x
   (BlT,LDY,STZ,ORA,AND,EOR,ADC,STA,LDA,CMP,SBC)
   (11 Op Codes)
   (3 bytes)
   (4,5 and 6 cycles)

		1	1  1   1  1	PBR,PC		Op code		1
		2	1  1   0  1	PBR,PC+1	AAL		1
		3	1  1   0  1	PBR,PC+2	AAH		1
	    (4) 3a	1  1   0  0	DBR,AAH,AAL+XL	IO		1
		4	1  1   1  0	DBR,AA+X	Data Low	1/0
	    (1) 4a	1  1   1  0	DBR,AA+X+1	Data High 	1/0

Code: Select all

Notes
    (1) Add 1 byte (for immediate only) for M=O or X=O (i.e. 16 bit data),
	 add 1 cycle for M=O or X=0.

Code: Select all

    (4) Add 1 cycle for indexing across page boundaries, or write, or X=0.
	 When X=1 or in the emulation mode, this cycle contains invalid
	 addresses.
From Eyes and Lichty:

LDA.png

From the wiki:

LDA_wiki.png


...so, I'm not quite sure. I'd like to know too, considering I was cycle-counting a routine that uses this exact instruction with M=X=0 just last night.

I suppose one could dig into a known-accurate emulator's source code and get another vote...
psycopathicteen
Posts: 3138
Joined: Wed May 19, 2010 6:12 pm

Re: What's the real cycle timing for LDA abs,x?

Post by psycopathicteen »

A few years ago superfamicom.wiki was the only website that didn't have that detail, now it's the only one that does.

BSNES does 6 cycles, from looking at it's debugger trace.
turboxray
Posts: 346
Joined: Thu Oct 31, 2019 12:56 am

Re: What's the real cycle timing for LDA abs,x?

Post by turboxray »

I mean you can always benchmark like a few hundred of those same instructions on the real system to verify. But the part about page boundary penalty cycle or index being 16bit makes sense.. from the perspective that the page boundary penalty still exists, then index = 16bit could force a page boundary crossing (and maybe it's just automatically enforced regardless of the value of Y/Y as long as they're in 16bit mode). Not saying it's not weird, but makes sense in that context.
Oziphantom
Posts: 1554
Joined: Tue Feb 07, 2017 2:03 am

Re: What's the real cycle timing for LDA abs,x?

Post by Oziphantom »

adding 16bits to 8 or 16 bits is the same, ie when you add 16bits X to A it doesn't matter if you are adding it to 8bits A or 16bits A as you will have to expand A to 16bits either way, however the CPU has a upper byte skip detection system. I.e if you do 16 + 8 and the `+8` doesn't cross the "page" aka need to inc the upper byte, the upper byte inc will be skipped. But if you are adding X at 16 bits it doesn't matter, it will always have to do the upper add, it doesn't check to see if the upper 8bits of X are 0, it just adds it either way. So it will always need 2 cycles for the `+X` operation in the 16bit index case. Crossing a page boundary or adding 16bits is the same, so in the 16bit index mode you will not incur an extra cycle for crossing a page, you are already paying it.

Thus the instruction is 4 cycles with
1 cycle if it needs to inc/add the upper 8bits of the address, i.e cross a page boundary or X is 16bits
1 cycle if you are reading 16bits of Data into A, i.e A is 16bits
Thus

Code: Select all

A8  X8  = 4 ( no  page, no  4, no  1 )
A8  X8  = 5 ( yes page, no  4, no  1 )
A16 X8  = 5 ( no  page, no  4, yes 1 )
A16 X8  = 6 ( yes page, no  4, yes 1 )
A8  X16 = 5 ( no  page, yes 4, no  1 )
A8  X16 = 5 ( yes page, yes 4, no  1 )
A16 X16 = 6 ( no  page, yes 4, yes 1 )
A16 X16 = 6 ( yes page, yes 4, yes 1 )
is my understanding of how it works.
creaothceann
Posts: 599
Joined: Mon Jan 23, 2006 7:47 am
Location: Germany
Contact:

Re: What's the real cycle timing for LDA abs,x?

Post by creaothceann »

psycopathicteen wrote: Sat Feb 10, 2024 9:24 pm I checked all the WDC manuals and they say nothing about an extra cycle being added when index registers are in 16-bit mode
datasheet page 24 note 3
My current setup:
Super Famicom ("2/1/3" SNS-CPU-GPM-02) → SCART → OSSC → StarTech USB3HDCAP → AmaRecTV 3.10
turboxray
Posts: 346
Joined: Thu Oct 31, 2019 12:56 am

Re: What's the real cycle timing for LDA abs,x?

Post by turboxray »

https://novasquirrel.github.io/SnesInst ... CycleTool/
^ agrees with 6 cycles. And if you force page boundary in the options while X=1, you don't get an additional penalty either (which again, makes sense).
psycopathicteen
Posts: 3138
Joined: Wed May 19, 2010 6:12 pm

Re: What's the real cycle timing for LDA abs,x?

Post by psycopathicteen »

creaothceann wrote: Mon Feb 12, 2024 12:39 am
psycopathicteen wrote: Sat Feb 10, 2024 9:24 pm I checked all the WDC manuals and they say nothing about an extra cycle being added when index registers are in 16-bit mode
datasheet page 24 note 3
That's not what I was talking about. I know that it takes 5 cycles when the accumulator is in 16-bit mode. The question was whether or not it's 6 cycles when both accumulator and index registers are 16-bit at the same time.
Drag
Posts: 1609
Joined: Mon Sep 27, 2004 2:57 pm
Contact:

Re: What's the real cycle timing for LDA abs,x?

Post by Drag »

My interpretation of the datasheet, plus what I know about the 6502:
  1. Read opcode.
  2. Read address LSB.
  3. Read address MSB while adding X LSB to address LSB.
  4. (Optional) Read from incompletely-calculated address and discard while adding X MSB (and LSB carry) to address MSB.
  5. Read from fully-calculated address into accumulator LSB.
  6. (Optional) Read from address+1 into accumulator MSB.
Table 5-7 and its footnotes suggest that a 16-bit index always results in step 4 happening, but Table 3-1 doesn't mention this and says step 4 is only when "page boundary is crossed when forming address".

This should be pretty easy to test though, try benchmarking LDA $nnn0,X when the X register is $0001, and then when it's $0101, and see if one takes longer than the other. If they take equal time, then Table 5-7's notes are correct and step 4 is always taken. If they take different time, then Table 3-1 is correct and step 4 is only when the index calculation truly requires touching the MSB.

EDIT: Page 50, point 7.5 of the datasheet also agrees with Table 5-7, as in, when the index mode is set to 16-bit width, step 4 always occurs regardless of whether a page crossing happens.
Post Reply