CPU budget questions

Discuss technical or other issues relating to programming the Nintendo Entertainment System, Famicom, or compatible systems.

Moderator: Moderators

Post Reply
User avatar
sonder
Posts: 116
Joined: Wed Jun 26, 2013 12:35 pm
Location: Baltimore
Contact:

CPU budget questions

Post by sonder »

In my experiments last night I tried to update the attribute table for a single screen with 64 random values.

Boy was I in for a surprise - 64 random values within the vblank is out of the question for the NES. Glitch city. I was able to make a for loop in C that wrote 54 immediate values without glitching the screen. However, by my calculations I should have been able to do something in the area of 80 values. If there are 113.667 CPU cycles per scanline and 20 scanlines in the vblank, and an itteration of my loop was around 27 cycles (I counted the CA65 output's cycles) that's what it works out to (working very roughly here - my math habits are often fuzzy.) But that's well more than 54 iterations. What's going on here? Assuming I counted the instruction timings correctly.

Here's my setup: I do everything in the NMI - and the very first thing is to call the main loop routine, therefore anything done there first should be within vblank. And here is the asm of the main routine:

Code: Select all

.segment	"CODE"

.proc	_main: near

	.dbg	func, "main", "00", extern, "_main"

.segment	"CODE"

;
; poke( PPU_CTRL, 0x90 );
;
	.dbg	line, "game.c", 57
	lda     #$90
	sta     $2000
;
; vram_adr( 0x23c0 );
;
	.dbg	line, "game.c", 58
	ldx     #$23
	lda     #$C0
	jsr     _vram_adr
;
; for ( i=0; i != 50; i++ ) {
;
	.dbg	line, "game.c", 59
	lda     #$00
L003E:	sta     _i
	cmp     #$32
	beq     L002F
;
; poke( PPU_DATA, 1 );
;
	.dbg	line, "game.c", 60
	lda     #$01
	sta     $2007
;
; for ( i=0; i != 50; i++ ) {
;
	.dbg	line, "game.c", 59
	lda     _i
	clc
	adc     #$01
	jmp     L003E
;
; j++;
;
	.dbg	line, "game.c", 62
L002F:	lda     _j
	clc
	adc     #$01
	sta     _j
;
; }
;
	.dbg	line, "game.c", 67
	rts
	.dbg	line
Just curious here. If 6502's listed timings are different on the NES for some reason that would be good to know.
sonder
Shiru
Posts: 1161
Joined: Sat Jan 23, 2010 11:41 pm

Re: CPU budget questions

Post by Shiru »

There is really short period when the VRAM access is possible - ~2700t, and 513+ are needed to do sprite DMA. So, if you need to put many values into VRAM, you have to write this part in assembly. Even further, you should prepare the values before getting into NMI, and use unrolled loops. If you have enough RAM, you can even generate a pusher subroutine, like sequence of 'lda #nn sta PPU_DATA', this way you'll get 6t/byte for sequental write and will be able to push a sequence of ~350 bytes. Much less for random writes, like 100 bytes.
User avatar
thefox
Posts: 3139
Joined: Mon Jan 03, 2005 10:36 am
Location: Tampere, Finland
Contact:

Re: CPU budget questions

Post by thefox »

You can use NintendulatorDX to figure out how many cycles a piece of code is taking. Simply write to $4020 when you want to start the timing and to $4030 when you want it to end. The README file has more info.
Download STREEMERZ for NES from fauxgame.com! — Some other stuff I've done: fo.aspekt.fi
User avatar
sonder
Posts: 116
Joined: Wed Jun 26, 2013 12:35 pm
Location: Baltimore
Contact:

Re: CPU budget questions

Post by sonder »

Shiru wrote:There is really short period when the VRAM access is possible - ~2700t, and 513+ are needed to do sprite DMA. So, if you need to put many values into VRAM, you have to write this part in assembly. Even further, you should prepare the values before getting into NMI, and use unrolled loops. If you have enough RAM, you can even generate a pusher subroutine, like sequence of 'lda #nn sta PPU_DATA', this way you'll get 6t/byte for sequental write and will be able to push a sequence of ~350 bytes. Much less for random writes, like 100 bytes.
ohhhh, THAT was what was causing the glitches! the other stuff in the nmi routine. d'oh. no wonder i was actually also seeing flashing colors and shearing. on a side note the glitches were actually quite beautiful to me. i have a mind to see how i can abuse the PPU to produce these kinds of effects intentionally.

yeah, I agree, asm is definitely a better way to go here, and preparing "batches" beforehand as i see is already set up in your code. I was just using these tests as a way to get a handle on C and performance. those are some good ideas for optimization / squeezing out more from the hardware.

Thanks for the tip, thefox. I use NDX, but I will have to look into that.
sonder
User avatar
blargg
Posts: 3717
Joined: Mon Sep 27, 2004 8:33 am
Location: Central Texas, USA
Contact:

Re: CPU budget questions

Post by blargg »

For this on-the-fly code, you can use X and Y for commonly used values. Do the updates to each 256-byte page of VRAM together and then X can hold $20 for the first few, then $21, etc., and Y whatever the most common value is. So you get code like

Code: Select all

ldx #$20
ldy #$00
stx $2006
lda #$08
sta $2006
sty $2007
stx $2006
lda #$25
sta $2006
lda #$10
sta $2007
stx $2006
User avatar
sonder
Posts: 116
Joined: Wed Jun 26, 2013 12:35 pm
Location: Baltimore
Contact:

Re: CPU budget questions

Post by sonder »

I thought I should mention that I've settled on a hybrid NMI approach - vram updates and music in the VMI, with a "callback" in the NMI to do whatever custom stuff needs doing, and controller polling and game logic in the main thread. Trying to poll controllers in the NMI was a disaster, not sure why. Anyway I think splitting the threads is a good idea in general for framerate and CPU control.
sonder
Post Reply