Opcodes per frame

Are you new to 6502, NES, or even programming in general? Post any of your questions here. Remember - the only dumb question is the question that remains unasked.

Moderator: Moderators

Post Reply
User avatar
bigjt_2
Posts: 82
Joined: Wed Feb 10, 2010 4:00 pm
Location: Indianapolis, IN

Opcodes per frame

Post by bigjt_2 »

Hey all,

To all my fellow yanks reading this, Happy 4th! To everyone else, hi!

Anyway, I seem to remember reading a post on here awhile back that I think was discussing instructions per second. I can't find it now. I'm just curious in finding out roughly (I know it changes depending on the opcodes) how many opcodes the 2A03 can handle during vblank after the NMI fires and during rendering time once vblank is done. Does anyone know offhand?
User avatar
Dwedit
Posts: 4470
Joined: Fri Nov 19, 2004 7:35 pm
Contact:

Post by Dwedit »

On NTSC:
341 ppu pixels per scanline
262 total scanlines.

In one frame (starting from NMI):
20 Vblank scanlines
1 prerender scanline
240 visible scanlines
1 'pre-vblank' scanline

Divide by 3 to turn PPU pixels into CPU cycles.

So that's ~2273 total CPU cycles during vblank time. Of course, you don't get all of them, because entering an interrupt itself takes some cycles, and running all the logic to get ready to draw takes time too.

Most instructions you'd run during vblank time are 4 cycles long.
Here come the fortune cookies! Here come the fortune cookies! They're wearing paper hats!
tepples
Posts: 22345
Joined: Sun Sep 19, 2004 11:12 pm
Location: NE Indiana, USA (NTSC)
Contact:

Post by tepples »

Let me answer what will probably be your next question.

To get the most out of vblank time, prepare a buffer in main RAM (for example, use unused parts of the stack at $0100-$019F) before vblank, and then copy from that buffer into VRAM during vblank. The limiting factor becomes how much you can stuff into VRAM. On NTSC, count on being able to copy 160 bytes to nametables using a moderately unrolled loop, plus one 256-byte display list to OAM.

Disch's doc explains more.
Last edited by tepples on Sun Jul 04, 2010 3:46 pm, edited 1 time in total.
User avatar
blargg
Posts: 3717
Joined: Mon Sep 27, 2004 8:33 am
Location: Central Texas, USA
Contact:

Post by blargg »

Probably the best answer to this question would be to profile a few common games to see the actual average number of instructions during each of those periods. Note that opcode refers to the operation code byte of an instruction, for example $A9 for LDA #imm. The opcode is examined to determine what instruction is executed.
User avatar
bigjt_2
Posts: 82
Joined: Wed Feb 10, 2010 4:00 pm
Location: Indianapolis, IN

Post by bigjt_2 »

Dwedit wrote: So that's ~2273 total CPU cycles during vblank time.
Most instructions you'd run during vblank time are 4 cycles long.
So we're looking at approximately 500 instructions. Thanks Dwedit.
tepples wrote:Let me answer what will probably be your next question.
To get the most out of vblank time, prepare a buffer in main RAM before vblank, and then copy from that buffer into VRAM during vblank.
Yep, that's how I do it. I load all the columns of background tiles that need to be drawn, score updates, etc. in the game loop and then have the NMI handler load them if there's been a change during the previous frame. I was just curious for curiosity's sake. But also it might be helpful later on. Thanks everyone.
User avatar
tokumaru
Posts: 12106
Joined: Sat Feb 12, 2005 9:43 pm
Location: Rio de Janeiro - Brazil

Post by tokumaru »

bigjt_2 wrote:So we're looking at approximately 500 instructions.
If you want my honest opinion, timing it in terms of "instructions" is a very bad idea. Typical VRAM-updating code will use instructions that vary between 2 and 5 cycles, while most should use 3 or 4, but I'm not really sure what a good average would be.

Also, if you have loops, it's not like you can just look at the source file and use the number of lines your code takes as an estimate of how much time it will need to execute, you have to take into consideration how many times the loop will repeat.

Another important thing is that even though a sprite DMA is triggered by a 4-cycle instruction (ST* $4014), the actual data transfer takes 513 cycles, so the math will be really off if you time your update routines by counting instructions.

Have you tried debugging your code with Nintendulator? You could set up a breakpoint for when the video updates finish and based on the timing information the emulator shows you will know how much time you have left (or if you went past VBlank, which is not good!).
User avatar
bigjt_2
Posts: 82
Joined: Wed Feb 10, 2010 4:00 pm
Location: Indianapolis, IN

Post by bigjt_2 »

tokumaru wrote: Another important thing is that even though a sprite DMA is triggered by a 4-cycle instruction (ST* $4014), the actual data transfer takes 513 cycles, so the math will be really off if you time your update routines by counting instructions.
I didn't even think about sprite DMA. It takes that many cycles? I guess I'm not surprised when I consider it's transferring everything in sprite RAM to the PPU, but that's pretty interesting.

Thanks all. As always, I learned a lot from this.
User avatar
tokumaru
Posts: 12106
Joined: Sat Feb 12, 2005 9:43 pm
Location: Rio de Janeiro - Brazil

Post by tokumaru »

bigjt_2 wrote:I didn't even think about sprite DMA. It takes that many cycles? I guess I'm not surprised when I consider it's transferring everything in sprite RAM to the PPU, but that's pretty interesting.
Yeah, it transfers 256 bytes from CPU memory to OAM. 513 cycles may seem like a long time, but this is practically 2 cycles per byte, much faster than would be possible without DMA. Even with the fastest unrolled code possible, it would take 7 (if you use all of zero page for sprites, which is not practical at all) or 8 cycles for each byte, for a total of 1792 or 2048 cycles, nearly all of VBlank. If you look at it like that, 513 is pretty damn fast.
Post Reply