VRAM buffer?

Are you new to 6502, NES, or even programming in general? Post any of your questions here. Remember - the only dumb question is the question that remains unasked.

Moderator: Moderators

Post Reply
JoeGtake2
Posts: 333
Joined: Tue Jul 01, 2014 4:02 pm

VRAM buffer?

Post by JoeGtake2 »

I'm looking for a starting point here as to best methods for creating a simple vram buffer. I understand the reasons to have one; the limited amount of time for graphical updates in the vBlank. So for things like HUD updates, background graphic changes, etc...I get it. I get that the concept would be to do all of the math and logic outside of the vBlank and have it in cue so that when vBlank hits, it does the update.

So I conceptually understand, I just don't know where to start to maximize that NMI time with doing the least amount of compares in it. Part of this is a little proverbial rubber-duck debugging, but I'm definitely genuinely curious as to best methods.

So for instance, simple HUD with a pictorial life meter. Just to muscle it to work, I had a system where in each NMI, I drew out a series of blank hearts, then looped through based on a health variable how many 'full' hearts to draw over the top. This is how I might handle it in a more modern engine, so it's where my mind went. This was fine, except I know that it'll kill me later.

I would figure that the math would be done outside the NMI to determine which would be full and which would be empty. It seems like I could create a little ram buffer table and in it determine the state of each image (full or empty) outside of NMI, and then just draw that whole table. This would cut the writes in half...but it still feels like there would be a more efficient way to change only the tiles that need changing. Make some function that stores the number of tiles that need changing, the address of those tiles, and the tiles they need to be changed to, and then call those values in the NMI? Maybe do different types of updates on even/odd numbered frames or something to ensure there is time for everything?

How do you guys handle this? Any advice?
User avatar
tokumaru
Posts: 12106
Joined: Sat Feb 12, 2005 9:43 pm
Location: Rio de Janeiro - Brazil

Re: VRAM buffer?

Post by tokumaru »

Here's a recent discussion about one way to implement a VRAM buffer, using the stack.

Yes, the idea is to do all the processing beforehand, so that only the actual data transfer takes place during VBlank.

For HUDs, it sure would make sense to decrease the frequency of the updates. I doubt anyone would notice such small delays. I personally wouldn't bother with updating only hearts that changed (or other similar partial updates). Too much complication for little reward.

I'm not a strong defender of generic VRAM update routines, even though I think they are very elegant, because of cases like this one with the HUD. With a generic buffer, from time to time you'd have to waste time pushing harts to a buffer, but since you know this is a kind of update you'll be doing very often, it would make more sense to permanently reserve a RAM area for hearts and have a specialized routine to always copy those same bytes to the same VRAM location, eliminating the need to fill a buffer every time.
JRoatch
Formerly 43110
Posts: 394
Joined: Wed Feb 05, 2014 7:01 am
Location: us-east
Contact:

Re: VRAM buffer?

Post by JRoatch »

My current framework code (which I'll post in it's respective thread once I figure out APU stuff), also uses the stack for VRAM buffers. I use two slightly different buffer formats in the same stack space.

First I have 5 buffers of 32 horizontal or 30 vertical bytes with the addresses in zeropage. The 2 high bits of the addresses are flags for if it's horizontal, vertical, or skipped. This is efficient because of the BIT opcode and that the PPU is mirrored at $4000-$ffff. These 32 byte buffers are used for palette updates, tiles uploads, and nametable rows, columns, attributes. Having flags to indicate skipped also allows the 5 buffers to be independently kept intact without having to rewrite the whole buffer.

The other format is a series of PPU strings in stack. Each string has 2 bytes address/direction, 1 byte tile count, then the tile data. These PPU strings are used for smaller updates like HUD elements or destructible tiles in game.
User avatar
Movax12
Posts: 529
Joined: Sun Jan 02, 2011 11:50 am

Re: VRAM buffer?

Post by Movax12 »

My current fastest VRAM update code that doesn't use the stack page (which is a good solution) or huge amounts of ROM:
It is partially unrolled, and does use a lot more space than slower versions of the same thing.
This is almost exactly the same format that Nintendo used in a lot of games. I modified this a bit without testing, but I think it should work correctly. Needs one temporary variable. Code is ca65 assembly code.

It is still better to code separate routines in situations that need it.

Code: Select all

;This is basically tweaked version (tweaked for speed, not size) of Nintendo's generic vrambuffer write routine (used in SMB, Zelda etc)
 
;Format in the buffer:
    ;PPUAddressHI, PPUAddressLO, datalength (max 32), data, data, ..
    ; datalength is the number of bytes minus 1, so one byte minimal will be written 
    ; (makes things faster, and why wouldn't you write at least one byte?)
 
    ;Note, PPUAddressHI:
    ;bit7 set means no more data blocks (done, exit)
    ;bit6 set means use column mode.
 
 
.proc PPUtransferFast
 
    .pushseg
    .segment "ZEROPAGE"
        dataLength:  .res 1
    .popseg
   
    ; can copy about 140 bytes with OAM transfer? a bit more than this.
 
    ldy #0                           ; y is index into buffer
    sty vramBufferIndex         ; reset index to zero so the main thread can use a fresh buffer
    jmp Start                       ; start at bottom of the loop
 
    loopPPUwrite:
       
        ; reg A now holds the first byte from the buffer:
 
        ; assume horizontal writes, leave NMI active
        ldx #CT_NMI
 
        ; check for vertical write (bit 6)
        ; bit 7 cannot be set here, so we can use CMP
       
        cmp #%01000000
        bcc notVertical        
            ldx #(CT_NMI | CT_ADDRINC32 )
        notVertical:
       
        ; calling code expected to reset PPU_CTRL after if needed.
        stx PPU_CTRL
       
        ; write high byte of PPU address:
        sta PPU_ADDRESS                    ; bit6,7 ignored here.
       
        iny
        lda vramBuffer, y                  ; PPU address low
        sta PPU_ADDRESS
       
        ; load count:
        iny
        lda vramBuffer,y
 
        ; save count here:
        ; this will be shifted to the right to check bits 0 through 4
        ; and write to PPU for each power
 
        sta dataLength
 
        ; count is actually less 1, so 0 means 1 byte, 31 means 32 bytes
        ; so always do 1 write at least:
 
        iny
        lda vramBuffer, y
        sta PPU_DATA
        tya ; count index with reg A
 
        lsr dataLength
        bcc :+
            ldx vramBuffer + 1, y
            stx PPU_DATA
           
            ; we need to only add one to the offset, but still track in A:
            iny
            tya        
        :
 
        lsr dataLength
        bcc :+
            ldx vramBuffer + 1, y
            stx PPU_DATA
            ldx vramBuffer + 2, y
            stx PPU_DATA
           
            ; add 2 to the offset:
           
            adc #1 ; add 2, C is set
            tay
        :
       
        lsr dataLength
        bcc :+
            ldx vramBuffer + 1, y
            stx PPU_DATA
            ldx vramBuffer + 2, y
            stx PPU_DATA
            ldx vramBuffer + 3, y
            stx PPU_DATA
            ldx vramBuffer + 4, y
            stx PPU_DATA
           
            ; add 4 to the offset:
           
            adc #3 ; C is set
            tay
        :
 
        lsr dataLength
        bcc :+
            ldx vramBuffer + 1, y
            stx PPU_DATA
            ldx vramBuffer + 2, y
            stx PPU_DATA
            ldx vramBuffer + 3, y
            stx PPU_DATA
            ldx vramBuffer + 4, y
            stx PPU_DATA
            ldx vramBuffer + 5, y
            stx PPU_DATA
            ldx vramBuffer + 6, y
            stx PPU_DATA
            ldx vramBuffer + 7, y
            stx PPU_DATA
            ldx vramBuffer + 8, y
            stx PPU_DATA
           
            ; add 8 to the offset:
           
            adc #7 ; C is set
            tay
        :
 
        lsr dataLength
        bcc :+
            ldx vramBuffer + 1, y
            stx PPU_DATA
            ldx vramBuffer + 2, y
            stx PPU_DATA
            ldx vramBuffer + 3, y
            stx PPU_DATA
            ldx vramBuffer + 4, y
            stx PPU_DATA
            ldx vramBuffer + 5, y
            stx PPU_DATA
            ldx vramBuffer + 6, y
            stx PPU_DATA
            ldx vramBuffer + 7, y
            stx PPU_DATA
            ldx vramBuffer + 8, y
            stx PPU_DATA
            ldx vramBuffer + 9, y
            stx PPU_DATA
            ldx vramBuffer + 10, y
            stx PPU_DATA
            ldx vramBuffer + 11, y
            stx PPU_DATA
            ldx vramBuffer + 12, y
            stx PPU_DATA
            ldx vramBuffer + 13, y
            stx PPU_DATA
            ldx vramBuffer + 14, y
            stx PPU_DATA
            ldx vramBuffer + 15, y
            stx PPU_DATA
            ldx vramBuffer + 16, y
            stx PPU_DATA
           
            ; add 16 to the offset:
           
            adc #15 ; C is set
            tay
        :
   
        iny
        Start:
 
    ; if bit 7 is set, exit loop:
 
    lda vramBuffer, y
    bmi break
    jmp loopPPUwrite
 
    break:
   
    ; mark buffer as empty (reg a has N bit set)
    ; first byte of buffer will be negative, if this buffer used now, nothing happens:
 
    sta vramBuffer              
    rts
.endproc
Post Reply