WIP: Wizard of Wor
Moderator: Moderators
Re: WIP: Wizard of Wor
Am slowing down again, as I need to squash a whole bunch of bugs that have crept into the code.
-Thom
-Thom
Re: WIP: Wizard of Wor
Wizard of Wor WIP: Worrior and monster collision detection fully implemented (both worriors and monsters able to shoot and kill each other, with appropriate points rewarded.) There still is at least one lingering bug with the laser code. But for now, I need to optimize all the bounding box checking code to gain back much needed cycles as the game slows down when everybody is on screen and shooting. Can't have that. Computer is playing blue.
And now, I need to take a break from new features, to drastically optimize the bounds checking code, as I am doing lots of multiplies and divides all over the code for ostensibly similar or same values. (at least I think), I need to do the calculations once, and just use them per frame, and that should free up more than enough cycles to finish the game play implementation.

Latest build is here: -Thom
And now, I need to take a break from new features, to drastically optimize the bounds checking code, as I am doing lots of multiplies and divides all over the code for ostensibly similar or same values. (at least I think), I need to do the calculations once, and just use them per frame, and that should free up more than enough cycles to finish the game play implementation.

Latest build is here: -Thom
Re: WIP: Wizard of Wor
Wizard of Wor NES WIP:
I had initially planned to do three major optimizations. I have done two, and the result is dramatic. It seems I was at least spanning two or more frames worth of time to do my game logic. By simply re-arranging the game state arrays, and placing them into 6502 zero page, the game program logic as is, is running at full frame rate, speeding up by at least 200% ... WHAT A DIFFERENCE.
Basically before, I was building macros that did:
and so on...
Which was causing a 6502 software multiply (because no hardware multiply) on EACH AND EVERY read and write of game state, and I was doing this a total of about 220 times throughout the game logic.
I replaced this with:
You can see, not only does this look cleaner, but it also runs much better, because the resulting calls literally become either direct X or indirect Y loads and stores. Which the 6502 loves to do..which is why I am KICKING myself for not doing it earlier. I KNOW this from doing 6502 assembler that it's better to keep arrays of the same data laterally together instead of in a c type struct or array, as it's simply an index change in the end.
I've pasted a copy of the latest ROM here, you can see it runs a fuckload faster, wowza! And of course, a GIF showing the new speed, it flies.. and I can now really start tuning the main game.

Damn, I feel good!
-Thom
I had initially planned to do three major optimizations. I have done two, and the result is dramatic. It seems I was at least spanning two or more frames worth of time to do my game logic. By simply re-arranging the game state arrays, and placing them into 6502 zero page, the game program logic as is, is running at full frame rate, speeding up by at least 200% ... WHAT A DIFFERENCE.
Basically before, I was building macros that did:
Code: Select all
unsigned char stamps[NUM_FIELDS*NUM_STAMPS];
#define STAMP_NUM(x) (x*NUM_FIELDS)
#define STAMP_X (STAMP_NUM(x)+0)
#define STAMP_Y (STAMP_NUM(x)+1)
...
stamps[STAMP_X(i)]=new_stamp_x_position;
stamps[STAMP_X(i)]=new_stamp_y_position;
...
if (stamps[STAMP_X(i)]==... && stamps[STAMP_Y(i)]==... )
{
...
}
Which was causing a 6502 software multiply (because no hardware multiply) on EACH AND EVERY read and write of game state, and I was doing this a total of about 220 times throughout the game logic.
I replaced this with:
Code: Select all
unsigned char stamp_x[NUM_STAMPS];
unsigned char stamp_y[NUM_STAMPS];
...
stamp_x[i]=new_stamp_pos_x;
stamp_y[i]=new_stamp_pos_y;
...
if (stamp_x[i]==... && stamp_y[i]==...)
{
}
I've pasted a copy of the latest ROM here, you can see it runs a fuckload faster, wowza! And of course, a GIF showing the new speed, it flies.. and I can now really start tuning the main game.

Damn, I feel good!
-Thom
Re: WIP: Wizard of Wor
LDA zpg,X is the same speed as LDA abs,X — at least as long as there's no zero crossing —so if you find there's memory pressure on zero page addresses you may be able to move arrays up.
Re: WIP: Wizard of Wor
now that everything is so smooth and zoomy-zoomy, I'm re-working the animation and delay routines to slow everything down, and slowly speed up as the level progresses (given a level #, adjust how fast the scaling happens, and the top speed value.)
This is happening in the initiial_tuning branch.
-Thom
This is happening in the initiial_tuning branch.
-Thom
Re: WIP: Wizard of Wor
Does anyone have a decent algorithm for a fractional delay? I need to apply both an animation cel delay, and a sprite position delay, and using frames for this seems to be too coarse.
-Thom
-Thom
Re: WIP: Wizard of Wor
Add a 16bit number, but only use the high byte to display where it is.
This will move the object a bit faster than one pixel every two frames (which would be adc #$80)
Code: Select all
lda poslow,x
clc
adc #$C0
sta poslow,x
lda poshigh,x
adc #0
sta poshigh,x
sta OAM,y
Re: WIP: Wizard of Wor
Thanks.Kasumi wrote:Add a 16bit number, but only use the high byte to display where it is.This will move the object a bit faster than one pixel every two frames (which would be adc #$80)Code: Select all
lda poslow,x clc adc #$C0 sta poslow,x lda poshigh,x adc #0 sta poshigh,x sta OAM,y
The problem I seem to be having, is that if I delay any amount, the delay seems asymmetrical, and I suspect this may be because of the code in the runtime that allows not only for detection of NTSC and PAL, but sets the same frame rate for both (50fps).. could this be the case? I'm going bonkers trying to see wtf is going on so I can do appropriate speed tuning.
-Thom
Re: WIP: Wizard of Wor
You game appears to skip running logic every sixth frame, on NTSC.
So on NTSC:
5 gameplay frames are run for every 6 "real" frames.
50 gameplay frames are run for every 60 "real" frames.
At 60 frames per second (close enough), 50 gameplay frames for every second.
And on PAL:
5 gameplay frames are run for every 5 "real" frames.
50 gameplay frames are run for every 50 "real" frames.
At 50 frames per second (close enough), 50 gameplay frames for every second.
So yes, your game is attempting to match NTSC and PAL gameplay speed. I'm unsure of if you're asking this question because you weren't aware it was doing that at all, or if you were totally aware and just want to do it a different way. (Or you don't want to do it at all, and want both versions to run 1 gameplay frame for every "real" frame with the NTSC character moving 60 pixels per second and the PAL character moving 50 pixels per second.)
So on NTSC:
5 gameplay frames are run for every 6 "real" frames.
50 gameplay frames are run for every 60 "real" frames.
At 60 frames per second (close enough), 50 gameplay frames for every second.
And on PAL:
5 gameplay frames are run for every 5 "real" frames.
50 gameplay frames are run for every 50 "real" frames.
At 50 frames per second (close enough), 50 gameplay frames for every second.
So yes, your game is attempting to match NTSC and PAL gameplay speed. I'm unsure of if you're asking this question because you weren't aware it was doing that at all, or if you were totally aware and just want to do it a different way. (Or you don't want to do it at all, and want both versions to run 1 gameplay frame for every "real" frame with the NTSC character moving 60 pixels per second and the PAL character moving 50 pixels per second.)
Re: WIP: Wizard of Wor
I'm simply trying to determine why if I use e.g. a delay counter that decrements every 'frame' that I am seeing some frames go faster than others.
-Thom
-Thom
Re: WIP: Wizard of Wor
Here's the code in _ppu_wait_frame:
(Comments mine)
So if you want it to not do that, you could do this:
in theory. But that may have other effects, since I'm not too familiar with neslib.
(Comments mine)
Code: Select all
lda #1;Tell the NMI the vram buffer is totally (rather than partially) updated (presumably)
sta <VRAM_UPDATE
lda <FRAME_CNT1;Load a counter changed in the NMI (presumably)
@1:
cmp <FRAME_CNT1;Compare to what's in A. When the NMI changes this, it'll be different
beq @1;and we'll stop looping
lda <NTSC_MODE;Assuming PAL is zero, we're done
beq @3;And branch
;If NTSC (non zero presumably)
@2:
lda <FRAME_CNT2;We check if this frame is a multiple of six
cmp #5
beq @2;If it is, keep waiting until it's not.
@3:
rts
Code: Select all
lda #1
sta <VRAM_UPDATE
lda <FRAME_CNT1;Load a counter changed in the NMI (presumably)
@1:
cmp <FRAME_CNT1;Compare to what's in A. When the NMI changes this, it'll be different
beq @1;and we'll stop looping
rts
Re: WIP: Wizard of Wor
ok, replaced my BOX_PIXEL_X and BOX_PIXEL_Y multiply by 24 macros with a straight table lookup, and this seems to have made everything extremely smooth, if fast. Debating on whether or not to replace the div24 routine, which is very fast, anyway.
-Thom
-Thom
Re: WIP: Wizard of Wor
Looks like with removing the multiplies, things are smooth now that I am applying two types of delay, animation delay, and move delay. I can now build a set of tables to scale those up per level.
With this and the current tuning that I've done for laser speeds and player movements, I just need to implement monster speed scaling, and it'll be good for the first pass of tuning.
CC65's generalized multiply routines, are, understandably slower than grandma stuck in molasses in January going uphill in a fucking ice storm.
-Thom
With this and the current tuning that I've done for laser speeds and player movements, I just need to implement monster speed scaling, and it'll be good for the first pass of tuning.
CC65's generalized multiply routines, are, understandably slower than grandma stuck in molasses in January going uphill in a fucking ice storm.
-Thom
Re: WIP: Wizard of Wor
If you are tight and want to ditch tables, notice that N*24 = N*8+N*16, or (N<<3)+(N<<4).