Page 1 of 26

Decreased S-CPU Velocity: Real or Imagined?

Posted: Sat Apr 11, 2015 11:17 am
by Drew Sebastino
In [url=http://forums.nesdev.com/viewtopic.php?p=144963#p144963]this post[/url], psycopathicteen wrote:This works fast enough. This makes me wonder how fast the Genesis would be if it had the same PPU as the SNES, and needed to do all this shit in order to have good animation, and vice-versa.
Agreed. It doesn't really seem that the SNES's CPU is slow, it actually seems like it has to do more work than the Genesis's because of the PPUs' design. The planar graphics format comes to mind...
psycopathicteen wrote:I would've used 80 equally sized 32x32 slots.
Wow.

Well, you got to look on the bright side. You have more than 64 colors. :wink: (I can barely get by with 256. :roll: )

Re: Dynamic Sprite Vram Routine Ideas

Posted: Sat Apr 11, 2015 12:25 pm
by psycopathicteen
Stef said I only got my Gunstar Heroes demo running on the SNES because I "simplified" everything to run good on the system, when my demo was actually running a much more complicated dynamic animation engine than the original game, and still had plenty of CPU time left.

I decided to convert part of the code to 68000 myself to compare the cycle counts.

Code: Select all

65816 code:
-;
inx				//2
lda {vram_slot_table},x		//4 6
cmp #$0f			//2 8
beq -				//3 11
-;				
lsr				//2
bcc +				//3 5
iny				//2 7
bra -				//3 10
+;


68000 code:
-;
mov.b d0,(a0)+			//8
cmp.b d0,d1			//4  12
beq -				//10 22
-;
lsr.w d0,#1			//8
bcc +				//10 18
addq.b d2,#1			//4  22
bra -				//10 32
+;
This is one of the speed crucial parts of the code. It looks for a 32x32 slot that is not completely full, then it looks at which 16x16 slot is still open. The first loop is 11 vs 22 cycles, the second loop is 10 vs 32 cycles. Look at how "fast" the 68000 is. :roll:

Re: Dynamic Sprite Vram Routine Ideas

Posted: Sat Apr 11, 2015 1:37 pm
by Drew Sebastino
psycopathicteen wrote:Stef said I only got my Gunstar Heroes demo running on the SNES because I "simplified" everything to run good on the system, when my demo was actually running a much more complicated dynamic animation engine than the original game, and still had plenty of CPU time left.
Why does it even matter how "simple" it is if it looks and plays exactly the same and doesn't have any slowdown? That'd be like I bragged that I could write "inc" 30 times instead "clc adc #$30" and the game would run without slowdown. It isn't any more impressive from a gameplay perspective and it just looks stupid.
psycopathicteen wrote:This is one of the speed crucial parts of the code. It looks for a 32x32 slot that is not completely full, then it looks at which 16x16 slot is still open. The first loop is 11 vs 22 cycles, the second loop is 10 vs 32 cycles. Look at how "fast" the 68000 is.
Wow. :shock: It's funny, because even though the SNES is clocked about 1/2 as fast as the Genesis, it takes about twice (or three times in the second example) the amount of time to do the same thing. I feel like video hardware generally matters more in a video game console than the main CPU anyway. You can try to optimize on a CPU, but not on video hardware, and if it's like the SNES, you can add more processing power with an expansion chip, (again, I never said it would be easy, it's just possible.) but you (unfortunately) can't add more overdraw. (No, the Yoshi's Island method is not a substitute for overdraw tepples. :roll: )

This is pretty random, and although I haven't been doing anything SNES related for the past week or so (I think you know why) I wonder. I plan on making both characters be able to carry a rocket launcher which will (obviously) cause an explosion. You won't be to shoot more than one rocket at a time, so there will be no more than 2 explosions onscreen at the same time. The main problem is that the explosions are going to be 64x64 pixels each, so there will (obviously) be a 1/4 of sprite vram gone. There will also only be about 2 32x32 sprites left I can animate, so I figure that I'd double buffer the explotions so there will be 3/8 of sprite vram taken up, (there will be the 2 64x64 explotions and 2 64x32 for the double buffering) but I'll only be using about 2/5 of the DMA bandwidth. I plan on cutting the top and bottom 8 pixels of the screen if they are not visible anyway, and if it comes down to it, I'll cut off 16 more pixels for DS resolution, but I'm only going to do that if I have to. I was originally going to make the scoreboard out of sprites, but I'm not sure how that's going to fit in the sprite vram...

Re: Dynamic Sprite Vram Routine Ideas

Posted: Sat Apr 11, 2015 2:06 pm
by 93143
Espozo wrote:even though the SNES is clocked about 1/2 as fast as the Genesis, it takes about twice (or three times in the second example) the amount of time to do the same thing.
Cycles, not time. In FastROM, the procedure would take roughly 6/7 as long on the SNES as on the Genesis, assuming neither one can be further optimized (they look pretty simple, but they're out of context, and I don't know 68K).

Re: Dynamic Sprite Vram Routine Ideas

Posted: Sat Apr 11, 2015 2:11 pm
by Drew Sebastino
Yes, that's what I meant. (The SNES's CPU obviously isn't twice as fast. :roll: )

Re: Dynamic Sprite Vram Routine Ideas

Posted: Sat Apr 11, 2015 4:59 pm
by psycopathicteen
I felt like posting a YouTube video of me ranting about how it is bullshit to design game logic around perceived CPU speed. I'll still do it if I have enough time Today.

I did notice that the second loop could be optimized a little by rearranging the loop.

Code: Select all

lsr          //2
bcc +    //3 5
-;
iny        //2
lsr         //2 4
bcs -     //3 7
+;

lsr.w d0,#1    //8
bcc +              //10 18
-;
addq.b d2,#1  //4
lsr.w d0,#1      //8   12
bcs -                 //10 22
+;

The 68000 still takes 3 times the cycles.

Re: Dynamic Sprite Vram Routine Ideas

Posted: Sun Apr 12, 2015 12:53 am
by Sik
Decided to redo the 68000 version to see how fast I could get (going by the code originally posted):

Code: Select all

    moveq   #0, d0                  ; 4
    lea     @Table(pc), a1          ; 8
    
@Loop:
    move.b  (a0)+, d0               ; 8
    move.b  (a1,d0.w), d1           ; 14
    bmi.s   @Loop                   ; 10 usually

; ...

@Table:
    dc.b    $00, $01, $00, $02      ; %0000, %0001, %0010, %0011
    dc.b    $00, $01, $00, $03      ; %0100, %0101, %0110, %0111
    dc.b    $00, $01, $00, $02      ; %1000, %1001, %1010, %1011
    dc.b    $00, $01, $00, $FF      ; %1100, %1101, %1110, %1111
That's 12 cycles for init and 32 cycles per iteration. For the sake of comparison, at 65816's usual speeds that'd be about 6 and 16, respectively.

Um, ouch, although now I want to see psycopathicteen go ahead and try the same using look-up tables. Pretty sure that if there wasn't a dare (or I was heavily starved for cycles) I'd have tried an approach similar to his.

PS: if anybody wonders, bmi.s would take 6 cycles when not branching. That'd only happen in the last iteration though, so I've decided to not count that possibility for the purpose of profiling.

Re: Dynamic Sprite Vram Routine Ideas

Posted: Sun Apr 12, 2015 6:46 am
by TOUKO
psycopathicteen wrote:Stef said I only got my Gunstar Heroes demo running on the SNES because I "simplified" everything to run good on the system, when my demo was actually running a much more complicated dynamic animation engine than the original game, and still had plenty of CPU time left.
Your bad apple demo for snes is a proof that actualy the snes CPU and 6xx architecture are not crappy :wink:
The snes cpu is "slow" because is not at his full speed (3,58 mhz) mainly because the use of crappy slow WRAM :? ..
And add to this, how about inexperimented 65xx coders ??
i saw the source code of Art of fighting for PCE, this game is full of macro in 68k style, and the ice on the cake, the code is 6502, not even hu6280,and the game run very fast with a faked zoom ..
Stef said I only got my Gunstar Heroes demo running on the SNES because I "simplified" everything to run good on the system, when my demo was actually running a much more complicated dynamic animation engine than the original game, and still had plenty of CPU time left.
I have got some discusion about GH, and i said him that is more simpliest than it seem .
I think you can do an exact port, not because snes CPU,but only because you cannot put the same amount of sprites in H32, readability become terrible, and should have an heavy sprites flicker .

Re: Dynamic Sprite Vram Routine Ideas

Posted: Sun Apr 12, 2015 6:53 am
by Drew Sebastino
TOUKO wrote:
psycopathicteen wrote:Stef said I only got my Gunstar Heroes demo running on the SNES because I "simplified" everything to run good on the system, when my demo was actually running a much more complicated dynamic animation engine than the original game, and still had plenty of CPU time left.
Your bad apple demo for snes is a proof that actualy the snes CPU and 6xx architecture are not crappy :wink:
The snes cpu is "slow" because is not at his full speed (3,58 mhz) mainly bacause the use of crappy slow WRAM :? ..
Pretty much. People just look at 3.58 and 7.6 and say "Duh, 7.6 is biger dan 3.58." They don't know anything deeper than that. It's funny when people try to compare processor speeds by seeing how fast the screen scrolls. :roll: Even James Rolfe does in his second SNES vs Genesis video. (You wouldn't believe how stupid the comments are there, and I'm including some of the people who are SNES side.)

Re: Dynamic Sprite Vram Routine Ideas

Posted: Sun Apr 12, 2015 7:00 am
by TOUKO
Pretty much. People just look at 3.58 and 7.6 and say "Duh, 7.6 is biger dan 3.58.
True, how many times i heard that :?, mainly on the best 16 bits forum,the well know sega-16 :P
It's funny when people try to compare processor speeds by seeing how fast the screen scrolls.
And parallaxes ??, you know that snes can't do the millions ton of parallaxes which we have in practically all Md games .. :D
however i like this kind of parallaxes, on PCE you can only do the same with only 1 bck layer,you have no choice, but this are called screen ruptures and not parallaxes because there not overlapping .

Re: Dynamic Sprite Vram Routine Ideas

Posted: Sun Apr 12, 2015 7:08 am
by tepples
3.580 on the Master System or 4.194 on the monochrome Game Boy is also bigger than 1.790 on the NES, yet the 6502 gets more done in each cycle than Z80 or similar chips, so it's mostly a wash. I'm under the impression that the 68000 and 65816 share a similar relationship: the latter can do twice as much Work Per Clock.

So to be able to compare work per clock on an even playing field, some time ago I invented a unit of time called gencycles. One gencycle is a fraction of a 65816 cycle intended to roughly approximate the period of one 68000 cycle. Each slow access (WRAM, slow ROM, and each byte of DMA) takes 3 gencycles, and each fast access (most I/O ports, fast ROM, and "internal operation") takes 2 gencycles. So you can cycle count a 68000 subroutine, see how many cycles it used, cycle count a 65816 subroutine that does the same thing, count gencycles, and you should get something fairly close to the machines' relative speeds.

Re: Dynamic Sprite Vram Routine Ideas

Posted: Sun Apr 12, 2015 7:20 am
by TOUKO
A Genesis emulator author (i don't remember his name) profiled the 68K in some games ,and his mips ratio is 0,7/0,8 mips at his best ..
This is not abnormal, the 68k was designed for workstations and servers, not for a gaming system .
his streinght is easy to code for, even in a high level language (like C), and his 24 bits addressing space, you don't have to care about the annoying mermory banks managment,and can access to a large memory space,not his freq/cycles ratio ,this is why in 68K arcade systems, the CPU manages only the game logic, an let all customs chips to do the dirty work.

The 68k profiling files are here :
http://exophase.devzero.co.uk/profiles.zip

Re: Dynamic Sprite Vram Routine Ideas

Posted: Sun Apr 12, 2015 9:53 am
by psycopathicteen
http://forums.sonicretro.org/index.php?showtopic=33501

This thread has a lot of really stupid posts like this:
I'd rather see it attempted with a plain vanilla SNES with nothing added. I would think these approaches are a good idea:

1. Optimize the hell out of the code and make sacrifices where it may not be noticed.
2. Maybe slow Sonic's max speed to 5 pixels per frame so that it doesn't look excessively fast.
3. Have an SCD-style pan forward when running.
4. Scale what art you can down to 80% original scale horizontally so that at least those don't look distorted.
News flash! Did it ever occur to you that deliberately causing the game to run slow at full speed, will cause **gasp** the game to run slow at full speed? Maybe that's the reason why you're having a hard time optimizing you're game? The SNES isn't actually lagging, you just programmed it that way?

Re: Dynamic Sprite Vram Routine Ideas

Posted: Sun Apr 12, 2015 10:09 am
by TOUKO
Oh yes, i have the same kind of discussions with stef on a french forum about the snes 65816 and the Hu6280,and he assimilates all two to a vanilla 6502, i think all guys in sonic forum do the same .

And you know, the 68k is better because of 32 bits instructions :P , but in a 16 bits and 2D games who care about this ??
For PCE i use mainly 8 bits and some 16 bits (for pointers ) ,and the hu6280 is clocked at 7,16 not 1,79.

At the same speed the 65816 is close to twice as fast as 6502 .
News flash! Did it ever occur to you that deliberately causing the game to run slow at full speed, will cause **gasp** the game to run slow at full speed? Maybe that's the reason why you're having a hard time optimizing you're game? The SNES isn't actually lagging, you just programmed it that way?
Yes if you code badly you have the wrong result, the snes CPU doesn't have much cycles to waste with bad programming .
The big advantage of 65xx achitecture, is his high level of code optimisation.

For GH if it's the CPU the limiting factor, when why a similar game do not exist with a sfx or SA-1 ??
Same for BTU, if is the cpu which limit to 3/4 sprites on screen, why capcom don't put a faster processor for FF 2/3 like he did with MMX series ???
Simply because is the pixels sprite limit / scanline the real limiting factor,and a faster CPU cannot change this fact .

Re: Dynamic Sprite Vram Routine Ideas

Posted: Sun Apr 12, 2015 11:10 am
by psycopathicteen
In the post above I was referring to the sonicretro guy stating "maybe slow Sonic's max speed to 5 pixels per frame so that it doesn't look excessively fast" making no sense, because lowering the pixels-per-frame movement speed of Sonic's character wouldn't speed up the CPU at all, it would just make it look like the game is lagging, even when it is running at 60fps.

It's like recording a slow-motion voice into a tape recorder. If you play it at normal speed, it will sound like slow-motion, because it was recorded that way.