Thunder Force 4 = overhyped

You can talk about almost anything that you want to on this board.

Moderator: Moderators

tepples
Posts: 22345
Joined: Sun Sep 19, 2004 11:12 pm
Location: NE Indiana, USA (NTSC)
Contact:

Post by tepples »

psycopathicteen wrote:I know a little speed trick that can be done on the 65816 but not on a 6502 or 68000. I like to move the direct page to the object slot of the object to be processed. This way I can easily reach everything in the object slot, and when I have two identical objects, it tricks the CPU into thinking it's writting into the same registers when it isn't.
Isn't that the same as base+constant (4,A4) addressing on a 68000? Or is that slow due to an extra memory access? In any case, direct page and absolute,X addressing are the same speed on a 65816 if the direct page isn't 256-byte aligned and the absolute address is page-aligned.
psycopathicteen
Posts: 3001
Joined: Wed May 19, 2010 6:12 pm

Post by psycopathicteen »

It's atleast 12 cycles for the 68000, 4 for the opcode, 4 for the address, and 4 for the memory fetch.

On the 65816, it takes 3-5 cycles depending on word size and where the DP is located.
tepples
Posts: 22345
Joined: Sun Sep 19, 2004 11:12 pm
Location: NE Indiana, USA (NTSC)
Contact:

Post by tepples »

Today's secret word please, Conky:

Code: Select all

  gencycle
The Super NES dot rate is the same as that of the NES: 3/2 Fsc, or 945/176 = 5.369 MHz. The master clock rate is four times that, and cycles take six or eight master clocks depending on whether they access slow memory (RAM or slow ROM); fast ROM and internal operation cycles always take six clocks. This gives an effective CPU rate somewhere between 2.7 and 3.6 MHz.

Because of the 68000's state machine, we can consider the Genesis to have a master clock of 7.67 MHz and actually run at 1.92 MHz, where each cycle of the internal state machine takes 4 master clocks. But how exactly is the 7.67 MHz rate related to the pixel clock? It's slightly faster than the Amiga and TG16 clock speed of 2*Fsc = 7.159 MHz. The dot rate in 256px mode is the same as the NES and Super NES, and the dot rate in 320px mode is 5/4 that: 6.712 MHz. Is the 68000 clock suposed to equal 8/7 of this 320px dot rate, which would equal 10/7 times the 256px dot rate or 15/7*Fsc?

To derive a clock speed useful for comparison between the Super NES and Genesis, I derive an abstract unit called the gencycle (AAAAAAAAA!), which is 1/2 of a fast (6 clock) cycle (that is, 7.159 MHz) or 1/3 of a slow (8 clock) cycle (that is, 8.054 MHz). Over the long run, the average period of a gencycle should be close to that of a Genesis master clock, allowing 65816 instruction timings to be quoted in units nearly commensurate with 68000 timings.

This 12-clock 68000 instruction corresponds to 10 or 12 65816 gencycles, as seen here:

Ordinary direct page instruction: 10 gc
  1. opcode fetch: 2 gc (fast ROM)
  2. offset fetch: 2 gc (fast ROM)
  3. data low: 3 gc (slow RAM)
  4. data high: 3 gc (slow RAM) (if 16-bit M/X)
Direct page instruction, D not 256-byte aligned: 12 gc
  1. opcode fetch: 2 gc (fast ROM)
  2. offset fetch: 2 gc (fast ROM)
  3. address generation: 2 gc (internal)
  4. data low: 3 gc (slow RAM)
  5. data high: 3 gc (slow RAM) (if 16-bit M/X)
Absolute indexed instruction: 12 gc
  1. opcode fetch: 2 gc (fast ROM)
  2. address fetch low: 2 gc (fast ROM)
  3. address fetch high: 2 gc (fast ROM)
  4. data low byte: 3 gc (slow RAM)
  5. data high byte: 3 gc (slow RAM) (if 16-bit M/X)
But to what extent do direct MOVEs, adds and subtracts with a memory destination, hardware MULs and DIVs, and 32-bit addition and subtraction give 68000 the edge?
Last edited by tepples on Fri Apr 06, 2012 6:49 pm, edited 1 time in total.
3gengames
Formerly 65024U
Posts: 2281
Joined: Sat Mar 27, 2010 12:57 pm

Post by 3gengames »

Don't forget index register [pre](de)incrementation of index registers within instructions.
User avatar
MottZilla
Posts: 2835
Joined: Wed Dec 06, 2006 8:18 pm

Post by MottZilla »

psycopathicteen wrote:
and in case with the SNES it arranged in not very convinient way (one bit of the X is packed these bits and sizes of with three other sprites).
A little while ago I attempted to find what was causing slowdown in Gradius 3, and I found the oam clearing routine, and the way they dealt with the the hi-oam was an unefficient way. Gradius 3 shuffles back and forth between the low-oam and high-oam, one sprite at a time.

The way I manage the high-oam is when I am writing a sprite to the oam, I "AND #$01ff" the x-coordinate, and "ORA #$0200" for big sprites. Then I store the result in both the oam buffer, and a second list that is 544 bytes after the oam buffer. Then I overwrite the high-byte of the x-coordinate with the y-coordinate, and overwrite the high-byte of the y-coordinate with the attribute word. After all the game logic has been calculated, I use the table of 16-bit x-coordiates, and build the high-oam with it.
See what I mean, they wasted CPU cycles it sounds like. Causing more slowdown than needed to achieve what needed to be done. Here is a fun challenge since you might still have notes, did you try rewriting the routine optimized and seeing what sort of performance boost you could yield? That would be an impressive patch if you could significantly reduce slowdown in gradius 3.

Keep in mind though I think Gradius III was Konami's first Super Famicom game. It is not that strange that it would not be coded very efficiently.
psycopathicteen
Posts: 3001
Joined: Wed May 19, 2010 6:12 pm

Post by psycopathicteen »

http://www.emulationzone.org/consoles/snes/tech.htm


More idiots passing off incorrect technical information. They mentioned the cpu being "slow" 5 times, and wrote "slow" in all caps 4 of the 5 times.
User avatar
Bregalad
Posts: 8036
Joined: Fri Nov 12, 2004 2:49 pm
Location: Caen, France

Post by Bregalad »

My god this page is just so, so innacurate/retarded - obviously written by biased sonic fanboys.

Last updated in 1999 though so it's 13 y.o.
Useless, lumbering half-wits don't scare us.
tepples
Posts: 22345
Joined: Sun Sep 19, 2004 11:12 pm
Location: NE Indiana, USA (NTSC)
Contact:

Post by tepples »

Yeah, that page is horrid. Have you tried e-mailing corrections to the maintainer? Say the presence of DMA to VRAM is equivalent to Blast Processing, and my gencycle theory might help alleviate the capitalized SLOW-ness. Apparently the most current e-mail address is wacko413 at Hotmail.

The "64K at a time" part appears to have something to do with the data segment register set with the PLB instruction. On the 68000, on the other hand, a pointer fits in a single register.
Last edited by tepples on Mon May 21, 2012 11:57 am, edited 1 time in total.
User avatar
MottZilla
Posts: 2835
Joined: Wed Dec 06, 2006 8:18 pm

Post by MottZilla »

That was a very funny read. It reminds you how bad the internet was for information back then.
psycopathicteen
Posts: 3001
Joined: Wed May 19, 2010 6:12 pm

Post by psycopathicteen »

That was a very funny read. It reminds you how bad the internet was for information back then.
You still see this kind of incorrect information on Sega-16.com. Those people just can't get over the fact that they were wrong and we were right.
tepples
Posts: 22345
Joined: Sun Sep 19, 2004 11:12 pm
Location: NE Indiana, USA (NTSC)
Contact:

Re:

Post by tepples »

psycopathicteen wrote:The way I manage the high-oam is when I am writing a sprite to the oam, I "AND #$01ff" the x-coordinate, and "ORA #$0200" for big sprites. Then I store the result in both the oam buffer, and a second list that is 544 bytes after the oam buffer. Then I overwrite the high-byte of the x-coordinate with the y-coordinate, and overwrite the high-byte of the y-coordinate with the attribute word. After all the game logic has been calculated, I use the table of 16-bit x-coordiates, and build the high-oam with it.
In other words, you emulate Metal Combat's OBC1 in software. I'm using this too, but I've discovered that the 32-byte buffer can be generated overlapping the 512-byte buffer.
Sik
Posts: 1589
Joined: Thu Aug 12, 2010 3:43 am

Re:

Post by Sik »

Meh, since this thread was bumped I may as well read it... and wow there's so much problems here.

Also: memory accesses on the 68000 are probably the worst part of it. They're so pathetically slow I'd compare them to cache misses on modern CPUs, except you can't work around it. Avoid memory accesses at all costs, no matter what. Luckily there are enough registers that in practice RAM will pretty much get used only to store permanent state rather than variables in a subroutine, but yeah...

The biggest advantages of the 68000 are 1) having a much easier time manipulating large numbers (here is where most of the important optimizations can come in) and 2) being a tad easier to program for. Although I think people are underestimating the amount of time available in a frame, unless you're being wasteful a frame is usually plenty of time for a game (my code tends to stay around vblank duration even when I'm sloppy...). This goes for both the 68000 and the 65816.
tepples wrote:But to what extent do direct MOVEs, adds and subtracts with a memory destination, hardware MULs and DIVs, and 32-bit addition and subtraction give 68000 the edge?
Hardware MUL and DIV don't give any edge at all since they're so pathetically slow nobody in their right mind would use them (and if you have raster effects, you outright can't use them because they'll delay the interrupt and you risk missing the timing window - DIVU/DIVS outright can delay as much as almost a third of a scanline).

32-bit operations help a lot from experience. Yeah, most of the time you'll use 16-bit (in fact, maybe even 8-bit moreso than 16-bit), but there are a lot of places where having 32-bit operations helps, and they're a lot faster than doing the equivalent with 16-bit operations. They can also be useful if you're dealing with a lot of data in a go.

Adds and substracts to memory tend to be useful with counters and such. No need to waste time loading the value into a register when all you want is just to increment it. Direct moves fall under a similar usage, and also make things easier when you want to store constants.

PS: Thunder Force IV sometimes slows down with a single enemy on screen, so huh... =P (although that PCM playback cuts the music is a much bigger offense to me, honestly)
User avatar
rainwarrior
Posts: 8062
Joined: Sun Jan 22, 2012 12:03 pm
Location: Canada
Contact:

Re: Thunder Force 4 = overhyped

Post by rainwarrior »

Hardware multiply is a huge boon if you want to do 3D transforms.
User avatar
Bregalad
Posts: 8036
Joined: Fri Nov 12, 2004 2:49 pm
Location: Caen, France

Re: Thunder Force 4 = overhyped

Post by Bregalad »

Ah, all those SEGA fanboys spreading bullshit lies on the SNES. That's nothing new. The fact is, the SNES has about the same processing power as the MD, despite a much lower clock rate, but better graphics and sound. I understand it's annoying, however, there's no hope to having them ever shut up their lies over the so called supperiority of the MD.

In any case, who cares about the supperiority or inferiority of a system ? The NES is inferior to so many gaming systems so far and I still love it.
Sik
Posts: 1589
Joined: Thu Aug 12, 2010 3:43 am

Re: Thunder Force 4 = overhyped

Post by Sik »

Actually, the format of the data used by the SNES hardware tends to get in the way too and makes code slower than it should be. The sprite table is probably the prime example of this, trying to add a sprite gets cumbersome when two of the bits are in a different place in the table and their byte is shared with other sprites. Also the fact that the only way the SPC can get any new data at all is through a busy loop with the 65816 (which is why so many games pause for a couple of seconds when switching the background music), which considering it's a sample-based chip, it can get really bad since those need a lot more of memory than synthesized sounds.

Also using planar format can hurt performance-wise, modifying a single pixel requires touching four bytes in RAM (assuming the usual 4bpp formats), if it was packed (like the Mega Drive's) it'd require touching only one byte as well as potentially simplifying code logic (mode 7 makes this even easier since 1 byte = 1 pixel, but it has a very limited amount of tiles so that reduces its usefulness). Granted, in practice like only 1% of games are probably affected by this, but I assume this is the main reason why the SuperFX is so slow for rendering (for comparison, the 68000 alone on the Mega Drive can do something comparable to Star Fox).

Of course in practice it's generally the sloppiness of the code what actually tends to affect performance the most (both systems are full of games that are slow as hell when there isn't any justification for it). Prime examples would be Sonic 2 in the case of the Mega Drive, and Super Mario World on the SNES seems to be prone to get slow easily (though I don't recall the game slowing down? could be hacks what are affected mostly, that game does have a reputation for slowing down easily for some reason).
Post Reply