How many sprites can the Neo Geo update per frame?

Discussion of development of software for any "obsolete" computer or video game system. See the WSdev wiki and ObscureDev wiki for more information on certain platforms.
93143
Posts: 1914
Joined: Fri Jul 04, 2014 9:31 pm

Re: How many sprites can the Neo Geo update per frame?

Post by 93143 »

creaothceann wrote: Tue Sep 16, 2025 12:38 amIf they had used 80 sprites like the Genesis
No no. I don't envy them that. I think I'd have real trouble with my shmup port if I only had 80 sprites to work with, even if they all had unique X and Y sizes. It'd be nice to have 8x16s (in fact some of the player bullet graphics would be more accurate), but I wouldn't trade away 48 sprites for it.

But yeah, that sort of layout would be much less restrictive, and 768 bytes isn't that much extra DMA (I assume 7 of those X bits don't actually exist on-die, so the whole thing only takes 112 additional real bytes). An extra bit for Y would be nice, but it would throw off the alignment, and with this level of size control it isn't strictly necessary...
NeoOne
Posts: 19
Joined: Sat Jul 22, 2023 8:52 am

Re: How many sprites can the Neo Geo update per frame?

Post by NeoOne »

93143 wrote: Mon Sep 15, 2025 11:52 am
NeoOne wrote: Mon Sep 15, 2025 10:16 amI can't tell from the gif you showed but it def looks like it is removing edges. I guess that is relatively fast - just writing 0's to VRAM locations?
Look at the thick guy's boots.

Somebody in the thread had previously proposed simply trimming edges (and not uniformly across all sprites in the metasprite either) as a homebrew technique, which may be what confused me.

Also, no, the sprites on SNES use the interleaved bitplane format, so the fastest method of (vertical) edge trimming would probably involve a lot of 16-bit immediate AND, with the results written to WRAM. You don't want to be trying to write directly to VRAM in a software rendering loop on SNES, because during active display and HBlank it's closed for exclusive use of the video chips, and your writes will fail.

This is why VRAM updates on SNES are generally limited to what you can push through with DMA during VBlank, and why games like Final Fight and Star Fox often letterbox with forced blank to give the DMA unit more time. Quite foreign to how the Neo Geo (and NES/Famicom) operate.
I'm not sure if you can join sprites together on the SNES so they all move as one?
No such luck. Sprites on SNES don't know about each other. Metasprites on SNES are a software concept. And the OAM format, while it is nice and compact (requiring only 544 bytes of DMA for 128 sprites), is such a pain to use that there's actually a special chip (the OBC-1, used in Metal Combat: Falcon's Revenge) solely to make it easier to compile an OAM image.

The SNES sprite system is actually a pretty weird quasi-legacy design that could have (IMO) greatly benefited from earlier abandonment of the idea of Famicom backward compatibility. I look with envy at the Mega Drive, where any sprite can have its own unique H and V sizes rather than having to select from only two global square* sizes, and getting more than 16 KB of sprite data onscreen at the same time doesn't require potentially brittle, difficult, limiting and/or VRAM- and DMA-wasting tricks†...

† my application involves a fake BG layer made of sprites, which makes bypassing the 16 KB limit almost trivial. Something like Metal Slug would be a whole different ball game...
Yes I can see the software scaling now. This (scrunching the tiles with some software scaling) seems to be a scaling technique that is used by a few games on the SNES. I wonder if it was shared to developers somehow? (maybe even by Nintendo?)

That seems annoying with a slower CPU to have to do more work. You have all those sprites and then you have to waste some CPU time like that

Even on the Neo Geo there are a few minor annoyances where they could have made things faster or easier to do. Maybe the SNES has more than most because - like you say - it was originally designed to be backwards compatible with NES. And possibly to have an extra chip too?( I don't know if its true but I read the extra chip they put in Pilotwings was going to be part of the the SNES originally)

BTW your shooter - If you had less sprites available (e.g. 80), could you re-use them with interrupts?

Is there a video of your shooting game? Would be interesting to see it πŸ‘
93143
Posts: 1914
Joined: Fri Jul 04, 2014 9:31 pm

Re: How many sprites can the Neo Geo update per frame?

Post by 93143 »

NeoOne wrote: Sat Sep 20, 2025 5:43 amYes I can see the software scaling now.
I can see another kind of software scaling too. It seems I neglected to put [/size] at the end of my last footnote...
I read the extra chip they put in Pilotwings was going to be part of the the SNES originally)
That'd be the DSP-1, also used in Super Mario Kart and a bunch of other games (but not F-Zero). It helps with 3D transforms and such, but it's infuriating once you delve into it because the rigid Harvard architecture forces it to spend most of its time parsing commands rather than actually executing them. So much wasted potential... It's still an improvement over just the bare S-CPU, and if I'm not mistaken there was at one point a plan to include it in the console, which makes sense as it has obvious synergy with Mode 7. I have heard the rumour that Pilotwings had to be equipped with it at the last second because it was stripped out of the console for cost reasons.
BTW your shooter - If you had less sprites available (e.g. 80), could you re-use them with interrupts?
No. For two reasons.

1) I'm already using interrupts and HDMA so extensively that I have maybe a quarter of my S-CPU compute time left. Thank goodness for the Super FX...

2) Rewriting OAM outside VBlank or forced blank is insanely difficult, because (a) the S-PPU takes over the internal address, so writes will go wherever the video chip last looked instead of to the location specified in OAMADDL/H, and (b) it's likely that OAM rejects writes during S-PPU accesses, in which case most of the line would be unusable anyway (though CGRAM writes seem to go through 100% of the time, so who knows). I've been meaning to test this, but it's not high priority.

Uniracers is the only game that attempts OAM access during HBlank, and it only manages to change a byte in the high table (the interleaved X high bits and size bits, four sprites per byte. I told you the OAM format was goofy...).

Of course, if you specify that in addition to reducing the sprite count to 80, a FIFO or something was added to make it easier to update OAM live, reason (2) would be solved, but reason (1) would still be an issue. This game was not designed for the SNES, and brute-forcing the visual presentation and layout (including mid-scanline BGMODE switching) has left me without much room to maneuver.
Is there a video of your shooting game? Would be interesting to see it πŸ‘
No, I'm trying to keep the identity of the game I'm porting secret. I also haven't made as much progress as you'd think in the last 11 years; hopefully that's going to change soon...

There is a Super FX bullet rendering demo, though: viewtopic.php?p=190917#p190917
It's janky on an FXPak Pro, but it works fine in any reasonably accurate emulator; I think the FXPak Super FX takes some shortcuts to fit on the FPGA...
NeoOne
Posts: 19
Joined: Sat Jul 22, 2023 8:52 am

Re: How many sprites can the Neo Geo update per frame?

Post by NeoOne »

93143 wrote: Sat Sep 20, 2025 3:05 pm
NeoOne wrote: Sat Sep 20, 2025 5:43 amYes I can see the software scaling now.
I can see another kind of software scaling too. It seems I neglected to put [/size] at the end of my last footnote...
I read the extra chip they put in Pilotwings was going to be part of the the SNES originally)
That'd be the DSP-1, also used in Super Mario Kart and a bunch of other games (but not F-Zero). It helps with 3D transforms and such, but it's infuriating once you delve into it because the rigid Harvard architecture forces it to spend most of its time parsing commands rather than actually executing them. So much wasted potential... It's still an improvement over just the bare S-CPU, and if I'm not mistaken there was at one point a plan to include it in the console, which makes sense as it has obvious synergy with Mode 7. I have heard the rumour that Pilotwings had to be equipped with it at the last second because it was stripped out of the console for cost reasons.
BTW your shooter - If you had less sprites available (e.g. 80), could you re-use them with interrupts?
No. For two reasons.

1) I'm already using interrupts and HDMA so extensively that I have maybe a quarter of my S-CPU compute time left. Thank goodness for the Super FX...

2) Rewriting OAM outside VBlank or forced blank is insanely difficult, because (a) the S-PPU takes over the internal address, so writes will go wherever the video chip last looked instead of to the location specified in OAMADDL/H, and (b) it's likely that OAM rejects writes during S-PPU accesses, in which case most of the line would be unusable anyway (though CGRAM writes seem to go through 100% of the time, so who knows). I've been meaning to test this, but it's not high priority.

Uniracers is the only game that attempts OAM access during HBlank, and it only manages to change a byte in the high table (the interleaved X high bits and size bits, four sprites per byte. I told you the OAM format was goofy...).

Of course, if you specify that in addition to reducing the sprite count to 80, a FIFO or something was added to make it easier to update OAM live, reason (2) would be solved, but reason (1) would still be an issue. This game was not designed for the SNES, and brute-forcing the visual presentation and layout (including mid-scanline BGMODE switching) has left me without much room to maneuver.
Is there a video of your shooting game? Would be interesting to see it πŸ‘
No, I'm trying to keep the identity of the game I'm porting secret. I also haven't made as much progress as you'd think in the last 11 years; hopefully that's going to change soon...

There is a Super FX bullet rendering demo, though: viewtopic.php?p=190917#p190917
It's janky on an FXPak Pro, but it works fine in any reasonably accurate emulator; I think the FXPak Super FX takes some shortcuts to fit on the FPGA...
Yes DSP-1 seems to be in a fair number of games. If it only does 3D stuff though, it doesn't seem that useful for most normal 2D games. Probably made sense to take it out in that case.

Thanks for the info! It does seems that Nintendo didn't plan for OAM raster updates! I guess with 128 sprites available though, its not that much of an issue

Nice work with the bullets. It seems like you are making some kind of bullet hell game - whichever one it is. Would be interested to know how you do collision checks quickly on the *player bullets* versus enemies. That's the proper CPU intensive one if you have a lot of enemies. I think about this a lot!
User avatar
creaothceann
Posts: 862
Joined: Mon Jan 23, 2006 7:47 am
Location: Germany

Re: How many sprites can the Neo Geo update per frame?

Post by creaothceann »

NeoOne wrote: Thu Sep 25, 2025 8:35 am It does seems that Nintendo didn't plan for OAM raster updates! I guess with 128 sprites available though, its not that much of an issue
Nintendo didn't really have a choice - OAM is used all the time to either search for sprites that are on the next line, or for rendering them into the line buffer. The Neo Geo does have 2 sprite line buffers (which has implications regarding cost).
My current setup:
Super Famicom ("2/1/3" SNS-CPU-GPM-02) β†’ SCART β†’ OSSC β†’ StarTech USB3HDCAP β†’ AmaRecTV 3.10
stan423321
Posts: 126
Joined: Wed Sep 09, 2020 3:08 am

Re: How many sprites can the Neo Geo update per frame?

Post by stan423321 »

From an Atari 2600 and even C64 viewpoint, the entire primary OAM on NES, SNES, MD, etc. is machinery for automatic raster updates to real (secondary) OAM, which is only scanline-limit-sized. I don't know if that analogy holds with Neo Geo or GBA.
93143
Posts: 1914
Joined: Fri Jul 04, 2014 9:31 pm

Re: How many sprites can the Neo Geo update per frame?

Post by 93143 »

NeoOne wrote: Thu Sep 25, 2025 8:35 amYes DSP-1 seems to be in a fair number of games. If it only does 3D stuff though, it doesn't seem that useful for most normal 2D games. Probably made sense to take it out in that case.
The DSP-1's functions are:

General:
- 16-bit multiplication
- floating-point inverse
- sin/cos
Vector:
- vector magnitude squared
- vector magnitude compare
- vector magnitude
Coordinate:
- 2D rotation
- 3D rotation
Projection:
- set parameters for screen projection
- calculate Mode 7 matrix elements for a scanline
- calculate object position and size onscreen
- calculate ground coordinates of screen pixel
Attitude Control:
- calculate attitude matrices
- convert from global to object coordinates
- convert from object to global coordinates
- inner product of forward attitude with a vector
New Angle:
- 3D gyration

It almost seems custom-built for Pilotwings specifically, but it would be helpful for any game using Mode 7 perspective, and probably some 2D games, not to mention something like Wolfenstein 3D. It would almost have been like a very primitive GTE or RSP, one generation early...
Thanks for the info! It does seems that Nintendo didn't plan for OAM raster updates! I guess with 128 sprites available though, its not that much of an issue
It's not too bad, considering how much cheaper the SNES was than the Neo Geo, plus the fact that it has robust BG layer functionality.

The guy who made the recent Sonic demo on SNES suggested that the moving stage elements in Marble Zone would be hard to do on SNES due to the limits of the sprite system. Leaving aside the possibility that he was being too pessimistic about using sprites, I figured out that using Mode 2 would probably work fine because you can use HDMA to change the VRAM offset of the column scroll table...
Nice work with the bullets. It seems like you are making some kind of bullet hell game - whichever one it is. Would be interested to know how you do collision checks quickly on the *player bullets* versus enemies. That's the proper CPU intensive one if you have a lot of enemies. I think about this a lot!
Thanks. I've been thinking about that on and off, but I haven't finalized a method. Like I said, I haven't made as much progress as you'd expect, for reasons which will hopefully be out of the way in the near future. You're quite right; in rare circumstances it appears to be possible to have to deal with thousands of unique enemy/bullet combinations.

One fairly obvious approach would be to make sure the player bullets (which largely move upwards at constant speed) are strictly sorted by Y-coordinate. The simplest thing would then be to do a linear search from the near side, aborting on overshoot. A binary search would probably be a fair bit faster, although you'd have to check in both directions from the first detected hit to catch simultaneous collisions. This should already be a substantial improvement over just trying all of the possible combinations.

Most types of player bullets in this game can be grouped into "flights", which are released at the same time, travel (mostly) upward, and maintain a narrow vertical extent. Checking for collision with whole flights would allow much faster rejection of distant bullets. Sorting the flights and enemies into bins in the Y-axis could make this even quicker.

Some types of bullets travel in a fairly narrow column, the horizontal extent of which can be easily tracked. This could allow substantial numbers of enemies to reject that entire bullet type before even checking for collision with flights.

Homing shots can't really use the "flight" method and they're hard to sort by Y-coordinate, but they might benefit from a full 2D grid method like the one I used to do 128x128 collisions at 60 fps on the SNES CPU:
ROM: viewtopic.php?p=240647#p240647
Explanation: viewtopic.php?p=240751#p240751
This type of method should work well with the Y-axis bin sort used for flight collision checks, because the grid assignment can be easily reused as a 1D bin assignment by simply considering a row of cells as a bin, without having to redo anything.

I expect using box-point collision will be optimal. Basically you just have to adjust the hitbox of whatever thing you're checking for collisions to be appropriate for the size of the colliders in the list you're checking it against, which should be much more efficient than loading or calculating a hitbox every time you load a collider from the list.

Did I miss any good ideas?

I'll have to do comprehensive testing to make sure I go with the optimal method. This game does load the Super FX fairly heavily just with the enemy bullet patterns, and in some cases I have to render backdrop elements into the bargain, so I can't afford to be lazy with collisions.
Pokun
Posts: 3442
Joined: Tue May 28, 2013 5:49 am
Location: Hokkaido, Japan

Re: How many sprites can the Neo Geo update per frame?

Post by Pokun »

93143 wrote: Fri Sep 26, 2025 9:29 pm It almost seems custom-built for Pilotwings specifically
And it probably was, considering Pilotwings evolved out of the SFC flight-sim tech demo "Dragonfly".
I guess they used the idea of a flight-sim as a goal when designing the hardware, resulting in mode 7, the DSP and Pilotwings. Flight-sims were hard to do for consoles at the time and if the SNES have hardware designed for a flight-sim it should be able to handle all sorts of games using similar advanced physics. It's kind of similar to how the Famicom was designed around Donkey Kong, which was an advanced arcade game that was hard to faithfully port to consoles at the time.


I guess the Neo Geo might have been designed for working well with fighting games with large sprites and animated backgrounds (at least SNK made many fighting games for it).
NeoOne
Posts: 19
Joined: Sat Jul 22, 2023 8:52 am

Re: How many sprites can the Neo Geo update per frame?

Post by NeoOne »

93143 wrote: Fri Sep 26, 2025 9:29 pm
Thanks. I've been thinking about that on and off, but I haven't finalized a method. Like I said, I haven't made as much progress as you'd expect, for reasons which will hopefully be out of the way in the near future. You're quite right; in rare circumstances it appears to be possible to have to deal with thousands of unique enemy/bullet combinations.

One fairly obvious approach would be to make sure the player bullets (which largely move upwards at constant speed) are strictly sorted by Y-coordinate. The simplest thing would then be to do a linear search from the near side, aborting on overshoot. A binary search would probably be a fair bit faster, although you'd have to check in both directions from the first detected hit to catch simultaneous collisions. This should already be a substantial improvement over just trying all of the possible combinations.

Most types of player bullets in this game can be grouped into "flights", which are released at the same time, travel (mostly) upward, and maintain a narrow vertical extent. Checking for collision with whole flights would allow much faster rejection of distant bullets. Sorting the flights and enemies into bins in the Y-axis could make this even quicker.

Some types of bullets travel in a fairly narrow column, the horizontal extent of which can be easily tracked. This could allow substantial numbers of enemies to reject that entire bullet type before even checking for collision with flights.

Homing shots can't really use the "flight" method and they're hard to sort by Y-coordinate, but they might benefit from a full 2D grid method like the one I used to do 128x128 collisions at 60 fps on the SNES CPU:
ROM: viewtopic.php?p=240647#p240647
Explanation: viewtopic.php?p=240751#p240751
This type of method should work well with the Y-axis bin sort used for flight collision checks, because the grid assignment can be easily reused as a 1D bin assignment by simply considering a row of cells as a bin, without having to redo anything.

I expect using box-point collision will be optimal. Basically you just have to adjust the hitbox of whatever thing you're checking for collisions to be appropriate for the size of the colliders in the list you're checking it against, which should be much more efficient than loading or calculating a hitbox every time you load a collider from the list.

Did I miss any good ideas?

I'll have to do comprehensive testing to make sure I go with the optimal method. This game does load the Super FX fairly heavily just with the enemy bullet patterns, and in some cases I have to render backdrop elements into the bargain, so I can't afford to be lazy with collisions.
The first thing which is I think, is doing it the brute force way (every player bullet against every enemy) is actually good up to a certain number of objects. If you really streamline your code for that and get everything in registers, (Not sure how many registers SNES CPU has? maybe it has zero page?) and organise data structures well. It can be quite fast because each loop doesn't use that many CPU cycles and 80% of collision checks are probably terminated by the first 2 checks on one axis.

The very simple horizontal shooter (its just a demo game really I am optimizing) I am currently working on can have up to 80 enemies. 20 player bullets and 120 enemy bullets and it works (currently) with brute force collision checks at 60fps (EDIT : up to 14 player bullets at 60fps) - and everything *just now*, is in C apart from the sprite update routines which are in 68000 so they can all be done in vbl period. I did have the brute force collision routine in assembler but had to put it back to C to make some changes since I am not that good coding assembler yet! Obviously though Ne Geo has a fast (12Mhz) 68000 which helps here.

On this game - it would be possible for me to check half the player bullets one frame and half the next and no collisions would be missed. Because all player bullets are horizontally moving and the enemies are wide enough. But really I am thinking more about a future more complex game I intend to make, so I haven't taken much advantage of optimisations like that on purpose

I have looked at collision zones and until you get a high number of objects, it seems to take the program more time to move enemies etc between zones than it does to do the simple brute force check. But yes with a lot of bullets this does change!

Apart from that though - I have thought about a small 8 pixel x 8 pixel grid (I saw C64 games use this) but this seems inaccurate and if you have larger objects - you have to make them fill a lot of grid cells. Also i presume you have to clear the grid every frame (or part of the grid) too

I like your thinking with the overall type of space the player bullets take up - I have thought of incorporating that check into my current game too.

I have never thought about Y sorting before, because I thought it would take a lot of time. So how fast can you sort the Y position of all your bullets/enemies (in display lines)?

BTW I am currently reading the links you posted. Trying to understand it all. A lot it is new to me. I like that you just went ahead and coded that 128 object example - while everyone was still talking about it!
Last edited by NeoOne on Fri Oct 10, 2025 1:40 pm, edited 1 time in total.
User avatar
creaothceann
Posts: 862
Joined: Mon Jan 23, 2006 7:47 am
Location: Germany

Re: How many sprites can the Neo Geo update per frame?

Post by creaothceann »

NeoOne wrote: Fri Oct 10, 2025 9:02 am (Not sure how many registers SNES CPU has? maybe it has zero page?)
The 65c816 core has only one 16-bit accumulator and two 16-bit index registers (both types can be switched to 8-bit). The 8-bit data bus doesn't help either...

The 6502 has a zero page fixed to address $0000; on the 65c816 it's renamed to direct page and can be moved to anywhere in the first 64 KiB.
My current setup:
Super Famicom ("2/1/3" SNS-CPU-GPM-02) β†’ SCART β†’ OSSC β†’ StarTech USB3HDCAP β†’ AmaRecTV 3.10
NeoOne
Posts: 19
Joined: Sat Jul 22, 2023 8:52 am

Re: How many sprites can the Neo Geo update per frame?

Post by NeoOne »

creaothceann wrote: Fri Oct 10, 2025 10:42 am The 65c816 core has only one 16-bit accumulator and two 16-bit index registers (both types can be switched to 8-bit). The 8-bit data bus doesn't help either...

The 6502 has a zero page fixed to address $0000; on the 65c816 it's renamed to direct page and can be moved to anywhere in the first 64 KiB.
That's pretty cool about direct page though. I once did some simple tests with 6502 and it's speed is comparable to 68000 for many functions. BUT only if clock speed is same e.g. 8Mhz 6502 Vs 8Mhz 68000. That's why PC Engine is so good because it has a fast enhanced 6502 (7.16 MHz with some faster data transfer instructions too)

I think for adding 2 x 16 bit numbers stored in memory the standard 6502 is faster than 68000! (assuming equal clock speed)

Code: Select all

; 6502: Add two 16-bit numbers from memory (zero page)
; Inputs: num1 ($00-$01), num2 ($02-$03)
; Output: sum ($04-$05)
CLC           ; Clear carry (2 cycles)
LDA $00       ; Load low byte of num1 (3 cycles)
ADC $02       ; Add low byte of num2  (3 cycles)
STA $04       ; Store low byte of sum (3 cycles)
LDA $01       ; Load high byte of num1 (3 cycles)
ADC $03       ; Add high byte of num2 with carry (3 cycles)
STA $05       ; Store high byte of sum   (3 cycles)

= 20 cycles total

; 68000: Add two 16-bit numbers from memory (68000 has no zero page as far as I know!)
; Inputs: num1 at $1000, num2 at $1002
; Output: sum at $1004
MOVE.W $1000, D0  ; Load num1 into D0 (12 cycles)
ADD.W  $1002, D0  ; Add num2 to D0    (12 cycles)
MOVE.W D0, $1004  ; Store result to memory (12 cycles)

= 36 cycles total 

The actual ADD can be faster on 68000 (if in registers = 4 cycles) but it's slower to get values from memory. But then 6502's addressing abilities are not as good and 8 bit values slow it down for storing proper 16 bit screen coordinates etc. Also much easier, more compact (+ more fun)to code for 68000.
User avatar
aa-dav
Posts: 339
Joined: Tue Apr 14, 2020 9:45 pm
Location: Russia

Re: How many sprites can the Neo Geo update per frame?

Post by aa-dav »

NeoOne wrote: Sat Oct 11, 2025 6:09 am MOVE.W $1000, D0 ; Load num1 into D0 (12 cycles)
ADD.W $1002, D0 ; Add num2 to D0 (12 cycles)
This is should be rewritten as
MOVE.W $00001000, D0 ; Load num1 into D0 (12 cycles)
ADD.W $00001002, D0 ; Add num2 to D0 (12 cycles)
to highlight one important nuance. m68k really does much more work of reading/transferring bytes here to be 32 bit arch. And to be 32-bit is a lot of practical preference for developer.
It is more honest to test register-register in m68k versus accumulator-zero-page in 6502.
Such tricks in 6502 as "to move structure to zero-page to work intensively with it and store result in main memory" are analogous to caching data in registers in m68k.
User avatar
TmEE
Posts: 1074
Joined: Wed Feb 13, 2008 9:10 am
Location: Norway (50 and 60Hz compatible :P)

Re: How many sprites can the Neo Geo update per frame?

Post by TmEE »

a good assembler will use short-addressing for the 16bit direct representation of the address value (and one can specify it directly too as ($1234).W, which is 4 cycles less effort compared to 32bit one. It is why first 32 and last 32KB of 68K address space are precious - you can use 16bit addresess to access either end with less cycles. 68K programmer also makes good use of address increment and decrement to traverse the data, since that comes for free.
User avatar
aa-dav
Posts: 339
Joined: Tue Apr 14, 2020 9:45 pm
Location: Russia

Re: How many sprites can the Neo Geo update per frame?

Post by aa-dav »

TmEE wrote: Sun Oct 12, 2025 1:29 am a good assembler will use short-addressing for the 16bit direct representation of the address value (and one can specify it directly too as ($1234).W, which is 4 cycles less effort compared to 32bit one. It is why first 32 and last 32KB of 68K address space are precious - you can use 16bit addresess to access either end with less cycles. 68K programmer also makes good use of address increment and decrement to traverse the data, since that comes for free.
Oh... It seems I misinterpreted "address long" vs "address short" from this table: http://goldencrystal.free.fr/M68kOpcodes-v2.3.pdf
I thought this is about data width, but it is about address wide and (...).W vs (...).L is about address. Really, data width is in instruction code already.
Thanks.
I never program for m68k, just overviewed this architectures, so missed such details.
Anyway real program is something different from synthetic tests with questionable comparisons. Common scheme in "modern" CPUs is to cache addresses and data in registers and minimize memory accesses. 6502 architecture in contrast is from 1970 and every two-operand instruction must take second operand from/to memory.
NeoOne
Posts: 19
Joined: Sat Jul 22, 2023 8:52 am

Re: How many sprites can the Neo Geo update per frame?

Post by NeoOne »

aa-dav wrote: Sat Oct 11, 2025 6:06 pm
NeoOne wrote: Sat Oct 11, 2025 6:09 am MOVE.W $1000, D0 ; Load num1 into D0 (12 cycles)
ADD.W $1002, D0 ; Add num2 to D0 (12 cycles)
This is should be rewritten as
MOVE.W $00001000, D0 ; Load num1 into D0 (12 cycles)
ADD.W $00001002, D0 ; Add num2 to D0 (12 cycles)
to highlight one important nuance. m68k really does much more work of reading/transferring bytes here to be 32 bit arch. And to be 32-bit is a lot of practical preference for developer.
It is more honest to test register-register in m68k versus accumulator-zero-page in 6502.
Such tricks in 6502 as "to move structure to zero-page to work intensively with it and store result in main memory" are analogous to caching data in registers in m68k.
You only really need 32 bit for the addresses in 2D games. It's faster to use 16 bit variables and they can represent high enough numbers for 98% of uses. Normally the reason 32 bit CPUS are better is just because they are more modern and faster + have better features. The 68000 is from 1979 for example

I think for 2D games, 32 bit is not that much use? I suppose you could move data blocks faster though - if you need to do that.

The reason I didn't use registers in 68000 is because zero page is in comparison, a lot of "registers" compared to the 16 registers in 68000. You could store a whole array in zero page!. So I thought the way I did it was a fairer comparison. The real test would be a simple game though. But I think 68000 and 6502 (at the same clock speed) would not be very far away from each other. Maybe the 68000 would be 1.3 times as fast overall, something like that.

It always seems to me that 6502 is more like RISC architecture and 68000 like CISC!