How many sprites can the Neo Geo update per frame?

Discussion of development of software for any "obsolete" computer or video game system. See the WSdev wiki and ObscureDev wiki for more information on certain platforms.
Pokun
Posts: 3476
Joined: Tue May 28, 2013 5:49 am
Location: Hokkaido, Japan

Re: How many sprites can the Neo Geo update per frame?

Post by Pokun »

NeoOne wrote: Thu Oct 23, 2025 10:31 am Yes 8-bit variables good for many things although on 68000 I don't think byte operations are any faster than 16 bit ones.
Yeah I think that's why Amiga, Mega Drive, Neo Geo and X68000 etc are considered 16-bit machines, although the 68000 has some 32-bit capability. Though the SNES is considered 16-bit too despite it having 8-bit modes as well and can do both 8- and 16-bit equally well after switching modes.


NeoOne wrote: Thu Oct 23, 2025 10:31 am I was reading somewhere recently that all N64 games run in 32 bit mode. Just because there was no real point to using 64 bits. 32 bits is accurate enough! Even 3D had its limit back then
Yep that's likely true. The R4300i CPU introduces a 64-bit architecture to the MIPS family, but it can also work in 32-bit mode and that's what the games normally use. In hindsight it should have been called Nintendo 32!
User avatar
aa-dav
Posts: 339
Joined: Tue Apr 14, 2020 9:45 pm
Location: Russia

Re: How many sprites can the Neo Geo update per frame?

Post by aa-dav »

turboxray wrote: Mon Oct 20, 2025 10:20 am What is your obsession with pipelines.
It's because pipelining shaped modern RISC: https://en.wikipedia.org/wiki/Classic_RISC_pipeline
This typical "to best fit in stages of conveyor" is still there.
As with everything it all evolves and modern RISC could be very complex out-of-order execution, but looking it instruction you always see this 'fetch-decode-execure-memory-store' pattern which shapes instruction's complexity and abilities. At least largest part of instructions because there were some trade-offs too in different desings - something which blocks pipeline could be introduced if alternatives are worse or just imposibble (like syncs).

P.S.
Well, I do not want to make this arguing become flame.
My point was about question 'is 6502 RISC in any way?'.
I argue: not. Because in many ways it is not. It has memory-memory operations. It has no many internal registers to minimize memory accesses, but instead uses memory in almost every instruction as second operand. Overall: it is bad for pipelining and has no specific features about it.
Ok, let's think about idea 'RISC is simple because compilers do not need that much instructions from CISC'. Well, in my opinion 6502 was developed without compilers in mind at all: there is no single instruction which can be described as 'ah, that is for high-level language support'. Even i8080 has instruction which can be thought as 'support for stack addressing (ADD HL, SP)'. So, it is just my point: 6502 is not about RISC, it's about accumulator-memory architectures from 70th.
tepples
Posts: 23006
Joined: Sun Sep 19, 2004 11:12 pm
Location: NE Indiana, USA (NTSC)

Re: How many sprites can the Neo Geo update per frame?

Post by tepples »

When "bits" were used in marketing as a proxy for a game console's visual throughput, the count usually[1] corresponded to the widest data bus on the mainboard, counting multiple words per clock cycle as a width multiplier.
  • TG-16 and Super NES have their CPU on an 8-bit data bus and PPU on a 16-bit data bus implemented wtih two 32Kx8 SRAMs (e.g. 62256).
  • Genesis has its CPU on a 16-bit data bus and VDP on an 8-bit data bus running at double data rate (DDR), making it effectively 16-bit as well.
  • Jaguar's memory is on a 64-bit data bus. Only the blitter really uses 64 bits at once.
  • Nintendo 64 has its CPU behind a northbridge on a GPU with a 9-bit data bus to RDRAM operating at octal data rate, for a 72-bit effective data bus. (Very little uses the 9th bit of each byte.)
Neo Geo is kind of weird. Its Z80 and 68000 data buses are 8- and 16-bit as expected, and the S ROM bus for fixed scoreboard tiles is 8 bits as well (two 4bpp pixels). The C ROMs hold sprite tiles, the vast majority of graphics data. These are addressed in 32-bit words (typically implemented as two 16-bit ROMs), treated as one 8x1-pixel 4bpp sliver per word. As I understand it, an octal 4:1 multiplexer (PRO-CT0) converts each word to four 8-bit 2x1-pixel units fed to the sprite compositor (NEO-B1); this multiplexer is in the MVS (arcade) system board or each AES (console) cartridge. And all of this was labeled "24-bit" for some odd reason.

[1] Two exceptions to this trend predate bit count marketing: Intellivision (CPU has 16-bit data bus, of which contemporary cartridges used 10) and Master System (VDP has 16-bit data bus). Neo Geo is the other, with its oddball "24-bit" choice.
User avatar
TmEE
Posts: 1078
Joined: Wed Feb 13, 2008 9:10 am
Location: Norway (50 and 60Hz compatible :P)

Re: How many sprites can the Neo Geo update per frame?

Post by TmEE »

tepples wrote: Sat Oct 25, 2025 12:02 pm[*]Genesis has its CPU on a 16-bit data bus and VDP on an 8-bit data bus running at double data rate (DDR), making it effectively 16-bit as well.
Not DDR, not in sense that both edges of the clock are used. VRAM bus does run at twice the pixel clock.
Pokun
Posts: 3476
Joined: Tue May 28, 2013 5:49 am
Location: Hokkaido, Japan

Re: How many sprites can the Neo Geo update per frame?

Post by Pokun »

Doesn't the SNES CPU run two bytes per clock on the 8-bit data-bus effectively making it 16-bit?
tepples
Posts: 23006
Joined: Sun Sep 19, 2004 11:12 pm
Location: NE Indiana, USA (NTSC)

Re: How many sprites can the Neo Geo update per frame?

Post by tepples »

No, and I don't know what would have led you to believe that. The closest thing I can think of is a simultaneous read and write on the 8-bit data bus when performing DMA from a device on address bus A to a device on address bus B. (This is usually ROM or WRAM to VRAM, or more rarely ROM to WRAM, coprocessor to WRAM or VRAM, or ROM or WRAM to APU I/O.)

During object pattern fetch, and during background nametable and pattern fetch in modes 0-6, the S-PPU reads even bit planes from one VRAM chip and odd from the same address in the other, treating them as a 32Kx16-bit memory. (The TG-16 behaves similarly.) The two are uncoupled during background fetch in mode 7, with nametable from one and pattern from the other. There's no DDR-like behavior in WRAM, VRAM, or ARAM that I'm aware of. CGRAM is read twice per pixel at different addresses, once for sub and once for main, but that's inside the S-PPU, not on the mainboard.

That said, I haven't studied bus behavior of GSU, SA-1, CX4, MCD, SVP, or 32X.
93143
Posts: 1923
Joined: Fri Jul 04, 2014 9:31 pm

Re: How many sprites can the Neo Geo update per frame?

Post by 93143 »

NeoOne wrote: Fri Oct 10, 2025 9:02 amThe first thing which is I think, is doing it the brute force way (every player bullet against every enemy) is actually good up to a certain number of objects.
Certainly, which is why I'd want to test each approach to see if it actually helps. The homing shots aren't especially numerous by themselves, so it's possible that just doing them naïvely could be optimal.
(Not sure how many registers SNES CPU has? maybe it has zero page?)
I might try doing player bullet collisions on the S-CPU during boss fights, to free up a little more time on the Super FX, but it's the Super FX that's going to be doing the heavy-duty stuff with potentially 2000+ collisions per frame during stages.

The S-CPU has one accumulator and two index registers, and the instruction set is not orthogonal (though it's better than the 6502). You can't do math with the index registers. Direct page (a movable zero page) is generally the fastest way to do something like this.

The Super FX is a very different chip. It doesn't like accessing memory; it has 8-bit ROM and RAM buses with access times of 5 cycles per byte when the core is in high-speed mode. To minimize bus delays, it has a 512-byte instruction cache (no data access, but single-cycle instruction execution) and 16 16-bit general-purpose registers...

...well, kinda. R15 is the program counter, R14 is the ROM buffer pointer, and various other registers have special functions (at least R1, R2, R4, R6, R7, R8, R12, and R13 if I recall correctly) and sometimes restrictions, although you can still use them for math and for storing values. Also, R0 functions as a sort of accumulator, since the RISC-like instruction set is 8-bit and is already extremely cramped just specifying one register for the instructions that need it. The FROM, TO, and WITH prefix instructions are used to specify additional source and destination registers, but they reset after use, and if you don't use the prefixes, SREG and DREG default to R0, making it considerably quicker under some circumstances to do math and logic in or with R0.

I figured that for simple enemy bullets, I could probably just leave the player's position in registers during the entire move/collide/draw/despawn loop. I might make this into two loops, since some bullets require more complicated logic, and even the simplest ones are already overloading the I-cache such that some of the despawn code has to run directly from ROM.
The very simple horizontal shooter (its just a demo game really I am optimizing) I am currently working on can have up to 80 enemies. 20 player bullets and 120 enemy bullets and it works (currently) with brute force collision checks at 60fps
NeoOne wrote: Thu Oct 23, 2025 10:31 amit can now display (+ collision check) up to 140 enemies, 153 enemy bullets and 21 player bullets.
Very nice, and somewhat encouraging. In my case, though, I kinda need the collisions to be a secondary load, since I'm going to want the vast majority of the Super FX's compute time to go towards rendering bullets, often enemies, and sometimes backdrop elements. Even a Neo Geo doesn't have enough sprites to handle this game in hardware...
On this game - it would be possible for me to check half the player bullets one frame and half the next and no collisions would be missed.
There are probably lots of situations where I could do that, but I really don't want to because it's a port, and I'm trying to make it as accurate as possible. I'm already using the same PRNG, and it would be nice if it ended up somewhat replay-compatible.
I have never thought about Y sorting before, because I thought it would take a lot of time. So how fast can you sort the Y position of all your bullets/enemies (in display lines)?
I haven't tried it yet, and I'm intensely occupied with my job at the moment so I can't spare the brain power. I rather suspect it could be expensive. They're almost sorted, but not quite. I expect colliding with flights before individual bullets may be a better plan, and flights would be much easier to sort. Furthermore, since all the bullets in a flight are at roughly the same Y-coordinate, sorting individual bullets might be a waste of time.
BTW I am currently reading the links you posted. Trying to understand it all. A lot it is new to me. I like that you just went ahead and coded that 128 object example - while everyone was still talking about it!
Yeah, that was fun. I figured out most of the method while out on a walk. I still remember a particular pickup truck I walked past while thinking about it. I remember initially not liking the grid idea very much, but as you can see it grew on me...

NeoOne wrote: Sat Oct 11, 2025 6:09 amBut then 6502's addressing abilities are not as good and 8 bit values slow it down
NeoOne wrote: Thu Oct 23, 2025 10:31 amon 68000 you have a MUL instruction to multiply and a DIV instruction to divide but on 6502 you have to write those out all in the little individual steps by adding or subtracting or whatever.
That's the nice part about the 65C816; it can operate in 16-bit mode, has a 16-bit ALU and can use 8-, 16-, or 24-bit addressing. You don't have to construct 16-bit operations out of 8-bit ones; it just takes an extra cycle to load or store a 16-bit value on the 8-bit bus. Plus the actual S-CPU has a bunch of custom bells and whistles, including an MMIO multiplier and divider. I think somebody roughly estimated once that the 3.58 MHz S-CPU is probably equivalent to a 5-6 MHz 68000. Weaker overall than the Mega Drive CPU, but not nearly as much weaker as the clock speed difference suggests.

Fun fact: the Sony SPC700 (the 1 MHz 8-bit 6502 knockoff used as a sound CPU in the SNES) has MUL and DIV instructions. The Super FX has several multiply instructions, but it doesn't have a divide instruction; you have to use reciprocal tables.

aa-dav wrote: Sat Oct 18, 2025 7:05 pm What is RISC really about? It's about pipeline.
Oddly enough, the 65C816 and the Super FX (usually considered RISC) have the same pipeline length (and width). One byte, just enough for an opcode.

The major difference is that the 65C816 discards the pipelined byte if a branch is taken, and the Super FX doesn't. This means that the Super FX kinda has branch prediction, but it's up to the programmer rather than being automatic at runtime.

Pokun wrote: Thu Oct 23, 2025 2:51 pmIn hindsight it should have been called Nintendo 32!
The PlayStation would have crushed it even harder than it did, coming out a year and a half late with a me-too name like that...

Well, since somebody brought up the Nintendo 64, I've got to say it again: I wish they'd done that respin to fix the memory interface bug. While they were at it, they might have realized that the additive blend mode was useless without clamping, and possibly even noticed the off-by-one multitexture bug. The CPU being too powerful was the least of its issues, and would have been at least partly solved by faster memory access (the RDRAM's data rate was faster than every piece of RAM in the PlayStation combined; the issue was latency, and if that article is right, a design error in the N64's chipset may have been largely to blame).

Pokun wrote: Sat Oct 25, 2025 2:01 pmDoesn't the SNES CPU run two bytes per clock on the 8-bit data-bus effectively making it 16-bit?
I wish. Loading or storing a 16-bit value takes two cycles. 24-bit addresses as operands take 3 cycles. It's an 8-bit bus.

In fact, it's worse than that. The 6502 has this weird quirk whereby only half of a memory access cycle counts as a memory access, meaning the memory has to be twice as fast to keep up. This quirk seems to have been removed from the PC Engine's HuC6280, resulting in the ability to run at 7.16 MHz on fast ROM or RAM. Unfortunately the licensed 65C816 core in the RIcoh 5A22 (the S-CPU) was not given this treatment, so memory that should be capable of over 8 MHz is only good enough for 3.58 MHz, and memory that should be capable of 5 MHz is only good enough for 2.68 MHz (it appears they were at least smart enough to only extend the actual memory access part of the cycle to deal with slow memory; the dead half is always 3 master clocks regardless).

It does (or rather can, and sometimes does) access the bus every cycle, as opposed to every 4 cycles for a 68000, which means that despite the Mega Drive's 16-bit bus and much higher clock speed vs. the SNES, the actual peak bus throughput is almost identical. (Plus the S-CPU doesn't know what a wait state is, so nothing else in the system can prevent it from using the bus, something that is not true of the 68000.)

SNES DMA is also basically the same speed as the Mega Drive (at least in H32 mode, and H40 mode is only moderately faster) because the Mega Drive uses 8-bit VRAM, rendering half the bus useless when loading VRAM (the main task of DMA on both systems). It's genuinely weird how close these two systems' specs ended up, given how different all the design decisions were.
User avatar
aa-dav
Posts: 339
Joined: Tue Apr 14, 2020 9:45 pm
Location: Russia

Re: How many sprites can the Neo Geo update per frame?

Post by aa-dav »

93143 wrote: Sat Oct 25, 2025 9:44 pm ...Super FX (usually considered RISC)...
This thing is really strange. I wouldn't have the heart to call it RISC.
It tries to be one-byte opcode, but you've mention TO/FROM/WITH instructions which makes factual variable length of opcodes. And MOVE instruction is TO or FROM if they are preceded with WITH. Ehm... It does not look like 'easy to decode'. It looks like decoder must take several cycles to do it's job.
Ortogonality of registers are also affected by this fact. Well it is load/store architecture and have link register to implement calls, IFAIR, so it has RISC infuence indeed. But I would call it semi-RISC. :)
NeoOne
Posts: 25
Joined: Sat Jul 22, 2023 8:52 am

Re: How many sprites can the Neo Geo update per frame?

Post by NeoOne »

tepples wrote: Sat Oct 25, 2025 12:02 pm When "bits" were used in marketing as a proxy for a game console's visual throughput, the count usually[1] corresponded to the widest data bus on the mainboard, counting multiple words per clock cycle as a width multiplier.
  • TG-16 and Super NES have their CPU on an 8-bit data bus and PPU on a 16-bit data bus implemented wtih two 32Kx8 SRAMs (e.g. 62256).
  • Genesis has its CPU on a 16-bit data bus and VDP on an 8-bit data bus running at double data rate (DDR), making it effectively 16-bit as well.
  • Jaguar's memory is on a 64-bit data bus. Only the blitter really uses 64 bits at once.
  • Nintendo 64 has its CPU behind a northbridge on a GPU with a 9-bit data bus to RDRAM operating at octal data rate, for a 72-bit effective data bus. (Very little uses the 9th bit of each byte.)
Neo Geo is kind of weird. Its Z80 and 68000 data buses are 8- and 16-bit as expected, and the S ROM bus for fixed scoreboard tiles is 8 bits as well (two 4bpp pixels). The C ROMs hold sprite tiles, the vast majority of graphics data. These are addressed in 32-bit words (typically implemented as two 16-bit ROMs), treated as one 8x1-pixel 4bpp sliver per word. As I understand it, an octal 4:1 multiplexer (PRO-CT0) converts each word to four 8-bit 2x1-pixel units fed to the sprite compositor (NEO-B1); this multiplexer is in the MVS (arcade) system board or each AES (console) cartridge. And all of this was labeled "24-bit" for some odd reason.

[1] Two exceptions to this trend predate bit count marketing: Intellivision (CPU has 16-bit data bus, of which contemporary cartridges used 10) and Master System (VDP has 16-bit data bus). Neo Geo is the other, with its oddball "24-bit" choice.
The Neo Geo's GPU does have a 24-bit data bus https://wiki.neogeodev.org/index.php?title=P_bus. It's really using it too! To process a lot of sprite information at once. So I say its legit in that way. Also the GPU is 80% of a system's power. CPU maybe 20%. For example if you gave the Neo Geo the Megadrive graphics chip - you'd suddenly lose two thirds of your sprites. Doesn't matter what CPU you have, you aren't getting those sprites back!

It's interesting to think about but really that whole bittage thing was only "invented" when a new generation was on the way e.g. Amiga and ST were called "the 16-bit computers" at the time, and then not long after we had "16 bit consoles" - the SNES and Megadrive - on the way. The bittage definition was never meant to then be applied to every other generation as a way of measuring power. The whole way of classifying systems by the size of the CPU data bus was taken from computers, it should never have been applied to consoles with lots of support chips.

For me , I like the term "16-bit *generation* of consoles" to mark an era. That's how I'd deal with systems like the Intellivision which is clearly from the 8 bit generation, despite its 16 bit CPU. And yes I'd have the Neo Geo as a 16 bit console using this way of classifying systems too
NeoOne
Posts: 25
Joined: Sat Jul 22, 2023 8:52 am

Re: How many sprites can the Neo Geo update per frame?

Post by NeoOne »

93143 wrote: Sat Oct 25, 2025 9:44 pm
NeoOne wrote: Fri Oct 10, 2025 9:02 amThe first thing which is I think, is doing it the brute force way (every player bullet against every enemy) is actually good up to a certain number of objects.
Certainly, which is why I'd want to test each approach to see if it actually helps. The homing shots aren't especially numerous by themselves, so it's possible that just doing them naïvely could be optimal.
Yes if you organise the object data so it can be read in the fastest way possible that helps make it fast. I'm not sure at what threshold (number of collision checks) it becomes slower than collision zones/grids etc. But yes def would be good for certain in-game objects I think
The Super FX is a very different chip. It doesn't like accessing memory; it has 8-bit ROM and RAM buses with access times of 5 cycles per byte when the core is in high-speed mode. To minimize bus delays, it has a 512-byte instruction cache (no data access, but single-cycle instruction execution) and 16 16-bit general-purpose registers...
I remember reading an interview with Argonaut where they said the Super FX can only update the SNES graphics tiles at a set speed. So even if the game had somehow been made faster, the frame rate could not have been 60fps because of this screen update restriction. Will you have any problems to do with that?
I figured that for simple enemy bullets, I could probably just leave the player's position in registers during the entire move/collide/draw/despawn loop. I might make this into two loops, since some bullets require more complicated logic, and even the simplest ones are already overloading the I-cache such that some of the despawn code has to run directly from ROM.
Yes that seems like a good idea. All my collision routines are back in C at the moment. But I intend to convert them into assembler again soon. So far I haven't thought much about anything other than player bulltes vs enemies. Since all the other routines are so much faster naturally
Even a Neo Geo doesn't have enough sprites to handle this game in hardware...
It's possible to re-use of sprites using raster interrupt but yes sadly the 96 sprite limit always there though :( Maybe for some bullet patterns it could use a single group of animated sprites though (Neo Geo is not short on graphics ROM!). That would be a nice trick.
I have never thought about Y sorting before, because I thought it would take a lot of time. So how fast can you sort the Y position of all your bullets/enemies (in display lines)?
On the Y sorting I think you said a bin sort. Wouldn't this be slower than just using bit shifts to the right on object coordinates to just get a "screen zone" number
That's the nice part about the 65C816; it can operate in 16-bit mode, has a 16-bit ALU and can use 8-, 16-, or 24-bit addressing. You don't have to construct 16-bit operations out of 8-bit ones; it just takes an extra cycle to load or store a 16-bit value on the 8-bit bus. Plus the actual S-CPU has a bunch of custom bells and whistles, including an MMIO multiplier and divider. I think somebody roughly estimated once that the 3.58 MHz S-CPU is probably equivalent to a 5-6 MHz 68000. Weaker overall than the Mega Drive CPU, but not nearly as much weaker as the clock speed difference suggests.
That's interesting to read. The SNES CPU seems underrated. I always thought a lot of developers back in the day were coding it wrong. But also I heard some SNES game used slower memory with slower CPU speed which seems crazy now. I think Castlevania was one
Pokun
Posts: 3476
Joined: Tue May 28, 2013 5:49 am
Location: Hokkaido, Japan

Re: How many sprites can the Neo Geo update per frame?

Post by Pokun »

tepples wrote: Sat Oct 25, 2025 2:25 pm No, and I don't know what would have led you to believe that.
I think I mixed it up with the bank address bus (BA0 to BA7) sharing pins with the data bus (D0 to D7) for outputting bank address during the first half of each cycle and data access on the second half.
lidnariq
Site Admin
Posts: 11811
Joined: Sun Apr 13, 2008 11:12 am

Re: How many sprites can the Neo Geo update per frame?

Post by lidnariq »

NeoOne wrote: Sun Oct 26, 2025 8:44 am The Neo Geo's GPU does have a 24-bit data bus https://wiki.neogeodev.org/index.php?title=P_bus. It's really using it too!
Despite what their wiki says, that's primarily an address bus. Only the top 8 bits ever carry data. A bunch of different things are multiplexed on the full 24-bit bus, including fake-DDR 64-bit (32 wires, one clock used as an asynchronous address line; the contents of the P bus are latched by the cartridge, fed to two 16-bit ROMs, and data returned on the CRxx pins) for sprite tile data ("C address") and fake-DDR 16-bit (8 wires, another clock used as an asynchronous address line; again latched by the cartridge, fed to one 8-bit ROM, and data returned on the FIXDx pins) ("S address"). The vertical shrinking table output data is there so that it can be fed back into the address needed for sprite tile fetch, with the low 4 bits where they are so that they can be reused verbatim for the C fetch.
93143
Posts: 1923
Joined: Fri Jul 04, 2014 9:31 pm

Re: How many sprites can the Neo Geo update per frame?

Post by 93143 »

NeoOne wrote: Sun Oct 26, 2025 9:09 amI remember reading an interview with Argonaut where they said the Super FX can only update the SNES graphics tiles at a set speed. So even if the game had somehow been made faster, the frame rate could not have been 60fps because of this screen update restriction.
They were probably talking about DMA. The S-PPU can't see the cartridge, and the Super FX renders to a framebuffer in Game Pak RAM, on the cartridge. So you have to DMA the finished framebuffer into VRAM in order to see it.

Normally you get about 6 KB of DMA per TV frame (on an NTSC SNES; PAL gets way more), minus overhead and other operations, because VRAM is locked during active display and HBlank. But you can letterbox the screen with forced blanking to get more time.

Typically a Super FX game's framebuffer is something like 224x192 and usually 4bpp, and can in principle be transferred in two blanking periods for a maximum sustainable frame rate of 30 fps. But there are other things that also have to be transferred during VBlank/forced blank, like palette, sprite attributes, and perhaps tilemap information and auxiliary tile data, so 20 fps may be more practical. Star Fox is capped at 20 fps.

There's an additional wrinkle, in that Star Fox, Vortex, and probably a majority of Super FX games were (for budgetary reasons) given insufficient Game Pak RAM to double buffer on the Super FX side. This means that the Super FX has to wait for the whole framebuffer to be transferred before it can start drawing the next frame; all it can do while waiting is clear the parts that have already been transferred. This is why Vortex is basically as slow as Star Fox despite running on a 21 MHz GSU1; essentially, to sustain 20 fps, the game would have to finish rendering in less than one TV frame. Star Fox manages this at the very beginning of the training mission.
Will you have any problems to do with that?
Yes and no, I think. The way I've set up the playfield size and letterboxing, I can transfer a 4bpp rendered layer at 30 fps, or a 2bpp layer at 60 fps, with enough room for the other stuff I have to transfer. Most scenarios will have to use the 4bpp 30 fps mode, but I doubt I'd be able to render some of the busier patterns at a reliable 60 fps anyway. But there are some patterns that do work at 2bpp 60 fps, and some are twitchy enough to need it...

I plan to run all motion and collisions at 60 Hz for accuracy reasons. Only certain components of the visuals will have to run at 30.
It's possible to re-use of sprites using raster interrupt but yes sadly the 96 sprite limit always there though :( Maybe for some bullet patterns it could use a single group of animated sprites though (Neo Geo is not short on graphics ROM!). That would be a nice trick.
Perhaps. But I think there are enough scenarios with several hundred bullets that are either heavily clustered or in highly chaotic patterns (or both) that tricks like these wouldn't work everywhere. These sorts of workarounds might fit better in an original game where you can design the game to work with the techniques, rather than a port.

The Neo Geo is certainly famous for its game sizes. Somebody once suggested that it could brute-force Space Harrier that way...
On the Y sorting I think you said a bin sort. Wouldn't this be slower than just using bit shifts to the right on object coordinates to just get a "screen zone" number
...no, I said "sorting into bins". Right shift is probably one of the quickest ways to do bin assignment.

Reviewing my code for the 128x128 collisions test, it looks like I used a hash table in WRAM (not the fastest place to put it, but I wanted to keep the ROM small) to assign sprites to cells based on the high byte of Y concatenated with the high 7 bits of X (you can cat bytes quickly on 65C816 by using 8-bit mode; you load one byte, swap the active and hidden halves of the accumulator, and load the other byte)...
The SNES CPU seems underrated. I always thought a lot of developers back in the day were coding it wrong.
I believe it is. You can do some pretty impressive-looking stuff with it if you know what you're doing. As for coding it wrong, I've played a very impressive slowdown removal hack of Gradius III that doesn't use the SA-1 (not to be confused with the more famous one that does), and unless they've fixed it recently, you basically can't play it on Arcade difficulty because the slowdown in the options menu is gone so you have to actually be Takahashi Meijin to tap the button fast enough to enable the secret mode...
I heard some SNES game used slower memory with slower CPU speed which seems crazy now. I think Castlevania was one
Virtually all early SNES games did that, because it was cheaper.

The SNES CPU has funny timing. It's hardwired to access different regions of the memory map at different speeds. There are three types of bus access cycles: Fast (6 master clock cycles per CPU cycle), Slow (8 clocks per cycle), and XSlow (12 clocks per cycle).

Internal (non-bus) cycles are always Fast. Most MMIO access is Fast, except for the serial ports which are XSlow*. WRAM is Slow, annoyingly enough. ROM is Slow, unless you set bit 0 of $420D, in which case ROM in the top half of the memory map is Fast.

Real code is mostly a mix of Fast and Slow cycles; if you set $420D.0 and run your code from bank $80 or higher, it will be closer to 3.58 MHz, and if you don't it will be closer to 2.68 MHz. (Minus the DRAM refresh, of course, which steals ~3% of every scanline...)

Nintendo required the purchase of 120 ns ROM to allow the use of the "FastROM" setting. Otherwise you could cheap out and use 200 ns ROM. Understandably, the prevalence of FastROM increased over the commercial lifetime of the console. (Note that Rendering Ranger R², the game I linked above, used SlowROM despite being a late release; this makes it even more impressive IMO.)


* Fortunately you don't need them much, or at all if you're using standard controllers and you can just use the S-CPU's controller autoread feature, which runs in the background and doesn't steal cycles. This hasn't stopped Sega fanboys from claiming the SNES is capped at 1.79 MHz if a controller is plugged in (or was that guy just a troll?)...

aa-dav wrote: Sun Oct 26, 2025 7:37 amBut I would call it semi-RISC. :)
Attempting RISC with 16 registers and 8-bit opcodes seems like a bit of a stretch; it reduces the instruction set a little too much...

I guess they made it work, and for a heck of a lot cheaper than the SVP...
Pokun
Posts: 3476
Joined: Tue May 28, 2013 5:49 am
Location: Hokkaido, Japan

Re: How many sprites can the Neo Geo update per frame?

Post by Pokun »

Sounds a bit RISCy...
turboxray
Posts: 402
Joined: Thu Oct 31, 2019 12:56 am

Re: How many sprites can the Neo Geo update per frame?

Post by turboxray »

93143 wrote: Sat Oct 25, 2025 9:44 pm I think somebody roughly estimated once that the 3.58 MHz S-CPU is probably equivalent to a 5-6 MHz 68000. Weaker overall than the Mega Drive CPU, but not nearly as much weaker as the clock speed difference suggests.
If it ran at 3.58mhz. It's closer to 3.05mhz since 65x design roots means it touches ram (slower work ram) for like 90% of its instructions.


I did some comparisons between optimized object-object bound-box collision detection for the '816 and 68k - they were surprisingly close despite the clock speed difference.

Also one for delta velocity updates. The famous Steve Snake had one on the 68k that used ADD.L (A0)+, (A1)+ ... but while that is really fast, it also had to be sorted, in a special format, etc. And it also doesn't take into account things like active and inactive objects in a level, etc (you typically don't want ALL off screen objects to be "active"). This is where the 65x design of the '816 pulls ahead or at least closer given the clock speed differences (because of the indexing modes.. and things like ldx arr,y ). Note: the ADD.L was for 16.16 fixed point numbers, which was overkill but fast. I used 16.8 + 8.8 in my version for '816, and I had a little optimization for signed numbers.


But yeah, optimized code is most probably within the 5-6mhz range equivalency of the 7.6mhz 68k. But that's one of the strengths of the 68k ISA.. you can write sub-optimal code and still get really good results (I firmly believe this is why it was used in so many arcade machines.. high clock rate + forgiving ISA). It's more resilient to poor code than the 65x design is.