SNES Doom Source Released! Now What?
Moderator: Moderators
Forum rules
- For making cartridges of your Super NES games, see Reproduction.
Re: SNES Doom Source Released! Now What?
Would any of these popcount algorithms be more efficient than a LUT?
-
- Posts: 611
- Joined: Mon Jan 23, 2006 7:47 am
- Location: Germany
- Contact:
Re: SNES Doom Source Released! Now What?
Those seems to be loops, LUTs, and some complicated-looking operations with larger registers.
A LUT (that fits into a CPU cache) is usually very fast, at the expense of space. Would the game be space-restrained?
A LUT (that fits into a CPU cache) is usually very fast, at the expense of space. Would the game be space-restrained?
My current setup:
Super Famicom ("2/1/3" SNS-CPU-GPM-02) → SCART → OSSC → StarTech USB3HDCAP → AmaRecTV 3.10
Super Famicom ("2/1/3" SNS-CPU-GPM-02) → SCART → OSSC → StarTech USB3HDCAP → AmaRecTV 3.10
Re: SNES Doom Source Released! Now What?
That's actually an issue. Without sketching out the surrounding code, I can't say how fast a LUT would be, because the Super FX has no data cache, so loading a byte takes five cycles no matter what. If I don't have four cycles of useful work to do in between setting R14 and reading the value, I'm wasting time waiting. RAM is worse because it doesn't buffer in parallel, and reading a byte is more inefficient because the 8-bit RISC instruction set prioritizes reading words; if my assumptions are correct, reading a byte from RAM consumes 8 cycles in 21 MHz mode regardless of what the surrounding code looks like.
I'm also pretty much out of registers, so anything I do that requires registers is going to take extra time. Loading an immediate byte is two cycles on Super FX, and an immediate word is three. The more unique values I need, the more extra time it will take (which means a popcount based on magic constants is going to be slow too).
We're seriously considering putting a three-layer compression algorithm in between the most time-critical part of the game and the data it needs. Yes, I'd say the game is space-constrained.Would the game be space-restrained?
Then again, a 256-byte LUT is not all that large, even if there's one in every bank (to avoid needing to switch banks, which costs cycles and registers). If it's part of the decompressor, I'd say it is what it is.
And there's still the possibility that the unconnected pin 21 of the GSU2 could turn out to be what nocash thought it was, which would permit 4 MB of ROM and ease things up considerably. It wasn't an official configuration, though, which makes me think twice about whether this would constitute cheating in the case of a port of a '90s-vintage game...
Re: SNES Doom Source Released! Now What?
My take on that one. If it would be a feasible option in the 90s, then I wouldn't consider it cheating. So in that sense, if connecting above mentioned pin would add the possibility of 4MB, I wouldn't consider it cheating.93143 wrote: ↑Sat Apr 16, 2022 7:19 am And there's still the possibility that the unconnected pin 21 of the GSU2 could turn out to be what nocash thought it was, which would permit 4 MB of ROM and ease things up considerably. It wasn't an official configuration, though, which makes me think twice about whether this would constitute cheating in the case of a port of a '90s-vintage game...
Adding raspberry pi 4 onto a chart to do all the heavy lifting, I'd consider cheating (while technically interesting).
Re: SNES Doom Source Released! Now What?
I'm leaning towards using the spare pin if it's what nocash thinks it is. The memory problem for this game in particular is quite severe, at least if the objective is to not have to make cuts.
But using RAM on the GSU's ROM bus to allow the S-CPU to change out the data would have been far too expensive back then, even discounting the substantial engineering required to make it work. That one's probably a trick for a different game...
Optional MSU1 music would be cool, as long as the game defaults to DSP music. Even apart from the better quality, it would allow the sound effects engine to use the full resources of the S-DSP without turning the music off. And if you try really hard, you can almost pretend it's a Voicer-kun (looping and two-player mode kinda ruin the illusion)...
MSU1 data, though, is hard to justify in this case. A few megabytes could potentially be defended in an emergency, but I'd rather not have to. The S-DD1 is period hardware and might work in a pinch...
But using RAM on the GSU's ROM bus to allow the S-CPU to change out the data would have been far too expensive back then, even discounting the substantial engineering required to make it work. That one's probably a trick for a different game...
Optional MSU1 music would be cool, as long as the game defaults to DSP music. Even apart from the better quality, it would allow the sound effects engine to use the full resources of the S-DSP without turning the music off. And if you try really hard, you can almost pretend it's a Voicer-kun (looping and two-player mode kinda ruin the illusion)...
MSU1 data, though, is hard to justify in this case. A few megabytes could potentially be defended in an emergency, but I'd rather not have to. The S-DD1 is period hardware and might work in a pinch...
Re: SNES Doom Source Released! Now What?
Well you can try to create a new wad format, to reduce memory footprint.
For example for the THINGS Lump in a wad is 10 bytes long (in DOS Doom standard), but you could adopt an "optimized" format (only 6 bytes):
Code: Select all
Things
Offset Size (bits) Description Format Example
0 16 x position int16_t -200
16 16 y position int16_t -128
32 8 Thing Type uint8_t 1
40 4 Angle facing 4bit BAM 0
34 4 Flags 4bit 7
Flags
bit hex description
0 0x01 Thing is on skill levels 1 & 2
1 0x02 Thing is on skill level 3
2 0x04 Thing is on skill levels 4 & 5
3 0x08 Thing is deaf
4 0x0010 Thing is not in single player <- drop to fit 1 byte
Thing type:
Doom, Doom II, Final Doom contain 123 different THINGS (https://doom.fandom.com/wiki/Thing_types or https://doomwiki.org/wiki/Thing_types)
There is room to optimize other Lumps of the map format, too
Re: SNES Doom Source Released! Now What?
I could probably take some cues from Randy's port - he managed to jam nearly the whole game into just 2 MB.
If the extra pin enables 4 MB and the texture compression works really well, it might be possible to include the level data in the Super FX ROM as well as the CPU ROM, eliminating the need to transfer geometry data to the Super FX RAM. Space saving on level data would then become more important. If it's only on the CPU side, I'm reasonably confident that everything will fit, because you've got up to 6 MB (or 18 MB with an S-DD1) that doesn't have to contain texture data. The original game was about 10-11 MB depending on version, and supposedly over half of it was texture data. You'd just have to be careful not to blow the ROM limit with unrolled loops and lookup tables (and even that's probably impossible if an S-DD1 can safely share a cart with a GSU)...
The above discussion is predicated on the S-CPU being powerful enough to preprocess level data in a heavily rethought Doom-equivalent engine, and send the result up to GPRAM. if it isn't, then either the largest single level has to fit in GPRAM in its entirety alongside the framebuffers and working data, or all of them have to go in GPROM whether the extra pin is real or not. Or, technically, a small amount of U-ROM could be implemented - 64 KB might be enough, and it wouldn't break the bank like 2 MB would - but it's still a hit to authenticity and feels a lot more like cheating than just connecting an existing unused pin does.
If the extra pin enables 4 MB and the texture compression works really well, it might be possible to include the level data in the Super FX ROM as well as the CPU ROM, eliminating the need to transfer geometry data to the Super FX RAM. Space saving on level data would then become more important. If it's only on the CPU side, I'm reasonably confident that everything will fit, because you've got up to 6 MB (or 18 MB with an S-DD1) that doesn't have to contain texture data. The original game was about 10-11 MB depending on version, and supposedly over half of it was texture data. You'd just have to be careful not to blow the ROM limit with unrolled loops and lookup tables (and even that's probably impossible if an S-DD1 can safely share a cart with a GSU)...
The above discussion is predicated on the S-CPU being powerful enough to preprocess level data in a heavily rethought Doom-equivalent engine, and send the result up to GPRAM. if it isn't, then either the largest single level has to fit in GPRAM in its entirety alongside the framebuffers and working data, or all of them have to go in GPROM whether the extra pin is real or not. Or, technically, a small amount of U-ROM could be implemented - 64 KB might be enough, and it wouldn't break the bank like 2 MB would - but it's still a hit to authenticity and feels a lot more like cheating than just connecting an existing unused pin does.
Re: SNES Doom Source Released! Now What?
The Doom 1 full wad is a little over 11MB. But for each level you only need to load a subset of the 11MB obviously.
That’s also what PC doom does, only load the required subset data into RAM.
Besides texture compression there are certain other optimization potentials on the doom level format:
- Texture references for walls, floors, ceilings are in 8byte (8 character) format. You could replace this with a UINT16 reference, reducing the need by 6bytes for each wall in level. Of course then you have a 64k max textures limit, but I don’t think Doom uses that many. Also this is what they went for in Doom64 for the N64.
- Flags and types entries in Doom are mostly 16bit, but these are unnecessarily large, and 8bit entries could be used. Example the Things type. It’s a 16bit entry originally. But Doom, Doom II, Final Doom use only 123 different THINGS (https://doom.fandom.com/wiki/Thing_types). So you can easily just use an 8bit entry. You have however convert some numbers compared to the original doom format. A 'conversion' of the type number would be fairly simple, however:
Type Number < 100 -> keep
2000 ≤ Type Number ≤ 3000 -> remove 1st two digits and replace with 1 (Type Number = Type Number - 2000 + 100). Example: 2019 (Megaarmor) -> 119
3000 < Type Number -> remove 1st two digits and replace with 2 (Type Number = Type Number - 2800). Example: 3006 (Lost soul) -> 206
- the SEGS entry in a map lump als has huge saving potentials in my opinion. The original format is 12bytes per entry. But I guess you could use an alternative format using 6bytes only (you’d add some performance hits however).
Idea: Drop Starting and Ending Vertex Info and get it via the Linedef Number. The direction indicates if Start and End need to be reversed (compared to Linedef).
But you can also Drop Direction and "encode" it in Linedef number. An opposite direction would be a negative linedef number. So -623 would mean line 623 opposite direction.
So you only keep angle, linedef and offset entry which is 6bytes.
Last but not least the blockmap entry in the maps lump can be compressed with the use ZokumBSP for example: https://github.com/zokum-no/zokumbsp
Just some ideas.
That’s also what PC doom does, only load the required subset data into RAM.
Besides texture compression there are certain other optimization potentials on the doom level format:
- Texture references for walls, floors, ceilings are in 8byte (8 character) format. You could replace this with a UINT16 reference, reducing the need by 6bytes for each wall in level. Of course then you have a 64k max textures limit, but I don’t think Doom uses that many. Also this is what they went for in Doom64 for the N64.
- Flags and types entries in Doom are mostly 16bit, but these are unnecessarily large, and 8bit entries could be used. Example the Things type. It’s a 16bit entry originally. But Doom, Doom II, Final Doom use only 123 different THINGS (https://doom.fandom.com/wiki/Thing_types). So you can easily just use an 8bit entry. You have however convert some numbers compared to the original doom format. A 'conversion' of the type number would be fairly simple, however:
Type Number < 100 -> keep
2000 ≤ Type Number ≤ 3000 -> remove 1st two digits and replace with 1 (Type Number = Type Number - 2000 + 100). Example: 2019 (Megaarmor) -> 119
3000 < Type Number -> remove 1st two digits and replace with 2 (Type Number = Type Number - 2800). Example: 3006 (Lost soul) -> 206
- the SEGS entry in a map lump als has huge saving potentials in my opinion. The original format is 12bytes per entry. But I guess you could use an alternative format using 6bytes only (you’d add some performance hits however).
Code: Select all
SEGS entry
Offset Bytes Description Idea
0 2 Starting vertex number (drop?)
2 2 Ending vertex number (drop?)
4 2 Angle
6 2 Linedef number "encode" direction.
8 2 Direction 1 byte (or drop)
10 2 Offset
But you can also Drop Direction and "encode" it in Linedef number. An opposite direction would be a negative linedef number. So -623 would mean line 623 opposite direction.
So you only keep angle, linedef and offset entry which is 6bytes.
Last but not least the blockmap entry in the maps lump can be compressed with the use ZokumBSP for example: https://github.com/zokum-no/zokumbsp
Just some ideas.
Last edited by M2m on Fri Apr 22, 2022 7:20 pm, edited 3 times in total.
-
- Posts: 18
- Joined: Thu Oct 10, 2013 3:30 pm
Re: SNES Doom Source Released! Now What?
93143 wrote: ↑Thu Mar 31, 2022 3:24 am There's a renderer for Mega Drive called PortalView, that uses "sorted linedefs". I'm imagining something where the linedefs associated with a subsector are ordered clockwise or something, so you can stop searching when you run off the edge of the portal. With subsectors of arbitrary convex geometry, you might want to be able to merge multiple portals opening into the same subsector so you don't have to process that subsector multiple times, but if you could come up with a visible linedef quickly and just search left and right until you get an occluded vertex, it might not be so bad. And a split portal is really two portals anyway, so maybe it's good to handle a split sector twice. It's hard to see how you could easily pick a linedef that can be seen through the portal, though - perhaps the portal data could suggest a likely candidate, resulting in a shorter search...
I'm the author of the megadrive portal engine you mention. I haven't read through the entire discussion, but I figured I could give a quick explanation of the sorted linedef idea. As far as I know, KK/Altair came up with this idea for Dread, a doom "clone" on Amiga.
Essentially, it's a merging of the visible surface determination algorithm used in build, and quake's pre-calculated PVS.
Build uses portals to search through the game world for rendering walls, but it uses concave sectors. This allows them to use far less sectors than a naive portal engine, which reduces clipping and transformation costs, but it also means that objects/walls can potentially occlude other walls in the same sector. It uses a complex test to determine which is in front, and this problem is solvable in 2D but not in 3D, because cyclic overlap cannot happen in 2D. Interestingly, this problem is why Ken Silverman didn't finish a 3D engine to compete with quake after build.
Anyway, the sorted linedef idea is to calculate all potentially visible walls for each sector, ala quake style PVS, and then sort those linedefs such that from any position in the sector, any wall B drawn after wall A cannot occlude wall A. There should always be a valid sorting order. As far as I can tell, this is probably the fastest way to handle visibility in "2.5D" worlds.
Re: SNES Doom Source Released! Now What?
Just came here to say, that your MD Doom "clone" is the most amazing thing, I have seen in 2021/22. Keep it up !ehaliewicz wrote: ↑Fri Apr 22, 2022 6:21 pm I'm the author of the megadrive portal engine you mention.
Re: SNES Doom Source Released! Now What?
Oh hey. Nice work.ehaliewicz wrote: ↑Fri Apr 22, 2022 6:21 pmI'm the author of the megadrive portal engine you mention.
Wow, I was way off.Anyway, the sorted linedef idea is to calculate all potentially visible walls for each sector, ala quake style PVS, and then sort those linedefs such that from any position in the sector, any wall B drawn after wall A cannot occlude wall A. There should always be a valid sorting order. As far as I can tell, this is probably the fastest way to handle visibility in "2.5D" worlds.
Are you using convex sectors or subsectors? I don't see how a single sorted list could handle a concave sector. The whole problem with concave sectors is that the Z order of potentially mutually-occluding linedefs is different when viewed from different parts of the sector. Isn't it?
Also, does this scheme require more storage than something like the Doom engine? I imagine it needs more than a Build-like engine at least.
It strikes me that this scheme pretty much gets rid of a significant chunk of the geometry preprocessing I was expecting the S-CPU to have to do in order to avoid needing to store level data in Super FX ROM. Frustum culling would still be helpful, I suppose, although it requires a lot of multiplication...
Funny - it had occurred to me that one could store per-subsector (or per-portal) lists of potentially visible subsectors, but I didn't think of taking it to the linedef level and I never considered the possibility that that's what you meant. I'm not familiar with the Quake engine; it seemed a bit too advanced to be relevant to Doom, but clearly that's not quite true...
-
- Posts: 18
- Joined: Thu Oct 10, 2013 3:30 pm
Re: SNES Doom Source Released! Now What?
Yes, this requires you to break down concave sectors to convex. Build gets away with concave sectors because it only ever calculates this order from a single view-point, at runtime.93143 wrote: ↑Fri Apr 22, 2022 8:27 pm Wow, I was way off.
Are you using convex sectors or subsectors? I don't see how a single sorted list could handle a concave sector. The whole problem with concave sectors is that the Z order of potentially mutually-occluding linedefs is different when viewed from different parts of the sector. Isn't it?
Yet, it takes quite a bit of space, but it can be in ROM since it never needs to be updated.
Yes, quake stores lists of potentially visible sectors, per-sector, so it's not quite as fine grained. It's also not pre-sorted in the same kind of order (because it's not possible in 3D).93143 wrote: ↑Fri Apr 22, 2022 8:27 pm It strikes me that this scheme pretty much gets rid of a significant chunk of the geometry preprocessing I was expecting the S-CPU to have to do in order to avoid needing to store level data in Super FX ROM. Frustum culling would still be helpful, I suppose, although it requires a lot of multiplication...
Funny - it had occurred to me that one could store per-subsector (or per-portal) lists of potentially visible subsectors, but I didn't think of taking it to the linedef level and I never considered the possibility that that's what you meant. I'm not familiar with the Quake engine; it seemed a bit too advanced to be relevant to Doom, but clearly that's not quite true...
Re: SNES Doom Source Released! Now What?
Okay, that makes sense. It also seems to match what I saw in that YouTube video, where the portals were actually shown on screen to demonstrate how the renderer worked...ehaliewicz wrote: ↑Fri Apr 22, 2022 8:39 pmYes, this requires you to break down concave sectors to convex. Build gets away with concave sectors because it only ever calculates this order from a single view-point, at runtime.
That might still turn out to be an issue. Doom has a lot of geometry, and according to another poster the level data in the IWAD takes up nearly 4 MB already. Then again, the fact that the existing SNES port fits most of the levels in 2 MB along with the entire rest of the game, without nerfing the geometry like the Jaguar-derived ports, suggests that the PC version might have been a little expansive with its level data format.it takes quite a bit of space, but it can be in ROM since it never needs to be updated.
Well, I guess there's no way to know without doing it. At this stage (basically I want to do this, but don't have the time), it's probably wise to keep my options open...
-
- Posts: 18
- Joined: Thu Oct 10, 2013 3:30 pm
Re: SNES Doom Source Released! Now What?
Well just storing potentially visible subsectors will probably help already, and it is very compressible (most sectors will not be visible from any given sector). The only problem is that it wouldn't provide a near to far sort order, you'll have to traverse a bsp tree or portal graph to determine the correct sorting order while checking if the sectors you're visiting are in the pvs.
One potential problem is that doom maps are less tightly constrained than quake maps, which often contain 90 degree angle tunnels and such to cut off visibility for the pvs generator.
One potential problem is that doom maps are less tightly constrained than quake maps, which often contain 90 degree angle tunnels and such to cut off visibility for the pvs generator.
Re: SNES Doom Source Released! Now What?
Is there a good way to do coarse frustum culling without having to transform every vertex?
I guess if you want very coarse culling you could just check if either vertex of a linedef is in either of the quadrants the view frustum is in. This would only require a single coordinate comparison per vertex, and no multiplication.
I guess if you want very coarse culling you could just check if either vertex of a linedef is in either of the quadrants the view frustum is in. This would only require a single coordinate comparison per vertex, and no multiplication.