Nicole wrote:It is possible to access all of WRAM through the WRAM registers as well ($2180-2183), though that's not really "direct access".
Yeah, that's damn near useless.
Stef wrote:Indeed it completely rewrite the sprite table during forced blanking in middle of screen
That's pretty impressive... 96 objects really does seem excessive for any of the Sonic games though, it's not like it's Contra.
psycopathicteen wrote:1) The direct page and stack can only be located within bank 0.
I don't care about the stack, but the deal with direct page is a pretty big one. If you're loading from the other 120KB of ram, you're stuck with only using the data in bank $00. I'll probably just DMA some data to ram on startup. Object palettes are an example, because my routine needs access to them and it's not like the palettes are going to take an unrealistic amount of space. Plus, the data is rewritable. Donkey Kong Country has about 64 (Edit: I remembered that wrong, it's more like 128) sprite palettes total. Metal Slug probably has no more than twice that. (It's insane though, it appears Metal Slug does some sort of dynamic palette updating. Why? It never goes past 64 onscreen palettes, ever

)
Revenant wrote:if they weren't aiming for similarity with the NES then they could have used a different processor entirely.
It seemed that this is the only reason the 65816 was ever used. I don't know anything else that used it except the Apple IIGS, and it used it there for Apple II compatibility.
I don't think the SNES's problem is the processor itself, but rather, the ram that's not even as fast as the CPU that's not exactly known for its horsepower (although I think it's lack of power is exaggerated, but that's another story). The memory mapping is silly (like I said, you can't load from any ram past the first 8KB and rom at the same time, it's like, why did they even put the other 120KB of ram in there to begin with?) and the communication with the SPC700 is really poor in that it takes too much CPU time to upload a reasonable amount of data. I know I'm going far overboard (my animation scheme and my palette changing stuff) but I find that most of the CPU time is spent making up for the PPU's shortcomings. I know that's ridiculous to say, but I can honestly see just about any arcade game from the time period running at the speed they do with the 65816, but there's so many background layers and animation and palettes and sprites that unless you're doing all this crap to compensate with the CPU (like sprite multiplexing, dynamic animation, etc.) which would potentially slow down the CPU too much, the PPU won't cut it.
tokumaru wrote:but from the stuff I read here it looks like you have a lot of complexity related to sprites and patterns.
Basically the only thing I've been doing in terms of programming has been object or dma to vram related. 32 bytes was kind of preposterous though. I'm at something like 26 bytes from all my routines, but I haven't even implemented any AI or physics stuff yet. I'll probably need at least 48 bytes, which with 96 objects (probably what I'd settle with. Maybe higher later, but 128 seems a little ridiculous) would be 4.5KB. However, say I want 64 bytes and 128 objects, I'm using the entire 8KB.
I think I found out my battle plan though. Like I said earlier, have whatever variables that have routines that don't need any information outside of the first bank will be past the first 8KB of ram. Whatever else will be in the first 8KB. Luckily for me, a lot of variables are (or are expected to) not touched by the actual object code. For example, I have X and Y position variables, but they are just for the total level. Then, I have onscreen X and Y position variables that a routine generates, and then these are used by the metasprite routine and other stuff. Non-onscreen X and Y are really only for object code, while onscreen is for everything else. What's nice (I want to make my engine as all-purpose as possible, just swap out a few routines for different types of games) is that I can have a simple "subtract x and y by camera x and y" or I can do something more complicated that uses multiplication and whatnot for a mode 7 racer.
Edit: (It's not worth to double post for this) Is "bmi" not the same thing as "and #$8000, bne" because it's not giving me the same result. I've been trying to speed up my metasprite routine, and I found that this was something minor I could do to help.
Also, what is the difference between "and" and "bit"? They both do the exact same thing, accept it affects an additional "V" flag, which is the "overflow" flag. I didn't know what that meant, so I looked it up, and I still don't get it.