Every good programming technique
Moderator: Moderators
Forum rules
- For making cartridges of your Super NES games, see Reproduction.
-
psycopathicteen
- Posts: 3001
- Joined: Wed May 19, 2010 6:12 pm
Re: Every good programming technique
I do profile my code, it's just that there aren't that many large optimizations left.
Re: Every good programming technique
Actually, the iteration itself should be slightly (but pretty much negligibly) faster when using a linked list, because you can skip the step of testing whether each slot is empty, since all slots in the list are guaranteed to be occupied.rainwarrior wrote:The worst case performance is probably going to be when all the slots are full, too. The linked list as an optimization isn't effective on this case, too, unless you're iterating partial lists many times per frame
What about collision detection? That might be an issue if you have many active objects at once. I got a bit of a performance boost on the NES by reducing the number of 16-bit operations: the only 16-bit operation is calculating the distance between the 2 objects, which must be smaller than 256 for a collision to even be considered, and then the radiuses of the objects are subtracted from this distance using 8-bit math. This particular optimization might not apply to the SNES, which uses a 16-bit CPU, but the point is that sometimes optimization is about doing things in a different way, not in the same way but with the optimal sequence of instructions.psycopathicteen wrote:it's just that there aren't that many large optimizations left.
- rainwarrior
- Posts: 8062
- Joined: Sun Jan 22, 2012 12:03 pm
- Location: Canada
- Contact:
Re: Every good programming technique
In my current project, the "empty object" variable is actually a "type of object" index to a jump table, so technically it doesn't actually test for emptiness explicitly. ;P Not that this matters. I'm not trying to argue for or against a linked list (they have tons of useful applications), I'm trying to talk about optimization.tokumaru wrote:Actually, the iteration itself should be slightly (but pretty much negligibly) faster when using a linked list, because you can skip the step of testing whether each slot is empty, since all slots in the list are guaranteed to be occupied.
For example: I think the most important word in your response was "negligibly". If an "emptiness test" (or equivalent) and loop takes, say, 12 more cycles than the linked list, that savings is indeed negligible until you're in a situation where you have to do that test enough times that it matters. The question that is absolutely critical here is: "How many objects are empty, and how many times do we iterate through the objects?" I keep using the word specific because the answer to this question is a specific one; every situation is different.
The big point I was trying to make is that optimizing for the general case won't help. If you're trying to fix a lag frame, you need to tackle the specific things that happened in that lag frame. Optimizing how empty objects are handled usually benefits frames where the load on the system is already low and not having a performance problem.
So, my suggestion is:
1. Profile the worst case. Make sure that you have a way to identify the lag frames specifically. If your profiling tool can already sort out frames by total cycles spent, or even if you need to write code that identifies lag frames specifically (e.g. if you have an NMI counter, you can check at the end of the frame whether it's already been incremented), do it. Dump and profile trace logs of the specific frames that need optimization.
Presuming that you're already doing effective profiling and you've identified what's taking the most time, I guess I would just tell you to start with the largest thing you can find, and work down from there. I obviously can't make specific optimization suggestions without knowing all the details of your project, and being able to see the code, trace logs, and profiling information that you've got, so I'm not even going to try to speculate that it might be this or that thing. If your profiling tools are in place, that's the most important step; measuring performance is the most effective way to improve it.psycopathicteen wrote:there aren't that many large optimizations left
With all that out of the way, if it's not worth optimizting what you've got any more, here are some alternative approaches:
2. Change of algorithm. Sometimes there's a different way to accomplish the same thing. Gratuitous example: the advantages of an online sorting algorithm when you're maintaining a list that doesn't change much from frame to frame, versus an offline sorting algorithm which performs much better when sorting a whole list from scratch. Only you know what your game does, so I can't just make suggestions like "use algorithm X because it's better"; it always depends what you're doing with it.
3. Load balancing. If you don't have other options, consider making some object types only update every second frame (and coordinate alternation with others), or otherwise distribute the performance load over multiple frames. Look for behaviours where you don't think the player would notice if one of the objects was running at 30 hz instead of 60. This doesn't have to be all or nothing; for example a bullet might move and collide with the player at 60 hz, but only test for collision with a wall at 20 hz if it doesn't really matter if they stop with frame perfect accuracy.
4. Load reduction. Change the design of the area that produces lag to do less stuff. Fixing bad performance is very often not really the programmer's domain. There are plenty of other places to fix it. Don't put so many performance heavy objects in the same room.
5. Doing something else. This is really just a combination of 4 and 2, but if one type of enemy behaviour is really killing performance, maybe try to make something different that doesn't. There's probably another idea you could use in its place that would perform better but still be effective.
These are all very real and very practical was of optimizing, even though each of them may change the functionality of the code. A good algorithmic change is worth N2 peephole optimizations.
Re: Every good programming technique
Yeah, you can quickly tell when a slot is empty after loading the "type" variable by branching away if it's 0, for example, but with a linked list you wouldn't have that BEQ instruction at all. This is why I said that the difference is minimal, but I thought it was important to point out that linked lists shouldn't be considered slower when it comes to iterating through objects (i.e. this is not where the problem is). In my specific case, my current design actually uses both approaches: I used linked lists when updating the objects, but later I scan all slots (in random order) to render sprites, and I actually have to check a variable to tell whether each object needs drawing or not.rainwarrior wrote:In my current project, the "empty object" variable is actually a "type of object" index to a jump table, so technically it doesn't actually test for emptiness explicitly. ;P
-
psycopathicteen
- Posts: 3001
- Joined: Wed May 19, 2010 6:12 pm
Re: Every good programming technique
I implemented region based level-object spawning, and that helped a lot. I think I'll just leave it the way it is for now. It's not really noticeable unless you're using a CPU meter.
Re: Every good programming technique
That sounds like your real problem was just spawning way too many objects at once.
That spawning a single object takes up 11 scanlines sounds kind of worrisome however. I should probably profile the spawning code in my object managers before complaining though.
That spawning a single object takes up 11 scanlines sounds kind of worrisome however. I should probably profile the spawning code in my object managers before complaining though.
-
psycopathicteen
- Posts: 3001
- Joined: Wed May 19, 2010 6:12 pm
Re: Every good programming technique
It took 11 scanlines total to look through every object in the level, check bounds and spawn objects in range.Sik wrote:That sounds like your real problem was just spawning way too many objects at once.
That spawning a single object takes up 11 scanlines sounds kind of worrisome however. I should probably profile the spawning code in my object managers before complaining though.
-
psycopathicteen
- Posts: 3001
- Joined: Wed May 19, 2010 6:12 pm
Re: Every good programming technique
I just thought about another good programming practice: optimize you're code for consistency between frames. Make it so the amount of CPU time doesn't vary wildly from one frame to the next.
Take for instant, my animation engine does not give a consistent amount of CPU time per frame, because it depends on when a character is updated and what ratio of 16x16 cells are updated compared to 32x32 cells. I don't feel like fixing it now, but I'm thinking about ways to do so. I think it would be a good idea to have a 32x16 slot mode, so 2 16x16s can share a slot without extra overhead. Also, it might be beneficial to search through the list of VRAM slots for open 32x32 and 16x16 (and possibly 32x16) slots before hand. Heck, I might possibly get rid of 16x16 slots if they prove to be unnecessary with 32x16 slots.
EDIT: Just checked my animation engine again, and it was surprisingly consistent, so just ignore what I just said.
Take for instant, my animation engine does not give a consistent amount of CPU time per frame, because it depends on when a character is updated and what ratio of 16x16 cells are updated compared to 32x32 cells. I don't feel like fixing it now, but I'm thinking about ways to do so. I think it would be a good idea to have a 32x16 slot mode, so 2 16x16s can share a slot without extra overhead. Also, it might be beneficial to search through the list of VRAM slots for open 32x32 and 16x16 (and possibly 32x16) slots before hand. Heck, I might possibly get rid of 16x16 slots if they prove to be unnecessary with 32x16 slots.
EDIT: Just checked my animation engine again, and it was surprisingly consistent, so just ignore what I just said.
Last edited by psycopathicteen on Tue Apr 19, 2016 3:51 pm, edited 2 times in total.
- rainwarrior
- Posts: 8062
- Joined: Sun Jan 22, 2012 12:03 pm
- Location: Canada
- Contact:
Re: Every good programming technique
That's not an optimization. The only way to do this is make cases that could be doing less work waste extra time just to be consistent. This is the opposite of optimization.psycopathicteen wrote:I just thought about another good programming practice: optimize you're code for consistency between frames. Make it so the amount of CPU time doesn't vary wildly from one frame to the next.
See also load balancing mentioned above. If you have longer and shorter updates for each object, you can do things to coordinate other objects so that they don't all do their long updates on the same frame.
-
psycopathicteen
- Posts: 3001
- Joined: Wed May 19, 2010 6:12 pm
Re: Every good programming technique
- rainwarrior
- Posts: 8062
- Joined: Sun Jan 22, 2012 12:03 pm
- Location: Canada
- Contact:
Re: Every good programming technique
There was no a misunderstanding there. What you're proposing would eliminate branches and then always take 10 scanlines, instead of sometimes taking 10 scanlines. Again, not an optimization. You're slowing down the best case to match the worst case.
Branchless code on small scales has some important optimization potential on systems with caching and pipelining, but this has nothing to do with 6500 family architecture, which has neither of those things.
All you're doing is creating predictable timing, which is a goal you could have, but I don't understand the purpose of it here. The usual use case for that is raster effects, but really only because the NES has no built-in timer. It has nothing to do with optimization. Why do you think it's an advantage, otherwise?
Branchless code on small scales has some important optimization potential on systems with caching and pipelining, but this has nothing to do with 6500 family architecture, which has neither of those things.
All you're doing is creating predictable timing, which is a goal you could have, but I don't understand the purpose of it here. The usual use case for that is raster effects, but really only because the NES has no built-in timer. It has nothing to do with optimization. Why do you think it's an advantage, otherwise?
-
psycopathicteen
- Posts: 3001
- Joined: Wed May 19, 2010 6:12 pm
Re: Every good programming technique
I never said anything about keeping the worst case scenario the same.
- rainwarrior
- Posts: 8062
- Joined: Sun Jan 22, 2012 12:03 pm
- Location: Canada
- Contact:
Re: Every good programming technique
But how would you propose to keep all the functionality that the worse case needs while reducing the time it takes?
Re: Every good programming technique
By spreading it over more frames.
Example from the NES scrolling code I'm working on: the old routine unpacks a 32-pixel-wide column at a time. In terms of cycles per byte it's easily the faster of the two. The new routine only unpacks an 8-pixel wide column at a time. It takes more cycles per byte, because it has to perform the same metatile decoding more than once, but it unpacks fewer bytes at a time.
When scrolling at 8 pixels per frame, the first routine is idle for three frames and spikes wildly every fourth. The second routine takes the same amount of time every frame. The total work over four frames is higher, but the worst case is significantly lower. If I only need to scroll 8 pixels per frame, the second routine will cause fewer lag frames despite being less efficient overall.
Example from the NES scrolling code I'm working on: the old routine unpacks a 32-pixel-wide column at a time. In terms of cycles per byte it's easily the faster of the two. The new routine only unpacks an 8-pixel wide column at a time. It takes more cycles per byte, because it has to perform the same metatile decoding more than once, but it unpacks fewer bytes at a time.
When scrolling at 8 pixels per frame, the first routine is idle for three frames and spikes wildly every fourth. The second routine takes the same amount of time every frame. The total work over four frames is higher, but the worst case is significantly lower. If I only need to scroll 8 pixels per frame, the second routine will cause fewer lag frames despite being less efficient overall.
Re: Every good programming technique
Maybe there's a lot of testing and branching that only saves time if there's less work to do.rainwarrior wrote:But how would you propose to keep all the functionality that the worse case needs while reducing the time it takes?