I'll attempt a more concise [er, longer it seems] version, covering some details I know you know but for the benefit of others.
The PPU has long and short frames. Long frame is 341*262=89342 PPU clocks, or 29780 2/3 CPU clocks. If rendering is disabled, you get all long frames. Looking at where CPU cycles fall on the first scanline, it cycles through three positions:
Code: Select all
0--1--2-- frame 1
-0--1--2- frame 2
--0--1--2 frame 3
0--1--2-- frame 4
-0--1--2- frame 5
--0--1--2 frame 6
...If rendering is enabled, you get an alternation between long and short frames (a short frame is one PPU clock shorter than a long). This causes the position of the CPU cycles on the first scanline to only toggle between two positions:
Code: Select all
0--1--2-- frame 1
-0--1--2- frame 2
0--1--2-- frame 3
-0--1--2- frame 4
0--1--2-- frame 5
-0--1--2- frame 6
...So your code is delaying one extra CPU cycle every other frame, and the PPU is skipping a pixel every other frame. When you delay an extra clock, you effectively move your image three pixels to the right. On long PPU frames, the PPU effectively moves the image to the left by one pixel, and on short, to the left by two pixels. You have a choice as to which PPU frames you delay an extra CPU clock on, either short or long. You want to delay the extra clock and move your image right three pixels on frames that the PPU moves it two to the left, so that they result in only one pixel shift right.
If you do it wrong, you'll move the image three pixels right when the PPU moves it only one to the left, resulting in it moving a total of two to the right. Then on the next frame, the PPU will move it two to the left, and you'll have much more noticeable shaking.
Code: Select all
0--1--2-- frame 1
--0--1--2 frame 2
0--1--2-- frame 3
--0--1--2 frame 4
0--1--2-- frame 5
--0--1--2 frame 6
...