[My emulator] Graphics glitches - SuperMarioBros
Moderator: Moderators
Code: Select all
NameTable 0:
$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$
$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$
$$$?????$$$$$$$$$$ ????$$????$$$
$$$??????$$.)??$$$$?(?$$$$$$$$$$
$$$$$DHHHHHHHHHHHHHHHHHHHHI$$$$$
$$$$$FÐÑØØÞÑÐÚÞÑ&&&&&&&&&&J$$$$$
$$$$$FÒÓÛÛÛÙÛÜÛß&&&&&&&&&&J$$$$$
$$$$$FÔÕÔÙÛâÔÚÛà&&&&&&&&&&J$$$$$
$$$$$FÖ×Ö×á&ÖÝáá&&&&&&&&&&J$$$$$
$$$$$FÐèÑÐÑÞÑØÐÑ&ÞÑÞÑÐÑÐÑ&J$$$$$
$$$$$FÛBBÛBÛBÛÛB&ÛBÛBÛBÛB&J$$$$$
$$$$$FÛÛÛÛÛÛßÛÛÛ&ÛßÛßÛÛäå&J$$$$$
$$$$$FÛÛÛÞCÛàÛÛÛ&ÛãÛàÛÛæã&J$$$$$
$$$$$FÛÛÛÛBÛÛÛÔÙ&ÛÙÛÛÔÙÔÙçJ$$$$$
$$$$$_??????????x?????????z$$$$$
$$$$$$$$$$$$$Ï????$????????$$$$$
$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$
$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$
$$$$$$$$$Î$?$???"??$????$$$$$$$$
$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$
$$$$$$$$$$$?$???"??$????$$$$$$$$
$$$$12$$$$$$$$$$$$$$$$$$$$$$$$$$
$$$0&43$$$$$$$$$$$$$$$$$$$$$$$$$
$$0&&&&3$$$$???($??????$$$$$$$$$
$0&4&&4&3$$$$$$$$$$$$$$$676767$$
0&&&&&&&&3$$$$$$$$$$$$$5%%%%%%8$
´µ´µ´µ´µ´µ´µ´µ´µ´µ´µ´µ´µ´µ´µ´µ´µ
¶·¶·¶·¶·¶·¶·¶·¶·¶·¶·¶·¶·¶·¶·¶·¶·
´µ´µ´µ´µ´µ´µ´µ´µ´µ´µ´µ´µ´µ´µ´µ´µ
¶·¶·¶·¶·¶·¶·¶·¶·¶·¶·¶·¶·¶·¶·¶·¶·
Attribute table 0:
AA AA AA AA AA AA AA AA
00 55 55 55 55 55 55 55
55 55 55 55 55 55 55 55
55 55 55 55 55 55 55 00
00 00 99 AA AA AA 00 00
00 00 00 00 00 00 00 00
50 50 50 50 50 50 50 50
05 05 05 05 05 05 05 05
Re: oddly green
This is wrong. $4, $8 and $C should not be mirrored down to 0 when reading/writing to/from the palette memory, only when rendering (if you want to think about it that way). $10, $14, $18 and $1C are mirrors of 0, $4, $8 and $C.Bisqwit wrote:That is, all reads/writes to palette indexes, whether internally or through I/O, are routed through the following map:Code: Select all
00 => 0 08 => 0 10 => 0 18 => 0 01 => 1 09 => 9 11 => 11 19 => 19 02 => 2 0A => A 12 => 12 1A => 1A 03 => 3 0B => B 13 => 13 1B => 1B 04 => 0 0C => 0 14 => 0 1C => 0 05 => 5 0D => D 15 => 15 1D => 1D 06 => 6 0E => E 16 => 16 1E => 1E 07 => 7 0F => F 17 => 17 1F => 1F
See http://wiki.nesdev.com/w/index.php/PPU_palettes for details.
Re: oddly green
Thanks for the help. Though I already mentioned it in IRC; I got it working.
Turns out my PPU rendering loop was changing the nametable address mid-frame even while background rendering was disabled.
Turns out my PPU rendering loop was changing the nametable address mid-frame even while background rendering was disabled.
I have another problem... My emulator causes the game to crash.
This nice 187 kilobyte animated screenshot illustrates the problem. Repeat frames were removed from the first part to bring the intro faster to motion.

When the mushroom appears, the game crashes. I tried also running a TAS on it, and while the TAS (where it synced) did not invoke the mushroom, the game still crashed around the same spot.
(The image was stitched into a form that avoids global motion with my tool called "animmerger"; this makes the GIF smaller. However, it may appear in the end as if the screen jerks forward. This did not happen; the stitcher was just confused by the HUD suddenly disappearing as a result of the game's crash.)
I thought it would be sprite-0-hit related, but my emulator passes all Blargg's sprite hit tests... Yet the game still crashes.
This is interesting, because e.g. Rockman 1 plays just fine (and syncs with the TAS exactly as long as the real console does).
This nice 187 kilobyte animated screenshot illustrates the problem. Repeat frames were removed from the first part to bring the intro faster to motion.

When the mushroom appears, the game crashes. I tried also running a TAS on it, and while the TAS (where it synced) did not invoke the mushroom, the game still crashed around the same spot.
(The image was stitched into a form that avoids global motion with my tool called "animmerger"; this makes the GIF smaller. However, it may appear in the end as if the screen jerks forward. This did not happen; the stitcher was just confused by the HUD suddenly disappearing as a result of the game's crash.)
I thought it would be sprite-0-hit related, but my emulator passes all Blargg's sprite hit tests... Yet the game still crashes.
This is interesting, because e.g. Rockman 1 plays just fine (and syncs with the TAS exactly as long as the real console does).
I initially cleared it at the same time as I clear the vblank flag, but in order to make the "sprite_hit_tests_2005.10.05" test "09.timing_basics" test "9) Cleared at end of VBL too late" pass and not fail, I changed it into cycle 340 of the last vblank line: 2 cycles before the vblank flag is cleared, 1 cycle before the pre-render scanline begins.3gengames wrote:Are you clearing the sprite 0 hit only when rendering begins? I'd believe the test ROM's would fail for that but you never know, good luck.
The vblank clearing time was established to be 1 cycle after the end of vblank in order to make "ppu_vbl_nmi" test "03-vbl_clear_time" pass. (The CPU emulator passes all timing tests, including the cyclewise disassembly trace of nestest.nes, so it's not that the tests get wrong timing.)
Changing the clearing time did not affect the game either way.
The sprite hit flag is not cleared at any other time. It is also not cleared by a read of any port.
The OAM address (which is used for the sprite-prescan for next scanline at ppu scanline cycles 0..255) is cleared at cycle 0. In addition, at the processing of sprite 2 it is set to 8 as is done by nestopia. (Both of these only happen if sprite rendering is enabled.)
From the selection of "nestest", "instr_test-v3", "instr_misc", "branch_timing_tests", "cpu_timing_testv6", "oam_read", "oam_stress", "ppu_vbl_test", "ppu_open_bus", "sprite_hit_tests_2005.10.0", "sprite_hit_timing", my emulator currently fails only two tests:
-- "ppu_vbl_nmi" test "07-nmi_on_timing": I get two N lines rather than 5.
-- "instr_misc" test "04-dummy_reads_apu": APU is not implemented yet.
In addition, "ppu_sprite_overflow" seems to produce a number of fails, curiously, including an unexplained complaint about wrong VBL timing (despite the passing of ppu_vbl_test).
Here is how I do Vblank and NMI currently:
CPU:
- - All memory accesses are synchronous with the PPU: a memory-write and memory-read both incur an immediate three PPU cycles before the I/O is performed, regardless of the type of memory accessed. The same goes for extra tick() calls incurred by certain opcodes that need them to ensure proper timing.
- At the beginning of opcode fetch (before the opcode is fetched), the NMI line is checked and saved into a variable.
- After the opcode is fetched (and PPU has allowed to run for 3 cycles), the just saved nmi variable is checked. If a rising edge was detected (i.e. it was up and it was not up the last time it was checked), the fetched opcode is discarded, and replaced with BRK instead. NMI processing begins. (Though a BRK opcode is processed, special conditions ensure that the vector is loaded from $FFFA and that the flags pushed are ORred with #$20 rather than with #$30. The return address pushed to stack is also calculated properly for NMI.)
- - At the beginning of every cycle, a bitwise AND of the NMI enable flag ($2000 bit 7) and the Vblank flag ($2002 bit 7) is pushed into an internal queue of NMI states. The third element of the queue is popped, and assigned to the NMI line polled by the CPU. This ensures that the CPU always receives the NMI flag at a two (or three?) PPU cycle delay. Doing the pushing before the next step also ensures that the "06-suppression" test passes, among others.
- At the beginning of every cycle, an internal variable called VBlankState is checked. If it was 1, the VBlank flag ($2002 bit 7) is set. If it was -1, the $2002 register is set to #$00 (which clears the VBlank flag). After these tests, VBlankState is set to 0.
- At the beginning of the 0th cycle of the 241st scanline (the first vblank scanline, after the one idle waste scanline that follows the rendering), VBlankState is set to 1. This is the internal flag.
- At the beginning of the 0th cycle of the -1th scanline (pre-render scanline), VBlankState is set to -1. This is the internal flag.
- When $2000 is written to (by the CPU), no special processing happens aside from storing to the register.
- When $2002 is read from (by the CPU), the VBlank flag is cleared. If VBlankState happened to be 1, it is also cleared.
Can someone point out what exactly I am doing wrong that causes the two NMI and VBL related tests to fail? (In addition to possible further hints towards solving the Mario crash.)
I've noticed the significant omission of this data from all documents. It's quite annoying.Bisqwit wrote:During the processing of an LDA $2000, the PPU runs for 12 cycles in total. The PPU register access happens right after the 12th PPU cycle. (I could not find a test that tells which cycle the access should happen on.)
When a device reads from or writes to another device, it requires time to pass before the read/write actually occurs. When two devices are supposed to do something at the exact same time, either a conflict occurs or one takes priority. This information is completely missing in NES documentation.
On the SNES, each clock cycle is 6, 8, or 12 clocks long. Reads against the PPU happen at total_clocks-4, and writes at total_clocks (eg after the PPU has run the same amount of time as our opcode.)
Internally, the behavior is that the data is there the entire time, but has to be sitting on the bus with /RD or /WR for the right amount of time before it is acted upon.
Right now, my best guess for NES is that, assuming all chips are at an equal time, CPU > APU > PPU. And a CPU read/write happens before PPU runs. If CPU accesses PPU $2007 during rendering, who the hell knows what happens. It's guessed that it will read/write whatever the PPU fetched last, but it's never explained.
http://bisqwit.iki.fi/src/nesemu1_vbl_test_skeleton.cc
Here is a link to my V-Blank / NMI timing test skeleton, stripped of all features not related to V-Blank / NMI timing testing (370 lines remain). It can be used to run Blargg's tests. Note that it does not include any graphical / audio output. It outputs only to the console. Lacking any mapper functions, it only supports the "rom_singles" versions.
byuu, changing the tick() to occur _after_ read() or write() requires changing the NMI delay buffer length from 3 elements to 6 elements to prevent test pass rate going worse. I find this unlikely to be correct...
Here is a link to my V-Blank / NMI timing test skeleton, stripped of all features not related to V-Blank / NMI timing testing (370 lines remain). It can be used to run Blargg's tests. Note that it does not include any graphical / audio output. It outputs only to the console. Lacking any mapper functions, it only supports the "rom_singles" versions.
byuu, changing the tick() to occur _after_ read() or write() requires changing the NMI delay buffer length from 3 elements to 6 elements to prevent test pass rate going worse. I find this unlikely to be correct...
Just from the sounds of it, I think there is a small timing error somewhere in your PPU. I clear the entire contents of $2002 at the beginning of scanline -1 (dot 0), and I pass all the relevant PPU tests.
One thing I was doing wrong, was in my $4014, after I added the cycles for sprites (513 CPU cycles), I didn't catch the PPU up immediately, this caused a few dot error, that was driving me crazy trying to figure out.
How are you handling $4014?
One thing I was doing wrong, was in my $4014, after I added the cycles for sprites (513 CPU cycles), I didn't catch the PPU up immediately, this caused a few dot error, that was driving me crazy trying to figure out.
How are you handling $4014?
In the way shown above. When a write to $4014 is encountered, 256 reads and writes will be issued, each consuming one cpu tick (three ppu ticks). The write() call will therefore last 256*2+1 = 513 cpu cycles total (instead of the normal 1 cpu cycle), plus the additional time required by the opcode (opcode and operand fetches (3 cycles), possible indexing and possible misfiring (2 cycles)). These cycles are also done synchronously with the PPU. So no, that is not the reason either.beannaich wrote:How are you handling $4014?
Can you look at the source code I provided (or just the algorithm description in the preceding post) and point out where the timing error is?