Mapper capability development: DMA theft, Register Spying

Discuss hardware-related topics, such as development cartridges, CopyNES, PowerPak, EPROMs, or whatever.

Moderator: Moderators

Post Reply
User avatar
Myask
Posts: 965
Joined: Sat Jul 12, 2014 3:04 pm

Mapper capability development: DMA theft, Register Spying

Post by Myask »

Memblers wrote:The OAM data shows up on the data bus, so a cart can interact with it. I think that would be a neat way to copy data to the mapper, but that's for another topic. :)
DMA theft? Sure, why not.

pseudocode for verilog
0. wait for enable/ 0b. wait for write of destination to mapper
1. on next write of #$xx to $4014 (requires all CPU address lines!! And data lines, since we want to spy.)...
2. for yy = 0...$ff
3. wait for CPU_ADDR = $xxyy [alternately, $2004]
4. copy CPU_DATA to [destination]+yy on cart
===
Meanwhile, register-spying, as it would mainly require memory space, having included all of
on read/write to 1'b001x_xxxx_xxxx_xyyy
--copy to memory
(If 2005/6, twiddle address latch; on 2002 read, clear it)
on read/write to 1'b0100_0000_000y_yyyy
--copy to memory

Then, make accessible (on $4018? $4019? $4009? $400D?) after writing for which register you want (there are only 30, but with several have multiple modes- 2005/6 have two bytes, 4016/7 have reads and writes with different meaning, plus we might want what the internal-scroll variables are or something)
tepples
Posts: 22819
Joined: Sun Sep 19, 2004 11:12 pm
Location: NE Indiana, USA (NTSC)
Contact:

Re: Mapper capability development: DMA theft, Register Spyin

Post by tepples »

Myask wrote:DMA theft [...] pseudocode for verilog
0. wait for enable/ 0b. wait for write of destination to mapper
1. on next write of #$xx to $4014 (requires all CPU address lines!! And data lines, since we want to spy.)...
2. for yy = 0...$ff
3. wait for CPU_ADDR = $xxyy [alternately, $2004]
4. copy CPU_DATA to [destination]+yy on cart
You don't need all pins. You just need M2, R/W, /ROMSEL, and A14-A12 to decode a $4xxx write and a $2xxx write.
  • The host program sets up mapper ports to receive the next DMA
  • The mapper decodes a write to $4000-$4FFF
  • If a write to $2000-$2FFF is not decoded within 8 cycles, cancel the request
  • The mapper decodes each of 256 writes to $2000-$2FFF
lidnariq
Site Admin
Posts: 11606
Joined: Sun Apr 13, 2008 11:12 am

Re: Mapper capability development: DMA theft, Register Spyin

Post by lidnariq »

I've been playing around with putting an ethernet IC – Microchip's ENC624J600 – in a NES cart. It has a bunch of indirect addresses that seem to be ideal for hijacking the NES's OAM DMA for, one of which is at a local-to-the-IC address of 0x7E84 (a post-increment indirect memory access), which isn't too hard to make appear overlapping with $2004.

My real problem is that I haven't figured out what getting data rapidly into this IC is useful for, especially since the rest of the IC's 24 KiB of RAM can be memory-mapped, so it's not like one'd want to assemble an ethernet packet in NES-local RAM and then copy it in.
zzo38
Posts: 1096
Joined: Mon Feb 07, 2011 12:46 pm

Re: Mapper capability development: DMA theft, Register Spyin

Post by zzo38 »

I have thought of DMA theft and register spying too, although there is still many more possibilities too. But another possibility I have thought of is just the DMA is used to access the range of memory in sequence, which the mapepr might somehow use
(Free Hero Mesh - FOSS puzzle game engine)
3gengames
Formerly 65024U
Posts: 2284
Joined: Sat Mar 27, 2010 12:57 pm

Re: Mapper capability development: DMA theft, Register Spyin

Post by 3gengames »

I could think of one: Tons of tiles being written to CHR-RAM. That alone would be worth it for a few people. Would allow for lots of smooth animation. CPU-Side preparation for it during gameplay would be pretty simple, too. As long as the update is in Vblank, you could take control of the chip entirely without worrying about bus collisions. Would be interesting to see.
lidnariq
Site Admin
Posts: 11606
Joined: Sun Apr 13, 2008 11:12 am

Re: Mapper capability development: DMA theft, Register Spyin

Post by lidnariq »

DMA to CHR-RAM either requires an FPGA that can entirely interpose the PPU's address bus, or using FPGA-internal block RAM.

And because the NES CPU is so slow, DMA isn't clearly better than just making a dual-ported RAM, unless you're blitting uncompressed tiles from PRG ROM.
User avatar
mikejmoffitt
Posts: 1353
Joined: Sun May 27, 2012 8:43 pm

Re: Mapper capability development: DMA theft, Register Spyin

Post by mikejmoffitt »

lidnariq wrote:DMA to CHR-RAM either requires an FPGA that can entirely interpose the PPU's address bus, or using FPGA-internal block RAM.

And because the NES CPU is so slow, DMA isn't clearly better than just making a dual-ported RAM, unless you're blitting uncompressed tiles from PRG ROM.
Not only that - doesn't the 6502 demand bus mastering at all times? You'd have to pull it off of both busses, making the interposer just a little bigger, just so you can DMA with the PRG bus as a source.
lidnariq
Site Admin
Posts: 11606
Joined: Sun Apr 13, 2008 11:12 am

Re: Mapper capability development: DMA theft, Register Spyin

Post by lidnariq »

This is (I think?) still assuming hijacking OAM DMA, so the 2A03 as bus master isn't really a problem...
User avatar
Myask
Posts: 965
Joined: Sat Jul 12, 2014 3:04 pm

Re: Mapper capability development: DMA theft, Register Spyin

Post by Myask »

lidnariq wrote:DMA to CHR-RAM either requires an FPGA that can entirely interpose the PPU's address bus, or using FPGA-internal block RAM.
The NES drives PPU address lines during vblank when not accessing it? huh. I was suggesting catching it on the CPU read cycle half rather than the write.
lidnariq wrote:And because the NES CPU is so slow, DMA isn't clearly better than just making a dual-ported RAM, unless you're blitting uncompressed tiles from PRG ROM.
Other simple uses: blit to name/attribute tables, blitting initial data to [W]RAM, saving from RAM to WRAM.
Yes, a dual-ported RAM would also serve for CHR-RAM, and have the advantage of being usable outside of VBlank...but DMA is faster than what the CPU can do otherwise; the best a program can do is 8 cycles per byte (full-unrolled LDA abs-STA abs) to copy from one place to another. [You can set up LDA imm, but that's just moving the cost elsewhere.]
DMA is 2 cycles per byte, isn't it? That's a big savings if you aren't going for dual-ported RAM.
You don't need all pins. You just need M2, R/W, /ROMSEL, and A14-A12 to decode a $4xxx write and a $2xxx write.
That seems a little iffy and/or misfire-capable. Certainly wouldn't work if you wanted to map anything into $4xxx.
tepples
Posts: 22819
Joined: Sun Sep 19, 2004 11:12 pm
Location: NE Indiana, USA (NTSC)
Contact:

Re: Mapper capability development: DMA theft, Register Spyin

Post by tepples »

Myask wrote:
You don't need all pins. You just need M2, R/W, /ROMSEL, and A14-A12 to decode a $4xxx write and a $2xxx write.
That seems a little iffy and/or misfire-capable. Certainly wouldn't work if you wanted to map anything into $4xxx.
If there are things mapped into $4xxx, then there are obviously more CPU address lines going into the mapper to decode "this is a mapper port, not an APU port".

I think the idea is that the program does SEI and writes a mapper port to specify the destination for the copy. This sets up a state machine inside the mapper with the following states:
  1. CPU writes a destination to the DMA destination port on the mapper
  2. CPU writes source address bits 7-0 to $4xxx
  3. Mapper waits for a CPU read where source bits 7-4 match !(PRG /CE) and A14-A12
  4. DMA is on
lidnariq
Site Admin
Posts: 11606
Joined: Sun Apr 13, 2008 11:12 am

Re: Mapper capability development: DMA theft, Register Spyin

Post by lidnariq »

Yes, the PPU's address bus drivers never turn off. Why would they? That would increase complexity for no reason. And anyway, the PPU's bus is completely busy during rendering: every cycle is either dealing with the PPU's multiplexed address bus or actively transferring data over it.
Myask wrote:Yes, a dual-ported RAM would also serve for CHR-RAM, and have the advantage of being usable outside of VBlank...but DMA is faster than what the CPU can do otherwise;
Because the PPU's address bus drivers never turn off, hijacking DMA requires something functionally equivalent to dual-ported RAM anyway.

DMA is really just another way of moving time costs around. Regardless of whether one prepares a buffer in RAM, and transfers it using an slow indexed loop, or unrolled LDA $x/STA $y, or more aggressively LDA #im / STA $y, or using DMA, it's still just additional time costs on top of the original data setup. Dual-ported RAM is the logical extreme—"no copy" transfers, because it's already where you want it to be.

Which is why I said that the only use for DMA in preference to dual-ported RAM is specifically DMAing uncompressed data from ROM ... or copying data from a coprocessor like on the SNES with the S-DD1.
tepples
Posts: 22819
Joined: Sun Sep 19, 2004 11:12 pm
Location: NE Indiana, USA (NTSC)
Contact:

Re: Mapper capability development: DMA theft, Register Spyin

Post by tepples »

Or if DMA can be done using fewer logic resources than dual-ported RAM that can be both written and read back. Some codecs refer to previous decompressed data, and they wouldn't work quite as well on a write-only pseudo-dual-port scheme that uses a FIFO to queue mapper writes to be committed to VRAM during the 14 dummy fetches on each line. The obvious example is LZ77-family codecs. The tile codec I've used in my recent projects is mostly RLE, but it does have a few commands involving back-references:
  • Plane 0 $82 repeats the previous 16-byte packet verbatim.
  • Plane 0 $83 repeats a tile from the previous half of the circular buffer. This is used when decoding pattern tables $0000 and $1000 in parallel, and it dramatically improves compression ratio in NROM games that use the background pattern table select bit to animate some tiles (such as many Shiru games).
  • Plane 1 $82 repeats the previous 8 bytes, which produces a tile with colors 0 and 3.
  • Plane 1 $83 repeats the previous 8 bytes XOR'd with $FF, which produces a tile with colors 1 and 2.
Or if we want to shift updates to vblank to avoid tearing. People bring up tearing when someone mentions CHR HDMA on Game Boy Color.
User avatar
Myask
Posts: 965
Joined: Sat Jul 12, 2014 3:04 pm

Re: Mapper capability development: DMA theft, Register Spyin

Post by Myask »

lidnariq wrote:Yes, the PPU's address bus drivers never turn off. Why would they? That would increase complexity for no reason.
For no reason (read: I didn't think about it) I was thinking that the CPU would be controlling it for a DMA. Which is, of course, the exact opposite point of DMA- something ELSE is directly accessing memory.
Post Reply