lidnariq wrote:supercat wrote:That would seem likely to work, but if there's a race condition between the DRAM machinery and the CPU cycles,
The DRAM circuitry is completely idle during forced and vertical blanking. ... with the caveat that the (PAL) 2C07 enables DRAM refresh for 50 scanlines before video rendering starts, and the PAL famiclone (UA6538) enables DRAM refresh for 50 scanlines after video rendering.
All operations involving DRAM reads--and that includes partial-row writes--need to be carefully sequenced. The mechanisms that force a regular sequence of actions may be idle, but if one wants to e.g. write to addresses 9 and 10 without affecting other bytes on the row, one of two sequences of events must occur:
1. Bytes 8-15 are read into a buffer, bytes 1 and 2 of that buffer are written with new data, and the buffer is written back to the row.
2. Bytes 8-15 are read into a buffer, bytes 1 of that buffer is written with new data, and the buffer is written back to the row. Then bytes 8-15 are read into a buffer again, byte 2 of that buffer is written, and the buffer is written back to the row.
In the second sequence, one could insert an arbitrary number of "read row X" and "write back row X" operations [with X being the same row as the other operations or a differen trow] between the first write-back, but the first half and second half of that sequence must, individually, be processed without other intervening operations.
Note that the number of discrete steps involving the DRAM array exceeds the number of CPU writes involved in performing them, so some kind of sequenced machinery is required even for accesses involving OAMADDR and OAMDATA.
In most DRAM chips, however, the "area" of the array is orders of magnitude larger than that of any individual reservoir.
The DRAM inside the 2C02 is weird; I haven't seen anything like it. It's NMOS, it holds both the bit and the inversion of the bit, it
still takes four transistors. The only way it's smaller than the SRAM that's also used on the die is that the NMOS pull-up is shared along an entire column, instead of next to each bit.
Both "bit" and "notbit" go in/out to the DRAM interface logic... look in the vicinity of node 426 in
Visual2C02.
I'd noticed the weird cell shape and wondered what was going on. Bulk DRAM uses a one-transistor cell, and until I saw the chip layout I would have guessed they'd use a three-transistor design with a storage transistor (source grounded), write transistor (connects storage gate to write bus), and read transistor (connects storage drain to read bus). That would avoid the destructive read issue, but I don't think that's what they're doing.
BTW, the area savings from eliminating the pull-up is significant, since the pull-ups would need to have routing to VDD.